Page MenuHomec4science

No OneTemporary

File Metadata

Created
Fri, May 10, 06:40
This file is larger than 256 KB, so syntax highlighting was skipped.
diff --git a/doc/src/JPG/tutorial_merged.png b/doc/src/JPG/tutorial_merged.png
index d3961afae..069c00735 100644
Binary files a/doc/src/JPG/tutorial_merged.png and b/doc/src/JPG/tutorial_merged.png differ
diff --git a/doc/src/JPG/tutorial_reverse_pull_request7.png b/doc/src/JPG/tutorial_reverse_pull_request7.png
new file mode 100644
index 000000000..89ea82363
Binary files /dev/null and b/doc/src/JPG/tutorial_reverse_pull_request7.png differ
diff --git a/doc/src/Manual.txt b/doc/src/Manual.txt
index 3e7c2fb20..b422d6ee1 100644
--- a/doc/src/Manual.txt
+++ b/doc/src/Manual.txt
@@ -1,340 +1,340 @@
<!-- HTML_ONLY -->
<HEAD>
<TITLE>LAMMPS Users Manual</TITLE>
-<META NAME="docnumber" CONTENT="9 Jan 2017 version">
+<META NAME="docnumber" CONTENT="17 Jan 2017 version">
<META NAME="author" CONTENT="http://lammps.sandia.gov - Sandia National Laboratories">
<META NAME="copyright" CONTENT="Copyright (2003) Sandia Corporation. This software and manual is distributed under the GNU General Public License.">
</HEAD>
<BODY>
<!-- END_HTML_ONLY -->
"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
<H1></H1>
LAMMPS Documentation :c,h3
-9 Jan 2017 version :c,h4
+17 Jan 2017 version :c,h4
Version info: :h4
The LAMMPS "version" is the date when it was released, such as 1 May
2010. LAMMPS is updated continuously. Whenever we fix a bug or add a
feature, we release it immediately, and post a notice on "this page of
the WWW site"_bug. Every 2-4 months one of the incremental releases
is subjected to more thorough testing and labeled as a {stable} version.
Each dated copy of LAMMPS contains all the
features and bug-fixes up to and including that version date. The
version date is printed to the screen and logfile every time you run
LAMMPS. It is also in the file src/version.h and in the LAMMPS
directory name created when you unpack a tarball, and at the top of
the first page of the manual (this page).
If you browse the HTML doc pages on the LAMMPS WWW site, they always
describe the most current version of LAMMPS. :ulb,l
If you browse the HTML doc pages included in your tarball, they
describe the version you have. :l
The "PDF file"_Manual.pdf on the WWW site or in the tarball is updated
about once per month. This is because it is large, and we don't want
it to be part of every patch. :l
There is also a "Developer.pdf"_Developer.pdf file in the doc
directory, which describes the internal structure and algorithms of
LAMMPS. :l
:ule
LAMMPS stands for Large-scale Atomic/Molecular Massively Parallel
Simulator.
LAMMPS is a classical molecular dynamics simulation code designed to
run efficiently on parallel computers. It was developed at Sandia
National Laboratories, a US Department of Energy facility, with
funding from the DOE. It is an open-source code, distributed freely
under the terms of the GNU Public License (GPL).
The current core group of LAMMPS developers is at Sandia National
Labs and Temple University:
"Steve Plimpton"_sjp, sjplimp at sandia.gov :ulb,l
Aidan Thompson, athomps at sandia.gov :l
Stan Moore, stamoore at sandia.gov :l
"Axel Kohlmeyer"_ako, akohlmey at gmail.com :l
:ule
Past core developers include Paul Crozier, Ray Shan and Mark Stevens,
all at Sandia. The [LAMMPS home page] at
"http://lammps.sandia.gov"_http://lammps.sandia.gov has more information
about the code and its uses. Interaction with external LAMMPS developers,
bug reports and feature requests are mainly coordinated through the
"LAMMPS project on GitHub."_https://github.com/lammps/lammps
The lammps.org domain, currently hosting "public continuous integration
testing"_https://ci.lammps.org/job/lammps/ and "precompiled Linux
RPM and Windows installer packages"_http://rpm.lammps.org is located
at Temple University and managed by Richard Berger,
richard.berger at temple.edu.
:link(bug,http://lammps.sandia.gov/bug.html)
:link(sjp,http://www.sandia.gov/~sjplimp)
:link(ako,http://goo.gl/1wk0)
:line
The LAMMPS documentation is organized into the following sections. If
you find errors or omissions in this manual or have suggestions for
useful information to add, please send an email to the developers so
we can improve the LAMMPS documentation.
Once you are familiar with LAMMPS, you may want to bookmark "this
page"_Section_commands.html#comm at Section_commands.html#comm since
it gives quick access to documentation for all LAMMPS commands.
"PDF file"_Manual.pdf of the entire manual, generated by
"htmldoc"_http://freecode.com/projects/htmldoc
<!-- RST
.. toctree::
:maxdepth: 2
:numbered:
:caption: User Documentation
:name: userdoc
:includehidden:
Section_intro
Section_start
Section_commands
Section_packages
Section_accelerate
Section_howto
Section_example
Section_perf
Section_tools
Section_modify
Section_python
Section_errors
Section_history
.. toctree::
:caption: Index
:name: index
:hidden:
tutorials
commands
fixes
computes
pairs
bonds
angles
dihedrals
impropers
Indices and tables
==================
* :ref:`genindex`
* :ref:`search`
END_RST -->
<!-- HTML_ONLY -->
"Introduction"_Section_intro.html :olb,l
1.1 "What is LAMMPS"_intro_1 :ulb,b
1.2 "LAMMPS features"_intro_2 :b
1.3 "LAMMPS non-features"_intro_3 :b
1.4 "Open source distribution"_intro_4 :b
1.5 "Acknowledgments and citations"_intro_5 :ule,b
"Getting started"_Section_start.html :l
2.1 "What's in the LAMMPS distribution"_start_1 :ulb,b
2.2 "Making LAMMPS"_start_2 :b
2.3 "Making LAMMPS with optional packages"_start_3 :b
2.4 "Building LAMMPS via the Make.py script"_start_4 :b
2.5 "Building LAMMPS as a library"_start_5 :b
2.6 "Running LAMMPS"_start_6 :b
2.7 "Command-line options"_start_7 :b
2.8 "Screen output"_start_8 :b
2.9 "Tips for users of previous versions"_start_9 :ule,b
"Commands"_Section_commands.html :l
3.1 "LAMMPS input script"_cmd_1 :ulb,b
3.2 "Parsing rules"_cmd_2 :b
3.3 "Input script structure"_cmd_3 :b
3.4 "Commands listed by category"_cmd_4 :b
3.5 "Commands listed alphabetically"_cmd_5 :ule,b
"Packages"_Section_packages.html :l
4.1 "Standard packages"_pkg_1 :ulb,b
4.2 "User packages"_pkg_2 :ule,b
"Accelerating LAMMPS performance"_Section_accelerate.html :l
5.1 "Measuring performance"_acc_1 :ulb,b
5.2 "Algorithms and code options to boost performace"_acc_2 :b
5.3 "Accelerator packages with optimized styles"_acc_3 :b
5.3.1 "GPU package"_accelerate_gpu.html :ulb,b
5.3.2 "USER-INTEL package"_accelerate_intel.html :b
5.3.3 "KOKKOS package"_accelerate_kokkos.html :b
5.3.4 "USER-OMP package"_accelerate_omp.html :b
5.3.5 "OPT package"_accelerate_opt.html :ule,b
5.4 "Comparison of various accelerator packages"_acc_4 :ule,b
"How-to discussions"_Section_howto.html :l
6.1 "Restarting a simulation"_howto_1 :ulb,b
6.2 "2d simulations"_howto_2 :b
6.3 "CHARMM and AMBER force fields"_howto_3 :b
6.4 "Running multiple simulations from one input script"_howto_4 :b
6.5 "Multi-replica simulations"_howto_5 :b
6.6 "Granular models"_howto_6 :b
6.7 "TIP3P water model"_howto_7 :b
6.8 "TIP4P water model"_howto_8 :b
6.9 "SPC water model"_howto_9 :b
6.10 "Coupling LAMMPS to other codes"_howto_10 :b
6.11 "Visualizing LAMMPS snapshots"_howto_11 :b
6.12 "Triclinic (non-orthogonal) simulation boxes"_howto_12 :b
6.13 "NEMD simulations"_howto_13 :b
6.14 "Finite-size spherical and aspherical particles"_howto_14 :b
6.15 "Output from LAMMPS (thermo, dumps, computes, fixes, variables)"_howto_15 :b
6.16 "Thermostatting, barostatting, and compute temperature"_howto_16 :b
6.17 "Walls"_howto_17 :b
6.18 "Elastic constants"_howto_18 :b
6.19 "Library interface to LAMMPS"_howto_19 :b
6.20 "Calculating thermal conductivity"_howto_20 :b
6.21 "Calculating viscosity"_howto_21 :b
6.22 "Calculating a diffusion coefficient"_howto_22 :b
6.23 "Using chunks to calculate system properties"_howto_23 :b
6.24 "Setting parameters for pppm/disp"_howto_24 :b
6.25 "Polarizable models"_howto_25 :b
6.26 "Adiabatic core/shell model"_howto_26 :b
6.27 "Drude induced dipoles"_howto_27 :ule,b
"Example problems"_Section_example.html :l
"Performance & scalability"_Section_perf.html :l
"Additional tools"_Section_tools.html :l
"Modifying & extending LAMMPS"_Section_modify.html :l
10.1 "Atom styles"_mod_1 :ulb,b
10.2 "Bond, angle, dihedral, improper potentials"_mod_2 :b
10.3 "Compute styles"_mod_3 :b
10.4 "Dump styles"_mod_4 :b
10.5 "Dump custom output options"_mod_5 :b
10.6 "Fix styles"_mod_6 :b
10.7 "Input script commands"_mod_7 :b
10.8 "Kspace computations"_mod_8 :b
10.9 "Minimization styles"_mod_9 :b
10.10 "Pairwise potentials"_mod_10 :b
10.11 "Region styles"_mod_11 :b
10.12 "Body styles"_mod_12 :b
10.13 "Thermodynamic output options"_mod_13 :b
10.14 "Variable options"_mod_14 :b
10.15 "Submitting new features for inclusion in LAMMPS"_mod_15 :ule,b
"Python interface"_Section_python.html :l
11.1 "Overview of running LAMMPS from Python"_py_1 :ulb,b
11.2 "Overview of using Python from a LAMMPS script"_py_2 :b
11.3 "Building LAMMPS as a shared library"_py_3 :b
11.4 "Installing the Python wrapper into Python"_py_4 :b
11.5 "Extending Python with MPI to run in parallel"_py_5 :b
11.6 "Testing the Python-LAMMPS interface"_py_6 :b
11.7 "Using LAMMPS from Python"_py_7 :b
11.8 "Example Python scripts that use LAMMPS"_py_8 :ule,b
"Errors"_Section_errors.html :l
12.1 "Common problems"_err_1 :ulb,b
12.2 "Reporting bugs"_err_2 :b
12.3 "Error & warning messages"_err_3 :ule,b
"Future and history"_Section_history.html :l
13.1 "Coming attractions"_hist_1 :ulb,b
13.2 "Past versions"_hist_2 :ule,b
:ole
:link(intro_1,Section_intro.html#intro_1)
:link(intro_2,Section_intro.html#intro_2)
:link(intro_3,Section_intro.html#intro_3)
:link(intro_4,Section_intro.html#intro_4)
:link(intro_5,Section_intro.html#intro_5)
:link(start_1,Section_start.html#start_1)
:link(start_2,Section_start.html#start_2)
:link(start_3,Section_start.html#start_3)
:link(start_4,Section_start.html#start_4)
:link(start_5,Section_start.html#start_5)
:link(start_6,Section_start.html#start_6)
:link(start_7,Section_start.html#start_7)
:link(start_8,Section_start.html#start_8)
:link(start_9,Section_start.html#start_9)
:link(cmd_1,Section_commands.html#cmd_1)
:link(cmd_2,Section_commands.html#cmd_2)
:link(cmd_3,Section_commands.html#cmd_3)
:link(cmd_4,Section_commands.html#cmd_4)
:link(cmd_5,Section_commands.html#cmd_5)
:link(pkg_1,Section_packages.html#pkg_1)
:link(pkg_2,Section_packages.html#pkg_2)
:link(acc_1,Section_accelerate.html#acc_1)
:link(acc_2,Section_accelerate.html#acc_2)
:link(acc_3,Section_accelerate.html#acc_3)
:link(acc_4,Section_accelerate.html#acc_4)
:link(howto_1,Section_howto.html#howto_1)
:link(howto_2,Section_howto.html#howto_2)
:link(howto_3,Section_howto.html#howto_3)
:link(howto_4,Section_howto.html#howto_4)
:link(howto_5,Section_howto.html#howto_5)
:link(howto_6,Section_howto.html#howto_6)
:link(howto_7,Section_howto.html#howto_7)
:link(howto_8,Section_howto.html#howto_8)
:link(howto_9,Section_howto.html#howto_9)
:link(howto_10,Section_howto.html#howto_10)
:link(howto_11,Section_howto.html#howto_11)
:link(howto_12,Section_howto.html#howto_12)
:link(howto_13,Section_howto.html#howto_13)
:link(howto_14,Section_howto.html#howto_14)
:link(howto_15,Section_howto.html#howto_15)
:link(howto_16,Section_howto.html#howto_16)
:link(howto_17,Section_howto.html#howto_17)
:link(howto_18,Section_howto.html#howto_18)
:link(howto_19,Section_howto.html#howto_19)
:link(howto_20,Section_howto.html#howto_20)
:link(howto_21,Section_howto.html#howto_21)
:link(howto_22,Section_howto.html#howto_22)
:link(howto_23,Section_howto.html#howto_23)
:link(howto_24,Section_howto.html#howto_24)
:link(howto_25,Section_howto.html#howto_25)
:link(howto_26,Section_howto.html#howto_26)
:link(howto_27,Section_howto.html#howto_27)
:link(mod_1,Section_modify.html#mod_1)
:link(mod_2,Section_modify.html#mod_2)
:link(mod_3,Section_modify.html#mod_3)
:link(mod_4,Section_modify.html#mod_4)
:link(mod_5,Section_modify.html#mod_5)
:link(mod_6,Section_modify.html#mod_6)
:link(mod_7,Section_modify.html#mod_7)
:link(mod_8,Section_modify.html#mod_8)
:link(mod_9,Section_modify.html#mod_9)
:link(mod_10,Section_modify.html#mod_10)
:link(mod_11,Section_modify.html#mod_11)
:link(mod_12,Section_modify.html#mod_12)
:link(mod_13,Section_modify.html#mod_13)
:link(mod_14,Section_modify.html#mod_14)
:link(mod_15,Section_modify.html#mod_15)
:link(py_1,Section_python.html#py_1)
:link(py_2,Section_python.html#py_2)
:link(py_3,Section_python.html#py_3)
:link(py_4,Section_python.html#py_4)
:link(py_5,Section_python.html#py_5)
:link(py_6,Section_python.html#py_6)
:link(err_1,Section_errors.html#err_1)
:link(err_2,Section_errors.html#err_2)
:link(err_3,Section_errors.html#err_3)
:link(hist_1,Section_history.html#hist_1)
:link(hist_2,Section_history.html#hist_2)
<!-- END_HTML_ONLY -->
</BODY>
diff --git a/doc/src/Section_errors.txt b/doc/src/Section_errors.txt
index d3ae9d94b..36c122bd1 100644
--- a/doc/src/Section_errors.txt
+++ b/doc/src/Section_errors.txt
@@ -1,11909 +1,11910 @@
"Previous Section"_Section_python.html - "LAMMPS WWW Site"_lws -
"LAMMPS Documentation"_ld - "LAMMPS Commands"_lc - "Next
Section"_Section_history.html :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
12. Errors :h3
This section describes the errors you can encounter when using LAMMPS,
either conceptually, or as printed out by the program.
12.1 "Common problems"_#err_1
12.2 "Reporting bugs"_#err_2
12.3 "Error & warning messages"_#err_3 :all(b)
:line
:line
12.1 Common problems :link(err_1),h4
If two LAMMPS runs do not produce the same answer on different
machines or different numbers of processors, this is typically not a
bug. In theory you should get identical answers on any number of
processors and on any machine. In practice, numerical round-off can
cause slight differences and eventual divergence of molecular dynamics
phase space trajectories within a few 100s or few 1000s of timesteps.
However, the statistical properties of the two runs (e.g. average
energy or temperature) should still be the same.
If the "velocity"_velocity.html command is used to set initial atom
velocities, a particular atom can be assigned a different velocity
when the problem is run on a different number of processors or on
different machines. If this happens, the phase space trajectories of
the two simulations will rapidly diverge. See the discussion of the
{loop} option in the "velocity"_velocity.html command for details and
options that avoid this issue.
Similarly, the "create_atoms"_create_atoms.html command generates a
lattice of atoms. For the same physical system, the ordering and
numbering of atoms by atom ID may be different depending on the number
of processors.
Some commands use random number generators which may be setup to
produce different random number streams on each processor and hence
will produce different effects when run on different numbers of
processors. A commonly-used example is the "fix
langevin"_fix_langevin.html command for thermostatting.
A LAMMPS simulation typically has two stages, setup and run. Most
LAMMPS errors are detected at setup time; others like a bond
stretching too far may not occur until the middle of a run.
LAMMPS tries to flag errors and print informative error messages so
-you can fix the problem. Of course, LAMMPS cannot figure out your
-physics or numerical mistakes, like choosing too big a timestep,
-specifying erroneous force field coefficients, or putting 2 atoms on
-top of each other! If you run into errors that LAMMPS doesn't catch
-that you think it should flag, please send an email to the
-"developers"_http://lammps.sandia.gov/authors.html.
+you can fix the problem. For most errors it will also print the last
+input script command that it was processing. Of course, LAMMPS cannot
+figure out your physics or numerical mistakes, like choosing too big a
+timestep, specifying erroneous force field coefficients, or putting 2
+atoms on top of each other! If you run into errors that LAMMPS
+doesn't catch that you think it should flag, please send an email to
+the "developers"_http://lammps.sandia.gov/authors.html.
If you get an error message about an invalid command in your input
script, you can determine what command is causing the problem by
looking in the log.lammps file or using the "echo command"_echo.html
to see it on the screen. If you get an error like "Invalid ...
style", with ... being fix, compute, pair, etc, it means that you
mistyped the style name or that the command is part of an optional
package which was not compiled into your executable. The list of
available styles in your executable can be listed by using "the -h
command-line argument"_Section_start.html#start_7. The installation
and compilation of optional packages is explained in the "installation
instructions"_Section_start.html#start_3.
For a given command, LAMMPS expects certain arguments in a specified
order. If you mess this up, LAMMPS will often flag the error, but it
may also simply read a bogus argument and assign a value that is
valid, but not what you wanted. E.g. trying to read the string "abc"
as an integer value of 0. Careful reading of the associated doc page
for the command should allow you to fix these problems. Note that
some commands allow for variables to be specified in place of numeric
constants so that the value can be evaluated and change over the
course of a run. This is typically done with the syntax {v_name} for
a parameter, where name is the name of the variable. This is only
allowed if the command documentation says it is.
Generally, LAMMPS will print a message to the screen and logfile and
exit gracefully when it encounters a fatal error. Sometimes it will
print a WARNING to the screen and logfile and continue on; you can
decide if the WARNING is important or not. A WARNING message that is
generated in the middle of a run is only printed to the screen, not to
the logfile, to avoid cluttering up thermodynamic output. If LAMMPS
crashes or hangs without spitting out an error message first then it
could be a bug (see "this section"_#err_2) or one of the following
cases:
LAMMPS runs in the available memory a processor allows to be
allocated. Most reasonable MD runs are compute limited, not memory
limited, so this shouldn't be a bottleneck on most platforms. Almost
all large memory allocations in the code are done via C-style malloc's
which will generate an error message if you run out of memory.
Smaller chunks of memory are allocated via C++ "new" statements. If
you are unlucky you could run out of memory just when one of these
small requests is made, in which case the code will crash or hang (in
parallel), since LAMMPS doesn't trap on those errors.
Illegal arithmetic can cause LAMMPS to run slow or crash. This is
typically due to invalid physics and numerics that your simulation is
computing. If you see wild thermodynamic values or NaN values in your
LAMMPS output, something is wrong with your simulation. If you
suspect this is happening, it is a good idea to print out
thermodynamic info frequently (e.g. every timestep) via the
"thermo"_thermo.html so you can monitor what is happening.
Visualizing the atom movement is also a good idea to insure your model
is behaving as you expect.
In parallel, one way LAMMPS can hang is due to how different MPI
implementations handle buffering of messages. If the code hangs
without an error message, it may be that you need to specify an MPI
setting or two (usually via an environment variable) to enable
buffering or boost the sizes of messages that can be buffered.
:line
12.2 Reporting bugs :link(err_2),h4
If you are confident that you have found a bug in LAMMPS, follow these
steps.
Check the "New features and bug
fixes"_http://lammps.sandia.gov/bug.html section of the "LAMMPS WWW
site"_lws to see if the bug has already been reported or fixed or the
"Unfixed bug"_http://lammps.sandia.gov/unbug.html to see if a fix is
pending.
Check the "mailing list"_http://lammps.sandia.gov/mail.html
to see if it has been discussed before.
If not, send an email to the mailing list describing the problem with
any ideas you have as to what is causing it or where in the code the
problem might be. The developers will ask for more info if needed,
such as an input script or data files.
The most useful thing you can do to help us fix the bug is to isolate
the problem. Run it on the smallest number of atoms and fewest number
of processors and with the simplest input script that reproduces the
bug and try to identify what command or combination of commands is
causing the problem.
As a last resort, you can send an email directly to the
"developers"_http://lammps.sandia.gov/authors.html.
:line
12.3 Error & warning messages :h4,link(err_3)
These are two alphabetic lists of the "ERROR"_#error and
"WARNING"_#warn messages LAMMPS prints out and the reason why. If the
explanation here is not sufficient, the documentation for the
offending command may help.
Error and warning messages also list the source file and line number
where the error was generated. For example, this message
ERROR: Illegal velocity command (velocity.cpp:78)
means that line #78 in the file src/velocity.cpp generated the error.
Looking in the source code may help you figure out what went wrong.
Note that error messages from "user-contributed
packages"_Section_start.html#start_3 are not listed here. If such an
error occurs and is not self-explanatory, you'll need to look in the
source code or contact the author of the package.
Errors: :h4,link(error)
:dlb
{1-3 bond count is inconsistent} :dt
An inconsistency was detected when computing the number of 1-3
neighbors for each atom. This likely means something is wrong with
the bond topologies you have defined. :dd
{1-4 bond count is inconsistent} :dt
An inconsistency was detected when computing the number of 1-4
neighbors for each atom. This likely means something is wrong with
the bond topologies you have defined. :dd
{Accelerator sharing is not currently supported on system} :dt
Multiple MPI processes cannot share the accelerator on your
system. For NVIDIA GPUs, see the nvidia-smi command to change this
setting. :dd
{All angle coeffs are not set} :dt
All angle coefficients must be set in the data file or by the
angle_coeff command before running a simulation. :dd
{All atom IDs = 0 but atom_modify id = yes} :dt
Self-explanatory. :dd
{All atoms of a swapped type must have same charge.} :dt
Self-explanatory. :dd
{All atoms of a swapped type must have the same charge.} :dt
Self-explanatory. :dd
{All bond coeffs are not set} :dt
All bond coefficients must be set in the data file or by the
bond_coeff command before running a simulation. :dd
{All dihedral coeffs are not set} :dt
All dihedral coefficients must be set in the data file or by the
dihedral_coeff command before running a simulation. :dd
{All improper coeffs are not set} :dt
All improper coefficients must be set in the data file or by the
improper_coeff command before running a simulation. :dd
{All masses are not set} :dt
For atom styles that define masses for each atom type, all masses must
be set in the data file or by the mass command before running a
simulation. They must also be set before using the velocity
command. :dd
{All mol IDs should be set for fix gcmc group atoms} :dt
The molecule flag is on, yet not all molecule ids in the fix group
have been set to non-zero positive values by the user. This is an
error since all atoms in the fix gcmc group are eligible for deletion,
rotation, and translation and therefore must have valid molecule ids. :dd
{All pair coeffs are not set} :dt
All pair coefficients must be set in the data file or by the
pair_coeff command before running a simulation. :dd
{All read_dump x,y,z fields must be specified for scaled, triclinic coords} :dt
For triclinic boxes and scaled coordinates you must specify all 3 of
the x,y,z fields, else LAMMPS cannot reconstruct the unscaled
coordinates. :dd
{All universe/uloop variables must have same # of values} :dt
Self-explanatory. :dd
{All variables in next command must be same style} :dt
Self-explanatory. :dd
{Angle atom missing in delete_bonds} :dt
The delete_bonds command cannot find one or more atoms in a particular
angle on a particular processor. The pairwise cutoff is too short or
the atoms are too far apart to make a valid angle. :dd
{Angle atom missing in set command} :dt
The set command cannot find one or more atoms in a particular angle on
a particular processor. The pairwise cutoff is too short or the atoms
are too far apart to make a valid angle. :dd
{Angle atoms %d %d %d missing on proc %d at step %ld} :dt
One or more of 3 atoms needed to compute a particular angle are
missing on this processor. Typically this is because the pairwise
cutoff is set too short or the angle has blown apart and an atom is
too far away. :dd
{Angle atoms missing on proc %d at step %ld} :dt
One or more of 3 atoms needed to compute a particular angle are
missing on this processor. Typically this is because the pairwise
cutoff is set too short or the angle has blown apart and an atom is
too far away. :dd
{Angle coeff for hybrid has invalid style} :dt
Angle style hybrid uses another angle style as one of its
coefficients. The angle style used in the angle_coeff command or read
from a restart file is not recognized. :dd
{Angle coeffs are not set} :dt
No angle coefficients have been assigned in the data file or via the
angle_coeff command. :dd
{Angle extent > half of periodic box length} :dt
This error was detected by the neigh_modify check yes setting. It is
an error because the angle atoms are so far apart it is ambiguous how
it should be defined. :dd
{Angle potential must be defined for SHAKE} :dt
When shaking angles, an angle_style potential must be used. :dd
{Angle style hybrid cannot have hybrid as an argument} :dt
Self-explanatory. :dd
{Angle style hybrid cannot have none as an argument} :dt
Self-explanatory. :dd
{Angle style hybrid cannot use same angle style twice} :dt
Self-explanatory. :dd
{Angle table must range from 0 to 180 degrees} :dt
Self-explanatory. :dd
{Angle table parameters did not set N} :dt
List of angle table parameters must include N setting. :dd
{Angle_coeff command before angle_style is defined} :dt
Coefficients cannot be set in the data file or via the angle_coeff
command until an angle_style has been assigned. :dd
{Angle_coeff command before simulation box is defined} :dt
The angle_coeff command cannot be used before a read_data,
read_restart, or create_box command. :dd
{Angle_coeff command when no angles allowed} :dt
The chosen atom style does not allow for angles to be defined. :dd
{Angle_style command when no angles allowed} :dt
The chosen atom style does not allow for angles to be defined. :dd
{Angles assigned incorrectly} :dt
Angles read in from the data file were not assigned correctly to
atoms. This means there is something invalid about the topology
definitions. :dd
{Angles defined but no angle types} :dt
The data file header lists angles but no angle types. :dd
{Append boundary must be shrink/minimum} :dt
The boundary style of the face where atoms are added
must be of type m (shrink/minimum). :dd
{Arccos of invalid value in variable formula} :dt
Argument of arccos() must be between -1 and 1. :dd
{Arcsin of invalid value in variable formula} :dt
Argument of arcsin() must be between -1 and 1. :dd
{Assigning body parameters to non-body atom} :dt
Self-explanatory. :dd
{Assigning ellipsoid parameters to non-ellipsoid atom} :dt
Self-explanatory. :dd
{Assigning line parameters to non-line atom} :dt
Self-explanatory. :dd
{Assigning quat to non-body atom} :dt
Self-explanatory. :dd
{Assigning tri parameters to non-tri atom} :dt
Self-explanatory. :dd
{At least one atom of each swapped type must be present to define charges.} :dt
Self-explanatory. :dd
{Atom IDs must be consecutive for velocity create loop all} :dt
Self-explanatory. :dd
{Atom IDs must be used for molecular systems} :dt
Atom IDs are used to identify and find partner atoms in bonds. :dd
{Atom count changed in fix neb} :dt
This is not allowed in a NEB calculation. :dd
{Atom count is inconsistent, cannot write data file} :dt
The sum of atoms across processors does not equal the global number
of atoms. Probably some atoms have been lost. :dd
{Atom count is inconsistent, cannot write restart file} :dt
Sum of atoms across processors does not equal initial total count.
This is probably because you have lost some atoms. :dd
{Atom in too many rigid bodies - boost MAXBODY} :dt
Fix poems has a parameter MAXBODY (in fix_poems.cpp) which determines
the maximum number of rigid bodies a single atom can belong to (i.e. a
multibody joint). The bodies you have defined exceed this limit. :dd
{Atom sort did not operate correctly} :dt
This is an internal LAMMPS error. Please report it to the
developers. :dd
{Atom sorting has bin size = 0.0} :dt
The neighbor cutoff is being used as the bin size, but it is zero.
Thus you must explicitly list a bin size in the atom_modify sort
command or turn off sorting. :dd
{Atom style hybrid cannot have hybrid as an argument} :dt
Self-explanatory. :dd
{Atom style hybrid cannot use same atom style twice} :dt
Self-explanatory. :dd
{Atom style template molecule must have atom types} :dt
The defined molecule(s) does not specify atom types. :dd
{Atom style was redefined after using fix property/atom} :dt
This is not allowed. :dd
{Atom type must be zero in fix gcmc mol command} :dt
Self-explanatory. :dd
{Atom vector in equal-style variable formula} :dt
Atom vectors generate one value per atom which is not allowed
in an equal-style variable. :dd
{Atom-style variable in equal-style variable formula} :dt
Atom-style variables generate one value per atom which is not allowed
in an equal-style variable. :dd
{Atom_modify id command after simulation box is defined} :dt
The atom_modify id command cannot be used after a read_data,
read_restart, or create_box command. :dd
{Atom_modify map command after simulation box is defined} :dt
The atom_modify map command cannot be used after a read_data,
read_restart, or create_box command. :dd
{Atom_modify sort and first options cannot be used together} :dt
Self-explanatory. :dd
{Atom_style command after simulation box is defined} :dt
The atom_style command cannot be used after a read_data,
read_restart, or create_box command. :dd
{Atom_style line can only be used in 2d simulations} :dt
Self-explanatory. :dd
{Atom_style tri can only be used in 3d simulations} :dt
Self-explanatory. :dd
{Atomfile variable could not read values} :dt
Check the file assigned to the variable. :dd
{Atomfile variable in equal-style variable formula} :dt
Self-explanatory. :dd
{Atomfile-style variable in equal-style variable formula} :dt
Self-explanatory. :dd
{Attempt to pop empty stack in fix box/relax} :dt
Internal LAMMPS error. Please report it to the developers. :dd
{Attempt to push beyond stack limit in fix box/relax} :dt
Internal LAMMPS error. Please report it to the developers. :dd
{Attempting to rescale a 0.0 temperature} :dt
Cannot rescale a temperature that is already 0.0. :dd
{Bad FENE bond} :dt
Two atoms in a FENE bond have become so far apart that the bond cannot
be computed. :dd
{Bad TIP4P angle type for PPPM/TIP4P} :dt
Specified angle type is not valid. :dd
{Bad TIP4P angle type for PPPMDisp/TIP4P} :dt
Specified angle type is not valid. :dd
{Bad TIP4P bond type for PPPM/TIP4P} :dt
Specified bond type is not valid. :dd
{Bad TIP4P bond type for PPPMDisp/TIP4P} :dt
Specified bond type is not valid. :dd
{Bad fix ID in fix append/atoms command} :dt
The value of the fix_id for keyword spatial must start with 'f_'. :dd
{Bad grid of processors} :dt
The 3d grid of processors defined by the processors command does not
match the number of processors LAMMPS is being run on. :dd
{Bad kspace_modify kmax/ewald parameter} :dt
Kspace_modify values for the kmax/ewald keyword must be integers > 0 :dd
{Bad kspace_modify slab parameter} :dt
Kspace_modify value for the slab/volume keyword must be >= 2.0. :dd
{Bad matrix inversion in mldivide3} :dt
This error should not occur unless the matrix is badly formed. :dd
{Bad principal moments} :dt
Fix rigid did not compute the principal moments of inertia of a rigid
group of atoms correctly. :dd
{Bad quadratic solve for particle/line collision} :dt
This is an internal error. It should nornally not occur. :dd
{Bad quadratic solve for particle/tri collision} :dt
This is an internal error. It should nornally not occur. :dd
{Bad real space Coulomb cutoff in fix tune/kspace} :dt
Fix tune/kspace tried to find the optimal real space Coulomb cutoff using
the Newton-Rhaphson method, but found a non-positive or NaN cutoff :dd
{Balance command before simulation box is defined} :dt
The balance command cannot be used before a read_data, read_restart,
or create_box command. :dd
{Balance produced bad splits} :dt
This should not occur. It means two or more cutting plane locations
are on top of each other or out of order. Report the problem to the
developers. :dd
{Balance rcb cannot be used with comm_style brick} :dt
Comm_style tiled must be used instead. :dd
{Balance shift string is invalid} :dt
The string can only contain the characters "x", "y", or "z". :dd
{Bias compute does not calculate a velocity bias} :dt
The specified compute must compute a bias for temperature. :dd
{Bias compute does not calculate temperature} :dt
The specified compute must compute temperature. :dd
{Bias compute group does not match compute group} :dt
The specified compute must operate on the same group as the parent
compute. :dd
{Big particle in fix srd cannot be point particle} :dt
Big particles must be extended spheriods or ellipsoids. :dd
{Bigint setting in lmptype.h is invalid} :dt
Size of bigint is less than size of tagint. :dd
{Bigint setting in lmptype.h is not compatible} :dt
Format of bigint stored in restart file is not consistent with LAMMPS
version you are running. See the settings in src/lmptype.h :dd
{Bitmapped lookup tables require int/float be same size} :dt
Cannot use pair tables on this machine, because of word sizes. Use
the pair_modify command with table 0 instead. :dd
{Bitmapped table in file does not match requested table} :dt
Setting for bitmapped table in pair_coeff command must match table
in file exactly. :dd
{Bitmapped table is incorrect length in table file} :dt
Number of table entries is not a correct power of 2. :dd
{Bond and angle potentials must be defined for TIP4P} :dt
Cannot use TIP4P pair potential unless bond and angle potentials
are defined. :dd
{Bond atom missing in box size check} :dt
The 2nd atoms needed to compute a particular bond is missing on this
processor. Typically this is because the pairwise cutoff is set too
short or the bond has blown apart and an atom is too far away. :dd
{Bond atom missing in delete_bonds} :dt
The delete_bonds command cannot find one or more atoms in a particular
bond on a particular processor. The pairwise cutoff is too short or
the atoms are too far apart to make a valid bond. :dd
{Bond atom missing in image check} :dt
The 2nd atom in a particular bond is missing on this processor.
Typically this is because the pairwise cutoff is set too short or the
bond has blown apart and an atom is too far away. :dd
{Bond atom missing in set command} :dt
The set command cannot find one or more atoms in a particular bond on
a particular processor. The pairwise cutoff is too short or the atoms
are too far apart to make a valid bond. :dd
{Bond atoms %d %d missing on proc %d at step %ld} :dt
The 2nd atom needed to compute a particular bond is missing on this
processor. Typically this is because the pairwise cutoff is set too
short or the bond has blown apart and an atom is too far away. :dd
{Bond atoms missing on proc %d at step %ld} :dt
The 2nd atom needed to compute a particular bond is missing on this
processor. Typically this is because the pairwise cutoff is set too
short or the bond has blown apart and an atom is too far away. :dd
{Bond coeff for hybrid has invalid style} :dt
Bond style hybrid uses another bond style as one of its coefficients.
The bond style used in the bond_coeff command or read from a restart
file is not recognized. :dd
{Bond coeffs are not set} :dt
No bond coefficients have been assigned in the data file or via the
bond_coeff command. :dd
{Bond extent > half of periodic box length} :dt
This error was detected by the neigh_modify check yes setting. It is
an error because the bond atoms are so far apart it is ambiguous how
it should be defined. :dd
{Bond potential must be defined for SHAKE} :dt
Cannot use fix shake unless bond potential is defined. :dd
{Bond style hybrid cannot have hybrid as an argument} :dt
Self-explanatory. :dd
{Bond style hybrid cannot have none as an argument} :dt
Self-explanatory. :dd
{Bond style hybrid cannot use same bond style twice} :dt
Self-explanatory. :dd
{Bond style quartic cannot be used with 3,4-body interactions} :dt
No angle, dihedral, or improper styles can be defined when using
bond style quartic. :dd
{Bond style quartic cannot be used with atom style template} :dt
This bond style can change the bond topology which is not
allowed with this atom style. :dd
{Bond style quartic requires special_bonds = 1,1,1} :dt
This is a restriction of the current bond quartic implementation. :dd
{Bond table parameters did not set N} :dt
List of bond table parameters must include N setting. :dd
{Bond table values are not increasing} :dt
The values in the tabulated file must be monotonically increasing. :dd
{BondAngle coeff for hybrid angle has invalid format} :dt
No "ba" field should appear in data file entry. :dd
{BondBond coeff for hybrid angle has invalid format} :dt
No "bb" field should appear in data file entry. :dd
{Bond_coeff command before bond_style is defined} :dt
Coefficients cannot be set in the data file or via the bond_coeff
command until an bond_style has been assigned. :dd
{Bond_coeff command before simulation box is defined} :dt
The bond_coeff command cannot be used before a read_data,
read_restart, or create_box command. :dd
{Bond_coeff command when no bonds allowed} :dt
The chosen atom style does not allow for bonds to be defined. :dd
{Bond_style command when no bonds allowed} :dt
The chosen atom style does not allow for bonds to be defined. :dd
{Bonds assigned incorrectly} :dt
Bonds read in from the data file were not assigned correctly to atoms.
This means there is something invalid about the topology definitions. :dd
{Bonds defined but no bond types} :dt
The data file header lists bonds but no bond types. :dd
{Both restart files must use % or neither} :dt
Self-explanatory. :dd
{Both restart files must use MPI-IO or neither} :dt
Self-explanatory. :dd
{Both sides of boundary must be periodic} :dt
Cannot specify a boundary as periodic only on the lo or hi side. Must
be periodic on both sides. :dd
{Boundary command after simulation box is defined} :dt
The boundary command cannot be used after a read_data, read_restart,
or create_box command. :dd
{Box bounds are invalid} :dt
The box boundaries specified in the read_data file are invalid. The
lo value must be less than the hi value for all 3 dimensions. :dd
{Box command after simulation box is defined} :dt
The box command cannot be used after a read_data, read_restart, or
create_box command. :dd
{CPU neighbor lists must be used for ellipsoid/sphere mix.} :dt
When using Gay-Berne or RE-squared pair styles with both ellipsoidal and
spherical particles, the neighbor list must be built on the CPU :dd
{Can not specify Pxy/Pxz/Pyz in fix box/relax with non-triclinic box} :dt
Only triclinic boxes can be used with off-diagonal pressure components.
See the region prism command for details. :dd
{Can not specify Pxy/Pxz/Pyz in fix nvt/npt/nph with non-triclinic box} :dt
Only triclinic boxes can be used with off-diagonal pressure components.
See the region prism command for details. :dd
{Can only use -plog with multiple partitions} :dt
Self-explanatory. See doc page discussion of command-line switches. :dd
{Can only use -pscreen with multiple partitions} :dt
Self-explanatory. See doc page discussion of command-line switches. :dd
{Can only use Kokkos supported regions with Kokkos package} :dt
Self-explanatory. :dd
{Can only use NEB with 1-processor replicas} :dt
This is current restriction for NEB as implemented in LAMMPS. :dd
{Can only use TAD with 1-processor replicas for NEB} :dt
This is current restriction for NEB as implemented in LAMMPS. :dd
{Cannot (yet) do analytic differentiation with pppm/gpu} :dt
This is a current restriction of this command. :dd
{Cannot (yet) request ghost atoms with Kokkos half neighbor list} :dt
This feature is not yet supported. :dd
{Cannot (yet) use 'electron' units with dipoles} :dt
This feature is not yet supported. :dd
{Cannot (yet) use Ewald with triclinic box and slab correction} :dt
This feature is not yet supported. :dd
{Cannot (yet) use K-space slab correction with compute group/group for triclinic systems} :dt
This option is not yet supported. :dd
{Cannot (yet) use MSM with 2d simulation} :dt
This feature is not yet supported. :dd
{Cannot (yet) use PPPM with triclinic box and TIP4P} :dt
This feature is not yet supported. :dd
{Cannot (yet) use PPPM with triclinic box and kspace_modify diff ad} :dt
This feature is not yet supported. :dd
{Cannot (yet) use PPPM with triclinic box and slab correction} :dt
This feature is not yet supported. :dd
{Cannot (yet) use kspace slab correction with long-range dipoles and non-neutral systems or per-atom energy} :dt
This feature is not yet supported. :dd
{Cannot (yet) use kspace_modify diff ad with compute group/group} :dt
This option is not yet supported. :dd
{Cannot (yet) use kspace_style pppm/stagger with triclinic systems} :dt
This feature is not yet supported. :dd
{Cannot (yet) use molecular templates with Kokkos} :dt
Self-explanatory. :dd
{Cannot (yet) use respa with Kokkos} :dt
Self-explanatory. :dd
{Cannot (yet) use rigid bodies with fix deform and Kokkos} :dt
Self-explanatory. :dd
{Cannot (yet) use rigid bodies with fix nh and Kokkos} :dt
Self-explanatory. :dd
{Cannot (yet) use single precision with MSM (remove -DFFT_SINGLE from Makefile and recompile)} :dt
Single precision cannot be used with MSM. :dd
{Cannot add atoms to fix move variable} :dt
Atoms can not be added afterwards to this fix option. :dd
{Cannot append atoms to a triclinic box} :dt
The simulation box must be defined with edges alligned with the
Cartesian axes. :dd
{Cannot balance in z dimension for 2d simulation} :dt
Self-explanatory. :dd
{Cannot change box ortho/triclinic with certain fixes defined} :dt
This is because those fixes store the shape of the box. You need to
use unfix to discard the fix, change the box, then redefine a new
fix. :dd
{Cannot change box ortho/triclinic with dumps defined} :dt
This is because some dumps store the shape of the box. You need to
use undump to discard the dump, change the box, then redefine a new
dump. :dd
{Cannot change box tilt factors for orthogonal box} :dt
Cannot use tilt factors unless the simulation box is non-orthogonal. :dd
{Cannot change box to orthogonal when tilt is non-zero} :dt
Self-explanatory. :dd
{Cannot change box z boundary to nonperiodic for a 2d simulation} :dt
Self-explanatory. :dd
{Cannot change dump_modify every for dump dcd} :dt
The frequency of writing dump dcd snapshots cannot be changed. :dd
{Cannot change dump_modify every for dump xtc} :dt
The frequency of writing dump xtc snapshots cannot be changed. :dd
{Cannot change timestep once fix srd is setup} :dt
This is because various SRD properties depend on the timestep
size. :dd
{Cannot change timestep with fix pour} :dt
This is because fix pour pre-computes the time delay for particles to
fall out of the insertion volume due to gravity. :dd
{Cannot change to comm_style brick from tiled layout} :dt
Self-explanatory. :dd
{Cannot change_box after reading restart file with per-atom info} :dt
This is because the restart file info cannot be migrated with the
atoms. You can get around this by performing a 0-timestep run which
will assign the restart file info to actual atoms. :dd
{Cannot change_box in xz or yz for 2d simulation} :dt
Self-explanatory. :dd
{Cannot change_box in z dimension for 2d simulation} :dt
Self-explanatory. :dd
{Cannot clear group all} :dt
This operation is not allowed. :dd
{Cannot close restart file - MPI error: %s} :dt
This error was generated by MPI when reading/writing an MPI-IO restart
file. :dd
{Cannot compute initial g_ewald_disp} :dt
LAMMPS failed to compute an initial guess for the PPPM_disp g_ewald_6
factor that partitions the computation between real space and k-space
for Disptersion interactions. :dd
{Cannot create an atom map unless atoms have IDs} :dt
The simulation requires a mapping from global atom IDs to local atoms,
but the atoms that have been defined have no IDs. :dd
{Cannot create atoms with undefined lattice} :dt
Must use the lattice command before using the create_atoms
command. :dd
{Cannot create/grow a vector/array of pointers for %s} :dt
LAMMPS code is making an illegal call to the templated memory
allocaters, to create a vector or array of pointers. :dd
{Cannot create_atoms after reading restart file with per-atom info} :dt
The per-atom info was stored to be used when by a fix that you may
re-define. If you add atoms before re-defining the fix, then there
will not be a correct amount of per-atom info. :dd
{Cannot create_box after simulation box is defined} :dt
A simulation box can only be defined once. :dd
{Cannot currently use pair reax with pair hybrid} :dt
This is not yet supported. :dd
{Cannot currently use pppm/gpu with fix balance.} :dt
Self-explanatory. :dd
{Cannot delete group all} :dt
Self-explanatory. :dd
{Cannot delete group currently used by a compute} :dt
Self-explanatory. :dd
{Cannot delete group currently used by a dump} :dt
Self-explanatory. :dd
{Cannot delete group currently used by a fix} :dt
Self-explanatory. :dd
{Cannot delete group currently used by atom_modify first} :dt
Self-explanatory. :dd
{Cannot delete_atoms bond yes for non-molecular systems} :dt
Self-explanatory. :dd
{Cannot displace_atoms after reading restart file with per-atom info} :dt
This is because the restart file info cannot be migrated with the
atoms. You can get around this by performing a 0-timestep run which
will assign the restart file info to actual atoms. :dd
{Cannot do GCMC on atoms in atom_modify first group} :dt
This is a restriction due to the way atoms are organized in a list to
enable the atom_modify first command. :dd
{Cannot do atom/swap on atoms in atom_modify first group} :dt
This is a restriction due to the way atoms are organized in a list to
enable the atom_modify first command. :dd
{Cannot dump sort on atom IDs with no atom IDs defined} :dt
Self-explanatory. :dd
{Cannot dump sort when multiple dump files are written} :dt
In this mode, each processor dumps its atoms to a file, so
no sorting is allowed. :dd
{Cannot embed Python when also extending Python with LAMMPS} :dt
When running LAMMPS via Python through the LAMMPS library interface
you cannot also user the input script python command. :dd
{Cannot evaporate atoms in atom_modify first group} :dt
This is a restriction due to the way atoms are organized in
a list to enable the atom_modify first command. :dd
{Cannot find create_bonds group ID} :dt
Self-explanatory. :dd
{Cannot find delete_bonds group ID} :dt
Group ID used in the delete_bonds command does not exist. :dd
{Cannot find specified group ID for core particles} :dt
Self-explanatory. :dd
{Cannot find specified group ID for shell particles} :dt
Self-explanatory. :dd
{Cannot have both pair_modify shift and tail set to yes} :dt
These 2 options are contradictory. :dd
{Cannot intersect groups using a dynamic group} :dt
This operation is not allowed. :dd
{Cannot mix molecular and molecule template atom styles} :dt
Self-explanatory. :dd
{Cannot open -reorder file} :dt
Self-explanatory. :dd
{Cannot open ADP potential file %s} :dt
The specified ADP potential file cannot be opened. Check that the
path and name are correct. :dd
{Cannot open AIREBO potential file %s} :dt
The specified AIREBO potential file cannot be opened. Check that the
path and name are correct. :dd
{Cannot open BOP potential file %s} :dt
The specified BOP potential file cannot be opened. Check that the
path and name are correct. :dd
{Cannot open COMB potential file %s} :dt
The specified COMB potential file cannot be opened. Check that the
path and name are correct. :dd
{Cannot open COMB3 lib.comb3 file} :dt
The COMB3 library file cannot be opened. Check that the path and name
are correct. :dd
{Cannot open COMB3 potential file %s} :dt
The specified COMB3 potential file cannot be opened. Check that the
path and name are correct. :dd
{Cannot open EAM potential file %s} :dt
The specified EAM potential file cannot be opened. Check that the
path and name are correct. :dd
{Cannot open EIM potential file %s} :dt
The specified EIM potential file cannot be opened. Check that the
path and name are correct. :dd
{Cannot open LCBOP potential file %s} :dt
The specified LCBOP potential file cannot be opened. Check that the
path and name are correct. :dd
{Cannot open MEAM potential file %s} :dt
The specified MEAM potential file cannot be opened. Check that the
path and name are correct. :dd
{Cannot open SNAP coefficient file %s} :dt
The specified SNAP coefficient file cannot be opened. Check that the
path and name are correct. :dd
{Cannot open SNAP parameter file %s} :dt
The specified SNAP parameter file cannot be opened. Check that the
path and name are correct. :dd
{Cannot open Stillinger-Weber potential file %s} :dt
The specified SW potential file cannot be opened. Check that the path
and name are correct. :dd
{Cannot open Tersoff potential file %s} :dt
The specified potential file cannot be opened. Check that the path
and name are correct. :dd
{Cannot open Vashishta potential file %s} :dt
The specified Vashishta potential file cannot be opened. Check that the path
and name are correct. :dd
{Cannot open balance output file} :dt
Self-explanatory. :dd
{Cannot open coul/streitz potential file %s} :dt
The specified coul/streitz potential file cannot be opened. Check
that the path and name are correct. :dd
{Cannot open custom file} :dt
Self-explanatory. :dd
{Cannot open data file %s} :dt
The specified file cannot be opened. Check that the path and name are
correct. :dd
{Cannot open dir to search for restart file} :dt
Using a "*" in the name of the restart file will open the current
directory to search for matching file names. :dd
{Cannot open dump file} :dt
Self-explanatory. :dd
{Cannot open dump file %s} :dt
The output file for the dump command cannot be opened. Check that the
path and name are correct. :dd
{Cannot open file %s} :dt
The specified file cannot be opened. Check that the path and name are
correct. If the file is a compressed file, also check that the gzip
executable can be found and run. :dd
{Cannot open file variable file %s} :dt
The specified file cannot be opened. Check that the path and name are
correct. :dd
{Cannot open fix ave/chunk file %s} :dt
The specified file cannot be opened. Check that the path and name are
correct. :dd
{Cannot open fix ave/correlate file %s} :dt
The specified file cannot be opened. Check that the path and name are
correct. :dd
{Cannot open fix ave/histo file %s} :dt
The specified file cannot be opened. Check that the path and name are
correct. :dd
{Cannot open fix ave/spatial file %s} :dt
The specified file cannot be opened. Check that the path and name are
correct. :dd
{Cannot open fix ave/time file %s} :dt
The specified file cannot be opened. Check that the path and name are
correct. :dd
{Cannot open fix balance output file} :dt
Self-explanatory. :dd
{Cannot open fix poems file %s} :dt
The specified file cannot be opened. Check that the path and name are
correct. :dd
{Cannot open fix print file %s} :dt
The output file generated by the fix print command cannot be opened :dd
{Cannot open fix qeq parameter file %s} :dt
The specified file cannot be opened. Check that the path and name are
correct. :dd
{Cannot open fix qeq/comb file %s} :dt
The output file for the fix qeq/combs command cannot be opened.
Check that the path and name are correct. :dd
{Cannot open fix reax/bonds file %s} :dt
The output file for the fix reax/bonds command cannot be opened.
Check that the path and name are correct. :dd
{Cannot open fix rigid infile %s} :dt
The specified file cannot be opened. Check that the path and name are
correct. :dd
{Cannot open fix rigid restart file %s} :dt
The specified file cannot be opened. Check that the path and name are
correct. :dd
{Cannot open fix rigid/small infile %s} :dt
The specified file cannot be opened. Check that the path and name are
correct. :dd
{Cannot open fix tmd file %s} :dt
The output file for the fix tmd command cannot be opened. Check that
the path and name are correct. :dd
{Cannot open fix ttm file %s} :dt
The output file for the fix ttm command cannot be opened. Check that
the path and name are correct. :dd
{Cannot open gzipped file} :dt
LAMMPS was compiled without support for reading and writing gzipped
files through a pipeline to the gzip program with -DLAMMPS_GZIP. :dd
{Cannot open input script %s} :dt
Self-explanatory. :dd
{Cannot open log.cite file} :dt
This file is created when you use some LAMMPS features, to indicate
what paper you should cite on behalf of those who implemented
the feature. Check that you have write priveleges into the directory
you are running in. :dd
{Cannot open log.lammps for writing} :dt
The default LAMMPS log file cannot be opened. Check that the
directory you are running in allows for files to be created. :dd
{Cannot open logfile} :dt
The LAMMPS log file named in a command-line argument cannot be opened.
Check that the path and name are correct. :dd
{Cannot open logfile %s} :dt
The LAMMPS log file specified in the input script cannot be opened.
Check that the path and name are correct. :dd
{Cannot open molecule file %s} :dt
The specified file cannot be opened. Check that the path and name are
correct. :dd
{Cannot open nb3b/harmonic potential file %s} :dt
The specified potential file cannot be opened. Check that the path
and name are correct. :dd
{Cannot open pair_write file} :dt
The specified output file for pair energies and forces cannot be
opened. Check that the path and name are correct. :dd
{Cannot open polymorphic potential file %s} :dt
The specified polymorphic potential file cannot be opened. Check that
the path and name are correct. :dd
{Cannot open print file %s} :dt
Self-explanatory. :dd
{Cannot open processors output file} :dt
Self-explanatory. :dd
{Cannot open restart file %s} :dt
Self-explanatory. :dd
{Cannot open restart file for reading - MPI error: %s} :dt
This error was generated by MPI when reading/writing an MPI-IO restart
file. :dd
{Cannot open restart file for writing - MPI error: %s} :dt
This error was generated by MPI when reading/writing an MPI-IO restart
file. :dd
{Cannot open screen file} :dt
The screen file specified as a command-line argument cannot be
opened. Check that the directory you are running in allows for files
to be created. :dd
{Cannot open temporary file for world counter.} :dt
Self-explanatory. :dd
{Cannot open universe log file} :dt
For a multi-partition run, the master log file cannot be opened.
Check that the directory you are running in allows for files to be
created. :dd
{Cannot open universe screen file} :dt
For a multi-partition run, the master screen file cannot be opened.
Check that the directory you are running in allows for files to be
created. :dd
{Cannot read from restart file - MPI error: %s} :dt
This error was generated by MPI when reading/writing an MPI-IO restart
file. :dd
{Cannot read_data without add keyword after simulation box is defined} :dt
Self-explanatory. :dd
{Cannot read_restart after simulation box is defined} :dt
The read_restart command cannot be used after a read_data,
read_restart, or create_box command. :dd
{Cannot redefine variable as a different style} :dt
An equal-style variable can be re-defined but only if it was
originally an equal-style variable. :dd
{Cannot replicate 2d simulation in z dimension} :dt
The replicate command cannot replicate a 2d simulation in the z
dimension. :dd
{Cannot replicate with fixes that store atom quantities} :dt
Either fixes are defined that create and store atom-based vectors or a
restart file was read which included atom-based vectors for fixes.
The replicate command cannot duplicate that information for new atoms.
You should use the replicate command before fixes are applied to the
system. :dd
{Cannot reset timestep with a dynamic region defined} :dt
Dynamic regions (see the region command) have a time dependence.
Thus you cannot change the timestep when one or more of these
are defined. :dd
{Cannot reset timestep with a time-dependent fix defined} :dt
You cannot reset the timestep when a fix that keeps track of elapsed
time is in place. :dd
{Cannot run 2d simulation with nonperiodic Z dimension} :dt
Use the boundary command to make the z dimension periodic in order to
run a 2d simulation. :dd
{Cannot set bond topology types for atom style template} :dt
The bond, angle, etc types cannot be changed for this atom style since
they are static settings in the molecule template files. :dd
{Cannot set both respa pair and inner/middle/outer} :dt
In the rRESPA integrator, you must compute pairwise potentials either
all together (pair), or in pieces (inner/middle/outer). You can't do
both. :dd
{Cannot set cutoff/multi before simulation box is defined} :dt
Self-explanatory. :dd
{Cannot set dpd/theta for this atom style} :dt
Self-explanatory. :dd
{Cannot set dump_modify flush for dump xtc} :dt
Self-explanatory. :dd
{Cannot set mass for this atom style} :dt
This atom style does not support mass settings for each atom type.
Instead they are defined on a per-atom basis in the data file. :dd
{Cannot set meso/cv for this atom style} :dt
Self-explanatory. :dd
{Cannot set meso/e for this atom style} :dt
Self-explanatory. :dd
{Cannot set meso/rho for this atom style} :dt
Self-explanatory. :dd
{Cannot set non-zero image flag for non-periodic dimension} :dt
Self-explanatory. :dd
{Cannot set non-zero z velocity for 2d simulation} :dt
Self-explanatory. :dd
{Cannot set quaternion for atom that has none} :dt
Self-explanatory. :dd
{Cannot set quaternion with xy components for 2d system} :dt
Self-explanatory. :dd
{Cannot set respa hybrid and any of pair/inner/middle/outer} :dt
In the rRESPA integrator, you must compute pairwise potentials either
all together (pair), with different cutoff regions (inner/middle/outer),
or per hybrid sub-style (hybrid). You cannot mix those. :dd
{Cannot set respa middle without inner/outer} :dt
In the rRESPA integrator, you must define both a inner and outer
setting in order to use a middle setting. :dd
{Cannot set restart file size - MPI error: %s} :dt
This error was generated by MPI when reading/writing an MPI-IO restart
file. :dd
{Cannot set smd/contact/radius for this atom style} :dt
Self-explanatory. :dd
{Cannot set smd/mass/density for this atom style} :dt
Self-explanatory. :dd
{Cannot set temperature for fix rigid/nph} :dt
The temp keyword cannot be specified. :dd
{Cannot set theta for atom that is not a line} :dt
Self-explanatory. :dd
{Cannot set this attribute for this atom style} :dt
The attribute being set does not exist for the defined atom style. :dd
{Cannot set variable z velocity for 2d simulation} :dt
Self-explanatory. :dd
{Cannot skew triclinic box in z for 2d simulation} :dt
Self-explanatory. :dd
{Cannot subtract groups using a dynamic group} :dt
This operation is not allowed. :dd
{Cannot union groups using a dynamic group} :dt
This operation is not allowed. :dd
{Cannot use -cuda on and -kokkos on together} :dt
This is not allowed since both packages can use GPUs. :dd
{Cannot use -cuda on without USER-CUDA installed} :dt
The USER-CUDA package must be installed via "make yes-user-cuda"
before LAMMPS is built. :dd
{Cannot use -kokkos on without KOKKOS installed} :dt
Self-explanatory. :dd
{Cannot use -reorder after -partition} :dt
Self-explanatory. See doc page discussion of command-line switches. :dd
{Cannot use Ewald with 2d simulation} :dt
The kspace style ewald cannot be used in 2d simulations. You can use
2d Ewald in a 3d simulation; see the kspace_modify command. :dd
{Cannot use Ewald/disp solver on system with no charge, dipole, or LJ particles} :dt
No atoms in system have a non-zero charge or dipole, or are LJ
particles. Change charges/dipoles or change options of the kspace
solver/pair style. :dd
{Cannot use EwaldDisp with 2d simulation} :dt
This is a current restriction of this command. :dd
{Cannot use GPU package with USER-CUDA package enabled} :dt
You cannot use both the GPU and USER-CUDA packages
together. Use one or the other. :dd
{Cannot use Kokkos pair style with rRESPA inner/middle} :dt
Self-explanatory. :dd
{Cannot use NEB unless atom map exists} :dt
Use the atom_modify command to create an atom map. :dd
{Cannot use NEB with a single replica} :dt
Self-explanatory. :dd
{Cannot use NEB with atom_modify sort enabled} :dt
This is current restriction for NEB implemented in LAMMPS. :dd
{Cannot use PPPM with 2d simulation} :dt
The kspace style pppm cannot be used in 2d simulations. You can use
2d PPPM in a 3d simulation; see the kspace_modify command. :dd
{Cannot use PPPMDisp with 2d simulation} :dt
The kspace style pppm/disp cannot be used in 2d simulations. You can
use 2d pppm/disp in a 3d simulation; see the kspace_modify command. :dd
{Cannot use PRD with a changing box} :dt
The current box dimensions are not copied between replicas :dd
{Cannot use PRD with a time-dependent fix defined} :dt
PRD alters the timestep in ways that will mess up these fixes. :dd
{Cannot use PRD with a time-dependent region defined} :dt
PRD alters the timestep in ways that will mess up these regions. :dd
{Cannot use PRD with atom_modify sort enabled} :dt
This is a current restriction of PRD. You must turn off sorting,
which is enabled by default, via the atom_modify command. :dd
{Cannot use PRD with multi-processor replicas unless atom map exists} :dt
Use the atom_modify command to create an atom map. :dd
{Cannot use TAD unless atom map exists for NEB} :dt
See atom_modify map command to set this. :dd
{Cannot use TAD with a single replica for NEB} :dt
NEB requires multiple replicas. :dd
{Cannot use TAD with atom_modify sort enabled for NEB} :dt
This is a current restriction of NEB. :dd
{Cannot use a damped dynamics min style with fix box/relax} :dt
This is a current restriction in LAMMPS. Use another minimizer
style. :dd
{Cannot use a damped dynamics min style with per-atom DOF} :dt
This is a current restriction in LAMMPS. Use another minimizer
style. :dd
{Cannot use append/atoms in periodic dimension} :dt
The boundary style of the face where atoms are added can not be of
type p (periodic). :dd
{Cannot use atomfile-style variable unless atom map exists} :dt
Self-explanatory. See the atom_modify command to create a map. :dd
{Cannot use both com and bias with compute temp/chunk} :dt
Self-explanatory. :dd
{Cannot use chosen neighbor list style with buck/coul/cut/kk} :dt
Self-explanatory. :dd
{Cannot use chosen neighbor list style with buck/coul/long/kk} :dt
Self-explanatory. :dd
{Cannot use chosen neighbor list style with buck/kk} :dt
That style is not supported by Kokkos. :dd
{Cannot use chosen neighbor list style with coul/cut/kk} :dt
That style is not supported by Kokkos. :dd
{Cannot use chosen neighbor list style with coul/debye/kk} :dt
Self-explanatory. :dd
{Cannot use chosen neighbor list style with coul/dsf/kk} :dt
That style is not supported by Kokkos. :dd
{Cannot use chosen neighbor list style with coul/wolf/kk} :dt
That style is not supported by Kokkos. :dd
{Cannot use chosen neighbor list style with lj/charmm/coul/charmm/implicit/kk} :dt
Self-explanatory. :dd
{Cannot use chosen neighbor list style with lj/charmm/coul/charmm/kk} :dt
Self-explanatory. :dd
{Cannot use chosen neighbor list style with lj/charmm/coul/long/kk} :dt
Self-explanatory. :dd
{Cannot use chosen neighbor list style with lj/class2/coul/cut/kk} :dt
Self-explanatory. :dd
{Cannot use chosen neighbor list style with lj/class2/coul/long/kk} :dt
Self-explanatory. :dd
{Cannot use chosen neighbor list style with lj/class2/kk} :dt
Self-explanatory. :dd
{Cannot use chosen neighbor list style with lj/cut/coul/cut/kk} :dt
That style is not supported by Kokkos. :dd
{Cannot use chosen neighbor list style with lj/cut/coul/debye/kk} :dt
Self-explanatory. :dd
{Cannot use chosen neighbor list style with lj/cut/coul/long/kk} :dt
That style is not supported by Kokkos. :dd
{Cannot use chosen neighbor list style with lj/cut/kk} :dt
That style is not supported by Kokkos. :dd
{Cannot use chosen neighbor list style with lj/expand/kk} :dt
Self-explanatory. :dd
{Cannot use chosen neighbor list style with lj/gromacs/coul/gromacs/kk} :dt
Self-explanatory. :dd
{Cannot use chosen neighbor list style with lj/gromacs/kk} :dt
Self-explanatory. :dd
{Cannot use chosen neighbor list style with lj/sdk/kk} :dt
That style is not supported by Kokkos. :dd
{Cannot use chosen neighbor list style with pair eam/kk} :dt
That style is not supported by Kokkos. :dd
{Cannot use chosen neighbor list style with pair eam/kk/alloy} :dt
Self-explanatory. :dd
{Cannot use chosen neighbor list style with pair eam/kk/fs} :dt
Self-explanatory. :dd
{Cannot use chosen neighbor list style with pair sw/kk} :dt
Self-explanatory. :dd
{Cannot use chosen neighbor list style with tersoff/kk} :dt
Self-explanatory. :dd
{Cannot use chosen neighbor list style with tersoff/zbl/kk} :dt
Self-explanatory. :dd
{Cannot use compute chunk/atom bin z for 2d model} :dt
Self-explanatory. :dd
{Cannot use compute cluster/atom unless atoms have IDs} :dt
Atom IDs are used to identify clusters. :dd
{Cannot use create_atoms rotate unless single style} :dt
Self-explanatory. :dd
{Cannot use create_bonds unless atoms have IDs} :dt
This command requires a mapping from global atom IDs to local atoms,
but the atoms that have been defined have no IDs. :dd
{Cannot use create_bonds with non-molecular system} :dt
Self-explanatory. :dd
{Cannot use cwiggle in variable formula between runs} :dt
This is a function of elapsed time. :dd
{Cannot use delete_atoms bond yes with atom_style template} :dt
This is because the bonds for that atom style are hardwired in the
molecule template. :dd
{Cannot use delete_atoms unless atoms have IDs} :dt
Your atoms do not have IDs, so the delete_atoms command cannot be
used. :dd
{Cannot use delete_bonds with non-molecular system} :dt
Your choice of atom style does not have bonds. :dd
{Cannot use dump_modify fileper without % in dump file name} :dt
Self-explanatory. :dd
{Cannot use dump_modify nfile without % in dump file name} :dt
Self-explanatory. :dd
{Cannot use dynamic group with fix adapt atom} :dt
This is not yet supported. :dd
{Cannot use fix TMD unless atom map exists} :dt
Using this fix requires the ability to lookup an atom index, which is
provided by an atom map. An atom map does not exist (by default) for
non-molecular problems. Using the atom_modify map command will force
an atom map to be created. :dd
{Cannot use fix ave/spatial z for 2 dimensional model} :dt
Self-explanatory. :dd
{Cannot use fix bond/break with non-molecular systems} :dt
Only systems with bonds that can be changed can be used. Atom_style
template does not qualify. :dd
{Cannot use fix bond/create with non-molecular systems} :dt
Only systems with bonds that can be changed can be used. Atom_style
template does not qualify. :dd
{Cannot use fix bond/swap with non-molecular systems} :dt
Only systems with bonds that can be changed can be used. Atom_style
template does not qualify. :dd
{Cannot use fix box/relax on a 2nd non-periodic dimension} :dt
When specifying an off-diagonal pressure component, the 2nd of the two
dimensions must be periodic. E.g. if the xy component is specified,
then the y dimension must be periodic. :dd
{Cannot use fix box/relax on a non-periodic dimension} :dt
When specifying a diagonal pressure component, the dimension must be
periodic. :dd
{Cannot use fix box/relax with both relaxation and scaling on a tilt factor} :dt
When specifying scaling on a tilt factor component, that component can not
also be controlled by the barostat. E.g. if scalexy yes is specified and
also keyword tri or xy, this is wrong. :dd
{Cannot use fix box/relax with tilt factor scaling on a 2nd non-periodic dimension} :dt
When specifying scaling on a tilt factor component, the 2nd of the two
dimensions must be periodic. E.g. if the xy component is specified,
then the y dimension must be periodic. :dd
{Cannot use fix deform on a shrink-wrapped boundary} :dt
The x, y, z options cannot be applied to shrink-wrapped
dimensions. :dd
{Cannot use fix deform tilt on a shrink-wrapped 2nd dim} :dt
This is because the shrink-wrapping will change the value
of the strain implied by the tilt factor. :dd
{Cannot use fix deform trate on a box with zero tilt} :dt
The trate style alters the current strain. :dd
{Cannot use fix deposit rigid and not molecule} :dt
Self-explanatory. :dd
{Cannot use fix deposit rigid and shake} :dt
These two attributes are conflicting. :dd
{Cannot use fix deposit shake and not molecule} :dt
Self-explanatory. :dd
{Cannot use fix enforce2d with 3d simulation} :dt
Self-explanatory. :dd
{Cannot use fix gcmc in a 2d simulation} :dt
Fix gcmc is set up to run in 3d only. No 2d simulations with fix gcmc
are allowed. :dd
{Cannot use fix gcmc shake and not molecule} :dt
Self-explanatory. :dd
{Cannot use fix msst without per-type mass defined} :dt
Self-explanatory. :dd
{Cannot use fix npt and fix deform on same component of stress tensor} :dt
This would be changing the same box dimension twice. :dd
{Cannot use fix nvt/npt/nph on a 2nd non-periodic dimension} :dt
When specifying an off-diagonal pressure component, the 2nd of the two
dimensions must be periodic. E.g. if the xy component is specified,
then the y dimension must be periodic. :dd
{Cannot use fix nvt/npt/nph on a non-periodic dimension} :dt
When specifying a diagonal pressure component, the dimension must be
periodic. :dd
{Cannot use fix nvt/npt/nph with both xy dynamics and xy scaling} :dt
Self-explanatory. :dd
{Cannot use fix nvt/npt/nph with both xz dynamics and xz scaling} :dt
Self-explanatory. :dd
{Cannot use fix nvt/npt/nph with both yz dynamics and yz scaling} :dt
Self-explanatory. :dd
{Cannot use fix nvt/npt/nph with xy scaling when y is non-periodic dimension} :dt
The 2nd dimension in the barostatted tilt factor must be periodic. :dd
{Cannot use fix nvt/npt/nph with xz scaling when z is non-periodic dimension} :dt
The 2nd dimension in the barostatted tilt factor must be periodic. :dd
{Cannot use fix nvt/npt/nph with yz scaling when z is non-periodic dimension} :dt
The 2nd dimension in the barostatted tilt factor must be periodic. :dd
{Cannot use fix pour rigid and not molecule} :dt
Self-explanatory. :dd
{Cannot use fix pour rigid and shake} :dt
These two attributes are conflicting. :dd
{Cannot use fix pour shake and not molecule} :dt
Self-explanatory. :dd
{Cannot use fix pour with triclinic box} :dt
This option is not yet supported. :dd
{Cannot use fix press/berendsen and fix deform on same component of stress tensor} :dt
These commands both change the box size/shape, so you cannot use both
together. :dd
{Cannot use fix press/berendsen on a non-periodic dimension} :dt
Self-explanatory. :dd
{Cannot use fix press/berendsen with triclinic box} :dt
Self-explanatory. :dd
{Cannot use fix reax/bonds without pair_style reax} :dt
Self-explantory. :dd
{Cannot use fix rigid npt/nph and fix deform on same component of stress tensor} :dt
This would be changing the same box dimension twice. :dd
{Cannot use fix rigid npt/nph on a non-periodic dimension} :dt
When specifying a diagonal pressure component, the dimension must be
periodic. :dd
{Cannot use fix rigid/small npt/nph on a non-periodic dimension} :dt
When specifying a diagonal pressure component, the dimension must be
periodic. :dd
{Cannot use fix shake with non-molecular system} :dt
Your choice of atom style does not have bonds. :dd
{Cannot use fix ttm with 2d simulation} :dt
This is a current restriction of this fix due to the grid it creates. :dd
{Cannot use fix ttm with triclinic box} :dt
This is a current restriction of this fix due to the grid it creates. :dd
{Cannot use fix tune/kspace without a kspace style} :dt
Self-explanatory. :dd
{Cannot use fix tune/kspace without a pair style} :dt
This fix (tune/kspace) can only be used when a pair style has been specified. :dd
{Cannot use fix wall in periodic dimension} :dt
Self-explanatory. :dd
{Cannot use fix wall zlo/zhi for a 2d simulation} :dt
Self-explanatory. :dd
{Cannot use fix wall/reflect in periodic dimension} :dt
Self-explanatory. :dd
{Cannot use fix wall/reflect zlo/zhi for a 2d simulation} :dt
Self-explanatory. :dd
{Cannot use fix wall/srd in periodic dimension} :dt
Self-explanatory. :dd
{Cannot use fix wall/srd more than once} :dt
Nor is their a need to since multiple walls can be specified
in one command. :dd
{Cannot use fix wall/srd without fix srd} :dt
Self-explanatory. :dd
{Cannot use fix wall/srd zlo/zhi for a 2d simulation} :dt
Self-explanatory. :dd
{Cannot use fix_deposit unless atoms have IDs} :dt
Self-explanatory. :dd
{Cannot use fix_pour unless atoms have IDs} :dt
Self-explanatory. :dd
{Cannot use include command within an if command} :dt
Self-explanatory. :dd
{Cannot use lines with fix srd unless overlap is set} :dt
This is because line segements are connected to each other. :dd
{Cannot use multiple fix wall commands with pair brownian} :dt
Self-explanatory. :dd
{Cannot use multiple fix wall commands with pair lubricate} :dt
Self-explanatory. :dd
{Cannot use multiple fix wall commands with pair lubricate/poly} :dt
Self-explanatory. :dd
{Cannot use multiple fix wall commands with pair lubricateU} :dt
Self-explanatory. :dd
{Cannot use neigh_modify exclude with GPU neighbor builds} :dt
This is a current limitation of the GPU implementation
in LAMMPS. :dd
{Cannot use neighbor bins - box size << cutoff} :dt
Too many neighbor bins will be created. This typically happens when
the simulation box is very small in some dimension, compared to the
neighbor cutoff. Use the "nsq" style instead of "bin" style. :dd
{Cannot use newton pair with beck/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with born/coul/long/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with born/coul/wolf/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with born/gpu pair style} :dt
Self-explantory. :dd
{Cannot use newton pair with buck/coul/cut/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with buck/coul/long/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with buck/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with colloid/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with coul/cut/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with coul/debye/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with coul/dsf/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with coul/long/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with dipole/cut/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with dipole/sf/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with dpd/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with dpd/tstat/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with eam/alloy/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with eam/fs/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with eam/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with gauss/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with gayberne/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with lj/charmm/coul/long/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with lj/class2/coul/long/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with lj/class2/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with lj/cubic/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with lj/cut/coul/cut/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with lj/cut/coul/debye/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with lj/cut/coul/dsf/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with lj/cut/coul/long/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with lj/cut/coul/msm/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with lj/cut/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with lj/expand/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with lj/gromacs/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with lj/sdk/coul/long/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with lj/sdk/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with lj96/cut/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with mie/cut/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with morse/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with resquared/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with soft/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with table/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with yukawa/colloid/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with yukawa/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with zbl/gpu pair style} :dt
Self-explantory. :dd
{Cannot use non-zero forces in an energy minimization} :dt
Fix setforce cannot be used in this manner. Use fix addforce
instead. :dd
{Cannot use nonperiodic boundares with fix ttm} :dt
This fix requires a fully periodic simulation box. :dd
{Cannot use nonperiodic boundaries with Ewald} :dt
For kspace style ewald, all 3 dimensions must have periodic boundaries
unless you use the kspace_modify command to define a 2d slab with a
non-periodic z dimension. :dd
{Cannot use nonperiodic boundaries with EwaldDisp} :dt
For kspace style ewald/disp, all 3 dimensions must have periodic
boundaries unless you use the kspace_modify command to define a 2d
slab with a non-periodic z dimension. :dd
{Cannot use nonperiodic boundaries with PPPM} :dt
For kspace style pppm, all 3 dimensions must have periodic boundaries
unless you use the kspace_modify command to define a 2d slab with a
non-periodic z dimension. :dd
{Cannot use nonperiodic boundaries with PPPMDisp} :dt
For kspace style pppm/disp, all 3 dimensions must have periodic
boundaries unless you use the kspace_modify command to define a 2d
slab with a non-periodic z dimension. :dd
{Cannot use order greater than 8 with pppm/gpu.} :dt
Self-explanatory. :dd
{Cannot use package gpu neigh yes with triclinic box} :dt
This is a current restriction in LAMMPS. :dd
{Cannot use pair hybrid with GPU neighbor list builds} :dt
Neighbor list builds must be done on the CPU for this pair style. :dd
{Cannot use pair tail corrections with 2d simulations} :dt
The correction factors are only currently defined for 3d systems. :dd
{Cannot use processors part command without using partitions} :dt
See the command-line -partition switch. :dd
{Cannot use ramp in variable formula between runs} :dt
This is because the ramp() function is time dependent. :dd
{Cannot use read_data add before simulation box is defined} :dt
Self-explanatory. :dd
{Cannot use read_data extra with add flag} :dt
Self-explanatory. :dd
{Cannot use read_data offset without add flag} :dt
Self-explanatory. :dd
{Cannot use read_data shift without add flag} :dt
Self-explanatory. :dd
{Cannot use region INF or EDGE when box does not exist} :dt
Regions that extend to the box boundaries can only be used after the
create_box command has been used. :dd
{Cannot use set atom with no atom IDs defined} :dt
Atom IDs are not defined, so they cannot be used to identify an atom. :dd
{Cannot use set mol with no molecule IDs defined} :dt
Self-explanatory. :dd
{Cannot use swiggle in variable formula between runs} :dt
This is a function of elapsed time. :dd
{Cannot use tris with fix srd unless overlap is set} :dt
This is because triangles are connected to each other. :dd
{Cannot use variable energy with constant efield in fix efield} :dt
LAMMPS computes the energy itself when the E-field is constant. :dd
{Cannot use variable energy with constant force in fix addforce} :dt
This is because for constant force, LAMMPS can compute the change
in energy directly. :dd
{Cannot use variable every setting for dump dcd} :dt
The format of DCD dump files requires snapshots be output
at a constant frequency. :dd
{Cannot use variable every setting for dump xtc} :dt
The format of this file requires snapshots at regular intervals. :dd
{Cannot use vdisplace in variable formula between runs} :dt
This is a function of elapsed time. :dd
{Cannot use velocity bias command without temp keyword} :dt
Self-explanatory. :dd
{Cannot use velocity create loop all unless atoms have IDs} :dt
Atoms in the simulation to do not have IDs, so this style
of velocity creation cannot be performed. :dd
{Cannot use wall in periodic dimension} :dt
Self-explanatory. :dd
{Cannot use write_restart fileper without % in restart file name} :dt
Self-explanatory. :dd
{Cannot use write_restart nfile without % in restart file name} :dt
Self-explanatory. :dd
{Cannot wiggle and shear fix wall/gran} :dt
Cannot specify both options at the same time. :dd
{Cannot write to restart file - MPI error: %s} :dt
This error was generated by MPI when reading/writing an MPI-IO restart
file. :dd
{Cannot yet use KSpace solver with grid with comm style tiled} :dt
This is current restriction in LAMMPS. :dd
{Cannot yet use comm_style tiled with multi-mode comm} :dt
Self-explanatory. :dd
{Cannot yet use comm_style tiled with triclinic box} :dt
Self-explanatory. :dd
{Cannot yet use compute tally with Kokkos} :dt
This feature is not yet supported. :dd
{Cannot yet use fix bond/break with this improper style} :dt
This is a current restriction in LAMMPS. :dd
{Cannot yet use fix bond/create with this improper style} :dt
This is a current restriction in LAMMPS. :dd
{Cannot yet use minimize with Kokkos} :dt
This feature is not yet supported. :dd
{Cannot yet use pair hybrid with Kokkos} :dt
This feature is not yet supported. :dd
{Cannot zero Langevin force of 0 atoms} :dt
The group has zero atoms, so you cannot request its force
be zeroed. :dd
{Cannot zero gld force for zero atoms} :dt
There are no atoms currently in the group. :dd
{Cannot zero momentum of no atoms} :dt
Self-explanatory. :dd
{Change_box command before simulation box is defined} :dt
Self-explanatory. :dd
{Change_box volume used incorrectly} :dt
The "dim volume" option must be used immediately following one or two
settings for "dim1 ..." (and optionally "dim2 ...") and must be for a
different dimension, i.e. dim != dim1 and dim != dim2. :dd
{Chunk/atom compute does not exist for compute angmom/chunk} :dt
Self-explanatory. :dd
{Chunk/atom compute does not exist for compute com/chunk} :dt
Self-explanatory. :dd
{Chunk/atom compute does not exist for compute gyration/chunk} :dt
Self-explanatory. :dd
{Chunk/atom compute does not exist for compute inertia/chunk} :dt
Self-explanatory. :dd
{Chunk/atom compute does not exist for compute msd/chunk} :dt
Self-explanatory. :dd
{Chunk/atom compute does not exist for compute omega/chunk} :dt
Self-explanatory. :dd
{Chunk/atom compute does not exist for compute property/chunk} :dt
Self-explanatory. :dd
{Chunk/atom compute does not exist for compute temp/chunk} :dt
Self-explanatory. :dd
{Chunk/atom compute does not exist for compute torque/chunk} :dt
Self-explanatory. :dd
{Chunk/atom compute does not exist for compute vcm/chunk} :dt
Self-explanatory. :dd
{Chunk/atom compute does not exist for fix ave/chunk} :dt
Self-explanatory. :dd
{Comm tiled invalid index in box drop brick} :dt
Internal error check in comm_style tiled which should not occur.
Contact the developers. :dd
{Comm tiled mis-match in box drop brick} :dt
Internal error check in comm_style tiled which should not occur.
Contact the developers. :dd
{Comm_modify group != atom_modify first group} :dt
Self-explanatory. :dd
{Communication cutoff for comm_style tiled cannot exceed periodic box length} :dt
Self-explanatory. :dd
{Communication cutoff too small for SNAP micro load balancing} :dt
This can happen if you change the neighbor skin after your pair_style
command or if your box dimensions grow during a run. You can set the
cutoff explicitly via the comm_modify cutoff command. :dd
{Compute %s does not allow use of dynamic group} :dt
Dynamic groups have not yet been enabled for this compute. :dd
{Compute ID for compute chunk /atom does not exist} :dt
Self-explanatory. :dd
{Compute ID for compute chunk/atom does not exist} :dt
Self-explanatory. :dd
{Compute ID for compute reduce does not exist} :dt
Self-explanatory. :dd
{Compute ID for compute slice does not exist} :dt
Self-explanatory. :dd
{Compute ID for fix ave/atom does not exist} :dt
Self-explanatory. :dd
{Compute ID for fix ave/chunk does not exist} :dt
Self-explanatory. :dd
{Compute ID for fix ave/correlate does not exist} :dt
Self-explanatory. :dd
{Compute ID for fix ave/histo does not exist} :dt
Self-explanatory. :dd
{Compute ID for fix ave/spatial does not exist} :dt
Self-explanatory. :dd
{Compute ID for fix ave/time does not exist} :dt
Self-explanatory. :dd
{Compute ID for fix store/state does not exist} :dt
Self-explanatory. :dd
{Compute ID for fix vector does not exist} :dt
Self-explanatory. :dd
{Compute ID must be alphanumeric or underscore characters} :dt
Self-explanatory. :dd
{Compute angle/local used when angles are not allowed} :dt
The atom style does not support angles. :dd
{Compute angmom/chunk does not use chunk/atom compute} :dt
The style of the specified compute is not chunk/atom. :dd
{Compute body/local requires atom style body} :dt
Self-explanatory. :dd
{Compute bond/local used when bonds are not allowed} :dt
The atom style does not support bonds. :dd
{Compute centro/atom requires a pair style be defined} :dt
This is because the computation of the centro-symmetry values
uses a pairwise neighbor list. :dd
{Compute chunk/atom bin/cylinder radius is too large for periodic box} :dt
Radius cannot be bigger than 1/2 of a non-axis periodic dimention. :dd
{Compute chunk/atom bin/sphere radius is too large for periodic box} :dt
Radius cannot be bigger than 1/2 of any periodic dimention. :dd
{Compute chunk/atom compute array is accessed out-of-range} :dt
The index for the array is out of bounds. :dd
{Compute chunk/atom compute does not calculate a per-atom array} :dt
Self-explanatory. :dd
{Compute chunk/atom compute does not calculate a per-atom vector} :dt
Self-explanatory. :dd
{Compute chunk/atom compute does not calculate per-atom values} :dt
Self-explanatory. :dd
{Compute chunk/atom cylinder axis must be z for 2d} :dt
Self-explanatory. :dd
{Compute chunk/atom fix array is accessed out-of-range} :dt
the index for the array is out of bounds. :dd
{Compute chunk/atom fix does not calculate a per-atom array} :dt
Self-explanatory. :dd
{Compute chunk/atom fix does not calculate a per-atom vector} :dt
Self-explanatory. :dd
{Compute chunk/atom fix does not calculate per-atom values} :dt
Self-explanatory. :dd
{Compute chunk/atom for triclinic boxes requires units reduced} :dt
Self-explanatory. :dd
{Compute chunk/atom ids once but nchunk is not once} :dt
You cannot assign chunks IDs to atom permanently if the number of
chunks may change. :dd
{Compute chunk/atom molecule for non-molecular system} :dt
Self-explanatory. :dd
{Compute chunk/atom sphere z origin must be 0.0 for 2d} :dt
Self-explanatory. :dd
{Compute chunk/atom stores no IDs for compute property/chunk} :dt
It will only store IDs if its compress option is enabled. :dd
{Compute chunk/atom stores no coord1 for compute property/chunk} :dt
Only certain binning options for comptue chunk/atom store coordinates. :dd
{Compute chunk/atom stores no coord2 for compute property/chunk} :dt
Only certain binning options for comptue chunk/atom store coordinates. :dd
{Compute chunk/atom stores no coord3 for compute property/chunk} :dt
Only certain binning options for comptue chunk/atom store coordinates. :dd
{Compute chunk/atom variable is not atom-style variable} :dt
Self-explanatory. :dd
{Compute chunk/atom without bins cannot use discard mixed} :dt
That discard option only applies to the binning styles. :dd
{Compute cluster/atom cutoff is longer than pairwise cutoff} :dt
Cannot identify clusters beyond cutoff. :dd
{Compute cluster/atom requires a pair style be defined} :dt
This is so that the pair style defines a cutoff distance which
is used to find clusters. :dd
{Compute cna/atom cutoff is longer than pairwise cutoff} :dt
Self-explantory. :dd
{Compute cna/atom requires a pair style be defined} :dt
Self-explantory. :dd
{Compute com/chunk does not use chunk/atom compute} :dt
The style of the specified compute is not chunk/atom. :dd
{Compute contact/atom requires a pair style be defined} :dt
Self-explantory. :dd
{Compute contact/atom requires atom style sphere} :dt
Self-explanatory. :dd
{Compute coord/atom cutoff is longer than pairwise cutoff} :dt
Cannot compute coordination at distances longer than the pair cutoff,
since those atoms are not in the neighbor list. :dd
{Compute coord/atom requires a pair style be defined} :dt
Self-explantory. :dd
{Compute damage/atom requires peridynamic potential} :dt
Damage is a Peridynamic-specific metric. It requires you
to be running a Peridynamics simulation. :dd
{Compute dihedral/local used when dihedrals are not allowed} :dt
The atom style does not support dihedrals. :dd
{Compute dilatation/atom cannot be used with this pair style} :dt
Self-explanatory. :dd
{Compute dilatation/atom requires Peridynamic pair style} :dt
Self-explanatory. :dd
{Compute does not allow an extra compute or fix to be reset} :dt
This is an internal LAMMPS error. Please report it to the
developers. :dd
{Compute erotate/asphere requires atom style ellipsoid or line or tri} :dt
Self-explanatory. :dd
{Compute erotate/asphere requires extended particles} :dt
This compute cannot be used with point paritlces. :dd
{Compute erotate/rigid with non-rigid fix-ID} :dt
Self-explanatory. :dd
{Compute erotate/sphere requires atom style sphere} :dt
Self-explanatory. :dd
{Compute erotate/sphere/atom requires atom style sphere} :dt
Self-explanatory. :dd
{Compute event/displace has invalid fix event assigned} :dt
This is an internal LAMMPS error. Please report it to the
developers. :dd
{Compute group/group group ID does not exist} :dt
Self-explanatory. :dd
{Compute gyration/chunk does not use chunk/atom compute} :dt
The style of the specified compute is not chunk/atom. :dd
{Compute heat/flux compute ID does not compute ke/atom} :dt
Self-explanatory. :dd
{Compute heat/flux compute ID does not compute pe/atom} :dt
Self-explanatory. :dd
{Compute heat/flux compute ID does not compute stress/atom} :dt
Self-explanatory. :dd
{Compute hexorder/atom cutoff is longer than pairwise cutoff} :dt
Cannot compute order parameter beyond cutoff. :dd
{Compute hexorder/atom requires a pair style be defined} :dt
Self-explantory. :dd
{Compute improper/local used when impropers are not allowed} :dt
The atom style does not support impropers. :dd
{Compute inertia/chunk does not use chunk/atom compute} :dt
The style of the specified compute is not chunk/atom. :dd
{Compute ke/rigid with non-rigid fix-ID} :dt
Self-explanatory. :dd
{Compute msd/chunk does not use chunk/atom compute} :dt
The style of the specified compute is not chunk/atom. :dd
{Compute msd/chunk nchunk is not static} :dt
This is required because the MSD cannot be computed consistently if
the number of chunks is changing. Compute chunk/atom allows setting
nchunk to be static. :dd
{Compute nve/asphere requires atom style ellipsoid} :dt
Self-explanatory. :dd
{Compute nvt/nph/npt asphere requires atom style ellipsoid} :dt
Self-explanatory. :dd
{Compute nvt/nph/npt body requires atom style body} :dt
Self-explanatory. :dd
{Compute omega/chunk does not use chunk/atom compute} :dt
The style of the specified compute is not chunk/atom. :dd
{Compute orientorder/atom cutoff is longer than pairwise cutoff} :dt
Cannot compute order parameter beyond cutoff. :dd
{Compute orientorder/atom requires a pair style be defined} :dt
Self-explantory. :dd
{Compute pair must use group all} :dt
Pair styles accumlate energy on all atoms. :dd
{Compute pe must use group all} :dt
Energies computed by potentials (pair, bond, etc) are computed on all
atoms. :dd
{Compute plasticity/atom cannot be used with this pair style} :dt
Self-explanatory. :dd
{Compute plasticity/atom requires Peridynamic pair style} :dt
Self-explanatory. :dd
{Compute pressure must use group all} :dt
Virial contributions computed by potentials (pair, bond, etc) are
computed on all atoms. :dd
{Compute pressure requires temperature ID to include kinetic energy} :dt
The keflag cannot be used unless a temperature compute is provided. :dd
{Compute pressure temperature ID does not compute temperature} :dt
The compute ID assigned to a pressure computation must compute
temperature. :dd
{Compute property/atom floating point vector does not exist} :dt
The command is accessing a vector added by the fix property/atom
command, that does not exist. :dd
{Compute property/atom for atom property that isn't allocated} :dt
Self-explanatory. :dd
{Compute property/atom integer vector does not exist} :dt
The command is accessing a vector added by the fix property/atom
command, that does not exist. :dd
{Compute property/chunk does not use chunk/atom compute} :dt
The style of the specified compute is not chunk/atom. :dd
{Compute property/local cannot use these inputs together} :dt
Only inputs that generate the same number of datums can be used
togther. E.g. bond and angle quantities cannot be mixed. :dd
{Compute property/local does not (yet) work with atom_style template} :dt
Self-explanatory. :dd
{Compute property/local for property that isn't allocated} :dt
Self-explanatory. :dd
{Compute rdf requires a pair style be defined} :dt
Self-explanatory. :dd
{Compute reduce compute array is accessed out-of-range} :dt
An index for the array is out of bounds. :dd
{Compute reduce compute calculates global values} :dt
A compute that calculates peratom or local values is required. :dd
{Compute reduce compute does not calculate a local array} :dt
Self-explanatory. :dd
{Compute reduce compute does not calculate a local vector} :dt
Self-explanatory. :dd
{Compute reduce compute does not calculate a per-atom array} :dt
Self-explanatory. :dd
{Compute reduce compute does not calculate a per-atom vector} :dt
Self-explanatory. :dd
{Compute reduce fix array is accessed out-of-range} :dt
An index for the array is out of bounds. :dd
{Compute reduce fix calculates global values} :dt
A fix that calculates peratom or local values is required. :dd
{Compute reduce fix does not calculate a local array} :dt
Self-explanatory. :dd
{Compute reduce fix does not calculate a local vector} :dt
Self-explanatory. :dd
{Compute reduce fix does not calculate a per-atom array} :dt
Self-explanatory. :dd
{Compute reduce fix does not calculate a per-atom vector} :dt
Self-explanatory. :dd
{Compute reduce replace requires min or max mode} :dt
Self-explanatory. :dd
{Compute reduce variable is not atom-style variable} :dt
Self-explanatory. :dd
{Compute slice compute array is accessed out-of-range} :dt
An index for the array is out of bounds. :dd
{Compute slice compute does not calculate a global array} :dt
Self-explanatory. :dd
{Compute slice compute does not calculate a global vector} :dt
Self-explanatory. :dd
{Compute slice compute does not calculate global vector or array} :dt
Self-explanatory. :dd
{Compute slice compute vector is accessed out-of-range} :dt
The index for the vector is out of bounds. :dd
{Compute slice fix array is accessed out-of-range} :dt
An index for the array is out of bounds. :dd
{Compute slice fix does not calculate a global array} :dt
Self-explanatory. :dd
{Compute slice fix does not calculate a global vector} :dt
Self-explanatory. :dd
{Compute slice fix does not calculate global vector or array} :dt
Self-explanatory. :dd
{Compute slice fix vector is accessed out-of-range} :dt
The index for the vector is out of bounds. :dd
{Compute sna/atom cutoff is longer than pairwise cutoff} :dt
Self-explanatory. :dd
{Compute sna/atom requires a pair style be defined} :dt
Self-explanatory. :dd
{Compute snad/atom cutoff is longer than pairwise cutoff} :dt
Self-explanatory. :dd
{Compute snad/atom requires a pair style be defined} :dt
Self-explanatory. :dd
{Compute snav/atom cutoff is longer than pairwise cutoff} :dt
Self-explanatory. :dd
{Compute snav/atom requires a pair style be defined} :dt
Self-explanatory. :dd
{Compute stress/atom temperature ID does not compute temperature} :dt
The specified compute must compute temperature. :dd
{Compute temp/asphere requires atom style ellipsoid} :dt
Self-explanatory. :dd
{Compute temp/asphere requires extended particles} :dt
This compute cannot be used with point paritlces. :dd
{Compute temp/body requires atom style body} :dt
Self-explanatory. :dd
{Compute temp/body requires bodies} :dt
This compute can only be applied to body particles. :dd
{Compute temp/chunk does not use chunk/atom compute} :dt
The style of the specified compute is not chunk/atom. :dd
{Compute temp/cs requires ghost atoms store velocity} :dt
Use the comm_modify vel yes command to enable this. :dd
{Compute temp/cs used when bonds are not allowed} :dt
This compute only works on pairs of bonded particles. :dd
{Compute temp/partial cannot use vz for 2d systemx} :dt
Self-explanatory. :dd
{Compute temp/profile cannot bin z for 2d systems} :dt
Self-explanatory. :dd
{Compute temp/profile cannot use vz for 2d systemx} :dt
Self-explanatory. :dd
{Compute temp/sphere requires atom style sphere} :dt
Self-explanatory. :dd
{Compute ti kspace style does not exist} :dt
Self-explanatory. :dd
{Compute ti pair style does not exist} :dt
Self-explanatory. :dd
{Compute ti tail when pair style does not compute tail corrections} :dt
Self-explanatory. :dd
{Compute torque/chunk does not use chunk/atom compute} :dt
The style of the specified compute is not chunk/atom. :dd
{Compute used in dump between runs is not current} :dt
The compute was not invoked on the current timestep, therefore it
cannot be used in a dump between runs. :dd
{Compute used in variable between runs is not current} :dt
Computes cannot be invoked by a variable in between runs. Thus they
must have been evaluated on the last timestep of the previous run in
order for their value(s) to be accessed. See the doc page for the
variable command for more info. :dd
{Compute used in variable thermo keyword between runs is not current} :dt
Some thermo keywords rely on a compute to calculate their value(s).
Computes cannot be invoked by a variable in between runs. Thus they
must have been evaluated on the last timestep of the previous run in
order for their value(s) to be accessed. See the doc page for the
variable command for more info. :dd
{Compute vcm/chunk does not use chunk/atom compute} :dt
The style of the specified compute is not chunk/atom. :dd
{Computed temperature for fix temp/berendsen cannot be 0.0} :dt
Self-explanatory. :dd
{Computed temperature for fix temp/rescale cannot be 0.0} :dt
Cannot rescale the temperature to a new value if the current
temperature is 0.0. :dd
{Core/shell partner atom not found} :dt
Could not find one of the atoms in the bond pair. :dd
{Core/shell partners were not all found} :dt
Could not find or more atoms in the bond pairs. :dd
{Could not adjust g_ewald_6} :dt
The Newton-Raphson solver failed to converge to a good value for
g_ewald. This error should not occur for typical problems. Please
send an email to the developers. :dd
{Could not compute g_ewald} :dt
The Newton-Raphson solver failed to converge to a good value for
g_ewald. This error should not occur for typical problems. Please
send an email to the developers. :dd
{Could not compute grid size} :dt
The code is unable to compute a grid size consistent with the desired
accuracy. This error should not occur for typical problems. Please
send an email to the developers. :dd
{Could not compute grid size for Coulomb interaction} :dt
The code is unable to compute a grid size consistent with the desired
accuracy. This error should not occur for typical problems. Please
send an email to the developers. :dd
{Could not compute grid size for Dispersion} :dt
The code is unable to compute a grid size consistent with the desired
accuracy. This error should not occur for typical problems. Please
send an email to the developers. :dd
{Could not create 3d FFT plan} :dt
The FFT setup for the PPPM solver failed, typically due
to lack of memory. This is an unusual error. Check the
size of the FFT grid you are requesting. :dd
{Could not create 3d grid of processors} :dt
The specified constraints did not allow a Px by Py by Pz grid to be
created where Px * Py * Pz = P = total number of processors. :dd
{Could not create 3d remap plan} :dt
The FFT setup in pppm failed. :dd
{Could not create Python function arguments} :dt
This is an internal Python error, possibly because the number
of inputs to the function is too large. :dd
{Could not create numa grid of processors} :dt
The specified constraints did not allow this style of grid to be
created. Usually this is because the total processor count is not a
multiple of the cores/node or the user specified processor count is >
1 in one of the dimensions. :dd
{Could not create twolevel 3d grid of processors} :dt
The specified constraints did not allow this style of grid to be
created. :dd
{Could not evaluate Python function input variable} :dt
Self-explanatory. :dd
{Could not find Python function} :dt
The provided Python code was run successfully, but it not
define a callable function with the required name. :dd
{Could not find atom_modify first group ID} :dt
Self-explanatory. :dd
{Could not find change_box group ID} :dt
Group ID used in the change_box command does not exist. :dd
{Could not find compute ID for PRD} :dt
Self-explanatory. :dd
{Could not find compute ID for TAD} :dt
Self-explanatory. :dd
{Could not find compute ID for temperature bias} :dt
Self-explanatory. :dd
{Could not find compute ID to delete} :dt
Self-explanatory. :dd
{Could not find compute displace/atom fix ID} :dt
Self-explanatory. :dd
{Could not find compute event/displace fix ID} :dt
Self-explanatory. :dd
{Could not find compute group ID} :dt
Self-explanatory. :dd
{Could not find compute heat/flux compute ID} :dt
Self-explanatory. :dd
{Could not find compute msd fix ID} :dt
Self-explanatory. :dd
{Could not find compute msd/chunk fix ID} :dt
The compute creates an internal fix, which has been deleted. :dd
{Could not find compute pressure temperature ID} :dt
The compute ID for calculating temperature does not exist. :dd
{Could not find compute stress/atom temperature ID} :dt
Self-explanatory. :dd
{Could not find compute vacf fix ID} :dt
Self-explanatory. :dd
{Could not find compute/voronoi surface group ID} :dt
Self-explanatory. :dd
{Could not find compute_modify ID} :dt
Self-explanatory. :dd
{Could not find custom per-atom property ID} :dt
Self-explanatory. :dd
{Could not find delete_atoms group ID} :dt
Group ID used in the delete_atoms command does not exist. :dd
{Could not find delete_atoms region ID} :dt
Region ID used in the delete_atoms command does not exist. :dd
{Could not find displace_atoms group ID} :dt
Group ID used in the displace_atoms command does not exist. :dd
{Could not find dump custom compute ID} :dt
Self-explanatory. :dd
{Could not find dump custom fix ID} :dt
Self-explanatory. :dd
{Could not find dump custom variable name} :dt
Self-explanatory. :dd
{Could not find dump group ID} :dt
A group ID used in the dump command does not exist. :dd
{Could not find dump local compute ID} :dt
Self-explanatory. :dd
{Could not find dump local fix ID} :dt
Self-explanatory. :dd
{Could not find dump modify compute ID} :dt
Self-explanatory. :dd
{Could not find dump modify custom atom floating point property ID} :dt
Self-explanatory. :dd
{Could not find dump modify custom atom integer property ID} :dt
Self-explanatory. :dd
{Could not find dump modify fix ID} :dt
Self-explanatory. :dd
{Could not find dump modify variable name} :dt
Self-explanatory. :dd
{Could not find fix ID to delete} :dt
Self-explanatory. :dd
{Could not find fix adapt storage fix ID} :dt
This should not happen unless you explicitly deleted
a secondary fix that fix adapt created internally. :dd
{Could not find fix gcmc exclusion group ID} :dt
Self-explanatory. :dd
{Could not find fix gcmc rotation group ID} :dt
Self-explanatory. :dd
{Could not find fix group ID} :dt
A group ID used in the fix command does not exist. :dd
{Could not find fix msst compute ID} :dt
Self-explanatory. :dd
{Could not find fix poems group ID} :dt
A group ID used in the fix poems command does not exist. :dd
{Could not find fix recenter group ID} :dt
A group ID used in the fix recenter command does not exist. :dd
{Could not find fix rigid group ID} :dt
A group ID used in the fix rigid command does not exist. :dd
{Could not find fix srd group ID} :dt
Self-explanatory. :dd
{Could not find fix_modify ID} :dt
A fix ID used in the fix_modify command does not exist. :dd
{Could not find fix_modify pressure ID} :dt
The compute ID for computing pressure does not exist. :dd
{Could not find fix_modify temperature ID} :dt
The compute ID for computing temperature does not exist. :dd
{Could not find group clear group ID} :dt
Self-explanatory. :dd
{Could not find group delete group ID} :dt
Self-explanatory. :dd
{Could not find pair fix ID} :dt
A fix is created internally by the pair style to store shear
history information. You cannot delete it. :dd
{Could not find set group ID} :dt
Group ID specified in set command does not exist. :dd
{Could not find specified fix gcmc group ID} :dt
Self-explanatory. :dd
{Could not find thermo compute ID} :dt
Compute ID specified in thermo_style command does not exist. :dd
{Could not find thermo custom compute ID} :dt
The compute ID needed by thermo style custom to compute a requested
quantity does not exist. :dd
{Could not find thermo custom fix ID} :dt
The fix ID needed by thermo style custom to compute a requested
quantity does not exist. :dd
{Could not find thermo custom variable name} :dt
Self-explanatory. :dd
{Could not find thermo fix ID} :dt
Fix ID specified in thermo_style command does not exist. :dd
{Could not find thermo variable name} :dt
Self-explanatory. :dd
{Could not find thermo_modify pressure ID} :dt
The compute ID needed by thermo style custom to compute pressure does
not exist. :dd
{Could not find thermo_modify temperature ID} :dt
The compute ID needed by thermo style custom to compute temperature does
not exist. :dd
{Could not find undump ID} :dt
A dump ID used in the undump command does not exist. :dd
{Could not find velocity group ID} :dt
A group ID used in the velocity command does not exist. :dd
{Could not find velocity temperature ID} :dt
The compute ID needed by the velocity command to compute temperature
does not exist. :dd
{Could not find/initialize a specified accelerator device} :dt
Could not initialize at least one of the devices specified for the gpu
package :dd
{Could not grab element entry from EIM potential file} :dt
Self-explanatory :dd
{Could not grab global entry from EIM potential file} :dt
Self-explanatory. :dd
{Could not grab pair entry from EIM potential file} :dt
Self-explanatory. :dd
{Could not initialize embedded Python} :dt
The main module in Python was not accessible. :dd
{Could not open Python file} :dt
The specified file of Python code cannot be opened. Check that the
path and name are correct. :dd
{Could not process Python file} :dt
The Python code in the specified file was not run sucessfully by
Python, probably due to errors in the Python code. :dd
{Could not process Python string} :dt
The Python code in the here string was not run sucessfully by Python,
probably due to errors in the Python code. :dd
{Coulomb PPPMDisp order has been reduced below minorder} :dt
The default minimum order is 2. This can be reset by the
kspace_modify minorder command. :dd
{Coulomb cut not supported in pair_style buck/long/coul/coul} :dt
Must use long-range Coulombic interactions. :dd
{Coulomb cut not supported in pair_style lj/long/coul/long} :dt
Must use long-range Coulombic interactions. :dd
{Coulomb cut not supported in pair_style lj/long/tip4p/long} :dt
Must use long-range Coulombic interactions. :dd
{Coulomb cutoffs of pair hybrid sub-styles do not match} :dt
If using a Kspace solver, all Coulomb cutoffs of long pair styles must
be the same. :dd
{Coulombic cut not supported in pair_style lj/long/dipole/long} :dt
Must use long-range Coulombic interactions. :dd
{Cound not find dump_modify ID} :dt
Self-explanatory. :dd
{Create_atoms command before simulation box is defined} :dt
The create_atoms command cannot be used before a read_data,
read_restart, or create_box command. :dd
{Create_atoms molecule has atom IDs, but system does not} :dt
The atom_style id command can be used to force atom IDs to be stored. :dd
{Create_atoms molecule must have atom types} :dt
The defined molecule does not specify atom types. :dd
{Create_atoms molecule must have coordinates} :dt
The defined molecule does not specify coordinates. :dd
{Create_atoms region ID does not exist} :dt
A region ID used in the create_atoms command does not exist. :dd
{Create_bonds command before simulation box is defined} :dt
Self-explanatory. :dd
{Create_bonds command requires no kspace_style be defined} :dt
This is so that atom pairs that are already bonded to not appear
in the neighbor list. :dd
{Create_bonds command requires special_bonds 1-2 weights be 0.0} :dt
This is so that atom pairs that are already bonded to not appear in
the neighbor list. :dd
{Create_bonds max distance > neighbor cutoff} :dt
Can only create bonds for atom pairs that will be in neighbor list. :dd
{Create_bonds requires a pair style be defined} :dt
Self-explanatory. :dd
{Create_box region ID does not exist} :dt
Self-explanatory. :dd
{Create_box region does not support a bounding box} :dt
Not all regions represent bounded volumes. You cannot use
such a region with the create_box command. :dd
{Custom floating point vector for fix store/state does not exist} :dt
The command is accessing a vector added by the fix property/atom
command, that does not exist. :dd
{Custom integer vector for fix store/state does not exist} :dt
The command is accessing a vector added by the fix property/atom
command, that does not exist. :dd
{Custom per-atom property ID is not floating point} :dt
Self-explanatory. :dd
{Custom per-atom property ID is not integer} :dt
Self-explanatory. :dd
{Cut-offs missing in pair_style lj/long/dipole/long} :dt
Self-explanatory. :dd
{Cutoffs missing in pair_style buck/long/coul/long} :dt
Self-exlanatory. :dd
{Cutoffs missing in pair_style lj/long/coul/long} :dt
Self-explanatory. :dd
{Cyclic loop in joint connections} :dt
Fix poems cannot (yet) work with coupled bodies whose joints connect
the bodies in a ring (or cycle). :dd
{Degenerate lattice primitive vectors} :dt
Invalid set of 3 lattice vectors for lattice command. :dd
{Delete region ID does not exist} :dt
Self-explanatory. :dd
{Delete_atoms command before simulation box is defined} :dt
The delete_atoms command cannot be used before a read_data,
read_restart, or create_box command. :dd
{Delete_atoms cutoff > max neighbor cutoff} :dt
Can only delete atoms in atom pairs that will be in neighbor list. :dd
{Delete_atoms mol yes requires atom attribute molecule} :dt
Cannot use this option with a non-molecular system. :dd
{Delete_atoms requires a pair style be defined} :dt
This is because atom deletion within a cutoff uses a pairwise
neighbor list. :dd
{Delete_bonds command before simulation box is defined} :dt
The delete_bonds command cannot be used before a read_data,
read_restart, or create_box command. :dd
{Delete_bonds command with no atoms existing} :dt
No atoms are yet defined so the delete_bonds command cannot be used. :dd
{Deposition region extends outside simulation box} :dt
Self-explanatory. :dd
{Did not assign all atoms correctly} :dt
Atoms read in from a data file were not assigned correctly to
processors. This is likely due to some atom coordinates being
outside a non-periodic simulation box. :dd
{Did not assign all restart atoms correctly} :dt
Atoms read in from the restart file were not assigned correctly to
processors. This is likely due to some atom coordinates being outside
a non-periodic simulation box. Normally this should not happen. You
may wish to use the "remap" option on the read_restart command to see
if this helps. :dd
{Did not find all elements in MEAM library file} :dt
The requested elements were not found in the MEAM file. :dd
{Did not find fix shake partner info} :dt
Could not find bond partners implied by fix shake command. This error
can be triggered if the delete_bonds command was used before fix
shake, and it removed bonds without resetting the 1-2, 1-3, 1-4
weighting list via the special keyword. :dd
{Did not find keyword in table file} :dt
Keyword used in pair_coeff command was not found in table file. :dd
{Did not set pressure for fix rigid/nph} :dt
The press keyword must be specified. :dd
{Did not set temp for fix rigid/nvt/small} :dt
Self-explanatory. :dd
{Did not set temp or press for fix rigid/npt/small} :dt
Self-explanatory. :dd
{Did not set temperature for fix rigid/nvt} :dt
The temp keyword must be specified. :dd
{Did not set temperature or pressure for fix rigid/npt} :dt
The temp and press keywords must be specified. :dd
{Dihedral atom missing in delete_bonds} :dt
The delete_bonds command cannot find one or more atoms in a particular
dihedral on a particular processor. The pairwise cutoff is too short
or the atoms are too far apart to make a valid dihedral. :dd
{Dihedral atom missing in set command} :dt
The set command cannot find one or more atoms in a particular dihedral
on a particular processor. The pairwise cutoff is too short or the
atoms are too far apart to make a valid dihedral. :dd
{Dihedral atoms %d %d %d %d missing on proc %d at step %ld} :dt
One or more of 4 atoms needed to compute a particular dihedral are
missing on this processor. Typically this is because the pairwise
cutoff is set too short or the dihedral has blown apart and an atom is
too far away. :dd
{Dihedral atoms missing on proc %d at step %ld} :dt
One or more of 4 atoms needed to compute a particular dihedral are
missing on this processor. Typically this is because the pairwise
cutoff is set too short or the dihedral has blown apart and an atom is
too far away. :dd
{Dihedral charmm is incompatible with Pair style} :dt
Dihedral style charmm must be used with a pair style charmm
in order for the 1-4 epsilon/sigma parameters to be defined. :dd
{Dihedral coeff for hybrid has invalid style} :dt
Dihedral style hybrid uses another dihedral style as one of its
coefficients. The dihedral style used in the dihedral_coeff command
or read from a restart file is not recognized. :dd
{Dihedral coeffs are not set} :dt
No dihedral coefficients have been assigned in the data file or via
the dihedral_coeff command. :dd
{Dihedral style hybrid cannot have hybrid as an argument} :dt
Self-explanatory. :dd
{Dihedral style hybrid cannot have none as an argument} :dt
Self-explanatory. :dd
{Dihedral style hybrid cannot use same dihedral style twice} :dt
Self-explanatory. :dd
{Dihedral/improper extent > half of periodic box length} :dt
This error was detected by the neigh_modify check yes setting. It is
an error because the dihedral atoms are so far apart it is ambiguous
how it should be defined. :dd
{Dihedral_coeff command before dihedral_style is defined} :dt
Coefficients cannot be set in the data file or via the dihedral_coeff
command until an dihedral_style has been assigned. :dd
{Dihedral_coeff command before simulation box is defined} :dt
The dihedral_coeff command cannot be used before a read_data,
read_restart, or create_box command. :dd
{Dihedral_coeff command when no dihedrals allowed} :dt
The chosen atom style does not allow for dihedrals to be defined. :dd
{Dihedral_style command when no dihedrals allowed} :dt
The chosen atom style does not allow for dihedrals to be defined. :dd
{Dihedrals assigned incorrectly} :dt
Dihedrals read in from the data file were not assigned correctly to
atoms. This means there is something invalid about the topology
definitions. :dd
{Dihedrals defined but no dihedral types} :dt
The data file header lists dihedrals but no dihedral types. :dd
{Dimension command after simulation box is defined} :dt
The dimension command cannot be used after a read_data,
read_restart, or create_box command. :dd
{Dispersion PPPMDisp order has been reduced below minorder} :dt
The default minimum order is 2. This can be reset by the
kspace_modify minorder command. :dd
{Displace_atoms command before simulation box is defined} :dt
The displace_atoms command cannot be used before a read_data,
read_restart, or create_box command. :dd
{Distance must be > 0 for compute event/displace} :dt
Self-explanatory. :dd
{Divide by 0 in influence function} :dt
This should not normally occur. It is likely a problem with your
model. :dd
{Divide by 0 in influence function of pair peri/lps} :dt
This should not normally occur. It is likely a problem with your
model. :dd
{Divide by 0 in variable formula} :dt
Self-explanatory. :dd
{Domain too large for neighbor bins} :dt
The domain has become extremely large so that neighbor bins cannot be
used. Most likely, one or more atoms have been blown out of the
simulation box to a great distance. :dd
{Double precision is not supported on this accelerator} :dt
Self-explanatory :dd
{Dump atom/gz only writes compressed files} :dt
The dump atom/gz output file name must have a .gz suffix. :dd
{Dump cfg arguments can not mix xs|ys|zs with xsu|ysu|zsu} :dt
Self-explanatory. :dd
{Dump cfg arguments must start with 'mass type xs ys zs' or 'mass type xsu ysu zsu'} :dt
This is a requirement of the CFG output format. See the dump cfg doc
page for more details. :dd
{Dump cfg requires one snapshot per file} :dt
Use the wildcard "*" character in the filename. :dd
{Dump cfg/gz only writes compressed files} :dt
The dump cfg/gz output file name must have a .gz suffix. :dd
{Dump custom and fix not computed at compatible times} :dt
The fix must produce per-atom quantities on timesteps that dump custom
needs them. :dd
{Dump custom compute does not calculate per-atom array} :dt
Self-explanatory. :dd
{Dump custom compute does not calculate per-atom vector} :dt
Self-explanatory. :dd
{Dump custom compute does not compute per-atom info} :dt
Self-explanatory. :dd
{Dump custom compute vector is accessed out-of-range} :dt
Self-explanatory. :dd
{Dump custom fix does not compute per-atom array} :dt
Self-explanatory. :dd
{Dump custom fix does not compute per-atom info} :dt
Self-explanatory. :dd
{Dump custom fix does not compute per-atom vector} :dt
Self-explanatory. :dd
{Dump custom fix vector is accessed out-of-range} :dt
Self-explanatory. :dd
{Dump custom variable is not atom-style variable} :dt
Only atom-style variables generate per-atom quantities, needed for
dump output. :dd
{Dump custom/gz only writes compressed files} :dt
The dump custom/gz output file name must have a .gz suffix. :dd
{Dump dcd of non-matching # of atoms} :dt
Every snapshot written by dump dcd must contain the same # of atoms. :dd
{Dump dcd requires sorting by atom ID} :dt
Use the dump_modify sort command to enable this. :dd
{Dump every variable returned a bad timestep} :dt
The variable must return a timestep greater than the current timestep. :dd
{Dump file MPI-IO output not allowed with % in filename} :dt
This is because a % signifies one file per processor and MPI-IO
creates one large file for all processors. :dd
{Dump file does not contain requested snapshot} :dt
Self-explanatory. :dd
{Dump file is incorrectly formatted} :dt
Self-explanatory. :dd
{Dump image body yes requires atom style body} :dt
Self-explanatory. :dd
{Dump image bond not allowed with no bond types} :dt
Self-explanatory. :dd
{Dump image cannot perform sorting} :dt
Self-explanatory. :dd
{Dump image line requires atom style line} :dt
Self-explanatory. :dd
{Dump image persp option is not yet supported} :dt
Self-explanatory. :dd
{Dump image requires one snapshot per file} :dt
Use a "*" in the filename. :dd
{Dump image tri requires atom style tri} :dt
Self-explanatory. :dd
{Dump local and fix not computed at compatible times} :dt
The fix must produce per-atom quantities on timesteps that dump local
needs them. :dd
{Dump local attributes contain no compute or fix} :dt
Self-explanatory. :dd
{Dump local cannot sort by atom ID} :dt
This is because dump local does not really dump per-atom info. :dd
{Dump local compute does not calculate local array} :dt
Self-explanatory. :dd
{Dump local compute does not calculate local vector} :dt
Self-explanatory. :dd
{Dump local compute does not compute local info} :dt
Self-explanatory. :dd
{Dump local compute vector is accessed out-of-range} :dt
Self-explanatory. :dd
{Dump local count is not consistent across input fields} :dt
Every column of output must be the same length. :dd
{Dump local fix does not compute local array} :dt
Self-explanatory. :dd
{Dump local fix does not compute local info} :dt
Self-explanatory. :dd
{Dump local fix does not compute local vector} :dt
Self-explanatory. :dd
{Dump local fix vector is accessed out-of-range} :dt
Self-explanatory. :dd
{Dump modify bcolor not allowed with no bond types} :dt
Self-explanatory. :dd
{Dump modify bdiam not allowed with no bond types} :dt
Self-explanatory. :dd
{Dump modify compute ID does not compute per-atom array} :dt
Self-explanatory. :dd
{Dump modify compute ID does not compute per-atom info} :dt
Self-explanatory. :dd
{Dump modify compute ID does not compute per-atom vector} :dt
Self-explanatory. :dd
{Dump modify compute ID vector is not large enough} :dt
Self-explanatory. :dd
{Dump modify element names do not match atom types} :dt
Number of element names must equal number of atom types. :dd
{Dump modify fix ID does not compute per-atom array} :dt
Self-explanatory. :dd
{Dump modify fix ID does not compute per-atom info} :dt
Self-explanatory. :dd
{Dump modify fix ID does not compute per-atom vector} :dt
Self-explanatory. :dd
{Dump modify fix ID vector is not large enough} :dt
Self-explanatory. :dd
{Dump modify variable is not atom-style variable} :dt
Self-explanatory. :dd
{Dump sort column is invalid} :dt
Self-explanatory. :dd
{Dump xtc requires sorting by atom ID} :dt
Use the dump_modify sort command to enable this. :dd
{Dump xyz/gz only writes compressed files} :dt
The dump xyz/gz output file name must have a .gz suffix. :dd
{Dump_modify buffer yes not allowed for this style} :dt
Self-explanatory. :dd
{Dump_modify format string is too short} :dt
There are more fields to be dumped in a line of output than your
format string specifies. :dd
{Dump_modify region ID does not exist} :dt
Self-explanatory. :dd
{Dumping an atom property that isn't allocated} :dt
The chosen atom style does not define the per-atom quantity being
dumped. :dd
{Duplicate atom IDs exist} :dt
Self-explanatory. :dd
{Duplicate fields in read_dump command} :dt
Self-explanatory. :dd
{Duplicate particle in PeriDynamic bond - simulation box is too small} :dt
This is likely because your box length is shorter than 2 times
the bond length. :dd
{Electronic temperature dropped below zero} :dt
Something has gone wrong with the fix ttm electron temperature model. :dd
{Element not defined in potential file} :dt
The specified element is not in the potential file. :dd
{Empty brackets in variable} :dt
There is no variable syntax that uses empty brackets. Check
the variable doc page. :dd
{Energy was not tallied on needed timestep} :dt
You are using a thermo keyword that requires potentials to
have tallied energy, but they didn't on this timestep. See the
variable doc page for ideas on how to make this work. :dd
{Epsilon or sigma reference not set by pair style in PPPMDisp} :dt
Self-explanatory. :dd
{Epsilon or sigma reference not set by pair style in ewald/n} :dt
The pair style is not providing the needed epsilon or sigma values. :dd
{Error in vdw spline: inner radius > outer radius} :dt
A pre-tabulated spline is invalid. Likely a problem with the
potential parameters. :dd
{Error writing averaged chunk data} :dt
Something in the output to the file triggered an error. :dd
{Error writing file header} :dt
Something in the output to the file triggered an error. :dd
{Error writing out correlation data} :dt
Something in the output to the file triggered an error. :dd
{Error writing out histogram data} :dt
Something in the output to the file triggered an error. :dd
{Error writing out time averaged data} :dt
Something in the output to the file triggered an error. :dd
{Failed to allocate %ld bytes for array %s} :dt
Your LAMMPS simulation has run out of memory. You need to run a
smaller simulation or on more processors. :dd
{Failed to open FFmpeg pipeline to file %s} :dt
The specified file cannot be opened. Check that the path and name are
correct and writable and that the FFmpeg executable can be found and run. :dd
{Failed to reallocate %ld bytes for array %s} :dt
Your LAMMPS simulation has run out of memory. You need to run a
smaller simulation or on more processors. :dd
{Fewer SRD bins than processors in some dimension} :dt
This is not allowed. Make your SRD bin size smaller. :dd
{File variable could not read value} :dt
Check the file assigned to the variable. :dd
{Final box dimension due to fix deform is < 0.0} :dt
Self-explanatory. :dd
{Fix %s does not allow use of dynamic group} :dt
Dynamic groups have not yet been enabled for this fix. :dd
{Fix ID for compute chunk/atom does not exist} :dt
Self-explanatory. :dd
{Fix ID for compute erotate/rigid does not exist} :dt
Self-explanatory. :dd
{Fix ID for compute ke/rigid does not exist} :dt
Self-explanatory. :dd
{Fix ID for compute reduce does not exist} :dt
Self-explanatory. :dd
{Fix ID for compute slice does not exist} :dt
Self-explanatory. :dd
{Fix ID for fix ave/atom does not exist} :dt
Self-explanatory. :dd
{Fix ID for fix ave/chunk does not exist} :dt
Self-explanatory. :dd
{Fix ID for fix ave/correlate does not exist} :dt
Self-explanatory. :dd
{Fix ID for fix ave/histo does not exist} :dt
Self-explanatory. :dd
{Fix ID for fix ave/spatial does not exist} :dt
Self-explanatory. :dd
{Fix ID for fix ave/time does not exist} :dt
Self-explanatory. :dd
{Fix ID for fix store/state does not exist} :dt
Self-explanatory :dd
{Fix ID for fix vector does not exist} :dt
Self-explanatory. :dd
{Fix ID for read_data does not exist} :dt
Self-explanatory. :dd
{Fix ID for velocity does not exist} :dt
Self-explanatory. :dd
{Fix ID must be alphanumeric or underscore characters} :dt
Self-explanatory. :dd
{Fix SRD: bad bin assignment for SRD advection} :dt
Something has gone wrong in your SRD model; try using more
conservative settings. :dd
{Fix SRD: bad search bin assignment} :dt
Something has gone wrong in your SRD model; try using more
conservative settings. :dd
{Fix SRD: bad stencil bin for big particle} :dt
Something has gone wrong in your SRD model; try using more
conservative settings. :dd
{Fix SRD: too many big particles in bin} :dt
Reset the ATOMPERBIN parameter at the top of fix_srd.cpp
to a larger value, and re-compile the code. :dd
{Fix SRD: too many walls in bin} :dt
This should not happen unless your system has been setup incorrectly. :dd
{Fix adapt interface to this pair style not supported} :dt
New coding for the pair style would need to be done. :dd
{Fix adapt kspace style does not exist} :dt
Self-explanatory. :dd
{Fix adapt pair style does not exist} :dt
Self-explanatory :dd
{Fix adapt pair style param not supported} :dt
The pair style does not know about the parameter you specified. :dd
{Fix adapt requires atom attribute charge} :dt
The atom style being used does not specify an atom charge. :dd
{Fix adapt requires atom attribute diameter} :dt
The atom style being used does not specify an atom diameter. :dd
{Fix adapt type pair range is not valid for pair hybrid sub-style} :dt
Self-explanatory. :dd
{Fix append/atoms requires a lattice be defined} :dt
Use the lattice command for this purpose. :dd
{Fix ave/atom compute array is accessed out-of-range} :dt
Self-explanatory. :dd
{Fix ave/atom compute does not calculate a per-atom array} :dt
Self-explanatory. :dd
{Fix ave/atom compute does not calculate a per-atom vector} :dt
A compute used by fix ave/atom must generate per-atom values. :dd
{Fix ave/atom compute does not calculate per-atom values} :dt
A compute used by fix ave/atom must generate per-atom values. :dd
{Fix ave/atom fix array is accessed out-of-range} :dt
Self-explanatory. :dd
{Fix ave/atom fix does not calculate a per-atom array} :dt
Self-explanatory. :dd
{Fix ave/atom fix does not calculate a per-atom vector} :dt
A fix used by fix ave/atom must generate per-atom values. :dd
{Fix ave/atom fix does not calculate per-atom values} :dt
A fix used by fix ave/atom must generate per-atom values. :dd
{Fix ave/atom variable is not atom-style variable} :dt
A variable used by fix ave/atom must generate per-atom values. :dd
{Fix ave/chunk compute does not calculate a per-atom array} :dt
Self-explanatory. :dd
{Fix ave/chunk compute does not calculate a per-atom vector} :dt
Self-explanatory. :dd
{Fix ave/chunk compute does not calculate per-atom values} :dt
Self-explanatory. :dd
{Fix ave/chunk compute vector is accessed out-of-range} :dt
Self-explanatory. :dd
{Fix ave/chunk does not use chunk/atom compute} :dt
The specified conpute is not for a compute chunk/atom command. :dd
{Fix ave/chunk fix does not calculate a per-atom array} :dt
Self-explanatory. :dd
{Fix ave/chunk fix does not calculate a per-atom vector} :dt
Self-explanatory. :dd
{Fix ave/chunk fix does not calculate per-atom values} :dt
Self-explanatory. :dd
{Fix ave/chunk fix vector is accessed out-of-range} :dt
Self-explanatory. :dd
{Fix ave/chunk variable is not atom-style variable} :dt
Self-explanatory. :dd
{Fix ave/correlate compute does not calculate a scalar} :dt
Self-explanatory. :dd
{Fix ave/correlate compute does not calculate a vector} :dt
Self-explanatory. :dd
{Fix ave/correlate compute vector is accessed out-of-range} :dt
The index for the vector is out of bounds. :dd
{Fix ave/correlate fix does not calculate a scalar} :dt
Self-explanatory. :dd
{Fix ave/correlate fix does not calculate a vector} :dt
Self-explanatory. :dd
{Fix ave/correlate fix vector is accessed out-of-range} :dt
The index for the vector is out of bounds. :dd
{Fix ave/correlate variable is not equal-style variable} :dt
Self-explanatory. :dd
{Fix ave/histo cannot input local values in scalar mode} :dt
Self-explanatory. :dd
{Fix ave/histo cannot input per-atom values in scalar mode} :dt
Self-explanatory. :dd
{Fix ave/histo compute array is accessed out-of-range} :dt
Self-explanatory. :dd
{Fix ave/histo compute does not calculate a global array} :dt
Self-explanatory. :dd
{Fix ave/histo compute does not calculate a global scalar} :dt
Self-explanatory. :dd
{Fix ave/histo compute does not calculate a global vector} :dt
Self-explanatory. :dd
{Fix ave/histo compute does not calculate a local array} :dt
Self-explanatory. :dd
{Fix ave/histo compute does not calculate a local vector} :dt
Self-explanatory. :dd
{Fix ave/histo compute does not calculate a per-atom array} :dt
Self-explanatory. :dd
{Fix ave/histo compute does not calculate a per-atom vector} :dt
Self-explanatory. :dd
{Fix ave/histo compute does not calculate local values} :dt
Self-explanatory. :dd
{Fix ave/histo compute does not calculate per-atom values} :dt
Self-explanatory. :dd
{Fix ave/histo compute vector is accessed out-of-range} :dt
Self-explanatory. :dd
{Fix ave/histo fix array is accessed out-of-range} :dt
Self-explanatory. :dd
{Fix ave/histo fix does not calculate a global array} :dt
Self-explanatory. :dd
{Fix ave/histo fix does not calculate a global scalar} :dt
Self-explanatory. :dd
{Fix ave/histo fix does not calculate a global vector} :dt
Self-explanatory. :dd
{Fix ave/histo fix does not calculate a local array} :dt
Self-explanatory. :dd
{Fix ave/histo fix does not calculate a local vector} :dt
Self-explanatory. :dd
{Fix ave/histo fix does not calculate a per-atom array} :dt
Self-explanatory. :dd
{Fix ave/histo fix does not calculate a per-atom vector} :dt
Self-explanatory. :dd
{Fix ave/histo fix does not calculate local values} :dt
Self-explanatory. :dd
{Fix ave/histo fix does not calculate per-atom values} :dt
Self-explanatory. :dd
{Fix ave/histo fix vector is accessed out-of-range} :dt
Self-explanatory. :dd
{Fix ave/histo input is invalid compute} :dt
Self-explanatory. :dd
{Fix ave/histo input is invalid fix} :dt
Self-explanatory. :dd
{Fix ave/histo input is invalid variable} :dt
Self-explanatory. :dd
{Fix ave/histo inputs are not all global, peratom, or local} :dt
All inputs in a single fix ave/histo command must be of the
same style. :dd
{Fix ave/histo/weight value and weight vector lengths do not match} :dt
Self-explanatory. :dd
{Fix ave/spatial compute does not calculate a per-atom array} :dt
Self-explanatory. :dd
{Fix ave/spatial compute does not calculate a per-atom vector} :dt
A compute used by fix ave/spatial must generate per-atom values. :dd
{Fix ave/spatial compute does not calculate per-atom values} :dt
A compute used by fix ave/spatial must generate per-atom values. :dd
{Fix ave/spatial compute vector is accessed out-of-range} :dt
The index for the vector is out of bounds. :dd
{Fix ave/spatial fix does not calculate a per-atom array} :dt
Self-explanatory. :dd
{Fix ave/spatial fix does not calculate a per-atom vector} :dt
A fix used by fix ave/spatial must generate per-atom values. :dd
{Fix ave/spatial fix does not calculate per-atom values} :dt
A fix used by fix ave/spatial must generate per-atom values. :dd
{Fix ave/spatial fix vector is accessed out-of-range} :dt
The index for the vector is out of bounds. :dd
{Fix ave/spatial for triclinic boxes requires units reduced} :dt
Self-explanatory. :dd
{Fix ave/spatial settings invalid with changing box size} :dt
If the box size changes, only the units reduced option can be
used. :dd
{Fix ave/spatial variable is not atom-style variable} :dt
A variable used by fix ave/spatial must generate per-atom values. :dd
{Fix ave/time cannot set output array intensive/extensive from these inputs} :dt
One of more of the vector inputs has individual elements which are
flagged as intensive or extensive. Such an input cannot be flagged as
all intensive/extensive when turned into an array by fix ave/time. :dd
{Fix ave/time cannot use variable with vector mode} :dt
Variables produce scalar values. :dd
{Fix ave/time columns are inconsistent lengths} :dt
Self-explanatory. :dd
{Fix ave/time compute array is accessed out-of-range} :dt
An index for the array is out of bounds. :dd
{Fix ave/time compute does not calculate a scalar} :dt
Self-explantory. :dd
{Fix ave/time compute does not calculate a vector} :dt
Self-explantory. :dd
{Fix ave/time compute does not calculate an array} :dt
Self-explanatory. :dd
{Fix ave/time compute vector is accessed out-of-range} :dt
The index for the vector is out of bounds. :dd
{Fix ave/time fix array cannot be variable length} :dt
Self-explanatory. :dd
{Fix ave/time fix array is accessed out-of-range} :dt
An index for the array is out of bounds. :dd
{Fix ave/time fix does not calculate a scalar} :dt
Self-explanatory. :dd
{Fix ave/time fix does not calculate a vector} :dt
Self-explanatory. :dd
{Fix ave/time fix does not calculate an array} :dt
Self-explanatory. :dd
{Fix ave/time fix vector cannot be variable length} :dt
Self-explanatory. :dd
{Fix ave/time fix vector is accessed out-of-range} :dt
The index for the vector is out of bounds. :dd
{Fix ave/time variable is not equal-style variable} :dt
Self-explanatory. :dd
{Fix balance rcb cannot be used with comm_style brick} :dt
Comm_style tiled must be used instead. :dd
{Fix balance shift string is invalid} :dt
The string can only contain the characters "x", "y", or "z". :dd
{Fix bond/break needs ghost atoms from further away} :dt
This is because the fix needs to walk bonds to a certain distance to
acquire needed info, The comm_modify cutoff command can be used to
extend the communication range. :dd
{Fix bond/create angle type is invalid} :dt
Self-explanatory. :dd
{Fix bond/create cutoff is longer than pairwise cutoff} :dt
This is not allowed because bond creation is done using the
pairwise neighbor list. :dd
{Fix bond/create dihedral type is invalid} :dt
Self-explanatory. :dd
{Fix bond/create improper type is invalid} :dt
Self-explanatory. :dd
{Fix bond/create induced too many angles/dihedrals/impropers per atom} :dt
See the read_data command for info on setting the "extra angle per
atom", etc header values to allow for additional angles, etc to be
formed. :dd
{Fix bond/create needs ghost atoms from further away} :dt
This is because the fix needs to walk bonds to a certain distance to
acquire needed info, The comm_modify cutoff command can be used to
extend the communication range. :dd
{Fix bond/swap cannot use dihedral or improper styles} :dt
These styles cannot be defined when using this fix. :dd
{Fix bond/swap requires pair and bond styles} :dt
Self-explanatory. :dd
{Fix bond/swap requires special_bonds = 0,1,1} :dt
Self-explanatory. :dd
{Fix box/relax generated negative box length} :dt
The pressure being applied is likely too large. Try applying
it incrementally, to build to the high pressure. :dd
{Fix command before simulation box is defined} :dt
The fix command cannot be used before a read_data, read_restart, or
create_box command. :dd
{Fix deform cannot use yz variable with xy} :dt
The yz setting cannot be a variable if xy deformation is also
specified. This is because LAMMPS cannot determine if the yz setting
will induce a box flip which would be invalid if xy is also changing. :dd
{Fix deform is changing yz too much with xy} :dt
When both yz and xy are changing, it induces changes in xz if the
box must flip from one tilt extreme to another. Thus it is not
allowed for yz to grow so much that a flip is induced. :dd
{Fix deform tilt factors require triclinic box} :dt
Cannot deform the tilt factors of a simulation box unless it
is a triclinic (non-orthogonal) box. :dd
{Fix deform volume setting is invalid} :dt
Cannot use volume style unless other dimensions are being controlled. :dd
{Fix deposit and fix rigid/small not using same molecule template ID} :dt
Self-explanatory. :dd
{Fix deposit and fix shake not using same molecule template ID} :dt
Self-explanatory. :dd
{Fix deposit molecule must have atom types} :dt
The defined molecule does not specify atom types. :dd
{Fix deposit molecule must have coordinates} :dt
The defined molecule does not specify coordinates. :dd
{Fix deposit molecule template ID must be same as atom_style template ID} :dt
When using atom_style template, you cannot deposit molecules that are
not in that template. :dd
{Fix deposit region cannot be dynamic} :dt
Only static regions can be used with fix deposit. :dd
{Fix deposit region does not support a bounding box} :dt
Not all regions represent bounded volumes. You cannot use
such a region with the fix deposit command. :dd
{Fix deposit shake fix does not exist} :dt
Self-explanatory. :dd
{Fix efield requires atom attribute q or mu} :dt
The atom style defined does not have this attribute. :dd
{Fix efield with dipoles cannot use atom-style variables} :dt
This option is not supported. :dd
{Fix evaporate molecule requires atom attribute molecule} :dt
The atom style being used does not define a molecule ID. :dd
{Fix external callback function not set} :dt
This must be done by an external program in order to use this fix. :dd
{Fix for fix ave/atom not computed at compatible time} :dt
Fixes generate their values on specific timesteps. Fix ave/atom is
requesting a value on a non-allowed timestep. :dd
{Fix for fix ave/chunk not computed at compatible time} :dt
Fixes generate their values on specific timesteps. Fix ave/chunk is
requesting a value on a non-allowed timestep. :dd
{Fix for fix ave/correlate not computed at compatible time} :dt
Fixes generate their values on specific timesteps. Fix ave/correlate
is requesting a value on a non-allowed timestep. :dd
{Fix for fix ave/histo not computed at compatible time} :dt
Fixes generate their values on specific timesteps. Fix ave/histo is
requesting a value on a non-allowed timestep. :dd
{Fix for fix ave/spatial not computed at compatible time} :dt
Fixes generate their values on specific timesteps. Fix ave/spatial is
requesting a value on a non-allowed timestep. :dd
{Fix for fix ave/time not computed at compatible time} :dt
Fixes generate their values on specific timesteps. Fix ave/time
is requesting a value on a non-allowed timestep. :dd
{Fix for fix store/state not computed at compatible time} :dt
Fixes generate their values on specific timesteps. Fix store/state
is requesting a value on a non-allowed timestep. :dd
{Fix for fix vector not computed at compatible time} :dt
Fixes generate their values on specific timesteps. Fix vector is
requesting a value on a non-allowed timestep. :dd
{Fix freeze requires atom attribute torque} :dt
The atom style defined does not have this attribute. :dd
{Fix gcmc and fix shake not using same molecule template ID} :dt
Self-explanatory. :dd
{Fix gcmc atom has charge, but atom style does not} :dt
Self-explanatory. :dd
{Fix gcmc cannot exchange individual atoms belonging to a molecule} :dt
This is an error since you should not delete only one atom of a
molecule. The user has specified atomic (non-molecular) gas
exchanges, but an atom belonging to a molecule could be deleted. :dd
{Fix gcmc does not (yet) work with atom_style template} :dt
Self-explanatory. :dd
{Fix gcmc molecule command requires that atoms have molecule attributes} :dt
Should not choose the gcmc molecule feature if no molecules are being
simulated. The general molecule flag is off, but gcmc's molecule flag
is on. :dd
{Fix gcmc molecule has charges, but atom style does not} :dt
Self-explanatory. :dd
{Fix gcmc molecule must have atom types} :dt
The defined molecule does not specify atom types. :dd
{Fix gcmc molecule must have coordinates} :dt
The defined molecule does not specify coordinates. :dd
{Fix gcmc molecule template ID must be same as atom_style template ID} :dt
When using atom_style template, you cannot insert molecules that are
not in that template. :dd
{Fix gcmc put atom outside box} :dt
This should not normally happen. Contact the developers. :dd
{Fix gcmc ran out of available atom IDs} :dt
See the setting for tagint in the src/lmptype.h file. :dd
{Fix gcmc ran out of available molecule IDs} :dt
See the setting for tagint in the src/lmptype.h file. :dd
{Fix gcmc region cannot be dynamic} :dt
Only static regions can be used with fix gcmc. :dd
{Fix gcmc region does not support a bounding box} :dt
Not all regions represent bounded volumes. You cannot use
such a region with the fix gcmc command. :dd
{Fix gcmc region extends outside simulation box} :dt
Self-explanatory. :dd
{Fix gcmc shake fix does not exist} :dt
Self-explanatory. :dd
{Fix gld c coefficients must be >= 0} :dt
Self-explanatory. :dd
{Fix gld needs more prony series coefficients} :dt
Self-explanatory. :dd
{Fix gld prony terms must be > 0} :dt
Self-explanatory. :dd
{Fix gld series type must be pprony for now} :dt
Self-explanatory. :dd
{Fix gld start temperature must be >= 0} :dt
Self-explanatory. :dd
{Fix gld stop temperature must be >= 0} :dt
Self-explanatory. :dd
{Fix gld tau coefficients must be > 0} :dt
Self-explanatory. :dd
{Fix heat group has no atoms} :dt
Self-explanatory. :dd
{Fix heat kinetic energy of an atom went negative} :dt
This will cause the velocity rescaling about to be performed by fix
heat to be invalid. :dd
{Fix heat kinetic energy went negative} :dt
This will cause the velocity rescaling about to be performed by fix
heat to be invalid. :dd
{Fix in variable not computed at compatible time} :dt
Fixes generate their values on specific timesteps. The variable is
requesting the values on a non-allowed timestep. :dd
{Fix langevin angmom is not yet implemented with kokkos} :dt
This option is not yet available. :dd
{Fix langevin angmom requires atom style ellipsoid} :dt
Self-explanatory. :dd
{Fix langevin angmom requires extended particles} :dt
This fix option cannot be used with point paritlces. :dd
{Fix langevin omega is not yet implemented with kokkos} :dt
This option is not yet available. :dd
{Fix langevin omega requires atom style sphere} :dt
Self-explanatory. :dd
{Fix langevin omega requires extended particles} :dt
One of the particles has radius 0.0. :dd
{Fix langevin period must be > 0.0} :dt
The time window for temperature relaxation must be > 0 :dd
{Fix langevin variable returned negative temperature} :dt
Self-explanatory. :dd
{Fix momentum group has no atoms} :dt
Self-explanatory. :dd
{Fix move cannot define z or vz variable for 2d problem} :dt
Self-explanatory. :dd
{Fix move cannot rotate aroung non z-axis for 2d problem} :dt
Self-explanatory. :dd
{Fix move cannot set linear z motion for 2d problem} :dt
Self-explanatory. :dd
{Fix move cannot set wiggle z motion for 2d problem} :dt
Self-explanatory. :dd
{Fix msst compute ID does not compute potential energy} :dt
Self-explanatory. :dd
{Fix msst compute ID does not compute pressure} :dt
Self-explanatory. :dd
{Fix msst compute ID does not compute temperature} :dt
Self-explanatory. :dd
{Fix msst requires a periodic box} :dt
Self-explanatory. :dd
{Fix msst tscale must satisfy 0 <= tscale < 1} :dt
Self-explanatory. :dd
{Fix npt/nph has tilted box too far in one step - periodic cell is too far from equilibrium state} :dt
Self-explanatory. The change in the box tilt is too extreme
on a short timescale. :dd
{Fix nve/asphere requires extended particles} :dt
This fix can only be used for particles with a shape setting. :dd
{Fix nve/asphere/noforce requires atom style ellipsoid} :dt
Self-explanatory. :dd
{Fix nve/asphere/noforce requires extended particles} :dt
One of the particles is not an ellipsoid. :dd
{Fix nve/body requires atom style body} :dt
Self-explanatory. :dd
{Fix nve/body requires bodies} :dt
This fix can only be used for particles that are bodies. :dd
{Fix nve/line can only be used for 2d simulations} :dt
Self-explanatory. :dd
{Fix nve/line requires atom style line} :dt
Self-explanatory. :dd
{Fix nve/line requires line particles} :dt
Self-explanatory. :dd
{Fix nve/sphere dipole requires atom attribute mu} :dt
An atom style with this attribute is needed. :dd
{Fix nve/sphere requires atom style sphere} :dt
Self-explanatory. :dd
{Fix nve/sphere requires extended particles} :dt
This fix can only be used for particles of a finite size. :dd
{Fix nve/tri can only be used for 3d simulations} :dt
Self-explanatory. :dd
{Fix nve/tri requires atom style tri} :dt
Self-explanatory. :dd
{Fix nve/tri requires tri particles} :dt
Self-explanatory. :dd
{Fix nvt/nph/npt asphere requires extended particles} :dt
The shape setting for a particle in the fix group has shape = 0.0,
which means it is a point particle. :dd
{Fix nvt/nph/npt body requires bodies} :dt
Self-explanatory. :dd
{Fix nvt/nph/npt sphere requires atom style sphere} :dt
Self-explanatory. :dd
{Fix nvt/npt/nph damping parameters must be > 0.0} :dt
Self-explanatory. :dd
{Fix nvt/npt/nph dilate group ID does not exist} :dt
Self-explanatory. :dd
{Fix nvt/sphere requires extended particles} :dt
This fix can only be used for particles of a finite size. :dd
{Fix orient/fcc file open failed} :dt
The fix orient/fcc command could not open a specified file. :dd
{Fix orient/fcc file read failed} :dt
The fix orient/fcc command could not read the needed parameters from a
specified file. :dd
{Fix orient/fcc found self twice} :dt
The neighbor lists used by fix orient/fcc are messed up. If this
error occurs, it is likely a bug, so send an email to the
"developers"_http://lammps.sandia.gov/authors.html. :dd
{Fix peri neigh does not exist} :dt
Somehow a fix that the pair style defines has been deleted. :dd
{Fix pour and fix rigid/small not using same molecule template ID} :dt
Self-explanatory. :dd
{Fix pour and fix shake not using same molecule template ID} :dt
Self-explanatory. :dd
{Fix pour insertion count per timestep is 0} :dt
Self-explanatory. :dd
{Fix pour molecule must have atom types} :dt
The defined molecule does not specify atom types. :dd
{Fix pour molecule must have coordinates} :dt
The defined molecule does not specify coordinates. :dd
{Fix pour molecule template ID must be same as atom style template ID} :dt
When using atom_style template, you cannot pour molecules that are
not in that template. :dd
{Fix pour polydisperse fractions do not sum to 1.0} :dt
Self-explanatory. :dd
{Fix pour region ID does not exist} :dt
Self-explanatory. :dd
{Fix pour region cannot be dynamic} :dt
Only static regions can be used with fix pour. :dd
{Fix pour region does not support a bounding box} :dt
Not all regions represent bounded volumes. You cannot use
such a region with the fix pour command. :dd
{Fix pour requires atom attributes radius, rmass} :dt
The atom style defined does not have these attributes. :dd
{Fix pour rigid fix does not exist} :dt
Self-explanatory. :dd
{Fix pour shake fix does not exist} :dt
Self-explanatory. :dd
{Fix press/berendsen damping parameters must be > 0.0} :dt
Self-explanatory. :dd
{Fix property/atom cannot specify mol twice} :dt
Self-explanatory. :dd
{Fix property/atom cannot specify q twice} :dt
Self-explanatory. :dd
{Fix property/atom mol when atom_style already has molecule attribute} :dt
Self-explanatory. :dd
{Fix property/atom q when atom_style already has charge attribute} :dt
Self-explanatory. :dd
{Fix property/atom vector name already exists} :dt
The name for an integer or floating-point vector must be unique. :dd
{Fix qeq has negative upper Taper radius cutoff} :dt
Self-explanatory. :dd
{Fix qeq/comb group has no atoms} :dt
Self-explanatory. :dd
{Fix qeq/comb requires atom attribute q} :dt
An atom style with charge must be used to perform charge equilibration. :dd
{Fix qeq/dynamic group has no atoms} :dt
Self-explanatory. :dd
{Fix qeq/dynamic requires atom attribute q} :dt
Self-explanatory. :dd
{Fix qeq/fire group has no atoms} :dt
Self-explanatory. :dd
{Fix qeq/fire requires atom attribute q} :dt
Self-explanatory. :dd
{Fix qeq/point group has no atoms} :dt
Self-explanatory. :dd
{Fix qeq/point has insufficient QEq matrix size} :dt
Occurs when number of neighbor atoms for an atom increased too much
during a run. Increase SAFE_ZONE and MIN_CAP in fix_qeq.h and
recompile. :dd
{Fix qeq/point requires atom attribute q} :dt
Self-explanatory. :dd
{Fix qeq/shielded group has no atoms} :dt
Self-explanatory. :dd
{Fix qeq/shielded has insufficient QEq matrix size} :dt
Occurs when number of neighbor atoms for an atom increased too much
during a run. Increase SAFE_ZONE and MIN_CAP in fix_qeq.h and
recompile. :dd
{Fix qeq/shielded requires atom attribute q} :dt
Self-explanatory. :dd
{Fix qeq/slater could not extract params from pair coul/streitz} :dt
This should not happen unless pair coul/streitz has been altered. :dd
{Fix qeq/slater group has no atoms} :dt
Self-explanatory. :dd
{Fix qeq/slater has insufficient QEq matrix size} :dt
Occurs when number of neighbor atoms for an atom increased too much
during a run. Increase SAFE_ZONE and MIN_CAP in fix_qeq.h and
recompile. :dd
{Fix qeq/slater requires atom attribute q} :dt
Self-explanatory. :dd
{Fix reax/bonds numbonds > nsbmax_most} :dt
The limit of the number of bonds expected by the ReaxFF force field
was exceeded. :dd
{Fix recenter group has no atoms} :dt
Self-explanatory. :dd
{Fix restrain requires an atom map, see atom_modify} :dt
Self-explanatory. :dd
{Fix rigid atom has non-zero image flag in a non-periodic dimension} :dt
Image flags for non-periodic dimensions should not be set. :dd
{Fix rigid file has no lines} :dt
Self-explanatory. :dd
{Fix rigid langevin period must be > 0.0} :dt
Self-explanatory. :dd
{Fix rigid molecule requires atom attribute molecule} :dt
Self-explanatory. :dd
{Fix rigid npt/nph dilate group ID does not exist} :dt
Self-explanatory. :dd
{Fix rigid npt/nph does not yet allow triclinic box} :dt
This is a current restriction in LAMMPS. :dd
{Fix rigid npt/nph period must be > 0.0} :dt
Self-explanatory. :dd
{Fix rigid npt/small t_chain should not be less than 1} :dt
Self-explanatory. :dd
{Fix rigid npt/small t_order must be 3 or 5} :dt
Self-explanatory. :dd
{Fix rigid nvt/npt/nph damping parameters must be > 0.0} :dt
Self-explanatory. :dd
{Fix rigid nvt/small t_chain should not be less than 1} :dt
Self-explanatory. :dd
{Fix rigid nvt/small t_iter should not be less than 1} :dt
Self-explanatory. :dd
{Fix rigid nvt/small t_order must be 3 or 5} :dt
Self-explanatory. :dd
{Fix rigid xy torque cannot be on for 2d simulation} :dt
Self-explanatory. :dd
{Fix rigid z force cannot be on for 2d simulation} :dt
Self-explanatory. :dd
{Fix rigid/npt period must be > 0.0} :dt
Self-explanatory. :dd
{Fix rigid/npt temperature order must be 3 or 5} :dt
Self-explanatory. :dd
{Fix rigid/npt/small period must be > 0.0} :dt
Self-explanatory. :dd
{Fix rigid/nvt period must be > 0.0} :dt
Self-explanatory. :dd
{Fix rigid/nvt temperature order must be 3 or 5} :dt
Self-explanatory. :dd
{Fix rigid/nvt/small period must be > 0.0} :dt
Self-explanatory. :dd
{Fix rigid/small atom has non-zero image flag in a non-periodic dimension} :dt
Image flags for non-periodic dimensions should not be set. :dd
{Fix rigid/small langevin period must be > 0.0} :dt
Self-explanatory. :dd
{Fix rigid/small molecule must have atom types} :dt
The defined molecule does not specify atom types. :dd
{Fix rigid/small molecule must have coordinates} :dt
The defined molecule does not specify coordinates. :dd
{Fix rigid/small npt/nph period must be > 0.0} :dt
Self-explanatory. :dd
{Fix rigid/small nvt/npt/nph damping parameters must be > 0.0} :dt
Self-explanatory. :dd
{Fix rigid/small nvt/npt/nph dilate group ID does not exist} :dt
Self-explanatory. :dd
{Fix rigid/small requires an atom map, see atom_modify} :dt
Self-explanatory. :dd
{Fix rigid/small requires atom attribute molecule} :dt
Self-explanatory. :dd
{Fix rigid: Bad principal moments} :dt
The principal moments of inertia computed for a rigid body
are not within the required tolerances. :dd
{Fix shake cannot be used with minimization} :dt
Cannot use fix shake while doing an energy minimization since
it turns off bonds that should contribute to the energy. :dd
{Fix shake molecule template must have shake info} :dt
The defined molecule does not specify SHAKE information. :dd
{Fix spring couple group ID does not exist} :dt
Self-explanatory. :dd
{Fix srd can only currently be used with comm_style brick} :dt
This is a current restriction in LAMMPS. :dd
{Fix srd lamda must be >= 0.6 of SRD grid size} :dt
This is a requirement for accuracy reasons. :dd
{Fix srd no-slip requires atom attribute torque} :dt
This is because the SRD collisions will impart torque to the solute
particles. :dd
{Fix srd requires SRD particles all have same mass} :dt
Self-explanatory. :dd
{Fix srd requires ghost atoms store velocity} :dt
Use the comm_modify vel yes command to enable this. :dd
{Fix srd requires newton pair on} :dt
Self-explanatory. :dd
{Fix store/state compute array is accessed out-of-range} :dt
Self-explanatory. :dd
{Fix store/state compute does not calculate a per-atom array} :dt
The compute calculates a per-atom vector. :dd
{Fix store/state compute does not calculate a per-atom vector} :dt
The compute calculates a per-atom vector. :dd
{Fix store/state compute does not calculate per-atom values} :dt
Computes that calculate global or local quantities cannot be used
with fix store/state. :dd
{Fix store/state fix array is accessed out-of-range} :dt
Self-explanatory. :dd
{Fix store/state fix does not calculate a per-atom array} :dt
The fix calculates a per-atom vector. :dd
{Fix store/state fix does not calculate a per-atom vector} :dt
The fix calculates a per-atom array. :dd
{Fix store/state fix does not calculate per-atom values} :dt
Fixes that calculate global or local quantities cannot be used with
fix store/state. :dd
{Fix store/state for atom property that isn't allocated} :dt
Self-explanatory. :dd
{Fix store/state variable is not atom-style variable} :dt
Only atom-style variables calculate per-atom quantities. :dd
{Fix temp/berendsen period must be > 0.0} :dt
Self-explanatory. :dd
{Fix temp/berendsen variable returned negative temperature} :dt
Self-explanatory. :dd
{Fix temp/csld is not compatible with fix rattle or fix shake} :dt
These two commands cannot currently be used together with fix temp/csld. :dd
{Fix temp/csld variable returned negative temperature} :dt
Self-explanatory. :dd
{Fix temp/csvr variable returned negative temperature} :dt
Self-explanatory. :dd
{Fix temp/rescale variable returned negative temperature} :dt
Self-explanatory. :dd
{Fix tfmc displacement length must be > 0} :dt
Self-explanatory. :dd
{Fix tfmc is not compatible with fix shake} :dt
These two commands cannot currently be used together. :dd
{Fix tfmc temperature must be > 0} :dt
Self-explanatory. :dd
{Fix thermal/conductivity swap value must be positive} :dt
Self-explanatory. :dd
{Fix tmd must come after integration fixes} :dt
Any fix tmd command must appear in the input script after all time
integration fixes (nve, nvt, npt). See the fix tmd documentation for
details. :dd
{Fix ttm electron temperatures must be > 0.0} :dt
Self-explanatory. :dd
{Fix ttm electronic_density must be > 0.0} :dt
Self-explanatory. :dd
{Fix ttm electronic_specific_heat must be > 0.0} :dt
Self-explanatory. :dd
{Fix ttm electronic_thermal_conductivity must be >= 0.0} :dt
Self-explanatory. :dd
{Fix ttm gamma_p must be > 0.0} :dt
Self-explanatory. :dd
{Fix ttm gamma_s must be >= 0.0} :dt
Self-explanatory. :dd
{Fix ttm number of nodes must be > 0} :dt
Self-explanatory. :dd
{Fix ttm v_0 must be >= 0.0} :dt
Self-explanatory. :dd
{Fix used in compute chunk/atom not computed at compatible time} :dt
The chunk/atom compute cannot query the output of the fix on a timestep
it is needed. :dd
{Fix used in compute reduce not computed at compatible time} :dt
Fixes generate their values on specific timesteps. Compute reduce is
requesting a value on a non-allowed timestep. :dd
{Fix used in compute slice not computed at compatible time} :dt
Fixes generate their values on specific timesteps. Compute slice is
requesting a value on a non-allowed timestep. :dd
{Fix vector cannot set output array intensive/extensive from these inputs} :dt
The inputs to the command have conflicting intensive/extensive attributes.
You need to use more than one fix vector command. :dd
{Fix vector compute does not calculate a scalar} :dt
Self-explanatory. :dd
{Fix vector compute does not calculate a vector} :dt
Self-explanatory. :dd
{Fix vector compute vector is accessed out-of-range} :dt
Self-explanatory. :dd
{Fix vector fix does not calculate a scalar} :dt
Self-explanatory. :dd
{Fix vector fix does not calculate a vector} :dt
Self-explanatory. :dd
{Fix vector fix vector is accessed out-of-range} :dt
Self-explanatory. :dd
{Fix vector variable is not equal-style variable} :dt
Self-explanatory. :dd
{Fix viscosity swap value must be positive} :dt
Self-explanatory. :dd
{Fix viscosity vtarget value must be positive} :dt
Self-explanatory. :dd
{Fix wall cutoff <= 0.0} :dt
Self-explanatory. :dd
{Fix wall/colloid requires atom style sphere} :dt
Self-explanatory. :dd
{Fix wall/colloid requires extended particles} :dt
One of the particles has radius 0.0. :dd
{Fix wall/gran is incompatible with Pair style} :dt
Must use a granular pair style to define the parameters needed for
this fix. :dd
{Fix wall/gran requires atom style sphere} :dt
Self-explanatory. :dd
{Fix wall/piston command only available at zlo} :dt
The face keyword must be zlo. :dd
{Fix wall/region colloid requires atom style sphere} :dt
Self-explanatory. :dd
{Fix wall/region colloid requires extended particles} :dt
One of the particles has radius 0.0. :dd
{Fix wall/region cutoff <= 0.0} :dt
Self-explanatory. :dd
{Fix_modify pressure ID does not compute pressure} :dt
The compute ID assigned to the fix must compute pressure. :dd
{Fix_modify temperature ID does not compute temperature} :dt
The compute ID assigned to the fix must compute temperature. :dd
{For triclinic deformation, specified target stress must be hydrostatic} :dt
Triclinic pressure control is allowed using the tri keyword, but
non-hydrostatic pressure control can not be used in this case. :dd
{Found no restart file matching pattern} :dt
When using a "*" in the restart file name, no matching file was found. :dd
{GPU library not compiled for this accelerator} :dt
Self-explanatory. :dd
{GPU package does not (yet) work with atom_style template} :dt
Self-explanatory. :dd
{GPU particle split must be set to 1 for this pair style.} :dt
For this pair style, you cannot run part of the force calculation on
the host. See the package command. :dd
{GPU split param must be positive for hybrid pair styles} :dt
See the package gpu command. :dd
{GPUs are requested but Kokkos has not been compiled for CUDA} :dt
Recompile Kokkos with CUDA support to use GPUs. :dd
{Ghost velocity forward comm not yet implemented with Kokkos} :dt
This is a current restriction. :dd
{Gmask function in equal-style variable formula} :dt
Gmask is per-atom operation. :dd
{Gravity changed since fix pour was created} :dt
The gravity vector defined by fix gravity must be static. :dd
{Gravity must point in -y to use with fix pour in 2d} :dt
Self-explanatory. :dd
{Gravity must point in -z to use with fix pour in 3d} :dt
Self-explanatory. :dd
{Grmask function in equal-style variable formula} :dt
Grmask is per-atom operation. :dd
{Group ID does not exist} :dt
A group ID used in the group command does not exist. :dd
{Group ID in variable formula does not exist} :dt
Self-explanatory. :dd
{Group all cannot be made dynamic} :dt
This operation is not allowed. :dd
{Group command before simulation box is defined} :dt
The group command cannot be used before a read_data, read_restart, or
create_box command. :dd
{Group dynamic cannot reference itself} :dt
Self-explanatory. :dd
{Group dynamic parent group cannot be dynamic} :dt
Self-explanatory. :dd
{Group dynamic parent group does not exist} :dt
Self-explanatory. :dd
{Group region ID does not exist} :dt
A region ID used in the group command does not exist. :dd
{If read_dump purges it cannot replace or trim} :dt
These operations are not compatible. See the read_dump doc
page for details. :dd
{Illegal ... command} :dt
Self-explanatory. Check the input script syntax and compare to the
documentation for the command. You can use -echo screen as a
command-line option when running LAMMPS to see the offending line. :dd
{Illegal COMB parameter} :dt
One or more of the coefficients defined in the potential file is
invalid. :dd
{Illegal COMB3 parameter} :dt
One or more of the coefficients defined in the potential file is
invalid. :dd
{Illegal Stillinger-Weber parameter} :dt
One or more of the coefficients defined in the potential file is
invalid. :dd
{Illegal Tersoff parameter} :dt
One or more of the coefficients defined in the potential file is
invalid. :dd
{Illegal Vashishta parameter} :dt
One or more of the coefficients defined in the potential file is
invalid. :dd
{Illegal compute voronoi/atom command (occupation and (surface or edges))} :dt
Self-explanatory. :dd
{Illegal coul/streitz parameter} :dt
One or more of the coefficients defined in the potential file is
invalid. :dd
{Illegal dump_modify sfactor value (must be > 0.0)} :dt
Self-explanatory. :dd
{Illegal dump_modify tfactor value (must be > 0.0)} :dt
Self-explanatory. :dd
{Illegal fix gcmc gas mass <= 0} :dt
The computed mass of the designated gas molecule or atom type was less
than or equal to zero. :dd
{Illegal fix tfmc random seed} :dt
Seeds can only be nonzero positive integers. :dd
{Illegal fix wall/piston velocity} :dt
The piston velocity must be positive. :dd
{Illegal integrate style} :dt
Self-explanatory. :dd
{Illegal nb3b/harmonic parameter} :dt
One or more of the coefficients defined in the potential file is
invalid. :dd
{Illegal number of angle table entries} :dt
There must be at least 2 table entries. :dd
{Illegal number of bond table entries} :dt
There must be at least 2 table entries. :dd
{Illegal number of pair table entries} :dt
There must be at least 2 table entries. :dd
{Illegal or unset periodicity in restart} :dt
This error should not normally occur unless the restart file is invalid. :dd
{Illegal range increment value} :dt
The increment must be >= 1. :dd
{Illegal simulation box} :dt
The lower bound of the simulation box is greater than the upper bound. :dd
{Illegal size double vector read requested} :dt
This error should not normally occur unless the restart file is invalid. :dd
{Illegal size integer vector read requested} :dt
This error should not normally occur unless the restart file is invalid. :dd
{Illegal size string or corrupt restart} :dt
This error should not normally occur unless the restart file is invalid. :dd
{Imageint setting in lmptype.h is invalid} :dt
Imageint must be as large or larger than smallint. :dd
{Imageint setting in lmptype.h is not compatible} :dt
Format of imageint stored in restart file is not consistent with
LAMMPS version you are running. See the settings in src/lmptype.h :dd
{Improper atom missing in delete_bonds} :dt
The delete_bonds command cannot find one or more atoms in a particular
improper on a particular processor. The pairwise cutoff is too short
or the atoms are too far apart to make a valid improper. :dd
{Improper atom missing in set command} :dt
The set command cannot find one or more atoms in a particular improper
on a particular processor. The pairwise cutoff is too short or the
atoms are too far apart to make a valid improper. :dd
{Improper atoms %d %d %d %d missing on proc %d at step %ld} :dt
One or more of 4 atoms needed to compute a particular improper are
missing on this processor. Typically this is because the pairwise
cutoff is set too short or the improper has blown apart and an atom is
too far away. :dd
{Improper atoms missing on proc %d at step %ld} :dt
One or more of 4 atoms needed to compute a particular improper are
missing on this processor. Typically this is because the pairwise
cutoff is set too short or the improper has blown apart and an atom is
too far away. :dd
{Improper coeff for hybrid has invalid style} :dt
Improper style hybrid uses another improper style as one of its
coefficients. The improper style used in the improper_coeff command
or read from a restart file is not recognized. :dd
{Improper coeffs are not set} :dt
No improper coefficients have been assigned in the data file or via
the improper_coeff command. :dd
{Improper style hybrid cannot have hybrid as an argument} :dt
Self-explanatory. :dd
{Improper style hybrid cannot have none as an argument} :dt
Self-explanatory. :dd
{Improper style hybrid cannot use same improper style twice} :dt
Self-explanatory. :dd
{Improper_coeff command before improper_style is defined} :dt
Coefficients cannot be set in the data file or via the improper_coeff
command until an improper_style has been assigned. :dd
{Improper_coeff command before simulation box is defined} :dt
The improper_coeff command cannot be used before a read_data,
read_restart, or create_box command. :dd
{Improper_coeff command when no impropers allowed} :dt
The chosen atom style does not allow for impropers to be defined. :dd
{Improper_style command when no impropers allowed} :dt
The chosen atom style does not allow for impropers to be defined. :dd
{Impropers assigned incorrectly} :dt
Impropers read in from the data file were not assigned correctly to
atoms. This means there is something invalid about the topology
definitions. :dd
{Impropers defined but no improper types} :dt
The data file header lists improper but no improper types. :dd
{Incomplete use of variables in create_atoms command} :dt
The var and set options must be used together. :dd
{Inconsistent iparam/jparam values in fix bond/create command} :dt
If itype and jtype are the same, then their maxbond and newtype
settings must also be the same. :dd
{Inconsistent line segment in data file} :dt
The end points of the line segment are not equal distances from the
center point which is the atom coordinate. :dd
{Inconsistent triangle in data file} :dt
The centroid of the triangle as defined by the corner points is not
the atom coordinate. :dd
{Inconsistent use of finite-size particles by molecule template molecules} :dt
Not all of the molecules define a radius for their constituent
particles. :dd
{Incorrect # of floating-point values in Bodies section of data file} :dt
See doc page for body style. :dd
{Incorrect # of integer values in Bodies section of data file} :dt
See doc page for body style. :dd
{Incorrect %s format in data file} :dt
A section of the data file being read by fix property/atom does
not have the correct number of values per line. :dd
{Incorrect SNAP parameter file} :dt
The file cannot be parsed correctly, check its internal syntax. :dd
{Incorrect args for angle coefficients} :dt
Self-explanatory. Check the input script or data file. :dd
{Incorrect args for bond coefficients} :dt
Self-explanatory. Check the input script or data file. :dd
{Incorrect args for dihedral coefficients} :dt
Self-explanatory. Check the input script or data file. :dd
{Incorrect args for improper coefficients} :dt
Self-explanatory. Check the input script or data file. :dd
{Incorrect args for pair coefficients} :dt
Self-explanatory. Check the input script or data file. :dd
{Incorrect args in pair_style command} :dt
Self-explanatory. :dd
{Incorrect atom format in data file} :dt
Number of values per atom line in the data file is not consistent with
the atom style. :dd
{Incorrect atom format in neb file} :dt
The number of fields per line is not what expected. :dd
{Incorrect bonus data format in data file} :dt
See the read_data doc page for a description of how various kinds of
bonus data must be formatted for certain atom styles. :dd
{Incorrect boundaries with slab Ewald} :dt
Must have periodic x,y dimensions and non-periodic z dimension to use
2d slab option with Ewald. :dd
{Incorrect boundaries with slab EwaldDisp} :dt
Must have periodic x,y dimensions and non-periodic z dimension to use
2d slab option with Ewald. :dd
{Incorrect boundaries with slab PPPM} :dt
Must have periodic x,y dimensions and non-periodic z dimension to use
2d slab option with PPPM. :dd
{Incorrect boundaries with slab PPPMDisp} :dt
Must have periodic x,y dimensions and non-periodic z dimension to use
2d slab option with pppm/disp. :dd
{Incorrect element names in ADP potential file} :dt
The element names in the ADP file do not match those requested. :dd
{Incorrect element names in EAM potential file} :dt
The element names in the EAM file do not match those requested. :dd
{Incorrect format in COMB potential file} :dt
Incorrect number of words per line in the potential file. :dd
{Incorrect format in COMB3 potential file} :dt
Incorrect number of words per line in the potential file. :dd
{Incorrect format in MEAM potential file} :dt
Incorrect number of words per line in the potential file. :dd
{Incorrect format in SNAP coefficient file} :dt
Incorrect number of words per line in the coefficient file. :dd
{Incorrect format in SNAP parameter file} :dt
Incorrect number of words per line in the parameter file. :dd
{Incorrect format in Stillinger-Weber potential file} :dt
Incorrect number of words per line in the potential file. :dd
{Incorrect format in TMD target file} :dt
Format of file read by fix tmd command is incorrect. :dd
{Incorrect format in Tersoff potential file} :dt
Incorrect number of words per line in the potential file. :dd
{Incorrect format in Vashishta potential file} :dt
Incorrect number of words per line in the potential file. :dd
{Incorrect format in coul/streitz potential file} :dt
Incorrect number of words per line in the potential file. :dd
{Incorrect format in nb3b/harmonic potential file} :dt
Incorrect number of words per line in the potential file. :dd
{Incorrect integer value in Bodies section of data file} :dt
See doc page for body style. :dd
{Incorrect multiplicity arg for dihedral coefficients} :dt
Self-explanatory. Check the input script or data file. :dd
{Incorrect number of elements in potential file} :dt
Self-explanatory. :dd
{Incorrect rigid body format in fix rigid file} :dt
The number of fields per line is not what expected. :dd
{Incorrect rigid body format in fix rigid/small file} :dt
The number of fields per line is not what expected. :dd
{Incorrect sign arg for dihedral coefficients} :dt
Self-explanatory. Check the input script or data file. :dd
{Incorrect table format check for element types} :dt
Self-explanatory. :dd
{Incorrect velocity format in data file} :dt
Each atom style defines a format for the Velocity section
of the data file. The read-in lines do not match. :dd
{Incorrect weight arg for dihedral coefficients} :dt
Self-explanatory. Check the input script or data file. :dd
{Index between variable brackets must be positive} :dt
Self-explanatory. :dd
{Indexed per-atom vector in variable formula without atom map} :dt
Accessing a value from an atom vector requires the ability to lookup
an atom index, which is provided by an atom map. An atom map does not
exist (by default) for non-molecular problems. Using the atom_modify
map command will force an atom map to be created. :dd
{Initial temperatures not all set in fix ttm} :dt
Self-explantory. :dd
{Input line quote not followed by whitespace} :dt
An end quote must be followed by whitespace. :dd
{Insertion region extends outside simulation box} :dt
Self-explanatory. :dd
{Insufficient Jacobi rotations for POEMS body} :dt
Eigensolve for rigid body was not sufficiently accurate. :dd
{Insufficient Jacobi rotations for body nparticle} :dt
Eigensolve for rigid body was not sufficiently accurate. :dd
{Insufficient Jacobi rotations for rigid body} :dt
Eigensolve for rigid body was not sufficiently accurate. :dd
{Insufficient Jacobi rotations for rigid molecule} :dt
Eigensolve for rigid body was not sufficiently accurate. :dd
{Insufficient Jacobi rotations for triangle} :dt
The calculation of the intertia tensor of the triangle failed. This
should not happen if it is a reasonably shaped triangle. :dd
{Insufficient memory on accelerator} :dt
There is insufficient memory on one of the devices specified for the gpu
package :dd
{Internal error in atom_style body} :dt
This error should not occur. Contact the developers. :dd
{Invalid -reorder N value} :dt
Self-explanatory. :dd
{Invalid Angles section in molecule file} :dt
Self-explanatory. :dd
{Invalid Bonds section in molecule file} :dt
Self-explanatory. :dd
{Invalid Boolean syntax in if command} :dt
Self-explanatory. :dd
{Invalid Charges section in molecule file} :dt
Self-explanatory. :dd
{Invalid Coords section in molecule file} :dt
Self-explanatory. :dd
{Invalid Diameters section in molecule file} :dt
Self-explanatory. :dd
{Invalid Dihedrals section in molecule file} :dt
Self-explanatory. :dd
{Invalid Impropers section in molecule file} :dt
Self-explanatory. :dd
{Invalid Kokkos command-line args} :dt
Self-explanatory. See Section 2.7 of the manual for details. :dd
{Invalid LAMMPS restart file} :dt
The file does not appear to be a LAMMPS restart file since
it doesn't contain the correct magic string at the beginning. :dd
{Invalid Masses section in molecule file} :dt
Self-explanatory. :dd
{Invalid REAX atom type} :dt
There is a mis-match between LAMMPS atom types and the elements
listed in the ReaxFF force field file. :dd
{Invalid Special Bond Counts section in molecule file} :dt
Self-explanatory. :dd
{Invalid Types section in molecule file} :dt
Self-explanatory. :dd
{Invalid angle count in molecule file} :dt
Self-explanatory. :dd
{Invalid angle table length} :dt
Length must be 2 or greater. :dd
{Invalid angle type in Angles section of data file} :dt
Angle type must be positive integer and within range of specified angle
types. :dd
{Invalid angle type in Angles section of molecule file} :dt
Self-explanatory. :dd
{Invalid angle type index for fix shake} :dt
Self-explanatory. :dd
{Invalid args for non-hybrid pair coefficients} :dt
"NULL" is only supported in pair_coeff calls when using pair hybrid :dd
{Invalid argument to factorial %d} :dt
N must be >= 0 and <= 167, otherwise the factorial result is too
large. :dd
{Invalid atom ID in %s section of data file} :dt
An atom in a section of the data file being read by fix property/atom
has an invalid atom ID that is <= 0 or > the maximum existing atom ID. :dd
{Invalid atom ID in Angles section of data file} :dt
Atom IDs must be positive integers and within range of defined
atoms. :dd
{Invalid atom ID in Angles section of molecule file} :dt
Self-explanatory. :dd
{Invalid atom ID in Atoms section of data file} :dt
Atom IDs must be positive integers. :dd
{Invalid atom ID in Bodies section of data file} :dt
Atom IDs must be positive integers and within range of defined
atoms. :dd
{Invalid atom ID in Bonds section of data file} :dt
Atom IDs must be positive integers and within range of defined
atoms. :dd
{Invalid atom ID in Bonds section of molecule file} :dt
Self-explanatory. :dd
{Invalid atom ID in Bonus section of data file} :dt
Atom IDs must be positive integers and within range of defined
atoms. :dd
{Invalid atom ID in Dihedrals section of data file} :dt
Atom IDs must be positive integers and within range of defined
atoms. :dd
{Invalid atom ID in Impropers section of data file} :dt
Atom IDs must be positive integers and within range of defined
atoms. :dd
{Invalid atom ID in Velocities section of data file} :dt
Atom IDs must be positive integers and within range of defined
atoms. :dd
{Invalid atom ID in dihedrals section of molecule file} :dt
Self-explanatory. :dd
{Invalid atom ID in impropers section of molecule file} :dt
Self-explanatory. :dd
{Invalid atom ID in variable file} :dt
Self-explanatory. :dd
{Invalid atom IDs in neb file} :dt
An ID in the file was not found in the system. :dd
{Invalid atom diameter in molecule file} :dt
Diameters must be >= 0.0. :dd
{Invalid atom mass for fix shake} :dt
Mass specified in fix shake command must be > 0.0. :dd
{Invalid atom mass in molecule file} :dt
Masses must be > 0.0. :dd
{Invalid atom type in Atoms section of data file} :dt
Atom types must range from 1 to specified # of types. :dd
{Invalid atom type in create_atoms command} :dt
The create_box command specified the range of valid atom types.
An invalid type is being requested. :dd
{Invalid atom type in create_atoms mol command} :dt
The atom types in the defined molecule are added to the value
specified in the create_atoms command, as an offset. The final value
for each atom must be between 1 to N, where N is the number of atom
types. :dd
{Invalid atom type in fix atom/swap command} :dt
The atom type specified in the atom/swap command does not exist. :dd
{Invalid atom type in fix bond/create command} :dt
Self-explanatory. :dd
{Invalid atom type in fix deposit command} :dt
Self-explanatory. :dd
{Invalid atom type in fix deposit mol command} :dt
The atom types in the defined molecule are added to the value
specified in the create_atoms command, as an offset. The final value
for each atom must be between 1 to N, where N is the number of atom
types. :dd
{Invalid atom type in fix gcmc command} :dt
The atom type specified in the gcmc command does not exist. :dd
{Invalid atom type in fix pour command} :dt
Self-explanatory. :dd
{Invalid atom type in fix pour mol command} :dt
The atom types in the defined molecule are added to the value
specified in the create_atoms command, as an offset. The final value
for each atom must be between 1 to N, where N is the number of atom
types. :dd
{Invalid atom type in molecule file} :dt
Atom types must range from 1 to specified # of types. :dd
{Invalid atom type in neighbor exclusion list} :dt
Atom types must range from 1 to Ntypes inclusive. :dd
{Invalid atom type index for fix shake} :dt
Atom types must range from 1 to Ntypes inclusive. :dd
{Invalid atom types in pair_write command} :dt
Atom types must range from 1 to Ntypes inclusive. :dd
{Invalid atom vector in variable formula} :dt
The atom vector is not recognized. :dd
{Invalid atom_style body command} :dt
No body style argument was provided. :dd
{Invalid atom_style command} :dt
Self-explanatory. :dd
{Invalid attribute in dump custom command} :dt
Self-explantory. :dd
{Invalid attribute in dump local command} :dt
Self-explantory. :dd
{Invalid attribute in dump modify command} :dt
Self-explantory. :dd
{Invalid basis setting in create_atoms command} :dt
The basis index must be between 1 to N where N is the number of basis
atoms in the lattice. The type index must be between 1 to N where N
is the number of atom types. :dd
{Invalid basis setting in fix append/atoms command} :dt
The basis index must be between 1 to N where N is the number of basis
atoms in the lattice. The type index must be between 1 to N where N
is the number of atom types. :dd
{Invalid bin bounds in compute chunk/atom} :dt
The lo/hi values are inconsistent. :dd
{Invalid bin bounds in fix ave/spatial} :dt
The lo/hi values are inconsistent. :dd
{Invalid body nparticle command} :dt
Arguments in atom-style command are not correct. :dd
{Invalid bond count in molecule file} :dt
Self-explanatory. :dd
{Invalid bond table length} :dt
Length must be 2 or greater. :dd
{Invalid bond type in Bonds section of data file} :dt
Bond type must be positive integer and within range of specified bond
types. :dd
{Invalid bond type in Bonds section of molecule file} :dt
Self-explanatory. :dd
{Invalid bond type in create_bonds command} :dt
Self-explanatory. :dd
{Invalid bond type in fix bond/break command} :dt
Self-explanatory. :dd
{Invalid bond type in fix bond/create command} :dt
Self-explanatory. :dd
{Invalid bond type index for fix shake} :dt
Self-explanatory. Check the fix shake command in the input script. :dd
{Invalid coeffs for this dihedral style} :dt
Cannot set class 2 coeffs in data file for this dihedral style. :dd
{Invalid color in dump_modify command} :dt
The specified color name was not in the list of recognized colors.
See the dump_modify doc page. :dd
{Invalid color map min/max values} :dt
The min/max values are not consistent with either each other or
with values in the color map. :dd
{Invalid command-line argument} :dt
One or more command-line arguments is invalid. Check the syntax of
the command you are using to launch LAMMPS. :dd
{Invalid compute ID in variable formula} :dt
The compute is not recognized. :dd
{Invalid create_atoms rotation vector for 2d model} :dt
The rotation vector can only have a z component. :dd
{Invalid custom OpenCL parameter string.} :dt
There are not enough or too many parameters in the custom string for package
GPU. :dd
{Invalid cutoff in comm_modify command} :dt
Specified cutoff must be >= 0.0. :dd
{Invalid cutoffs in pair_write command} :dt
Inner cutoff must be larger than 0.0 and less than outer cutoff. :dd
{Invalid d1 or d2 value for pair colloid coeff} :dt
Neither d1 or d2 can be < 0. :dd
{Invalid data file section: Angle Coeffs} :dt
Atom style does not allow angles. :dd
{Invalid data file section: AngleAngle Coeffs} :dt
Atom style does not allow impropers. :dd
{Invalid data file section: AngleAngleTorsion Coeffs} :dt
Atom style does not allow dihedrals. :dd
{Invalid data file section: AngleTorsion Coeffs} :dt
Atom style does not allow dihedrals. :dd
{Invalid data file section: Angles} :dt
Atom style does not allow angles. :dd
{Invalid data file section: Bodies} :dt
Atom style does not allow bodies. :dd
{Invalid data file section: Bond Coeffs} :dt
Atom style does not allow bonds. :dd
{Invalid data file section: BondAngle Coeffs} :dt
Atom style does not allow angles. :dd
{Invalid data file section: BondBond Coeffs} :dt
Atom style does not allow angles. :dd
{Invalid data file section: BondBond13 Coeffs} :dt
Atom style does not allow dihedrals. :dd
{Invalid data file section: Bonds} :dt
Atom style does not allow bonds. :dd
{Invalid data file section: Dihedral Coeffs} :dt
Atom style does not allow dihedrals. :dd
{Invalid data file section: Dihedrals} :dt
Atom style does not allow dihedrals. :dd
{Invalid data file section: Ellipsoids} :dt
Atom style does not allow ellipsoids. :dd
{Invalid data file section: EndBondTorsion Coeffs} :dt
Atom style does not allow dihedrals. :dd
{Invalid data file section: Improper Coeffs} :dt
Atom style does not allow impropers. :dd
{Invalid data file section: Impropers} :dt
Atom style does not allow impropers. :dd
{Invalid data file section: Lines} :dt
Atom style does not allow lines. :dd
{Invalid data file section: MiddleBondTorsion Coeffs} :dt
Atom style does not allow dihedrals. :dd
{Invalid data file section: Triangles} :dt
Atom style does not allow triangles. :dd
{Invalid delta_conf in tad command} :dt
The value must be between 0 and 1 inclusive. :dd
{Invalid density in Atoms section of data file} :dt
Density value cannot be <= 0.0. :dd
{Invalid density in set command} :dt
Density must be > 0.0. :dd
{Invalid diameter in set command} :dt
Self-explanatory. :dd
{Invalid dihedral count in molecule file} :dt
Self-explanatory. :dd
{Invalid dihedral type in Dihedrals section of data file} :dt
Dihedral type must be positive integer and within range of specified
dihedral types. :dd
{Invalid dihedral type in dihedrals section of molecule file} :dt
Self-explanatory. :dd
{Invalid dipole length in set command} :dt
Self-explanatory. :dd
{Invalid displace_atoms rotate axis for 2d} :dt
Axis must be in z direction. :dd
{Invalid dump dcd filename} :dt
Filenames used with the dump dcd style cannot be binary or compressed
or cause multiple files to be written. :dd
{Invalid dump frequency} :dt
Dump frequency must be 1 or greater. :dd
{Invalid dump image element name} :dt
The specified element name was not in the standard list of elements.
See the dump_modify doc page. :dd
{Invalid dump image filename} :dt
The file produced by dump image cannot be binary and must
be for a single processor. :dd
{Invalid dump image persp value} :dt
Persp value must be >= 0.0. :dd
{Invalid dump image theta value} :dt
Theta must be between 0.0 and 180.0 inclusive. :dd
{Invalid dump image zoom value} :dt
Zoom value must be > 0.0. :dd
{Invalid dump movie filename} :dt
The file produced by dump movie cannot be binary or compressed
and must be a single file for a single processor. :dd
{Invalid dump xtc filename} :dt
Filenames used with the dump xtc style cannot be binary or compressed
or cause multiple files to be written. :dd
{Invalid dump xyz filename} :dt
Filenames used with the dump xyz style cannot be binary or cause files
to be written by each processor. :dd
{Invalid dump_modify threshhold operator} :dt
Operator keyword used for threshold specification in not recognized. :dd
{Invalid entry in -reorder file} :dt
Self-explanatory. :dd
{Invalid fix ID in variable formula} :dt
The fix is not recognized. :dd
{Invalid fix ave/time off column} :dt
Self-explantory. :dd
{Invalid fix box/relax command for a 2d simulation} :dt
Fix box/relax styles involving the z dimension cannot be used in
a 2d simulation. :dd
{Invalid fix box/relax command pressure settings} :dt
If multiple dimensions are coupled, those dimensions must be specified. :dd
{Invalid fix box/relax pressure settings} :dt
Settings for coupled dimensions must be the same. :dd
{Invalid fix nvt/npt/nph command for a 2d simulation} :dt
Cannot control z dimension in a 2d model. :dd
{Invalid fix nvt/npt/nph command pressure settings} :dt
If multiple dimensions are coupled, those dimensions must be
specified. :dd
{Invalid fix nvt/npt/nph pressure settings} :dt
Settings for coupled dimensions must be the same. :dd
{Invalid fix press/berendsen for a 2d simulation} :dt
The z component of pressure cannot be controlled for a 2d model. :dd
{Invalid fix press/berendsen pressure settings} :dt
Settings for coupled dimensions must be the same. :dd
{Invalid fix qeq parameter file} :dt
Element index > number of atom types. :dd
{Invalid fix rigid npt/nph command for a 2d simulation} :dt
Cannot control z dimension in a 2d model. :dd
{Invalid fix rigid npt/nph command pressure settings} :dt
If multiple dimensions are coupled, those dimensions must be
specified. :dd
{Invalid fix rigid/small npt/nph command for a 2d simulation} :dt
Cannot control z dimension in a 2d model. :dd
{Invalid fix rigid/small npt/nph command pressure settings} :dt
If multiple dimensions are coupled, those dimensions must be
specified. :dd
{Invalid flag in force field section of restart file} :dt
Unrecognized entry in restart file. :dd
{Invalid flag in header section of restart file} :dt
Unrecognized entry in restart file. :dd
{Invalid flag in peratom section of restart file} :dt
The format of this section of the file is not correct. :dd
{Invalid flag in type arrays section of restart file} :dt
Unrecognized entry in restart file. :dd
{Invalid frequency in temper command} :dt
Nevery must be > 0. :dd
{Invalid group ID in neigh_modify command} :dt
A group ID used in the neigh_modify command does not exist. :dd
{Invalid group function in variable formula} :dt
Group function is not recognized. :dd
{Invalid group in comm_modify command} :dt
Self-explanatory. :dd
{Invalid image up vector} :dt
Up vector cannot be (0,0,0). :dd
{Invalid immediate variable} :dt
Syntax of immediate value is incorrect. :dd
{Invalid improper count in molecule file} :dt
Self-explanatory. :dd
{Invalid improper type in Impropers section of data file} :dt
Improper type must be positive integer and within range of specified
improper types. :dd
{Invalid improper type in impropers section of molecule file} :dt
Self-explanatory. :dd
{Invalid index for non-body particles in compute body/local command} :dt
Only indices 1,2,3 can be used for non-body particles. :dd
{Invalid index in compute body/local command} :dt
Self-explanatory. :dd
{Invalid is_active() function in variable formula} :dt
Self-explanatory. :dd
{Invalid is_available() function in variable formula} :dt
Self-explanatory. :dd
{Invalid is_defined() function in variable formula} :dt
Self-explanatory. :dd
{Invalid keyword in angle table parameters} :dt
Self-explanatory. :dd
{Invalid keyword in bond table parameters} :dt
Self-explanatory. :dd
{Invalid keyword in compute angle/local command} :dt
Self-explanatory. :dd
{Invalid keyword in compute bond/local command} :dt
Self-explanatory. :dd
{Invalid keyword in compute dihedral/local command} :dt
Self-explanatory. :dd
{Invalid keyword in compute improper/local command} :dt
Self-explanatory. :dd
{Invalid keyword in compute pair/local command} :dt
Self-explanatory. :dd
{Invalid keyword in compute property/atom command} :dt
Self-explanatory. :dd
{Invalid keyword in compute property/chunk command} :dt
Self-explanatory. :dd
{Invalid keyword in compute property/local command} :dt
Self-explanatory. :dd
{Invalid keyword in dump cfg command} :dt
Self-explanatory. :dd
{Invalid keyword in pair table parameters} :dt
Keyword used in list of table parameters is not recognized. :dd
{Invalid length in set command} :dt
Self-explanatory. :dd
{Invalid mass in set command} :dt
Self-explanatory. :dd
{Invalid mass line in data file} :dt
Self-explanatory. :dd
{Invalid mass value} :dt
Self-explanatory. :dd
{Invalid math function in variable formula} :dt
Self-explanatory. :dd
{Invalid math/group/special function in variable formula} :dt
Self-explanatory. :dd
{Invalid option in lattice command for non-custom style} :dt
Certain lattice keywords are not supported unless the
lattice style is "custom". :dd
{Invalid order of forces within respa levels} :dt
For respa, ordering of force computations within respa levels must
obey certain rules. E.g. bonds cannot be compute less frequently than
angles, pairwise forces cannot be computed less frequently than
kspace, etc. :dd
{Invalid pair table cutoff} :dt
Cutoffs in pair_coeff command are not valid with read-in pair table. :dd
{Invalid pair table length} :dt
Length of read-in pair table is invalid :dd
{Invalid param file for fix qeq/shielded} :dt
Invalid value of gamma. :dd
{Invalid param file for fix qeq/slater} :dt
Zeta value is 0.0. :dd
{Invalid partitions in processors part command} :dt
Valid partitions are numbered 1 to N and the sender and receiver
cannot be the same partition. :dd
{Invalid python command} :dt
Self-explanatory. Check the input script syntax and compare to the
documentation for the command. You can use -echo screen as a
command-line option when running LAMMPS to see the offending line. :dd
{Invalid radius in Atoms section of data file} :dt
Radius must be >= 0.0. :dd
{Invalid random number seed in fix ttm command} :dt
Random number seed must be > 0. :dd
{Invalid random number seed in set command} :dt
Random number seed must be > 0. :dd
{Invalid replace values in compute reduce} :dt
Self-explanatory. :dd
{Invalid rigid body ID in fix rigid file} :dt
The ID does not match the number of an existing ID of rigid bodies
that are defined by the fix rigid command. :dd
{Invalid rigid body ID in fix rigid/small file} :dt
The ID does not match the number of an existing ID of rigid bodies
that are defined by the fix rigid/small command. :dd
{Invalid run command N value} :dt
The number of timesteps must fit in a 32-bit integer. If you want to
run for more steps than this, perform multiple shorter runs. :dd
{Invalid run command start/stop value} :dt
Self-explanatory. :dd
{Invalid run command upto value} :dt
Self-explanatory. :dd
{Invalid seed for Marsaglia random # generator} :dt
The initial seed for this random number generator must be a positive
integer less than or equal to 900 million. :dd
{Invalid seed for Park random # generator} :dt
The initial seed for this random number generator must be a positive
integer. :dd
{Invalid shake angle type in molecule file} :dt
Self-explanatory. :dd
{Invalid shake atom in molecule file} :dt
Self-explanatory. :dd
{Invalid shake bond type in molecule file} :dt
Self-explanatory. :dd
{Invalid shake flag in molecule file} :dt
Self-explanatory. :dd
{Invalid shape in Ellipsoids section of data file} :dt
Self-explanatory. :dd
{Invalid shape in Triangles section of data file} :dt
Two or more of the triangle corners are duplicate points. :dd
{Invalid shape in set command} :dt
Self-explanatory. :dd
{Invalid shear direction for fix wall/gran} :dt
Self-explanatory. :dd
{Invalid special atom index in molecule file} :dt
Self-explanatory. :dd
{Invalid special function in variable formula} :dt
Self-explanatory. :dd
{Invalid style in pair_write command} :dt
Self-explanatory. Check the input script. :dd
{Invalid syntax in variable formula} :dt
Self-explanatory. :dd
{Invalid t_event in prd command} :dt
Self-explanatory. :dd
{Invalid t_event in tad command} :dt
The value must be greater than 0. :dd
{Invalid template atom in Atoms section of data file} :dt
The atom indices must be between 1 to N, where N is the number of
atoms in the template molecule the atom belongs to. :dd
{Invalid template index in Atoms section of data file} :dt
The template indices must be between 1 to N, where N is the number of
molecules in the template. :dd
{Invalid thermo keyword in variable formula} :dt
The keyword is not recognized. :dd
{Invalid threads_per_atom specified.} :dt
For 3-body potentials on the GPU, the threads_per_atom setting cannot be
greater than 4 for NVIDIA GPUs. :dd
{Invalid timestep reset for fix ave/atom} :dt
Resetting the timestep has invalidated the sequence of timesteps this
fix needs to process. :dd
{Invalid timestep reset for fix ave/chunk} :dt
Resetting the timestep has invalidated the sequence of timesteps this
fix needs to process. :dd
{Invalid timestep reset for fix ave/correlate} :dt
Resetting the timestep has invalidated the sequence of timesteps this
fix needs to process. :dd
{Invalid timestep reset for fix ave/histo} :dt
Resetting the timestep has invalidated the sequence of timesteps this
fix needs to process. :dd
{Invalid timestep reset for fix ave/spatial} :dt
Resetting the timestep has invalidated the sequence of timesteps this
fix needs to process. :dd
{Invalid timestep reset for fix ave/time} :dt
Resetting the timestep has invalidated the sequence of timesteps this
fix needs to process. :dd
{Invalid tmax in tad command} :dt
The value must be greater than 0.0. :dd
{Invalid type for mass set} :dt
Mass command must set a type from 1-N where N is the number of atom
types. :dd
{Invalid use of library file() function} :dt
This function is called thru the library interface. This
error should not occur. Contact the developers if it does. :dd
{Invalid value in set command} :dt
The value specified for the setting is invalid, likely because it is
too small or too large. :dd
{Invalid variable evaluation in variable formula} :dt
A variable used in a formula could not be evaluated. :dd
{Invalid variable in next command} :dt
Self-explanatory. :dd
{Invalid variable name} :dt
Variable name used in an input script line is invalid. :dd
{Invalid variable name in variable formula} :dt
Variable name is not recognized. :dd
{Invalid variable style in special function next} :dt
Only file-style or atomfile-style variables can be used with next(). :dd
{Invalid variable style with next command} :dt
Variable styles {equal} and {world} cannot be used in a next
command. :dd
{Invalid volume in set command} :dt
Volume must be > 0.0. :dd
{Invalid wiggle direction for fix wall/gran} :dt
Self-explanatory. :dd
{Invoked angle equil angle on angle style none} :dt
Self-explanatory. :dd
{Invoked angle single on angle style none} :dt
Self-explanatory. :dd
{Invoked bond equil distance on bond style none} :dt
Self-explanatory. :dd
{Invoked bond single on bond style none} :dt
Self-explanatory. :dd
{Invoked pair single on pair style none} :dt
A command (e.g. a dump) attempted to invoke the single() function on a
pair style none, which is illegal. You are probably attempting to
compute per-atom quantities with an undefined pair style. :dd
{Invoking coulombic in pair style lj/coul requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Invoking coulombic in pair style lj/long/dipole/long requires atom attribute q} :dt
The atom style defined does not have these attributes. :dd
{KIM neighbor iterator exceeded range} :dt
This should not happen. It likely indicates a bug
in the KIM implementation of the interatomic potential
where it is requesting neighbors incorrectly. :dd
{KOKKOS package does not yet support comm_style tiled} :dt
Self-explanatory. :dd
{KOKKOS package requires a kokkos enabled atom_style} :dt
Self-explanatory. :dd
{KSpace accuracy must be > 0} :dt
The kspace accuracy designated in the input must be greater than zero. :dd
{KSpace accuracy too large to estimate G vector} :dt
Reduce the accuracy request or specify gwald explicitly
via the kspace_modify command. :dd
{KSpace accuracy too low} :dt
Requested accuracy must be less than 1.0. :dd
{KSpace solver requires a pair style} :dt
No pair style is defined. :dd
{KSpace style does not yet support triclinic geometries} :dt
The specified kspace style does not allow for non-orthogonal
simulation boxes. :dd
{KSpace style has not yet been set} :dt
Cannot use kspace_modify command until a kspace style is set. :dd
{KSpace style is incompatible with Pair style} :dt
Setting a kspace style requires that a pair style with matching
long-range Coulombic or dispersion components be used. :dd
{Keyword %s in MEAM parameter file not recognized} :dt
Self-explanatory. :dd
{Kokkos has been compiled for CUDA but no GPUs are requested} :dt
One or more GPUs must be used when Kokkos is compiled for CUDA. :dd
{Kspace style does not support compute group/group} :dt
Self-explanatory. :dd
{Kspace style pppm/disp/tip4p requires newton on} :dt
Self-explanatory. :dd
{Kspace style pppm/tip4p requires newton on} :dt
Self-explanatory. :dd
{Kspace style requires atom attribute q} :dt
The atom style defined does not have these attributes. :dd
{Kspace_modify eigtol must be smaller than one} :dt
Self-explanatory. :dd
{LAMMPS is not built with Python embedded} :dt
This is done by including the PYTHON package before LAMMPS is built.
This is required to use python-style variables. :dd
{LAMMPS unit_style lj not supported by KIM models} :dt
Self-explanatory. Check the input script or data file. :dd
{LJ6 off not supported in pair_style buck/long/coul/long} :dt
Self-exlanatory. :dd
{Label wasn't found in input script} :dt
Self-explanatory. :dd
{Lattice orient vectors are not orthogonal} :dt
The three specified lattice orientation vectors must be mutually
orthogonal. :dd
{Lattice orient vectors are not right-handed} :dt
The three specified lattice orientation vectors must create a
right-handed coordinate system such that a1 cross a2 = a3. :dd
{Lattice primitive vectors are collinear} :dt
The specified lattice primitive vectors do not for a unit cell with
non-zero volume. :dd
{Lattice settings are not compatible with 2d simulation} :dt
One or more of the specified lattice vectors has a non-zero z
component. :dd
{Lattice spacings are invalid} :dt
Each x,y,z spacing must be > 0. :dd
{Lattice style incompatible with simulation dimension} :dt
2d simulation can use sq, sq2, or hex lattice. 3d simulation can use
sc, bcc, or fcc lattice. :dd
{Log of zero/negative value in variable formula} :dt
Self-explanatory. :dd
{Lost atoms via balance: original %ld current %ld} :dt
This should not occur. Report the problem to the developers. :dd
{Lost atoms: original %ld current %ld} :dt
Lost atoms are checked for each time thermo output is done. See the
thermo_modify lost command for options. Lost atoms usually indicate
bad dynamics, e.g. atoms have been blown far out of the simulation
box, or moved futher than one processor's sub-domain away before
reneighboring. :dd
{MEAM library error %d} :dt
A call to the MEAM Fortran library returned an error. :dd
{MPI_LMP_BIGINT and bigint in lmptype.h are not compatible} :dt
The size of the MPI datatype does not match the size of a bigint. :dd
{MPI_LMP_TAGINT and tagint in lmptype.h are not compatible} :dt
The size of the MPI datatype does not match the size of a tagint. :dd
{MSM can only currently be used with comm_style brick} :dt
This is a current restriction in LAMMPS. :dd
{MSM grid is too large} :dt
The global MSM grid is larger than OFFSET in one or more dimensions.
OFFSET is currently set to 16384. You likely need to decrease the
requested accuracy. :dd
{MSM order must be 4, 6, 8, or 10} :dt
This is a limitation of the MSM implementation in LAMMPS:
the MSM order can only be 4, 6, 8, or 10. :dd
{Mass command before simulation box is defined} :dt
The mass command cannot be used before a read_data, read_restart, or
create_box command. :dd
{Matrix factorization to split dispersion coefficients failed} :dt
This should not normally happen. Contact the developers. :dd
{Min_style command before simulation box is defined} :dt
The min_style command cannot be used before a read_data, read_restart,
or create_box command. :dd
{Minimization could not find thermo_pe compute} :dt
This compute is created by the thermo command. It must have been
explicitly deleted by a uncompute command. :dd
{Minimize command before simulation box is defined} :dt
The minimize command cannot be used before a read_data, read_restart,
or create_box command. :dd
{Mismatched brackets in variable} :dt
Self-explanatory. :dd
{Mismatched compute in variable formula} :dt
A compute is referenced incorrectly or a compute that produces per-atom
values is used in an equal-style variable formula. :dd
{Mismatched fix in variable formula} :dt
A fix is referenced incorrectly or a fix that produces per-atom
values is used in an equal-style variable formula. :dd
{Mismatched variable in variable formula} :dt
A variable is referenced incorrectly or an atom-style variable that
produces per-atom values is used in an equal-style variable
formula. :dd
{Modulo 0 in variable formula} :dt
Self-explanatory. :dd
{Molecule IDs too large for compute chunk/atom} :dt
The IDs must not be larger than can be stored in a 32-bit integer
since chunk IDs are 32-bit integers. :dd
{Molecule auto special bond generation overflow} :dt
Counts exceed maxspecial setting for other atoms in system. :dd
{Molecule file has angles but no nangles setting} :dt
Self-explanatory. :dd
{Molecule file has body params but no setting for them} :dt
Self-explanatory. :dd
{Molecule file has bonds but no nbonds setting} :dt
Self-explanatory. :dd
{Molecule file has dihedrals but no ndihedrals setting} :dt
Self-explanatory. :dd
{Molecule file has impropers but no nimpropers setting} :dt
Self-explanatory. :dd
{Molecule file has no Body Doubles section} :dt
Self-explanatory. :dd
{Molecule file has no Body Integers section} :dt
Self-explanatory. :dd
{Molecule file has special flags but no bonds} :dt
Self-explanatory. :dd
{Molecule file needs both Special Bond sections} :dt
Self-explanatory. :dd
{Molecule file requires atom style body} :dt
Self-explanatory. :dd
{Molecule file shake flags not before shake atoms} :dt
The order of the two sections is important. :dd
{Molecule file shake flags not before shake bonds} :dt
The order of the two sections is important. :dd
{Molecule file shake info is incomplete} :dt
All 3 SHAKE sections are needed. :dd
{Molecule file special list does not match special count} :dt
The number of values in an atom's special list does not match count. :dd
{Molecule file z center-of-mass must be 0.0 for 2d} :dt
Self-explanatory. :dd
{Molecule file z coord must be 0.0 for 2d} :dt
Self-explanatory. :dd
{Molecule natoms must be 1 for body particle} :dt
Self-explanatory. :dd
{Molecule sizescale must be 1.0 for body particle} :dt
Self-explanatory. :dd
{Molecule template ID for atom_style template does not exist} :dt
Self-explanatory. :dd
{Molecule template ID for create_atoms does not exist} :dt
Self-explantory. :dd
{Molecule template ID for fix deposit does not exist} :dt
Self-explanatory. :dd
{Molecule template ID for fix gcmc does not exist} :dt
Self-explanatory. :dd
{Molecule template ID for fix pour does not exist} :dt
Self-explanatory. :dd
{Molecule template ID for fix rigid/small does not exist} :dt
Self-explanatory. :dd
{Molecule template ID for fix shake does not exist} :dt
Self-explanatory. :dd
{Molecule template ID must be alphanumeric or underscore characters} :dt
Self-explanatory. :dd
{Molecule toplogy/atom exceeds system topology/atom} :dt
The number of bonds, angles, etc per-atom in the molecule exceeds the
system setting. See the create_box command for how to specify these
values. :dd
{Molecule topology type exceeds system topology type} :dt
The number of bond, angle, etc types in the molecule exceeds the
system setting. See the create_box command for how to specify these
values. :dd
{More than one fix deform} :dt
Only one fix deform can be defined at a time. :dd
{More than one fix freeze} :dt
Only one of these fixes can be defined, since the granular pair
potentials access it. :dd
{More than one fix shake} :dt
Only one fix shake can be defined. :dd
{Mu not allowed when not using semi-grand in fix atom/swap command} :dt
Self-explanatory. :dd
{Must define angle_style before Angle Coeffs} :dt
Must use an angle_style command before reading a data file that
defines Angle Coeffs. :dd
{Must define angle_style before BondAngle Coeffs} :dt
Must use an angle_style command before reading a data file that
defines Angle Coeffs. :dd
{Must define angle_style before BondBond Coeffs} :dt
Must use an angle_style command before reading a data file that
defines Angle Coeffs. :dd
{Must define bond_style before Bond Coeffs} :dt
Must use a bond_style command before reading a data file that
defines Bond Coeffs. :dd
{Must define dihedral_style before AngleAngleTorsion Coeffs} :dt
Must use a dihedral_style command before reading a data file that
defines AngleAngleTorsion Coeffs. :dd
{Must define dihedral_style before AngleTorsion Coeffs} :dt
Must use a dihedral_style command before reading a data file that
defines AngleTorsion Coeffs. :dd
{Must define dihedral_style before BondBond13 Coeffs} :dt
Must use a dihedral_style command before reading a data file that
defines BondBond13 Coeffs. :dd
{Must define dihedral_style before Dihedral Coeffs} :dt
Must use a dihedral_style command before reading a data file that
defines Dihedral Coeffs. :dd
{Must define dihedral_style before EndBondTorsion Coeffs} :dt
Must use a dihedral_style command before reading a data file that
defines EndBondTorsion Coeffs. :dd
{Must define dihedral_style before MiddleBondTorsion Coeffs} :dt
Must use a dihedral_style command before reading a data file that
defines MiddleBondTorsion Coeffs. :dd
{Must define improper_style before AngleAngle Coeffs} :dt
Must use an improper_style command before reading a data file that
defines AngleAngle Coeffs. :dd
{Must define improper_style before Improper Coeffs} :dt
Must use an improper_style command before reading a data file that
defines Improper Coeffs. :dd
{Must define pair_style before Pair Coeffs} :dt
Must use a pair_style command before reading a data file that defines
Pair Coeffs. :dd
{Must define pair_style before PairIJ Coeffs} :dt
Must use a pair_style command before reading a data file that defines
PairIJ Coeffs. :dd
{Must have more than one processor partition to temper} :dt
Cannot use the temper command with only one processor partition. Use
the -partition command-line option. :dd
{Must read Atoms before Angles} :dt
The Atoms section of a data file must come before an Angles section. :dd
{Must read Atoms before Bodies} :dt
The Atoms section of a data file must come before a Bodies section. :dd
{Must read Atoms before Bonds} :dt
The Atoms section of a data file must come before a Bonds section. :dd
{Must read Atoms before Dihedrals} :dt
The Atoms section of a data file must come before a Dihedrals section. :dd
{Must read Atoms before Ellipsoids} :dt
The Atoms section of a data file must come before a Ellipsoids
section. :dd
{Must read Atoms before Impropers} :dt
The Atoms section of a data file must come before an Impropers
section. :dd
{Must read Atoms before Lines} :dt
The Atoms section of a data file must come before a Lines section. :dd
{Must read Atoms before Triangles} :dt
The Atoms section of a data file must come before a Triangles section. :dd
{Must read Atoms before Velocities} :dt
The Atoms section of a data file must come before a Velocities
section. :dd
{Must set both respa inner and outer} :dt
Cannot use just the inner or outer option with respa without using the
other. :dd
{Must set number of threads via package omp command} :dt
Because you are using the USER-OMP package, set the number of threads
via its settings, not by the pair_style snap nthreads setting. :dd
{Must shrink-wrap piston boundary} :dt
The boundary style of the face where the piston is applied must be of
type s (shrink-wrapped). :dd
{Must specify a region in fix deposit} :dt
The region keyword must be specified with this fix. :dd
{Must specify a region in fix pour} :dt
Self-explanatory. :dd
{Must specify at least 2 types in fix atom/swap command} :dt
Self-explanatory. :dd
{Must use 'kspace_modify pressure/scalar no' for rRESPA with kspace_style MSM} :dt
The kspace scalar pressure option cannot (yet) be used with rRESPA. :dd
{Must use 'kspace_modify pressure/scalar no' for tensor components with kspace_style msm} :dt
Otherwise MSM will compute only a scalar pressure. See the kspace_modify
command for details on this setting. :dd
{Must use 'kspace_modify pressure/scalar no' to obtain per-atom virial with kspace_style MSM} :dt
The kspace scalar pressure option cannot be used to obtain per-atom virial. :dd
{Must use 'kspace_modify pressure/scalar no' with GPU MSM Pair styles} :dt
The kspace scalar pressure option is not (yet) compatible with GPU MSM Pair styles. :dd
{Must use 'kspace_modify pressure/scalar no' with kspace_style msm/cg} :dt
The kspace scalar pressure option is not compatible with kspace_style msm/cg. :dd
{Must use -in switch with multiple partitions} :dt
A multi-partition simulation cannot read the input script from stdin.
The -in command-line option must be used to specify a file. :dd
{Must use Kokkos half/thread or full neighbor list with threads or GPUs} :dt
Using Kokkos half-neighbor lists with threading is not allowed. :dd
{Must use a block or cylinder region with fix pour} :dt
Self-explanatory. :dd
{Must use a block region with fix pour for 2d simulations} :dt
Self-explanatory. :dd
{Must use a bond style with TIP4P potential} :dt
TIP4P potentials assume bond lengths in water are constrained
by a fix shake command. :dd
{Must use a molecular atom style with fix poems molecule} :dt
Self-explanatory. :dd
{Must use a z-axis cylinder region with fix pour} :dt
Self-explanatory. :dd
{Must use an angle style with TIP4P potential} :dt
TIP4P potentials assume angles in water are constrained by a fix shake
command. :dd
{Must use atom map style array with Kokkos} :dt
See the atom_modify map command. :dd
{Must use atom style with molecule IDs with fix bond/swap} :dt
Self-explanatory. :dd
{Must use pair_style comb or comb3 with fix qeq/comb} :dt
Self-explanatory. :dd
{Must use variable energy with fix addforce} :dt
Must define an energy vartiable when applyting a dynamic
force during minimization. :dd
{Must use variable energy with fix efield} :dt
You must define an energy when performing a minimization with a
variable E-field. :dd
{NEB command before simulation box is defined} :dt
Self-explanatory. :dd
{NEB requires damped dynamics minimizer} :dt
Use a different minimization style. :dd
{NEB requires use of fix neb} :dt
Self-explanatory. :dd
{NL ramp in wall/piston only implemented in zlo for now} :dt
The ramp keyword can only be used for piston applied to face zlo. :dd
{Need nswaptypes mu values in fix atom/swap command} :dt
Self-explanatory. :dd
{Needed bonus data not in data file} :dt
Some atom styles require bonus data. See the read_data doc page for
details. :dd
{Needed molecular topology not in data file} :dt
The header of the data file indicated bonds, angles, etc would be
included, but they are not present. :dd
{Neigh_modify exclude molecule requires atom attribute molecule} :dt
Self-explanatory. :dd
{Neigh_modify include group != atom_modify first group} :dt
Self-explanatory. :dd
{Neighbor delay must be 0 or multiple of every setting} :dt
The delay and every parameters set via the neigh_modify command are
inconsistent. If the delay setting is non-zero, then it must be a
multiple of the every setting. :dd
{Neighbor include group not allowed with ghost neighbors} :dt
This is a current restriction within LAMMPS. :dd
{Neighbor list overflow, boost neigh_modify one} :dt
There are too many neighbors of a single atom. Use the neigh_modify
command to increase the max number of neighbors allowed for one atom.
You may also want to boost the page size. :dd
{Neighbor multi not yet enabled for ghost neighbors} :dt
This is a current restriction within LAMMPS. :dd
{Neighbor multi not yet enabled for granular} :dt
Self-explanatory. :dd
{Neighbor multi not yet enabled for rRESPA} :dt
Self-explanatory. :dd
{Neighbor page size must be >= 10x the one atom setting} :dt
This is required to prevent wasting too much memory. :dd
{New atom IDs exceed maximum allowed ID} :dt
See the setting for tagint in the src/lmptype.h file. :dd
{New bond exceeded bonds per atom in create_bonds} :dt
See the read_data command for info on setting the "extra bond per
atom" header value to allow for additional bonds to be formed. :dd
{New bond exceeded bonds per atom in fix bond/create} :dt
See the read_data command for info on setting the "extra bond per
atom" header value to allow for additional bonds to be formed. :dd
{New bond exceeded special list size in fix bond/create} :dt
See the special_bonds extra command for info on how to leave space in
the special bonds list to allow for additional bonds to be formed. :dd
{Newton bond change after simulation box is defined} :dt
The newton command cannot be used to change the newton bond value
after a read_data, read_restart, or create_box command. :dd
{Next command must list all universe and uloop variables} :dt
This is to insure they stay in sync. :dd
{No Kspace style defined for compute group/group} :dt
Self-explanatory. :dd
{No OpenMP support compiled in} :dt
An OpenMP flag is set, but LAMMPS was not built with
OpenMP support. :dd
{No angle style is defined for compute angle/local} :dt
Self-explanatory. :dd
{No angles allowed with this atom style} :dt
Self-explanatory. :dd
{No atoms in data file} :dt
The header of the data file indicated that atoms would be included,
but they are not present. :dd
{No basis atoms in lattice} :dt
Basis atoms must be defined for lattice style user. :dd
{No bodies allowed with this atom style} :dt
Self-explanatory. Check data file. :dd
{No bond style is defined for compute bond/local} :dt
Self-explanatory. :dd
{No bonds allowed with this atom style} :dt
Self-explanatory. :dd
{No box information in dump. You have to use 'box no'} :dt
Self-explanatory. :dd
{No count or invalid atom count in molecule file} :dt
The number of atoms must be specified. :dd
{No dihedral style is defined for compute dihedral/local} :dt
Self-explanatory. :dd
{No dihedrals allowed with this atom style} :dt
Self-explanatory. :dd
{No dump custom arguments specified} :dt
The dump custom command requires that atom quantities be specified to
output to dump file. :dd
{No dump local arguments specified} :dt
Self-explanatory. :dd
{No ellipsoids allowed with this atom style} :dt
Self-explanatory. Check data file. :dd
{No fix gravity defined for fix pour} :dt
Gravity is required to use fix pour. :dd
{No improper style is defined for compute improper/local} :dt
Self-explanatory. :dd
{No impropers allowed with this atom style} :dt
Self-explanatory. :dd
{No input values for fix ave/spatial} :dt
Self-explanatory. :dd
{No lines allowed with this atom style} :dt
Self-explanatory. Check data file. :dd
{No matching element in ADP potential file} :dt
The ADP potential file does not contain elements that match the
requested elements. :dd
{No matching element in EAM potential file} :dt
The EAM potential file does not contain elements that match the
requested elements. :dd
{No molecule topology allowed with atom style template} :dt
The data file cannot specify the number of bonds, angles, etc,
because this info if inferred from the molecule templates. :dd
{No overlap of box and region for create_atoms} :dt
Self-explanatory. :dd
{No pair coul/streitz for fix qeq/slater} :dt
These commands must be used together. :dd
{No pair hbond/dreiding coefficients set} :dt
Self-explanatory. :dd
{No pair style defined for compute group/group} :dt
Cannot calculate group interactions without a pair style defined. :dd
{No pair style is defined for compute pair/local} :dt
Self-explanatory. :dd
{No pair style is defined for compute property/local} :dt
Self-explanatory. :dd
{No rigid bodies defined} :dt
The fix specification did not end up defining any rigid bodies. :dd
{No triangles allowed with this atom style} :dt
Self-explanatory. Check data file. :dd
{No values in fix ave/chunk command} :dt
Self-explanatory. :dd
{No values in fix ave/time command} :dt
Self-explanatory. :dd
{Non digit character between brackets in variable} :dt
Self-explantory. :dd
{Non integer # of swaps in temper command} :dt
Swap frequency in temper command must evenly divide the total # of
timesteps. :dd
{Non-numeric box dimensions - simulation unstable} :dt
The box size has apparently blown up. :dd
{Non-zero atom IDs with atom_modify id = no} :dt
Self-explanatory. :dd
{Non-zero read_data shift z value for 2d simulation} :dt
Self-explanatory. :dd
{Nprocs not a multiple of N for -reorder} :dt
Self-explanatory. :dd
{Number of core atoms != number of shell atoms} :dt
There must be a one-to-one pairing of core and shell atoms. :dd
{Numeric index is out of bounds} :dt
A command with an argument that specifies an integer or range of
integers is using a value that is less than 1 or greater than the
maximum allowed limit. :dd
{One or more Atom IDs is negative} :dt
Atom IDs must be positive integers. :dd
{One or more atom IDs is too big} :dt
The limit on atom IDs is set by the SMALLBIG, BIGBIG, SMALLSMALL
setting in your Makefile. See Section_start 2.2 of the manual for
more details. :dd
{One or more atom IDs is zero} :dt
Either all atoms IDs must be zero or none of them. :dd
{One or more atoms belong to multiple rigid bodies} :dt
Two or more rigid bodies defined by the fix rigid command cannot
contain the same atom. :dd
{One or more rigid bodies are a single particle} :dt
Self-explanatory. :dd
{One or zero atoms in rigid body} :dt
Any rigid body defined by the fix rigid command must contain 2 or more
atoms. :dd
{Only 2 types allowed when not using semi-grand in fix atom/swap command} :dt
Self-explanatory. :dd
{Only one cut-off allowed when requesting all long} :dt
Self-explanatory. :dd
{Only one cutoff allowed when requesting all long} :dt
Self-explanatory. :dd
{Only zhi currently implemented for fix append/atoms} :dt
Self-explanatory. :dd
{Out of range atoms - cannot compute MSM} :dt
One or more atoms are attempting to map their charge to a MSM grid point
that is not owned by a processor. This is likely for one of two
reasons, both of them bad. First, it may mean that an atom near the
boundary of a processor's sub-domain has moved more than 1/2 the
"neighbor skin distance"_neighbor.html without neighbor lists being
rebuilt and atoms being migrated to new processors. This also means
you may be missing pairwise interactions that need to be computed.
The solution is to change the re-neighboring criteria via the
"neigh_modify"_neigh_modify.html command. The safest settings are
"delay 0 every 1 check yes". Second, it may mean that an atom has
moved far outside a processor's sub-domain or even the entire
simulation box. This indicates bad physics, e.g. due to highly
overlapping atoms, too large a timestep, etc. :dd
{Out of range atoms - cannot compute PPPM} :dt
One or more atoms are attempting to map their charge to a PPPM grid
point that is not owned by a processor. This is likely for one of two
reasons, both of them bad. First, it may mean that an atom near the
boundary of a processor's sub-domain has moved more than 1/2 the
"neighbor skin distance"_neighbor.html without neighbor lists being
rebuilt and atoms being migrated to new processors. This also means
you may be missing pairwise interactions that need to be computed.
The solution is to change the re-neighboring criteria via the
"neigh_modify"_neigh_modify.html command. The safest settings are
"delay 0 every 1 check yes". Second, it may mean that an atom has
moved far outside a processor's sub-domain or even the entire
simulation box. This indicates bad physics, e.g. due to highly
overlapping atoms, too large a timestep, etc. :dd
{Out of range atoms - cannot compute PPPMDisp} :dt
One or more atoms are attempting to map their charge to a PPPM grid
point that is not owned by a processor. This is likely for one of two
reasons, both of them bad. First, it may mean that an atom near the
boundary of a processor's sub-domain has moved more than 1/2 the
"neighbor skin distance"_neighbor.html without neighbor lists being
rebuilt and atoms being migrated to new processors. This also means
you may be missing pairwise interactions that need to be computed.
The solution is to change the re-neighboring criteria via the
"neigh_modify"_neigh_modify.html command. The safest settings are
"delay 0 every 1 check yes". Second, it may mean that an atom has
moved far outside a processor's sub-domain or even the entire
simulation box. This indicates bad physics, e.g. due to highly
overlapping atoms, too large a timestep, etc. :dd
{Overflow of allocated fix vector storage} :dt
This should not normally happen if the fix correctly calculated
how long the vector will grow to. Contact the developers. :dd
{Overlapping large/large in pair colloid} :dt
This potential is infinite when there is an overlap. :dd
{Overlapping small/large in pair colloid} :dt
This potential is infinite when there is an overlap. :dd
{POEMS fix must come before NPT/NPH fix} :dt
NPT/NPH fix must be defined in input script after all poems fixes,
else the fix contribution to the pressure virial is incorrect. :dd
{PPPM can only currently be used with comm_style brick} :dt
This is a current restriction in LAMMPS. :dd
{PPPM grid is too large} :dt
The global PPPM grid is larger than OFFSET in one or more dimensions.
OFFSET is currently set to 4096. You likely need to decrease the
requested accuracy. :dd
{PPPM grid stencil extends beyond nearest neighbor processor} :dt
This is not allowed if the kspace_modify overlap setting is no. :dd
{PPPM order < minimum allowed order} :dt
The default minimum order is 2. This can be reset by the
kspace_modify minorder command. :dd
{PPPM order cannot be < 2 or > than %d} :dt
This is a limitation of the PPPM implementation in LAMMPS. :dd
{PPPMDisp Coulomb grid is too large} :dt
The global PPPM grid is larger than OFFSET in one or more dimensions.
OFFSET is currently set to 4096. You likely need to decrease the
requested accuracy. :dd
{PPPMDisp Dispersion grid is too large} :dt
The global PPPM grid is larger than OFFSET in one or more dimensions.
OFFSET is currently set to 4096. You likely need to decrease the
requested accuracy. :dd
{PPPMDisp can only currently be used with comm_style brick} :dt
This is a current restriction in LAMMPS. :dd
{PPPMDisp coulomb order cannot be greater than %d} :dt
This is a limitation of the PPPM implementation in LAMMPS. :dd
{PPPMDisp used but no parameters set, for further information please see the pppm/disp documentation} :dt
An efficient and accurate usage of the pppm/disp requires settings via the kspace_modify command. Please see the pppm/disp documentation for further instructions. :dd
{PRD command before simulation box is defined} :dt
The prd command cannot be used before a read_data,
read_restart, or create_box command. :dd
{PRD nsteps must be multiple of t_event} :dt
Self-explanatory. :dd
{PRD t_corr must be multiple of t_event} :dt
Self-explanatory. :dd
{Package command after simulation box is defined} :dt
The package command cannot be used afer a read_data, read_restart, or
create_box command. :dd
{Package cuda command without USER-CUDA package enabled} :dt
The USER-CUDA package must be installed via "make yes-user-cuda"
before LAMMPS is built, and the "-c on" must be used to enable the
package. :dd
{Package gpu command without GPU package installed} :dt
The GPU package must be installed via "make yes-gpu" before LAMMPS is
built. :dd
{Package intel command without USER-INTEL package installed} :dt
The USER-INTEL package must be installed via "make yes-user-intel"
before LAMMPS is built. :dd
{Package kokkos command without KOKKOS package enabled} :dt
The KOKKOS package must be installed via "make yes-kokkos" before
LAMMPS is built, and the "-k on" must be used to enable the package. :dd
{Package omp command without USER-OMP package installed} :dt
The USER-OMP package must be installed via "make yes-user-omp" before
LAMMPS is built. :dd
{Pair body requires atom style body} :dt
Self-explanatory. :dd
{Pair body requires body style nparticle} :dt
This pair style is specific to the nparticle body style. :dd
{Pair brownian requires atom style sphere} :dt
Self-explanatory. :dd
{Pair brownian requires extended particles} :dt
One of the particles has radius 0.0. :dd
{Pair brownian requires monodisperse particles} :dt
All particles must be the same finite size. :dd
{Pair brownian/poly requires atom style sphere} :dt
Self-explanatory. :dd
{Pair brownian/poly requires extended particles} :dt
One of the particles has radius 0.0. :dd
{Pair brownian/poly requires newton pair off} :dt
Self-explanatory. :dd
{Pair coeff for hybrid has invalid style} :dt
Style in pair coeff must have been listed in pair_style command. :dd
{Pair coul/wolf requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair cutoff < Respa interior cutoff} :dt
One or more pairwise cutoffs are too short to use with the specified
rRESPA cutoffs. :dd
{Pair dipole/cut requires atom attributes q, mu, torque} :dt
The atom style defined does not have these attributes. :dd
{Pair dipole/cut/gpu requires atom attributes q, mu, torque} :dt
The atom style defined does not have this attribute. :dd
{Pair dipole/long requires atom attributes q, mu, torque} :dt
The atom style defined does not have these attributes. :dd
{Pair dipole/sf/gpu requires atom attributes q, mu, torque} :dt
The atom style defined does not one or more of these attributes. :dd
{Pair distance < table inner cutoff} :dt
Two atoms are closer together than the pairwise table allows. :dd
{Pair distance > table outer cutoff} :dt
Two atoms are further apart than the pairwise table allows. :dd
{Pair dpd requires ghost atoms store velocity} :dt
Use the comm_modify vel yes command to enable this. :dd
{Pair gayberne epsilon a,b,c coeffs are not all set} :dt
Each atom type involved in pair_style gayberne must
have these 3 coefficients set at least once. :dd
{Pair gayberne requires atom style ellipsoid} :dt
Self-explanatory. :dd
{Pair gayberne requires atoms with same type have same shape} :dt
Self-explanatory. :dd
{Pair gayberne/gpu requires atom style ellipsoid} :dt
Self-explanatory. :dd
{Pair gayberne/gpu requires atoms with same type have same shape} :dt
Self-explanatory. :dd
{Pair granular requires atom attributes radius, rmass} :dt
The atom style defined does not have these attributes. :dd
{Pair granular requires ghost atoms store velocity} :dt
Use the comm_modify vel yes command to enable this. :dd
{Pair granular with shear history requires newton pair off} :dt
This is a current restriction of the implementation of pair
granular styles with history. :dd
{Pair hybrid single calls do not support per sub-style special bond values} :dt
Self-explanatory. :dd
{Pair hybrid sub-style does not support single call} :dt
You are attempting to invoke a single() call on a pair style
that doesn't support it. :dd
{Pair hybrid sub-style is not used} :dt
No pair_coeff command used a sub-style specified in the pair_style
command. :dd
{Pair inner cutoff < Respa interior cutoff} :dt
One or more pairwise cutoffs are too short to use with the specified
rRESPA cutoffs. :dd
{Pair inner cutoff >= Pair outer cutoff} :dt
The specified cutoffs for the pair style are inconsistent. :dd
{Pair line/lj requires atom style line} :dt
Self-explanatory. :dd
{Pair lj/long/dipole/long requires atom attributes mu, torque} :dt
The atom style defined does not have these attributes. :dd
{Pair lubricate requires atom style sphere} :dt
Self-explanatory. :dd
{Pair lubricate requires ghost atoms store velocity} :dt
Use the comm_modify vel yes command to enable this. :dd
{Pair lubricate requires monodisperse particles} :dt
All particles must be the same finite size. :dd
{Pair lubricate/poly requires atom style sphere} :dt
Self-explanatory. :dd
{Pair lubricate/poly requires extended particles} :dt
One of the particles has radius 0.0. :dd
{Pair lubricate/poly requires ghost atoms store velocity} :dt
Use the comm_modify vel yes command to enable this. :dd
{Pair lubricate/poly requires newton pair off} :dt
Self-explanatory. :dd
{Pair lubricateU requires atom style sphere} :dt
Self-explanatory. :dd
{Pair lubricateU requires ghost atoms store velocity} :dt
Use the comm_modify vel yes command to enable this. :dd
{Pair lubricateU requires monodisperse particles} :dt
All particles must be the same finite size. :dd
{Pair lubricateU/poly requires ghost atoms store velocity} :dt
Use the comm_modify vel yes command to enable this. :dd
{Pair lubricateU/poly requires newton pair off} :dt
Self-explanatory. :dd
{Pair peri lattice is not identical in x, y, and z} :dt
The lattice defined by the lattice command must be cubic. :dd
{Pair peri requires a lattice be defined} :dt
Use the lattice command for this purpose. :dd
{Pair peri requires an atom map, see atom_modify} :dt
Even for atomic systems, an atom map is required to find Peridynamic
bonds. Use the atom_modify command to define one. :dd
{Pair resquared epsilon a,b,c coeffs are not all set} :dt
Self-explanatory. :dd
{Pair resquared epsilon and sigma coeffs are not all set} :dt
Self-explanatory. :dd
{Pair resquared requires atom style ellipsoid} :dt
Self-explanatory. :dd
{Pair resquared requires atoms with same type have same shape} :dt
Self-explanatory. :dd
{Pair resquared/gpu requires atom style ellipsoid} :dt
Self-explanatory. :dd
{Pair resquared/gpu requires atoms with same type have same shape} :dt
Self-explanatory. :dd
{Pair style AIREBO requires atom IDs} :dt
This is a requirement to use the AIREBO potential. :dd
{Pair style AIREBO requires newton pair on} :dt
See the newton command. This is a restriction to use the AIREBO
potential. :dd
{Pair style BOP requires atom IDs} :dt
This is a requirement to use the BOP potential. :dd
{Pair style BOP requires newton pair on} :dt
See the newton command. This is a restriction to use the BOP
potential. :dd
{Pair style COMB requires atom IDs} :dt
This is a requirement to use the AIREBO potential. :dd
{Pair style COMB requires atom attribute q} :dt
Self-explanatory. :dd
{Pair style COMB requires newton pair on} :dt
See the newton command. This is a restriction to use the COMB
potential. :dd
{Pair style COMB3 requires atom IDs} :dt
This is a requirement to use the COMB3 potential. :dd
{Pair style COMB3 requires atom attribute q} :dt
Self-explanatory. :dd
{Pair style COMB3 requires newton pair on} :dt
See the newton command. This is a restriction to use the COMB3
potential. :dd
{Pair style LCBOP requires atom IDs} :dt
This is a requirement to use the LCBOP potential. :dd
{Pair style LCBOP requires newton pair on} :dt
See the newton command. This is a restriction to use the Tersoff
potential. :dd
{Pair style MEAM requires newton pair on} :dt
See the newton command. This is a restriction to use the MEAM
potential. :dd
{Pair style SNAP requires newton pair on} :dt
See the newton command. This is a restriction to use the SNAP
potential. :dd
{Pair style Stillinger-Weber requires atom IDs} :dt
This is a requirement to use the SW potential. :dd
{Pair style Stillinger-Weber requires newton pair on} :dt
See the newton command. This is a restriction to use the SW
potential. :dd
{Pair style Tersoff requires atom IDs} :dt
This is a requirement to use the Tersoff potential. :dd
{Pair style Tersoff requires newton pair on} :dt
See the newton command. This is a restriction to use the Tersoff
potential. :dd
{Pair style Vashishta requires atom IDs} :dt
This is a requirement to use the Vashishta potential. :dd
{Pair style Vashishta requires newton pair on} :dt
See the newton command. This is a restriction to use the Vashishta
potential. :dd
{Pair style bop requires comm ghost cutoff at least 3x larger than %g} :dt
Use the communicate ghost command to set this. See the pair bop
doc page for more details. :dd
{Pair style born/coul/long requires atom attribute q} :dt
An atom style that defines this attribute must be used. :dd
{Pair style born/coul/long/gpu requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style born/coul/wolf requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style buck/coul/cut requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style buck/coul/long requires atom attribute q} :dt
The atom style defined does not have these attributes. :dd
{Pair style buck/coul/long/gpu requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style buck/long/coul/long requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style coul/cut requires atom attribute q} :dt
The atom style defined does not have these attributes. :dd
{Pair style coul/cut/gpu requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style coul/debye/gpu requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style coul/dsf requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style coul/dsf/gpu requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style coul/long/gpu requires atom attribute q} :dt
The atom style defined does not have these attributes. :dd
{Pair style coul/streitz requires atom attribute q} :dt
Self-explanatory. :dd
{Pair style does not have extra field requested by compute pair/local} :dt
The pair style does not support the pN value requested by the compute
pair/local command. :dd
{Pair style does not support bond_style quartic} :dt
The pair style does not have a single() function, so it can
not be invoked by bond_style quartic. :dd
{Pair style does not support compute group/group} :dt
The pair_style does not have a single() function, so it cannot be
invokded by the compute group/group command. :dd
{Pair style does not support compute pair/local} :dt
The pair style does not have a single() function, so it can
not be invoked by compute pair/local. :dd
{Pair style does not support compute property/local} :dt
The pair style does not have a single() function, so it can
not be invoked by fix bond/swap. :dd
{Pair style does not support fix bond/swap} :dt
The pair style does not have a single() function, so it can
not be invoked by fix bond/swap. :dd
{Pair style does not support pair_write} :dt
The pair style does not have a single() function, so it can
not be invoked by pair write. :dd
{Pair style does not support rRESPA inner/middle/outer} :dt
You are attempting to use rRESPA options with a pair style that
does not support them. :dd
{Pair style granular with history requires atoms have IDs} :dt
Atoms in the simulation do not have IDs, so history effects
cannot be tracked by the granular pair potential. :dd
{Pair style hbond/dreiding requires an atom map, see atom_modify} :dt
Self-explanatory. :dd
{Pair style hbond/dreiding requires atom IDs} :dt
Self-explanatory. :dd
{Pair style hbond/dreiding requires molecular system} :dt
Self-explanatory. :dd
{Pair style hbond/dreiding requires newton pair on} :dt
See the newton command for details. :dd
{Pair style hybrid cannot have hybrid as an argument} :dt
Self-explanatory. :dd
{Pair style hybrid cannot have none as an argument} :dt
Self-explanatory. :dd
{Pair style is incompatible with KSpace style} :dt
If a pair style with a long-range Coulombic component is selected,
then a kspace style must also be used. :dd
{Pair style is incompatible with TIP4P KSpace style} :dt
The pair style does not have the requires TIP4P settings. :dd
{Pair style lj/charmm/coul/charmm requires atom attribute q} :dt
The atom style defined does not have these attributes. :dd
{Pair style lj/charmm/coul/long requires atom attribute q} :dt
The atom style defined does not have these attributes. :dd
{Pair style lj/charmm/coul/long/gpu requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style lj/class2/coul/cut requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style lj/class2/coul/long requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style lj/class2/coul/long/gpu requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style lj/cut/coul/cut requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style lj/cut/coul/cut/gpu requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style lj/cut/coul/debye/gpu requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style lj/cut/coul/dsf requires atom attribute q} :dt
The atom style defined does not have these attributes. :dd
{Pair style lj/cut/coul/dsf/gpu requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style lj/cut/coul/long requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style lj/cut/coul/long/gpu requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style lj/cut/tip4p/cut requires atom IDs} :dt
This is a requirement to use this potential. :dd
{Pair style lj/cut/tip4p/cut requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style lj/cut/tip4p/cut requires newton pair on} :dt
See the newton command. This is a restriction to use this
potential. :dd
{Pair style lj/cut/tip4p/long requires atom IDs} :dt
There are no atom IDs defined in the system and the TIP4P potential
requires them to find O,H atoms with a water molecule. :dd
{Pair style lj/cut/tip4p/long requires atom attribute q} :dt
The atom style defined does not have these attributes. :dd
{Pair style lj/cut/tip4p/long requires newton pair on} :dt
This is because the computation of constraint forces within a water
molecule adds forces to atoms owned by other processors. :dd
{Pair style lj/gromacs/coul/gromacs requires atom attribute q} :dt
An atom_style with this attribute is needed. :dd
{Pair style lj/long/dipole/long does not currently support respa} :dt
This feature is not yet supported. :dd
{Pair style lj/long/tip4p/long requires atom IDs} :dt
There are no atom IDs defined in the system and the TIP4P potential
requires them to find O,H atoms with a water molecule. :dd
{Pair style lj/long/tip4p/long requires atom attribute q} :dt
The atom style defined does not have these attributes. :dd
{Pair style lj/long/tip4p/long requires newton pair on} :dt
This is because the computation of constraint forces within a water
molecule adds forces to atoms owned by other processors. :dd
{Pair style lj/sdk/coul/long/gpu requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style nb3b/harmonic requires atom IDs} :dt
This is a requirement to use this potential. :dd
{Pair style nb3b/harmonic requires newton pair on} :dt
See the newton command. This is a restriction to use this potential. :dd
{Pair style nm/cut/coul/cut requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style nm/cut/coul/long requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style peri requires atom style peri} :dt
Self-explanatory. :dd
{Pair style polymorphic requires atom IDs} :dt
This is a requirement to use the polymorphic potential. :dd
{Pair style polymorphic requires newton pair on} :dt
See the newton command. This is a restriction to use the polymorphic
potential. :dd
{Pair style reax requires atom IDs} :dt
This is a requirement to use the ReaxFF potential. :dd
{Pair style reax requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style reax requires newton pair on} :dt
This is a requirement to use the ReaxFF potential. :dd
{Pair style requires a KSpace style} :dt
No kspace style is defined. :dd
{Pair style requires use of kspace_style ewald/disp} :dt
Self-explanatory. :dd
{Pair style sw/gpu requires atom IDs} :dt
This is a requirement to use this potential. :dd
{Pair style sw/gpu requires newton pair off} :dt
See the newton command. This is a restriction to use this potential. :dd
{Pair style tersoff/gpu requires atom IDs} :dt
This is a requirement to use the tersoff/gpu potential. :dd
{Pair style tersoff/gpu requires newton pair off} :dt
See the newton command. This is a restriction to use this pair style. :dd
{Pair style tip4p/cut requires atom IDs} :dt
This is a requirement to use this potential. :dd
{Pair style tip4p/cut requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style tip4p/cut requires newton pair on} :dt
See the newton command. This is a restriction to use this potential. :dd
{Pair style tip4p/long requires atom IDs} :dt
There are no atom IDs defined in the system and the TIP4P potential
requires them to find O,H atoms with a water molecule. :dd
{Pair style tip4p/long requires atom attribute q} :dt
The atom style defined does not have these attributes. :dd
{Pair style tip4p/long requires newton pair on} :dt
This is because the computation of constraint forces within a water
molecule adds forces to atoms owned by other processors. :dd
{Pair table cutoffs must all be equal to use with KSpace} :dt
When using pair style table with a long-range KSpace solver, the
cutoffs for all atom type pairs must all be the same, since the
long-range solver starts at that cutoff. :dd
{Pair table parameters did not set N} :dt
List of pair table parameters must include N setting. :dd
{Pair tersoff/zbl requires metal or real units} :dt
This is a current restriction of this pair potential. :dd
{Pair tersoff/zbl/kk requires metal or real units} :dt
This is a current restriction of this pair potential. :dd
{Pair tri/lj requires atom style tri} :dt
Self-explanatory. :dd
{Pair yukawa/colloid requires atom style sphere} :dt
Self-explantory. :dd
{Pair yukawa/colloid requires atoms with same type have same radius} :dt
Self-explantory. :dd
{Pair yukawa/colloid/gpu requires atom style sphere} :dt
Self-explanatory. :dd
{PairKIM only works with 3D problems} :dt
This is a current limitation. :dd
{Pair_coeff command before pair_style is defined} :dt
Self-explanatory. :dd
{Pair_coeff command before simulation box is defined} :dt
The pair_coeff command cannot be used before a read_data,
read_restart, or create_box command. :dd
{Pair_modify command before pair_style is defined} :dt
Self-explanatory. :dd
{Pair_modify special setting for pair hybrid incompatible with global special_bonds setting} :dt
Cannot override a setting of 0.0 or 1.0 or change a setting between
0.0 and 1.0. :dd
{Pair_write command before pair_style is defined} :dt
Self-explanatory. :dd
{Particle on or inside fix wall surface} :dt
Particles must be "exterior" to the wall in order for energy/force to
be calculated. :dd
{Particle outside surface of region used in fix wall/region} :dt
Particles must be inside the region for energy/force to be calculated.
A particle outside the region generates an error. :dd
{Per-atom compute in equal-style variable formula} :dt
Equal-style variables cannot use per-atom quantities. :dd
{Per-atom energy was not tallied on needed timestep} :dt
You are using a thermo keyword that requires potentials to
have tallied energy, but they didn't on this timestep. See the
variable doc page for ideas on how to make this work. :dd
{Per-atom fix in equal-style variable formula} :dt
Equal-style variables cannot use per-atom quantities. :dd
{Per-atom virial was not tallied on needed timestep} :dt
You are using a thermo keyword that requires potentials to have
tallied the virial, but they didn't on this timestep. See the
variable doc page for ideas on how to make this work. :dd
{Per-processor system is too big} :dt
The number of owned atoms plus ghost atoms on a single
processor must fit in 32-bit integer. :dd
{Potential energy ID for fix neb does not exist} :dt
Self-explanatory. :dd
{Potential energy ID for fix nvt/nph/npt does not exist} :dt
A compute for potential energy must be defined. :dd
{Potential file has duplicate entry} :dt
The potential file has more than one entry for the same element. :dd
{Potential file is missing an entry} :dt
The potential file does not have a needed entry. :dd
{Power by 0 in variable formula} :dt
Self-explanatory. :dd
{Pressure ID for fix box/relax does not exist} :dt
The compute ID needed to compute pressure for the fix does not
exist. :dd
{Pressure ID for fix modify does not exist} :dt
Self-explanatory. :dd
{Pressure ID for fix npt/nph does not exist} :dt
Self-explanatory. :dd
{Pressure ID for fix press/berendsen does not exist} :dt
The compute ID needed to compute pressure for the fix does not
exist. :dd
{Pressure ID for fix rigid npt/nph does not exist} :dt
Self-explanatory. :dd
{Pressure ID for thermo does not exist} :dt
The compute ID needed to compute pressure for thermodynamics does not
exist. :dd
{Pressure control can not be used with fix nvt} :dt
Self-explanatory. :dd
{Pressure control can not be used with fix nvt/asphere} :dt
Self-explanatory. :dd
{Pressure control can not be used with fix nvt/body} :dt
Self-explanatory. :dd
{Pressure control can not be used with fix nvt/sllod} :dt
Self-explanatory. :dd
{Pressure control can not be used with fix nvt/sphere} :dt
Self-explanatory. :dd
{Pressure control must be used with fix nph} :dt
Self-explanatory. :dd
{Pressure control must be used with fix nph/asphere} :dt
Self-explanatory. :dd
{Pressure control must be used with fix nph/body} :dt
Self-explanatory. :dd
{Pressure control must be used with fix nph/small} :dt
Self-explanatory. :dd
{Pressure control must be used with fix nph/sphere} :dt
Self-explanatory. :dd
{Pressure control must be used with fix nphug} :dt
A pressure control keyword (iso, aniso, tri, x, y, or z) must be
provided. :dd
{Pressure control must be used with fix npt} :dt
Self-explanatory. :dd
{Pressure control must be used with fix npt/asphere} :dt
Self-explanatory. :dd
{Pressure control must be used with fix npt/body} :dt
Self-explanatory. :dd
{Pressure control must be used with fix npt/sphere} :dt
Self-explanatory. :dd
{Processor count in z must be 1 for 2d simulation} :dt
Self-explanatory. :dd
{Processor partitions do not match number of allocated processors} :dt
The total number of processors in all partitions must match the number
of processors LAMMPS is running on. :dd
{Processors command after simulation box is defined} :dt
The processors command cannot be used after a read_data, read_restart,
or create_box command. :dd
{Processors custom grid file is inconsistent} :dt
The vales in the custom file are not consistent with the number of
processors you are running on or the Px,Py,Pz settings of the
processors command. Or there was not a setting for every processor. :dd
{Processors grid numa and map style are incompatible} :dt
Using numa for gstyle in the processors command requires using
cart for the map option. :dd
{Processors part option and grid style are incompatible} :dt
Cannot use gstyle numa or custom with the part option. :dd
{Processors twogrid requires proc count be a multiple of core count} :dt
Self-explanatory. :dd
{Pstart and Pstop must have the same value} :dt
Self-explanatory. :dd
{Python function evaluation failed} :dt
The Python function did not run succesfully and/or did not return a
value (if it is supposed to return a value). This is probably due to
some error condition in the function. :dd
{Python function is not callable} :dt
The provided Python code was run successfully, but it not
define a callable function with the required name. :dd
{Python invoke of undefined function} :dt
Cannot invoke a function that has not been previously defined. :dd
{Python variable does not match Python function} :dt
This matching is defined by the python-style variable and the python
command. :dd
{Python variable has no function} :dt
No python command was used to define the function associated with the
python-style variable. :dd
{QEQ with 'newton pair off' not supported} :dt
See the newton command. This is a restriction to use the QEQ fixes. :dd
{R0 < 0 for fix spring command} :dt
Equilibrium spring length is invalid. :dd
{RATTLE coordinate constraints are not satisfied up to desired tolerance} :dt
Self-explanatory. :dd
{RATTLE determinant = 0.0} :dt
The determinant of the matrix being solved for a single cluster
specified by the fix rattle command is numerically invalid. :dd
{RATTLE failed} :dt
Certain constraints were not satisfied. :dd
{RATTLE velocity constraints are not satisfied up to desired tolerance} :dt
Self-explanatory. :dd
{Read data add offset is too big} :dt
It cannot be larger than the size of atom IDs, e.g. the maximum 32-bit
integer. :dd
{Read dump of atom property that isn't allocated} :dt
Self-explanatory. :dd
{Read rerun dump file timestep > specified stop} :dt
Self-explanatory. :dd
{Read restart MPI-IO input not allowed with % in filename} :dt
This is because a % signifies one file per processor and MPI-IO
creates one large file for all processors. :dd
{Read_data shrink wrap did not assign all atoms correctly} :dt
This is typically because the box-size specified in the data file is
large compared to the actual extent of atoms in a shrink-wrapped
dimension. When LAMMPS shrink-wraps the box atoms will be lost if the
processor they are re-assigned to is too far away. Choose a box
size closer to the actual extent of the atoms. :dd
{Read_dump command before simulation box is defined} :dt
The read_dump command cannot be used before a read_data, read_restart,
or create_box command. :dd
{Read_dump field not found in dump file} :dt
Self-explanatory. :dd
{Read_dump triclinic status does not match simulation} :dt
Both the dump snapshot and the current LAMMPS simulation must
be using either an orthogonal or triclinic box. :dd
{Read_dump xyz fields do not have consistent scaling/wrapping} :dt
Self-explanatory. :dd
{Reading from MPI-IO filename when MPIIO package is not installed} :dt
Self-explanatory. :dd
{Reax_defs.h setting for NATDEF is too small} :dt
Edit the setting in the ReaxFF library and re-compile the
library and re-build LAMMPS. :dd
{Reax_defs.h setting for NNEIGHMAXDEF is too small} :dt
Edit the setting in the ReaxFF library and re-compile the
library and re-build LAMMPS. :dd
{Receiving partition in processors part command is already a receiver} :dt
Cannot specify a partition to be a receiver twice. :dd
{Region ID for compute chunk/atom does not exist} :dt
Self-explanatory. :dd
{Region ID for compute reduce/region does not exist} :dt
Self-explanatory. :dd
{Region ID for compute temp/region does not exist} :dt
Self-explanatory. :dd
{Region ID for dump custom does not exist} :dt
Self-explanatory. :dd
{Region ID for fix addforce does not exist} :dt
Self-explanatory. :dd
{Region ID for fix atom/swap does not exist} :dt
Self-explanatory. :dd
{Region ID for fix ave/spatial does not exist} :dt
Self-explanatory. :dd
{Region ID for fix aveforce does not exist} :dt
Self-explanatory. :dd
{Region ID for fix deposit does not exist} :dt
Self-explanatory. :dd
{Region ID for fix efield does not exist} :dt
Self-explanatory. :dd
{Region ID for fix evaporate does not exist} :dt
Self-explanatory. :dd
{Region ID for fix gcmc does not exist} :dt
Self-explanatory. :dd
{Region ID for fix heat does not exist} :dt
Self-explanatory. :dd
{Region ID for fix setforce does not exist} :dt
Self-explanatory. :dd
{Region ID for fix wall/region does not exist} :dt
Self-explanatory. :dd
{Region ID for group dynamic does not exist} :dt
Self-explanatory. :dd
{Region ID in variable formula does not exist} :dt
Self-explanatory. :dd
{Region cannot have 0 length rotation vector} :dt
Self-explanatory. :dd
{Region for fix oneway does not exist} :dt
Self-explanatory. :dd
{Region intersect region ID does not exist} :dt
Self-explanatory. :dd
{Region union or intersect cannot be dynamic} :dt
The sub-regions can be dynamic, but not the combined region. :dd
{Region union region ID does not exist} :dt
One or more of the region IDs specified by the region union command
does not exist. :dd
{Replacing a fix, but new style != old style} :dt
A fix ID can be used a 2nd time, but only if the style matches the
previous fix. In this case it is assumed you with to reset a fix's
parameters. This error may mean you are mistakenly re-using a fix ID
when you do not intend to. :dd
{Replicate command before simulation box is defined} :dt
The replicate command cannot be used before a read_data, read_restart,
or create_box command. :dd
{Replicate did not assign all atoms correctly} :dt
Atoms replicated by the replicate command were not assigned correctly
to processors. This is likely due to some atom coordinates being
outside a non-periodic simulation box. :dd
{Replicated system atom IDs are too big} :dt
See the setting for tagint in the src/lmptype.h file. :dd
{Replicated system is too big} :dt
See the setting for bigint in the src/lmptype.h file. :dd
{Required border comm not yet implemented with Kokkos} :dt
There are various limitations in the communication options supported
by Kokkos. :dd
{Rerun command before simulation box is defined} :dt
The rerun command cannot be used before a read_data, read_restart, or
create_box command. :dd
{Rerun dump file does not contain requested snapshot} :dt
Self-explanatory. :dd
{Resetting timestep size is not allowed with fix move} :dt
This is because fix move is moving atoms based on elapsed time. :dd
{Respa inner cutoffs are invalid} :dt
The first cutoff must be <= the second cutoff. :dd
{Respa levels must be >= 1} :dt
Self-explanatory. :dd
{Respa middle cutoffs are invalid} :dt
The first cutoff must be <= the second cutoff. :dd
{Restart file MPI-IO output not allowed with % in filename} :dt
This is because a % signifies one file per processor and MPI-IO
creates one large file for all processors. :dd
{Restart file byte ordering is not recognized} :dt
The file does not appear to be a LAMMPS restart file since it doesn't
contain a recognized byte-orderomg flag at the beginning. :dd
{Restart file byte ordering is swapped} :dt
The file was written on a machine with different byte-ordering than
the machine you are reading it on. Convert it to a text data file
instead, on the machine you wrote it on. :dd
{Restart file incompatible with current version} :dt
This is probably because you are trying to read a file created with a
version of LAMMPS that is too old compared to the current version.
Use your older version of LAMMPS and convert the restart file
to a data file. :dd
{Restart file is a MPI-IO file} :dt
The file is inconsistent with the filename you specified for it. :dd
{Restart file is a multi-proc file} :dt
The file is inconsistent with the filename you specified for it. :dd
{Restart file is not a MPI-IO file} :dt
The file is inconsistent with the filename you specified for it. :dd
{Restart file is not a multi-proc file} :dt
The file is inconsistent with the filename you specified for it. :dd
{Restart variable returned a bad timestep} :dt
The variable must return a timestep greater than the current timestep. :dd
{Restrain atoms %d %d %d %d missing on proc %d at step %ld} :dt
The 4 atoms in a restrain dihedral specified by the fix restrain
command are not all accessible to a processor. This probably means an
atom has moved too far. :dd
{Restrain atoms %d %d %d missing on proc %d at step %ld} :dt
The 3 atoms in a restrain angle specified by the fix restrain
command are not all accessible to a processor. This probably means an
atom has moved too far. :dd
{Restrain atoms %d %d missing on proc %d at step %ld} :dt
The 2 atoms in a restrain bond specified by the fix restrain
command are not all accessible to a processor. This probably means an
atom has moved too far. :dd
{Reuse of compute ID} :dt
A compute ID cannot be used twice. :dd
{Reuse of dump ID} :dt
A dump ID cannot be used twice. :dd
{Reuse of molecule template ID} :dt
The template IDs must be unique. :dd
{Reuse of region ID} :dt
A region ID cannot be used twice. :dd
{Rigid body atoms %d %d missing on proc %d at step %ld} :dt
This means that an atom cannot find the atom that owns the rigid body
it is part of, or vice versa. The solution is to use the communicate
cutoff command to insure ghost atoms are acquired from far enough away
to encompass the max distance printed when the fix rigid/small command
was invoked. :dd
{Rigid body has degenerate moment of inertia} :dt
Fix poems will only work with bodies (collections of atoms) that have
non-zero principal moments of inertia. This means they must be 3 or
more non-collinear atoms, even with joint atoms removed. :dd
{Rigid fix must come before NPT/NPH fix} :dt
NPT/NPH fix must be defined in input script after all rigid fixes,
else the rigid fix contribution to the pressure virial is
incorrect. :dd
{Rmask function in equal-style variable formula} :dt
Rmask is per-atom operation. :dd
{Run command before simulation box is defined} :dt
The run command cannot be used before a read_data, read_restart, or
create_box command. :dd
{Run command start value is after start of run} :dt
Self-explanatory. :dd
{Run command stop value is before end of run} :dt
Self-explanatory. :dd
{Run_style command before simulation box is defined} :dt
The run_style command cannot be used before a read_data,
read_restart, or create_box command. :dd
{SRD bin size for fix srd differs from user request} :dt
Fix SRD had to adjust the bin size to fit the simulation box. See the
cubic keyword if you want this message to be an error vs warning. :dd
{SRD bins for fix srd are not cubic enough} :dt
The bin shape is not within tolerance of cubic. See the cubic
keyword if you want this message to be an error vs warning. :dd
{SRD particle %d started inside big particle %d on step %ld bounce %d} :dt
See the inside keyword if you want this message to be an error vs
warning. :dd
{SRD particle %d started inside wall %d on step %ld bounce %d} :dt
See the inside keyword if you want this message to be an error vs
warning. :dd
{Same dimension twice in fix ave/spatial} :dt
Self-explanatory. :dd
{Sending partition in processors part command is already a sender} :dt
Cannot specify a partition to be a sender twice. :dd
{Set command before simulation box is defined} :dt
The set command cannot be used before a read_data, read_restart,
or create_box command. :dd
{Set command floating point vector does not exist} :dt
Self-explanatory. :dd
{Set command integer vector does not exist} :dt
Self-explanatory. :dd
{Set command with no atoms existing} :dt
No atoms are yet defined so the set command cannot be used. :dd
{Set region ID does not exist} :dt
Region ID specified in set command does not exist. :dd
{Shake angles have different bond types} :dt
All 3-atom angle-constrained SHAKE clusters specified by the fix shake
command that are the same angle type, must also have the same bond
types for the 2 bonds in the angle. :dd
{Shake atoms %d %d %d %d missing on proc %d at step %ld} :dt
The 4 atoms in a single shake cluster specified by the fix shake
command are not all accessible to a processor. This probably means
an atom has moved too far. :dd
{Shake atoms %d %d %d missing on proc %d at step %ld} :dt
The 3 atoms in a single shake cluster specified by the fix shake
command are not all accessible to a processor. This probably means
an atom has moved too far. :dd
{Shake atoms %d %d missing on proc %d at step %ld} :dt
The 2 atoms in a single shake cluster specified by the fix shake
command are not all accessible to a processor. This probably means
an atom has moved too far. :dd
{Shake cluster of more than 4 atoms} :dt
A single cluster specified by the fix shake command can have no more
than 4 atoms. :dd
{Shake clusters are connected} :dt
A single cluster specified by the fix shake command must have a single
central atom with up to 3 other atoms bonded to it. :dd
{Shake determinant = 0.0} :dt
The determinant of the matrix being solved for a single cluster
specified by the fix shake command is numerically invalid. :dd
{Shake fix must come before NPT/NPH fix} :dt
NPT fix must be defined in input script after SHAKE fix, else the
SHAKE fix contribution to the pressure virial is incorrect. :dd
{Shear history overflow, boost neigh_modify one} :dt
There are too many neighbors of a single atom. Use the neigh_modify
command to increase the max number of neighbors allowed for one atom.
You may also want to boost the page size. :dd
{Small to big integers are not sized correctly} :dt
This error occurs whenthe sizes of smallint, imageint, tagint, bigint,
as defined in src/lmptype.h are not what is expected. Contact
the developers if this occurs. :dd
{Smallint setting in lmptype.h is invalid} :dt
It has to be the size of an integer. :dd
{Smallint setting in lmptype.h is not compatible} :dt
Smallint stored in restart file is not consistent with LAMMPS version
you are running. :dd
{Special list size exceeded in fix bond/create} :dt
See the read_data command for info on setting the "extra special per
atom" header value to allow for additional special values to be
stored. :dd
{Specified processors != physical processors} :dt
The 3d grid of processors defined by the processors command does not
match the number of processors LAMMPS is being run on. :dd
{Specified target stress must be uniaxial or hydrostatic} :dt
Self-explanatory. :dd
{Sqrt of negative value in variable formula} :dt
Self-explanatory. :dd
{Subsequent read data induced too many angles per atom} :dt
See the create_box extra/angle/per/atom or read_data "extra angle per
atom" header value to set this limit larger. :dd
{Subsequent read data induced too many bonds per atom} :dt
See the create_box extra/bond/per/atom or read_data "extra bond per
atom" header value to set this limit larger. :dd
{Subsequent read data induced too many dihedrals per atom} :dt
See the create_box extra/dihedral/per/atom or read_data "extra
dihedral per atom" header value to set this limit larger. :dd
{Subsequent read data induced too many impropers per atom} :dt
See the create_box extra/improper/per/atom or read_data "extra
improper per atom" header value to set this limit larger. :dd
{Substitution for illegal variable} :dt
Input script line contained a variable that could not be substituted
for. :dd
{Support for writing images in JPEG format not included} :dt
LAMMPS was not built with the -DLAMMPS_JPEG switch in the Makefile. :dd
{Support for writing images in PNG format not included} :dt
LAMMPS was not built with the -DLAMMPS_PNG switch in the Makefile. :dd
{Support for writing movies not included} :dt
LAMMPS was not built with the -DLAMMPS_FFMPEG switch in the Makefile :dd
{System in data file is too big} :dt
See the setting for bigint in the src/lmptype.h file. :dd
{System is not charge neutral, net charge = %g} :dt
The total charge on all atoms on the system is not 0.0.
For some KSpace solvers this is an error. :dd
{TAD nsteps must be multiple of t_event} :dt
Self-explanatory. :dd
{TIP4P hydrogen has incorrect atom type} :dt
The TIP4P pairwise computation found an H atom whose type does not
agree with the specified H type. :dd
{TIP4P hydrogen is missing} :dt
The TIP4P pairwise computation failed to find the correct H atom
within a water molecule. :dd
{TMD target file did not list all group atoms} :dt
The target file for the fix tmd command did not list all atoms in the
fix group. :dd
{Tad command before simulation box is defined} :dt
Self-explanatory. :dd
{Tagint setting in lmptype.h is invalid} :dt
Tagint must be as large or larger than smallint. :dd
{Tagint setting in lmptype.h is not compatible} :dt
Format of tagint stored in restart file is not consistent with LAMMPS
version you are running. See the settings in src/lmptype.h :dd
{Target pressure for fix rigid/nph cannot be < 0.0} :dt
Self-explanatory. :dd
{Target pressure for fix rigid/npt/small cannot be < 0.0} :dt
Self-explanatory. :dd
{Target temperature for fix nvt/npt/nph cannot be 0.0} :dt
Self-explanatory. :dd
{Target temperature for fix rigid/npt cannot be 0.0} :dt
Self-explanatory. :dd
{Target temperature for fix rigid/npt/small cannot be 0.0} :dt
Self-explanatory. :dd
{Target temperature for fix rigid/nvt cannot be 0.0} :dt
Self-explanatory. :dd
{Target temperature for fix rigid/nvt/small cannot be 0.0} :dt
Self-explanatory. :dd
{Temper command before simulation box is defined} :dt
The temper command cannot be used before a read_data, read_restart, or
create_box command. :dd
{Temperature ID for fix bond/swap does not exist} :dt
Self-explanatory. :dd
{Temperature ID for fix box/relax does not exist} :dt
Self-explanatory. :dd
{Temperature ID for fix nvt/npt does not exist} :dt
Self-explanatory. :dd
{Temperature ID for fix press/berendsen does not exist} :dt
Self-explanatory. :dd
{Temperature ID for fix rigid nvt/npt/nph does not exist} :dt
Self-explanatory. :dd
{Temperature ID for fix temp/berendsen does not exist} :dt
Self-explanatory. :dd
{Temperature ID for fix temp/csld does not exist} :dt
Self-explanatory. :dd
{Temperature ID for fix temp/csvr does not exist} :dt
Self-explanatory. :dd
{Temperature ID for fix temp/rescale does not exist} :dt
Self-explanatory. :dd
{Temperature compute degrees of freedom < 0} :dt
This should not happen if you are calculating the temperature
on a valid set of atoms. :dd
{Temperature control can not be used with fix nph} :dt
Self-explanatory. :dd
{Temperature control can not be used with fix nph/asphere} :dt
Self-explanatory. :dd
{Temperature control can not be used with fix nph/body} :dt
Self-explanatory. :dd
{Temperature control can not be used with fix nph/sphere} :dt
Self-explanatory. :dd
{Temperature control must be used with fix nphug} :dt
The temp keyword must be provided. :dd
{Temperature control must be used with fix npt} :dt
Self-explanatory. :dd
{Temperature control must be used with fix npt/asphere} :dt
Self-explanatory. :dd
{Temperature control must be used with fix npt/body} :dt
Self-explanatory. :dd
{Temperature control must be used with fix npt/sphere} :dt
Self-explanatory. :dd
{Temperature control must be used with fix nvt} :dt
Self-explanatory. :dd
{Temperature control must be used with fix nvt/asphere} :dt
Self-explanatory. :dd
{Temperature control must be used with fix nvt/body} :dt
Self-explanatory. :dd
{Temperature control must be used with fix nvt/sllod} :dt
Self-explanatory. :dd
{Temperature control must be used with fix nvt/sphere} :dt
Self-explanatory. :dd
{Temperature control must not be used with fix nph/small} :dt
Self-explanatory. :dd
{Temperature for fix nvt/sllod does not have a bias} :dt
The specified compute must compute temperature with a bias. :dd
{Tempering could not find thermo_pe compute} :dt
This compute is created by the thermo command. It must have been
explicitly deleted by a uncompute command. :dd
{Tempering fix ID is not defined} :dt
The fix ID specified by the temper command does not exist. :dd
{Tempering temperature fix is not valid} :dt
The fix specified by the temper command is not one that controls
temperature (nvt or langevin). :dd
{Test_descriptor_string already allocated} :dt
This is an internal error. Contact the developers. :dd
{The package gpu command is required for gpu styles} :dt
Self-explanatory. :dd
{Thermo and fix not computed at compatible times} :dt
Fixes generate values on specific timesteps. The thermo output
does not match these timesteps. :dd
{Thermo compute array is accessed out-of-range} :dt
Self-explanatory. :dd
{Thermo compute does not compute array} :dt
Self-explanatory. :dd
{Thermo compute does not compute scalar} :dt
Self-explanatory. :dd
{Thermo compute does not compute vector} :dt
Self-explanatory. :dd
{Thermo compute vector is accessed out-of-range} :dt
Self-explanatory. :dd
{Thermo custom variable cannot be indexed} :dt
Self-explanatory. :dd
{Thermo custom variable is not equal-style variable} :dt
Only equal-style variables can be output with thermodynamics, not
atom-style variables. :dd
{Thermo every variable returned a bad timestep} :dt
The variable must return a timestep greater than the current timestep. :dd
{Thermo fix array is accessed out-of-range} :dt
Self-explanatory. :dd
{Thermo fix does not compute array} :dt
Self-explanatory. :dd
{Thermo fix does not compute scalar} :dt
Self-explanatory. :dd
{Thermo fix does not compute vector} :dt
Self-explanatory. :dd
{Thermo fix vector is accessed out-of-range} :dt
Self-explanatory. :dd
{Thermo keyword in variable requires thermo to use/init pe} :dt
You are using a thermo keyword in a variable that requires
potential energy to be calculated, but your thermo output
does not use it. Add it to your thermo output. :dd
{Thermo keyword in variable requires thermo to use/init press} :dt
You are using a thermo keyword in a variable that requires pressure to
be calculated, but your thermo output does not use it. Add it to your
thermo output. :dd
{Thermo keyword in variable requires thermo to use/init temp} :dt
You are using a thermo keyword in a variable that requires temperature
to be calculated, but your thermo output does not use it. Add it to
your thermo output. :dd
{Thermo style does not use press} :dt
Cannot use thermo_modify to set this parameter since the thermo_style
is not computing this quantity. :dd
{Thermo style does not use temp} :dt
Cannot use thermo_modify to set this parameter since the thermo_style
is not computing this quantity. :dd
{Thermo_modify every variable returned a bad timestep} :dt
The returned timestep is less than or equal to the current timestep. :dd
{Thermo_modify int format does not contain d character} :dt
Self-explanatory. :dd
{Thermo_modify pressure ID does not compute pressure} :dt
The specified compute ID does not compute pressure. :dd
{Thermo_modify temperature ID does not compute temperature} :dt
The specified compute ID does not compute temperature. :dd
{Thermo_style command before simulation box is defined} :dt
The thermo_style command cannot be used before a read_data,
read_restart, or create_box command. :dd
{This variable thermo keyword cannot be used between runs} :dt
Keywords that refer to time (such as cpu, elapsed) do not
make sense in between runs. :dd
{Threshhold for an atom property that isn't allocated} :dt
A dump threshhold has been requested on a quantity that is
not defined by the atom style used in this simulation. :dd
{Timestep must be >= 0} :dt
Specified timestep is invalid. :dd
{Too big a problem to use velocity create loop all} :dt
The system size must fit in a 32-bit integer to use this option. :dd
{Too big a timestep for dump dcd} :dt
The timestep must fit in a 32-bit integer to use this dump style. :dd
{Too big a timestep for dump xtc} :dt
The timestep must fit in a 32-bit integer to use this dump style. :dd
{Too few bits for lookup table} :dt
Table size specified via pair_modify command does not work with your
machine's floating point representation. :dd
{Too few lines in %s section of data file} :dt
Self-explanatory. :dd
{Too few values in body lines in data file} :dt
Self-explanatory. :dd
{Too few values in body section of molecule file} :dt
Self-explanatory. :dd
{Too many -pk arguments in command line} :dt
The string formed by concatenating the arguments is too long. Use a
package command in the input script instead. :dd
{Too many MSM grid levels} :dt
The max number of MSM grid levels is hardwired to 10. :dd
{Too many args in variable function} :dt
More args are used than any variable function allows. :dd
{Too many atom pairs for pair bop} :dt
The number of atomic pairs exceeds the expected number. Check your
atomic structure to ensure that it is realistic. :dd
{Too many atom sorting bins} :dt
This is likely due to an immense simulation box that has blown up
to a large size. :dd
{Too many atom triplets for pair bop} :dt
The number of three atom groups for angle determinations exceeds the
expected number. Check your atomic structrure to ensure that it is
realistic. :dd
{Too many atoms for dump dcd} :dt
The system size must fit in a 32-bit integer to use this dump
style. :dd
{Too many atoms for dump xtc} :dt
The system size must fit in a 32-bit integer to use this dump
style. :dd
{Too many atoms to dump sort} :dt
Cannot sort when running with more than 2^31 atoms. :dd
{Too many exponent bits for lookup table} :dt
Table size specified via pair_modify command does not work with your
machine's floating point representation. :dd
{Too many groups} :dt
The maximum number of atom groups (including the "all" group) is
given by MAX_GROUP in group.cpp and is 32. :dd
{Too many iterations} :dt
You must use a number of iterations that fit in a 32-bit integer
for minimization. :dd
{Too many lines in one body in data file - boost MAXBODY} :dt
MAXBODY is a setting at the top of the src/read_data.cpp file.
Set it larger and re-compile the code. :dd
{Too many local+ghost atoms for neighbor list} :dt
The number of nlocal + nghost atoms on a processor
is limited by the size of a 32-bit integer with 2 bits
removed for masking 1-2, 1-3, 1-4 neighbors. :dd
{Too many mantissa bits for lookup table} :dt
Table size specified via pair_modify command does not work with your
machine's floating point representation. :dd
{Too many masses for fix shake} :dt
The fix shake command cannot list more masses than there are atom
types. :dd
{Too many molecules for fix poems} :dt
The limit is 2^31 = ~2 billion molecules. :dd
{Too many molecules for fix rigid} :dt
The limit is 2^31 = ~2 billion molecules. :dd
{Too many neighbor bins} :dt
This is likely due to an immense simulation box that has blown up
to a large size. :dd
{Too many timesteps} :dt
The cummulative timesteps must fit in a 64-bit integer. :dd
{Too many timesteps for NEB} :dt
You must use a number of timesteps that fit in a 32-bit integer
for NEB. :dd
{Too many total atoms} :dt
See the setting for bigint in the src/lmptype.h file. :dd
{Too many total bits for bitmapped lookup table} :dt
Table size specified via pair_modify command is too large. Note that
a value of N generates a 2^N size table. :dd
{Too many values in body lines in data file} :dt
Self-explanatory. :dd
{Too many values in body section of molecule file} :dt
Self-explanatory. :dd
{Too much buffered per-proc info for dump} :dt
The size of the buffered string must fit in a 32-bit integer for a
dump. :dd
{Too much per-proc info for dump} :dt
Number of local atoms times number of columns must fit in a 32-bit
integer for dump. :dd
{Tree structure in joint connections} :dt
Fix poems cannot (yet) work with coupled bodies whose joints connect
the bodies in a tree structure. :dd
{Triclinic box skew is too large} :dt
The displacement in a skewed direction must be less than half the box
length in that dimension. E.g. the xy tilt must be between -half and
+half of the x box length. This constraint can be relaxed by using
the box tilt command. :dd
{Tried to convert a double to int, but input_double > INT_MAX} :dt
Self-explanatory. :dd
{Trying to build an occasional neighbor list before initialization completed} :dt
This is not allowed. Source code caller needs to be modified. :dd
{Two fix ave commands using same compute chunk/atom command in incompatible ways} :dt
They are both attempting to "lock" the chunk/atom command so that the
chunk assignments persist for some number of timesteps, but are doing
it in different ways. :dd
{Two groups cannot be the same in fix spring couple} :dt
Self-explanatory. :dd
{USER-CUDA mode requires CUDA variant of min style} :dt
CUDA mode is enabled, so the min style must include a cuda suffix. :dd
{USER-CUDA mode requires CUDA variant of run style} :dt
CUDA mode is enabled, so the run style must include a cuda suffix. :dd
{USER-CUDA package does not yet support comm_style tiled} :dt
Self-explanatory. :dd
{USER-CUDA package requires a cuda enabled atom_style} :dt
Self-explanatory. :dd
{Unable to initialize accelerator for use} :dt
There was a problem initializing an accelerator for the gpu package :dd
{Unbalanced quotes in input line} :dt
No matching end double quote was found following a leading double
quote. :dd
{Unexpected end of -reorder file} :dt
Self-explanatory. :dd
{Unexpected end of AngleCoeffs section} :dt
Read a blank line. :dd
{Unexpected end of BondCoeffs section} :dt
Read a blank line. :dd
{Unexpected end of DihedralCoeffs section} :dt
Read a blank line. :dd
{Unexpected end of ImproperCoeffs section} :dt
Read a blank line. :dd
{Unexpected end of PairCoeffs section} :dt
Read a blank line. :dd
{Unexpected end of custom file} :dt
Self-explanatory. :dd
{Unexpected end of data file} :dt
LAMMPS hit the end of the data file while attempting to read a
section. Something is wrong with the format of the data file. :dd
{Unexpected end of dump file} :dt
A read operation from the file failed. :dd
{Unexpected end of fix rigid file} :dt
A read operation from the file failed. :dd
{Unexpected end of fix rigid/small file} :dt
A read operation from the file failed. :dd
{Unexpected end of molecule file} :dt
Self-explanatory. :dd
{Unexpected end of neb file} :dt
A read operation from the file failed. :dd
{Units command after simulation box is defined} :dt
The units command cannot be used after a read_data, read_restart, or
create_box command. :dd
{Universe/uloop variable count < # of partitions} :dt
A universe or uloop style variable must specify a number of values >= to the
number of processor partitions. :dd
{Unknown angle style} :dt
The choice of angle style is unknown. :dd
{Unknown atom style} :dt
The choice of atom style is unknown. :dd
{Unknown body style} :dt
The choice of body style is unknown. :dd
{Unknown bond style} :dt
The choice of bond style is unknown. :dd
{Unknown category for info is_active()} :dt
Self-explanatory. :dd
{Unknown category for info is_available()} :dt
Self-explanatory. :dd
{Unknown category for info is_defined()} :dt
Self-explanatory. :dd
{Unknown command: %s} :dt
The command is not known to LAMMPS. Check the input script. :dd
{Unknown compute style} :dt
The choice of compute style is unknown. :dd
{Unknown dihedral style} :dt
The choice of dihedral style is unknown. :dd
{Unknown dump reader style} :dt
The choice of dump reader style via the format keyword is unknown. :dd
{Unknown dump style} :dt
The choice of dump style is unknown. :dd
{Unknown error in GPU library} :dt
Self-explanatory. :dd
{Unknown fix style} :dt
The choice of fix style is unknown. :dd
{Unknown identifier in data file: %s} :dt
A section of the data file cannot be read by LAMMPS. :dd
{Unknown improper style} :dt
The choice of improper style is unknown. :dd
{Unknown keyword in thermo_style custom command} :dt
One or more specified keywords are not recognized. :dd
{Unknown kspace style} :dt
The choice of kspace style is unknown. :dd
{Unknown name for info newton category} :dt
Self-explanatory. :dd
{Unknown name for info package category} :dt
Self-explanatory. :dd
{Unknown name for info pair category} :dt
Self-explanatory. :dd
{Unknown pair style} :dt
The choice of pair style is unknown. :dd
{Unknown pair_modify hybrid sub-style} :dt
The choice of sub-style is unknown. :dd
{Unknown region style} :dt
The choice of region style is unknown. :dd
{Unknown section in molecule file} :dt
Self-explanatory. :dd
{Unknown table style in angle style table} :dt
Self-explanatory. :dd
{Unknown table style in bond style table} :dt
Self-explanatory. :dd
{Unknown table style in pair_style command} :dt
Style of table is invalid for use with pair_style table command. :dd
{Unknown unit_style} :dt
Self-explanatory. Check the input script or data file. :dd
{Unrecognized lattice type in MEAM file 1} :dt
The lattice type in an entry of the MEAM library file is not
valid. :dd
{Unrecognized lattice type in MEAM file 2} :dt
The lattice type in an entry of the MEAM parameter file is not
valid. :dd
{Unrecognized pair style in compute pair command} :dt
Self-explanatory. :dd
{Unrecognized virial argument in pair_style command} :dt
Only two options are supported: LAMMPSvirial and KIMvirial :dd
{Unsupported mixing rule in kspace_style ewald/disp} :dt
Only geometric mixing is supported. :dd
{Unsupported order in kspace_style ewald/disp} :dt
Only 1/r^6 dispersion or dipole terms are supported. :dd
{Unsupported order in kspace_style pppm/disp, pair_style %s} :dt
Only pair styles with 1/r and 1/r^6 dependence are currently supported. :dd
{Use cutoff keyword to set cutoff in single mode} :dt
Mode is single so cutoff/multi keyword cannot be used. :dd
{Use cutoff/multi keyword to set cutoff in multi mode} :dt
Mode is multi so cutoff keyword cannot be used. :dd
{Using fix nvt/sllod with inconsistent fix deform remap option} :dt
Fix nvt/sllod requires that deforming atoms have a velocity profile
provided by "remap v" as a fix deform option. :dd
{Using fix nvt/sllod with no fix deform defined} :dt
Self-explanatory. :dd
{Using fix srd with inconsistent fix deform remap option} :dt
When shearing the box in an SRD simulation, the remap v option for fix
deform needs to be used. :dd
{Using pair lubricate with inconsistent fix deform remap option} :dt
Must use remap v option with fix deform with this pair style. :dd
{Using pair lubricate/poly with inconsistent fix deform remap option} :dt
If fix deform is used, the remap v option is required. :dd
{Using suffix cuda without USER-CUDA package enabled} :dt
Self-explanatory. :dd
{Using suffix gpu without GPU package installed} :dt
Self-explanatory. :dd
{Using suffix intel without USER-INTEL package installed} :dt
Self-explanatory. :dd
{Using suffix kk without KOKKOS package enabled} :dt
Self-explanatory. :dd
{Using suffix omp without USER-OMP package installed} :dt
Self-explanatory. :dd
{Using update dipole flag requires atom attribute mu} :dt
Self-explanatory. :dd
{Using update dipole flag requires atom style sphere} :dt
Self-explanatory. :dd
{Variable ID in variable formula does not exist} :dt
Self-explanatory. :dd
{Variable atom ID is too large} :dt
Specified ID is larger than the maximum allowed atom ID. :dd
{Variable evaluation before simulation box is defined} :dt
Cannot evaluate a compute or fix or atom-based value in a variable
before the simulation has been setup. :dd
{Variable evaluation in fix wall gave bad value} :dt
The returned value for epsilon or sigma < 0.0. :dd
{Variable evaluation in region gave bad value} :dt
Variable returned a radius < 0.0. :dd
{Variable for compute ti is invalid style} :dt
Self-explanatory. :dd
{Variable for create_atoms is invalid style} :dt
The variables must be equal-style variables. :dd
{Variable for displace_atoms is invalid style} :dt
It must be an equal-style or atom-style variable. :dd
{Variable for dump every is invalid style} :dt
Only equal-style variables can be used. :dd
{Variable for dump image center is invalid style} :dt
Must be an equal-style variable. :dd
{Variable for dump image persp is invalid style} :dt
Must be an equal-style variable. :dd
{Variable for dump image phi is invalid style} :dt
Must be an equal-style variable. :dd
{Variable for dump image theta is invalid style} :dt
Must be an equal-style variable. :dd
{Variable for dump image zoom is invalid style} :dt
Must be an equal-style variable. :dd
{Variable for fix adapt is invalid style} :dt
Only equal-style variables can be used. :dd
{Variable for fix addforce is invalid style} :dt
Self-explanatory. :dd
{Variable for fix aveforce is invalid style} :dt
Only equal-style variables can be used. :dd
{Variable for fix deform is invalid style} :dt
The variable must be an equal-style variable. :dd
{Variable for fix efield is invalid style} :dt
The variable must be an equal- or atom-style variable. :dd
{Variable for fix gravity is invalid style} :dt
Only equal-style variables can be used. :dd
{Variable for fix heat is invalid style} :dt
Only equal-style or atom-style variables can be used. :dd
{Variable for fix indent is invalid style} :dt
Only equal-style variables can be used. :dd
{Variable for fix indent is not equal style} :dt
Only equal-style variables can be used. :dd
{Variable for fix langevin is invalid style} :dt
It must be an equal-style variable. :dd
{Variable for fix move is invalid style} :dt
Only equal-style variables can be used. :dd
{Variable for fix setforce is invalid style} :dt
Only equal-style variables can be used. :dd
{Variable for fix temp/berendsen is invalid style} :dt
Only equal-style variables can be used. :dd
{Variable for fix temp/csld is invalid style} :dt
Only equal-style variables can be used. :dd
{Variable for fix temp/csvr is invalid style} :dt
Only equal-style variables can be used. :dd
{Variable for fix temp/rescale is invalid style} :dt
Only equal-style variables can be used. :dd
{Variable for fix wall is invalid style} :dt
Only equal-style variables can be used. :dd
{Variable for fix wall/reflect is invalid style} :dt
Only equal-style variables can be used. :dd
{Variable for fix wall/srd is invalid style} :dt
Only equal-style variables can be used. :dd
{Variable for group dynamic is invalid style} :dt
The variable must be an atom-style variable. :dd
{Variable for group is invalid style} :dt
Only atom-style variables can be used. :dd
{Variable for region cylinder is invalid style} :dt
Only equal-style varaibles are allowed. :dd
{Variable for region is invalid style} :dt
Only equal-style variables can be used. :dd
{Variable for region is not equal style} :dt
Self-explanatory. :dd
{Variable for region sphere is invalid style} :dt
Only equal-style varaibles are allowed. :dd
{Variable for restart is invalid style} :dt
Only equal-style variables can be used. :dd
{Variable for set command is invalid style} :dt
Only atom-style variables can be used. :dd
{Variable for thermo every is invalid style} :dt
Only equal-style variables can be used. :dd
{Variable for velocity set is invalid style} :dt
Only atom-style variables can be used. :dd
{Variable for voronoi radius is not atom style} :dt
Self-explanatory. :dd
{Variable formula compute array is accessed out-of-range} :dt
Self-explanatory. :dd
{Variable formula compute vector is accessed out-of-range} :dt
Self-explanatory. :dd
{Variable formula fix array is accessed out-of-range} :dt
Self-explanatory. :dd
{Variable formula fix vector is accessed out-of-range} :dt
Self-explanatory. :dd
{Variable has circular dependency} :dt
A circular dependency is when variable "a" in used by variable "b" and
variable "b" is also used by varaible "a". Circular dependencies with
longer chains of dependence are also not allowed. :dd
{Variable name between brackets must be alphanumeric or underscore characters} :dt
Self-explanatory. :dd
{Variable name for compute chunk/atom does not exist} :dt
Self-explanatory. :dd
{Variable name for compute reduce does not exist} :dt
Self-explanatory. :dd
{Variable name for compute ti does not exist} :dt
Self-explanatory. :dd
{Variable name for create_atoms does not exist} :dt
Self-explanatory. :dd
{Variable name for displace_atoms does not exist} :dt
Self-explanatory. :dd
{Variable name for dump every does not exist} :dt
Self-explanatory. :dd
{Variable name for dump image center does not exist} :dt
Self-explanatory. :dd
{Variable name for dump image persp does not exist} :dt
Self-explanatory. :dd
{Variable name for dump image phi does not exist} :dt
Self-explanatory. :dd
{Variable name for dump image theta does not exist} :dt
Self-explanatory. :dd
{Variable name for dump image zoom does not exist} :dt
Self-explanatory. :dd
{Variable name for fix adapt does not exist} :dt
Self-explanatory. :dd
{Variable name for fix addforce does not exist} :dt
Self-explanatory. :dd
{Variable name for fix ave/atom does not exist} :dt
Self-explanatory. :dd
{Variable name for fix ave/chunk does not exist} :dt
Self-explanatory. :dd
{Variable name for fix ave/correlate does not exist} :dt
Self-explanatory. :dd
{Variable name for fix ave/histo does not exist} :dt
Self-explanatory. :dd
{Variable name for fix ave/spatial does not exist} :dt
Self-explanatory. :dd
{Variable name for fix ave/time does not exist} :dt
Self-explanatory. :dd
{Variable name for fix aveforce does not exist} :dt
Self-explanatory. :dd
{Variable name for fix deform does not exist} :dt
Self-explantory. :dd
{Variable name for fix efield does not exist} :dt
Self-explanatory. :dd
{Variable name for fix gravity does not exist} :dt
Self-explanatory. :dd
{Variable name for fix heat does not exist} :dt
Self-explanatory. :dd
{Variable name for fix indent does not exist} :dt
Self-explanatory. :dd
{Variable name for fix langevin does not exist} :dt
Self-explanatory. :dd
{Variable name for fix move does not exist} :dt
Self-explanatory. :dd
{Variable name for fix setforce does not exist} :dt
Self-explanatory. :dd
{Variable name for fix store/state does not exist} :dt
Self-explanatory. :dd
{Variable name for fix temp/berendsen does not exist} :dt
Self-explanatory. :dd
{Variable name for fix temp/csld does not exist} :dt
Self-explanatory. :dd
{Variable name for fix temp/csvr does not exist} :dt
Self-explanatory. :dd
{Variable name for fix temp/rescale does not exist} :dt
Self-explanatory. :dd
{Variable name for fix vector does not exist} :dt
Self-explanatory. :dd
{Variable name for fix wall does not exist} :dt
Self-explanatory. :dd
{Variable name for fix wall/reflect does not exist} :dt
Self-explanatory. :dd
{Variable name for fix wall/srd does not exist} :dt
Self-explanatory. :dd
{Variable name for group does not exist} :dt
Self-explanatory. :dd
{Variable name for group dynamic does not exist} :dt
Self-explanatory. :dd
{Variable name for region cylinder does not exist} :dt
Self-explanatory. :dd
{Variable name for region does not exist} :dt
Self-explanatory. :dd
{Variable name for region sphere does not exist} :dt
Self-explanatory. :dd
{Variable name for restart does not exist} :dt
Self-explanatory. :dd
{Variable name for set command does not exist} :dt
Self-explanatory. :dd
{Variable name for thermo every does not exist} :dt
Self-explanatory. :dd
{Variable name for velocity set does not exist} :dt
Self-explanatory. :dd
{Variable name for voronoi radius does not exist} :dt
Self-explanatory. :dd
{Variable name must be alphanumeric or underscore characters} :dt
Self-explanatory. :dd
{Variable uses atom property that isn't allocated} :dt
Self-explanatory. :dd
{Velocity command before simulation box is defined} :dt
The velocity command cannot be used before a read_data, read_restart,
or create_box command. :dd
{Velocity command with no atoms existing} :dt
A velocity command has been used, but no atoms yet exist. :dd
{Velocity ramp in z for a 2d problem} :dt
Self-explanatory. :dd
{Velocity rigid used with non-rigid fix-ID} :dt
Self-explanatory. :dd
{Velocity temperature ID does calculate a velocity bias} :dt
The specified compute must compute a bias for temperature. :dd
{Velocity temperature ID does not compute temperature} :dt
The compute ID given to the velocity command must compute
temperature. :dd
{Verlet/split can only currently be used with comm_style brick} :dt
This is a current restriction in LAMMPS. :dd
{Verlet/split does not yet support TIP4P} :dt
This is a current limitation. :dd
{Verlet/split requires 2 partitions} :dt
See the -partition command-line switch. :dd
{Verlet/split requires Rspace partition layout be multiple of Kspace partition layout in each dim} :dt
This is controlled by the processors command. :dd
{Verlet/split requires Rspace partition size be multiple of Kspace partition size} :dt
This is so there is an equal number of Rspace processors for every
Kspace processor. :dd
{Virial was not tallied on needed timestep} :dt
You are using a thermo keyword that requires potentials to
have tallied the virial, but they didn't on this timestep. See the
variable doc page for ideas on how to make this work. :dd
{Voro++ error: narea and neigh have a different size} :dt
This error is returned by the Voro++ library. :dd
{Wall defined twice in fix wall command} :dt
Self-explanatory. :dd
{Wall defined twice in fix wall/reflect command} :dt
Self-explanatory. :dd
{Wall defined twice in fix wall/srd command} :dt
Self-explanatory. :dd
{Water H epsilon must be 0.0 for pair style lj/cut/tip4p/cut} :dt
This is because LAMMPS does not compute the Lennard-Jones interactions
with these particles for efficiency reasons. :dd
{Water H epsilon must be 0.0 for pair style lj/cut/tip4p/long} :dt
This is because LAMMPS does not compute the Lennard-Jones interactions
with these particles for efficiency reasons. :dd
{Water H epsilon must be 0.0 for pair style lj/long/tip4p/long} :dt
This is because LAMMPS does not compute the Lennard-Jones interactions
with these particles for efficiency reasons. :dd
{World variable count doesn't match # of partitions} :dt
A world-style variable must specify a number of values equal to the
number of processor partitions. :dd
{Write_data command before simulation box is defined} :dt
Self-explanatory. :dd
{Write_restart command before simulation box is defined} :dt
The write_restart command cannot be used before a read_data,
read_restart, or create_box command. :dd
{Writing to MPI-IO filename when MPIIO package is not installed} :dt
Self-explanatory. :dd
{Zero length rotation vector with displace_atoms} :dt
Self-explanatory. :dd
{Zero length rotation vector with fix move} :dt
Self-explanatory. :dd
{Zero-length lattice orient vector} :dt
Self-explanatory. :dd
:dle
Warnings: :h4,link(warn)
:dlb
{Adjusting Coulombic cutoff for MSM, new cutoff = %g} :dt
The adjust/cutoff command is turned on and the Coulombic cutoff has been
adjusted to match the user-specified accuracy. :dd
{Angle atoms missing at step %ld} :dt
One or more of 3 atoms needed to compute a particular angle are
missing on this processor. Typically this is because the pairwise
cutoff is set too short or the angle has blown apart and an atom is
too far away. :dd
{Angle style in data file differs from currently defined angle style} :dt
Self-explanatory. :dd
{Atom style in data file differs from currently defined atom style} :dt
Self-explanatory. :dd
{Bond atom missing in box size check} :dt
The 2nd atoms needed to compute a particular bond is missing on this
processor. Typically this is because the pairwise cutoff is set too
short or the bond has blown apart and an atom is too far away. :dd
{Bond atom missing in image check} :dt
The 2nd atom in a particular bond is missing on this processor.
Typically this is because the pairwise cutoff is set too short or the
bond has blown apart and an atom is too far away. :dd
{Bond atoms missing at step %ld} :dt
The 2nd atom needed to compute a particular bond is missing on this
processor. Typically this is because the pairwise cutoff is set too
short or the bond has blown apart and an atom is too far away. :dd
{Bond style in data file differs from currently defined bond style} :dt
Self-explanatory. :dd
{Bond/angle/dihedral extent > half of periodic box length} :dt
This is a restriction because LAMMPS can be confused about which image
of an atom in the bonded interaction is the correct one to use.
"Extent" in this context means the maximum end-to-end length of the
bond/angle/dihedral. LAMMPS computes this by taking the maximum bond
length, multiplying by the number of bonds in the interaction (e.g. 3
for a dihedral) and adding a small amount of stretch. :dd
{Both groups in compute group/group have a net charge; the Kspace boundary correction to energy will be non-zero} :dt
Self-explantory. :dd
{Calling write_dump before a full system init.} :dt
The write_dump command is used before the system has been fully
initialized as part of a 'run' or 'minimize' command. Not all dump
styles and features are fully supported at this point and thus the
command may fail or produce incomplete or incorrect output. Insert
a "run 0" command, if a full system init is required. :dd
{Cannot count rigid body degrees-of-freedom before bodies are fully initialized} :dt
This means the temperature associated with the rigid bodies may be
incorrect on this timestep. :dd
{Cannot count rigid body degrees-of-freedom before bodies are initialized} :dt
This means the temperature associated with the rigid bodies may be
incorrect on this timestep. :dd
{Cannot include log terms without 1/r terms; setting flagHI to 1} :dt
Self-explanatory. :dd
{Cannot include log terms without 1/r terms; setting flagHI to 1.} :dt
Self-explanatory. :dd
{Charges are set, but coulombic solver is not used} :dt
Self-explanatory. :dd
{Charges did not converge at step %ld: %lg} :dt
Self-explanatory. :dd
{Communication cutoff is too small for SNAP micro load balancing, increased to %lf} :dt
Self-explanatory. :dd
{Compute cna/atom cutoff may be too large to find ghost atom neighbors} :dt
The neighbor cutoff used may not encompass enough ghost atoms
to perform this operation correctly. :dd
{Computing temperature of portions of rigid bodies} :dt
The group defined by the temperature compute does not encompass all
the atoms in one or more rigid bodies, so the change in
degrees-of-freedom for the atoms in those partial rigid bodies will
not be accounted for. :dd
{Create_bonds max distance > minimum neighbor cutoff} :dt
This means atom pairs for some atom types may not be in the neighbor
list and thus no bond can be created between them. :dd
{Delete_atoms cutoff > minimum neighbor cutoff} :dt
This means atom pairs for some atom types may not be in the neighbor
list and thus an atom in that pair cannot be deleted. :dd
{Dihedral atoms missing at step %ld} :dt
One or more of 4 atoms needed to compute a particular dihedral are
missing on this processor. Typically this is because the pairwise
cutoff is set too short or the dihedral has blown apart and an atom is
too far away. :dd
{Dihedral problem} :dt
Conformation of the 4 listed dihedral atoms is extreme; you may want
to check your simulation geometry. :dd
{Dihedral problem: %d %ld %d %d %d %d} :dt
Conformation of the 4 listed dihedral atoms is extreme; you may want
to check your simulation geometry. :dd
{Dihedral style in data file differs from currently defined dihedral style} :dt
Self-explanatory. :dd
{Dump dcd/xtc timestamp may be wrong with fix dt/reset} :dt
If the fix changes the timestep, the dump dcd file will not
reflect the change. :dd
{Energy tally does not account for 'zero yes'} :dt
The energy removed by using the 'zero yes' flag is not accounted
for in the energy tally and thus energy conservation cannot be
monitored in this case. :dd
{Estimated error in splitting of dispersion coeffs is %g} :dt
Error is greater than 0.0001 percent. :dd
{Ewald/disp Newton solver failed, using old method to estimate g_ewald} :dt
Self-explanatory. Choosing a different cutoff value may help. :dd
{FENE bond too long} :dt
A FENE bond has stretched dangerously far. It's interaction strength
will be truncated to attempt to prevent the bond from blowing up. :dd
{FENE bond too long: %ld %d %d %g} :dt
A FENE bond has stretched dangerously far. It's interaction strength
will be truncated to attempt to prevent the bond from blowing up. :dd
{FENE bond too long: %ld %g} :dt
A FENE bond has stretched dangerously far. It's interaction strength
will be truncated to attempt to prevent the bond from blowing up. :dd
{Fix SRD walls overlap but fix srd overlap not set} :dt
You likely want to set this in your input script. :dd
{Fix bond/swap will ignore defined angles} :dt
See the doc page for fix bond/swap for more info on this
restriction. :dd
{Fix deposit near setting < possible overlap separation %g} :dt
This test is performed for finite size particles with a diameter, not
for point particles. The near setting is smaller than the particle
diameter which can lead to overlaps. :dd
{Fix evaporate may delete atom with non-zero molecule ID} :dt
This is probably an error, since you should not delete only one atom
of a molecule. :dd
{Fix gcmc using full_energy option} :dt
Fix gcmc has automatically turned on the full_energy option since it
is required for systems like the one specified by the user. User input
included one or more of the following: kspace, triclinic, a hybrid
pair style, an eam pair style, or no "single" function for the pair
style. :dd
{Fix property/atom mol or charge w/out ghost communication} :dt
A model typically needs these properties defined for ghost atoms. :dd
{Fix qeq CG convergence failed (%g) after %d iterations at %ld step} :dt
Self-explanatory. :dd
{Fix qeq has non-zero lower Taper radius cutoff} :dt
Absolute value must be <= 0.01. :dd
{Fix qeq has very low Taper radius cutoff} :dt
Value should typically be >= 5.0. :dd
{Fix qeq/dynamic tolerance may be too small for damped dynamics} :dt
Self-explanatory. :dd
{Fix qeq/fire tolerance may be too small for damped fires} :dt
Self-explanatory. :dd
{Fix rattle should come after all other integration fixes} :dt
This fix is designed to work after all other integration fixes change
atom positions. Thus it should be the last integration fix specified.
If not, it will not satisfy the desired constraints as well as it
otherwise would. :dd
{Fix recenter should come after all other integration fixes} :dt
Other fixes may change the position of the center-of-mass, so
fix recenter should come last. :dd
{Fix srd SRD moves may trigger frequent reneighboring} :dt
This is because the SRD particles may move long distances. :dd
{Fix srd grid size > 1/4 of big particle diameter} :dt
This may cause accuracy problems. :dd
{Fix srd particle moved outside valid domain} :dt
This may indicate a problem with your simulation parameters. :dd
{Fix srd particles may move > big particle diameter} :dt
This may cause accuracy problems. :dd
{Fix srd viscosity < 0.0 due to low SRD density} :dt
This may cause accuracy problems. :dd
{Fix thermal/conductivity comes before fix ave/spatial} :dt
The order of these 2 fixes in your input script is such that fix
thermal/conductivity comes first. If you are using fix ave/spatial to
measure the temperature profile induced by fix viscosity, then this
may cause a glitch in the profile since you are averaging immediately
after swaps have occurred. Flipping the order of the 2 fixes
typically helps. :dd
{Fix viscosity comes before fix ave/spatial} :dt
The order of these 2 fixes in your input script is such that
fix viscosity comes first. If you are using fix ave/spatial
to measure the velocity profile induced by fix viscosity, then
this may cause a glitch in the profile since you are averaging
immediately after swaps have occurred. Flipping the order
of the 2 fixes typically helps. :dd
{Fixes cannot send data in Kokkos communication, switching to classic communication} :dt
This is current restriction with Kokkos. :dd
{For better accuracy use 'pair_modify table 0'} :dt
The user-specified force accuracy cannot be achieved unless the table
feature is disabled by using 'pair_modify table 0'. :dd
{Geometric mixing assumed for 1/r^6 coefficients} :dt
Self-explanatory. :dd
{Group for fix_modify temp != fix group} :dt
The fix_modify command is specifying a temperature computation that
computes a temperature on a different group of atoms than the fix
itself operates on. This is probably not what you want to do. :dd
{H matrix size has been exceeded: m_fill=%d H.m=%d\n} :dt
This is the size of the matrix. :dd
{Ignoring unknown or incorrect info command flag} :dt
Self-explanatory. An unknown argument was given to the info command.
Compare your input with the documentation. :dd
{Improper atoms missing at step %ld} :dt
One or more of 4 atoms needed to compute a particular improper are
missing on this processor. Typically this is because the pairwise
cutoff is set too short or the improper has blown apart and an atom is
too far away. :dd
{Improper problem: %d %ld %d %d %d %d} :dt
Conformation of the 4 listed improper atoms is extreme; you may want
to check your simulation geometry. :dd
{Improper style in data file differs from currently defined improper style} :dt
Self-explanatory. :dd
{Inconsistent image flags} :dt
The image flags for a pair on bonded atoms appear to be inconsistent.
Inconsistent means that when the coordinates of the two atoms are
unwrapped using the image flags, the two atoms are far apart.
Specifically they are further apart than half a periodic box length.
Or they are more than a box length apart in a non-periodic dimension.
This is usually due to the initial data file not having correct image
flags for the 2 atoms in a bond that straddles a periodic boundary.
They should be different by 1 in that case. This is a warning because
inconsistent image flags will not cause problems for dynamics or most
LAMMPS simulations. However they can cause problems when such atoms
are used with the fix rigid or replicate commands. Note that if you
have an infinite periodic crystal with bonds then it is impossible to
have fully consistent image flags, since some bonds will cross
periodic boundaries and connect two atoms with the same image
flag. :dd
{KIM Model does not provide 'energy'; Potential energy will be zero} :dt
Self-explanatory. :dd
{KIM Model does not provide 'forces'; Forces will be zero} :dt
Self-explanatory. :dd
{KIM Model does not provide 'particleEnergy'; energy per atom will be zero} :dt
Self-explanatory. :dd
{KIM Model does not provide 'particleVirial'; virial per atom will be zero} :dt
Self-explanatory. :dd
{Kspace_modify slab param < 2.0 may cause unphysical behavior} :dt
The kspace_modify slab parameter should be larger to insure periodic
grids padded with empty space do not overlap. :dd
{Less insertions than requested} :dt
The fix pour command was unsuccessful at finding open space
for as many particles as it tried to insert. :dd
{Library error in lammps_gather_atoms} :dt
This library function cannot be used if atom IDs are not defined
or are not consecutively numbered. :dd
{Library error in lammps_scatter_atoms} :dt
This library function cannot be used if atom IDs are not defined or
are not consecutively numbered, or if no atom map is defined. See the
atom_modify command for details about atom maps. :dd
{Lost atoms via change_box: original %ld current %ld} :dt
The command options you have used caused atoms to be lost. :dd
{Lost atoms via displace_atoms: original %ld current %ld} :dt
The command options you have used caused atoms to be lost. :dd
{Lost atoms: original %ld current %ld} :dt
Lost atoms are checked for each time thermo output is done. See the
thermo_modify lost command for options. Lost atoms usually indicate
bad dynamics, e.g. atoms have been blown far out of the simulation
box, or moved futher than one processor's sub-domain away before
reneighboring. :dd
{MSM mesh too small, increasing to 2 points in each direction} :dt
Self-explanatory. :dd
{Mismatch between velocity and compute groups} :dt
The temperature computation used by the velocity command will not be
on the same group of atoms that velocities are being set for. :dd
{Mixing forced for lj coefficients} :dt
Self-explanatory. :dd
{Molecule attributes do not match system attributes} :dt
An attribute is specified (e.g. diameter, charge) that is
not defined for the specified atom style. :dd
{Molecule has bond topology but no special bond settings} :dt
This means the bonded atoms will not be excluded in pair-wise
interactions. :dd
{Molecule template for create_atoms has multiple molecules} :dt
The create_atoms command will only create molecules of a single type,
i.e. the first molecule in the template. :dd
{Molecule template for fix gcmc has multiple molecules} :dt
The fix gcmc command will only create molecules of a single type,
i.e. the first molecule in the template. :dd
{Molecule template for fix shake has multiple molecules} :dt
The fix shake command will only recoginze molecules of a single
type, i.e. the first molecule in the template. :dd
{More than one compute centro/atom} :dt
It is not efficient to use compute centro/atom more than once. :dd
{More than one compute cluster/atom} :dt
It is not efficient to use compute cluster/atom more than once. :dd
{More than one compute cna/atom defined} :dt
It is not efficient to use compute cna/atom more than once. :dd
{More than one compute contact/atom} :dt
It is not efficient to use compute contact/atom more than once. :dd
{More than one compute coord/atom} :dt
It is not efficient to use compute coord/atom more than once. :dd
{More than one compute damage/atom} :dt
It is not efficient to use compute ke/atom more than once. :dd
{More than one compute dilatation/atom} :dt
Self-explanatory. :dd
{More than one compute erotate/sphere/atom} :dt
It is not efficient to use compute erorate/sphere/atom more than once. :dd
{More than one compute hexorder/atom} :dt
It is not efficient to use compute hexorder/atom more than once. :dd
{More than one compute ke/atom} :dt
It is not efficient to use compute ke/atom more than once. :dd
{More than one compute orientorder/atom} :dt
It is not efficient to use compute orientorder/atom more than once. :dd
{More than one compute plasticity/atom} :dt
Self-explanatory. :dd
{More than one compute sna/atom} :dt
Self-explanatory. :dd
{More than one compute snad/atom} :dt
Self-explanatory. :dd
{More than one compute snav/atom} :dt
Self-explanatory. :dd
{More than one fix poems} :dt
It is not efficient to use fix poems more than once. :dd
{More than one fix rigid} :dt
It is not efficient to use fix rigid more than once. :dd
{Neighbor exclusions used with KSpace solver may give inconsistent Coulombic energies} :dt
This is because excluding specific pair interactions also excludes
them from long-range interactions which may not be the desired effect.
The special_bonds command handles this consistently by insuring
excluded (or weighted) 1-2, 1-3, 1-4 interactions are treated
consistently by both the short-range pair style and the long-range
solver. This is not done for exclusions of charged atom pairs via the
neigh_modify exclude command. :dd
{New thermo_style command, previous thermo_modify settings will be lost} :dt
If a thermo_style command is used after a thermo_modify command, the
settings changed by the thermo_modify command will be reset to their
default values. This is because the thermo_modify commmand acts on
the currently defined thermo style, and a thermo_style command creates
a new style. :dd
{No Kspace calculation with verlet/split} :dt
The 2nd partition performs a kspace calculation so the kspace_style
command must be used. :dd
{No automatic unit conversion to XTC file format conventions possible for units lj} :dt
This means no scaling will be performed. :dd
{No fixes defined, atoms won't move} :dt
If you are not using a fix like nve, nvt, npt then atom velocities and
coordinates will not be updated during timestepping. :dd
{No joints between rigid bodies, use fix rigid instead} :dt
The bodies defined by fix poems are not connected by joints. POEMS
will integrate the body motion, but it would be more efficient to use
fix rigid. :dd
{Not using real units with pair reax} :dt
This is most likely an error, unless you have created your own ReaxFF
parameter file in a different set of units. :dd
{Number of MSM mesh points changed to be a multiple of 2} :dt
MSM requires that the number of grid points in each direction be a multiple
of two and the number of grid points in one or more directions have been
adjusted to meet this requirement. :dd
{OMP_NUM_THREADS environment is not set.} :dt
This environment variable must be set appropriately to use the
USER-OMP package. :dd
{One or more atoms are time integrated more than once} :dt
This is probably an error since you typically do not want to
advance the positions or velocities of an atom more than once
per timestep. :dd
{One or more chunks do not contain all atoms in molecule} :dt
This may not be what you intended. :dd
{One or more dynamic groups may not be updated at correct point in timestep} :dt
If there are other fixes that act immediately after the intitial stage
of time integration within a timestep (i.e. after atoms move), then
the command that sets up the dynamic group should appear after those
fixes. This will insure that dynamic group assignments are made
after all atoms have moved. :dd
{One or more respa levels compute no forces} :dt
This is computationally inefficient. :dd
{Pair COMB charge %.10f with force %.10f hit max barrier} :dt
Something is possibly wrong with your model. :dd
{Pair COMB charge %.10f with force %.10f hit min barrier} :dt
Something is possibly wrong with your model. :dd
{Pair brownian needs newton pair on for momentum conservation} :dt
Self-explanatory. :dd
{Pair dpd needs newton pair on for momentum conservation} :dt
Self-explanatory. :dd
{Pair dsmc: num_of_collisions > number_of_A} :dt
Collision model in DSMC is breaking down. :dd
{Pair dsmc: num_of_collisions > number_of_B} :dt
Collision model in DSMC is breaking down. :dd
{Pair style in data file differs from currently defined pair style} :dt
Self-explanatory. :dd
{Particle deposition was unsuccessful} :dt
The fix deposit command was not able to insert as many atoms as
needed. The requested volume fraction may be too high, or other atoms
may be in the insertion region. :dd
{Proc sub-domain size < neighbor skin, could lead to lost atoms} :dt
The decomposition of the physical domain (likely due to load
balancing) has led to a processor's sub-domain being smaller than the
neighbor skin in one or more dimensions. Since reneighboring is
triggered by atoms moving the skin distance, this may lead to lost
atoms, if an atom moves all the way across a neighboring processor's
sub-domain before reneighboring is triggered. :dd
{Reducing PPPM order b/c stencil extends beyond nearest neighbor processor} :dt
This may lead to a larger grid than desired. See the kspace_modify overlap
command to prevent changing of the PPPM order. :dd
{Reducing PPPMDisp Coulomb order b/c stencil extends beyond neighbor processor} :dt
This may lead to a larger grid than desired. See the kspace_modify overlap
command to prevent changing of the PPPM order. :dd
{Reducing PPPMDisp dispersion order b/c stencil extends beyond neighbor processor} :dt
This may lead to a larger grid than desired. See the kspace_modify overlap
command to prevent changing of the PPPM order. :dd
{Replacing a fix, but new group != old group} :dt
The ID and style of a fix match for a fix you are changing with a fix
command, but the new group you are specifying does not match the old
group. :dd
{Replicating in a non-periodic dimension} :dt
The parameters for a replicate command will cause a non-periodic
dimension to be replicated; this may cause unwanted behavior. :dd
{Resetting reneighboring criteria during PRD} :dt
A PRD simulation requires that neigh_modify settings be delay = 0,
every = 1, check = yes. Since these settings were not in place,
LAMMPS changed them and will restore them to their original values
after the PRD simulation. :dd
{Resetting reneighboring criteria during TAD} :dt
A TAD simulation requires that neigh_modify settings be delay = 0,
every = 1, check = yes. Since these settings were not in place,
LAMMPS changed them and will restore them to their original values
after the PRD simulation. :dd
{Resetting reneighboring criteria during minimization} :dt
Minimization requires that neigh_modify settings be delay = 0, every =
1, check = yes. Since these settings were not in place, LAMMPS
changed them and will restore them to their original values after the
minimization. :dd
{Restart file used different # of processors} :dt
The restart file was written out by a LAMMPS simulation running on a
different number of processors. Due to round-off, the trajectories of
your restarted simulation may diverge a little more quickly than if
you ran on the same # of processors. :dd
{Restart file used different 3d processor grid} :dt
The restart file was written out by a LAMMPS simulation running on a
different 3d grid of processors. Due to round-off, the trajectories
of your restarted simulation may diverge a little more quickly than if
you ran on the same # of processors. :dd
{Restart file used different boundary settings, using restart file values} :dt
Your input script cannot change these restart file settings. :dd
{Restart file used different newton bond setting, using restart file value} :dt
The restart file value will override the setting in the input script. :dd
{Restart file used different newton pair setting, using input script value} :dt
The input script value will override the setting in the restart file. :dd
{Restrain problem: %d %ld %d %d %d %d} :dt
Conformation of the 4 listed dihedral atoms is extreme; you may want
to check your simulation geometry. :dd
{Running PRD with only one replica} :dt
This is allowed, but you will get no parallel speed-up. :dd
{SRD bin shifting turned on due to small lamda} :dt
This is done to try to preserve accuracy. :dd
{SRD bin size for fix srd differs from user request} :dt
Fix SRD had to adjust the bin size to fit the simulation box. See the
cubic keyword if you want this message to be an error vs warning. :dd
{SRD bins for fix srd are not cubic enough} :dt
The bin shape is not within tolerance of cubic. See the cubic
keyword if you want this message to be an error vs warning. :dd
{SRD particle %d started inside big particle %d on step %ld bounce %d} :dt
See the inside keyword if you want this message to be an error vs
warning. :dd
{SRD particle %d started inside wall %d on step %ld bounce %d} :dt
See the inside keyword if you want this message to be an error vs
warning. :dd
{Shake determinant < 0.0} :dt
The determinant of the quadratic equation being solved for a single
cluster specified by the fix shake command is numerically suspect. LAMMPS
will set it to 0.0 and continue. :dd
{Shell command '%s' failed with error '%s'} :dt
Self-explanatory. :dd
{Shell command returned with non-zero status} :dt
This may indicate the shell command did not operate as expected. :dd
{Should not allow rigid bodies to bounce off relecting walls} :dt
LAMMPS allows this, but their dynamics are not computed correctly. :dd
{Should not use fix nve/limit with fix shake or fix rattle} :dt
This will lead to invalid constraint forces in the SHAKE/RATTLE
computation. :dd
{Simulations might be very slow because of large number of structure factors} :dt
Self-explanatory. :dd
{Slab correction not needed for MSM} :dt
Slab correction is intended to be used with Ewald or PPPM and is not needed by MSM. :dd
{System is not charge neutral, net charge = %g} :dt
The total charge on all atoms on the system is not 0.0.
For some KSpace solvers this is only a warning. :dd
{Table inner cutoff >= outer cutoff} :dt
You specified an inner cutoff for a Coulombic table that is longer
than the global cutoff. Probably not what you wanted. :dd
{Temperature for MSST is not for group all} :dt
User-assigned temperature to MSST fix does not compute temperature for
all atoms. Since MSST computes a global pressure, the kinetic energy
contribution from the temperature is assumed to also be for all atoms.
Thus the pressure used by MSST could be inaccurate. :dd
{Temperature for NPT is not for group all} :dt
User-assigned temperature to NPT fix does not compute temperature for
all atoms. Since NPT computes a global pressure, the kinetic energy
contribution from the temperature is assumed to also be for all atoms.
Thus the pressure used by NPT could be inaccurate. :dd
{Temperature for fix modify is not for group all} :dt
The temperature compute is being used with a pressure calculation
which does operate on group all, so this may be inconsistent. :dd
{Temperature for thermo pressure is not for group all} :dt
User-assigned temperature to thermo via the thermo_modify command does
not compute temperature for all atoms. Since thermo computes a global
pressure, the kinetic energy contribution from the temperature is
assumed to also be for all atoms. Thus the pressure printed by thermo
could be inaccurate. :dd
{The fix ave/spatial command has been replaced by the more flexible fix ave/chunk and compute chunk/atom commands -- fix ave/spatial will be removed in the summer of 2015} :dt
Self-explanatory. :dd
{The minimizer does not re-orient dipoles when using fix efield} :dt
This means that only the atom coordinates will be minimized,
not the orientation of the dipoles. :dd
{Too many common neighbors in CNA %d times} :dt
More than the maximum # of neighbors was found multiple times. This
was unexpected. :dd
{Too many inner timesteps in fix ttm} :dt
Self-explanatory. :dd
{Too many neighbors in CNA for %d atoms} :dt
More than the maximum # of neighbors was found multiple times. This
was unexpected. :dd
{Triclinic box skew is large} :dt
The displacement in a skewed direction is normally required to be less
than half the box length in that dimension. E.g. the xy tilt must be
between -half and +half of the x box length. You have relaxed the
constraint using the box tilt command, but the warning means that a
LAMMPS simulation may be inefficient as a result. :dd
{Use special bonds = 0,1,1 with bond style fene} :dt
Most FENE models need this setting for the special_bonds command. :dd
{Use special bonds = 0,1,1 with bond style fene/expand} :dt
Most FENE models need this setting for the special_bonds command. :dd
{Using a manybody potential with bonds/angles/dihedrals and special_bond exclusions} :dt
This is likely not what you want to do. The exclusion settings will
eliminate neighbors in the neighbor list, which the manybody potential
needs to calculated its terms correctly. :dd
{Using compute temp/deform with inconsistent fix deform remap option} :dt
Fix nvt/sllod assumes deforming atoms have a velocity profile provided
by "remap v" or "remap none" as a fix deform option. :dd
{Using compute temp/deform with no fix deform defined} :dt
This is probably an error, since it makes little sense to use
compute temp/deform in this case. :dd
{Using fix srd with box deformation but no SRD thermostat} :dt
The deformation will heat the SRD particles so this can
be dangerous. :dd
{Using kspace solver on system with no charge} :dt
Self-explanatory. :dd
{Using largest cut-off for lj/long/dipole/long long long} :dt
Self-explanatory. :dd
{Using largest cutoff for buck/long/coul/long} :dt
Self-exlanatory. :dd
{Using largest cutoff for lj/long/coul/long} :dt
Self-explanatory. :dd
{Using largest cutoff for pair_style lj/long/tip4p/long} :dt
Self-explanatory. :dd
{Using package gpu without any pair style defined} :dt
Self-explanatory. :dd
{Using pair potential shift with pair_modify compute no} :dt
The shift effects will thus not be computed. :dd
{Using pair tail corrections with nonperiodic system} :dt
This is probably a bogus thing to do, since tail corrections are
computed by integrating the density of a periodic system out to
infinity. :dd
{Using pair tail corrections with pair_modify compute no} :dt
The tail corrections will thus not be computed. :dd
{pair style reax is now deprecated and will soon be retired. Users should switch to pair_style reax/c} :dt
Self-explanatory. :dd
:dle
diff --git a/doc/src/Section_packages.txt b/doc/src/Section_packages.txt
index 341483d7a..0e4b6037c 100644
--- a/doc/src/Section_packages.txt
+++ b/doc/src/Section_packages.txt
@@ -1,1873 +1,1874 @@
"Previous Section"_Section_commands.html - "LAMMPS WWW Site"_lws -
"LAMMPS Documentation"_ld - "LAMMPS Commands"_lc - "Next
Section"_Section_accelerate.html :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
4. Packages :h3
This section gives an overview of the add-on optional packages that
extend LAMMPS functionality. Packages are groups of files that enable
a specific set of features. For example, force fields for molecular
systems or granular systems are in packages. You can see the list of
all packages by typing "make package" from within the src directory of
the LAMMPS distribution.
Here are links for two tables below, which list standard and user
packages.
4.1 "Standard packages"_#pkg_1
4.2 "User packages"_#pkg_2 :all(b)
"Section 2.3"_Section_start.html#start_3 of the manual describes
the difference between standard packages and user packages. It also
has general details on how to include/exclude specific packages as
part of the LAMMPS build process, and on how to build auxiliary
libraries or modify a machine Makefile if a package requires it.
Following the two tables below, is a sub-section for each package. It
has a summary of what the package contains. It has specific
instructions on how to install it, build or obtain any auxiliary
library it requires, and any Makefile.machine changes it requires. It
also lists pointers to examples of its use or documentation provided
in the LAMMPS distribution. If you want to know the complete list of
commands that a package adds to LAMMPS, simply list the files in its
directory, e.g. "ls src/GRANULAR". Source files with names that start
with compute, fix, pair, bond, etc correspond to command styles with
the same names.
NOTE: The USER package sub-sections below are still being filled in,
as of March 2016.
Unless otherwise noted below, every package is independent of all the
others. I.e. any package can be included or excluded in a LAMMPS
build, independent of all other packages. However, note that some
packages include commands derived from commands in other packages. If
the other package is not installed, the derived command from the new
package will also not be installed when you include the new one.
E.g. the pair lj/cut/coul/long/omp command from the USER-OMP package
will not be installed as part of the USER-OMP package if the KSPACE
package is not also installed, since it contains the pair
lj/cut/coul/long command. If you later install the KSPACE package and
the USER-OMP package is already installed, both the pair
lj/cut/coul/long and lj/cut/coul/long/omp commands will be installed.
:line
4.1 Standard packages :h4,link(pkg_1)
The current list of standard packages is as follows. Each package
name links to a sub-section below with more details.
Package, Description, Author(s), Doc page, Example, Library
"ASPHERE"_#ASPHERE, aspherical particles, -, "Section 6.6.14"_Section_howto.html#howto_14, ellipse, -
"BODY"_#BODY, body-style particles, -, "body"_body.html, body, -
"CLASS2"_#CLASS2, class 2 force fields, -, "pair_style lj/class2"_pair_class2.html, -, -
"COLLOID"_#COLLOID, colloidal particles, Kumar (1), "atom_style colloid"_atom_style.html, colloid, -
"COMPRESS"_#COMPRESS, I/O compression, Axel Kohlmeyer (Temple U), "dump */gz"_dump.html, -, -
"CORESHELL"_#CORESHELL, adiabatic core/shell model, Hendrik Heenen (Technical U of Munich), "Section 6.6.25"_Section_howto.html#howto_25, coreshell, -
"DIPOLE"_#DIPOLE, point dipole particles, -, "pair_style dipole/cut"_pair_dipole.html, dipole, -
"GPU"_#GPU, GPU-enabled styles, Mike Brown (ORNL), "Section 5.3.1"_accelerate_gpu.html, gpu, lib/gpu
"GRANULAR"_#GRANULAR, granular systems, -, "Section 6.6.6"_Section_howto.html#howto_6, pour, -
"KIM"_#KIM, openKIM potentials, Smirichinski & Elliot & Tadmor (3), "pair_style kim"_pair_kim.html, kim, KIM
"KOKKOS"_#KOKKOS, Kokkos-enabled styles, Trott & Moore (4), "Section 5.3.3"_accelerate_kokkos.html, kokkos, lib/kokkos
"KSPACE"_#KSPACE, long-range Coulombic solvers, -, "kspace_style"_kspace_style.html, peptide, -
"MANYBODY"_#MANYBODY, many-body potentials, -, "pair_style tersoff"_pair_tersoff.html, shear, -
"MEAM"_#MEAM, modified EAM potential, Greg Wagner (Sandia), "pair_style meam"_pair_meam.html, meam, lib/meam
"MC"_#MC, Monte Carlo options, -, "fix gcmc"_fix_gcmc.html, -, -
"MOLECULE"_#MOLECULE, molecular system force fields, -, "Section 6.6.3"_Section_howto.html#howto_3, peptide, -
"OPT"_#OPT, optimized pair styles, Fischer & Richie & Natoli (2), "Section 5.3.5"_accelerate_opt.html, -, -
"PERI"_#PERI, Peridynamics models, Mike Parks (Sandia), "pair_style peri"_pair_peri.html, peri, -
"POEMS"_#POEMS, coupled rigid body motion, Rudra Mukherjee (JPL), "fix poems"_fix_poems.html, rigid, lib/poems
"PYTHON"_#PYTHON, embed Python code in an input script, -, "python"_python.html, python, lib/python
"REAX"_#REAX, ReaxFF potential, Aidan Thompson (Sandia), "pair_style reax"_pair_reax.html, reax, lib/reax
"REPLICA"_#REPLICA, multi-replica methods, -, "Section 6.6.5"_Section_howto.html#howto_5, tad, -
"RIGID"_#RIGID, rigid bodies, -, "fix rigid"_fix_rigid.html, rigid, -
"SHOCK"_#SHOCK, shock loading methods, -, "fix msst"_fix_msst.html, -, -
"SNAP"_#SNAP, quantum-fit potential, Aidan Thompson (Sandia), "pair snap"_pair_snap.html, snap, -
"SRD"_#SRD, stochastic rotation dynamics, -, "fix srd"_fix_srd.html, srd, -
"VORONOI"_#VORONOI, Voronoi tesselations, Daniel Schwen (LANL), "compute voronoi/atom"_compute_voronoi_atom.html, -, Voro++
:tb(ea=c)
The "Authors" column lists a name(s) if a specific person is
responible for creating and maintaining the package.
(1) The COLLOID package includes Fast Lubrication Dynamics pair styles
which were created by Amit Kumar and Michael Bybee from Jonathan
Higdon's group at UIUC.
(2) The OPT package was created by James Fischer (High Performance
Technologies), David Richie, and Vincent Natoli (Stone Ridge
Technolgy).
(3) The KIM package was created by Valeriu Smirichinski, Ryan Elliott,
and Ellad Tadmor (U Minn).
(4) The KOKKOS package was created primarily by Christian Trott and
Stan Moore (Sandia). It uses the Kokkos library which was developed
by Carter Edwards, Christian Trott, and others at Sandia.
The "Doc page" column links to either a sub-section of the
"Section 6"_Section_howto.html of the manual, or an input script
command implemented as part of the package, or to additional
documentation provided within the package.
The "Example" column is a sub-directory in the examples directory of
the distribution which has an input script that uses the package.
E.g. "peptide" refers to the examples/peptide directory.
The "Library" column lists an external library which must be built
first and which LAMMPS links to when it is built. If it is listed as
lib/package, then the code for the library is under the lib directory
of the LAMMPS distribution. See the lib/package/README file for info
on how to build the library. If it is not listed as lib/package, then
it is a third-party library not included in the LAMMPS distribution.
See details on all of this below for individual packages.
:line
ASPHERE package :link(ASPHERE),h5
Contents: Several computes, time-integration fixes, and pair styles
for aspherical particle models: ellipsoids, 2d lines, 3d triangles.
To install via make or Make.py:
make yes-asphere
make machine :pre
Make.py -p asphere -a machine :pre
To un-install via make or Make.py:
make no-asphere
make machine :pre
Make.py -p ^asphere -a machine :pre
Supporting info: "Section 6.14"_Section_howto.html#howto_14,
"pair_style gayberne"_pair_gayberne.html, "pair_style
resquared"_pair_resquared.html,
"doc/PDF/pair_gayberne_extra.pdf"_PDF/pair_gayberne_extra.pdf,
"doc/PDF/pair_resquared_extra.pdf"_PDF/pair_resquared_extra.pdf,
examples/ASPHERE, examples/ellipse
:line
BODY package :link(BODY),h5
Contents: Support for body-style particles. Computes,
time-integration fixes, pair styles, as well as the body styles
themselves. See the "body"_body.html doc page for an overview.
To install via make or Make.py:
make yes-body
make machine :pre
Make.py -p body -a machine :pre
To un-install via make or Make.py:
make no-body
make machine :pre
Make.py -p ^body -a machine :pre
Supporting info: "atom_style body"_atom_style.html, "body"_body.html,
"pair_style body"_pair_body.html, examples/body
:line
CLASS2 package :link(CLASS2),h5
Contents: Bond, angle, dihedral, improper, and pair styles for the
COMPASS CLASS2 molecular force field.
To install via make or Make.py:
make yes-class2
make machine :pre
Make.py -p class2 -a machine :pre
To un-install via make or Make.py:
make no-class2
make machine :pre
Make.py -p ^class2 -a machine :pre
Supporting info: "bond_style class2"_bond_class2.html, "angle_style
class2"_angle_class2.html, "dihedral_style
class2"_dihedral_class2.html, "improper_style
class2"_improper_class2.html, "pair_style lj/class2"_pair_class2.html
:line
COLLOID package :link(COLLOID),h5
Contents: Support for coarse-grained colloidal particles. Wall fix
and pair styles that implement colloidal interaction models for
finite-size particles. This includes the Fast Lubrication Dynamics
method for hydrodynamic interactions, which is a simplified
approximation to Stokesian dynamics.
To install via make or Make.py:
make yes-colloid
make machine :pre
Make.py -p colloid -a machine :pre
To un-install via make or Make.py:
make no-colloid
make machine :pre
Make.py -p ^colloid -a machine :pre
Supporting info: "fix wall/colloid"_fix_wall.html, "pair_style
colloid"_pair_colloid.html, "pair_style
yukawa/colloid"_pair_yukawa_colloid.html, "pair_style
brownian"_pair_brownian.html, "pair_style
lubricate"_pair_lubricate.html, "pair_style
lubricateU"_pair_lubricateU.html, examples/colloid, examples/srd
:line
COMPRESS package :link(COMPRESS),h5
Contents: Support for compressed output of dump files via the zlib
compression library, using dump styles with a "gz" in their style
name.
Building with the COMPRESS package assumes you have the zlib
compression library available on your system. The build uses the
lib/compress/Makefile.lammps file in the compile/link process. You
should only need to edit this file if the LAMMPS build cannot find the
zlib info it specifies.
To install via make or Make.py:
make yes-compress
make machine :pre
Make.py -p compress -a machine :pre
To un-install via make or Make.py:
make no-compress
make machine :pre
Make.py -p ^compress -a machine :pre
Supporting info: src/COMPRESS/README, lib/compress/README, "dump
atom/gz"_dump.html, "dump cfg/gz"_dump.html, "dump
custom/gz"_dump.html, "dump xyz/gz"_dump.html
:line
CORESHELL package :link(CORESHELL),h5
Contents: Compute and pair styles that implement the adiabatic
core/shell model for polarizability. The compute temp/cs command
measures the temperature of a system with core/shell particles. The
pair styles augment Born, Buckingham, and Lennard-Jones styles with
core/shell capabilities. See "Section 6.26"_Section_howto.html#howto_26
for an overview of how to use the package.
To install via make or Make.py:
make yes-coreshell
make machine :pre
Make.py -p coreshell -a machine :pre
To un-install via make or Make.py:
make no-coreshell
make machine :pre
Make.py -p ^coreshell -a machine :pre
Supporting info: "Section 6.26"_Section_howto.html#howto_26,
"compute temp/cs"_compute_temp_cs.html,
"pair_style born/coul/long/cs"_pair_cs.html, "pair_style
buck/coul/long/cs"_pair_cs.html, pair_style
lj/cut/coul/long/cs"_pair_lj.html, examples/coreshell
:line
DIPOLE package :link(DIPOLE),h5
Contents: An atom style and several pair styles to support point
dipole models with short-range or long-range interactions.
To install via make or Make.py:
make yes-dipole
make machine :pre
Make.py -p dipole -a machine :pre
To un-install via make or Make.py:
make no-dipole
make machine :pre
Make.py -p ^dipole -a machine :pre
Supporting info: "atom_style dipole"_atom_style.html, "pair_style
lj/cut/dipole/cut"_pair_dipole.html, "pair_style
lj/cut/dipole/long"_pair_dipole.html, "pair_style
lj/long/dipole/long"_pair_dipole.html, examples/dipole
:line
GPU package :link(GPU),h5
Contents: Dozens of pair styles and a version of the PPPM long-range
Coulombic solver for NVIDIA GPUs. All of them have a "gpu" in their
style name. "Section 5.3.1"_accelerate_gpu.html gives
details of what hardware and Cuda software is required on your system,
and how to build and use this package. See the KOKKOS package, which
also has GPU-enabled styles.
Building LAMMPS with the GPU package requires first building the GPU
library itself, which is a set of C and Cuda files in lib/gpu.
Details of how to do this are in lib/gpu/README. As illustrated
below, perform a "make" using one of the Makefile.machine files in
lib/gpu which should create a lib/reax/libgpu.a file.
Makefile.linux.* and Makefile.xk7 are examples for different
platforms. There are 3 important settings in the Makefile.machine you
use:
CUDA_HOME = where NVIDIA Cuda software is installed on your system
CUDA_ARCH = appropriate to your GPU hardware
CUDA_PREC = precision (double, mixed, single) you desire :ul
See example Makefile.machine files in lib/gpu for the syntax of these
settings. See lib/gpu/Makefile.linux.double for ARCH settings for
various NVIDIA GPUs. The "make" also creates a
lib/gpu/Makefile.lammps file. This file has settings that enable
LAMMPS to link with Cuda libraries. If the settings in
Makefile.lammps for your machine are not correct, the LAMMPS link will
fail. Note that the Make.py script has a "-gpu" option to allow the
GPU library (with several of its options) and LAMMPS to be built in
one step, with Type "python src/Make.py -h -gpu" to see the details.
To install via make or Make.py:
cd ~/lammps/lib/gpu
make -f Makefile.linux.mixed # for example
cd ~/lammps/src
make yes-gpu
make machine :pre
Make.py -p gpu -gpu mode=mixed arch=35 -a machine :pre
To un-install via make or Make.py:
make no-gpu
make machine :pre
Make.py -p ^gpu -a machine :pre
Supporting info: src/GPU/README, lib/gpu/README,
"Section 5.3"_Section_accelerate.html#acc_3,
"Section 5.3.1"_accelerate_gpu.html,
Pair Styles section of "Section 3.5"_Section_commands.html#cmd_5
for any pair style listed with a (g),
"kspace_style"_kspace_style.html, "package gpu"_package.html,
examples/accelerate, bench/FERMI, bench/KEPLER
:line
GRANULAR package :link(GRANULAR),h5
Contents: Fixes and pair styles that support models of finite-size
granular particles, which interact with each other and boundaries via
frictional and dissipative potentials.
To install via make or Make.py:
make yes-granular
make machine :pre
Make.py -p granular -a machine :pre
To un-install via make or Make.py:
make no-granular
make machine :pre
Make.py -p ^granular -a machine :pre
Supporting info: "Section 6.6"_Section_howto.html#howto_6, "fix
pour"_fix_pour.html, "fix wall/gran"_fix_wall_gran.html, "pair_style
gran/hooke"_pair_gran.html, "pair_style
gran/hertz/history"_pair_gran.html, examples/pour, bench/in.chute
:line
KIM package :link(KIM),h5
Contents: A pair style that interfaces to the Knowledge Base for
Interatomic Models (KIM) repository of interatomic potentials, so that
KIM potentials can be used in a LAMMPS simulation.
To build LAMMPS with the KIM package you must have previously
installed the KIM API (library) on your system. The lib/kim/README
file explains how to download and install KIM. Building with the KIM
package also uses the lib/kim/Makefile.lammps file in the compile/link
process. You should not need to edit this file.
To install via make or Make.py:
make yes-kim
make machine :pre
Make.py -p kim -a machine :pre
To un-install via make or Make.py:
make no-kim
make machine :pre
Make.py -p ^kim -a machine :pre
Supporting info: src/KIM/README, lib/kim/README, "pair_style
kim"_pair_kim.html, examples/kim
:line
KOKKOS package :link(KOKKOS),h5
Contents: Dozens of atom, pair, bond, angle, dihedral, improper styles
which run with the Kokkos library to provide optimization for
multicore CPUs (via OpenMP), NVIDIA GPUs, or the Intel Xeon Phi (in
native mode). All of them have a "kk" in their style name. "Section
5.3.3"_accelerate_kokkos.html gives details of what
hardware and software is required on your system, and how to build and
use this package. See the GPU, OPT, USER-INTEL, USER-OMP packages,
which also provide optimizations for the same range of hardware.
Building with the KOKKOS package requires choosing which of 3 hardware
options you are optimizing for: CPU acceleration via OpenMP, GPU
acceleration, or Intel Xeon Phi. (You can build multiple times to
create LAMMPS executables for different hardware.) It also requires a
C++11 compatible compiler. For GPUs, the NVIDIA "nvcc" compiler is
used, and an appopriate KOKKOS_ARCH setting should be made in your
Makefile.machine for your GPU hardware and NVIDIA software.
The simplest way to do this is to use Makefile.kokkos_cuda or
Makefile.kokkos_omp or Makefile.kokkos_phi in src/MAKE/OPTIONS, via
"make kokkos_cuda" or "make kokkos_omp" or "make kokkos_phi". (Check
the KOKKOS_ARCH setting in Makefile.kokkos_cuda), Or, as illustrated
below, you can use the Make.py script with its "-kokkos" option to
choose which hardware to build for. Type "python src/Make.py -h
-kokkos" to see the details. If these methods do not work on your
system, you will need to read the "Section 5.3.3"_accelerate_kokkos.html
doc page for details of what Makefile.machine settings are needed.
To install via make or Make.py for each of 3 hardware options:
make yes-kokkos
make kokkos_omp # for CPUs with OpenMP
make kokkos_cuda # for GPUs, check the KOKKOS_ARCH setting in Makefile.kokkos_cuda
make kokkos_phi # for Xeon Phis :pre
Make.py -p kokkos -kokkos omp -a machine # for CPUs with OpenMP
Make.py -p kokkos -kokkos cuda arch=35 -a machine # for GPUs of style arch
Make.py -p kokkos -kokkos phi -a machine # for Xeon Phis
To un-install via make or Make.py:
make no-kokkos
make machine :pre
Make.py -p ^kokkos -a machine :pre
Supporting info: src/KOKKOS/README, lib/kokkos/README,
"Section 5.3"_Section_accelerate.html#acc_3,
"Section 5.3.3"_accelerate_kokkos.html,
Pair Styles section of "Section 3.5"_Section_commands.html#cmd_5
for any pair style listed with a (k), "package kokkos"_package.html,
examples/accelerate, bench/FERMI, bench/KEPLER
:line
KSPACE package :link(KSPACE),h5
Contents: A variety of long-range Coulombic solvers, and pair styles
which compute the corresponding short-range portion of the pairwise
Coulombic interactions. These include Ewald, particle-particle
particle-mesh (PPPM), and multilevel summation method (MSM) solvers.
Building with the KSPACE package requires a 1d FFT library be present
on your system for use by the PPPM solvers. This can be the KISS FFT
library provided with LAMMPS, or 3rd party libraries like FFTW or a
vendor-supplied FFT library. See step 6 of "Section
2.2.2"_Section_start.html#start_2_2 of the manual for details of how
to select different FFT options in your machine Makefile. The Make.py
tool has an "-fft" option which can insert these settings into your
machine Makefile automatically. Type "python src/Make.py -h -fft" to
see the details.
To install via make or Make.py:
make yes-kspace
make machine :pre
Make.py -p kspace -a machine :pre
To un-install via make or Make.py:
make no-kspace
make machine :pre
Make.py -p ^kspace -a machine :pre
Supporting info: "kspace_style"_kspace_style.html,
"doc/PDF/kspace.pdf"_PDF/kspace.pdf,
"Section 6.7"_Section_howto.html#howto_7,
"Section 6.8"_Section_howto.html#howto_8,
"Section 6.9"_Section_howto.html#howto_9,
"pair_style coul"_pair_coul.html, other pair style command doc pages
which have "long" or "msm" in their style name,
examples/peptide, bench/in.rhodo
:line
MANYBODY package :link(MANYBODY),h5
Contents: A variety of many-body and bond-order potentials. These
include (AI)REBO, EAM, EIM, BOP, Stillinger-Weber, and Tersoff
potentials. Do a directory listing, "ls src/MANYBODY", to see
the full list.
To install via make or Make.py:
make yes-manybody
make machine :pre
Make.py -p manybody -a machine :pre
To un-install via make or Make.py:
make no-manybody
make machine :pre
Make.py -p ^manybody -a machine :pre
Supporting info:
Examples: Pair Styles section of "Section
3.5"_Section_commands.html#cmd_5, examples/comb, examples/eim,
examples/nb3d, examples/vashishta
:line
MC package :link(MC),h5
Contents: Several fixes and a pair style that have Monte Carlo (MC) or
MC-like attributes. These include fixes for creating, breaking, and
swapping bonds, and for performing atomic swaps and grand-canonical MC
in conjuction with dynamics.
To install via make or Make.py:
make yes-mc
make machine :pre
Make.py -p mc -a machine :pre
To un-install via make or Make.py:
make no-mc
make machine :pre
Make.py -p ^mc -a machine :pre
Supporting info: "fix atom/swap"_fix_atom_swap.html, "fix
bond/break"_fix_bond_break.html, "fix
bond/create"_fix_bond_create.html, "fix bond/swap"_fix_bond_swap.html,
"fix gcmc"_fix_gcmc.html, "pair_style dsmc"_pair_dsmc.html
:line
MEAM package :link(MEAM),h5
Contents: A pair style for the modified embedded atom (MEAM)
potential.
Building LAMMPS with the MEAM package requires first building the MEAM
library itself, which is a set of Fortran 95 files in lib/meam.
Details of how to do this are in lib/meam/README. As illustrated
below, perform a "make" using one of the Makefile.machine files in
lib/meam which should create a lib/meam/libmeam.a file.
Makefile.gfortran and Makefile.ifort are examples for the GNU Fortran
and Intel Fortran compilers. The "make" also copies a
lib/meam/Makefile.lammps.machine file to lib/meam/Makefile.lammps.
This file has settings that enable the C++ compiler used to build
LAMMPS to link with a Fortran library (typically the 2 compilers to be
consistent e.g. both Intel compilers, or both GNU compilers). If the
settings in Makefile.lammps for your compilers and machine are not
correct, the LAMMPS link will fail. Note that the Make.py script has
a "-meam" option to allow the MEAM library and LAMMPS to be built in
one step. Type "python src/Make.py -h -meam" to see the details.
NOTE: The MEAM potential can run dramatically faster if built with the
Intel Fortran compiler, rather than the GNU Fortran compiler.
To install via make or Make.py:
cd ~/lammps/lib/meam
make -f Makefile.gfortran # for example
cd ~/lammps/src
make yes-meam
make machine :pre
Make.py -p meam -meam make=gfortran -a machine :pre
To un-install via make or Make.py:
make no-meam
make machine :pre
Make.py -p ^meam -a machine :pre
Supporting info: lib/meam/README, "pair_style meam"_pair_meam.html,
examples/meam
:line
MISC package :link(MISC),h5
Contents: A variety of computes, fixes, and pair styles that are not
commonly used, but don't align with other packages. Do a directory
listing, "ls src/MISC", to see the list of commands.
To install via make or Make.py:
make yes-misc
make machine :pre
Make.py -p misc -a machine :pre
To un-install via make or Make.py:
make no-misc
make machine :pre
Make.py -p ^misc -a machine :pre
Supporting info: "compute ti"_compute_ti.html, "fix
evaporate"_fix_evaporate.html, "fix tmm"_fix_ttm.html, "fix
viscosity"_fix_viscosity.html, examples/misc
:line
MOLECULE package :link(MOLECULE),h5
Contents: A large number of atom, pair, bond, angle, dihedral,
improper styles that are used to model molecular systems with fixed
covalent bonds. The pair styles include terms for the Dreiding
(hydrogen-bonding) and CHARMM force fields, and TIP4P water model.
To install via make or Make.py:
make yes-molecule
make machine :pre
Make.py -p molecule -a machine :pre
To un-install via make or Make.py:
make no-molecule
make machine :pre
Make.py -p ^molecule -a machine :pre
Supporting info:"atom_style"_atom_style.html,
"bond_style"_bond_style.html, "angle_style"_angle_style.html,
"dihedral_style"_dihedral_style.html,
"improper_style"_improper_style.html, "pair_style
hbond/dreiding/lj"_pair_hbond_dreiding.html, "pair_style
lj/charmm/coul/charmm"_pair_charmm.html,
"Section 6.3"_Section_howto.html#howto_3,
examples/micelle, examples/peptide, bench/in.chain, bench/in.rhodo
:line
MPIIO package :link(MPIIO),h5
Contents: Support for parallel output/input of dump and restart files
via the MPIIO library, which is part of the standard message-passing
interface (MPI) library. It adds "dump styles"_dump.html with a
"mpiio" in their style name. Restart files with an ".mpiio" suffix
are also written and read in parallel.
To install via make or Make.py:
make yes-mpiio
make machine :pre
Make.py -p mpiio -a machine :pre
To un-install via make or Make.py:
make no-mpiio
make machine :pre
Make.py -p ^mpiio -a machine :pre
Supporting info: "dump"_dump.html, "restart"_restart.html,
"write_restart"_write_restart.html, "read_restart"_read_restart.html
:line
OPT package :link(OPT),h5
Contents: A handful of pair styles with an "opt" in their style name
which are optimized for improved CPU performance on single or multiple
cores. These include EAM, LJ, CHARMM, and Morse potentials. "Section
5.3.5"_accelerate_opt.html gives details of how to build and
use this package. See the KOKKOS, USER-INTEL, and USER-OMP packages,
which also have styles optimized for CPU performance.
Some C++ compilers, like the Intel compiler, require the compile flag
"-restrict" to build LAMMPS with the OPT package. It should be added
to the CCFLAGS line of your Makefile.machine. Or use Makefile.opt in
src/MAKE/OPTIONS, via "make opt". For compilers that use the flag,
the Make.py command adds it automatically to the Makefile.auto file it
creates and uses.
To install via make or Make.py:
make yes-opt
make machine :pre
Make.py -p opt -a machine :pre
To un-install via make or Make.py:
make no-opt
make machine :pre
Make.py -p ^opt -a machine :pre
Supporting info: "Section 5.3"_Section_accelerate.html#acc_3,
"Section 5.3.5"_accelerate_opt.html, Pair Styles section of
"Section 3.5"_Section_commands.html#cmd_5 for any pair style
listed with an (t), examples/accelerate, bench/KEPLER
:line
PERI package :link(PERI),h5
Contents: Support for the Peridynamics method, a particle-based
meshless continuum model. The package includes an atom style, several
computes which calculate diagnostics, and several Peridynamic pair
styles which implement different materials models.
To install via make or Make.py:
make yes-peri
make machine :pre
Make.py -p peri -a machine :pre
To un-install via make or Make.py:
make no-peri
make machine :pre
Make.py -p ^peri -a machine :pre
Supporting info:
"doc/PDF/PDLammps_overview.pdf"_PDF/PDLammps_overview.pdf,
"doc/PDF/PDLammps_EPS.pdf"_PDF/PDLammps_EPS.pdf,
"doc/PDF/PDLammps_VES.pdf"_PDF/PDLammps_VES.pdf, "atom_style
peri"_atom_style.html, "compute damage/atom"_compute_damage_atom.html,
"pair_style peri/pmb"_pair_peri.html, examples/peri
:line
POEMS package :link(POEMS),h5
Contents: A fix that wraps the Parallelizable Open source Efficient
Multibody Software (POEMS) librar, which is able to simulate the
dynamics of articulated body systems. These are systems with multiple
rigid bodies (collections of atoms or particles) whose motion is
coupled by connections at hinge points.
Building LAMMPS with the POEMS package requires first building the
POEMS library itself, which is a set of C++ files in lib/poems.
Details of how to do this are in lib/poems/README. As illustrated
below, perform a "make" using one of the Makefile.machine files in
lib/poems which should create a lib/meam/libpoems.a file.
Makefile.g++ and Makefile.icc are examples for the GNU and Intel C++
compilers. The "make" also creates a lib/poems/Makefile.lammps file
which you should not need to change. Note the Make.py script has a
"-poems" option to allow the POEMS library and LAMMPS to be built in
one step. Type "python src/Make.py -h -poems" to see the details.
To install via make or Make.py:
cd ~/lammps/lib/poems
make -f Makefile.g++ # for example
cd ~/lammps/src
make yes-poems
make machine :pre
Make.py -p poems -poems make=g++ -a machine :pre
To un-install via make or Make.py:
make no-meam
make machine :pre
Make.py -p ^meam -a machine :pre
Supporting info: src/POEMS/README, lib/poems/README,
"fix poems"_fix_poems.html, examples/rigid
:line
PYTHON package :link(PYTHON),h5
Contents: A "python"_python.html command which allow you to execute
Python code from a LAMMPS input script. The code can be in a separate
file or embedded in the input script itself. See "Section
11.2"_Section_python.html#py_2 for an overview of using Python from
LAMMPS and for other ways to use LAMMPS and Python together.
Building with the PYTHON package assumes you have a Python shared
library available on your system, which needs to be a Python 2
version, 2.6 or later. Python 3 is not yet supported. The build uses
the contents of the lib/python/Makefile.lammps file to find all the Python
files required in the build/link process. See the lib/python/README
file if the settings in that file do not work on your system. Note
that the Make.py script has a "-python" option to allow an alternate
lib/python/Makefile.lammps file to be specified and LAMMPS to be built
in one step. Type "python src/Make.py -h -python" to see the details.
To install via make or Make.py:
make yes-python
make machine :pre
Make.py -p python -a machine :pre
To un-install via make or Make.py:
make no-python
make machine :pre
Make.py -p ^python -a machine :pre
Supporting info: examples/python
:line
QEQ package :link(QEQ),h5
Contents: Several fixes for performing charge equilibration (QEq) via
severeal different algorithms. These can be used with pair styles
that use QEq as part of their formulation.
To install via make or Make.py:
make yes-qeq
make machine :pre
Make.py -p qeq -a machine :pre
To un-install via make or Make.py:
make no-qeq
make machine :pre
Make.py -p ^qeq -a machine :pre
Supporting info: "fix qeq/*"_fix_qeq.html, examples/qeq
:line
REAX package :link(REAX),h5
Contents: A pair style for the ReaxFF potential, a universal reactive
force field, as well as a "fix reax/bonds"_fix_reax_bonds.html command
for monitoring molecules as bonds are created and destroyed.
Building LAMMPS with the REAX package requires first building the REAX
library itself, which is a set of Fortran 95 files in lib/reax.
Details of how to do this are in lib/reax/README. As illustrated
below, perform a "make" using one of the Makefile.machine files in
lib/reax which should create a lib/reax/libreax.a file.
Makefile.gfortran and Makefile.ifort are examples for the GNU Fortran
and Intel Fortran compilers. The "make" also copies a
lib/reax/Makefile.lammps.machine file to lib/reax/Makefile.lammps.
This file has settings that enable the C++ compiler used to build
LAMMPS to link with a Fortran library (typically the 2 compilers to be
consistent e.g. both Intel compilers, or both GNU compilers). If the
settings in Makefile.lammps for your compilers and machine are not
correct, the LAMMPS link will fail. Note that the Make.py script has
a "-reax" option to allow the REAX library and LAMMPS to be built in
one step. Type "python src/Make.py -h -reax" to see the details.
To install via make or Make.py:
cd ~/lammps/lib/reax
make -f Makefile.gfortran # for example
cd ~/lammps/src
make yes-reax
make machine :pre
Make.py -p reax -reax make=gfortran -a machine :pre
To un-install via make or Make.py:
make no-reax
make machine :pre
Make.py -p ^reax -a machine :pre
Supporting info: lib/reax/README, "pair_style reax"_pair_reax.html,
"fix reax/bonds"_fix_reax_bonds.html, examples/reax
:line
REPLICA package :link(REPLICA),h5
Contents: A collection of multi-replica methods that are used by
invoking multiple instances (replicas) of LAMMPS
simulations. Communication between individual replicas is performed in
different ways by the different methods. See "Section
6.5"_Section_howto.html#howto_5 for an overview of how to run
multi-replica simulations in LAMMPS. Multi-replica methods included
in the package are nudged elastic band (NEB), parallel replica
dynamics (PRD), temperature accelerated dynamics (TAD), parallel
tempering, and a verlet/split algorithm for performing long-range
Coulombics on one set of processors, and the remainded of the force
field calcalation on another set.
To install via make or Make.py:
make yes-replica
make machine :pre
Make.py -p replica -a machine :pre
To un-install via make or Make.py:
make no-replica
make machine :pre
Make.py -p ^replica -a machine :pre
Supporting info: "Section 6.5"_Section_howto.html#howto_5,
"neb"_neb.html, "prd"_prd.html, "tad"_tad.html, "temper"_temper.html,
"run_style verlet/split"_run_style.html, examples/neb, examples/prd,
examples/tad
:line
RIGID package :link(RIGID),h5
Contents: A collection of computes and fixes which enforce rigid
constraints on collections of atoms or particles. This includes SHAKE
and RATTLE, as well as variants of rigid-body time integrators for a
few large bodies or many small bodies.
To install via make or Make.py:
make yes-rigid
make machine :pre
Make.py -p rigid -a machine :pre
To un-install via make or Make.py:
make no-rigid
make machine :pre
Make.py -p ^rigid -a machine :pre
Supporting info: "compute erotate/rigid"_compute_erotate_rigid.html,
"fix shake"_fix_shake.html, "fix rattle"_fix_shake.html, "fix
rigid/*"_fix_rigid.html, examples/ASPHERE, examples/rigid
:line
SHOCK package :link(SHOCK),h5
Contents: A small number of fixes useful for running impact
simulations where a shock-wave passes through a material.
To install via make or Make.py:
make yes-shock
make machine :pre
Make.py -p shock -a machine :pre
To un-install via make or Make.py:
make no-shock
make machine :pre
Make.py -p ^shock -a machine :pre
Supporting info: "fix append/atoms"_fix_append_atoms.html, "fix
msst"_fix_msst.html, "fix nphug"_fix_nphug.html, "fix
wall/piston"_fix_wall_piston.html, examples/hugoniostat, examples/msst
:line
SNAP package :link(SNAP),h5
Contents: A pair style for the spectral neighbor analysis potential
(SNAP), which is an empirical potential which can be quantum accurate
when fit to an archive of DFT data. Computes useful for analyzing
properties of the potential are also included.
To install via make or Make.py:
make yes-snap
make machine :pre
Make.py -p snap -a machine :pre
To un-install via make or Make.py:
make no-snap
make machine :pre
Make.py -p ^snap -a machine :pre
Supporting info: "pair snap"_pair_snap.html, "compute
sna/atom"_compute_sna_atom.html, "compute snad/atom"_compute_sna_atom.html,
"compute snav/atom"_compute_sna_atom.html, examples/snap
:line
SRD package :link(SRD),h5
Contents: Two fixes which implement the Stochastic Rotation Dynamics
(SRD) method for coarse-graining of a solvent, typically around large
colloidal-scale particles.
To install via make or Make.py:
make yes-srd
make machine :pre
Make.py -p srd -a machine :pre
To un-install via make or Make.py:
make no-srd
make machine :pre
Make.py -p ^srd -a machine :pre
Supporting info: "fix srd"_fix_srd.html, "fix
wall/srd"_fix_wall_srd.html, examples/srd, examples/ASPHERE
:line
VORONOI package :link(VORONOI),h5
Contents: A "compute voronoi/atom"_compute_voronoi_atom.html command
which computes the Voronoi tesselation of a collection of atoms or
particles by wrapping the Voro++ lib
To build LAMMPS with the KIM package you must have previously
installed the KIM API (library) on your system. The lib/kim/README
file explains how to download and install KIM. Building with the KIM
package also uses the lib/kim/Makefile.lammps file in the compile/link
process. You should not need to edit this file.
To build LAMMPS with the VORONOI package you must have previously
installed the Voro++ library on your system. The lib/voronoi/README
file explains how to download and install Voro++. There is a
lib/voronoi/install.py script which automates the process. Type
"python install.py" to see instructions. The final step is to create
soft links in the lib/voronoi directory for "includelink" and
"liblink" which point to installed Voro++ directories. Building with
the VORONOI package uses the contents of the
lib/voronoi/Makefile.lammps file in the compile/link process. You
should not need to edit this file. Note that the Make.py script has a
"-voronoi" option to allow the Voro++ library to be downloaded and/or
installed and LAMMPS to be built in one step. Type "python
src/Make.py -h -voronoi" to see the details.
To install via make or Make.py:
cd ~/lammps/lib/voronoi
python install.py -g -b -l # download Voro++, build in lib/voronoi, create links
cd ~/lammps/src
make yes-voronoi
make machine :pre
Make.py -p voronoi -voronoi install="-g -b -l" -a machine :pre
To un-install via make or Make.py:
make no-voronoi
make machine :pre
Make.py -p ^voronoi -a machine :pre
Supporting info: src/VORONOI/README, lib/voronoi/README, "compute
voronoi/atom"_compute_voronoi_atom.html, examples/voronoi
:line
4.2 User packages :h4,link(pkg_2)
The current list of user-contributed packages is as follows:
Package, Description, Author(s), Doc page, Example, Pic/movie, Library
"USER-ATC"_#USER-ATC, atom-to-continuum coupling, Jones & Templeton & Zimmerman (1), "fix atc"_fix_atc.html, USER/atc, "atc"_atc, lib/atc
"USER-AWPMD"_#USER-AWPMD, wave-packet MD, Ilya Valuev (JIHT), "pair_style awpmd/cut"_pair_awpmd.html, USER/awpmd, -, lib/awpmd
"USER-CG-CMM"_#USER-CG-CMM, coarse-graining model, Axel Kohlmeyer (Temple U), "pair_style lj/sdk"_pair_sdk.html, USER/cg-cmm, "cg"_cg, -
"USER-COLVARS"_#USER-COLVARS, collective variables, Fiorin & Henin & Kohlmeyer (2), "fix colvars"_fix_colvars.html, USER/colvars, "colvars"_colvars, lib/colvars
"USER-DIFFRACTION"_#USER-DIFFRACTION, virutal x-ray and electron diffraction, Shawn Coleman (ARL),"compute xrd"_compute_xrd.html, USER/diffraction, -, -
"USER-DPD"_#USER-DPD, reactive dissipative particle dynamics (DPD), Larentzos & Mattox & Brennan (5), src/USER-DPD/README, USER/dpd, -, -
"USER-DRUDE"_#USER-DRUDE, Drude oscillators, Dequidt & Devemy & Padua (3), "tutorial"_tutorial_drude.html, USER/drude, -, -
"USER-EFF"_#USER-EFF, electron force field, Andres Jaramillo-Botero (Caltech), "pair_style eff/cut"_pair_eff.html, USER/eff, "eff"_eff, -
"USER-FEP"_#USER-FEP, free energy perturbation, Agilio Padua (U Blaise Pascal Clermont-Ferrand), "compute fep"_compute_fep.html, USER/fep, -, -
"USER-H5MD"_#USER-H5MD, dump output via HDF5, Pierre de Buyl (KU Leuven), "dump h5md"_dump_h5md.html, -, -, lib/h5md
"USER-INTEL"_#USER-INTEL, Vectorized CPU and Intel(R) coprocessor styles, W. Michael Brown (Intel), "Section 5.3.2"_accelerate_intel.html, examples/intel, -, -
"USER-LB"_#USER-LB, Lattice Boltzmann fluid, Colin Denniston (U Western Ontario), "fix lb/fluid"_fix_lb_fluid.html, USER/lb, -, -
"USER-MGPT"_#USER-MGPT, fast MGPT multi-ion potentials, Tomas Oppelstrup & John Moriarty (LLNL), "pair_style mgpt"_pair_mgpt.html, USER/mgpt, -, -
"USER-MISC"_#USER-MISC, single-file contributions, USER-MISC/README, USER-MISC/README, -, -, -
"USER-MANIFOLD"_#USER-MANIFOLD, motion on 2d surface, Stefan Paquay (Eindhoven U of Technology), "fix manifoldforce"_fix_manifoldforce.html, USER/manifold, "manifold"_manifold, -
"USER-MOLFILE"_#USER-MOLFILE, "VMD"_VMD molfile plug-ins, Axel Kohlmeyer (Temple U), "dump molfile"_dump_molfile.html, -, -, VMD-MOLFILE
-"USER-NC-DUMP"_#USER-NC-DUMP, dump output via NetCDF, Lars Pastewka (Karlsruhe Institute of Technology, KIT), "dump nc, dump nc/mpiio"_dump_nc.html, -, -, lib/netcdf
+"USER-NC-DUMP"_#USER-NC-DUMP, dump output via NetCDF, Lars Pastewka (Karlsruhe Institute of Technology, KIT), "dump nc / dump nc/mpiio"_dump_nc.html, -, -, lib/netcdf
"USER-OMP"_#USER-OMP, OpenMP threaded styles, Axel Kohlmeyer (Temple U), "Section 5.3.4"_accelerate_omp.html, -, -, -
"USER-PHONON"_#USER-PHONON, phonon dynamical matrix, Ling-Ti Kong (Shanghai Jiao Tong U), "fix phonon"_fix_phonon.html, USER/phonon, -, -
"USER-QMMM"_#USER-QMMM, QM/MM coupling, Axel Kohlmeyer (Temple U), "fix qmmm"_fix_qmmm.html, USER/qmmm, -, lib/qmmm
"USER-QTB"_#USER-QTB, quantum nuclear effects, Yuan Shen (Stanford), "fix qtb"_fix_qtb.html "fix qbmsst"_fix_qbmsst.html, qtb, -, -
"USER-QUIP"_#USER-QUIP, QUIP/libatoms interface, Albert Bartok-Partay (U Cambridge), "pair_style quip"_pair_quip.html, USER/quip, -, lib/quip
"USER-REAXC"_#USER-REAXC, C version of ReaxFF, Metin Aktulga (LBNL), "pair_style reaxc"_pair_reax_c.html, reax, -, -
"USER-SMD"_#USER-SMD, smoothed Mach dynamics, Georg Ganzenmuller (EMI), "SMD User Guide"_PDF/SMD_LAMMPS_userguide.pdf, USER/smd, -, -
"USER-SMTBQ"_#USER-SMTBQ, Second Moment Tight Binding - QEq potential, Salles & Maras & Politano & Tetot (4), "pair_style smtbq"_pair_smtbq.html, USER/smtbq, -, -
"USER-SPH"_#USER-SPH, smoothed particle hydrodynamics, Georg Ganzenmuller (EMI), "SPH User Guide"_PDF/SPH_LAMMPS_userguide.pdf, USER/sph, "sph"_sph, -
"USER-TALLY"_#USER-TALLY, Pairwise tallied computes, Axel Kohlmeyer (Temple U), "compute XXX/tally"_compute_tally.html, USER/tally, -, -
"USER-VTK"_#USER-VTK, VTK-style dumps, Berger and Queteschiner (6), "compute custom/vtk"_dump_custom_vtk.html, -, -, lib/vtk
:tb(ea=c)
:link(atc,http://lammps.sandia.gov/pictures.html#atc)
:link(cg,http://lammps.sandia.gov/pictures.html#cg)
:link(eff,http://lammps.sandia.gov/movies.html#eff)
:link(manifold,http://lammps.sandia.gov/movies.html#manifold)
:link(sph,http://lammps.sandia.gov/movies.html#sph)
:link(VMD,http://www.ks.uiuc.edu/Research/vmd)
The "Authors" column lists a name(s) if a specific person is
responible for creating and maintaining the package.
(1) The ATC package was created by Reese Jones, Jeremy Templeton, and
Jon Zimmerman (Sandia).
(2) The COLVARS package was created by Axel Kohlmeyer (Temple U) using
the colvars module library written by Giacomo Fiorin (Temple U) and
Jerome Henin (LISM, Marseille, France).
(3) The DRUDE package was created by Alain Dequidt (U Blaise Pascal
Clermont-Ferrand) and co-authors Julien Devemy (CNRS) and Agilio Padua
(U Blaise Pascal).
(4) The SMTBQ package was created by Nicolas Salles, Emile Maras,
Olivier Politano, and Robert Tetot (LAAS-CNRS, France).
(5) The USER-DPD package was created by James Larentzos (ARL), Timothy
Mattox (Engility), and John Brennan (ARL).
(6) The USER-VTK package was created by Richard Berger (JKU) and
Daniel Queteschiner (DCS Computing).
The "Doc page" column links to either a sub-section of the
"Section 6"_Section_howto.html of the manual, or an input script
command implemented as part of the package, or to additional
documentation provided within the package.
The "Example" column is a sub-directory in the examples directory of
the distribution which has an input script that uses the package.
E.g. "peptide" refers to the examples/peptide directory.
The "Library" column lists an external library which must be built
first and which LAMMPS links to when it is built. If it is listed as
lib/package, then the code for the library is under the lib directory
of the LAMMPS distribution. See the lib/package/README file for info
on how to build the library. If it is not listed as lib/package, then
it is a third-party library not included in the LAMMPS distribution.
See details on all of this below for individual packages.
:line
USER-ATC package :link(USER-ATC),h5
Contents: ATC stands for atoms-to-continuum. This package implements
a "fix atc"_fix_atc.html command to either couple MD with continuum
finite element equations or perform on-the-fly post-processing of
atomic information to continuum fields. See src/USER-ATC/README for
more details.
To build LAMMPS with this package ...
To install via make or Make.py:
make yes-user-atc
make machine :pre
Make.py -p atc -a machine :pre
To un-install via make or Make.py:
make no-user-atc
make machine :pre
Make.py -p ^atc -a machine :pre
Supporting info:src/USER-ATC/README, "fix atc"_fix_atc.html,
examples/USER/atc
Authors: Reese Jones (rjones at sandia.gov), Jeremy Templeton (jatempl
at sandia.gov) and Jon Zimmerman (jzimmer at sandia.gov) at Sandia.
Contact them directly if you have questions.
:line
USER-AWPMD package :link(USER-AWPMD),h5
Contents: AWPMD stands for Antisymmetrized Wave Packet Molecular
Dynamics. This package implements an atom, pair, and fix style which
allows electrons to be treated as explicit particles in an MD
calculation. See src/USER-AWPMD/README for more details.
To build LAMMPS with this package ...
Supporting info: src/USER-AWPMD/README, "fix
awpmd/cut"_pair_awpmd.html, examples/USER/awpmd
Author: Ilya Valuev at the JIHT in Russia (valuev at
physik.hu-berlin.de). Contact him directly if you have questions.
:line
USER-CG-CMM package :link(USER-CG-CMM),h5
Contents: CG-CMM stands for coarse-grained ??. This package
implements several pair styles and an angle style using the coarse
grained parametrization of Shinoda, DeVane, Klein, Mol Sim, 33, 27
(2007) (SDK), with extensions to simulate ionic liquids, electrolytes,
lipids and charged amino acids. See src/USER-CG-CMM/README for more
details.
Supporting info: src/USER-CG-CMM/README, "pair lj/sdk"_pair_sdk.html,
"pair lj/sdk/coul/long"_pair_sdk.html, "angle sdk"_angle_sdk.html,
examples/USER/cg-cmm
Author: Axel Kohlmeyer at Temple U (akohlmey at gmail.com). Contact
him directly if you have questions.
:line
USER-COLVARS package :link(USER-COLVARS),h5
Contents: COLVARS stands for collective variables which can be used to
implement Adaptive Biasing Force, Metadynamics, Steered MD, Umbrella
Sampling and Restraints. This package implements a "fix
colvars"_fix_colvars.html command which wraps a COLVARS library which
can perform those kinds of simulations. See src/USER-COLVARS/README
for more details.
Supporting info:
"doc/PDF/colvars-refman-lammps.pdf"_PDF/colvars-refman-lammps.pdf,
src/USER-COLVARS/README, lib/colvars/README, "fix
colvars"_fix_colvars.html, examples/USER/colvars
Authors: Axel Kohlmeyer at Temple U (akohlmey at gmail.com) wrote the
fix. The COLVARS library itself is written and maintained by Giacomo
Fiorin (ICMS, Temple University, Philadelphia, PA, USA) and Jerome
Henin (LISM, CNRS, Marseille, France). Contact them directly if you
have questions.
:line
USER-DIFFRACTION package :link(USER-DIFFRACTION),h5
Contents: This packages implements two computes and a fix for
calculating x-ray and electron diffraction intensities based on
kinematic diffraction theory. See src/USER-DIFFRACTION/README for
more details.
Supporting info: "compute saed"_compute_saed.html, "compute
xrd"_compute_xrd.html, "fix saed/vtk"_fix_saed_vtk.html,
examples/USER/diffraction
Author: Shawn P. Coleman (shawn.p.coleman8.ctr at mail.mil) while at
the University of Arkansas. Contact him directly if you have
questions.
:line
USER-DPD package :link(USER-DPD),h5
Contents: DPD stands for dissipative particle dynamics, This package
implements DPD for isothermal, isoenergetic, isobaric and isenthalpic
conditions. It also has extensions for performing reactive DPD, where
each particle has internal state for multiple species and a coupled
set of chemical reaction ODEs are integrated each timestep. The DPD
equations of motion are integrated efficiently through the Shardlow
splitting algorithm. See src/USER-DPD/README for more details.
Supporting info: /src/USER-DPD/README, "compute dpd"_compute_dpd.html
"compute dpd/atom"_compute_dpd_atom.html
"fix eos/cv"_fix_eos_table.html "fix eos/table"_fix_eos_table.html
"fix eos/table/rx"_fix_eos_table_rx.html "fix shardlow"_fix_shardlow.html
"fix rx"_fix_rx.html "pair table/rx"_pair_table_rx.html
"pair dpd/fdt"_pair_dpd_fdt.html "pair dpd/fdt/energy"_pair_dpd_fdt.html
"pair exp6/rx"_pair_exp6_rx.html "pair multi/lucy"_pair_multi_lucy.html
"pair multi/lucy/rx"_pair_multi_lucy_rx.html, examples/USER/dpd
Authors: James Larentzos (ARL) (james.p.larentzos.civ at mail.mil),
Timothy Mattox (Engility Corp) (Timothy.Mattox at engilitycorp.com)
and John Brennan (ARL) (john.k.brennan.civ at mail.mil). Contact them
directly if you have questions.
:line
USER-DRUDE package :link(USER-DRUDE),h5
Contents: This package contains methods for simulating polarizable
systems using thermalized Drude oscillators. It has computes, fixes,
and pair styles for this purpose. See "Section
6.27"_Section_howto.html#howto_27 for an overview of how to use the
package. See src/USER-DRUDE/README for additional details. There are
auxiliary tools for using this package in tools/drude.
Supporting info: "Section 6.27"_Section_howto.html#howto_27,
src/USER-DRUDE/README, "fix drude"_fix_drude.html, "fix
drude/transform/*"_fix_drude_transform.html, "compute
temp/drude"_compute_temp_drude.html, "pair thole"_pair_thole.html,
"pair lj/cut/thole/long"_pair_thole.html, examples/USER/drude,
tools/drude
Authors: Alain Dequidt at Universite Blaise Pascal Clermont-Ferrand
(alain.dequidt at univ-bpclermont.fr); co-authors: Julien Devemy,
Agilio Padua. Contact them directly if you have questions.
:line
USER-EFF package :link(USER-EFF),h5
Contents: EFF stands for electron force field. This package contains
atom, pair, fix and compute styles which implement the eFF as
described in A. Jaramillo-Botero, J. Su, Q. An, and W.A. Goddard III,
JCC, 2010. The eFF potential was first introduced by Su and Goddard,
in 2007. See src/USER-EFF/README for more details. There are
auxiliary tools for using this package in tools/eff; see its README
file.
Supporting info:
Author: Andres Jaramillo-Botero at CalTech (ajaramil at
wag.caltech.edu). Contact him directly if you have questions.
:line
USER-FEP package :link(USER-FEP),h5
Contents: FEP stands for free energy perturbation. This package
provides methods for performing FEP simulations by using a "fix
adapt/fep"_fix_adapt_fep.html command with soft-core pair potentials,
which have a "soft" in their style name. See src/USER-FEP/README for
more details. There are auxiliary tools for using this package in
tools/fep; see its README file.
Supporting info: src/USER-FEP/README, "fix
adapt/fep"_fix_adapt_fep.html, "compute fep"_compute_fep.html,
"pair_style */soft"_pair_lj_soft.html, examples/USER/fep
Author: Agilio Padua at Universite Blaise Pascal Clermont-Ferrand
(agilio.padua at univ-bpclermont.fr). Contact him directly if you have
questions.
:line
USER-H5MD package :link(USER-H5MD),h5
Contents: H5MD stands for HDF5 for MD. "HDF5"_HDF5 is a binary,
portable, self-describing file format, used by many scientific
simulations. H5MD is a format for molecular simulations, built on top
of HDF5. This package implements a "dump h5md"_dump_h5md.html command
to output LAMMPS snapshots in this format. See src/USER-H5MD/README
for more details.
:link(HDF5,http://www.hdfgroup.org/HDF5/)
Supporting info: src/USER-H5MD/README, lib/h5md/README, "dump
h5md"_dump_h5md.html
Author: Pierre de Buyl at KU Leuven (see http://pdebuyl.be) created
this package as well as the H5MD format and library. Contact him
directly if you have questions.
:line
USER-INTEL package :link(USER-INTEL),h5
Contents: Dozens of pair, bond, angle, dihedral, and improper styles
that are optimized for Intel CPUs and the Intel Xeon Phi (in offload
mode). All of them have an "intel" in their style name. "Section
5.3.2"_accelerate_intel.html gives details of what hardware
and compilers are required on your system, and how to build and use
this package. Also see src/USER-INTEL/README for more details. See
the KOKKOS, OPT, and USER-OMP packages, which also have CPU and
Phi-enabled styles.
Supporting info: examples/accelerate, src/USER-INTEL/TEST
"Section 5.3"_Section_accelerate.html#acc_3
Author: Mike Brown at Intel (michael.w.brown at intel.com). Contact
him directly if you have questions.
For the USER-INTEL package, you have 2 choices when building. You can
build with CPU or Phi support. The latter uses Xeon Phi chips in
"offload" mode. Each of these modes requires additional settings in
your Makefile.machine for CCFLAGS and LINKFLAGS.
For CPU mode (if using an Intel compiler):
CCFLAGS: add -fopenmp, -DLAMMPS_MEMALIGN=64, -restrict, -xHost, -fno-alias, -ansi-alias, -override-limits
LINKFLAGS: add -fopenmp :ul
For Phi mode add the following in addition to the CPU mode flags:
CCFLAGS: add -DLMP_INTEL_OFFLOAD and
LINKFLAGS: add -offload :ul
And also add this to CCFLAGS:
-offload-option,mic,compiler,"-fp-model fast=2 -mGLOB_default_function_attrs=\"gather_scatter_loop_unroll=4\"" :pre
Examples:
:line
USER-LB package :link(USER-LB),h5
Supporting info:
This package contains a LAMMPS implementation of a background
Lattice-Boltzmann fluid, which can be used to model MD particles
influenced by hydrodynamic forces.
See this doc page and its related commands to get started:
"fix lb/fluid"_fix_lb_fluid.html
The people who created this package are Frances Mackay (fmackay at
uwo.ca) and Colin (cdennist at uwo.ca) Denniston, University of
Western Ontario. Contact them directly if you have questions.
Examples: examples/USER/lb
:line
USER-MGPT package :link(USER-MGPT),h5
Supporting info:
This package contains a fast implementation for LAMMPS of
quantum-based MGPT multi-ion potentials. The MGPT or model GPT method
derives from first-principles DFT-based generalized pseudopotential
theory (GPT) through a series of systematic approximations valid for
mid-period transition metals with nearly half-filled d bands. The
MGPT method was originally developed by John Moriarty at Lawrence
Livermore National Lab (LLNL).
In the general matrix representation of MGPT, which can also be
applied to f-band actinide metals, the multi-ion potentials are
evaluated on the fly during a simulation through d- or f-state matrix
multiplication, and the forces that move the ions are determined
analytically. The {mgpt} pair style in this package calculates forces
and energies using an optimized matrix-MGPT algorithm due to Tomas
Oppelstrup at LLNL.
See this doc page to get started:
"pair_style mgpt"_pair_mgpt.html
The persons who created the USER-MGPT package are Tomas Oppelstrup
(oppelstrup2@llnl.gov) and John Moriarty (moriarty2@llnl.gov)
Contact them directly if you have any questions.
Examples: examples/USER/mgpt
:line
USER-MISC package :link(USER-MISC),h5
Supporting info:
The files in this package are a potpourri of (mostly) unrelated
features contributed to LAMMPS by users. Each feature is a single
pair of files (*.cpp and *.h).
More information about each feature can be found by reading its doc
page in the LAMMPS doc directory. The doc page which lists all LAMMPS
input script commands is as follows:
"Section 3.5"_Section_commands.html#cmd_5
User-contributed features are listed at the bottom of the fix,
compute, pair, etc sections.
The list of features and author of each is given in the
src/USER-MISC/README file.
You should contact the author directly if you have specific questions
about the feature or its coding.
Examples: examples/USER/misc
:line
USER-MANIFOLD package :link(USER-MANIFOLD),h5
Supporting info:
This package contains a dump molfile command which uses molfile
plugins that are bundled with the
"VMD"_http://www.ks.uiuc.edu/Research/vmd molecular visualization and
analysis program, to enable LAMMPS to dump its information in formats
compatible with various molecular simulation tools.
This package allows LAMMPS to perform MD simulations of particles
constrained on a manifold (i.e., a 2D subspace of the 3D simulation
box). It achieves this using the RATTLE constraint algorithm applied
to single-particle constraint functions g(xi,yi,zi) = 0 and their
derivative (i.e. the normal of the manifold) n = grad(g).
See this doc page to get started:
"fix manifoldforce"_fix_manifoldforce.html
The person who created this package is Stefan Paquay, at the Eindhoven
University of Technology (TU/e), The Netherlands (s.paquay at tue.nl).
Contact him directly if you have questions.
:line
USER-MOLFILE package :link(USER-MOLFILE),h5
Supporting info:
This package contains a dump molfile command which uses molfile
plugins that are bundled with the
"VMD"_http://www.ks.uiuc.edu/Research/vmd molecular visualization and
analysis program, to enable LAMMPS to dump its information in formats
compatible with various molecular simulation tools.
The package only provides the interface code, not the plugins. These
can be obtained from a VMD installation which has to match the
platform that you are using to compile LAMMPS for. By adding plugins
to VMD, support for new file formats can be added to LAMMPS (or VMD or
other programs that use them) without having to recompile the
application itself.
See this doc page to get started:
"dump molfile"_dump_molfile.html
The person who created this package is Axel Kohlmeyer at Temple U
(akohlmey at gmail.com). Contact him directly if you have questions.
:line
USER-NC-DUMP package :link(USER-NC-DUMP),h5
Contents: Dump styles for writing NetCDF format files. NetCDF is a binary,
portable, self-describing file format on top of HDF5. The file format
contents follow the AMBER NetCDF trajectory conventions
(http://ambermd.org/netcdf/nctraj.xhtml), but include extensions to this
convention. This package implements a "dump nc"_dump_nc.html command
and a "dump nc/mpiio"_dump_nc.html command to output LAMMPS snapshots
in this format. See src/USER-NC-DUMP/README for more details.
NetCDF files can be directly visualized with the following tools:
+
Ovito (http://www.ovito.org/). Ovito supports the AMBER convention
- and all of the above extensions. :ulb,l
+and all of the above extensions. :ulb,l
VMD (http://www.ks.uiuc.edu/Research/vmd/) :l
AtomEye (http://www.libatoms.org/). The libAtoms version of AtomEye contains
- a NetCDF reader that is not present in the standard distribution of AtomEye :l,ule
+a NetCDF reader that is not present in the standard distribution of AtomEye :l,ule
The person who created these files is Lars Pastewka at
Karlsruhe Institute of Technology (lars.pastewka at kit.edu).
Contact him directly if you have questions.
:line
USER-OMP package :link(USER-OMP),h5
Supporting info:
This package provides OpenMP multi-threading support and
other optimizations of various LAMMPS pair styles, dihedral
styles, and fix styles.
See this section of the manual to get started:
"Section 5.3"_Section_accelerate.html#acc_3
The person who created this package is Axel Kohlmeyer at Temple U
(akohlmey at gmail.com). Contact him directly if you have questions.
For the USER-OMP package, your Makefile.machine needs additional
settings for CCFLAGS and LINKFLAGS.
CCFLAGS: add -fopenmp and -restrict
LINKFLAGS: add -fopenmp :ul
Examples: examples/accelerate, bench/KEPLER
:line
USER-PHONON package :link(USER-PHONON),h5
This package contains a fix phonon command that calculates dynamical
matrices, which can then be used to compute phonon dispersion
relations, directly from molecular dynamics simulations.
See this doc page to get started:
"fix phonon"_fix_phonon.html
The person who created this package is Ling-Ti Kong (konglt at
sjtu.edu.cn) at Shanghai Jiao Tong University. Contact him directly
if you have questions.
Examples: examples/USER/phonon
:line
USER-QMMM package :link(USER-QMMM),h5
Supporting info:
This package provides a fix qmmm command which allows LAMMPS to be
used in a QM/MM simulation, currently only in combination with pw.x
code from the "Quantum ESPRESSO"_espresso package.
:link(espresso,http://www.quantum-espresso.org)
The current implementation only supports an ONIOM style mechanical
coupling to the Quantum ESPRESSO plane wave DFT package.
Electrostatic coupling is in preparation and the interface has been
written in a manner that coupling to other QM codes should be possible
without changes to LAMMPS itself.
See this doc page to get started:
"fix qmmm"_fix_qmmm.html
as well as the lib/qmmm/README file.
The person who created this package is Axel Kohlmeyer at Temple U
(akohlmey at gmail.com). Contact him directly if you have questions.
:line
USER-QTB package :link(USER-QTB),h5
Supporting info:
This package provides a self-consistent quantum treatment of the
vibrational modes in a classical molecular dynamics simulation. By
coupling the MD simulation to a colored thermostat, it introduces zero
point energy into the system, alter the energy power spectrum and the
heat capacity towards their quantum nature. This package could be of
interest if one wants to model systems at temperatures lower than
their classical limits or when temperatures ramp up across the
classical limits in the simulation.
See these two doc pages to get started:
"fix qtb"_fix_qtb.html provides quantum nulcear correction through a
colored thermostat and can be used with other time integration schemes
like "fix nve"_fix_nve.html or "fix nph"_fix_nh.html.
"fix qbmsst"_fix_qbmsst.html enables quantum nuclear correction of a
multi-scale shock technique simulation by coupling the quantum thermal
bath with the shocked system.
The person who created this package is Yuan Shen (sy0302 at
stanford.edu) at Stanford University. Contact him directly if you
have questions.
Examples: examples/USER/qtb
:line
USER-QUIP package :link(USER-QUIP),h5
Supporting info:
Examples: examples/USER/quip
:line
USER-REAXC package :link(USER-REAXC),h5
Supporting info:
This package contains a implementation for LAMMPS of the ReaxFF force
field. ReaxFF uses distance-dependent bond-order functions to
represent the contributions of chemical bonding to the potential
energy. It was originally developed by Adri van Duin and the Goddard
group at CalTech.
The USER-REAXC version of ReaxFF (pair_style reax/c), implemented in
C, should give identical or very similar results to pair_style reax,
which is a ReaxFF implementation on top of a Fortran library, a
version of which library was originally authored by Adri van Duin.
The reax/c version should be somewhat faster and more scalable,
particularly with respect to the charge equilibration calculation. It
should also be easier to build and use since there are no complicating
issues with Fortran memory allocation or linking to a Fortran library.
For technical details about this implemention of ReaxFF, see
this paper:
Parallel and Scalable Reactive Molecular Dynamics: Numerical Methods
and Algorithmic Techniques, H. M. Aktulga, J. C. Fogarty,
S. A. Pandit, A. Y. Grama, Parallel Computing, in press (2011).
See the doc page for the pair_style reax/c command for details
of how to use it in LAMMPS.
The person who created this package is Hasan Metin Aktulga (hmaktulga
at lbl.gov), while at Purdue University. Contact him directly, or
Aidan Thompson at Sandia (athomps at sandia.gov), if you have
questions.
Examples: examples/reax
:line
USER-SMD package :link(USER-SMD),h5
Supporting info:
This package implements smoothed Mach dynamics (SMD) in
LAMMPS. Currently, the package has the following features:
* Does liquids via traditional Smooth Particle Hydrodynamics (SPH)
* Also solves solids mechanics problems via a state of the art
stabilized meshless method with hourglass control.
* Can specify hydrostatic interactions independently from material
strength models, i.e. pressure and deviatoric stresses are separated.
* Many material models available (Johnson-Cook, plasticity with
hardening, Mie-Grueneisen, Polynomial EOS). Easy to add new
material models.
* Rigid boundary conditions (walls) can be loaded as surface geometries
from *.STL files.
See the file doc/PDF/SMD_LAMMPS_userguide.pdf to get started.
There are example scripts for using this package in examples/USER/smd.
The person who created this package is Georg Ganzenmuller at the
Fraunhofer-Institute for High-Speed Dynamics, Ernst Mach Institute in
Germany (georg.ganzenmueller at emi.fhg.de). Contact him directly if
you have questions.
Examples: examples/USER/smd
:line
USER-SMTBQ package :link(USER-SMTBQ),h5
Supporting info:
This package implements the Second Moment Tight Binding - QEq (SMTB-Q)
potential for the description of ionocovalent bonds in oxides.
There are example scripts for using this package in
examples/USER/smtbq.
See this doc page to get started:
"pair_style smtbq"_pair_smtbq.html
The persons who created the USER-SMTBQ package are Nicolas Salles,
Emile Maras, Olivier Politano, Robert Tetot, who can be contacted at
these email addreses: lammps@u-bourgogne.fr, nsalles@laas.fr. Contact
them directly if you have any questions.
Examples: examples/USER/smtbq
:line
USER-SPH package :link(USER-SPH),h5
Supporting info:
This package implements smoothed particle hydrodynamics (SPH) in
LAMMPS. Currently, the package has the following features:
* Tait, ideal gas, Lennard-Jones equation of states, full support for
complete (i.e. internal-energy dependent) equations of state
* Plain or Monaghans XSPH integration of the equations of motion
* Density continuity or density summation to propagate the density field
* Commands to set internal energy and density of particles from the
input script
* Output commands to access internal energy and density for dumping and
thermo output
See the file doc/PDF/SPH_LAMMPS_userguide.pdf to get started.
There are example scripts for using this package in examples/USER/sph.
The person who created this package is Georg Ganzenmuller at the
Fraunhofer-Institute for High-Speed Dynamics, Ernst Mach Institute in
Germany (georg.ganzenmueller at emi.fhg.de). Contact him directly if
you have questions.
Examples: examples/USER/sph
:line
USER-TALLY package :link(USER-TALLY),h5
Supporting info:
Examples: examples/USER/tally
:line
USER-VTK package :link(USER-VTK),h5
diff --git a/doc/src/Section_start.txt b/doc/src/Section_start.txt
index ee122e0a7..da693a1b9 100644
--- a/doc/src/Section_start.txt
+++ b/doc/src/Section_start.txt
@@ -1,1907 +1,1907 @@
"Previous Section"_Section_intro.html - "LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc - "Next Section"_Section_commands.html :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
2. Getting Started :h3
This section describes how to build and run LAMMPS, for both new and
experienced users.
2.1 "What's in the LAMMPS distribution"_#start_1
2.2 "Making LAMMPS"_#start_2
2.3 "Making LAMMPS with optional packages"_#start_3
2.4 "Building LAMMPS via the Make.py script"_#start_4
2.5 "Building LAMMPS as a library"_#start_5
2.6 "Running LAMMPS"_#start_6
2.7 "Command-line options"_#start_7
2.8 "Screen output"_#start_8
2.9 "Tips for users of previous versions"_#start_9 :all(b)
:line
2.1 What's in the LAMMPS distribution :h4,link(start_1)
When you download a LAMMPS tarball you will need to unzip and untar
the downloaded file with the following commands, after placing the
tarball in an appropriate directory.
tar -xzvf lammps*.tar.gz :pre
This will create a LAMMPS directory containing two files and several
sub-directories:
README: text file
LICENSE: the GNU General Public License (GPL)
bench: benchmark problems
doc: documentation
examples: simple test problems
potentials: embedded atom method (EAM) potential files
src: source files
tools: pre- and post-processing tools :tb(s=:)
Note that the "download page"_download also has links to download
pre-build Windows installers, as well as pre-built packages for
several widely used Linux distributions. It also has instructions
for how to download/install LAMMPS for Macs (via Homebrew), and to
download and update LAMMPS from SVN and Git repositories, which gives
you access to the up-to-date sources that are used by the LAMMPS
core developers.
:link(download,http://lammps.sandia.gov/download.html)
The Windows and Linux packages for serial or parallel include
only selected packages and bug-fixes/upgrades listed on "this
page"_http://lammps.sandia.gov/bug.html up to a certain date, as
stated on the download page. If you want an executable with
non-included packages or that is more current, then you'll need to
build LAMMPS yourself, as discussed in the next section.
Skip to the "Running LAMMPS"_#start_6 sections for info on how to
launch a LAMMPS Windows executable on a Windows box.
:line
2.2 Making LAMMPS :h4,link(start_2)
This section has the following sub-sections:
2.2.1 "Read this first"_#start_2_1
2.2.1 "Steps to build a LAMMPS executable"_#start_2_2
2.2.3 "Common errors that can occur when making LAMMPS"_#start_2_3
2.2.4 "Additional build tips"_#start_2_4
2.2.5 "Building for a Mac"_#start_2_5
2.2.6 "Building for Windows"_#start_2_6 :all(b)
:line
Read this first :h5,link(start_2_1)
If you want to avoid building LAMMPS yourself, read the preceeding
section about options available for downloading and installing
executables. Details are discussed on the "download"_download page.
Building LAMMPS can be simple or not-so-simple. If all you need are
the default packages installed in LAMMPS, and MPI is already installed
on your machine, or you just want to run LAMMPS in serial, then you
can typically use the Makefile.mpi or Makefile.serial files in
src/MAKE by typing one of these lines (from the src dir):
make mpi
make serial :pre
Note that on a facility supercomputer, there are often "modules"
loaded in your environment that provide the compilers and MPI you
should use. In this case, the "mpicxx" compile/link command in
Makefile.mpi should just work by accessing those modules.
It may be the case that one of the other Makefile.machine files in the
src/MAKE sub-directories is a better match to your system (type "make"
to see a list), you can use it as-is by typing (for example):
make stampede :pre
If any of these builds (with an existing Makefile.machine) works on
your system, then you're done!
If you want to do one of the following:
use optional LAMMPS features that require additional libraries
use optional packages that require additional libraries
use optional accelerator packages that require special compiler/linker settings
run on a specialized platform that has its own compilers, settings, or other libs to use :ul
then building LAMMPS is more complicated. You may need to find where
auxiliary libraries exist on your machine or install them if they
don't. You may need to build additional libraries that are part of
the LAMMPS package, before building LAMMPS. You may need to edit a
Makefile.machine file to make it compatible with your system.
Note that there is a Make.py tool in the src directory that automates
several of these steps, but you still have to know what you are doing.
"Section 2.4"_#start_4 below describes the tool. It is a convenient
way to work with installing/un-installing various packages, the
Makefile.machine changes required by some packages, and the auxiliary
libraries some of them use.
Please read the following sections carefully. If you are not
comfortable with makefiles, or building codes on a Unix platform, or
running an MPI job on your machine, please find a local expert to help
you. Many compilation, linking, and run problems that users have are
often not really LAMMPS issues - they are peculiar to the user's
system, compilers, libraries, etc. Such questions are better answered
by a local expert.
If you have a build problem that you are convinced is a LAMMPS issue
(e.g. the compiler complains about a line of LAMMPS source code), then
please post the issue to the "LAMMPS mail
list"_http://lammps.sandia.gov/mail.html.
If you succeed in building LAMMPS on a new kind of machine, for which
there isn't a similar machine Makefile included in the
src/MAKE/MACHINES directory, then send it to the developers and we can
include it in the LAMMPS distribution.
:line
Steps to build a LAMMPS executable :h5,link(start_2_2)
Step 0 :h6
The src directory contains the C++ source and header files for LAMMPS.
It also contains a top-level Makefile and a MAKE sub-directory with
low-level Makefile.* files for many systems and machines. See the
src/MAKE/README file for a quick overview of what files are available
and what sub-directories they are in.
The src/MAKE dir has a few files that should work as-is on many
platforms. The src/MAKE/OPTIONS dir has more that invoke additional
compiler, MPI, and other setting options commonly used by LAMMPS, to
illustrate their syntax. The src/MAKE/MACHINES dir has many more that
have been tweaked or optimized for specific machines. These files are
all good starting points if you find you need to change them for your
machine. Put any file you edit into the src/MAKE/MINE directory and
it will be never be touched by any LAMMPS updates.
>From within the src directory, type "make" or "gmake". You should see
a list of available choices from src/MAKE and all of its
sub-directories. If one of those has the options you want or is the
machine you want, you can type a command like:
make mpi :pre
or
make serial :pre
or
gmake mac :pre
Note that the corresponding Makefile.machine can exist in src/MAKE or
any of its sub-directories. If a file with the same name appears in
multiple places (not a good idea), the order they are used is as
follows: src/MAKE/MINE, src/MAKE, src/MAKE/OPTIONS, src/MAKE/MACHINES.
This gives preference to a file you have created/edited and put in
src/MAKE/MINE.
Note that on a multi-processor or multi-core platform you can launch a
parallel make, by using the "-j" switch with the make command, which
will build LAMMPS more quickly.
If you get no errors and an executable like [lmp_mpi] or [lmp_serial]
or [lmp_mac] is produced, then you're done; it's your lucky day.
Note that by default only a few of LAMMPS optional packages are
installed. To build LAMMPS with optional packages, see "this
section"_#start_3 below.
Step 1 :h6
If Step 0 did not work, you will need to create a low-level Makefile
for your machine, like Makefile.foo. You should make a copy of an
existing Makefile.* in src/MAKE or one of its sub-directories as a
starting point. The only portions of the file you need to edit are
the first line, the "compiler/linker settings" section, and the
"LAMMPS-specific settings" section. When it works, put the edited
file in src/MAKE/MINE and it will not be altered by any future LAMMPS
updates.
Step 2 :h6
Change the first line of Makefile.foo to list the word "foo" after the
"#", and whatever other options it will set. This is the line you
will see if you just type "make".
Step 3 :h6
The "compiler/linker settings" section lists compiler and linker
settings for your C++ compiler, including optimization flags. You can
use g++, the open-source GNU compiler, which is available on all Unix
systems. You can also use mpicxx which will typically be available if
MPI is installed on your system, though you should check which actual
compiler it wraps. Vendor compilers often produce faster code. On
boxes with Intel CPUs, we suggest using the Intel icc compiler, which
can be downloaded from "Intel's compiler site"_intel.
:link(intel,http://www.intel.com/software/products/noncom)
If building a C++ code on your machine requires additional libraries,
then you should list them as part of the LIB variable. You should
not need to do this if you use mpicxx.
The DEPFLAGS setting is what triggers the C++ compiler to create a
dependency list for a source file. This speeds re-compilation when
source (*.cpp) or header (*.h) files are edited. Some compilers do
not support dependency file creation, or may use a different switch
than -D. GNU g++ and Intel icc works with -D. If your compiler can't
create dependency files, then you'll need to create a Makefile.foo
patterned after Makefile.storm, which uses different rules that do not
involve dependency files. Note that when you build LAMMPS for the
first time on a new platform, a long list of *.d files will be printed
out rapidly. This is not an error; it is the Makefile doing its
normal creation of dependencies.
Step 4 :h6
The "system-specific settings" section has several parts. Note that
if you change any -D setting in this section, you should do a full
re-compile, after typing "make clean" (which will describe different
clean options).
The LMP_INC variable is used to include options that turn on ifdefs
within the LAMMPS code. The options that are currently recogized are:
-DLAMMPS_GZIP
-DLAMMPS_JPEG
-DLAMMPS_PNG
-DLAMMPS_FFMPEG
-DLAMMPS_MEMALIGN
-DLAMMPS_XDR
-DLAMMPS_SMALLBIG
-DLAMMPS_BIGBIG
-DLAMMPS_SMALLSMALL
-DLAMMPS_LONGLONG_TO_LONG
-DLAMMPS_EXCEPTIONS
-DPACK_ARRAY
-DPACK_POINTER
-DPACK_MEMCPY :ul
The read_data and dump commands will read/write gzipped files if you
compile with -DLAMMPS_GZIP. It requires that your machine supports
the "popen()" function in the standard runtime library and that a gzip
executable can be found by LAMMPS during a run.
NOTE: on some clusters with high-speed networks, using the fork()
library calls (required by popen()) can interfere with the fast
communication library and lead to simulations using compressed output
or input to hang or crash. For selected operations, compressed file
I/O is also available using a compression library instead, which are
provided in the COMPRESS package. From more details about compiling
LAMMPS with packages, please see below.
If you use -DLAMMPS_JPEG, the "dump image"_dump_image.html command
will be able to write out JPEG image files. For JPEG files, you must
also link LAMMPS with a JPEG library, as described below. If you use
-DLAMMPS_PNG, the "dump image"_dump.html command will be able to write
out PNG image files. For PNG files, you must also link LAMMPS with a
PNG library, as described below. If neither of those two defines are
used, LAMMPS will only be able to write out uncompressed PPM image
files.
If you use -DLAMMPS_FFMPEG, the "dump movie"_dump_image.html command
will be available to support on-the-fly generation of rendered movies
the need to store intermediate image files. It requires that your
machines supports the "popen" function in the standard runtime library
and that an FFmpeg executable can be found by LAMMPS during the run.
NOTE: Similar to the note above, this option can conflict with
high-speed networks, because it uses popen().
Using -DLAMMPS_MEMALIGN=<bytes> enables the use of the
posix_memalign() call instead of malloc() when large chunks or memory
are allocated by LAMMPS. This can help to make more efficient use of
vector instructions of modern CPUS, since dynamically allocated memory
has to be aligned on larger than default byte boundaries (e.g. 16
bytes instead of 8 bytes on x86 type platforms) for optimal
performance.
If you use -DLAMMPS_XDR, the build will include XDR compatibility
files for doing particle dumps in XTC format. This is only necessary
if your platform does have its own XDR files available. See the
Restrictions section of the "dump"_dump.html command for details.
Use at most one of the -DLAMMPS_SMALLBIG, -DLAMMPS_BIGBIG,
-DLAMMPS_SMALLSMALL settings. The default is -DLAMMPS_SMALLBIG. These
settings refer to use of 4-byte (small) vs 8-byte (big) integers
within LAMMPS, as specified in src/lmptype.h. The only reason to use
the BIGBIG setting is to enable simulation of huge molecular systems
(which store bond topology info) with more than 2 billion atoms, or to
track the image flags of moving atoms that wrap around a periodic box
more than 512 times. Normally, the only reason to use SMALLSMALL is
if your machine does not support 64-bit integers, though you can use
SMALLSMALL setting if you are running in serial or on a desktop
machine or small cluster where you will never run large systems or for
long time (more than 2 billion atoms, more than 2 billion timesteps).
See the "Additional build tips"_#start_2_4 section below for more
details on these settings.
Note that the USER-ATC package is not currently compatible with
-DLAMMPS_BIGBIG. Also the GPU package requires the lib/gpu library to
be compiled with the same setting, or the link will fail.
The -DLAMMPS_LONGLONG_TO_LONG setting may be needed if your system or
MPI version does not recognize "long long" data types. In this case a
"long" data type is likely already 64-bits, in which case this setting
will convert to that data type.
The -DLAMMPS_EXCEPTIONS setting can be used to activate alternative
versions of error handling inside of LAMMPS. This is useful when
external codes drive LAMMPS as a library. Using this option, LAMMPS
errors do not kill the caller. Instead, the call stack is unwound and
control returns to the caller. The library interface provides the
lammps_has_error() and lammps_get_last_error_message() functions to
detect and find out more about a LAMMPS error.
Using one of the -DPACK_ARRAY, -DPACK_POINTER, and -DPACK_MEMCPY
options can make for faster parallel FFTs (in the PPPM solver) on some
platforms. The -DPACK_ARRAY setting is the default. See the
"kspace_style"_kspace_style.html command for info about PPPM. See
Step 6 below for info about building LAMMPS with an FFT library.
Step 5 :h6
The 3 MPI variables are used to specify an MPI library to build LAMMPS
with. Note that you do not need to set these if you use the MPI
compiler mpicxx for your CC and LINK setting in the section above.
The MPI wrapper knows where to find the needed files.
If you want LAMMPS to run in parallel, you must have an MPI library
installed on your platform. If MPI is installed on your system in the
usual place (under /usr/local), you also may not need to specify these
3 variables, assuming /usr/local is in your path. On some large
parallel machines which use "modules" for their compile/link
environements, you may simply need to include the correct module in
your build environment, before building LAMMPS. Or the parallel
machine may have a vendor-provided MPI which the compiler has no
trouble finding.
Failing this, these 3 variables can be used to specify where the mpi.h
file (MPI_INC) and the MPI library file (MPI_PATH) are found and the
name of the library file (MPI_LIB).
If you are installing MPI yourself, we recommend Argonne's MPICH2
or OpenMPI. MPICH can be downloaded from the "Argonne MPI
site"_http://www.mcs.anl.gov/research/projects/mpich2/. OpenMPI can
be downloaded from the "OpenMPI site"_http://www.open-mpi.org.
Other MPI packages should also work. If you are running on a big
parallel platform, your system people or the vendor should have
already installed a version of MPI, which is likely to be faster
than a self-installed MPICH or OpenMPI, so find out how to build
and link with it. If you use MPICH or OpenMPI, you will have to
configure and build it for your platform. The MPI configure script
should have compiler options to enable you to use the same compiler
you are using for the LAMMPS build, which can avoid problems that can
arise when linking LAMMPS to the MPI library.
If you just want to run LAMMPS on a single processor, you can use the
dummy MPI library provided in src/STUBS, since you don't need a true
MPI library installed on your system. See src/MAKE/Makefile.serial
for how to specify the 3 MPI variables in this case. You will also
need to build the STUBS library for your platform before making LAMMPS
itself. Note that if you are building with src/MAKE/Makefile.serial,
e.g. by typing "make serial", then the STUBS library is built for you.
To build the STUBS library from the src directory, type "make
mpi-stubs", or from the src/STUBS dir, type "make". This should
create a libmpi_stubs.a file suitable for linking to LAMMPS. If the
build fails, you will need to edit the STUBS/Makefile for your
platform.
The file STUBS/mpi.c provides a CPU timer function called MPI_Wtime()
that calls gettimeofday() . If your system doesn't support
gettimeofday() , you'll need to insert code to call another timer.
Note that the ANSI-standard function clock() rolls over after an hour
or so, and is therefore insufficient for timing long LAMMPS
simulations.
Step 6 :h6
The 3 FFT variables allow you to specify an FFT library which LAMMPS
uses (for performing 1d FFTs) when running the particle-particle
particle-mesh (PPPM) option for long-range Coulombics via the
"kspace_style"_kspace_style.html command.
LAMMPS supports various open-source or vendor-supplied FFT libraries
for this purpose. If you leave these 3 variables blank, LAMMPS will
use the open-source "KISS FFT library"_http://kissfft.sf.net, which is
included in the LAMMPS distribution. This library is portable to all
platforms and for typical LAMMPS simulations is almost as fast as FFTW
or vendor optimized libraries. If you are not including the KSPACE
package in your build, you can also leave the 3 variables blank.
Otherwise, select which kinds of FFTs to use as part of the FFT_INC
setting by a switch of the form -DFFT_XXX. Recommended values for XXX
are: MKL, SCSL, FFTW2, and FFTW3. Legacy options are: INTEL, SGI,
ACML, and T3E. For backward compatability, using -DFFT_FFTW will use
the FFTW2 library. Using -DFFT_NONE will use the KISS library
described above.
You may also need to set the FFT_INC, FFT_PATH, and FFT_LIB variables,
so the compiler and linker can find the needed FFT header and library
files. Note that on some large parallel machines which use "modules"
for their compile/link environements, you may simply need to include
the correct module in your build environment. Or the parallel machine
may have a vendor-provided FFT library which the compiler has no
trouble finding.
FFTW is a fast, portable library that should also work on any
platform. You can download it from
"www.fftw.org"_http://www.fftw.org. Both the legacy version 2.1.X and
the newer 3.X versions are supported as -DFFT_FFTW2 or -DFFT_FFTW3.
Building FFTW for your box should be as simple as ./configure; make.
Note that on some platforms FFTW2 has been pre-installed, and uses
renamed files indicating the precision it was compiled with,
e.g. sfftw.h, or dfftw.h instead of fftw.h. In this case, you can
specify an additional define variable for FFT_INC called -DFFTW_SIZE,
which will select the correct include file. In this case, for FFT_LIB
you must also manually specify the correct library, namely -lsfftw or
-ldfftw.
The FFT_INC variable also allows for a -DFFT_SINGLE setting that will
use single-precision FFTs with PPPM, which can speed-up long-range
calulations, particularly in parallel or on GPUs. Fourier transform
and related PPPM operations are somewhat insensitive to floating point
truncation errors and thus do not always need to be performed in
double precision. Using the -DFFT_SINGLE setting trades off a little
accuracy for reduced memory use and parallel communication costs for
transposing 3d FFT data. Note that single precision FFTs have only
been tested with the FFTW3, FFTW2, MKL, and KISS FFT options.
Step 7 :h6
The 3 JPG variables allow you to specify a JPEG and/or PNG library
which LAMMPS uses when writing out JPEG or PNG files via the "dump
image"_dump_image.html command. These can be left blank if you do not
use the -DLAMMPS_JPEG or -DLAMMPS_PNG switches discussed above in Step
4, since in that case JPEG/PNG output will be disabled.
A standard JPEG library usually goes by the name libjpeg.a or
libjpeg.so and has an associated header file jpeglib.h. Whichever
JPEG library you have on your platform, you'll need to set the
appropriate JPG_INC, JPG_PATH, and JPG_LIB variables, so that the
compiler and linker can find it.
A standard PNG library usually goes by the name libpng.a or libpng.so
and has an associated header file png.h. Whichever PNG library you
have on your platform, you'll need to set the appropriate JPG_INC,
JPG_PATH, and JPG_LIB variables, so that the compiler and linker can
find it.
As before, if these header and library files are in the usual place on
your machine, you may not need to set these variables.
Step 8 :h6
Note that by default only a few of LAMMPS optional packages are
installed. To build LAMMPS with optional packages, see "this
section"_#start_3 below, before proceeding to Step 9.
Step 9 :h6
That's it. Once you have a correct Makefile.foo, and you have
pre-built any other needed libraries (e.g. MPI, FFT, etc) all you need
to do from the src directory is type something like this:
make foo
make -j N foo
gmake foo
gmake -j N foo :pre
The -j or -j N switches perform a parallel build which can be much
faster, depending on how many cores your compilation machine has. N
is the number of cores the build runs on.
You should get the executable lmp_foo when the build is complete.
:line
Errors that can occur when making LAMMPS: h5 :link(start_2_3)
NOTE: If an error occurs when building LAMMPS, the compiler or linker
will state very explicitly what the problem is. The error message
should give you a hint as to which of the steps above has failed, and
what you need to do in order to fix it. Building a code with a
Makefile is a very logical process. The compiler and linker need to
find the appropriate files and those files need to be compatible with
LAMMPS source files. When a make fails, there is usually a very
simple reason, which you or a local expert will need to fix.
Here are two non-obvious errors that can occur:
(1) If the make command breaks immediately with errors that indicate
it can't find files with a "*" in their names, this can be because
your machine's native make doesn't support wildcard expansion in a
makefile. Try gmake instead of make. If that doesn't work, try using
a -f switch with your make command to use a pre-generated
Makefile.list which explicitly lists all the needed files, e.g.
make makelist
make -f Makefile.list linux
gmake -f Makefile.list mac :pre
The first "make" command will create a current Makefile.list with all
the file names in your src dir. The 2nd "make" command (make or
gmake) will use it to build LAMMPS. Note that you should
include/exclude any desired optional packages before using the "make
makelist" command.
(2) If you get an error that says something like 'identifier "atoll"
is undefined', then your machine does not support "long long"
integers. Try using the -DLAMMPS_LONGLONG_TO_LONG setting described
above in Step 4.
:line
Additional build tips :h5,link(start_2_4)
Building LAMMPS for multiple platforms. :h6
You can make LAMMPS for multiple platforms from the same src
directory. Each target creates its own object sub-directory called
Obj_target where it stores the system-specific *.o files.
Cleaning up. :h6
Typing "make clean-all" or "make clean-machine" will delete *.o object
files created when LAMMPS is built, for either all builds or for a
particular machine.
Changing the LAMMPS size limits via -DLAMMPS_SMALLBIG or -DLAMMPS_BIGBIG or -DLAMMPS_SMALLSMALL :h6
As explained above, any of these 3 settings can be specified on the
LMP_INC line in your low-level src/MAKE/Makefile.foo.
The default is -DLAMMPS_SMALLBIG which allows for systems with up to
2^63 atoms and 2^63 timesteps (about 9e18). The atom limit is for
atomic systems which do not store bond topology info and thus do not
require atom IDs. If you use atom IDs for atomic systems (which is
the default) or if you use a molecular model, which stores bond
topology info and thus requires atom IDs, the limit is 2^31 atoms
(about 2 billion). This is because the IDs are stored in 32-bit
integers.
Likewise, with this setting, the 3 image flags for each atom (see the
"dump"_dump.html doc page for a discussion) are stored in a 32-bit
integer, which means the atoms can only wrap around a periodic box (in
each dimension) at most 512 times. If atoms move through the periodic
box more than this many times, the image flags will "roll over",
e.g. from 511 to -512, which can cause diagnostics like the
mean-squared displacement, as calculated by the "compute
msd"_compute_msd.html command, to be faulty.
To allow for larger atomic systems with atom IDs or larger molecular
systems or larger image flags, compile with -DLAMMPS_BIGBIG. This
stores atom IDs and image flags in 64-bit integers. This enables
atomic or molecular systems with atom IDS of up to 2^63 atoms (about
9e18). And image flags will not "roll over" until they reach 2^20 =
1048576.
If your system does not support 8-byte integers, you will need to
compile with the -DLAMMPS_SMALLSMALL setting. This will restrict the
total number of atoms (for atomic or molecular systems) and timesteps
to 2^31 (about 2 billion). Image flags will roll over at 2^9 = 512.
Note that in src/lmptype.h there are definitions of all these data
types as well as the MPI data types associated with them. The MPI
types need to be consistent with the associated C data types, or else
LAMMPS will generate a run-time error. As far as we know, the
settings defined in src/lmptype.h are portable and work on every
current system.
In all cases, the size of problem that can be run on a per-processor
basis is limited by 4-byte integer storage to 2^31 atoms per processor
(about 2 billion). This should not normally be a limitation since such
a problem would have a huge per-processor memory footprint due to
neighbor lists and would run very slowly in terms of CPU secs/timestep.
:line
Building for a Mac :h5,link(start_2_5)
OS X is a derivative of BSD Unix, so it should just work. See the
src/MAKE/MACHINES/Makefile.mac and Makefile.mac_mpi files.
:line
Building for Windows :h5,link(start_2_6)
If you want to build a Windows version of LAMMPS, you can build it
yourself, but it may require some effort. LAMMPS expects a Unix-like
build environment for the default build procedure. This can be done
using either Cygwin or MinGW; the latter also exists as a ready-to-use
Linux-to-Windows cross-compiler in several Linux distributions. In
these cases, you can do the installation after installing several
unix-style commands like make, grep, sed and bash with some shell
utilities.
For Cygwin and the MinGW cross-compilers, suitable makefiles are
provided in src/MAKE/MACHINES. When using other compilers, like
Visual C++ or Intel compilers for Windows, you may have to implement
your own build system. Since none of the current LAMMPS core developers
has significant experience building executables on Windows, we are
happy to distribute contributed instructions and modifications, but
we cannot provide support for those.
With the so-called "Anniversary Update" to Windows 10, there is a
Ubuntu Linux subsystem available for Windows, that can be installed
and then used to compile/install LAMMPS as if you are running on a
Ubuntu Linux system instead of Windows.
As an alternative, you can download "daily builds" (and some older
versions) of the installer packages from
"rpm.lammps.org/windows.html"_http://rpm.lammps.org/windows.html.
These executables are built with most optional packages and the
download includes documentation, potential files, some tools and
many examples, but no source code.
:line
2.3 Making LAMMPS with optional packages :h4,link(start_3)
This section has the following sub-sections:
2.3.1 "Package basics"_#start_3_1
2.3.2 "Including/excluding packages"_#start_3_2
2.3.3 "Packages that require extra libraries"_#start_3_3
2.3.4 "Packages that require Makefile.machine settings"_#start_3_4 :all(b)
Note that the following "Section 2.4"_#start_4 describes the Make.py
tool which can be used to install/un-install packages and build the
auxiliary libraries which some of them use. It can also auto-edit a
Makefile.machine to add settings needed by some packages.
:line
Package basics: :h5,link(start_3_1)
The source code for LAMMPS is structured as a set of core files which
are always included, plus optional packages. Packages are groups of
files that enable a specific set of features. For example, force
fields for molecular systems or granular systems are in packages.
"Section 4"_Section_packages.html in the manual has details
about all the packages, including specific instructions for building
LAMMPS with each package, which are covered in a more general manner
below.
You can see the list of all packages by typing "make package" from
within the src directory of the LAMMPS distribution. This also lists
various make commands that can be used to manipulate packages.
If you use a command in a LAMMPS input script that is part of a
package, you must have built LAMMPS with that package, else you will
get an error that the style is invalid or the command is unknown.
Every command's doc page specfies if it is part of a package. You can
also type
lmp_machine -h :pre
to run your executable with the optional "-h command-line
switch"_#start_7 for "help", which will simply list the styles and
commands known to your executable, and immediately exit.
There are two kinds of packages in LAMMPS, standard and user packages.
More information about the contents of standard and user packages is
given in "Section 4"_Section_packages.html of the manual. The
difference between standard and user packages is as follows:
Standard packages, such as molecule or kspace, are supported by the
LAMMPS developers and are written in a syntax and style consistent
with the rest of LAMMPS. This means we will answer questions about
them, debug and fix them if necessary, and keep them compatible with
future changes to LAMMPS.
User packages, such as user-atc or user-omp, have been contributed by
users, and always begin with the user prefix. If they are a single
command (single file), they are typically in the user-misc package.
Otherwise, they are a set of files grouped together which add a
specific functionality to the code.
User packages don't necessarily meet the requirements of the standard
packages. If you have problems using a feature provided in a user
package, you may need to contact the contributor directly to get help.
Information on how to submit additions you make to LAMMPS as single
files or either a standard or user-contributed package are given in
"this section"_Section_modify.html#mod_15 of the documentation.
:line
Including/excluding packages :h5,link(start_3_2)
To use (or not use) a package you must include it (or exclude it)
before building LAMMPS. From the src directory, this is typically as
simple as:
make yes-colloid
make mpi :pre
or
make no-manybody
make mpi :pre
NOTE: You should NOT include/exclude packages and build LAMMPS in a
single make command using multiple targets, e.g. make yes-colloid mpi.
This is because the make procedure creates a list of source files that
will be out-of-date for the build if the package configuration changes
within the same command.
Some packages have individual files that depend on other packages
being included. LAMMPS checks for this and does the right thing.
I.e. individual files are only included if their dependencies are
already included. Likewise, if a package is excluded, other files
dependent on that package are also excluded.
If you will never run simulations that use the features in a
particular packages, there is no reason to include it in your build.
For some packages, this will keep you from having to build auxiliary
libraries (see below), and will also produce a smaller executable
which may run a bit faster.
When you download a LAMMPS tarball, these packages are pre-installed
in the src directory: KSPACE, MANYBODY,MOLECULE, because they are so
commonly used. When you download LAMMPS source files from the SVN or
Git repositories, no packages are pre-installed.
Packages are included or excluded by typing "make yes-name" or "make
no-name", where "name" is the name of the package in lower-case, e.g.
name = kspace for the KSPACE package or name = user-atc for the
USER-ATC package. You can also type "make yes-standard", "make
no-standard", "make yes-std", "make no-std", "make yes-user", "make
no-user", "make yes-lib", "make no-lib", "make yes-all", or "make
no-all" to include/exclude various sets of packages. Type "make
package" to see all of the package-related make options.
NOTE: Inclusion/exclusion of a package works by simply moving files
back and forth between the main src directory and sub-directories with
the package name (e.g. src/KSPACE, src/USER-ATC), so that the files
are seen or not seen when LAMMPS is built. After you have included or
excluded a package, you must re-build LAMMPS.
Additional package-related make options exist to help manage LAMMPS
files that exist in both the src directory and in package
sub-directories. You do not normally need to use these commands
unless you are editing LAMMPS files or have downloaded a patch from
the LAMMPS WWW site.
Typing "make package-update" or "make pu" will overwrite src files
with files from the package sub-directories if the package has been
included. It should be used after a patch is installed, since patches
only update the files in the package sub-directory, but not the src
files. Typing "make package-overwrite" will overwrite files in the
package sub-directories with src files.
Typing "make package-status" or "make ps" will show which packages are
currently included. For those that are included, it will list any
files that are different in the src directory and package
sub-directory. Typing "make package-diff" lists all differences
between these files. Again, type "make package" to see all of the
package-related make options.
:line
Packages that require extra libraries :h5,link(start_3_3)
A few of the standard and user packages require additional auxiliary
libraries. Many of them are provided with LAMMPS, in which case they
must be compiled first, before LAMMPS is built, if you wish to include
that package. If you get a LAMMPS build error about a missing
library, this is likely the reason. See the
"Section 4"_Section_packages.html doc page for a list of
packages that have these kinds of auxiliary libraries.
The lib directory in the distribution has sub-directories with package
names that correspond to the needed auxiliary libs, e.g. lib/gpu.
Each sub-directory has a README file that gives more details. Code
for most of the auxiliary libraries is included in that directory.
Examples are the USER-ATC and MEAM packages.
A few of the lib sub-directories do not include code, but do include
instructions (and sometimes scripts) that automate the process of
downloading the auxiliary library and installing it so LAMMPS can link
to it. Examples are the KIM, VORONOI, USER-MOLFILE, and USER-SMD
packages.
The lib/python directory (for the PYTHON package) contains only a
choice of Makefile.lammps.* files. This is because no auxiliary code
or libraries are needed, only the Python library and other system libs
that should already available on your system. However, the
Makefile.lammps file is needed to tell LAMMPS which libs to use and
where to find them.
For libraries with provided code, the sub-directory README file
(e.g. lib/atc/README) has instructions on how to build that library.
This information is also summarized in "Section
4"_Section_packages.html. Typically this is done by typing
something like:
make -f Makefile.g++ :pre
If one of the provided Makefiles is not appropriate for your system
you will need to edit or add one. Note that all the Makefiles have a
setting for EXTRAMAKE at the top that specifies a Makefile.lammps.*
file.
If the library build is successful, it will produce 2 files in the lib
directory:
libpackage.a
Makefile.lammps :pre
The Makefile.lammps file will typically be a copy of one of the
Makefile.lammps.* files in the library directory.
Note that you must insure that the settings in Makefile.lammps are
appropriate for your system. If they are not, the LAMMPS build may
fail. To fix this, you can edit or create a new Makefile.lammps.*
file for your system, and copy it to Makefile.lammps.
As explained in the lib/package/README files, the settings in
Makefile.lammps are used to specify additional system libraries and
their locations so that LAMMPS can build with the auxiliary library.
For example, if the MEAM package is used, the auxiliary library
consists of F90 code, built with a Fortran complier. To link that
library with LAMMPS (a C++ code) via whatever C++ compiler LAMMPS is
built with, typically requires additional Fortran-to-C libraries be
included in the link. Another example are the BLAS and LAPACK
libraries needed to use the USER-ATC or USER-AWPMD packages.
For libraries without provided code, the sub-directory README file has
information on where to download the library and how to build it,
e.g. lib/voronoi/README and lib/smd/README. The README files also
describe how you must either (a) create soft links, via the "ln"
command, in those directories to point to where you built or installed
the packages, or (b) check or edit the Makefile.lammps file in the
same directory to provide that information.
Some of the sub-directories, e.g. lib/voronoi, also have an install.py
script which can be used to automate the process of
downloading/building/installing the auxiliary library, and setting the
needed soft links. Type "python install.py" for further instructions.
As with the sub-directories containing library code, if the soft links
or settings in the lib/package/Makefile.lammps files are not correct,
the LAMMPS build will typically fail.
:line
Packages that require Makefile.machine settings :h5,link(start_3_4)
A few packages require specific settings in Makefile.machine, to
either build or use the package effectively. These are the
USER-INTEL, KOKKOS, USER-OMP, and OPT packages, used for accelerating
code performance on CPUs or other hardware, as discussed in "Section
5.3"_Section_accelerate.html#acc_3.
A summary of what Makefile.machine changes are needed for each of
these packages is given in "Section 4"_Section_packages.html.
The details are given on the doc pages that describe each of these
accelerator packages in detail:
5.3.1 "USER-INTEL package"_accelerate_intel.html
5.3.3 "KOKKOS package"_accelerate_kokkos.html
5.3.4 "USER-OMP package"_accelerate_omp.html
5.3.5 "OPT package"_accelerate_opt.html :all(b)
You can also look at the following machine Makefiles in
src/MAKE/OPTIONS, which include the changes. Note that the USER-INTEL
and KOKKOS packages allow for settings that build LAMMPS for different
hardware. The USER-INTEL package builds for CPU and the Xeon Phi, the
KOKKOS package builds for OpenMP, GPUs (Cuda), and the Xeon Phi.
Makefile.intel_cpu
Makefile.intel_phi
Makefile.kokkos_omp
Makefile.kokkos_cuda
Makefile.kokkos_phi
Makefile.omp
Makefile.opt :ul
Also note that the Make.py tool, described in the next "Section
2.4"_#start_4 can automatically add the needed info to an existing
machine Makefile, using simple command-line arguments.
:line
2.4 Building LAMMPS via the Make.py tool :h4,link(start_4)
The src directory includes a Make.py script, written in Python, which
can be used to automate various steps of the build process. It is
particularly useful for working with the accelerator packages, as well
as other packages which require auxiliary libraries to be built.
The goal of the Make.py tool is to allow any complex multi-step LAMMPS
build to be performed as a single Make.py command. And you can
archive the commands, so they can be re-invoked later via the -r
(redo) switch. If you find some LAMMPS build procedure that can't be
done in a single Make.py command, let the developers know, and we'll
see if we can augment the tool.
You can run Make.py from the src directory by typing either:
Make.py -h
python Make.py -h :pre
which will give you help info about the tool. For the former to work,
you may need to edit the first line of Make.py to point to your local
Python. And you may need to insure the script is executable:
chmod +x Make.py :pre
Here are examples of build tasks you can perform with Make.py:
Install/uninstall packages: Make.py -p no-lib kokkos omp intel
Build specific auxiliary libs: Make.py -a lib-atc lib-meam
Build libs for all installed packages: Make.py -p cuda gpu -gpu mode=double arch=31 -a lib-all
Create a Makefile from scratch with compiler and MPI settings: Make.py -m none -cc g++ -mpi mpich -a file
Augment Makefile.serial with settings for installed packages: Make.py -p intel -intel cpu -m serial -a file
Add JPG and FFTW support to Makefile.mpi: Make.py -m mpi -jpg -fft fftw -a file
Build LAMMPS with a parallel make using Makefile.mpi: Make.py -j 16 -m mpi -a exe
Build LAMMPS and libs it needs using Makefile.serial with accelerator settings: Make.py -p gpu intel -intel cpu -a lib-all file serial :tb(s=:)
The bench and examples directories give Make.py commands that can be
used to build LAMMPS with the various packages and options needed to
run all the benchmark and example input scripts. See these files for
more details:
bench/README
bench/FERMI/README
bench/KEPLER/README
bench/PHI/README
examples/README
examples/accelerate/README
examples/accelerate/make.list :ul
All of the Make.py options and syntax help can be accessed by using
the "-h" switch.
E.g. typing "Make.py -h" gives
Syntax: Make.py switch args ...
switches can be listed in any order
help switch:
-h prints help and syntax for all other specified switches
switch for actions:
-a lib-all, lib-dir, clean, file, exe or machine
list one or more actions, in any order
machine is a Makefile.machine suffix, must be last if used
one-letter switches:
-d (dir), -j (jmake), -m (makefile), -o (output),
-p (packages), -r (redo), -s (settings), -v (verbose)
switches for libs:
-atc, -awpmd, -colvars, -cuda
-gpu, -meam, -poems, -qmmm, -reax
switches for build and makefile options:
-intel, -kokkos, -cc, -mpi, -fft, -jpg, -png :pre
Using the "-h" switch with other switches and actions gives additional
info on all the other specified switches or actions. The "-h" can be
anywhere in the command-line and the other switches do not need their
arguments. E.g. type "Make.py -h -d -atc -intel" will print:
-d dir
dir = LAMMPS home dir
if -d not specified, working dir must be lammps/src :pre
-atc make=suffix lammps=suffix2
all args are optional and can be in any order
make = use Makefile.suffix (def = g++)
lammps = use Makefile.lammps.suffix2 (def = EXTRAMAKE in makefile) :pre
-intel mode
mode = cpu or phi (def = cpu)
build Intel package for CPU or Xeon Phi :pre
Note that Make.py never overwrites an existing Makefile.machine.
Instead, it creates src/MAKE/MINE/Makefile.auto, which you can save or
rename if desired. Likewise it creates an executable named
src/lmp_auto, which you can rename using the -o switch if desired.
The most recently executed Make.py commmand is saved in
src/Make.py.last. You can use the "-r" switch (for redo) to re-invoke
the last command, or you can save a sequence of one or more Make.py
commands to a file and invoke the file of commands using "-r". You
can also label the commands in the file and invoke one or more of them
by name.
A typical use of Make.py is to start with a valid Makefile.machine for
your system, that works for a vanilla LAMMPS build, i.e. when optional
packages are not installed. You can then use Make.py to add various
settings (FFT, JPG, PNG) to the Makefile.machine as well as change its
compiler and MPI options. You can also add additional packages to the
build, as well as build the needed supporting libraries.
You can also use Make.py to create a new Makefile.machine from
scratch, using the "-m none" switch, if you also specify what compiler
and MPI options to use, via the "-cc" and "-mpi" switches.
:line
2.5 Building LAMMPS as a library :h4,link(start_5)
LAMMPS can be built as either a static or shared library, which can
then be called from another application or a scripting language. See
"this section"_Section_howto.html#howto_10 for more info on coupling
LAMMPS to other codes. See "this section"_Section_python.html for
more info on wrapping and running LAMMPS from Python.
Static library :h5
To build LAMMPS as a static library (*.a file on Linux), type
make foo mode=lib :pre
where foo is the machine name. This kind of library is typically used
to statically link a driver application to LAMMPS, so that you can
insure all dependencies are satisfied at compile time. This will use
the ARCHIVE and ARFLAGS settings in src/MAKE/Makefile.foo. The build
will create the file liblammps_foo.a which another application can
link to. It will also create a soft link liblammps.a, which will
point to the most recently built static library.
Shared library :h5
To build LAMMPS as a shared library (*.so file on Linux), which can be
dynamically loaded, e.g. from Python, type
make foo mode=shlib :pre
where foo is the machine name. This kind of library is required when
wrapping LAMMPS with Python; see "Section 11"_Section_python.html
for details. This will use the SHFLAGS and SHLIBFLAGS settings in
src/MAKE/Makefile.foo and perform the build in the directory
Obj_shared_foo. This is so that each file can be compiled with the
-fPIC flag which is required for inclusion in a shared library. The
build will create the file liblammps_foo.so which another application
can link to dyamically. It will also create a soft link liblammps.so,
which will point to the most recently built shared library. This is
the file the Python wrapper loads by default.
Note that for a shared library to be usable by a calling program, all
the auxiliary libraries it depends on must also exist as shared
libraries. This will be the case for libraries included with LAMMPS,
such as the dummy MPI library in src/STUBS or any package libraries in
lib/packages, since they are always built as shared libraries using
the -fPIC switch. However, if a library like MPI or FFTW does not
exist as a shared library, the shared library build will generate an
error. This means you will need to install a shared library version
of the auxiliary library. The build instructions for the library
should tell you how to do this.
Here is an example of such errors when the system FFTW or provided
lib/colvars library have not been built as shared libraries:
/usr/bin/ld: /usr/local/lib/libfftw3.a(mapflags.o): relocation
R_X86_64_32 against '.rodata' can not be used when making a shared
object; recompile with -fPIC
/usr/local/lib/libfftw3.a: could not read symbols: Bad value :pre
/usr/bin/ld: ../../lib/colvars/libcolvars.a(colvarmodule.o):
relocation R_X86_64_32 against '__pthread_key_create' can not be used
when making a shared object; recompile with -fPIC
../../lib/colvars/libcolvars.a: error adding symbols: Bad value :pre
As an example, here is how to build and install the "MPICH
library"_mpich, a popular open-source version of MPI, distributed by
Argonne National Labs, as a shared library in the default
/usr/local/lib location:
:link(mpich,http://www-unix.mcs.anl.gov/mpi)
./configure --enable-shared
make
make install :pre
You may need to use "sudo make install" in place of the last line if
you do not have write privileges for /usr/local/lib. The end result
should be the file /usr/local/lib/libmpich.so.
[Additional requirement for using a shared library:] :h5
The operating system finds shared libraries to load at run-time using
the environment variable LD_LIBRARY_PATH. So you may wish to copy the
file src/liblammps.so or src/liblammps_g++.so (for example) to a place
the system can find it by default, such as /usr/local/lib, or you may
wish to add the LAMMPS src directory to LD_LIBRARY_PATH, so that the
current version of the shared library is always available to programs
that use it.
For the csh or tcsh shells, you would add something like this to your
~/.cshrc file:
setenv LD_LIBRARY_PATH $\{LD_LIBRARY_PATH\}:/home/sjplimp/lammps/src :pre
Calling the LAMMPS library :h5
Either flavor of library (static or shared) allows one or more LAMMPS
objects to be instantiated from the calling program.
When used from a C++ program, all of LAMMPS is wrapped in a LAMMPS_NS
namespace; you can safely use any of its classes and methods from
within the calling code, as needed.
When used from a C or Fortran program or a scripting language like
Python, the library has a simple function-style interface, provided in
src/library.cpp and src/library.h.
See the sample codes in examples/COUPLE/simple for examples of C++ and
C and Fortran codes that invoke LAMMPS thru its library interface.
There are other examples as well in the COUPLE directory which are
discussed in "Section 6.10"_Section_howto.html#howto_10 of the
manual. See "Section 11"_Section_python.html of the manual for a
description of the Python wrapper provided with LAMMPS that operates
through the LAMMPS library interface.
The files src/library.cpp and library.h define the C-style API for
using LAMMPS as a library. See "Section
6.19"_Section_howto.html#howto_19 of the manual for a description of the
interface and how to extend it for your needs.
:line
2.6 Running LAMMPS :h4,link(start_6)
By default, LAMMPS runs by reading commands from standard input. Thus
if you run the LAMMPS executable by itself, e.g.
lmp_linux :pre
it will simply wait, expecting commands from the keyboard. Typically
you should put commands in an input script and use I/O redirection,
e.g.
lmp_linux < in.file :pre
For parallel environments this should also work. If it does not, use
the '-in' command-line switch, e.g.
lmp_linux -in in.file :pre
"This section"_Section_commands.html describes how input scripts are
structured and what commands they contain.
You can test LAMMPS on any of the sample inputs provided in the
examples or bench directory. Input scripts are named in.* and sample
outputs are named log.*.name.P where name is a machine and P is the
number of processors it was run on.
Here is how you might run a standard Lennard-Jones benchmark on a
Linux box, using mpirun to launch a parallel job:
cd src
make linux
cp lmp_linux ../bench
cd ../bench
mpirun -np 4 lmp_linux -in in.lj :pre
See "this page"_bench for timings for this and the other benchmarks on
various platforms. Note that some of the example scripts require
LAMMPS to be built with one or more of its optional packages.
:link(bench,http://lammps.sandia.gov/bench.html)
:line
On a Windows box, you can skip making LAMMPS and simply download an
installer package from "here"_http://rpm.lammps.org/windows.html
For running the non-MPI executable, follow these steps:
Get a command prompt by going to Start->Run... ,
then typing "cmd". :ulb,l
Move to the directory where you have your input, e.g. a copy of
the [in.lj] input from the bench folder. (e.g. by typing: cd "Documents"). :l
At the command prompt, type "lmp_serial -in in.lj", replacing [in.lj]
with the name of your LAMMPS input script. :l
:ule
For the MPI version, which allows you to run LAMMPS under Windows on
multiple processors, follow these steps:
Download and install
"MPICH2"_http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=downloads
for Windows. :ulb,l
The LAMMPS Windows installer packages will automatically adjust your
path for the default location of this MPI package. After the installation
of the MPICH software, it needs to be integrated into the system.
For this you need to start a Command Prompt in {Administrator Mode}
(right click on the icon and select it). Change into the MPICH2
installation directory, then into the subdirectory [bin] and execute
[smpd.exe -install]. Exit the command window.
Get a new, regular command prompt by going to Start->Run... ,
then typing "cmd". :l
Move to the directory where you have your input file
(e.g. by typing: cd "Documents"). :l
Then type something like this:
mpiexec -localonly 4 lmp_mpi -in in.lj :pre
or
mpiexec -np 4 lmp_mpi -in in.lj :pre
replacing in.lj with the name of your LAMMPS input script. For the latter
case, you may be prompted to enter your password. :l
In this mode, output may not immediately show up on the screen, so if
your input script takes a long time to execute, you may need to be
patient before the output shows up. :l
The parallel executable can also run on a single processor by typing
something like:
lmp_mpi -in in.lj :pre
:ule
:line
The screen output from LAMMPS is described in a section below. As it
runs, LAMMPS also writes a log.lammps file with the same information.
Note that this sequence of commands copies the LAMMPS executable
(lmp_linux) to the directory with the input files. This may not be
necessary, but some versions of MPI reset the working directory to
where the executable is, rather than leave it as the directory where
you launch mpirun from (if you launch lmp_linux on its own and not
under mpirun). If that happens, LAMMPS will look for additional input
files and write its output files to the executable directory, rather
than your working directory, which is probably not what you want.
If LAMMPS encounters errors in the input script or while running a
simulation it will print an ERROR message and stop or a WARNING
message and continue. See "Section 12"_Section_errors.html for a
discussion of the various kinds of errors LAMMPS can or can't detect,
a list of all ERROR and WARNING messages, and what to do about them.
LAMMPS can run a problem on any number of processors, including a
single processor. In theory you should get identical answers on any
number of processors and on any machine. In practice, numerical
round-off can cause slight differences and eventual divergence of
molecular dynamics phase space trajectories.
LAMMPS can run as large a problem as will fit in the physical memory
of one or more processors. If you run out of memory, you must run on
more processors or setup a smaller problem.
:line
2.7 Command-line options :h4,link(start_7)
At run time, LAMMPS recognizes several optional command-line switches
which may be used in any order. Either the full word or a one-or-two
letter abbreviation can be used:
-e or -echo
-h or -help
-i or -in
-k or -kokkos
-l or -log
-nc or -nocite
-pk or -package
-p or -partition
-pl or -plog
-ps or -pscreen
-r or -restart
-ro or -reorder
-sc or -screen
-sf or -suffix
-v or -var :ul
For example, lmp_ibm might be launched as follows:
mpirun -np 16 lmp_ibm -v f tmp.out -l my.log -sc none -in in.alloy
mpirun -np 16 lmp_ibm -var f tmp.out -log my.log -screen none -in in.alloy :pre
Here are the details on the options:
-echo style :pre
Set the style of command echoing. The style can be {none} or {screen}
or {log} or {both}. Depending on the style, each command read from
the input script will be echoed to the screen and/or logfile. This
can be useful to figure out which line of your script is causing an
input error. The default value is {log}. The echo style can also be
set by using the "echo"_echo.html command in the input script itself.
-help :pre
Print a brief help summary and a list of options compiled into this
executable for each LAMMPS style (atom_style, fix, compute,
pair_style, bond_style, etc). This can tell you if the command you
want to use was included via the appropriate package at compile time.
LAMMPS will print the info and immediately exit if this switch is
used.
-in file :pre
Specify a file to use as an input script. This is an optional switch
when running LAMMPS in one-partition mode. If it is not specified,
LAMMPS reads its script from standard input, typically from a script
via I/O redirection; e.g. lmp_linux < in.run. I/O redirection should
also work in parallel, but if it does not (in the unlikely case that
an MPI implementation does not support it), then use the -in flag.
Note that this is a required switch when running LAMMPS in
multi-partition mode, since multiple processors cannot all read from
stdin.
-kokkos on/off keyword/value ... :pre
Explicitly enable or disable KOKKOS support, as provided by the KOKKOS
package. Even if LAMMPS is built with this package, as described
above in "Section 2.3"_#start_3, this switch must be set to enable
running with the KOKKOS-enabled styles the package provides. If the
switch is not set (the default), LAMMPS will operate as if the KOKKOS
package were not installed; i.e. you can run standard LAMMPS or with
the GPU or USER-OMP packages, for testing or benchmarking purposes.
Additional optional keyword/value pairs can be specified which
determine how Kokkos will use the underlying hardware on your
platform. These settings apply to each MPI task you launch via the
"mpirun" or "mpiexec" command. You may choose to run one or more MPI
tasks per physical node. Note that if you are running on a desktop
machine, you typically have one physical node. On a cluster or
supercomputer there may be dozens or 1000s of physical nodes.
Either the full word or an abbreviation can be used for the keywords.
Note that the keywords do not use a leading minus sign. I.e. the
keyword is "t", not "-t". Also note that each of the keywords has a
default setting. Example of when to use these options and what
settings to use on different platforms is given in "Section
5.3"_Section_accelerate.html#acc_3.
d or device
g or gpus
t or threads
n or numa :ul
device Nd :pre
This option is only relevant if you built LAMMPS with CUDA=yes, you
have more than one GPU per node, and if you are running with only one
MPI task per node. The Nd setting is the ID of the GPU on the node to
run on. By default Nd = 0. If you have multiple GPUs per node, they
have consecutive IDs numbered as 0,1,2,etc. This setting allows you
to launch multiple independent jobs on the node, each with a single
MPI task per node, and assign each job to run on a different GPU.
gpus Ng Ns :pre
This option is only relevant if you built LAMMPS with CUDA=yes, you
have more than one GPU per node, and you are running with multiple MPI
tasks per node (up to one per GPU). The Ng setting is how many GPUs
you will use. The Ns setting is optional. If set, it is the ID of a
GPU to skip when assigning MPI tasks to GPUs. This may be useful if
your desktop system reserves one GPU to drive the screen and the rest
are intended for computational work like running LAMMPS. By default
Ng = 1 and Ns is not set.
Depending on which flavor of MPI you are running, LAMMPS will look for
one of these 3 environment variables
SLURM_LOCALID (various MPI variants compiled with SLURM support)
MV2_COMM_WORLD_LOCAL_RANK (Mvapich)
OMPI_COMM_WORLD_LOCAL_RANK (OpenMPI) :pre
which are initialized by the "srun", "mpirun" or "mpiexec" commands.
The environment variable setting for each MPI rank is used to assign a
unique GPU ID to the MPI task.
threads Nt :pre
This option assigns Nt number of threads to each MPI task for
performing work when Kokkos is executing in OpenMP or pthreads mode.
The default is Nt = 1, which essentially runs in MPI-only mode. If
there are Np MPI tasks per physical node, you generally want Np*Nt =
the number of physical cores per node, to use your available hardware
optimally. This also sets the number of threads used by the host when
LAMMPS is compiled with CUDA=yes.
numa Nm :pre
This option is only relevant when using pthreads with hwloc support.
In this case Nm defines the number of NUMA regions (typicaly sockets)
on a node which will be utilizied by a single MPI rank. By default Nm
= 1. If this option is used the total number of worker-threads per
MPI rank is threads*numa. Currently it is always almost better to
assign at least one MPI rank per NUMA region, and leave numa set to
its default value of 1. This is because letting a single process span
multiple NUMA regions induces a significant amount of cross NUMA data
traffic which is slow.
-log file :pre
Specify a log file for LAMMPS to write status information to. In
one-partition mode, if the switch is not used, LAMMPS writes to the
file log.lammps. If this switch is used, LAMMPS writes to the
specified file. In multi-partition mode, if the switch is not used, a
log.lammps file is created with hi-level status information. Each
partition also writes to a log.lammps.N file where N is the partition
ID. If the switch is specified in multi-partition mode, the hi-level
logfile is named "file" and each partition also logs information to a
file.N. For both one-partition and multi-partition mode, if the
specified file is "none", then no log files are created. Using a
"log"_log.html command in the input script will override this setting.
Option -plog will override the name of the partition log files file.N.
-nocite :pre
Disable writing the log.cite file which is normally written to list
references for specific cite-able features used during a LAMMPS run.
See the "citation page"_http://lammps.sandia.gov/cite.html for more
details.
-package style args .... :pre
Invoke the "package"_package.html command with style and args. The
syntax is the same as if the command appeared at the top of the input
script. For example "-package gpu 2" or "-pk gpu 2" is the same as
"package gpu 2"_package.html in the input script. The possible styles
and args are documented on the "package"_package.html doc page. This
switch can be used multiple times, e.g. to set options for the
USER-INTEL and USER-OMP packages which can be used together.
Along with the "-suffix" command-line switch, this is a convenient
mechanism for invoking accelerator packages and their options without
having to edit an input script.
-partition 8x2 4 5 ... :pre
Invoke LAMMPS in multi-partition mode. When LAMMPS is run on P
processors and this switch is not used, LAMMPS runs in one partition,
i.e. all P processors run a single simulation. If this switch is
used, the P processors are split into separate partitions and each
partition runs its own simulation. The arguments to the switch
specify the number of processors in each partition. Arguments of the
form MxN mean M partitions, each with N processors. Arguments of the
form N mean a single partition with N processors. The sum of
processors in all partitions must equal P. Thus the command
"-partition 8x2 4 5" has 10 partitions and runs on a total of 25
processors.
Running with multiple partitions can e useful for running
"multi-replica simulations"_Section_howto.html#howto_5, where each
replica runs on on one or a few processors. Note that with MPI
installed on a machine (e.g. your desktop), you can run on more
(virtual) processors than you have physical processors.
To run multiple independent simulatoins from one input script, using
multiple partitions, see "Section 6.4"_Section_howto.html#howto_4
of the manual. World- and universe-style "variables"_variable.html
are useful in this context.
-plog file :pre
Specify the base name for the partition log files, so partition N
writes log information to file.N. If file is none, then no partition
log files are created. This overrides the filename specified in the
-log command-line option. This option is useful when working with
large numbers of partitions, allowing the partition log files to be
suppressed (-plog none) or placed in a sub-directory (-plog
replica_files/log.lammps) If this option is not used the log file for
partition N is log.lammps.N or whatever is specified by the -log
command-line option.
-pscreen file :pre
Specify the base name for the partition screen file, so partition N
writes screen information to file.N. If file is none, then no
partition screen files are created. This overrides the filename
specified in the -screen command-line option. This option is useful
when working with large numbers of partitions, allowing the partition
screen files to be suppressed (-pscreen none) or placed in a
sub-directory (-pscreen replica_files/screen). If this option is not
used the screen file for partition N is screen.N or whatever is
specified by the -screen command-line option.
-restart restartfile {remap} datafile keyword value ... :pre
Convert the restart file into a data file and immediately exit. This
is the same operation as if the following 2-line input script were
run:
read_restart restartfile {remap}
write_data datafile keyword value ... :pre
Note that the specified restartfile and datafile can have wild-card
characters ("*",%") as described by the
"read_restart"_read_restart.html and "write_data"_write_data.html
commands. But a filename such as file.* will need to be enclosed in
quotes to avoid shell expansion of the "*" character.
Note that following restartfile, the optional flag {remap} can be
used. This has the same effect as adding it to the
"read_restart"_read_restart.html command, as explained on its doc
page. This is only useful if the reading of the restart file triggers
an error that atoms have been lost. In that case, use of the remap
flag should allow the data file to still be produced.
Also note that following datafile, the same optional keyword/value
pairs can be listed as used by the "write_data"_write_data.html
command.
-reorder nth N
-reorder custom filename :pre
Reorder the processors in the MPI communicator used to instantiate
LAMMPS, in one of several ways. The original MPI communicator ranks
all P processors from 0 to P-1. The mapping of these ranks to
physical processors is done by MPI before LAMMPS begins. It may be
useful in some cases to alter the rank order. E.g. to insure that
cores within each node are ranked in a desired order. Or when using
the "run_style verlet/split"_run_style.html command with 2 partitions
to insure that a specific Kspace processor (in the 2nd partition) is
matched up with a specific set of processors in the 1st partition.
See the "Section 5"_Section_accelerate.html doc pages for
more details.
If the keyword {nth} is used with a setting {N}, then it means every
Nth processor will be moved to the end of the ranking. This is useful
when using the "run_style verlet/split"_run_style.html command with 2
partitions via the -partition command-line switch. The first set of
processors will be in the first partition, the 2nd set in the 2nd
partition. The -reorder command-line switch can alter this so that
the 1st N procs in the 1st partition and one proc in the 2nd partition
will be ordered consecutively, e.g. as the cores on one physical node.
This can boost performance. For example, if you use "-reorder nth 4"
and "-partition 9 3" and you are running on 12 processors, the
processors will be reordered from
0 1 2 3 4 5 6 7 8 9 10 11 :pre
to
0 1 2 4 5 6 8 9 10 3 7 11 :pre
so that the processors in each partition will be
0 1 2 4 5 6 8 9 10
3 7 11 :pre
See the "processors" command for how to insure processors from each
partition could then be grouped optimally for quad-core nodes.
If the keyword is {custom}, then a file that specifies a permutation
of the processor ranks is also specified. The format of the reorder
file is as follows. Any number of initial blank or comment lines
(starting with a "#" character) can be present. These should be
followed by P lines of the form:
I J :pre
where P is the number of processors LAMMPS was launched with. Note
that if running in multi-partition mode (see the -partition switch
above) P is the total number of processors in all partitions. The I
and J values describe a permutation of the P processors. Every I and
J should be values from 0 to P-1 inclusive. In the set of P I values,
every proc ID should appear exactly once. Ditto for the set of P J
values. A single I,J pairing means that the physical processor with
rank I in the original MPI communicator will have rank J in the
reordered communicator.
Note that rank ordering can also be specified by many MPI
implementations, either by environment variables that specify how to
order physical processors, or by config files that specify what
physical processors to assign to each MPI rank. The -reorder switch
simply gives you a portable way to do this without relying on MPI
itself. See the "processors out"_processors.html command for how
to output info on the final assignment of physical processors to
the LAMMPS simulation domain.
-screen file :pre
Specify a file for LAMMPS to write its screen information to. In
one-partition mode, if the switch is not used, LAMMPS writes to the
screen. If this switch is used, LAMMPS writes to the specified file
instead and you will see no screen output. In multi-partition mode,
if the switch is not used, hi-level status information is written to
the screen. Each partition also writes to a screen.N file where N is
the partition ID. If the switch is specified in multi-partition mode,
the hi-level screen dump is named "file" and each partition also
writes screen information to a file.N. For both one-partition and
multi-partition mode, if the specified file is "none", then no screen
output is performed. Option -pscreen will override the name of the
partition screen files file.N.
-suffix style args :pre
Use variants of various styles if they exist. The specified style can
be {cuda}, {gpu}, {intel}, {kk}, {omp}, {opt}, or {hybrid}. These
refer to optional packages that LAMMPS can be built with, as described
above in "Section 2.3"_#start_3. The "gpu" style corresponds to the
GPU package, the "intel" style to the USER-INTEL package, the "kk"
style to the KOKKOS package, the "opt" style to the OPT package, and
the "omp" style to the USER-OMP package. The hybrid style is the only
style that accepts arguments. It allows for two packages to be
specified. The first package specified is the default and will be used
if it is available. If no style is available for the first package,
the style for the second package will be used if available. For
example, "-suffix hybrid intel omp" will use styles from the
USER-INTEL package if they are installed and available, but styles for
the USER-OMP package otherwise.
Along with the "-package" command-line switch, this is a convenient
mechanism for invoking accelerator packages and their options without
having to edit an input script.
As an example, all of the packages provide a "pair_style
lj/cut"_pair_lj.html variant, with style names lj/cut/gpu,
lj/cut/intel, lj/cut/kk, lj/cut/omp, and lj/cut/opt. A variant style
can be specified explicitly in your input script, e.g. pair_style
lj/cut/gpu. If the -suffix switch is used the specified suffix
(gpu,intel,kk,omp,opt) is automatically appended whenever your input
script command creates a new "atom"_atom_style.html,
"pair"_pair_style.html, "fix"_fix.html, "compute"_compute.html, or
"run"_run_style.html style. If the variant version does not exist,
the standard version is created.
For the GPU package, using this command-line switch also invokes the
default GPU settings, as if the command "package gpu 1" were used at
the top of your input script. These settings can be changed by using
the "-package gpu" command-line switch or the "package
gpu"_package.html command in your script.
For the USER-INTEL package, using this command-line switch also
invokes the default USER-INTEL settings, as if the command "package
intel 1" were used at the top of your input script. These settings
can be changed by using the "-package intel" command-line switch or
the "package intel"_package.html command in your script. If the
USER-OMP package is also installed, the hybrid style with "intel omp"
arguments can be used to make the omp suffix a second choice, if a
requested style is not available in the USER-INTEL package. It will
also invoke the default USER-OMP settings, as if the command "package
omp 0" were used at the top of your input script. These settings can
be changed by using the "-package omp" command-line switch or the
"package omp"_package.html command in your script.
For the KOKKOS package, using this command-line switch also invokes
the default KOKKOS settings, as if the command "package kokkos" were
used at the top of your input script. These settings can be changed
by using the "-package kokkos" command-line switch or the "package
kokkos"_package.html command in your script.
For the OMP package, using this command-line switch also invokes the
default OMP settings, as if the command "package omp 0" were used at
the top of your input script. These settings can be changed by using
the "-package omp" command-line switch or the "package
omp"_package.html command in your script.
The "suffix"_suffix.html command can also be used within an input
script to set a suffix, or to turn off or back on any suffix setting
made via the command line.
-var name value1 value2 ... :pre
Specify a variable that will be defined for substitution purposes when
the input script is read. This switch can be used multiple times to
define multiple variables. "Name" is the variable name which can be a
single character (referenced as $x in the input script) or a full
string (referenced as $\{abc\}). An "index-style
variable"_variable.html will be created and populated with the
subsequent values, e.g. a set of filenames. Using this command-line
option is equivalent to putting the line "variable name index value1
value2 ..." at the beginning of the input script. Defining an index
variable as a command-line argument overrides any setting for the same
index variable in the input script, since index variables cannot be
re-defined. See the "variable"_variable.html command for more info on
defining index and other kinds of variables and "this
section"_Section_commands.html#cmd_2 for more info on using variables
in input scripts.
NOTE: Currently, the command-line parser looks for arguments that
start with "-" to indicate new switches. Thus you cannot specify
multiple variable values if any of they start with a "-", e.g. a
negative numeric value. It is OK if the first value1 starts with a
"-", since it is automatically skipped.
:line
2.8 LAMMPS screen output :h4,link(start_8)
As LAMMPS reads an input script, it prints information to both the
screen and a log file about significant actions it takes to setup a
simulation. When the simulation is ready to begin, LAMMPS performs
various initializations and prints the amount of memory (in MBytes per
processor) that the simulation requires. It also prints details of
the initial thermodynamic state of the system. During the run itself,
thermodynamic information is printed periodically, every few
timesteps. When the run concludes, LAMMPS prints the final
thermodynamic state and a total run time for the simulation. It then
appends statistics about the CPU time and storage requirements for the
simulation. An example set of statistics is shown here:
-Loop time of 2.81192 on 4 procs for 300 steps with 2004 atoms
+Loop time of 2.81192 on 4 procs for 300 steps with 2004 atoms :pre
Performance: 18.436 ns/day 1.302 hours/ns 106.689 timesteps/s
97.0% CPU use with 4 MPI tasks x no OpenMP threads :pre
MPI task timings breakdown:
Section | min time | avg time | max time |%varavg| %total
---------------------------------------------------------------
Pair | 1.9808 | 2.0134 | 2.0318 | 1.4 | 71.60
Bond | 0.0021894 | 0.0060319 | 0.010058 | 4.7 | 0.21
Kspace | 0.3207 | 0.3366 | 0.36616 | 3.1 | 11.97
Neigh | 0.28411 | 0.28464 | 0.28516 | 0.1 | 10.12
Comm | 0.075732 | 0.077018 | 0.07883 | 0.4 | 2.74
Output | 0.00030518 | 0.00042665 | 0.00078821 | 1.0 | 0.02
Modify | 0.086606 | 0.086631 | 0.086668 | 0.0 | 3.08
Other | | 0.007178 | | | 0.26 :pre
Nlocal: 501 ave 508 max 490 min
Histogram: 1 0 0 0 0 0 1 1 0 1
Nghost: 6586.25 ave 6628 max 6548 min
Histogram: 1 0 1 0 0 0 1 0 0 1
Neighs: 177007 ave 180562 max 170212 min
Histogram: 1 0 0 0 0 0 0 1 1 1 :pre
Total # of neighbors = 708028
Ave neighs/atom = 353.307
Ave special neighs/atom = 2.34032
Neighbor list builds = 26
Dangerous builds = 0 :pre
-The first section provides a global loop timing summary. The loop time
+The first section provides a global loop timing summary. The {loop time}
is the total wall time for the section. The {Performance} line is
provided for convenience to help predicting the number of loop
-continuations required and for comparing performance with other
-similar MD codes. The CPU use line provides the CPU utilzation per
+continuations required and for comparing performance with other,
+similar MD codes. The {CPU use} line provides the CPU utilzation per
MPI task; it should be close to 100% times the number of OpenMP
-threads (or 1). Lower numbers correspond to delays due to file I/O or
-insufficient thread utilization.
+threads (or 1 of no OpenMP). Lower numbers correspond to delays due
+to file I/O or insufficient thread utilization.
The MPI task section gives the breakdown of the CPU run time (in
seconds) into major categories:
{Pair} stands for all non-bonded force computation
{Bond} stands for bonded interactions: bonds, angles, dihedrals, impropers
{Kspace} stands for reciprocal space interactions: Ewald, PPPM, MSM
{Neigh} stands for neighbor list construction
{Comm} stands for communicating atoms and their properties
{Output} stands for writing dumps and thermo output
{Modify} stands for fixes and computes called by them
{Other} is the remaining time :ul
For each category, there is a breakdown of the least, average and most
amount of wall time a processor spent on this section. Also you have the
variation from the average time. Together these numbers allow to gauge
the amount of load imbalance in this segment of the calculation. Ideally
the difference between minimum, maximum and average is small and thus
the variation from the average close to zero. The final column shows
the percentage of the total loop time is spent in this section.
When using the "timer full"_timer.html setting, an additional column
is present that also prints the CPU utilization in percent. In
addition, when using {timer full} and the "package omp"_package.html
command are active, a similar timing summary of time spent in threaded
regions to monitor thread utilization and load balance is provided. A
-new entry is the {Reduce} section, which lists the time spend in
+new entry is the {Reduce} section, which lists the time spent in
reducing the per-thread data elements to the storage for non-threaded
computation. These thread timings are taking from the first MPI rank
only and and thus, as the breakdown for MPI tasks can change from MPI
rank to MPI rank, this breakdown can be very different for individual
ranks. Here is an example output for this section:
Thread timings breakdown (MPI rank 0):
Total threaded time 0.6846 / 90.6%
Section | min time | avg time | max time |%varavg| %total
---------------------------------------------------------------
Pair | 0.5127 | 0.5147 | 0.5167 | 0.3 | 75.18
Bond | 0.0043139 | 0.0046779 | 0.0050418 | 0.5 | 0.68
Kspace | 0.070572 | 0.074541 | 0.07851 | 1.5 | 10.89
Neigh | 0.084778 | 0.086969 | 0.089161 | 0.7 | 12.70
Reduce | 0.0036485 | 0.003737 | 0.0038254 | 0.1 | 0.55 :pre
The third section lists the number of owned atoms (Nlocal), ghost atoms
(Nghost), and pair-wise neighbors stored per processor. The max and min
values give the spread of these values across processors with a 10-bin
histogram showing the distribution. The total number of histogram counts
is equal to the number of processors.
The last section gives aggregate statistics for pair-wise neighbors
and special neighbors that LAMMPS keeps track of (see the
"special_bonds"_special_bonds.html command). The number of times
neighbor lists were rebuilt during the run is given as well as the
number of potentially "dangerous" rebuilds. If atom movement
triggered neighbor list rebuilding (see the
"neigh_modify"_neigh_modify.html command), then dangerous
reneighborings are those that were triggered on the first timestep
atom movement was checked for. If this count is non-zero you may wish
to reduce the delay factor to insure no force interactions are missed
by atoms moving beyond the neighbor skin distance before a rebuild
takes place.
If an energy minimization was performed via the
"minimize"_minimize.html command, additional information is printed,
e.g.
Minimization stats:
Stopping criterion = linesearch alpha is zero
Energy initial, next-to-last, final =
-6372.3765206 -8328.46998942 -8328.46998942
Force two-norm initial, final = 1059.36 5.36874
Force max component initial, final = 58.6026 1.46872
Final line search alpha, max atom move = 2.7842e-10 4.0892e-10
Iterations, force evaluations = 701 1516 :pre
The first line prints the criterion that determined the minimization
to be completed. The third line lists the initial and final energy,
as well as the energy on the next-to-last iteration. The next 2 lines
give a measure of the gradient of the energy (force on all atoms).
The 2-norm is the "length" of this force vector; the inf-norm is the
largest component. Then some information about the line search and
statistics on how many iterations and force-evaluations the minimizer
required. Multiple force evaluations are typically done at each
iteration to perform a 1d line minimization in the search direction.
If a "kspace_style"_kspace_style.html long-range Coulombics solve was
performed during the run (PPPM, Ewald), then additional information is
printed, e.g.
FFT time (% of Kspce) = 0.200313 (8.34477)
FFT Gflps 3d 1d-only = 2.31074 9.19989 :pre
The first line gives the time spent doing 3d FFTs (4 per timestep) and
the fraction it represents of the total KSpace time (listed above).
Each 3d FFT requires computation (3 sets of 1d FFTs) and communication
(transposes). The total flops performed is 5Nlog_2(N), where N is the
number of points in the 3d grid. The FFTs are timed with and without
the communication and a Gflop rate is computed. The 3d rate is with
communication; the 1d rate is without (just the 1d FFTs). Thus you
can estimate what fraction of your FFT time was spent in
communication, roughly 75% in the example above.
:line
2.9 Tips for users of previous LAMMPS versions :h4,link(start_9)
The current C++ began with a complete rewrite of LAMMPS 2001, which
was written in F90. Features of earlier versions of LAMMPS are listed
in "Section 13"_Section_history.html. The F90 and F77 versions
(2001 and 99) are also freely distributed as open-source codes; check
the "LAMMPS WWW Site"_lws for distribution information if you prefer
those versions. The 99 and 2001 versions are no longer under active
development; they do not have all the features of C++ LAMMPS.
If you are a previous user of LAMMPS 2001, these are the most
significant changes you will notice in C++ LAMMPS:
(1) The names and arguments of many input script commands have
changed. All commands are now a single word (e.g. read_data instead
of read data).
(2) All the functionality of LAMMPS 2001 is included in C++ LAMMPS,
but you may need to specify the relevant commands in different ways.
(3) The format of the data file can be streamlined for some problems.
See the "read_data"_read_data.html command for details. The data file
section "Nonbond Coeff" has been renamed to "Pair Coeff" in C++ LAMMPS.
(4) Binary restart files written by LAMMPS 2001 cannot be read by C++
LAMMPS with a "read_restart"_read_restart.html command. This is
because they were output by F90 which writes in a different binary
format than C or C++ writes or reads. Use the {restart2data} tool
provided with LAMMPS 2001 to convert the 2001 restart file to a text
data file. Then edit the data file as necessary before using the C++
LAMMPS "read_data"_read_data.html command to read it in.
(5) There are numerous small numerical changes in C++ LAMMPS that mean
you will not get identical answers when comparing to a 2001 run.
However, your initial thermodynamic energy and MD trajectory should be
close if you have setup the problem for both codes the same.
diff --git a/doc/src/accelerate_kokkos.txt b/doc/src/accelerate_kokkos.txt
index 1a45c04a1..3d31344c2 100644
--- a/doc/src/accelerate_kokkos.txt
+++ b/doc/src/accelerate_kokkos.txt
@@ -1,496 +1,496 @@
"Previous Section"_Section_packages.html - "LAMMPS WWW Site"_lws -
"LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
"Return to Section accelerate overview"_Section_accelerate.html
5.3.3 KOKKOS package :h5
The KOKKOS package was developed primarily by Christian Trott (Sandia)
with contributions of various styles by others, including Sikandar
Mashayak (UIUC), Stan Moore (Sandia), and Ray Shan (Sandia). The
underlying Kokkos library was written primarily by Carter Edwards,
Christian Trott, and Dan Sunderland (all Sandia).
The KOKKOS package contains versions of pair, fix, and atom styles
that use data structures and macros provided by the Kokkos library,
which is included with LAMMPS in lib/kokkos.
The Kokkos library is part of
"Trilinos"_http://trilinos.sandia.gov/packages/kokkos and can also be
downloaded from "Github"_https://github.com/kokkos/kokkos. Kokkos is a
templated C++ library that provides two key abstractions for an
application like LAMMPS. First, it allows a single implementation of
an application kernel (e.g. a pair style) to run efficiently on
different kinds of hardware, such as a GPU, Intel Phi, or many-core
CPU.
The Kokkos library also provides data abstractions to adjust (at
compile time) the memory layout of basic data structures like 2d and
3d arrays and allow the transparent utilization of special hardware
load and store operations. Such data structures are used in LAMMPS to
store atom coordinates or forces or neighbor lists. The layout is
chosen to optimize performance on different platforms. Again this
functionality is hidden from the developer, and does not affect how
the kernel is coded.
These abstractions are set at build time, when LAMMPS is compiled with
the KOKKOS package installed. All Kokkos operations occur within the
context of an individual MPI task running on a single node of the
machine. The total number of MPI tasks used by LAMMPS (one or
multiple per compute node) is set in the usual manner via the mpirun
or mpiexec commands, and is independent of Kokkos.
Kokkos currently provides support for 3 modes of execution (per MPI
task). These are OpenMP (for many-core CPUs), Cuda (for NVIDIA GPUs),
and OpenMP (for Intel Phi). Note that the KOKKOS package supports
running on the Phi in native mode, not offload mode like the
USER-INTEL package supports. You choose the mode at build time to
produce an executable compatible with specific hardware.
Here is a quick overview of how to use the KOKKOS package
for CPU acceleration, assuming one or more 16-core nodes.
More details follow.
use a C++11 compatible compiler
make yes-kokkos
make mpi KOKKOS_DEVICES=OpenMP # build with the KOKKOS package
make kokkos_omp # or Makefile.kokkos_omp already has variable set
Make.py -v -p kokkos -kokkos omp -o mpi -a file mpi # or one-line build via Make.py :pre
mpirun -np 16 lmp_mpi -k on -sf kk -in in.lj # 1 node, 16 MPI tasks/node, no threads
mpirun -np 2 -ppn 1 lmp_mpi -k on t 16 -sf kk -in in.lj # 2 nodes, 1 MPI task/node, 16 threads/task
mpirun -np 2 lmp_mpi -k on t 8 -sf kk -in in.lj # 1 node, 2 MPI tasks/node, 8 threads/task
mpirun -np 32 -ppn 4 lmp_mpi -k on t 4 -sf kk -in in.lj # 8 nodes, 4 MPI tasks/node, 4 threads/task :pre
specify variables and settings in your Makefile.machine that enable OpenMP, GPU, or Phi support
include the KOKKOS package and build LAMMPS
enable the KOKKOS package and its hardware options via the "-k on" command-line switch use KOKKOS styles in your input script :ul
Here is a quick overview of how to use the KOKKOS package for GPUs,
assuming one or more nodes, each with 16 cores and a GPU. More
details follow.
discuss use of NVCC, which Makefiles to examine
use a C++11 compatible compiler
KOKKOS_DEVICES = Cuda, OpenMP
KOKKOS_ARCH = Kepler35
make yes-kokkos
make machine
Make.py -p kokkos -kokkos cuda arch=31 -o kokkos_cuda -a file kokkos_cuda :pre
mpirun -np 1 lmp_cuda -k on t 6 -sf kk -in in.lj # one MPI task, 6 threads on CPU
mpirun -np 4 -ppn 1 lmp_cuda -k on t 6 -sf kk -in in.lj # ditto on 4 nodes :pre
mpirun -np 2 lmp_cuda -k on t 8 g 2 -sf kk -in in.lj # two MPI tasks, 8 threads per CPU
mpirun -np 32 -ppn 2 lmp_cuda -k on t 8 g 2 -sf kk -in in.lj # ditto on 16 nodes :pre
Here is a quick overview of how to use the KOKKOS package
for the Intel Phi:
use a C++11 compatible compiler
KOKKOS_DEVICES = OpenMP
KOKKOS_ARCH = KNC
make yes-kokkos
make machine
Make.py -p kokkos -kokkos phi -o kokkos_phi -a file mpi :pre
host=MIC, Intel Phi with 61 cores (240 threads/phi via 4x hardware threading):
mpirun -np 1 lmp_g++ -k on t 240 -sf kk -in in.lj # 1 MPI task on 1 Phi, 1*240 = 240
mpirun -np 30 lmp_g++ -k on t 8 -sf kk -in in.lj # 30 MPI tasks on 1 Phi, 30*8 = 240
mpirun -np 12 lmp_g++ -k on t 20 -sf kk -in in.lj # 12 MPI tasks on 1 Phi, 12*20 = 240
mpirun -np 96 -ppn 12 lmp_g++ -k on t 20 -sf kk -in in.lj # ditto on 8 Phis :pre
[Required hardware/software:]
Kokkos support within LAMMPS must be built with a C++11 compatible
-compiler. If using gcc, version 4.8.1 or later is required.
+compiler. If using gcc, version 4.7.2 or later is required.
To build with Kokkos support for CPUs, your compiler must support the
OpenMP interface. You should have one or more multi-core CPUs so that
multiple threads can be launched by each MPI task running on a CPU.
To build with Kokkos support for NVIDIA GPUs, NVIDIA Cuda software
-version 6.5 or later must be installed on your system. See the
+version 7.5 or later must be installed on your system. See the
discussion for the "GPU"_accelerate_gpu.html package for details of
how to check and do this.
NOTE: For good performance of the KOKKOS package on GPUs, you must
have Kepler generation GPUs (or later). The Kokkos library exploits
texture cache options not supported by Telsa generation GPUs (or
older).
To build with Kokkos support for Intel Xeon Phi coprocessors, your
sysmte must be configured to use them in "native" mode, not "offload"
mode like the USER-INTEL package supports.
[Building LAMMPS with the KOKKOS package:]
You must choose at build time whether to build for CPUs (OpenMP),
GPUs, or Phi.
You can do any of these in one line, using the src/Make.py script,
described in "Section 2.4"_Section_start.html#start_4 of the manual.
Type "Make.py -h" for help. If run from the src directory, these
commands will create src/lmp_kokkos_omp, lmp_kokkos_cuda, and
lmp_kokkos_phi. Note that the OMP and PHI options use
src/MAKE/Makefile.mpi as the starting Makefile.machine. The CUDA
option uses src/MAKE/OPTIONS/Makefile.kokkos_cuda.
The latter two steps can be done using the "-k on", "-pk kokkos" and
"-sf kk" "command-line switches"_Section_start.html#start_7
respectively. Or the effect of the "-pk" or "-sf" switches can be
duplicated by adding the "package kokkos"_package.html or "suffix
kk"_suffix.html commands respectively to your input script.
Or you can follow these steps:
CPU-only (run all-MPI or with OpenMP threading):
cd lammps/src
make yes-kokkos
make kokkos_omp :pre
CPU-only (only MPI, no threading):
cd lammps/src
make yes-kokkos
make kokkos_mpi :pre
Intel Xeon Phi (Intel Compiler, Intel MPI):
cd lammps/src
make yes-kokkos
make kokkos_phi :pre
CPUs and GPUs (with MPICH):
cd lammps/src
make yes-kokkos
make kokkos_cuda_mpich :pre
These examples set the KOKKOS-specific OMP, MIC, CUDA variables on the
make command line which requires a GNU-compatible make command. Try
"gmake" if your system's standard make complains.
NOTE: If you build using make line variables and re-build LAMMPS twice
with different KOKKOS options and the *same* target, e.g. g++ in the
first two examples above, then you *must* perform a "make clean-all"
or "make clean-machine" before each build. This is to force all the
KOKKOS-dependent files to be re-compiled with the new options.
NOTE: Currently, there are no precision options with the KOKKOS
package. All compilation and computation is performed in double
precision.
There are other allowed options when building with the KOKKOS package.
As above, they can be set either as variables on the make command line
or in Makefile.machine. This is the full list of options, including
those discussed above, Each takes a value shown below. The
default value is listed, which is set in the
lib/kokkos/Makefile.kokkos file.
#Default settings specific options
#Options: force_uvm,use_ldg,rdc
KOKKOS_DEVICES, values = {OpenMP}, {Serial}, {Pthreads}, {Cuda}, default = {OpenMP}
KOKKOS_ARCH, values = {KNC}, {SNB}, {HSW}, {Kepler}, {Kepler30}, {Kepler32}, {Kepler35}, {Kepler37}, {Maxwell}, {Maxwell50}, {Maxwell52}, {Maxwell53}, {ARMv8}, {BGQ}, {Power7}, {Power8}, default = {none}
KOKKOS_DEBUG, values = {yes}, {no}, default = {no}
KOKKOS_USE_TPLS, values = {hwloc}, {librt}, default = {none}
KOKKOS_CUDA_OPTIONS, values = {force_uvm}, {use_ldg}, {rdc} :ul
KOKKOS_DEVICE sets the parallelization method used for Kokkos code
(within LAMMPS). KOKKOS_DEVICES=OpenMP means that OpenMP will be
used. KOKKOS_DEVICES=Pthreads means that pthreads will be used.
KOKKOS_DEVICES=Cuda means an NVIDIA GPU running CUDA will be used.
If KOKKOS_DEVICES=Cuda, then the lo-level Makefile in the src/MAKE
directory must use "nvcc" as its compiler, via its CC setting. For
best performance its CCFLAGS setting should use -O3 and have a
KOKKOS_ARCH setting that matches the compute capability of your NVIDIA
hardware and software installation, e.g. KOKKOS_ARCH=Kepler30. Note
the minimal required compute capability is 2.0, but this will give
signicantly reduced performance compared to Kepler generation GPUs
with compute capability 3.x. For the LINK setting, "nvcc" should not
be used; instead use g++ or another compiler suitable for linking C++
applications. Often you will want to use your MPI compiler wrapper
for this setting (i.e. mpicxx). Finally, the lo-level Makefile must
also have a "Compilation rule" for creating *.o files from *.cu files.
See src/Makefile.cuda for an example of a lo-level Makefile with all
of these settings.
KOKKOS_USE_TPLS=hwloc binds threads to hardware cores, so they do not
migrate during a simulation. KOKKOS_USE_TPLS=hwloc should always be
used if running with KOKKOS_DEVICES=Pthreads for pthreads. It is not
necessary for KOKKOS_DEVICES=OpenMP for OpenMP, because OpenMP
provides alternative methods via environment variables for binding
threads to hardware cores. More info on binding threads to cores is
given in "Section 5.3"_Section_accelerate.html#acc_3.
KOKKOS_ARCH=KNC enables compiler switches needed when compling for an
Intel Phi processor.
KOKKOS_USE_TPLS=librt enables use of a more accurate timer mechanism
on most Unix platforms. This library is not available on all
platforms.
KOKKOS_DEBUG is only useful when developing a Kokkos-enabled style
within LAMMPS. KOKKOS_DEBUG=yes enables printing of run-time
debugging information that can be useful. It also enables runtime
bounds checking on Kokkos data structures.
KOKKOS_CUDA_OPTIONS are additional options for CUDA.
For more information on Kokkos see the Kokkos programmers' guide here:
/lib/kokkos/doc/Kokkos_PG.pdf.
[Run with the KOKKOS package from the command line:]
The mpirun or mpiexec command sets the total number of MPI tasks used
by LAMMPS (one or multiple per compute node) and the number of MPI
tasks used per node. E.g. the mpirun command in MPICH does this via
its -np and -ppn switches. Ditto for OpenMPI via -np and -npernode.
When using KOKKOS built with host=OMP, you need to choose how many
OpenMP threads per MPI task will be used (via the "-k" command-line
switch discussed below). Note that the product of MPI tasks * OpenMP
threads/task should not exceed the physical number of cores (on a
node), otherwise performance will suffer.
When using the KOKKOS package built with device=CUDA, you must use
exactly one MPI task per physical GPU.
When using the KOKKOS package built with host=MIC for Intel Xeon Phi
coprocessor support you need to insure there are one or more MPI tasks
per coprocessor, and choose the number of coprocessor threads to use
per MPI task (via the "-k" command-line switch discussed below). The
product of MPI tasks * coprocessor threads/task should not exceed the
maximum number of threads the coproprocessor is designed to run,
otherwise performance will suffer. This value is 240 for current
generation Xeon Phi(TM) chips, which is 60 physical cores * 4
threads/core. Note that with the KOKKOS package you do not need to
specify how many Phi coprocessors there are per node; each
coprocessors is simply treated as running some number of MPI tasks.
You must use the "-k on" "command-line
switch"_Section_start.html#start_7 to enable the KOKKOS package. It
takes additional arguments for hardware settings appropriate to your
system. Those arguments are "documented
here"_Section_start.html#start_7. The two most commonly used
options are:
-k on t Nt g Ng :pre
The "t Nt" option applies to host=OMP (even if device=CUDA) and
host=MIC. For host=OMP, it specifies how many OpenMP threads per MPI
task to use with a node. For host=MIC, it specifies how many Xeon Phi
threads per MPI task to use within a node. The default is Nt = 1.
Note that for host=OMP this is effectively MPI-only mode which may be
fine. But for host=MIC you will typically end up using far less than
all the 240 available threads, which could give very poor performance.
The "g Ng" option applies to device=CUDA. It specifies how many GPUs
per compute node to use. The default is 1, so this only needs to be
specified is you have 2 or more GPUs per compute node.
The "-k on" switch also issues a "package kokkos" command (with no
additional arguments) which sets various KOKKOS options to default
values, as discussed on the "package"_package.html command doc page.
Use the "-sf kk" "command-line switch"_Section_start.html#start_7,
which will automatically append "kk" to styles that support it. Use
the "-pk kokkos" "command-line switch"_Section_start.html#start_7 if
you wish to change any of the default "package kokkos"_package.html
optionns set by the "-k on" "command-line
switch"_Section_start.html#start_7.
Note that the default for the "package kokkos"_package.html command is
to use "full" neighbor lists and set the Newton flag to "off" for both
pairwise and bonded interactions. This typically gives fastest
performance. If the "newton"_newton.html command is used in the input
script, it can override the Newton flag defaults.
However, when running in MPI-only mode with 1 thread per MPI task, it
will typically be faster to use "half" neighbor lists and set the
Newton flag to "on", just as is the case for non-accelerated pair
styles. You can do this with the "-pk" "command-line
switch"_Section_start.html#start_7.
[Or run with the KOKKOS package by editing an input script:]
The discussion above for the mpirun/mpiexec command and setting
appropriate thread and GPU values for host=OMP or host=MIC or
device=CUDA are the same.
You must still use the "-k on" "command-line
switch"_Section_start.html#start_7 to enable the KOKKOS package, and
specify its additional arguments for hardware options appopriate to
your system, as documented above.
Use the "suffix kk"_suffix.html command, or you can explicitly add a
"kk" suffix to individual styles in your input script, e.g.
pair_style lj/cut/kk 2.5 :pre
You only need to use the "package kokkos"_package.html command if you
wish to change any of its option defaults, as set by the "-k on"
"command-line switch"_Section_start.html#start_7.
[Speed-ups to expect:]
The performance of KOKKOS running in different modes is a function of
your hardware, which KOKKOS-enable styles are used, and the problem
size.
Generally speaking, the following rules of thumb apply:
When running on CPUs only, with a single thread per MPI task,
performance of a KOKKOS style is somewhere between the standard
(un-accelerated) styles (MPI-only mode), and those provided by the
USER-OMP package. However the difference between all 3 is small (less
than 20%). :ulb,l
When running on CPUs only, with multiple threads per MPI task,
performance of a KOKKOS style is a bit slower than the USER-OMP
package. :l
When running large number of atoms per GPU, KOKKOS is typically faster
than the GPU package. :l
When running on Intel Xeon Phi, KOKKOS is not as fast as
the USER-INTEL package, which is optimized for that hardware. :l
:ule
See the "Benchmark page"_http://lammps.sandia.gov/bench.html of the
LAMMPS web site for performance of the KOKKOS package on different
hardware.
[Guidelines for best performance:]
Here are guidline for using the KOKKOS package on the different
hardware configurations listed above.
Many of the guidelines use the "package kokkos"_package.html command
See its doc page for details and default settings. Experimenting with
its options can provide a speed-up for specific calculations.
[Running on a multi-core CPU:]
If N is the number of physical cores/node, then the number of MPI
tasks/node * number of threads/task should not exceed N, and should
typically equal N. Note that the default threads/task is 1, as set by
the "t" keyword of the "-k" "command-line
switch"_Section_start.html#start_7. If you do not change this, no
additional parallelism (beyond MPI) will be invoked on the host
CPU(s).
You can compare the performance running in different modes:
run with 1 MPI task/node and N threads/task
run with N MPI tasks/node and 1 thread/task
run with settings in between these extremes :ul
Examples of mpirun commands in these modes are shown above.
When using KOKKOS to perform multi-threading, it is important for
performance to bind both MPI tasks to physical cores, and threads to
physical cores, so they do not migrate during a simulation.
If you are not certain MPI tasks are being bound (check the defaults
for your MPI installation), binding can be forced with these flags:
OpenMPI 1.8: mpirun -np 2 -bind-to socket -map-by socket ./lmp_openmpi ...
Mvapich2 2.0: mpiexec -np 2 -bind-to socket -map-by socket ./lmp_mvapich ... :pre
For binding threads with the KOKKOS OMP option, use thread affinity
environment variables to force binding. With OpenMP 3.1 (gcc 4.7 or
later, intel 12 or later) setting the environment variable
OMP_PROC_BIND=true should be sufficient. For binding threads with the
KOKKOS pthreads option, compile LAMMPS the KOKKOS HWLOC=yes option, as
discussed in "Section 2.3.4"_Sections_start.html#start_3_4 of the
manual.
[Running on GPUs:]
Insure the -arch setting in the machine makefile you are using,
e.g. src/MAKE/Makefile.cuda, is correct for your GPU hardware/software
(see "this section"_Section_start.html#start_3_4 of the manual for
details).
The -np setting of the mpirun command should set the number of MPI
tasks/node to be equal to the # of physical GPUs on the node.
Use the "-k" "command-line switch"_Section_commands.html#start_7 to
specify the number of GPUs per node, and the number of threads per MPI
task. As above for multi-core CPUs (and no GPU), if N is the number
of physical cores/node, then the number of MPI tasks/node * number of
threads/task should not exceed N. With one GPU (and one MPI task) it
may be faster to use less than all the available cores, by setting
threads/task to a smaller value. This is because using all the cores
on a dual-socket node will incur extra cost to copy memory from the
2nd socket to the GPU.
Examples of mpirun commands that follow these rules are shown above.
NOTE: When using a GPU, you will achieve the best performance if your
input script does not use any fix or compute styles which are not yet
Kokkos-enabled. This allows data to stay on the GPU for multiple
timesteps, without being copied back to the host CPU. Invoking a
non-Kokkos fix or compute, or performing I/O for
"thermo"_thermo_style.html or "dump"_dump.html output will cause data
to be copied back to the CPU.
You cannot yet assign multiple MPI tasks to the same GPU with the
KOKKOS package. We plan to support this in the future, similar to the
GPU package in LAMMPS.
You cannot yet use both the host (multi-threaded) and device (GPU)
together to compute pairwise interactions with the KOKKOS package. We
hope to support this in the future, similar to the GPU package in
LAMMPS.
[Running on an Intel Phi:]
Kokkos only uses Intel Phi processors in their "native" mode, i.e.
not hosted by a CPU.
As illustrated above, build LAMMPS with OMP=yes (the default) and
MIC=yes. The latter insures code is correctly compiled for the Intel
Phi. The OMP setting means OpenMP will be used for parallelization on
the Phi, which is currently the best option within Kokkos. In the
future, other options may be added.
Current-generation Intel Phi chips have either 61 or 57 cores. One
core should be excluded for running the OS, leaving 60 or 56 cores.
Each core is hyperthreaded, so there are effectively N = 240 (4*60) or
N = 224 (4*56) cores to run on.
The -np setting of the mpirun command sets the number of MPI
tasks/node. The "-k on t Nt" command-line switch sets the number of
threads/task as Nt. The product of these 2 values should be N, i.e.
240 or 224. Also, the number of threads/task should be a multiple of
4 so that logical threads from more than one MPI task do not run on
the same physical core.
Examples of mpirun commands that follow these rules are shown above.
[Restrictions:]
As noted above, if using GPUs, the number of MPI tasks per compute
node should equal to the number of GPUs per compute node. In the
future Kokkos will support assigning multiple MPI tasks to a single
GPU.
Currently Kokkos does not support AMD GPUs due to limits in the
available backend programming models. Specifically, Kokkos requires
extensive C++ support from the Kernel language. This is expected to
change in the future.
diff --git a/doc/src/commands.txt b/doc/src/commands.txt
index c5f22c666..2fdb69ea4 100644
--- a/doc/src/commands.txt
+++ b/doc/src/commands.txt
@@ -1,110 +1,111 @@
Commands :h1
<!-- RST
.. toctree::
:maxdepth: 1
angle_coeff
angle_style
atom_modify
atom_style
balance
bond_coeff
bond_style
bond_write
boundary
box
change_box
clear
comm_modify
comm_style
compute
compute_modify
create_atoms
create_bonds
create_box
delete_atoms
delete_bonds
dielectric
dihedral_coeff
dihedral_style
dimension
displace_atoms
dump
dump_custom_vtk
dump_h5md
dump_image
dump_modify
dump_molfile
dump_nc
echo
fix
fix_modify
group
group2ndx
if
improper_coeff
improper_style
include
info
jump
kspace_modify
kspace_style
label
lattice
log
mass
min_modify
min_style
minimize
molecule
neb
neigh_modify
neighbor
newton
next
package
pair_coeff
pair_modify
pair_style
pair_write
partition
prd
print
processors
python
quit
read_data
read_dump
read_restart
region
replicate
rerun
reset_timestep
restart
run
run_style
set
shell
special_bonds
suffix
tad
temper
+ temper_grem
thermo
thermo_modify
thermo_style
timer
timestep
uncompute
undump
unfix
units
variable
velocity
write_coeff
write_data
write_dump
write_restart
END_RST -->
diff --git a/doc/src/compute_coord_atom.txt b/doc/src/compute_coord_atom.txt
index 012a87a9a..f1a6bf7ff 100644
--- a/doc/src/compute_coord_atom.txt
+++ b/doc/src/compute_coord_atom.txt
@@ -1,92 +1,121 @@
"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
compute coord/atom command :h3
[Syntax:]
-compute ID group-ID coord/atom cutoff type1 type2 ... :pre
+compute ID group-ID coord/atom cstyle args ... :pre
ID, group-ID are documented in "compute"_compute.html command
coord/atom = style name of this compute command
-cutoff = distance within which to count coordination neighbors (distance units)
-typeN = atom type for Nth coordination count (see asterisk form below) :ul
+one cstyle must be appended :ul
+
+cstyle = {cutoff} or {orientorder}
+
+{cutoff} args = cutoff typeN
+ cutoff = distance within which to count coordination neighbors (distance units)
+ typeN = atom type for Nth coordination count (see asterisk form below) :pre
+
+{orientorder} args = orientorderID threshold
+ orientorderID = ID of a previously defined orientorder/atom compute
+ threshold = minimum value of the scalar product between two 'connected' atoms (see text for explanation) :pre
[Examples:]
-compute 1 all coord/atom 2.0
-compute 1 all coord/atom 6.0 1 2
-compute 1 all coord/atom 6.0 2*4 5*8 * :pre
+compute 1 all coord/atom cutoff 2.0
+compute 1 all coord/atom cutoff 6.0 1 2
+compute 1 all coord/atom cutoff 6.0 2*4 5*8 *
+compute 1 all coord/atom orientorder 2 0.5 :pre
[Description:]
-Define a computation that calculates one or more coordination numbers
+This compute performs generic calculations between neighboring atoms. So far,
+there are two cstyles implemented: {cutoff} and {orientorder}.
+The {cutoff} cstyle calculates one or more coordination numbers
for each atom in a group.
A coordination number is defined as the number of neighbor atoms with
specified atom type(s) that are within the specified cutoff distance
from the central atom. Atoms not in the group are included in a
coordination number of atoms in the group.
The {typeN} keywords allow you to specify which atom types contribute
to each coordination number. One coordination number is computed for
each of the {typeN} keywords listed. If no {typeN} keywords are
listed, a single coordination number is calculated, which includes
atoms of all types (same as the "*" format, see below).
The {typeN} keywords can be specified in one of two ways. An explicit
numeric value can be used, as in the 2nd example above. Or a
wild-card asterisk can be used to specify a range of atom types. This
takes the form "*" or "*n" or "n*" or "m*n". If N = the number of
atom types, then an asterisk with no numeric values means all types
from 1 to N. A leading asterisk means all types from 1 to n
(inclusive). A trailing asterisk means all types from n to N
(inclusive). A middle asterisk means all types from m to n
(inclusive).
+The {orientorder} cstyle calculates the number of 'connected' atoms j
+around each atom i. The atom j is connected to i if the scalar product
+({Ybar_lm(i)},{Ybar_lm(j)}) is larger than {threshold}. Thus, this cstyle
+will work only if a "compute orientorder/atom"_compute_orientorder_atom.html
+has been previously defined. This cstyle allows one to apply the
+ten Wolde's criterion to identify cristal-like atoms in a system
+(see "ten Wolde et al."_#tenWolde).
+
The value of all coordination numbers will be 0.0 for atoms not in the
specified compute group.
The neighbor list needed to compute this quantity is constructed each
time the calculation is performed (i.e. each time a snapshot of atoms
is dumped). Thus it can be inefficient to compute/dump this quantity
too frequently.
NOTE: If you have a bonded system, then the settings of
"special_bonds"_special_bonds.html command can remove pairwise
interactions between atoms in the same bond, angle, or dihedral. This
is the default setting for the "special_bonds"_special_bonds.html
command, and means those pairwise interactions do not appear in the
neighbor list. Because this fix uses the neighbor list, it also means
those pairs will not be included in the coordination count. One way
to get around this, is to write a dump file, and use the
"rerun"_rerun.html command to compute the coordination for snapshots
in the dump file. The rerun script can use a
"special_bonds"_special_bonds.html command that includes all pairs in
the neighbor list.
[Output info:]
If single {type1} keyword is specified (or if none are specified),
this compute calculates a per-atom vector. If multiple {typeN}
keywords are specified, this compute calculates a per-atom array, with
N columns. These values can be accessed by any command that uses
per-atom values from a compute as input. See "Section
6.15"_Section_howto.html#howto_15 for an overview of LAMMPS output
options.
The per-atom vector or array values will be a number >= 0.0, as
explained above.
-[Restrictions:] none
+[Restrictions:]
+The cstyle {orientorder} can only be used if a
+"compute orientorder/atom"_compute_orientorder_atom.html command
+was previously defined. Otherwise, an error message will be issued.
[Related commands:]
"compute cluster/atom"_compute_cluster_atom.html
+"compute orientorder/atom"_compute_orientorder_atom.html
[Default:] none
+
+:line
+
+:link(tenWolde)
+[(tenWolde)] P. R. ten Wolde, M. J. Ruiz-Montero, D. Frenkel, J. Chem. Phys. 104, 9932 (1996).
diff --git a/doc/src/compute_orientorder_atom.txt b/doc/src/compute_orientorder_atom.txt
index c5ecef3cb..74426dd33 100644
--- a/doc/src/compute_orientorder_atom.txt
+++ b/doc/src/compute_orientorder_atom.txt
@@ -1,121 +1,140 @@
"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
compute orientorder/atom command :h3
[Syntax:]
compute ID group-ID orientorder/atom keyword values ... :pre
ID, group-ID are documented in "compute"_compute.html command :ulb,l
orientorder/atom = style name of this compute command :l
one or more keyword/value pairs may be appended :l
-keyword = {cutoff} or {nnn} or {degrees}
+keyword = {cutoff} or {nnn} or {degrees} or {components}
{cutoff} value = distance cutoff
{nnn} value = number of nearest neighbors
- {degrees} values = nlvalues, l1, l2,... :pre
+ {degrees} values = nlvalues, l1, l2,...
+ {components} value = l :pre
:ule
[Examples:]
compute 1 all orientorder/atom
-compute 1 all orientorder/atom degrees 5 4 6 8 10 12 nnn NULL cutoff 1.5 :pre
+compute 1 all orientorder/atom degrees 5 4 6 8 10 12 nnn NULL cutoff 1.5
+compute 1 all orientorder/atom degrees 4 6 components 6 nnn NULL cutoff 3.0 :pre
[Description:]
Define a computation that calculates a set of bond-orientational
order parameters {Ql} for each atom in a group. These order parameters
were introduced by "Steinhardt et al."_#Steinhardt as a way to
characterize the local orientational order in atomic structures.
For each atom, {Ql} is a real number defined as follows:
:c,image(Eqs/orientorder.jpg)
The first equation defines the spherical harmonic order parameters.
These are complex number components of the 3D analog of the 2D order
parameter {qn}, which is implemented as LAMMPS compute
"hexorder/atom"_compute_hexorder_atom.html.
The summation is over the {nnn} nearest
neighbors of the central atom.
The angles theta and phi are the standard spherical polar angles
defining the direction of the bond vector {rij}.
The second equation defines {Ql}, which is a
rotationally invariant scalar quantity obtained by summing
over all the components of degree {l}.
The optional keyword {cutoff} defines the distance cutoff
used when searching for neighbors. The default value, also
the maximum allowable value, is the cutoff specified
by the pair style.
The optional keyword {nnn} defines the number of nearest
neighbors used to calculate {Ql}. The default value is 12.
If the value is NULL, then all neighbors up to the
specified distance cutoff are used.
The optional keyword {degrees} defines the list of order parameters to
be computed. The first argument {nlvalues} is the number of order
parameters. This is followed by that number of integers giving the
degree of each order parameter. Because {Q}2 and all odd-degree
order parameters are zero for atoms in cubic crystals
(see "Steinhardt"_#Steinhardt), the default order parameters
are {Q}4, {Q}6, {Q}8, {Q}10, and {Q}12. For the
FCC crystal with {nnn}=12, {Q}4 = sqrt(7/3)/8 = 0.19094....
The numerical values of all order parameters up to {Q}12
for a range of commonly encountered high-symmetry structures are given
in Table I of "Mickel et al."_#Mickel.
+The optional keyword {components} will output the components of
+the normalized complex vector {Ybar_lm} of degree {l}, which must be
+explicitly included in the keyword {degrees}. This option can be used
+in conjunction with "compute coord_atom"_compute_coord_atom.html to
+calculate the ten Wolde's criterion to identify crystal-like particles
+(see "ten Wolde et al."_#tenWolde96).
+
The value of {Ql} is set to zero for atoms not in the
specified compute group, as well as for atoms that have less than
{nnn} neighbors within the distance cutoff.
The neighbor list needed to compute this quantity is constructed each
time the calculation is performed (i.e. each time a snapshot of atoms
is dumped). Thus it can be inefficient to compute/dump this quantity
too frequently.
NOTE: If you have a bonded system, then the settings of
"special_bonds"_special_bonds.html command can remove pairwise
interactions between atoms in the same bond, angle, or dihedral. This
is the default setting for the "special_bonds"_special_bonds.html
command, and means those pairwise interactions do not appear in the
neighbor list. Because this fix uses the neighbor list, it also means
those pairs will not be included in the order parameter. This
difficulty can be circumvented by writing a dump file, and using the
"rerun"_rerun.html command to compute the order parameter for
snapshots in the dump file. The rerun script can use a
"special_bonds"_special_bonds.html command that includes all pairs in
the neighbor list.
[Output info:]
This compute calculates a per-atom array with {nlvalues} columns, giving the
{Ql} values for each atom, which are real numbers on the range 0 <= {Ql} <= 1.
+If the keyword {components} is set, then the real and imaginary parts of each
+component of (normalized) {Ybar_lm} will be added to the output array in the
+following order:
+Re({Ybar_-m}) Im({Ybar_-m}) Re({Ybar_-m+1}) Im({Ybar_-m+1}) ... Re({Ybar_m}) Im({Ybar_m}).
+This way, the per-atom array will have a total of {nlvalues}+2*(2{l}+1) columns.
+
These values can be accessed by any command that uses
per-atom values from a compute as input. See "Section
6.15"_Section_howto.html#howto_15 for an overview of LAMMPS output
options.
[Restrictions:] none
[Related commands:]
"compute coord/atom"_compute_coord_atom.html, "compute centro/atom"_compute_centro_atom.html, "compute hexorder/atom"_compute_hexorder_atom.html
[Default:]
The option defaults are {cutoff} = pair style cutoff, {nnn} = 12, {degrees} = 5 4 6 8 10 12 i.e. {Q}4, {Q}6, {Q}8, {Q}10, and {Q}12.
:line
:link(Steinhardt)
[(Steinhardt)] P. Steinhardt, D. Nelson, and M. Ronchetti, Phys. Rev. B 28, 784 (1983).
+
:link(Mickel)
[(Mickel)] W. Mickel, S. C. Kapfer, G. E. Schroeder-Turkand, K. Mecke, J. Chem. Phys. 138, 044501 (2013).
+
+:link(tenWolde96)
+[(tenWolde)] P. R. ten Wolde, M. J. Ruiz-Montero, D. Frenkel, J. Chem. Phys. 104, 9932 (1996).
diff --git a/doc/src/computes.txt b/doc/src/computes.txt
index e7e54a6b3..1d0179879 100644
--- a/doc/src/computes.txt
+++ b/doc/src/computes.txt
@@ -1,120 +1,121 @@
Computes :h1
<!-- RST
.. toctree::
:maxdepth: 1
compute_ackland_atom
compute_angle
compute_angle_local
compute_angmom_chunk
compute_basal_atom
compute_body_local
compute_bond
compute_bond_local
compute_centro_atom
compute_chunk_atom
compute_cluster_atom
compute_cna_atom
compute_com
compute_com_chunk
compute_contact_atom
compute_coord_atom
compute_damage_atom
compute_dihedral
compute_dihedral_local
compute_dilatation_atom
compute_dipole_chunk
compute_displace_atom
compute_dpd
compute_dpd_atom
compute_erotate_asphere
compute_erotate_rigid
compute_erotate_sphere
compute_erotate_sphere_atom
compute_event_displace
compute_fep
+ compute_global_atom
compute_group_group
compute_gyration
compute_gyration_chunk
compute_heat_flux
compute_hexorder_atom
compute_improper
compute_improper_local
compute_inertia_chunk
compute_ke
compute_ke_atom
compute_ke_atom_eff
compute_ke_eff
compute_ke_rigid
compute_meso_e_atom
compute_meso_rho_atom
compute_meso_t_atom
compute_msd
compute_msd_chunk
compute_msd_nongauss
compute_omega_chunk
compute_orientorder_atom
compute_pair
compute_pair_local
compute_pe
compute_pe_atom
compute_plasticity_atom
compute_pressure
compute_property_atom
compute_property_chunk
compute_property_local
compute_rdf
compute_reduce
compute_rigid_local
compute_saed
compute_slice
compute_smd_contact_radius
compute_smd_damage
compute_smd_hourglass_error
compute_smd_internal_energy
compute_smd_plastic_strain
compute_smd_plastic_strain_rate
compute_smd_rho
compute_smd_tlsph_defgrad
compute_smd_tlsph_dt
compute_smd_tlsph_num_neighs
compute_smd_tlsph_shape
compute_smd_tlsph_strain
compute_smd_tlsph_strain_rate
compute_smd_tlsph_stress
compute_smd_triangle_mesh_vertices
compute_smd_ulsph_num_neighs
compute_smd_ulsph_strain
compute_smd_ulsph_strain_rate
compute_smd_ulsph_stress
compute_smd_vol
compute_sna_atom
compute_stress_atom
compute_tally
compute_temp
compute_temp_asphere
compute_temp_body
compute_temp_chunk
compute_temp_com
compute_temp_cs
compute_temp_deform
compute_temp_deform_eff
compute_temp_drude
compute_temp_eff
compute_temp_partial
compute_temp_profile
compute_temp_ramp
compute_temp_region
compute_temp_region_eff
compute_temp_rotate
compute_temp_sphere
compute_ti
compute_torque_chunk
compute_vacf
compute_vcm_chunk
compute_voronoi_atom
compute_xrd
END_RST -->
diff --git a/doc/src/fix_flow_gauss.txt b/doc/src/fix_flow_gauss.txt
index e4088cd02..fcdc4e558 100644
--- a/doc/src/fix_flow_gauss.txt
+++ b/doc/src/fix_flow_gauss.txt
@@ -1,160 +1,160 @@
"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
fix flow/gauss command :h3
[Syntax:]
fix ID group-ID flow/gauss xflag yflag zflag keyword :pre
ID, group-ID are documented in "fix"_fix.html command :ulb,l
flow/gauss = style name of this fix command :l
xflag,yflag,zflag = 0 or 1 :l
0 = do not conserve current in this dimension
1 = conserve current in this dimension :pre
zero or more keyword/value pairs may be appended :l
keyword = {energy} :l
{energy} value = no or yes
no = do not compute work done by this fix
yes = compute work done by this fix :pre
:ule
[Examples:]
fix GD fluid flow/gauss 1 0 0
fix GD fluid flow/gauss 1 1 1 energy yes :pre
[Description:]
This fix implements the Gaussian dynamics (GD) method to simulate a
system at constant mass flux "(Strong)"_#Strong. GD is a
nonequilibrium molecular dynamics simulation method that can be used
to study fluid flows through pores, pipes, and channels. In its
original implementation GD was used to compute the pressure required
to achieve a fixed mass flux through an opening. The flux can be
conserved in any combination of the directions, x, y, or z, using
xflag,yflag,zflag. This fix does not initialize a net flux through a
system, it only conserves the center-of-mass momentum that is present
when the fix is declared in the input script. Use the
"velocity"_velocity.html command to generate an initial center-of-mass
momentum.
GD applies an external fluctuating gravitational field that acts as a
driving force to keep the system away from equilibrium. To maintain
steady state, a profile-unbiased thermostat must be implemented to
dissipate the heat that is added by the driving force. "Compute
temp/profile"_compute_temp_profile.html can be used to implement a
profile-unbiased thermostat.
A common use of this fix is to compute a pressure drop across a pipe,
pore, or membrane. The pressure profile can be computed in LAMMPS with
"compute stress/atom"_compute_stress_atom.html and "fix
ave/chunk"_fix_ave_chunk.html, or with the hardy method in "fix
atc"_fix_atc.html. Note that the simple "compute
stress/atom"_compute_stress_atom.html method is only accurate away
from inhomogeneities in the fluid, such as fixed wall atoms. Further,
the computed pressure profile must be corrected for the acceleration
applied by GD before computing a pressure drop or comparing it to
other methods, such as the pump method "(Zhu)"_#Zhu. The pressure
correction is discussed and described in "(Strong)"_#Strong.
For a complete example including the considerations discussed
above, see the examples/USER/flow_gauss directory.
NOTE: Only the flux of the atoms in group-ID will be conserved. If the
velocities of the group-ID atoms are coupled to the velocities of
other atoms in the simulation, the flux will not be conserved. For
example, in a simulation with fluid atoms and harmonically constrained
wall atoms, if a single thermostat is applied to group {all}, the
fluid atom velocities will be coupled to the wall atom velocities, and
the flux will not be conserved. This issue can be avoided by
thermostatting the fluid and wall groups separately.
Adding an acceleration to atoms does work on the system. This added
energy can be optionally subtracted from the potential energy for the
thermodynamic output (see below) to check that the timestep is small
enough to conserve energy. Since the applied acceleration is
fluctuating in time, the work cannot be computed from a potential. As
a result, computing the work is slightly more computationally
expensive than usual, so it is not performed by default. To invoke the
work calculation, use the {energy} keyword. The
"fix_modify"_fix_modify.html {energy} option also invokes the work
calculation, and overrides an {energy no} setting here. If neither
{energy yes} or {fix_modify energy yes} are set, the global scalar
computed by the fix will return zero.
NOTE: In order to check energy conservation, any other fixes that do
work on the system must have {fix_modify energy yes} set as well. This
includes thermostat fixes and any constraints that hold the positions
of wall atoms fixed, such as "fix spring/self"_fix_spring_self.html.
If this fix is used in a simulation with the "rRESPA"_run_style.html
integrator, the applied acceleration must be computed and applied at the same
rRESPA level as the interactions between the flowing fluid and the obstacle.
The rRESPA level at which the acceleration is applied can be changed using
the "fix_modify"_fix_modify.html {respa} option discussed below. If the
flowing fluid and the obstacle interact through multiple interactions that are
computed at different rRESPA levels, then there must be a separate flow/gauss
fix for each level. For example, if the flowing fluid and obstacle interact
through pairwise and long-range Coulomb interactions, which are computed at
rRESPA levels 3 and 4, respectively, then there must be two separate
flow/gauss fixes, one that specifies {fix_modify respa 3} and one with
{fix_modify respa 4}.
:line
[Restart, fix_modify, output, run start/stop, minimize info:]
This fix is part of the USER-MISC package. It is only enabled if
LAMMPS was built with that package. See the "Making
LAMMPS"_Section_start.html#start_3 section for more info.
No information about this fix is written to "binary restart
files"_restart.html.
The "fix_modify"_fix_modify.html {energy} option is supported by this
fix to subtract the work done from the
system's potential energy as part of "thermodynamic
output"_thermo_style.html.
The "fix_modify"_fix_modify.html {respa} option is supported by this
fix. This allows the user to set at which level of the "rRESPA"_run_style.html
integrator the fix computes and adds the external acceleration. Default is the
outermost level.
This fix computes a global scalar and a global 3-vector of forces,
which can be accessed by various "output
commands"_Section_howto.html#howto_15. The scalar is the negative of the
work done on the system, see above discussion. The vector is the total force
that this fix applied to the group of atoms on the current timestep.
The scalar and vector values calculated by this fix are "extensive".
No parameter of this fix can be used with the {start/stop} keywords of
the "run"_run.html command.
[Restrictions:] none
[Related commands:]
"fix addforce"_fix_addforce.html, "compute
temp/profile"_compute_temp_profile.html, "velocity"_velocity.html
[Default:]
The option default for the {energy} keyword is energy = no.
:line
:link(Strong)
-[(Strong)] Strong and Eaves, J. Phys. Chem. Lett. 7, 1907 (2016).
+[(Strong)] Strong and Eaves, J. Phys. Chem. B 121, 189 (2017).
:link(Evans)
[(Evans)] Evans and Morriss, Phys. Rev. Lett. 56, 2172 (1986).
:link(Zhu)
[(Zhu)] Zhu, Tajkhorshid, and Schulten, Biophys. J. 83, 154 (2002).
diff --git a/doc/src/fix_grem.txt b/doc/src/fix_grem.txt
index eac4d6f4b..3fc5c1a10 100644
--- a/doc/src/fix_grem.txt
+++ b/doc/src/fix_grem.txt
@@ -1,111 +1,111 @@
"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
fix grem command :h3
[Syntax:]
fix ID group-ID grem lambda eta H0 thermostat-ID :pre
ID, group-ID are documented in "fix"_fix.html command :ulb,l
grem = style name of this fix command :l
lambda = intercept parameter of linear effective temperature function :l
eta = slope parameter of linear effective temperature function :l
H0 = shift parameter of linear effective temperature function :l
thermostat-ID = ID of Nose-Hoover thermostat or barostat used in simulation :l,ule
[Examples:]
fix fxgREM all grem 400 -0.01 -30000 fxnpt
thermo_modify press fxgREM_press :pre
fix fxgREM all grem 502 -0.15 -80000 fxnvt :pre
[Description:]
This fix implements the molecular dynamics version of the generalized
-replica exchange method (gREM) originally developed by "(Kim)"_#Kim,
+replica exchange method (gREM) originally developed by "(Kim)"_#Kim2010,
which uses non-Boltzmann ensembles to sample over first order phase
transitions. The is done by defining replicas with an enthalpy
dependent effective temperature
:c,image(Eqs/fix_grem.jpg)
with {eta} negative and steep enough to only intersect the
characteristic microcanonical temperature (Ts) of the system once,
ensuring a unimodal enthalpy distribution in that replica. {Lambda} is
the intercept and effects the generalized ensemble similar to how
temperature effects a Boltzmann ensemble. {H0} is a reference
enthalpy, and is typically set as the lowest desired sampled enthalpy.
Further explanation can be found in our recent papers
"(Malolepsza)"_#Malolepsza.
This fix requires a Nose-Hoover thermostat fix reference passed to the
grem as {thermostat-ID}. Two distinct temperatures exist in this
generalized ensemble, the effective temperature defined above, and a
kinetic temperature that controls the velocity distribution of
particles as usual. Either constant volume or constant pressure
algorithms can be used.
The fix enforces a generalized ensemble in a single replica
only. Typically, this ideaology is combined with replica exchange with
replicas differing by {lambda} only for simplicity, but this is not
required. A multi-replica simulation can be run within the LAMMPS
environment using the "temper/grem"_temper_grem.html command. This
utilizes LAMMPS partition mode and requires the number of available
processors be on the order of the number of desired replicas. A
100-replica simulation would require at least 100 processors (1 per
world at minimum). If a many replicas are needed on a small number of
processors, multi-replica runs can be run outside of LAMMPS. An
example of this can be found in examples/USER/misc/grem and has no
limit on the number of replicas per processor. However, this is very
inefficient and error prone and should be avoided if possible.
In general, defining the generalized ensembles is unique for every
system. When starting a many-replica simulation without any knowledge
of the underlying microcanonical temperature, there are several tricks
we have utilized to optimize the process. Choosing a less-steep {eta}
yields broader distributions, requiring fewer replicas to map the
microcanonical temperature. While this likely struggles from the same
sampling problems gREM was built to avoid, it provides quick insight
to Ts. Initially using an evenly-spaced {lambda} distribution
identifies regions where small changes in enthalpy lead to large
temperature changes. Replicas are easily added where needed.
:line
[Restart, fix_modify, output, run start/stop, minimize info:]
No information about this fix is written to "binary restart
files"_restart.html.
The "thermo_modify"_thermo_modify.html {press} option is supported
by this fix to add the rescaled kinetic pressure as part of
"thermodynamic output"_thermo_style.html.
[Restrictions:]
This fix is part of the USER-MISC package. It is only enabled if
LAMMPS was built with that package. See the "Making
LAMMPS"_Section_start.html#start_3 section for more info.
[Related commands:]
"temper/grem"_temper_grem.html, "fix nvt"_fix_nh.html, "fix
npt"_fix_nh.html, "thermo_modify"_thermo_modify.html
[Default:] none
:line
-:link(Kim)
+:link(Kim2010)
[(Kim)] Kim, Keyes, Straub, J Chem. Phys, 132, 224107 (2010).
:link(Malolepsza)
[(Malolepsza)] Malolepsza, Secor, Keyes, J Phys Chem B 119 (42),
13379-13384 (2015).
diff --git a/doc/src/fix_spring.txt b/doc/src/fix_spring.txt
index 1d0bd4714..5f94f4cda 100644
--- a/doc/src/fix_spring.txt
+++ b/doc/src/fix_spring.txt
@@ -1,146 +1,142 @@
"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
fix spring command :h3
[Syntax:]
fix ID group-ID spring keyword values :pre
ID, group-ID are documented in "fix"_fix.html command :ulb,l
spring = style name of this fix command :l
keyword = {tether} or {couple} :l
{tether} values = K x y z R0
K = spring constant (force/distance units)
x,y,z = point to which spring is tethered
R0 = equilibrium distance from tether point (distance units)
{couple} values = group-ID2 K x y z R0
group-ID2 = 2nd group to couple to fix group with a spring
K = spring constant (force/distance units)
x,y,z = direction of spring
R0 = equilibrium distance of spring (distance units) :pre
:ule
[Examples:]
fix pull ligand spring tether 50.0 0.0 0.0 0.0 0.0
fix pull ligand spring tether 50.0 0.0 0.0 0.0 5.0
fix pull ligand spring tether 50.0 NULL NULL 2.0 3.0
fix 5 bilayer1 spring couple bilayer2 100.0 NULL NULL 10.0 0.0
fix longitudinal pore spring couple ion 100.0 NULL NULL -20.0 0.0
fix radial pore spring couple ion 100.0 0.0 0.0 NULL 5.0 :pre
[Description:]
Apply a spring force to a group of atoms or between two groups of
atoms. This is useful for applying an umbrella force to a small
molecule or lightly tethering a large group of atoms (e.g. all the
solvent or a large molecule) to the center of the simulation box so
that it doesn't wander away over the course of a long simulation. It
can also be used to hold the centers of mass of two groups of atoms at
a given distance or orientation with respect to each other.
The {tether} style attaches a spring between a fixed point {x,y,z} and
the center of mass of the fix group of atoms. The equilibrium
position of the spring is R0. At each timestep the distance R from
the center of mass of the group of atoms to the tethering point is
computed, taking account of wrap-around in a periodic simulation box.
A restoring force of magnitude K (R - R0) Mi / M is applied to each
atom in the group where {K} is the spring constant, Mi is the mass of
the atom, and M is the total mass of all atoms in the group. Note
that {K} thus represents the spring constant for the total force on
the group of atoms, not for a spring applied to each atom.
The {couple} style links two groups of atoms together. The first
group is the fix group; the second is specified by group-ID2. The
groups are coupled together by a spring that is at equilibrium when
the two groups are displaced by a vector {x,y,z} with respect to each
other and at a distance R0 from that displacement. Note that {x,y,z}
is the equilibrium displacement of group-ID2 relative to the fix
group. Thus (1,1,0) is a different spring than (-1,-1,0). When the
relative positions and distance between the two groups are not in
equilibrium, the same spring force described above is applied to atoms
in each of the two groups.
For both the {tether} and {couple} styles, any of the x,y,z values can
be specified as NULL which means do not include that dimension in the
distance calculation or force application.
The first example above pulls the ligand towards the point (0,0,0).
The second example holds the ligand near the surface of a sphere of
radius 5 around the point (0,0,0). The third example holds the ligand
a distance 3 away from the z=2 plane (on either side).
The fourth example holds 2 bilayers a distance 10 apart in z. For the
last two examples, imagine a pore (a slab of atoms with a cylindrical
hole cut out) oriented with the pore axis along z, and an ion moving
within the pore. The fifth example holds the ion a distance of -20
below the z = 0 center plane of the pore (umbrella sampling). The
last example holds the ion a distance 5 away from the pore axis
(assuming the center-of-mass of the pore in x,y is the pore axis).
NOTE: The center of mass of a group of atoms is calculated in
"unwrapped" coordinates using atom image flags, which means that the
group can straddle a periodic boundary. See the "dump"_dump.html doc
page for a discussion of unwrapped coordinates. It also means that a
spring connecting two groups or a group and the tether point can cross
-a periodic boundary and its length be calculated correctly. One
-exception is for rigid bodies, which should not be used with the fix
-spring command, if the rigid body will cross a periodic boundary.
-This is because image flags for rigid bodies are used in a different
-way, as explained on the "fix rigid"_fix_rigid.html doc page.
+a periodic boundary and its length be calculated correctly.
[Restart, fix_modify, output, run start/stop, minimize info:]
No information about this fix is written to "binary restart
files"_restart.html.
The "fix_modify"_fix_modify.html {energy} option is supported by this
fix to add the energy stored in the spring to the system's potential
energy as part of "thermodynamic output"_thermo_style.html.
The "fix_modify"_fix_modify.html {respa} option is supported by this
fix. This allows to set at which level of the "r-RESPA"_run_style.html
integrator the fix is adding its forces. Default is the outermost level.
This fix computes a global scalar which can be accessed by various
"output commands"_Section_howto.html#howto_15. The scalar is the
spring energy = 0.5 * K * r^2.
This fix also computes global 4-vector which can be accessed by
various "output commands"_Section_howto.html#howto_15. The first 3
quantities in the vector are xyz components of the total force added
to the group of atoms by the spring. In the case of the {couple}
style, it is the force on the fix group (group-ID) or the negative of
the force on the 2nd group (group-ID2). The 4th quantity in the
vector is the magnitude of the force added by the spring, as a
positive value if (r-R0) > 0 and a negative value if (r-R0) < 0. This
sign convention can be useful when using the spring force to compute a
potential of mean force (PMF).
The scalar and vector values calculated by this fix are "extensive".
No parameter of this fix can be used with the {start/stop} keywords of
the "run"_run.html command.
The forces due to this fix are imposed during an energy minimization,
invoked by the "minimize"_minimize.html command.
NOTE: If you want the spring energy to be included in the total
potential energy of the system (the quantity being minimized), you
MUST enable the "fix_modify"_fix_modify.html {energy} option for this
fix.
[Restrictions:] none
[Related commands:]
"fix drag"_fix_drag.html, "fix spring/self"_fix_spring_self.html,
"fix spring/rg"_fix_spring_rg.html, "fix smd"_fix_smd.html
[Default:] none
diff --git a/doc/src/lammps.book b/doc/src/lammps.book
index 307b182fe..c4834dbbd 100644
--- a/doc/src/lammps.book
+++ b/doc/src/lammps.book
@@ -1,633 +1,635 @@
#HTMLDOC 1.8.27
-t pdf14 -f "../Manual.pdf" --book --toclevels 4 --no-numbered --toctitle "Table of Contents" --title --textcolor #000000 --linkcolor #0000ff --linkstyle plain --bodycolor #ffffff --size Universal --left 1.00in --right 0.50in --top 0.50in --bottom 0.50in --header .t. --header1 ... --footer ..1 --nup 1 --tocheader .t. --tocfooter ..i --portrait --color --no-pscommands --no-xrxcomments --compression=1 --jpeg=0 --fontsize 11.0 --fontspacing 1.2 --headingfont helvetica --bodyfont times --headfootsize 11.0 --headfootfont helvetica --charset iso-8859-1 --links --embedfonts --pagemode document --pagelayout single --firstpage c1 --pageeffect none --pageduration 10 --effectduration 1.0 --no-encryption --permissions all --owner-password "" --user-password "" --browserwidth 680 --no-strict --no-overflow
Manual.html
Section_intro.html
Section_start.html
Section_commands.html
Section_packages.html
Section_accelerate.html
accelerate_gpu.html
accelerate_intel.html
accelerate_kokkos.html
accelerate_omp.html
accelerate_opt.html
Section_howto.html
Section_example.html
Section_perf.html
Section_tools.html
Section_modify.html
Section_python.html
Section_errors.html
Section_history.html
tutorial_drude.html
tutorial_github.html
+tutorial_pylammps.html
body.html
manifolds.html
angle_coeff.html
angle_style.html
atom_modify.html
atom_style.html
balance.html
bond_coeff.html
bond_style.html
bond_write.html
boundary.html
box.html
change_box.html
clear.html
comm_modify.html
comm_style.html
compute.html
compute_modify.html
create_atoms.html
create_bonds.html
create_box.html
delete_atoms.html
delete_bonds.html
dielectric.html
dihedral_coeff.html
dihedral_style.html
dimension.html
displace_atoms.html
dump.html
dump_custom_vtk.html
dump_h5md.html
dump_image.html
dump_modify.html
dump_molfile.html
dump_nc.html
echo.html
fix.html
fix_modify.html
group.html
group2ndx.html
if.html
improper_coeff.html
improper_style.html
include.html
info.html
jump.html
kspace_modify.html
kspace_style.html
label.html
lattice.html
log.html
mass.html
min_modify.html
min_style.html
minimize.html
molecule.html
neb.html
neigh_modify.html
neighbor.html
newton.html
next.html
package.html
pair_coeff.html
pair_modify.html
pair_style.html
pair_write.html
partition.html
prd.html
print.html
processors.html
python.html
quit.html
read_data.html
read_dump.html
read_restart.html
region.html
replicate.html
rerun.html
reset_timestep.html
restart.html
run.html
run_style.html
set.html
shell.html
special_bonds.html
suffix.html
tad.html
temper.html
+temper_grem.html
thermo.html
thermo_modify.html
thermo_style.html
timer.html
timestep.html
uncompute.html
undump.html
unfix.html
units.html
variable.html
velocity.html
write_coeff.html
write_data.html
write_dump.html
write_restart.html
fix_adapt.html
fix_adapt_fep.html
fix_addforce.html
fix_addtorque.html
fix_append_atoms.html
fix_atc.html
fix_atom_swap.html
fix_ave_atom.html
fix_ave_chunk.html
fix_ave_correlate.html
fix_ave_correlate_long.html
fix_ave_histo.html
fix_ave_time.html
fix_aveforce.html
fix_balance.html
fix_bond_break.html
fix_bond_create.html
fix_bond_swap.html
fix_box_relax.html
fix_cmap.html
fix_colvars.html
fix_controller.html
fix_deform.html
fix_deposit.html
fix_dpd_energy.html
fix_drag.html
fix_drude.html
fix_drude_transform.html
fix_dt_reset.html
fix_efield.html
fix_ehex.html
fix_enforce2d.html
fix_eos_cv.html
fix_eos_table.html
fix_eos_table_rx.html
fix_evaporate.html
fix_external.html
fix_flow_gauss.html
fix_freeze.html
fix_gcmc.html
fix_gld.html
fix_gle.html
fix_gravity.html
fix_grem.html
fix_halt.html
fix_heat.html
fix_imd.html
fix_indent.html
fix_ipi.html
fix_langevin.html
fix_langevin_drude.html
fix_langevin_eff.html
fix_lb_fluid.html
fix_lb_momentum.html
fix_lb_pc.html
fix_lb_rigid_pc_sphere.html
fix_lb_viscous.html
fix_lineforce.html
fix_manifoldforce.html
fix_meso.html
fix_meso_stationary.html
fix_momentum.html
fix_move.html
fix_mscg.html
fix_msst.html
fix_neb.html
fix_nh.html
fix_nh_eff.html
fix_nph_asphere.html
fix_nph_body.html
fix_nph_sphere.html
fix_nphug.html
fix_npt_asphere.html
fix_npt_body.html
fix_npt_sphere.html
fix_nve.html
fix_nve_asphere.html
fix_nve_asphere_noforce.html
fix_nve_body.html
fix_nve_eff.html
fix_nve_limit.html
fix_nve_line.html
fix_nve_manifold_rattle.html
fix_nve_noforce.html
fix_nve_sphere.html
fix_nve_tri.html
fix_nvk.html
fix_nvt_asphere.html
fix_nvt_body.html
fix_nvt_manifold_rattle.html
fix_nvt_sllod.html
fix_nvt_sllod_eff.html
fix_nvt_sphere.html
fix_oneway.html
fix_orient.html
fix_phonon.html
fix_pimd.html
fix_planeforce.html
fix_poems.html
fix_pour.html
fix_press_berendsen.html
fix_print.html
fix_property_atom.html
fix_qbmsst.html
fix_qeq.html
fix_qeq_comb.html
fix_qeq_reax.html
fix_qmmm.html
fix_qtb.html
fix_reax_bonds.html
fix_reaxc_species.html
fix_recenter.html
fix_restrain.html
fix_rigid.html
fix_rx.html
fix_saed_vtk.html
fix_setforce.html
fix_shake.html
fix_shardlow.html
fix_smd.html
fix_smd_adjust_dt.html
fix_smd_integrate_tlsph.html
fix_smd_integrate_ulsph.html
fix_smd_move_triangulated_surface.html
fix_smd_setvel.html
fix_smd_wall_surface.html
fix_spring.html
fix_spring_chunk.html
fix_spring_rg.html
fix_spring_self.html
fix_srd.html
fix_store_force.html
fix_store_state.html
fix_temp_berendsen.html
fix_temp_csvr.html
fix_temp_rescale.html
fix_temp_rescale_eff.html
fix_tfmc.html
fix_thermal_conductivity.html
fix_ti_spring.html
fix_tmd.html
fix_ttm.html
fix_tune_kspace.html
fix_vector.html
fix_viscosity.html
fix_viscous.html
fix_wall.html
fix_wall_gran.html
fix_wall_gran_region.html
fix_wall_piston.html
fix_wall_reflect.html
fix_wall_region.html
fix_wall_srd.html
compute_ackland_atom.html
compute_angle.html
compute_angle_local.html
compute_angmom_chunk.html
compute_basal_atom.html
compute_body_local.html
compute_bond.html
compute_bond_local.html
compute_centro_atom.html
compute_chunk_atom.html
compute_cluster_atom.html
compute_cna_atom.html
compute_com.html
compute_com_chunk.html
compute_contact_atom.html
compute_coord_atom.html
compute_damage_atom.html
compute_dihedral.html
compute_dihedral_local.html
compute_dilatation_atom.html
compute_dipole_chunk.html
compute_displace_atom.html
compute_dpd.html
compute_dpd_atom.html
compute_erotate_asphere.html
compute_erotate_rigid.html
compute_erotate_sphere.html
compute_erotate_sphere_atom.html
compute_event_displace.html
compute_fep.html
compute_global_atom.html
compute_group_group.html
compute_gyration.html
compute_gyration_chunk.html
compute_heat_flux.html
compute_hexorder_atom.html
compute_improper.html
compute_improper_local.html
compute_inertia_chunk.html
compute_ke.html
compute_ke_atom.html
compute_ke_atom_eff.html
compute_ke_eff.html
compute_ke_rigid.html
compute_meso_e_atom.html
compute_meso_rho_atom.html
compute_meso_t_atom.html
compute_msd.html
compute_msd_chunk.html
compute_msd_nongauss.html
compute_omega_chunk.html
compute_orientorder_atom.html
compute_pair.html
compute_pair_local.html
compute_pe.html
compute_pe_atom.html
compute_plasticity_atom.html
compute_pressure.html
compute_property_atom.html
compute_property_chunk.html
compute_property_local.html
compute_rdf.html
compute_reduce.html
compute_rigid_local.html
compute_saed.html
compute_slice.html
compute_smd_contact_radius.html
compute_smd_damage.html
compute_smd_hourglass_error.html
compute_smd_internal_energy.html
compute_smd_plastic_strain.html
compute_smd_plastic_strain_rate.html
compute_smd_rho.html
compute_smd_tlsph_defgrad.html
compute_smd_tlsph_dt.html
compute_smd_tlsph_num_neighs.html
compute_smd_tlsph_shape.html
compute_smd_tlsph_strain.html
compute_smd_tlsph_strain_rate.html
compute_smd_tlsph_stress.html
compute_smd_triangle_mesh_vertices.html
compute_smd_ulsph_num_neighs.html
compute_smd_ulsph_strain.html
compute_smd_ulsph_strain_rate.html
compute_smd_ulsph_stress.html
compute_smd_vol.html
compute_sna_atom.html
compute_stress_atom.html
compute_tally.html
compute_temp.html
compute_temp_asphere.html
compute_temp_body.html
compute_temp_chunk.html
compute_temp_com.html
compute_temp_cs.html
compute_temp_deform.html
compute_temp_deform_eff.html
compute_temp_drude.html
compute_temp_eff.html
compute_temp_partial.html
compute_temp_profile.html
compute_temp_ramp.html
compute_temp_region.html
compute_temp_region_eff.html
compute_temp_rotate.html
compute_temp_sphere.html
compute_ti.html
compute_torque_chunk.html
compute_vacf.html
compute_vcm_chunk.html
compute_voronoi_atom.html
compute_xrd.html
pair_adp.html
pair_agni.html
pair_airebo.html
pair_awpmd.html
pair_beck.html
pair_body.html
pair_bop.html
pair_born.html
pair_brownian.html
pair_buck.html
pair_buck_long.html
pair_charmm.html
pair_class2.html
pair_colloid.html
pair_comb.html
pair_coul.html
pair_coul_diel.html
pair_cs.html
pair_dipole.html
pair_dpd.html
pair_dpd_fdt.html
pair_dsmc.html
pair_eam.html
pair_edip.html
pair_eff.html
pair_eim.html
pair_exp6_rx.html
pair_gauss.html
pair_gayberne.html
pair_gran.html
pair_gromacs.html
pair_hbond_dreiding.html
pair_hybrid.html
pair_kim.html
pair_lcbop.html
pair_line_lj.html
pair_list.html
pair_lj.html
pair_lj96.html
pair_lj_cubic.html
pair_lj_expand.html
pair_lj_long.html
pair_lj_sf.html
pair_lj_smooth.html
pair_lj_smooth_linear.html
pair_lj_soft.html
pair_lubricate.html
pair_lubricateU.html
pair_mdf.html
pair_meam.html
pair_meam_spline.html
pair_meam_sw_spline.html
pair_mgpt.html
pair_mie.html
pair_morse.html
pair_multi_lucy.html
pair_multi_lucy_rx.html
pair_nb3b_harmonic.html
pair_nm.html
pair_none.html
pair_peri.html
pair_polymorphic.html
pair_quip.html
pair_reax.html
pair_reax_c.html
pair_resquared.html
pair_sdk.html
pair_smd_hertz.html
pair_smd_tlsph.html
pair_smd_triangulated_surface.html
pair_smd_ulsph.html
pair_smtbq.html
pair_snap.html
pair_soft.html
pair_sph_heatconduction.html
pair_sph_idealgas.html
pair_sph_lj.html
pair_sph_rhosum.html
pair_sph_taitwater.html
pair_sph_taitwater_morris.html
pair_srp.html
pair_sw.html
pair_table.html
pair_table_rx.html
pair_tersoff.html
pair_tersoff_mod.html
pair_tersoff_zbl.html
pair_thole.html
pair_tri_lj.html
pair_vashishta.html
pair_yukawa.html
pair_yukawa_colloid.html
pair_zbl.html
pair_zero.html
bond_class2.html
bond_fene.html
bond_fene_expand.html
bond_harmonic.html
bond_harmonic_shift.html
bond_harmonic_shift_cut.html
bond_hybrid.html
bond_morse.html
bond_none.html
bond_nonlinear.html
bond_quartic.html
bond_table.html
bond_zero.html
angle_charmm.html
angle_class2.html
angle_cosine.html
angle_cosine_delta.html
angle_cosine_periodic.html
angle_cosine_shift.html
angle_cosine_shift_exp.html
angle_cosine_squared.html
angle_dipole.html
angle_fourier.html
angle_fourier_simple.html
angle_harmonic.html
angle_hybrid.html
angle_none.html
angle_quartic.html
angle_sdk.html
angle_table.html
angle_zero.html
dihedral_charmm.html
dihedral_class2.html
dihedral_cosine_shift_exp.html
dihedral_fourier.html
dihedral_harmonic.html
dihedral_helix.html
dihedral_hybrid.html
dihedral_multi_harmonic.html
dihedral_nharmonic.html
dihedral_none.html
dihedral_opls.html
dihedral_quadratic.html
dihedral_spherical.html
dihedral_table.html
dihedral_zero.html
improper_class2.html
improper_cossq.html
improper_cvff.html
improper_distance.html
improper_fourier.html
improper_harmonic.html
improper_hybrid.html
improper_none.html
improper_ring.html
improper_umbrella.html
improper_zero.html
USER/atc/man_add_molecule.html
USER/atc/man_add_species.html
USER/atc/man_atom_element_map.html
USER/atc/man_atom_weight.html
USER/atc/man_atomic_charge.html
USER/atc/man_boundary.html
USER/atc/man_boundary_dynamics.html
USER/atc/man_boundary_faceset.html
USER/atc/man_boundary_integral.html
USER/atc/man_consistent_fe_initialization.html
USER/atc/man_contour_integral.html
USER/atc/man_control.html
USER/atc/man_control_momentum.html
USER/atc/man_control_thermal.html
USER/atc/man_control_thermal_correction_max_iterations.html
USER/atc/man_decomposition.html
USER/atc/man_electron_integration.html
USER/atc/man_equilibrium_start.html
USER/atc/man_extrinsic_exchange.html
USER/atc/man_fe_md_boundary.html
USER/atc/man_fem_mesh.html
USER/atc/man_filter_scale.html
USER/atc/man_filter_type.html
USER/atc/man_fix_atc.html
USER/atc/man_fix_flux.html
USER/atc/man_fix_nodes.html
USER/atc/man_hardy_computes.html
USER/atc/man_hardy_fields.html
USER/atc/man_hardy_gradients.html
USER/atc/man_hardy_kernel.html
USER/atc/man_hardy_on_the_fly.html
USER/atc/man_hardy_rates.html
USER/atc/man_initial.html
USER/atc/man_internal_atom_integrate.html
USER/atc/man_internal_element_set.html
USER/atc/man_internal_quadrature.html
USER/atc/man_kernel_function.html
USER/atc/man_localized_lambda.html
USER/atc/man_lumped_lambda_solve.html
USER/atc/man_mask_direction.html
USER/atc/man_mass_matrix.html
USER/atc/man_material.html
USER/atc/man_mesh_add_to_nodeset.html
USER/atc/man_mesh_create.html
USER/atc/man_mesh_create_elementset.html
USER/atc/man_mesh_create_faceset_box.html
USER/atc/man_mesh_create_faceset_plane.html
USER/atc/man_mesh_create_nodeset.html
USER/atc/man_mesh_delete_elements.html
USER/atc/man_mesh_nodeset_to_elementset.html
USER/atc/man_mesh_output.html
USER/atc/man_mesh_quadrature.html
USER/atc/man_mesh_read.html
USER/atc/man_mesh_write.html
USER/atc/man_momentum_time_integration.html
USER/atc/man_output.html
USER/atc/man_output_elementset.html
USER/atc/man_output_nodeset.html
USER/atc/man_pair_interactions.html
USER/atc/man_poisson_solver.html
USER/atc/man_read_restart.html
USER/atc/man_remove_molecule.html
USER/atc/man_remove_source.html
USER/atc/man_remove_species.html
USER/atc/man_reset_atomic_reference_positions.html
USER/atc/man_reset_time.html
USER/atc/man_sample_frequency.html
USER/atc/man_set.html
USER/atc/man_source.html
USER/atc/man_source_integration.html
USER/atc/man_temperature_definition.html
USER/atc/man_thermal_time_integration.html
USER/atc/man_time_filter.html
USER/atc/man_track_displacement.html
USER/atc/man_unfix_flux.html
USER/atc/man_unfix_nodes.html
USER/atc/man_write_atom_weights.html
USER/atc/man_write_restart.html
diff --git a/doc/src/temper_grem.txt b/doc/src/temper_grem.txt
index b41bbdf02..6145c8704 100644
--- a/doc/src/temper_grem.txt
+++ b/doc/src/temper_grem.txt
@@ -1,109 +1,109 @@
"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
temper/grem command :h3
[Syntax:]
temper/grem N M lambda fix-ID thermostat-ID seed1 seed2 index :pre
N = total # of timesteps to run
M = attempt a tempering swap every this many steps
lambda = initial lambda for this ensemble
fix-ID = ID of fix_grem
thermostat-ID = ID of the thermostat that controls kinetic temperature
seed1 = random # seed used to decide on adjacent temperature to partner with
seed2 = random # seed for Boltzmann factor in Metropolis swap
index = which temperature (0 to N-1) I am simulating (optional) :ul
[Examples:]
temper/grem 100000 1000 ${lambda} fxgREM fxnvt 0 58728
temper/grem 40000 100 ${lambda} fxgREM fxnpt 0 32285 ${walkers} :pre
[Description:]
Run a parallel tempering or replica exchange simulation in LAMMPS
partition mode using multiple generalized replicas (ensembles) of a
system defined by "fix grem"_fix_grem.html, which stands for the
generalized replica exchange method (gREM) originally developed by
-"(Kim)"_#Kim. It uses non-Boltzmann ensembles to sample over first
+"(Kim)"_#KimStraub. It uses non-Boltzmann ensembles to sample over first
order phase transitions. The is done by defining replicas with an
enthalpy dependent effective temperature
Two or more replicas must be used. See the "temper"_temper.html
command for an explanation of how to run replicas on multiple
partitions of one or more processors.
This command is a modification of the "temper"_temper.html command and
has the same dependencies, restraints, and input variables which are
discussed there in greater detail.
Instead of temperature, this command performs replica exchanges in
lambda as per the generalized ensemble enforced by "fix
grem"_fix_grem.html. The desired lambda is specified by {lambda},
which is typically a variable previously set in the input script, so
that each partition is assigned a different temperature. See the
"variable"_variable.html command for more details. For example:
variable lambda world 400 420 440 460
fix fxnvt all nvt temp 300.0 300.0 100.0
fix fxgREM all grem ${lambda} -0.05 -50000 fxnvt
temper 100000 100 ${lambda} fxgREM fxnvt 3847 58382 :pre
would define 4 lambdas with constant kinetic temperature but unique
generalized temperature, and assign one of them to "fix
grem"_fix_grem.html used by each replica, and to the grem command.
As the gREM simulation runs for {N} timesteps, a swap between adjacent
ensembles will be attempted every {M} timesteps. If {seed1} is 0,
then the swap attempts will alternate between odd and even pairings.
If {seed1} is non-zero then it is used as a seed in a random number
generator to randomly choose an odd or even pairing each time. Each
attempted swap of temperatures is either accepted or rejected based on
a Metropolis criterion, derived for gREM by "(Kim)"_#Kim, which uses
{seed2} in the random number generator.
File management works identical to the "temper"_temper.html command.
Dump files created by this fix contain continuous trajectories and
require post-processing to obtain per-replica information.
The last argument {index} in the grem command is optional and is used
when restarting a run from a set of restart files (one for each
replica) which had previously swapped to new lambda. This is done
using a variable. For example if the log file listed the following for
a simulation with 5 replicas:
500000 2 4 0 1 3 :pre
then a setting of
variable walkers world 2 4 0 1 3 :pre
would be used to restart the run with a grem command like the example
above with ${walkers} as the last argument. This functionality is
identical to "temper"_temper.html.
:line
[Restrictions:]
This command can only be used if LAMMPS was built with the USER-MISC
package. See the "Making LAMMPS"_Section_start.html#start_3 section
for more info on packages.
This command must be used with "fix grem"_fix_grem.html.
[Related commands:]
"fix grem"_fix_grem.html, "temper"_temper.html, "variable"_variable.html
[Default:] none
-:link(Kim)
+:link(KimStraub)
[(Kim)] Kim, Keyes, Straub, J Chem Phys, 132, 224107 (2010).
diff --git a/doc/src/timer.txt b/doc/src/timer.txt
index c37798cff..358ec75a5 100644
--- a/doc/src/timer.txt
+++ b/doc/src/timer.txt
@@ -1,119 +1,121 @@
"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
timer command :h3
[Syntax:]
timer args :pre
{args} = one or more of {off} or {loop} or {normal} or {full} or {sync} or {nosync} or {timeout} or {every} :l
{off} = do not collect or print any timing information
{loop} = collect only the total time for the simulation loop
{normal} = collect timer information broken down by sections (default)
{full} = like {normal} but also include CPU and thread utilzation
{sync} = explicitly synchronize MPI tasks between sections
{nosync} = do not synchronize MPI tasks between sections (default)
{timeout} elapse = set walltime limit to {elapse}
{every} Ncheck = perform timeout check every {Ncheck} steps :pre
[Examples:]
timer full sync
timer timeout 2:00:00 every 100
timer loop :pre
[Description:]
Select the level of detail at which LAMMPS performs its CPU timings.
Multiple keywords can be specified with the {timer} command. For
keywords that are mutually exclusive, the last one specified takes
-effect.
+precedence.
During a simulation run LAMMPS collects information about how much
time is spent in different sections of the code and thus can provide
information for determining performance and load imbalance problems.
This can be done at different levels of detail and accuracy. For more
information about the timing output, see this "discussion of screen
-output"_Section_start.html#start_8.
+output in Section 2.8"_Section_start.html#start_8.
The {off} setting will turn all time measurements off. The {loop}
setting will only measure the total time for a run and not collect any
detailed per section information. With the {normal} setting, timing
information for portions of the timestep (pairwise calculations,
neighbor list construction, output, etc) are collected as well as
information about load imbalances for those sections across
procsessors. The {full} setting adds information about CPU
utilization and thread utilization, when multi-threading is enabled.
With the {sync} setting, all MPI tasks are synchronized at each timer
-call which meaures load imbalance more accuractly, though it can also
-slow down the simulation. Using the {nosync} setting (which is the
-default) turns off this synchronization.
+call which measures load imbalance for each section more accuractly,
+though it can also slow down the simulation by prohibiting overlapping
+independent computations on different MPI ranks Using the {nosync}
+setting (which is the default) turns this synchronization off.
-With the {timeout} keyword a walltime limit can be imposed that
+With the {timeout} keyword a walltime limit can be imposed, that
affects the "run"_run.html and "minimize"_minimize.html commands.
-This can be convenient when runs have to confirm to time limits,
-e.g. when running under a batch system and you want to maximize
-the utilization of the batch time slot, especially when the time
-per timestep varies and is thus difficult to predict how many
-steps a simulation can perform, or for difficult to converge
-minimizations. The timeout {elapse} value should be somewhat smaller
-than the time requested from the batch system, as there is usually
-some overhead to launch jobs, and it may be advisable to write
+This can be convenient when calculations have to comply with execution
+time limits, e.g. when running under a batch system when you want to
+maximize the utilization of the batch time slot, especially for runs
+where the time per timestep varies much and thus it becomes difficult
+to predict how many steps a simulation can perform for a given walltime
+limit. This also applies for difficult to converge minimizations.
+The timeout {elapse} value should be somewhat smaller than the maximum
+wall time requested from the batch system, as there is usually
+some overhead to launch jobs, and it is advisable to write
out a restart after terminating a run due to a timeout.
The timeout timer starts when the command is issued. When the time
limit is reached, the run or energy minimization will exit on the
next step or iteration that is a multiple of the {Ncheck} value
which can be set with the {every} keyword. Default is checking
every 10 steps. After the timer timeout has expired all subsequent
run or minimize commands in the input script will be skipped.
The remaining time or timer status can be accessed with the
"thermo"_thermo_style.html variable {timeremain}, which will be
zero, if the timeout is inactive (default setting), it will be
negative, if the timeout time is expired and positive if there
is time remaining and in this case the value of the variable are
the number of seconds remaining.
When the {timeout} key word is used a second time, the timer is
restarted with a new time limit. The timeout {elapse} value can
be specified as {off} or {unlimited} to impose a no timeout condition
(which is the default). The {elapse} setting can be specified as
a single number for seconds, two numbers separated by a colon (MM:SS)
for minutes and seconds, or as three numbers separated by colons for
hours, minutes, and seconds (H:MM:SS).
The {every} keyword sets how frequently during a run or energy
minimization the wall clock will be checked. This check count applies
to the outer iterations or time steps during minimizations or "r-RESPA
runs"_run_style.html, respectively. Checking for timeout too often,
can slow a calculation down. Checking too infrequently can make the
timeout measurement less accurate, with the run being stopped later
than desired.
NOTE: Using the {full} and {sync} options provides the most detailed
and accurate timing information, but can also have a negative
performance impact due to the overhead of the many required system
calls. It is thus recommended to use these settings only when testing
tests to identify performance bottlenecks. For calculations with few
atoms or a very large number of processors, even the {normal} setting
can have a measurable negative performance impact. In those cases you
can just use the {loop} or {off} setting.
[Restrictions:] none
[Related commands:]
"run post no"_run.html, "kspace_modify fftbench"_kspace_modify.html
[Default:]
timer normal nosync
timer timeout off
timer every 10 :pre
diff --git a/doc/src/tutorial_github.txt b/doc/src/tutorial_github.txt
index aed47a573..d6ec22589 100644
--- a/doc/src/tutorial_github.txt
+++ b/doc/src/tutorial_github.txt
@@ -1,380 +1,383 @@
"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
LAMMPS GitHub tutorial :h3
[written by Stefan Paquay]
:line
This document describes the process of how to use GitHub to integrate
changes or additions you have made to LAMMPS into the official LAMMPS
distribution. It uses the process of updating this very tutorial as
an example to describe the individual steps and options. You need to
be familiar with git and you may want to have a look at the
"Git book"_http://git-scm.com/book/ to reacquaint yourself with some
of the more advanced git features used below.
As of fall 2016, submitting contributions to LAMMPS via pull requests
on GitHub is the preferred option for integrating contributed features
or improvements to LAMMPS, as it significantly reduces the amount of
work required by the LAMMPS developers. Consequently, creating a pull
request will increase your chances to have your contribution included
and will reduce the time until the integration is complete. For more
information on the requirements to have your code included into LAMMPS
please see "Section 10.15"_Section_modify.html#mod_15
:line
[Making an account]
First of all, you need a GitHub account. This is fairly simple, just
go to "GitHub"_https://github.com and create an account by clicking
the "Sign up for GitHub" button. Once your account is created, you
can sign in by clicking the button in the top left and filling in your
username or e-mail address and password.
:line
[Forking the repository]
To get changes into LAMMPS, you need to first fork the `lammps/lammps`
repository on GitHub. At the time of writing, {master} is the preferred
target branch. Thus go to "LAMMPS on GitHub"_https://github.com/lammps/lammps
and make sure branch is set to "master", as shown in the figure below.
:c,image(JPG/tutorial_branch.png)
If it is not, use the button to change it to {master}. Once it is, use the
fork button to create a fork.
:c,image(JPG/tutorial_fork.png)
This will create a fork (which is essentially a copy, but uses less
resources) of the LAMMPS repository under your own GitHub account. You
can make changes in this fork and later file {pull requests} to allow
the upstream repository to merge changes from your own fork into the one
we just forked from (or others that were forked from the same repository).
At the same time, you can set things up, so you can include changes from
upstream into your repository and thus keep it in sync with the ongoing
LAMMPS development.
:line
[Adding changes to your own fork]
Additions to the upstream version of LAMMPS are handled using {feature
branches}. For every new feature, a so-called feature branch is
created, which contains only those modification relevant to one specific
feature. For example, adding a single fix would consist of creating a
branch with only the fix header and source file and nothing else. It is
explained in more detail here: "feature branch
workflow"_https://www.atlassian.com/git/tutorials/comparing-workflows/feature-branch-workflow.
[Feature branches]
First of all, create a clone of your version on github on your local
machine via HTTPS:
$ git clone https://github.com/<your user name>/lammps.git <some name> :pre
or, if you have set up your GitHub account for using SSH keys, via SSH:
$ git clone git@github.com:<your user name>/lammps.git :pre
You can find the proper URL by clicking the "Clone or download"-button:
:c,image(JPG/tutorial_https_block.png)
The above command copies ("clones") the git repository to your local
machine to a directory with the name you chose. If none is given, it will
default to "lammps". Typical names are "mylammps" or something similar.
You can use this local clone to make changes and
test them without interfering with the repository on Github.
To pull changes from upstream into this copy, you can go to the directory
and use git pull:
$ cd mylammps
$ git checkout master
$ git pull https://github.com/lammps/lammps :pre
You can also add this URL as a remote:
$ git remote add lammps_upstream https://www.github.com/lammps/lammps :pre
At this point, you typically make a feature branch from the updated master
branch for the feature you want to work on. This tutorial contains the
workflow that updated this tutorial, and hence we will call the branch
"github-tutorial-update":
$ git checkout -b github-tutorial-update master :pre
Now that we have changed branches, we can make our changes to our local
repository. Just remember that if you want to start working on another,
unrelated feature, you should switch branches!
[After changes are made]
After everything is done, add the files to the branch and commit them:
$ git add doc/src/tutorial_github.txt
$ git add doc/src/JPG/tutorial*.png :pre
IMPORTANT NOTE: Do not use {git commit -a} (or {git add -A}). The -a
flag (or -A flag) will automatically include _all_ modified or new files
and that is rarely the behavior you want. It can easily lead to
accidentally adding unrelated and unwanted changes into the repository.
Instead it is preferable to explicitly use {git add}, {git rm}, {git mv}
for adding, removing, renaming individual files, respectively, and then
{git commit} to finalize the commit. Carefully check all pending
changes with {git status} before committing them. If you find doing
this on the command line too tedious, consider using a GUI, for example
the one included in git distributions written in Tk, i.e. use {git gui}
(on some Linux distributions it may be required to install an additional
package to use it).
After adding all files, the change set can be committed with some
useful message that explains the change.
$ git commit -m 'Finally updated the github tutorial' :pre
After the commit, the changes can be pushed to the same branch on GitHub:
$ git push :pre
Git will ask you for your user name and password on GitHub if you have
not configured anything. If your local branch is not present on Github yet,
it will ask you to add it by running
$ git push --set-upstream origin github-tutorial-update :pre
If you correctly type your user name and
password, the feature branch should be added to your fork on GitHub.
If you want to make really sure you push to the right repository
(which is good practice), you can provide it explicitly:
$ git push origin :pre
or using an explicit URL:
$ git push git@github.com:Pakketeretet2/lammps.git :pre
:line
[Filing a pull request]
Up to this point in the tutorial, all changes were to {your} clones of
LAMMPS. Eventually, however, you want this feature to be included into
the official LAMMPS version. To do this, you will want to file a pull
request by clicking on the "New pull request" button:
:c,image(JPG/tutorial_new_pull_request.png)
Make sure that the current branch is set to the correct one, which, in
this case, is "github-tutorial-update". If done correctly, the only
changes you will see are those that were made on this branch.
This will open up a new window that lists changes made to the
repository. If you are just adding new files, there is not much to do,
but I suppose merge conflicts are to be resolved here if there are
changes in existing files. If all changes can automatically be merged,
green text at the top will say so and you can click the "Create pull
request" button, see image.
:c,image(JPG/tutorial_create_new_pull_request1.png)
Before creating the pull request, make sure the short title is accurate
and add a comment with details about your pull request. Here you write
what your modifications do and why they should be incorporated upstream.
Note the checkbox that says "Allow edits from maintainers".
This is checked by default checkbox (although in my version of Firefox, only the checkmark is visible):
:c,image(JPG/tutorial_edits_maintainers.png)
If it is checked, maintainers can immediately add their own edits to the
pull request. This helps the inclusion of your branch significantly, as
simple/trivial changes can be added directly to your pull request branch
by the LAMMPS maintainers. The alternative would be that they make
changes on their own version of the branch and file a reverse pull
request to you. Just leave this box checked unless you have a very good
reason not to.
Now just write some nice comments and click on "Create pull request".
:c,image(JPG/tutorial_create_new_pull_request2.png)
:line
[After filing a pull request]
NOTE: When you submit a pull request (or ask for a pull request) for the
first time, you will receive an invitation to become a LAMMPS project
collaborator. Please accept this invite as being a collaborator will
simplify certain administrative tasks and will probably speed up the
merging of your feature, too.
You will notice that after filing the pull request, some checks are
performed automatically:
:c,image(JPG/tutorial_automated_checks.png)
If all is fine, you will see this:
:c,image(JPG/tutorial_automated_checks_passed.png)
If any of the checks are failing, your pull request will not be
processed, as your changes may break compilation for certain
configurations or may not merge cleanly. It is your responsibility
to remove the reason(s) for the failed test(s). If you need help
with this, please contact the LAMMPS developers by adding a comment
explaining your problems with resolving the failed tests.
A few further interesting things (can) happen to pull requests before
they are included.
[Additional changes]
First of all, any additional changes you push into your branch in your
repository will automatically become part of the pull request:
:c,image(JPG/tutorial_additional_changes.png)
This means you can add changes that should be part of the feature after
filing the pull request, which is useful in case you have forgotten
them, or if a developer has requested that something needs to be changed
before the feature can be accepted into the official LAMMPS version.
After each push, the automated checks are run again.
[Assignees]
There is an assignee label for pull requests. If the request has not
been reviewed by any developer yet, it is not assigned to anyone. After
revision, a developer can choose to assign it to either a) you, b) a
LAMMPS developer (including him/herself) or c) Steve Plimpton (sjplimp).
Case a) happens if changes are required on your part :ulb,l
Case b) means that at the moment, it is being tested and reviewed by a
LAMMPS developer with the expectation that some changes would be required.
After the review, the developer can choose to implement changes directly
or suggest them to you. :l
Case c) means that the pull request has been assigned to the lead
developer Steve Plimpton and means it is considered ready for merging. :ule,l
In this case, Axel assigned the tutorial to Steve:
:c,image(JPG/tutorial_steve_assignee.png)
[Edits from LAMMPS maintainers]
If you allowed edits from maintainers (the default), any LAMMPS
maintainer can add changes to your pull request. In this case, both
Axel and Richard made changes to the tutorial:
:c,image(JPG/tutorial_changes_others.png)
[Reverse pull requests]
Sometimes, however, you might not feel comfortable having other people
push changes into your own branch, or maybe the maintainers are not sure
their idea was the right one. In such a case, they can make changes,
reassign you as the assignee, and file a "reverse pull request", i.e.
file a pull request in your GitHub repository to include changes in the
branch, that you have submitted as a pull request yourself. In that
case, you can choose to merge their changes back into your branch,
possibly make additional changes or corrections and proceed from there.
It looks something like this:
:c,image(JPG/tutorial_reverse_pull_request.png)
For some reason, the highlighted button didn't work in my case, but I
can go to my own repository and merge the pull request from there:
:c,image(JPG/tutorial_reverse_pull_request2.png)
Be sure to check the changes to see if you agree with them by clicking
on the tab button:
:c,image(JPG/tutorial_reverse_pull_request3.png)
In this case, most of it is changes in the markup and a short rewrite of
Axel's explanation of the "git gui" and "git add" commands.
:c,image(JPG/tutorial_reverse_pull_request4.png)
Because the changes are OK with us, we are going to merge by clicking on
"Merge pull request". After a merge it looks like this:
:c,image(JPG/tutorial_reverse_pull_request5.png)
Now, since in the meantime our local text for the tutorial also changed,
we need to pull Axel's change back into our branch, and merge them:
$ git add tutorial_github.txt
$ git add JPG/tutorial_reverse_pull_request*.png
$ git commit -m "Updated text and images on reverse pull requests"
$ git pull :pre
In this case, the merge was painless because git could auto-merge:
:c,image(JPG/tutorial_reverse_pull_request6.png)
With Axel's changes merged in and some final text updates, our feature
branch is now perfect as far as we are concerned, so we are going to
commit and push again:
$ git add tutorial_github.txt
$ git add JPG/tutorial_reverse_pull_request6.png
$ git commit -m "Merged Axel's suggestions and updated text"
$ git push git@github.com:Pakketeretet2/lammps :pre
+This merge also shows up on the lammps Github page:
+
+:c,image(JPG/tutorial_reverse_pull_request7.png)
:line
[After a merge]
-When everything is fine, the feature branch is merged into the master branch.
+When everything is fine, the feature branch is merged into the master branch:
:c,image(JPG/tutorial_merged.png)
Now one question remains: What to do with the feature branch that got
merged into upstream?
It is in principle safe to delete them from your own fork. This helps
keep it a bit more tidy. Note that you first have to switch to another
branch!
$ git checkout master
$ git pull master
$ git branch -d github-tutorial-update :pre
If you do not pull first, it is not really a problem but git will warn
you at the next statement that you are deleting a local branch that
was not yet fully merged into HEAD. This is because git does not yet
know your branch just got merged into LAMMPS upstream. If you
first delete and then pull, everything should still be fine.
Finally, if you delete the branch locally, you might want to push this
to your remote(s) as well:
$ git push origin :github-tutorial-update :pre
[Recent changes in the workflow]
Some changes to the workflow are not captured in this tutorial. For
example, in addition to the master branch, to which all new features
should be submitted, there is now also an "unstable" and a "stable"
branch; these have the same content as "master", but are only updated
after a patch release or stable release was made.
Furthermore, the naming of the patches now follow the pattern
"patch_<Day><Month><Year>" to simplify comparisons between releases.
Finally, all patches and submissions are subject to automatic testing
and code checks to make sure they at the very least compile.
diff --git a/examples/USER/dpd/dpdh-shardlow/in.dpdh-shardlow b/examples/USER/dpd/dpdh-shardlow/in.dpdh-shardlow
index 432c666c7..e403175e7 100644
--- a/examples/USER/dpd/dpdh-shardlow/in.dpdh-shardlow
+++ b/examples/USER/dpd/dpdh-shardlow/in.dpdh-shardlow
@@ -1,31 +1,31 @@
# Input File for DPD fluid under isoenthalpic conditions using the VV-SSA integration scheme
log log.dpdh-shardlow
boundary p p p
units metal # ev, ps
atom_style dpd
read_data data.dpdh
comm_modify mode single vel yes
mass 1 100.0
pair_style dpd/fdt/energy 10.0 234324
pair_coeff 1 1 0.075 0.022 3.2E-5 10.0
neighbor 2.0 bin
neigh_modify every 1 delay 0 check no once no
timestep 0.001
compute dpdU all dpd
-variable totEnergy equal pe+ke+c_dpdU[1]+c_dpdU[1]+press*vol
+variable totEnergy equal pe+ke+c_dpdU[1]+c_dpdU[2]+press*vol
thermo 1
thermo_style custom step temp press vol pe ke v_totEnergy cella cellb cellc
thermo_modify format float %15.10f
fix 1 all shardlow
fix 0 all nph iso 0.0 0.0 1000.0
fix 2 all eos/cv 0.0005
run 100
diff --git a/examples/USER/dpd/dpdh-shardlow/log.dpdh-shardlow.reference b/examples/USER/dpd/dpdh-shardlow/log.dpdh-shardlow.reference
index e36c31774..7a5478aaa 100644
--- a/examples/USER/dpd/dpdh-shardlow/log.dpdh-shardlow.reference
+++ b/examples/USER/dpd/dpdh-shardlow/log.dpdh-shardlow.reference
@@ -1,175 +1,183 @@
boundary p p p
units metal # ev, ps
atom_style dpd
read_data data.dpdh
orthogonal box = (-64.5 -64.5 -64.5) to (64.5 64.5 64.5)
1 by 1 by 1 MPI processor grid
reading atoms ...
10125 atoms
reading velocities ...
10125 velocities
comm_modify mode single vel yes
mass 1 100.0
pair_style dpd/fdt/energy 10.0 234324
pair_coeff 1 1 0.075 0.022 3.2E-5 10.0
neighbor 2.0 bin
neigh_modify every 1 delay 0 check no once no
timestep 0.001
compute dpdU all dpd
-variable totEnergy equal pe+ke+c_dpdU[1]+c_dpdU[1]+press*vol
+variable totEnergy equal pe+ke+c_dpdU[1]+c_dpdU[2]+press*vol
thermo 1
thermo_style custom step temp press vol pe ke v_totEnergy cella cellb cellc
thermo_modify format float %15.10f
fix 1 all shardlow
fix 0 all nph iso 0.0 0.0 1000.0
fix 2 all eos/cv 0.0005
run 100
Neighbor list info ...
- 1 neighbor list requests
update every 1 steps, delay 0 steps, check no
max neighbors/atom: 2000, page size: 100000
master list distance cutoff = 12
ghost atom cutoff = 12
- binsize = 6 -> bins = 22 22 22
-Memory usage per processor = 6.48143 Mbytes
+ binsize = 6, bins = 22 22 22
+ 2 neighbor lists, perpetual/occasional/extra = 2 0 0
+ (1) pair dpd/fdt/energy, perpetual
+ pair build: half/bin/newton
+ stencil: half/bin/3d/newton
+ bin: standard
+ (2) fix shardlow, perpetual, ssa
+ pair build: half/bin/newton/ssa
+ stencil: half/bin/3d/newton/ssa
+ bin: ssa
+Memory usage per processor = 8.55503 Mbytes
Step Temp Press Volume PotEng KinEng v_totEnergy Cella Cellb Cellc
- 0 239.4274282976 2817.4421750949 2146689.0000000000 2639.8225470740 313.3218455755 6048176597.3066043854 129.0000000000 129.0000000000 129.0000000000
- 1 239.4771405316 2817.4798146419 2146689.0000581890 2639.8304543632 313.3869004818 6048257397.9450111389 129.0000000012 129.0000000012 129.0000000012
- 2 239.5643955010 2817.5423194969 2146689.0002327557 2639.8379071907 313.5010849268 6048391577.0431985855 129.0000000047 129.0000000047 129.0000000047
- 3 239.6633839196 2817.6123662396 2146689.0005237064 2639.8445238058 313.6306241122 6048541946.5712032318 129.0000000105 129.0000000105 129.0000000105
- 4 239.5371222027 2817.5355424336 2146689.0009310376 2639.8505035043 313.4653942786 6048377030.7404460907 129.0000000186 129.0000000186 129.0000000186
- 5 239.6512678169 2817.6153097076 2146689.0014547524 2639.8561498340 313.6147686202 6048548267.9007377625 129.0000000291 129.0000000291 129.0000000291
- 6 239.5617886781 2817.5624195435 2146689.0020948485 2639.8617493725 313.4976735610 6048434730.8592004776 129.0000000420 129.0000000420 129.0000000420
- 7 239.5228587856 2817.5420009502 2146689.0028513218 2639.8666590407 313.4467287471 6048390900.5748577118 129.0000000571 129.0000000571 129.0000000571
- 8 239.6066877934 2817.6008649264 2146689.0037241788 2639.8710757645 313.5564298772 6048517265.7987136841 129.0000000746 129.0000000746 129.0000000746
- 9 239.5719861485 2817.5823530300 2146689.0047134170 2639.8752557893 313.5110182737 6048477529.2603597641 129.0000000944 129.0000000944 129.0000000944
- 10 239.5800176776 2817.5915671176 2146689.0058190385 2639.8793778438 313.5215285712 6048497312.1706552505 129.0000001166 129.0000001166 129.0000001166
- 11 239.6299830954 2817.6281223139 2146689.0070410441 2639.8829762049 313.5869148014 6048575788.3208351135 129.0000001410 129.0000001410 129.0000001410
- 12 239.6011995911 2817.6132377273 2146689.0083794324 2639.8860704236 313.5492478526 6048543839.4788360596 129.0000001678 129.0000001678 129.0000001678
- 13 239.6407681166 2817.6427924824 2146689.0098342048 2639.8889816934 313.6010284005 6048607288.5005025864 129.0000001970 129.0000001970 129.0000001970
- 14 239.6981172055 2817.6844100046 2146689.0114053637 2639.8913405110 313.6760771219 6048696632.8825626373 129.0000002285 129.0000002285 129.0000002285
- 15 239.8563971968 2817.7922519039 2146689.0130929090 2639.8934358481 313.8832070208 6048928140.8671455383 129.0000002623 129.0000002623 129.0000002623
- 16 239.8561894618 2817.7971208197 2146689.0148968464 2639.8950496967 313.8829351726 6048938597.9994916916 129.0000002984 129.0000002984 129.0000002984
- 17 239.8816520361 2817.8185621543 2146689.0168171758 2639.8961257823 313.9162562538 6048984631.3226108551 129.0000003369 129.0000003369 129.0000003369
- 18 239.9099966096 2817.8417368960 2146689.0188538977 2639.8965743204 313.9533488047 6049034386.0627622604 129.0000003777 129.0000003777 129.0000003777
- 19 240.0514024347 2817.9389205774 2146689.0210070144 2639.8966103811 314.1383966683 6049243015.4568052292 129.0000004208 129.0000004208 129.0000004208
- 20 239.8802541140 2817.8327386176 2146689.0232765260 2639.8962085210 313.9144268914 6049015081.9802341461 129.0000004662 129.0000004662 129.0000004662
- 21 239.8462621903 2817.8160306167 2146689.0256624296 2639.8953174755 313.8699440502 6048979221.7758703232 129.0000005140 129.0000005140 129.0000005140
- 22 240.0487944678 2817.9533849157 2146689.0281647225 2639.8938590354 314.1349838054 6049274086.0571212769 129.0000005642 129.0000005642 129.0000005642
- 23 240.0966314441 2817.9897873787 2146689.0307834130 2639.8918104774 314.1975846937 6049352238.2649183273 129.0000006166 129.0000006166 129.0000006166
- 24 240.1765312516 2818.0463843765 2146689.0335185044 2639.8891292321 314.3021439554 6049473742.2287187576 129.0000006714 129.0000006714 129.0000006714
- 25 240.1500705973 2818.0336048048 2146689.0363699966 2639.8858785483 314.2675167572 6049446316.4600162506 129.0000007285 129.0000007285 129.0000007285
- 26 240.2681423500 2818.1151708195 2146689.0393378921 2639.8825176506 314.4220289603 6049621421.8445177078 129.0000007880 129.0000007880 129.0000007880
- 27 240.4728815247 2818.2527327079 2146689.0424221945 2639.8784158747 314.6899567267 6049916733.3989181519 129.0000008498 129.0000008498 129.0000008498
- 28 240.4793027032 2818.2613348477 2146689.0456229053 2639.8736089473 314.6983596717 6049935208.5421981812 129.0000009139 129.0000009139 129.0000009139
- 29 240.5020619198 2818.2805472685 2146689.0489400285 2639.8681043704 314.7281430587 6049976461.0082206726 129.0000009803 129.0000009803 129.0000009803
- 30 240.5513721776 2818.3167157263 2146689.0523735629 2639.8623484053 314.7926719270 6050054113.1760177612 129.0000010491 129.0000010491 129.0000010491
- 31 240.7340393104 2818.4391703712 2146689.0559235099 2639.8563442170 315.0317155636 6050316995.4599781036 129.0000011202 129.0000011202 129.0000011202
- 32 240.8254719483 2818.5014640740 2146689.0595898777 2639.8498122053 315.1513670299 6050450731.1168394089 129.0000011936 129.0000011936 129.0000011936
- 33 240.9681573541 2818.5965480750 2146689.0633726656 2639.8425779528 315.3380893908 6050654857.7432861328 129.0000012694 129.0000012694 129.0000012694
- 34 241.0039494187 2818.6217008564 2146689.0672718794 2639.8347174393 315.3849279499 6050708863.9733209610 129.0000013475 129.0000013475 129.0000013475
- 35 241.0314566197 2818.6411150538 2146689.0712875174 2639.8262983643 315.4209246902 6050750551.5649127960 129.0000014279 129.0000014279 129.0000014279
- 36 241.0829173424 2818.6763455617 2146689.0754195810 2639.8174397481 315.4882677207 6050826192.2165899277 129.0000015107 129.0000015107 129.0000015107
- 37 241.2845682012 2818.8087982181 2146689.0796680767 2639.8080129872 315.7521540252 6051110539.1171846390 129.0000015958 129.0000015958 129.0000015958
- 38 241.3214712920 2818.8336260248 2146689.0840330068 2639.7981963574 315.8004465062 6051163849.0412235260 129.0000016833 129.0000016833 129.0000016833
- 39 241.3392127125 2818.8456991528 2146689.0885143690 2639.7879618658 315.8236634561 6051189778.9386901855 129.0000017730 129.0000017730 129.0000017730
- 40 241.5383770555 2818.9753950055 2146689.0931121684 2639.7769824244 316.0842958321 6051468208.8210506439 129.0000018651 129.0000018651 129.0000018651
- 41 241.5059730674 2818.9543817992 2146689.0978264087 2639.7656512498 316.0418910106 6051423113.2358427048 129.0000019595 129.0000019595 129.0000019595
- 42 241.3907605672 2818.8793800508 2146689.1026570834 2639.7541331920 315.8911205101 6051262121.2551422119 129.0000020563 129.0000020563 129.0000020563
- 43 241.5095917610 2818.9559595711 2146689.1076041958 2639.7424355740 316.0466265406 6051426527.7663059235 129.0000021554 129.0000021554 129.0000021554
- 44 241.6271631762 2819.0312325531 2146689.1126677482 2639.7297705654 316.2004839873 6051588129.8722610474 129.0000022568 129.0000022568 129.0000022568
- 45 241.5702411838 2818.9923790176 2146689.1178477411 2639.7163554760 316.1259941770 6051504737.9250564575 129.0000023606 129.0000023606 129.0000023606
- 46 241.7029985068 2819.0771124986 2146689.1231441777 2639.7024246704 316.2997243538 6051686649.4576120377 129.0000024667 129.0000024667 129.0000024667
- 47 241.7966144965 2819.1357830868 2146689.1285570571 2639.6882106593 316.4222330191 6051812612.3391046524 129.0000025751 129.0000025751 129.0000025751
- 48 241.8573480255 2819.1726205120 2146689.1340863821 2639.6735287925 316.5017107195 6051891706.4921989441 129.0000026859 129.0000026859 129.0000026859
- 49 241.9611147338 2819.2374095379 2146689.1397321564 2639.6583357477 316.6375029166 6052030804.4275226593 129.0000027990 129.0000027990 129.0000027990
- 50 242.1023518806 2819.3259059811 2146689.1454943856 2639.6424863169 316.8223300428 6052220795.1955394745 129.0000029144 129.0000029144 129.0000029144
- 51 242.1174105473 2819.3319633044 2146689.1513730693 2639.6264141131 316.8420362613 6052233814.9634265900 129.0000030321 129.0000030321 129.0000030321
- 52 242.2534914901 2819.4164594322 2146689.1573682069 2639.6098392670 317.0201158259 6052415218.9485445023 129.0000031522 129.0000031522 129.0000031522
- 53 242.3504633236 2819.4754119996 2146689.1634798055 2639.5930076506 317.1470160479 6052541789.1274013519 129.0000032746 129.0000032746 129.0000032746
- 54 242.2982323323 2819.4368568264 2146689.1697078613 2639.5756353782 317.0786650211 6052459040.6286897659 129.0000033994 129.0000033994 129.0000033994
- 55 242.3452896272 2819.4623310219 2146689.1760523771 2639.5575918586 317.1402455951 6052513743.7400159836 129.0000035265 129.0000035265 129.0000035265
- 56 242.4181903333 2819.5048897011 2146689.1825133534 2639.5390347547 317.2356456249 6052605122.2894439697 129.0000036559 129.0000036559 129.0000036559
- 57 242.5317091656 2819.5739975787 2146689.1890907930 2639.5199828249 317.3841997413 6052753494.0979280472 129.0000037876 129.0000037876 129.0000037876
- 58 242.5478978740 2819.5796954935 2146689.1957846982 2639.5006137388 317.4053847660 6052765744.6257629395 129.0000039217 129.0000039217 129.0000039217
- 59 242.6655316466 2819.6519225743 2146689.2025950695 2639.4808234811 317.5593238156 6052920813.0568208694 129.0000040582 129.0000040582 129.0000040582
- 60 242.8126131177 2819.7431588157 2146689.2095219092 2639.4607996998 317.7517989980 6053116688.6155729294 129.0000041969 129.0000041969 129.0000041969
- 61 242.7957124913 2819.7275989047 2146689.2165652174 2639.4406312730 317.7296823362 6053083306.1403274536 129.0000043380 129.0000043380 129.0000043380
- 62 242.9276177041 2819.8088790098 2146689.2237249981 2639.4201279058 317.9022974164 6053257809.6067762375 129.0000044814 129.0000044814 129.0000044814
- 63 243.0465445938 2819.8814758895 2146689.2310012528 2639.3991657500 318.0579286774 6053413673.1989650726 129.0000046272 129.0000046272 129.0000046272
- 64 242.9890585501 2819.8387587817 2146689.2383939880 2639.3781767844 317.9827007328 6053321993.5937871933 129.0000047752 129.0000047752 129.0000047752
- 65 242.9653746583 2819.8180104181 2146689.2459031967 2639.3568184374 317.9517072884 6053277474.4272727966 129.0000049256 129.0000049256 129.0000049256
- 66 243.0259297024 2819.8514334947 2146689.2535288804 2639.3352568621 318.0309514181 6053349244.9473772049 129.0000050784 129.0000050784 129.0000050784
- 67 242.9638979697 2819.8046112742 2146689.2612710390 2639.3134547096 317.9497748498 6053248753.9180717468 129.0000052335 129.0000052335 129.0000052335
- 68 243.0283540775 2819.8395632725 2146689.2691296688 2639.2912303374 318.0341240273 6053323807.2197017670 129.0000053909 129.0000053909 129.0000053909
- 69 243.2256418664 2819.9609646019 2146689.2771047787 2639.2684509205 318.2923006889 6053584440.8757400513 129.0000055506 129.0000055506 129.0000055506
- 70 243.2507495334 2819.9706145524 2146689.2851963686 2639.2450126010 318.3251573278 6053605179.1483964920 129.0000057127 129.0000057127 129.0000057127
- 71 243.4287155518 2820.0794853386 2146689.2934044413 2639.2213699915 318.5580489464 6053838914.2552747726 129.0000058771 129.0000058771 129.0000058771
- 72 243.5097518574 2820.1249498194 2146689.3017290002 2639.1971212009 318.6640954635 6053936535.9274711609 129.0000060439 129.0000060439 129.0000060439
- 73 243.5356790969 2820.1337977544 2146689.3101700447 2639.1723394661 318.6980246193 6053955553.5090074539 129.0000062130 129.0000062130 129.0000062130
- 74 243.5479180498 2820.1331964183 2146689.3187275808 2639.1473868749 318.7140408766 6053954286.7515821457 129.0000063844 129.0000063844 129.0000063844
- 75 243.7115573025 2820.2314361523 2146689.3274016059 2639.1220411207 318.9281840641 6054165201.5909118652 129.0000065581 129.0000065581 129.0000065581
- 76 243.7457279618 2820.2454531429 2146689.3361921217 2639.0963868224 318.9729008040 6054195316.5254154205 129.0000067342 129.0000067342 129.0000067342
- 77 243.8345031069 2820.2948644965 2146689.3450991292 2639.0700900389 319.0890745962 6054301412.5615310669 129.0000069126 129.0000069126 129.0000069126
- 78 244.0193931195 2820.4067881628 2146689.3541226317 2639.0435094409 319.3310271594 6054541703.5689058304 129.0000070934 129.0000070934 129.0000070934
- 79 243.9919100078 2820.3799166166 2146689.3632626338 2639.0164249037 319.2950619430 6054484044.4218587875 129.0000072765 129.0000072765 129.0000072765
- 80 244.0965612207 2820.4387335935 2146689.3725191355 2638.9888176882 319.4320116291 6054610332.4174261093 129.0000074619 129.0000074619 129.0000074619
- 81 244.1334315951 2820.4535208568 2146689.3818921377 2638.9608330195 319.4802612965 6054642102.5347270966 129.0000076496 129.0000076496 129.0000076496
- 82 244.3029520408 2820.5543485196 2146689.3913816395 2638.9318525796 319.7021007878 6054858575.1664342880 129.0000078397 129.0000078397 129.0000078397
- 83 244.3445761189 2820.5713690935 2146689.4009876498 2638.9021684795 319.7565712929 6054895140.1710596085 129.0000080321 129.0000080321 129.0000080321
- 84 244.2696671559 2820.5125763350 2146689.4107101629 2638.8720941742 319.6585431986 6054768957.6739044189 129.0000082269 129.0000082269 129.0000082269
- 85 244.5161919319 2820.6629431352 2146689.4205491822 2638.8415194387 319.9811528443 6055091776.5361995697 129.0000084240 129.0000084240 129.0000084240
- 86 244.5641090282 2820.6838080201 2146689.4305047127 2638.8103612394 320.0438585800 6055136595.0767974854 129.0000086234 129.0000086234 129.0000086234
- 87 244.5348240638 2820.6541129118 2146689.4405767513 2638.7789728309 320.0055354056 6055072877.2416200638 129.0000088251 129.0000088251 129.0000088251
- 88 244.6939431427 2820.7468233396 2146689.4507653015 2638.7470269267 320.2137633592 6055271926.6536149979 129.0000090292 129.0000090292 129.0000090292
- 89 244.8800201091 2820.8567117003 2146689.4610703662 2638.7147520097 320.4572692055 6055507852.1186332703 129.0000092356 129.0000092356 129.0000092356
- 90 244.8804280382 2820.8451141876 2146689.4714919478 2638.6820441173 320.4578030336 6055482985.2258749008 129.0000094444 129.0000094444 129.0000094444
- 91 244.9558851986 2820.8815975090 2146689.4820300462 2638.6491836104 320.5565485155 6055561333.3803453445 129.0000096555 129.0000096555 129.0000096555
- 92 244.9965893140 2820.8949614294 2146689.4926846647 2638.6159817170 320.6098151301 6055590051.6433181763 129.0000098689 129.0000098689 129.0000098689
- 93 245.1381056687 2820.9732811388 2146689.5034558061 2638.5824451870 320.7950076360 6055758210.2774200439 129.0000100846 129.0000100846 129.0000100846
- 94 245.2954807041 2821.0619342131 2146689.5143434699 2638.5485198222 321.0009532826 6055948551.7882709503 129.0000103027 129.0000103027 129.0000103027
- 95 245.3535822199 2821.0860553731 2146689.5253476589 2638.5144817512 321.0769866522 6056000363.5151576996 129.0000105232 129.0000105232 129.0000105232
- 96 245.5013476026 2821.1682908185 2146689.5364683764 2638.4801107361 321.2703568219 6056176929.0169925690 129.0000107459 129.0000107459 129.0000107459
- 97 245.4166531417 2821.0989038023 2146689.5477056229 2638.4453663061 321.1595231342 6056028008.1910057068 129.0000109710 129.0000109710 129.0000109710
- 98 245.4121937790 2821.0817490953 2146689.5590593945 2638.4097762390 321.1536874797 6055991214.3494396210 129.0000111984 129.0000111984 129.0000111984
- 99 245.4532592994 2821.0946353191 2146689.5705296928 2638.3738037546 321.2074270397 6056018909.4480972290 129.0000114282 129.0000114282 129.0000114282
- 100 245.7500657390 2821.2735939427 2146689.5821165247 2638.3375549051 321.5958367642 6056403111.1006488800 129.0000116603 129.0000116603 129.0000116603
-Loop time of 4.05006 on 1 procs for 100 steps with 10125 atoms
+ 0 239.4274282976 2817.4421750949 2146689.0000000000 2639.8225470740 313.3218455755 6048176597.3066034317 129.0000000000 129.0000000000 129.0000000000
+ 1 239.4771405316 2817.4798146419 2146689.0000581890 2639.8304543632 313.3869004818 6048257397.8720483780 129.0000000012 129.0000000012 129.0000000012
+ 2 239.5643955010 2817.5423194969 2146689.0002327557 2639.8379071907 313.5010849268 6048391576.8485937119 129.0000000047 129.0000000047 129.0000000047
+ 3 239.6633839196 2817.6123662396 2146689.0005237064 2639.8445238058 313.6306241122 6048541946.2404479980 129.0000000105 129.0000000105 129.0000000105
+ 4 239.5371222027 2817.5355424336 2146689.0009310376 2639.8505035043 313.4653942786 6048377030.5689325333 129.0000000186 129.0000000186 129.0000000186
+ 5 239.6512678169 2817.6153097076 2146689.0014547524 2639.8561498340 313.6147686202 6048548267.5742130280 129.0000000291 129.0000000291 129.0000000291
+ 6 239.5617886781 2817.5624195435 2146689.0020948485 2639.8617493725 313.4976735610 6048434730.6441593170 129.0000000420 129.0000000420 129.0000000420
+ 7 239.5228587856 2817.5420009502 2146689.0028513218 2639.8666590407 313.4467287471 6048390900.4058599472 129.0000000571 129.0000000571 129.0000000571
+ 8 239.6066877934 2817.6008649264 2146689.0037241788 2639.8710757645 313.5564298772 6048517265.5155982971 129.0000000746 129.0000000746 129.0000000746
+ 9 239.5719861485 2817.5823530300 2146689.0047134170 2639.8752557893 313.5110182737 6048477529.0184717178 129.0000000944 129.0000000944 129.0000000944
+ 10 239.5800176776 2817.5915671176 2146689.0058190385 2639.8793778438 313.5215285712 6048497311.9141387939 129.0000001166 129.0000001166 129.0000001166
+ 11 239.6299830954 2817.6281223139 2146689.0070410441 2639.8829762049 313.5869148014 6048575787.9953098297 129.0000001410 129.0000001410 129.0000001410
+ 12 239.6011995911 2817.6132377273 2146689.0083794324 2639.8860704236 313.5492478526 6048543839.1878814697 129.0000001678 129.0000001678 129.0000001678
+ 13 239.6407681166 2817.6427924824 2146689.0098342048 2639.8889816934 313.6010284005 6048607288.1548709869 129.0000001970 129.0000001970 129.0000001970
+ 14 239.6981172055 2817.6844100046 2146689.0114053637 2639.8913405110 313.6760771219 6048696632.4595127106 129.0000002285 129.0000002285 129.0000002285
+ 15 239.8563971968 2817.7922519039 2146689.0130929090 2639.8934358481 313.8832070208 6048928140.2348766327 129.0000002623 129.0000002623 129.0000002623
+ 16 239.8561894618 2817.7971208196 2146689.0148968464 2639.8950496967 313.8829351726 6048938597.3658657074 129.0000002984 129.0000002984 129.0000002984
+ 17 239.8816520361 2817.8185621543 2146689.0168171758 2639.8961257823 313.9162562538 6048984630.6545839310 129.0000003369 129.0000003369 129.0000003369
+ 18 239.9099966096 2817.8417368960 2146689.0188538977 2639.8965743204 313.9533488047 6049034385.3571958542 129.0000003777 129.0000003777 129.0000003777
+ 19 240.0514024347 2817.9389205774 2146689.0210070144 2639.8966103811 314.1383966683 6049243014.5661621094 129.0000004208 129.0000004208 129.0000004208
+ 20 239.8802541140 2817.8327386176 2146689.0232765260 2639.8962085210 313.9144268914 6049015081.3139505386 129.0000004662 129.0000004662 129.0000004662
+ 21 239.8462621903 2817.8160306167 2146689.0256624296 2639.8953174755 313.8699440502 6048979221.1549577713 129.0000005140 129.0000005140 129.0000005140
+ 22 240.0487944678 2817.9533849157 2146689.0281647225 2639.8938590354 314.1349838054 6049274085.1726217270 129.0000005642 129.0000005642 129.0000005642
+ 23 240.0966314441 2817.9897873787 2146689.0307834130 2639.8918104774 314.1975846937 6049352237.3198652267 129.0000006166 129.0000006166 129.0000006166
+ 24 240.1765312516 2818.0463843765 2146689.0335185044 2639.8891292321 314.3021439554 6049473741.1817827225 129.0000006714 129.0000006714 129.0000006714
+ 25 240.1500705973 2818.0336048048 2146689.0363699966 2639.8858785483 314.2675167572 6049446315.4509468079 129.0000007285 129.0000007285 129.0000007285
+ 26 240.2681423500 2818.1151708195 2146689.0393378921 2639.8825176506 314.4220289603 6049621420.6842966080 129.0000007880 129.0000007880 129.0000007880
+ 27 240.4728815247 2818.2527327079 2146689.0424221945 2639.8784158747 314.6899567267 6049916731.9748563766 129.0000008498 129.0000008498 129.0000008498
+ 28 240.4793027032 2818.2613348477 2146689.0456229053 2639.8736089473 314.6983596717 6049935207.1145420074 129.0000009139 129.0000009139 129.0000009139
+ 29 240.5020619198 2818.2805472685 2146689.0489400285 2639.8681043704 314.7281430587 6049976459.5562763214 129.0000009803 129.0000009803 129.0000009803
+ 30 240.5513721776 2818.3167157263 2146689.0523735629 2639.8623484053 314.7926719270 6050054111.6652946472 129.0000010491 129.0000010491 129.0000010491
+ 31 240.7340393104 2818.4391703712 2146689.0559235099 2639.8563442170 315.0317155636 6050316993.7162160873 129.0000011202 129.0000011202 129.0000011202
+ 32 240.8254719483 2818.5014640740 2146689.0595898777 2639.8498122053 315.1513670299 6050450729.2599506378 129.0000011936 129.0000011936 129.0000011936
+ 33 240.9681573541 2818.5965480750 2146689.0633726656 2639.8425779528 315.3380893908 6050654855.7068986893 129.0000012694 129.0000012694 129.0000012694
+ 34 241.0039494187 2818.6217008564 2146689.0672718794 2639.8347174393 315.3849279499 6050708861.8979463577 129.0000013475 129.0000013475 129.0000013475
+ 35 241.0314566197 2818.6411150538 2146689.0712875174 2639.8262983643 315.4209246902 6050750549.4619541168 129.0000014279 129.0000014279 129.0000014279
+ 36 241.0829173424 2818.6763455617 2146689.0754195810 2639.8174397481 315.4882677207 6050826190.0551443100 129.0000015107 129.0000015107 129.0000015107
+ 37 241.2845682012 2818.8087982181 2146689.0796680767 2639.8080129872 315.7521540252 6051110536.7012710571 129.0000015958 129.0000015958 129.0000015958
+ 38 241.3214712920 2818.8336260248 2146689.0840330068 2639.7981963574 315.8004465062 6051163846.5868301392 129.0000016833 129.0000016833 129.0000016833
+ 39 241.3392127125 2818.8456991528 2146689.0885143690 2639.7879618658 315.8236634561 6051189776.4712991714 129.0000017730 129.0000017730 129.0000017730
+ 40 241.5383770555 2818.9753950055 2146689.0931121684 2639.7769824244 316.0842958321 6051468206.1039972305 129.0000018651 129.0000018651 129.0000018651
+ 41 241.5059730674 2818.9543817992 2146689.0978264087 2639.7656512498 316.0418910106 6051423110.5725250244 129.0000019595 129.0000019595 129.0000019595
+ 42 241.3907605672 2818.8793800508 2146689.1026570834 2639.7541331920 315.8911205101 6051262118.7541017532 129.0000020563 129.0000020563 129.0000020563
+ 43 241.5095917610 2818.9559595711 2146689.1076041958 2639.7424355740 316.0466265406 6051426525.1214485168 129.0000021554 129.0000021554 129.0000021554
+ 44 241.6271631762 2819.0312325531 2146689.1126677482 2639.7297705654 316.2004839873 6051588127.0861988068 129.0000022568 129.0000022568 129.0000022568
+ 45 241.5702411838 2818.9923790176 2146689.1178477411 2639.7163554760 316.1259941770 6051504735.2269029617 129.0000023606 129.0000023606 129.0000023606
+ 46 241.7029985068 2819.0771124986 2146689.1231441777 2639.7024246704 316.2997243538 6051686646.5996389389 129.0000024667 129.0000024667 129.0000024667
+ 47 241.7966144965 2819.1357830868 2146689.1285570571 2639.6882106593 316.4222330191 6051812609.3728218079 129.0000025751 129.0000025751 129.0000025751
+ 48 241.8573480255 2819.1726205120 2146689.1340863821 2639.6735287925 316.5017107195 6051891703.4611186981 129.0000026859 129.0000026859 129.0000026859
+ 49 241.9611147338 2819.2374095379 2146689.1397321564 2639.6583357477 316.6375029166 6052030801.2758235931 129.0000027990 129.0000027990 129.0000027990
+ 50 242.1023518806 2819.3259059811 2146689.1454943856 2639.6424863169 316.8223300428 6052220791.8748512268 129.0000029144 129.0000029144 129.0000029144
+ 51 242.1174105473 2819.3319633044 2146689.1513730693 2639.6264141131 316.8420362613 6052233811.6391019821 129.0000030321 129.0000030321 129.0000030321
+ 52 242.2534914901 2819.4164594322 2146689.1573682069 2639.6098392671 317.0201158259 6052415215.4627037048 129.0000031522 129.0000031522 129.0000031522
+ 53 242.3504633236 2819.4754119996 2146689.1634798055 2639.5930076506 317.1470160479 6052541785.5314817429 129.0000032746 129.0000032746 129.0000032746
+ 54 242.2982323323 2819.4368568264 2146689.1697078613 2639.5756353782 317.0786650211 6052459037.1184797287 129.0000033994 129.0000033994 129.0000033994
+ 55 242.3452896272 2819.4623310219 2146689.1760523771 2639.5575918586 317.1402455951 6052513740.1862611771 129.0000035265 129.0000035265 129.0000035265
+ 56 242.4181903333 2819.5048897011 2146689.1825133534 2639.5390347547 317.2356456249 6052605118.6588287354 129.0000036559 129.0000036559 129.0000036559
+ 57 242.5317091656 2819.5739975787 2146689.1890907930 2639.5199828249 317.3841997413 6052753490.3378009796 129.0000037876 129.0000037876 129.0000037876
+ 58 242.5478978740 2819.5796954935 2146689.1957846982 2639.5006137388 317.4053847660 6052765740.8638200760 129.0000039217 129.0000039217 129.0000039217
+ 59 242.6655316466 2819.6519225743 2146689.2025950695 2639.4808234811 317.5593238156 6052920809.1607065201 129.0000040582 129.0000040582 129.0000040582
+ 60 242.8126131177 2819.7431588157 2146689.2095219092 2639.4607996998 317.7517989980 6053116684.5470046997 129.0000041969 129.0000041969 129.0000041969
+ 61 242.7957124913 2819.7275989047 2146689.2165652174 2639.4406312730 317.7296823362 6053083302.1140241623 129.0000043380 129.0000043380 129.0000043380
+ 62 242.9276177041 2819.8088790098 2146689.2237249981 2639.4201279058 317.9022974164 6053257805.4283437729 129.0000044814 129.0000044814 129.0000044814
+ 63 243.0465445938 2819.8814758895 2146689.2310012528 2639.3991657500 318.0579286774 6053413668.8858547211 129.0000046272 129.0000046272 129.0000046272
+ 64 242.9890585501 2819.8387587817 2146689.2383939880 2639.3781767844 317.9827007328 6053321989.3768787384 129.0000047752 129.0000047752 129.0000047752
+ 65 242.9653746583 2819.8180104181 2146689.2459031967 2639.3568184374 317.9517072884 6053277470.2627182007 129.0000049256 129.0000049256 129.0000049256
+ 66 243.0259297024 2819.8514334947 2146689.2535288804 2639.3352568621 318.0309514181 6053349240.7251205444 129.0000050784 129.0000050784 129.0000050784
+ 67 242.9638979697 2819.8046112742 2146689.2612710390 2639.3134547096 317.9497748498 6053248749.7987766266 129.0000052335 129.0000052335 129.0000052335
+ 68 243.0283540775 2819.8395632725 2146689.2691296688 2639.2912303374 318.0341240273 6053323803.0382738113 129.0000053909 129.0000053909 129.0000053909
+ 69 243.2256418664 2819.9609646019 2146689.2771047787 2639.2684509205 318.2923006889 6053584436.4588871002 129.0000055506 129.0000055506 129.0000055506
+ 70 243.2507495334 2819.9706145524 2146689.2851963686 2639.2450126010 318.3251573278 6053605174.7221174240 129.0000057127 129.0000057127 129.0000057127
+ 71 243.4287155518 2820.0794853386 2146689.2934044413 2639.2213699915 318.5580489464 6053838909.6197280884 129.0000058771 129.0000058771 129.0000058771
+ 72 243.5097518574 2820.1249498194 2146689.3017290002 2639.1971212009 318.6640954635 6053936531.2101163864 129.0000060439 129.0000060439 129.0000060439
+ 73 243.5356790969 2820.1337977544 2146689.3101700447 2639.1723394661 318.6980246193 6053955548.7824945450 129.0000062130 129.0000062130 129.0000062130
+ 74 243.5479180498 2820.1331964183 2146689.3187275808 2639.1473868749 318.7140408766 6053954282.0339813232 129.0000063844 129.0000063844 129.0000063844
+ 75 243.7115573025 2820.2314361523 2146689.3274016059 2639.1220411207 318.9281840641 6054165196.6845111847 129.0000065581 129.0000065581 129.0000065581
+ 76 243.7457279618 2820.2454531429 2146689.3361921217 2639.0963868224 318.9729008040 6054195311.5999307632 129.0000067342 129.0000067342 129.0000067342
+ 77 243.8345031069 2820.2948644965 2146689.3450991292 2639.0700900389 319.0890745962 6054301407.5461502075 129.0000069126 129.0000069126 129.0000069126
+ 78 244.0193931195 2820.4067881628 2146689.3541226317 2639.0435094409 319.3310271594 6054541698.3381366730 129.0000070934 129.0000070934 129.0000070934
+ 79 243.9919100078 2820.3799166166 2146689.3632626338 2639.0164249037 319.2950619430 6054484039.2541246414 129.0000072765 129.0000072765 129.0000072765
+ 80 244.0965612207 2820.4387335935 2146689.3725191355 2638.9888176882 319.4320116291 6054610327.1403293610 129.0000074619 129.0000074619 129.0000074619
+ 81 244.1334315951 2820.4535208568 2146689.3818921377 2638.9608330195 319.4802612965 6054642097.2373485565 129.0000076496 129.0000076496 129.0000076496
+ 82 244.3029520408 2820.5543485196 2146689.3913816395 2638.9318525796 319.7021007878 6054858569.6761827469 129.0000078397 129.0000078397 129.0000078397
+ 83 244.3445761189 2820.5713690935 2146689.4009876498 2638.9021684795 319.7565712929 6054895134.6560049057 129.0000080321 129.0000080321 129.0000080321
+ 84 244.2696671559 2820.5125763350 2146689.4107101629 2638.8720941742 319.6585431986 6054768952.2869329453 129.0000082269 129.0000082269 129.0000082269
+ 85 244.5161919319 2820.6629431352 2146689.4205491822 2638.8415194387 319.9811528443 6055091770.8571672440 129.0000084240 129.0000084240 129.0000084240
+ 86 244.5641090282 2820.6838080201 2146689.4305047127 2638.8103612394 320.0438585800 6055136589.3662166595 129.0000086234 129.0000086234 129.0000086234
+ 87 244.5348240638 2820.6541129118 2146689.4405767513 2638.7789728309 320.0055354056 6055072871.6007261276 129.0000088251 129.0000088251 129.0000088251
+ 88 244.6939431427 2820.7468233396 2146689.4507653015 2638.7470269267 320.2137633592 6055271920.8364210129 129.0000090292 129.0000090292 129.0000090292
+ 89 244.8800201091 2820.8567117003 2146689.4610703662 2638.7147520097 320.4572692055 6055507846.0901927948 129.0000092356 129.0000092356 129.0000092356
+ 90 244.8804280382 2820.8451141876 2146689.4714919478 2638.6820441173 320.4578030336 6055482979.2295818329 129.0000094444 129.0000094444 129.0000094444
+ 91 244.9558851986 2820.8815975090 2146689.4820300462 2638.6491836104 320.5565485155 6055561327.3181543350 129.0000096555 129.0000096555 129.0000096555
+ 92 244.9965893140 2820.8949614294 2146689.4926846647 2638.6159817170 320.6098151301 6055590045.5610351562 129.0000098689 129.0000098689 129.0000098689
+ 93 245.1381056687 2820.9732811388 2146689.5034558061 2638.5824451870 320.7950076360 6055758204.0434722900 129.0000100846 129.0000100846 129.0000100846
+ 94 245.2954807041 2821.0619342131 2146689.5143434699 2638.5485198222 321.0009532826 6055948545.3822879791 129.0000103027 129.0000103027 129.0000103027
+ 95 245.3535822199 2821.0860553731 2146689.5253476589 2638.5144817512 321.0769866522 6056000357.0671482086 129.0000105232 129.0000105232 129.0000105232
+ 96 245.5013476026 2821.1682908185 2146689.5364683764 2638.4801107361 321.2703568219 6056176922.4099712372 129.0000107459 129.0000107459 129.0000107459
+ 97 245.4166531417 2821.0989038023 2146689.5477056229 2638.4453663061 321.1595231342 6056028001.7295455933 129.0000109710 129.0000109710 129.0000109710
+ 98 245.4121937790 2821.0817490953 2146689.5590593945 2638.4097762390 321.1536874797 6055991207.9293851852 129.0000111984 129.0000111984 129.0000111984
+ 99 245.4532592994 2821.0946353191 2146689.5705296928 2638.3738037546 321.2074270397 6056018903.0102539062 129.0000114282 129.0000114282 129.0000114282
+ 100 245.7500657390 2821.2735939427 2146689.5821165247 2638.3375549051 321.5958367642 6056403104.3106222153 129.0000116603 129.0000116603 129.0000116603
+Loop time of 5.22601 on 1 procs for 100 steps with 10125 atoms
-Performance: 2.133 ns/day, 11.250 hours/ns, 24.691 timesteps/s
-99.8% CPU use with 1 MPI tasks x no OpenMP threads
+Performance: 1.653 ns/day, 14.517 hours/ns, 19.135 timesteps/s
+99.7% CPU use with 1 MPI tasks x no OpenMP threads
MPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %total
---------------------------------------------------------------
-Pair | 0.46587 | 0.46587 | 0.46587 | 0.0 | 11.50
-Neigh | 1.4713 | 1.4713 | 1.4713 | 0.0 | 36.33
-Comm | 0.05567 | 0.05567 | 0.05567 | 0.0 | 1.37
-Output | 0.011364 | 0.011364 | 0.011364 | 0.0 | 0.28
-Modify | 2.0158 | 2.0158 | 2.0158 | 0.0 | 49.77
-Other | | 0.03004 | | | 0.74
+Pair | 0.44045 | 0.44045 | 0.44045 | 0.0 | 8.43
+Neigh | 2.669 | 2.669 | 2.669 | 0.0 | 51.07
+Comm | 0.056143 | 0.056143 | 0.056143 | 0.0 | 1.07
+Output | 0.012469 | 0.012469 | 0.012469 | 0.0 | 0.24
+Modify | 2.0163 | 2.0163 | 2.0163 | 0.0 | 38.58
+Other | | 0.03168 | | | 0.61
Nlocal: 10125 ave 10125 max 10125 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Nghost: 6703 ave 6703 max 6703 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Neighs: 166831 ave 166831 max 166831 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Total # of neighbors = 166831
Ave neighs/atom = 16.4771
Neighbor list builds = 100
Dangerous builds not checked
Please see the log.cite file for references relevant to this simulation
-Total wall time: 0:00:04
+Total wall time: 0:00:05
diff --git a/examples/USER/misc/basal/in.basal b/examples/USER/misc/basal/in.basal
index fa30c38b7..7fe4c5d2a 100644
--- a/examples/USER/misc/basal/in.basal
+++ b/examples/USER/misc/basal/in.basal
@@ -1,163 +1,163 @@
-############################################################################
-# Input file for investigating twinning nucleation under uniaxial loading with basal plane vector analysis
-# Christopher Barrett, March 2013
-# This script requires a Mg pair potential file to be in the same directory.
-
-# fname is the file name. It is necessary for loops to work correctly. (See jump command)
-variable fname index in.basal
-
-######################################
-# POTENTIAL VARIABLES
-# lattice parameters and the minimum energy per atom which should be obtained with the current pair potential and homogeneous lattice
-variable lx equal 3.181269601
-variable b equal sqrt(3)
-variable c equal sqrt(8/3)
-variable ly equal ${b}*${lx}
-variable lz equal ${c}*${lx}
-variable pairlocation index almg.liu
-variable pairstyle index eam/alloy/opt
-
-######################################
-# EQUILIBRATION/DEFORMATION VARIABLES
-# eqpress = 10 bar = 1 MPa
-# tstep (the timestep) is set to a default value of 0.001 (1 fs)
-# seed randomizes the velocity
-# srate is the rate of strain in 1/s
-# Ndump is the number of timesteps in between each dump of the atom coordinates
-variable tstep equal 0.001
-variable seed equal 95812384
-variable srate equal 1e9
-
-######################################
-# INITIALIZATION
-units metal
-dimension 3
-boundary s s s
-atom_style atomic
-
-######################################
-# ATOM BUILD
-atom_modify map array
-
-# lattice custom scale a1 "coordinates of a1" a2 "coordinates of a2" a3 "coordinates of a3" basis "atom1 coordinates" basis "atom2 coordinates" basis "atom3 coordinates" basis "atom4 coordinates" orient x "crystallagraphic orientation of x axis" orient y "crystallagraphic orientation of y axis" z "crystallagraphic orientation of z axis"
-lattice custom 3.181269601 a1 1 0 0 a2 0 1.732050808 0 a3 0 0 1.632993162 basis 0.0 0.0 0.0 basis 0.5 0.5 0 basis 0 0.3333333 0.5 basis 0.5 0.833333 0.5 orient x 0 1 1 orient y 1 0 0 orient z 0 1 -1
-variable multiple equal 20
-variable mx equal "v_lx*v_multiple"
-variable my equal "v_ly*v_multiple"
-variable mz equal "v_lz*v_multiple"
-
-# the simulation region should be from 0 to a multiple of the periodic boundary in x, y and z.
-region whole block 0 ${mz} 0 ${mx} 0 ${my} units box
-create_box 2 whole
-create_atoms 1 box basis 1 1 basis 2 1 basis 3 1 basis 4 1
-
-region fixed1 block INF INF INF INF INF 10 units box
-region fixed2 block INF INF INF INF 100 INF units box
-group lower region fixed1
-group upper region fixed2
-group boundary union upper lower
-group mobile subtract all boundary
-
-variable natoms equal "count(all)"
-print "# of atoms are: ${natoms}"
-
-######################################
-# INTERATOMIC POTENTIAL
-pair_style ${pairstyle}
-pair_coeff * * ${pairlocation} Mg Mg
-
-######################################
-# COMPUTES REQUIRED
-compute csym all centro/atom 12
-compute eng all pe/atom
-compute eatoms all reduce sum c_eng
-compute basal all basal/atom
-
-######################################
-# MINIMIZATION
-# Primarily adjusts the c/a ratio to value predicted by EAM potential
-reset_timestep 0
-thermo 1
-thermo_style custom step pe c_eatoms
-min_style cg
-minimize 1e-15 1e-15 1000 2000
-variable eminimum equal "c_eatoms / count(all)"
-print "%%e(it,1)=${eminimum}"
-
-######################################
-# EQUILIBRATION
-reset_timestep 0
-timestep ${tstep}
-# atoms are given a random velocity based on a temperature of 100K.
-velocity all create 100 ${seed} mom yes rot no
-
-# temperature and pressure are set to 100 and 0
-fix 1 all nve
-
-# Set thermo output
-thermo 100
-thermo_style custom step lx ly lz press pxx pyy pzz pe temp
-
-# Run for at least 2 picosecond (assuming 1 fs timestep)
-run 2000
-
-# Loop to run until pressure is below the variable eqpress (defined at beginning of file)
-label loopeq
-variable eq loop 100
-run 250
-variable converge equal press
-if "${converge} <= 0" then "variable converge equal -press" else "variable converge equal press"
-if "${converge} <= 50" then "jump ${fname} breakeq"
-next eq
-jump ${fname} loopeq
-label breakeq
-
-# Store length for strain rate calculations
-variable tmp equal "lx"
-variable L0 equal ${tmp}
-print "Initial Length, L0: ${L0}"
-unfix 1
-
-######################################
-# DEFORMATION
-reset_timestep 0
-timestep ${tstep}
-
-# Impose constant strain rate
-variable srate1 equal "v_srate / 1.0e10"
-velocity upper set 0.0 NULL 0.0 units box
-velocity lower set 0.0 NULL 0.0 units box
-
-fix 2 upper setforce 0.0 NULL 0.0
-fix 3 lower setforce 0.0 NULL 0.0
-fix 1 all nve
-
-# Output strain and stress info to file
-# for units metal, pressure is in [bars] = 100 [kPa] = 1/10000 [GPa]
-# p2 is in GPa
-variable strain equal "(lx - v_L0)/v_L0"
-variable p1 equal "v_strain"
-variable p2 equal "-pxz/10000"
-variable p3 equal "lx"
-variable p4 equal "temp"
-variable p5 equal "pe"
-variable p6 equal "ke"
-fix def1 all print 100 "${p1} ${p2} ${p3} ${p4} ${p5} ${p6}" file output.def1.txt screen no
-# Dump coordinates to file (for void size calculations)
-dump 1 all custom 1000 output.dump.* id x y z c_basal[1] c_basal[2] c_basal[3]
-
-# Display thermo
-thermo_style custom step v_strain pxz lx temp pe ke
-restart 50000 output.restart
-
-# run deformation for 100000 timesteps (10% strain assuming 1 fs timestep and 1e9/s strainrate)
-variable runtime equal 0
-label loop
-displace_atoms all ramp x 0.0 ${srate1} z 10 100 units box
-run 100
-variable runtime equal ${runtime}+100
-if "${runtime} < 100000" then "jump ${fname} loop"
-
-######################################
-# SIMULATION DONE
-print "All done"
+############################################################################
+# Input file for investigating twinning nucleation under uniaxial loading with basal plane vector analysis
+# Christopher Barrett, March 2013
+# This script requires a Mg pair potential file to be in the same directory.
+
+# fname is the file name. It is necessary for loops to work correctly. (See jump command)
+variable fname index in.basal
+
+######################################
+# POTENTIAL VARIABLES
+# lattice parameters and the minimum energy per atom which should be obtained with the current pair potential and homogeneous lattice
+variable lx equal 3.181269601
+variable b equal sqrt(3)
+variable c equal sqrt(8/3)
+variable ly equal ${b}*${lx}
+variable lz equal ${c}*${lx}
+variable pairlocation index almg.liu
+variable pairstyle index eam/alloy/opt
+
+######################################
+# EQUILIBRATION/DEFORMATION VARIABLES
+# eqpress = 10 bar = 1 MPa
+# tstep (the timestep) is set to a default value of 0.001 (1 fs)
+# seed randomizes the velocity
+# srate is the rate of strain in 1/s
+# Ndump is the number of timesteps in between each dump of the atom coordinates
+variable tstep equal 0.001
+variable seed equal 95812384
+variable srate equal 1e9
+
+######################################
+# INITIALIZATION
+units metal
+dimension 3
+boundary s s s
+atom_style atomic
+
+######################################
+# ATOM BUILD
+atom_modify map array
+
+# lattice custom scale a1 "coordinates of a1" a2 "coordinates of a2" a3 "coordinates of a3" basis "atom1 coordinates" basis "atom2 coordinates" basis "atom3 coordinates" basis "atom4 coordinates" orient x "crystallagraphic orientation of x axis" orient y "crystallagraphic orientation of y axis" z "crystallagraphic orientation of z axis"
+lattice custom 3.181269601 a1 1 0 0 a2 0 1.732050808 0 a3 0 0 1.632993162 basis 0.0 0.0 0.0 basis 0.5 0.5 0 basis 0 0.3333333 0.5 basis 0.5 0.833333 0.5 orient x 0 1 1 orient y 1 0 0 orient z 0 1 -1
+variable multiple equal 20
+variable mx equal "v_lx*v_multiple"
+variable my equal "v_ly*v_multiple"
+variable mz equal "v_lz*v_multiple"
+
+# the simulation region should be from 0 to a multiple of the periodic boundary in x, y and z.
+region whole block 0 ${mz} 0 ${mx} 0 ${my} units box
+create_box 2 whole
+create_atoms 1 box basis 1 1 basis 2 1 basis 3 1 basis 4 1
+
+region fixed1 block INF INF INF INF INF 10 units box
+region fixed2 block INF INF INF INF 100 INF units box
+group lower region fixed1
+group upper region fixed2
+group boundary union upper lower
+group mobile subtract all boundary
+
+variable natoms equal "count(all)"
+print "# of atoms are: ${natoms}"
+
+######################################
+# INTERATOMIC POTENTIAL
+pair_style ${pairstyle}
+pair_coeff * * ${pairlocation} Mg Mg
+
+######################################
+# COMPUTES REQUIRED
+compute csym all centro/atom 12
+compute eng all pe/atom
+compute eatoms all reduce sum c_eng
+compute basal all basal/atom
+
+######################################
+# MINIMIZATION
+# Primarily adjusts the c/a ratio to value predicted by EAM potential
+reset_timestep 0
+thermo 1
+thermo_style custom step pe c_eatoms
+min_style cg
+minimize 1e-15 1e-15 1000 2000
+variable eminimum equal "c_eatoms / count(all)"
+print "%%e(it,1)=${eminimum}"
+
+######################################
+# EQUILIBRATION
+reset_timestep 0
+timestep ${tstep}
+# atoms are given a random velocity based on a temperature of 100K.
+velocity all create 100 ${seed} mom yes rot no
+
+# temperature and pressure are set to 100 and 0
+fix 1 all nve
+
+# Set thermo output
+thermo 100
+thermo_style custom step lx ly lz press pxx pyy pzz pe temp
+
+# Run for at least 2 picosecond (assuming 1 fs timestep)
+run 2000
+
+# Loop to run until pressure is below the variable eqpress (defined at beginning of file)
+label loopeq
+variable eq loop 100
+run 250
+variable converge equal press
+if "${converge} <= 0" then "variable converge equal -press" else "variable converge equal press"
+if "${converge} <= 50" then "jump ${fname} breakeq"
+next eq
+jump ${fname} loopeq
+label breakeq
+
+# Store length for strain rate calculations
+variable tmp equal "lx"
+variable L0 equal ${tmp}
+print "Initial Length, L0: ${L0}"
+unfix 1
+
+######################################
+# DEFORMATION
+reset_timestep 0
+timestep ${tstep}
+
+# Impose constant strain rate
+variable srate1 equal "v_srate / 1.0e10"
+velocity upper set 0.0 NULL 0.0 units box
+velocity lower set 0.0 NULL 0.0 units box
+
+fix 2 upper setforce 0.0 NULL 0.0
+fix 3 lower setforce 0.0 NULL 0.0
+fix 1 all nve
+
+# Output strain and stress info to file
+# for units metal, pressure is in [bars] = 100 [kPa] = 1/10000 [GPa]
+# p2 is in GPa
+variable strain equal "(lx - v_L0)/v_L0"
+variable p1 equal "v_strain"
+variable p2 equal "-pxz/10000"
+variable p3 equal "lx"
+variable p4 equal "temp"
+variable p5 equal "pe"
+variable p6 equal "ke"
+fix def1 all print 100 "${p1} ${p2} ${p3} ${p4} ${p5} ${p6}" file output.def1.txt screen no
+# Dump coordinates to file (for void size calculations)
+dump 1 all custom 1000 output.dump.* id x y z c_basal[1] c_basal[2] c_basal[3]
+
+# Display thermo
+thermo_style custom step v_strain pxz lx temp pe ke
+restart 50000 output.restart
+
+# run deformation for 100000 timesteps (10% strain assuming 1 fs timestep and 1e9/s strainrate)
+variable runtime equal 0
+label loop
+displace_atoms all ramp x 0.0 ${srate1} z 10 100 units box
+run 100
+variable runtime equal ${runtime}+100
+if "${runtime} < 100000" then "jump ${fname} loop"
+
+######################################
+# SIMULATION DONE
+print "All done"
diff --git a/examples/USER/misc/srp/in.srp b/examples/USER/misc/srp/in.srp
index 691343436..234026e9c 100644
--- a/examples/USER/misc/srp/in.srp
+++ b/examples/USER/misc/srp/in.srp
@@ -1,41 +1,42 @@
units lj
atom_style full
boundary p p p
special_bonds lj/coul 1 1 1
newton on on
# save an extra atom type for bond particles
read_data data.chain
neighbor 2.0 bin
neigh_modify every 10 check yes
bond_style harmonic
bond_coeff * 225.0 0.85
comm_modify vel yes
+comm_modify cutoff 3.6
# must use pair hybrid, since srp bond particles
# do not interact with other atoms types
pair_style hybrid dpd 1.0 1.0 373692 srp 0.8 1 mid
pair_coeff 1 1 dpd 60.0 4.5 1.0
pair_coeff 1 2 none
pair_coeff 2 2 srp 100.0
# auto normalization of thermo quantites is turned off by pair srp
# just divide by natoms
variable natoms equal count(all)
variable nPotEng equal c_thermo_pe/v_natoms
thermo 50
thermo_style custom step temp pe v_nPotEng press atoms v_natoms lx ly lz
fix 1 all nve
timestep 0.01
restart 500 mid-run-*.restart
run 1000
write_restart end-run.restart
diff --git a/examples/prd/in.prd b/examples/prd/in.prd
index ea5220ab4..be3454bcb 100644
--- a/examples/prd/in.prd
+++ b/examples/prd/in.prd
@@ -1,97 +1,97 @@
# Parallel replica dynamics model for a single vacancy in bulk Si
# events occur when a neighboring atom diffuses to the vacant site
# run this on multiple partitions as
# mpirun -np 4 lmp_g++ -partition 4x1 -in in.prd
#log none
units metal
atom_style atomic
atom_modify map array
boundary p p p
atom_modify sort 0 0.0
# temperature
variable t equal 1800.0
# coordination number cutoff
variable r equal 2.835
# minimization parameters
variable etol equal 1.0e-5
variable ftol equal 1.0e-5
variable maxiter equal 100
variable maxeval equal 100
variable dmax equal 1.0e-1
# diamond unit cell
variable a equal 5.431
lattice custom $a &
a1 1.0 0.0 0.0 &
a2 0.0 1.0 0.0 &
a3 0.0 0.0 1.0 &
basis 0.0 0.0 0.0 &
basis 0.0 0.5 0.5 &
basis 0.5 0.0 0.5 &
basis 0.5 0.5 0.0 &
basis 0.25 0.25 0.25 &
basis 0.25 0.75 0.75 &
basis 0.75 0.25 0.75 &
basis 0.75 0.75 0.25
region myreg block 0 4 &
0 4 &
0 4
create_box 1 myreg
create_atoms 1 region myreg
mass 1 28.06
group Si type 1
velocity all create $t 5287287 mom yes rot yes dist gaussian
# make a vacancy
group del id 300
delete_atoms group del
pair_style sw
pair_coeff * * Si.sw Si
thermo 10
fix 1 all nvt temp $t $t 0.1
timestep 1.0e-3
neighbor 1.0 bin
neigh_modify every 1 delay 10 check yes
# equilibrate
run 100
# only output atoms near vacancy
-compute coord all coord/atom $r
+compute coord all coord/atom cutoff $r
#dump events all custom 1 dump.prd id type x y z
#dump_modify events thresh c_coord != 4
compute patom all pe/atom
compute pe all reduce sum c_patom
compute satom all stress/atom NULL
compute str all reduce sum c_satom[1] c_satom[2] c_satom[3]
variable press equal (c_str[1]+c_str[2]+c_str[3])/(3*vol)
thermo_style custom step temp pe c_pe press v_press
compute 1 all event/displace 0.5
prd 2000 100 10 10 100 1 54985 temp $t &
min ${etol} ${ftol} ${maxiter} ${maxeval} vel all uniform
diff --git a/examples/tad/in.tad b/examples/tad/in.tad
index da3d2175a..687e1dde0 100644
--- a/examples/tad/in.tad
+++ b/examples/tad/in.tad
@@ -1,110 +1,110 @@
# temperature accelerated dynamics model for a single vacancy in bulk Si
# events occur when a neighboring atom diffuses to the vacant site
# run this on multiple partitions as
# mpirun -np 3 lmp_g++ -partition 3x1 -in in.tad
units metal
atom_style atomic
atom_modify map array
boundary p p p
atom_modify sort 0 0.0
# temperatures
variable tlo equal 1800.0
variable thi equal 2400.0
# coordination number cutoff
variable r equal 2.835
# minimization parameters
variable etol equal 1.0e-5
variable ftol equal 1.0e-5
variable maxiter equal 100
variable maxeval equal 100
variable dmax equal 1.0e-1
# diamond unit cell
variable a equal 5.431
lattice custom $a &
a1 1.0 0.0 0.0 &
a2 0.0 1.0 0.0 &
a3 0.0 0.0 1.0 &
basis 0.0 0.0 0.0 &
basis 0.0 0.5 0.5 &
basis 0.5 0.0 0.5 &
basis 0.5 0.5 0.0 &
basis 0.25 0.25 0.25 &
basis 0.25 0.75 0.75 &
basis 0.75 0.25 0.75 &
basis 0.75 0.75 0.25
region myreg block 0 4 &
0 4 &
0 4
create_box 1 myreg
create_atoms 1 region myreg
mass 1 28.06
group Si type 1
velocity all create ${thi} 5287286 mom yes rot yes dist gaussian
# make a vacancy
group del id 300
delete_atoms group del
pair_style sw
pair_coeff * * Si.sw Si
thermo 10
fix 1 all nve
fix 2 all langevin ${thi} ${thi} 0.1 48278
timestep 1.0e-3
neighbor 1.0 bin
neigh_modify every 1 delay 10 check yes
# equilibrate
run 1000
# Eliminate COM motion
velocity all zero linear
# only output atoms near vacancy
-compute coord all coord/atom $r
+compute coord all coord/atom cutoff $r
#dump events all custom 1 dump.prd id type x y z
#dump_modify events thresh c_coord != 4
compute patom all pe/atom
compute pe all reduce sum c_patom
compute satom all stress/atom NULL
compute str all reduce sum c_satom[1] c_satom[2] c_satom[3]
variable press equal (c_str[1]+c_str[2]+c_str[3])/(3*vol)
thermo_style custom step temp pe c_pe press v_press
compute event all event/displace 1.0
unfix 1
unfix 2
fix 1 all nvt temp ${thi} ${thi} 0.1
# tad nsteps nevent tlo thi delta_conf tmax compute
# [min etol ftol niter neval]
# [neb etol_neb ftol_neb n1steps n2steps nevery]
# [neb_style min_style]
# [neb_log logfile]
tad 2000 50 ${tlo} ${thi} 0.05 1.0 event &
min ${etol} ${ftol} ${maxiter} ${maxeval} &
neb 0.0 0.01 200 200 20 neb_style fire neb_log log.neb
diff --git a/lib/kokkos/.gitignore b/lib/kokkos/.gitignore
deleted file mode 100644
index f9d16be15..000000000
--- a/lib/kokkos/.gitignore
+++ /dev/null
@@ -1,8 +0,0 @@
-# Standard ignores
-*~
-*.pyc
-\#*#
-.#*
-.*.swp
-.cproject
-.project
diff --git a/lib/kokkos/CHANGELOG.md b/lib/kokkos/CHANGELOG.md
new file mode 100644
index 000000000..a444f08ee
--- /dev/null
+++ b/lib/kokkos/CHANGELOG.md
@@ -0,0 +1,284 @@
+# Change Log
+
+## [2.02.07](https://github.com/kokkos/kokkos/tree/2.02.07) (2016-12-16)
+[Full Changelog](https://github.com/kokkos/kokkos/compare/2.02.01...2.02.07)
+
+**Implemented enhancements:**
+
+- Add CMake option to enable Cuda Lambda support [\#589](https://github.com/kokkos/kokkos/issues/589)
+- Add CMake option to enable Cuda RDC support [\#588](https://github.com/kokkos/kokkos/issues/588)
+- Add Initial Intel Sky Lake Xeon-HPC Compiler Support to Kokkos Make System [\#584](https://github.com/kokkos/kokkos/issues/584)
+- Building Tutorial Examples [\#582](https://github.com/kokkos/kokkos/issues/582)
+- Internal way for using ThreadVectorRange without TeamHandle [\#574](https://github.com/kokkos/kokkos/issues/574)
+- Testing: Add testing for uvm and rdc [\#571](https://github.com/kokkos/kokkos/issues/571)
+- Profiling: Add Memory Tracing and Region Markers [\#557](https://github.com/kokkos/kokkos/issues/557)
+- nvcc\_wrapper not installed with Kokkos built with CUDA through CMake [\#543](https://github.com/kokkos/kokkos/issues/543)
+- Improve DynRankView debug check [\#541](https://github.com/kokkos/kokkos/issues/541)
+- Benchmarks: Add Gather benchmark [\#536](https://github.com/kokkos/kokkos/issues/536)
+- Testing: add spot\_check option to test\_all\_sandia [\#535](https://github.com/kokkos/kokkos/issues/535)
+- Deprecate Kokkos::Impl::VerifyExecutionCanAccessMemorySpace [\#527](https://github.com/kokkos/kokkos/issues/527)
+- Add AtomicAdd support for 64bit float for Pascal [\#522](https://github.com/kokkos/kokkos/issues/522)
+- Add Restrict and Aligned memory trait [\#517](https://github.com/kokkos/kokkos/issues/517)
+- Kokkos Tests are Not Run using Compiler Optimization [\#501](https://github.com/kokkos/kokkos/issues/501)
+- Add support for clang 3.7 w/ openmp backend [\#393](https://github.com/kokkos/kokkos/issues/393)
+- Provide an error throw class [\#79](https://github.com/kokkos/kokkos/issues/79)
+
+**Fixed bugs:**
+
+- Cuda UVM Allocation test broken with UVM as default space [\#586](https://github.com/kokkos/kokkos/issues/586)
+- Bug \(develop branch only\): multiple tests are now failing when forcing uvm usage. [\#570](https://github.com/kokkos/kokkos/issues/570)
+- Error in generate\_makefile.sh for Kokkos when Compiler is Empty String/Fails [\#568](https://github.com/kokkos/kokkos/issues/568)
+- XL 13.1.4 incorrect C++11 flag [\#553](https://github.com/kokkos/kokkos/issues/553)
+- Improve DynRankView debug check [\#541](https://github.com/kokkos/kokkos/issues/541)
+- Installing Library on MAC broken due to cp -u [\#539](https://github.com/kokkos/kokkos/issues/539)
+- Intel Nightly Testing with Debug enabled fails [\#534](https://github.com/kokkos/kokkos/issues/534)
+
+## [2.02.01](https://github.com/kokkos/kokkos/tree/2.02.01) (2016-11-01)
+[Full Changelog](https://github.com/kokkos/kokkos/compare/2.02.00...2.02.01)
+
+**Implemented enhancements:**
+
+- Add Changelog generation to our process. [\#506](https://github.com/kokkos/kokkos/issues/506)
+
+**Fixed bugs:**
+
+- Test scratch\_request fails in Serial with Debug enabled [\#520](https://github.com/kokkos/kokkos/issues/520)
+- Bug In BoundsCheck for DynRankView [\#516](https://github.com/kokkos/kokkos/issues/516)
+
+## [2.02.00](https://github.com/kokkos/kokkos/tree/2.02.00) (2016-10-30)
+[Full Changelog](https://github.com/kokkos/kokkos/compare/2.01.10...2.02.00)
+
+**Implemented enhancements:**
+
+- Add PowerPC assembly for grabbing clock register in memory pool [\#511](https://github.com/kokkos/kokkos/issues/511)
+- Add GCC 6.x support [\#508](https://github.com/kokkos/kokkos/issues/508)
+- Test install and build against installed library [\#498](https://github.com/kokkos/kokkos/issues/498)
+- Makefile.kokkos adds expt-extended-lambda to cuda build with clang [\#490](https://github.com/kokkos/kokkos/issues/490)
+- Add top-level makefile option to just test kokkos-core unit-test [\#485](https://github.com/kokkos/kokkos/issues/485)
+- Split and harmonize Object Files of Core UnitTests to increase build parallelism [\#484](https://github.com/kokkos/kokkos/issues/484)
+- LayoutLeft to LayoutLeft subview for 3D and 4D views [\#473](https://github.com/kokkos/kokkos/issues/473)
+- Add official Cuda 8.0 support [\#468](https://github.com/kokkos/kokkos/issues/468)
+- Allow C++1Z Flag for Class Lambda capture [\#465](https://github.com/kokkos/kokkos/issues/465)
+- Add Clang 4.0+ compilation of Cuda code [\#455](https://github.com/kokkos/kokkos/issues/455)
+- Possible Issue with Intel 17.0.098 and GCC 6.1.0 in Develop Branch [\#445](https://github.com/kokkos/kokkos/issues/445)
+- Add name of view to "View bounds error" [\#432](https://github.com/kokkos/kokkos/issues/432)
+- Move Sort Binning Operators into Kokkos namespace [\#421](https://github.com/kokkos/kokkos/issues/421)
+- TaskPolicy - generate error when attempt to use uninitialized [\#396](https://github.com/kokkos/kokkos/issues/396)
+- Import WithoutInitializing and AllowPadding into Kokkos namespace [\#325](https://github.com/kokkos/kokkos/issues/325)
+- TeamThreadRange requires begin, end to be the same type [\#305](https://github.com/kokkos/kokkos/issues/305)
+- CudaUVMSpace should track \# allocations, due to CUDA limit on \# UVM allocations [\#300](https://github.com/kokkos/kokkos/issues/300)
+- Remove old View and its infrastructure [\#259](https://github.com/kokkos/kokkos/issues/259)
+
+**Fixed bugs:**
+
+- Bug in TestCuda\_Other.cpp: most likely assembly inserted into Device code [\#515](https://github.com/kokkos/kokkos/issues/515)
+- Cuda Compute Capability check of GPU is outdated [\#509](https://github.com/kokkos/kokkos/issues/509)
+- multi\_scratch test with hwloc and pthreads seg-faults. [\#504](https://github.com/kokkos/kokkos/issues/504)
+- generate\_makefile.bash: "make install" is broken [\#503](https://github.com/kokkos/kokkos/issues/503)
+- make clean in Out of Source Build/Tests Does Not Work Correctly [\#502](https://github.com/kokkos/kokkos/issues/502)
+- Makefiles for test and examples have issues in Cuda when CXX is not explicitly specified [\#497](https://github.com/kokkos/kokkos/issues/497)
+- Dispatch lambda test directly inside GTEST macro doesn't work with nvcc [\#491](https://github.com/kokkos/kokkos/issues/491)
+- UnitTests with HWLOC enabled fail if run with mpirun bound to a single core [\#489](https://github.com/kokkos/kokkos/issues/489)
+- Failing Reducer Test on Mac with Pthreads [\#479](https://github.com/kokkos/kokkos/issues/479)
+- make test Dumps Error with Clang Not Found [\#471](https://github.com/kokkos/kokkos/issues/471)
+- OpenMP TeamPolicy member broadcast not using correct volatile shared variable [\#424](https://github.com/kokkos/kokkos/issues/424)
+- TaskPolicy - generate error when attempt to use uninitialized [\#396](https://github.com/kokkos/kokkos/issues/396)
+- New task policy implementation is pulling in old experimental code. [\#372](https://github.com/kokkos/kokkos/issues/372)
+- MemoryPool unit test hangs on Power8 with GCC 6.1.0 [\#298](https://github.com/kokkos/kokkos/issues/298)
+
+## [2.01.10](https://github.com/kokkos/kokkos/tree/2.01.10) (2016-09-27)
+[Full Changelog](https://github.com/kokkos/kokkos/compare/2.01.06...2.01.10)
+
+**Implemented enhancements:**
+
+- Enable Profiling by default in Tribits build [\#438](https://github.com/kokkos/kokkos/issues/438)
+- parallel\_reduce\(0\), parallel\_scan\(0\) unit tests [\#436](https://github.com/kokkos/kokkos/issues/436)
+- data\(\)==NULL after realloc with LayoutStride [\#351](https://github.com/kokkos/kokkos/issues/351)
+- Fix tutorials to track new Kokkos::View [\#323](https://github.com/kokkos/kokkos/issues/323)
+- Rename team policy set\_scratch\_size. [\#195](https://github.com/kokkos/kokkos/issues/195)
+
+**Fixed bugs:**
+
+- Possible Issue with Intel 17.0.098 and GCC 6.1.0 in Develop Branch [\#445](https://github.com/kokkos/kokkos/issues/445)
+- Makefile spits syntax error [\#435](https://github.com/kokkos/kokkos/issues/435)
+- Kokkos::sort fails for view with all the same values [\#422](https://github.com/kokkos/kokkos/issues/422)
+- Generic Reducers: can't accept inline constructed reducer [\#404](https://github.com/kokkos/kokkos/issues/404)
+- data\\(\\)==NULL after realloc with LayoutStride [\#351](https://github.com/kokkos/kokkos/issues/351)
+- const subview of const view with compile time dimensions on Cuda backend [\#310](https://github.com/kokkos/kokkos/issues/310)
+- Kokkos \(in Trilinos\) Causes Internal Compiler Error on CUDA 8.0.21-EA on POWER8 [\#307](https://github.com/kokkos/kokkos/issues/307)
+- Core Oversubscription Detection Broken? [\#159](https://github.com/kokkos/kokkos/issues/159)
+
+
+## [2.01.06](https://github.com/kokkos/kokkos/tree/2.01.06) (2016-09-02)
+[Full Changelog](https://github.com/kokkos/kokkos/compare/2.01.00...2.01.06)
+
+**Implemented enhancements:**
+
+- Add "standard" reducers for lambda-supportable customized reduce [\#411](https://github.com/kokkos/kokkos/issues/411)
+- TaskPolicy - single thread back-end execution [\#390](https://github.com/kokkos/kokkos/issues/390)
+- Kokkos master clone tag [\#387](https://github.com/kokkos/kokkos/issues/387)
+- Query memory requirements from task policy [\#378](https://github.com/kokkos/kokkos/issues/378)
+- Output order of test\_atomic.cpp is confusing [\#373](https://github.com/kokkos/kokkos/issues/373)
+- Missing testing for atomics [\#341](https://github.com/kokkos/kokkos/issues/341)
+- Feature request for Kokkos to provide Kokkos::atomic\_fetch\_max and atomic\_fetch\_min [\#336](https://github.com/kokkos/kokkos/issues/336)
+- TaskPolicy\<Cuda\> performance requires teams mapped to warps [\#218](https://github.com/kokkos/kokkos/issues/218)
+
+**Fixed bugs:**
+
+- Reduce with Teams broken for custom initialize [\#407](https://github.com/kokkos/kokkos/issues/407)
+- Failing Kokkos build on Debian [\#402](https://github.com/kokkos/kokkos/issues/402)
+- Failing Tests on NVIDIA Pascal GPUs [\#398](https://github.com/kokkos/kokkos/issues/398)
+- Algorithms: fill\_random assumes dimensions fit in unsigned int [\#389](https://github.com/kokkos/kokkos/issues/389)
+- Kokkos::subview with RandomAccess Memory Trait [\#385](https://github.com/kokkos/kokkos/issues/385)
+- Build warning \(signed / unsigned comparison\) in Cuda implementation [\#365](https://github.com/kokkos/kokkos/issues/365)
+- wrong results for a parallel\_reduce with CUDA8 / Maxwell50 [\#352](https://github.com/kokkos/kokkos/issues/352)
+- Hierarchical parallelism - 3 level unit test [\#344](https://github.com/kokkos/kokkos/issues/344)
+- Can I allocate a View w/ both WithoutInitializing & AllowPadding? [\#324](https://github.com/kokkos/kokkos/issues/324)
+- subview View layout determination [\#309](https://github.com/kokkos/kokkos/issues/309)
+- Unit tests with Cuda - Maxwell [\#196](https://github.com/kokkos/kokkos/issues/196)
+
+## [2.01.00](https://github.com/kokkos/kokkos/tree/2.01.00) (2016-07-21)
+[Full Changelog](https://github.com/kokkos/kokkos/compare/End_C++98...2.01.00)
+
+**Implemented enhancements:**
+
+- Edit ViewMapping so assigning Views with the same custom layout compiles when const casting [\#327](https://github.com/kokkos/kokkos/issues/327)
+- DynRankView: Performance improvement for operator\(\) [\#321](https://github.com/kokkos/kokkos/issues/321)
+- Interoperability between static and dynamic rank views [\#295](https://github.com/kokkos/kokkos/issues/295)
+- subview member function ? [\#280](https://github.com/kokkos/kokkos/issues/280)
+- Inter-operatibility between View and DynRankView. [\#245](https://github.com/kokkos/kokkos/issues/245)
+- \(Trilinos\) build warning in atomic\_assign, with Kokkos::complex [\#177](https://github.com/kokkos/kokkos/issues/177)
+- View\<\>::shmem\_size should runtime check for number of arguments equal to rank [\#176](https://github.com/kokkos/kokkos/issues/176)
+- Custom reduction join via lambda argument [\#99](https://github.com/kokkos/kokkos/issues/99)
+- DynRankView with 0 dimensions passed in at construction [\#293](https://github.com/kokkos/kokkos/issues/293)
+- Inject view\_alloc and friends into Kokkos namespace [\#292](https://github.com/kokkos/kokkos/issues/292)
+- Less restrictive TeamPolicy reduction on Cuda [\#286](https://github.com/kokkos/kokkos/issues/286)
+- deep\_copy using remap with source execution space [\#267](https://github.com/kokkos/kokkos/issues/267)
+- Suggestion: Enable opt-in L1 caching via nvcc-wrapper [\#261](https://github.com/kokkos/kokkos/issues/261)
+- More flexible create\_mirror functions [\#260](https://github.com/kokkos/kokkos/issues/260)
+- Rename View::memory\_span to View::required\_allocation\_size [\#256](https://github.com/kokkos/kokkos/issues/256)
+- Use of subviews and views with compile-time dimensions [\#237](https://github.com/kokkos/kokkos/issues/237)
+- Use of subviews and views with compile-time dimensions [\#237](https://github.com/kokkos/kokkos/issues/237)
+- Kokkos::Timer [\#234](https://github.com/kokkos/kokkos/issues/234)
+- Fence CudaUVMSpace allocations [\#230](https://github.com/kokkos/kokkos/issues/230)
+- View::operator\(\) accept std::is\_integral and std::is\_enum [\#227](https://github.com/kokkos/kokkos/issues/227)
+- Allocating zero size View [\#216](https://github.com/kokkos/kokkos/issues/216)
+- Thread scalable memory pool [\#212](https://github.com/kokkos/kokkos/issues/212)
+- Add a way to disable memory leak output [\#194](https://github.com/kokkos/kokkos/issues/194)
+- Kokkos exec space init should init Kokkos profiling [\#192](https://github.com/kokkos/kokkos/issues/192)
+- Runtime rank wrapper for View [\#189](https://github.com/kokkos/kokkos/issues/189)
+- Profiling Interface [\#158](https://github.com/kokkos/kokkos/issues/158)
+- Fix View assignment \(of managed to unmanaged\) [\#153](https://github.com/kokkos/kokkos/issues/153)
+- Add unit test for assignment of managed View to unmanaged View [\#152](https://github.com/kokkos/kokkos/issues/152)
+- Check for oversubscription of threads with MPI in Kokkos::initialize [\#149](https://github.com/kokkos/kokkos/issues/149)
+- Dynamic resizeable 1dimensional view [\#143](https://github.com/kokkos/kokkos/issues/143)
+- Develop TaskPolicy for CUDA [\#142](https://github.com/kokkos/kokkos/issues/142)
+- New View : Test Compilation Downstream [\#138](https://github.com/kokkos/kokkos/issues/138)
+- New View Implementation [\#135](https://github.com/kokkos/kokkos/issues/135)
+- Add variant of subview that lets users add traits [\#134](https://github.com/kokkos/kokkos/issues/134)
+- NVCC-WRAPPER: Add --host-only flag [\#121](https://github.com/kokkos/kokkos/issues/121)
+- Address gtest issue with TriBITS Kokkos build outside of Trilinos [\#117](https://github.com/kokkos/kokkos/issues/117)
+- Make tests pass with -expt-extended-lambda on CUDA [\#108](https://github.com/kokkos/kokkos/issues/108)
+- Dynamic scheduling for parallel\_for and parallel\_reduce [\#106](https://github.com/kokkos/kokkos/issues/106)
+- Runtime or compile time error when reduce functor's join is not properly specified as const member function or with volatile arguments [\#105](https://github.com/kokkos/kokkos/issues/105)
+- Error out when the number of threads is modified after kokkos is initialized [\#104](https://github.com/kokkos/kokkos/issues/104)
+- Porting to POWER and remove assumption of X86 default [\#103](https://github.com/kokkos/kokkos/issues/103)
+- Dynamic scheduling option for RangePolicy [\#100](https://github.com/kokkos/kokkos/issues/100)
+- SharedMemory Support for Lambdas [\#81](https://github.com/kokkos/kokkos/issues/81)
+- Recommended TeamSize for Lambdas [\#80](https://github.com/kokkos/kokkos/issues/80)
+- Add Aggressive Vectorization Compilation mode [\#72](https://github.com/kokkos/kokkos/issues/72)
+- Dynamic scheduling team execution policy [\#53](https://github.com/kokkos/kokkos/issues/53)
+- UVM allocations in multi-GPU systems [\#50](https://github.com/kokkos/kokkos/issues/50)
+- Synchronic in Kokkos::Impl [\#44](https://github.com/kokkos/kokkos/issues/44)
+- index and dimension types in for loops [\#28](https://github.com/kokkos/kokkos/issues/28)
+- Subview assign of 1D Strided with stride 1 to LayoutLeft/Right [\#1](https://github.com/kokkos/kokkos/issues/1)
+
+**Fixed bugs:**
+
+- misspelled variable name in Kokkos\_Atomic\_Fetch + missing unit tests [\#340](https://github.com/kokkos/kokkos/issues/340)
+- seg fault Kokkos::Impl::CudaInternal::print\_configuration [\#338](https://github.com/kokkos/kokkos/issues/338)
+- Clang compiler error with named parallel\_reduce, tags, and TeamPolicy. [\#335](https://github.com/kokkos/kokkos/issues/335)
+- Shared Memory Allocation Error at parallel\_reduce [\#311](https://github.com/kokkos/kokkos/issues/311)
+- DynRankView: Fix resize and realloc [\#303](https://github.com/kokkos/kokkos/issues/303)
+- Scratch memory and dynamic scheduling [\#279](https://github.com/kokkos/kokkos/issues/279)
+- MemoryPool infinite loop when out of memory [\#312](https://github.com/kokkos/kokkos/issues/312)
+- Kokkos DynRankView changes break Sacado and Panzer [\#299](https://github.com/kokkos/kokkos/issues/299)
+- MemoryPool fails to compile on non-cuda non-x86 [\#297](https://github.com/kokkos/kokkos/issues/297)
+- Random Number Generator Fix [\#296](https://github.com/kokkos/kokkos/issues/296)
+- View template parameter ordering Bug [\#282](https://github.com/kokkos/kokkos/issues/282)
+- Serial task policy broken. [\#281](https://github.com/kokkos/kokkos/issues/281)
+- deep\_copy with LayoutStride should not memcpy [\#262](https://github.com/kokkos/kokkos/issues/262)
+- DualView::need\_sync should be a const method [\#248](https://github.com/kokkos/kokkos/issues/248)
+- Arbitrary-sized atomics on GPUs broken; loop forever [\#238](https://github.com/kokkos/kokkos/issues/238)
+- boolean reduction value\_type changes answer [\#225](https://github.com/kokkos/kokkos/issues/225)
+- Custom init\(\) function for parallel\_reduce with array value\_type [\#210](https://github.com/kokkos/kokkos/issues/210)
+- unit\_test Makefile is Broken - Recursively Calls itself until Machine Apocalypse. [\#202](https://github.com/kokkos/kokkos/issues/202)
+- nvcc\_wrapper Does Not Support -Xcompiler \<compiler option\> [\#198](https://github.com/kokkos/kokkos/issues/198)
+- Kokkos exec space init should init Kokkos profiling [\#192](https://github.com/kokkos/kokkos/issues/192)
+- Kokkos Threads Backend impl\_shared\_alloc Broken on Intel 16.1 \(Shepard Haswell\) [\#186](https://github.com/kokkos/kokkos/issues/186)
+- pthread back end hangs if used uninitialized [\#182](https://github.com/kokkos/kokkos/issues/182)
+- parallel\_reduce of size 0, not calling init/join [\#175](https://github.com/kokkos/kokkos/issues/175)
+- Bug in Threads with OpenMP enabled [\#173](https://github.com/kokkos/kokkos/issues/173)
+- KokkosExp\_SharedAlloc, m\_team\_work\_index inaccessible [\#166](https://github.com/kokkos/kokkos/issues/166)
+- 128-bit CAS without Assembly Broken? [\#161](https://github.com/kokkos/kokkos/issues/161)
+- fatal error: Cuda/Kokkos\_Cuda\_abort.hpp: No such file or directory [\#157](https://github.com/kokkos/kokkos/issues/157)
+- Power8: Fix OpenMP backend [\#139](https://github.com/kokkos/kokkos/issues/139)
+- Data race in Kokkos OpenMP initialization [\#131](https://github.com/kokkos/kokkos/issues/131)
+- parallel\_launch\_local\_memory and cuda 7.5 [\#125](https://github.com/kokkos/kokkos/issues/125)
+- Resize can fail with Cuda due to asynchronous dispatch [\#119](https://github.com/kokkos/kokkos/issues/119)
+- Qthread taskpolicy initialization bug. [\#92](https://github.com/kokkos/kokkos/issues/92)
+- Windows: sys/mman.h [\#89](https://github.com/kokkos/kokkos/issues/89)
+- Windows: atomic\_fetch\_sub\(\) [\#88](https://github.com/kokkos/kokkos/issues/88)
+- Windows: snprintf [\#87](https://github.com/kokkos/kokkos/issues/87)
+- Parallel\_Reduce with TeamPolicy and league size of 0 returns garbage [\#85](https://github.com/kokkos/kokkos/issues/85)
+- Throw with Cuda when using \(2D\) team\_policy parallel\_reduce with less than a warp size [\#76](https://github.com/kokkos/kokkos/issues/76)
+- Scalar views don't work with Kokkos::Atomic memory trait [\#69](https://github.com/kokkos/kokkos/issues/69)
+- Reduce the number of threads per team for Cuda [\#63](https://github.com/kokkos/kokkos/issues/63)
+- Named Kernels fail for reductions with CUDA [\#60](https://github.com/kokkos/kokkos/issues/60)
+- Kokkos View dimension\_\(\) for long returning unsigned int [\#20](https://github.com/kokkos/kokkos/issues/20)
+- atomic test hangs with LLVM [\#6](https://github.com/kokkos/kokkos/issues/6)
+- OpenMP Test should set omp\_set\_num\_threads to 1 [\#4](https://github.com/kokkos/kokkos/issues/4)
+
+**Closed issues:**
+
+- develop branch broken with CUDA 8 and --expt-extended-lambda [\#354](https://github.com/kokkos/kokkos/issues/354)
+- --arch=KNL with Intel 2016 build failure [\#349](https://github.com/kokkos/kokkos/issues/349)
+- Error building with Cuda when passing -DKOKKOS\_CUDA\_USE\_LAMBDA to generate\_makefile.bash [\#343](https://github.com/kokkos/kokkos/issues/343)
+- Can I safely use int indices in a 2-D View with capacity \> 2B? [\#318](https://github.com/kokkos/kokkos/issues/318)
+- Kokkos::ViewAllocateWithoutInitializing is not working [\#317](https://github.com/kokkos/kokkos/issues/317)
+- Intel build on Mac OS X [\#277](https://github.com/kokkos/kokkos/issues/277)
+- deleted [\#271](https://github.com/kokkos/kokkos/issues/271)
+- Broken Mira build [\#268](https://github.com/kokkos/kokkos/issues/268)
+- 32-bit build [\#246](https://github.com/kokkos/kokkos/issues/246)
+- parallel\_reduce with RDC crashes linker [\#232](https://github.com/kokkos/kokkos/issues/232)
+- build of Kokkos\_Sparse\_MV\_impl\_spmv\_Serial.cpp.o fails if you use nvcc and have cuda disabled [\#209](https://github.com/kokkos/kokkos/issues/209)
+- Kokkos Serial execution space is not tested with TeamPolicy. [\#207](https://github.com/kokkos/kokkos/issues/207)
+- Unit test failure on Hansen KokkosCore\_UnitTest\_Cuda\_MPI\_1 [\#200](https://github.com/kokkos/kokkos/issues/200)
+- nvcc compiler warning: calling a \_\_host\_\_ function from a \_\_host\_\_ \_\_device\_\_ function is not allowed [\#180](https://github.com/kokkos/kokkos/issues/180)
+- Intel 15 build error with defaulted "move" operators [\#171](https://github.com/kokkos/kokkos/issues/171)
+- missing libkokkos.a during Trilinos 12.4.2 build, yet other libkokkos\*.a libs are there [\#165](https://github.com/kokkos/kokkos/issues/165)
+- Tie atomic updates to execution space or even to thread team? \(speculation\) [\#144](https://github.com/kokkos/kokkos/issues/144)
+- New View: Compiletime/size Test [\#137](https://github.com/kokkos/kokkos/issues/137)
+- New View : Performance Test [\#136](https://github.com/kokkos/kokkos/issues/136)
+- Signed/unsigned comparison warning in CUDA parallel [\#130](https://github.com/kokkos/kokkos/issues/130)
+- Kokkos::complex: Need op\* w/ std::complex & real [\#126](https://github.com/kokkos/kokkos/issues/126)
+- Use uintptr\_t for casting pointers [\#110](https://github.com/kokkos/kokkos/issues/110)
+- Default thread mapping behavior between P and Q threads. [\#91](https://github.com/kokkos/kokkos/issues/91)
+- Windows: Atomic\_Fetch\_Exchange\(\) return type [\#90](https://github.com/kokkos/kokkos/issues/90)
+- Synchronic unit test is way too long [\#84](https://github.com/kokkos/kokkos/issues/84)
+- nvcc\_wrapper -\> $\(NVCC\_WRAPPER\) [\#42](https://github.com/kokkos/kokkos/issues/42)
+- Check compiler version and print helpful message [\#39](https://github.com/kokkos/kokkos/issues/39)
+- Kokkos shared memory on Cuda uses a lot of registers [\#31](https://github.com/kokkos/kokkos/issues/31)
+- Can not pass unit test `cuda.space` without a GT 720 [\#25](https://github.com/kokkos/kokkos/issues/25)
+- Makefile.kokkos lacks bounds checking option that CMake has [\#24](https://github.com/kokkos/kokkos/issues/24)
+- Kokkos can not complete unit tests with CUDA UVM enabled [\#23](https://github.com/kokkos/kokkos/issues/23)
+- Simplify teams + shared memory histogram example to remove vectorization [\#21](https://github.com/kokkos/kokkos/issues/21)
+- Kokkos needs to rever to ${PROJECT\_NAME}\_ENABLE\_CXX11 not Trilinos\_ENABLE\_CXX11 [\#17](https://github.com/kokkos/kokkos/issues/17)
+- Kokkos Base Makefile adds AVX to KNC Build [\#16](https://github.com/kokkos/kokkos/issues/16)
+- MS Visual Studio 2013 Build Errors [\#9](https://github.com/kokkos/kokkos/issues/9)
+- subview\(X, ALL\(\), j\) for 2-D LayoutRight View X: should it view a column? [\#5](https://github.com/kokkos/kokkos/issues/5)
+
+## [End_C++98](https://github.com/kokkos/kokkos/tree/End_C++98) (2015-04-15)
+
+
+\* *This Change Log was automatically generated by [github_changelog_generator](https://github.com/skywinder/Github-Changelog-Generator)*
diff --git a/lib/kokkos/CMakeLists.txt b/lib/kokkos/CMakeLists.txt
index 1219352f7..2b2b9be6a 100644
--- a/lib/kokkos/CMakeLists.txt
+++ b/lib/kokkos/CMakeLists.txt
@@ -1,184 +1,216 @@
IF(COMMAND TRIBITS_PACKAGE_DECL)
SET(KOKKOS_HAS_TRILINOS ON CACHE BOOL "")
ELSE()
SET(KOKKOS_HAS_TRILINOS OFF CACHE BOOL "")
ENDIF()
IF(NOT KOKKOS_HAS_TRILINOS)
CMAKE_MINIMUM_REQUIRED(VERSION 2.8.11 FATAL_ERROR)
INCLUDE(cmake/tribits.cmake)
ENDIF()
#
# A) Forward delcare the package so that certain options are also defined for
# subpackages
#
TRIBITS_PACKAGE_DECL(Kokkos) # ENABLE_SHADOWING_WARNINGS)
#------------------------------------------------------------------------------
#
# B) Define the common options for Kokkos first so they can be used by
# subpackages as well.
#
# mfh 01 Aug 2016: See Issue #61:
#
# https://github.com/kokkos/kokkos/issues/61
#
# Don't use TRIBITS_ADD_DEBUG_OPTION() here, because that defines
# HAVE_KOKKOS_DEBUG. We define KOKKOS_HAVE_DEBUG here instead,
# for compatibility with Kokkos' Makefile build system.
TRIBITS_ADD_OPTION_AND_DEFINE(
- ${PACKAGE_NAME}_ENABLE_DEBUG
- ${PACKAGE_NAME_UC}_HAVE_DEBUG
+ Kokkos_ENABLE_DEBUG
+ KOKKOS_HAVE_DEBUG
"Enable run-time debug checks. These checks may be expensive, so they are disabled by default in a release build."
${${PROJECT_NAME}_ENABLE_DEBUG}
)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_SIERRA_BUILD
KOKKOS_FOR_SIERRA
"Configure Kokkos for building within the Sierra build system."
OFF
)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_Cuda
KOKKOS_HAVE_CUDA
"Enable CUDA support in Kokkos."
"${TPL_ENABLE_CUDA}"
)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_Cuda_UVM
KOKKOS_USE_CUDA_UVM
- "Enable CUDA Unified Virtual Memory support in Kokkos."
+ "Enable CUDA Unified Virtual Memory as the default in Kokkos."
+ OFF
+ )
+
+TRIBITS_ADD_OPTION_AND_DEFINE(
+ Kokkos_ENABLE_Cuda_RDC
+ KOKKOS_HAVE_CUDA_RDC
+ "Enable CUDA Relocatable Device Code support in Kokkos."
+ OFF
+ )
+
+TRIBITS_ADD_OPTION_AND_DEFINE(
+ Kokkos_ENABLE_Cuda_Lambda
+ KOKKOS_HAVE_CUDA_LAMBDA
+ "Enable CUDA LAMBDA support in Kokkos."
OFF
)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_Pthread
KOKKOS_HAVE_PTHREAD
"Enable Pthread support in Kokkos."
OFF
)
ASSERT_DEFINED(TPL_ENABLE_Pthread)
IF (Kokkos_ENABLE_Pthread AND NOT TPL_ENABLE_Pthread)
MESSAGE(FATAL_ERROR "You set Kokkos_ENABLE_Pthread=ON, but Trilinos' support for Pthread(s) is not enabled (TPL_ENABLE_Pthread=OFF). This is not allowed. Please enable Pthreads in Trilinos before attempting to enable Kokkos' support for Pthreads.")
ENDIF ()
+IF (NOT TPL_ENABLE_Pthread)
+ ADD_DEFINITIONS(-DGTEST_HAS_PTHREAD=0)
+ENDIF()
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_OpenMP
KOKKOS_HAVE_OPENMP
"Enable OpenMP support in Kokkos."
"${${PROJECT_NAME}_ENABLE_OpenMP}"
)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_QTHREAD
KOKKOS_HAVE_QTHREAD
"Enable QTHREAD support in Kokkos."
"${TPL_ENABLE_QTHREAD}"
)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_CXX11
KOKKOS_HAVE_CXX11
"Enable C++11 support in Kokkos."
"${${PROJECT_NAME}_ENABLE_CXX11}"
)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_HWLOC
KOKKOS_HAVE_HWLOC
"Enable HWLOC support in Kokkos."
"${TPL_ENABLE_HWLOC}"
)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_MPI
KOKKOS_HAVE_MPI
"Enable MPI support in Kokkos."
"${TPL_ENABLE_MPI}"
)
# Set default value of Kokkos_ENABLE_Debug_Bounds_Check option
#
# CMake is case sensitive. The Kokkos_ENABLE_Debug_Bounds_Check
# option (defined below) is annoyingly not all caps, but we need to
# keep it that way for backwards compatibility. If users forget and
# try using an all-caps variable, then make it count by using the
# all-caps version as the default value of the original, not-all-caps
# option. Otherwise, the default value of this option comes from
# Kokkos_ENABLE_DEBUG (see Issue #367).
ASSERT_DEFINED(${PACKAGE_NAME}_ENABLE_DEBUG)
IF(DEFINED Kokkos_ENABLE_DEBUG_BOUNDS_CHECK)
IF(Kokkos_ENABLE_DEBUG_BOUNDS_CHECK)
SET(Kokkos_ENABLE_Debug_Bounds_Check_DEFAULT ON)
ELSE()
SET(Kokkos_ENABLE_Debug_Bounds_Check_DEFAULT "${${PACKAGE_NAME}_ENABLE_DEBUG}")
ENDIF()
ELSE()
SET(Kokkos_ENABLE_Debug_Bounds_Check_DEFAULT "${${PACKAGE_NAME}_ENABLE_DEBUG}")
ENDIF()
ASSERT_DEFINED(Kokkos_ENABLE_Debug_Bounds_Check_DEFAULT)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_Debug_Bounds_Check
KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK
"Enable Kokkos::View run-time bounds checking."
"${Kokkos_ENABLE_Debug_Bounds_Check_DEFAULT}"
)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_Profiling
KOKKOS_ENABLE_PROFILING_INTERNAL
"Enable KokkosP profiling support for kernel data collections."
"${TPL_ENABLE_DLlib}"
)
# placeholder for future device...
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_Winthread
KOKKOS_HAVE_WINTHREAD
"Enable Winthread support in Kokkos."
"${TPL_ENABLE_Winthread}"
)
# use new/old View
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_USING_DEPRECATED_VIEW
KOKKOS_USING_DEPRECATED_VIEW
"Choose whether to use the old, deprecated Kokkos::View"
OFF
)
#------------------------------------------------------------------------------
#
-# C) Process the subpackages for Kokkos
+# C) Install Kokkos' executable scripts
+#
+
+
+# nvcc_wrapper is Kokkos' wrapper for NVIDIA's NVCC CUDA compiler.
+# Kokkos needs nvcc_wrapper in order to build. Other libraries and
+# executables also need nvcc_wrapper. Thus, we need to install it.
+# If the argument of DESTINATION is a relative path, CMake computes it
+# as relative to ${CMAKE_INSTALL_PATH}.
+
+INSTALL(PROGRAMS ${CMAKE_CURRENT_SOURCE_DIR}/bin/nvcc_wrapper DESTINATION bin)
+
+
+#------------------------------------------------------------------------------
+#
+# D) Process the subpackages for Kokkos
#
TRIBITS_PROCESS_SUBPACKAGES()
#
-# D) If Kokkos itself is enabled, process the Kokkos package
+# E) If Kokkos itself is enabled, process the Kokkos package
#
TRIBITS_PACKAGE_DEF()
TRIBITS_EXCLUDE_AUTOTOOLS_FILES()
TRIBITS_EXCLUDE_FILES(
classic/doc
classic/LinAlg/doc/CrsRefactorNotesMay2012
)
TRIBITS_PACKAGE_POSTPROCESS()
diff --git a/lib/kokkos/Makefile.kokkos b/lib/kokkos/Makefile.kokkos
index 94d045242..038c252cf 100644
--- a/lib/kokkos/Makefile.kokkos
+++ b/lib/kokkos/Makefile.kokkos
@@ -1,480 +1,664 @@
# Default settings common options
#LAMMPS specific settings:
KOKKOS_PATH=../../lib/kokkos
CXXFLAGS=$(CCFLAGS)
#Options: OpenMP,Serial,Pthreads,Cuda
KOKKOS_DEVICES ?= "OpenMP"
#KOKKOS_DEVICES ?= "Pthreads"
-#Options: KNC,SNB,HSW,Kepler,Kepler30,Kepler32,Kepler35,Kepler37,Maxwell,Maxwell50,Maxwell52,Maxwell53,Pascal61,ARMv8,BGQ,Power7,Power8,KNL,BDW
+#Options: KNC,SNB,HSW,Kepler,Kepler30,Kepler32,Kepler35,Kepler37,Maxwell,Maxwell50,Maxwell52,Maxwell53,Pascal61,ARMv80,ARMv81,ARMv8-ThunderX,BGQ,Power7,Power8,KNL,BDW,SKX
KOKKOS_ARCH ?= ""
#Options: yes,no
KOKKOS_DEBUG ?= "no"
#Options: hwloc,librt,experimental_memkind
KOKKOS_USE_TPLS ?= ""
-#Options: c++11
+#Options: c++11,c++1z
KOKKOS_CXX_STANDARD ?= "c++11"
#Options: aggressive_vectorization,disable_profiling
KOKKOS_OPTIONS ?= ""
#Default settings specific options
#Options: force_uvm,use_ldg,rdc,enable_lambda
KOKKOS_CUDA_OPTIONS ?= "enable_lambda"
# Check for general settings
KOKKOS_INTERNAL_ENABLE_DEBUG := $(strip $(shell echo $(KOKKOS_DEBUG) | grep "yes" | wc -l))
KOKKOS_INTERNAL_ENABLE_CXX11 := $(strip $(shell echo $(KOKKOS_CXX_STANDARD) | grep "c++11" | wc -l))
+KOKKOS_INTERNAL_ENABLE_CXX1Z := $(strip $(shell echo $(KOKKOS_CXX_STANDARD) | grep "c++1z" | wc -l))
# Check for external libraries
KOKKOS_INTERNAL_USE_HWLOC := $(strip $(shell echo $(KOKKOS_USE_TPLS) | grep "hwloc" | wc -l))
KOKKOS_INTERNAL_USE_LIBRT := $(strip $(shell echo $(KOKKOS_USE_TPLS) | grep "librt" | wc -l))
KOKKOS_INTERNAL_USE_MEMKIND := $(strip $(shell echo $(KOKKOS_USE_TPLS) | grep "experimental_memkind" | wc -l))
# Check for advanced settings
KOKKOS_INTERNAL_OPT_RANGE_AGGRESSIVE_VECTORIZATION := $(strip $(shell echo $(KOKKOS_OPTIONS) | grep "aggressive_vectorization" | wc -l))
KOKKOS_INTERNAL_DISABLE_PROFILING := $(strip $(shell echo $(KOKKOS_OPTIONS) | grep "disable_profiling" | wc -l))
KOKKOS_INTERNAL_CUDA_USE_LDG := $(strip $(shell echo $(KOKKOS_CUDA_OPTIONS) | grep "use_ldg" | wc -l))
KOKKOS_INTERNAL_CUDA_USE_UVM := $(strip $(shell echo $(KOKKOS_CUDA_OPTIONS) | grep "force_uvm" | wc -l))
KOKKOS_INTERNAL_CUDA_USE_RELOC := $(strip $(shell echo $(KOKKOS_CUDA_OPTIONS) | grep "rdc" | wc -l))
KOKKOS_INTERNAL_CUDA_USE_LAMBDA := $(strip $(shell echo $(KOKKOS_CUDA_OPTIONS) | grep "enable_lambda" | wc -l))
# Check for Kokkos Host Execution Spaces one of which must be on
KOKKOS_INTERNAL_USE_OPENMP := $(strip $(shell echo $(KOKKOS_DEVICES) | grep OpenMP | wc -l))
KOKKOS_INTERNAL_USE_PTHREADS := $(strip $(shell echo $(KOKKOS_DEVICES) | grep Pthread | wc -l))
KOKKOS_INTERNAL_USE_SERIAL := $(strip $(shell echo $(KOKKOS_DEVICES) | grep Serial | wc -l))
KOKKOS_INTERNAL_USE_QTHREAD := $(strip $(shell echo $(KOKKOS_DEVICES) | grep Qthread | wc -l))
ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 0)
ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 0)
KOKKOS_INTERNAL_USE_SERIAL := 1
endif
endif
+# Check for other Execution Spaces
+
+KOKKOS_INTERNAL_USE_CUDA := $(strip $(shell echo $(KOKKOS_DEVICES) | grep Cuda | wc -l))
+
+ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
+ KOKKOS_INTERNAL_NVCC_PATH := $(shell which nvcc)
+ CUDA_PATH ?= $(KOKKOS_INTERNAL_NVCC_PATH:/bin/nvcc=)
+ KOKKOS_INTERNAL_COMPILER_NVCC_VERSION := $(shell nvcc --version 2>&1 | grep release | cut -d' ' -f5 | cut -d',' -f1 | tr -d .)
+endif
+
+# Check OS
+
+KOKKOS_OS := $(shell uname -s)
+KOKKOS_INTERNAL_OS_CYGWIN := $(shell uname -s | grep CYGWIN | wc -l)
+KOKKOS_INTERNAL_OS_LINUX := $(shell uname -s | grep Linux | wc -l)
+KOKKOS_INTERNAL_OS_DARWIN := $(shell uname -s | grep Darwin | wc -l)
+
+# Check compiler
+
KOKKOS_INTERNAL_COMPILER_INTEL := $(shell $(CXX) --version 2>&1 | grep "Intel Corporation" | wc -l)
KOKKOS_INTERNAL_COMPILER_PGI := $(shell $(CXX) --version 2>&1 | grep PGI | wc -l)
KOKKOS_INTERNAL_COMPILER_XL := $(shell $(CXX) -qversion 2>&1 | grep XL | wc -l)
KOKKOS_INTERNAL_COMPILER_CRAY := $(shell $(CXX) -craype-verbose 2>&1 | grep "CC-" | wc -l)
-KOKKOS_INTERNAL_OS_CYGWIN := $(shell uname | grep CYGWIN | wc -l)
+KOKKOS_INTERNAL_COMPILER_NVCC := $(shell $(CXX) --version 2>&1 | grep "nvcc" | wc -l)
+ifneq ($(OMPI_CXX),)
+ KOKKOS_INTERNAL_COMPILER_NVCC := $(shell $(OMPI_CXX) --version 2>&1 | grep "nvcc" | wc -l)
+endif
+ifneq ($(MPICH_CXX),)
+ KOKKOS_INTERNAL_COMPILER_NVCC := $(shell $(MPICH_CXX) --version 2>&1 | grep "nvcc" | wc -l)
+endif
+KOKKOS_INTERNAL_COMPILER_CLANG := $(shell $(CXX) --version 2>&1 | grep "clang" | wc -l)
+
+ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 2)
+ KOKKOS_INTERNAL_COMPILER_CLANG = 1
+endif
+ifeq ($(KOKKOS_INTERNAL_COMPILER_XL), 2)
+ KOKKOS_INTERNAL_COMPILER_XL = 1
+endif
+
+ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
+ KOKKOS_INTERNAL_COMPILER_CLANG_VERSION := $(shell clang --version | grep version | cut -d ' ' -f3 | tr -d '.')
+ ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
+ ifeq ($(shell test $(KOKKOS_INTERNAL_COMPILER_CLANG_VERSION) -lt 400; echo $$?),0)
+ $(error Compiling Cuda code directly with Clang requires version 4.0.0 or higher)
+ endif
+ KOKKOS_INTERNAL_CUDA_USE_LAMBDA := 1
+ endif
+endif
+
ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
KOKKOS_INTERNAL_OPENMP_FLAG := -mp
else
- ifeq ($(KOKKOS_INTERNAL_COMPILER_XL), 1)
- KOKKOS_INTERNAL_OPENMP_FLAG := -qsmp=omp
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
+ KOKKOS_INTERNAL_OPENMP_FLAG := -fopenmp=libomp
else
- ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
- # OpenMP is turned on by default in Cray compiler environment
- KOKKOS_INTERNAL_OPENMP_FLAG :=
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_XL), 1)
+ KOKKOS_INTERNAL_OPENMP_FLAG := -qsmp=omp
else
- KOKKOS_INTERNAL_OPENMP_FLAG := -fopenmp
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
+ # OpenMP is turned on by default in Cray compiler environment
+ KOKKOS_INTERNAL_OPENMP_FLAG :=
+ else
+ KOKKOS_INTERNAL_OPENMP_FLAG := -fopenmp
+ endif
endif
endif
endif
ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
KOKKOS_INTERNAL_CXX11_FLAG := --c++11
else
ifeq ($(KOKKOS_INTERNAL_COMPILER_XL), 1)
KOKKOS_INTERNAL_CXX11_FLAG := -std=c++11
else
ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
KOKKOS_INTERNAL_CXX11_FLAG := -hstd=c++11
else
KOKKOS_INTERNAL_CXX11_FLAG := --std=c++11
+ KOKKOS_INTERNAL_CXX1Z_FLAG := --std=c++1z
endif
endif
endif
-# Check for other Execution Spaces
-KOKKOS_INTERNAL_USE_CUDA := $(strip $(shell echo $(KOKKOS_DEVICES) | grep Cuda | wc -l))
-
# Check for Kokkos Architecture settings
#Intel based
KOKKOS_INTERNAL_USE_ARCH_KNC := $(strip $(shell echo $(KOKKOS_ARCH) | grep KNC | wc -l))
KOKKOS_INTERNAL_USE_ARCH_SNB := $(strip $(shell echo $(KOKKOS_ARCH) | grep SNB | wc -l))
KOKKOS_INTERNAL_USE_ARCH_HSW := $(strip $(shell echo $(KOKKOS_ARCH) | grep HSW | wc -l))
KOKKOS_INTERNAL_USE_ARCH_BDW := $(strip $(shell echo $(KOKKOS_ARCH) | grep BDW | wc -l))
+KOKKOS_INTERNAL_USE_ARCH_SKX := $(strip $(shell echo $(KOKKOS_ARCH) | grep SKX | wc -l))
KOKKOS_INTERNAL_USE_ARCH_KNL := $(strip $(shell echo $(KOKKOS_ARCH) | grep KNL | wc -l))
#NVIDIA based
NVCC_WRAPPER := $(KOKKOS_PATH)/config/nvcc_wrapper
KOKKOS_INTERNAL_USE_ARCH_KEPLER30 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Kepler30 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_KEPLER32 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Kepler32 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_KEPLER35 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Kepler35 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_KEPLER37 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Kepler37 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_MAXWELL50 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Maxwell50 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_MAXWELL52 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Maxwell52 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_MAXWELL53 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Maxwell53 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_PASCAL61 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Pascal61 | wc -l))
+KOKKOS_INTERNAL_USE_ARCH_PASCAL60 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Pascal60 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_NVIDIA := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_KEPLER30) \
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER32) \
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER35) \
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER37) \
+ $(KOKKOS_INTERNAL_USE_ARCH_PASCAL61) \
+ + $(KOKKOS_INTERNAL_USE_ARCH_PASCAL60) \
+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL50) \
+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL52) \
+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL53) | bc))
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_NVIDIA), 0)
KOKKOS_INTERNAL_USE_ARCH_MAXWELL50 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Maxwell | wc -l))
KOKKOS_INTERNAL_USE_ARCH_KEPLER35 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Kepler | wc -l))
KOKKOS_INTERNAL_USE_ARCH_NVIDIA := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_KEPLER30) \
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER32) \
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER35) \
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER37) \
+ $(KOKKOS_INTERNAL_USE_ARCH_PASCAL61) \
+ + $(KOKKOS_INTERNAL_USE_ARCH_PASCAL60) \
+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL50) \
+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL52) \
+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL53) | bc))
endif
#ARM based
-KOKKOS_INTERNAL_USE_ARCH_ARMV80 := $(strip $(shell echo $(KOKKOS_ARCH) | grep ARMv8 | wc -l))
+KOKKOS_INTERNAL_USE_ARCH_ARMV80 := $(strip $(shell echo $(KOKKOS_ARCH) | grep ARMv80 | wc -l))
+KOKKOS_INTERNAL_USE_ARCH_ARMV81 := $(strip $(shell echo $(KOKKOS_ARCH) | grep ARMv81 | wc -l))
+KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX := $(strip $(shell echo $(KOKKOS_ARCH) | grep ARMv8-ThunderX | wc -l))
#IBM based
KOKKOS_INTERNAL_USE_ARCH_BGQ := $(strip $(shell echo $(KOKKOS_ARCH) | grep BGQ | wc -l))
KOKKOS_INTERNAL_USE_ARCH_POWER7 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Power7 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_POWER8 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Power8 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_IBM := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_BGQ)+$(KOKKOS_INTERNAL_USE_ARCH_POWER7)+$(KOKKOS_INTERNAL_USE_ARCH_POWER8) | bc))
#AMD based
KOKKOS_INTERNAL_USE_ARCH_AMDAVX := $(strip $(shell echo $(KOKKOS_ARCH) | grep AMDAVX | wc -l))
#Any AVX?
-KOKKOS_INTERNAL_USE_ARCH_AVX := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_SNB)+$(KOKKOS_INTERNAL_USE_ARCH_AMDAVX) | bc ))
-KOKKOS_INTERNAL_USE_ARCH_AVX2 := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_HSW)+$(KOKKOS_INTERNAL_USE_ARCH_BDW) | bc ))
-KOKKOS_INTERNAL_USE_ARCH_AVX512MIC := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_KNL) | bc ))
+KOKKOS_INTERNAL_USE_ARCH_AVX := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_SNB)+$(KOKKOS_INTERNAL_USE_ARCH_AMDAVX) | bc ))
+KOKKOS_INTERNAL_USE_ARCH_AVX2 := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_HSW)+$(KOKKOS_INTERNAL_USE_ARCH_BDW) | bc ))
+KOKKOS_INTERNAL_USE_ARCH_AVX512MIC := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_KNL) | bc ))
+KOKKOS_INTERNAL_USE_ARCH_AVX512XEON := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_SKX) | bc ))
# Decide what ISA level we are able to support
-KOKKOS_INTERNAL_USE_ISA_X86_64 := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_SNB)+$(KOKKOS_INTERNAL_USE_ARCH_HSW)+$(KOKKOS_INTERNAL_USE_ARCH_BDW)+$(KOKKOS_INTERNAL_USE_ARCH_KNL) | bc ))
+KOKKOS_INTERNAL_USE_ISA_X86_64 := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_SNB)+$(KOKKOS_INTERNAL_USE_ARCH_HSW)+$(KOKKOS_INTERNAL_USE_ARCH_BDW)+$(KOKKOS_INTERNAL_USE_ARCH_KNL)+$(KOKKOS_INTERNAL_USE_ARCH_SKX) | bc ))
KOKKOS_INTERNAL_USE_ISA_KNC := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_KNC) | bc ))
KOKKOS_INTERNAL_USE_ISA_POWERPCLE := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_POWER8) | bc ))
#Incompatible flags?
-KOKKOS_INTERNAL_USE_ARCH_MULTIHOST := $(strip $(shell echo "$(KOKKOS_INTERNAL_USE_ARCH_AVX)+$(KOKKOS_INTERNAL_USE_ARCH_AVX2)+$(KOKKOS_INTERNAL_USE_ARCH_KNC)+$(KOKKOS_INTERNAL_USE_ARCH_IBM)+$(KOKKOS_INTERNAL_USE_ARCH_AMDAVX)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV80)>1" | bc ))
+KOKKOS_INTERNAL_USE_ARCH_MULTIHOST := $(strip $(shell echo "$(KOKKOS_INTERNAL_USE_ARCH_AVX)+$(KOKKOS_INTERNAL_USE_ARCH_AVX2)+$(KOKKOS_INTERNAL_USE_ARCH_KNC)+$(KOKKOS_INTERNAL_USE_ARCH_IBM)+$(KOKKOS_INTERNAL_USE_ARCH_AMDAVX)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV80)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV81)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX)>1" | bc ))
KOKKOS_INTERNAL_USE_ARCH_MULTIGPU := $(strip $(shell echo "$(KOKKOS_INTERNAL_USE_ARCH_NVIDIA)>1" | bc))
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MULTIHOST), 1)
$(error Defined Multiple Host architectures: KOKKOS_ARCH=$(KOKKOS_ARCH) )
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MULTIGPU), 1)
$(error Defined Multiple GPU architectures: KOKKOS_ARCH=$(KOKKOS_ARCH) )
endif
#Generating the list of Flags
KOKKOS_CPPFLAGS = -I./ -I$(KOKKOS_PATH)/core/src -I$(KOKKOS_PATH)/containers/src -I$(KOKKOS_PATH)/algorithms/src
# No warnings:
KOKKOS_CXXFLAGS =
# INTEL and CLANG warnings:
#KOKKOS_CXXFLAGS = -Wall -Wshadow -pedantic -Wsign-compare -Wtype-limits -Wuninitialized
# GCC warnings:
#KOKKOS_CXXFLAGS = -Wall -Wshadow -pedantic -Wsign-compare -Wtype-limits -Wuninitialized -Wignored-qualifiers -Wempty-body -Wclobbered
KOKKOS_LIBS = -lkokkos -ldl
KOKKOS_LDFLAGS = -L$(shell pwd)
KOKKOS_SRC =
KOKKOS_HEADERS =
#Generating the KokkosCore_config.h file
tmp := $(shell echo "/* ---------------------------------------------" > KokkosCore_config.tmp)
tmp := $(shell echo "Makefile constructed configuration:" >> KokkosCore_config.tmp)
tmp := $(shell date >> KokkosCore_config.tmp)
tmp := $(shell echo "----------------------------------------------*/" >> KokkosCore_config.tmp)
tmp := $(shell echo "/* Execution Spaces */" >> KokkosCore_config.tmp)
ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
tmp := $(shell echo '\#define KOKKOS_HAVE_OPENMP 1' >> KokkosCore_config.tmp)
endif
ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 1)
tmp := $(shell echo "\#define KOKKOS_HAVE_PTHREAD 1" >> KokkosCore_config.tmp )
endif
ifeq ($(KOKKOS_INTERNAL_USE_SERIAL), 1)
tmp := $(shell echo "\#define KOKKOS_HAVE_SERIAL 1" >> KokkosCore_config.tmp )
endif
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
tmp := $(shell echo "\#define KOKKOS_HAVE_CUDA 1" >> KokkosCore_config.tmp )
endif
ifeq ($(KOKKOS_INTERNAL_USE_ISA_X86_64), 1)
+ tmp := $(shell echo "\#ifndef __CUDA_ARCH__" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_USE_ISA_X86_64" >> KokkosCore_config.tmp )
+ tmp := $(shell echo "\#endif" >> KokkosCore_config.tmp )
endif
ifeq ($(KOKKOS_INTERNAL_USE_ISA_KNC), 1)
+ tmp := $(shell echo "\#ifndef __CUDA_ARCH__" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_USE_ISA_KNC" >> KokkosCore_config.tmp )
+ tmp := $(shell echo "\#endif" >> KokkosCore_config.tmp )
endif
ifeq ($(KOKKOS_INTERNAL_USE_ISA_POWERPCLE), 1)
+ tmp := $(shell echo "\#ifndef __CUDA_ARCH__" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_USE_ISA_POWERPCLE" >> KokkosCore_config.tmp )
+ tmp := $(shell echo "\#endif" >> KokkosCore_config.tmp )
endif
ifeq ($(KOKKOS_INTERNAL_USE_QTHREAD), 1)
KOKKOS_CPPFLAGS += -I$(QTHREAD_PATH)/include
KOKKOS_LDFLAGS += -L$(QTHREAD_PATH)/lib
tmp := $(shell echo "\#define KOKKOS_HAVE_QTHREAD 1" >> KokkosCore_config.tmp )
endif
tmp := $(shell echo "/* General Settings */" >> KokkosCore_config.tmp)
ifeq ($(KOKKOS_INTERNAL_ENABLE_CXX11), 1)
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_CXX11_FLAG)
tmp := $(shell echo "\#define KOKKOS_HAVE_CXX11 1" >> KokkosCore_config.tmp )
endif
+ifeq ($(KOKKOS_INTERNAL_ENABLE_CXX1Z), 1)
+ KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_CXX1Z_FLAG)
+ tmp := $(shell echo "\#define KOKKOS_HAVE_CXX11 1" >> KokkosCore_config.tmp )
+ tmp := $(shell echo "\#define KOKKOS_HAVE_CXX1Z 1" >> KokkosCore_config.tmp )
+endif
+
ifeq ($(KOKKOS_INTERNAL_ENABLE_DEBUG), 1)
-ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
- KOKKOS_CXXFLAGS += -G
+ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC), 1)
+ KOKKOS_CXXFLAGS += -lineinfo
endif
KOKKOS_CXXFLAGS += -g
KOKKOS_LDFLAGS += -g -ldl
tmp := $(shell echo "\#define KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_HAVE_DEBUG 1" >> KokkosCore_config.tmp )
endif
ifeq ($(KOKKOS_INTERNAL_USE_HWLOC), 1)
KOKKOS_CPPFLAGS += -I$(HWLOC_PATH)/include
KOKKOS_LDFLAGS += -L$(HWLOC_PATH)/lib
KOKKOS_LIBS += -lhwloc
tmp := $(shell echo "\#define KOKKOS_HAVE_HWLOC 1" >> KokkosCore_config.tmp )
endif
ifeq ($(KOKKOS_INTERNAL_USE_LIBRT), 1)
tmp := $(shell echo "\#define KOKKOS_USE_LIBRT 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define PREC_TIMER 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOSP_ENABLE_RTLIB 1" >> KokkosCore_config.tmp )
KOKKOS_LIBS += -lrt
endif
ifeq ($(KOKKOS_INTERNAL_USE_MEMKIND), 1)
KOKKOS_CPPFLAGS += -I$(MEMKIND_PATH)/include
KOKKOS_LDFLAGS += -L$(MEMKIND_PATH)/lib
KOKKOS_LIBS += -lmemkind
tmp := $(shell echo "\#define KOKKOS_HAVE_HBWSPACE 1" >> KokkosCore_config.tmp )
endif
ifeq ($(KOKKOS_INTERNAL_DISABLE_PROFILING), 1)
tmp := $(shell echo "\#define KOKKOS_ENABLE_PROFILING 0" >> KokkosCore_config.tmp )
endif
tmp := $(shell echo "/* Optimization Settings */" >> KokkosCore_config.tmp)
ifeq ($(KOKKOS_INTERNAL_OPT_RANGE_AGGRESSIVE_VECTORIZATION), 1)
tmp := $(shell echo "\#define KOKKOS_OPT_RANGE_AGGRESSIVE_VECTORIZATION 1" >> KokkosCore_config.tmp )
endif
tmp := $(shell echo "/* Cuda Settings */" >> KokkosCore_config.tmp)
+ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_LDG), 1)
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_LDG_INTRINSIC 1" >> KokkosCore_config.tmp )
endif
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_UVM), 1)
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_UVM 1" >> KokkosCore_config.tmp )
- tmp := $(shell echo "\#define KOKKOS_USE_CUDA_UVM 1" >> KokkosCore_config.tmp )
+ tmp := $(shell echo "\#define KOKKOS_USE_CUDA_UVM 1" >> KokkosCore_config.tmp )
endif
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_RELOC), 1)
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_RELOCATABLE_DEVICE_CODE 1" >> KokkosCore_config.tmp )
KOKKOS_CXXFLAGS += --relocatable-device-code=true
KOKKOS_LDFLAGS += --relocatable-device-code=true
endif
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_LAMBDA), 1)
- tmp := $(shell echo "\#define KOKKOS_CUDA_USE_LAMBDA 1" >> KokkosCore_config.tmp )
- KOKKOS_CXXFLAGS += -expt-extended-lambda
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC), 1)
+ ifeq ($(shell test $(KOKKOS_INTERNAL_COMPILER_NVCC_VERSION) -gt 70; echo $$?),0)
+ tmp := $(shell echo "\#define KOKKOS_CUDA_USE_LAMBDA 1" >> KokkosCore_config.tmp )
+ KOKKOS_CXXFLAGS += -expt-extended-lambda
+ else
+ $(warning Warning: Cuda Lambda support was requested but NVCC version is too low. This requires NVCC for Cuda version 7.5 or higher. Disabling Lambda support now.)
+ endif
+ endif
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
+ tmp := $(shell echo "\#define KOKKOS_CUDA_USE_LAMBDA 1" >> KokkosCore_config.tmp )
+ endif
+endif
endif
#Add Architecture flags
-ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AVX), 1)
- tmp := $(shell echo "\#define KOKKOS_ARCH_AVX 1" >> KokkosCore_config.tmp )
+ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ARMV80), 1)
+ tmp := $(shell echo "\#define KOKKOS_ARCH_ARMV80 1" >> KokkosCore_config.tmp )
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
+ KOKKOS_CXXFLAGS +=
+ KOKKOS_LDFLAGS +=
+ else
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
+ KOKKOS_CXXFLAGS +=
+ KOKKOS_LDFLAGS +=
+ else
+ KOKKOS_CXXFLAGS += -march=armv8-a
+ KOKKOS_LDFLAGS += -march=armv8-a
+ endif
+ endif
+endif
+
+ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ARMV81), 1)
+ tmp := $(shell echo "\#define KOKKOS_ARCH_ARMV81 1" >> KokkosCore_config.tmp )
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
+ KOKKOS_CXXFLAGS +=
+ KOKKOS_LDFLAGS +=
+ else
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
+ KOKKOS_CXXFLAGS +=
+ KOKKOS_LDFLAGS +=
+ else
+ KOKKOS_CXXFLAGS += -march=armv8.1-a
+ KOKKOS_LDFLAGS += -march=armv8.1-a
+ endif
+ endif
+endif
+
+ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX), 1)
+ tmp := $(shell echo "\#define KOKKOS_ARCH_ARMV80 1" >> KokkosCore_config.tmp )
+ tmp := $(shell echo "\#define KOKKOS_ARCH_ARMV8_THUNDERX 1" >> KokkosCore_config.tmp )
ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
KOKKOS_CXXFLAGS +=
KOKKOS_LDFLAGS +=
- else
- KOKKOS_CXXFLAGS += -mavx
- KOKKOS_LDFLAGS += -mavx
+ else
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
+ KOKKOS_CXXFLAGS +=
+ KOKKOS_LDFLAGS +=
+ else
+ KOKKOS_CXXFLAGS += -march=armv8-a -mtune=thunderx
+ KOKKOS_LDFLAGS += -march=armv8-a -mtune=thunderx
+ endif
endif
endif
+ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AVX), 1)
+ tmp := $(shell echo "\#define KOKKOS_ARCH_AVX 1" >> KokkosCore_config.tmp )
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_INTEL), 1)
+ KOKKOS_CXXFLAGS += -mavx
+ KOKKOS_LDFLAGS += -mavx
+ else
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
+
+ else
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
+ KOKKOS_CXXFLAGS += -tp=sandybridge
+ KOKKOS_LDFLAGS += -tp=sandybridge
+ else
+ # Assume that this is a really a GNU compiler
+ KOKKOS_CXXFLAGS += -mavx
+ KOKKOS_LDFLAGS += -mavx
+ endif
+ endif
+ endif
+endif
+
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_POWER8), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_POWER8 1" >> KokkosCore_config.tmp )
- KOKKOS_CXXFLAGS += -mcpu=power8 -mtune=power8
- KOKKOS_LDFLAGS += -mcpu=power8 -mtune=power8
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
+
+ else
+ # Assume that this is a really a GNU compiler or it could be XL on P8
+ KOKKOS_CXXFLAGS += -mcpu=power8 -mtune=power8
+ KOKKOS_LDFLAGS += -mcpu=power8 -mtune=power8
+ endif
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AVX2), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_AVX2 1" >> KokkosCore_config.tmp )
ifeq ($(KOKKOS_INTERNAL_COMPILER_INTEL), 1)
KOKKOS_CXXFLAGS += -xCORE-AVX2
KOKKOS_LDFLAGS += -xCORE-AVX2
else
ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
else
ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
-
+ KOKKOS_CXXFLAGS += -tp=haswell
+ KOKKOS_LDFLAGS += -tp=haswell
else
# Assume that this is a really a GNU compiler
KOKKOS_CXXFLAGS += -march=core-avx2 -mtune=core-avx2
KOKKOS_LDFLAGS += -march=core-avx2 -mtune=core-avx2
endif
endif
endif
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AVX512MIC), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_AVX512MIC 1" >> KokkosCore_config.tmp )
ifeq ($(KOKKOS_INTERNAL_COMPILER_INTEL), 1)
KOKKOS_CXXFLAGS += -xMIC-AVX512
KOKKOS_LDFLAGS += -xMIC-AVX512
else
ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
else
ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
else
# Asssume that this is really a GNU compiler
KOKKOS_CXXFLAGS += -march=knl
KOKKOS_LDFLAGS += -march=knl
endif
endif
endif
endif
+ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AVX512XEON), 1)
+ tmp := $(shell echo "\#define KOKKOS_ARCH_AVX512XEON 1" >> KokkosCore_config.tmp )
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_INTEL), 1)
+ KOKKOS_CXXFLAGS += -xCORE-AVX512
+ KOKKOS_LDFLAGS += -xCORE-AVX512
+ else
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
+
+ else
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
+
+ else
+ # Nothing here yet
+ KOKKOS_CXXFLAGS += -march=skylake-avx512
+ KOKKOS_LDFLAGS += -march=skylake-avx512
+ endif
+ endif
+ endif
+endif
+
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KNC), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_KNC 1" >> KokkosCore_config.tmp )
KOKKOS_CXXFLAGS += -mmic
KOKKOS_LDFLAGS += -mmic
endif
+#Figure out the architecture flag for Cuda
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
+ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC), 1)
+ KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG=-arch
+endif
+ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
+ KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG=-x cuda --cuda-gpu-arch
+endif
+
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER30), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER30 1" >> KokkosCore_config.tmp )
- KOKKOS_CXXFLAGS += -arch=sm_30
+ KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_30
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER32), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER32 1" >> KokkosCore_config.tmp )
- KOKKOS_CXXFLAGS += -arch=sm_32
+ KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_32
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER35), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER35 1" >> KokkosCore_config.tmp )
- KOKKOS_CXXFLAGS += -arch=sm_35
+ KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_35
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER37), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER37 1" >> KokkosCore_config.tmp )
- KOKKOS_CXXFLAGS += -arch=sm_37
+ KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_37
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MAXWELL50), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL50 1" >> KokkosCore_config.tmp )
- KOKKOS_CXXFLAGS += -arch=sm_50
+ KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_50
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MAXWELL52), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL52 1" >> KokkosCore_config.tmp )
- KOKKOS_CXXFLAGS += -arch=sm_52
+ KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_52
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MAXWELL53), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL53 1" >> KokkosCore_config.tmp )
- KOKKOS_CXXFLAGS += -arch=sm_53
+ KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_53
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_PASCAL61), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL61 1" >> KokkosCore_config.tmp )
- KOKKOS_CXXFLAGS += -arch=sm_61
+ KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_61
+endif
+ifeq ($(KOKKOS_INTERNAL_USE_ARCH_PASCAL60), 1)
+ tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL 1" >> KokkosCore_config.tmp )
+ tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL60 1" >> KokkosCore_config.tmp )
+ KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_60
endif
endif
KOKKOS_INTERNAL_LS_CONFIG := $(shell ls KokkosCore_config.h)
ifeq ($(KOKKOS_INTERNAL_LS_CONFIG), KokkosCore_config.h)
KOKKOS_INTERNAL_NEW_CONFIG := $(strip $(shell diff KokkosCore_config.h KokkosCore_config.tmp | grep define | wc -l))
else
KOKKOS_INTERNAL_NEW_CONFIG := 1
endif
ifneq ($(KOKKOS_INTERNAL_NEW_CONFIG), 0)
tmp := $(shell cp KokkosCore_config.tmp KokkosCore_config.h)
endif
KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/*.hpp)
KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/impl/*.hpp)
KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/containers/src/*.hpp)
KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/containers/src/impl/*.hpp)
KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/algorithms/src/*.hpp)
KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/core/src/impl/*.cpp)
KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/containers/src/impl/*.cpp)
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/core/src/Cuda/*.cpp)
KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/Cuda/*.hpp)
+ KOKKOS_CXXFLAGS += -I$(CUDA_PATH)/include
KOKKOS_LDFLAGS += -L$(CUDA_PATH)/lib64
KOKKOS_LIBS += -lcudart -lcuda
endif
ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 1)
KOKKOS_LIBS += -lpthread
KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/core/src/Threads/*.cpp)
KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/Threads/*.hpp)
endif
ifeq ($(KOKKOS_INTERNAL_USE_QTHREAD), 1)
KOKKOS_LIBS += -lqthread
KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/core/src/Qthread/*.cpp)
KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/Qthread/*.hpp)
endif
ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/core/src/OpenMP/*.cpp)
KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/OpenMP/*.hpp)
- ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC), 1)
KOKKOS_CXXFLAGS += -Xcompiler $(KOKKOS_INTERNAL_OPENMP_FLAG)
else
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_OPENMP_FLAG)
endif
KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_OPENMP_FLAG)
endif
+#Explicitly set the GCC Toolchain for Clang
+ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
+ KOKKOS_INTERNAL_GCC_PATH = $(shell which g++)
+ KOKKOS_INTERNAL_GCC_TOOLCHAIN = $(KOKKOS_INTERNAL_GCC_PATH:/bin/g++=)
+ KOKKOS_CXXFLAGS += --gcc-toolchain=$(KOKKOS_INTERNAL_GCC_TOOLCHAIN) -DKOKKOS_CUDA_CLANG_WORKAROUND -DKOKKOS_CUDA_USE_LDG_INTRINSIC
+ KOKKOS_LDFLAGS += --gcc-toolchain=$(KOKKOS_INTERNAL_GCC_TOOLCHAIN)
+endif
+
#With Cygwin functions such as fdopen and fileno are not defined
#when strict ansi is enabled. strict ansi gets enabled with --std=c++11
#though. So we hard undefine it here. Not sure if that has any bad side effects
#This is needed for gtest actually, not for Kokkos itself!
ifeq ($(KOKKOS_INTERNAL_OS_CYGWIN), 1)
KOKKOS_CXXFLAGS += -U__STRICT_ANSI__
endif
# Setting up dependencies
KokkosCore_config.h:
KOKKOS_CPP_DEPENDS := KokkosCore_config.h $(KOKKOS_HEADERS)
KOKKOS_OBJ = $(KOKKOS_SRC:.cpp=.o)
KOKKOS_OBJ_LINK = $(notdir $(KOKKOS_OBJ))
include $(KOKKOS_PATH)/Makefile.targets
kokkos-clean:
- -rm -f $(KOKKOS_OBJ_LINK) KokkosCore_config.h KokkosCore_config.tmp libkokkos.a
+ rm -f $(KOKKOS_OBJ_LINK) KokkosCore_config.h KokkosCore_config.tmp libkokkos.a
libkokkos.a: $(KOKKOS_OBJ_LINK) $(KOKKOS_SRC) $(KOKKOS_HEADERS)
ar cr libkokkos.a $(KOKKOS_OBJ_LINK)
ranlib libkokkos.a
KOKKOS_LINK_DEPENDS=libkokkos.a
diff --git a/lib/kokkos/Makefile.targets b/lib/kokkos/Makefile.targets
index 86929ea0f..a48a5f6eb 100644
--- a/lib/kokkos/Makefile.targets
+++ b/lib/kokkos/Makefile.targets
@@ -1,72 +1,62 @@
Kokkos_UnorderedMap_impl.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/containers/src/impl/Kokkos_UnorderedMap_impl.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/containers/src/impl/Kokkos_UnorderedMap_impl.cpp
Kokkos_Core.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_Core.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_Core.cpp
Kokkos_CPUDiscovery.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_CPUDiscovery.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_CPUDiscovery.cpp
Kokkos_Error.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_Error.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_Error.cpp
Kokkos_ExecPolicy.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_ExecPolicy.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_ExecPolicy.cpp
Kokkos_HostSpace.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_HostSpace.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_HostSpace.cpp
Kokkos_hwloc.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_hwloc.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_hwloc.cpp
Kokkos_Serial.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_Serial.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_Serial.cpp
-Kokkos_Serial_TaskPolicy.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_Serial_TaskPolicy.cpp
- $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_Serial_TaskPolicy.cpp
-Kokkos_TaskQueue.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_TaskQueue.cpp
- $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_TaskQueue.cpp
Kokkos_Serial_Task.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_Serial_Task.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_Serial_Task.cpp
-Kokkos_Shape.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_Shape.cpp
- $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_Shape.cpp
+Kokkos_TaskQueue.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_TaskQueue.cpp
+ $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_TaskQueue.cpp
Kokkos_spinwait.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_spinwait.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_spinwait.cpp
Kokkos_Profiling_Interface.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_Profiling_Interface.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_Profiling_Interface.cpp
-KokkosExp_SharedAlloc.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/KokkosExp_SharedAlloc.cpp
- $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/KokkosExp_SharedAlloc.cpp
+Kokkos_SharedAlloc.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_SharedAlloc.cpp
+ $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_SharedAlloc.cpp
Kokkos_MemoryPool.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_MemoryPool.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_MemoryPool.cpp
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
Kokkos_Cuda_Impl.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Cuda/Kokkos_Cuda_Impl.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Cuda/Kokkos_Cuda_Impl.cpp
Kokkos_CudaSpace.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Cuda/Kokkos_CudaSpace.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Cuda/Kokkos_CudaSpace.cpp
Kokkos_Cuda_Task.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Cuda/Kokkos_Cuda_Task.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Cuda/Kokkos_Cuda_Task.cpp
-Kokkos_Cuda_TaskPolicy.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Cuda/Kokkos_Cuda_TaskPolicy.cpp
- $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Cuda/Kokkos_Cuda_TaskPolicy.cpp
endif
ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 1)
Kokkos_ThreadsExec_base.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Threads/Kokkos_ThreadsExec_base.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Threads/Kokkos_ThreadsExec_base.cpp
Kokkos_ThreadsExec.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Threads/Kokkos_ThreadsExec.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Threads/Kokkos_ThreadsExec.cpp
-Kokkos_Threads_TaskPolicy.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Threads/Kokkos_Threads_TaskPolicy.cpp
- $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Threads/Kokkos_Threads_TaskPolicy.cpp
endif
ifeq ($(KOKKOS_INTERNAL_USE_QTHREAD), 1)
Kokkos_QthreadExec.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Qthread/Kokkos_QthreadExec.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Qthread/Kokkos_QthreadExec.cpp
Kokkos_Qthread_TaskPolicy.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Qthread/Kokkos_Qthread_TaskPolicy.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Qthread/Kokkos_Qthread_TaskPolicy.cpp
endif
ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
Kokkos_OpenMPexec.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/OpenMP/Kokkos_OpenMPexec.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/OpenMP/Kokkos_OpenMPexec.cpp
Kokkos_OpenMP_Task.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/OpenMP/Kokkos_OpenMP_Task.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/OpenMP/Kokkos_OpenMP_Task.cpp
endif
Kokkos_HBWSpace.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_HBWSpace.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_HBWSpace.cpp
-Kokkos_HBWAllocators.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_HBWAllocators.cpp
- $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_HBWAllocators.cpp
diff --git a/lib/kokkos/README b/lib/kokkos/README
index b094578af..ffc1fe53b 100644
--- a/lib/kokkos/README
+++ b/lib/kokkos/README
@@ -1,152 +1,154 @@
Kokkos implements a programming model in C++ for writing performance portable
applications targeting all major HPC platforms. For that purpose it provides
abstractions for both parallel execution of code and data management.
Kokkos is designed to target complex node architectures with N-level memory
hierarchies and multiple types of execution resources. It currently can use
OpenMP, Pthreads and CUDA as backend programming models.
The core developers of Kokkos are Carter Edwards and Christian Trott
at the Computer Science Research Institute of the Sandia National
Laboratories.
The KokkosP interface and associated tools are developed by the Application
Performance Team and Kokkos core developers at Sandia National Laboratories.
To learn more about Kokkos consider watching one of our presentations:
GTC 2015:
http://on-demand.gputechconf.com/gtc/2015/video/S5166.html
http://on-demand.gputechconf.com/gtc/2015/presentation/S5166-H-Carter-Edwards.pdf
A programming guide can be found under doc/Kokkos_PG.pdf. This is an initial version
and feedback is greatly appreciated.
A separate repository with extensive tutorial material can be found under
https://github.com/kokkos/kokkos-tutorials.
If you have a patch to contribute please feel free to issue a pull request against
the develop branch. For major contributions it is better to contact us first
for guidance.
For questions please send an email to
kokkos-users@software.sandia.gov
For non-public questions send an email to
hcedwar(at)sandia.gov and crtrott(at)sandia.gov
============================================================================
====Requirements============================================================
============================================================================
Primary tested compilers on X86 are:
GCC 4.7.2
GCC 4.8.4
GCC 4.9.2
GCC 5.1.0
Intel 14.0.4
Intel 15.0.2
Intel 16.0.1
+ Intel 17.0.098
Clang 3.5.2
Clang 3.6.1
+ Clang 3.9.0
Primary tested compilers on Power 8 are:
- IBM XL 13.1.3 (OpenMP,Serial)
- GCC 4.9.2 (OpenMP,Serial)
- GCC 5.3.0 (OpenMP,Serial)
+ GCC 5.4.0 (OpenMP,Serial)
+ IBM XL 13.1.3 (OpenMP, Serial) (There is a workaround in place to avoid a compiler bug)
+
+Primary tested compilers on Intel KNL are:
+ Intel 16.2.181 (with gcc 4.7.2)
+ Intel 17.0.098 (with gcc 4.7.2)
Secondary tested compilers are:
- CUDA 6.5 (with gcc 4.7.2)
CUDA 7.0 (with gcc 4.7.2)
- CUDA 7.5 (with gcc 4.8.4)
+ CUDA 7.5 (with gcc 4.7.2)
+ CUDA 8.0 (with gcc 5.3.0 on X86 and gcc 5.4.0 on Power8)
+ CUDA/Clang 8.0 using Clang/Trunk compiler
Other compilers working:
X86:
- Intel 17.0.042 (the FENL example causes internal compiler error)
PGI 15.4
Cygwin 2.1.0 64bit with gcc 4.9.3
- KNL:
- Intel 16.2.181 (the FENL example causes internal compiler error)
- Intel 17.0.042 (the FENL example causes internal compiler error)
Known non-working combinations:
Power8:
- GCC 6.1.0
Pthreads backend
Primary tested compiler are passing in release mode
with warnings as errors. They also are tested with a comprehensive set of
backend combinations (i.e. OpenMP, Pthreads, Serial, OpenMP+Serial, ...).
We are using the following set of flags:
GCC: -Wall -Wshadow -pedantic -Werror -Wsign-compare -Wtype-limits
-Wignored-qualifiers -Wempty-body -Wclobbered -Wuninitialized
Intel: -Wall -Wshadow -pedantic -Werror -Wsign-compare -Wtype-limits -Wuninitialized
Clang: -Wall -Wshadow -pedantic -Werror -Wsign-compare -Wtype-limits -Wuninitialized
Secondary compilers are passing without -Werror.
Other compilers are tested occasionally, in particular when pushing from develop to
master branch, without -Werror and only for a select set of backends.
============================================================================
====Getting started=========================================================
============================================================================
In the 'example/tutorial' directory you will find step by step tutorial
examples which explain many of the features of Kokkos. They work with
-simple Makefiles. To build with g++ and OpenMP simply type 'make openmp'
+simple Makefiles. To build with g++ and OpenMP simply type 'make'
in the 'example/tutorial' directory. This will build all examples in the
-subfolders.
+subfolders. To change the build options refer to the Programming Guide
+in the compilation section.
============================================================================
====Running Unit Tests======================================================
============================================================================
To run the unit tests create a build directory and run the following commands
KOKKOS_PATH/generate_makefile.bash
make build-test
make test
Run KOKKOS_PATH/generate_makefile.bash --help for more detailed options such as
changing the device type for which to build.
============================================================================
====Install the library=====================================================
============================================================================
To install Kokkos as a library create a build directory and run the following
KOKKOS_PATH/generate_makefile.bash --prefix=INSTALL_PATH
make lib
make install
KOKKOS_PATH/generate_makefile.bash --help for more detailed options such as
changing the device type for which to build.
============================================================================
====CMakeFiles==============================================================
============================================================================
The CMake files contained in this repository require Tribits and are used
for integration with Trilinos. They do not currently support a standalone
CMake build.
===========================================================================
====Kokkos and CUDA UVM====================================================
===========================================================================
Kokkos does support UVM as a specific memory space called CudaUVMSpace.
Allocations made with that space are accessible from host and device.
You can tell Kokkos to use that as the default space for Cuda allocations.
In either case UVM comes with a number of restrictions:
(i) You can't access allocations on the host while a kernel is potentially
running. This will lead to segfaults. To avoid that you either need to
call Kokkos::Cuda::fence() (or just Kokkos::fence()), after kernels, or
you can set the environment variable CUDA_LAUNCH_BLOCKING=1.
Furthermore in multi socket multi GPU machines, UVM defaults to using
zero copy allocations for technical reasons related to using multiple
GPUs from the same process. If an executable doesn't do that (e.g. each
MPI rank of an application uses a single GPU [can be the same GPU for
multiple MPI ranks]) you can set CUDA_MANAGED_FORCE_DEVICE_ALLOC=1.
This will enforce proper UVM allocations, but can lead to errors if
more than a single GPU is used by a single process.
diff --git a/lib/kokkos/algorithms/src/Kokkos_Random.hpp b/lib/kokkos/algorithms/src/Kokkos_Random.hpp
index afe6b54e9..78cddeeae 100644
--- a/lib/kokkos/algorithms/src/Kokkos_Random.hpp
+++ b/lib/kokkos/algorithms/src/Kokkos_Random.hpp
@@ -1,1751 +1,1751 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_RANDOM_HPP
#define KOKKOS_RANDOM_HPP
#include <Kokkos_Core.hpp>
#include <Kokkos_Complex.hpp>
#include <cstdio>
#include <cstdlib>
#include <cmath>
/// \file Kokkos_Random.hpp
/// \brief Pseudorandom number generators
///
/// These generators are based on Vigna, Sebastiano (2014). "An
/// experimental exploration of Marsaglia's xorshift generators,
/// scrambled." See: http://arxiv.org/abs/1402.6246
namespace Kokkos {
/*Template functions to get equidistributed random numbers from a generator for a specific Scalar type
template<class Generator,Scalar>
struct rand{
//Max value returned by draw(Generator& gen)
KOKKOS_INLINE_FUNCTION
static Scalar max();
//Returns a value between zero and max()
KOKKOS_INLINE_FUNCTION
static Scalar draw(Generator& gen);
//Returns a value between zero and range()
//Note: for floating point values range can be larger than max()
KOKKOS_INLINE_FUNCTION
static Scalar draw(Generator& gen, const Scalar& range){}
//Return value between start and end
KOKKOS_INLINE_FUNCTION
static Scalar draw(Generator& gen, const Scalar& start, const Scalar& end);
};
The Random number generators themselves have two components a state-pool and the actual generator
A state-pool manages a number of generators, so that each active thread is able to grep its own.
This allows the generation of random numbers which are independent between threads. Note that
in contrast to CuRand none of the functions of the pool (or the generator) are collectives,
i.e. all functions can be called inside conditionals.
template<class Device>
class Pool {
public:
//The Kokkos device type
typedef Device device_type;
//The actual generator type
typedef Generator<Device> generator_type;
//Default constructor: does not initialize a pool
Pool();
//Initializing constructor: calls init(seed,Device_Specific_Number);
Pool(unsigned int seed);
//Intialize Pool with seed as a starting seed with a pool_size of num_states
//The Random_XorShift64 generator is used in serial to initialize all states,
//thus the intialization process is platform independent and deterministic.
void init(unsigned int seed, int num_states);
//Get a generator. This will lock one of the states, guaranteeing that each thread
//will have its private generator. Note: on Cuda getting a state involves atomics,
//and is thus not deterministic!
generator_type get_state();
//Give a state back to the pool. This unlocks the state, and writes the modified
//state of the generator back to the pool.
void free_state(generator_type gen);
}
template<class Device>
class Generator {
public:
//The Kokkos device type
typedef DeviceType device_type;
//Max return values of respective [X]rand[S]() functions
enum {MAX_URAND = 0xffffffffU};
enum {MAX_URAND64 = 0xffffffffffffffffULL-1};
enum {MAX_RAND = static_cast<int>(0xffffffffU/2)};
enum {MAX_RAND64 = static_cast<int64_t>(0xffffffffffffffffULL/2-1)};
//Init with a state and the idx with respect to pool. Note: in serial the
//Generator can be used by just giving it the necessary state arguments
KOKKOS_INLINE_FUNCTION
Generator (STATE_ARGUMENTS, int state_idx = 0);
//Draw a equidistributed uint32_t in the range (0,MAX_URAND]
KOKKOS_INLINE_FUNCTION
uint32_t urand();
//Draw a equidistributed uint64_t in the range (0,MAX_URAND64]
KOKKOS_INLINE_FUNCTION
uint64_t urand64();
//Draw a equidistributed uint32_t in the range (0,range]
KOKKOS_INLINE_FUNCTION
uint32_t urand(const uint32_t& range);
//Draw a equidistributed uint32_t in the range (start,end]
KOKKOS_INLINE_FUNCTION
uint32_t urand(const uint32_t& start, const uint32_t& end );
//Draw a equidistributed uint64_t in the range (0,range]
KOKKOS_INLINE_FUNCTION
uint64_t urand64(const uint64_t& range);
//Draw a equidistributed uint64_t in the range (start,end]
KOKKOS_INLINE_FUNCTION
uint64_t urand64(const uint64_t& start, const uint64_t& end );
//Draw a equidistributed int in the range (0,MAX_RAND]
KOKKOS_INLINE_FUNCTION
int rand();
//Draw a equidistributed int in the range (0,range]
KOKKOS_INLINE_FUNCTION
int rand(const int& range);
//Draw a equidistributed int in the range (start,end]
KOKKOS_INLINE_FUNCTION
int rand(const int& start, const int& end );
//Draw a equidistributed int64_t in the range (0,MAX_RAND64]
KOKKOS_INLINE_FUNCTION
int64_t rand64();
//Draw a equidistributed int64_t in the range (0,range]
KOKKOS_INLINE_FUNCTION
int64_t rand64(const int64_t& range);
//Draw a equidistributed int64_t in the range (start,end]
KOKKOS_INLINE_FUNCTION
int64_t rand64(const int64_t& start, const int64_t& end );
//Draw a equidistributed float in the range (0,1.0]
KOKKOS_INLINE_FUNCTION
float frand();
//Draw a equidistributed float in the range (0,range]
KOKKOS_INLINE_FUNCTION
float frand(const float& range);
//Draw a equidistributed float in the range (start,end]
KOKKOS_INLINE_FUNCTION
float frand(const float& start, const float& end );
//Draw a equidistributed double in the range (0,1.0]
KOKKOS_INLINE_FUNCTION
double drand();
//Draw a equidistributed double in the range (0,range]
KOKKOS_INLINE_FUNCTION
double drand(const double& range);
//Draw a equidistributed double in the range (start,end]
KOKKOS_INLINE_FUNCTION
double drand(const double& start, const double& end );
//Draw a standard normal distributed double
KOKKOS_INLINE_FUNCTION
double normal() ;
//Draw a normal distributed double with given mean and standard deviation
KOKKOS_INLINE_FUNCTION
double normal(const double& mean, const double& std_dev=1.0);
}
//Additional Functions:
//Fills view with random numbers in the range (0,range]
template<class ViewType, class PoolType>
void fill_random(ViewType view, PoolType pool, ViewType::value_type range);
//Fills view with random numbers in the range (start,end]
template<class ViewType, class PoolType>
void fill_random(ViewType view, PoolType pool,
ViewType::value_type start, ViewType::value_type end);
*/
template<class Generator, class Scalar>
struct rand;
template<class Generator>
struct rand<Generator,char> {
KOKKOS_INLINE_FUNCTION
static short max(){return 127;}
KOKKOS_INLINE_FUNCTION
static short draw(Generator& gen)
{return short((gen.rand()&0xff+256)%256);}
KOKKOS_INLINE_FUNCTION
static short draw(Generator& gen, const char& range)
{return char(gen.rand(range));}
KOKKOS_INLINE_FUNCTION
static short draw(Generator& gen, const char& start, const char& end)
{return char(gen.rand(start,end));}
};
template<class Generator>
struct rand<Generator,short> {
KOKKOS_INLINE_FUNCTION
static short max(){return 32767;}
KOKKOS_INLINE_FUNCTION
static short draw(Generator& gen)
{return short((gen.rand()&0xffff+65536)%32768);}
KOKKOS_INLINE_FUNCTION
static short draw(Generator& gen, const short& range)
{return short(gen.rand(range));}
KOKKOS_INLINE_FUNCTION
static short draw(Generator& gen, const short& start, const short& end)
{return short(gen.rand(start,end));}
};
template<class Generator>
struct rand<Generator,int> {
KOKKOS_INLINE_FUNCTION
static int max(){return Generator::MAX_RAND;}
KOKKOS_INLINE_FUNCTION
static int draw(Generator& gen)
{return gen.rand();}
KOKKOS_INLINE_FUNCTION
static int draw(Generator& gen, const int& range)
{return gen.rand(range);}
KOKKOS_INLINE_FUNCTION
static int draw(Generator& gen, const int& start, const int& end)
{return gen.rand(start,end);}
};
template<class Generator>
struct rand<Generator,unsigned int> {
KOKKOS_INLINE_FUNCTION
static unsigned int max () {
return Generator::MAX_URAND;
}
KOKKOS_INLINE_FUNCTION
static unsigned int draw (Generator& gen) {
return gen.urand ();
}
KOKKOS_INLINE_FUNCTION
static unsigned int draw(Generator& gen, const unsigned int& range) {
return gen.urand (range);
}
KOKKOS_INLINE_FUNCTION
static unsigned int
draw (Generator& gen, const unsigned int& start, const unsigned int& end) {
return gen.urand (start, end);
}
};
template<class Generator>
struct rand<Generator,long> {
KOKKOS_INLINE_FUNCTION
static long max () {
// FIXME (mfh 26 Oct 2014) It would be better to select the
// return value at compile time, using something like enable_if.
return sizeof (long) == 4 ?
static_cast<long> (Generator::MAX_RAND) :
static_cast<long> (Generator::MAX_RAND64);
}
KOKKOS_INLINE_FUNCTION
static long draw (Generator& gen) {
// FIXME (mfh 26 Oct 2014) It would be better to select the
// return value at compile time, using something like enable_if.
return sizeof (long) == 4 ?
static_cast<long> (gen.rand ()) :
static_cast<long> (gen.rand64 ());
}
KOKKOS_INLINE_FUNCTION
static long draw (Generator& gen, const long& range) {
// FIXME (mfh 26 Oct 2014) It would be better to select the
// return value at compile time, using something like enable_if.
return sizeof (long) == 4 ?
static_cast<long> (gen.rand (static_cast<int> (range))) :
static_cast<long> (gen.rand64 (range));
}
KOKKOS_INLINE_FUNCTION
static long draw (Generator& gen, const long& start, const long& end) {
// FIXME (mfh 26 Oct 2014) It would be better to select the
// return value at compile time, using something like enable_if.
return sizeof (long) == 4 ?
static_cast<long> (gen.rand (static_cast<int> (start),
static_cast<int> (end))) :
static_cast<long> (gen.rand64 (start, end));
}
};
template<class Generator>
struct rand<Generator,unsigned long> {
KOKKOS_INLINE_FUNCTION
static unsigned long max () {
// FIXME (mfh 26 Oct 2014) It would be better to select the
// return value at compile time, using something like enable_if.
return sizeof (unsigned long) == 4 ?
static_cast<unsigned long> (Generator::MAX_URAND) :
static_cast<unsigned long> (Generator::MAX_URAND64);
}
KOKKOS_INLINE_FUNCTION
static unsigned long draw (Generator& gen) {
// FIXME (mfh 26 Oct 2014) It would be better to select the
// return value at compile time, using something like enable_if.
return sizeof (unsigned long) == 4 ?
static_cast<unsigned long> (gen.urand ()) :
static_cast<unsigned long> (gen.urand64 ());
}
KOKKOS_INLINE_FUNCTION
static unsigned long draw(Generator& gen, const unsigned long& range) {
// FIXME (mfh 26 Oct 2014) It would be better to select the
// return value at compile time, using something like enable_if.
return sizeof (unsigned long) == 4 ?
static_cast<unsigned long> (gen.urand (static_cast<unsigned int> (range))) :
static_cast<unsigned long> (gen.urand64 (range));
}
KOKKOS_INLINE_FUNCTION
static unsigned long
draw (Generator& gen, const unsigned long& start, const unsigned long& end) {
// FIXME (mfh 26 Oct 2014) It would be better to select the
// return value at compile time, using something like enable_if.
return sizeof (unsigned long) == 4 ?
static_cast<unsigned long> (gen.urand (static_cast<unsigned int> (start),
static_cast<unsigned int> (end))) :
static_cast<unsigned long> (gen.urand64 (start, end));
}
};
// NOTE (mfh 26 oct 2014) This is a partial specialization for long
// long, a C99 / C++11 signed type which is guaranteed to be at
// least 64 bits. Do NOT write a partial specialization for
// int64_t!!! This is just a typedef! It could be either long or
// long long. We don't know which a priori, and I've seen both.
// The types long and long long are guaranteed to differ, so it's
// always safe to specialize for both.
template<class Generator>
struct rand<Generator, long long> {
KOKKOS_INLINE_FUNCTION
static long long max () {
// FIXME (mfh 26 Oct 2014) It's legal for long long to be > 64 bits.
return Generator::MAX_RAND64;
}
KOKKOS_INLINE_FUNCTION
static long long draw (Generator& gen) {
// FIXME (mfh 26 Oct 2014) It's legal for long long to be > 64 bits.
return gen.rand64 ();
}
KOKKOS_INLINE_FUNCTION
static long long draw (Generator& gen, const long long& range) {
// FIXME (mfh 26 Oct 2014) It's legal for long long to be > 64 bits.
return gen.rand64 (range);
}
KOKKOS_INLINE_FUNCTION
static long long draw (Generator& gen, const long long& start, const long long& end) {
// FIXME (mfh 26 Oct 2014) It's legal for long long to be > 64 bits.
return gen.rand64 (start, end);
}
};
// NOTE (mfh 26 oct 2014) This is a partial specialization for
// unsigned long long, a C99 / C++11 unsigned type which is
// guaranteed to be at least 64 bits. Do NOT write a partial
// specialization for uint64_t!!! This is just a typedef! It could
// be either unsigned long or unsigned long long. We don't know
// which a priori, and I've seen both. The types unsigned long and
// unsigned long long are guaranteed to differ, so it's always safe
// to specialize for both.
template<class Generator>
struct rand<Generator,unsigned long long> {
KOKKOS_INLINE_FUNCTION
static unsigned long long max () {
// FIXME (mfh 26 Oct 2014) It's legal for unsigned long long to be > 64 bits.
return Generator::MAX_URAND64;
}
KOKKOS_INLINE_FUNCTION
static unsigned long long draw (Generator& gen) {
// FIXME (mfh 26 Oct 2014) It's legal for unsigned long long to be > 64 bits.
return gen.urand64 ();
}
KOKKOS_INLINE_FUNCTION
static unsigned long long draw (Generator& gen, const unsigned long long& range) {
// FIXME (mfh 26 Oct 2014) It's legal for long long to be > 64 bits.
return gen.urand64 (range);
}
KOKKOS_INLINE_FUNCTION
static unsigned long long
draw (Generator& gen, const unsigned long long& start, const unsigned long long& end) {
// FIXME (mfh 26 Oct 2014) It's legal for long long to be > 64 bits.
return gen.urand64 (start, end);
}
};
template<class Generator>
struct rand<Generator,float> {
KOKKOS_INLINE_FUNCTION
static float max(){return 1.0f;}
KOKKOS_INLINE_FUNCTION
static float draw(Generator& gen)
{return gen.frand();}
KOKKOS_INLINE_FUNCTION
static float draw(Generator& gen, const float& range)
{return gen.frand(range);}
KOKKOS_INLINE_FUNCTION
static float draw(Generator& gen, const float& start, const float& end)
{return gen.frand(start,end);}
};
template<class Generator>
struct rand<Generator,double> {
KOKKOS_INLINE_FUNCTION
static double max(){return 1.0;}
KOKKOS_INLINE_FUNCTION
static double draw(Generator& gen)
{return gen.drand();}
KOKKOS_INLINE_FUNCTION
static double draw(Generator& gen, const double& range)
{return gen.drand(range);}
KOKKOS_INLINE_FUNCTION
static double draw(Generator& gen, const double& start, const double& end)
{return gen.drand(start,end);}
};
template<class Generator>
- struct rand<Generator, ::Kokkos::complex<float> > {
+ struct rand<Generator, Kokkos::complex<float> > {
KOKKOS_INLINE_FUNCTION
- static ::Kokkos::complex<float> max () {
- return ::Kokkos::complex<float> (1.0, 1.0);
+ static Kokkos::complex<float> max () {
+ return Kokkos::complex<float> (1.0, 1.0);
}
KOKKOS_INLINE_FUNCTION
- static ::Kokkos::complex<float> draw (Generator& gen) {
+ static Kokkos::complex<float> draw (Generator& gen) {
const float re = gen.frand ();
const float im = gen.frand ();
- return ::Kokkos::complex<float> (re, im);
+ return Kokkos::complex<float> (re, im);
}
KOKKOS_INLINE_FUNCTION
- static ::Kokkos::complex<float> draw (Generator& gen, const ::Kokkos::complex<float>& range) {
+ static Kokkos::complex<float> draw (Generator& gen, const Kokkos::complex<float>& range) {
const float re = gen.frand (real (range));
const float im = gen.frand (imag (range));
- return ::Kokkos::complex<float> (re, im);
+ return Kokkos::complex<float> (re, im);
}
KOKKOS_INLINE_FUNCTION
- static ::Kokkos::complex<float> draw (Generator& gen, const ::Kokkos::complex<float>& start, const ::Kokkos::complex<float>& end) {
+ static Kokkos::complex<float> draw (Generator& gen, const Kokkos::complex<float>& start, const Kokkos::complex<float>& end) {
const float re = gen.frand (real (start), real (end));
const float im = gen.frand (imag (start), imag (end));
- return ::Kokkos::complex<float> (re, im);
+ return Kokkos::complex<float> (re, im);
}
};
template<class Generator>
- struct rand<Generator, ::Kokkos::complex<double> > {
+ struct rand<Generator, Kokkos::complex<double> > {
KOKKOS_INLINE_FUNCTION
- static ::Kokkos::complex<double> max () {
- return ::Kokkos::complex<double> (1.0, 1.0);
+ static Kokkos::complex<double> max () {
+ return Kokkos::complex<double> (1.0, 1.0);
}
KOKKOS_INLINE_FUNCTION
- static ::Kokkos::complex<double> draw (Generator& gen) {
+ static Kokkos::complex<double> draw (Generator& gen) {
const double re = gen.drand ();
const double im = gen.drand ();
- return ::Kokkos::complex<double> (re, im);
+ return Kokkos::complex<double> (re, im);
}
KOKKOS_INLINE_FUNCTION
- static ::Kokkos::complex<double> draw (Generator& gen, const ::Kokkos::complex<double>& range) {
+ static Kokkos::complex<double> draw (Generator& gen, const Kokkos::complex<double>& range) {
const double re = gen.drand (real (range));
const double im = gen.drand (imag (range));
- return ::Kokkos::complex<double> (re, im);
+ return Kokkos::complex<double> (re, im);
}
KOKKOS_INLINE_FUNCTION
- static ::Kokkos::complex<double> draw (Generator& gen, const ::Kokkos::complex<double>& start, const ::Kokkos::complex<double>& end) {
+ static Kokkos::complex<double> draw (Generator& gen, const Kokkos::complex<double>& start, const Kokkos::complex<double>& end) {
const double re = gen.drand (real (start), real (end));
const double im = gen.drand (imag (start), imag (end));
- return ::Kokkos::complex<double> (re, im);
+ return Kokkos::complex<double> (re, im);
}
};
template<class DeviceType>
class Random_XorShift64_Pool;
template<class DeviceType>
class Random_XorShift64 {
private:
uint64_t state_;
const int state_idx_;
friend class Random_XorShift64_Pool<DeviceType>;
public:
typedef DeviceType device_type;
enum {MAX_URAND = 0xffffffffU};
enum {MAX_URAND64 = 0xffffffffffffffffULL-1};
enum {MAX_RAND = static_cast<int>(0xffffffff/2)};
enum {MAX_RAND64 = static_cast<int64_t>(0xffffffffffffffffLL/2-1)};
KOKKOS_INLINE_FUNCTION
Random_XorShift64 (uint64_t state, int state_idx = 0)
: state_(state),state_idx_(state_idx){}
KOKKOS_INLINE_FUNCTION
uint32_t urand() {
state_ ^= state_ >> 12;
state_ ^= state_ << 25;
state_ ^= state_ >> 27;
uint64_t tmp = state_ * 2685821657736338717ULL;
tmp = tmp>>16;
return static_cast<uint32_t>(tmp&MAX_URAND);
}
KOKKOS_INLINE_FUNCTION
uint64_t urand64() {
state_ ^= state_ >> 12;
state_ ^= state_ << 25;
state_ ^= state_ >> 27;
return (state_ * 2685821657736338717ULL) - 1;
}
KOKKOS_INLINE_FUNCTION
uint32_t urand(const uint32_t& range) {
const uint32_t max_val = (MAX_URAND/range)*range;
uint32_t tmp = urand();
while(tmp>=max_val)
tmp = urand();
return tmp%range;
}
KOKKOS_INLINE_FUNCTION
uint32_t urand(const uint32_t& start, const uint32_t& end ) {
return urand(end-start)+start;
}
KOKKOS_INLINE_FUNCTION
uint64_t urand64(const uint64_t& range) {
const uint64_t max_val = (MAX_URAND64/range)*range;
uint64_t tmp = urand64();
while(tmp>=max_val)
tmp = urand64();
return tmp%range;
}
KOKKOS_INLINE_FUNCTION
uint64_t urand64(const uint64_t& start, const uint64_t& end ) {
return urand64(end-start)+start;
}
KOKKOS_INLINE_FUNCTION
int rand() {
return static_cast<int>(urand()/2);
}
KOKKOS_INLINE_FUNCTION
int rand(const int& range) {
const int max_val = (MAX_RAND/range)*range;
int tmp = rand();
while(tmp>=max_val)
tmp = rand();
return tmp%range;
}
KOKKOS_INLINE_FUNCTION
int rand(const int& start, const int& end ) {
return rand(end-start)+start;
}
KOKKOS_INLINE_FUNCTION
int64_t rand64() {
return static_cast<int64_t>(urand64()/2);
}
KOKKOS_INLINE_FUNCTION
int64_t rand64(const int64_t& range) {
const int64_t max_val = (MAX_RAND64/range)*range;
int64_t tmp = rand64();
while(tmp>=max_val)
tmp = rand64();
return tmp%range;
}
KOKKOS_INLINE_FUNCTION
int64_t rand64(const int64_t& start, const int64_t& end ) {
return rand64(end-start)+start;
}
KOKKOS_INLINE_FUNCTION
float frand() {
return 1.0f * urand64()/MAX_URAND64;
}
KOKKOS_INLINE_FUNCTION
float frand(const float& range) {
return range * urand64()/MAX_URAND64;
}
KOKKOS_INLINE_FUNCTION
float frand(const float& start, const float& end ) {
return frand(end-start)+start;
}
KOKKOS_INLINE_FUNCTION
double drand() {
return 1.0 * urand64()/MAX_URAND64;
}
KOKKOS_INLINE_FUNCTION
double drand(const double& range) {
return range * urand64()/MAX_URAND64;
}
KOKKOS_INLINE_FUNCTION
double drand(const double& start, const double& end ) {
return drand(end-start)+start;
}
//Marsaglia polar method for drawing a standard normal distributed random number
KOKKOS_INLINE_FUNCTION
double normal() {
double S = 2.0;
double U;
while(S>=1.0) {
U = 2.0*drand() - 1.0;
const double V = 2.0*drand() - 1.0;
S = U*U+V*V;
}
return U*sqrt(-2.0*log(S)/S);
}
KOKKOS_INLINE_FUNCTION
double normal(const double& mean, const double& std_dev=1.0) {
return mean + normal()*std_dev;
}
};
template<class DeviceType = Kokkos::DefaultExecutionSpace>
class Random_XorShift64_Pool {
private:
typedef View<int*,DeviceType> lock_type;
typedef View<uint64_t*,DeviceType> state_data_type;
lock_type locks_;
state_data_type state_;
int num_states_;
public:
typedef Random_XorShift64<DeviceType> generator_type;
typedef DeviceType device_type;
Random_XorShift64_Pool() {
num_states_ = 0;
}
Random_XorShift64_Pool(uint64_t seed) {
num_states_ = 0;
init(seed,DeviceType::max_hardware_threads());
}
Random_XorShift64_Pool(const Random_XorShift64_Pool& src):
locks_(src.locks_),
state_(src.state_),
num_states_(src.num_states_)
{}
Random_XorShift64_Pool operator = (const Random_XorShift64_Pool& src) {
locks_ = src.locks_;
state_ = src.state_;
num_states_ = src.num_states_;
return *this;
}
void init(uint64_t seed, int num_states) {
num_states_ = num_states;
locks_ = lock_type("Kokkos::Random_XorShift64::locks",num_states_);
state_ = state_data_type("Kokkos::Random_XorShift64::state",num_states_);
typename state_data_type::HostMirror h_state = create_mirror_view(state_);
typename lock_type::HostMirror h_lock = create_mirror_view(locks_);
// Execute on the HostMirror's default execution space.
Random_XorShift64<typename state_data_type::HostMirror::execution_space> gen(seed,0);
for(int i = 0; i < 17; i++)
gen.rand();
for(int i = 0; i < num_states_; i++) {
int n1 = gen.rand();
int n2 = gen.rand();
int n3 = gen.rand();
int n4 = gen.rand();
h_state(i) = (((static_cast<uint64_t>(n1)) & 0xffff)<<00) |
(((static_cast<uint64_t>(n2)) & 0xffff)<<16) |
(((static_cast<uint64_t>(n3)) & 0xffff)<<32) |
(((static_cast<uint64_t>(n4)) & 0xffff)<<48);
h_lock(i) = 0;
}
deep_copy(state_,h_state);
deep_copy(locks_,h_lock);
}
KOKKOS_INLINE_FUNCTION
Random_XorShift64<DeviceType> get_state() const {
const int i = DeviceType::hardware_thread_id();;
return Random_XorShift64<DeviceType>(state_(i),i);
}
KOKKOS_INLINE_FUNCTION
void free_state(const Random_XorShift64<DeviceType>& state) const {
state_(state.state_idx_) = state.state_;
}
};
template<class DeviceType>
class Random_XorShift1024_Pool;
template<class DeviceType>
class Random_XorShift1024 {
private:
int p_;
const int state_idx_;
uint64_t state_[16];
friend class Random_XorShift1024_Pool<DeviceType>;
public:
typedef Random_XorShift1024_Pool<DeviceType> pool_type;
typedef DeviceType device_type;
enum {MAX_URAND = 0xffffffffU};
enum {MAX_URAND64 = 0xffffffffffffffffULL-1};
enum {MAX_RAND = static_cast<int>(0xffffffffU/2)};
enum {MAX_RAND64 = static_cast<int64_t>(0xffffffffffffffffULL/2-1)};
KOKKOS_INLINE_FUNCTION
Random_XorShift1024 (const typename pool_type::state_data_type& state, int p, int state_idx = 0):
p_(p),state_idx_(state_idx){
for(int i=0 ; i<16; i++)
state_[i] = state(state_idx,i);
}
KOKKOS_INLINE_FUNCTION
uint32_t urand() {
uint64_t state_0 = state_[ p_ ];
uint64_t state_1 = state_[ p_ = ( p_ + 1 ) & 15 ];
state_1 ^= state_1 << 31;
state_1 ^= state_1 >> 11;
state_0 ^= state_0 >> 30;
uint64_t tmp = ( state_[ p_ ] = state_0 ^ state_1 ) * 1181783497276652981ULL;
tmp = tmp>>16;
return static_cast<uint32_t>(tmp&MAX_URAND);
}
KOKKOS_INLINE_FUNCTION
uint64_t urand64() {
uint64_t state_0 = state_[ p_ ];
uint64_t state_1 = state_[ p_ = ( p_ + 1 ) & 15 ];
state_1 ^= state_1 << 31;
state_1 ^= state_1 >> 11;
state_0 ^= state_0 >> 30;
return (( state_[ p_ ] = state_0 ^ state_1 ) * 1181783497276652981LL) - 1;
}
KOKKOS_INLINE_FUNCTION
uint32_t urand(const uint32_t& range) {
const uint32_t max_val = (MAX_URAND/range)*range;
uint32_t tmp = urand();
while(tmp>=max_val)
tmp = urand();
return tmp%range;
}
KOKKOS_INLINE_FUNCTION
uint32_t urand(const uint32_t& start, const uint32_t& end ) {
return urand(end-start)+start;
}
KOKKOS_INLINE_FUNCTION
uint64_t urand64(const uint64_t& range) {
const uint64_t max_val = (MAX_URAND64/range)*range;
uint64_t tmp = urand64();
while(tmp>=max_val)
tmp = urand64();
return tmp%range;
}
KOKKOS_INLINE_FUNCTION
uint64_t urand64(const uint64_t& start, const uint64_t& end ) {
return urand64(end-start)+start;
}
KOKKOS_INLINE_FUNCTION
int rand() {
return static_cast<int>(urand()/2);
}
KOKKOS_INLINE_FUNCTION
int rand(const int& range) {
const int max_val = (MAX_RAND/range)*range;
int tmp = rand();
while(tmp>=max_val)
tmp = rand();
return tmp%range;
}
KOKKOS_INLINE_FUNCTION
int rand(const int& start, const int& end ) {
return rand(end-start)+start;
}
KOKKOS_INLINE_FUNCTION
int64_t rand64() {
return static_cast<int64_t>(urand64()/2);
}
KOKKOS_INLINE_FUNCTION
int64_t rand64(const int64_t& range) {
const int64_t max_val = (MAX_RAND64/range)*range;
int64_t tmp = rand64();
while(tmp>=max_val)
tmp = rand64();
return tmp%range;
}
KOKKOS_INLINE_FUNCTION
int64_t rand64(const int64_t& start, const int64_t& end ) {
return rand64(end-start)+start;
}
KOKKOS_INLINE_FUNCTION
float frand() {
return 1.0f * urand64()/MAX_URAND64;
}
KOKKOS_INLINE_FUNCTION
float frand(const float& range) {
return range * urand64()/MAX_URAND64;
}
KOKKOS_INLINE_FUNCTION
float frand(const float& start, const float& end ) {
return frand(end-start)+start;
}
KOKKOS_INLINE_FUNCTION
double drand() {
return 1.0 * urand64()/MAX_URAND64;
}
KOKKOS_INLINE_FUNCTION
double drand(const double& range) {
return range * urand64()/MAX_URAND64;
}
KOKKOS_INLINE_FUNCTION
double drand(const double& start, const double& end ) {
return frand(end-start)+start;
}
//Marsaglia polar method for drawing a standard normal distributed random number
KOKKOS_INLINE_FUNCTION
double normal() {
double S = 2.0;
double U;
while(S>=1.0) {
U = 2.0*drand() - 1.0;
const double V = 2.0*drand() - 1.0;
S = U*U+V*V;
}
return U*sqrt(-2.0*log(S)/S);
}
KOKKOS_INLINE_FUNCTION
double normal(const double& mean, const double& std_dev=1.0) {
return mean + normal()*std_dev;
}
};
template<class DeviceType = Kokkos::DefaultExecutionSpace>
class Random_XorShift1024_Pool {
private:
typedef View<int*,DeviceType> int_view_type;
typedef View<uint64_t*[16],DeviceType> state_data_type;
int_view_type locks_;
state_data_type state_;
int_view_type p_;
int num_states_;
friend class Random_XorShift1024<DeviceType>;
public:
typedef Random_XorShift1024<DeviceType> generator_type;
typedef DeviceType device_type;
Random_XorShift1024_Pool() {
num_states_ = 0;
}
inline
Random_XorShift1024_Pool(uint64_t seed){
num_states_ = 0;
init(seed,DeviceType::max_hardware_threads());
}
Random_XorShift1024_Pool(const Random_XorShift1024_Pool& src):
locks_(src.locks_),
state_(src.state_),
p_(src.p_),
num_states_(src.num_states_)
{}
Random_XorShift1024_Pool operator = (const Random_XorShift1024_Pool& src) {
locks_ = src.locks_;
state_ = src.state_;
p_ = src.p_;
num_states_ = src.num_states_;
return *this;
}
inline
void init(uint64_t seed, int num_states) {
num_states_ = num_states;
locks_ = int_view_type("Kokkos::Random_XorShift1024::locks",num_states_);
state_ = state_data_type("Kokkos::Random_XorShift1024::state",num_states_);
p_ = int_view_type("Kokkos::Random_XorShift1024::p",num_states_);
typename state_data_type::HostMirror h_state = create_mirror_view(state_);
typename int_view_type::HostMirror h_lock = create_mirror_view(locks_);
typename int_view_type::HostMirror h_p = create_mirror_view(p_);
// Execute on the HostMirror's default execution space.
Random_XorShift64<typename state_data_type::HostMirror::execution_space> gen(seed,0);
for(int i = 0; i < 17; i++)
gen.rand();
for(int i = 0; i < num_states_; i++) {
for(int j = 0; j < 16 ; j++) {
int n1 = gen.rand();
int n2 = gen.rand();
int n3 = gen.rand();
int n4 = gen.rand();
h_state(i,j) = (((static_cast<uint64_t>(n1)) & 0xffff)<<00) |
(((static_cast<uint64_t>(n2)) & 0xffff)<<16) |
(((static_cast<uint64_t>(n3)) & 0xffff)<<32) |
(((static_cast<uint64_t>(n4)) & 0xffff)<<48);
}
h_p(i) = 0;
h_lock(i) = 0;
}
deep_copy(state_,h_state);
deep_copy(locks_,h_lock);
}
KOKKOS_INLINE_FUNCTION
Random_XorShift1024<DeviceType> get_state() const {
const int i = DeviceType::hardware_thread_id();
return Random_XorShift1024<DeviceType>(state_,p_(i),i);
};
KOKKOS_INLINE_FUNCTION
void free_state(const Random_XorShift1024<DeviceType>& state) const {
for(int i = 0; i<16; i++)
state_(state.state_idx_,i) = state.state_[i];
p_(state.state_idx_) = state.p_;
}
};
#if defined(KOKKOS_HAVE_CUDA) && defined(__CUDACC__)
template<>
class Random_XorShift1024<Kokkos::Cuda> {
private:
int p_;
const int state_idx_;
uint64_t* state_;
const int stride_;
friend class Random_XorShift1024_Pool<Kokkos::Cuda>;
public:
typedef Kokkos::Cuda device_type;
typedef Random_XorShift1024_Pool<device_type> pool_type;
enum {MAX_URAND = 0xffffffffU};
enum {MAX_URAND64 = 0xffffffffffffffffULL-1};
enum {MAX_RAND = static_cast<int>(0xffffffffU/2)};
enum {MAX_RAND64 = static_cast<int64_t>(0xffffffffffffffffULL/2-1)};
KOKKOS_INLINE_FUNCTION
Random_XorShift1024 (const typename pool_type::state_data_type& state, int p, int state_idx = 0):
p_(p),state_idx_(state_idx),state_(&state(state_idx,0)),stride_(state.stride_1()){
}
KOKKOS_INLINE_FUNCTION
uint32_t urand() {
uint64_t state_0 = state_[ p_ * stride_ ];
uint64_t state_1 = state_[ (p_ = ( p_ + 1 ) & 15) * stride_ ];
state_1 ^= state_1 << 31;
state_1 ^= state_1 >> 11;
state_0 ^= state_0 >> 30;
uint64_t tmp = ( state_[ p_ * stride_ ] = state_0 ^ state_1 ) * 1181783497276652981ULL;
tmp = tmp>>16;
return static_cast<uint32_t>(tmp&MAX_URAND);
}
KOKKOS_INLINE_FUNCTION
uint64_t urand64() {
uint64_t state_0 = state_[ p_ * stride_ ];
uint64_t state_1 = state_[ (p_ = ( p_ + 1 ) & 15) * stride_ ];
state_1 ^= state_1 << 31;
state_1 ^= state_1 >> 11;
state_0 ^= state_0 >> 30;
return (( state_[ p_ * stride_ ] = state_0 ^ state_1 ) * 1181783497276652981LL) - 1;
}
KOKKOS_INLINE_FUNCTION
uint32_t urand(const uint32_t& range) {
const uint32_t max_val = (MAX_URAND/range)*range;
uint32_t tmp = urand();
while(tmp>=max_val)
urand();
return tmp%range;
}
KOKKOS_INLINE_FUNCTION
uint32_t urand(const uint32_t& start, const uint32_t& end ) {
return urand(end-start)+start;
}
KOKKOS_INLINE_FUNCTION
uint64_t urand64(const uint64_t& range) {
const uint64_t max_val = (MAX_URAND64/range)*range;
uint64_t tmp = urand64();
while(tmp>=max_val)
urand64();
return tmp%range;
}
KOKKOS_INLINE_FUNCTION
uint64_t urand64(const uint64_t& start, const uint64_t& end ) {
return urand64(end-start)+start;
}
KOKKOS_INLINE_FUNCTION
int rand() {
return static_cast<int>(urand()/2);
}
KOKKOS_INLINE_FUNCTION
int rand(const int& range) {
const int max_val = (MAX_RAND/range)*range;
int tmp = rand();
while(tmp>=max_val)
rand();
return tmp%range;
}
KOKKOS_INLINE_FUNCTION
int rand(const int& start, const int& end ) {
return rand(end-start)+start;
}
KOKKOS_INLINE_FUNCTION
int64_t rand64() {
return static_cast<int64_t>(urand64()/2);
}
KOKKOS_INLINE_FUNCTION
int64_t rand64(const int64_t& range) {
const int64_t max_val = (MAX_RAND64/range)*range;
int64_t tmp = rand64();
while(tmp>=max_val)
rand64();
return tmp%range;
}
KOKKOS_INLINE_FUNCTION
int64_t rand64(const int64_t& start, const int64_t& end ) {
return rand64(end-start)+start;
}
KOKKOS_INLINE_FUNCTION
float frand() {
return 1.0f * urand64()/MAX_URAND64;
}
KOKKOS_INLINE_FUNCTION
float frand(const float& range) {
return range * urand64()/MAX_URAND64;
}
KOKKOS_INLINE_FUNCTION
float frand(const float& start, const float& end ) {
return frand(end-start)+start;
}
KOKKOS_INLINE_FUNCTION
double drand() {
return 1.0 * urand64()/MAX_URAND64;
}
KOKKOS_INLINE_FUNCTION
double drand(const double& range) {
return range * urand64()/MAX_URAND64;
}
KOKKOS_INLINE_FUNCTION
double drand(const double& start, const double& end ) {
return frand(end-start)+start;
}
//Marsaglia polar method for drawing a standard normal distributed random number
KOKKOS_INLINE_FUNCTION
double normal() {
double S = 2.0;
double U;
while(S>=1.0) {
U = 2.0*drand() - 1.0;
const double V = 2.0*drand() - 1.0;
S = U*U+V*V;
}
return U*sqrt(-2.0*log(S)/S);
}
KOKKOS_INLINE_FUNCTION
double normal(const double& mean, const double& std_dev=1.0) {
return mean + normal()*std_dev;
}
};
template<>
inline
Random_XorShift64_Pool<Kokkos::Cuda>::Random_XorShift64_Pool(uint64_t seed) {
num_states_ = 0;
init(seed,4*32768);
}
template<>
KOKKOS_INLINE_FUNCTION
Random_XorShift64<Kokkos::Cuda> Random_XorShift64_Pool<Kokkos::Cuda>::get_state() const {
#ifdef __CUDA_ARCH__
const int i_offset = (threadIdx.x*blockDim.y + threadIdx.y)*blockDim.z+threadIdx.z;
int i = (((blockIdx.x*gridDim.y+blockIdx.y)*gridDim.z + blockIdx.z) *
blockDim.x*blockDim.y*blockDim.z + i_offset)%num_states_;
while(Kokkos::atomic_compare_exchange(&locks_(i),0,1)) {
i+=blockDim.x*blockDim.y*blockDim.z;
if(i>=num_states_) {i = i_offset;}
}
return Random_XorShift64<Kokkos::Cuda>(state_(i),i);
#else
return Random_XorShift64<Kokkos::Cuda>(state_(0),0);
#endif
}
template<>
KOKKOS_INLINE_FUNCTION
void Random_XorShift64_Pool<Kokkos::Cuda>::free_state(const Random_XorShift64<Kokkos::Cuda> &state) const {
#ifdef __CUDA_ARCH__
state_(state.state_idx_) = state.state_;
locks_(state.state_idx_) = 0;
return;
#endif
}
template<>
inline
Random_XorShift1024_Pool<Kokkos::Cuda>::Random_XorShift1024_Pool(uint64_t seed) {
num_states_ = 0;
init(seed,4*32768);
}
template<>
KOKKOS_INLINE_FUNCTION
Random_XorShift1024<Kokkos::Cuda> Random_XorShift1024_Pool<Kokkos::Cuda>::get_state() const {
#ifdef __CUDA_ARCH__
const int i_offset = (threadIdx.x*blockDim.y + threadIdx.y)*blockDim.z+threadIdx.z;
int i = (((blockIdx.x*gridDim.y+blockIdx.y)*gridDim.z + blockIdx.z) *
blockDim.x*blockDim.y*blockDim.z + i_offset)%num_states_;
while(Kokkos::atomic_compare_exchange(&locks_(i),0,1)) {
i+=blockDim.x*blockDim.y*blockDim.z;
if(i>=num_states_) {i = i_offset;}
}
return Random_XorShift1024<Kokkos::Cuda>(state_, p_(i), i);
#else
return Random_XorShift1024<Kokkos::Cuda>(state_, p_(0), 0);
#endif
}
template<>
KOKKOS_INLINE_FUNCTION
void Random_XorShift1024_Pool<Kokkos::Cuda>::free_state(const Random_XorShift1024<Kokkos::Cuda> &state) const {
#ifdef __CUDA_ARCH__
for(int i=0; i<16; i++)
state_(state.state_idx_,i) = state.state_[i];
locks_(state.state_idx_) = 0;
return;
#endif
}
#endif
namespace Impl {
template<class ViewType, class RandomPool, int loops, int rank, class IndexType>
struct fill_random_functor_range;
template<class ViewType, class RandomPool, int loops, int rank, class IndexType>
struct fill_random_functor_begin_end;
template<class ViewType, class RandomPool, int loops, class IndexType>
struct fill_random_functor_range<ViewType,RandomPool,loops,1,IndexType>{
typedef typename ViewType::execution_space execution_space;
ViewType a;
RandomPool rand_pool;
typename ViewType::const_value_type range;
typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;
fill_random_functor_range(ViewType a_, RandomPool rand_pool_,
typename ViewType::const_value_type range_):
a(a_),rand_pool(rand_pool_),range(range_) {}
KOKKOS_INLINE_FUNCTION
void operator() (const IndexType& i) const {
typename RandomPool::generator_type gen = rand_pool.get_state();
for(IndexType j=0;j<loops;j++) {
const IndexType idx = i*loops+j;
if(idx<static_cast<IndexType>(a.dimension_0()))
a(idx) = Rand::draw(gen,range);
}
rand_pool.free_state(gen);
}
};
template<class ViewType, class RandomPool, int loops, class IndexType>
struct fill_random_functor_range<ViewType,RandomPool,loops,2,IndexType>{
typedef typename ViewType::execution_space execution_space;
ViewType a;
RandomPool rand_pool;
typename ViewType::const_value_type range;
typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;
fill_random_functor_range(ViewType a_, RandomPool rand_pool_,
typename ViewType::const_value_type range_):
a(a_),rand_pool(rand_pool_),range(range_) {}
KOKKOS_INLINE_FUNCTION
void operator() (IndexType i) const {
typename RandomPool::generator_type gen = rand_pool.get_state();
for(IndexType j=0;j<loops;j++) {
const IndexType idx = i*loops+j;
if(idx<static_cast<IndexType>(a.dimension_0())) {
for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
a(idx,k) = Rand::draw(gen,range);
}
}
rand_pool.free_state(gen);
}
};
template<class ViewType, class RandomPool, int loops, class IndexType>
struct fill_random_functor_range<ViewType,RandomPool,loops,3,IndexType>{
typedef typename ViewType::execution_space execution_space;
ViewType a;
RandomPool rand_pool;
typename ViewType::const_value_type range;
typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;
fill_random_functor_range(ViewType a_, RandomPool rand_pool_,
typename ViewType::const_value_type range_):
a(a_),rand_pool(rand_pool_),range(range_) {}
KOKKOS_INLINE_FUNCTION
void operator() (IndexType i) const {
typename RandomPool::generator_type gen = rand_pool.get_state();
for(IndexType j=0;j<loops;j++) {
const IndexType idx = i*loops+j;
if(idx<static_cast<IndexType>(a.dimension_0())) {
for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
for(IndexType l=0;l<static_cast<IndexType>(a.dimension_2());l++)
a(idx,k,l) = Rand::draw(gen,range);
}
}
rand_pool.free_state(gen);
}
};
template<class ViewType, class RandomPool, int loops, class IndexType>
struct fill_random_functor_range<ViewType,RandomPool,loops,4, IndexType>{
typedef typename ViewType::execution_space execution_space;
ViewType a;
RandomPool rand_pool;
typename ViewType::const_value_type range;
typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;
fill_random_functor_range(ViewType a_, RandomPool rand_pool_,
typename ViewType::const_value_type range_):
a(a_),rand_pool(rand_pool_),range(range_) {}
KOKKOS_INLINE_FUNCTION
void operator() (IndexType i) const {
typename RandomPool::generator_type gen = rand_pool.get_state();
for(IndexType j=0;j<loops;j++) {
const IndexType idx = i*loops+j;
if(idx<static_cast<IndexType>(a.dimension_0())) {
for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
for(IndexType l=0;l<static_cast<IndexType>(a.dimension_2());l++)
for(IndexType m=0;m<static_cast<IndexType>(a.dimension_3());m++)
a(idx,k,l,m) = Rand::draw(gen,range);
}
}
rand_pool.free_state(gen);
}
};
template<class ViewType, class RandomPool, int loops, class IndexType>
struct fill_random_functor_range<ViewType,RandomPool,loops,5,IndexType>{
typedef typename ViewType::execution_space execution_space;
ViewType a;
RandomPool rand_pool;
typename ViewType::const_value_type range;
typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;
fill_random_functor_range(ViewType a_, RandomPool rand_pool_,
typename ViewType::const_value_type range_):
a(a_),rand_pool(rand_pool_),range(range_) {}
KOKKOS_INLINE_FUNCTION
void operator() (IndexType i) const {
typename RandomPool::generator_type gen = rand_pool.get_state();
for(IndexType j=0;j<loops;j++) {
const IndexType idx = i*loops+j;
if(idx<static_cast<IndexType>(a.dimension_0())) {
for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
for(IndexType l=0;l<static_cast<IndexType>(a.dimension_2());l++)
for(IndexType m=0;m<static_cast<IndexType>(a.dimension_3());m++)
for(IndexType n=0;n<static_cast<IndexType>(a.dimension_4());n++)
a(idx,k,l,m,n) = Rand::draw(gen,range);
}
}
rand_pool.free_state(gen);
}
};
template<class ViewType, class RandomPool, int loops, class IndexType>
struct fill_random_functor_range<ViewType,RandomPool,loops,6,IndexType>{
typedef typename ViewType::execution_space execution_space;
ViewType a;
RandomPool rand_pool;
typename ViewType::const_value_type range;
typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;
fill_random_functor_range(ViewType a_, RandomPool rand_pool_,
typename ViewType::const_value_type range_):
a(a_),rand_pool(rand_pool_),range(range_) {}
KOKKOS_INLINE_FUNCTION
void operator() (IndexType i) const {
typename RandomPool::generator_type gen = rand_pool.get_state();
for(IndexType j=0;j<loops;j++) {
const IndexType idx = i*loops+j;
if(idx<static_cast<IndexType>(a.dimension_0())) {
for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
for(IndexType l=0;l<static_cast<IndexType>(a.dimension_2());l++)
for(IndexType m=0;m<static_cast<IndexType>(a.dimension_3());m++)
for(IndexType n=0;n<static_cast<IndexType>(a.dimension_4());n++)
for(IndexType o=0;o<static_cast<IndexType>(a.dimension_5());o++)
a(idx,k,l,m,n,o) = Rand::draw(gen,range);
}
}
rand_pool.free_state(gen);
}
};
template<class ViewType, class RandomPool, int loops, class IndexType>
struct fill_random_functor_range<ViewType,RandomPool,loops,7,IndexType>{
typedef typename ViewType::execution_space execution_space;
ViewType a;
RandomPool rand_pool;
typename ViewType::const_value_type range;
typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;
fill_random_functor_range(ViewType a_, RandomPool rand_pool_,
typename ViewType::const_value_type range_):
a(a_),rand_pool(rand_pool_),range(range_) {}
KOKKOS_INLINE_FUNCTION
void operator() (IndexType i) const {
typename RandomPool::generator_type gen = rand_pool.get_state();
for(IndexType j=0;j<loops;j++) {
const IndexType idx = i*loops+j;
if(idx<static_cast<IndexType>(a.dimension_0())) {
for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
for(IndexType l=0;l<static_cast<IndexType>(a.dimension_2());l++)
for(IndexType m=0;m<static_cast<IndexType>(a.dimension_3());m++)
for(IndexType n=0;n<static_cast<IndexType>(a.dimension_4());n++)
for(IndexType o=0;o<static_cast<IndexType>(a.dimension_5());o++)
for(IndexType p=0;p<static_cast<IndexType>(a.dimension_6());p++)
a(idx,k,l,m,n,o,p) = Rand::draw(gen,range);
}
}
rand_pool.free_state(gen);
}
};
template<class ViewType, class RandomPool, int loops, class IndexType>
struct fill_random_functor_range<ViewType,RandomPool,loops,8,IndexType>{
typedef typename ViewType::execution_space execution_space;
ViewType a;
RandomPool rand_pool;
typename ViewType::const_value_type range;
typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;
fill_random_functor_range(ViewType a_, RandomPool rand_pool_,
typename ViewType::const_value_type range_):
a(a_),rand_pool(rand_pool_),range(range_) {}
KOKKOS_INLINE_FUNCTION
void operator() (IndexType i) const {
typename RandomPool::generator_type gen = rand_pool.get_state();
for(IndexType j=0;j<loops;j++) {
const IndexType idx = i*loops+j;
if(idx<static_cast<IndexType>(a.dimension_0())) {
for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
for(IndexType l=0;l<static_cast<IndexType>(a.dimension_2());l++)
for(IndexType m=0;m<static_cast<IndexType>(a.dimension_3());m++)
for(IndexType n=0;n<static_cast<IndexType>(a.dimension_4());n++)
for(IndexType o=0;o<static_cast<IndexType>(a.dimension_5());o++)
for(IndexType p=0;p<static_cast<IndexType>(a.dimension_6());p++)
for(IndexType q=0;q<static_cast<IndexType>(a.dimension_7());q++)
a(idx,k,l,m,n,o,p,q) = Rand::draw(gen,range);
}
}
rand_pool.free_state(gen);
}
};
template<class ViewType, class RandomPool, int loops, class IndexType>
struct fill_random_functor_begin_end<ViewType,RandomPool,loops,1,IndexType>{
typedef typename ViewType::execution_space execution_space;
ViewType a;
RandomPool rand_pool;
typename ViewType::const_value_type begin,end;
typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;
fill_random_functor_begin_end(ViewType a_, RandomPool rand_pool_,
typename ViewType::const_value_type begin_, typename ViewType::const_value_type end_):
a(a_),rand_pool(rand_pool_),begin(begin_),end(end_) {}
KOKKOS_INLINE_FUNCTION
void operator() (IndexType i) const {
typename RandomPool::generator_type gen = rand_pool.get_state();
for(IndexType j=0;j<loops;j++) {
const IndexType idx = i*loops+j;
if(idx<static_cast<IndexType>(a.dimension_0()))
a(idx) = Rand::draw(gen,begin,end);
}
rand_pool.free_state(gen);
}
};
template<class ViewType, class RandomPool, int loops, class IndexType>
struct fill_random_functor_begin_end<ViewType,RandomPool,loops,2,IndexType>{
typedef typename ViewType::execution_space execution_space;
ViewType a;
RandomPool rand_pool;
typename ViewType::const_value_type begin,end;
typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;
fill_random_functor_begin_end(ViewType a_, RandomPool rand_pool_,
typename ViewType::const_value_type begin_, typename ViewType::const_value_type end_):
a(a_),rand_pool(rand_pool_),begin(begin_),end(end_) {}
KOKKOS_INLINE_FUNCTION
void operator() (IndexType i) const {
typename RandomPool::generator_type gen = rand_pool.get_state();
for(IndexType j=0;j<loops;j++) {
const IndexType idx = i*loops+j;
if(idx<static_cast<IndexType>(a.dimension_0())) {
for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
a(idx,k) = Rand::draw(gen,begin,end);
}
}
rand_pool.free_state(gen);
}
};
template<class ViewType, class RandomPool, int loops, class IndexType>
struct fill_random_functor_begin_end<ViewType,RandomPool,loops,3,IndexType>{
typedef typename ViewType::execution_space execution_space;
ViewType a;
RandomPool rand_pool;
typename ViewType::const_value_type begin,end;
typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;
fill_random_functor_begin_end(ViewType a_, RandomPool rand_pool_,
typename ViewType::const_value_type begin_, typename ViewType::const_value_type end_):
a(a_),rand_pool(rand_pool_),begin(begin_),end(end_) {}
KOKKOS_INLINE_FUNCTION
void operator() (IndexType i) const {
typename RandomPool::generator_type gen = rand_pool.get_state();
for(IndexType j=0;j<loops;j++) {
const IndexType idx = i*loops+j;
if(idx<static_cast<IndexType>(a.dimension_0())) {
for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
for(IndexType l=0;l<static_cast<IndexType>(a.dimension_2());l++)
a(idx,k,l) = Rand::draw(gen,begin,end);
}
}
rand_pool.free_state(gen);
}
};
template<class ViewType, class RandomPool, int loops, class IndexType>
struct fill_random_functor_begin_end<ViewType,RandomPool,loops,4,IndexType>{
typedef typename ViewType::execution_space execution_space;
ViewType a;
RandomPool rand_pool;
typename ViewType::const_value_type begin,end;
typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;
fill_random_functor_begin_end(ViewType a_, RandomPool rand_pool_,
typename ViewType::const_value_type begin_, typename ViewType::const_value_type end_):
a(a_),rand_pool(rand_pool_),begin(begin_),end(end_) {}
KOKKOS_INLINE_FUNCTION
void operator() (IndexType i) const {
typename RandomPool::generator_type gen = rand_pool.get_state();
for(IndexType j=0;j<loops;j++) {
const IndexType idx = i*loops+j;
if(idx<static_cast<IndexType>(a.dimension_0())) {
for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
for(IndexType l=0;l<static_cast<IndexType>(a.dimension_2());l++)
for(IndexType m=0;m<static_cast<IndexType>(a.dimension_3());m++)
a(idx,k,l,m) = Rand::draw(gen,begin,end);
}
}
rand_pool.free_state(gen);
}
};
template<class ViewType, class RandomPool, int loops, class IndexType>
struct fill_random_functor_begin_end<ViewType,RandomPool,loops,5,IndexType>{
typedef typename ViewType::execution_space execution_space;
ViewType a;
RandomPool rand_pool;
typename ViewType::const_value_type begin,end;
typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;
fill_random_functor_begin_end(ViewType a_, RandomPool rand_pool_,
typename ViewType::const_value_type begin_, typename ViewType::const_value_type end_):
a(a_),rand_pool(rand_pool_),begin(begin_),end(end_) {}
KOKKOS_INLINE_FUNCTION
void operator() (IndexType i) const {
typename RandomPool::generator_type gen = rand_pool.get_state();
for(IndexType j=0;j<loops;j++) {
const IndexType idx = i*loops+j;
if(idx<static_cast<IndexType>(a.dimension_0())){
for(IndexType l=0;l<static_cast<IndexType>(a.dimension_1());l++)
for(IndexType m=0;m<static_cast<IndexType>(a.dimension_2());m++)
for(IndexType n=0;n<static_cast<IndexType>(a.dimension_3());n++)
for(IndexType o=0;o<static_cast<IndexType>(a.dimension_4());o++)
a(idx,l,m,n,o) = Rand::draw(gen,begin,end);
}
}
rand_pool.free_state(gen);
}
};
template<class ViewType, class RandomPool, int loops, class IndexType>
struct fill_random_functor_begin_end<ViewType,RandomPool,loops,6,IndexType>{
typedef typename ViewType::execution_space execution_space;
ViewType a;
RandomPool rand_pool;
typename ViewType::const_value_type begin,end;
typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;
fill_random_functor_begin_end(ViewType a_, RandomPool rand_pool_,
typename ViewType::const_value_type begin_, typename ViewType::const_value_type end_):
a(a_),rand_pool(rand_pool_),begin(begin_),end(end_) {}
KOKKOS_INLINE_FUNCTION
void operator() (IndexType i) const {
typename RandomPool::generator_type gen = rand_pool.get_state();
for(IndexType j=0;j<loops;j++) {
const IndexType idx = i*loops+j;
if(idx<static_cast<IndexType>(a.dimension_0())) {
for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
for(IndexType l=0;l<static_cast<IndexType>(a.dimension_2());l++)
for(IndexType m=0;m<static_cast<IndexType>(a.dimension_3());m++)
for(IndexType n=0;n<static_cast<IndexType>(a.dimension_4());n++)
for(IndexType o=0;o<static_cast<IndexType>(a.dimension_5());o++)
a(idx,k,l,m,n,o) = Rand::draw(gen,begin,end);
}
}
rand_pool.free_state(gen);
}
};
template<class ViewType, class RandomPool, int loops, class IndexType>
struct fill_random_functor_begin_end<ViewType,RandomPool,loops,7,IndexType>{
typedef typename ViewType::execution_space execution_space;
ViewType a;
RandomPool rand_pool;
typename ViewType::const_value_type begin,end;
typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;
fill_random_functor_begin_end(ViewType a_, RandomPool rand_pool_,
typename ViewType::const_value_type begin_, typename ViewType::const_value_type end_):
a(a_),rand_pool(rand_pool_),begin(begin_),end(end_) {}
KOKKOS_INLINE_FUNCTION
void operator() (IndexType i) const {
typename RandomPool::generator_type gen = rand_pool.get_state();
for(IndexType j=0;j<loops;j++) {
const IndexType idx = i*loops+j;
if(idx<static_cast<IndexType>(a.dimension_0())) {
for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
for(IndexType l=0;l<static_cast<IndexType>(a.dimension_2());l++)
for(IndexType m=0;m<static_cast<IndexType>(a.dimension_3());m++)
for(IndexType n=0;n<static_cast<IndexType>(a.dimension_4());n++)
for(IndexType o=0;o<static_cast<IndexType>(a.dimension_5());o++)
for(IndexType p=0;p<static_cast<IndexType>(a.dimension_6());p++)
a(idx,k,l,m,n,o,p) = Rand::draw(gen,begin,end);
}
}
rand_pool.free_state(gen);
}
};
template<class ViewType, class RandomPool, int loops, class IndexType>
struct fill_random_functor_begin_end<ViewType,RandomPool,loops,8,IndexType>{
typedef typename ViewType::execution_space execution_space;
ViewType a;
RandomPool rand_pool;
typename ViewType::const_value_type begin,end;
typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;
fill_random_functor_begin_end(ViewType a_, RandomPool rand_pool_,
typename ViewType::const_value_type begin_, typename ViewType::const_value_type end_):
a(a_),rand_pool(rand_pool_),begin(begin_),end(end_) {}
KOKKOS_INLINE_FUNCTION
void operator() (IndexType i) const {
typename RandomPool::generator_type gen = rand_pool.get_state();
for(IndexType j=0;j<loops;j++) {
const IndexType idx = i*loops+j;
if(idx<static_cast<IndexType>(a.dimension_0())) {
for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
for(IndexType l=0;l<static_cast<IndexType>(a.dimension_2());l++)
for(IndexType m=0;m<static_cast<IndexType>(a.dimension_3());m++)
for(IndexType n=0;n<static_cast<IndexType>(a.dimension_4());n++)
for(IndexType o=0;o<static_cast<IndexType>(a.dimension_5());o++)
for(IndexType p=0;p<static_cast<IndexType>(a.dimension_6());p++)
for(IndexType q=0;q<static_cast<IndexType>(a.dimension_7());q++)
a(idx,k,l,m,n,o,p,q) = Rand::draw(gen,begin,end);
}
}
rand_pool.free_state(gen);
}
};
}
template<class ViewType, class RandomPool, class IndexType = int64_t>
void fill_random(ViewType a, RandomPool g, typename ViewType::const_value_type range) {
int64_t LDA = a.dimension_0();
if(LDA>0)
parallel_for((LDA+127)/128,Impl::fill_random_functor_range<ViewType,RandomPool,128,ViewType::Rank,IndexType>(a,g,range));
}
template<class ViewType, class RandomPool, class IndexType = int64_t>
void fill_random(ViewType a, RandomPool g, typename ViewType::const_value_type begin,typename ViewType::const_value_type end ) {
int64_t LDA = a.dimension_0();
if(LDA>0)
parallel_for((LDA+127)/128,Impl::fill_random_functor_begin_end<ViewType,RandomPool,128,ViewType::Rank,IndexType>(a,g,begin,end));
}
}
#endif
diff --git a/lib/kokkos/algorithms/src/Kokkos_Sort.hpp b/lib/kokkos/algorithms/src/Kokkos_Sort.hpp
index 6123ce978..5b8c65fee 100644
--- a/lib/kokkos/algorithms/src/Kokkos_Sort.hpp
+++ b/lib/kokkos/algorithms/src/Kokkos_Sort.hpp
@@ -1,496 +1,407 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_SORT_HPP_
#define KOKKOS_SORT_HPP_
#include <Kokkos_Core.hpp>
#include <algorithm>
namespace Kokkos {
- namespace SortImpl {
+ namespace Impl {
template<class ValuesViewType, int Rank=ValuesViewType::Rank>
struct CopyOp;
template<class ValuesViewType>
struct CopyOp<ValuesViewType,1> {
template<class DstType, class SrcType>
KOKKOS_INLINE_FUNCTION
static void copy(DstType& dst, size_t i_dst,
SrcType& src, size_t i_src ) {
dst(i_dst) = src(i_src);
}
};
template<class ValuesViewType>
struct CopyOp<ValuesViewType,2> {
template<class DstType, class SrcType>
KOKKOS_INLINE_FUNCTION
static void copy(DstType& dst, size_t i_dst,
SrcType& src, size_t i_src ) {
for(int j = 0;j< (int) dst.dimension_1(); j++)
dst(i_dst,j) = src(i_src,j);
}
};
template<class ValuesViewType>
struct CopyOp<ValuesViewType,3> {
template<class DstType, class SrcType>
KOKKOS_INLINE_FUNCTION
static void copy(DstType& dst, size_t i_dst,
SrcType& src, size_t i_src ) {
for(int j = 0; j<dst.dimension_1(); j++)
for(int k = 0; k<dst.dimension_2(); k++)
dst(i_dst,j,k) = src(i_src,j,k);
}
};
}
template<class KeyViewType, class BinSortOp, class ExecutionSpace = typename KeyViewType::execution_space,
class SizeType = typename KeyViewType::memory_space::size_type>
class BinSort {
public:
template<class ValuesViewType, class PermuteViewType, class CopyOp>
struct bin_sort_sort_functor {
typedef ExecutionSpace execution_space;
typedef typename ValuesViewType::non_const_type values_view_type;
typedef typename ValuesViewType::const_type const_values_view_type;
Kokkos::View<typename values_view_type::const_data_type,typename values_view_type::array_layout,
typename values_view_type::memory_space,Kokkos::MemoryTraits<Kokkos::RandomAccess> > values;
values_view_type sorted_values;
typename PermuteViewType::const_type sort_order;
bin_sort_sort_functor(const_values_view_type values_, values_view_type sorted_values_, PermuteViewType sort_order_):
values(values_),sorted_values(sorted_values_),sort_order(sort_order_) {}
KOKKOS_INLINE_FUNCTION
void operator() (const int& i) const {
//printf("Sort: %i %i\n",i,sort_order(i));
CopyOp::copy(sorted_values,i,values,sort_order(i));
}
};
typedef ExecutionSpace execution_space;
typedef BinSortOp bin_op_type;
struct bin_count_tag {};
struct bin_offset_tag {};
struct bin_binning_tag {};
struct bin_sort_bins_tag {};
public:
typedef SizeType size_type;
typedef size_type value_type;
typedef Kokkos::View<size_type*, execution_space> offset_type;
typedef Kokkos::View<const int*, execution_space> bin_count_type;
typedef Kokkos::View<typename KeyViewType::const_data_type,
typename KeyViewType::array_layout,
typename KeyViewType::memory_space> const_key_view_type;
typedef Kokkos::View<typename KeyViewType::const_data_type,
typename KeyViewType::array_layout,
typename KeyViewType::memory_space,
Kokkos::MemoryTraits<Kokkos::RandomAccess> > const_rnd_key_view_type;
typedef typename KeyViewType::non_const_value_type non_const_key_scalar;
typedef typename KeyViewType::const_value_type const_key_scalar;
private:
const_key_view_type keys;
const_rnd_key_view_type keys_rnd;
public:
BinSortOp bin_op;
offset_type bin_offsets;
Kokkos::View<int*, ExecutionSpace, Kokkos::MemoryTraits<Kokkos::Atomic> > bin_count_atomic;
bin_count_type bin_count_const;
offset_type sort_order;
bool sort_within_bins;
public:
// Constructor: takes the keys, the binning_operator and optionally whether to sort within bins (default false)
BinSort(const_key_view_type keys_, BinSortOp bin_op_,
bool sort_within_bins_ = false)
:keys(keys_),keys_rnd(keys_), bin_op(bin_op_) {
bin_count_atomic = Kokkos::View<int*, ExecutionSpace >("Kokkos::SortImpl::BinSortFunctor::bin_count",bin_op.max_bins());
bin_count_const = bin_count_atomic;
bin_offsets = offset_type("Kokkos::SortImpl::BinSortFunctor::bin_offsets",bin_op.max_bins());
sort_order = offset_type("PermutationVector",keys.dimension_0());
sort_within_bins = sort_within_bins_;
}
// Create the permutation vector, the bin_offset array and the bin_count array. Can be called again if keys changed
void create_permute_vector() {
Kokkos::parallel_for (Kokkos::RangePolicy<ExecutionSpace,bin_count_tag> (0,keys.dimension_0()),*this);
Kokkos::parallel_scan(Kokkos::RangePolicy<ExecutionSpace,bin_offset_tag> (0,bin_op.max_bins()) ,*this);
Kokkos::deep_copy(bin_count_atomic,0);
Kokkos::parallel_for (Kokkos::RangePolicy<ExecutionSpace,bin_binning_tag> (0,keys.dimension_0()),*this);
if(sort_within_bins)
Kokkos::parallel_for (Kokkos::RangePolicy<ExecutionSpace,bin_sort_bins_tag>(0,bin_op.max_bins()) ,*this);
}
// Sort a view with respect ot the first dimension using the permutation array
template<class ValuesViewType>
void sort(ValuesViewType values) {
ValuesViewType sorted_values = ValuesViewType("Copy",
values.dimension_0(),
values.dimension_1(),
values.dimension_2(),
values.dimension_3(),
values.dimension_4(),
values.dimension_5(),
values.dimension_6(),
values.dimension_7());
parallel_for(values.dimension_0(),
bin_sort_sort_functor<ValuesViewType, offset_type,
- SortImpl::CopyOp<ValuesViewType> >(values,sorted_values,sort_order));
+ Impl::CopyOp<ValuesViewType> >(values,sorted_values,sort_order));
deep_copy(values,sorted_values);
}
// Get the permutation vector
KOKKOS_INLINE_FUNCTION
offset_type get_permute_vector() const { return sort_order;}
// Get the start offsets for each bin
KOKKOS_INLINE_FUNCTION
offset_type get_bin_offsets() const { return bin_offsets;}
// Get the count for each bin
KOKKOS_INLINE_FUNCTION
bin_count_type get_bin_count() const {return bin_count_const;}
public:
KOKKOS_INLINE_FUNCTION
void operator() (const bin_count_tag& tag, const int& i) const {
bin_count_atomic(bin_op.bin(keys,i))++;
}
KOKKOS_INLINE_FUNCTION
void operator() (const bin_offset_tag& tag, const int& i, value_type& offset, const bool& final) const {
if(final) {
bin_offsets(i) = offset;
}
offset+=bin_count_const(i);
}
KOKKOS_INLINE_FUNCTION
void operator() (const bin_binning_tag& tag, const int& i) const {
const int bin = bin_op.bin(keys,i);
const int count = bin_count_atomic(bin)++;
sort_order(bin_offsets(bin) + count) = i;
}
KOKKOS_INLINE_FUNCTION
void operator() (const bin_sort_bins_tag& tag, const int&i ) const {
bool sorted = false;
int upper_bound = bin_offsets(i)+bin_count_const(i);
while(!sorted) {
sorted = true;
int old_idx = sort_order(bin_offsets(i));
int new_idx;
for(int k=bin_offsets(i)+1; k<upper_bound; k++) {
new_idx = sort_order(k);
if(!bin_op(keys_rnd,old_idx,new_idx)) {
sort_order(k-1) = new_idx;
sort_order(k) = old_idx;
sorted = false;
} else {
old_idx = new_idx;
}
}
upper_bound--;
}
}
};
-namespace SortImpl {
-
template<class KeyViewType>
-struct DefaultBinOp1D {
+struct BinOp1D {
const int max_bins_;
const double mul_;
typename KeyViewType::const_value_type range_;
typename KeyViewType::const_value_type min_;
//Construct BinOp with number of bins, minimum value and maxuimum value
- DefaultBinOp1D(int max_bins__, typename KeyViewType::const_value_type min,
+ BinOp1D(int max_bins__, typename KeyViewType::const_value_type min,
typename KeyViewType::const_value_type max )
:max_bins_(max_bins__+1),mul_(1.0*max_bins__/(max-min)),range_(max-min),min_(min) {}
//Determine bin index from key value
template<class ViewType>
KOKKOS_INLINE_FUNCTION
int bin(ViewType& keys, const int& i) const {
return int(mul_*(keys(i)-min_));
}
//Return maximum bin index + 1
KOKKOS_INLINE_FUNCTION
int max_bins() const {
return max_bins_;
}
//Compare to keys within a bin if true new_val will be put before old_val
template<class ViewType, typename iType1, typename iType2>
KOKKOS_INLINE_FUNCTION
bool operator()(ViewType& keys, iType1& i1, iType2& i2) const {
return keys(i1)<keys(i2);
}
};
template<class KeyViewType>
-struct DefaultBinOp3D {
+struct BinOp3D {
int max_bins_[3];
double mul_[3];
typename KeyViewType::non_const_value_type range_[3];
typename KeyViewType::non_const_value_type min_[3];
- DefaultBinOp3D(int max_bins__[], typename KeyViewType::const_value_type min[],
+ BinOp3D(int max_bins__[], typename KeyViewType::const_value_type min[],
typename KeyViewType::const_value_type max[] )
{
max_bins_[0] = max_bins__[0]+1;
max_bins_[1] = max_bins__[1]+1;
max_bins_[2] = max_bins__[2]+1;
mul_[0] = 1.0*max_bins__[0]/(max[0]-min[0]);
mul_[1] = 1.0*max_bins__[1]/(max[1]-min[1]);
mul_[2] = 1.0*max_bins__[2]/(max[2]-min[2]);
range_[0] = max[0]-min[0];
range_[1] = max[1]-min[1];
range_[2] = max[2]-min[2];
min_[0] = min[0];
min_[1] = min[1];
min_[2] = min[2];
}
template<class ViewType>
KOKKOS_INLINE_FUNCTION
int bin(ViewType& keys, const int& i) const {
return int( (((int(mul_[0]*(keys(i,0)-min_[0]))*max_bins_[1]) +
int(mul_[1]*(keys(i,1)-min_[1])))*max_bins_[2]) +
int(mul_[2]*(keys(i,2)-min_[2])));
}
KOKKOS_INLINE_FUNCTION
int max_bins() const {
return max_bins_[0]*max_bins_[1]*max_bins_[2];
}
template<class ViewType, typename iType1, typename iType2>
KOKKOS_INLINE_FUNCTION
bool operator()(ViewType& keys, iType1& i1 , iType2& i2) const {
if (keys(i1,0)>keys(i2,0)) return true;
else if (keys(i1,0)==keys(i2,0)) {
if (keys(i1,1)>keys(i2,1)) return true;
else if (keys(i1,1)==keys(i2,2)) {
if (keys(i1,2)>keys(i2,2)) return true;
}
}
return false;
}
};
-template<typename Scalar>
-struct min_max {
- Scalar min;
- Scalar max;
- bool init;
-
- KOKKOS_INLINE_FUNCTION
- min_max() {
- min = 0;
- max = 0;
- init = 0;
- }
-
- KOKKOS_INLINE_FUNCTION
- min_max (const min_max& val) {
- min = val.min;
- max = val.max;
- init = val.init;
- }
-
- KOKKOS_INLINE_FUNCTION
- min_max operator = (const min_max& val) {
- min = val.min;
- max = val.max;
- init = val.init;
- return *this;
- }
-
- KOKKOS_INLINE_FUNCTION
- void operator+= (const Scalar& val) {
- if(init) {
- min = min<val?min:val;
- max = max>val?max:val;
- } else {
- min = val;
- max = val;
- init = 1;
- }
- }
-
- KOKKOS_INLINE_FUNCTION
- void operator+= (const min_max& val) {
- if(init && val.init) {
- min = min<val.min?min:val.min;
- max = max>val.max?max:val.max;
- } else {
- if(val.init) {
- min = val.min;
- max = val.max;
- init = 1;
- }
- }
- }
-
- KOKKOS_INLINE_FUNCTION
- void operator+= (volatile const Scalar& val) volatile {
- if(init) {
- min = min<val?min:val;
- max = max>val?max:val;
- } else {
- min = val;
- max = val;
- init = 1;
- }
- }
-
- KOKKOS_INLINE_FUNCTION
- void operator+= (volatile const min_max& val) volatile {
- if(init && val.init) {
- min = min<val.min?min:val.min;
- max = max>val.max?max:val.max;
- } else {
- if(val.init) {
- min = val.min;
- max = val.max;
- init = 1;
- }
- }
- }
-};
-
-
-template<class ViewType>
-struct min_max_functor {
- typedef typename ViewType::execution_space execution_space;
- ViewType view;
- typedef min_max<typename ViewType::non_const_value_type> value_type;
- min_max_functor (const ViewType view_):view(view_) {
- }
-
- KOKKOS_INLINE_FUNCTION
- void operator()(const size_t& i, value_type& val) const {
- val += view(i);
- }
-};
+namespace Impl {
template<class ViewType>
bool try_std_sort(ViewType view) {
bool possible = true;
-#if ! KOKKOS_USING_EXP_VIEW
- size_t stride[8];
- view.stride(stride);
-#else
size_t stride[8] = { view.stride_0()
, view.stride_1()
, view.stride_2()
, view.stride_3()
, view.stride_4()
, view.stride_5()
, view.stride_6()
, view.stride_7()
};
-#endif
- possible = possible && Impl::is_same<typename ViewType::memory_space, HostSpace>::value;
+ possible = possible && std::is_same<typename ViewType::memory_space, HostSpace>::value;
possible = possible && (ViewType::Rank == 1);
possible = possible && (stride[0] == 1);
if(possible) {
std::sort(view.ptr_on_device(),view.ptr_on_device()+view.dimension_0());
}
return possible;
}
+template<class ViewType>
+struct min_max_functor {
+ typedef Kokkos::Experimental::MinMaxScalar<typename ViewType::non_const_value_type> minmax_scalar;
+
+ ViewType view;
+ min_max_functor(const ViewType& view_):view(view_) {}
+
+ KOKKOS_INLINE_FUNCTION
+ void operator() (const size_t& i, minmax_scalar& minmax) const {
+ if(view(i) < minmax.min_val) minmax.min_val = view(i);
+ if(view(i) > minmax.max_val) minmax.max_val = view(i);
+ }
+};
+
}
template<class ViewType>
void sort(ViewType view, bool always_use_kokkos_sort = false) {
if(!always_use_kokkos_sort) {
- if(SortImpl::try_std_sort(view)) return;
+ if(Impl::try_std_sort(view)) return;
}
-
- typedef SortImpl::DefaultBinOp1D<ViewType> CompType;
- SortImpl::min_max<typename ViewType::non_const_value_type> val;
- parallel_reduce(view.dimension_0(),SortImpl::min_max_functor<ViewType>(view),val);
- BinSort<ViewType, CompType> bin_sort(view,CompType(view.dimension_0()/2,val.min,val.max),true);
+ typedef BinOp1D<ViewType> CompType;
+
+ Kokkos::Experimental::MinMaxScalar<typename ViewType::non_const_value_type> result;
+ Kokkos::Experimental::MinMax<typename ViewType::non_const_value_type> reducer(result);
+ parallel_reduce(Kokkos::RangePolicy<typename ViewType::execution_space>(0,view.dimension_0()),
+ Impl::min_max_functor<ViewType>(view),reducer);
+ if(result.min_val == result.max_val) return;
+ BinSort<ViewType, CompType> bin_sort(view,CompType(view.dimension_0()/2,result.min_val,result.max_val),true);
bin_sort.create_permute_vector();
bin_sort.sort(view);
}
-/*template<class ViewType, class Comparator>
-void sort(ViewType view, Comparator comp, bool always_use_kokkos_sort = false) {
-
-}*/
-
}
#endif
diff --git a/lib/kokkos/algorithms/unit_tests/CMakeLists.txt b/lib/kokkos/algorithms/unit_tests/CMakeLists.txt
index 654104b44..fde6b967e 100644
--- a/lib/kokkos/algorithms/unit_tests/CMakeLists.txt
+++ b/lib/kokkos/algorithms/unit_tests/CMakeLists.txt
@@ -1,38 +1,38 @@
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINARY_DIR})
-INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR})
+INCLUDE_DIRECTORIES(REQUIRED_DURING_INSTALLATION_TESTING ${CMAKE_CURRENT_SOURCE_DIR})
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR}/../src )
SET(SOURCES
UnitTestMain.cpp
TestCuda.cpp
)
SET(LIBRARIES kokkoscore)
IF(Kokkos_ENABLE_OpenMP)
LIST( APPEND SOURCES
TestOpenMP.cpp
)
ENDIF()
IF(Kokkos_ENABLE_Serial)
LIST( APPEND SOURCES
TestSerial.cpp
)
ENDIF()
IF(Kokkos_ENABLE_Pthread)
LIST( APPEND SOURCES
TestThreads.cpp
)
ENDIF()
TRIBITS_ADD_EXECUTABLE_AND_TEST(
UnitTest
SOURCES ${SOURCES}
COMM serial mpi
NUM_MPI_PROCS 1
FAIL_REGULAR_EXPRESSION " FAILED "
TESTONLYLIBS kokkos_gtest
)
diff --git a/lib/kokkos/algorithms/unit_tests/Makefile b/lib/kokkos/algorithms/unit_tests/Makefile
index 5d79364c5..3027c6a94 100644
--- a/lib/kokkos/algorithms/unit_tests/Makefile
+++ b/lib/kokkos/algorithms/unit_tests/Makefile
@@ -1,92 +1,89 @@
KOKKOS_PATH = ../..
GTEST_PATH = ../../TPL/gtest
vpath %.cpp ${KOKKOS_PATH}/algorithms/unit_tests
default: build_all
echo "End Build"
-
-include $(KOKKOS_PATH)/Makefile.kokkos
-
-ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
- CXX = $(NVCC_WRAPPER)
- CXXFLAGS ?= -O3
- LINK = $(CXX)
- LDFLAGS ?= -lpthread
+ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
+ CXX = $(KOKKOS_PATH)/config/nvcc_wrapper
else
- CXX ?= g++
- CXXFLAGS ?= -O3
- LINK ?= $(CXX)
- LDFLAGS ?= -lpthread
+ CXX = g++
endif
+CXXFLAGS = -O3
+LINK ?= $(CXX)
+LDFLAGS ?= -lpthread
+
+include $(KOKKOS_PATH)/Makefile.kokkos
+
KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/algorithms/unit_tests
TEST_TARGETS =
TARGETS =
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
OBJ_CUDA = TestCuda.o UnitTestMain.o gtest-all.o
TARGETS += KokkosAlgorithms_UnitTest_Cuda
TEST_TARGETS += test-cuda
endif
ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 1)
OBJ_THREADS = TestThreads.o UnitTestMain.o gtest-all.o
TARGETS += KokkosAlgorithms_UnitTest_Threads
TEST_TARGETS += test-threads
endif
ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
OBJ_OPENMP = TestOpenMP.o UnitTestMain.o gtest-all.o
TARGETS += KokkosAlgorithms_UnitTest_OpenMP
TEST_TARGETS += test-openmp
endif
ifeq ($(KOKKOS_INTERNAL_USE_SERIAL), 1)
OBJ_SERIAL = TestSerial.o UnitTestMain.o gtest-all.o
TARGETS += KokkosAlgorithms_UnitTest_Serial
TEST_TARGETS += test-serial
endif
KokkosAlgorithms_UnitTest_Cuda: $(OBJ_CUDA) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_CUDA) $(KOKKOS_LIBS) $(LIB) -o KokkosAlgorithms_UnitTest_Cuda
KokkosAlgorithms_UnitTest_Threads: $(OBJ_THREADS) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_THREADS) $(KOKKOS_LIBS) $(LIB) -o KokkosAlgorithms_UnitTest_Threads
KokkosAlgorithms_UnitTest_OpenMP: $(OBJ_OPENMP) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_OPENMP) $(KOKKOS_LIBS) $(LIB) -o KokkosAlgorithms_UnitTest_OpenMP
KokkosAlgorithms_UnitTest_Serial: $(OBJ_SERIAL) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_SERIAL) $(KOKKOS_LIBS) $(LIB) -o KokkosAlgorithms_UnitTest_Serial
test-cuda: KokkosAlgorithms_UnitTest_Cuda
./KokkosAlgorithms_UnitTest_Cuda
test-threads: KokkosAlgorithms_UnitTest_Threads
./KokkosAlgorithms_UnitTest_Threads
test-openmp: KokkosAlgorithms_UnitTest_OpenMP
./KokkosAlgorithms_UnitTest_OpenMP
test-serial: KokkosAlgorithms_UnitTest_Serial
./KokkosAlgorithms_UnitTest_Serial
build_all: $(TARGETS)
test: $(TEST_TARGETS)
clean: kokkos-clean
rm -f *.o $(TARGETS)
# Compilation rules
%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
gtest-all.o:$(GTEST_PATH)/gtest/gtest-all.cc
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $(GTEST_PATH)/gtest/gtest-all.cc
diff --git a/lib/kokkos/algorithms/unit_tests/TestSort.hpp b/lib/kokkos/algorithms/unit_tests/TestSort.hpp
index ccbcbdd00..03e4fb691 100644
--- a/lib/kokkos/algorithms/unit_tests/TestSort.hpp
+++ b/lib/kokkos/algorithms/unit_tests/TestSort.hpp
@@ -1,206 +1,210 @@
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
#ifndef TESTSORT_HPP_
#define TESTSORT_HPP_
#include <gtest/gtest.h>
#include<Kokkos_Core.hpp>
#include<Kokkos_Random.hpp>
#include<Kokkos_Sort.hpp>
namespace Test {
namespace Impl{
template<class ExecutionSpace, class Scalar>
struct is_sorted_struct {
typedef unsigned int value_type;
typedef ExecutionSpace execution_space;
Kokkos::View<Scalar*,ExecutionSpace> keys;
is_sorted_struct(Kokkos::View<Scalar*,ExecutionSpace> keys_):keys(keys_) {}
KOKKOS_INLINE_FUNCTION
void operator() (int i, unsigned int& count) const {
if(keys(i)>keys(i+1)) count++;
}
};
template<class ExecutionSpace, class Scalar>
struct sum {
typedef double value_type;
typedef ExecutionSpace execution_space;
Kokkos::View<Scalar*,ExecutionSpace> keys;
sum(Kokkos::View<Scalar*,ExecutionSpace> keys_):keys(keys_) {}
KOKKOS_INLINE_FUNCTION
void operator() (int i, double& count) const {
count+=keys(i);
}
};
template<class ExecutionSpace, class Scalar>
struct bin3d_is_sorted_struct {
typedef unsigned int value_type;
typedef ExecutionSpace execution_space;
Kokkos::View<Scalar*[3],ExecutionSpace> keys;
int max_bins;
Scalar min;
Scalar max;
bin3d_is_sorted_struct(Kokkos::View<Scalar*[3],ExecutionSpace> keys_,int max_bins_,Scalar min_,Scalar max_):
keys(keys_),max_bins(max_bins_),min(min_),max(max_) {
}
KOKKOS_INLINE_FUNCTION
void operator() (int i, unsigned int& count) const {
int ix1 = int ((keys(i,0)-min)/max * max_bins);
int iy1 = int ((keys(i,1)-min)/max * max_bins);
int iz1 = int ((keys(i,2)-min)/max * max_bins);
int ix2 = int ((keys(i+1,0)-min)/max * max_bins);
int iy2 = int ((keys(i+1,1)-min)/max * max_bins);
int iz2 = int ((keys(i+1,2)-min)/max * max_bins);
if (ix1>ix2) count++;
else if(ix1==ix2) {
if (iy1>iy2) count++;
else if ((iy1==iy2) && (iz1>iz2)) count++;
}
}
};
template<class ExecutionSpace, class Scalar>
struct sum3D {
typedef double value_type;
typedef ExecutionSpace execution_space;
Kokkos::View<Scalar*[3],ExecutionSpace> keys;
sum3D(Kokkos::View<Scalar*[3],ExecutionSpace> keys_):keys(keys_) {}
KOKKOS_INLINE_FUNCTION
void operator() (int i, double& count) const {
count+=keys(i,0);
count+=keys(i,1);
count+=keys(i,2);
}
};
template<class ExecutionSpace, typename KeyType>
void test_1D_sort(unsigned int n,bool force_kokkos) {
typedef Kokkos::View<KeyType*,ExecutionSpace> KeyViewType;
KeyViewType keys("Keys",n);
+ // Test sorting array with all numbers equal
+ Kokkos::deep_copy(keys,KeyType(1));
+ Kokkos::sort(keys,force_kokkos);
+
Kokkos::Random_XorShift64_Pool<ExecutionSpace> g(1931);
Kokkos::fill_random(keys,g,Kokkos::Random_XorShift64_Pool<ExecutionSpace>::generator_type::MAX_URAND);
double sum_before = 0.0;
double sum_after = 0.0;
unsigned int sort_fails = 0;
Kokkos::parallel_reduce(n,sum<ExecutionSpace, KeyType>(keys),sum_before);
Kokkos::sort(keys,force_kokkos);
Kokkos::parallel_reduce(n,sum<ExecutionSpace, KeyType>(keys),sum_after);
Kokkos::parallel_reduce(n-1,is_sorted_struct<ExecutionSpace, KeyType>(keys),sort_fails);
double ratio = sum_before/sum_after;
double epsilon = 1e-10;
unsigned int equal_sum = (ratio > (1.0-epsilon)) && (ratio < (1.0+epsilon)) ? 1 : 0;
ASSERT_EQ(sort_fails,0);
ASSERT_EQ(equal_sum,1);
}
template<class ExecutionSpace, typename KeyType>
void test_3D_sort(unsigned int n) {
typedef Kokkos::View<KeyType*[3],ExecutionSpace > KeyViewType;
KeyViewType keys("Keys",n*n*n);
Kokkos::Random_XorShift64_Pool<ExecutionSpace> g(1931);
Kokkos::fill_random(keys,g,100.0);
double sum_before = 0.0;
double sum_after = 0.0;
unsigned int sort_fails = 0;
Kokkos::parallel_reduce(keys.dimension_0(),sum3D<ExecutionSpace, KeyType>(keys),sum_before);
int bin_1d = 1;
while( bin_1d*bin_1d*bin_1d*4< (int) keys.dimension_0() ) bin_1d*=2;
int bin_max[3] = {bin_1d,bin_1d,bin_1d};
typename KeyViewType::value_type min[3] = {0,0,0};
typename KeyViewType::value_type max[3] = {100,100,100};
- typedef Kokkos::SortImpl::DefaultBinOp3D< KeyViewType > BinOp;
+ typedef Kokkos::BinOp3D< KeyViewType > BinOp;
BinOp bin_op(bin_max,min,max);
Kokkos::BinSort< KeyViewType , BinOp >
Sorter(keys,bin_op,false);
Sorter.create_permute_vector();
Sorter.template sort< KeyViewType >(keys);
Kokkos::parallel_reduce(keys.dimension_0(),sum3D<ExecutionSpace, KeyType>(keys),sum_after);
Kokkos::parallel_reduce(keys.dimension_0()-1,bin3d_is_sorted_struct<ExecutionSpace, KeyType>(keys,bin_1d,min[0],max[0]),sort_fails);
double ratio = sum_before/sum_after;
double epsilon = 1e-10;
unsigned int equal_sum = (ratio > (1.0-epsilon)) && (ratio < (1.0+epsilon)) ? 1 : 0;
printf("3D Sort Sum: %f %f Fails: %u\n",sum_before,sum_after,sort_fails);
ASSERT_EQ(sort_fails,0);
ASSERT_EQ(equal_sum,1);
}
template<class ExecutionSpace, typename KeyType>
void test_sort(unsigned int N)
{
test_1D_sort<ExecutionSpace,KeyType>(N*N*N, true);
test_1D_sort<ExecutionSpace,KeyType>(N*N*N, false);
test_3D_sort<ExecutionSpace,KeyType>(N);
}
}
}
#endif /* TESTSORT_HPP_ */
diff --git a/lib/kokkos/example/tutorial/01_hello_world/Makefile b/lib/kokkos/benchmarks/bytes_and_flops/Makefile
similarity index 69%
copy from lib/kokkos/example/tutorial/01_hello_world/Makefile
copy to lib/kokkos/benchmarks/bytes_and_flops/Makefile
index 78a9fed0c..6a1917a52 100644
--- a/lib/kokkos/example/tutorial/01_hello_world/Makefile
+++ b/lib/kokkos/benchmarks/bytes_and_flops/Makefile
@@ -1,43 +1,43 @@
-KOKKOS_PATH = ../../..
+KOKKOS_PATH = ${HOME}/kokkos
SRC = $(wildcard *.cpp)
+KOKKOS_DEVICES=Cuda
+KOKKOS_CUDA_OPTIONS=enable_lambda
default: build
echo "Start Build"
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
-CXX = ../../../config/nvcc_wrapper
-CXXFLAGS = -O3
-LINK = ${CXX}
-LINKFLAGS =
-EXE = $(SRC:.cpp=.cuda)
+CXX = ${KOKKOS_PATH}/config/nvcc_wrapper
+EXE = bytes_and_flops.cuda
KOKKOS_DEVICES = "Cuda,OpenMP"
KOKKOS_ARCH = "SNB,Kepler35"
else
CXX = g++
-CXXFLAGS = -O3
-LINK = ${CXX}
-LINKFLAGS =
-EXE = $(SRC:.cpp=.host)
+EXE = bytes_and_flops.host
KOKKOS_DEVICES = "OpenMP"
KOKKOS_ARCH = "SNB"
endif
+CXXFLAGS = -O3 -g
+
DEPFLAGS = -M
+LINK = ${CXX}
+LINKFLAGS =
OBJ = $(SRC:.cpp=.o)
LIB =
include $(KOKKOS_PATH)/Makefile.kokkos
build: $(EXE)
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
clean: kokkos-clean
rm -f *.o *.cuda *.host
# Compilation rules
-%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
+%.o:%.cpp $(KOKKOS_CPP_DEPENDS) bench.hpp bench_unroll_stride.hpp bench_stride.hpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
diff --git a/lib/kokkos/core/src/Kokkos_Concepts.hpp b/lib/kokkos/benchmarks/bytes_and_flops/bench.hpp
similarity index 55%
copy from lib/kokkos/core/src/Kokkos_Concepts.hpp
copy to lib/kokkos/benchmarks/bytes_and_flops/bench.hpp
index 82a342eec..e3fe42a65 100644
--- a/lib/kokkos/core/src/Kokkos_Concepts.hpp
+++ b/lib/kokkos/benchmarks/bytes_and_flops/bench.hpp
@@ -1,78 +1,99 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
-#ifndef KOKKOS_CORE_CONCEPTS_HPP
-#define KOKKOS_CORE_CONCEPTS_HPP
+#include<Kokkos_Core.hpp>
+#include<impl/Kokkos_Timer.hpp>
-#include <type_traits>
-
-namespace Kokkos {
-//Schedules for Execution Policies
-struct Static {};
-struct Dynamic {};
-
-//Schedule Wrapper Type
-template<class T>
-struct Schedule
-{
- static_assert( std::is_same<T,Static>::value
- || std::is_same<T,Dynamic>::value
- , "Kokkos: Invalid Schedule<> type."
- );
- using schedule_type = Schedule<T>;
- using type = T;
+template<class Scalar, int Unroll,int Stride>
+struct Run {
+static void run(int N, int K, int R, int F, int T, int S);
};
-//Specify Iteration Index Type
-template<typename T>
-struct IndexType
-{
- static_assert(std::is_integral<T>::value,"Kokkos: Invalid IndexType<>.");
- using index_type = IndexType<T>;
- using type = T;
+template<class Scalar, int Stride>
+struct RunStride {
+static void run_1(int N, int K, int R, int F, int T, int S);
+static void run_2(int N, int K, int R, int F, int T, int S);
+static void run_3(int N, int K, int R, int F, int T, int S);
+static void run_4(int N, int K, int R, int F, int T, int S);
+static void run_5(int N, int K, int R, int F, int T, int S);
+static void run_6(int N, int K, int R, int F, int T, int S);
+static void run_7(int N, int K, int R, int F, int T, int S);
+static void run_8(int N, int K, int R, int F, int T, int S);
+static void run(int N, int K, int R, int U, int F, int T, int S);
};
-} // namespace Kokkos
+#define STRIDE 1
+#include<bench_stride.hpp>
+#undef STRIDE
+#define STRIDE 2
+#include<bench_stride.hpp>
+#undef STRIDE
+#define STRIDE 4
+#include<bench_stride.hpp>
+#undef STRIDE
+#define STRIDE 8
+#include<bench_stride.hpp>
+#undef STRIDE
+#define STRIDE 16
+#include<bench_stride.hpp>
+#undef STRIDE
+#define STRIDE 32
+#include<bench_stride.hpp>
+#undef STRIDE
-#endif // KOKKOS_CORE_CONCEPTS_HPP
+template<class Scalar>
+void run_stride_unroll(int N, int K, int R, int D, int U, int F, int T, int S) {
+ if(D == 1)
+ RunStride<Scalar,1>::run(N,K,R,U,F,T,S);
+ if(D == 2)
+ RunStride<Scalar,2>::run(N,K,R,U,F,T,S);
+ if(D == 4)
+ RunStride<Scalar,4>::run(N,K,R,U,F,T,S);
+ if(D == 8)
+ RunStride<Scalar,8>::run(N,K,R,U,F,T,S);
+ if(D == 16)
+ RunStride<Scalar,16>::run(N,K,R,U,F,T,S);
+ if(D == 32)
+ RunStride<Scalar,32>::run(N,K,R,U,F,T,S);
+}
diff --git a/lib/kokkos/core/src/impl/Kokkos_CPUDiscovery.cpp b/lib/kokkos/benchmarks/bytes_and_flops/bench_stride.hpp
similarity index 52%
copy from lib/kokkos/core/src/impl/Kokkos_CPUDiscovery.cpp
copy to lib/kokkos/benchmarks/bytes_and_flops/bench_stride.hpp
index b9d23bd81..b60ec8499 100644
--- a/lib/kokkos/core/src/impl/Kokkos_CPUDiscovery.cpp
+++ b/lib/kokkos/benchmarks/bytes_and_flops/bench_stride.hpp
@@ -1,124 +1,124 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
-#ifdef _WIN32
-#define WIN32_LEAN_AND_MEAN
-#include <windows.h>
-#else
-#include <unistd.h>
-#endif
-#include <cstdio>
-#include <cstdlib>
-#include <cstring>
-#include <cerrno>
-namespace Kokkos {
-namespace Impl {
+#define UNROLL 1
+#include<bench_unroll_stride.hpp>
+#undef UNROLL
+#define UNROLL 2
+#include<bench_unroll_stride.hpp>
+#undef UNROLL
+#define UNROLL 3
+#include<bench_unroll_stride.hpp>
+#undef UNROLL
+#define UNROLL 4
+#include<bench_unroll_stride.hpp>
+#undef UNROLL
+#define UNROLL 5
+#include<bench_unroll_stride.hpp>
+#undef UNROLL
+#define UNROLL 6
+#include<bench_unroll_stride.hpp>
+#undef UNROLL
+#define UNROLL 7
+#include<bench_unroll_stride.hpp>
+#undef UNROLL
+#define UNROLL 8
+#include<bench_unroll_stride.hpp>
+#undef UNROLL
-//The following function (processors_per_node) is copied from here:
-// https://lists.gnu.org/archive/html/autoconf/2002-08/msg00126.html
-// Philip Willoughby
-
-int processors_per_node() {
- int nprocs = -1;
- int nprocs_max = -1;
-#ifdef _WIN32
-#ifndef _SC_NPROCESSORS_ONLN
-SYSTEM_INFO info;
-GetSystemInfo(&info);
-#define sysconf(a) info.dwNumberOfProcessors
-#define _SC_NPROCESSORS_ONLN
-#endif
-#endif
-#ifdef _SC_NPROCESSORS_ONLN
- nprocs = sysconf(_SC_NPROCESSORS_ONLN);
- if (nprocs < 1)
- {
- return -1;
- }
- nprocs_max = sysconf(_SC_NPROCESSORS_CONF);
- if (nprocs_max < 1)
- {
- return -1;
- }
- return nprocs;
-#else
- return -1;
-#endif
+template<class Scalar>
+struct RunStride<Scalar,STRIDE> {
+static void run_1(int N, int K, int R, int F, int T, int S) {
+ Run<Scalar,1,STRIDE>::run(N,K,R,F,T,S);
+}
+static void run_2(int N, int K, int R, int F, int T, int S) {
+ Run<Scalar,2,STRIDE>::run(N,K,R,F,T,S);
+}
+static void run_3(int N, int K, int R, int F, int T, int S) {
+ Run<Scalar,3,STRIDE>::run(N,K,R,F,T,S);
+}
+static void run_4(int N, int K, int R, int F, int T, int S) {
+ Run<Scalar,4,STRIDE>::run(N,K,R,F,T,S);
+}
+static void run_5(int N, int K, int R, int F, int T, int S) {
+ Run<Scalar,5,STRIDE>::run(N,K,R,F,T,S);
+}
+static void run_6(int N, int K, int R, int F, int T, int S) {
+ Run<Scalar,6,STRIDE>::run(N,K,R,F,T,S);
+}
+static void run_7(int N, int K, int R, int F, int T, int S) {
+ Run<Scalar,7,STRIDE>::run(N,K,R,F,T,S);
+}
+static void run_8(int N, int K, int R, int F, int T, int S) {
+ Run<Scalar,8,STRIDE>::run(N,K,R,F,T,S);
}
-int mpi_ranks_per_node() {
- char *str;
- int ppn = 1;
- if ((str = getenv("SLURM_TASKS_PER_NODE"))) {
- ppn = atoi(str);
- if(ppn<=0) ppn = 1;
+static void run(int N, int K, int R, int U, int F, int T, int S) {
+ if(U==1) {
+ run_1(N,K,R,F,T,S);
}
- if ((str = getenv("MV2_COMM_WORLD_LOCAL_SIZE"))) {
- ppn = atoi(str);
- if(ppn<=0) ppn = 1;
+ if(U==2) {
+ run_2(N,K,R,F,T,S);
}
- if ((str = getenv("OMPI_COMM_WORLD_LOCAL_SIZE"))) {
- ppn = atoi(str);
- if(ppn<=0) ppn = 1;
+ if(U==3) {
+ run_3(N,K,R,F,T,S);
}
- return ppn;
-}
-
-int mpi_local_rank_on_node() {
- char *str;
- int local_rank=0;
- if ((str = getenv("SLURM_LOCALID"))) {
- local_rank = atoi(str);
+ if(U==4) {
+ run_4(N,K,R,F,T,S);
}
- if ((str = getenv("MV2_COMM_WORLD_LOCAL_RANK"))) {
- local_rank = atoi(str);
+ if(U==5) {
+ run_5(N,K,R,F,T,S);
}
- if ((str = getenv("OMPI_COMM_WORLD_LOCAL_RANK"))) {
- local_rank = atoi(str);
+ if(U==6) {
+ run_6(N,K,R,F,T,S);
}
- return local_rank;
+ if(U==7) {
+ run_7(N,K,R,F,T,S);
+ }
+ if(U==8) {
+ run_8(N,K,R,F,T,S);
+ }
}
+};
-}
-}
diff --git a/lib/kokkos/benchmarks/bytes_and_flops/bench_unroll_stride.hpp b/lib/kokkos/benchmarks/bytes_and_flops/bench_unroll_stride.hpp
new file mode 100644
index 000000000..0992c5b54
--- /dev/null
+++ b/lib/kokkos/benchmarks/bytes_and_flops/bench_unroll_stride.hpp
@@ -0,0 +1,148 @@
+/*
+//@HEADER
+// ************************************************************************
+//
+// Kokkos v. 2.0
+// Copyright (2014) Sandia Corporation
+//
+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
+// the U.S. Government retains certain rights in this software.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the Corporation nor the names of the
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
+//
+// ************************************************************************
+//@HEADER
+*/
+
+template<class Scalar>
+struct Run<Scalar,UNROLL,STRIDE> {
+static void run(int N, int K, int R, int F, int T, int S) {
+ Kokkos::View<Scalar**[STRIDE],Kokkos::LayoutRight> A("A",N,K);
+ Kokkos::View<Scalar**[STRIDE],Kokkos::LayoutRight> B("B",N,K);
+ Kokkos::View<Scalar**[STRIDE],Kokkos::LayoutRight> C("C",N,K);
+
+ Kokkos::deep_copy(A,Scalar(1.5));
+ Kokkos::deep_copy(B,Scalar(2.5));
+ Kokkos::deep_copy(C,Scalar(3.5));
+
+ Kokkos::Timer timer;
+ Kokkos::parallel_for("BenchmarkKernel",Kokkos::TeamPolicy<>(N,T).set_scratch_size(0,Kokkos::PerTeam(S)),
+ KOKKOS_LAMBDA ( const Kokkos::TeamPolicy<>::member_type& team) {
+ const int n = team.league_rank();
+ for(int r=0; r<R; r++) {
+ Kokkos::parallel_for(Kokkos::TeamThreadRange(team,0,K), [&] (const int& i) {
+ Scalar a1 = A(n,i,0);
+ const Scalar b = B(n,i,0);
+#if(UNROLL>1)
+ Scalar a2 = a1*1.3;
+#endif
+#if(UNROLL>2)
+ Scalar a3 = a2*1.1;
+#endif
+#if(UNROLL>3)
+ Scalar a4 = a3*1.1;
+#endif
+#if(UNROLL>4)
+ Scalar a5 = a4*1.3;
+#endif
+#if(UNROLL>5)
+ Scalar a6 = a5*1.1;
+#endif
+#if(UNROLL>6)
+ Scalar a7 = a6*1.1;
+#endif
+#if(UNROLL>7)
+ Scalar a8 = a7*1.1;
+#endif
+
+
+ for(int f = 0; f<F; f++) {
+ a1 += b*a1;
+#if(UNROLL>1)
+ a2 += b*a2;
+#endif
+#if(UNROLL>2)
+ a3 += b*a3;
+#endif
+#if(UNROLL>3)
+ a4 += b*a4;
+#endif
+#if(UNROLL>4)
+ a5 += b*a5;
+#endif
+#if(UNROLL>5)
+ a6 += b*a6;
+#endif
+#if(UNROLL>6)
+ a7 += b*a7;
+#endif
+#if(UNROLL>7)
+ a8 += b*a8;
+#endif
+
+
+ }
+#if(UNROLL==1)
+ C(n,i,0) = a1;
+#endif
+#if(UNROLL==2)
+ C(n,i,0) = a1+a2;
+#endif
+#if(UNROLL==3)
+ C(n,i,0) = a1+a2+a3;
+#endif
+#if(UNROLL==4)
+ C(n,i,0) = a1+a2+a3+a4;
+#endif
+#if(UNROLL==5)
+ C(n,i,0) = a1+a2+a3+a4+a5;
+#endif
+#if(UNROLL==6)
+ C(n,i,0) = a1+a2+a3+a4+a5+a6;
+#endif
+#if(UNROLL==7)
+ C(n,i,0) = a1+a2+a3+a4+a5+a6+a7;
+#endif
+#if(UNROLL==8)
+ C(n,i,0) = a1+a2+a3+a4+a5+a6+a7+a8;
+#endif
+
+ });
+ }
+ });
+ Kokkos::fence();
+ double seconds = timer.seconds();
+
+ double bytes = 1.0*N*K*R*3*sizeof(Scalar);
+ double flops = 1.0*N*K*R*(F*2*UNROLL + 2*(UNROLL-1));
+ printf("NKRUFTS: %i %i %i %i %i %i %i Time: %lfs Bandwidth: %lfGiB/s GFlop/s: %lf\n",N,K,R,UNROLL,F,T,S,seconds,1.0*bytes/seconds/1024/1024/1024,1.e-9*flops/seconds);
+}
+};
+
diff --git a/lib/kokkos/core/unit_test/TestMemorySpaceTracking.hpp b/lib/kokkos/benchmarks/bytes_and_flops/main.cpp
similarity index 53%
copy from lib/kokkos/core/unit_test/TestMemorySpaceTracking.hpp
copy to lib/kokkos/benchmarks/bytes_and_flops/main.cpp
index 575f2f2c2..f54524721 100644
--- a/lib/kokkos/core/unit_test/TestMemorySpaceTracking.hpp
+++ b/lib/kokkos/benchmarks/bytes_and_flops/main.cpp
@@ -1,100 +1,96 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
-#include <gtest/gtest.h>
-
-#include <iostream>
-#include <Kokkos_Core.hpp>
-
-/*--------------------------------------------------------------------------*/
-
-namespace {
-
-template<class Arg1>
-class TestMemorySpace {
-public:
-
- typedef typename Arg1::memory_space MemorySpace;
- TestMemorySpace() { run_test(); }
-
- void run_test()
- {
-
-#if ! KOKKOS_USING_EXP_VIEW
-
- Kokkos::View<int* ,Arg1> invalid;
- ASSERT_EQ(0u, invalid.tracker().ref_count() );
-
- {
- Kokkos::View<int* ,Arg1> a("A",10);
-
- ASSERT_EQ(1u, a.tracker().ref_count() );
-
- {
- Kokkos::View<int* ,Arg1> b = a;
- ASSERT_EQ(2u, b.tracker().ref_count() );
-
- Kokkos::View<int* ,Arg1> D("D",10);
- ASSERT_EQ(1u, D.tracker().ref_count() );
-
- {
- Kokkos::View<int* ,Arg1> E("E",10);
- ASSERT_EQ(1u, E.tracker().ref_count() );
- }
-
- ASSERT_EQ(2u, b.tracker().ref_count() );
- }
- ASSERT_EQ(1u, a.tracker().ref_count() );
- }
-
-#endif
-
+#include<Kokkos_Core.hpp>
+#include<impl/Kokkos_Timer.hpp>
+#include<bench.hpp>
+
+int main(int argc, char* argv[]) {
+ Kokkos::initialize();
+
+
+ if(argc<10) {
+ printf("Arguments: N K R D U F T S\n");
+ printf(" P: Precision (1==float, 2==double)\n");
+ printf(" N,K: dimensions of the 2D array to allocate\n");
+ printf(" R: how often to loop through the K dimension with each team\n");
+ printf(" D: distance between loaded elements (stride)\n");
+ printf(" U: how many independent flops to do per load\n");
+ printf(" F: how many times to repeat the U unrolled operations before reading next element\n");
+ printf(" T: team size\n");
+ printf(" S: shared memory per team (used to control occupancy on GPUs)\n");
+ printf("Example Input GPU:\n");
+ printf(" Bandwidth Bound : 2 100000 1024 1 1 1 1 256 6000\n");
+ printf(" Cache Bound : 2 100000 1024 64 1 1 1 512 20000\n");
+ printf(" Compute Bound : 2 100000 1024 1 1 8 64 256 6000\n");
+ printf(" Load Slots Used : 2 20000 256 32 16 1 1 256 6000\n");
+ printf(" Inefficient Load: 2 20000 256 32 2 1 1 256 20000\n");
+ Kokkos::finalize();
+ return 0;
+ }
+
+
+ int P = atoi(argv[1]);
+ int N = atoi(argv[2]);
+ int K = atoi(argv[3]);
+ int R = atoi(argv[4]);
+ int D = atoi(argv[5]);
+ int U = atoi(argv[6]);
+ int F = atoi(argv[7]);
+ int T = atoi(argv[8]);
+ int S = atoi(argv[9]);
+
+ if(U>8) {printf("U must be 1-8\n"); return 0;}
+ if( (D!=1) && (D!=2) && (D!=4) && (D!=8) && (D!=16) && (D!=32)) {printf("D must be one of 1,2,4,8,16,32\n"); return 0;}
+ if( (P!=1) && (P!=2) ) {printf("P must be one of 1,2\n"); return 0;}
+
+ if(P==1) {
+ run_stride_unroll<float>(N,K,R,D,U,F,T,S);
+ }
+ if(P==2) {
+ run_stride_unroll<double>(N,K,R,D,U,F,T,S);
}
-};
+ Kokkos::finalize();
}
-/*--------------------------------------------------------------------------*/
-
-
-
diff --git a/lib/kokkos/example/tutorial/01_hello_world/Makefile b/lib/kokkos/benchmarks/gather/Makefile
similarity index 70%
copy from lib/kokkos/example/tutorial/01_hello_world/Makefile
copy to lib/kokkos/benchmarks/gather/Makefile
index 78a9fed0c..fd1feab6f 100644
--- a/lib/kokkos/example/tutorial/01_hello_world/Makefile
+++ b/lib/kokkos/benchmarks/gather/Makefile
@@ -1,43 +1,44 @@
-KOKKOS_PATH = ../../..
+KOKKOS_PATH = ${HOME}/kokkos
SRC = $(wildcard *.cpp)
+KOKKOS_DEVICES=Cuda
+KOKKOS_CUDA_OPTIONS=enable_lambda
default: build
echo "Start Build"
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
-CXX = ../../../config/nvcc_wrapper
-CXXFLAGS = -O3
-LINK = ${CXX}
-LINKFLAGS =
-EXE = $(SRC:.cpp=.cuda)
+CXX = ${KOKKOS_PATH}/config/nvcc_wrapper
+EXE = gather.cuda
KOKKOS_DEVICES = "Cuda,OpenMP"
KOKKOS_ARCH = "SNB,Kepler35"
else
CXX = g++
-CXXFLAGS = -O3
-LINK = ${CXX}
-LINKFLAGS =
-EXE = $(SRC:.cpp=.host)
+EXE = gather.host
KOKKOS_DEVICES = "OpenMP"
KOKKOS_ARCH = "SNB"
endif
+CXXFLAGS = -O3 -g
+
DEPFLAGS = -M
+LINK = ${CXX}
+LINKFLAGS =
OBJ = $(SRC:.cpp=.o)
LIB =
include $(KOKKOS_PATH)/Makefile.kokkos
+$(warning ${KOKKOS_CPPFLAGS})
build: $(EXE)
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
clean: kokkos-clean
rm -f *.o *.cuda *.host
# Compilation rules
-%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
+%.o:%.cpp $(KOKKOS_CPP_DEPENDS) gather_unroll.hpp gather.hpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
diff --git a/lib/kokkos/core/src/Kokkos_Concepts.hpp b/lib/kokkos/benchmarks/gather/gather.hpp
similarity index 64%
copy from lib/kokkos/core/src/Kokkos_Concepts.hpp
copy to lib/kokkos/benchmarks/gather/gather.hpp
index 82a342eec..406bd2898 100644
--- a/lib/kokkos/core/src/Kokkos_Concepts.hpp
+++ b/lib/kokkos/benchmarks/gather/gather.hpp
@@ -1,78 +1,92 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
-#ifndef KOKKOS_CORE_CONCEPTS_HPP
-#define KOKKOS_CORE_CONCEPTS_HPP
-
-#include <type_traits>
-
-namespace Kokkos {
-//Schedules for Execution Policies
-struct Static {};
-struct Dynamic {};
-
-//Schedule Wrapper Type
-template<class T>
-struct Schedule
-{
- static_assert( std::is_same<T,Static>::value
- || std::is_same<T,Dynamic>::value
- , "Kokkos: Invalid Schedule<> type."
- );
- using schedule_type = Schedule<T>;
- using type = T;
+template<class Scalar, int UNROLL>
+struct RunGather {
+ static void run(int N, int K, int D, int R, int F);
};
-//Specify Iteration Index Type
-template<typename T>
-struct IndexType
-{
- static_assert(std::is_integral<T>::value,"Kokkos: Invalid IndexType<>.");
- using index_type = IndexType<T>;
- using type = T;
-};
-
-} // namespace Kokkos
-
-#endif // KOKKOS_CORE_CONCEPTS_HPP
+#define UNROLL 1
+#include<gather_unroll.hpp>
+#undef UNROLL
+#define UNROLL 2
+#include<gather_unroll.hpp>
+#undef UNROLL
+#define UNROLL 3
+#include<gather_unroll.hpp>
+#undef UNROLL
+#define UNROLL 4
+#include<gather_unroll.hpp>
+#undef UNROLL
+#define UNROLL 5
+#include<gather_unroll.hpp>
+#undef UNROLL
+#define UNROLL 6
+#include<gather_unroll.hpp>
+#undef UNROLL
+#define UNROLL 7
+#include<gather_unroll.hpp>
+#undef UNROLL
+#define UNROLL 8
+#include<gather_unroll.hpp>
+#undef UNROLL
+template<class Scalar>
+void run_gather_test(int N, int K, int D, int R, int U, int F) {
+ if(U == 1)
+ RunGather<Scalar,1>::run(N,K,D,R,F);
+ if(U == 2)
+ RunGather<Scalar,2>::run(N,K,D,R,F);
+ if(U == 3)
+ RunGather<Scalar,3>::run(N,K,D,R,F);
+ if(U == 4)
+ RunGather<Scalar,4>::run(N,K,D,R,F);
+ if(U == 5)
+ RunGather<Scalar,5>::run(N,K,D,R,F);
+ if(U == 6)
+ RunGather<Scalar,6>::run(N,K,D,R,F);
+ if(U == 7)
+ RunGather<Scalar,7>::run(N,K,D,R,F);
+ if(U == 8)
+ RunGather<Scalar,8>::run(N,K,D,R,F);
+}
diff --git a/lib/kokkos/benchmarks/gather/gather_unroll.hpp b/lib/kokkos/benchmarks/gather/gather_unroll.hpp
new file mode 100644
index 000000000..1d01b26ca
--- /dev/null
+++ b/lib/kokkos/benchmarks/gather/gather_unroll.hpp
@@ -0,0 +1,169 @@
+/*
+//@HEADER
+// ************************************************************************
+//
+// Kokkos v. 2.0
+// Copyright (2014) Sandia Corporation
+//
+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
+// the U.S. Government retains certain rights in this software.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the Corporation nor the names of the
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
+//
+// ************************************************************************
+//@HEADER
+*/
+
+#include<Kokkos_Core.hpp>
+#include<Kokkos_Random.hpp>
+
+template<class Scalar>
+struct RunGather<Scalar,UNROLL> {
+static void run(int N, int K, int D, int R, int F) {
+ Kokkos::View<int**> connectivity("Connectivity",N,K);
+ Kokkos::View<Scalar*> A_in("Input",N);
+ Kokkos::View<Scalar*> B_in("Input",N);
+ Kokkos::View<Scalar*> C("Output",N);
+
+ Kokkos::Random_XorShift64_Pool<> rand_pool(12313);
+
+ Kokkos::deep_copy(A_in,1.5);
+ Kokkos::deep_copy(B_in,2.0);
+
+ Kokkos::View<const Scalar*, Kokkos::MemoryTraits<Kokkos::RandomAccess> > A(A_in);
+ Kokkos::View<const Scalar*, Kokkos::MemoryTraits<Kokkos::RandomAccess> > B(B_in);
+
+ Kokkos::parallel_for("InitKernel",N,
+ KOKKOS_LAMBDA (const int& i) {
+ auto rand_gen = rand_pool.get_state();
+ for( int jj=0; jj<K; jj++) {
+ connectivity(i,jj) = (rand_gen.rand(D) + i - D/2 + N)%N;
+ }
+ rand_pool.free_state(rand_gen);
+ });
+ Kokkos::fence();
+
+
+ Kokkos::Timer timer;
+ for(int r = 0; r<R; r++) {
+ Kokkos::parallel_for("BenchmarkKernel",N,
+ KOKKOS_LAMBDA (const int& i) {
+ Scalar c = Scalar(0.0);
+ for( int jj=0; jj<K; jj++) {
+ const int j = connectivity(i,jj);
+ Scalar a1 = A(j);
+ const Scalar b = B(j);
+#if(UNROLL>1)
+ Scalar a2 = a1*Scalar(1.3);
+#endif
+#if(UNROLL>2)
+ Scalar a3 = a2*Scalar(1.1);
+#endif
+#if(UNROLL>3)
+ Scalar a4 = a3*Scalar(1.1);
+#endif
+#if(UNROLL>4)
+ Scalar a5 = a4*Scalar(1.3);
+#endif
+#if(UNROLL>5)
+ Scalar a6 = a5*Scalar(1.1);
+#endif
+#if(UNROLL>6)
+ Scalar a7 = a6*Scalar(1.1);
+#endif
+#if(UNROLL>7)
+ Scalar a8 = a7*Scalar(1.1);
+#endif
+
+
+ for(int f = 0; f<F; f++) {
+ a1 += b*a1;
+#if(UNROLL>1)
+ a2 += b*a2;
+#endif
+#if(UNROLL>2)
+ a3 += b*a3;
+#endif
+#if(UNROLL>3)
+ a4 += b*a4;
+#endif
+#if(UNROLL>4)
+ a5 += b*a5;
+#endif
+#if(UNROLL>5)
+ a6 += b*a6;
+#endif
+#if(UNROLL>6)
+ a7 += b*a7;
+#endif
+#if(UNROLL>7)
+ a8 += b*a8;
+#endif
+
+
+ }
+#if(UNROLL==1)
+ c += a1;
+#endif
+#if(UNROLL==2)
+ c += a1+a2;
+#endif
+#if(UNROLL==3)
+ c += a1+a2+a3;
+#endif
+#if(UNROLL==4)
+ c += a1+a2+a3+a4;
+#endif
+#if(UNROLL==5)
+ c += a1+a2+a3+a4+a5;
+#endif
+#if(UNROLL==6)
+ c += a1+a2+a3+a4+a5+a6;
+#endif
+#if(UNROLL==7)
+ c += a1+a2+a3+a4+a5+a6+a7;
+#endif
+#if(UNROLL==8)
+ c += a1+a2+a3+a4+a5+a6+a7+a8;
+#endif
+
+ }
+ C(i) = c ;
+ });
+ Kokkos::fence();
+ }
+ double seconds = timer.seconds();
+
+ double bytes = 1.0*N*K*R*(2*sizeof(Scalar)+sizeof(int)) + 1.0*N*R*sizeof(Scalar);
+ double flops = 1.0*N*K*R*(F*2*UNROLL + 2*(UNROLL-1));
+ double gather_ops = 1.0*N*K*R*2;
+ printf("SNKDRUF: %i %i %i %i %i %i %i Time: %lfs Bandwidth: %lfGiB/s GFlop/s: %lf GGather/s: %lf\n",sizeof(Scalar)/4,N,K,D,R,UNROLL,F,seconds,1.0*bytes/seconds/1024/1024/1024,1.e-9*flops/seconds,1.e-9*gather_ops/seconds);
+}
+};
diff --git a/lib/kokkos/core/src/impl/Kokkos_HBWAllocators.cpp b/lib/kokkos/benchmarks/gather/main.cpp
similarity index 54%
rename from lib/kokkos/core/src/impl/Kokkos_HBWAllocators.cpp
rename to lib/kokkos/benchmarks/gather/main.cpp
index 4eb80d03f..161c6f209 100644
--- a/lib/kokkos/core/src/impl/Kokkos_HBWAllocators.cpp
+++ b/lib/kokkos/benchmarks/gather/main.cpp
@@ -1,108 +1,93 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
-#include <Kokkos_HostSpace.hpp>
-
-#include <impl/Kokkos_HBWAllocators.hpp>
-#include <impl/Kokkos_Error.hpp>
-
-
-#include <stdint.h> // uintptr_t
-#include <cstdlib> // for malloc, realloc, and free
-#include <cstring> // for memcpy
+#include<Kokkos_Core.hpp>
+#include<impl/Kokkos_Timer.hpp>
+#include<gather.hpp>
-#if defined(KOKKOS_POSIX_MEMALIGN_AVAILABLE)
-#include <sys/mman.h> // for mmap, munmap, MAP_ANON, etc
-#include <unistd.h> // for sysconf, _SC_PAGE_SIZE, _SC_PHYS_PAGES
-#endif
+int main(int argc, char* argv[]) {
+ Kokkos::initialize(argc,argv);
-#include <sstream>
-#include <iostream>
-#ifdef KOKKOS_HAVE_HBWSPACE
-#include <memkind.h>
+ if(argc<8) {
+ printf("Arguments: S N K D\n");
+ printf(" S: Scalar Type Size (1==float, 2==double, 4=complex<double>)\n");
+ printf(" N: Number of entities\n");
+ printf(" K: Number of things to gather per entity\n");
+ printf(" D: Max distance of gathered things of an entity\n");
+ printf(" R: how often to loop through the K dimension with each team\n");
+ printf(" U: how many independent flops to do per load\n");
+ printf(" F: how many times to repeat the U unrolled operations before reading next element\n");
+ printf("Example Input GPU:\n");
+ printf(" Bandwidth Bound : 2 10000000 1 1 10 1 1\n");
+ printf(" Cache Bound : 2 10000000 64 1 10 1 1\n");
+ printf(" Cache Gather : 2 10000000 64 256 10 1 1\n");
+ printf(" Global Gather : 2 100000000 16 100000000 1 1 1\n");
+ printf(" Typical MD : 2 100000 32 512 1000 8 2\n");
+ Kokkos::finalize();
+ return 0;
+ }
-namespace Kokkos {
-namespace Experimental {
-namespace Impl {
-#define MEMKIND_TYPE MEMKIND_HBW //hbw_get_kind(HBW_PAGESIZE_4KB)
-/*--------------------------------------------------------------------------*/
-void* HBWMallocAllocator::allocate( size_t size )
-{
- std::cout<< "Allocate HBW: " << 1.0e-6*size << "MB" << std::endl;
- void * ptr = NULL;
- if (size) {
- ptr = memkind_malloc(MEMKIND_TYPE,size);
+ int S = atoi(argv[1]);
+ int N = atoi(argv[2]);
+ int K = atoi(argv[3]);
+ int D = atoi(argv[4]);
+ int R = atoi(argv[5]);
+ int U = atoi(argv[6]);
+ int F = atoi(argv[7]);
- if (!ptr)
- {
- std::ostringstream msg ;
- msg << name() << ": allocate(" << size << ") FAILED";
- Kokkos::Impl::throw_runtime_exception( msg.str() );
- }
+ if( (S!=1) && (S!=2) && (S!=4)) {printf("S must be one of 1,2,4\n"); return 0;}
+ if( N<D ) {printf("N must be larger or equal to D\n"); return 0; }
+ if(S==1) {
+ run_gather_test<float>(N,K,D,R,U,F);
}
- return ptr;
-}
-
-void HBWMallocAllocator::deallocate( void * ptr, size_t /*size*/ )
-{
- if (ptr) {
- memkind_free(MEMKIND_TYPE,ptr);
+ if(S==2) {
+ run_gather_test<double>(N,K,D,R,U,F);
}
-}
-
-void * HBWMallocAllocator::reallocate(void * old_ptr, size_t /*old_size*/, size_t new_size)
-{
- void * ptr = memkind_realloc(MEMKIND_TYPE, old_ptr, new_size);
-
- if (new_size > 0u && ptr == NULL) {
- Kokkos::Impl::throw_runtime_exception("Error: Malloc Allocator could not reallocate memory");
+ if(S==4) {
+ run_gather_test<Kokkos::complex<double> >(N,K,D,R,U,F);
}
- return ptr;
+ Kokkos::finalize();
}
-} // namespace Impl
-} // namespace Experimental
-} // namespace Kokkos
-#endif
diff --git a/lib/kokkos/config/nvcc_wrapper b/lib/kokkos/bin/nvcc_wrapper
similarity index 98%
copy from lib/kokkos/config/nvcc_wrapper
copy to lib/kokkos/bin/nvcc_wrapper
index 6093cb61b..cb206cf88 100755
--- a/lib/kokkos/config/nvcc_wrapper
+++ b/lib/kokkos/bin/nvcc_wrapper
@@ -1,280 +1,284 @@
#!/bin/bash
#
# This shell script (nvcc_wrapper) wraps both the host compiler and
# NVCC, if you are building legacy C or C++ code with CUDA enabled.
# The script remedies some differences between the interface of NVCC
# and that of the host compiler, in particular for linking.
# It also means that a legacy code doesn't need separate .cu files;
# it can just use .cpp files.
#
# Default settings: change those according to your machine. For
# example, you may have have two different wrappers with either icpc
# or g++ as their back-end compiler. The defaults can be overwritten
# by using the usual arguments (e.g., -arch=sm_30 -ccbin icpc).
default_arch="sm_35"
#default_arch="sm_50"
#
# The default C++ compiler.
#
host_compiler=${NVCC_WRAPPER_DEFAULT_COMPILER:-"g++"}
#host_compiler="icpc"
#host_compiler="/usr/local/gcc/4.8.3/bin/g++"
#host_compiler="/usr/local/gcc/4.9.1/bin/g++"
#
# Internal variables
#
# C++ files
cpp_files=""
# Host compiler arguments
xcompiler_args=""
# Cuda (NVCC) only arguments
cuda_args=""
# Arguments for both NVCC and Host compiler
shared_args=""
# Linker arguments
xlinker_args=""
# Object files passable to NVCC
object_files=""
# Link objects for the host linker only
object_files_xlinker=""
# Shared libraries with version numbers are not handled correctly by NVCC
shared_versioned_libraries_host=""
shared_versioned_libraries=""
# Does the User set the architecture
arch_set=0
# Does the user overwrite the host compiler
ccbin_set=0
#Error code of compilation
error_code=0
# Do a dry run without actually compiling
dry_run=0
# Skip NVCC compilation and use host compiler directly
host_only=0
# Enable workaround for CUDA 6.5 for pragma ident
replace_pragma_ident=0
# Mark first host compiler argument
first_xcompiler_arg=1
temp_dir=${TMPDIR:-/tmp}
# Check if we have an optimization argument already
optimization_applied=0
#echo "Arguments: $# $@"
while [ $# -gt 0 ]
do
case $1 in
#show the executed command
--show|--nvcc-wrapper-show)
dry_run=1
;;
#run host compilation only
--host-only)
host_only=1
;;
#replace '#pragma ident' with '#ident' this is needed to compile OpenMPI due to a configure script bug and a non standardized behaviour of pragma with macros
--replace-pragma-ident)
replace_pragma_ident=1
;;
#handle source files to be compiled as cuda files
*.cpp|*.cxx|*.cc|*.C|*.c++|*.cu)
cpp_files="$cpp_files $1"
;;
# Ensure we only have one optimization flag because NVCC doesn't allow muliple
-O*)
if [ $optimization_applied -eq 1 ]; then
echo "nvcc_wrapper - *warning* you have set multiple optimization flags (-O*), only the first is used because nvcc can only accept a single optimization setting."
else
shared_args="$shared_args $1"
optimization_applied=1
fi
;;
#Handle shared args (valid for both nvcc and the host compiler)
-D*|-c|-I*|-L*|-l*|-g|--help|--version|-E|-M|-shared)
shared_args="$shared_args $1"
;;
#Handle shared args that have an argument
-o|-MT)
shared_args="$shared_args $1 $2"
shift
;;
#Handle known nvcc args
-gencode*|--dryrun|--verbose|--keep|--keep-dir*|-G|--relocatable-device-code*|-lineinfo|-expt-extended-lambda|--resource-usage|-Xptxas*)
cuda_args="$cuda_args $1"
;;
+ #Handle more known nvcc args
+ --expt-extended-lambda|--expt-relaxed-constexpr)
+ cuda_args="$cuda_args $1"
+ ;;
#Handle known nvcc args that have an argument
-rdc|-maxrregcount|--default-stream)
cuda_args="$cuda_args $1 $2"
shift
;;
#Handle c++11 setting
--std=c++11|-std=c++11)
shared_args="$shared_args $1"
;;
#strip of -std=c++98 due to nvcc warnings and Tribits will place both -std=c++11 and -std=c++98
-std=c++98|--std=c++98)
;;
#strip of pedantic because it produces endless warnings about #LINE added by the preprocessor
-pedantic|-Wpedantic|-ansi)
;;
#strip -Xcompiler because we add it
-Xcompiler)
if [ $first_xcompiler_arg -eq 1 ]; then
xcompiler_args="$2"
first_xcompiler_arg=0
else
xcompiler_args="$xcompiler_args,$2"
fi
shift
;;
#strip of "-x cu" because we add that
-x)
if [[ $2 != "cu" ]]; then
if [ $first_xcompiler_arg -eq 1 ]; then
xcompiler_args="-x,$2"
first_xcompiler_arg=0
else
xcompiler_args="$xcompiler_args,-x,$2"
fi
fi
shift
;;
#Handle -ccbin (if its not set we can set it to a default value)
-ccbin)
cuda_args="$cuda_args $1 $2"
ccbin_set=1
host_compiler=$2
shift
;;
#Handle -arch argument (if its not set use a default
-arch*)
cuda_args="$cuda_args $1"
arch_set=1
;;
#Handle -Xcudafe argument
-Xcudafe)
cuda_args="$cuda_args -Xcudafe $2"
shift
;;
#Handle args that should be sent to the linker
-Wl*)
xlinker_args="$xlinker_args -Xlinker ${1:4:${#1}}"
host_linker_args="$host_linker_args ${1:4:${#1}}"
;;
#Handle object files: -x cu applies to all input files, so give them to linker, except if only linking
*.a|*.so|*.o|*.obj)
object_files="$object_files $1"
object_files_xlinker="$object_files_xlinker -Xlinker $1"
;;
#Handle object files which always need to use "-Xlinker": -x cu applies to all input files, so give them to linker, except if only linking
*.dylib)
object_files="$object_files -Xlinker $1"
object_files_xlinker="$object_files_xlinker -Xlinker $1"
;;
#Handle shared libraries with *.so.* names which nvcc can't do.
*.so.*)
shared_versioned_libraries_host="$shared_versioned_libraries_host $1"
shared_versioned_libraries="$shared_versioned_libraries -Xlinker $1"
;;
#All other args are sent to the host compiler
*)
if [ $first_xcompiler_arg -eq 1 ]; then
xcompiler_args=$1
first_xcompiler_arg=0
else
xcompiler_args="$xcompiler_args,$1"
fi
;;
esac
shift
done
#Add default host compiler if necessary
if [ $ccbin_set -ne 1 ]; then
cuda_args="$cuda_args -ccbin $host_compiler"
fi
#Add architecture command
if [ $arch_set -ne 1 ]; then
cuda_args="$cuda_args -arch=$default_arch"
fi
#Compose compilation command
nvcc_command="nvcc $cuda_args $shared_args $xlinker_args $shared_versioned_libraries"
if [ $first_xcompiler_arg -eq 0 ]; then
nvcc_command="$nvcc_command -Xcompiler $xcompiler_args"
fi
#Compose host only command
host_command="$host_compiler $shared_args $xcompiler_args $host_linker_args $shared_versioned_libraries_host"
#nvcc does not accept '#pragma ident SOME_MACRO_STRING' but it does accept '#ident SOME_MACRO_STRING'
if [ $replace_pragma_ident -eq 1 ]; then
cpp_files2=""
for file in $cpp_files
do
var=`grep pragma ${file} | grep ident | grep "#"`
if [ "${#var}" -gt 0 ]
then
sed 's/#[\ \t]*pragma[\ \t]*ident/#ident/g' $file > $temp_dir/nvcc_wrapper_tmp_$file
cpp_files2="$cpp_files2 $temp_dir/nvcc_wrapper_tmp_$file"
else
cpp_files2="$cpp_files2 $file"
fi
done
cpp_files=$cpp_files2
#echo $cpp_files
fi
if [ "$cpp_files" ]; then
nvcc_command="$nvcc_command $object_files_xlinker -x cu $cpp_files"
else
nvcc_command="$nvcc_command $object_files"
fi
if [ "$cpp_files" ]; then
host_command="$host_command $object_files $cpp_files"
else
host_command="$host_command $object_files"
fi
#Print command for dryrun
if [ $dry_run -eq 1 ]; then
if [ $host_only -eq 1 ]; then
echo $host_command
else
echo $nvcc_command
fi
exit 0
fi
#Run compilation command
if [ $host_only -eq 1 ]; then
$host_command
else
$nvcc_command
fi
error_code=$?
#Report error code
exit $error_code
diff --git a/lib/kokkos/cmake/deps/CUSPARSE.cmake b/lib/kokkos/cmake/deps/CUSPARSE.cmake
index 205f5e2a9..6f26d857c 100644
--- a/lib/kokkos/cmake/deps/CUSPARSE.cmake
+++ b/lib/kokkos/cmake/deps/CUSPARSE.cmake
@@ -1,64 +1,64 @@
# @HEADER
# ************************************************************************
#
# Trilinos: An Object-Oriented Solver Framework
# Copyright (2001) Sandia Corporation
#
#
# Copyright (2001) Sandia Corporation. Under the terms of Contract
# DE-AC04-94AL85000, there is a non-exclusive license for use of this
# work by or on behalf of the U.S. Government. Export of this program
# may require a license from the United States Government.
#
# 1. Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# 3. Neither the name of the Corporation nor the names of the
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
# NOTICE: The United States Government is granted for itself and others
# acting on its behalf a paid-up, nonexclusive, irrevocable worldwide
# license in this data to reproduce, prepare derivative works, and
# perform publicly and display publicly. Beginning five (5) years from
# July 25, 2001, the United States Government is granted for itself and
# others acting on its behalf a paid-up, nonexclusive, irrevocable
# worldwide license in this data to reproduce, prepare derivative works,
# distribute copies to the public, perform publicly and display
# publicly, and to permit others to do so.
#
# NEITHER THE UNITED STATES GOVERNMENT, NOR THE UNITED STATES DEPARTMENT
# OF ENERGY, NOR SANDIA CORPORATION, NOR ANY OF THEIR EMPLOYEES, MAKES
# ANY WARRANTY, EXPRESS OR IMPLIED, OR ASSUMES ANY LEGAL LIABILITY OR
# RESPONSIBILITY FOR THE ACCURACY, COMPLETENESS, OR USEFULNESS OF ANY
# INFORMATION, APPARATUS, PRODUCT, OR PROCESS DISCLOSED, OR REPRESENTS
# THAT ITS USE WOULD NOT INFRINGE PRIVATELY OWNED RIGHTS.
#
# ************************************************************************
# @HEADER
-include(${TRIBITS_DEPS_DIR}/CUDA.cmake)
+#include(${TRIBITS_DEPS_DIR}/CUDA.cmake)
-IF (TPL_ENABLE_CUDA)
- GLOBAL_SET(TPL_CUSPARSE_LIBRARY_DIRS)
- GLOBAL_SET(TPL_CUSPARSE_INCLUDE_DIRS ${TPL_CUDA_INCLUDE_DIRS})
- GLOBAL_SET(TPL_CUSPARSE_LIBRARIES ${CUDA_cusparse_LIBRARY})
- TIBITS_CREATE_IMPORTED_TPL_LIBRARY(CUSPARSE)
-ENDIF()
+#IF (TPL_ENABLE_CUDA)
+# GLOBAL_SET(TPL_CUSPARSE_LIBRARY_DIRS)
+# GLOBAL_SET(TPL_CUSPARSE_INCLUDE_DIRS ${TPL_CUDA_INCLUDE_DIRS})
+# GLOBAL_SET(TPL_CUSPARSE_LIBRARIES ${CUDA_cusparse_LIBRARY})
+# TIBITS_CREATE_IMPORTED_TPL_LIBRARY(CUSPARSE)
+#ENDIF()
diff --git a/lib/kokkos/cmake/tribits.cmake b/lib/kokkos/cmake/tribits.cmake
index 34cd216f8..879d80172 100644
--- a/lib/kokkos/cmake/tribits.cmake
+++ b/lib/kokkos/cmake/tribits.cmake
@@ -1,485 +1,507 @@
INCLUDE(CMakeParseArguments)
INCLUDE(CTest)
+cmake_policy(SET CMP0054 NEW)
+
+IF(NOT DEFINED ${PROJECT_NAME})
+ project(Kokkos)
+ENDIF()
+
+IF(NOT DEFINED ${${PROJECT_NAME}_ENABLE_DEBUG}})
+ SET(${PROJECT_NAME}_ENABLE_DEBUG OFF)
+ENDIF()
+
FUNCTION(ASSERT_DEFINED VARS)
FOREACH(VAR ${VARS})
IF(NOT DEFINED ${VAR})
MESSAGE(SEND_ERROR "Error, the variable ${VAR} is not defined!")
ENDIF()
ENDFOREACH()
ENDFUNCTION()
MACRO(GLOBAL_SET VARNAME)
SET(${VARNAME} ${ARGN} CACHE INTERNAL "")
ENDMACRO()
MACRO(PREPEND_GLOBAL_SET VARNAME)
ASSERT_DEFINED(${VARNAME})
GLOBAL_SET(${VARNAME} ${ARGN} ${${VARNAME}})
ENDMACRO()
FUNCTION(REMOVE_GLOBAL_DUPLICATES VARNAME)
ASSERT_DEFINED(${VARNAME})
IF (${VARNAME})
SET(TMP ${${VARNAME}})
LIST(REMOVE_DUPLICATES TMP)
GLOBAL_SET(${VARNAME} ${TMP})
ENDIF()
ENDFUNCTION()
MACRO(TRIBITS_ADD_OPTION_AND_DEFINE USER_OPTION_NAME MACRO_DEFINE_NAME DOCSTRING DEFAULT_VALUE)
MESSAGE(STATUS "TRIBITS_ADD_OPTION_AND_DEFINE: '${USER_OPTION_NAME}' '${MACRO_DEFINE_NAME}' '${DEFAULT_VALUE}'")
SET( ${USER_OPTION_NAME} "${DEFAULT_VALUE}" CACHE BOOL "${DOCSTRING}" )
IF(NOT ${MACRO_DEFINE_NAME} STREQUAL "")
IF(${USER_OPTION_NAME})
GLOBAL_SET(${MACRO_DEFINE_NAME} ON)
ELSE()
GLOBAL_SET(${MACRO_DEFINE_NAME} OFF)
ENDIF()
ENDIF()
ENDMACRO()
FUNCTION(TRIBITS_CONFIGURE_FILE PACKAGE_NAME_CONFIG_FILE)
# Configure the file
CONFIGURE_FILE(
${PACKAGE_SOURCE_DIR}/cmake/${PACKAGE_NAME_CONFIG_FILE}.in
${CMAKE_CURRENT_BINARY_DIR}/${PACKAGE_NAME_CONFIG_FILE}
)
ENDFUNCTION()
MACRO(TRIBITS_ADD_DEBUG_OPTION)
TRIBITS_ADD_OPTION_AND_DEFINE(
${PROJECT_NAME}_ENABLE_DEBUG
HAVE_${PROJECT_NAME_UC}_DEBUG
"Enable a host of runtime debug checking."
OFF
)
ENDMACRO()
MACRO(TRIBITS_ADD_TEST_DIRECTORIES)
FOREACH(TEST_DIR ${ARGN})
ADD_SUBDIRECTORY(${TEST_DIR})
ENDFOREACH()
ENDMACRO()
MACRO(TRIBITS_ADD_EXAMPLE_DIRECTORIES)
IF(${PACKAGE_NAME}_ENABLE_EXAMPLES OR ${PARENT_PACKAGE_NAME}_ENABLE_EXAMPLES)
FOREACH(EXAMPLE_DIR ${ARGN})
ADD_SUBDIRECTORY(${EXAMPLE_DIR})
ENDFOREACH()
ENDIF()
ENDMACRO()
+
+function(INCLUDE_DIRECTORIES)
+ cmake_parse_arguments(INCLUDE_DIRECTORIES "REQUIRED_DURING_INSTALLATION_TESTING" "" "" ${ARGN})
+ _INCLUDE_DIRECTORIES(${INCLUDE_DIRECTORIES_UNPARSED_ARGUMENTS})
+endfunction()
+
+
MACRO(TARGET_TRANSFER_PROPERTY TARGET_NAME PROP_IN PROP_OUT)
SET(PROP_VALUES)
FOREACH(TARGET_X ${ARGN})
LIST(APPEND PROP_VALUES "$<TARGET_PROPERTY:${TARGET_X},${PROP_IN}>")
ENDFOREACH()
SET_TARGET_PROPERTIES(${TARGET_NAME} PROPERTIES ${PROP_OUT} "${PROP_VALUES}")
ENDMACRO()
MACRO(ADD_INTERFACE_LIBRARY LIB_NAME)
FILE(WRITE ${CMAKE_CURRENT_BINARY_DIR}/dummy.cpp "")
ADD_LIBRARY(${LIB_NAME} STATIC ${CMAKE_CURRENT_BINARY_DIR}/dummy.cpp)
SET_TARGET_PROPERTIES(${LIB_NAME} PROPERTIES INTERFACE TRUE)
ENDMACRO()
# Older versions of cmake does not make include directories transitive
MACRO(TARGET_LINK_AND_INCLUDE_LIBRARIES TARGET_NAME)
TARGET_LINK_LIBRARIES(${TARGET_NAME} LINK_PUBLIC ${ARGN})
FOREACH(DEP_LIB ${ARGN})
TARGET_INCLUDE_DIRECTORIES(${TARGET_NAME} PUBLIC $<TARGET_PROPERTY:${DEP_LIB},INTERFACE_INCLUDE_DIRECTORIES>)
TARGET_INCLUDE_DIRECTORIES(${TARGET_NAME} PUBLIC $<TARGET_PROPERTY:${DEP_LIB},INCLUDE_DIRECTORIES>)
ENDFOREACH()
ENDMACRO()
FUNCTION(TRIBITS_ADD_LIBRARY LIBRARY_NAME)
SET(options STATIC SHARED TESTONLY NO_INSTALL_LIB_OR_HEADERS CUDALIBRARY)
SET(oneValueArgs)
SET(multiValueArgs HEADERS HEADERS_INSTALL_SUBDIR NOINSTALLHEADERS SOURCES DEPLIBS IMPORTEDLIBS DEFINES ADDED_LIB_TARGET_NAME_OUT)
CMAKE_PARSE_ARGUMENTS(PARSE "${options}" "${oneValueArgs}" "${multiValueArgs}" ${ARGN})
IF(PARSE_HEADERS)
LIST(REMOVE_DUPLICATES PARSE_HEADERS)
ENDIF()
IF(PARSE_SOURCES)
LIST(REMOVE_DUPLICATES PARSE_SOURCES)
ENDIF()
# Local variable to hold all of the libraries that will be directly linked
# to this library.
SET(LINK_LIBS ${${PACKAGE_NAME}_DEPS})
# Add dependent libraries passed directly in
IF (PARSE_IMPORTEDLIBS)
LIST(APPEND LINK_LIBS ${PARSE_IMPORTEDLIBS})
ENDIF()
IF (PARSE_DEPLIBS)
LIST(APPEND LINK_LIBS ${PARSE_DEPLIBS})
ENDIF()
# Add the library and all the dependencies
IF (PARSE_DEFINES)
ADD_DEFINITIONS(${PARSE_DEFINES})
ENDIF()
IF (PARSE_STATIC)
SET(STATIC_KEYWORD "STATIC")
ELSE()
SET(STATIC_KEYWORD)
ENDIF()
IF (PARSE_SHARED)
SET(SHARED_KEYWORD "SHARED")
ELSE()
SET(SHARED_KEYWORD)
ENDIF()
IF (PARSE_TESTONLY)
SET(EXCLUDE_FROM_ALL_KEYWORD "EXCLUDE_FROM_ALL")
ELSE()
SET(EXCLUDE_FROM_ALL_KEYWORD)
ENDIF()
IF (NOT PARSE_CUDALIBRARY)
ADD_LIBRARY(
${LIBRARY_NAME}
${STATIC_KEYWORD}
${SHARED_KEYWORD}
${EXCLUDE_FROM_ALL_KEYWORD}
${PARSE_HEADERS}
${PARSE_NOINSTALLHEADERS}
${PARSE_SOURCES}
)
ELSE()
CUDA_ADD_LIBRARY(
${LIBRARY_NAME}
${PARSE_HEADERS}
${PARSE_NOINSTALLHEADERS}
${PARSE_SOURCES}
)
ENDIF()
TARGET_LINK_AND_INCLUDE_LIBRARIES(${LIBRARY_NAME} ${LINK_LIBS})
IF (NOT PARSE_TESTONLY OR PARSE_NO_INSTALL_LIB_OR_HEADERS)
INSTALL(
TARGETS ${LIBRARY_NAME}
EXPORT ${PROJECT_NAME}
RUNTIME DESTINATION bin
LIBRARY DESTINATION lib
ARCHIVE DESTINATION lib
COMPONENT ${PACKAGE_NAME}
)
INSTALL(
FILES ${PARSE_HEADERS}
EXPORT ${PROJECT_NAME}
DESTINATION include
COMPONENT ${PACKAGE_NAME}
)
INSTALL(
DIRECTORY ${PARSE_HEADERS_INSTALL_SUBDIR}
EXPORT ${PROJECT_NAME}
DESTINATION include
COMPONENT ${PACKAGE_NAME}
)
ENDIF()
IF (NOT PARSE_TESTONLY)
PREPEND_GLOBAL_SET(${PACKAGE_NAME}_LIBS ${LIBRARY_NAME})
REMOVE_GLOBAL_DUPLICATES(${PACKAGE_NAME}_LIBS)
ENDIF()
ENDFUNCTION()
FUNCTION(TRIBITS_ADD_EXECUTABLE EXE_NAME)
SET(options NOEXEPREFIX NOEXESUFFIX ADD_DIR_TO_NAME INSTALLABLE TESTONLY)
SET(oneValueArgs ADDED_EXE_TARGET_NAME_OUT)
SET(multiValueArgs SOURCES CATEGORIES HOST XHOST HOSTTYPE XHOSTTYPE DIRECTORY TESTONLYLIBS IMPORTEDLIBS DEPLIBS COMM LINKER_LANGUAGE TARGET_DEFINES DEFINES)
CMAKE_PARSE_ARGUMENTS(PARSE "${options}" "${oneValueArgs}" "${multiValueArgs}" ${ARGN})
IF (PARSE_TARGET_DEFINES)
TARGET_COMPILE_DEFINITIONS(${EXE_NAME} PUBLIC ${PARSE_TARGET_DEFINES})
ENDIF()
SET(LINK_LIBS PACKAGE_${PACKAGE_NAME})
IF (PARSE_TESTONLYLIBS)
LIST(APPEND LINK_LIBS ${PARSE_TESTONLYLIBS})
ENDIF()
IF (PARSE_IMPORTEDLIBS)
LIST(APPEND LINK_LIBS ${PARSE_IMPORTEDLIBS})
ENDIF()
SET (EXE_SOURCES)
IF(PARSE_DIRECTORY)
FOREACH( SOURCE_FILE ${PARSE_SOURCES} )
IF(IS_ABSOLUTE ${SOURCE_FILE})
SET (EXE_SOURCES ${EXE_SOURCES} ${SOURCE_FILE})
ELSE()
SET (EXE_SOURCES ${EXE_SOURCES} ${PARSE_DIRECTORY}/${SOURCE_FILE})
ENDIF()
ENDFOREACH( )
ELSE()
FOREACH( SOURCE_FILE ${PARSE_SOURCES} )
SET (EXE_SOURCES ${EXE_SOURCES} ${SOURCE_FILE})
ENDFOREACH( )
ENDIF()
SET(EXE_BINARY_NAME ${EXE_NAME})
IF(DEFINED PACKAGE_NAME AND NOT PARSE_NOEXEPREFIX)
SET(EXE_BINARY_NAME ${PACKAGE_NAME}_${EXE_BINARY_NAME})
ENDIF()
IF (PARSE_TESTONLY)
SET(EXCLUDE_FROM_ALL_KEYWORD "EXCLUDE_FROM_ALL")
ELSE()
SET(EXCLUDE_FROM_ALL_KEYWORD)
ENDIF()
ADD_EXECUTABLE(${EXE_BINARY_NAME} ${EXCLUDE_FROM_ALL_KEYWORD} ${EXE_SOURCES})
TARGET_LINK_AND_INCLUDE_LIBRARIES(${EXE_BINARY_NAME} ${LINK_LIBS})
IF(PARSE_ADDED_EXE_TARGET_NAME_OUT)
SET(${PARSE_ADDED_EXE_TARGET_NAME_OUT} ${EXE_BINARY_NAME} PARENT_SCOPE)
ENDIF()
IF(PARSE_INSTALLABLE)
INSTALL(
TARGETS ${EXE_BINARY_NAME}
EXPORT ${PROJECT_NAME}
DESTINATION bin
)
ENDIF()
ENDFUNCTION()
ADD_CUSTOM_TARGET(check COMMAND ${CMAKE_CTEST_COMMAND} -VV -C ${CMAKE_CFG_INTDIR})
+FUNCTION(TRIBITS_ADD_TEST)
+ENDFUNCTION()
+FUNCTION(TRIBITS_TPL_TENTATIVELY_ENABLE)
+ENDFUNCTION()
+
FUNCTION(TRIBITS_ADD_EXECUTABLE_AND_TEST EXE_NAME)
SET(options STANDARD_PASS_OUTPUT WILL_FAIL)
SET(oneValueArgs PASS_REGULAR_EXPRESSION FAIL_REGULAR_EXPRESSION ENVIRONMENT TIMEOUT CATEGORIES ADDED_TESTS_NAMES_OUT ADDED_EXE_TARGET_NAME_OUT)
SET(multiValueArgs)
CMAKE_PARSE_ARGUMENTS(PARSE "${options}" "${oneValueArgs}" "${multiValueArgs}" ${ARGN})
TRIBITS_ADD_EXECUTABLE(${EXE_NAME} TESTONLY ADDED_EXE_TARGET_NAME_OUT TEST_NAME ${PARSE_UNPARSED_ARGUMENTS})
IF(WIN32)
ADD_TEST(NAME ${TEST_NAME} WORKING_DIRECTORY ${LIBRARY_OUTPUT_PATH} COMMAND ${TEST_NAME}${CMAKE_EXECUTABLE_SUFFIX})
ELSE()
ADD_TEST(NAME ${TEST_NAME} COMMAND ${TEST_NAME})
ENDIF()
ADD_DEPENDENCIES(check ${TEST_NAME})
IF(PARSE_FAIL_REGULAR_EXPRESSION)
SET_TESTS_PROPERTIES(${TEST_NAME} PROPERTIES FAIL_REGULAR_EXPRESSION ${PARSE_FAIL_REGULAR_EXPRESSION})
ENDIF()
IF(PARSE_PASS_REGULAR_EXPRESSION)
SET_TESTS_PROPERTIES(${TEST_NAME} PROPERTIES PASS_REGULAR_EXPRESSION ${PARSE_PASS_REGULAR_EXPRESSION})
ENDIF()
IF(PARSE_WILL_FAIL)
SET_TESTS_PROPERTIES(${TEST_NAME} PROPERTIES WILL_FAIL ${PARSE_WILL_FAIL})
ENDIF()
IF(PARSE_ADDED_TESTS_NAMES_OUT)
SET(${PARSE_ADDED_TESTS_NAMES_OUT} ${TEST_NAME} PARENT_SCOPE)
ENDIF()
IF(PARSE_ADDED_EXE_TARGET_NAME_OUT)
SET(${PARSE_ADDED_EXE_TARGET_NAME_OUT} ${TEST_NAME} PARENT_SCOPE)
ENDIF()
ENDFUNCTION()
MACRO(TIBITS_CREATE_IMPORTED_TPL_LIBRARY TPL_NAME)
ADD_INTERFACE_LIBRARY(TPL_LIB_${TPL_NAME})
TARGET_LINK_LIBRARIES(TPL_LIB_${TPL_NAME} LINK_PUBLIC ${TPL_${TPL_NAME}_LIBRARIES})
TARGET_INCLUDE_DIRECTORIES(TPL_LIB_${TPL_NAME} INTERFACE ${TPL_${TPL_NAME}_INCLUDE_DIRS})
ENDMACRO()
FUNCTION(TRIBITS_TPL_FIND_INCLUDE_DIRS_AND_LIBRARIES TPL_NAME)
SET(options MUST_FIND_ALL_LIBS MUST_FIND_ALL_HEADERS NO_PRINT_ENABLE_SUCCESS_FAIL)
SET(oneValueArgs)
SET(multiValueArgs REQUIRED_HEADERS REQUIRED_LIBS_NAMES)
CMAKE_PARSE_ARGUMENTS(PARSE "${options}" "${oneValueArgs}" "${multiValueArgs}" ${ARGN})
SET(_${TPL_NAME}_ENABLE_SUCCESS TRUE)
IF (PARSE_REQUIRED_LIBS_NAMES)
FIND_LIBRARY(TPL_${TPL_NAME}_LIBRARIES NAMES ${PARSE_REQUIRED_LIBS_NAMES})
IF(NOT TPL_${TPL_NAME}_LIBRARIES)
SET(_${TPL_NAME}_ENABLE_SUCCESS FALSE)
ENDIF()
ENDIF()
IF (PARSE_REQUIRED_HEADERS)
FIND_PATH(TPL_${TPL_NAME}_INCLUDE_DIRS NAMES ${PARSE_REQUIRED_HEADERS})
IF(NOT TPL_${TPL_NAME}_INCLUDE_DIRS)
SET(_${TPL_NAME}_ENABLE_SUCCESS FALSE)
ENDIF()
ENDIF()
IF (_${TPL_NAME}_ENABLE_SUCCESS)
TIBITS_CREATE_IMPORTED_TPL_LIBRARY(${TPL_NAME})
ENDIF()
ENDFUNCTION()
MACRO(TRIBITS_PROCESS_TPL_DEP_FILE TPL_FILE)
GET_FILENAME_COMPONENT(TPL_NAME ${TPL_FILE} NAME_WE)
INCLUDE("${TPL_FILE}")
IF(TARGET TPL_LIB_${TPL_NAME})
MESSAGE(STATUS "Found tpl library: ${TPL_NAME}")
SET(TPL_ENABLE_${TPL_NAME} TRUE)
ELSE()
MESSAGE(STATUS "Tpl library not found: ${TPL_NAME}")
SET(TPL_ENABLE_${TPL_NAME} FALSE)
ENDIF()
ENDMACRO()
MACRO(PREPEND_TARGET_SET VARNAME TARGET_NAME TYPE)
IF(TYPE STREQUAL "REQUIRED")
SET(REQUIRED TRUE)
ELSE()
SET(REQUIRED FALSE)
ENDIF()
IF(TARGET ${TARGET_NAME})
PREPEND_GLOBAL_SET(${VARNAME} ${TARGET_NAME})
ELSE()
IF(REQUIRED)
MESSAGE(FATAL_ERROR "Missing dependency ${TARGET_NAME}")
ENDIF()
ENDIF()
ENDMACRO()
MACRO(TRIBITS_APPEND_PACKAGE_DEPS DEP_LIST TYPE)
FOREACH(DEP ${ARGN})
PREPEND_GLOBAL_SET(${DEP_LIST} PACKAGE_${DEP})
ENDFOREACH()
ENDMACRO()
MACRO(TRIBITS_APPEND_TPLS_DEPS DEP_LIST TYPE)
FOREACH(DEP ${ARGN})
PREPEND_TARGET_SET(${DEP_LIST} TPL_LIB_${DEP} ${TYPE})
ENDFOREACH()
ENDMACRO()
MACRO(TRIBITS_ENABLE_TPLS)
FOREACH(TPL ${ARGN})
IF(TARGET ${TPL})
GLOBAL_SET(${PACKAGE_NAME}_ENABLE_${TPL} TRUE)
ELSE()
GLOBAL_SET(${PACKAGE_NAME}_ENABLE_${TPL} FALSE)
ENDIF()
ENDFOREACH()
ENDMACRO()
MACRO(TRIBITS_PACKAGE_DEFINE_DEPENDENCIES)
SET(options)
SET(oneValueArgs)
SET(multiValueArgs
LIB_REQUIRED_PACKAGES
LIB_OPTIONAL_PACKAGES
TEST_REQUIRED_PACKAGES
TEST_OPTIONAL_PACKAGES
LIB_REQUIRED_TPLS
LIB_OPTIONAL_TPLS
TEST_REQUIRED_TPLS
TEST_OPTIONAL_TPLS
REGRESSION_EMAIL_LIST
SUBPACKAGES_DIRS_CLASSIFICATIONS_OPTREQS
)
CMAKE_PARSE_ARGUMENTS(PARSE "${options}" "${oneValueArgs}" "${multiValueArgs}" ${ARGN})
GLOBAL_SET(${PACKAGE_NAME}_DEPS "")
TRIBITS_APPEND_PACKAGE_DEPS(${PACKAGE_NAME}_DEPS REQUIRED ${PARSE_LIB_REQUIRED_PACKAGES})
TRIBITS_APPEND_PACKAGE_DEPS(${PACKAGE_NAME}_DEPS OPTIONAL ${PARSE_LIB_OPTIONAL_PACKAGES})
TRIBITS_APPEND_TPLS_DEPS(${PACKAGE_NAME}_DEPS REQUIRED ${PARSE_LIB_REQUIRED_TPLS})
TRIBITS_APPEND_TPLS_DEPS(${PACKAGE_NAME}_DEPS OPTIONAL ${PARSE_LIB_OPTIONAL_TPLS})
GLOBAL_SET(${PACKAGE_NAME}_TEST_DEPS "")
TRIBITS_APPEND_PACKAGE_DEPS(${PACKAGE_NAME}_TEST_DEPS REQUIRED ${PARSE_TEST_REQUIRED_PACKAGES})
TRIBITS_APPEND_PACKAGE_DEPS(${PACKAGE_NAME}_TEST_DEPS OPTIONAL ${PARSE_TEST_OPTIONAL_PACKAGES})
TRIBITS_APPEND_TPLS_DEPS(${PACKAGE_NAME}_TEST_DEPS REQUIRED ${PARSE_TEST_REQUIRED_TPLS})
TRIBITS_APPEND_TPLS_DEPS(${PACKAGE_NAME}_TEST_DEPS OPTIONAL ${PARSE_TEST_OPTIONAL_TPLS})
TRIBITS_ENABLE_TPLS(${PARSE_LIB_REQUIRED_TPLS} ${PARSE_LIB_OPTIONAL_TPLS} ${PARSE_TEST_REQUIRED_TPLS} ${PARSE_TEST_OPTIONAL_TPLS})
ENDMACRO()
MACRO(TRIBITS_SUBPACKAGE NAME)
SET(PACKAGE_SOURCE_DIR ${CMAKE_CURRENT_SOURCE_DIR})
SET(PARENT_PACKAGE_NAME ${PACKAGE_NAME})
SET(PACKAGE_NAME ${PACKAGE_NAME}${NAME})
STRING(TOUPPER ${PACKAGE_NAME} PACKAGE_NAME_UC)
ADD_INTERFACE_LIBRARY(PACKAGE_${PACKAGE_NAME})
GLOBAL_SET(${PACKAGE_NAME}_LIBS "")
INCLUDE(${PACKAGE_SOURCE_DIR}/cmake/Dependencies.cmake)
ENDMACRO(TRIBITS_SUBPACKAGE)
MACRO(TRIBITS_SUBPACKAGE_POSTPROCESS)
TARGET_LINK_AND_INCLUDE_LIBRARIES(PACKAGE_${PACKAGE_NAME} ${${PACKAGE_NAME}_LIBS})
ENDMACRO(TRIBITS_SUBPACKAGE_POSTPROCESS)
MACRO(TRIBITS_PACKAGE_DECL NAME)
PROJECT(${NAME})
STRING(TOUPPER ${PROJECT_NAME} PROJECT_NAME_UC)
SET(PACKAGE_NAME ${PROJECT_NAME})
STRING(TOUPPER ${PACKAGE_NAME} PACKAGE_NAME_UC)
SET(TRIBITS_DEPS_DIR "${CMAKE_SOURCE_DIR}/cmake/deps")
FILE(GLOB TPLS_FILES "${TRIBITS_DEPS_DIR}/*.cmake")
FOREACH(TPL_FILE ${TPLS_FILES})
TRIBITS_PROCESS_TPL_DEP_FILE(${TPL_FILE})
ENDFOREACH()
ENDMACRO()
MACRO(TRIBITS_PROCESS_SUBPACKAGES)
FILE(GLOB SUBPACKAGES RELATIVE ${CMAKE_SOURCE_DIR} */cmake/Dependencies.cmake)
FOREACH(SUBPACKAGE ${SUBPACKAGES})
GET_FILENAME_COMPONENT(SUBPACKAGE_CMAKE ${SUBPACKAGE} DIRECTORY)
GET_FILENAME_COMPONENT(SUBPACKAGE_DIR ${SUBPACKAGE_CMAKE} DIRECTORY)
ADD_SUBDIRECTORY(${SUBPACKAGE_DIR})
ENDFOREACH()
ENDMACRO(TRIBITS_PROCESS_SUBPACKAGES)
MACRO(TRIBITS_PACKAGE_DEF)
ENDMACRO(TRIBITS_PACKAGE_DEF)
MACRO(TRIBITS_EXCLUDE_AUTOTOOLS_FILES)
ENDMACRO(TRIBITS_EXCLUDE_AUTOTOOLS_FILES)
MACRO(TRIBITS_EXCLUDE_FILES)
ENDMACRO(TRIBITS_EXCLUDE_FILES)
MACRO(TRIBITS_PACKAGE_POSTPROCESS)
ENDMACRO(TRIBITS_PACKAGE_POSTPROCESS)
diff --git a/lib/kokkos/config/configure_compton_cpu.sh b/lib/kokkos/config/configure_compton_cpu.sh
old mode 100755
new mode 100644
diff --git a/lib/kokkos/config/configure_compton_mic.sh b/lib/kokkos/config/configure_compton_mic.sh
old mode 100755
new mode 100644
diff --git a/lib/kokkos/config/configure_kokkos.sh b/lib/kokkos/config/configure_kokkos.sh
old mode 100755
new mode 100644
diff --git a/lib/kokkos/config/configure_kokkos_nvidia.sh b/lib/kokkos/config/configure_kokkos_nvidia.sh
old mode 100755
new mode 100644
diff --git a/lib/kokkos/config/configure_shannon.sh b/lib/kokkos/config/configure_shannon.sh
old mode 100755
new mode 100644
diff --git a/lib/kokkos/config/kokkos-trilinos-integration-procedure.txt b/lib/kokkos/config/kokkos-trilinos-integration-procedure.txt
index 9f56f2fd4..961e4186e 100644
--- a/lib/kokkos/config/kokkos-trilinos-integration-procedure.txt
+++ b/lib/kokkos/config/kokkos-trilinos-integration-procedure.txt
@@ -1,153 +1,164 @@
// -------------------------------------------------------------------------------- //
The following steps are for workstations/servers with the SEMS environment installed.
// -------------------------------------------------------------------------------- //
Summary:
- Step 1: Rigorous testing of Kokkos' develop branch for each backend (Serial, OpenMP, Threads, Cuda) with all supported compilers.
- Step 2: Snapshot Kokkos' develop branch into current Trilinos develop branch.
- Step 3: Build and test Trilinos with combinations of compilers, types, backends.
- Step 4: Promote Kokkos develop branch to master if the snapshot does not cause any new tests to fail; else track/fix causes of new failures.
- Step 5: Snapshot Kokkos tagged master branch into Trilinos and push Trilinos.
// -------------------------------------------------------------------------------- //
// -------------------------------------------------------------------------------- //
Step 1:
1.1. Update kokkos develop branch (NOT a fork)
(From kokkos directory):
git fetch --all
git checkout develop
git reset --hard origin/develop
1.2. Create a testing directory - here the directory is created within the kokkos directory
mkdir testing
cd testing
1.3. Run the test_all_sandia script; various compiler and build-list options can be specified
../config/test_all_sandia
1.4 Clean repository of untracked files
cd ../
git clean -df
// -------------------------------------------------------------------------------- //
Step 2:
2.1 Update Trilinos develop branch
(From Trilinos directory):
git checkout develop
git fetch --all
git reset --hard origin/develop
git clean -df
2.2 Snapshot Kokkos into Trilinos - this requires python/2.7.9 and that both Trilinos and Kokkos be clean - no untracked or modified files
module load python/2.7.9
python KOKKOS_PATH/config/snapshot.py KOKKOS_PATH TRILINOS_PATH/packages
// -------------------------------------------------------------------------------- //
Step 3:
3.1. Build and test Trilinos with 3 different configurations; a configure-all script is provided in Trilinos and should be modified to test each of the following 3 configurations with appropriate environment variable(s):
- GCC/4.7.2-OpenMP/Complex
Run tests with the following environment variable:
export OMP_NUM_THREADS=2
- Intel/15.0.2-Serial/NoComplex
- GCC/4.8.4/CUDA/7.5.18-Cuda/Serial/NoComplex
Run tests with the following environment variables:
export CUDA_LAUNCH_BLOCKING=1
export CUDA_MANAGED_FORCE_DEVICE_ALLOC=1
mkdir Build
cd Build
cp TRILINOS_PATH/sampleScripts/Sandia-SEMS/configure-all ./
** Set the path to Trilinos appropriately within the configure-all script **
source $SEMS_MODULE_ROOT/utils/sems-modules-init.sh kokkos
source configure-all
make -k (-k means "keep going" to get past build errors; -j12 can also be specified to build with 12 threads, for example)
ctest
3.2. Compare the failed test output to the test output on the dashboard ( testing.sandia.gov/cdash select Trilinos ); investigate and fix problems if new tests fail after the Kokkos snapshot
// -------------------------------------------------------------------------------- //
-Step 4:
- 4.1. Once all Trilinos tests pass promote Kokkos develop branch to master on Github
+Step 4: Once all Trilinos tests pass promote Kokkos develop branch to master on Github
+ 4.1. Generate Changelog (You need a github API token)
+
+ Close all Open issues with "InDevelop" tag on github
+
+ (Not from kokkos directory)
+ gitthub_changelog_generator kokkos/kokkos --token TOKEN --no-pull-requests --include-labels 'InDevelop' --enhancement-labels 'enhancement,Feature Request' --future-release 'NEWTAG' --between-tags 'NEWTAG,OLDTAG'
+
+ (Copy the new section from the generated CHANGELOG.md to the kokkos/CHANGELOG.md)
+ (Make desired changes to CHANGELOG.md to enhance clarity)
+ (Commit and push the CHANGELOG to develop)
+ 4.2 Merge develop into Master
+
- DO NOT fast-forward the merge!!!!
(From kokkos directory):
git checkout master
git fetch --all
# Ensure we are on the current origin/master
git reset --hard origin/master
git merge --no-ff origin/develop
- 4.2. Update the tag in kokkos/config/master_history.txt
+ 4.3. Update the tag in kokkos/config/master_history.txt
Tag description: MajorNumber.MinorNumber.WeeksSinceMinorNumberUpdate
Tag format: #.#.##
# Prepend master_history.txt with
# tag: #.#.##
# date: mm/dd/yyyy
# master: sha1
# develop: sha1
# -----------------------
git commit --amend -a
git tag -a #.#.##
tag: #.#.##
date: mm/dd/yyyy
master: sha1
develop: sha1
git push --follow-tags origin master
// -------------------------------------------------------------------------------- //
Step 5:
5.1. Make sure Trilinos is up-to-date - chances are other changes have been committed since the integration testing process began. If a substantial change has occurred that may be affected by the snapshot the testing procedure may need to be repeated
(From Trilinos directory):
git checkout develop
git fetch --all
git reset --hard origin/develop
git clean -df
5.2. Snapshot Kokkos master branch into Trilinos
(From kokkos directory):
git fetch --all
git checkout tags/#.#.##
git clean -df
python KOKKOS_PATH/config/snapshot.py KOKKOS_PATH TRILINOS_PATH/packages
5.3. Push the updated develop branch of Trilinos to Github - congratulations!!!
(From Trilinos directory):
git push
// -------------------------------------------------------------------------------- //
diff --git a/lib/kokkos/config/master_history.txt b/lib/kokkos/config/master_history.txt
index f2eb67457..78c512cce 100644
--- a/lib/kokkos/config/master_history.txt
+++ b/lib/kokkos/config/master_history.txt
@@ -1,3 +1,6 @@
tag: 2.01.00 date: 07:21:2016 master: xxxxxxxx develop: fa6dfcc4
tag: 2.01.06 date: 09:02:2016 master: 9afaa87f develop: 555f1a3a
-
+tag: 2.01.10 date: 09:27:2016 master: e4119325 develop: e6cda11e
+tag: 2.02.00 date: 10:30:2016 master: 6c90a581 develop: ca3dd56e
+tag: 2.02.01 date: 11:01:2016 master: 9c698c86 develop: b0072304
+tag: 2.02.07 date: 12:16:2016 master: 4b4cc4ba develop: 382c0966
diff --git a/lib/kokkos/config/nvcc_wrapper b/lib/kokkos/config/nvcc_wrapper
index 6093cb61b..cb206cf88 100755
--- a/lib/kokkos/config/nvcc_wrapper
+++ b/lib/kokkos/config/nvcc_wrapper
@@ -1,280 +1,284 @@
#!/bin/bash
#
# This shell script (nvcc_wrapper) wraps both the host compiler and
# NVCC, if you are building legacy C or C++ code with CUDA enabled.
# The script remedies some differences between the interface of NVCC
# and that of the host compiler, in particular for linking.
# It also means that a legacy code doesn't need separate .cu files;
# it can just use .cpp files.
#
# Default settings: change those according to your machine. For
# example, you may have have two different wrappers with either icpc
# or g++ as their back-end compiler. The defaults can be overwritten
# by using the usual arguments (e.g., -arch=sm_30 -ccbin icpc).
default_arch="sm_35"
#default_arch="sm_50"
#
# The default C++ compiler.
#
host_compiler=${NVCC_WRAPPER_DEFAULT_COMPILER:-"g++"}
#host_compiler="icpc"
#host_compiler="/usr/local/gcc/4.8.3/bin/g++"
#host_compiler="/usr/local/gcc/4.9.1/bin/g++"
#
# Internal variables
#
# C++ files
cpp_files=""
# Host compiler arguments
xcompiler_args=""
# Cuda (NVCC) only arguments
cuda_args=""
# Arguments for both NVCC and Host compiler
shared_args=""
# Linker arguments
xlinker_args=""
# Object files passable to NVCC
object_files=""
# Link objects for the host linker only
object_files_xlinker=""
# Shared libraries with version numbers are not handled correctly by NVCC
shared_versioned_libraries_host=""
shared_versioned_libraries=""
# Does the User set the architecture
arch_set=0
# Does the user overwrite the host compiler
ccbin_set=0
#Error code of compilation
error_code=0
# Do a dry run without actually compiling
dry_run=0
# Skip NVCC compilation and use host compiler directly
host_only=0
# Enable workaround for CUDA 6.5 for pragma ident
replace_pragma_ident=0
# Mark first host compiler argument
first_xcompiler_arg=1
temp_dir=${TMPDIR:-/tmp}
# Check if we have an optimization argument already
optimization_applied=0
#echo "Arguments: $# $@"
while [ $# -gt 0 ]
do
case $1 in
#show the executed command
--show|--nvcc-wrapper-show)
dry_run=1
;;
#run host compilation only
--host-only)
host_only=1
;;
#replace '#pragma ident' with '#ident' this is needed to compile OpenMPI due to a configure script bug and a non standardized behaviour of pragma with macros
--replace-pragma-ident)
replace_pragma_ident=1
;;
#handle source files to be compiled as cuda files
*.cpp|*.cxx|*.cc|*.C|*.c++|*.cu)
cpp_files="$cpp_files $1"
;;
# Ensure we only have one optimization flag because NVCC doesn't allow muliple
-O*)
if [ $optimization_applied -eq 1 ]; then
echo "nvcc_wrapper - *warning* you have set multiple optimization flags (-O*), only the first is used because nvcc can only accept a single optimization setting."
else
shared_args="$shared_args $1"
optimization_applied=1
fi
;;
#Handle shared args (valid for both nvcc and the host compiler)
-D*|-c|-I*|-L*|-l*|-g|--help|--version|-E|-M|-shared)
shared_args="$shared_args $1"
;;
#Handle shared args that have an argument
-o|-MT)
shared_args="$shared_args $1 $2"
shift
;;
#Handle known nvcc args
-gencode*|--dryrun|--verbose|--keep|--keep-dir*|-G|--relocatable-device-code*|-lineinfo|-expt-extended-lambda|--resource-usage|-Xptxas*)
cuda_args="$cuda_args $1"
;;
+ #Handle more known nvcc args
+ --expt-extended-lambda|--expt-relaxed-constexpr)
+ cuda_args="$cuda_args $1"
+ ;;
#Handle known nvcc args that have an argument
-rdc|-maxrregcount|--default-stream)
cuda_args="$cuda_args $1 $2"
shift
;;
#Handle c++11 setting
--std=c++11|-std=c++11)
shared_args="$shared_args $1"
;;
#strip of -std=c++98 due to nvcc warnings and Tribits will place both -std=c++11 and -std=c++98
-std=c++98|--std=c++98)
;;
#strip of pedantic because it produces endless warnings about #LINE added by the preprocessor
-pedantic|-Wpedantic|-ansi)
;;
#strip -Xcompiler because we add it
-Xcompiler)
if [ $first_xcompiler_arg -eq 1 ]; then
xcompiler_args="$2"
first_xcompiler_arg=0
else
xcompiler_args="$xcompiler_args,$2"
fi
shift
;;
#strip of "-x cu" because we add that
-x)
if [[ $2 != "cu" ]]; then
if [ $first_xcompiler_arg -eq 1 ]; then
xcompiler_args="-x,$2"
first_xcompiler_arg=0
else
xcompiler_args="$xcompiler_args,-x,$2"
fi
fi
shift
;;
#Handle -ccbin (if its not set we can set it to a default value)
-ccbin)
cuda_args="$cuda_args $1 $2"
ccbin_set=1
host_compiler=$2
shift
;;
#Handle -arch argument (if its not set use a default
-arch*)
cuda_args="$cuda_args $1"
arch_set=1
;;
#Handle -Xcudafe argument
-Xcudafe)
cuda_args="$cuda_args -Xcudafe $2"
shift
;;
#Handle args that should be sent to the linker
-Wl*)
xlinker_args="$xlinker_args -Xlinker ${1:4:${#1}}"
host_linker_args="$host_linker_args ${1:4:${#1}}"
;;
#Handle object files: -x cu applies to all input files, so give them to linker, except if only linking
*.a|*.so|*.o|*.obj)
object_files="$object_files $1"
object_files_xlinker="$object_files_xlinker -Xlinker $1"
;;
#Handle object files which always need to use "-Xlinker": -x cu applies to all input files, so give them to linker, except if only linking
*.dylib)
object_files="$object_files -Xlinker $1"
object_files_xlinker="$object_files_xlinker -Xlinker $1"
;;
#Handle shared libraries with *.so.* names which nvcc can't do.
*.so.*)
shared_versioned_libraries_host="$shared_versioned_libraries_host $1"
shared_versioned_libraries="$shared_versioned_libraries -Xlinker $1"
;;
#All other args are sent to the host compiler
*)
if [ $first_xcompiler_arg -eq 1 ]; then
xcompiler_args=$1
first_xcompiler_arg=0
else
xcompiler_args="$xcompiler_args,$1"
fi
;;
esac
shift
done
#Add default host compiler if necessary
if [ $ccbin_set -ne 1 ]; then
cuda_args="$cuda_args -ccbin $host_compiler"
fi
#Add architecture command
if [ $arch_set -ne 1 ]; then
cuda_args="$cuda_args -arch=$default_arch"
fi
#Compose compilation command
nvcc_command="nvcc $cuda_args $shared_args $xlinker_args $shared_versioned_libraries"
if [ $first_xcompiler_arg -eq 0 ]; then
nvcc_command="$nvcc_command -Xcompiler $xcompiler_args"
fi
#Compose host only command
host_command="$host_compiler $shared_args $xcompiler_args $host_linker_args $shared_versioned_libraries_host"
#nvcc does not accept '#pragma ident SOME_MACRO_STRING' but it does accept '#ident SOME_MACRO_STRING'
if [ $replace_pragma_ident -eq 1 ]; then
cpp_files2=""
for file in $cpp_files
do
var=`grep pragma ${file} | grep ident | grep "#"`
if [ "${#var}" -gt 0 ]
then
sed 's/#[\ \t]*pragma[\ \t]*ident/#ident/g' $file > $temp_dir/nvcc_wrapper_tmp_$file
cpp_files2="$cpp_files2 $temp_dir/nvcc_wrapper_tmp_$file"
else
cpp_files2="$cpp_files2 $file"
fi
done
cpp_files=$cpp_files2
#echo $cpp_files
fi
if [ "$cpp_files" ]; then
nvcc_command="$nvcc_command $object_files_xlinker -x cu $cpp_files"
else
nvcc_command="$nvcc_command $object_files"
fi
if [ "$cpp_files" ]; then
host_command="$host_command $object_files $cpp_files"
else
host_command="$host_command $object_files"
fi
#Print command for dryrun
if [ $dry_run -eq 1 ]; then
if [ $host_only -eq 1 ]; then
echo $host_command
else
echo $nvcc_command
fi
exit 0
fi
#Run compilation command
if [ $host_only -eq 1 ]; then
$host_command
else
$nvcc_command
fi
error_code=$?
#Report error code
exit $error_code
diff --git a/lib/kokkos/config/test_all_sandia b/lib/kokkos/config/test_all_sandia
index aac036a8f..21b8bbff6 100755
--- a/lib/kokkos/config/test_all_sandia
+++ b/lib/kokkos/config/test_all_sandia
@@ -1,539 +1,651 @@
#!/bin/bash -e
#
# Global config
#
set -o pipefail
# Determine current machine
MACHINE=""
HOSTNAME=$(hostname)
if [[ "$HOSTNAME" =~ (white|ride).* ]]; then
MACHINE=white
elif [[ "$HOSTNAME" =~ .*bowman.* ]]; then
MACHINE=bowman
elif [[ "$HOSTNAME" =~ node.* ]]; then # Warning: very generic name
MACHINE=shepard
+elif [[ "$HOSTNAME" =~ apollo ]]; then
+ MACHINE=apollo
elif [ ! -z "$SEMS_MODULEFILES_ROOT" ]; then
MACHINE=sems
else
echo "Unrecognized machine" >&2
exit 1
fi
GCC_BUILD_LIST="OpenMP,Pthread,Serial,OpenMP_Serial,Pthread_Serial"
IBM_BUILD_LIST="OpenMP,Serial,OpenMP_Serial"
INTEL_BUILD_LIST="OpenMP,Pthread,Serial,OpenMP_Serial,Pthread_Serial"
CLANG_BUILD_LIST="Pthread,Serial,Pthread_Serial"
CUDA_BUILD_LIST="Cuda_OpenMP,Cuda_Pthread,Cuda_Serial"
+CUDA_IBM_BUILD_LIST="Cuda_OpenMP,Cuda_Serial"
GCC_WARNING_FLAGS="-Wall,-Wshadow,-pedantic,-Werror,-Wsign-compare,-Wtype-limits,-Wignored-qualifiers,-Wempty-body,-Wclobbered,-Wuninitialized"
IBM_WARNING_FLAGS="-Wall,-Wshadow,-pedantic,-Werror,-Wsign-compare,-Wtype-limits,-Wuninitialized"
CLANG_WARNING_FLAGS="-Wall,-Wshadow,-pedantic,-Werror,-Wsign-compare,-Wtype-limits,-Wuninitialized"
INTEL_WARNING_FLAGS="-Wall,-Wshadow,-pedantic,-Werror,-Wsign-compare,-Wtype-limits,-Wuninitialized"
CUDA_WARNING_FLAGS=""
# Default. Machine specific can override
DEBUG=False
ARGS=""
CUSTOM_BUILD_LIST=""
DRYRUN=False
BUILD_ONLY=False
declare -i NUM_JOBS_TO_RUN_IN_PARALLEL=3
TEST_SCRIPT=False
SKIP_HWLOC=False
+SPOT_CHECK=False
-ARCH_FLAG=""
+PRINT_HELP=False
+OPT_FLAG=""
+KOKKOS_OPTIONS=""
+
+
+#
+# Handle arguments
+#
+
+while [[ $# > 0 ]]
+do
+key="$1"
+case $key in
+--kokkos-path*)
+KOKKOS_PATH="${key#*=}"
+;;
+--build-list*)
+CUSTOM_BUILD_LIST="${key#*=}"
+;;
+--debug*)
+DEBUG=True
+;;
+--build-only*)
+BUILD_ONLY=True
+;;
+--test-script*)
+TEST_SCRIPT=True
+;;
+--skip-hwloc*)
+SKIP_HWLOC=True
+;;
+--num*)
+NUM_JOBS_TO_RUN_IN_PARALLEL="${key#*=}"
+;;
+--dry-run*)
+DRYRUN=True
+;;
+--spot-check*)
+SPOT_CHECK=True
+;;
+--arch*)
+ARCH_FLAG="--arch=${key#*=}"
+;;
+--opt-flag*)
+OPT_FLAG="${key#*=}"
+;;
+--with-cuda-options*)
+KOKKOS_CUDA_OPTIONS="--with-cuda-options=${key#*=}"
+;;
+--help*)
+PRINT_HELP=True
+;;
+*)
+# args, just append
+ARGS="$ARGS $1"
+;;
+esac
+shift
+done
+
+SCRIPT_KOKKOS_ROOT=$( cd "$( dirname "$0" )" && cd .. && pwd )
+
+# set kokkos path
+if [ -z "$KOKKOS_PATH" ]; then
+ KOKKOS_PATH=$SCRIPT_KOKKOS_ROOT
+else
+ # Ensure KOKKOS_PATH is abs path
+ KOKKOS_PATH=$( cd $KOKKOS_PATH && pwd )
+fi
#
# Machine specific config
#
if [ "$MACHINE" = "sems" ]; then
- source /projects/modulefiles/utils/sems-modules-init.sh
- source /projects/modulefiles/utils/kokkos-modules-init.sh
+ source /projects/sems/modulefiles/utils/sems-modules-init.sh
+
+ BASE_MODULE_LIST="sems-env,kokkos-env,sems-<COMPILER_NAME>/<COMPILER_VERSION>,kokkos-hwloc/1.10.1/base"
+ CUDA_MODULE_LIST="sems-env,kokkos-env,kokkos-<COMPILER_NAME>/<COMPILER_VERSION>,sems-gcc/4.8.4,kokkos-hwloc/1.10.1/base"
+ CUDA8_MODULE_LIST="sems-env,kokkos-env,kokkos-<COMPILER_NAME>/<COMPILER_VERSION>,sems-gcc/5.3.0,kokkos-hwloc/1.10.1/base"
- BASE_MODULE_LIST="<COMPILER_NAME>/<COMPILER_VERSION>/base,hwloc/1.10.1/<COMPILER_NAME>/<COMPILER_VERSION>/base"
- CUDA_MODULE_LIST="<COMPILER_NAME>/<COMPILER_VERSION>,gcc/4.7.2/base"
+ if [ -z "$ARCH_FLAG" ]; then
+ ARCH_FLAG=""
+ fi
+ if [ "$SPOT_CHECK" = "True" ]; then
+ # Format: (compiler module-list build-list exe-name warning-flag)
+ COMPILERS=("gcc/4.7.2 $BASE_MODULE_LIST "OpenMP,Pthread" g++ $GCC_WARNING_FLAGS"
+ "gcc/5.1.0 $BASE_MODULE_LIST "Serial" g++ $GCC_WARNING_FLAGS"
+ "intel/16.0.1 $BASE_MODULE_LIST "OpenMP" icpc $INTEL_WARNING_FLAGS"
+ "clang/3.9.0 $BASE_MODULE_LIST "Pthread_Serial" clang++ $CLANG_WARNING_FLAGS"
+ "cuda/8.0.44 $CUDA8_MODULE_LIST "Cuda_OpenMP" $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
+ )
+ else
# Format: (compiler module-list build-list exe-name warning-flag)
COMPILERS=("gcc/4.7.2 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
"gcc/4.8.4 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
"gcc/4.9.2 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
"gcc/5.1.0 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
"intel/14.0.4 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
"intel/15.0.2 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
"intel/16.0.1 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
- "clang/3.5.2 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
"clang/3.6.1 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
- "cuda/6.5.14 $CUDA_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
+ "clang/3.7.1 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
+ "clang/3.8.1 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
+ "clang/3.9.0 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
"cuda/7.0.28 $CUDA_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
"cuda/7.5.18 $CUDA_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
+ "cuda/8.0.44 $CUDA8_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
)
+ fi
elif [ "$MACHINE" = "white" ]; then
source /etc/profile.d/modules.sh
SKIP_HWLOC=True
export SLURM_TASKS_PER_NODE=32
BASE_MODULE_LIST="<COMPILER_NAME>/<COMPILER_VERSION>"
IBM_MODULE_LIST="<COMPILER_NAME>/xl/<COMPILER_VERSION>"
- CUDA_MODULE_LIST="<COMPILER_NAME>/<COMPILER_VERSION>,gcc/4.9.2"
+ CUDA_MODULE_LIST="<COMPILER_NAME>/<COMPILER_VERSION>,gcc/5.4.0"
# Don't do pthread on white
GCC_BUILD_LIST="OpenMP,Serial,OpenMP_Serial"
# Format: (compiler module-list build-list exe-name warning-flag)
- COMPILERS=("gcc/4.9.2 $BASE_MODULE_LIST $IBM_BUILD_LIST g++ $GCC_WARNING_FLAGS"
- "gcc/5.3.0 $BASE_MODULE_LIST $IBM_BUILD_LIST g++ $GCC_WARNING_FLAGS"
+ COMPILERS=("gcc/5.4.0 $BASE_MODULE_LIST $IBM_BUILD_LIST g++ $GCC_WARNING_FLAGS"
"ibm/13.1.3 $IBM_MODULE_LIST $IBM_BUILD_LIST xlC $IBM_WARNING_FLAGS"
+ "cuda/8.0.44 $CUDA_MODULE_LIST $CUDA_IBM_BUILD_LIST ${KOKKOS_PATH}/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
)
-
- ARCH_FLAG="--arch=Power8"
- NUM_JOBS_TO_RUN_IN_PARALLEL=8
+ if [ -z "$ARCH_FLAG" ]; then
+ ARCH_FLAG="--arch=Power8,Kepler37"
+ fi
+ NUM_JOBS_TO_RUN_IN_PARALLEL=2
elif [ "$MACHINE" = "bowman" ]; then
source /etc/profile.d/modules.sh
SKIP_HWLOC=True
export SLURM_TASKS_PER_NODE=32
BASE_MODULE_LIST="<COMPILER_NAME>/compilers/<COMPILER_VERSION>"
OLD_INTEL_BUILD_LIST="Pthread,Serial,Pthread_Serial"
# Format: (compiler module-list build-list exe-name warning-flag)
COMPILERS=("intel/16.2.181 $BASE_MODULE_LIST $OLD_INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
- "intel/17.0.064 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
+ "intel/17.0.098 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
)
- ARCH_FLAG="--arch=KNL"
- NUM_JOBS_TO_RUN_IN_PARALLEL=8
+ if [ -z "$ARCH_FLAG" ]; then
+ ARCH_FLAG="--arch=KNL"
+ fi
+ NUM_JOBS_TO_RUN_IN_PARALLEL=2
elif [ "$MACHINE" = "shepard" ]; then
source /etc/profile.d/modules.sh
SKIP_HWLOC=True
export SLURM_TASKS_PER_NODE=32
BASE_MODULE_LIST="<COMPILER_NAME>/compilers/<COMPILER_VERSION>"
OLD_INTEL_BUILD_LIST="Pthread,Serial,Pthread_Serial"
# Format: (compiler module-list build-list exe-name warning-flag)
COMPILERS=("intel/16.2.181 $BASE_MODULE_LIST $OLD_INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
- "intel/17.0.064 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
+ "intel/17.0.098 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
)
- ARCH_FLAG="--arch=HSW"
- NUM_JOBS_TO_RUN_IN_PARALLEL=8
+ if [ -z "$ARCH_FLAG" ]; then
+ ARCH_FLAG="--arch=HSW"
+ fi
+ NUM_JOBS_TO_RUN_IN_PARALLEL=2
+
+elif [ "$MACHINE" = "apollo" ]; then
+ source /projects/sems/modulefiles/utils/sems-modules-init.sh
+ module use /home/projects/modulefiles/local/x86-64
+ module load kokkos-env
+
+ module load sems-git
+ module load sems-tex
+ module load sems-cmake/3.5.2
+ module load sems-gdb
+
+ SKIP_HWLOC=True
+
+ BASE_MODULE_LIST="sems-env,kokkos-env,sems-<COMPILER_NAME>/<COMPILER_VERSION>,kokkos-hwloc/1.10.1/base"
+ CUDA_MODULE_LIST="sems-env,kokkos-env,kokkos-<COMPILER_NAME>/<COMPILER_VERSION>,sems-gcc/4.8.4,kokkos-hwloc/1.10.1/base"
+ CUDA8_MODULE_LIST="sems-env,kokkos-env,kokkos-<COMPILER_NAME>/<COMPILER_VERSION>,sems-gcc/5.3.0,kokkos-hwloc/1.10.1/base"
+
+ CLANG_MODULE_LIST="sems-env,kokkos-env,sems-git,sems-cmake/3.5.2,<COMPILER_NAME>/<COMPILER_VERSION>,cuda/8.0.44"
+ NVCC_MODULE_LIST="sems-env,kokkos-env,sems-git,sems-cmake/3.5.2,<COMPILER_NAME>/<COMPILER_VERSION>,sems-gcc/5.3.0"
+
+ BUILD_LIST_CUDA_NVCC="Cuda_Serial,Cuda_OpenMP"
+ BUILD_LIST_CUDA_CLANG="Cuda_Serial,Cuda_Pthread"
+ BUILD_LIST_CLANG="Serial,Pthread,OpenMP"
+ if [ "$SPOT_CHECK" = "True" ]; then
+ # Format: (compiler module-list build-list exe-name warning-flag)
+ COMPILERS=("gcc/4.7.2 $BASE_MODULE_LIST "OpenMP,Pthread" g++ $GCC_WARNING_FLAGS"
+ "gcc/5.1.0 $BASE_MODULE_LIST "Serial" g++ $GCC_WARNING_FLAGS"
+ "intel/16.0.1 $BASE_MODULE_LIST "OpenMP" icpc $INTEL_WARNING_FLAGS"
+ "clang/3.9.0 $BASE_MODULE_LIST "Pthread_Serial" clang++ $CLANG_WARNING_FLAGS"
+ "clang/head $CLANG_MODULE_LIST "Cuda_Pthread" clang++ $CUDA_WARNING_FLAGS"
+ "cuda/8.0.44 $CUDA_MODULE_LIST "Cuda_OpenMP" $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
+ )
+ else
+ # Format: (compiler module-list build-list exe-name warning-flag)
+ COMPILERS=("cuda/8.0.44 $CUDA8_MODULE_LIST $BUILD_LIST_CUDA_NVCC $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
+ "clang/head $CLANG_MODULE_LIST $BUILD_LIST_CUDA_CLANG clang++ $CUDA_WARNING_FLAGS"
+ "clang/3.9.0 $CLANG_MODULE_LIST $BUILD_LIST_CLANG clang++ $CLANG_WARNING_FLAGS"
+ "gcc/4.7.2 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
+ "gcc/4.8.4 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
+ "gcc/4.9.2 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
+ "gcc/5.3.0 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
+ "gcc/6.1.0 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
+ "intel/14.0.4 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
+ "intel/15.0.2 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
+ "intel/16.0.1 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
+ "clang/3.5.2 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
+ "clang/3.6.1 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
+ "cuda/7.0.28 $CUDA_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
+ "cuda/7.5.18 $CUDA_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
+ )
+ fi
+
+ if [ -z "$ARCH_FLAG" ]; then
+ ARCH_FLAG="--arch=SNB,Kepler35"
+ fi
+ NUM_JOBS_TO_RUN_IN_PARALLEL=2
else
echo "Unhandled machine $MACHINE" >&2
exit 1
fi
+
+
export OMP_NUM_THREADS=4
declare -i NUM_RESULTS_TO_KEEP=7
RESULT_ROOT_PREFIX=TestAll
-SCRIPT_KOKKOS_ROOT=$( cd "$( dirname "$0" )" && cd .. && pwd )
-
-#
-# Handle arguments
-#
-
-while [[ $# > 0 ]]
-do
-key="$1"
-case $key in
---kokkos-path*)
-KOKKOS_PATH="${key#*=}"
-;;
---build-list*)
-CUSTOM_BUILD_LIST="${key#*=}"
-;;
---debug*)
-DEBUG=True
-;;
---build-only*)
-BUILD_ONLY=True
-;;
---test-script*)
-TEST_SCRIPT=True
-;;
---skip-hwloc*)
-SKIP_HWLOC=True
-;;
---num*)
-NUM_JOBS_TO_RUN_IN_PARALLEL="${key#*=}"
-;;
---dry-run*)
-DRYRUN=True
-;;
---help)
+if [ "$PRINT_HELP" = "True" ]; then
echo "test_all_sandia <ARGS> <OPTIONS>:"
echo "--kokkos-path=/Path/To/Kokkos: Path to the Kokkos root directory"
echo " Defaults to root repo containing this script"
echo "--debug: Run tests in debug. Defaults to False"
echo "--test-script: Test this script, not Kokkos"
echo "--skip-hwloc: Do not do hwloc tests"
echo "--num=N: Number of jobs to run in parallel "
echo "--dry-run: Just print what would be executed"
echo "--build-only: Just do builds, don't run anything"
+echo "--opt-flag=FLAG: Optimization flag (default: -O3)"
+echo "--arch=ARCHITECTURE: overwrite architecture flags"
+echo "--with-cuda-options=OPT: set KOKKOS_CUDA_OPTIONS"
echo "--build-list=BUILD,BUILD,BUILD..."
echo " Provide a comma-separated list of builds instead of running all builds"
echo " Valid items:"
echo " OpenMP, Pthread, Serial, OpenMP_Serial, Pthread_Serial"
echo " Cuda_OpenMP, Cuda_Pthread, Cuda_Serial"
echo ""
echo "ARGS: list of expressions matching compilers to test"
echo " supported compilers sems"
for COMPILER_DATA in "${COMPILERS[@]}"; do
ARR=($COMPILER_DATA)
COMPILER=${ARR[0]}
echo " $COMPILER"
done
echo ""
echo "Examples:"
echo " Run all tests"
echo " % test_all_sandia"
echo ""
echo " Run all gcc tests"
echo " % test_all_sandia gcc"
echo ""
echo " Run all gcc/4.7.2 and all intel tests"
echo " % test_all_sandia gcc/4.7.2 intel"
echo ""
echo " Run all tests in debug"
echo " % test_all_sandia --debug"
echo ""
echo " Run gcc/4.7.2 and only do OpenMP and OpenMP_Serial builds"
echo " % test_all_sandia gcc/4.7.2 --build-list=OpenMP,OpenMP_Serial"
echo ""
echo "If you want to kill the tests, do:"
echo " hit ctrl-z"
echo " % kill -9 %1"
echo
exit 0
-;;
-*)
-# args, just append
-ARGS="$ARGS $1"
-;;
-esac
-shift
-done
-
-# set kokkos path
-if [ -z "$KOKKOS_PATH" ]; then
- KOKKOS_PATH=$SCRIPT_KOKKOS_ROOT
-else
- # Ensure KOKKOS_PATH is abs path
- KOKKOS_PATH=$( cd $KOKKOS_PATH && pwd )
fi
# set build type
if [ "$DEBUG" = "True" ]; then
BUILD_TYPE=debug
else
BUILD_TYPE=release
fi
# If no args provided, do all compilers
if [ -z "$ARGS" ]; then
ARGS='?'
fi
# Process args to figure out which compilers to test
COMPILERS_TO_TEST=""
for ARG in $ARGS; do
for COMPILER_DATA in "${COMPILERS[@]}"; do
ARR=($COMPILER_DATA)
COMPILER=${ARR[0]}
if [[ "$COMPILER" = $ARG* ]]; then
if [[ "$COMPILERS_TO_TEST" != *${COMPILER}* ]]; then
COMPILERS_TO_TEST="$COMPILERS_TO_TEST $COMPILER"
else
echo "Tried to add $COMPILER twice"
fi
fi
done
done
#
# Functions
#
# get_compiler_name <COMPILER>
get_compiler_name() {
echo $1 | cut -d/ -f1
}
# get_compiler_version <COMPILER>
get_compiler_version() {
echo $1 | cut -d/ -f2
}
# Do not call directly
get_compiler_data() {
local compiler=$1
local item=$2
local compiler_name=$(get_compiler_name $compiler)
local compiler_vers=$(get_compiler_version $compiler)
local compiler_data
for compiler_data in "${COMPILERS[@]}" ; do
local arr=($compiler_data)
if [ "$compiler" = "${arr[0]}" ]; then
echo "${arr[$item]}" | tr , ' ' | sed -e "s/<COMPILER_NAME>/$compiler_name/g" -e "s/<COMPILER_VERSION>/$compiler_vers/g"
return 0
fi
done
# Not found
echo "Unreconized compiler $compiler" >&2
exit 1
}
#
# For all getters, usage: <GETTER> <COMPILER>
#
get_compiler_modules() {
get_compiler_data $1 1
}
get_compiler_build_list() {
get_compiler_data $1 2
}
get_compiler_exe_name() {
get_compiler_data $1 3
}
get_compiler_warning_flags() {
get_compiler_data $1 4
}
run_cmd() {
echo "RUNNING: $*"
if [ "$DRYRUN" != "True" ]; then
eval "$* 2>&1"
fi
}
# report_and_log_test_results <SUCCESS> <DESC> <COMMENT>
report_and_log_test_result() {
# Use sane var names
local success=$1; local desc=$2; local comment=$3;
if [ "$success" = "0" ]; then
echo " PASSED $desc"
echo $comment > $PASSED_DIR/$desc
else
# For failures, comment should be the name of the phase that failed
echo " FAILED $desc" >&2
echo $comment > $FAILED_DIR/$desc
cat ${desc}.${comment}.log
fi
}
setup_env() {
local compiler=$1
local compiler_modules=$(get_compiler_modules $compiler)
module purge
local mod
for mod in $compiler_modules; do
echo "Loading module $mod"
module load $mod 2>&1
# It is ridiculously hard to check for the success of a loaded
# module. Module does not return error codes and piping to grep
# causes module to run in a subshell.
module list 2>&1 | grep "$mod" >& /dev/null || return 1
done
return 0
}
# single_build_and_test <COMPILER> <BUILD> <BUILD_TYPE>
single_build_and_test() {
# Use sane var names
local compiler=$1; local build=$2; local build_type=$3;
# set up env
mkdir -p $ROOT_DIR/$compiler/"${build}-$build_type"
cd $ROOT_DIR/$compiler/"${build}-$build_type"
local desc=$(echo "${compiler}-${build}-${build_type}" | sed 's:/:-:g')
setup_env $compiler >& ${desc}.configure.log || { report_and_log_test_result 1 ${desc} configure && return 0; }
# Set up flags
local compiler_warning_flags=$(get_compiler_warning_flags $compiler)
local compiler_exe=$(get_compiler_exe_name $compiler)
if [[ "$build_type" = hwloc* ]]; then
local extra_args=--with-hwloc=$(dirname $(dirname $(which hwloc-info)))
fi
+ if [[ "$OPT_FLAG" = "" ]]; then
+ OPT_FLAG="-O3"
+ fi
+
if [[ "$build_type" = *debug* ]]; then
local extra_args="$extra_args --debug"
local cxxflags="-g $compiler_warning_flags"
else
- local cxxflags="-O3 $compiler_warning_flags"
+ local cxxflags="$OPT_FLAG $compiler_warning_flags"
fi
if [[ "$compiler" == cuda* ]]; then
cxxflags="--keep --keep-dir=$(pwd) $cxxflags"
export TMPDIR=$(pwd)
fi
- # cxxflags="-DKOKKOS_USING_EXP_VIEW=1 $cxxflags"
+ if [[ "$KOKKOS_CUDA_OPTIONS" != "" ]]; then
+ local extra_args="$extra_args $KOKKOS_CUDA_OPTIONS"
+ fi
echo " Starting job $desc"
local comment="no_comment"
if [ "$TEST_SCRIPT" = "True" ]; then
local rand=$[ 1 + $[ RANDOM % 10 ]]
sleep $rand
if [ $rand -gt 5 ]; then
run_cmd ls fake_problem >& ${desc}.configure.log || { report_and_log_test_result 1 $desc configure && return 0; }
fi
else
run_cmd ${KOKKOS_PATH}/generate_makefile.bash --with-devices=$build $ARCH_FLAG --compiler=$(which $compiler_exe) --cxxflags=\"$cxxflags\" $extra_args &>> ${desc}.configure.log || { report_and_log_test_result 1 ${desc} configure && return 0; }
local -i build_start_time=$(date +%s)
run_cmd make build-test >& ${desc}.build.log || { report_and_log_test_result 1 ${desc} build && return 0; }
local -i build_end_time=$(date +%s)
comment="build_time=$(($build_end_time-$build_start_time))"
if [[ "$BUILD_ONLY" == False ]]; then
run_cmd make test >& ${desc}.test.log || { report_and_log_test_result 1 ${desc} test && return 0; }
local -i run_end_time=$(date +%s)
comment="$comment run_time=$(($run_end_time-$build_end_time))"
fi
fi
report_and_log_test_result 0 $desc "$comment"
return 0
}
# wait_for_jobs <NUM-JOBS>
wait_for_jobs() {
local -i max_jobs=$1
local -i num_active_jobs=$(jobs | wc -l)
while [ $num_active_jobs -ge $max_jobs ]
do
sleep 1
num_active_jobs=$(jobs | wc -l)
jobs >& /dev/null
done
}
# run_in_background <COMPILER> <BUILD> <BUILD_TYPE>
run_in_background() {
local compiler=$1
local -i num_jobs=$NUM_JOBS_TO_RUN_IN_PARALLEL
- if [[ "$BUILD_ONLY" == True ]]; then
- num_jobs=8
- else
+ # don't override command line input
+ # if [[ "$BUILD_ONLY" == True ]]; then
+ # num_jobs=8
+ # else
if [[ "$compiler" == cuda* ]]; then
num_jobs=1
fi
- fi
+ # fi
wait_for_jobs $num_jobs
single_build_and_test $* &
}
# build_and_test_all <COMPILER>
build_and_test_all() {
# Get compiler data
local compiler=$1
if [ -z "$CUSTOM_BUILD_LIST" ]; then
local compiler_build_list=$(get_compiler_build_list $compiler)
else
local compiler_build_list=$(echo "$CUSTOM_BUILD_LIST" | tr , ' ')
fi
# do builds
local build
for build in $compiler_build_list
do
run_in_background $compiler $build $BUILD_TYPE
# If not cuda, do a hwloc test too
if [[ "$compiler" != cuda* && "$SKIP_HWLOC" == False ]]; then
run_in_background $compiler $build "hwloc-$BUILD_TYPE"
fi
done
return 0
}
get_test_root_dir() {
local existing_results=$(find . -maxdepth 1 -name "$RESULT_ROOT_PREFIX*" | sort)
local -i num_existing_results=$(echo $existing_results | tr ' ' '\n' | wc -l)
local -i num_to_delete=${num_existing_results}-${NUM_RESULTS_TO_KEEP}
if [ $num_to_delete -gt 0 ]; then
/bin/rm -rf $(echo $existing_results | tr ' ' '\n' | head -n $num_to_delete)
fi
echo $(pwd)/${RESULT_ROOT_PREFIX}_$(date +"%Y-%m-%d_%H.%M.%S")
}
wait_summarize_and_exit() {
wait_for_jobs 1
echo "#######################################################"
echo "PASSED TESTS"
echo "#######################################################"
local passed_test
for passed_test in $(\ls -1 $PASSED_DIR | sort)
do
echo $passed_test $(cat $PASSED_DIR/$passed_test)
done
echo "#######################################################"
echo "FAILED TESTS"
echo "#######################################################"
local failed_test
local -i rv=0
for failed_test in $(\ls -1 $FAILED_DIR | sort)
do
echo $failed_test "("$(cat $FAILED_DIR/$failed_test)" failed)"
rv=$rv+1
done
exit $rv
}
#
# Main
#
ROOT_DIR=$(get_test_root_dir)
mkdir -p $ROOT_DIR
cd $ROOT_DIR
PASSED_DIR=$ROOT_DIR/results/passed
FAILED_DIR=$ROOT_DIR/results/failed
mkdir -p $PASSED_DIR
mkdir -p $FAILED_DIR
echo "Going to test compilers: " $COMPILERS_TO_TEST
for COMPILER in $COMPILERS_TO_TEST; do
echo "Testing compiler $COMPILER"
build_and_test_all $COMPILER
done
wait_summarize_and_exit
diff --git a/lib/kokkos/config/trilinos-integration/prepare_trilinos_repos.sh b/lib/kokkos/config/trilinos-integration/prepare_trilinos_repos.sh
new file mode 100755
index 000000000..d2a7a533d
--- /dev/null
+++ b/lib/kokkos/config/trilinos-integration/prepare_trilinos_repos.sh
@@ -0,0 +1,50 @@
+#!/bin/bash -le
+
+export TRILINOS_UPDATED_PATH=${PWD}/trilinos-update
+export TRILINOS_PRISTINE_PATH=${PWD}/trilinos-pristine
+
+#rm -rf ${KOKKOS_PATH}
+#rm -rf ${TRILINOS_UPDATED_PATH}
+#rm -rf ${TRILINOS_PRISTINE_PATH}
+
+#Already done:
+if [ ! -d "${TRILINOS_UPDATED_PATH}" ]; then
+ git clone https://github.com/trilinos/trilinos ${TRILINOS_UPDATED_PATH}
+fi
+if [ ! -d "${TRILINOS_PRISTINE_PATH}" ]; then
+ git clone https://github.com/trilinos/trilinos ${TRILINOS_PRISTINE_PATH}
+fi
+
+cd ${TRILINOS_UPDATED_PATH}
+git checkout develop
+git reset --hard origin/develop
+git pull
+cd ..
+
+python kokkos/config/snapshot.py ${KOKKOS_PATH} ${TRILINOS_UPDATED_PATH}/packages
+
+cd ${TRILINOS_UPDATED_PATH}
+echo ""
+echo ""
+echo "Trilinos State:"
+git log --pretty=oneline --since=2.days
+SHA=`git log --pretty=oneline --since=2.days | head -n 2 | tail -n 1 | awk '{print $1}'`
+cd ..
+
+cd ${TRILINOS_PRISTINE_PATH}
+git status
+git log --pretty=oneline --since=2.days
+echo "Checkout develop"
+git checkout develop
+echo "Pull"
+git pull
+echo "Checkout SHA"
+git checkout ${SHA}
+cd ..
+
+cd ${TRILINOS_PRISTINE_PATH}
+echo ""
+echo ""
+echo "Trilinos Pristine State:"
+git log --pretty=oneline --since=2.days
+cd ..
diff --git a/lib/kokkos/containers/performance_tests/CMakeLists.txt b/lib/kokkos/containers/performance_tests/CMakeLists.txt
index 726d40345..403ac746f 100644
--- a/lib/kokkos/containers/performance_tests/CMakeLists.txt
+++ b/lib/kokkos/containers/performance_tests/CMakeLists.txt
@@ -1,37 +1,37 @@
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINARY_DIR})
-INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR})
+INCLUDE_DIRECTORIES(REQUIRED_DURING_INSTALLATION_TESTING ${CMAKE_CURRENT_SOURCE_DIR})
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR}/../src )
SET(SOURCES
TestMain.cpp
TestCuda.cpp
)
IF(Kokkos_ENABLE_Pthread)
LIST( APPEND SOURCES TestThreads.cpp)
ENDIF()
IF(Kokkos_ENABLE_OpenMP)
LIST( APPEND SOURCES TestOpenMP.cpp)
ENDIF()
# Per #374, we always want to build this test, but we only want to run
# it as a PERFORMANCE test. That's why we separate building the test
# from running the test.
TRIBITS_ADD_EXECUTABLE(
PerfTestExec
SOURCES ${SOURCES}
COMM serial mpi
TESTONLYLIBS kokkos_gtest
)
TRIBITS_ADD_TEST(
PerformanceTest
NAME PerfTestExec
COMM serial mpi
NUM_MPI_PROCS 1
CATEGORIES PERFORMANCE
FAIL_REGULAR_EXPRESSION " FAILED "
)
diff --git a/lib/kokkos/containers/performance_tests/Makefile b/lib/kokkos/containers/performance_tests/Makefile
index e7abaf44c..fa3bc7770 100644
--- a/lib/kokkos/containers/performance_tests/Makefile
+++ b/lib/kokkos/containers/performance_tests/Makefile
@@ -1,81 +1,78 @@
KOKKOS_PATH = ../..
GTEST_PATH = ../../TPL/gtest
vpath %.cpp ${KOKKOS_PATH}/containers/performance_tests
default: build_all
echo "End Build"
-
-include $(KOKKOS_PATH)/Makefile.kokkos
-
-ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
- CXX = $(NVCC_WRAPPER)
- CXXFLAGS ?= -O3
- LINK = $(CXX)
- LDFLAGS ?= -lpthread
+ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
+ CXX = $(KOKKOS_PATH)/config/nvcc_wrapper
else
- CXX ?= g++
- CXXFLAGS ?= -O3
- LINK ?= $(CXX)
- LDFLAGS ?= -lpthread
+ CXX = g++
endif
+CXXFLAGS = -O3
+LINK ?= $(CXX)
+LDFLAGS ?= -lpthread
+
+include $(KOKKOS_PATH)/Makefile.kokkos
+
KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/containers/performance_tests
TEST_TARGETS =
TARGETS =
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
OBJ_CUDA = TestCuda.o TestMain.o gtest-all.o
TARGETS += KokkosContainers_PerformanceTest_Cuda
TEST_TARGETS += test-cuda
endif
ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 1)
OBJ_THREADS = TestThreads.o TestMain.o gtest-all.o
TARGETS += KokkosContainers_PerformanceTest_Threads
TEST_TARGETS += test-threads
endif
ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
OBJ_OPENMP = TestOpenMP.o TestMain.o gtest-all.o
TARGETS += KokkosContainers_PerformanceTest_OpenMP
TEST_TARGETS += test-openmp
endif
KokkosContainers_PerformanceTest_Cuda: $(OBJ_CUDA) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_CUDA) $(KOKKOS_LIBS) $(LIB) -o KokkosContainers_PerformanceTest_Cuda
KokkosContainers_PerformanceTest_Threads: $(OBJ_THREADS) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_THREADS) $(KOKKOS_LIBS) $(LIB) -o KokkosContainers_PerformanceTest_Threads
KokkosContainers_PerformanceTest_OpenMP: $(OBJ_OPENMP) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_OPENMP) $(KOKKOS_LIBS) $(LIB) -o KokkosContainers_PerformanceTest_OpenMP
test-cuda: KokkosContainers_PerformanceTest_Cuda
./KokkosContainers_PerformanceTest_Cuda
test-threads: KokkosContainers_PerformanceTest_Threads
./KokkosContainers_PerformanceTest_Threads
test-openmp: KokkosContainers_PerformanceTest_OpenMP
./KokkosContainers_PerformanceTest_OpenMP
build_all: $(TARGETS)
test: $(TEST_TARGETS)
clean: kokkos-clean
rm -f *.o $(TARGETS)
# Compilation rules
%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
gtest-all.o:$(GTEST_PATH)/gtest/gtest-all.cc
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $(GTEST_PATH)/gtest/gtest-all.cc
diff --git a/lib/kokkos/containers/performance_tests/TestCuda.cpp b/lib/kokkos/containers/performance_tests/TestCuda.cpp
index 8183adaa6..e7afad905 100644
--- a/lib/kokkos/containers/performance_tests/TestCuda.cpp
+++ b/lib/kokkos/containers/performance_tests/TestCuda.cpp
@@ -1,109 +1,109 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <stdint.h>
#include <string>
#include <iostream>
#include <iomanip>
#include <sstream>
#include <fstream>
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
#if defined( KOKKOS_HAVE_CUDA )
#include <TestDynRankView.hpp>
#include <Kokkos_UnorderedMap.hpp>
#include <TestGlobal2LocalIds.hpp>
#include <TestUnorderedMapPerformance.hpp>
namespace Performance {
class cuda : public ::testing::Test {
protected:
static void SetUpTestCase()
{
std::cout << std::setprecision(5) << std::scientific;
Kokkos::HostSpace::execution_space::initialize();
Kokkos::Cuda::initialize( Kokkos::Cuda::SelectDevice(0) );
}
static void TearDownTestCase()
{
Kokkos::Cuda::finalize();
Kokkos::HostSpace::execution_space::finalize();
}
};
TEST_F( cuda, dynrankview_perf )
{
std::cout << "Cuda" << std::endl;
std::cout << " DynRankView vs View: Initialization Only " << std::endl;
- test_dynrankview_op_perf<Kokkos::Cuda>( 4096 );
+ test_dynrankview_op_perf<Kokkos::Cuda>( 40960 );
}
TEST_F( cuda, global_2_local)
{
std::cout << "Cuda" << std::endl;
std::cout << "size, create, generate, fill, find" << std::endl;
for (unsigned i=Performance::begin_id_size; i<=Performance::end_id_size; i *= Performance::id_step)
test_global_to_local_ids<Kokkos::Cuda>(i);
}
TEST_F( cuda, unordered_map_performance_near)
{
Perf::run_performance_tests<Kokkos::Cuda,true>("cuda-near");
}
TEST_F( cuda, unordered_map_performance_far)
{
Perf::run_performance_tests<Kokkos::Cuda,false>("cuda-far");
}
}
#endif /* #if defined( KOKKOS_HAVE_CUDA ) */
diff --git a/lib/kokkos/containers/performance_tests/TestDynRankView.hpp b/lib/kokkos/containers/performance_tests/TestDynRankView.hpp
index aab6e6988..d96a3f743 100644
--- a/lib/kokkos/containers/performance_tests/TestDynRankView.hpp
+++ b/lib/kokkos/containers/performance_tests/TestDynRankView.hpp
@@ -1,265 +1,265 @@
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
#ifndef KOKKOS_TEST_DYNRANKVIEW_HPP
#define KOKKOS_TEST_DYNRANKVIEW_HPP
#include <Kokkos_Core.hpp>
#include <Kokkos_DynRankView.hpp>
#include <vector>
#include <impl/Kokkos_Timer.hpp>
// Compare performance of DynRankView to View, specific focus on the parenthesis operators
namespace Performance {
//View functor
template <typename DeviceType>
struct InitViewFunctor {
typedef Kokkos::View<double***, DeviceType> inviewtype;
inviewtype _inview;
InitViewFunctor( inviewtype &inview_ ) : _inview(inview_)
{}
KOKKOS_INLINE_FUNCTION
void operator()(const int i) const {
for (unsigned j = 0; j < _inview.dimension(1); ++j) {
for (unsigned k = 0; k < _inview.dimension(2); ++k) {
_inview(i,j,k) = i/2 -j*j + k/3;
}
}
}
struct SumComputationTest
{
typedef Kokkos::View<double***, DeviceType> inviewtype;
inviewtype _inview;
typedef Kokkos::View<double*, DeviceType> outviewtype;
outviewtype _outview;
KOKKOS_INLINE_FUNCTION
SumComputationTest(inviewtype &inview_ , outviewtype &outview_) : _inview(inview_), _outview(outview_) {}
KOKKOS_INLINE_FUNCTION
void operator()(const int i) const {
for (unsigned j = 0; j < _inview.dimension(1); ++j) {
for (unsigned k = 0; k < _inview.dimension(2); ++k) {
_outview(i) += _inview(i,j,k) ;
}
}
}
};
};
template <typename DeviceType>
struct InitStrideViewFunctor {
typedef Kokkos::View<double***, Kokkos::LayoutStride, DeviceType> inviewtype;
inviewtype _inview;
InitStrideViewFunctor( inviewtype &inview_ ) : _inview(inview_)
{}
KOKKOS_INLINE_FUNCTION
void operator()(const int i) const {
for (unsigned j = 0; j < _inview.dimension(1); ++j) {
for (unsigned k = 0; k < _inview.dimension(2); ++k) {
_inview(i,j,k) = i/2 -j*j + k/3;
}
}
}
};
template <typename DeviceType>
struct InitViewRank7Functor {
typedef Kokkos::View<double*******, DeviceType> inviewtype;
inviewtype _inview;
InitViewRank7Functor( inviewtype &inview_ ) : _inview(inview_)
{}
KOKKOS_INLINE_FUNCTION
void operator()(const int i) const {
for (unsigned j = 0; j < _inview.dimension(1); ++j) {
for (unsigned k = 0; k < _inview.dimension(2); ++k) {
_inview(i,j,k,0,0,0,0) = i/2 -j*j + k/3;
}
}
}
};
//DynRankView functor
template <typename DeviceType>
struct InitDynRankViewFunctor {
typedef Kokkos::DynRankView<double, DeviceType> inviewtype;
inviewtype _inview;
InitDynRankViewFunctor( inviewtype &inview_ ) : _inview(inview_)
{}
KOKKOS_INLINE_FUNCTION
void operator()(const int i) const {
for (unsigned j = 0; j < _inview.dimension(1); ++j) {
for (unsigned k = 0; k < _inview.dimension(2); ++k) {
_inview(i,j,k) = i/2 -j*j + k/3;
}
}
}
struct SumComputationTest
{
typedef Kokkos::DynRankView<double, DeviceType> inviewtype;
inviewtype _inview;
typedef Kokkos::DynRankView<double, DeviceType> outviewtype;
outviewtype _outview;
KOKKOS_INLINE_FUNCTION
SumComputationTest(inviewtype &inview_ , outviewtype &outview_) : _inview(inview_), _outview(outview_) {}
KOKKOS_INLINE_FUNCTION
void operator()(const int i) const {
for (unsigned j = 0; j < _inview.dimension(1); ++j) {
for (unsigned k = 0; k < _inview.dimension(2); ++k) {
_outview(i) += _inview(i,j,k) ;
}
}
}
};
};
template <typename DeviceType>
void test_dynrankview_op_perf( const int par_size )
{
typedef DeviceType execution_space;
typedef typename execution_space::size_type size_type;
- const size_type dim2 = 900;
- const size_type dim3 = 300;
+ const size_type dim2 = 90;
+ const size_type dim3 = 30;
double elapsed_time_view = 0;
double elapsed_time_compview = 0;
double elapsed_time_strideview = 0;
double elapsed_time_view_rank7 = 0;
double elapsed_time_drview = 0;
double elapsed_time_compdrview = 0;
Kokkos::Timer timer;
{
Kokkos::View<double***,DeviceType> testview("testview",par_size,dim2,dim3);
typedef InitViewFunctor<DeviceType> FunctorType;
timer.reset();
Kokkos::RangePolicy<DeviceType> policy(0,par_size);
Kokkos::parallel_for( policy , FunctorType(testview) );
DeviceType::fence();
elapsed_time_view = timer.seconds();
std::cout << " View time (init only): " << elapsed_time_view << std::endl;
timer.reset();
Kokkos::View<double*,DeviceType> sumview("sumview",par_size);
Kokkos::parallel_for( policy , typename FunctorType::SumComputationTest(testview, sumview) );
DeviceType::fence();
elapsed_time_compview = timer.seconds();
std::cout << " View sum computation time: " << elapsed_time_view << std::endl;
Kokkos::View<double***,Kokkos::LayoutStride, DeviceType> teststrideview = Kokkos::subview(testview, Kokkos::ALL, Kokkos::ALL,Kokkos::ALL);
typedef InitStrideViewFunctor<DeviceType> FunctorStrideType;
timer.reset();
Kokkos::parallel_for( policy , FunctorStrideType(teststrideview) );
DeviceType::fence();
elapsed_time_strideview = timer.seconds();
std::cout << " Strided View time (init only): " << elapsed_time_strideview << std::endl;
}
{
Kokkos::View<double*******,DeviceType> testview("testview",par_size,dim2,dim3,1,1,1,1);
typedef InitViewRank7Functor<DeviceType> FunctorType;
timer.reset();
Kokkos::RangePolicy<DeviceType> policy(0,par_size);
Kokkos::parallel_for( policy , FunctorType(testview) );
DeviceType::fence();
elapsed_time_view_rank7 = timer.seconds();
std::cout << " View Rank7 time (init only): " << elapsed_time_view_rank7 << std::endl;
}
{
Kokkos::DynRankView<double,DeviceType> testdrview("testdrview",par_size,dim2,dim3);
typedef InitDynRankViewFunctor<DeviceType> FunctorType;
timer.reset();
Kokkos::RangePolicy<DeviceType> policy(0,par_size);
Kokkos::parallel_for( policy , FunctorType(testdrview) );
DeviceType::fence();
elapsed_time_drview = timer.seconds();
std::cout << " DynRankView time (init only): " << elapsed_time_drview << std::endl;
timer.reset();
Kokkos::DynRankView<double,DeviceType> sumview("sumview",par_size);
Kokkos::parallel_for( policy , typename FunctorType::SumComputationTest(testdrview, sumview) );
DeviceType::fence();
elapsed_time_compdrview = timer.seconds();
std::cout << " DynRankView sum computation time: " << elapsed_time_compdrview << std::endl;
}
std::cout << " Ratio of View to DynRankView time: " << elapsed_time_view / elapsed_time_drview << std::endl; //expect < 1
std::cout << " Ratio of View to DynRankView sum computation time: " << elapsed_time_compview / elapsed_time_compdrview << std::endl; //expect < 1
std::cout << " Ratio of View to View Rank7 time: " << elapsed_time_view / elapsed_time_view_rank7 << std::endl; //expect < 1
std::cout << " Ratio of StrideView to DynRankView time: " << elapsed_time_strideview / elapsed_time_drview << std::endl; //expect < 1
std::cout << " Ratio of DynRankView to View Rank7 time: " << elapsed_time_drview / elapsed_time_view_rank7 << std::endl; //expect ?
timer.reset();
} //end test_dynrankview
} //end Performance
#endif
diff --git a/lib/kokkos/containers/src/Kokkos_DualView.hpp b/lib/kokkos/containers/src/Kokkos_DualView.hpp
index 1230df4d9..3a0196ee4 100644
--- a/lib/kokkos/containers/src/Kokkos_DualView.hpp
+++ b/lib/kokkos/containers/src/Kokkos_DualView.hpp
@@ -1,982 +1,626 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
/// \file Kokkos_DualView.hpp
/// \brief Declaration and definition of Kokkos::DualView.
///
/// This header file declares and defines Kokkos::DualView and its
/// related nonmember functions.
#ifndef KOKKOS_DUALVIEW_HPP
#define KOKKOS_DUALVIEW_HPP
#include <Kokkos_Core.hpp>
#include <impl/Kokkos_Error.hpp>
namespace Kokkos {
/* \class DualView
* \brief Container to manage mirroring a Kokkos::View that lives
* in device memory with a Kokkos::View that lives in host memory.
*
* This class provides capabilities to manage data which exists in two
* memory spaces at the same time. It keeps views of the same layout
* on two memory spaces as well as modified flags for both
* allocations. Users are responsible for setting the modified flags
* manually if they change the data in either memory space, by calling
* the sync() method templated on the device where they modified the
* data. Users may synchronize data by calling the modify() function,
* templated on the device towards which they want to synchronize
* (i.e., the target of the one-way copy operation).
*
* The DualView class also provides convenience methods such as
* realloc, resize and capacity which call the appropriate methods of
* the underlying Kokkos::View objects.
*
* The four template arguments are the same as those of Kokkos::View.
* (Please refer to that class' documentation for a detailed
* description.)
*
* \tparam DataType The type of the entries stored in the container.
*
* \tparam Layout The array's layout in memory.
*
* \tparam Device The Kokkos Device type. If its memory space is
* not the same as the host's memory space, then DualView will
* contain two separate Views: one in device memory, and one in
* host memory. Otherwise, DualView will only store one View.
*
* \tparam MemoryTraits (optional) The user's intended memory access
* behavior. Please see the documentation of Kokkos::View for
* examples. The default suffices for most users.
*/
template< class DataType ,
class Arg1Type = void ,
class Arg2Type = void ,
class Arg3Type = void>
class DualView : public ViewTraits< DataType , Arg1Type , Arg2Type, Arg3Type >
{
public:
//! \name Typedefs for device types and various Kokkos::View specializations.
//@{
typedef ViewTraits< DataType , Arg1Type , Arg2Type, Arg3Type > traits ;
//! The Kokkos Host Device type;
typedef typename traits::host_mirror_space host_mirror_space ;
//! The type of a Kokkos::View on the device.
typedef View< typename traits::data_type ,
Arg1Type ,
Arg2Type ,
Arg3Type > t_dev ;
/// \typedef t_host
/// \brief The type of a Kokkos::View host mirror of \c t_dev.
typedef typename t_dev::HostMirror t_host ;
//! The type of a const View on the device.
//! The type of a Kokkos::View on the device.
typedef View< typename traits::const_data_type ,
Arg1Type ,
Arg2Type ,
Arg3Type > t_dev_const ;
/// \typedef t_host_const
/// \brief The type of a const View host mirror of \c t_dev_const.
typedef typename t_dev_const::HostMirror t_host_const;
//! The type of a const, random-access View on the device.
typedef View< typename traits::const_data_type ,
typename traits::array_layout ,
typename traits::device_type ,
Kokkos::MemoryTraits<Kokkos::RandomAccess> > t_dev_const_randomread ;
/// \typedef t_host_const_randomread
/// \brief The type of a const, random-access View host mirror of
/// \c t_dev_const_randomread.
typedef typename t_dev_const_randomread::HostMirror t_host_const_randomread;
//! The type of an unmanaged View on the device.
typedef View< typename traits::data_type ,
typename traits::array_layout ,
typename traits::device_type ,
MemoryUnmanaged> t_dev_um;
//! The type of an unmanaged View host mirror of \c t_dev_um.
typedef View< typename t_host::data_type ,
typename t_host::array_layout ,
typename t_host::device_type ,
MemoryUnmanaged> t_host_um;
//! The type of a const unmanaged View on the device.
typedef View< typename traits::const_data_type ,
typename traits::array_layout ,
typename traits::device_type ,
MemoryUnmanaged> t_dev_const_um;
//! The type of a const unmanaged View host mirror of \c t_dev_const_um.
typedef View<typename t_host::const_data_type,
typename t_host::array_layout,
typename t_host::device_type,
MemoryUnmanaged> t_host_const_um;
//! The type of a const, random-access View on the device.
typedef View< typename t_host::const_data_type ,
typename t_host::array_layout ,
typename t_host::device_type ,
Kokkos::MemoryTraits<Kokkos::Unmanaged|Kokkos::RandomAccess> > t_dev_const_randomread_um ;
/// \typedef t_host_const_randomread
/// \brief The type of a const, random-access View host mirror of
/// \c t_dev_const_randomread.
typedef typename t_dev_const_randomread::HostMirror t_host_const_randomread_um;
//@}
//! \name The two View instances.
//@{
t_dev d_view;
t_host h_view;
//@}
//! \name Counters to keep track of changes ("modified" flags)
//@{
View<unsigned int,LayoutLeft,typename t_host::execution_space> modified_device;
View<unsigned int,LayoutLeft,typename t_host::execution_space> modified_host;
//@}
//! \name Constructors
//@{
/// \brief Empty constructor.
///
/// Both device and host View objects are constructed using their
/// default constructors. The "modified" flags are both initialized
/// to "unmodified."
DualView () :
modified_device (View<unsigned int,LayoutLeft,typename t_host::execution_space> ("DualView::modified_device")),
modified_host (View<unsigned int,LayoutLeft,typename t_host::execution_space> ("DualView::modified_host"))
{}
/// \brief Constructor that allocates View objects on both host and device.
///
/// This constructor works like the analogous constructor of View.
/// The first argument is a string label, which is entirely for your
/// benefit. (Different DualView objects may have the same label if
/// you like.) The arguments that follow are the dimensions of the
/// View objects. For example, if the View has three dimensions,
/// the first three integer arguments will be nonzero, and you may
/// omit the integer arguments that follow.
DualView (const std::string& label,
const size_t n0 = 0,
const size_t n1 = 0,
const size_t n2 = 0,
const size_t n3 = 0,
const size_t n4 = 0,
const size_t n5 = 0,
const size_t n6 = 0,
const size_t n7 = 0)
: d_view (label, n0, n1, n2, n3, n4, n5, n6, n7)
, h_view (create_mirror_view (d_view)) // without UVM, host View mirrors
, modified_device (View<unsigned int,LayoutLeft,typename t_host::execution_space> ("DualView::modified_device"))
, modified_host (View<unsigned int,LayoutLeft,typename t_host::execution_space> ("DualView::modified_host"))
{}
//! Copy constructor (shallow copy)
template<class SS, class LS, class DS, class MS>
DualView (const DualView<SS,LS,DS,MS>& src) :
d_view (src.d_view),
h_view (src.h_view),
modified_device (src.modified_device),
modified_host (src.modified_host)
{}
//! Subview constructor
template< class SD, class S1 , class S2 , class S3
, class Arg0 , class ... Args >
DualView( const DualView<SD,S1,S2,S3> & src
, const Arg0 & arg0
, Args ... args
)
: d_view( Kokkos::subview( src.d_view , arg0 , args ... ) )
, h_view( Kokkos::subview( src.h_view , arg0 , args ... ) )
, modified_device (src.modified_device)
, modified_host (src.modified_host)
{}
/// \brief Create DualView from existing device and host View objects.
///
/// This constructor assumes that the device and host View objects
/// are synchronized. You, the caller, are responsible for making
/// sure this is the case before calling this constructor. After
/// this constructor returns, you may use DualView's sync() and
/// modify() methods to ensure synchronization of the View objects.
///
/// \param d_view_ Device View
/// \param h_view_ Host View (must have type t_host = t_dev::HostMirror)
DualView (const t_dev& d_view_, const t_host& h_view_) :
d_view (d_view_),
h_view (h_view_),
modified_device (View<unsigned int,LayoutLeft,typename t_host::execution_space> ("DualView::modified_device")),
modified_host (View<unsigned int,LayoutLeft,typename t_host::execution_space> ("DualView::modified_host"))
{
-#if ! KOKKOS_USING_EXP_VIEW
- Impl::assert_shapes_are_equal (d_view.shape (), h_view.shape ());
-#else
if ( int(d_view.rank) != int(h_view.rank) ||
d_view.dimension_0() != h_view.dimension_0() ||
d_view.dimension_1() != h_view.dimension_1() ||
d_view.dimension_2() != h_view.dimension_2() ||
d_view.dimension_3() != h_view.dimension_3() ||
d_view.dimension_4() != h_view.dimension_4() ||
d_view.dimension_5() != h_view.dimension_5() ||
d_view.dimension_6() != h_view.dimension_6() ||
d_view.dimension_7() != h_view.dimension_7() ||
d_view.stride_0() != h_view.stride_0() ||
d_view.stride_1() != h_view.stride_1() ||
d_view.stride_2() != h_view.stride_2() ||
d_view.stride_3() != h_view.stride_3() ||
d_view.stride_4() != h_view.stride_4() ||
d_view.stride_5() != h_view.stride_5() ||
d_view.stride_6() != h_view.stride_6() ||
d_view.stride_7() != h_view.stride_7() ||
d_view.span() != h_view.span() ) {
Kokkos::Impl::throw_runtime_exception("DualView constructed with incompatible views");
}
-#endif
}
//@}
//! \name Methods for synchronizing, marking as modified, and getting Views.
//@{
/// \brief Return a View on a specific device \c Device.
///
/// Please don't be afraid of the if_c expression in the return
/// value's type. That just tells the method what the return type
/// should be: t_dev if the \c Device template parameter matches
/// this DualView's device type, else t_host.
///
/// For example, suppose you create a DualView on Cuda, like this:
/// \code
/// typedef Kokkos::DualView<float, Kokkos::LayoutRight, Kokkos::Cuda> dual_view_type;
/// dual_view_type DV ("my dual view", 100);
/// \endcode
/// If you want to get the CUDA device View, do this:
/// \code
/// typename dual_view_type::t_dev cudaView = DV.view<Kokkos::Cuda> ();
/// \endcode
/// and if you want to get the host mirror of that View, do this:
/// \code
/// typedef typename Kokkos::HostSpace::execution_space host_device_type;
/// typename dual_view_type::t_host hostView = DV.view<host_device_type> ();
/// \endcode
template< class Device >
KOKKOS_INLINE_FUNCTION
const typename Impl::if_c<
- Impl::is_same<typename t_dev::memory_space,
+ std::is_same<typename t_dev::memory_space,
typename Device::memory_space>::value,
t_dev,
t_host>::type& view () const
{
return Impl::if_c<
- Impl::is_same<
+ std::is_same<
typename t_dev::memory_space,
typename Device::memory_space>::value,
t_dev,
t_host >::select (d_view , h_view);
}
/// \brief Update data on device or host only if data in the other
/// space has been marked as modified.
///
/// If \c Device is the same as this DualView's device type, then
/// copy data from host to device. Otherwise, copy data from device
/// to host. In either case, only copy if the source of the copy
/// has been modified.
///
/// This is a one-way synchronization only. If the target of the
/// copy has been modified, this operation will discard those
/// modifications. It will also reset both device and host modified
/// flags.
///
/// \note This method doesn't know on its own whether you modified
/// the data in either View. You must manually mark modified data
/// as modified, by calling the modify() method with the
/// appropriate template parameter.
template<class Device>
void sync( const typename Impl::enable_if<
- ( Impl::is_same< typename traits::data_type , typename traits::non_const_data_type>::value) ||
- ( Impl::is_same< Device , int>::value)
+ ( std::is_same< typename traits::data_type , typename traits::non_const_data_type>::value) ||
+ ( std::is_same< Device , int>::value)
, int >::type& = 0)
{
const unsigned int dev =
Impl::if_c<
- Impl::is_same<
+ std::is_same<
typename t_dev::memory_space,
typename Device::memory_space>::value ,
unsigned int,
unsigned int>::select (1, 0);
if (dev) { // if Device is the same as DualView's device type
if ((modified_host () > 0) && (modified_host () >= modified_device ())) {
deep_copy (d_view, h_view);
modified_host() = modified_device() = 0;
}
} else { // hopefully Device is the same as DualView's host type
if ((modified_device () > 0) && (modified_device () >= modified_host ())) {
deep_copy (h_view, d_view);
modified_host() = modified_device() = 0;
}
}
- if(Impl::is_same<typename t_host::memory_space,typename t_dev::memory_space>::value) {
+ if(std::is_same<typename t_host::memory_space,typename t_dev::memory_space>::value) {
t_dev::execution_space::fence();
t_host::execution_space::fence();
}
}
template<class Device>
void sync ( const typename Impl::enable_if<
- ( ! Impl::is_same< typename traits::data_type , typename traits::non_const_data_type>::value ) ||
- ( Impl::is_same< Device , int>::value)
+ ( ! std::is_same< typename traits::data_type , typename traits::non_const_data_type>::value ) ||
+ ( std::is_same< Device , int>::value)
, int >::type& = 0 )
{
const unsigned int dev =
Impl::if_c<
- Impl::is_same<
+ std::is_same<
typename t_dev::memory_space,
typename Device::memory_space>::value,
unsigned int,
unsigned int>::select (1, 0);
if (dev) { // if Device is the same as DualView's device type
if ((modified_host () > 0) && (modified_host () >= modified_device ())) {
Impl::throw_runtime_exception("Calling sync on a DualView with a const datatype.");
}
} else { // hopefully Device is the same as DualView's host type
if ((modified_device () > 0) && (modified_device () >= modified_host ())) {
Impl::throw_runtime_exception("Calling sync on a DualView with a const datatype.");
}
}
}
template<class Device>
bool need_sync() const
{
const unsigned int dev =
Impl::if_c<
- Impl::is_same<
+ std::is_same<
typename t_dev::memory_space,
typename Device::memory_space>::value ,
unsigned int,
unsigned int>::select (1, 0);
if (dev) { // if Device is the same as DualView's device type
if ((modified_host () > 0) && (modified_host () >= modified_device ())) {
return true;
}
} else { // hopefully Device is the same as DualView's host type
if ((modified_device () > 0) && (modified_device () >= modified_host ())) {
return true;
}
}
return false;
}
/// \brief Mark data as modified on the given device \c Device.
///
/// If \c Device is the same as this DualView's device type, then
/// mark the device's data as modified. Otherwise, mark the host's
/// data as modified.
template<class Device>
void modify () {
const unsigned int dev =
Impl::if_c<
- Impl::is_same<
+ std::is_same<
typename t_dev::memory_space,
typename Device::memory_space>::value,
unsigned int,
unsigned int>::select (1, 0);
if (dev) { // if Device is the same as DualView's device type
// Increment the device's modified count.
modified_device () = (modified_device () > modified_host () ?
modified_device () : modified_host ()) + 1;
} else { // hopefully Device is the same as DualView's host type
// Increment the host's modified count.
modified_host () = (modified_device () > modified_host () ?
modified_device () : modified_host ()) + 1;
}
}
//@}
//! \name Methods for reallocating or resizing the View objects.
//@{
/// \brief Reallocate both View objects.
///
/// This discards any existing contents of the objects, and resets
/// their modified flags. It does <i>not</i> copy the old contents
/// of either View into the new View objects.
void realloc( const size_t n0 = 0 ,
const size_t n1 = 0 ,
const size_t n2 = 0 ,
const size_t n3 = 0 ,
const size_t n4 = 0 ,
const size_t n5 = 0 ,
const size_t n6 = 0 ,
const size_t n7 = 0 ) {
::Kokkos::realloc(d_view,n0,n1,n2,n3,n4,n5,n6,n7);
h_view = create_mirror_view( d_view );
/* Reset dirty flags */
modified_device() = modified_host() = 0;
}
/// \brief Resize both views, copying old contents into new if necessary.
///
/// This method only copies the old contents into the new View
/// objects for the device which was last marked as modified.
void resize( const size_t n0 = 0 ,
const size_t n1 = 0 ,
const size_t n2 = 0 ,
const size_t n3 = 0 ,
const size_t n4 = 0 ,
const size_t n5 = 0 ,
const size_t n6 = 0 ,
const size_t n7 = 0 ) {
if(modified_device() >= modified_host()) {
/* Resize on Device */
::Kokkos::resize(d_view,n0,n1,n2,n3,n4,n5,n6,n7);
h_view = create_mirror_view( d_view );
/* Mark Device copy as modified */
modified_device() = modified_device()+1;
} else {
/* Realloc on Device */
::Kokkos::realloc(d_view,n0,n1,n2,n3,n4,n5,n6,n7);
t_host temp_view = create_mirror_view( d_view );
/* Remap on Host */
Kokkos::deep_copy( temp_view , h_view );
h_view = temp_view;
/* Mark Host copy as modified */
modified_host() = modified_host()+1;
}
}
//@}
//! \name Methods for getting capacity, stride, or dimension(s).
//@{
//! The allocation size (same as Kokkos::View::capacity).
size_t capacity() const {
-#if KOKKOS_USING_EXP_VIEW
return d_view.span();
-#else
- return d_view.capacity();
-#endif
}
//! Get stride(s) for each dimension.
template< typename iType>
void stride(iType* stride_) const {
d_view.stride(stride_);
}
/* \brief return size of dimension 0 */
size_t dimension_0() const {return d_view.dimension_0();}
/* \brief return size of dimension 1 */
size_t dimension_1() const {return d_view.dimension_1();}
/* \brief return size of dimension 2 */
size_t dimension_2() const {return d_view.dimension_2();}
/* \brief return size of dimension 3 */
size_t dimension_3() const {return d_view.dimension_3();}
/* \brief return size of dimension 4 */
size_t dimension_4() const {return d_view.dimension_4();}
/* \brief return size of dimension 5 */
size_t dimension_5() const {return d_view.dimension_5();}
/* \brief return size of dimension 6 */
size_t dimension_6() const {return d_view.dimension_6();}
/* \brief return size of dimension 7 */
size_t dimension_7() const {return d_view.dimension_7();}
//@}
};
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
//
// Partial specializations of Kokkos::subview() for DualView objects.
//
-#if KOKKOS_USING_EXP_VIEW
-
namespace Kokkos {
namespace Impl {
template< class D, class A1, class A2, class A3, class ... Args >
struct DualViewSubview {
typedef typename Kokkos::Experimental::Impl::ViewMapping
< void
, Kokkos::ViewTraits< D, A1, A2, A3 >
, Args ...
>::traits_type dst_traits ;
typedef Kokkos::DualView
< typename dst_traits::data_type
, typename dst_traits::array_layout
, typename dst_traits::device_type
, typename dst_traits::memory_traits
> type ;
};
} /* namespace Impl */
template< class D , class A1 , class A2 , class A3 , class ... Args >
typename Impl::DualViewSubview<D,A1,A2,A3,Args...>::type
subview( const DualView<D,A1,A2,A3> & src , Args ... args )
{
return typename
Impl::DualViewSubview<D,A1,A2,A3,Args...>::type( src , args ... );
}
} /* namespace Kokkos */
-#else
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-//
-// Partial specializations of Kokkos::subview() for DualView objects.
-//
-
-namespace Kokkos {
-namespace Impl {
-
-template< class SrcDataType , class SrcArg1Type , class SrcArg2Type , class SrcArg3Type
- , class SubArg0_type , class SubArg1_type , class SubArg2_type , class SubArg3_type
- , class SubArg4_type , class SubArg5_type , class SubArg6_type , class SubArg7_type
- >
-struct ViewSubview< DualView< SrcDataType , SrcArg1Type , SrcArg2Type , SrcArg3Type >
- , SubArg0_type , SubArg1_type , SubArg2_type , SubArg3_type
- , SubArg4_type , SubArg5_type , SubArg6_type , SubArg7_type >
-{
-private:
-
- typedef DualView< SrcDataType , SrcArg1Type , SrcArg2Type , SrcArg3Type > SrcViewType ;
-
- enum { V0 = Impl::is_same< SubArg0_type , void >::value ? 1 : 0 };
- enum { V1 = Impl::is_same< SubArg1_type , void >::value ? 1 : 0 };
- enum { V2 = Impl::is_same< SubArg2_type , void >::value ? 1 : 0 };
- enum { V3 = Impl::is_same< SubArg3_type , void >::value ? 1 : 0 };
- enum { V4 = Impl::is_same< SubArg4_type , void >::value ? 1 : 0 };
- enum { V5 = Impl::is_same< SubArg5_type , void >::value ? 1 : 0 };
- enum { V6 = Impl::is_same< SubArg6_type , void >::value ? 1 : 0 };
- enum { V7 = Impl::is_same< SubArg7_type , void >::value ? 1 : 0 };
-
- // The source view rank must be equal to the input argument rank
- // Once a void argument is encountered all subsequent arguments must be void.
- enum { InputRank =
- Impl::StaticAssert<( SrcViewType::rank ==
- ( V0 ? 0 : (
- V1 ? 1 : (
- V2 ? 2 : (
- V3 ? 3 : (
- V4 ? 4 : (
- V5 ? 5 : (
- V6 ? 6 : (
- V7 ? 7 : 8 ))))))) ))
- &&
- ( SrcViewType::rank ==
- ( 8 - ( V0 + V1 + V2 + V3 + V4 + V5 + V6 + V7 ) ) )
- >::value ? SrcViewType::rank : 0 };
-
- enum { R0 = Impl::ViewOffsetRange< SubArg0_type >::is_range ? 1 : 0 };
- enum { R1 = Impl::ViewOffsetRange< SubArg1_type >::is_range ? 1 : 0 };
- enum { R2 = Impl::ViewOffsetRange< SubArg2_type >::is_range ? 1 : 0 };
- enum { R3 = Impl::ViewOffsetRange< SubArg3_type >::is_range ? 1 : 0 };
- enum { R4 = Impl::ViewOffsetRange< SubArg4_type >::is_range ? 1 : 0 };
- enum { R5 = Impl::ViewOffsetRange< SubArg5_type >::is_range ? 1 : 0 };
- enum { R6 = Impl::ViewOffsetRange< SubArg6_type >::is_range ? 1 : 0 };
- enum { R7 = Impl::ViewOffsetRange< SubArg7_type >::is_range ? 1 : 0 };
-
- enum { OutputRank = unsigned(R0) + unsigned(R1) + unsigned(R2) + unsigned(R3)
- + unsigned(R4) + unsigned(R5) + unsigned(R6) + unsigned(R7) };
-
- // Reverse
- enum { R0_rev = 0 == InputRank ? 0u : (
- 1 == InputRank ? unsigned(R0) : (
- 2 == InputRank ? unsigned(R1) : (
- 3 == InputRank ? unsigned(R2) : (
- 4 == InputRank ? unsigned(R3) : (
- 5 == InputRank ? unsigned(R4) : (
- 6 == InputRank ? unsigned(R5) : (
- 7 == InputRank ? unsigned(R6) : unsigned(R7) ))))))) };
-
- typedef typename SrcViewType::array_layout SrcViewLayout ;
-
- // Choose array layout, attempting to preserve original layout if at all possible.
- typedef typename Impl::if_c<
- ( // Same Layout IF
- // OutputRank 0
- ( OutputRank == 0 )
- ||
- // OutputRank 1 or 2, InputLayout Left, Interval 0
- // because single stride one or second index has a stride.
- ( OutputRank <= 2 && R0 && Impl::is_same<SrcViewLayout,LayoutLeft>::value )
- ||
- // OutputRank 1 or 2, InputLayout Right, Interval [InputRank-1]
- // because single stride one or second index has a stride.
- ( OutputRank <= 2 && R0_rev && Impl::is_same<SrcViewLayout,LayoutRight>::value )
- ), SrcViewLayout , Kokkos::LayoutStride >::type OutputViewLayout ;
-
- // Choose data type as a purely dynamic rank array to accomodate a runtime range.
- typedef typename Impl::if_c< OutputRank == 0 , typename SrcViewType::value_type ,
- typename Impl::if_c< OutputRank == 1 , typename SrcViewType::value_type *,
- typename Impl::if_c< OutputRank == 2 , typename SrcViewType::value_type **,
- typename Impl::if_c< OutputRank == 3 , typename SrcViewType::value_type ***,
- typename Impl::if_c< OutputRank == 4 , typename SrcViewType::value_type ****,
- typename Impl::if_c< OutputRank == 5 , typename SrcViewType::value_type *****,
- typename Impl::if_c< OutputRank == 6 , typename SrcViewType::value_type ******,
- typename Impl::if_c< OutputRank == 7 , typename SrcViewType::value_type *******,
- typename SrcViewType::value_type ********
- >::type >::type >::type >::type >::type >::type >::type >::type OutputData ;
-
- // Choose space.
- // If the source view's template arg1 or arg2 is a space then use it,
- // otherwise use the source view's execution space.
-
- typedef typename Impl::if_c< Impl::is_space< SrcArg1Type >::value , SrcArg1Type ,
- typename Impl::if_c< Impl::is_space< SrcArg2Type >::value , SrcArg2Type , typename SrcViewType::execution_space
- >::type >::type OutputSpace ;
-
-public:
-
- // If keeping the layout then match non-data type arguments
- // else keep execution space and memory traits.
- typedef typename
- Impl::if_c< Impl::is_same< SrcViewLayout , OutputViewLayout >::value
- , Kokkos::DualView< OutputData , SrcArg1Type , SrcArg2Type , SrcArg3Type >
- , Kokkos::DualView< OutputData , OutputViewLayout , OutputSpace
- , typename SrcViewType::memory_traits >
- >::type type ;
-};
-
-} /* namespace Impl */
-} /* namespace Kokkos */
-
-namespace Kokkos {
-
-template< class D , class A1 , class A2 , class A3 ,
- class ArgType0 >
-typename Impl::ViewSubview< DualView<D,A1,A2,A3>
- , ArgType0 , void , void , void
- , void , void , void , void
- >::type
-subview( const DualView<D,A1,A2,A3> & src ,
- const ArgType0 & arg0 )
-{
- typedef typename
- Impl::ViewSubview< DualView<D,A1,A2,A3>
- , ArgType0 , void , void , void
- , void , void , void , void
- >::type
- DstViewType ;
- DstViewType sub_view;
- sub_view.d_view = subview(src.d_view,arg0);
- sub_view.h_view = subview(src.h_view,arg0);
- sub_view.modified_device = src.modified_device;
- sub_view.modified_host = src.modified_host;
- return sub_view;
-}
-
-
-template< class D , class A1 , class A2 , class A3 ,
- class ArgType0 , class ArgType1 >
-typename Impl::ViewSubview< DualView<D,A1,A2,A3>
- , ArgType0 , ArgType1 , void , void
- , void , void , void , void
- >::type
-subview( const DualView<D,A1,A2,A3> & src ,
- const ArgType0 & arg0 ,
- const ArgType1 & arg1 )
-{
- typedef typename
- Impl::ViewSubview< DualView<D,A1,A2,A3>
- , ArgType0 , ArgType1 , void , void
- , void , void , void , void
- >::type
- DstViewType ;
- DstViewType sub_view;
- sub_view.d_view = subview(src.d_view,arg0,arg1);
- sub_view.h_view = subview(src.h_view,arg0,arg1);
- sub_view.modified_device = src.modified_device;
- sub_view.modified_host = src.modified_host;
- return sub_view;
-}
-
-template< class D , class A1 , class A2 , class A3 ,
- class ArgType0 , class ArgType1 , class ArgType2 >
-typename Impl::ViewSubview< DualView<D,A1,A2,A3>
- , ArgType0 , ArgType1 , ArgType2 , void
- , void , void , void , void
- >::type
-subview( const DualView<D,A1,A2,A3> & src ,
- const ArgType0 & arg0 ,
- const ArgType1 & arg1 ,
- const ArgType2 & arg2 )
-{
- typedef typename
- Impl::ViewSubview< DualView<D,A1,A2,A3>
- , ArgType0 , ArgType1 , ArgType2 , void
- , void , void , void , void
- >::type
- DstViewType ;
- DstViewType sub_view;
- sub_view.d_view = subview(src.d_view,arg0,arg1,arg2);
- sub_view.h_view = subview(src.h_view,arg0,arg1,arg2);
- sub_view.modified_device = src.modified_device;
- sub_view.modified_host = src.modified_host;
- return sub_view;
-}
-
-template< class D , class A1 , class A2 , class A3 ,
- class ArgType0 , class ArgType1 , class ArgType2 , class ArgType3 >
-typename Impl::ViewSubview< DualView<D,A1,A2,A3>
- , ArgType0 , ArgType1 , ArgType2 , ArgType3
- , void , void , void , void
- >::type
-subview( const DualView<D,A1,A2,A3> & src ,
- const ArgType0 & arg0 ,
- const ArgType1 & arg1 ,
- const ArgType2 & arg2 ,
- const ArgType3 & arg3 )
-{
- typedef typename
- Impl::ViewSubview< DualView<D,A1,A2,A3>
- , ArgType0 , ArgType1 , ArgType2 , ArgType3
- , void , void , void , void
- >::type
- DstViewType ;
- DstViewType sub_view;
- sub_view.d_view = subview(src.d_view,arg0,arg1,arg2,arg3);
- sub_view.h_view = subview(src.h_view,arg0,arg1,arg2,arg3);
- sub_view.modified_device = src.modified_device;
- sub_view.modified_host = src.modified_host;
- return sub_view;
-}
-
-template< class D , class A1 , class A2 , class A3 ,
- class ArgType0 , class ArgType1 , class ArgType2 , class ArgType3 ,
- class ArgType4 >
-typename Impl::ViewSubview< DualView<D,A1,A2,A3>
- , ArgType0 , ArgType1 , ArgType2 , ArgType3
- , ArgType4 , void , void , void
- >::type
-subview( const DualView<D,A1,A2,A3> & src ,
- const ArgType0 & arg0 ,
- const ArgType1 & arg1 ,
- const ArgType2 & arg2 ,
- const ArgType3 & arg3 ,
- const ArgType4 & arg4 )
-{
- typedef typename
- Impl::ViewSubview< DualView<D,A1,A2,A3>
- , ArgType0 , ArgType1 , ArgType2 , ArgType3
- , ArgType4 , void , void ,void
- >::type
- DstViewType ;
- DstViewType sub_view;
- sub_view.d_view = subview(src.d_view,arg0,arg1,arg2,arg3,arg4);
- sub_view.h_view = subview(src.h_view,arg0,arg1,arg2,arg3,arg4);
- sub_view.modified_device = src.modified_device;
- sub_view.modified_host = src.modified_host;
- return sub_view;
-}
-
-template< class D , class A1 , class A2 , class A3 ,
- class ArgType0 , class ArgType1 , class ArgType2 , class ArgType3 ,
- class ArgType4 , class ArgType5 >
-typename Impl::ViewSubview< DualView<D,A1,A2,A3>
- , ArgType0 , ArgType1 , ArgType2 , ArgType3
- , ArgType4 , ArgType5 , void , void
- >::type
-subview( const DualView<D,A1,A2,A3> & src ,
- const ArgType0 & arg0 ,
- const ArgType1 & arg1 ,
- const ArgType2 & arg2 ,
- const ArgType3 & arg3 ,
- const ArgType4 & arg4 ,
- const ArgType5 & arg5 )
-{
- typedef typename
- Impl::ViewSubview< DualView<D,A1,A2,A3>
- , ArgType0 , ArgType1 , ArgType2 , ArgType3
- , ArgType4 , ArgType5 , void , void
- >::type
- DstViewType ;
- DstViewType sub_view;
- sub_view.d_view = subview(src.d_view,arg0,arg1,arg2,arg3,arg4,arg5);
- sub_view.h_view = subview(src.h_view,arg0,arg1,arg2,arg3,arg4,arg5);
- sub_view.modified_device = src.modified_device;
- sub_view.modified_host = src.modified_host;
- return sub_view;
-}
-
-template< class D , class A1 , class A2 , class A3 ,
- class ArgType0 , class ArgType1 , class ArgType2 , class ArgType3 ,
- class ArgType4 , class ArgType5 , class ArgType6 >
-typename Impl::ViewSubview< DualView<D,A1,A2,A3>
- , ArgType0 , ArgType1 , ArgType2 , ArgType3
- , ArgType4 , ArgType5 , ArgType6 , void
- >::type
-subview( const DualView<D,A1,A2,A3> & src ,
- const ArgType0 & arg0 ,
- const ArgType1 & arg1 ,
- const ArgType2 & arg2 ,
- const ArgType3 & arg3 ,
- const ArgType4 & arg4 ,
- const ArgType5 & arg5 ,
- const ArgType6 & arg6 )
-{
- typedef typename
- Impl::ViewSubview< DualView<D,A1,A2,A3>
- , ArgType0 , ArgType1 , ArgType2 , ArgType3
- , ArgType4 , ArgType5 , ArgType6 , void
- >::type
- DstViewType ;
- DstViewType sub_view;
- sub_view.d_view = subview(src.d_view,arg0,arg1,arg2,arg3,arg4,arg5,arg6);
- sub_view.h_view = subview(src.h_view,arg0,arg1,arg2,arg3,arg4,arg5,arg6);
- sub_view.modified_device = src.modified_device;
- sub_view.modified_host = src.modified_host;
- return sub_view;
-}
-
-template< class D , class A1 , class A2 , class A3 ,
- class ArgType0 , class ArgType1 , class ArgType2 , class ArgType3 ,
- class ArgType4 , class ArgType5 , class ArgType6 , class ArgType7 >
-typename Impl::ViewSubview< DualView<D,A1,A2,A3>
- , ArgType0 , ArgType1 , ArgType2 , ArgType3
- , ArgType4 , ArgType5 , ArgType6 , ArgType7
- >::type
-subview( const DualView<D,A1,A2,A3> & src ,
- const ArgType0 & arg0 ,
- const ArgType1 & arg1 ,
- const ArgType2 & arg2 ,
- const ArgType3 & arg3 ,
- const ArgType4 & arg4 ,
- const ArgType5 & arg5 ,
- const ArgType6 & arg6 ,
- const ArgType7 & arg7 )
-{
- typedef typename
- Impl::ViewSubview< DualView<D,A1,A2,A3>
- , ArgType0 , ArgType1 , ArgType2 , ArgType3
- , ArgType4 , ArgType5 , ArgType6 , ArgType7
- >::type
- DstViewType ;
- DstViewType sub_view;
- sub_view.d_view = subview(src.d_view,arg0,arg1,arg2,arg3,arg4,arg5,arg6,arg7);
- sub_view.h_view = subview(src.h_view,arg0,arg1,arg2,arg3,arg4,arg5,arg6,arg7);
- sub_view.modified_device = src.modified_device;
- sub_view.modified_host = src.modified_host;
- return sub_view;
-}
-
-} // namespace Kokkos
-
-#endif /* KOKKOS_USING_EXP_VIEW */
-
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
//
// Partial specialization of Kokkos::deep_copy() for DualView objects.
//
template< class DT , class DL , class DD , class DM ,
class ST , class SL , class SD , class SM >
void
deep_copy (DualView<DT,DL,DD,DM> dst, // trust me, this must not be a reference
const DualView<ST,SL,SD,SM>& src )
{
if (src.modified_device () >= src.modified_host ()) {
deep_copy (dst.d_view, src.d_view);
dst.template modify<typename DualView<DT,DL,DD,DM>::device_type> ();
} else {
deep_copy (dst.h_view, src.h_view);
dst.template modify<typename DualView<DT,DL,DD,DM>::host_mirror_space> ();
}
}
template< class ExecutionSpace ,
class DT , class DL , class DD , class DM ,
class ST , class SL , class SD , class SM >
void
deep_copy (const ExecutionSpace& exec ,
DualView<DT,DL,DD,DM> dst, // trust me, this must not be a reference
const DualView<ST,SL,SD,SM>& src )
{
if (src.modified_device () >= src.modified_host ()) {
deep_copy (exec, dst.d_view, src.d_view);
dst.template modify<typename DualView<DT,DL,DD,DM>::device_type> ();
} else {
deep_copy (exec, dst.h_view, src.h_view);
dst.template modify<typename DualView<DT,DL,DD,DM>::host_mirror_space> ();
}
}
} // namespace Kokkos
#endif
diff --git a/lib/kokkos/containers/src/Kokkos_DynRankView.hpp b/lib/kokkos/containers/src/Kokkos_DynRankView.hpp
index f72277700..1ac92b9d1 100644
--- a/lib/kokkos/containers/src/Kokkos_DynRankView.hpp
+++ b/lib/kokkos/containers/src/Kokkos_DynRankView.hpp
@@ -1,1834 +1,1968 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
/// \file Kokkos_DynRankView.hpp
/// \brief Declaration and definition of Kokkos::Experimental::DynRankView.
///
/// This header file declares and defines Kokkos::Experimental::DynRankView and its
/// related nonmember functions.
/*
* Changes from View
* 1. The rank of the DynRankView is returned by the method rank()
* 2. Max rank of a DynRankView is 7
* 3. subview name is subdynrankview
* 4. Every subdynrankview is returned with LayoutStride
*
* NEW: Redesigned DynRankView
* 5. subview function name now available
* 6. Copy and Copy-Assign View to DynRankView
* 7. deep_copy between Views and DynRankViews
* 8. rank( view ); returns the rank of View or DynRankView
*/
#ifndef KOKKOS_DYNRANKVIEW_HPP
#define KOKKOS_DYNRANKVIEW_HPP
#include <Kokkos_Core.hpp>
#include <impl/Kokkos_Error.hpp>
#include <type_traits>
namespace Kokkos {
namespace Experimental {
template< typename DataType , class ... Properties >
class DynRankView; //forward declare
namespace Impl {
template <typename Specialize>
struct DynRankDimTraits {
enum : size_t{unspecified = ~size_t(0)};
// Compute the rank of the view from the nonzero dimension arguments.
KOKKOS_INLINE_FUNCTION
static size_t computeRank( const size_t N0
, const size_t N1
, const size_t N2
, const size_t N3
, const size_t N4
, const size_t N5
, const size_t N6
, const size_t N7 )
{
return
( (N6 == unspecified && N5 == unspecified && N4 == unspecified && N3 == unspecified && N2 == unspecified && N1 == unspecified && N0 == unspecified) ? 0
: ( (N6 == unspecified && N5 == unspecified && N4 == unspecified && N3 == unspecified && N2 == unspecified && N1 == unspecified) ? 1
: ( (N6 == unspecified && N5 == unspecified && N4 == unspecified && N3 == unspecified && N2 == unspecified) ? 2
: ( (N6 == unspecified && N5 == unspecified && N4 == unspecified && N3 == unspecified) ? 3
: ( (N6 == unspecified && N5 == unspecified && N4 == unspecified) ? 4
: ( (N6 == unspecified && N5 == unspecified) ? 5
: ( (N6 == unspecified) ? 6
: 7 ) ) ) ) ) ) );
}
// Compute the rank of the view from the nonzero layout arguments.
template <typename Layout>
KOKKOS_INLINE_FUNCTION
static size_t computeRank( const Layout& layout )
{
return computeRank( layout.dimension[0]
, layout.dimension[1]
, layout.dimension[2]
, layout.dimension[3]
, layout.dimension[4]
, layout.dimension[5]
, layout.dimension[6]
, layout.dimension[7] );
}
// Create the layout for the rank-7 view.
// Non-strided Layout
template <typename Layout>
KOKKOS_INLINE_FUNCTION
static typename std::enable_if< (std::is_same<Layout , Kokkos::LayoutRight>::value || std::is_same<Layout , Kokkos::LayoutLeft>::value) , Layout >::type createLayout( const Layout& layout )
{
return Layout( layout.dimension[0] != unspecified ? layout.dimension[0] : 1
, layout.dimension[1] != unspecified ? layout.dimension[1] : 1
, layout.dimension[2] != unspecified ? layout.dimension[2] : 1
, layout.dimension[3] != unspecified ? layout.dimension[3] : 1
, layout.dimension[4] != unspecified ? layout.dimension[4] : 1
, layout.dimension[5] != unspecified ? layout.dimension[5] : 1
, layout.dimension[6] != unspecified ? layout.dimension[6] : 1
, layout.dimension[7] != unspecified ? layout.dimension[7] : 1
);
}
// LayoutStride
template <typename Layout>
KOKKOS_INLINE_FUNCTION
static typename std::enable_if< (std::is_same<Layout , Kokkos::LayoutStride>::value) , Layout>::type createLayout( const Layout& layout )
{
return Layout( layout.dimension[0] != unspecified ? layout.dimension[0] : 1
, layout.stride[0]
, layout.dimension[1] != unspecified ? layout.dimension[1] : 1
, layout.stride[1]
, layout.dimension[2] != unspecified ? layout.dimension[2] : 1
, layout.stride[2]
, layout.dimension[3] != unspecified ? layout.dimension[3] : 1
, layout.stride[3]
, layout.dimension[4] != unspecified ? layout.dimension[4] : 1
, layout.stride[4]
, layout.dimension[5] != unspecified ? layout.dimension[5] : 1
, layout.stride[5]
, layout.dimension[6] != unspecified ? layout.dimension[6] : 1
, layout.stride[6]
, layout.dimension[7] != unspecified ? layout.dimension[7] : 1
, layout.stride[7]
);
}
// Create a view from the given dimension arguments.
// This is only necessary because the shmem constructor doesn't take a layout.
template <typename ViewType, typename ViewArg>
static ViewType createView( const ViewArg& arg
, const size_t N0
, const size_t N1
, const size_t N2
, const size_t N3
, const size_t N4
, const size_t N5
, const size_t N6
, const size_t N7 )
{
return ViewType( arg
, N0 != unspecified ? N0 : 1
, N1 != unspecified ? N1 : 1
, N2 != unspecified ? N2 : 1
, N3 != unspecified ? N3 : 1
, N4 != unspecified ? N4 : 1
, N5 != unspecified ? N5 : 1
, N6 != unspecified ? N6 : 1
, N7 != unspecified ? N7 : 1 );
}
};
// Non-strided Layout
template <typename Layout , typename iType>
KOKKOS_INLINE_FUNCTION
static typename std::enable_if< (std::is_same<Layout , Kokkos::LayoutRight>::value || std::is_same<Layout , Kokkos::LayoutLeft>::value) && std::is_integral<iType>::value , Layout >::type reconstructLayout( const Layout& layout , iType dynrank )
{
return Layout( dynrank > 0 ? layout.dimension[0] : ~size_t(0)
, dynrank > 1 ? layout.dimension[1] : ~size_t(0)
, dynrank > 2 ? layout.dimension[2] : ~size_t(0)
, dynrank > 3 ? layout.dimension[3] : ~size_t(0)
, dynrank > 4 ? layout.dimension[4] : ~size_t(0)
, dynrank > 5 ? layout.dimension[5] : ~size_t(0)
, dynrank > 6 ? layout.dimension[6] : ~size_t(0)
, dynrank > 7 ? layout.dimension[7] : ~size_t(0)
);
}
// LayoutStride
template <typename Layout , typename iType>
KOKKOS_INLINE_FUNCTION
static typename std::enable_if< (std::is_same<Layout , Kokkos::LayoutStride>::value) && std::is_integral<iType>::value , Layout >::type reconstructLayout( const Layout& layout , iType dynrank )
{
return Layout( dynrank > 0 ? layout.dimension[0] : ~size_t(0)
, dynrank > 0 ? layout.stride[0] : (0)
, dynrank > 1 ? layout.dimension[1] : ~size_t(0)
, dynrank > 1 ? layout.stride[1] : (0)
, dynrank > 2 ? layout.dimension[2] : ~size_t(0)
, dynrank > 2 ? layout.stride[2] : (0)
, dynrank > 3 ? layout.dimension[3] : ~size_t(0)
, dynrank > 3 ? layout.stride[3] : (0)
, dynrank > 4 ? layout.dimension[4] : ~size_t(0)
, dynrank > 4 ? layout.stride[4] : (0)
, dynrank > 5 ? layout.dimension[5] : ~size_t(0)
, dynrank > 5 ? layout.stride[5] : (0)
, dynrank > 6 ? layout.dimension[6] : ~size_t(0)
, dynrank > 6 ? layout.stride[6] : (0)
, dynrank > 7 ? layout.dimension[7] : ~size_t(0)
, dynrank > 7 ? layout.stride[7] : (0)
);
}
- template < typename DynRankViewType , typename iType >
- void verify_dynrankview_rank ( iType N , const DynRankViewType &drv )
- {
- if ( static_cast<iType>(drv.rank()) > N )
- {
- Kokkos::abort( "Need at least rank arguments to the operator()" );
- }
+
+/** \brief Debug bounds-checking routines */
+// Enhanced debug checking - most infrastructure matches that of functions in
+// Kokkos_ViewMapping; additional checks for extra arguments beyond rank are 0
+template< unsigned , typename iType0 , class MapType >
+KOKKOS_INLINE_FUNCTION
+bool dyn_rank_view_verify_operator_bounds( const iType0 & , const MapType & )
+{ return true ; }
+
+template< unsigned R , typename iType0 , class MapType , typename iType1 , class ... Args >
+KOKKOS_INLINE_FUNCTION
+bool dyn_rank_view_verify_operator_bounds
+ ( const iType0 & rank
+ , const MapType & map
+ , const iType1 & i
+ , Args ... args
+ )
+{
+ if ( static_cast<iType0>(R) < rank ) {
+ return ( size_t(i) < map.extent(R) )
+ && dyn_rank_view_verify_operator_bounds<R+1>( rank , map , args ... );
+ }
+ else if ( i != 0 ) {
+ printf("DynRankView Debug Bounds Checking Error: at rank %u\n Extra arguments beyond the rank must be zero \n",R);
+ return ( false )
+ && dyn_rank_view_verify_operator_bounds<R+1>( rank , map , args ... );
}
+ else {
+ return ( true )
+ && dyn_rank_view_verify_operator_bounds<R+1>( rank , map , args ... );
+ }
+}
+
+template< unsigned , class MapType >
+inline
+void dyn_rank_view_error_operator_bounds( char * , int , const MapType & )
+{}
+
+template< unsigned R , class MapType , class iType , class ... Args >
+inline
+void dyn_rank_view_error_operator_bounds
+ ( char * buf
+ , int len
+ , const MapType & map
+ , const iType & i
+ , Args ... args
+ )
+{
+ const int n =
+ snprintf(buf,len," %ld < %ld %c"
+ , static_cast<unsigned long>(i)
+ , static_cast<unsigned long>( map.extent(R) )
+ , ( sizeof...(Args) ? ',' : ')' )
+ );
+ dyn_rank_view_error_operator_bounds<R+1>(buf+n,len-n,map,args...);
+}
+
+// op_rank = rank of the operator version that was called
+template< typename iType0 , typename iType1 , class MapType , class ... Args >
+KOKKOS_INLINE_FUNCTION
+void dyn_rank_view_verify_operator_bounds
+ ( const iType0 & op_rank , const iType1 & rank , const char* label , const MapType & map , Args ... args )
+{
+ if ( static_cast<iType0>(rank) > op_rank ) {
+ Kokkos::abort( "DynRankView Bounds Checking Error: Need at least rank arguments to the operator()" );
+ }
+
+ if ( ! dyn_rank_view_verify_operator_bounds<0>( rank , map , args ... ) ) {
+#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
+ enum { LEN = 1024 };
+ char buffer[ LEN ];
+ int n = snprintf(buffer,LEN,"DynRankView bounds error of view %s (", label);
+ dyn_rank_view_error_operator_bounds<0>( buffer + n , LEN - n , map , args ... );
+ Kokkos::Impl::throw_runtime_exception(std::string(buffer));
+#else
+ Kokkos::abort("DynRankView bounds error");
+#endif
+ }
+}
/** \brief Assign compatible default mappings */
struct ViewToDynRankViewTag {};
template< class DstTraits , class SrcTraits >
class ViewMapping< DstTraits , SrcTraits ,
typename std::enable_if<(
std::is_same< typename DstTraits::memory_space , typename SrcTraits::memory_space >::value
&&
std::is_same< typename DstTraits::specialize , void >::value
&&
std::is_same< typename SrcTraits::specialize , void >::value
&&
(
std::is_same< typename DstTraits::array_layout , typename SrcTraits::array_layout >::value
||
(
(
std::is_same< typename DstTraits::array_layout , Kokkos::LayoutLeft >::value ||
std::is_same< typename DstTraits::array_layout , Kokkos::LayoutRight >::value ||
std::is_same< typename DstTraits::array_layout , Kokkos::LayoutStride >::value
)
&&
(
std::is_same< typename SrcTraits::array_layout , Kokkos::LayoutLeft >::value ||
std::is_same< typename SrcTraits::array_layout , Kokkos::LayoutRight >::value ||
std::is_same< typename SrcTraits::array_layout , Kokkos::LayoutStride >::value
)
)
)
) , ViewToDynRankViewTag >::type >
{
private:
enum { is_assignable_value_type =
std::is_same< typename DstTraits::value_type
, typename SrcTraits::value_type >::value ||
std::is_same< typename DstTraits::value_type
, typename SrcTraits::const_value_type >::value };
enum { is_assignable_layout =
std::is_same< typename DstTraits::array_layout
, typename SrcTraits::array_layout >::value ||
std::is_same< typename DstTraits::array_layout
, Kokkos::LayoutStride >::value
};
public:
enum { is_assignable = is_assignable_value_type &&
is_assignable_layout };
typedef ViewMapping< DstTraits , void > DstType ;
typedef ViewMapping< SrcTraits , void > SrcType ;
template < typename DT , typename ... DP , typename ST , typename ... SP >
KOKKOS_INLINE_FUNCTION
static void assign( Kokkos::Experimental::DynRankView< DT , DP...> & dst , const Kokkos::View< ST , SP... > & src )
{
static_assert( is_assignable_value_type
, "View assignment must have same value type or const = non-const" );
static_assert( is_assignable_layout
, "View assignment must have compatible layout or have rank <= 1" );
// Removed dimension checks...
typedef typename DstType::offset_type dst_offset_type ;
dst.m_map.m_offset = dst_offset_type(std::integral_constant<unsigned,0>() , src.layout() ); //Check this for integer input1 for padding, etc
dst.m_map.m_handle = Kokkos::Experimental::Impl::ViewDataHandle< DstTraits >::assign( src.m_map.m_handle , src.m_track );
dst.m_track.assign( src.m_track , DstTraits::is_managed );
dst.m_rank = src.Rank ;
}
};
} //end Impl
/* \class DynRankView
* \brief Container that creates a Kokkos view with rank determined at runtime.
* Essentially this is a rank 7 view that wraps the access operators
* to yield the functionality of a view
*
* Changes from View
* 1. The rank of the DynRankView is returned by the method rank()
* 2. Max rank of a DynRankView is 7
* 3. subview name is subdynrankview
* 4. Every subdynrankview is returned with LayoutStride
*
* NEW: Redesigned DynRankView
* 5. subview function name now available
* 6. Copy and Copy-Assign View to DynRankView
* 7. deep_copy between Views and DynRankViews
* 8. rank( view ); returns the rank of View or DynRankView
*
*/
template< class > struct is_dyn_rank_view : public std::false_type {};
template< class D, class ... P >
struct is_dyn_rank_view< Kokkos::Experimental::DynRankView<D,P...> > : public std::true_type {};
template< typename DataType , class ... Properties >
class DynRankView : public ViewTraits< DataType , Properties ... >
{
static_assert( !std::is_array<DataType>::value && !std::is_pointer<DataType>::value , "Cannot template DynRankView with array or pointer datatype - must be pod" );
private:
template < class , class ... > friend class DynRankView ;
-// template < class , class ... > friend class Kokkos::Experimental::View ; //unnecessary now...
template < class , class ... > friend class Impl::ViewMapping ;
public:
typedef ViewTraits< DataType , Properties ... > drvtraits ;
typedef View< DataType******* , Properties...> view_type ;
typedef ViewTraits< DataType******* , Properties ... > traits ;
private:
typedef Kokkos::Experimental::Impl::ViewMapping< traits , void > map_type ;
typedef Kokkos::Experimental::Impl::SharedAllocationTracker track_type ;
track_type m_track ;
map_type m_map ;
unsigned m_rank;
public:
KOKKOS_INLINE_FUNCTION
view_type & DownCast() const { return ( view_type & ) (*this); }
KOKKOS_INLINE_FUNCTION
const view_type & ConstDownCast() const { return (const view_type & ) (*this); }
//Types below - at least the HostMirror requires the value_type, NOT the rank 7 data_type of the traits
/** \brief Compatible view of array of scalar types */
typedef DynRankView< typename drvtraits::scalar_array_type ,
typename drvtraits::array_layout ,
typename drvtraits::device_type ,
typename drvtraits::memory_traits >
array_type ;
/** \brief Compatible view of const data type */
typedef DynRankView< typename drvtraits::const_data_type ,
typename drvtraits::array_layout ,
typename drvtraits::device_type ,
typename drvtraits::memory_traits >
const_type ;
/** \brief Compatible view of non-const data type */
typedef DynRankView< typename drvtraits::non_const_data_type ,
typename drvtraits::array_layout ,
typename drvtraits::device_type ,
typename drvtraits::memory_traits >
non_const_type ;
/** \brief Compatible HostMirror view */
typedef DynRankView< typename drvtraits::non_const_data_type ,
typename drvtraits::array_layout ,
typename drvtraits::host_mirror_space >
HostMirror ;
//----------------------------------------
// Domain rank and extents
// enum { Rank = map_type::Rank }; //Will be dyn rank of 7 always, keep the enum?
template< typename iType >
KOKKOS_INLINE_FUNCTION constexpr
typename std::enable_if< std::is_integral<iType>::value , size_t >::type
extent( const iType & r ) const
{ return m_map.extent(r); }
template< typename iType >
KOKKOS_INLINE_FUNCTION constexpr
typename std::enable_if< std::is_integral<iType>::value , int >::type
extent_int( const iType & r ) const
{ return static_cast<int>(m_map.extent(r)); }
KOKKOS_INLINE_FUNCTION constexpr
typename traits::array_layout layout() const
{ return m_map.layout(); }
//----------------------------------------
/* Deprecate all 'dimension' functions in favor of
* ISO/C++ vocabulary 'extent'.
*/
template< typename iType >
KOKKOS_INLINE_FUNCTION constexpr
typename std::enable_if< std::is_integral<iType>::value , size_t >::type
dimension( const iType & r ) const { return extent( r ); }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_0() const { return m_map.dimension_0(); }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_1() const { return m_map.dimension_1(); }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_2() const { return m_map.dimension_2(); }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_3() const { return m_map.dimension_3(); }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_4() const { return m_map.dimension_4(); }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_5() const { return m_map.dimension_5(); }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_6() const { return m_map.dimension_6(); }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_7() const { return m_map.dimension_7(); }
//----------------------------------------
KOKKOS_INLINE_FUNCTION constexpr size_t size() const { return m_map.dimension_0() *
m_map.dimension_1() *
m_map.dimension_2() *
m_map.dimension_3() *
m_map.dimension_4() *
m_map.dimension_5() *
m_map.dimension_6() *
m_map.dimension_7(); }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_0() const { return m_map.stride_0(); }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_1() const { return m_map.stride_1(); }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_2() const { return m_map.stride_2(); }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_3() const { return m_map.stride_3(); }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_4() const { return m_map.stride_4(); }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_5() const { return m_map.stride_5(); }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_6() const { return m_map.stride_6(); }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_7() const { return m_map.stride_7(); }
template< typename iType >
KOKKOS_INLINE_FUNCTION void stride( iType * const s ) const { m_map.stride(s); }
//----------------------------------------
// Range span is the span which contains all members.
typedef typename map_type::reference_type reference_type ;
typedef typename map_type::pointer_type pointer_type ;
enum { reference_type_is_lvalue_reference = std::is_lvalue_reference< reference_type >::value };
KOKKOS_INLINE_FUNCTION constexpr size_t span() const { return m_map.span(); }
// Deprecated, use 'span()' instead
KOKKOS_INLINE_FUNCTION constexpr size_t capacity() const { return m_map.span(); }
KOKKOS_INLINE_FUNCTION constexpr bool span_is_contiguous() const { return m_map.span_is_contiguous(); }
KOKKOS_INLINE_FUNCTION constexpr pointer_type data() const { return m_map.data(); }
// Deprecated, use 'span_is_contigous()' instead
KOKKOS_INLINE_FUNCTION constexpr bool is_contiguous() const { return m_map.span_is_contiguous(); }
// Deprecated, use 'data()' instead
KOKKOS_INLINE_FUNCTION constexpr pointer_type ptr_on_device() const { return m_map.data(); }
//----------------------------------------
// Allow specializations to query their specialized map
KOKKOS_INLINE_FUNCTION
const Kokkos::Experimental::Impl::ViewMapping< traits , void > &
implementation_map() const { return m_map ; }
//----------------------------------------
private:
enum {
is_layout_left = std::is_same< typename traits::array_layout
, Kokkos::LayoutLeft >::value ,
is_layout_right = std::is_same< typename traits::array_layout
, Kokkos::LayoutRight >::value ,
is_layout_stride = std::is_same< typename traits::array_layout
, Kokkos::LayoutStride >::value ,
is_default_map =
std::is_same< typename traits::specialize , void >::value &&
( is_layout_left || is_layout_right || is_layout_stride )
};
+ template< class Space , bool = Kokkos::Impl::MemorySpaceAccess< Space , typename traits::memory_space >::accessible > struct verify_space
+ { KOKKOS_FORCEINLINE_FUNCTION static void check() {} };
+
+ template< class Space > struct verify_space<Space,false>
+ { KOKKOS_FORCEINLINE_FUNCTION static void check()
+ { Kokkos::abort("Kokkos::DynRankView ERROR: attempt to access inaccessible memory space"); };
+ };
+
// Bounds checking macros
#if defined( KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK )
-#define KOKKOS_VIEW_OPERATOR_VERIFY( N , ARG ) \
- Kokkos::Impl::VerifyExecutionCanAccessMemorySpace \
- < Kokkos::Impl::ActiveExecutionMemorySpace , typename traits::memory_space >::verify(); \
- Kokkos::Experimental::Impl::verify_dynrankview_rank ( N , *this ) ; \
- Kokkos::Experimental::Impl::view_verify_operator_bounds ARG ;
+// rank of the calling operator - included as first argument in ARG
+#define KOKKOS_VIEW_OPERATOR_VERIFY( ARG ) \
+ DynRankView::template verify_space< Kokkos::Impl::ActiveExecutionMemorySpace >::check(); \
+ Kokkos::Experimental::Impl::dyn_rank_view_verify_operator_bounds ARG ;
#else
-#define KOKKOS_VIEW_OPERATOR_VERIFY( N , ARG ) \
- Kokkos::Impl::VerifyExecutionCanAccessMemorySpace \
- < Kokkos::Impl::ActiveExecutionMemorySpace , typename traits::memory_space >::verify();
+#define KOKKOS_VIEW_OPERATOR_VERIFY( ARG ) \
+ DynRankView::template verify_space< Kokkos::Impl::ActiveExecutionMemorySpace >::check();
#endif
public:
KOKKOS_INLINE_FUNCTION
constexpr unsigned rank() const { return m_rank; }
//operators ()
// Rank 0
KOKKOS_INLINE_FUNCTION
reference_type operator()() const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( 0 , ( implementation_map() ) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (0 , this->rank() , NULL , m_map) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (0 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map) )
+ #endif
return implementation_map().reference();
//return m_map.reference(0,0,0,0,0,0,0);
}
// Rank 1
// This assumes a contiguous underlying memory (i.e. no padding, no striding...)
template< typename iType >
KOKKOS_INLINE_FUNCTION
typename std::enable_if< std::is_same<typename drvtraits::value_type, typename drvtraits::scalar_array_type>::value && std::is_integral<iType>::value, reference_type>::type
operator[](const iType & i0) const
{
return data()[i0];
}
// This assumes a contiguous underlying memory (i.e. no padding, no striding...
// AND a Trilinos/Sacado scalar type )
template< typename iType >
KOKKOS_INLINE_FUNCTION
typename std::enable_if< !std::is_same<typename drvtraits::value_type, typename drvtraits::scalar_array_type>::value && std::is_integral<iType>::value, reference_type>::type
operator[](const iType & i0) const
{
// auto map = implementation_map();
const size_t dim_scalar = m_map.dimension_scalar();
const size_t bytes = this->span() / dim_scalar;
typedef Kokkos::View<DataType*, typename traits::array_layout, typename traits::device_type, Kokkos::MemoryTraits<Kokkos::Unmanaged | traits::memory_traits::RandomAccess | traits::memory_traits::Atomic> > tmp_view_type;
tmp_view_type rankone_view(this->data(), bytes, dim_scalar);
return rankone_view(i0);
}
+ // Rank 1 parenthesis
template< typename iType >
KOKKOS_INLINE_FUNCTION
typename std::enable_if< (std::is_same<typename traits::specialize , void>::value && std::is_integral<iType>::value), reference_type>::type
operator()(const iType & i0 ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( 1 , ( m_map , i0 ) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (1 , this->rank() , NULL , m_map , i0) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (1 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0) )
+ #endif
return m_map.reference(i0);
}
template< typename iType >
KOKKOS_INLINE_FUNCTION
typename std::enable_if< !(std::is_same<typename traits::specialize , void>::value && std::is_integral<iType>::value), reference_type>::type
operator()(const iType & i0 ) const
{
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (1 , this->rank() , NULL , m_map , i0) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (1 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0) )
+ #endif
return m_map.reference(i0,0,0,0,0,0,0);
}
// Rank 2
template< typename iType0 , typename iType1 >
KOKKOS_INLINE_FUNCTION
typename std::enable_if< (std::is_same<typename traits::specialize , void>::value && std::is_integral<iType0>::value && std::is_integral<iType1>::value), reference_type>::type
operator()(const iType0 & i0 , const iType1 & i1 ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( 2 , ( m_map , i0 , i1 ) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (2 , this->rank() , NULL , m_map , i0 , i1) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (2 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1) )
+ #endif
return m_map.reference(i0,i1);
}
template< typename iType0 , typename iType1 >
KOKKOS_INLINE_FUNCTION
typename std::enable_if< !(std::is_same<typename drvtraits::specialize , void>::value && std::is_integral<iType0>::value), reference_type>::type
operator()(const iType0 & i0 , const iType1 & i1 ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( 2 , ( m_map , i0 , i1 ) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (2 , this->rank() , NULL , m_map , i0 , i1) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (2 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1) )
+ #endif
return m_map.reference(i0,i1,0,0,0,0,0);
}
// Rank 3
template< typename iType0 , typename iType1 , typename iType2 >
KOKKOS_INLINE_FUNCTION
typename std::enable_if< (std::is_same<typename traits::specialize , void>::value && std::is_integral<iType0>::value && std::is_integral<iType1>::value && std::is_integral<iType2>::value), reference_type>::type
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( 3 , ( m_map , i0 , i1 , i2 ) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (3 , this->rank() , NULL , m_map , i0 , i1 , i2) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (3 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2) )
+ #endif
return m_map.reference(i0,i1,i2);
}
template< typename iType0 , typename iType1 , typename iType2 >
KOKKOS_INLINE_FUNCTION
typename std::enable_if< !(std::is_same<typename drvtraits::specialize , void>::value && std::is_integral<iType0>::value), reference_type>::type
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( 3 , ( m_map , i0 , i1 , i2 ) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (3 , this->rank() , NULL , m_map , i0 , i1 , i2) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (3 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2) )
+ #endif
return m_map.reference(i0,i1,i2,0,0,0,0);
}
// Rank 4
template< typename iType0 , typename iType1 , typename iType2 , typename iType3 >
KOKKOS_INLINE_FUNCTION
typename std::enable_if< (std::is_same<typename traits::specialize , void>::value && std::is_integral<iType0>::value && std::is_integral<iType1>::value && std::is_integral<iType2>::value && std::is_integral<iType3>::value), reference_type>::type
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( 4 , ( m_map , i0 , i1 , i2 , i3 ) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (4 , this->rank() , NULL , m_map , i0 , i1 , i2 , i3) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (4 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3) )
+ #endif
return m_map.reference(i0,i1,i2,i3);
}
template< typename iType0 , typename iType1 , typename iType2 , typename iType3 >
KOKKOS_INLINE_FUNCTION
typename std::enable_if< !(std::is_same<typename drvtraits::specialize , void>::value && std::is_integral<iType0>::value), reference_type>::type
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( 4 , ( m_map , i0 , i1 , i2 , i3 ) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (4 , this->rank() , NULL , m_map , i0 , i1 , i2 , i3) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (4 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3) )
+ #endif
return m_map.reference(i0,i1,i2,i3,0,0,0);
}
// Rank 5
template< typename iType0 , typename iType1 , typename iType2 , typename iType3, typename iType4 >
KOKKOS_INLINE_FUNCTION
typename std::enable_if< (std::is_same<typename traits::specialize , void>::value && std::is_integral<iType0>::value && std::is_integral<iType1>::value && std::is_integral<iType2>::value && std::is_integral<iType3>::value && std::is_integral<iType4>::value), reference_type>::type
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 , const iType4 & i4 ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( 5 , ( m_map , i0 , i1 , i2 , i3 , i4 ) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (5 , this->rank() , NULL , m_map , i0 , i1 , i2 , i3, i4) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (5 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3,i4) )
+ #endif
return m_map.reference(i0,i1,i2,i3,i4);
}
template< typename iType0 , typename iType1 , typename iType2 , typename iType3, typename iType4 >
KOKKOS_INLINE_FUNCTION
typename std::enable_if< !(std::is_same<typename drvtraits::specialize , void>::value && std::is_integral<iType0>::value), reference_type>::type
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 , const iType4 & i4 ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( 5 , ( m_map , i0 , i1 , i2 , i3 , i4 ) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (5 , this->rank() , NULL , m_map , i0 , i1 , i2 , i3, i4) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (5 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3,i4) )
+ #endif
return m_map.reference(i0,i1,i2,i3,i4,0,0);
}
// Rank 6
template< typename iType0 , typename iType1 , typename iType2 , typename iType3, typename iType4 , typename iType5 >
KOKKOS_INLINE_FUNCTION
typename std::enable_if< (std::is_same<typename traits::specialize , void>::value && std::is_integral<iType0>::value && std::is_integral<iType1>::value && std::is_integral<iType2>::value && std::is_integral<iType3>::value && std::is_integral<iType4>::value && std::is_integral<iType5>::value), reference_type>::type
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 , const iType4 & i4 , const iType5 & i5 ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( 6 , ( m_map , i0 , i1 , i2 , i3 , i4 , i5 ) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (6 , this->rank() , NULL , m_map , i0 , i1 , i2 , i3, i4 , i5) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (6 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3,i4,i5) )
+ #endif
return m_map.reference(i0,i1,i2,i3,i4,i5);
}
template< typename iType0 , typename iType1 , typename iType2 , typename iType3, typename iType4 , typename iType5 >
KOKKOS_INLINE_FUNCTION
typename std::enable_if< !(std::is_same<typename drvtraits::specialize , void>::value && std::is_integral<iType0>::value), reference_type>::type
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 , const iType4 & i4 , const iType5 & i5 ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( 6 , ( m_map , i0 , i1 , i2 , i3 , i4 , i5 ) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (6 , this->rank() , NULL , m_map , i0 , i1 , i2 , i3, i4 , i5) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (6 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3,i4,i5) )
+ #endif
return m_map.reference(i0,i1,i2,i3,i4,i5,0);
}
// Rank 7
template< typename iType0 , typename iType1 , typename iType2 , typename iType3, typename iType4 , typename iType5 , typename iType6 >
KOKKOS_INLINE_FUNCTION
typename std::enable_if< (std::is_integral<iType0>::value && std::is_integral<iType1>::value && std::is_integral<iType2>::value && std::is_integral<iType3>::value && std::is_integral<iType4>::value && std::is_integral<iType5>::value && std::is_integral<iType6>::value), reference_type>::type
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 , const iType4 & i4 , const iType5 & i5 , const iType6 & i6 ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( 7 , ( m_map , i0 , i1 , i2 , i3 , i4 , i5 , i6 ) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (7 , this->rank() , NULL , m_map , i0 , i1 , i2 , i3, i4 , i5 , i6) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (7 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3,i4,i5,i6) )
+ #endif
return m_map.reference(i0,i1,i2,i3,i4,i5,i6);
}
#undef KOKKOS_VIEW_OPERATOR_VERIFY
//----------------------------------------
// Standard constructor, destructor, and assignment operators...
KOKKOS_INLINE_FUNCTION
~DynRankView() {}
KOKKOS_INLINE_FUNCTION
DynRankView() : m_track(), m_map(), m_rank() {} //Default ctor
KOKKOS_INLINE_FUNCTION
DynRankView( const DynRankView & rhs ) : m_track( rhs.m_track ), m_map( rhs.m_map ), m_rank(rhs.m_rank) {}
KOKKOS_INLINE_FUNCTION
DynRankView( DynRankView && rhs ) : m_track( rhs.m_track ), m_map( rhs.m_map ), m_rank(rhs.m_rank) {}
KOKKOS_INLINE_FUNCTION
DynRankView & operator = ( const DynRankView & rhs ) { m_track = rhs.m_track; m_map = rhs.m_map; m_rank = rhs.m_rank; return *this; }
KOKKOS_INLINE_FUNCTION
DynRankView & operator = ( DynRankView && rhs ) { m_track = rhs.m_track; m_map = rhs.m_map; m_rank = rhs.m_rank; return *this; }
//----------------------------------------
// Compatible view copy constructor and assignment
// may assign unmanaged from managed.
template< class RT , class ... RP >
KOKKOS_INLINE_FUNCTION
DynRankView( const DynRankView<RT,RP...> & rhs )
: m_track( rhs.m_track , traits::is_managed )
, m_map()
, m_rank(rhs.m_rank)
{
typedef typename DynRankView<RT,RP...> ::traits SrcTraits ;
typedef Kokkos::Experimental::Impl::ViewMapping< traits , SrcTraits , void > Mapping ;
static_assert( Mapping::is_assignable , "Incompatible DynRankView copy construction" );
Mapping::assign( m_map , rhs.m_map , rhs.m_track );
}
template< class RT , class ... RP >
KOKKOS_INLINE_FUNCTION
DynRankView & operator = (const DynRankView<RT,RP...> & rhs )
{
typedef typename DynRankView<RT,RP...> ::traits SrcTraits ;
typedef Kokkos::Experimental::Impl::ViewMapping< traits , SrcTraits , void > Mapping ;
static_assert( Mapping::is_assignable , "Incompatible DynRankView copy construction" );
Mapping::assign( m_map , rhs.m_map , rhs.m_track );
m_track.assign( rhs.m_track , traits::is_managed );
m_rank = rhs.rank();
return *this;
}
// Experimental
// Copy/Assign View to DynRankView
template< class RT , class ... RP >
KOKKOS_INLINE_FUNCTION
DynRankView( const View<RT,RP...> & rhs )
: m_track()
, m_map()
, m_rank( rhs.Rank )
{
typedef typename View<RT,RP...>::traits SrcTraits ;
typedef Kokkos::Experimental::Impl::ViewMapping< traits , SrcTraits , Kokkos::Experimental::Impl::ViewToDynRankViewTag > Mapping ;
static_assert( Mapping::is_assignable , "Incompatible DynRankView copy construction" );
Mapping::assign( *this , rhs );
}
template< class RT , class ... RP >
KOKKOS_INLINE_FUNCTION
DynRankView & operator = ( const View<RT,RP...> & rhs )
{
typedef typename View<RT,RP...>::traits SrcTraits ;
typedef Kokkos::Experimental::Impl::ViewMapping< traits , SrcTraits , Kokkos::Experimental::Impl::ViewToDynRankViewTag > Mapping ;
static_assert( Mapping::is_assignable , "Incompatible View to DynRankView copy assignment" );
Mapping::assign( *this , rhs );
return *this ;
}
//----------------------------------------
// Allocation tracking properties
KOKKOS_INLINE_FUNCTION
int use_count() const
{ return m_track.use_count(); }
inline
const std::string label() const
{ return m_track.template get_label< typename traits::memory_space >(); }
//----------------------------------------
// Allocation according to allocation properties and array layout
// unused arg_layout dimensions must be set to ~size_t(0) so that rank deduction can properly take place
template< class ... P >
explicit inline
DynRankView( const Impl::ViewCtorProp< P ... > & arg_prop
, typename std::enable_if< ! Impl::ViewCtorProp< P... >::has_pointer
, typename traits::array_layout
>::type const & arg_layout
)
: m_track()
, m_map()
, m_rank( Impl::DynRankDimTraits<typename traits::specialize>::computeRank(arg_layout) )
{
// Append layout and spaces if not input
typedef Impl::ViewCtorProp< P ... > alloc_prop_input ;
// use 'std::integral_constant<unsigned,I>' for non-types
// to avoid duplicate class error.
typedef Impl::ViewCtorProp
< P ...
, typename std::conditional
< alloc_prop_input::has_label
, std::integral_constant<unsigned,0>
, typename std::string
>::type
, typename std::conditional
< alloc_prop_input::has_memory_space
, std::integral_constant<unsigned,1>
, typename traits::device_type::memory_space
>::type
, typename std::conditional
< alloc_prop_input::has_execution_space
, std::integral_constant<unsigned,2>
, typename traits::device_type::execution_space
>::type
> alloc_prop ;
static_assert( traits::is_managed
, "View allocation constructor requires managed memory" );
if ( alloc_prop::initialize &&
! alloc_prop::execution_space::is_initialized() ) {
// If initializing view data then
// the execution space must be initialized.
Kokkos::Impl::throw_runtime_exception("Constructing DynRankView and initializing data with uninitialized execution space");
}
// Copy the input allocation properties with possibly defaulted properties
alloc_prop prop( arg_prop );
//------------------------------------------------------------
#if defined( KOKKOS_HAVE_CUDA )
// If allocating in CudaUVMSpace must fence before and after
// the allocation to protect against possible concurrent access
// on the CPU and the GPU.
// Fence using the trait's executon space (which will be Kokkos::Cuda)
// to avoid incomplete type errors from usng Kokkos::Cuda directly.
if ( std::is_same< Kokkos::CudaUVMSpace , typename traits::device_type::memory_space >::value ) {
traits::device_type::memory_space::execution_space::fence();
}
#endif
//------------------------------------------------------------
Kokkos::Experimental::Impl::SharedAllocationRecord<> *
record = m_map.allocate_shared( prop , Impl::DynRankDimTraits<typename traits::specialize>::createLayout(arg_layout) );
//------------------------------------------------------------
#if defined( KOKKOS_HAVE_CUDA )
if ( std::is_same< Kokkos::CudaUVMSpace , typename traits::device_type::memory_space >::value ) {
traits::device_type::memory_space::execution_space::fence();
}
#endif
//------------------------------------------------------------
// Setup and initialization complete, start tracking
m_track.assign_allocated_record_to_uninitialized( record );
}
// Wrappers
template< class ... P >
explicit KOKKOS_INLINE_FUNCTION
DynRankView( const Impl::ViewCtorProp< P ... > & arg_prop
, typename std::enable_if< Impl::ViewCtorProp< P... >::has_pointer
, typename traits::array_layout
>::type const & arg_layout
)
: m_track() // No memory tracking
, m_map( arg_prop , Impl::DynRankDimTraits<typename traits::specialize>::createLayout(arg_layout) )
, m_rank( Impl::DynRankDimTraits<typename traits::specialize>::computeRank(arg_layout) )
{
static_assert(
std::is_same< pointer_type
, typename Impl::ViewCtorProp< P... >::pointer_type
>::value ,
"Constructing DynRankView to wrap user memory must supply matching pointer type" );
}
//----------------------------------------
//Constructor(s)
// Simple dimension-only layout
template< class ... P >
explicit inline
DynRankView( const Impl::ViewCtorProp< P ... > & arg_prop
, typename std::enable_if< ! Impl::ViewCtorProp< P... >::has_pointer
, size_t
>::type const arg_N0 = ~size_t(0)
, const size_t arg_N1 = ~size_t(0)
, const size_t arg_N2 = ~size_t(0)
, const size_t arg_N3 = ~size_t(0)
, const size_t arg_N4 = ~size_t(0)
, const size_t arg_N5 = ~size_t(0)
, const size_t arg_N6 = ~size_t(0)
, const size_t arg_N7 = ~size_t(0)
)
: DynRankView( arg_prop
, typename traits::array_layout
( arg_N0 , arg_N1 , arg_N2 , arg_N3 , arg_N4 , arg_N5 , arg_N6 , arg_N7 )
)
{}
template< class ... P >
explicit KOKKOS_INLINE_FUNCTION
DynRankView( const Impl::ViewCtorProp< P ... > & arg_prop
, typename std::enable_if< Impl::ViewCtorProp< P... >::has_pointer
, size_t
>::type const arg_N0 = ~size_t(0)
, const size_t arg_N1 = ~size_t(0)
, const size_t arg_N2 = ~size_t(0)
, const size_t arg_N3 = ~size_t(0)
, const size_t arg_N4 = ~size_t(0)
, const size_t arg_N5 = ~size_t(0)
, const size_t arg_N6 = ~size_t(0)
, const size_t arg_N7 = ~size_t(0)
)
: DynRankView( arg_prop
, typename traits::array_layout
( arg_N0 , arg_N1 , arg_N2 , arg_N3 , arg_N4 , arg_N5 , arg_N6 , arg_N7 )
)
{}
// Allocate with label and layout
template< typename Label >
explicit inline
DynRankView( const Label & arg_label
, typename std::enable_if<
Kokkos::Experimental::Impl::is_view_label<Label>::value ,
typename traits::array_layout >::type const & arg_layout
)
: DynRankView( Impl::ViewCtorProp< std::string >( arg_label ) , arg_layout )
{}
// Allocate label and layout, must disambiguate from subview constructor
template< typename Label >
explicit inline
DynRankView( const Label & arg_label
, typename std::enable_if<
Kokkos::Experimental::Impl::is_view_label<Label>::value ,
const size_t >::type arg_N0 = ~size_t(0)
, const size_t arg_N1 = ~size_t(0)
, const size_t arg_N2 = ~size_t(0)
, const size_t arg_N3 = ~size_t(0)
, const size_t arg_N4 = ~size_t(0)
, const size_t arg_N5 = ~size_t(0)
, const size_t arg_N6 = ~size_t(0)
, const size_t arg_N7 = ~size_t(0)
)
: DynRankView( Impl::ViewCtorProp< std::string >( arg_label )
, typename traits::array_layout
( arg_N0 , arg_N1 , arg_N2 , arg_N3 , arg_N4 , arg_N5 , arg_N6 , arg_N7 )
)
{}
// For backward compatibility
explicit inline
DynRankView( const ViewAllocateWithoutInitializing & arg_prop
, const typename traits::array_layout & arg_layout
)
: DynRankView( Impl::ViewCtorProp< std::string , Kokkos::Experimental::Impl::WithoutInitializing_t >( arg_prop.label , Kokkos::Experimental::WithoutInitializing )
, Impl::DynRankDimTraits<typename traits::specialize>::createLayout(arg_layout)
)
{}
explicit inline
DynRankView( const ViewAllocateWithoutInitializing & arg_prop
, const size_t arg_N0 = ~size_t(0)
, const size_t arg_N1 = ~size_t(0)
, const size_t arg_N2 = ~size_t(0)
, const size_t arg_N3 = ~size_t(0)
, const size_t arg_N4 = ~size_t(0)
, const size_t arg_N5 = ~size_t(0)
, const size_t arg_N6 = ~size_t(0)
, const size_t arg_N7 = ~size_t(0)
)
: DynRankView(Impl::ViewCtorProp< std::string , Kokkos::Experimental::Impl::WithoutInitializing_t >( arg_prop.label , Kokkos::Experimental::WithoutInitializing ), arg_N0, arg_N1, arg_N2, arg_N3, arg_N4, arg_N5, arg_N6, arg_N7 )
{}
//----------------------------------------
// Memory span required to wrap these dimensions.
static constexpr size_t required_allocation_size(
const size_t arg_N0 = 0
, const size_t arg_N1 = 0
, const size_t arg_N2 = 0
, const size_t arg_N3 = 0
, const size_t arg_N4 = 0
, const size_t arg_N5 = 0
, const size_t arg_N6 = 0
, const size_t arg_N7 = 0
)
{
return map_type::memory_span(
typename traits::array_layout
( arg_N0 , arg_N1 , arg_N2 , arg_N3
, arg_N4 , arg_N5 , arg_N6 , arg_N7 ) );
}
explicit KOKKOS_INLINE_FUNCTION
DynRankView( pointer_type arg_ptr
, const size_t arg_N0 = ~size_t(0)
, const size_t arg_N1 = ~size_t(0)
, const size_t arg_N2 = ~size_t(0)
, const size_t arg_N3 = ~size_t(0)
, const size_t arg_N4 = ~size_t(0)
, const size_t arg_N5 = ~size_t(0)
, const size_t arg_N6 = ~size_t(0)
, const size_t arg_N7 = ~size_t(0)
)
: DynRankView( Impl::ViewCtorProp<pointer_type>(arg_ptr) , arg_N0, arg_N1, arg_N2, arg_N3, arg_N4, arg_N5, arg_N6, arg_N7 )
{}
explicit KOKKOS_INLINE_FUNCTION
DynRankView( pointer_type arg_ptr
, typename traits::array_layout & arg_layout
)
: DynRankView( Impl::ViewCtorProp<pointer_type>(arg_ptr) , arg_layout )
{}
//----------------------------------------
// Shared scratch memory constructor
static inline
size_t shmem_size( const size_t arg_N0 = ~size_t(0) ,
const size_t arg_N1 = ~size_t(0) ,
const size_t arg_N2 = ~size_t(0) ,
const size_t arg_N3 = ~size_t(0) ,
const size_t arg_N4 = ~size_t(0) ,
const size_t arg_N5 = ~size_t(0) ,
const size_t arg_N6 = ~size_t(0) ,
const size_t arg_N7 = ~size_t(0) )
{
const size_t num_passed_args =
( arg_N0 != ~size_t(0) ) + ( arg_N1 != ~size_t(0) ) + ( arg_N2 != ~size_t(0) ) +
( arg_N3 != ~size_t(0) ) + ( arg_N4 != ~size_t(0) ) + ( arg_N5 != ~size_t(0) ) +
( arg_N6 != ~size_t(0) ) + ( arg_N7 != ~size_t(0) );
if ( std::is_same<typename traits::specialize , void>::value && num_passed_args != traits::rank_dynamic ) {
Kokkos::abort( "Kokkos::View::shmem_size() rank_dynamic != number of arguments.\n" );
}
{}
return map_type::memory_span(
typename traits::array_layout
( arg_N0 , arg_N1 , arg_N2 , arg_N3
, arg_N4 , arg_N5 , arg_N6 , arg_N7 ) );
}
explicit KOKKOS_INLINE_FUNCTION
DynRankView( const typename traits::execution_space::scratch_memory_space & arg_space
, const typename traits::array_layout & arg_layout )
: DynRankView( Impl::ViewCtorProp<pointer_type>(
reinterpret_cast<pointer_type>(
arg_space.get_shmem( map_type::memory_span(
Impl::DynRankDimTraits<typename traits::specialize>::createLayout( arg_layout ) //is this correct?
) ) ) )
, arg_layout )
{}
explicit KOKKOS_INLINE_FUNCTION
DynRankView( const typename traits::execution_space::scratch_memory_space & arg_space
, const size_t arg_N0 = ~size_t(0)
, const size_t arg_N1 = ~size_t(0)
, const size_t arg_N2 = ~size_t(0)
, const size_t arg_N3 = ~size_t(0)
, const size_t arg_N4 = ~size_t(0)
, const size_t arg_N5 = ~size_t(0)
, const size_t arg_N6 = ~size_t(0)
, const size_t arg_N7 = ~size_t(0) )
: DynRankView( Impl::ViewCtorProp<pointer_type>(
reinterpret_cast<pointer_type>(
arg_space.get_shmem(
map_type::memory_span(
Impl::DynRankDimTraits<typename traits::specialize>::createLayout(
typename traits::array_layout
( arg_N0 , arg_N1 , arg_N2 , arg_N3
, arg_N4 , arg_N5 , arg_N6 , arg_N7 ) ) ) ) )
)
, typename traits::array_layout
( arg_N0 , arg_N1 , arg_N2 , arg_N3
, arg_N4 , arg_N5 , arg_N6 , arg_N7 )
)
{}
};
template < typename D , class ... P >
KOKKOS_INLINE_FUNCTION
constexpr unsigned rank( const DynRankView<D , P...> & DRV ) { return DRV.rank(); } //needed for transition to common constexpr method in view and dynrankview to return rank
//----------------------------------------------------------------------------
// Subview mapping.
// Deduce destination view type from source view traits and subview arguments
namespace Impl {
struct DynRankSubviewTag {};
template< class SrcTraits , class ... Args >
struct ViewMapping
< typename std::enable_if<(
std::is_same< typename SrcTraits::specialize , void >::value
&&
(
std::is_same< typename SrcTraits::array_layout
, Kokkos::LayoutLeft >::value ||
std::is_same< typename SrcTraits::array_layout
, Kokkos::LayoutRight >::value ||
std::is_same< typename SrcTraits::array_layout
, Kokkos::LayoutStride >::value
)
), DynRankSubviewTag >::type
, SrcTraits
, Args ... >
{
private:
enum
{ RZ = false
, R0 = bool(is_integral_extent<0,Args...>::value)
, R1 = bool(is_integral_extent<1,Args...>::value)
, R2 = bool(is_integral_extent<2,Args...>::value)
, R3 = bool(is_integral_extent<3,Args...>::value)
, R4 = bool(is_integral_extent<4,Args...>::value)
, R5 = bool(is_integral_extent<5,Args...>::value)
, R6 = bool(is_integral_extent<6,Args...>::value)
};
enum { rank = unsigned(R0) + unsigned(R1) + unsigned(R2) + unsigned(R3)
+ unsigned(R4) + unsigned(R5) + unsigned(R6) };
typedef Kokkos::LayoutStride array_layout ;
typedef typename SrcTraits::value_type value_type ;
typedef value_type******* data_type ;
public:
- typedef Kokkos::Experimental::ViewTraits
+ typedef Kokkos::ViewTraits
< data_type
, array_layout
, typename SrcTraits::device_type
, typename SrcTraits::memory_traits > traits_type ;
- typedef Kokkos::Experimental::View
+ typedef Kokkos::View
< data_type
, array_layout
, typename SrcTraits::device_type
, typename SrcTraits::memory_traits > type ;
template< class MemoryTraits >
struct apply {
static_assert( Kokkos::Impl::is_memory_traits< MemoryTraits >::value , "" );
- typedef Kokkos::Experimental::ViewTraits
+ typedef Kokkos::ViewTraits
< data_type
, array_layout
, typename SrcTraits::device_type
, MemoryTraits > traits_type ;
- typedef Kokkos::Experimental::View
+ typedef Kokkos::View
< data_type
, array_layout
, typename SrcTraits::device_type
, MemoryTraits > type ;
};
typedef typename SrcTraits::dimension dimension ;
template < class Arg0 = int, class Arg1 = int, class Arg2 = int, class Arg3 = int, class Arg4 = int, class Arg5 = int, class Arg6 = int >
struct ExtentGenerator {
KOKKOS_INLINE_FUNCTION
static SubviewExtents< 7 , rank > generator ( const dimension & dim , Arg0 arg0 = Arg0(), Arg1 arg1 = Arg1(), Arg2 arg2 = Arg2(), Arg3 arg3 = Arg3(), Arg4 arg4 = Arg4(), Arg5 arg5 = Arg5(), Arg6 arg6 = Arg6() )
{
return SubviewExtents< 7 , rank>( dim , arg0 , arg1 , arg2 , arg3 , arg4 , arg5 , arg6 );
}
};
typedef DynRankView< value_type , array_layout , typename SrcTraits::device_type , typename SrcTraits::memory_traits > ret_type;
template < typename T , class ... P >
KOKKOS_INLINE_FUNCTION
static ret_type subview( const unsigned src_rank , Kokkos::Experimental::DynRankView< T , P...> const & src
, Args ... args )
{
typedef ViewMapping< traits_type, void > DstType ;
typedef typename std::conditional< (rank==0) , ViewDimension<>
, typename std::conditional< (rank==1) , ViewDimension<0>
, typename std::conditional< (rank==2) , ViewDimension<0,0>
, typename std::conditional< (rank==3) , ViewDimension<0,0,0>
, typename std::conditional< (rank==4) , ViewDimension<0,0,0,0>
, typename std::conditional< (rank==5) , ViewDimension<0,0,0,0,0>
, typename std::conditional< (rank==6) , ViewDimension<0,0,0,0,0,0>
, ViewDimension<0,0,0,0,0,0,0>
>::type >::type >::type >::type >::type >::type >::type DstDimType ;
typedef ViewOffset< DstDimType , Kokkos::LayoutStride > dst_offset_type ;
typedef typename DstType::handle_type dst_handle_type ;
ret_type dst ;
const SubviewExtents< 7 , rank > extents =
ExtentGenerator< Args ... >::generator( src.m_map.m_offset.m_dim , args... ) ;
dst_offset_type tempdst( src.m_map.m_offset , extents ) ;
dst.m_track = src.m_track ;
dst.m_map.m_offset.m_dim.N0 = tempdst.m_dim.N0 ;
dst.m_map.m_offset.m_dim.N1 = tempdst.m_dim.N1 ;
dst.m_map.m_offset.m_dim.N2 = tempdst.m_dim.N2 ;
dst.m_map.m_offset.m_dim.N3 = tempdst.m_dim.N3 ;
dst.m_map.m_offset.m_dim.N4 = tempdst.m_dim.N4 ;
dst.m_map.m_offset.m_dim.N5 = tempdst.m_dim.N5 ;
dst.m_map.m_offset.m_dim.N6 = tempdst.m_dim.N6 ;
dst.m_map.m_offset.m_stride.S0 = tempdst.m_stride.S0 ;
dst.m_map.m_offset.m_stride.S1 = tempdst.m_stride.S1 ;
dst.m_map.m_offset.m_stride.S2 = tempdst.m_stride.S2 ;
dst.m_map.m_offset.m_stride.S3 = tempdst.m_stride.S3 ;
dst.m_map.m_offset.m_stride.S4 = tempdst.m_stride.S4 ;
dst.m_map.m_offset.m_stride.S5 = tempdst.m_stride.S5 ;
dst.m_map.m_offset.m_stride.S6 = tempdst.m_stride.S6 ;
dst.m_map.m_handle = dst_handle_type( src.m_map.m_handle +
src.m_map.m_offset( extents.domain_offset(0)
, extents.domain_offset(1)
, extents.domain_offset(2)
, extents.domain_offset(3)
, extents.domain_offset(4)
, extents.domain_offset(5)
, extents.domain_offset(6)
) );
dst.m_rank = ( src_rank > 0 ? unsigned(R0) : 0 )
+ ( src_rank > 1 ? unsigned(R1) : 0 )
+ ( src_rank > 2 ? unsigned(R2) : 0 )
+ ( src_rank > 3 ? unsigned(R3) : 0 )
+ ( src_rank > 4 ? unsigned(R4) : 0 )
+ ( src_rank > 5 ? unsigned(R5) : 0 )
+ ( src_rank > 6 ? unsigned(R6) : 0 ) ;
return dst ;
}
};
} // end Impl
template< class V , class ... Args >
using Subdynrankview = typename Kokkos::Experimental::Impl::ViewMapping< Kokkos::Experimental::Impl::DynRankSubviewTag , V , Args... >::ret_type ;
template< class D , class ... P , class ...Args >
KOKKOS_INLINE_FUNCTION
Subdynrankview< ViewTraits<D******* , P...> , Args... >
subdynrankview( const Kokkos::Experimental::DynRankView< D , P... > &src , Args...args)
{
if ( src.rank() > sizeof...(Args) ) //allow sizeof...(Args) >= src.rank(), ignore the remaining args
{ Kokkos::abort("subdynrankview: num of args must be >= rank of the source DynRankView"); }
- typedef Kokkos::Experimental::Impl::ViewMapping< Kokkos::Experimental::Impl::DynRankSubviewTag , Kokkos::Experimental::ViewTraits< D*******, P... > , Args... > metafcn ;
+ typedef Kokkos::Experimental::Impl::ViewMapping< Kokkos::Experimental::Impl::DynRankSubviewTag , Kokkos::ViewTraits< D*******, P... > , Args... > metafcn ;
return metafcn::subview( src.rank() , src , args... );
}
//Wrapper to allow subview function name
template< class D , class ... P , class ...Args >
KOKKOS_INLINE_FUNCTION
Subdynrankview< ViewTraits<D******* , P...> , Args... >
subview( const Kokkos::Experimental::DynRankView< D , P... > &src , Args...args)
{
return subdynrankview( src , args... );
}
} // namespace Experimental
} // namespace Kokkos
namespace Kokkos {
namespace Experimental {
// overload == and !=
template< class LT , class ... LP , class RT , class ... RP >
KOKKOS_INLINE_FUNCTION
bool operator == ( const DynRankView<LT,LP...> & lhs ,
const DynRankView<RT,RP...> & rhs )
{
// Same data, layout, dimensions
typedef ViewTraits<LT,LP...> lhs_traits ;
typedef ViewTraits<RT,RP...> rhs_traits ;
return
std::is_same< typename lhs_traits::const_value_type ,
typename rhs_traits::const_value_type >::value &&
std::is_same< typename lhs_traits::array_layout ,
typename rhs_traits::array_layout >::value &&
std::is_same< typename lhs_traits::memory_space ,
typename rhs_traits::memory_space >::value &&
lhs.rank() == rhs.rank() &&
lhs.data() == rhs.data() &&
lhs.span() == rhs.span() &&
lhs.dimension(0) == rhs.dimension(0) &&
lhs.dimension(1) == rhs.dimension(1) &&
lhs.dimension(2) == rhs.dimension(2) &&
lhs.dimension(3) == rhs.dimension(3) &&
lhs.dimension(4) == rhs.dimension(4) &&
lhs.dimension(5) == rhs.dimension(5) &&
lhs.dimension(6) == rhs.dimension(6) &&
lhs.dimension(7) == rhs.dimension(7);
}
template< class LT , class ... LP , class RT , class ... RP >
KOKKOS_INLINE_FUNCTION
bool operator != ( const DynRankView<LT,LP...> & lhs ,
const DynRankView<RT,RP...> & rhs )
{
return ! ( operator==(lhs,rhs) );
}
} //end Experimental
} //end Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Experimental {
namespace Impl {
template< class OutputView , typename Enable = void >
struct DynRankViewFill {
typedef typename OutputView::traits::const_value_type const_value_type ;
const OutputView output ;
const_value_type input ;
KOKKOS_INLINE_FUNCTION
void operator()( const size_t i0 ) const
{
const size_t n1 = output.dimension_1();
const size_t n2 = output.dimension_2();
const size_t n3 = output.dimension_3();
const size_t n4 = output.dimension_4();
const size_t n5 = output.dimension_5();
const size_t n6 = output.dimension_6();
for ( size_t i1 = 0 ; i1 < n1 ; ++i1 ) {
for ( size_t i2 = 0 ; i2 < n2 ; ++i2 ) {
for ( size_t i3 = 0 ; i3 < n3 ; ++i3 ) {
for ( size_t i4 = 0 ; i4 < n4 ; ++i4 ) {
for ( size_t i5 = 0 ; i5 < n5 ; ++i5 ) {
for ( size_t i6 = 0 ; i6 < n6 ; ++i6 ) {
output(i0,i1,i2,i3,i4,i5,i6) = input ;
}}}}}}
}
DynRankViewFill( const OutputView & arg_out , const_value_type & arg_in )
: output( arg_out ), input( arg_in )
{
typedef typename OutputView::execution_space execution_space ;
typedef Kokkos::RangePolicy< execution_space > Policy ;
const Kokkos::Impl::ParallelFor< DynRankViewFill , Policy > closure( *this , Policy( 0 , output.dimension_0() ) );
closure.execute();
execution_space::fence();
}
};
template< class OutputView >
struct DynRankViewFill< OutputView , typename std::enable_if< OutputView::Rank == 0 >::type > {
DynRankViewFill( const OutputView & dst , const typename OutputView::const_value_type & src )
{
Kokkos::Impl::DeepCopy< typename OutputView::memory_space , Kokkos::HostSpace >
( dst.data() , & src , sizeof(typename OutputView::const_value_type) );
}
};
template< class OutputView , class InputView , class ExecSpace = typename OutputView::execution_space >
struct DynRankViewRemap {
const OutputView output ;
const InputView input ;
const size_t n0 ;
const size_t n1 ;
const size_t n2 ;
const size_t n3 ;
const size_t n4 ;
const size_t n5 ;
const size_t n6 ;
const size_t n7 ;
DynRankViewRemap( const OutputView & arg_out , const InputView & arg_in )
: output( arg_out ), input( arg_in )
, n0( std::min( (size_t)arg_out.dimension_0() , (size_t)arg_in.dimension_0() ) )
, n1( std::min( (size_t)arg_out.dimension_1() , (size_t)arg_in.dimension_1() ) )
, n2( std::min( (size_t)arg_out.dimension_2() , (size_t)arg_in.dimension_2() ) )
, n3( std::min( (size_t)arg_out.dimension_3() , (size_t)arg_in.dimension_3() ) )
, n4( std::min( (size_t)arg_out.dimension_4() , (size_t)arg_in.dimension_4() ) )
, n5( std::min( (size_t)arg_out.dimension_5() , (size_t)arg_in.dimension_5() ) )
, n6( std::min( (size_t)arg_out.dimension_6() , (size_t)arg_in.dimension_6() ) )
, n7( std::min( (size_t)arg_out.dimension_7() , (size_t)arg_in.dimension_7() ) )
{
typedef Kokkos::RangePolicy< ExecSpace > Policy ;
const Kokkos::Impl::ParallelFor< DynRankViewRemap , Policy > closure( *this , Policy( 0 , n0 ) );
closure.execute();
}
KOKKOS_INLINE_FUNCTION
void operator()( const size_t i0 ) const
{
for ( size_t i1 = 0 ; i1 < n1 ; ++i1 ) {
for ( size_t i2 = 0 ; i2 < n2 ; ++i2 ) {
for ( size_t i3 = 0 ; i3 < n3 ; ++i3 ) {
for ( size_t i4 = 0 ; i4 < n4 ; ++i4 ) {
for ( size_t i5 = 0 ; i5 < n5 ; ++i5 ) {
for ( size_t i6 = 0 ; i6 < n6 ; ++i6 ) {
output(i0,i1,i2,i3,i4,i5,i6) = input(i0,i1,i2,i3,i4,i5,i6);
}}}}}}
}
};
} /* namespace Impl */
} /* namespace Experimental */
} /* namespace Kokkos */
namespace Kokkos {
namespace Experimental {
/** \brief Deep copy a value from Host memory into a view. */
template< class DT , class ... DP >
inline
void deep_copy
( const DynRankView<DT,DP...> & dst
, typename ViewTraits<DT,DP...>::const_value_type & value
, typename std::enable_if<
std::is_same< typename ViewTraits<DT,DP...>::specialize , void >::value
>::type * = 0 )
{
static_assert(
std::is_same< typename ViewTraits<DT,DP...>::non_const_value_type ,
typename ViewTraits<DT,DP...>::value_type >::value
, "deep_copy requires non-const type" );
Kokkos::Experimental::Impl::DynRankViewFill< DynRankView<DT,DP...> >( dst , value );
}
/** \brief Deep copy into a value in Host memory from a view. */
template< class ST , class ... SP >
inline
void deep_copy
( typename ViewTraits<ST,SP...>::non_const_value_type & dst
, const DynRankView<ST,SP...> & src
, typename std::enable_if<
std::is_same< typename ViewTraits<ST,SP...>::specialize , void >::value
>::type * = 0 )
{
if ( src.rank() != 0 )
{
Kokkos::abort("");
}
typedef ViewTraits<ST,SP...> src_traits ;
typedef typename src_traits::memory_space src_memory_space ;
Kokkos::Impl::DeepCopy< HostSpace , src_memory_space >( & dst , src.data() , sizeof(ST) );
}
//----------------------------------------------------------------------------
/** \brief A deep copy between views of the default specialization, compatible type,
* same rank, same contiguous layout.
*/
template< class DstType , class SrcType >
inline
void deep_copy
( const DstType & dst
, const SrcType & src
, typename std::enable_if<(
std::is_same< typename DstType::traits::specialize , void >::value &&
std::is_same< typename SrcType::traits::specialize , void >::value
&&
( Kokkos::Experimental::is_dyn_rank_view<DstType>::value || Kokkos::Experimental::is_dyn_rank_view<SrcType>::value)
)>::type * = 0 )
{
static_assert(
std::is_same< typename DstType::traits::value_type ,
typename DstType::traits::non_const_value_type >::value
, "deep_copy requires non-const destination type" );
typedef DstType dst_type ;
typedef SrcType src_type ;
typedef typename dst_type::execution_space dst_execution_space ;
typedef typename src_type::execution_space src_execution_space ;
typedef typename dst_type::memory_space dst_memory_space ;
typedef typename src_type::memory_space src_memory_space ;
enum { DstExecCanAccessSrc =
- Kokkos::Impl::VerifyExecutionCanAccessMemorySpace< typename dst_execution_space::memory_space , src_memory_space >::value };
+ Kokkos::Impl::SpaceAccessibility< dst_execution_space , src_memory_space >::accessible };
enum { SrcExecCanAccessDst =
- Kokkos::Impl::VerifyExecutionCanAccessMemorySpace< typename src_execution_space::memory_space , dst_memory_space >::value };
+ Kokkos::Impl::SpaceAccessibility< src_execution_space , dst_memory_space >::accessible };
if ( (void *) dst.data() != (void*) src.data() ) {
// Concern: If overlapping views then a parallel copy will be erroneous.
// ...
// If same type, equal layout, equal dimensions, equal span, and contiguous memory then can byte-wise copy
if ( rank(src) == 0 && rank(dst) == 0 )
{
typedef typename dst_type::value_type value_type ;
Kokkos::Impl::DeepCopy< dst_memory_space , src_memory_space >( dst.data() , src.data() , sizeof(value_type) );
}
else if ( std::is_same< typename DstType::traits::value_type ,
typename SrcType::traits::non_const_value_type >::value &&
(
( std::is_same< typename DstType::traits::array_layout ,
typename SrcType::traits::array_layout >::value
&&
( std::is_same< typename DstType::traits::array_layout ,
typename Kokkos::LayoutLeft>::value
||
std::is_same< typename DstType::traits::array_layout ,
typename Kokkos::LayoutRight>::value
)
)
||
(
rank(dst) == 1
&&
rank(src) == 1
)
) &&
dst.span_is_contiguous() &&
src.span_is_contiguous() &&
dst.span() == src.span() &&
dst.dimension_0() == src.dimension_0() &&
dst.dimension_1() == src.dimension_1() &&
dst.dimension_2() == src.dimension_2() &&
dst.dimension_3() == src.dimension_3() &&
dst.dimension_4() == src.dimension_4() &&
dst.dimension_5() == src.dimension_5() &&
dst.dimension_6() == src.dimension_6() &&
dst.dimension_7() == src.dimension_7() ) {
const size_t nbytes = sizeof(typename dst_type::value_type) * dst.span();
Kokkos::Impl::DeepCopy< dst_memory_space , src_memory_space >( dst.data() , src.data() , nbytes );
}
else if ( std::is_same< typename DstType::traits::value_type ,
typename SrcType::traits::non_const_value_type >::value &&
(
( std::is_same< typename DstType::traits::array_layout ,
typename SrcType::traits::array_layout >::value
&&
std::is_same< typename DstType::traits::array_layout ,
typename Kokkos::LayoutStride>::value
)
||
(
rank(dst) == 1
&&
rank(src) == 1
)
) &&
dst.span_is_contiguous() &&
src.span_is_contiguous() &&
dst.span() == src.span() &&
dst.dimension_0() == src.dimension_0() &&
dst.dimension_1() == src.dimension_1() &&
dst.dimension_2() == src.dimension_2() &&
dst.dimension_3() == src.dimension_3() &&
dst.dimension_4() == src.dimension_4() &&
dst.dimension_5() == src.dimension_5() &&
dst.dimension_6() == src.dimension_6() &&
dst.dimension_7() == src.dimension_7() &&
dst.stride_0() == src.stride_0() &&
dst.stride_1() == src.stride_1() &&
dst.stride_2() == src.stride_2() &&
dst.stride_3() == src.stride_3() &&
dst.stride_4() == src.stride_4() &&
dst.stride_5() == src.stride_5() &&
dst.stride_6() == src.stride_6() &&
dst.stride_7() == src.stride_7()
) {
const size_t nbytes = sizeof(typename dst_type::value_type) * dst.span();
Kokkos::Impl::DeepCopy< dst_memory_space , src_memory_space >( dst.data() , src.data() , nbytes );
}
else if ( DstExecCanAccessSrc ) {
// Copying data between views in accessible memory spaces and either non-contiguous or incompatible shape.
Kokkos::Experimental::Impl::DynRankViewRemap< dst_type , src_type >( dst , src );
}
else if ( SrcExecCanAccessDst ) {
// Copying data between views in accessible memory spaces and either non-contiguous or incompatible shape.
Kokkos::Experimental::Impl::DynRankViewRemap< dst_type , src_type , src_execution_space >( dst , src );
}
else {
Kokkos::Impl::throw_runtime_exception("deep_copy given views that would require a temporary allocation");
}
}
}
} //end Experimental
} //end Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Experimental {
namespace Impl {
// Deduce Mirror Types
template<class Space, class T, class ... P>
struct MirrorDRViewType {
// The incoming view_type
typedef typename Kokkos::Experimental::DynRankView<T,P...> src_view_type;
// The memory space for the mirror view
typedef typename Space::memory_space memory_space;
// Check whether it is the same memory space
enum { is_same_memspace = std::is_same<memory_space,typename src_view_type::memory_space>::value };
// The array_layout
typedef typename src_view_type::array_layout array_layout;
// The data type (we probably want it non-const since otherwise we can't even deep_copy to it.
typedef typename src_view_type::non_const_data_type data_type;
// The destination view type if it is not the same memory space
typedef Kokkos::Experimental::DynRankView<data_type,array_layout,Space> dest_view_type;
// If it is the same memory_space return the existsing view_type
// This will also keep the unmanaged trait if necessary
typedef typename std::conditional<is_same_memspace,src_view_type,dest_view_type>::type view_type;
};
template<class Space, class T, class ... P>
struct MirrorDRVType {
// The incoming view_type
typedef typename Kokkos::Experimental::DynRankView<T,P...> src_view_type;
// The memory space for the mirror view
typedef typename Space::memory_space memory_space;
// Check whether it is the same memory space
enum { is_same_memspace = std::is_same<memory_space,typename src_view_type::memory_space>::value };
// The array_layout
typedef typename src_view_type::array_layout array_layout;
// The data type (we probably want it non-const since otherwise we can't even deep_copy to it.
typedef typename src_view_type::non_const_data_type data_type;
// The destination view type if it is not the same memory space
typedef Kokkos::Experimental::DynRankView<data_type,array_layout,Space> view_type;
};
}
template< class T , class ... P >
inline
typename DynRankView<T,P...>::HostMirror
create_mirror( const DynRankView<T,P...> & src
, typename std::enable_if<
- ! std::is_same< typename Kokkos::Experimental::ViewTraits<T,P...>::array_layout
+ ! std::is_same< typename Kokkos::ViewTraits<T,P...>::array_layout
, Kokkos::LayoutStride >::value
>::type * = 0
)
{
typedef DynRankView<T,P...> src_type ;
typedef typename src_type::HostMirror dst_type ;
return dst_type( std::string( src.label() ).append("_mirror")
, Impl::reconstructLayout(src.layout(), src.rank()) );
}
template< class T , class ... P >
inline
typename DynRankView<T,P...>::HostMirror
create_mirror( const DynRankView<T,P...> & src
, typename std::enable_if<
- std::is_same< typename Kokkos::Experimental::ViewTraits<T,P...>::array_layout
+ std::is_same< typename Kokkos::ViewTraits<T,P...>::array_layout
, Kokkos::LayoutStride >::value
>::type * = 0
)
{
typedef DynRankView<T,P...> src_type ;
typedef typename src_type::HostMirror dst_type ;
return dst_type( std::string( src.label() ).append("_mirror")
, Impl::reconstructLayout(src.layout(), src.rank()) );
}
// Create a mirror in a new space (specialization for different space)
template<class Space, class T, class ... P>
typename Impl::MirrorDRVType<Space,T,P ...>::view_type create_mirror(const Space& , const Kokkos::Experimental::DynRankView<T,P...> & src) {
return typename Impl::MirrorDRVType<Space,T,P ...>::view_type(src.label(), Impl::reconstructLayout(src.layout(), src.rank()) );
}
template< class T , class ... P >
inline
typename DynRankView<T,P...>::HostMirror
create_mirror_view( const DynRankView<T,P...> & src
, typename std::enable_if<(
std::is_same< typename DynRankView<T,P...>::memory_space
, typename DynRankView<T,P...>::HostMirror::memory_space
>::value
&&
std::is_same< typename DynRankView<T,P...>::data_type
, typename DynRankView<T,P...>::HostMirror::data_type
>::value
)>::type * = 0
)
{
return src ;
}
template< class T , class ... P >
inline
typename DynRankView<T,P...>::HostMirror
create_mirror_view( const DynRankView<T,P...> & src
, typename std::enable_if< ! (
std::is_same< typename DynRankView<T,P...>::memory_space
, typename DynRankView<T,P...>::HostMirror::memory_space
>::value
&&
std::is_same< typename DynRankView<T,P...>::data_type
, typename DynRankView<T,P...>::HostMirror::data_type
>::value
)>::type * = 0
)
{
return Kokkos::Experimental::create_mirror( src );
}
// Create a mirror view in a new space (specialization for same space)
template<class Space, class T, class ... P>
typename Impl::MirrorDRViewType<Space,T,P ...>::view_type
create_mirror_view(const Space& , const Kokkos::Experimental::DynRankView<T,P...> & src
, typename std::enable_if<Impl::MirrorDRViewType<Space,T,P ...>::is_same_memspace>::type* = 0 ) {
return src;
}
// Create a mirror view in a new space (specialization for different space)
template<class Space, class T, class ... P>
typename Impl::MirrorDRViewType<Space,T,P ...>::view_type
create_mirror_view(const Space& , const Kokkos::Experimental::DynRankView<T,P...> & src
, typename std::enable_if<!Impl::MirrorDRViewType<Space,T,P ...>::is_same_memspace>::type* = 0 ) {
return typename Impl::MirrorDRViewType<Space,T,P ...>::view_type(src.label(), Impl::reconstructLayout(src.layout(), src.rank()) );
}
} //end Experimental
} //end Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Experimental {
/** \brief Resize a view with copying old data to new data at the corresponding indices. */
template< class T , class ... P >
inline
void resize( DynRankView<T,P...> & v ,
const size_t n0 = ~size_t(0) ,
const size_t n1 = ~size_t(0) ,
const size_t n2 = ~size_t(0) ,
const size_t n3 = ~size_t(0) ,
const size_t n4 = ~size_t(0) ,
const size_t n5 = ~size_t(0) ,
const size_t n6 = ~size_t(0) ,
const size_t n7 = ~size_t(0) )
{
typedef DynRankView<T,P...> drview_type ;
- static_assert( Kokkos::Experimental::ViewTraits<T,P...>::is_managed , "Can only resize managed views" );
+ static_assert( Kokkos::ViewTraits<T,P...>::is_managed , "Can only resize managed views" );
drview_type v_resized( v.label(), n0, n1, n2, n3, n4, n5, n6 );
Kokkos::Experimental::Impl::DynRankViewRemap< drview_type , drview_type >( v_resized, v );
v = v_resized ;
}
/** \brief Resize a view with copying old data to new data at the corresponding indices. */
template< class T , class ... P >
inline
void realloc( DynRankView<T,P...> & v ,
const size_t n0 = ~size_t(0) ,
const size_t n1 = ~size_t(0) ,
const size_t n2 = ~size_t(0) ,
const size_t n3 = ~size_t(0) ,
const size_t n4 = ~size_t(0) ,
const size_t n5 = ~size_t(0) ,
const size_t n6 = ~size_t(0) ,
const size_t n7 = ~size_t(0) )
{
typedef DynRankView<T,P...> drview_type ;
- static_assert( Kokkos::Experimental::ViewTraits<T,P...>::is_managed , "Can only realloc managed views" );
+ static_assert( Kokkos::ViewTraits<T,P...>::is_managed , "Can only realloc managed views" );
const std::string label = v.label();
v = drview_type(); // Deallocate first, if the only view to allocation
v = drview_type( label, n0, n1, n2, n3, n4, n5, n6 );
}
} //end Experimental
} //end Kokkos
using Kokkos::Experimental::is_dyn_rank_view ;
namespace Kokkos {
template< typename D , class ... P >
using DynRankView = Kokkos::Experimental::DynRankView< D , P... > ;
using Kokkos::Experimental::deep_copy ;
using Kokkos::Experimental::create_mirror ;
using Kokkos::Experimental::create_mirror_view ;
using Kokkos::Experimental::subdynrankview ;
using Kokkos::Experimental::subview ;
using Kokkos::Experimental::resize ;
using Kokkos::Experimental::realloc ;
} //end Kokkos
#endif
diff --git a/lib/kokkos/containers/src/Kokkos_DynamicView.hpp b/lib/kokkos/containers/src/Kokkos_DynamicView.hpp
index fb364f0bf..3277c007d 100644
--- a/lib/kokkos/containers/src/Kokkos_DynamicView.hpp
+++ b/lib/kokkos/containers/src/Kokkos_DynamicView.hpp
@@ -1,494 +1,494 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_DYNAMIC_VIEW_HPP
#define KOKKOS_DYNAMIC_VIEW_HPP
#include <cstdio>
#include <Kokkos_Core.hpp>
#include <impl/Kokkos_Error.hpp>
namespace Kokkos {
namespace Experimental {
/** \brief Dynamic views are restricted to rank-one and no layout.
* Subviews are not allowed.
*/
template< typename DataType , typename ... P >
-class DynamicView : public Kokkos::Experimental::ViewTraits< DataType , P ... >
+class DynamicView : public Kokkos::ViewTraits< DataType , P ... >
{
public:
typedef ViewTraits< DataType , P ... > traits ;
private:
template< class , class ... > friend class DynamicView ;
typedef Kokkos::Experimental::Impl::SharedAllocationTracker track_type ;
static_assert( traits::rank == 1 && traits::rank_dynamic == 1
, "DynamicView must be rank-one" );
static_assert( std::is_trivial< typename traits::value_type >::value &&
std::is_same< typename traits::specialize , void >::value
, "DynamicView must have trivial data type" );
+
+ template< class Space , bool = Kokkos::Impl::MemorySpaceAccess< Space , typename traits::memory_space >::accessible > struct verify_space
+ { KOKKOS_FORCEINLINE_FUNCTION static void check() {} };
+
+ template< class Space > struct verify_space<Space,false>
+ { KOKKOS_FORCEINLINE_FUNCTION static void check()
+ { Kokkos::abort("Kokkos::DynamicView ERROR: attempt to access inaccessible memory space"); };
+ };
+
public:
typedef Kokkos::Experimental::MemoryPool< typename traits::device_type > memory_pool ;
private:
memory_pool m_pool ;
track_type m_track ;
typename traits::value_type ** m_chunks ;
unsigned m_chunk_shift ;
unsigned m_chunk_mask ;
unsigned m_chunk_max ;
public:
//----------------------------------------------------------------------
/** \brief Compatible view of array of scalar types */
typedef DynamicView< typename traits::data_type ,
typename traits::device_type >
array_type ;
/** \brief Compatible view of const data type */
typedef DynamicView< typename traits::const_data_type ,
typename traits::device_type >
const_type ;
/** \brief Compatible view of non-const data type */
typedef DynamicView< typename traits::non_const_data_type ,
typename traits::device_type >
non_const_type ;
/** \brief Must be accessible everywhere */
typedef DynamicView HostMirror ;
//----------------------------------------------------------------------
enum { Rank = 1 };
KOKKOS_INLINE_FUNCTION constexpr size_t size() const
{
return
- Kokkos::Impl::VerifyExecutionCanAccessMemorySpace
+ Kokkos::Impl::MemorySpaceAccess
< Kokkos::Impl::ActiveExecutionMemorySpace
, typename traits::memory_space
- >::value
+ >::accessible
? // Runtime size is at the end of the chunk pointer array
(*reinterpret_cast<const uintptr_t*>( m_chunks + m_chunk_max ))
<< m_chunk_shift
: 0 ;
}
template< typename iType >
KOKKOS_INLINE_FUNCTION constexpr
size_t extent( const iType & r ) const
{ return r == 0 ? size() : 1 ; }
template< typename iType >
KOKKOS_INLINE_FUNCTION constexpr
size_t extent_int( const iType & r ) const
{ return r == 0 ? size() : 1 ; }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_0() const { return size(); }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_1() const { return 1 ; }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_2() const { return 1 ; }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_3() const { return 1 ; }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_4() const { return 1 ; }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_5() const { return 1 ; }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_6() const { return 1 ; }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_7() const { return 1 ; }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_0() const { return 0 ; }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_1() const { return 0 ; }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_2() const { return 0 ; }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_3() const { return 0 ; }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_4() const { return 0 ; }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_5() const { return 0 ; }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_6() const { return 0 ; }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_7() const { return 0 ; }
template< typename iType >
KOKKOS_INLINE_FUNCTION void stride( iType * const s ) const { *s = 0 ; }
//----------------------------------------------------------------------
// Range span is the span which contains all members.
typedef typename traits::value_type & reference_type ;
typedef typename traits::value_type * pointer_type ;
enum { reference_type_is_lvalue_reference = std::is_lvalue_reference< reference_type >::value };
KOKKOS_INLINE_FUNCTION constexpr bool span_is_contiguous() const { return false ; }
KOKKOS_INLINE_FUNCTION constexpr size_t span() const { return 0 ; }
KOKKOS_INLINE_FUNCTION constexpr pointer_type data() const { return 0 ; }
//----------------------------------------
template< typename I0 , class ... Args >
KOKKOS_INLINE_FUNCTION
reference_type operator()( const I0 & i0 , const Args & ... args ) const
{
static_assert( Kokkos::Impl::are_integral<I0,Args...>::value
, "Indices must be integral type" );
- Kokkos::Impl::VerifyExecutionCanAccessMemorySpace
- < Kokkos::Impl::ActiveExecutionMemorySpace
- , typename traits::memory_space
- >::verify();
+ DynamicView::template verify_space< Kokkos::Impl::ActiveExecutionMemorySpace >::check();
// Which chunk is being indexed.
const uintptr_t ic = uintptr_t( i0 >> m_chunk_shift );
typename traits::value_type * volatile * const ch = m_chunks + ic ;
// Do bounds checking if enabled or if the chunk pointer is zero.
// If not bounds checking then we assume a non-zero pointer is valid.
#if ! defined( KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK )
if ( 0 == *ch )
#endif
{
// Verify that allocation of the requested chunk in in progress.
// The allocated chunk counter is m_chunks[ m_chunk_max ]
const uintptr_t n =
*reinterpret_cast<uintptr_t volatile *>( m_chunks + m_chunk_max );
if ( n <= ic ) {
Kokkos::abort("Kokkos::DynamicView array bounds error");
}
// Allocation of this chunk is in progress
// so wait for allocation to complete.
while ( 0 == *ch );
}
return (*ch)[ i0 & m_chunk_mask ];
}
//----------------------------------------
/** \brief Resizing in parallel only increases the array size,
* never decrease.
*/
KOKKOS_INLINE_FUNCTION
void resize_parallel( size_t n ) const
{
typedef typename traits::value_type value_type ;
- Kokkos::Impl::VerifyExecutionCanAccessMemorySpace
- < Kokkos::Impl::ActiveExecutionMemorySpace
- , typename traits::memory_space >::verify();
+ DynamicView::template verify_space< Kokkos::Impl::ActiveExecutionMemorySpace >::check();
const uintptr_t NC = ( n + m_chunk_mask ) >> m_chunk_shift ;
if ( m_chunk_max < NC ) {
#if defined( KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK )
- printf("DynamicView::resize_parallel(%lu) m_chunk_max(%lu) NC(%lu)\n"
+ printf("DynamicView::resize_parallel(%lu) m_chunk_max(%u) NC(%lu)\n"
, n , m_chunk_max , NC );
#endif
Kokkos::abort("DynamicView::resize_parallel exceeded maximum size");
}
typename traits::value_type * volatile * const ch = m_chunks ;
// The allocated chunk counter is m_chunks[ m_chunk_max ]
uintptr_t volatile * const pc =
reinterpret_cast<uintptr_t volatile*>( m_chunks + m_chunk_max );
// Potentially concurrent iteration of allocation to the required size.
for ( uintptr_t jc = *pc ; jc < NC ; ) {
// Claim the 'jc' chunk to-be-allocated index
const uintptr_t jc_try = jc ;
// Jump iteration to the chunk counter.
jc = atomic_compare_exchange( pc , jc_try , jc_try + 1 );
if ( jc_try == jc ) {
ch[jc_try] = reinterpret_cast<value_type*>(
m_pool.allocate( sizeof(value_type) << m_chunk_shift ));
Kokkos::memory_fence();
}
}
}
/** \brief Resizing in serial can grow or shrink the array size, */
inline
void resize_serial( size_t n )
{
- Kokkos::Impl::VerifyExecutionCanAccessMemorySpace
- < Kokkos::Impl::ActiveExecutionMemorySpace
- , typename traits::memory_space >::verify();
+ DynamicView::template verify_space< Kokkos::Impl::ActiveExecutionMemorySpace >::check();
const uintptr_t NC = ( n + m_chunk_mask ) >> m_chunk_shift ;
if ( m_chunk_max < NC ) {
Kokkos::abort("DynamicView::resize_serial exceeded maximum size");
}
uintptr_t * const pc =
reinterpret_cast<uintptr_t*>( m_chunks + m_chunk_max );
if ( *pc < NC ) {
while ( *pc < NC ) {
m_chunks[*pc] =
m_pool.allocate( sizeof(traits::value_type) << m_chunk_shift );
++*pc ;
}
}
else {
while ( NC + 1 <= *pc ) {
--*pc ;
m_pool.deallocate( m_chunks[*pc]
, sizeof(traits::value_type) << m_chunk_shift );
m_chunks[*pc] = 0 ;
}
}
}
//----------------------------------------------------------------------
~DynamicView() = default ;
DynamicView() = default ;
DynamicView( DynamicView && ) = default ;
DynamicView( const DynamicView & ) = default ;
DynamicView & operator = ( DynamicView && ) = default ;
DynamicView & operator = ( const DynamicView & ) = default ;
template< class RT , class ... RP >
KOKKOS_INLINE_FUNCTION
DynamicView( const DynamicView<RT,RP...> & rhs )
: m_pool( rhs.m_pool )
, m_track( rhs.m_track )
, m_chunks( rhs.m_chunks )
, m_chunk_shift( rhs.m_chunk_shift )
, m_chunk_mask( rhs.m_chunk_mask )
, m_chunk_max( rhs.m_chunk_max )
{
}
//----------------------------------------------------------------------
struct Destroy {
memory_pool m_pool ;
typename traits::value_type ** m_chunks ;
unsigned m_chunk_max ;
bool m_destroy ;
// Initialize or destroy array of chunk pointers.
// Two entries beyond the max chunks are allocation counters.
KOKKOS_INLINE_FUNCTION
void operator()( unsigned i ) const
{
if ( m_destroy && i < m_chunk_max && 0 != m_chunks[i] ) {
m_pool.deallocate( m_chunks[i] , m_pool.get_min_block_size() );
}
m_chunks[i] = 0 ;
}
void execute( bool arg_destroy )
{
typedef Kokkos::RangePolicy< typename traits::execution_space > Range ;
m_destroy = arg_destroy ;
Kokkos::Impl::ParallelFor<Destroy,Range>
closure( *this , Range(0, m_chunk_max + 1) );
closure.execute();
traits::execution_space::fence();
}
void construct_shared_allocation()
{ execute( false ); }
void destroy_shared_allocation()
{ execute( true ); }
Destroy() = default ;
Destroy( Destroy && ) = default ;
Destroy( const Destroy & ) = default ;
Destroy & operator = ( Destroy && ) = default ;
Destroy & operator = ( const Destroy & ) = default ;
Destroy( const memory_pool & arg_pool
, typename traits::value_type ** arg_chunk
, const unsigned arg_chunk_max )
: m_pool( arg_pool )
, m_chunks( arg_chunk )
, m_chunk_max( arg_chunk_max )
, m_destroy( false )
{}
};
/**\brief Allocation constructor
*
* Memory is allocated in chunks from the memory pool.
* The chunk size conforms to the memory pool's chunk size.
* A maximum size is required in order to allocate a
* chunk-pointer array.
*/
explicit inline
DynamicView( const std::string & arg_label
, const memory_pool & arg_pool
, const size_t arg_size_max )
: m_pool( arg_pool )
, m_track()
, m_chunks(0)
// The memory pool chunk is guaranteed to be a power of two
, m_chunk_shift(
Kokkos::Impl::integral_power_of_two(
m_pool.get_min_block_size()/sizeof(typename traits::value_type)) )
, m_chunk_mask( ( 1 << m_chunk_shift ) - 1 )
, m_chunk_max( ( arg_size_max + m_chunk_mask ) >> m_chunk_shift )
{
- Kokkos::Impl::VerifyExecutionCanAccessMemorySpace
- < Kokkos::Impl::ActiveExecutionMemorySpace
- , typename traits::memory_space >::verify();
+ DynamicView::template verify_space< Kokkos::Impl::ActiveExecutionMemorySpace >::check();
// A functor to deallocate all of the chunks upon final destruction
typedef typename traits::memory_space memory_space ;
typedef Kokkos::Experimental::Impl::SharedAllocationRecord< memory_space , Destroy > record_type ;
// Allocate chunk pointers and allocation counter
record_type * const record =
record_type::allocate( memory_space()
, arg_label
, ( sizeof(pointer_type) * ( m_chunk_max + 1 ) ) );
m_chunks = reinterpret_cast<pointer_type*>( record->data() );
record->m_destroy = Destroy( m_pool , m_chunks , m_chunk_max );
// Initialize to zero
record->m_destroy.construct_shared_allocation();
m_track.assign_allocated_record_to_uninitialized( record );
}
};
} // namespace Experimental
} // namespace Kokkos
namespace Kokkos {
namespace Experimental {
template< class T , class ... P >
inline
typename Kokkos::Experimental::DynamicView<T,P...>::HostMirror
create_mirror_view( const Kokkos::Experimental::DynamicView<T,P...> & src )
{
return src ;
}
template< class T , class ... DP , class ... SP >
inline
void deep_copy( const View<T,DP...> & dst
, const DynamicView<T,SP...> & src
)
{
typedef View<T,DP...> dst_type ;
typedef DynamicView<T,SP...> src_type ;
typedef typename ViewTraits<T,DP...>::execution_space dst_execution_space ;
typedef typename ViewTraits<T,SP...>::memory_space src_memory_space ;
enum { DstExecCanAccessSrc =
- Kokkos::Impl::VerifyExecutionCanAccessMemorySpace< typename dst_execution_space::memory_space , src_memory_space >::value };
+ Kokkos::Impl::SpaceAccessibility< dst_execution_space , src_memory_space >::accessible };
if ( DstExecCanAccessSrc ) {
// Copying data between views in accessible memory spaces and either non-contiguous or incompatible shape.
Kokkos::Experimental::Impl::ViewRemap< dst_type , src_type >( dst , src );
}
else {
Kokkos::Impl::throw_runtime_exception("deep_copy given views that would require a temporary allocation");
}
}
template< class T , class ... DP , class ... SP >
inline
void deep_copy( const DynamicView<T,DP...> & dst
, const View<T,SP...> & src
)
{
typedef DynamicView<T,SP...> dst_type ;
typedef View<T,DP...> src_type ;
typedef typename ViewTraits<T,DP...>::execution_space dst_execution_space ;
typedef typename ViewTraits<T,SP...>::memory_space src_memory_space ;
enum { DstExecCanAccessSrc =
- Kokkos::Impl::VerifyExecutionCanAccessMemorySpace< typename dst_execution_space::memory_space , src_memory_space >::value };
+ Kokkos::Impl::SpaceAccessibility< dst_execution_space , src_memory_space >::accessible };
if ( DstExecCanAccessSrc ) {
// Copying data between views in accessible memory spaces and either non-contiguous or incompatible shape.
Kokkos::Experimental::Impl::ViewRemap< dst_type , src_type >( dst , src );
}
else {
Kokkos::Impl::throw_runtime_exception("deep_copy given views that would require a temporary allocation");
}
}
} // namespace Experimental
} // namespace Kokkos
#endif /* #ifndef KOKKOS_DYNAMIC_VIEW_HPP */
diff --git a/lib/kokkos/containers/src/Kokkos_ErrorReporter.hpp b/lib/kokkos/containers/src/Kokkos_ErrorReporter.hpp
new file mode 100644
index 000000000..4c90e4c23
--- /dev/null
+++ b/lib/kokkos/containers/src/Kokkos_ErrorReporter.hpp
@@ -0,0 +1,196 @@
+/*
+//@HEADER
+// ************************************************************************
+//
+// Kokkos v. 2.0
+// Copyright (2014) Sandia Corporation
+//
+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
+// the U.S. Government retains certain rights in this software.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the Corporation nor the names of the
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
+//
+// ************************************************************************
+//@HEADER
+*/
+
+#ifndef KOKKOS_EXPERIMENTAL_ERROR_REPORTER_HPP
+#define KOKKOS_EXPERIMENTAL_ERROR_REPORTER_HPP
+
+#include <vector>
+#include <Kokkos_Core.hpp>
+#include <Kokkos_View.hpp>
+#include <Kokkos_DualView.hpp>
+
+namespace Kokkos {
+namespace Experimental {
+
+template <typename ReportType, typename DeviceType>
+class ErrorReporter
+{
+public:
+
+ typedef ReportType report_type;
+ typedef DeviceType device_type;
+ typedef typename device_type::execution_space execution_space;
+
+ ErrorReporter(int max_results)
+ : m_numReportsAttempted(""),
+ m_reports("", max_results),
+ m_reporters("", max_results)
+ {
+ clear();
+ }
+
+ int getCapacity() const { return m_reports.h_view.dimension_0(); }
+
+ int getNumReports();
+
+ int getNumReportAttempts();
+
+ void getReports(std::vector<int> &reporters_out, std::vector<report_type> &reports_out);
+ void getReports( typename Kokkos::View<int*, typename DeviceType::execution_space >::HostMirror &reporters_out,
+ typename Kokkos::View<report_type*, typename DeviceType::execution_space >::HostMirror &reports_out);
+
+ void clear();
+
+ void resize(const size_t new_size);
+
+ bool full() {return (getNumReportAttempts() >= getCapacity()); }
+
+ KOKKOS_INLINE_FUNCTION
+ bool add_report(int reporter_id, report_type report) const
+ {
+ int idx = Kokkos::atomic_fetch_add(&m_numReportsAttempted(), 1);
+
+ if (idx >= 0 && (idx < static_cast<int>(m_reports.d_view.dimension_0()))) {
+ m_reporters.d_view(idx) = reporter_id;
+ m_reports.d_view(idx) = report;
+ return true;
+ }
+ else {
+ return false;
+ }
+ }
+
+private:
+
+ typedef Kokkos::View<report_type *, execution_space> reports_view_t;
+ typedef Kokkos::DualView<report_type *, execution_space> reports_dualview_t;
+
+ typedef typename reports_dualview_t::host_mirror_space host_mirror_space;
+ Kokkos::View<int, execution_space> m_numReportsAttempted;
+ reports_dualview_t m_reports;
+ Kokkos::DualView<int *, execution_space> m_reporters;
+
+};
+
+
+template <typename ReportType, typename DeviceType>
+inline int ErrorReporter<ReportType, DeviceType>::getNumReports()
+{
+ int num_reports = 0;
+ Kokkos::deep_copy(num_reports,m_numReportsAttempted);
+ if (num_reports > static_cast<int>(m_reports.h_view.dimension_0())) {
+ num_reports = m_reports.h_view.dimension_0();
+ }
+ return num_reports;
+}
+
+template <typename ReportType, typename DeviceType>
+inline int ErrorReporter<ReportType, DeviceType>::getNumReportAttempts()
+{
+ int num_reports = 0;
+ Kokkos::deep_copy(num_reports,m_numReportsAttempted);
+ return num_reports;
+}
+
+template <typename ReportType, typename DeviceType>
+void ErrorReporter<ReportType, DeviceType>::getReports(std::vector<int> &reporters_out, std::vector<report_type> &reports_out)
+{
+ int num_reports = getNumReports();
+ reporters_out.clear();
+ reporters_out.reserve(num_reports);
+ reports_out.clear();
+ reports_out.reserve(num_reports);
+
+ if (num_reports > 0) {
+ m_reports.template sync<host_mirror_space>();
+ m_reporters.template sync<host_mirror_space>();
+
+ for (int i = 0; i < num_reports; ++i) {
+ reporters_out.push_back(m_reporters.h_view(i));
+ reports_out.push_back(m_reports.h_view(i));
+ }
+ }
+}
+
+template <typename ReportType, typename DeviceType>
+void ErrorReporter<ReportType, DeviceType>::getReports(
+ typename Kokkos::View<int*, typename DeviceType::execution_space >::HostMirror &reporters_out,
+ typename Kokkos::View<report_type*, typename DeviceType::execution_space >::HostMirror &reports_out)
+{
+ int num_reports = getNumReports();
+ reporters_out = typename Kokkos::View<int*, typename DeviceType::execution_space >::HostMirror("ErrorReport::reporters_out",num_reports);
+ reports_out = typename Kokkos::View<report_type*, typename DeviceType::execution_space >::HostMirror("ErrorReport::reports_out",num_reports);
+
+ if (num_reports > 0) {
+ m_reports.template sync<host_mirror_space>();
+ m_reporters.template sync<host_mirror_space>();
+
+ for (int i = 0; i < num_reports; ++i) {
+ reporters_out(i) = m_reporters.h_view(i);
+ reports_out(i) = m_reports.h_view(i);
+ }
+ }
+}
+
+template <typename ReportType, typename DeviceType>
+void ErrorReporter<ReportType, DeviceType>::clear()
+{
+ int num_reports=0;
+ Kokkos::deep_copy(m_numReportsAttempted, num_reports);
+ m_reports.template modify<execution_space>();
+ m_reporters.template modify<execution_space>();
+}
+
+template <typename ReportType, typename DeviceType>
+void ErrorReporter<ReportType, DeviceType>::resize(const size_t new_size)
+{
+ m_reports.resize(new_size);
+ m_reporters.resize(new_size);
+ Kokkos::fence();
+}
+
+
+} // namespace Experimental
+} // namespace kokkos
+
+#endif
diff --git a/lib/kokkos/containers/src/Kokkos_SegmentedView.hpp b/lib/kokkos/containers/src/Kokkos_SegmentedView.hpp
deleted file mode 100644
index 5dd7a98b8..000000000
--- a/lib/kokkos/containers/src/Kokkos_SegmentedView.hpp
+++ /dev/null
@@ -1,531 +0,0 @@
-/*
-//@HEADER
-// ************************************************************************
-//
-// Kokkos v. 2.0
-// Copyright (2014) Sandia Corporation
-//
-// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
-// the U.S. Government retains certain rights in this software.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are
-// met:
-//
-// 1. Redistributions of source code must retain the above copyright
-// notice, this list of conditions and the following disclaimer.
-//
-// 2. Redistributions in binary form must reproduce the above copyright
-// notice, this list of conditions and the following disclaimer in the
-// documentation and/or other materials provided with the distribution.
-//
-// 3. Neither the name of the Corporation nor the names of the
-// contributors may be used to endorse or promote products derived from
-// this software without specific prior written permission.
-//
-// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
-// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
-// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
-// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
-// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
-// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
-// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
-// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
-// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-//
-// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
-// ************************************************************************
-//@HEADER
-*/
-
-#ifndef KOKKOS_SEGMENTED_VIEW_HPP_
-#define KOKKOS_SEGMENTED_VIEW_HPP_
-
-#include <Kokkos_Core.hpp>
-#include <impl/Kokkos_Error.hpp>
-#include <cstdio>
-
-#if ! KOKKOS_USING_EXP_VIEW
-
-namespace Kokkos {
-namespace Experimental {
-
-namespace Impl {
-
-template<class DataType, class Arg1Type, class Arg2Type, class Arg3Type>
-struct delete_segmented_view;
-
-template<class MemorySpace>
-inline
-void DeviceSetAllocatableMemorySize(size_t) {}
-
-#if defined( KOKKOS_HAVE_CUDA )
-
-template<>
-inline
-void DeviceSetAllocatableMemorySize<Kokkos::CudaSpace>(size_t size) {
-#ifdef __CUDACC__
- size_t size_limit;
- cudaDeviceGetLimit(&size_limit,cudaLimitMallocHeapSize);
- if(size_limit<size)
- cudaDeviceSetLimit(cudaLimitMallocHeapSize,2*size);
- cudaDeviceGetLimit(&size_limit,cudaLimitMallocHeapSize);
-#endif
-}
-
-template<>
-inline
-void DeviceSetAllocatableMemorySize<Kokkos::CudaUVMSpace>(size_t size) {
-#ifdef __CUDACC__
- size_t size_limit;
- cudaDeviceGetLimit(&size_limit,cudaLimitMallocHeapSize);
- if(size_limit<size)
- cudaDeviceSetLimit(cudaLimitMallocHeapSize,2*size);
- cudaDeviceGetLimit(&size_limit,cudaLimitMallocHeapSize);
-#endif
-}
-
-#endif /* #if defined( KOKKOS_HAVE_CUDA ) */
-
-}
-
-template< class DataType ,
- class Arg1Type = void ,
- class Arg2Type = void ,
- class Arg3Type = void>
-class SegmentedView : public Kokkos::ViewTraits< DataType , Arg1Type , Arg2Type, Arg3Type >
-{
-public:
- //! \name Typedefs for device types and various Kokkos::View specializations.
- //@{
- typedef Kokkos::ViewTraits< DataType , Arg1Type , Arg2Type, Arg3Type > traits ;
-
- //! The type of a Kokkos::View on the device.
- typedef Kokkos::View< typename traits::data_type ,
- typename traits::array_layout ,
- typename traits::memory_space ,
- Kokkos::MemoryUnmanaged > t_dev ;
-
-
-private:
- Kokkos::View<t_dev*,typename traits::memory_space> segments_;
-
- Kokkos::View<int,typename traits::memory_space> realloc_lock;
- Kokkos::View<int,typename traits::memory_space> nsegments_;
-
- size_t segment_length_;
- size_t segment_length_m1_;
- int max_segments_;
-
- int segment_length_log2;
-
- // Dimensions, cardinality, capacity, and offset computation for
- // multidimensional array view of contiguous memory.
- // Inherits from Impl::Shape
- typedef Kokkos::Impl::ViewOffset< typename traits::shape_type
- , typename traits::array_layout
- > offset_map_type ;
-
- offset_map_type m_offset_map ;
-
- typedef Kokkos::View< typename traits::array_intrinsic_type ,
- typename traits::array_layout ,
- typename traits::memory_space ,
- typename traits::memory_traits > array_type ;
-
- typedef Kokkos::View< typename traits::const_data_type ,
- typename traits::array_layout ,
- typename traits::memory_space ,
- typename traits::memory_traits > const_type ;
-
- typedef Kokkos::View< typename traits::non_const_data_type ,
- typename traits::array_layout ,
- typename traits::memory_space ,
- typename traits::memory_traits > non_const_type ;
-
- typedef Kokkos::View< typename traits::non_const_data_type ,
- typename traits::array_layout ,
- HostSpace ,
- void > HostMirror ;
-
- template< bool Accessible >
- KOKKOS_INLINE_FUNCTION
- typename Kokkos::Impl::enable_if< Accessible , typename traits::size_type >::type
- dimension_0_intern() const { return nsegments_() * segment_length_ ; }
-
- template< bool Accessible >
- KOKKOS_INLINE_FUNCTION
- typename Kokkos::Impl::enable_if< ! Accessible , typename traits::size_type >::type
- dimension_0_intern() const
- {
- // In Host space
- int n = 0 ;
-#if ! defined( __CUDA_ARCH__ )
- Kokkos::Impl::DeepCopy< HostSpace , typename traits::memory_space >( & n , nsegments_.ptr_on_device() , sizeof(int) );
-#endif
-
- return n * segment_length_ ;
- }
-
-public:
-
- enum { Rank = traits::rank };
-
- KOKKOS_INLINE_FUNCTION offset_map_type shape() const { return m_offset_map ; }
-
- /* \brief return (current) size of dimension 0 */
- KOKKOS_INLINE_FUNCTION typename traits::size_type dimension_0() const {
- enum { Accessible = Kokkos::Impl::VerifyExecutionCanAccessMemorySpace<
- Kokkos::Impl::ActiveExecutionMemorySpace, typename traits::memory_space >::value };
- int n = SegmentedView::dimension_0_intern< Accessible >();
- return n ;
- }
-
- /* \brief return size of dimension 1 */
- KOKKOS_INLINE_FUNCTION typename traits::size_type dimension_1() const { return m_offset_map.N1 ; }
- /* \brief return size of dimension 2 */
- KOKKOS_INLINE_FUNCTION typename traits::size_type dimension_2() const { return m_offset_map.N2 ; }
- /* \brief return size of dimension 3 */
- KOKKOS_INLINE_FUNCTION typename traits::size_type dimension_3() const { return m_offset_map.N3 ; }
- /* \brief return size of dimension 4 */
- KOKKOS_INLINE_FUNCTION typename traits::size_type dimension_4() const { return m_offset_map.N4 ; }
- /* \brief return size of dimension 5 */
- KOKKOS_INLINE_FUNCTION typename traits::size_type dimension_5() const { return m_offset_map.N5 ; }
- /* \brief return size of dimension 6 */
- KOKKOS_INLINE_FUNCTION typename traits::size_type dimension_6() const { return m_offset_map.N6 ; }
- /* \brief return size of dimension 7 */
- KOKKOS_INLINE_FUNCTION typename traits::size_type dimension_7() const { return m_offset_map.N7 ; }
-
- /* \brief return size of dimension 2 */
- KOKKOS_INLINE_FUNCTION typename traits::size_type size() const {
- return dimension_0() *
- m_offset_map.N1 * m_offset_map.N2 * m_offset_map.N3 * m_offset_map.N4 *
- m_offset_map.N5 * m_offset_map.N6 * m_offset_map.N7 ;
- }
-
- template< typename iType >
- KOKKOS_INLINE_FUNCTION
- typename traits::size_type dimension( const iType & i ) const {
- if(i==0)
- return dimension_0();
- else
- return Kokkos::Impl::dimension( m_offset_map , i );
- }
-
- KOKKOS_INLINE_FUNCTION
- typename traits::size_type capacity() {
- return segments_.dimension_0() *
- m_offset_map.N1 * m_offset_map.N2 * m_offset_map.N3 * m_offset_map.N4 *
- m_offset_map.N5 * m_offset_map.N6 * m_offset_map.N7;
- }
-
- KOKKOS_INLINE_FUNCTION
- typename traits::size_type get_num_segments() {
- enum { Accessible = Kokkos::Impl::VerifyExecutionCanAccessMemorySpace<
- Kokkos::Impl::ActiveExecutionMemorySpace, typename traits::memory_space >::value };
- int n = SegmentedView::dimension_0_intern< Accessible >();
- return n/segment_length_ ;
- }
-
- KOKKOS_INLINE_FUNCTION
- typename traits::size_type get_max_segments() {
- return max_segments_;
- }
-
- /// \brief Constructor that allocates View objects with an initial length of 0.
- ///
- /// This constructor works mostly like the analogous constructor of View.
- /// The first argument is a string label, which is entirely for your
- /// benefit. (Different SegmentedView objects may have the same label if
- /// you like.) The second argument 'view_length' is the size of the segments.
- /// This number must be a power of two. The third argument n0 is the maximum
- /// value for the first dimension of the segmented view. The maximal allocatable
- /// number of Segments is thus: (n0+view_length-1)/view_length.
- /// The arguments that follow are the other dimensions of the (1-7) of the
- /// View objects. For example, for a View with 3 runtime dimensions,
- /// the first 4 integer arguments will be nonzero:
- /// SegmentedView("Name",32768,10000000,8,4). This allocates a SegmentedView
- /// with a maximum of 306 segments of dimension (32768,8,4). The logical size of
- /// the segmented view is (n,8,4) with n between 0 and 10000000.
- /// You may omit the integer arguments that follow.
- template< class LabelType >
- SegmentedView(const LabelType & label ,
- const size_t view_length ,
- const size_t n0 ,
- const size_t n1 = 0 ,
- const size_t n2 = 0 ,
- const size_t n3 = 0 ,
- const size_t n4 = 0 ,
- const size_t n5 = 0 ,
- const size_t n6 = 0 ,
- const size_t n7 = 0
- ): segment_length_(view_length),segment_length_m1_(view_length-1)
- {
- segment_length_log2 = -1;
- size_t l = segment_length_;
- while(l>0) {
- l>>=1;
- segment_length_log2++;
- }
- l = 1<<segment_length_log2;
- if(l!=segment_length_)
- Kokkos::Impl::throw_runtime_exception("Kokkos::SegmentedView requires a 'power of 2' segment length");
-
- max_segments_ = (n0+segment_length_m1_)/segment_length_;
-
- Impl::DeviceSetAllocatableMemorySize<typename traits::memory_space>(segment_length_*max_segments_*sizeof(typename traits::value_type));
-
- segments_ = Kokkos::View<t_dev*,typename traits::execution_space>(label , max_segments_);
- realloc_lock = Kokkos::View<int,typename traits::execution_space>("Lock");
- nsegments_ = Kokkos::View<int,typename traits::execution_space>("nviews");
- m_offset_map.assign( n0, n1, n2, n3, n4, n5, n6, n7, n0*n1*n2*n3*n4*n5*n6*n7 );
-
- }
-
- KOKKOS_INLINE_FUNCTION
- SegmentedView(const SegmentedView& src):
- segments_(src.segments_),
- realloc_lock (src.realloc_lock),
- nsegments_ (src.nsegments_),
- segment_length_(src.segment_length_),
- segment_length_m1_(src.segment_length_m1_),
- max_segments_ (src.max_segments_),
- segment_length_log2(src.segment_length_log2),
- m_offset_map (src.m_offset_map)
- {}
-
- KOKKOS_INLINE_FUNCTION
- SegmentedView& operator= (const SegmentedView& src) {
- segments_ = src.segments_;
- realloc_lock = src.realloc_lock;
- nsegments_ = src.nsegments_;
- segment_length_= src.segment_length_;
- segment_length_m1_= src.segment_length_m1_;
- max_segments_ = src.max_segments_;
- segment_length_log2= src.segment_length_log2;
- m_offset_map = src.m_offset_map;
- return *this;
- }
-
- ~SegmentedView() {
- if ( !segments_.tracker().ref_counting()) { return; }
- size_t ref_count = segments_.tracker().ref_count();
- if(ref_count == 1u) {
- Kokkos::fence();
- typename Kokkos::View<int,typename traits::execution_space>::HostMirror h_nviews("h_nviews");
- Kokkos::deep_copy(h_nviews,nsegments_);
- Kokkos::parallel_for(h_nviews(),Impl::delete_segmented_view<DataType , Arg1Type , Arg2Type, Arg3Type>(*this));
- }
- }
-
- KOKKOS_INLINE_FUNCTION
- t_dev get_segment(const int& i) const {
- return segments_[i];
- }
-
- template< class MemberType>
- KOKKOS_INLINE_FUNCTION
- void grow (MemberType& team_member, const size_t& growSize) const {
- if (growSize>max_segments_*segment_length_) {
- printf ("Exceeding maxSize: %lu %lu\n", growSize, max_segments_*segment_length_);
- return;
- }
-
- if(team_member.team_rank()==0) {
- bool too_small = growSize > segment_length_ * nsegments_();
- if (too_small) {
- while(Kokkos::atomic_compare_exchange(&realloc_lock(),0,1) )
- ; // get the lock
- too_small = growSize > segment_length_ * nsegments_(); // Recheck once we have the lock
- if(too_small) {
- while(too_small) {
- const size_t alloc_size = segment_length_*m_offset_map.N1*m_offset_map.N2*m_offset_map.N3*
- m_offset_map.N4*m_offset_map.N5*m_offset_map.N6*m_offset_map.N7;
- typename traits::non_const_value_type* const ptr = new typename traits::non_const_value_type[alloc_size];
-
- segments_(nsegments_()) =
- t_dev(ptr,segment_length_,m_offset_map.N1,m_offset_map.N2,m_offset_map.N3,m_offset_map.N4,m_offset_map.N5,m_offset_map.N6,m_offset_map.N7);
- nsegments_()++;
- too_small = growSize > segment_length_ * nsegments_();
- }
- }
- realloc_lock() = 0; //release the lock
- }
- }
- team_member.team_barrier();
- }
-
- KOKKOS_INLINE_FUNCTION
- void grow_non_thread_safe (const size_t& growSize) const {
- if (growSize>max_segments_*segment_length_) {
- printf ("Exceeding maxSize: %lu %lu\n", growSize, max_segments_*segment_length_);
- return;
- }
- bool too_small = growSize > segment_length_ * nsegments_();
- if(too_small) {
- while(too_small) {
- const size_t alloc_size = segment_length_*m_offset_map.N1*m_offset_map.N2*m_offset_map.N3*
- m_offset_map.N4*m_offset_map.N5*m_offset_map.N6*m_offset_map.N7;
- typename traits::non_const_value_type* const ptr =
- new typename traits::non_const_value_type[alloc_size];
-
- segments_(nsegments_()) =
- t_dev (ptr, segment_length_, m_offset_map.N1, m_offset_map.N2,
- m_offset_map.N3, m_offset_map.N4, m_offset_map.N5,
- m_offset_map.N6, m_offset_map.N7);
- nsegments_()++;
- too_small = growSize > segment_length_ * nsegments_();
- }
- }
- }
-
- template< typename iType0 >
- KOKKOS_FORCEINLINE_FUNCTION
- typename std::enable_if<( std::is_integral<iType0>::value && traits::rank == 1 )
- , typename traits::value_type &
- >::type
- operator() ( const iType0 & i0 ) const
- {
- return segments_[i0>>segment_length_log2](i0&(segment_length_m1_));
- }
-
- template< typename iType0 , typename iType1 >
- KOKKOS_FORCEINLINE_FUNCTION
- typename std::enable_if<( std::is_integral<iType0>::value &&
- std::is_integral<iType1>::value &&
- traits::rank == 2 )
- , typename traits::value_type &
- >::type
- operator() ( const iType0 & i0 , const iType1 & i1 ) const
- {
- return segments_[i0>>segment_length_log2](i0&(segment_length_m1_),i1);
- }
-
- template< typename iType0 , typename iType1 , typename iType2 >
- KOKKOS_FORCEINLINE_FUNCTION
- typename std::enable_if<( std::is_integral<iType0>::value &&
- std::is_integral<iType1>::value &&
- std::is_integral<iType2>::value &&
- traits::rank == 3 )
- , typename traits::value_type &
- >::type
- operator() ( const iType0 & i0 , const iType1 & i1 , const iType2 & i2 ) const
- {
- return segments_[i0>>segment_length_log2](i0&(segment_length_m1_),i1,i2);
- }
-
- template< typename iType0 , typename iType1 , typename iType2 , typename iType3 >
- KOKKOS_FORCEINLINE_FUNCTION
- typename std::enable_if<( std::is_integral<iType0>::value &&
- std::is_integral<iType1>::value &&
- std::is_integral<iType2>::value &&
- std::is_integral<iType3>::value &&
- traits::rank == 4 )
- , typename traits::value_type &
- >::type
- operator() ( const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 ) const
- {
- return segments_[i0>>segment_length_log2](i0&(segment_length_m1_),i1,i2,i3);
- }
-
- template< typename iType0 , typename iType1 , typename iType2 , typename iType3 ,
- typename iType4 >
- KOKKOS_FORCEINLINE_FUNCTION
- typename std::enable_if<( std::is_integral<iType0>::value &&
- std::is_integral<iType1>::value &&
- std::is_integral<iType2>::value &&
- std::is_integral<iType3>::value &&
- std::is_integral<iType4>::value &&
- traits::rank == 5 )
- , typename traits::value_type &
- >::type
- operator() ( const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 ,
- const iType4 & i4 ) const
- {
- return segments_[i0>>segment_length_log2](i0&(segment_length_m1_),i1,i2,i3,i4);
- }
-
- template< typename iType0 , typename iType1 , typename iType2 , typename iType3 ,
- typename iType4 , typename iType5 >
- KOKKOS_FORCEINLINE_FUNCTION
- typename std::enable_if<( std::is_integral<iType0>::value &&
- std::is_integral<iType1>::value &&
- std::is_integral<iType2>::value &&
- std::is_integral<iType3>::value &&
- std::is_integral<iType4>::value &&
- std::is_integral<iType5>::value &&
- traits::rank == 6 )
- , typename traits::value_type &
- >::type
- operator() ( const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 ,
- const iType4 & i4 , const iType5 & i5 ) const
- {
- return segments_[i0>>segment_length_log2](i0&(segment_length_m1_),i1,i2,i3,i4,i5);
- }
-
- template< typename iType0 , typename iType1 , typename iType2 , typename iType3 ,
- typename iType4 , typename iType5 , typename iType6 >
- KOKKOS_FORCEINLINE_FUNCTION
- typename std::enable_if<( std::is_integral<iType0>::value &&
- std::is_integral<iType1>::value &&
- std::is_integral<iType2>::value &&
- std::is_integral<iType3>::value &&
- std::is_integral<iType4>::value &&
- std::is_integral<iType5>::value &&
- std::is_integral<iType6>::value &&
- traits::rank == 7 )
- , typename traits::value_type &
- >::type
- operator() ( const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 ,
- const iType4 & i4 , const iType5 & i5 , const iType6 & i6 ) const
- {
- return segments_[i0>>segment_length_log2](i0&(segment_length_m1_),i1,i2,i3,i4,i5,i6);
- }
-
- template< typename iType0 , typename iType1 , typename iType2 , typename iType3 ,
- typename iType4 , typename iType5 , typename iType6 , typename iType7 >
- KOKKOS_FORCEINLINE_FUNCTION
- typename std::enable_if<( std::is_integral<iType0>::value &&
- std::is_integral<iType1>::value &&
- std::is_integral<iType2>::value &&
- std::is_integral<iType3>::value &&
- std::is_integral<iType4>::value &&
- std::is_integral<iType5>::value &&
- std::is_integral<iType6>::value &&
- std::is_integral<iType7>::value &&
- traits::rank == 8 )
- , typename traits::value_type &
- >::type
- operator() ( const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 ,
- const iType4 & i4 , const iType5 & i5 , const iType6 & i6 , const iType7 & i7 ) const
- {
- return segments_[i0>>segment_length_log2](i0&(segment_length_m1_),i1,i2,i3,i4,i5,i6,i7);
- }
-};
-
-namespace Impl {
-template<class DataType, class Arg1Type, class Arg2Type, class Arg3Type>
-struct delete_segmented_view {
- typedef SegmentedView<DataType , Arg1Type , Arg2Type, Arg3Type> view_type;
- typedef typename view_type::execution_space execution_space;
-
- view_type view_;
- delete_segmented_view(view_type view):view_(view) {
- }
-
- KOKKOS_INLINE_FUNCTION
- void operator() (int i) const {
- delete [] view_.get_segment(i).ptr_on_device();
- }
-};
-
-}
-}
-}
-
-#endif
-
-#endif
diff --git a/lib/kokkos/containers/src/Kokkos_UnorderedMap.hpp b/lib/kokkos/containers/src/Kokkos_UnorderedMap.hpp
index 7a916c6ef..8646d2779 100644
--- a/lib/kokkos/containers/src/Kokkos_UnorderedMap.hpp
+++ b/lib/kokkos/containers/src/Kokkos_UnorderedMap.hpp
@@ -1,848 +1,848 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
/// \file Kokkos_UnorderedMap.hpp
/// \brief Declaration and definition of Kokkos::UnorderedMap.
///
/// This header file declares and defines Kokkos::UnorderedMap and its
/// related nonmember functions.
#ifndef KOKKOS_UNORDERED_MAP_HPP
#define KOKKOS_UNORDERED_MAP_HPP
#include <Kokkos_Core.hpp>
#include <Kokkos_Functional.hpp>
#include <Kokkos_Bitset.hpp>
#include <impl/Kokkos_Traits.hpp>
#include <impl/Kokkos_UnorderedMap_impl.hpp>
#include <iostream>
#include <stdint.h>
#include <stdexcept>
namespace Kokkos {
enum { UnorderedMapInvalidIndex = ~0u };
/// \brief First element of the return value of UnorderedMap::insert().
///
/// Inserting an element into an UnorderedMap is not guaranteed to
/// succeed. There are three possible conditions:
/// <ol>
/// <li> <tt>INSERT_FAILED</tt>: The insert failed. This usually
/// means that the UnorderedMap ran out of space. </li>
/// <li> <tt>INSERT_SUCCESS</tt>: The insert succeeded, and the key
/// did <i>not</i> exist in the table before. </li>
/// <li> <tt>INSERT_EXISTING</tt>: The insert succeeded, and the key
/// <i>did</i> exist in the table before. The new value was
/// ignored and the old value was left in place. </li>
/// </ol>
class UnorderedMapInsertResult
{
private:
enum Status{
SUCCESS = 1u << 31
, EXISTING = 1u << 30
, FREED_EXISTING = 1u << 29
, LIST_LENGTH_MASK = ~(SUCCESS | EXISTING | FREED_EXISTING)
};
public:
/// Did the map successful insert the key/value pair
KOKKOS_FORCEINLINE_FUNCTION
bool success() const { return (m_status & SUCCESS); }
/// Was the key already present in the map
KOKKOS_FORCEINLINE_FUNCTION
bool existing() const { return (m_status & EXISTING); }
/// Did the map fail to insert the key due to insufficent capacity
KOKKOS_FORCEINLINE_FUNCTION
bool failed() const { return m_index == UnorderedMapInvalidIndex; }
/// Did the map lose a race condition to insert a dupulicate key/value pair
/// where an index was claimed that needed to be released
KOKKOS_FORCEINLINE_FUNCTION
bool freed_existing() const { return (m_status & FREED_EXISTING); }
/// How many iterations through the insert loop did it take before the
/// map returned
KOKKOS_FORCEINLINE_FUNCTION
uint32_t list_position() const { return (m_status & LIST_LENGTH_MASK); }
/// Index where the key can be found as long as the insert did not fail
KOKKOS_FORCEINLINE_FUNCTION
uint32_t index() const { return m_index; }
KOKKOS_FORCEINLINE_FUNCTION
UnorderedMapInsertResult()
: m_index(UnorderedMapInvalidIndex)
, m_status(0)
{}
KOKKOS_FORCEINLINE_FUNCTION
void increment_list_position()
{
m_status += (list_position() < LIST_LENGTH_MASK) ? 1u : 0u;
}
KOKKOS_FORCEINLINE_FUNCTION
void set_existing(uint32_t i, bool arg_freed_existing)
{
m_index = i;
m_status = EXISTING | (arg_freed_existing ? FREED_EXISTING : 0u) | list_position();
}
KOKKOS_FORCEINLINE_FUNCTION
void set_success(uint32_t i)
{
m_index = i;
m_status = SUCCESS | list_position();
}
private:
uint32_t m_index;
uint32_t m_status;
};
/// \class UnorderedMap
/// \brief Thread-safe, performance-portable lookup table.
///
/// This class provides a lookup table. In terms of functionality,
/// this class compares to std::unordered_map (new in C++11).
/// "Unordered" means that keys are not stored in any particular
/// order, unlike (for example) std::map. "Thread-safe" means that
/// lookups, insertion, and deletion are safe to call by multiple
/// threads in parallel. "Performance-portable" means that parallel
/// performance of these operations is reasonable, on multiple
/// hardware platforms. Platforms on which performance has been
/// tested include conventional Intel x86 multicore processors, Intel
/// Xeon Phi ("MIC"), and NVIDIA GPUs.
///
/// Parallel performance portability entails design decisions that
/// might differ from one's expectation for a sequential interface.
/// This particularly affects insertion of single elements. In an
/// interface intended for sequential use, insertion might reallocate
/// memory if the original allocation did not suffice to hold the new
/// element. In this class, insertion does <i>not</i> reallocate
/// memory. This means that it might fail. insert() returns an enum
/// which indicates whether the insert failed. There are three
/// possible conditions:
/// <ol>
/// <li> <tt>INSERT_FAILED</tt>: The insert failed. This usually
/// means that the UnorderedMap ran out of space. </li>
/// <li> <tt>INSERT_SUCCESS</tt>: The insert succeeded, and the key
/// did <i>not</i> exist in the table before. </li>
/// <li> <tt>INSERT_EXISTING</tt>: The insert succeeded, and the key
/// <i>did</i> exist in the table before. The new value was
/// ignored and the old value was left in place. </li>
/// </ol>
///
/// \tparam Key Type of keys of the lookup table. If \c const, users
/// are not allowed to add or remove keys, though they are allowed
/// to change values. In that case, the implementation may make
/// optimizations specific to the <tt>Device</tt>. For example, if
/// <tt>Device</tt> is \c Cuda, it may use texture fetches to access
/// keys.
///
/// \tparam Value Type of values stored in the lookup table. You may use
/// \c void here, in which case the table will be a set of keys. If
/// \c const, users are not allowed to change entries.
/// In that case, the implementation may make
/// optimizations specific to the \c Device, such as using texture
/// fetches to access values.
///
/// \tparam Device The Kokkos Device type.
///
/// \tparam Hasher Definition of the hash function for instances of
/// <tt>Key</tt>. The default will calculate a bitwise hash.
///
/// \tparam EqualTo Definition of the equality function for instances of
/// <tt>Key</tt>. The default will do a bitwise equality comparison.
///
template < typename Key
, typename Value
, typename Device = Kokkos::DefaultExecutionSpace
, typename Hasher = pod_hash<typename Impl::remove_const<Key>::type>
, typename EqualTo = pod_equal_to<typename Impl::remove_const<Key>::type>
>
class UnorderedMap
{
private:
typedef typename ViewTraits<Key,Device,void,void>::host_mirror_space host_mirror_space ;
public:
//! \name Public types and constants
//@{
//key_types
typedef Key declared_key_type;
typedef typename Impl::remove_const<declared_key_type>::type key_type;
typedef typename Impl::add_const<key_type>::type const_key_type;
//value_types
typedef Value declared_value_type;
typedef typename Impl::remove_const<declared_value_type>::type value_type;
typedef typename Impl::add_const<value_type>::type const_value_type;
typedef Device execution_space;
typedef Hasher hasher_type;
typedef EqualTo equal_to_type;
typedef uint32_t size_type;
//map_types
typedef UnorderedMap<declared_key_type,declared_value_type,execution_space,hasher_type,equal_to_type> declared_map_type;
typedef UnorderedMap<key_type,value_type,execution_space,hasher_type,equal_to_type> insertable_map_type;
typedef UnorderedMap<const_key_type,value_type,execution_space,hasher_type,equal_to_type> modifiable_map_type;
typedef UnorderedMap<const_key_type,const_value_type,execution_space,hasher_type,equal_to_type> const_map_type;
- static const bool is_set = Impl::is_same<void,value_type>::value;
- static const bool has_const_key = Impl::is_same<const_key_type,declared_key_type>::value;
- static const bool has_const_value = is_set || Impl::is_same<const_value_type,declared_value_type>::value;
+ static const bool is_set = std::is_same<void,value_type>::value;
+ static const bool has_const_key = std::is_same<const_key_type,declared_key_type>::value;
+ static const bool has_const_value = is_set || std::is_same<const_value_type,declared_value_type>::value;
static const bool is_insertable_map = !has_const_key && (is_set || !has_const_value);
static const bool is_modifiable_map = has_const_key && !has_const_value;
static const bool is_const_map = has_const_key && has_const_value;
typedef UnorderedMapInsertResult insert_result;
typedef UnorderedMap<Key,Value,host_mirror_space,Hasher,EqualTo> HostMirror;
typedef Impl::UnorderedMapHistogram<const_map_type> histogram_type;
//@}
private:
enum { invalid_index = ~static_cast<size_type>(0) };
typedef typename Impl::if_c< is_set, int, declared_value_type>::type impl_value_type;
typedef typename Impl::if_c< is_insertable_map
, View< key_type *, execution_space>
, View< const key_type *, execution_space, MemoryTraits<RandomAccess> >
>::type key_type_view;
typedef typename Impl::if_c< is_insertable_map || is_modifiable_map
, View< impl_value_type *, execution_space>
, View< const impl_value_type *, execution_space, MemoryTraits<RandomAccess> >
>::type value_type_view;
typedef typename Impl::if_c< is_insertable_map
, View< size_type *, execution_space>
, View< const size_type *, execution_space, MemoryTraits<RandomAccess> >
>::type size_type_view;
typedef typename Impl::if_c< is_insertable_map
, Bitset< execution_space >
, ConstBitset< execution_space>
>::type bitset_type;
enum { modified_idx = 0, erasable_idx = 1, failed_insert_idx = 2 };
enum { num_scalars = 3 };
typedef View< int[num_scalars], LayoutLeft, execution_space> scalars_view;
public:
//! \name Public member functions
//@{
UnorderedMap()
: m_bounded_insert()
, m_hasher()
, m_equal_to()
, m_size()
, m_available_indexes()
, m_hash_lists()
, m_next_index()
, m_keys()
, m_values()
, m_scalars()
{}
/// \brief Constructor
///
/// \param capacity_hint [in] Initial guess of how many unique keys will be inserted into the map
/// \param hash [in] Hasher function for \c Key instances. The
/// default value usually suffices.
UnorderedMap( size_type capacity_hint, hasher_type hasher = hasher_type(), equal_to_type equal_to = equal_to_type() )
: m_bounded_insert(true)
, m_hasher(hasher)
, m_equal_to(equal_to)
, m_size()
, m_available_indexes(calculate_capacity(capacity_hint))
, m_hash_lists(ViewAllocateWithoutInitializing("UnorderedMap hash list"), Impl::find_hash_size(capacity()))
, m_next_index(ViewAllocateWithoutInitializing("UnorderedMap next index"), capacity()+1) // +1 so that the *_at functions can always return a valid reference
, m_keys("UnorderedMap keys",capacity()+1)
, m_values("UnorderedMap values",(is_set? 1 : capacity()+1))
, m_scalars("UnorderedMap scalars")
{
if (!is_insertable_map) {
throw std::runtime_error("Cannot construct a non-insertable (i.e. const key_type) unordered_map");
}
Kokkos::deep_copy(m_hash_lists, invalid_index);
Kokkos::deep_copy(m_next_index, invalid_index);
}
void reset_failed_insert_flag()
{
reset_flag(failed_insert_idx);
}
histogram_type get_histogram()
{
return histogram_type(*this);
}
//! Clear all entries in the table.
void clear()
{
m_bounded_insert = true;
if (capacity() == 0) return;
m_available_indexes.clear();
Kokkos::deep_copy(m_hash_lists, invalid_index);
Kokkos::deep_copy(m_next_index, invalid_index);
{
const key_type tmp = key_type();
Kokkos::deep_copy(m_keys,tmp);
}
if (is_set){
const impl_value_type tmp = impl_value_type();
Kokkos::deep_copy(m_values,tmp);
}
{
Kokkos::deep_copy(m_scalars, 0);
}
}
/// \brief Change the capacity of the the map
///
/// If there are no failed inserts the current size of the map will
/// be used as a lower bound for the input capacity.
/// If the map is not empty and does not have failed inserts
/// and the capacity changes then the current data is copied
/// into the resized / rehashed map.
///
/// This is <i>not</i> a device function; it may <i>not</i> be
/// called in a parallel kernel.
bool rehash(size_type requested_capacity = 0)
{
const bool bounded_insert = (capacity() == 0) || (size() == 0u);
return rehash(requested_capacity, bounded_insert );
}
bool rehash(size_type requested_capacity, bool bounded_insert)
{
if(!is_insertable_map) return false;
const size_type curr_size = size();
requested_capacity = (requested_capacity < curr_size) ? curr_size : requested_capacity;
insertable_map_type tmp(requested_capacity, m_hasher, m_equal_to);
if (curr_size) {
tmp.m_bounded_insert = false;
Impl::UnorderedMapRehash<insertable_map_type> f(tmp,*this);
f.apply();
}
tmp.m_bounded_insert = bounded_insert;
*this = tmp;
return true;
}
/// \brief The number of entries in the table.
///
/// This method has undefined behavior when erasable() is true.
///
/// Note that this is not a device function; it cannot be called in
/// a parallel kernel. The value is not stored as a variable; it
/// must be computed.
size_type size() const
{
if( capacity() == 0u ) return 0u;
if (modified()) {
m_size = m_available_indexes.count();
reset_flag(modified_idx);
}
return m_size;
}
/// \brief The current number of failed insert() calls.
///
/// This is <i>not</i> a device function; it may <i>not</i> be
/// called in a parallel kernel. The value is not stored as a
/// variable; it must be computed.
bool failed_insert() const
{
return get_flag(failed_insert_idx);
}
bool erasable() const
{
return is_insertable_map ? get_flag(erasable_idx) : false;
}
bool begin_erase()
{
bool result = !erasable();
if (is_insertable_map && result) {
execution_space::fence();
set_flag(erasable_idx);
execution_space::fence();
}
return result;
}
bool end_erase()
{
bool result = erasable();
if (is_insertable_map && result) {
execution_space::fence();
Impl::UnorderedMapErase<declared_map_type> f(*this);
f.apply();
execution_space::fence();
reset_flag(erasable_idx);
}
return result;
}
/// \brief The maximum number of entries that the table can hold.
///
/// This <i>is</i> a device function; it may be called in a parallel
/// kernel.
KOKKOS_FORCEINLINE_FUNCTION
size_type capacity() const
{ return m_available_indexes.size(); }
/// \brief The number of hash table "buckets."
///
/// This is different than the number of entries that the table can
/// hold. Each key hashes to an index in [0, hash_capacity() - 1].
/// That index can hold zero or more entries. This class decides
/// what hash_capacity() should be, given the user's upper bound on
/// the number of entries the table must be able to hold.
///
/// This <i>is</i> a device function; it may be called in a parallel
/// kernel.
KOKKOS_INLINE_FUNCTION
size_type hash_capacity() const
{ return m_hash_lists.dimension_0(); }
//---------------------------------------------------------------------------
//---------------------------------------------------------------------------
/// This <i>is</i> a device function; it may be called in a parallel
/// kernel. As discussed in the class documentation, it need not
/// succeed. The return value tells you if it did.
///
/// \param k [in] The key to attempt to insert.
/// \param v [in] The corresponding value to attempt to insert. If
/// using this class as a set (with Value = void), then you need not
/// provide this value.
KOKKOS_INLINE_FUNCTION
insert_result insert(key_type const& k, impl_value_type const&v = impl_value_type()) const
{
insert_result result;
if ( !is_insertable_map || capacity() == 0u || m_scalars((int)erasable_idx) ) {
return result;
}
if ( !m_scalars((int)modified_idx) ) {
m_scalars((int)modified_idx) = true;
}
int volatile & failed_insert_ref = m_scalars((int)failed_insert_idx) ;
const size_type hash_value = m_hasher(k);
const size_type hash_list = hash_value % m_hash_lists.dimension_0();
size_type * curr_ptr = & m_hash_lists[ hash_list ];
size_type new_index = invalid_index ;
// Force integer multiply to long
size_type index_hint = static_cast<size_type>( (static_cast<double>(hash_list) * capacity()) / m_hash_lists.dimension_0());
size_type find_attempts = 0;
enum { bounded_find_attempts = 32u };
const size_type max_attempts = (m_bounded_insert && (bounded_find_attempts < m_available_indexes.max_hint()) ) ?
bounded_find_attempts :
m_available_indexes.max_hint();
bool not_done = true ;
#if defined( __MIC__ )
#pragma noprefetch
#endif
while ( not_done ) {
// Continue searching the unordered list for this key,
// list will only be appended during insert phase.
// Need volatile_load as other threads may be appending.
size_type curr = volatile_load(curr_ptr);
KOKKOS_NONTEMPORAL_PREFETCH_LOAD(&m_keys[curr != invalid_index ? curr : 0]);
#if defined( __MIC__ )
#pragma noprefetch
#endif
while ( curr != invalid_index && ! m_equal_to( volatile_load(&m_keys[curr]), k) ) {
result.increment_list_position();
index_hint = curr;
curr_ptr = &m_next_index[curr];
curr = volatile_load(curr_ptr);
KOKKOS_NONTEMPORAL_PREFETCH_LOAD(&m_keys[curr != invalid_index ? curr : 0]);
}
//------------------------------------------------------------
// If key already present then return that index.
if ( curr != invalid_index ) {
const bool free_existing = new_index != invalid_index;
if ( free_existing ) {
// Previously claimed an unused entry that was not inserted.
// Release this unused entry immediately.
if (!m_available_indexes.reset(new_index) ) {
printf("Unable to free existing\n");
}
}
result.set_existing(curr, free_existing);
not_done = false ;
}
//------------------------------------------------------------
// Key is not currently in the map.
// If the thread has claimed an entry try to insert now.
else {
//------------------------------------------------------------
// If have not already claimed an unused entry then do so now.
if (new_index == invalid_index) {
bool found = false;
// use the hash_list as the flag for the search direction
Kokkos::tie(found, index_hint) = m_available_indexes.find_any_unset_near( index_hint, hash_list );
// found and index and this thread set it
if ( !found && ++find_attempts >= max_attempts ) {
failed_insert_ref = true;
not_done = false ;
}
else if (m_available_indexes.set(index_hint) ) {
new_index = index_hint;
// Set key and value
KOKKOS_NONTEMPORAL_PREFETCH_STORE(&m_keys[new_index]);
m_keys[new_index] = k ;
if (!is_set) {
KOKKOS_NONTEMPORAL_PREFETCH_STORE(&m_values[new_index]);
m_values[new_index] = v ;
}
// Do not proceed until key and value are updated in global memory
memory_fence();
}
}
else if (failed_insert_ref) {
not_done = false;
}
// Attempt to append claimed entry into the list.
// Another thread may also be trying to append the same list so protect with atomic.
if ( new_index != invalid_index &&
curr == atomic_compare_exchange(curr_ptr, static_cast<size_type>(invalid_index), new_index) ) {
// Succeeded in appending
result.set_success(new_index);
not_done = false ;
}
}
} // while ( not_done )
return result ;
}
KOKKOS_INLINE_FUNCTION
bool erase(key_type const& k) const
{
bool result = false;
if(is_insertable_map && 0u < capacity() && m_scalars((int)erasable_idx)) {
if ( ! m_scalars((int)modified_idx) ) {
m_scalars((int)modified_idx) = true;
}
size_type index = find(k);
if (valid_at(index)) {
m_available_indexes.reset(index);
result = true;
}
}
return result;
}
/// \brief Find the given key \c k, if it exists in the table.
///
/// \return If the key exists in the table, the index of the
/// value corresponding to that key; otherwise, an invalid index.
///
/// This <i>is</i> a device function; it may be called in a parallel
/// kernel.
KOKKOS_INLINE_FUNCTION
size_type find( const key_type & k) const
{
size_type curr = 0u < capacity() ? m_hash_lists( m_hasher(k) % m_hash_lists.dimension_0() ) : invalid_index ;
KOKKOS_NONTEMPORAL_PREFETCH_LOAD(&m_keys[curr != invalid_index ? curr : 0]);
while (curr != invalid_index && !m_equal_to( m_keys[curr], k) ) {
KOKKOS_NONTEMPORAL_PREFETCH_LOAD(&m_keys[curr != invalid_index ? curr : 0]);
curr = m_next_index[curr];
}
return curr;
}
/// \brief Does the key exist in the map
///
/// This <i>is</i> a device function; it may be called in a parallel
/// kernel.
KOKKOS_INLINE_FUNCTION
bool exists( const key_type & k) const
{
return valid_at(find(k));
}
/// \brief Get the value with \c i as its direct index.
///
/// \param i [in] Index directly into the array of entries.
///
/// This <i>is</i> a device function; it may be called in a parallel
/// kernel.
///
/// 'const value_type' via Cuda texture fetch must return by value.
KOKKOS_FORCEINLINE_FUNCTION
typename Impl::if_c< (is_set || has_const_value), impl_value_type, impl_value_type &>::type
value_at(size_type i) const
{
return m_values[ is_set ? 0 : (i < capacity() ? i : capacity()) ];
}
/// \brief Get the key with \c i as its direct index.
///
/// \param i [in] Index directly into the array of entries.
///
/// This <i>is</i> a device function; it may be called in a parallel
/// kernel.
KOKKOS_FORCEINLINE_FUNCTION
key_type key_at(size_type i) const
{
return m_keys[ i < capacity() ? i : capacity() ];
}
KOKKOS_FORCEINLINE_FUNCTION
bool valid_at(size_type i) const
{
return m_available_indexes.test(i);
}
template <typename SKey, typename SValue>
UnorderedMap( UnorderedMap<SKey,SValue,Device,Hasher,EqualTo> const& src,
typename Impl::enable_if< Impl::UnorderedMapCanAssign<declared_key_type,declared_value_type,SKey,SValue>::value,int>::type = 0
)
: m_bounded_insert(src.m_bounded_insert)
, m_hasher(src.m_hasher)
, m_equal_to(src.m_equal_to)
, m_size(src.m_size)
, m_available_indexes(src.m_available_indexes)
, m_hash_lists(src.m_hash_lists)
, m_next_index(src.m_next_index)
, m_keys(src.m_keys)
, m_values(src.m_values)
, m_scalars(src.m_scalars)
{}
template <typename SKey, typename SValue>
typename Impl::enable_if< Impl::UnorderedMapCanAssign<declared_key_type,declared_value_type,SKey,SValue>::value
,declared_map_type & >::type
operator=( UnorderedMap<SKey,SValue,Device,Hasher,EqualTo> const& src)
{
m_bounded_insert = src.m_bounded_insert;
m_hasher = src.m_hasher;
m_equal_to = src.m_equal_to;
m_size = src.m_size;
m_available_indexes = src.m_available_indexes;
m_hash_lists = src.m_hash_lists;
m_next_index = src.m_next_index;
m_keys = src.m_keys;
m_values = src.m_values;
m_scalars = src.m_scalars;
return *this;
}
template <typename SKey, typename SValue, typename SDevice>
- typename Impl::enable_if< Impl::is_same< typename Impl::remove_const<SKey>::type, key_type>::value &&
- Impl::is_same< typename Impl::remove_const<SValue>::type, value_type>::value
+ typename Impl::enable_if< std::is_same< typename Impl::remove_const<SKey>::type, key_type>::value &&
+ std::is_same< typename Impl::remove_const<SValue>::type, value_type>::value
>::type
create_copy_view( UnorderedMap<SKey, SValue, SDevice, Hasher,EqualTo> const& src)
{
if (m_hash_lists.ptr_on_device() != src.m_hash_lists.ptr_on_device()) {
insertable_map_type tmp;
tmp.m_bounded_insert = src.m_bounded_insert;
tmp.m_hasher = src.m_hasher;
tmp.m_equal_to = src.m_equal_to;
tmp.m_size = src.size();
tmp.m_available_indexes = bitset_type( src.capacity() );
tmp.m_hash_lists = size_type_view( ViewAllocateWithoutInitializing("UnorderedMap hash list"), src.m_hash_lists.dimension_0() );
tmp.m_next_index = size_type_view( ViewAllocateWithoutInitializing("UnorderedMap next index"), src.m_next_index.dimension_0() );
tmp.m_keys = key_type_view( ViewAllocateWithoutInitializing("UnorderedMap keys"), src.m_keys.dimension_0() );
tmp.m_values = value_type_view( ViewAllocateWithoutInitializing("UnorderedMap values"), src.m_values.dimension_0() );
tmp.m_scalars = scalars_view("UnorderedMap scalars");
Kokkos::deep_copy(tmp.m_available_indexes, src.m_available_indexes);
typedef Kokkos::Impl::DeepCopy< typename execution_space::memory_space, typename SDevice::memory_space > raw_deep_copy;
raw_deep_copy(tmp.m_hash_lists.ptr_on_device(), src.m_hash_lists.ptr_on_device(), sizeof(size_type)*src.m_hash_lists.dimension_0());
raw_deep_copy(tmp.m_next_index.ptr_on_device(), src.m_next_index.ptr_on_device(), sizeof(size_type)*src.m_next_index.dimension_0());
raw_deep_copy(tmp.m_keys.ptr_on_device(), src.m_keys.ptr_on_device(), sizeof(key_type)*src.m_keys.dimension_0());
if (!is_set) {
raw_deep_copy(tmp.m_values.ptr_on_device(), src.m_values.ptr_on_device(), sizeof(impl_value_type)*src.m_values.dimension_0());
}
raw_deep_copy(tmp.m_scalars.ptr_on_device(), src.m_scalars.ptr_on_device(), sizeof(int)*num_scalars );
*this = tmp;
}
}
//@}
private: // private member functions
bool modified() const
{
return get_flag(modified_idx);
}
void set_flag(int flag) const
{
typedef Kokkos::Impl::DeepCopy< typename execution_space::memory_space, Kokkos::HostSpace > raw_deep_copy;
const int true_ = true;
raw_deep_copy(m_scalars.ptr_on_device() + flag, &true_, sizeof(int));
}
void reset_flag(int flag) const
{
typedef Kokkos::Impl::DeepCopy< typename execution_space::memory_space, Kokkos::HostSpace > raw_deep_copy;
const int false_ = false;
raw_deep_copy(m_scalars.ptr_on_device() + flag, &false_, sizeof(int));
}
bool get_flag(int flag) const
{
typedef Kokkos::Impl::DeepCopy< Kokkos::HostSpace, typename execution_space::memory_space > raw_deep_copy;
int result = false;
raw_deep_copy(&result, m_scalars.ptr_on_device() + flag, sizeof(int));
return result;
}
static uint32_t calculate_capacity(uint32_t capacity_hint)
{
// increase by 16% and round to nears multiple of 128
return capacity_hint ? ((static_cast<uint32_t>(7ull*capacity_hint/6u) + 127u)/128u)*128u : 128u;
}
private: // private members
bool m_bounded_insert;
hasher_type m_hasher;
equal_to_type m_equal_to;
mutable size_type m_size;
bitset_type m_available_indexes;
size_type_view m_hash_lists;
size_type_view m_next_index;
key_type_view m_keys;
value_type_view m_values;
scalars_view m_scalars;
template <typename KKey, typename VValue, typename DDevice, typename HHash, typename EEqualTo>
friend class UnorderedMap;
template <typename UMap>
friend struct Impl::UnorderedMapErase;
template <typename UMap>
friend struct Impl::UnorderedMapHistogram;
template <typename UMap>
friend struct Impl::UnorderedMapPrint;
};
// Specialization of deep_copy for two UnorderedMap objects.
template < typename DKey, typename DT, typename DDevice
, typename SKey, typename ST, typename SDevice
, typename Hasher, typename EqualTo >
inline void deep_copy( UnorderedMap<DKey, DT, DDevice, Hasher, EqualTo> & dst
, const UnorderedMap<SKey, ST, SDevice, Hasher, EqualTo> & src )
{
dst.create_copy_view(src);
}
} // namespace Kokkos
#endif //KOKKOS_UNORDERED_MAP_HPP
diff --git a/lib/kokkos/containers/unit_tests/CMakeLists.txt b/lib/kokkos/containers/unit_tests/CMakeLists.txt
index 7fff0f835..b9d860f32 100644
--- a/lib/kokkos/containers/unit_tests/CMakeLists.txt
+++ b/lib/kokkos/containers/unit_tests/CMakeLists.txt
@@ -1,40 +1,40 @@
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINARY_DIR})
-INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR})
+INCLUDE_DIRECTORIES(REQUIRED_DURING_INSTALLATION_TESTING ${CMAKE_CURRENT_SOURCE_DIR})
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR}/../src )
SET(SOURCES
UnitTestMain.cpp
TestCuda.cpp
)
SET(LIBRARIES kokkoscore)
IF(Kokkos_ENABLE_Pthread)
LIST( APPEND SOURCES
TestThreads.cpp
)
ENDIF()
IF(Kokkos_ENABLE_Serial)
LIST( APPEND SOURCES
TestSerial.cpp
)
ENDIF()
IF(Kokkos_ENABLE_OpenMP)
LIST( APPEND SOURCES
TestOpenMP.cpp
)
ENDIF()
TRIBITS_ADD_EXECUTABLE_AND_TEST(
UnitTest
SOURCES ${SOURCES}
COMM serial mpi
NUM_MPI_PROCS 1
FAIL_REGULAR_EXPRESSION " FAILED "
TESTONLYLIBS kokkos_gtest
)
diff --git a/lib/kokkos/containers/unit_tests/Makefile b/lib/kokkos/containers/unit_tests/Makefile
index 48e3ff61d..c45e2be05 100644
--- a/lib/kokkos/containers/unit_tests/Makefile
+++ b/lib/kokkos/containers/unit_tests/Makefile
@@ -1,92 +1,89 @@
KOKKOS_PATH = ../..
GTEST_PATH = ../../TPL/gtest
vpath %.cpp ${KOKKOS_PATH}/containers/unit_tests
default: build_all
echo "End Build"
-
-include $(KOKKOS_PATH)/Makefile.kokkos
-
-ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
- CXX = $(NVCC_WRAPPER)
- CXXFLAGS ?= -O3
- LINK = $(CXX)
- LDFLAGS ?= -lpthread
+ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
+ CXX = $(KOKKOS_PATH)/config/nvcc_wrapper
else
- CXX ?= g++
- CXXFLAGS ?= -O3
- LINK ?= $(CXX)
- LDFLAGS ?= -lpthread
+ CXX = g++
endif
+CXXFLAGS = -O3
+LINK ?= $(CXX)
+LDFLAGS ?= -lpthread
+
+include $(KOKKOS_PATH)/Makefile.kokkos
+
KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/containers/unit_tests
TEST_TARGETS =
TARGETS =
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
OBJ_CUDA = TestCuda.o UnitTestMain.o gtest-all.o
TARGETS += KokkosContainers_UnitTest_Cuda
TEST_TARGETS += test-cuda
endif
ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 1)
OBJ_THREADS = TestThreads.o UnitTestMain.o gtest-all.o
TARGETS += KokkosContainers_UnitTest_Threads
TEST_TARGETS += test-threads
endif
ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
OBJ_OPENMP = TestOpenMP.o UnitTestMain.o gtest-all.o
TARGETS += KokkosContainers_UnitTest_OpenMP
TEST_TARGETS += test-openmp
endif
ifeq ($(KOKKOS_INTERNAL_USE_SERIAL), 1)
OBJ_SERIAL = TestSerial.o UnitTestMain.o gtest-all.o
TARGETS += KokkosContainers_UnitTest_Serial
TEST_TARGETS += test-serial
endif
KokkosContainers_UnitTest_Cuda: $(OBJ_CUDA) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_CUDA) $(KOKKOS_LIBS) $(LIB) -o KokkosContainers_UnitTest_Cuda
KokkosContainers_UnitTest_Threads: $(OBJ_THREADS) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_THREADS) $(KOKKOS_LIBS) $(LIB) -o KokkosContainers_UnitTest_Threads
KokkosContainers_UnitTest_OpenMP: $(OBJ_OPENMP) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_OPENMP) $(KOKKOS_LIBS) $(LIB) -o KokkosContainers_UnitTest_OpenMP
KokkosContainers_UnitTest_Serial: $(OBJ_SERIAL) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_SERIAL) $(KOKKOS_LIBS) $(LIB) -o KokkosContainers_UnitTest_Serial
test-cuda: KokkosContainers_UnitTest_Cuda
./KokkosContainers_UnitTest_Cuda
test-threads: KokkosContainers_UnitTest_Threads
./KokkosContainers_UnitTest_Threads
test-openmp: KokkosContainers_UnitTest_OpenMP
./KokkosContainers_UnitTest_OpenMP
test-serial: KokkosContainers_UnitTest_Serial
./KokkosContainers_UnitTest_Serial
build_all: $(TARGETS)
test: $(TEST_TARGETS)
clean: kokkos-clean
rm -f *.o $(TARGETS)
# Compilation rules
%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
gtest-all.o:$(GTEST_PATH)/gtest/gtest-all.cc
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $(GTEST_PATH)/gtest/gtest-all.cc
diff --git a/lib/kokkos/containers/unit_tests/TestCuda.cpp b/lib/kokkos/containers/unit_tests/TestCuda.cpp
index e30160b24..6be38cd7a 100644
--- a/lib/kokkos/containers/unit_tests/TestCuda.cpp
+++ b/lib/kokkos/containers/unit_tests/TestCuda.cpp
@@ -1,227 +1,229 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <iostream>
#include <iomanip>
#include <stdint.h>
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
#include <Kokkos_Bitset.hpp>
#include <Kokkos_UnorderedMap.hpp>
#include <Kokkos_Vector.hpp>
#include <TestBitset.hpp>
#include <TestUnorderedMap.hpp>
#include <TestStaticCrsGraph.hpp>
#include <TestVector.hpp>
#include <TestDualView.hpp>
#include <TestDynamicView.hpp>
-#include <TestSegmentedView.hpp>
#include <Kokkos_DynRankView.hpp>
#include <TestDynViewAPI.hpp>
+#include <Kokkos_ErrorReporter.hpp>
+#include <TestErrorReporter.hpp>
+
//----------------------------------------------------------------------------
#ifdef KOKKOS_HAVE_CUDA
namespace Test {
class cuda : public ::testing::Test {
protected:
static void SetUpTestCase()
{
std::cout << std::setprecision(5) << std::scientific;
Kokkos::HostSpace::execution_space::initialize();
Kokkos::Cuda::initialize( Kokkos::Cuda::SelectDevice(0) );
}
static void TearDownTestCase()
{
Kokkos::Cuda::finalize();
Kokkos::HostSpace::execution_space::finalize();
}
};
TEST_F( cuda , dyn_view_api) {
TestDynViewAPI< double , Kokkos::Cuda >();
}
TEST_F( cuda , staticcrsgraph )
{
TestStaticCrsGraph::run_test_graph< Kokkos::Cuda >();
TestStaticCrsGraph::run_test_graph2< Kokkos::Cuda >();
}
void cuda_test_insert_close( uint32_t num_nodes
, uint32_t num_inserts
, uint32_t num_duplicates
)
{
test_insert< Kokkos::Cuda >( num_nodes, num_inserts, num_duplicates, true);
}
void cuda_test_insert_far( uint32_t num_nodes
, uint32_t num_inserts
, uint32_t num_duplicates
)
{
test_insert< Kokkos::Cuda >( num_nodes, num_inserts, num_duplicates, false);
}
void cuda_test_failed_insert( uint32_t num_nodes )
{
test_failed_insert< Kokkos::Cuda >( num_nodes );
}
void cuda_test_deep_copy( uint32_t num_nodes )
{
test_deep_copy< Kokkos::Cuda >( num_nodes );
}
void cuda_test_vector_combinations(unsigned int size)
{
test_vector_combinations<int,Kokkos::Cuda>(size);
}
void cuda_test_dualview_combinations(unsigned int size)
{
test_dualview_combinations<int,Kokkos::Cuda>(size);
}
-void cuda_test_segmented_view(unsigned int size)
-{
- test_segmented_view<double,Kokkos::Cuda>(size);
-}
-
void cuda_test_bitset()
{
test_bitset<Kokkos::Cuda>();
}
/*TEST_F( cuda, bitset )
{
cuda_test_bitset();
}*/
#define CUDA_INSERT_TEST( name, num_nodes, num_inserts, num_duplicates, repeat ) \
TEST_F( cuda, UnorderedMap_insert_##name##_##num_nodes##_##num_inserts##_##num_duplicates##_##repeat##x) { \
for (int i=0; i<repeat; ++i) \
cuda_test_insert_##name(num_nodes,num_inserts,num_duplicates); \
}
#define CUDA_FAILED_INSERT_TEST( num_nodes, repeat ) \
TEST_F( cuda, UnorderedMap_failed_insert_##num_nodes##_##repeat##x) { \
for (int i=0; i<repeat; ++i) \
cuda_test_failed_insert(num_nodes); \
}
#define CUDA_ASSIGNEMENT_TEST( num_nodes, repeat ) \
TEST_F( cuda, UnorderedMap_assignment_operators_##num_nodes##_##repeat##x) { \
for (int i=0; i<repeat; ++i) \
cuda_test_assignment_operators(num_nodes); \
}
#define CUDA_DEEP_COPY( num_nodes, repeat ) \
TEST_F( cuda, UnorderedMap_deep_copy##num_nodes##_##repeat##x) { \
for (int i=0; i<repeat; ++i) \
cuda_test_deep_copy(num_nodes); \
}
#define CUDA_VECTOR_COMBINE_TEST( size ) \
TEST_F( cuda, vector_combination##size##x) { \
cuda_test_vector_combinations(size); \
}
#define CUDA_DUALVIEW_COMBINE_TEST( size ) \
TEST_F( cuda, dualview_combination##size##x) { \
cuda_test_dualview_combinations(size); \
}
-#define CUDA_SEGMENTEDVIEW_TEST( size ) \
- TEST_F( cuda, segmentedview_##size##x) { \
- cuda_test_segmented_view(size); \
- }
-
CUDA_DUALVIEW_COMBINE_TEST( 10 )
CUDA_VECTOR_COMBINE_TEST( 10 )
CUDA_VECTOR_COMBINE_TEST( 3057 )
CUDA_INSERT_TEST(close, 100000, 90000, 100, 500)
CUDA_INSERT_TEST(far, 100000, 90000, 100, 500)
CUDA_DEEP_COPY( 10000, 1 )
CUDA_FAILED_INSERT_TEST( 10000, 1000 )
-CUDA_SEGMENTEDVIEW_TEST( 200 )
#undef CUDA_INSERT_TEST
#undef CUDA_FAILED_INSERT_TEST
#undef CUDA_ASSIGNEMENT_TEST
#undef CUDA_DEEP_COPY
#undef CUDA_VECTOR_COMBINE_TEST
#undef CUDA_DUALVIEW_COMBINE_TEST
-#undef CUDA_SEGMENTEDVIEW_TEST
TEST_F( cuda , dynamic_view )
{
typedef TestDynamicView< double , Kokkos::CudaUVMSpace >
TestDynView ;
for ( int i = 0 ; i < 10 ; ++i ) {
TestDynView::run( 100000 + 100 * i );
}
}
+#if defined(KOKKOS_CLASS_LAMBDA)
+TEST_F(cuda, ErrorReporterViaLambda)
+{
+ TestErrorReporter<ErrorReporterDriverUseLambda<Kokkos::Cuda>>();
+}
+#endif
+
+TEST_F(cuda, ErrorReporter)
+{
+ TestErrorReporter<ErrorReporterDriver<Kokkos::Cuda>>();
+}
+
}
#endif /* #ifdef KOKKOS_HAVE_CUDA */
diff --git a/lib/kokkos/containers/unit_tests/TestDynViewAPI.hpp b/lib/kokkos/containers/unit_tests/TestDynViewAPI.hpp
index e71ccc009..d06277864 100644
--- a/lib/kokkos/containers/unit_tests/TestDynViewAPI.hpp
+++ b/lib/kokkos/containers/unit_tests/TestDynViewAPI.hpp
@@ -1,1559 +1,1558 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
#include <stdexcept>
#include <sstream>
#include <iostream>
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
namespace Test {
template< class T , class ... P >
size_t allocation_count( const Kokkos::Experimental::DynRankView<T,P...> & view )
{
const size_t card = view.size();
const size_t alloc = view.span();
return card <= alloc ? alloc : 0 ;
}
/*--------------------------------------------------------------------------*/
template< typename T, class DeviceType>
struct TestViewOperator
{
typedef DeviceType execution_space ;
static const unsigned N = 100 ;
static const unsigned D = 3 ;
typedef Kokkos::Experimental::DynRankView< T , execution_space > view_type ;
const view_type v1 ;
const view_type v2 ;
TestViewOperator()
: v1( "v1" , N , D )
, v2( "v2" , N , D )
{}
static void testit()
{
Kokkos::parallel_for( N , TestViewOperator() );
}
KOKKOS_INLINE_FUNCTION
void operator()( const unsigned i ) const
{
const unsigned X = 0 ;
const unsigned Y = 1 ;
const unsigned Z = 2 ;
v2(i,X) = v1(i,X);
v2(i,Y) = v1(i,Y);
v2(i,Z) = v1(i,Z);
}
};
/*--------------------------------------------------------------------------*/
template< class DataType ,
class DeviceType ,
unsigned Rank >
struct TestViewOperator_LeftAndRight ;
template< class DataType , class DeviceType >
struct TestViewOperator_LeftAndRight< DataType , DeviceType , 7 >
{
typedef DeviceType execution_space ;
typedef typename execution_space::memory_space memory_space ;
typedef typename execution_space::size_type size_type ;
typedef int value_type ;
KOKKOS_INLINE_FUNCTION
static void join( volatile value_type & update ,
const volatile value_type & input )
{ update |= input ; }
KOKKOS_INLINE_FUNCTION
static void init( value_type & update )
{ update = 0 ; }
typedef Kokkos::
Experimental::DynRankView< DataType, Kokkos::LayoutLeft, execution_space > left_view ;
typedef Kokkos::
Experimental::DynRankView< DataType, Kokkos::LayoutRight, execution_space > right_view ;
left_view left ;
right_view right ;
long left_alloc ;
long right_alloc ;
TestViewOperator_LeftAndRight(unsigned N0, unsigned N1, unsigned N2, unsigned N3, unsigned N4, unsigned N5, unsigned N6 )
: left( "left" , N0, N1, N2, N3, N4, N5, N6 )
, right( "right" , N0, N1, N2, N3, N4, N5, N6 )
, left_alloc( allocation_count( left ) )
, right_alloc( allocation_count( right ) )
{}
static void testit(unsigned N0, unsigned N1, unsigned N2, unsigned N3, unsigned N4, unsigned N5, unsigned N6 )
{
TestViewOperator_LeftAndRight driver(N0, N1, N2, N3, N4, N5, N6 );
int error_flag = 0 ;
Kokkos::parallel_reduce( 1 , driver , error_flag );
ASSERT_EQ( error_flag , 0 );
}
KOKKOS_INLINE_FUNCTION
void operator()( const size_type , value_type & update ) const
{
long offset ;
offset = -1 ;
for ( unsigned i6 = 0 ; i6 < unsigned(left.dimension_6()) ; ++i6 )
for ( unsigned i5 = 0 ; i5 < unsigned(left.dimension_5()) ; ++i5 )
for ( unsigned i4 = 0 ; i4 < unsigned(left.dimension_4()) ; ++i4 )
for ( unsigned i3 = 0 ; i3 < unsigned(left.dimension_3()) ; ++i3 )
for ( unsigned i2 = 0 ; i2 < unsigned(left.dimension_2()) ; ++i2 )
for ( unsigned i1 = 0 ; i1 < unsigned(left.dimension_1()) ; ++i1 )
for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
{
const long j = & left( i0, i1, i2, i3, i4, i5, i6 ) -
& left( 0, 0, 0, 0, 0, 0, 0 );
if ( j <= offset || left_alloc <= j ) { update |= 1 ; }
offset = j ;
}
offset = -1 ;
for ( unsigned i0 = 0 ; i0 < unsigned(right.dimension_0()) ; ++i0 )
for ( unsigned i1 = 0 ; i1 < unsigned(right.dimension_1()) ; ++i1 )
for ( unsigned i2 = 0 ; i2 < unsigned(right.dimension_2()) ; ++i2 )
for ( unsigned i3 = 0 ; i3 < unsigned(right.dimension_3()) ; ++i3 )
for ( unsigned i4 = 0 ; i4 < unsigned(right.dimension_4()) ; ++i4 )
for ( unsigned i5 = 0 ; i5 < unsigned(right.dimension_5()) ; ++i5 )
for ( unsigned i6 = 0 ; i6 < unsigned(right.dimension_6()) ; ++i6 )
{
const long j = & right( i0, i1, i2, i3, i4, i5, i6 ) -
& right( 0, 0, 0, 0, 0, 0, 0 );
if ( j <= offset || right_alloc <= j ) { update |= 2 ; }
offset = j ;
}
}
};
template< class DataType , class DeviceType >
struct TestViewOperator_LeftAndRight< DataType , DeviceType , 6 >
{
typedef DeviceType execution_space ;
typedef typename execution_space::memory_space memory_space ;
typedef typename execution_space::size_type size_type ;
typedef int value_type ;
KOKKOS_INLINE_FUNCTION
static void join( volatile value_type & update ,
const volatile value_type & input )
{ update |= input ; }
KOKKOS_INLINE_FUNCTION
static void init( value_type & update )
{ update = 0 ; }
typedef Kokkos::
Experimental::DynRankView< DataType, Kokkos::LayoutLeft, execution_space > left_view ;
typedef Kokkos::
Experimental::DynRankView< DataType, Kokkos::LayoutRight, execution_space > right_view ;
left_view left ;
right_view right ;
long left_alloc ;
long right_alloc ;
TestViewOperator_LeftAndRight(unsigned N0, unsigned N1, unsigned N2, unsigned N3, unsigned N4, unsigned N5 )
: left( "left" , N0, N1, N2, N3, N4, N5 )
, right( "right" , N0, N1, N2, N3, N4, N5 )
, left_alloc( allocation_count( left ) )
, right_alloc( allocation_count( right ) )
{}
static void testit(unsigned N0, unsigned N1, unsigned N2, unsigned N3, unsigned N4, unsigned N5)
{
TestViewOperator_LeftAndRight driver (N0, N1, N2, N3, N4, N5);
int error_flag = 0 ;
Kokkos::parallel_reduce( 1 , driver , error_flag );
ASSERT_EQ( error_flag , 0 );
}
KOKKOS_INLINE_FUNCTION
void operator()( const size_type , value_type & update ) const
{
long offset ;
offset = -1 ;
for ( unsigned i5 = 0 ; i5 < unsigned(left.dimension_5()) ; ++i5 )
for ( unsigned i4 = 0 ; i4 < unsigned(left.dimension_4()) ; ++i4 )
for ( unsigned i3 = 0 ; i3 < unsigned(left.dimension_3()) ; ++i3 )
for ( unsigned i2 = 0 ; i2 < unsigned(left.dimension_2()) ; ++i2 )
for ( unsigned i1 = 0 ; i1 < unsigned(left.dimension_1()) ; ++i1 )
for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
{
const long j = & left( i0, i1, i2, i3, i4, i5 ) -
& left( 0, 0, 0, 0, 0, 0 );
if ( j <= offset || left_alloc <= j ) { update |= 1 ; }
offset = j ;
}
offset = -1 ;
for ( unsigned i0 = 0 ; i0 < unsigned(right.dimension_0()) ; ++i0 )
for ( unsigned i1 = 0 ; i1 < unsigned(right.dimension_1()) ; ++i1 )
for ( unsigned i2 = 0 ; i2 < unsigned(right.dimension_2()) ; ++i2 )
for ( unsigned i3 = 0 ; i3 < unsigned(right.dimension_3()) ; ++i3 )
for ( unsigned i4 = 0 ; i4 < unsigned(right.dimension_4()) ; ++i4 )
for ( unsigned i5 = 0 ; i5 < unsigned(right.dimension_5()) ; ++i5 )
{
const long j = & right( i0, i1, i2, i3, i4, i5 ) -
& right( 0, 0, 0, 0, 0, 0 );
if ( j <= offset || right_alloc <= j ) { update |= 2 ; }
offset = j ;
}
}
};
template< class DataType , class DeviceType >
struct TestViewOperator_LeftAndRight< DataType , DeviceType , 5 >
{
typedef DeviceType execution_space ;
typedef typename execution_space::memory_space memory_space ;
typedef typename execution_space::size_type size_type ;
typedef int value_type ;
KOKKOS_INLINE_FUNCTION
static void join( volatile value_type & update ,
const volatile value_type & input )
{ update |= input ; }
KOKKOS_INLINE_FUNCTION
static void init( value_type & update )
{ update = 0 ; }
typedef Kokkos::
Experimental::DynRankView< DataType, Kokkos::LayoutLeft, execution_space > left_view ;
typedef Kokkos::
Experimental::DynRankView< DataType, Kokkos::LayoutRight, execution_space > right_view ;
typedef Kokkos::
Experimental::DynRankView< DataType, Kokkos::LayoutStride, execution_space > stride_view ;
left_view left ;
right_view right ;
stride_view left_stride ;
stride_view right_stride ;
long left_alloc ;
long right_alloc ;
TestViewOperator_LeftAndRight(unsigned N0, unsigned N1, unsigned N2, unsigned N3, unsigned N4 )
: left( "left" , N0, N1, N2, N3, N4 )
, right( "right" , N0, N1, N2, N3, N4 )
, left_stride( left )
, right_stride( right )
, left_alloc( allocation_count( left ) )
, right_alloc( allocation_count( right ) )
{}
static void testit(unsigned N0, unsigned N1, unsigned N2, unsigned N3, unsigned N4)
{
TestViewOperator_LeftAndRight driver(N0, N1, N2, N3, N4);
int error_flag = 0 ;
Kokkos::parallel_reduce( 1 , driver , error_flag );
ASSERT_EQ( error_flag , 0 );
}
KOKKOS_INLINE_FUNCTION
void operator()( const size_type , value_type & update ) const
{
long offset ;
offset = -1 ;
for ( unsigned i4 = 0 ; i4 < unsigned(left.dimension_4()) ; ++i4 )
for ( unsigned i3 = 0 ; i3 < unsigned(left.dimension_3()) ; ++i3 )
for ( unsigned i2 = 0 ; i2 < unsigned(left.dimension_2()) ; ++i2 )
for ( unsigned i1 = 0 ; i1 < unsigned(left.dimension_1()) ; ++i1 )
for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
{
const long j = & left( i0, i1, i2, i3, i4 ) -
& left( 0, 0, 0, 0, 0 );
if ( j <= offset || left_alloc <= j ) { update |= 1 ; }
offset = j ;
if ( & left( i0, i1, i2, i3, i4 ) !=
& left_stride( i0, i1, i2, i3, i4 ) ) { update |= 4 ; }
}
offset = -1 ;
for ( unsigned i0 = 0 ; i0 < unsigned(right.dimension_0()) ; ++i0 )
for ( unsigned i1 = 0 ; i1 < unsigned(right.dimension_1()) ; ++i1 )
for ( unsigned i2 = 0 ; i2 < unsigned(right.dimension_2()) ; ++i2 )
for ( unsigned i3 = 0 ; i3 < unsigned(right.dimension_3()) ; ++i3 )
for ( unsigned i4 = 0 ; i4 < unsigned(right.dimension_4()) ; ++i4 )
{
const long j = & right( i0, i1, i2, i3, i4 ) -
& right( 0, 0, 0, 0, 0 );
if ( j <= offset || right_alloc <= j ) { update |= 2 ; }
offset = j ;
if ( & right( i0, i1, i2, i3, i4 ) !=
& right_stride( i0, i1, i2, i3, i4 ) ) { update |= 8 ; }
}
}
};
template< class DataType , class DeviceType >
struct TestViewOperator_LeftAndRight< DataType , DeviceType , 4 >
{
typedef DeviceType execution_space ;
typedef typename execution_space::memory_space memory_space ;
typedef typename execution_space::size_type size_type ;
typedef int value_type ;
KOKKOS_INLINE_FUNCTION
static void join( volatile value_type & update ,
const volatile value_type & input )
{ update |= input ; }
KOKKOS_INLINE_FUNCTION
static void init( value_type & update )
{ update = 0 ; }
typedef Kokkos::
Experimental::DynRankView< DataType, Kokkos::LayoutLeft, execution_space > left_view ;
typedef Kokkos::
Experimental::DynRankView< DataType, Kokkos::LayoutRight, execution_space > right_view ;
left_view left ;
right_view right ;
long left_alloc ;
long right_alloc ;
TestViewOperator_LeftAndRight(unsigned N0, unsigned N1, unsigned N2, unsigned N3)
: left( "left" , N0, N1, N2, N3 )
, right( "right" , N0, N1, N2, N3 )
, left_alloc( allocation_count( left ) )
, right_alloc( allocation_count( right ) )
{}
static void testit(unsigned N0, unsigned N1, unsigned N2, unsigned N3)
{
TestViewOperator_LeftAndRight driver (N0, N1, N2, N3);
int error_flag = 0 ;
Kokkos::parallel_reduce( 1 , driver , error_flag );
ASSERT_EQ( error_flag , 0 );
}
KOKKOS_INLINE_FUNCTION
void operator()( const size_type , value_type & update ) const
{
long offset ;
offset = -1 ;
for ( unsigned i3 = 0 ; i3 < unsigned(left.dimension_3()) ; ++i3 )
for ( unsigned i2 = 0 ; i2 < unsigned(left.dimension_2()) ; ++i2 )
for ( unsigned i1 = 0 ; i1 < unsigned(left.dimension_1()) ; ++i1 )
for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
{
const long j = & left( i0, i1, i2, i3 ) -
& left( 0, 0, 0, 0 );
if ( j <= offset || left_alloc <= j ) { update |= 1 ; }
offset = j ;
}
offset = -1 ;
for ( unsigned i0 = 0 ; i0 < unsigned(right.dimension_0()) ; ++i0 )
for ( unsigned i1 = 0 ; i1 < unsigned(right.dimension_1()) ; ++i1 )
for ( unsigned i2 = 0 ; i2 < unsigned(right.dimension_2()) ; ++i2 )
for ( unsigned i3 = 0 ; i3 < unsigned(right.dimension_3()) ; ++i3 )
{
const long j = & right( i0, i1, i2, i3 ) -
& right( 0, 0, 0, 0 );
if ( j <= offset || right_alloc <= j ) { update |= 2 ; }
offset = j ;
}
}
};
template< class DataType , class DeviceType >
struct TestViewOperator_LeftAndRight< DataType , DeviceType , 3 >
{
typedef DeviceType execution_space ;
typedef typename execution_space::memory_space memory_space ;
typedef typename execution_space::size_type size_type ;
typedef int value_type ;
KOKKOS_INLINE_FUNCTION
static void join( volatile value_type & update ,
const volatile value_type & input )
{ update |= input ; }
KOKKOS_INLINE_FUNCTION
static void init( value_type & update )
{ update = 0 ; }
typedef Kokkos::
Experimental::DynRankView< DataType, Kokkos::LayoutLeft, execution_space > left_view ;
typedef Kokkos::
Experimental::DynRankView< DataType, Kokkos::LayoutRight, execution_space > right_view ;
typedef Kokkos::
Experimental::DynRankView< DataType, Kokkos::LayoutStride, execution_space > stride_view ;
left_view left ;
right_view right ;
stride_view left_stride ;
stride_view right_stride ;
long left_alloc ;
long right_alloc ;
TestViewOperator_LeftAndRight(unsigned N0, unsigned N1, unsigned N2)
: left( std::string("left") , N0, N1, N2 )
, right( std::string("right") , N0, N1, N2 )
, left_stride( left )
, right_stride( right )
, left_alloc( allocation_count( left ) )
, right_alloc( allocation_count( right ) )
{}
static void testit(unsigned N0, unsigned N1, unsigned N2)
{
TestViewOperator_LeftAndRight driver (N0, N1, N2);
int error_flag = 0 ;
Kokkos::parallel_reduce( 1 , driver , error_flag );
ASSERT_EQ( error_flag , 0 );
}
KOKKOS_INLINE_FUNCTION
void operator()( const size_type , value_type & update ) const
{
long offset ;
offset = -1 ;
for ( unsigned i2 = 0 ; i2 < unsigned(left.dimension_2()) ; ++i2 )
for ( unsigned i1 = 0 ; i1 < unsigned(left.dimension_1()) ; ++i1 )
for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
{
const long j = & left( i0, i1, i2 ) -
& left( 0, 0, 0 );
if ( j <= offset || left_alloc <= j ) { update |= 1 ; }
offset = j ;
if ( & left(i0,i1,i2) != & left_stride(i0,i1,i2) ) { update |= 4 ; }
}
offset = -1 ;
for ( unsigned i0 = 0 ; i0 < unsigned(right.dimension_0()) ; ++i0 )
for ( unsigned i1 = 0 ; i1 < unsigned(right.dimension_1()) ; ++i1 )
for ( unsigned i2 = 0 ; i2 < unsigned(right.dimension_2()) ; ++i2 )
{
const long j = & right( i0, i1, i2 ) -
& right( 0, 0, 0 );
if ( j <= offset || right_alloc <= j ) { update |= 2 ; }
offset = j ;
if ( & right(i0,i1,i2) != & right_stride(i0,i1,i2) ) { update |= 8 ; }
}
for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
for ( unsigned i1 = 0 ; i1 < unsigned(left.dimension_1()) ; ++i1 )
for ( unsigned i2 = 0 ; i2 < unsigned(left.dimension_2()) ; ++i2 )
{
if ( & left(i0,i1,i2) != & left(i0,i1,i2,0,0,0,0) ) { update |= 3 ; }
if ( & right(i0,i1,i2) != & right(i0,i1,i2,0,0,0,0) ) { update |= 3 ; }
}
}
};
template< class DataType , class DeviceType >
struct TestViewOperator_LeftAndRight< DataType , DeviceType , 2 >
{
typedef DeviceType execution_space ;
typedef typename execution_space::memory_space memory_space ;
typedef typename execution_space::size_type size_type ;
typedef int value_type ;
KOKKOS_INLINE_FUNCTION
static void join( volatile value_type & update ,
const volatile value_type & input )
{ update |= input ; }
KOKKOS_INLINE_FUNCTION
static void init( value_type & update )
{ update = 0 ; }
typedef Kokkos::
Experimental::DynRankView< DataType, Kokkos::LayoutLeft, execution_space > left_view ;
typedef Kokkos::
Experimental::DynRankView< DataType, Kokkos::LayoutRight, execution_space > right_view ;
left_view left ;
right_view right ;
long left_alloc ;
long right_alloc ;
TestViewOperator_LeftAndRight(unsigned N0, unsigned N1)
: left( "left" , N0, N1 )
, right( "right" , N0, N1 )
, left_alloc( allocation_count( left ) )
, right_alloc( allocation_count( right ) )
{}
static void testit(unsigned N0, unsigned N1)
{
TestViewOperator_LeftAndRight driver(N0, N1);
int error_flag = 0 ;
Kokkos::parallel_reduce( 1 , driver , error_flag );
ASSERT_EQ( error_flag , 0 );
}
KOKKOS_INLINE_FUNCTION
void operator()( const size_type , value_type & update ) const
{
long offset ;
offset = -1 ;
for ( unsigned i1 = 0 ; i1 < unsigned(left.dimension_1()) ; ++i1 )
for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
{
const long j = & left( i0, i1 ) -
& left( 0, 0 );
if ( j <= offset || left_alloc <= j ) { update |= 1 ; }
offset = j ;
}
offset = -1 ;
for ( unsigned i0 = 0 ; i0 < unsigned(right.dimension_0()) ; ++i0 )
for ( unsigned i1 = 0 ; i1 < unsigned(right.dimension_1()) ; ++i1 )
{
const long j = & right( i0, i1 ) -
& right( 0, 0 );
if ( j <= offset || right_alloc <= j ) { update |= 2 ; }
offset = j ;
}
for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
for ( unsigned i1 = 0 ; i1 < unsigned(left.dimension_1()) ; ++i1 )
{
if ( & left(i0,i1) != & left(i0,i1,0,0,0,0,0) ) { update |= 3 ; }
if ( & right(i0,i1) != & right(i0,i1,0,0,0,0,0) ) { update |= 3 ; }
}
}
};
template< class DataType , class DeviceType >
struct TestViewOperator_LeftAndRight< DataType , DeviceType , 1 >
{
typedef DeviceType execution_space ;
typedef typename execution_space::memory_space memory_space ;
typedef typename execution_space::size_type size_type ;
typedef int value_type ;
KOKKOS_INLINE_FUNCTION
static void join( volatile value_type & update ,
const volatile value_type & input )
{ update |= input ; }
KOKKOS_INLINE_FUNCTION
static void init( value_type & update )
{ update = 0 ; }
typedef Kokkos::
Experimental::DynRankView< DataType, Kokkos::LayoutLeft, execution_space > left_view ;
typedef Kokkos::
Experimental::DynRankView< DataType, Kokkos::LayoutRight, execution_space > right_view ;
typedef Kokkos::
Experimental::DynRankView< DataType, Kokkos::LayoutStride, execution_space > stride_view ;
left_view left ;
right_view right ;
stride_view left_stride ;
stride_view right_stride ;
long left_alloc ;
long right_alloc ;
TestViewOperator_LeftAndRight(unsigned N0)
: left( "left" , N0 )
, right( "right" , N0 )
, left_stride( left )
, right_stride( right )
, left_alloc( allocation_count( left ) )
, right_alloc( allocation_count( right ) )
{}
static void testit(unsigned N0)
{
TestViewOperator_LeftAndRight driver (N0) ;
int error_flag = 0 ;
Kokkos::parallel_reduce( 1 , driver , error_flag );
ASSERT_EQ( error_flag , 0 );
}
KOKKOS_INLINE_FUNCTION
void operator()( const size_type , value_type & update ) const
{
for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
{
if ( & left(i0) != & left(i0,0,0,0,0,0,0) ) { update |= 3 ; }
if ( & right(i0) != & right(i0,0,0,0,0,0,0) ) { update |= 3 ; }
if ( & left(i0) != & left_stride(i0) ) { update |= 4 ; }
if ( & right(i0) != & right_stride(i0) ) { update |= 8 ; }
}
}
};
/*--------------------------------------------------------------------------*/
template< typename T, class DeviceType >
class TestDynViewAPI
{
public:
typedef DeviceType device ;
enum { N0 = 1000 ,
N1 = 3 ,
N2 = 5 ,
N3 = 7 };
typedef Kokkos::Experimental::DynRankView< T , device > dView0 ;
typedef Kokkos::Experimental::DynRankView< const T , device > const_dView0 ;
typedef Kokkos::Experimental::DynRankView< T, device, Kokkos::MemoryUnmanaged > dView0_unmanaged ;
typedef typename dView0::host_mirror_space host_drv_space ;
- typedef Kokkos::Experimental::View< T , device > View0 ;
- typedef Kokkos::Experimental::View< T* , device > View1 ;
- typedef Kokkos::Experimental::View< T******* , device > View7 ;
+ typedef Kokkos::View< T , device > View0 ;
+ typedef Kokkos::View< T* , device > View1 ;
+ typedef Kokkos::View< T******* , device > View7 ;
typedef typename View0::host_mirror_space host_view_space ;
TestDynViewAPI()
{
run_test_resize_realloc();
run_test_mirror();
run_test_scalar();
run_test();
run_test_const();
run_test_subview();
run_test_subview_strided();
run_test_vector();
TestViewOperator< T , device >::testit();
TestViewOperator_LeftAndRight< int , device , 7 >::testit(2,3,4,2,3,4,2);
TestViewOperator_LeftAndRight< int , device , 6 >::testit(2,3,4,2,3,4);
TestViewOperator_LeftAndRight< int , device , 5 >::testit(2,3,4,2,3);
TestViewOperator_LeftAndRight< int , device , 4 >::testit(2,3,4,2);
TestViewOperator_LeftAndRight< int , device , 3 >::testit(2,3,4);
TestViewOperator_LeftAndRight< int , device , 2 >::testit(2,3);
TestViewOperator_LeftAndRight< int , device , 1 >::testit(2);
}
static void run_test_resize_realloc()
{
dView0 drv0("drv0", 10, 20, 30);
ASSERT_EQ( drv0.rank(), 3);
Kokkos::Experimental::resize(drv0, 5, 10);
ASSERT_EQ( drv0.rank(), 2);
ASSERT_EQ( drv0.dimension_0(), 5);
ASSERT_EQ( drv0.dimension_1(), 10);
ASSERT_EQ( drv0.dimension_2(), 1);
Kokkos::Experimental::realloc(drv0, 10, 20);
ASSERT_EQ( drv0.rank(), 2);
ASSERT_EQ( drv0.dimension_0(), 10);
ASSERT_EQ( drv0.dimension_1(), 20);
ASSERT_EQ( drv0.dimension_2(), 1);
}
static void run_test_mirror()
{
typedef Kokkos::Experimental::DynRankView< int , host_drv_space > view_type ;
typedef typename view_type::HostMirror mirror_type ;
view_type a("a");
mirror_type am = Kokkos::Experimental::create_mirror_view(a);
mirror_type ax = Kokkos::Experimental::create_mirror(a);
ASSERT_EQ( & a() , & am() );
ASSERT_EQ( a.rank() , am.rank() );
ASSERT_EQ( ax.rank() , am.rank() );
if (Kokkos::HostSpace::execution_space::is_initialized() )
{
Kokkos::DynRankView<double, Kokkos::LayoutLeft, Kokkos::HostSpace> a_h("A",1000);
auto a_h2 = Kokkos::create_mirror(Kokkos::HostSpace(),a_h);
auto a_d = Kokkos::create_mirror(typename device::memory_space(),a_h);
int equal_ptr_h_h2 = (a_h.data() ==a_h2.data())?1:0;
int equal_ptr_h_d = (a_h.data() ==a_d. data())?1:0;
int equal_ptr_h2_d = (a_h2.data()==a_d. data())?1:0;
ASSERT_EQ(equal_ptr_h_h2,0);
ASSERT_EQ(equal_ptr_h_d ,0);
ASSERT_EQ(equal_ptr_h2_d,0);
ASSERT_EQ(a_h.dimension_0(),a_h2.dimension_0());
ASSERT_EQ(a_h.dimension_0(),a_d .dimension_0());
ASSERT_EQ(a_h.rank(),a_h2.rank());
ASSERT_EQ(a_h.rank(),a_d.rank());
}
if (Kokkos::HostSpace::execution_space::is_initialized() )
{
Kokkos::DynRankView<double, Kokkos::LayoutRight, Kokkos::HostSpace> a_h("A",1000);
auto a_h2 = Kokkos::create_mirror(Kokkos::HostSpace(),a_h);
auto a_d = Kokkos::create_mirror(typename device::memory_space(),a_h);
int equal_ptr_h_h2 = (a_h.data() ==a_h2.data())?1:0;
int equal_ptr_h_d = (a_h.data() ==a_d. data())?1:0;
int equal_ptr_h2_d = (a_h2.data()==a_d. data())?1:0;
ASSERT_EQ(equal_ptr_h_h2,0);
ASSERT_EQ(equal_ptr_h_d ,0);
ASSERT_EQ(equal_ptr_h2_d,0);
ASSERT_EQ(a_h.dimension_0(),a_h2.dimension_0());
ASSERT_EQ(a_h.dimension_0(),a_d .dimension_0());
ASSERT_EQ(a_h.rank(),a_h2.rank());
ASSERT_EQ(a_h.rank(),a_d.rank());
}
if (Kokkos::HostSpace::execution_space::is_initialized() )
{
Kokkos::DynRankView<double, Kokkos::LayoutLeft, Kokkos::HostSpace> a_h("A",1000);
auto a_h2 = Kokkos::create_mirror_view(Kokkos::HostSpace(),a_h);
auto a_d = Kokkos::create_mirror_view(typename device::memory_space(),a_h);
int equal_ptr_h_h2 = a_h.data() ==a_h2.data()?1:0;
int equal_ptr_h_d = a_h.data() ==a_d. data()?1:0;
int equal_ptr_h2_d = a_h2.data()==a_d. data()?1:0;
int is_same_memspace = std::is_same<Kokkos::HostSpace,typename device::memory_space>::value?1:0;
ASSERT_EQ(equal_ptr_h_h2,1);
ASSERT_EQ(equal_ptr_h_d ,is_same_memspace);
ASSERT_EQ(equal_ptr_h2_d ,is_same_memspace);
ASSERT_EQ(a_h.dimension_0(),a_h2.dimension_0());
ASSERT_EQ(a_h.dimension_0(),a_d .dimension_0());
ASSERT_EQ(a_h.rank(),a_h2.rank());
ASSERT_EQ(a_h.rank(),a_d.rank());
}
if (Kokkos::HostSpace::execution_space::is_initialized() )
{
Kokkos::DynRankView<double, Kokkos::LayoutRight, Kokkos::HostSpace> a_h("A",1000);
auto a_h2 = Kokkos::create_mirror_view(Kokkos::HostSpace(),a_h);
auto a_d = Kokkos::create_mirror_view(typename device::memory_space(),a_h);
int equal_ptr_h_h2 = a_h.data() ==a_h2.data()?1:0;
int equal_ptr_h_d = a_h.data() ==a_d. data()?1:0;
int equal_ptr_h2_d = a_h2.data()==a_d. data()?1:0;
int is_same_memspace = std::is_same<Kokkos::HostSpace,typename device::memory_space>::value?1:0;
ASSERT_EQ(equal_ptr_h_h2,1);
ASSERT_EQ(equal_ptr_h_d ,is_same_memspace);
ASSERT_EQ(equal_ptr_h2_d ,is_same_memspace);
ASSERT_EQ(a_h.dimension_0(),a_h2.dimension_0());
ASSERT_EQ(a_h.dimension_0(),a_d .dimension_0());
ASSERT_EQ(a_h.rank(),a_h2.rank());
ASSERT_EQ(a_h.rank(),a_d.rank());
}
if (Kokkos::HostSpace::execution_space::is_initialized() )
{
typedef Kokkos::DynRankView< int , Kokkos::LayoutStride , Kokkos::HostSpace > view_stride_type ;
unsigned order[] = { 6,5,4,3,2,1,0 }, dimen[] = { N0, N1, N2, 2, 2, 2, 2 }; //LayoutRight equivalent
view_stride_type a_h( "a" , Kokkos::LayoutStride::order_dimensions(7, order, dimen) );
auto a_h2 = Kokkos::create_mirror_view(Kokkos::HostSpace(),a_h);
auto a_d = Kokkos::create_mirror_view(typename device::memory_space(),a_h);
int equal_ptr_h_h2 = a_h.data() ==a_h2.data()?1:0;
int equal_ptr_h_d = a_h.data() ==a_d. data()?1:0;
int equal_ptr_h2_d = a_h2.data()==a_d. data()?1:0;
int is_same_memspace = std::is_same<Kokkos::HostSpace,typename device::memory_space>::value?1:0;
ASSERT_EQ(equal_ptr_h_h2,1);
ASSERT_EQ(equal_ptr_h_d ,is_same_memspace);
ASSERT_EQ(equal_ptr_h2_d ,is_same_memspace);
ASSERT_EQ(a_h.dimension_0(),a_h2.dimension_0());
ASSERT_EQ(a_h.dimension_0(),a_d .dimension_0());
ASSERT_EQ(a_h.rank(),a_h2.rank());
ASSERT_EQ(a_h.rank(),a_d.rank());
}
}
static void run_test_scalar()
{
typedef typename dView0::HostMirror hView0 ; //HostMirror of DynRankView is a DynRankView
dView0 dx , dy ;
hView0 hx , hy ;
dx = dView0( "dx" );
dy = dView0( "dy" );
hx = Kokkos::Experimental::create_mirror( dx );
hy = Kokkos::Experimental::create_mirror( dy );
hx() = 1 ;
Kokkos::Experimental::deep_copy( dx , hx );
Kokkos::Experimental::deep_copy( dy , dx );
Kokkos::Experimental::deep_copy( hy , dy );
ASSERT_EQ( hx(), hy() );
ASSERT_EQ( dx.rank() , hx.rank() );
ASSERT_EQ( dy.rank() , hy.rank() );
//View - DynRankView Interoperability tests
// deep_copy DynRankView to View
View0 vx("vx");
Kokkos::deep_copy( vx , dx );
ASSERT_EQ( rank(dx) , rank(vx) );
View0 vy("vy");
Kokkos::deep_copy( vy , dy );
ASSERT_EQ( rank(dy) , rank(vy) );
// deep_copy View to DynRankView
dView0 dxx("dxx");
Kokkos::deep_copy( dxx , vx );
ASSERT_EQ( rank(dxx) , rank(vx) );
View7 vcast = dx.ConstDownCast();
ASSERT_EQ( dx.dimension_0() , vcast.dimension_0() );
ASSERT_EQ( dx.dimension_1() , vcast.dimension_1() );
ASSERT_EQ( dx.dimension_2() , vcast.dimension_2() );
ASSERT_EQ( dx.dimension_3() , vcast.dimension_3() );
ASSERT_EQ( dx.dimension_4() , vcast.dimension_4() );
View7 vcast1( dy.ConstDownCast() );
ASSERT_EQ( dy.dimension_0() , vcast1.dimension_0() );
ASSERT_EQ( dy.dimension_1() , vcast1.dimension_1() );
ASSERT_EQ( dy.dimension_2() , vcast1.dimension_2() );
ASSERT_EQ( dy.dimension_3() , vcast1.dimension_3() );
ASSERT_EQ( dy.dimension_4() , vcast1.dimension_4() );
//View - DynRankView Interoperability tests
// copy View to DynRankView
dView0 dfromvx( vx );
auto hmx = Kokkos::create_mirror_view(dfromvx) ;
Kokkos::deep_copy(hmx , dfromvx);
auto hvx = Kokkos::create_mirror_view(vx) ;
Kokkos::deep_copy(hvx , vx);
ASSERT_EQ( rank(hvx) , rank(hmx) );
ASSERT_EQ( hvx.dimension_0() , hmx.dimension_0() );
ASSERT_EQ( hvx.dimension_1() , hmx.dimension_1() );
// copy-assign View to DynRankView
dView0 dfromvy = vy ;
auto hmy = Kokkos::create_mirror_view(dfromvy) ;
Kokkos::deep_copy(hmy , dfromvy);
auto hvy = Kokkos::create_mirror_view(vy) ;
Kokkos::deep_copy(hvy , vy);
ASSERT_EQ( rank(hvy) , rank(hmy) );
ASSERT_EQ( hvy.dimension_0() , hmy.dimension_0() );
ASSERT_EQ( hvy.dimension_1() , hmy.dimension_1() );
View7 vtest1("vtest1",2,2,2,2,2,2,2);
dView0 dfromv1( vtest1 );
ASSERT_EQ( dfromv1.rank() , vtest1.Rank );
ASSERT_EQ( dfromv1.dimension_0() , vtest1.dimension_0() );
ASSERT_EQ( dfromv1.dimension_1() , vtest1.dimension_1() );
ASSERT_EQ( dfromv1.use_count() , vtest1.use_count() );
dView0 dfromv2( vcast );
ASSERT_EQ( dfromv2.rank() , vcast.Rank );
ASSERT_EQ( dfromv2.dimension_0() , vcast.dimension_0() );
ASSERT_EQ( dfromv2.dimension_1() , vcast.dimension_1() );
ASSERT_EQ( dfromv2.use_count() , vcast.use_count() );
dView0 dfromv3 = vcast1;
ASSERT_EQ( dfromv3.rank() , vcast1.Rank );
ASSERT_EQ( dfromv3.dimension_0() , vcast1.dimension_0() );
ASSERT_EQ( dfromv3.dimension_1() , vcast1.dimension_1() );
ASSERT_EQ( dfromv3.use_count() , vcast1.use_count() );
}
static void run_test()
{
// mfh 14 Feb 2014: This test doesn't actually create instances of
// these types. In order to avoid "declared but unused typedef"
// warnings, we declare empty instances of these types, with the
// usual "(void)" marker to avoid compiler warnings for unused
// variables.
typedef typename dView0::HostMirror hView0 ;
{
hView0 thing;
(void) thing;
}
dView0 d_uninitialized(Kokkos::ViewAllocateWithoutInitializing("uninit"),10,20);
ASSERT_TRUE( d_uninitialized.data() != nullptr );
ASSERT_EQ( d_uninitialized.rank() , 2 );
ASSERT_EQ( d_uninitialized.dimension_0() , 10 );
ASSERT_EQ( d_uninitialized.dimension_1() , 20 );
ASSERT_EQ( d_uninitialized.dimension_2() , 1 );
dView0 dx , dy , dz ;
hView0 hx , hy , hz ;
ASSERT_TRUE( Kokkos::Experimental::is_dyn_rank_view<dView0>::value );
ASSERT_FALSE( Kokkos::Experimental::is_dyn_rank_view< Kokkos::View<double> >::value );
ASSERT_TRUE( dx.ptr_on_device() == 0 ); //Okay with UVM
ASSERT_TRUE( dy.ptr_on_device() == 0 ); //Okay with UVM
ASSERT_TRUE( dz.ptr_on_device() == 0 ); //Okay with UVM
ASSERT_TRUE( hx.ptr_on_device() == 0 );
ASSERT_TRUE( hy.ptr_on_device() == 0 );
ASSERT_TRUE( hz.ptr_on_device() == 0 );
ASSERT_EQ( dx.dimension_0() , 0u ); //Okay with UVM
ASSERT_EQ( dy.dimension_0() , 0u ); //Okay with UVM
ASSERT_EQ( dz.dimension_0() , 0u ); //Okay with UVM
ASSERT_EQ( hx.dimension_0() , 0u );
ASSERT_EQ( hy.dimension_0() , 0u );
ASSERT_EQ( hz.dimension_0() , 0u );
ASSERT_EQ( dx.rank() , 0u ); //Okay with UVM
ASSERT_EQ( hx.rank() , 0u );
dx = dView0( "dx" , N1 , N2 , N3 );
dy = dView0( "dy" , N1 , N2 , N3 );
hx = hView0( "hx" , N1 , N2 , N3 );
hy = hView0( "hy" , N1 , N2 , N3 );
ASSERT_EQ( dx.dimension_0() , unsigned(N1) ); //Okay with UVM
ASSERT_EQ( dy.dimension_0() , unsigned(N1) ); //Okay with UVM
ASSERT_EQ( hx.dimension_0() , unsigned(N1) );
ASSERT_EQ( hy.dimension_0() , unsigned(N1) );
ASSERT_EQ( dx.rank() , 3 ); //Okay with UVM
ASSERT_EQ( hx.rank() , 3 );
dx = dView0( "dx" , N0 , N1 , N2 , N3 );
dy = dView0( "dy" , N0 , N1 , N2 , N3 );
hx = hView0( "hx" , N0 , N1 , N2 , N3 );
hy = hView0( "hy" , N0 , N1 , N2 , N3 );
ASSERT_EQ( dx.dimension_0() , unsigned(N0) );
ASSERT_EQ( dy.dimension_0() , unsigned(N0) );
ASSERT_EQ( hx.dimension_0() , unsigned(N0) );
ASSERT_EQ( hy.dimension_0() , unsigned(N0) );
ASSERT_EQ( dx.rank() , 4 );
ASSERT_EQ( dy.rank() , 4 );
ASSERT_EQ( hx.rank() , 4 );
ASSERT_EQ( hy.rank() , 4 );
ASSERT_EQ( dx.use_count() , size_t(1) );
dView0_unmanaged unmanaged_dx = dx;
ASSERT_EQ( dx.use_count() , size_t(1) );
dView0_unmanaged unmanaged_from_ptr_dx = dView0_unmanaged(dx.ptr_on_device(),
dx.dimension_0(),
dx.dimension_1(),
dx.dimension_2(),
dx.dimension_3());
{
// Destruction of this view should be harmless
const_dView0 unmanaged_from_ptr_const_dx( dx.ptr_on_device() ,
dx.dimension_0() ,
dx.dimension_1() ,
dx.dimension_2() ,
dx.dimension_3() );
}
const_dView0 const_dx = dx ;
ASSERT_EQ( dx.use_count() , size_t(2) );
{
const_dView0 const_dx2;
const_dx2 = const_dx;
ASSERT_EQ( dx.use_count() , size_t(3) );
const_dx2 = dy;
ASSERT_EQ( dx.use_count() , size_t(2) );
const_dView0 const_dx3(dx);
ASSERT_EQ( dx.use_count() , size_t(3) );
dView0_unmanaged dx4_unmanaged(dx);
ASSERT_EQ( dx.use_count() , size_t(3) );
}
ASSERT_EQ( dx.use_count() , size_t(2) );
ASSERT_FALSE( dx.ptr_on_device() == 0 );
ASSERT_FALSE( const_dx.ptr_on_device() == 0 );
ASSERT_FALSE( unmanaged_dx.ptr_on_device() == 0 );
ASSERT_FALSE( unmanaged_from_ptr_dx.ptr_on_device() == 0 );
ASSERT_FALSE( dy.ptr_on_device() == 0 );
ASSERT_NE( dx , dy );
ASSERT_EQ( dx.dimension_0() , unsigned(N0) );
ASSERT_EQ( dx.dimension_1() , unsigned(N1) );
ASSERT_EQ( dx.dimension_2() , unsigned(N2) );
ASSERT_EQ( dx.dimension_3() , unsigned(N3) );
ASSERT_EQ( dy.dimension_0() , unsigned(N0) );
ASSERT_EQ( dy.dimension_1() , unsigned(N1) );
ASSERT_EQ( dy.dimension_2() , unsigned(N2) );
ASSERT_EQ( dy.dimension_3() , unsigned(N3) );
ASSERT_EQ( unmanaged_from_ptr_dx.capacity(),unsigned(N0)*unsigned(N1)*unsigned(N2)*unsigned(N3) );
hx = Kokkos::Experimental::create_mirror( dx );
hy = Kokkos::Experimental::create_mirror( dy );
ASSERT_EQ( hx.rank() , dx.rank() );
ASSERT_EQ( hy.rank() , dy.rank() );
ASSERT_EQ( hx.dimension_0() , unsigned(N0) );
ASSERT_EQ( hx.dimension_1() , unsigned(N1) );
ASSERT_EQ( hx.dimension_2() , unsigned(N2) );
ASSERT_EQ( hx.dimension_3() , unsigned(N3) );
ASSERT_EQ( hy.dimension_0() , unsigned(N0) );
ASSERT_EQ( hy.dimension_1() , unsigned(N1) );
ASSERT_EQ( hy.dimension_2() , unsigned(N2) );
ASSERT_EQ( hy.dimension_3() , unsigned(N3) );
// T v1 = hx() ; // Generates compile error as intended
// T v2 = hx(0,0) ; // Generates compile error as intended
// hx(0,0) = v2 ; // Generates compile error as intended
-/*
-#if ! KOKKOS_USING_EXP_VIEW
+#if 0 /* Asynchronous deep copies not implemented for dynamic rank view */
// Testing with asynchronous deep copy with respect to device
{
size_t count = 0 ;
for ( size_t ip = 0 ; ip < N0 ; ++ip ) {
for ( size_t i1 = 0 ; i1 < hx.dimension_1() ; ++i1 ) {
for ( size_t i2 = 0 ; i2 < hx.dimension_2() ; ++i2 ) {
for ( size_t i3 = 0 ; i3 < hx.dimension_3() ; ++i3 ) {
hx(ip,i1,i2,i3) = ++count ;
}}}}
Kokkos::deep_copy(typename hView0::execution_space(), dx , hx );
Kokkos::deep_copy(typename hView0::execution_space(), dy , dx );
Kokkos::deep_copy(typename hView0::execution_space(), hy , dy );
for ( size_t ip = 0 ; ip < N0 ; ++ip ) {
for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
{ ASSERT_EQ( hx(ip,i1,i2,i3) , hy(ip,i1,i2,i3) ); }
}}}}
Kokkos::deep_copy(typename hView0::execution_space(), dx , T(0) );
Kokkos::deep_copy(typename hView0::execution_space(), hx , dx );
for ( size_t ip = 0 ; ip < N0 ; ++ip ) {
for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
{ ASSERT_EQ( hx(ip,i1,i2,i3) , T(0) ); }
}}}}
}
// Testing with asynchronous deep copy with respect to host
{
size_t count = 0 ;
for ( size_t ip = 0 ; ip < N0 ; ++ip ) {
for ( size_t i1 = 0 ; i1 < hx.dimension_1() ; ++i1 ) {
for ( size_t i2 = 0 ; i2 < hx.dimension_2() ; ++i2 ) {
for ( size_t i3 = 0 ; i3 < hx.dimension_3() ; ++i3 ) {
hx(ip,i1,i2,i3) = ++count ;
}}}}
Kokkos::deep_copy(typename dView0::execution_space(), dx , hx );
Kokkos::deep_copy(typename dView0::execution_space(), dy , dx );
Kokkos::deep_copy(typename dView0::execution_space(), hy , dy );
for ( size_t ip = 0 ; ip < N0 ; ++ip ) {
for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
{ ASSERT_EQ( hx(ip,i1,i2,i3) , hy(ip,i1,i2,i3) ); }
}}}}
Kokkos::deep_copy(typename dView0::execution_space(), dx , T(0) );
Kokkos::deep_copy(typename dView0::execution_space(), hx , dx );
for ( size_t ip = 0 ; ip < N0 ; ++ip ) {
for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
{ ASSERT_EQ( hx(ip,i1,i2,i3) , T(0) ); }
}}}}
}
-#endif */ // #if ! KOKKOS_USING_EXP_VIEW
+#endif
// Testing with synchronous deep copy
{
size_t count = 0 ;
for ( size_t ip = 0 ; ip < N0 ; ++ip ) {
for ( size_t i1 = 0 ; i1 < hx.dimension_1() ; ++i1 ) {
for ( size_t i2 = 0 ; i2 < hx.dimension_2() ; ++i2 ) {
for ( size_t i3 = 0 ; i3 < hx.dimension_3() ; ++i3 ) {
hx(ip,i1,i2,i3) = ++count ;
}}}}
Kokkos::Experimental::deep_copy( dx , hx );
Kokkos::Experimental::deep_copy( dy , dx );
Kokkos::Experimental::deep_copy( hy , dy );
for ( size_t ip = 0 ; ip < N0 ; ++ip ) {
for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
{ ASSERT_EQ( hx(ip,i1,i2,i3) , hy(ip,i1,i2,i3) ); }
}}}}
Kokkos::Experimental::deep_copy( dx , T(0) );
Kokkos::Experimental::deep_copy( hx , dx );
for ( size_t ip = 0 ; ip < N0 ; ++ip ) {
for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
{ ASSERT_EQ( hx(ip,i1,i2,i3) , T(0) ); }
}}}}
// ASSERT_EQ( hx(0,0,0,0,0,0,0,0) , T(0) ); //Test rank8 op behaves properly - if implemented
}
dz = dx ; ASSERT_EQ( dx, dz); ASSERT_NE( dy, dz);
dz = dy ; ASSERT_EQ( dy, dz); ASSERT_NE( dx, dz);
dx = dView0();
ASSERT_TRUE( dx.ptr_on_device() == 0 );
ASSERT_FALSE( dy.ptr_on_device() == 0 );
ASSERT_FALSE( dz.ptr_on_device() == 0 );
dy = dView0();
ASSERT_TRUE( dx.ptr_on_device() == 0 );
ASSERT_TRUE( dy.ptr_on_device() == 0 );
ASSERT_FALSE( dz.ptr_on_device() == 0 );
dz = dView0();
ASSERT_TRUE( dx.ptr_on_device() == 0 );
ASSERT_TRUE( dy.ptr_on_device() == 0 );
ASSERT_TRUE( dz.ptr_on_device() == 0 );
//View - DynRankView Interoperability tests
// deep_copy from view to dynrankview
const int testdim = 4;
dView0 dxx("dxx",testdim);
View1 vxx("vxx",testdim);
auto hvxx = Kokkos::create_mirror_view(vxx);
for (int i = 0; i < testdim; ++i)
{ hvxx(i) = i; }
Kokkos::deep_copy(vxx,hvxx);
Kokkos::deep_copy(dxx,vxx);
auto hdxx = Kokkos::create_mirror_view(dxx);
Kokkos::deep_copy(hdxx,dxx);
for (int i = 0; i < testdim; ++i)
{ ASSERT_EQ( hvxx(i) , hdxx(i) ); }
ASSERT_EQ( rank(hdxx) , rank(hvxx) );
ASSERT_EQ( hdxx.dimension_0() , testdim );
ASSERT_EQ( hdxx.dimension_0() , hvxx.dimension_0() );
// deep_copy from dynrankview to view
View1 vdxx("vdxx",testdim);
auto hvdxx = Kokkos::create_mirror_view(vdxx);
Kokkos::deep_copy(hvdxx , hdxx);
ASSERT_EQ( rank(hdxx) , rank(hvdxx) );
ASSERT_EQ( hvdxx.dimension_0() , testdim );
ASSERT_EQ( hdxx.dimension_0() , hvdxx.dimension_0() );
for (int i = 0; i < testdim; ++i)
{ ASSERT_EQ( hvxx(i) , hvdxx(i) ); }
}
typedef T DataType ;
static void
check_auto_conversion_to_const(
const Kokkos::Experimental::DynRankView< const DataType , device > & arg_const ,
const Kokkos::Experimental::DynRankView< DataType , device > & arg )
{
ASSERT_TRUE( arg_const == arg );
}
static void run_test_const()
{
typedef Kokkos::Experimental::DynRankView< DataType , device > typeX ;
typedef Kokkos::Experimental::DynRankView< const DataType , device > const_typeX ;
typedef Kokkos::Experimental::DynRankView< const DataType , device , Kokkos::MemoryRandomAccess > const_typeR ;
typeX x( "X", 2 );
const_typeX xc = x ;
const_typeR xr = x ;
ASSERT_TRUE( xc == x );
ASSERT_TRUE( x == xc );
// For CUDA the constant random access View does not return
// an lvalue reference due to retrieving through texture cache
// therefore not allowed to query the underlying pointer.
#if defined(KOKKOS_HAVE_CUDA)
if ( ! std::is_same< typename device::execution_space , Kokkos::Cuda >::value )
#endif
{
ASSERT_TRUE( x.ptr_on_device() == xr.ptr_on_device() );
}
// typeX xf = xc ; // setting non-const from const must not compile
check_auto_conversion_to_const( x , x );
}
static void run_test_subview()
{
typedef Kokkos::Experimental::DynRankView< const T , device > cdView ;
typedef Kokkos::Experimental::DynRankView< T , device > dView ;
// LayoutStride required for all returned DynRankView subdynrankview's
typedef Kokkos::Experimental::DynRankView< T , Kokkos::LayoutStride , device > sdView ;
dView0 d0( "d0" );
cdView s0 = d0 ;
// N0 = 1000,N1 = 3,N2 = 5,N3 = 7
unsigned order[] = { 6,5,4,3,2,1,0 }, dimen[] = { N0, N1, N2, 2, 2, 2, 2 }; //LayoutRight equivalent
sdView d7( "d7" , Kokkos::LayoutStride::order_dimensions(7, order, dimen) );
ASSERT_EQ( d7.rank() , 7 );
sdView ds0 = Kokkos::subdynrankview( d7 , 1 , 1 , 1 , 1 , 1 , 1 , 1 );
ASSERT_EQ( ds0.rank() , 0 );
//Basic test - ALL
sdView dsALL = Kokkos::Experimental::subdynrankview( d7 , Kokkos::ALL() , Kokkos::ALL() , Kokkos::ALL() , Kokkos::ALL() , Kokkos::ALL() , Kokkos::ALL() , Kokkos::ALL() );
ASSERT_EQ( dsALL.rank() , 7 );
// Send a value to final rank returning rank 6 subview
sdView dsm1 = Kokkos::Experimental::subdynrankview( d7 , Kokkos::ALL() , Kokkos::ALL() , Kokkos::ALL() , Kokkos::ALL() , Kokkos::ALL() , Kokkos::ALL() , 1 );
ASSERT_EQ( dsm1.rank() , 6 );
// Send a std::pair as argument to a rank
sdView dssp = Kokkos::Experimental::subdynrankview( d7 , Kokkos::ALL() , Kokkos::ALL() , Kokkos::ALL() , Kokkos::ALL() , Kokkos::ALL() , Kokkos::ALL() , std::pair<unsigned,unsigned>(1,2) );
ASSERT_EQ( dssp.rank() , 7 );
// Send a kokkos::pair as argument to a rank; take default layout as input
dView0 dd0("dd0" , N0 , N1 , N2 , 2 , 2 , 2 , 2 ); //default layout
ASSERT_EQ( dd0.rank() , 7 );
sdView dtkp = Kokkos::Experimental::subdynrankview( dd0 , Kokkos::ALL() , Kokkos::ALL() , Kokkos::ALL() , Kokkos::ALL() , Kokkos::ALL() , Kokkos::ALL() , Kokkos::pair<unsigned,unsigned>(0,1) );
ASSERT_EQ( dtkp.rank() , 7 );
// Return rank 7 subview, taking a pair as one argument, layout stride input
sdView ds7 = Kokkos::Experimental::subdynrankview( d7 , Kokkos::ALL() , Kokkos::ALL() , Kokkos::ALL() , Kokkos::ALL() , Kokkos::ALL() , Kokkos::ALL() , Kokkos::pair<unsigned,unsigned>(0,1) );
ASSERT_EQ( ds7.rank() , 7 );
// Default Layout DynRankView
dView dv6("dv6" , N0 , N1 , N2 , N3 , 2 , 2 );
ASSERT_EQ( dv6.rank() , 6 );
// DynRankView with LayoutRight
typedef Kokkos::Experimental::DynRankView< T , Kokkos::LayoutRight , device > drView ;
drView dr5( "dr5" , N0 , N1 , N2 , 2 , 2 );
ASSERT_EQ( dr5.rank() , 5 );
// LayoutStride but arranged as LayoutRight
// NOTE: unused arg_layout dimensions must be set to ~size_t(0) so that
// rank deduction can properly take place
unsigned order5[] = { 4,3,2,1,0 }, dimen5[] = { N0, N1, N2, 2, 2 };
Kokkos::LayoutStride ls = Kokkos::LayoutStride::order_dimensions(5, order5, dimen5);
ls.dimension[5] = ~size_t(0);
ls.dimension[6] = ~size_t(0);
ls.dimension[7] = ~size_t(0);
sdView d5("d5", ls);
ASSERT_EQ( d5.rank() , 5 );
// LayoutStride arranged as LayoutRight - commented out as example that fails unit test
// unsigned order5[] = { 4,3,2,1,0 }, dimen5[] = { N0, N1, N2, 2, 2 };
// sdView d5( "d5" , Kokkos::LayoutStride::order_dimensions(5, order5, dimen5) );
//
// Fails the following unit test:
// ASSERT_EQ( d5.rank() , dr5.rank() );
//
// Explanation: In construction of the Kokkos::LayoutStride below, since the
// remaining dimensions are not specified, they will default to values of 0
// rather than ~size_t(0).
// When passed to the DynRankView constructor the default dimensions (of 0)
// will be counted toward the dynamic rank and returning an incorrect value
// (i.e. rank 7 rather than 5).
// Check LayoutRight dr5 and LayoutStride d5 dimensions agree (as they should)
ASSERT_EQ( d5.dimension_0() , dr5.dimension_0() );
ASSERT_EQ( d5.dimension_1() , dr5.dimension_1() );
ASSERT_EQ( d5.dimension_2() , dr5.dimension_2() );
ASSERT_EQ( d5.dimension_3() , dr5.dimension_3() );
ASSERT_EQ( d5.dimension_4() , dr5.dimension_4() );
ASSERT_EQ( d5.dimension_5() , dr5.dimension_5() );
ASSERT_EQ( d5.rank() , dr5.rank() );
// Rank 5 subview of rank 5 dynamic rank view, layout stride input
sdView ds5 = Kokkos::Experimental::subdynrankview( d5 , Kokkos::ALL() , Kokkos::ALL() , Kokkos::ALL() , Kokkos::ALL() , Kokkos::pair<unsigned,unsigned>(0,1) );
ASSERT_EQ( ds5.rank() , 5 );
// Pass in extra ALL arguments beyond the rank of the DynRank View.
// This behavior is allowed - ignore the extra ALL arguments when
// the src.rank() < number of arguments, but be careful!
sdView ds5plus = Kokkos::Experimental::subdynrankview( d5 , Kokkos::ALL() , Kokkos::ALL() , Kokkos::ALL() , Kokkos::ALL() , Kokkos::pair<unsigned,unsigned>(0,1) , Kokkos::ALL() );
ASSERT_EQ( ds5.rank() , ds5plus.rank() );
ASSERT_EQ( ds5.dimension_0() , ds5plus.dimension_0() );
ASSERT_EQ( ds5.dimension_4() , ds5plus.dimension_4() );
ASSERT_EQ( ds5.dimension_5() , ds5plus.dimension_5() );
#if ! defined( KOKKOS_HAVE_CUDA ) || defined ( KOKKOS_USE_CUDA_UVM )
ASSERT_EQ( & ds5(1,1,1,1,0) - & ds5plus(1,1,1,1,0) , 0 );
ASSERT_EQ( & ds5(1,1,1,1,0,0) - & ds5plus(1,1,1,1,0,0) , 0 ); // passing argument to rank beyond the view's rank is allowed iff it is a 0.
#endif
// Similar test to rank 5 above, but create rank 4 subview
// Check that the rank contracts (ds4 and ds4plus) and that subdynrankview can accept extra args (ds4plus)
sdView ds4 = Kokkos::Experimental::subdynrankview( d5 , Kokkos::ALL() , Kokkos::ALL() , Kokkos::ALL() , Kokkos::ALL() , 0 );
sdView ds4plus = Kokkos::Experimental::subdynrankview( d5 , Kokkos::ALL() , Kokkos::ALL() , Kokkos::ALL() , Kokkos::ALL() , 0 , Kokkos::ALL() );
ASSERT_EQ( ds4.rank() , ds4plus.rank() );
ASSERT_EQ( ds4.rank() , 4 );
ASSERT_EQ( ds4.dimension_0() , ds4plus.dimension_0() );
ASSERT_EQ( ds4.dimension_4() , ds4plus.dimension_4() );
ASSERT_EQ( ds4.dimension_5() , ds4plus.dimension_5() );
}
static void run_test_subview_strided()
{
typedef Kokkos::Experimental::DynRankView < int , Kokkos::LayoutLeft , host_drv_space > drview_left ;
typedef Kokkos::Experimental::DynRankView < int , Kokkos::LayoutRight , host_drv_space > drview_right ;
typedef Kokkos::Experimental::DynRankView < int , Kokkos::LayoutStride , host_drv_space > drview_stride ;
drview_left xl2( "xl2", 100 , 200 );
drview_right xr2( "xr2", 100 , 200 );
drview_stride yl1 = Kokkos::Experimental::subdynrankview( xl2 , 0 , Kokkos::ALL() );
drview_stride yl2 = Kokkos::Experimental::subdynrankview( xl2 , 1 , Kokkos::ALL() );
drview_stride ys1 = Kokkos::Experimental::subdynrankview( xr2 , 0 , Kokkos::ALL() );
drview_stride ys2 = Kokkos::Experimental::subdynrankview( xr2 , 1 , Kokkos::ALL() );
drview_stride yr1 = Kokkos::Experimental::subdynrankview( xr2 , 0 , Kokkos::ALL() );
drview_stride yr2 = Kokkos::Experimental::subdynrankview( xr2 , 1 , Kokkos::ALL() );
ASSERT_EQ( yl1.dimension_0() , xl2.dimension_1() );
ASSERT_EQ( yl2.dimension_0() , xl2.dimension_1() );
ASSERT_EQ( yr1.dimension_0() , xr2.dimension_1() );
ASSERT_EQ( yr2.dimension_0() , xr2.dimension_1() );
ASSERT_EQ( & yl1(0) - & xl2(0,0) , 0 );
ASSERT_EQ( & yl2(0) - & xl2(1,0) , 0 );
ASSERT_EQ( & yr1(0) - & xr2(0,0) , 0 );
ASSERT_EQ( & yr2(0) - & xr2(1,0) , 0 );
drview_left xl4( "xl4", 10 , 20 , 30 , 40 );
drview_right xr4( "xr4", 10 , 20 , 30 , 40 );
//Replace subdynrankview with subview - test
drview_stride yl4 = Kokkos::Experimental::subview( xl4 , 1 , Kokkos::ALL() , 2 , Kokkos::ALL() );
drview_stride yr4 = Kokkos::Experimental::subview( xr4 , 1 , Kokkos::ALL() , 2 , Kokkos::ALL() );
ASSERT_EQ( yl4.dimension_0() , xl4.dimension_1() );
ASSERT_EQ( yl4.dimension_1() , xl4.dimension_3() );
ASSERT_EQ( yr4.dimension_0() , xr4.dimension_1() );
ASSERT_EQ( yr4.dimension_1() , xr4.dimension_3() );
ASSERT_EQ( yl4.rank() , 2);
ASSERT_EQ( yr4.rank() , 2);
ASSERT_EQ( & yl4(4,4) - & xl4(1,4,2,4) , 0 );
ASSERT_EQ( & yr4(4,4) - & xr4(1,4,2,4) , 0 );
}
static void run_test_vector()
{
static const unsigned Length = 1000 , Count = 8 ;
typedef typename Kokkos::Experimental::DynRankView< T , Kokkos::LayoutLeft , host_drv_space > multivector_type ;
typedef typename Kokkos::Experimental::DynRankView< T , Kokkos::LayoutRight , host_drv_space > multivector_right_type ;
multivector_type mv = multivector_type( "mv" , Length , Count );
multivector_right_type mv_right = multivector_right_type( "mv" , Length , Count );
typedef typename Kokkos::Experimental::DynRankView< T , Kokkos::LayoutStride , host_drv_space > svector_type ;
typedef typename Kokkos::Experimental::DynRankView< T , Kokkos::LayoutStride , host_drv_space > smultivector_type ;
typedef typename Kokkos::Experimental::DynRankView< const T , Kokkos::LayoutStride , host_drv_space > const_svector_right_type ;
typedef typename Kokkos::Experimental::DynRankView< const T , Kokkos::LayoutStride , host_drv_space > const_svector_type ;
typedef typename Kokkos::Experimental::DynRankView< const T , Kokkos::LayoutStride , host_drv_space > const_smultivector_type ;
svector_type v1 = Kokkos::Experimental::subdynrankview( mv , Kokkos::ALL() , 0 );
svector_type v2 = Kokkos::Experimental::subdynrankview( mv , Kokkos::ALL() , 1 );
svector_type v3 = Kokkos::Experimental::subdynrankview( mv , Kokkos::ALL() , 2 );
svector_type rv1 = Kokkos::Experimental::subdynrankview( mv_right , 0 , Kokkos::ALL() );
svector_type rv2 = Kokkos::Experimental::subdynrankview( mv_right , 1 , Kokkos::ALL() );
svector_type rv3 = Kokkos::Experimental::subdynrankview( mv_right , 2 , Kokkos::ALL() );
smultivector_type mv1 = Kokkos::Experimental::subdynrankview( mv , std::make_pair( 1 , 998 ) ,
std::make_pair( 2 , 5 ) );
smultivector_type mvr1 =
Kokkos::Experimental::subdynrankview( mv_right ,
std::make_pair( 1 , 998 ) ,
std::make_pair( 2 , 5 ) );
const_svector_type cv1 = Kokkos::Experimental::subdynrankview( mv , Kokkos::ALL(), 0 );
const_svector_type cv2 = Kokkos::Experimental::subdynrankview( mv , Kokkos::ALL(), 1 );
const_svector_type cv3 = Kokkos::Experimental::subdynrankview( mv , Kokkos::ALL(), 2 );
svector_type vr1 = Kokkos::Experimental::subdynrankview( mv , Kokkos::ALL() , 0 );
svector_type vr2 = Kokkos::Experimental::subdynrankview( mv , Kokkos::ALL() , 1 );
svector_type vr3 = Kokkos::Experimental::subdynrankview( mv , Kokkos::ALL() , 2 );
const_svector_right_type cvr1 = Kokkos::Experimental::subdynrankview( mv , Kokkos::ALL() , 0 );
const_svector_right_type cvr2 = Kokkos::Experimental::subdynrankview( mv , Kokkos::ALL() , 1 );
const_svector_right_type cvr3 = Kokkos::Experimental::subdynrankview( mv , Kokkos::ALL() , 2 );
ASSERT_TRUE( & v1[0] == & v1(0) );
ASSERT_TRUE( & v1[0] == & mv(0,0) );
ASSERT_TRUE( & v2[0] == & mv(0,1) );
ASSERT_TRUE( & v3[0] == & mv(0,2) );
ASSERT_TRUE( & cv1[0] == & mv(0,0) );
ASSERT_TRUE( & cv2[0] == & mv(0,1) );
ASSERT_TRUE( & cv3[0] == & mv(0,2) );
ASSERT_TRUE( & vr1[0] == & mv(0,0) );
ASSERT_TRUE( & vr2[0] == & mv(0,1) );
ASSERT_TRUE( & vr3[0] == & mv(0,2) );
ASSERT_TRUE( & cvr1[0] == & mv(0,0) );
ASSERT_TRUE( & cvr2[0] == & mv(0,1) );
ASSERT_TRUE( & cvr3[0] == & mv(0,2) );
ASSERT_TRUE( & mv1(0,0) == & mv( 1 , 2 ) );
ASSERT_TRUE( & mv1(1,1) == & mv( 2 , 3 ) );
ASSERT_TRUE( & mv1(3,2) == & mv( 4 , 4 ) );
ASSERT_TRUE( & mvr1(0,0) == & mv_right( 1 , 2 ) );
ASSERT_TRUE( & mvr1(1,1) == & mv_right( 2 , 3 ) );
ASSERT_TRUE( & mvr1(3,2) == & mv_right( 4 , 4 ) );
const_svector_type c_cv1( v1 );
typename svector_type::const_type c_cv2( v2 );
typename const_svector_type::const_type c_ccv2( v2 );
const_smultivector_type cmv( mv );
typename smultivector_type::const_type cmvX( cmv );
typename const_smultivector_type::const_type ccmvX( cmv );
}
};
} // namespace Test
/*--------------------------------------------------------------------------*/
diff --git a/lib/kokkos/containers/unit_tests/TestErrorReporter.hpp b/lib/kokkos/containers/unit_tests/TestErrorReporter.hpp
new file mode 100644
index 000000000..c431b62a5
--- /dev/null
+++ b/lib/kokkos/containers/unit_tests/TestErrorReporter.hpp
@@ -0,0 +1,227 @@
+/*
+//@HEADER
+// ************************************************************************
+//
+// Kokkos v. 2.0
+// Copyright (2014) Sandia Corporation
+//
+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
+// the U.S. Government retains certain rights in this software.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the Corporation nor the names of the
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
+//
+// ************************************************************************
+//@HEADER
+*/
+
+#ifndef KOKKOS_TEST_EXPERIMENTAL_ERROR_REPORTER_HPP
+#define KOKKOS_TEST_EXPERIMENTAL_ERROR_REPORTER_HPP
+
+#include <gtest/gtest.h>
+#include <iostream>
+#include <Kokkos_Core.hpp>
+
+namespace Test {
+
+// Just save the data in the report. Informative text goies in the operator<<(..).
+template <typename DataType1, typename DataType2, typename DataType3>
+struct ThreeValReport
+{
+ DataType1 m_data1;
+ DataType2 m_data2;
+ DataType3 m_data3;
+
+};
+
+template <typename DataType1, typename DataType2, typename DataType3>
+std::ostream &operator<<(std::ostream & os, const ThreeValReport<DataType1, DataType2, DataType3> &val)
+{
+ return os << "{" << val.m_data1 << " " << val.m_data2 << " " << val.m_data3 << "}";
+}
+
+template<typename ReportType>
+void checkReportersAndReportsAgree(const std::vector<int> &reporters,
+ const std::vector<ReportType> &reports)
+{
+ for (size_t i = 0; i < reports.size(); ++i) {
+ EXPECT_EQ(1, reporters[i] % 2);
+ EXPECT_EQ(reporters[i], reports[i].m_data1);
+ }
+}
+
+
+template <typename DeviceType>
+struct ErrorReporterDriverBase {
+
+ typedef ThreeValReport<int, int, double> report_type;
+ typedef Kokkos::Experimental::ErrorReporter<report_type, DeviceType> error_reporter_type;
+ error_reporter_type m_errorReporter;
+
+ ErrorReporterDriverBase(int reporter_capacity, int test_size)
+ : m_errorReporter(reporter_capacity) { }
+
+ KOKKOS_INLINE_FUNCTION bool error_condition(const int work_idx) const { return (work_idx % 2 != 0); }
+
+ void check_expectations(int reporter_capacity, int test_size)
+ {
+ int num_reported = m_errorReporter.getNumReports();
+ int num_attempts = m_errorReporter.getNumReportAttempts();
+
+ int expected_num_reports = std::min(reporter_capacity, test_size / 2);
+ EXPECT_EQ(expected_num_reports, num_reported);
+ EXPECT_EQ(test_size / 2, num_attempts);
+
+ bool expect_full = (reporter_capacity <= (test_size / 2));
+ bool reported_full = m_errorReporter.full();
+ EXPECT_EQ(expect_full, reported_full);
+ }
+};
+
+template <typename ErrorReporterDriverType>
+void TestErrorReporter()
+{
+ typedef ErrorReporterDriverType tester_type;
+ std::vector<int> reporters;
+ std::vector<typename tester_type::report_type> reports;
+
+ tester_type test1(100, 10);
+ test1.m_errorReporter.getReports(reporters, reports);
+ checkReportersAndReportsAgree(reporters, reports);
+
+ tester_type test2(10, 100);
+ test2.m_errorReporter.getReports(reporters, reports);
+ checkReportersAndReportsAgree(reporters, reports);
+
+ typename Kokkos::View<int*, typename ErrorReporterDriverType::execution_space >::HostMirror view_reporters;
+ typename Kokkos::View<typename tester_type::report_type*, typename ErrorReporterDriverType::execution_space >::HostMirror
+ view_reports;
+ test2.m_errorReporter.getReports(view_reporters, view_reports);
+
+ int num_reports = view_reporters.extent(0);
+ reporters.clear();
+ reports.clear();
+ reporters.reserve(num_reports);
+ reports.reserve(num_reports);
+
+ for (int i = 0; i < num_reports; ++i) {
+ reporters.push_back(view_reporters(i));
+ reports.push_back(view_reports(i));
+ }
+ checkReportersAndReportsAgree(reporters, reports);
+
+}
+
+
+template <typename DeviceType>
+struct ErrorReporterDriver : public ErrorReporterDriverBase<DeviceType>
+{
+ typedef ErrorReporterDriverBase<DeviceType> driver_base;
+ typedef typename driver_base::error_reporter_type::execution_space execution_space;
+
+ ErrorReporterDriver(int reporter_capacity, int test_size)
+ : driver_base(reporter_capacity, test_size)
+ {
+ execute(reporter_capacity, test_size);
+
+ // Test that clear() and resize() work across memory spaces.
+ if (reporter_capacity < test_size) {
+ driver_base::m_errorReporter.clear();
+ driver_base::m_errorReporter.resize(test_size);
+ execute(test_size, test_size);
+ }
+ }
+
+ void execute(int reporter_capacity, int test_size)
+ {
+ Kokkos::parallel_for(Kokkos::RangePolicy<execution_space>(0,test_size), *this);
+ driver_base::check_expectations(reporter_capacity, test_size);
+ }
+
+ KOKKOS_INLINE_FUNCTION
+ void operator()(const int work_idx) const
+ {
+ if (driver_base::error_condition(work_idx)) {
+ double val = M_PI * static_cast<double>(work_idx);
+ typename driver_base::report_type report = {work_idx, -2*work_idx, val};
+ driver_base::m_errorReporter.add_report(work_idx, report);
+ }
+ }
+};
+
+#if defined(KOKKOS_CLASS_LAMBDA)
+template <typename DeviceType>
+struct ErrorReporterDriverUseLambda : public ErrorReporterDriverBase<DeviceType>
+{
+
+ typedef ErrorReporterDriverBase<DeviceType> driver_base;
+ typedef typename driver_base::error_reporter_type::execution_space execution_space;
+
+ ErrorReporterDriverUseLambda(int reporter_capacity, int test_size)
+ : driver_base(reporter_capacity, test_size)
+ {
+ Kokkos::parallel_for(Kokkos::RangePolicy<execution_space>(0,test_size), KOKKOS_CLASS_LAMBDA (const int work_idx) {
+ if (driver_base::error_condition(work_idx)) {
+ double val = M_PI * static_cast<double>(work_idx);
+ typename driver_base::report_type report = {work_idx, -2*work_idx, val};
+ driver_base::m_errorReporter.add_report(work_idx, report);
+ }
+ });
+ driver_base::check_expectations(reporter_capacity, test_size);
+ }
+
+};
+#endif
+
+
+#ifdef KOKKOS_HAVE_OPENMP
+struct ErrorReporterDriverNativeOpenMP : public ErrorReporterDriverBase<Kokkos::OpenMP>
+{
+ typedef ErrorReporterDriverBase<Kokkos::OpenMP> driver_base;
+ typedef typename driver_base::error_reporter_type::execution_space execution_space;
+
+ ErrorReporterDriverNativeOpenMP(int reporter_capacity, int test_size)
+ : driver_base(reporter_capacity, test_size)
+ {
+#pragma omp parallel for
+ for(int work_idx = 0; work_idx < test_size; ++work_idx)
+ {
+ if (driver_base::error_condition(work_idx)) {
+ double val = M_PI * static_cast<double>(work_idx);
+ typename driver_base::report_type report = {work_idx, -2*work_idx, val};
+ driver_base::m_errorReporter.add_report(work_idx, report);
+ }
+ };
+ driver_base::check_expectations(reporter_capacity, test_size);
+ }
+};
+#endif
+
+} // namespace Test
+#endif // #ifndef KOKKOS_TEST_ERROR_REPORTING_HPP
diff --git a/lib/kokkos/containers/unit_tests/TestOpenMP.cpp b/lib/kokkos/containers/unit_tests/TestOpenMP.cpp
index a4319f39f..598a296c7 100644
--- a/lib/kokkos/containers/unit_tests/TestOpenMP.cpp
+++ b/lib/kokkos/containers/unit_tests/TestOpenMP.cpp
@@ -1,182 +1,194 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
#include <Kokkos_Bitset.hpp>
#include <Kokkos_UnorderedMap.hpp>
#include <Kokkos_Vector.hpp>
//----------------------------------------------------------------------------
#include <TestBitset.hpp>
#include <TestUnorderedMap.hpp>
#include <TestStaticCrsGraph.hpp>
#include <TestVector.hpp>
#include <TestDualView.hpp>
#include <TestDynamicView.hpp>
-#include <TestSegmentedView.hpp>
#include <TestComplex.hpp>
#include <Kokkos_DynRankView.hpp>
#include <TestDynViewAPI.hpp>
+#include <Kokkos_ErrorReporter.hpp>
+#include <TestErrorReporter.hpp>
+
#include <iomanip>
namespace Test {
#ifdef KOKKOS_HAVE_OPENMP
class openmp : public ::testing::Test {
protected:
static void SetUpTestCase()
{
std::cout << std::setprecision(5) << std::scientific;
unsigned threads_count = 4 ;
if ( Kokkos::hwloc::available() ) {
threads_count = Kokkos::hwloc::get_available_numa_count() *
Kokkos::hwloc::get_available_cores_per_numa();
}
Kokkos::OpenMP::initialize( threads_count );
}
static void TearDownTestCase()
{
Kokkos::OpenMP::finalize();
}
};
TEST_F( openmp, complex )
{
testComplex<Kokkos::OpenMP> ();
}
TEST_F( openmp, dyn_view_api) {
TestDynViewAPI< double , Kokkos::OpenMP >();
}
TEST_F( openmp, bitset )
{
test_bitset<Kokkos::OpenMP>();
}
TEST_F( openmp , staticcrsgraph )
{
TestStaticCrsGraph::run_test_graph< Kokkos::OpenMP >();
TestStaticCrsGraph::run_test_graph2< Kokkos::OpenMP >();
}
#define OPENMP_INSERT_TEST( name, num_nodes, num_inserts, num_duplicates, repeat, near ) \
TEST_F( openmp, UnorderedMap_insert_##name##_##num_nodes##_##num_inserts##_##num_duplicates##_##repeat##x) { \
for (int i=0; i<repeat; ++i) \
test_insert<Kokkos::OpenMP>(num_nodes,num_inserts,num_duplicates, near); \
}
#define OPENMP_FAILED_INSERT_TEST( num_nodes, repeat ) \
TEST_F( openmp, UnorderedMap_failed_insert_##num_nodes##_##repeat##x) { \
for (int i=0; i<repeat; ++i) \
test_failed_insert<Kokkos::OpenMP>(num_nodes); \
}
#define OPENMP_ASSIGNEMENT_TEST( num_nodes, repeat ) \
TEST_F( openmp, UnorderedMap_assignment_operators_##num_nodes##_##repeat##x) { \
for (int i=0; i<repeat; ++i) \
test_assignement_operators<Kokkos::OpenMP>(num_nodes); \
}
#define OPENMP_DEEP_COPY( num_nodes, repeat ) \
TEST_F( openmp, UnorderedMap_deep_copy##num_nodes##_##repeat##x) { \
for (int i=0; i<repeat; ++i) \
test_deep_copy<Kokkos::OpenMP>(num_nodes); \
}
#define OPENMP_VECTOR_COMBINE_TEST( size ) \
TEST_F( openmp, vector_combination##size##x) { \
test_vector_combinations<int,Kokkos::OpenMP>(size); \
}
#define OPENMP_DUALVIEW_COMBINE_TEST( size ) \
TEST_F( openmp, dualview_combination##size##x) { \
test_dualview_combinations<int,Kokkos::OpenMP>(size); \
}
-#define OPENMP_SEGMENTEDVIEW_TEST( size ) \
- TEST_F( openmp, segmentedview_##size##x) { \
- test_segmented_view<double,Kokkos::OpenMP>(size); \
- }
-
OPENMP_INSERT_TEST(close, 100000, 90000, 100, 500, true)
OPENMP_INSERT_TEST(far, 100000, 90000, 100, 500, false)
OPENMP_FAILED_INSERT_TEST( 10000, 1000 )
OPENMP_DEEP_COPY( 10000, 1 )
OPENMP_VECTOR_COMBINE_TEST( 10 )
OPENMP_VECTOR_COMBINE_TEST( 3057 )
OPENMP_DUALVIEW_COMBINE_TEST( 10 )
-OPENMP_SEGMENTEDVIEW_TEST( 10000 )
#undef OPENMP_INSERT_TEST
#undef OPENMP_FAILED_INSERT_TEST
#undef OPENMP_ASSIGNEMENT_TEST
#undef OPENMP_DEEP_COPY
#undef OPENMP_VECTOR_COMBINE_TEST
#undef OPENMP_DUALVIEW_COMBINE_TEST
-#undef OPENMP_SEGMENTEDVIEW_TEST
#endif
TEST_F( openmp , dynamic_view )
{
typedef TestDynamicView< double , Kokkos::OpenMP >
TestDynView ;
for ( int i = 0 ; i < 10 ; ++i ) {
TestDynView::run( 100000 + 100 * i );
}
}
+#if defined(KOKKOS_CLASS_LAMBDA)
+TEST_F(openmp, ErrorReporterViaLambda)
+{
+ TestErrorReporter<ErrorReporterDriverUseLambda<Kokkos::OpenMP>>();
+}
+#endif
+
+TEST_F(openmp, ErrorReporter)
+{
+ TestErrorReporter<ErrorReporterDriver<Kokkos::OpenMP>>();
+}
+
+TEST_F(openmp, ErrorReporterNativeOpenMP)
+{
+ TestErrorReporter<ErrorReporterDriverNativeOpenMP>();
+}
+
} // namespace test
diff --git a/lib/kokkos/containers/unit_tests/TestSegmentedView.hpp b/lib/kokkos/containers/unit_tests/TestSegmentedView.hpp
deleted file mode 100644
index bfd66d12a..000000000
--- a/lib/kokkos/containers/unit_tests/TestSegmentedView.hpp
+++ /dev/null
@@ -1,708 +0,0 @@
-/*
-//@HEADER
-// ************************************************************************
-//
-// Kokkos v. 2.0
-// Copyright (2014) Sandia Corporation
-//
-// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
-// the U.S. Government retains certain rights in this software.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are
-// met:
-//
-// 1. Redistributions of source code must retain the above copyright
-// notice, this list of conditions and the following disclaimer.
-//
-// 2. Redistributions in binary form must reproduce the above copyright
-// notice, this list of conditions and the following disclaimer in the
-// documentation and/or other materials provided with the distribution.
-//
-// 3. Neither the name of the Corporation nor the names of the
-// contributors may be used to endorse or promote products derived from
-// this software without specific prior written permission.
-//
-// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
-// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
-// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
-// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
-// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
-// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
-// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
-// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
-// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-//
-// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
-// ************************************************************************
-//@HEADER
-*/
-
-#ifndef KOKKOS_TEST_SEGMENTEDVIEW_HPP
-#define KOKKOS_TEST_SEGMENTEDVIEW_HPP
-
-#include <gtest/gtest.h>
-#include <iostream>
-#include <cstdlib>
-#include <cstdio>
-#include <Kokkos_Core.hpp>
-
-#if ! KOKKOS_USING_EXP_VIEW
-
-#include <Kokkos_SegmentedView.hpp>
-#include <impl/Kokkos_Timer.hpp>
-
-namespace Test {
-
-namespace Impl {
-
- template<class ViewType , class ExecutionSpace, int Rank = ViewType::Rank>
- struct GrowTest;
-
- template<class ViewType , class ExecutionSpace>
- struct GrowTest<ViewType , ExecutionSpace , 1> {
- typedef ExecutionSpace execution_space;
- typedef Kokkos::TeamPolicy<execution_space> Policy;
- typedef typename Policy::member_type team_type;
- typedef double value_type;
-
- ViewType a;
-
- GrowTest(ViewType in):a(in) {}
-
- KOKKOS_INLINE_FUNCTION
- void operator() (team_type team_member, double& value) const {
- unsigned int team_idx = team_member.league_rank() * team_member.team_size();
-
- a.grow(team_member , team_idx+team_member.team_size());
- value += team_idx + team_member.team_rank();
-
- if((a.dimension_0()>team_idx+team_member.team_rank()) &&
- (a.dimension(0)>team_idx+team_member.team_rank()))
- a(team_idx+team_member.team_rank()) = team_idx+team_member.team_rank();
-
- }
- };
-
- template<class ViewType , class ExecutionSpace>
- struct GrowTest<ViewType , ExecutionSpace , 2> {
- typedef ExecutionSpace execution_space;
- typedef Kokkos::TeamPolicy<execution_space> Policy;
- typedef typename Policy::member_type team_type;
- typedef double value_type;
-
- ViewType a;
-
- GrowTest(ViewType in):a(in) {}
-
- KOKKOS_INLINE_FUNCTION
- void operator() (team_type team_member, double& value) const {
- unsigned int team_idx = team_member.league_rank() * team_member.team_size();
-
- a.grow(team_member , team_idx+ team_member.team_size());
-
- for( typename ExecutionSpace::size_type k=0;k<7;k++)
- value += team_idx + team_member.team_rank() + 13*k;
-
- if((a.dimension_0()>team_idx+ team_member.team_rank()) &&
- (a.dimension(0)>team_idx+ team_member.team_rank())) {
- for( typename ExecutionSpace::size_type k=0;k<a.dimension_1();k++) {
- a(team_idx+ team_member.team_rank(),k) =
- team_idx+ team_member.team_rank() + 13*k;
- }
- }
- }
- };
-
- template<class ViewType , class ExecutionSpace>
- struct GrowTest<ViewType , ExecutionSpace , 3> {
- typedef ExecutionSpace execution_space;
- typedef Kokkos::TeamPolicy<execution_space> Policy;
- typedef typename Policy::member_type team_type;
- typedef double value_type;
-
- ViewType a;
-
- GrowTest(ViewType in):a(in) {}
-
- KOKKOS_INLINE_FUNCTION
- void operator() (team_type team_member, double& value) const {
- unsigned int team_idx = team_member.league_rank() * team_member.team_size();
-
- a.grow(team_member , team_idx+ team_member.team_size());
-
- for( typename ExecutionSpace::size_type k=0;k<7;k++)
- for( typename ExecutionSpace::size_type l=0;l<3;l++)
- value += team_idx + team_member.team_rank() + 13*k + 3*l;
-
- if((a.dimension_0()>team_idx+ team_member.team_rank()) &&
- (a.dimension(0)>team_idx+ team_member.team_rank())) {
- for( typename ExecutionSpace::size_type k=0;k<a.dimension_1();k++)
- for( typename ExecutionSpace::size_type l=0;l<a.dimension_2();l++)
- a(team_idx+ team_member.team_rank(),k,l) =
- team_idx+ team_member.team_rank() + 13*k + 3*l;
- }
- }
- };
-
- template<class ViewType , class ExecutionSpace>
- struct GrowTest<ViewType , ExecutionSpace , 4> {
- typedef ExecutionSpace execution_space;
- typedef Kokkos::TeamPolicy<execution_space> Policy;
- typedef typename Policy::member_type team_type;
- typedef double value_type;
-
- ViewType a;
-
- GrowTest(ViewType in):a(in) {}
-
- KOKKOS_INLINE_FUNCTION
- void operator() (team_type team_member, double& value) const {
- unsigned int team_idx = team_member.league_rank() * team_member.team_size();
-
- a.grow(team_member , team_idx+ team_member.team_size());
-
- for( typename ExecutionSpace::size_type k=0;k<7;k++)
- for( typename ExecutionSpace::size_type l=0;l<3;l++)
- for( typename ExecutionSpace::size_type m=0;m<2;m++)
- value += team_idx + team_member.team_rank() + 13*k + 3*l + 7*m;
-
- if((a.dimension_0()>team_idx+ team_member.team_rank()) &&
- (a.dimension(0)>team_idx+ team_member.team_rank())) {
- for( typename ExecutionSpace::size_type k=0;k<a.dimension_1();k++)
- for( typename ExecutionSpace::size_type l=0;l<a.dimension_2();l++)
- for( typename ExecutionSpace::size_type m=0;m<a.dimension_3();m++)
- a(team_idx+ team_member.team_rank(),k,l,m) =
- team_idx+ team_member.team_rank() + 13*k + 3*l + 7*m;
- }
- }
- };
-
- template<class ViewType , class ExecutionSpace>
- struct GrowTest<ViewType , ExecutionSpace , 5> {
- typedef ExecutionSpace execution_space;
- typedef Kokkos::TeamPolicy<execution_space> Policy;
- typedef typename Policy::member_type team_type;
- typedef double value_type;
-
- ViewType a;
-
- GrowTest(ViewType in):a(in) {}
-
- KOKKOS_INLINE_FUNCTION
- void operator() (team_type team_member, double& value) const {
- unsigned int team_idx = team_member.league_rank() * team_member.team_size();
-
- a.grow(team_member , team_idx+ team_member.team_size());
-
- for( typename ExecutionSpace::size_type k=0;k<7;k++)
- for( typename ExecutionSpace::size_type l=0;l<3;l++)
- for( typename ExecutionSpace::size_type m=0;m<2;m++)
- for( typename ExecutionSpace::size_type n=0;n<3;n++)
- value +=
- team_idx + team_member.team_rank() + 13*k + 3*l + 7*m + 5*n;
-
- if((a.dimension_0()>team_idx+ team_member.team_rank()) &&
- (a.dimension(0)>team_idx+ team_member.team_rank())) {
- for( typename ExecutionSpace::size_type k=0;k<a.dimension_1();k++)
- for( typename ExecutionSpace::size_type l=0;l<a.dimension_2();l++)
- for( typename ExecutionSpace::size_type m=0;m<a.dimension_3();m++)
- for( typename ExecutionSpace::size_type n=0;n<a.dimension_4();n++)
- a(team_idx+ team_member.team_rank(),k,l,m,n) =
- team_idx+ team_member.team_rank() + 13*k + 3*l + 7*m + 5*n;
- }
- }
- };
-
- template<class ViewType , class ExecutionSpace>
- struct GrowTest<ViewType , ExecutionSpace , 6> {
- typedef ExecutionSpace execution_space;
- typedef Kokkos::TeamPolicy<execution_space> Policy;
- typedef typename Policy::member_type team_type;
- typedef double value_type;
-
- ViewType a;
-
- GrowTest(ViewType in):a(in) {}
-
- KOKKOS_INLINE_FUNCTION
- void operator() (team_type team_member, double& value) const {
- unsigned int team_idx = team_member.league_rank() * team_member.team_size();
-
- a.grow(team_member , team_idx+ team_member.team_size());
-
- for( typename ExecutionSpace::size_type k=0;k<7;k++)
- for( typename ExecutionSpace::size_type l=0;l<3;l++)
- for( typename ExecutionSpace::size_type m=0;m<2;m++)
- for( typename ExecutionSpace::size_type n=0;n<3;n++)
- for( typename ExecutionSpace::size_type o=0;o<2;o++)
- value +=
- team_idx + team_member.team_rank() + 13*k + 3*l + 7*m + 5*n + 2*o ;
-
- if((a.dimension_0()>team_idx+ team_member.team_rank()) &&
- (a.dimension(0)>team_idx+ team_member.team_rank())) {
- for( typename ExecutionSpace::size_type k=0;k<a.dimension_1();k++)
- for( typename ExecutionSpace::size_type l=0;l<a.dimension_2();l++)
- for( typename ExecutionSpace::size_type m=0;m<a.dimension_3();m++)
- for( typename ExecutionSpace::size_type n=0;n<a.dimension_4();n++)
- for( typename ExecutionSpace::size_type o=0;o<a.dimension_5();o++)
- a(team_idx+ team_member.team_rank(),k,l,m,n,o) =
- team_idx + team_member.team_rank() + 13*k + 3*l + 7*m + 5*n + 2*o ;
- }
- }
- };
-
- template<class ViewType , class ExecutionSpace>
- struct GrowTest<ViewType , ExecutionSpace , 7> {
- typedef ExecutionSpace execution_space;
- typedef Kokkos::TeamPolicy<execution_space> Policy;
- typedef typename Policy::member_type team_type;
- typedef double value_type;
-
- ViewType a;
-
- GrowTest(ViewType in):a(in) {}
-
- KOKKOS_INLINE_FUNCTION
- void operator() (team_type team_member, double& value) const {
- unsigned int team_idx = team_member.league_rank() * team_member.team_size();
-
- a.grow(team_member , team_idx+ team_member.team_size());
-
- for( typename ExecutionSpace::size_type k=0;k<7;k++)
- for( typename ExecutionSpace::size_type l=0;l<3;l++)
- for( typename ExecutionSpace::size_type m=0;m<2;m++)
- for( typename ExecutionSpace::size_type n=0;n<3;n++)
- for( typename ExecutionSpace::size_type o=0;o<2;o++)
- for( typename ExecutionSpace::size_type p=0;p<4;p++)
- value +=
- team_idx + team_member.team_rank() + 13*k + 3*l + 7*m + 5*n + 2*o + 15*p ;
-
- if((a.dimension_0()>team_idx+ team_member.team_rank()) &&
- (a.dimension(0)>team_idx+ team_member.team_rank())) {
- for( typename ExecutionSpace::size_type k=0;k<a.dimension_1();k++)
- for( typename ExecutionSpace::size_type l=0;l<a.dimension_2();l++)
- for( typename ExecutionSpace::size_type m=0;m<a.dimension_3();m++)
- for( typename ExecutionSpace::size_type n=0;n<a.dimension_4();n++)
- for( typename ExecutionSpace::size_type o=0;o<a.dimension_5();o++)
- for( typename ExecutionSpace::size_type p=0;p<a.dimension_6();p++)
- a(team_idx+ team_member.team_rank(),k,l,m,n,o,p) =
- team_idx + team_member.team_rank() + 13*k + 3*l + 7*m + 5*n + 2*o + 15*p ;
- }
- }
- };
-
- template<class ViewType , class ExecutionSpace>
- struct GrowTest<ViewType , ExecutionSpace , 8> {
- typedef ExecutionSpace execution_space;
- typedef Kokkos::TeamPolicy<execution_space> Policy;
- typedef typename Policy::member_type team_type;
- typedef double value_type;
-
- ViewType a;
-
- GrowTest(ViewType in):a(in) {}
-
- KOKKOS_INLINE_FUNCTION
- void operator() (team_type team_member, double& value) const {
- unsigned int team_idx = team_member.league_rank() * team_member.team_size();
- a.grow(team_member , team_idx + team_member.team_size());
-
- for( typename ExecutionSpace::size_type k=0;k<7;k++)
- for( typename ExecutionSpace::size_type l=0;l<3;l++)
- for( typename ExecutionSpace::size_type m=0;m<2;m++)
- for( typename ExecutionSpace::size_type n=0;n<3;n++)
- for( typename ExecutionSpace::size_type o=0;o<2;o++)
- for( typename ExecutionSpace::size_type p=0;p<4;p++)
- for( typename ExecutionSpace::size_type q=0;q<3;q++)
- value +=
- team_idx + team_member.team_rank() + 13*k + 3*l + 7*m + 5*n + 2*o + 15*p + 17*q;
-
- if((a.dimension_0()>team_idx+ team_member.team_rank()) &&
- (a.dimension(0)>team_idx+ team_member.team_rank())) {
- for( typename ExecutionSpace::size_type k=0;k<a.dimension_1();k++)
- for( typename ExecutionSpace::size_type l=0;l<a.dimension_2();l++)
- for( typename ExecutionSpace::size_type m=0;m<a.dimension_3();m++)
- for( typename ExecutionSpace::size_type n=0;n<a.dimension_4();n++)
- for( typename ExecutionSpace::size_type o=0;o<a.dimension_5();o++)
- for( typename ExecutionSpace::size_type p=0;p<a.dimension_6();p++)
- for( typename ExecutionSpace::size_type q=0;q<a.dimension_7();q++)
- a(team_idx+ team_member.team_rank(),k,l,m,n,o,p,q) =
- team_idx + team_member.team_rank() + 13*k + 3*l + 7*m + 5*n + 2*o + 15*p + 17*q;
- }
- }
- };
-
- template<class ViewType , class ExecutionSpace, int Rank = ViewType::Rank>
- struct VerifyTest;
-
- template<class ViewType , class ExecutionSpace>
- struct VerifyTest<ViewType , ExecutionSpace , 1> {
- typedef ExecutionSpace execution_space;
- typedef Kokkos::TeamPolicy<execution_space> Policy;
- typedef typename Policy::member_type team_type;
- typedef double value_type;
-
- ViewType a;
-
- VerifyTest(ViewType in):a(in) {}
-
- KOKKOS_INLINE_FUNCTION
- void operator() (team_type team_member, double& value) const {
- unsigned int team_idx = team_member.league_rank() * team_member.team_size();
-
- if((a.dimension_0()>team_idx+ team_member.team_rank()) &&
- (a.dimension(0)>team_idx+ team_member.team_rank())) {
- value += a(team_idx+ team_member.team_rank());
- }
- }
- };
-
- template<class ViewType , class ExecutionSpace>
- struct VerifyTest<ViewType , ExecutionSpace , 2> {
- typedef ExecutionSpace execution_space;
- typedef Kokkos::TeamPolicy<execution_space> Policy;
- typedef typename Policy::member_type team_type;
- typedef double value_type;
-
- ViewType a;
-
- VerifyTest(ViewType in):a(in) {}
-
- KOKKOS_INLINE_FUNCTION
- void operator() (team_type team_member, double& value) const {
- unsigned int team_idx = team_member.league_rank() * team_member.team_size();
-
- if((a.dimension_0()>team_idx+ team_member.team_rank()) &&
- (a.dimension(0)>team_idx+ team_member.team_rank())) {
- for( typename ExecutionSpace::size_type k=0;k<a.dimension_1();k++)
- value += a(team_idx+ team_member.team_rank(),k);
- }
- }
- };
-
- template<class ViewType , class ExecutionSpace>
- struct VerifyTest<ViewType , ExecutionSpace , 3> {
- typedef ExecutionSpace execution_space;
- typedef Kokkos::TeamPolicy<execution_space> Policy;
- typedef typename Policy::member_type team_type;
- typedef double value_type;
-
- ViewType a;
-
- VerifyTest(ViewType in):a(in) {}
-
- KOKKOS_INLINE_FUNCTION
- void operator() (team_type team_member, double& value) const {
- unsigned int team_idx = team_member.league_rank() * team_member.team_size();
-
- if((a.dimension_0()>team_idx+ team_member.team_rank()) &&
- (a.dimension(0)>team_idx+ team_member.team_rank())) {
- for( typename ExecutionSpace::size_type k=0;k<a.dimension_1();k++)
- for( typename ExecutionSpace::size_type l=0;l<a.dimension_2();l++)
- value += a(team_idx+ team_member.team_rank(),k,l);
- }
- }
- };
-
- template<class ViewType , class ExecutionSpace>
- struct VerifyTest<ViewType , ExecutionSpace , 4> {
- typedef ExecutionSpace execution_space;
- typedef Kokkos::TeamPolicy<execution_space> Policy;
- typedef typename Policy::member_type team_type;
- typedef double value_type;
-
- ViewType a;
-
- VerifyTest(ViewType in):a(in) {}
-
- KOKKOS_INLINE_FUNCTION
- void operator() (team_type team_member, double& value) const {
- unsigned int team_idx = team_member.league_rank() * team_member.team_size();
-
- if((a.dimension_0()>team_idx+ team_member.team_rank()) &&
- (a.dimension(0)>team_idx+ team_member.team_rank())) {
- for( typename ExecutionSpace::size_type k=0;k<a.dimension_1();k++)
- for( typename ExecutionSpace::size_type l=0;l<a.dimension_2();l++)
- for( typename ExecutionSpace::size_type m=0;m<a.dimension_3();m++)
- value += a(team_idx+ team_member.team_rank(),k,l,m);
- }
- }
- };
-
- template<class ViewType , class ExecutionSpace>
- struct VerifyTest<ViewType , ExecutionSpace , 5> {
- typedef ExecutionSpace execution_space;
- typedef Kokkos::TeamPolicy<execution_space> Policy;
- typedef typename Policy::member_type team_type;
- typedef double value_type;
-
- ViewType a;
-
- VerifyTest(ViewType in):a(in) {}
-
- KOKKOS_INLINE_FUNCTION
- void operator() (team_type team_member, double& value) const {
- unsigned int team_idx = team_member.league_rank() * team_member.team_size();
-
- if((a.dimension_0()>team_idx+ team_member.team_rank()) &&
- (a.dimension(0)>team_idx+ team_member.team_rank())) {
- for( typename ExecutionSpace::size_type k=0;k<a.dimension_1();k++)
- for( typename ExecutionSpace::size_type l=0;l<a.dimension_2();l++)
- for( typename ExecutionSpace::size_type m=0;m<a.dimension_3();m++)
- for( typename ExecutionSpace::size_type n=0;n<a.dimension_4();n++)
- value += a(team_idx+ team_member.team_rank(),k,l,m,n);
- }
- }
- };
-
- template<class ViewType , class ExecutionSpace>
- struct VerifyTest<ViewType , ExecutionSpace , 6> {
- typedef ExecutionSpace execution_space;
- typedef Kokkos::TeamPolicy<execution_space> Policy;
- typedef typename Policy::member_type team_type;
- typedef double value_type;
-
- ViewType a;
-
- VerifyTest(ViewType in):a(in) {}
-
- KOKKOS_INLINE_FUNCTION
- void operator() (team_type team_member, double& value) const {
- unsigned int team_idx = team_member.league_rank() * team_member.team_size();
-
- if((a.dimension_0()>team_idx+ team_member.team_rank()) &&
- (a.dimension(0)>team_idx+ team_member.team_rank())) {
- for( typename ExecutionSpace::size_type k=0;k<a.dimension_1();k++)
- for( typename ExecutionSpace::size_type l=0;l<a.dimension_2();l++)
- for( typename ExecutionSpace::size_type m=0;m<a.dimension_3();m++)
- for( typename ExecutionSpace::size_type n=0;n<a.dimension_4();n++)
- for( typename ExecutionSpace::size_type o=0;o<a.dimension_5();o++)
- value += a(team_idx+ team_member.team_rank(),k,l,m,n,o);
- }
- }
- };
-
- template<class ViewType , class ExecutionSpace>
- struct VerifyTest<ViewType , ExecutionSpace , 7> {
- typedef ExecutionSpace execution_space;
- typedef Kokkos::TeamPolicy<execution_space> Policy;
- typedef typename Policy::member_type team_type;
- typedef double value_type;
-
- ViewType a;
-
- VerifyTest(ViewType in):a(in) {}
-
- KOKKOS_INLINE_FUNCTION
- void operator() (team_type team_member, double& value) const {
- unsigned int team_idx = team_member.league_rank() * team_member.team_size();
-
- if((a.dimension_0()>team_idx+ team_member.team_rank()) &&
- (a.dimension(0)>team_idx+ team_member.team_rank())) {
- for( typename ExecutionSpace::size_type k=0;k<a.dimension_1();k++)
- for( typename ExecutionSpace::size_type l=0;l<a.dimension_2();l++)
- for( typename ExecutionSpace::size_type m=0;m<a.dimension_3();m++)
- for( typename ExecutionSpace::size_type n=0;n<a.dimension_4();n++)
- for( typename ExecutionSpace::size_type o=0;o<a.dimension_5();o++)
- for( typename ExecutionSpace::size_type p=0;p<a.dimension_6();p++)
- value += a(team_idx+ team_member.team_rank(),k,l,m,n,o,p);
- }
- }
- };
-
- template<class ViewType , class ExecutionSpace>
- struct VerifyTest<ViewType , ExecutionSpace , 8> {
- typedef ExecutionSpace execution_space;
- typedef Kokkos::TeamPolicy<execution_space> Policy;
- typedef typename Policy::member_type team_type;
- typedef double value_type;
-
- ViewType a;
-
- VerifyTest(ViewType in):a(in) {}
-
- KOKKOS_INLINE_FUNCTION
- void operator() (team_type team_member, double& value) const {
- unsigned int team_idx = team_member.league_rank() * team_member.team_size();
-
- if((a.dimension_0()>team_idx+ team_member.team_rank()) &&
- (a.dimension(0)>team_idx+ team_member.team_rank())) {
- for( typename ExecutionSpace::size_type k=0;k<a.dimension_1();k++)
- for( typename ExecutionSpace::size_type l=0;l<a.dimension_2();l++)
- for( typename ExecutionSpace::size_type m=0;m<a.dimension_3();m++)
- for( typename ExecutionSpace::size_type n=0;n<a.dimension_4();n++)
- for( typename ExecutionSpace::size_type o=0;o<a.dimension_5();o++)
- for( typename ExecutionSpace::size_type p=0;p<a.dimension_6();p++)
- for( typename ExecutionSpace::size_type q=0;q<a.dimension_7();q++)
- value += a(team_idx+ team_member.team_rank(),k,l,m,n,o,p,q);
- }
- }
- };
-
- template <typename Scalar, class ExecutionSpace>
- struct test_segmented_view
- {
- typedef test_segmented_view<Scalar,ExecutionSpace> self_type;
-
- typedef Scalar scalar_type;
- typedef ExecutionSpace execution_space;
- typedef Kokkos::TeamPolicy<execution_space> Policy;
-
- double result;
- double reference;
-
- template <class ViewType>
- void run_me(ViewType a, int max_length){
- const int team_size = Policy::team_size_max( GrowTest<ViewType,execution_space>(a) );
- const int nteams = max_length/team_size;
-
- reference = 0;
- result = 0;
-
- Kokkos::parallel_reduce(Policy(nteams,team_size),GrowTest<ViewType,execution_space>(a),reference);
- Kokkos::fence();
- Kokkos::parallel_reduce(Policy(nteams,team_size),VerifyTest<ViewType,execution_space>(a),result);
- Kokkos::fence();
- }
-
-
- test_segmented_view(unsigned int size,int rank)
- {
- reference = 0;
- result = 0;
-
- const int dim_1 = 7;
- const int dim_2 = 3;
- const int dim_3 = 2;
- const int dim_4 = 3;
- const int dim_5 = 2;
- const int dim_6 = 4;
- //const int dim_7 = 3;
-
- if(rank==1) {
- typedef Kokkos::Experimental::SegmentedView<Scalar*,Kokkos::LayoutLeft,ExecutionSpace> rank1_view;
- run_me< rank1_view >(rank1_view("Rank1",128,size), size);
- }
- if(rank==2) {
- typedef Kokkos::Experimental::SegmentedView<Scalar**,Kokkos::LayoutLeft,ExecutionSpace> rank2_view;
- run_me< rank2_view >(rank2_view("Rank2",128,size,dim_1), size);
- }
- if(rank==3) {
- typedef Kokkos::Experimental::SegmentedView<Scalar*[7][3][2],Kokkos::LayoutRight,ExecutionSpace> rank3_view;
- run_me< rank3_view >(rank3_view("Rank3",128,size), size);
- }
- if(rank==4) {
- typedef Kokkos::Experimental::SegmentedView<Scalar****,Kokkos::LayoutRight,ExecutionSpace> rank4_view;
- run_me< rank4_view >(rank4_view("Rank4",128,size,dim_1,dim_2,dim_3), size);
- }
- if(rank==5) {
- typedef Kokkos::Experimental::SegmentedView<Scalar*[7][3][2][3],Kokkos::LayoutLeft,ExecutionSpace> rank5_view;
- run_me< rank5_view >(rank5_view("Rank5",128,size), size);
- }
- if(rank==6) {
- typedef Kokkos::Experimental::SegmentedView<Scalar*****[2],Kokkos::LayoutRight,ExecutionSpace> rank6_view;
- run_me< rank6_view >(rank6_view("Rank6",128,size,dim_1,dim_2,dim_3,dim_4), size);
- }
- if(rank==7) {
- typedef Kokkos::Experimental::SegmentedView<Scalar*******,Kokkos::LayoutLeft,ExecutionSpace> rank7_view;
- run_me< rank7_view >(rank7_view("Rank7",128,size,dim_1,dim_2,dim_3,dim_4,dim_5,dim_6), size);
- }
- if(rank==8) {
- typedef Kokkos::Experimental::SegmentedView<Scalar*****[2][4][3],Kokkos::LayoutLeft,ExecutionSpace> rank8_view;
- run_me< rank8_view >(rank8_view("Rank8",128,size,dim_1,dim_2,dim_3,dim_4), size);
- }
- }
-
- };
-
-} // namespace Impl
-
-
-
-
-template <typename Scalar, class ExecutionSpace>
-void test_segmented_view(unsigned int size)
-{
- {
- typedef Kokkos::Experimental::SegmentedView<Scalar*****[2][4][3],Kokkos::LayoutLeft,ExecutionSpace> view_type;
- view_type a("A",128,size,7,3,2,3);
- double reference;
-
- Impl::GrowTest<view_type,ExecutionSpace> f(a);
-
- const int team_size = Kokkos::TeamPolicy<ExecutionSpace>::team_size_max( f );
- const int nteams = (size+team_size-1)/team_size;
-
- Kokkos::parallel_reduce(Kokkos::TeamPolicy<ExecutionSpace>(nteams,team_size),f,reference);
-
- size_t real_size = ((size+127)/128)*128;
-
- ASSERT_EQ(real_size,a.dimension_0());
- ASSERT_EQ(7,a.dimension_1());
- ASSERT_EQ(3,a.dimension_2());
- ASSERT_EQ(2,a.dimension_3());
- ASSERT_EQ(3,a.dimension_4());
- ASSERT_EQ(2,a.dimension_5());
- ASSERT_EQ(4,a.dimension_6());
- ASSERT_EQ(3,a.dimension_7());
- ASSERT_EQ(real_size,a.dimension(0));
- ASSERT_EQ(7,a.dimension(1));
- ASSERT_EQ(3,a.dimension(2));
- ASSERT_EQ(2,a.dimension(3));
- ASSERT_EQ(3,a.dimension(4));
- ASSERT_EQ(2,a.dimension(5));
- ASSERT_EQ(4,a.dimension(6));
- ASSERT_EQ(3,a.dimension(7));
- ASSERT_EQ(8,a.Rank);
- }
- {
- Impl::test_segmented_view<Scalar,ExecutionSpace> test(size,1);
- ASSERT_EQ(test.reference,test.result);
- }
- {
- Impl::test_segmented_view<Scalar,ExecutionSpace> test(size,2);
- ASSERT_EQ(test.reference,test.result);
- }
- {
- Impl::test_segmented_view<Scalar,ExecutionSpace> test(size,3);
- ASSERT_EQ(test.reference,test.result);
- }
- {
- Impl::test_segmented_view<Scalar,ExecutionSpace> test(size,4);
- ASSERT_EQ(test.reference,test.result);
- }
- {
- Impl::test_segmented_view<Scalar,ExecutionSpace> test(size,5);
- ASSERT_EQ(test.reference,test.result);
- }
- {
- Impl::test_segmented_view<Scalar,ExecutionSpace> test(size,6);
- ASSERT_EQ(test.reference,test.result);
- }
- {
- Impl::test_segmented_view<Scalar,ExecutionSpace> test(size,7);
- ASSERT_EQ(test.reference,test.result);
- }
- {
- Impl::test_segmented_view<Scalar,ExecutionSpace> test(size,8);
- ASSERT_EQ(test.reference,test.result);
- }
-
-}
-
-
-} // namespace Test
-
-#else
-
-template <typename Scalar, class ExecutionSpace>
-void test_segmented_view(unsigned int ) {}
-
-#endif
-
-#endif /* #ifndef KOKKOS_TEST_SEGMENTEDVIEW_HPP */
-
diff --git a/lib/kokkos/containers/unit_tests/TestSerial.cpp b/lib/kokkos/containers/unit_tests/TestSerial.cpp
index a7c42d279..2be27ea61 100644
--- a/lib/kokkos/containers/unit_tests/TestSerial.cpp
+++ b/lib/kokkos/containers/unit_tests/TestSerial.cpp
@@ -1,175 +1,183 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
#if ! defined(KOKKOS_HAVE_SERIAL)
# error "It doesn't make sense to build this file unless the Kokkos::Serial device is enabled. If you see this message, it probably means that there is an error in Kokkos' CMake build infrastructure."
#else
#include <Kokkos_Bitset.hpp>
#include <Kokkos_UnorderedMap.hpp>
#include <Kokkos_Vector.hpp>
#include <TestBitset.hpp>
#include <TestUnorderedMap.hpp>
#include <TestStaticCrsGraph.hpp>
#include <TestVector.hpp>
#include <TestDualView.hpp>
-#include <TestSegmentedView.hpp>
#include <TestDynamicView.hpp>
#include <TestComplex.hpp>
#include <iomanip>
#include <Kokkos_DynRankView.hpp>
#include <TestDynViewAPI.hpp>
+#include <Kokkos_ErrorReporter.hpp>
+#include <TestErrorReporter.hpp>
+
namespace Test {
class serial : public ::testing::Test {
protected:
static void SetUpTestCase () {
std::cout << std::setprecision(5) << std::scientific;
Kokkos::Serial::initialize ();
}
static void TearDownTestCase () {
Kokkos::Serial::finalize ();
}
};
TEST_F( serial, dyn_view_api) {
TestDynViewAPI< double , Kokkos::Serial >();
}
TEST_F( serial , staticcrsgraph )
{
TestStaticCrsGraph::run_test_graph< Kokkos::Serial >();
TestStaticCrsGraph::run_test_graph2< Kokkos::Serial >();
}
TEST_F( serial, complex )
{
testComplex<Kokkos::Serial> ();
}
TEST_F( serial, bitset )
{
test_bitset<Kokkos::Serial> ();
}
#define SERIAL_INSERT_TEST( name, num_nodes, num_inserts, num_duplicates, repeat, near ) \
TEST_F( serial, UnorderedMap_insert_##name##_##num_nodes##_##num_inserts##_##num_duplicates##_##repeat##x) { \
for (int i=0; i<repeat; ++i) \
test_insert<Kokkos::Serial> (num_nodes, num_inserts, num_duplicates, near); \
}
#define SERIAL_FAILED_INSERT_TEST( num_nodes, repeat ) \
TEST_F( serial, UnorderedMap_failed_insert_##num_nodes##_##repeat##x) { \
for (int i=0; i<repeat; ++i) \
test_failed_insert<Kokkos::Serial> (num_nodes); \
}
#define SERIAL_ASSIGNEMENT_TEST( num_nodes, repeat ) \
TEST_F( serial, UnorderedMap_assignment_operators_##num_nodes##_##repeat##x) { \
for (int i=0; i<repeat; ++i) \
test_assignement_operators<Kokkos::Serial> (num_nodes); \
}
#define SERIAL_DEEP_COPY( num_nodes, repeat ) \
TEST_F( serial, UnorderedMap_deep_copy##num_nodes##_##repeat##x) { \
for (int i=0; i<repeat; ++i) \
test_deep_copy<Kokkos::Serial> (num_nodes); \
}
#define SERIAL_VECTOR_COMBINE_TEST( size ) \
TEST_F( serial, vector_combination##size##x) { \
test_vector_combinations<int,Kokkos::Serial>(size); \
}
#define SERIAL_DUALVIEW_COMBINE_TEST( size ) \
TEST_F( serial, dualview_combination##size##x) { \
test_dualview_combinations<int,Kokkos::Serial>(size); \
}
-#define SERIAL_SEGMENTEDVIEW_TEST( size ) \
- TEST_F( serial, segmentedview_##size##x) { \
- test_segmented_view<double,Kokkos::Serial>(size); \
- }
-
SERIAL_INSERT_TEST(close, 100000, 90000, 100, 500, true)
SERIAL_INSERT_TEST(far, 100000, 90000, 100, 500, false)
SERIAL_FAILED_INSERT_TEST( 10000, 1000 )
SERIAL_DEEP_COPY( 10000, 1 )
SERIAL_VECTOR_COMBINE_TEST( 10 )
SERIAL_VECTOR_COMBINE_TEST( 3057 )
SERIAL_DUALVIEW_COMBINE_TEST( 10 )
-SERIAL_SEGMENTEDVIEW_TEST( 10000 )
#undef SERIAL_INSERT_TEST
#undef SERIAL_FAILED_INSERT_TEST
#undef SERIAL_ASSIGNEMENT_TEST
#undef SERIAL_DEEP_COPY
#undef SERIAL_VECTOR_COMBINE_TEST
#undef SERIAL_DUALVIEW_COMBINE_TEST
-#undef SERIAL_SEGMENTEDVIEW_TEST
TEST_F( serial , dynamic_view )
{
typedef TestDynamicView< double , Kokkos::Serial >
TestDynView ;
for ( int i = 0 ; i < 10 ; ++i ) {
TestDynView::run( 100000 + 100 * i );
}
}
+#if defined(KOKKOS_CLASS_LAMBDA)
+TEST_F(serial, ErrorReporterViaLambda)
+{
+ TestErrorReporter<ErrorReporterDriverUseLambda<Kokkos::Serial>>();
+}
+#endif
+
+TEST_F(serial, ErrorReporter)
+{
+ TestErrorReporter<ErrorReporterDriver<Kokkos::Serial>>();
+}
+
+
} // namespace Test
#endif // KOKKOS_HAVE_SERIAL
diff --git a/lib/kokkos/containers/unit_tests/TestThreads.cpp b/lib/kokkos/containers/unit_tests/TestThreads.cpp
index 58277528d..3b34006a0 100644
--- a/lib/kokkos/containers/unit_tests/TestThreads.cpp
+++ b/lib/kokkos/containers/unit_tests/TestThreads.cpp
@@ -1,188 +1,194 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
#if defined( KOKKOS_HAVE_PTHREAD )
#include <Kokkos_Bitset.hpp>
#include <Kokkos_UnorderedMap.hpp>
#include <Kokkos_Vector.hpp>
#include <iomanip>
//----------------------------------------------------------------------------
#include <TestBitset.hpp>
#include <TestUnorderedMap.hpp>
#include <TestStaticCrsGraph.hpp>
#include <TestVector.hpp>
#include <TestDualView.hpp>
#include <TestDynamicView.hpp>
-#include <TestSegmentedView.hpp>
#include <Kokkos_DynRankView.hpp>
#include <TestDynViewAPI.hpp>
+#include <Kokkos_ErrorReporter.hpp>
+#include <TestErrorReporter.hpp>
+
namespace Test {
class threads : public ::testing::Test {
protected:
static void SetUpTestCase()
{
std::cout << std::setprecision(5) << std::scientific;
unsigned num_threads = 4;
if (Kokkos::hwloc::available()) {
num_threads = Kokkos::hwloc::get_available_numa_count()
* Kokkos::hwloc::get_available_cores_per_numa()
// * Kokkos::hwloc::get_available_threads_per_core()
;
}
std::cout << "Threads: " << num_threads << std::endl;
Kokkos::Threads::initialize( num_threads );
}
static void TearDownTestCase()
{
Kokkos::Threads::finalize();
}
};
TEST_F( threads , dyn_view_api) {
TestDynViewAPI< double , Kokkos::Threads >();
}
TEST_F( threads , staticcrsgraph )
{
TestStaticCrsGraph::run_test_graph< Kokkos::Threads >();
TestStaticCrsGraph::run_test_graph2< Kokkos::Threads >();
}
/*TEST_F( threads, bitset )
{
test_bitset<Kokkos::Threads>();
}*/
#define THREADS_INSERT_TEST( name, num_nodes, num_inserts, num_duplicates, repeat, near ) \
TEST_F( threads, UnorderedMap_insert_##name##_##num_nodes##_##num_inserts##_##num_duplicates##_##repeat##x) { \
for (int i=0; i<repeat; ++i) \
test_insert<Kokkos::Threads>(num_nodes,num_inserts,num_duplicates, near); \
}
#define THREADS_FAILED_INSERT_TEST( num_nodes, repeat ) \
TEST_F( threads, UnorderedMap_failed_insert_##num_nodes##_##repeat##x) { \
for (int i=0; i<repeat; ++i) \
test_failed_insert<Kokkos::Threads>(num_nodes); \
}
#define THREADS_ASSIGNEMENT_TEST( num_nodes, repeat ) \
TEST_F( threads, UnorderedMap_assignment_operators_##num_nodes##_##repeat##x) { \
for (int i=0; i<repeat; ++i) \
test_assignement_operators<Kokkos::Threads>(num_nodes); \
}
#define THREADS_DEEP_COPY( num_nodes, repeat ) \
TEST_F( threads, UnorderedMap_deep_copy##num_nodes##_##repeat##x) { \
for (int i=0; i<repeat; ++i) \
test_deep_copy<Kokkos::Threads>(num_nodes); \
}
#define THREADS_VECTOR_COMBINE_TEST( size ) \
TEST_F( threads, vector_combination##size##x) { \
test_vector_combinations<int,Kokkos::Threads>(size); \
}
#define THREADS_DUALVIEW_COMBINE_TEST( size ) \
TEST_F( threads, dualview_combination##size##x) { \
test_dualview_combinations<int,Kokkos::Threads>(size); \
}
-#define THREADS_SEGMENTEDVIEW_TEST( size ) \
- TEST_F( threads, segmentedview_##size##x) { \
- test_segmented_view<double,Kokkos::Threads>(size); \
- }
-
-
THREADS_INSERT_TEST(far, 100000, 90000, 100, 500, false)
THREADS_FAILED_INSERT_TEST( 10000, 1000 )
THREADS_DEEP_COPY( 10000, 1 )
THREADS_VECTOR_COMBINE_TEST( 10 )
THREADS_VECTOR_COMBINE_TEST( 3057 )
THREADS_DUALVIEW_COMBINE_TEST( 10 )
-THREADS_SEGMENTEDVIEW_TEST( 10000 )
#undef THREADS_INSERT_TEST
#undef THREADS_FAILED_INSERT_TEST
#undef THREADS_ASSIGNEMENT_TEST
#undef THREADS_DEEP_COPY
#undef THREADS_VECTOR_COMBINE_TEST
#undef THREADS_DUALVIEW_COMBINE_TEST
-#undef THREADS_SEGMENTEDVIEW_TEST
-
TEST_F( threads , dynamic_view )
{
typedef TestDynamicView< double , Kokkos::Threads >
TestDynView ;
for ( int i = 0 ; i < 10 ; ++i ) {
TestDynView::run( 100000 + 100 * i );
}
}
+
+#if defined(KOKKOS_CLASS_LAMBDA)
+TEST_F(threads, ErrorReporterViaLambda)
+{
+ TestErrorReporter<ErrorReporterDriverUseLambda<Kokkos::Threads>>();
+}
+#endif
+
+TEST_F(threads, ErrorReporter)
+{
+ TestErrorReporter<ErrorReporterDriver<Kokkos::Threads>>();
+}
+
} // namespace Test
#endif /* #if defined( KOKKOS_HAVE_PTHREAD ) */
diff --git a/lib/kokkos/core/cmake/Dependencies.cmake b/lib/kokkos/core/cmake/Dependencies.cmake
index 34ff0be5d..ae9a20c50 100644
--- a/lib/kokkos/core/cmake/Dependencies.cmake
+++ b/lib/kokkos/core/cmake/Dependencies.cmake
@@ -1,4 +1,6 @@
TRIBITS_PACKAGE_DEFINE_DEPENDENCIES(
LIB_OPTIONAL_TPLS Pthread CUDA HWLOC QTHREAD DLlib
TEST_OPTIONAL_TPLS CUSPARSE
)
+
+TRIBITS_TPL_TENTATIVELY_ENABLE(DLlib)
\ No newline at end of file
diff --git a/lib/kokkos/core/cmake/KokkosCore_config.h.in b/lib/kokkos/core/cmake/KokkosCore_config.h.in
index 27e3ba1c3..9359b5a32 100644
--- a/lib/kokkos/core/cmake/KokkosCore_config.h.in
+++ b/lib/kokkos/core/cmake/KokkosCore_config.h.in
@@ -1,57 +1,67 @@
#ifndef KOKKOS_CORE_CONFIG_H
#define KOKKOS_CORE_CONFIG_H
/* The trivial 'src/build_common.sh' creates a config
* that must stay in sync with this file.
*/
#cmakedefine KOKKOS_FOR_SIERRA
#if !defined( KOKKOS_FOR_SIERRA )
#cmakedefine KOKKOS_HAVE_MPI
#cmakedefine KOKKOS_HAVE_CUDA
// mfh 16 Sep 2014: If passed in on the command line, that overrides
// any value of KOKKOS_USE_CUDA_UVM here. Doing this should prevent build
// warnings like this one:
//
// packages/kokkos/core/src/KokkosCore_config.h:13:1: warning: "KOKKOS_USE_CUDA_UVM" redefined
//
// At some point, we should edit the test-build scripts in
// Trilinos/cmake/ctest/drivers/perseus/, and take
// -DKOKKOS_USE_CUDA_UVM from the command-line arguments there. I
// hesitate to do that now, because I'm not sure if all the files are
// including KokkosCore_config.h (or a header file that includes it) like
// they should.
#if ! defined(KOKKOS_USE_CUDA_UVM)
#cmakedefine KOKKOS_USE_CUDA_UVM
#endif // ! defined(KOKKOS_USE_CUDA_UVM)
#cmakedefine KOKKOS_HAVE_PTHREAD
#cmakedefine KOKKOS_HAVE_SERIAL
#cmakedefine KOKKOS_HAVE_QTHREAD
#cmakedefine KOKKOS_HAVE_Winthread
#cmakedefine KOKKOS_HAVE_OPENMP
#cmakedefine KOKKOS_HAVE_HWLOC
#cmakedefine KOKKOS_HAVE_DEBUG
#cmakedefine KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK
#cmakedefine KOKKOS_HAVE_CXX11
#cmakedefine KOKKOS_HAVE_CUSPARSE
#cmakedefine KOKKOS_ENABLE_PROFILING_INTERNAL
#ifdef KOKKOS_ENABLE_PROFILING_INTERNAL
#define KOKKOS_ENABLE_PROFILING 1
#else
#define KOKKOS_ENABLE_PROFILING 0
#endif
+#cmakedefine KOKKOS_HAVE_CUDA_RDC
+#ifdef KOKKOS_HAVE_CUDA_RDC
+#define KOKKOS_CUDA_USE_RELOCATABLE_DEVICE_CODE 1
+#endif
+
+#cmakedefine KOKKOS_HAVE_CUDA_LAMBDA
+#ifdef KOKKOS_HAVE_CUDA_LAMBDA
+#define KOKKOS_CUDA_USE_LAMBDA 1
+#endif
+
// Don't forbid users from defining this macro on the command line,
// but still make sure that CMake logic can control its definition.
#if ! defined(KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA)
#cmakedefine KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA 1
#endif // KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA
#cmakedefine KOKKOS_USING_DEPRECATED_VIEW
#endif // KOKKOS_FOR_SIERRA
#endif // KOKKOS_CORE_CONFIG_H
diff --git a/lib/kokkos/core/perf_test/CMakeLists.txt b/lib/kokkos/core/perf_test/CMakeLists.txt
index d93ca14d9..cae52f140 100644
--- a/lib/kokkos/core/perf_test/CMakeLists.txt
+++ b/lib/kokkos/core/perf_test/CMakeLists.txt
@@ -1,29 +1,29 @@
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINRARY_DIR})
-INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR})
+INCLUDE_DIRECTORIES(REQUIRED_DURING_INSTALLATION_TESTING ${CMAKE_CURRENT_SOURCE_DIR})
SET(SOURCES
PerfTestMain.cpp
PerfTestHost.cpp
PerfTestCuda.cpp
)
# Per #374, we always want to build this test, but we only want to run
# it as a PERFORMANCE test. That's why we separate building the test
# from running the test.
TRIBITS_ADD_EXECUTABLE(
PerfTestExec
SOURCES ${SOURCES}
COMM serial mpi
TESTONLYLIBS kokkos_gtest
)
-TRIBITS_ADD_EXECUTABLE_AND_TEST(
+TRIBITS_ADD_TEST(
PerfTest
NAME PerfTestExec
COMM serial mpi
NUM_MPI_PROCS 1
CATEGORIES PERFORMANCE
FAIL_REGULAR_EXPRESSION " FAILED "
)
diff --git a/lib/kokkos/core/perf_test/Makefile b/lib/kokkos/core/perf_test/Makefile
index 8fa1fbfc3..85f869971 100644
--- a/lib/kokkos/core/perf_test/Makefile
+++ b/lib/kokkos/core/perf_test/Makefile
@@ -1,66 +1,63 @@
KOKKOS_PATH = ../..
GTEST_PATH = ../../tpls/gtest
vpath %.cpp ${KOKKOS_PATH}/core/perf_test
default: build_all
echo "End Build"
-
-include $(KOKKOS_PATH)/Makefile.kokkos
-
-ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
- CXX = $(NVCC_WRAPPER)
- CXXFLAGS ?= -O3
- LINK = $(CXX)
- LDFLAGS ?= -lpthread
+ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
+ CXX = $(KOKKOS_PATH)/config/nvcc_wrapper
else
- CXX ?= g++
- CXXFLAGS ?= -O3
- LINK ?= $(CXX)
- LDFLAGS ?= -lpthread
+ CXX = g++
endif
+CXXFLAGS = -O3
+LINK ?= $(CXX)
+LDFLAGS ?= -lpthread
+
+include $(KOKKOS_PATH)/Makefile.kokkos
+
KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/core/perf_test
TEST_TARGETS =
TARGETS =
OBJ_PERF = PerfTestHost.o PerfTestCuda.o PerfTestMain.o gtest-all.o
TARGETS += KokkosCore_PerformanceTest
TEST_TARGETS += test-performance
OBJ_ATOMICS = test_atomic.o
TARGETS += KokkosCore_PerformanceTest_Atomics
TEST_TARGETS += test-atomic
KokkosCore_PerformanceTest: $(OBJ_PERF) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_PERF) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_PerformanceTest
KokkosCore_PerformanceTest_Atomics: $(OBJ_ATOMICS) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_ATOMICS) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_PerformanceTest_Atomics
test-performance: KokkosCore_PerformanceTest
./KokkosCore_PerformanceTest
test-atomic: KokkosCore_PerformanceTest_Atomics
./KokkosCore_PerformanceTest_Atomics
build_all: $(TARGETS)
test: $(TEST_TARGETS)
clean: kokkos-clean
rm -f *.o $(TARGETS)
# Compilation rules
%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
gtest-all.o:$(GTEST_PATH)/gtest/gtest-all.cc
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $(GTEST_PATH)/gtest/gtest-all.cc
diff --git a/lib/kokkos/core/perf_test/PerfTestHost.cpp b/lib/kokkos/core/perf_test/PerfTestHost.cpp
index 6a0f2efad..4a05eecfe 100644
--- a/lib/kokkos/core/perf_test/PerfTestHost.cpp
+++ b/lib/kokkos/core/perf_test/PerfTestHost.cpp
@@ -1,104 +1,115 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
#if defined( KOKKOS_HAVE_OPENMP )
typedef Kokkos::OpenMP TestHostDevice ;
const char TestHostDeviceName[] = "Kokkos::OpenMP" ;
#elif defined( KOKKOS_HAVE_PTHREAD )
typedef Kokkos::Threads TestHostDevice ;
const char TestHostDeviceName[] = "Kokkos::Threads" ;
#elif defined( KOKKOS_HAVE_SERIAL )
typedef Kokkos::Serial TestHostDevice ;
const char TestHostDeviceName[] = "Kokkos::Serial" ;
#else
# error "You must enable at least one of the following execution spaces in order to build this test: Kokkos::Threads, Kokkos::OpenMP, or Kokkos::Serial."
#endif
#include <impl/Kokkos_Timer.hpp>
#include <PerfTestHexGrad.hpp>
#include <PerfTestBlasKernels.hpp>
#include <PerfTestGramSchmidt.hpp>
#include <PerfTestDriver.hpp>
//------------------------------------------------------------------------
namespace Test {
class host : public ::testing::Test {
protected:
static void SetUpTestCase()
{
- const unsigned team_count = Kokkos::hwloc::get_available_numa_count();
- const unsigned threads_per_team = 4 ;
-
- TestHostDevice::initialize( team_count * threads_per_team );
+ if(Kokkos::hwloc::available()) {
+ const unsigned numa_count = Kokkos::hwloc::get_available_numa_count();
+ const unsigned cores_per_numa = Kokkos::hwloc::get_available_cores_per_numa();
+ const unsigned threads_per_core = Kokkos::hwloc::get_available_threads_per_core();
+
+ unsigned threads_count = 0 ;
+
+ threads_count = std::max( 1u , numa_count )
+ * std::max( 2u , cores_per_numa * threads_per_core );
+
+ TestHostDevice::initialize( threads_count );
+ } else {
+ const unsigned thread_count = 4 ;
+ TestHostDevice::initialize( thread_count );
+ }
}
static void TearDownTestCase()
{
TestHostDevice::finalize();
}
};
TEST_F( host, hexgrad ) {
EXPECT_NO_THROW(run_test_hexgrad< TestHostDevice>( 10, 20, TestHostDeviceName ));
}
TEST_F( host, gramschmidt ) {
EXPECT_NO_THROW(run_test_gramschmidt< TestHostDevice>( 10, 20, TestHostDeviceName ));
}
} // namespace Test
diff --git a/lib/kokkos/core/src/Cuda/KokkosExp_Cuda_View.hpp b/lib/kokkos/core/src/Cuda/KokkosExp_Cuda_View.hpp
deleted file mode 100644
index 4ed7d8e2a..000000000
--- a/lib/kokkos/core/src/Cuda/KokkosExp_Cuda_View.hpp
+++ /dev/null
@@ -1,334 +0,0 @@
-/*
-//@HEADER
-// ************************************************************************
-//
-// Kokkos v. 2.0
-// Copyright (2014) Sandia Corporation
-//
-// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
-// the U.S. Government retains certain rights in this software.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are
-// met:
-//
-// 1. Redistributions of source code must retain the above copyright
-// notice, this list of conditions and the following disclaimer.
-//
-// 2. Redistributions in binary form must reproduce the above copyright
-// notice, this list of conditions and the following disclaimer in the
-// documentation and/or other materials provided with the distribution.
-//
-// 3. Neither the name of the Corporation nor the names of the
-// contributors may be used to endorse or promote products derived from
-// this software without specific prior written permission.
-//
-// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
-// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
-// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
-// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
-// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
-// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
-// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
-// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
-// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-//
-// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
-// ************************************************************************
-//@HEADER
-*/
-
-#ifndef KOKKOS_EXPERIMENTAL_CUDA_VIEW_HPP
-#define KOKKOS_EXPERIMENTAL_CUDA_VIEW_HPP
-
-/* only compile this file if CUDA is enabled for Kokkos */
-#if defined( KOKKOS_HAVE_CUDA )
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Experimental {
-namespace Impl {
-
-template<>
-struct ViewOperatorBoundsErrorAbort< Kokkos::CudaSpace > {
- KOKKOS_INLINE_FUNCTION
- static void apply( const size_t rank
- , const size_t n0 , const size_t n1
- , const size_t n2 , const size_t n3
- , const size_t n4 , const size_t n5
- , const size_t n6 , const size_t n7
- , const size_t i0 , const size_t i1
- , const size_t i2 , const size_t i3
- , const size_t i4 , const size_t i5
- , const size_t i6 , const size_t i7 )
- {
- const int r =
- ( n0 <= i0 ? 0 :
- ( n1 <= i1 ? 1 :
- ( n2 <= i2 ? 2 :
- ( n3 <= i3 ? 3 :
- ( n4 <= i4 ? 4 :
- ( n5 <= i5 ? 5 :
- ( n6 <= i6 ? 6 : 7 )))))));
- const size_t n =
- ( n0 <= i0 ? n0 :
- ( n1 <= i1 ? n1 :
- ( n2 <= i2 ? n2 :
- ( n3 <= i3 ? n3 :
- ( n4 <= i4 ? n4 :
- ( n5 <= i5 ? n5 :
- ( n6 <= i6 ? n6 : n7 )))))));
- const size_t i =
- ( n0 <= i0 ? i0 :
- ( n1 <= i1 ? i1 :
- ( n2 <= i2 ? i2 :
- ( n3 <= i3 ? i3 :
- ( n4 <= i4 ? i4 :
- ( n5 <= i5 ? i5 :
- ( n6 <= i6 ? i6 : i7 )))))));
- printf("Cuda view array bounds error index %d : FAILED %lu < %lu\n" , r , i , n );
- Kokkos::Impl::cuda_abort("Cuda view array bounds error");
- }
-};
-
-} // namespace Impl
-} // namespace Experimental
-} // namespace Kokkos
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Experimental {
-namespace Impl {
-
-// Cuda Texture fetches can be performed for 4, 8 and 16 byte objects (int,int2,int4)
-// Via reinterpret_case this can be used to support all scalar types of those sizes.
-// Any other scalar type falls back to either normal reads out of global memory,
-// or using the __ldg intrinsic on Kepler GPUs or newer (Compute Capability >= 3.0)
-
-template< typename ValueType , typename AliasType >
-struct CudaTextureFetch {
-
- ::cudaTextureObject_t m_obj ;
- const ValueType * m_ptr ;
- int m_offset ;
-
- // Deference operator pulls through texture object and returns by value
- template< typename iType >
- KOKKOS_INLINE_FUNCTION
- ValueType operator[]( const iType & i ) const
- {
-#if defined( __CUDA_ARCH__ ) && ( 300 <= __CUDA_ARCH__ )
- AliasType v = tex1Dfetch<AliasType>( m_obj , i + m_offset );
- return *(reinterpret_cast<ValueType*> (&v));
-#else
- return m_ptr[ i ];
-#endif
- }
-
- // Pointer to referenced memory
- KOKKOS_INLINE_FUNCTION
- operator const ValueType * () const { return m_ptr ; }
-
-
- KOKKOS_INLINE_FUNCTION
- CudaTextureFetch() : m_obj() , m_ptr() , m_offset() {}
-
- KOKKOS_INLINE_FUNCTION
- ~CudaTextureFetch() {}
-
- KOKKOS_INLINE_FUNCTION
- CudaTextureFetch( const CudaTextureFetch & rhs )
- : m_obj( rhs.m_obj )
- , m_ptr( rhs.m_ptr )
- , m_offset( rhs.m_offset )
- {}
-
- KOKKOS_INLINE_FUNCTION
- CudaTextureFetch( CudaTextureFetch && rhs )
- : m_obj( rhs.m_obj )
- , m_ptr( rhs.m_ptr )
- , m_offset( rhs.m_offset )
- {}
-
- KOKKOS_INLINE_FUNCTION
- CudaTextureFetch & operator = ( const CudaTextureFetch & rhs )
- {
- m_obj = rhs.m_obj ;
- m_ptr = rhs.m_ptr ;
- m_offset = rhs.m_offset ;
- return *this ;
- }
-
- KOKKOS_INLINE_FUNCTION
- CudaTextureFetch & operator = ( CudaTextureFetch && rhs )
- {
- m_obj = rhs.m_obj ;
- m_ptr = rhs.m_ptr ;
- m_offset = rhs.m_offset ;
- return *this ;
- }
-
- // Texture object spans the entire allocation.
- // This handle may view a subset of the allocation, so an offset is required.
- template< class CudaMemorySpace >
- inline explicit
- CudaTextureFetch( const ValueType * const arg_ptr
- , Kokkos::Experimental::Impl::SharedAllocationRecord< CudaMemorySpace , void > & record
- )
- : m_obj( record.template attach_texture_object< AliasType >() )
- , m_ptr( arg_ptr )
- , m_offset( record.attach_texture_object_offset( reinterpret_cast<const AliasType*>( arg_ptr ) ) )
- {}
-};
-
-#if defined( KOKKOS_CUDA_USE_LDG_INTRINSIC )
-
-template< typename ValueType , typename AliasType >
-struct CudaLDGFetch {
-
- const ValueType * m_ptr ;
-
- template< typename iType >
- KOKKOS_INLINE_FUNCTION
- ValueType operator[]( const iType & i ) const
- {
- AliasType v = __ldg(reinterpret_cast<AliasType*>(&m_ptr[i]));
- return *(reinterpret_cast<ValueType*> (&v));
- }
-
- KOKKOS_INLINE_FUNCTION
- operator const ValueType * () const { return m_ptr ; }
-
- KOKKOS_INLINE_FUNCTION
- CudaLDGFetch() : m_ptr() {}
-
- KOKKOS_INLINE_FUNCTION
- ~CudaLDGFetch() {}
-
- KOKKOS_INLINE_FUNCTION
- CudaLDGFetch( const CudaLDGFetch & rhs )
- : m_ptr( rhs.m_ptr )
- {}
-
- KOKKOS_INLINE_FUNCTION
- CudaLDGFetch( CudaLDGFetch && rhs )
- : m_ptr( rhs.m_ptr )
- {}
-
- KOKKOS_INLINE_FUNCTION
- CudaLDGFetch & operator = ( const CudaLDGFetch & rhs )
- {
- m_ptr = rhs.m_ptr ;
- return *this ;
- }
-
- KOKKOS_INLINE_FUNCTION
- CudaLDGFetch & operator = ( CudaLDGFetch && rhs )
- {
- m_ptr = rhs.m_ptr ;
- return *this ;
- }
-
- template< class CudaMemorySpace >
- inline explicit
- CudaTextureFetch( const ValueType * const arg_ptr
- , Kokkos::Experimental::Impl::SharedAllocationRecord< CudaMemorySpace , void > const &
- )
- : m_ptr( arg_data_ptr )
- {}
-};
-
-#endif
-
-} // namespace Impl
-} // namespace Experimental
-} // namespace Kokkos
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Experimental {
-namespace Impl {
-
-/** \brief Replace Default ViewDataHandle with Cuda texture fetch specialization
- * if 'const' value type, CudaSpace and random access.
- */
-template< class Traits >
-class ViewDataHandle< Traits ,
- typename std::enable_if<(
- // Is Cuda memory space
- ( std::is_same< typename Traits::memory_space,Kokkos::CudaSpace>::value ||
- std::is_same< typename Traits::memory_space,Kokkos::CudaUVMSpace>::value )
- &&
- // Is a trivial const value of 4, 8, or 16 bytes
- std::is_trivial<typename Traits::const_value_type>::value
- &&
- std::is_same<typename Traits::const_value_type,typename Traits::value_type>::value
- &&
- ( sizeof(typename Traits::const_value_type) == 4 ||
- sizeof(typename Traits::const_value_type) == 8 ||
- sizeof(typename Traits::const_value_type) == 16 )
- &&
- // Random access trait
- ( Traits::memory_traits::RandomAccess != 0 )
- )>::type >
-{
-public:
-
- using track_type = Kokkos::Experimental::Impl::SharedAllocationTracker ;
-
- using value_type = typename Traits::const_value_type ;
- using return_type = typename Traits::const_value_type ; // NOT a reference
-
- using alias_type = typename std::conditional< ( sizeof(value_type) == 4 ) , int ,
- typename std::conditional< ( sizeof(value_type) == 8 ) , ::int2 ,
- typename std::conditional< ( sizeof(value_type) == 16 ) , ::int4 , void
- >::type
- >::type
- >::type ;
-
-#if defined( KOKKOS_CUDA_USE_LDG_INTRINSIC )
- using handle_type = Kokkos::Experimental::Impl::CudaLDGFetch< value_type , alias_type > ;
-#else
- using handle_type = Kokkos::Experimental::Impl::CudaTextureFetch< value_type , alias_type > ;
-#endif
-
- KOKKOS_INLINE_FUNCTION
- static handle_type const & assign( handle_type const & arg_handle , track_type const & /* arg_tracker */ )
- {
- return arg_handle ;
- }
-
- KOKKOS_INLINE_FUNCTION
- static handle_type assign( value_type * arg_data_ptr, track_type const & arg_tracker )
- {
-#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- // Assignment of texture = non-texture requires creation of a texture object
- // which can only occur on the host. In addition, 'get_record' is only valid
- // if called in a host execution space
- return handle_type( arg_data_ptr , arg_tracker.template get_record< typename Traits::memory_space >() );
-#else
- Kokkos::Impl::cuda_abort("Cannot create Cuda texture object from within a Cuda kernel");
- return handle_type();
-#endif
- }
-};
-
-}
-}
-}
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-#endif /* #if defined( KOKKOS_HAVE_CUDA ) */
-#endif /* #ifndef KOKKOS_CUDA_VIEW_HPP */
-
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_CudaSpace.cpp b/lib/kokkos/core/src/Cuda/Kokkos_CudaSpace.cpp
index a4f372d65..8abf2292d 100644
--- a/lib/kokkos/core/src/Cuda/Kokkos_CudaSpace.cpp
+++ b/lib/kokkos/core/src/Cuda/Kokkos_CudaSpace.cpp
@@ -1,829 +1,914 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <stdlib.h>
#include <iostream>
#include <sstream>
#include <stdexcept>
#include <algorithm>
+#include <atomic>
#include <Kokkos_Macros.hpp>
/* only compile this file if CUDA is enabled for Kokkos */
#ifdef KOKKOS_HAVE_CUDA
#include <Kokkos_Core.hpp>
#include <Kokkos_Cuda.hpp>
#include <Kokkos_CudaSpace.hpp>
#include <Cuda/Kokkos_Cuda_Internal.hpp>
#include <impl/Kokkos_Error.hpp>
+#if (KOKKOS_ENABLE_PROFILING)
+#include <impl/Kokkos_Profiling_Interface.hpp>
+#endif
+
+
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
namespace Kokkos {
namespace Impl {
namespace {
+
+ static std::atomic<int> num_uvm_allocations(0) ;
+
cudaStream_t get_deep_copy_stream() {
static cudaStream_t s = 0;
if( s == 0) {
cudaStreamCreate ( &s );
}
return s;
}
}
DeepCopy<CudaSpace,CudaSpace,Cuda>::DeepCopy( void * dst , const void * src , size_t n )
{ CUDA_SAFE_CALL( cudaMemcpy( dst , src , n , cudaMemcpyDefault ) ); }
DeepCopy<HostSpace,CudaSpace,Cuda>::DeepCopy( void * dst , const void * src , size_t n )
{ CUDA_SAFE_CALL( cudaMemcpy( dst , src , n , cudaMemcpyDefault ) ); }
DeepCopy<CudaSpace,HostSpace,Cuda>::DeepCopy( void * dst , const void * src , size_t n )
{ CUDA_SAFE_CALL( cudaMemcpy( dst , src , n , cudaMemcpyDefault ) ); }
DeepCopy<CudaSpace,CudaSpace,Cuda>::DeepCopy( const Cuda & instance , void * dst , const void * src , size_t n )
{ CUDA_SAFE_CALL( cudaMemcpyAsync( dst , src , n , cudaMemcpyDefault , instance.cuda_stream() ) ); }
DeepCopy<HostSpace,CudaSpace,Cuda>::DeepCopy( const Cuda & instance , void * dst , const void * src , size_t n )
{ CUDA_SAFE_CALL( cudaMemcpyAsync( dst , src , n , cudaMemcpyDefault , instance.cuda_stream() ) ); }
DeepCopy<CudaSpace,HostSpace,Cuda>::DeepCopy( const Cuda & instance , void * dst , const void * src , size_t n )
{ CUDA_SAFE_CALL( cudaMemcpyAsync( dst , src , n , cudaMemcpyDefault , instance.cuda_stream() ) ); }
void DeepCopyAsyncCuda( void * dst , const void * src , size_t n) {
cudaStream_t s = get_deep_copy_stream();
CUDA_SAFE_CALL( cudaMemcpyAsync( dst , src , n , cudaMemcpyDefault , s ) );
cudaStreamSynchronize(s);
}
} // namespace Impl
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
namespace Kokkos {
void CudaSpace::access_error()
{
const std::string msg("Kokkos::CudaSpace::access_error attempt to execute Cuda function from non-Cuda space" );
Kokkos::Impl::throw_runtime_exception( msg );
}
void CudaSpace::access_error( const void * const )
{
const std::string msg("Kokkos::CudaSpace::access_error attempt to execute Cuda function from non-Cuda space" );
Kokkos::Impl::throw_runtime_exception( msg );
}
+
/*--------------------------------------------------------------------------*/
bool CudaUVMSpace::available()
{
#if defined( CUDA_VERSION ) && ( 6000 <= CUDA_VERSION ) && !defined(__APPLE__)
enum { UVM_available = true };
#else
enum { UVM_available = false };
#endif
return UVM_available;
}
/*--------------------------------------------------------------------------*/
+int CudaUVMSpace::number_of_allocations()
+{
+ return Kokkos::Impl::num_uvm_allocations.load();
+}
+
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
namespace Kokkos {
CudaSpace::CudaSpace()
: m_device( Kokkos::Cuda().cuda_device() )
{
}
CudaUVMSpace::CudaUVMSpace()
: m_device( Kokkos::Cuda().cuda_device() )
{
}
CudaHostPinnedSpace::CudaHostPinnedSpace()
{
}
void * CudaSpace::allocate( const size_t arg_alloc_size ) const
{
void * ptr = NULL;
CUDA_SAFE_CALL( cudaMalloc( &ptr, arg_alloc_size ) );
return ptr ;
}
void * CudaUVMSpace::allocate( const size_t arg_alloc_size ) const
{
void * ptr = NULL;
- CUDA_SAFE_CALL( cudaMallocManaged( &ptr, arg_alloc_size , cudaMemAttachGlobal ) );
+ enum { max_uvm_allocations = 65536 };
+
+ if ( arg_alloc_size > 0 )
+ {
+ Kokkos::Impl::num_uvm_allocations++;
+
+ if ( Kokkos::Impl::num_uvm_allocations.load() > max_uvm_allocations ) {
+ Kokkos::Impl::throw_runtime_exception( "CudaUVM error: The maximum limit of UVM allocations exceeded (currently 65536)." ) ;
+ }
+
+ CUDA_SAFE_CALL( cudaMallocManaged( &ptr, arg_alloc_size , cudaMemAttachGlobal ) );
+ }
return ptr ;
}
void * CudaHostPinnedSpace::allocate( const size_t arg_alloc_size ) const
{
void * ptr = NULL;
CUDA_SAFE_CALL( cudaHostAlloc( &ptr, arg_alloc_size , cudaHostAllocDefault ) );
return ptr ;
}
void CudaSpace::deallocate( void * const arg_alloc_ptr , const size_t /* arg_alloc_size */ ) const
{
try {
CUDA_SAFE_CALL( cudaFree( arg_alloc_ptr ) );
} catch(...) {}
}
void CudaUVMSpace::deallocate( void * const arg_alloc_ptr , const size_t /* arg_alloc_size */ ) const
{
try {
- CUDA_SAFE_CALL( cudaFree( arg_alloc_ptr ) );
+ if ( arg_alloc_ptr != nullptr ) {
+ Kokkos::Impl::num_uvm_allocations--;
+ CUDA_SAFE_CALL( cudaFree( arg_alloc_ptr ) );
+ }
} catch(...) {}
}
void CudaHostPinnedSpace::deallocate( void * const arg_alloc_ptr , const size_t /* arg_alloc_size */ ) const
{
try {
CUDA_SAFE_CALL( cudaFreeHost( arg_alloc_ptr ) );
} catch(...) {}
}
+constexpr const char* CudaSpace::name() {
+ return m_name;
+}
+
+constexpr const char* CudaUVMSpace::name() {
+ return m_name;
+}
+
+constexpr const char* CudaHostPinnedSpace::name() {
+ return m_name;
+}
+
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
-namespace Experimental {
namespace Impl {
SharedAllocationRecord< void , void >
SharedAllocationRecord< Kokkos::CudaSpace , void >::s_root_record ;
SharedAllocationRecord< void , void >
SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::s_root_record ;
SharedAllocationRecord< void , void >
SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::s_root_record ;
::cudaTextureObject_t
SharedAllocationRecord< Kokkos::CudaSpace , void >::
attach_texture_object( const unsigned sizeof_alias
, void * const alloc_ptr
, size_t const alloc_size )
{
enum { TEXTURE_BOUND_1D = 1u << 27 };
if ( ( alloc_ptr == 0 ) || ( sizeof_alias * TEXTURE_BOUND_1D <= alloc_size ) ) {
std::ostringstream msg ;
msg << "Kokkos::CudaSpace ERROR: Cannot attach texture object to"
<< " alloc_ptr(" << alloc_ptr << ")"
<< " alloc_size(" << alloc_size << ")"
<< " max_size(" << ( sizeof_alias * TEXTURE_BOUND_1D ) << ")" ;
std::cerr << msg.str() << std::endl ;
std::cerr.flush();
Kokkos::Impl::throw_runtime_exception( msg.str() );
}
::cudaTextureObject_t tex_obj ;
struct cudaResourceDesc resDesc ;
struct cudaTextureDesc texDesc ;
memset( & resDesc , 0 , sizeof(resDesc) );
memset( & texDesc , 0 , sizeof(texDesc) );
resDesc.resType = cudaResourceTypeLinear ;
resDesc.res.linear.desc = ( sizeof_alias == 4 ? cudaCreateChannelDesc< int >() :
( sizeof_alias == 8 ? cudaCreateChannelDesc< ::int2 >() :
/* sizeof_alias == 16 */ cudaCreateChannelDesc< ::int4 >() ) );
resDesc.res.linear.sizeInBytes = alloc_size ;
resDesc.res.linear.devPtr = alloc_ptr ;
CUDA_SAFE_CALL( cudaCreateTextureObject( & tex_obj , & resDesc, & texDesc, NULL ) );
return tex_obj ;
}
std::string
SharedAllocationRecord< Kokkos::CudaSpace , void >::get_label() const
{
SharedAllocationHeader header ;
Kokkos::Impl::DeepCopy< Kokkos::HostSpace , Kokkos::CudaSpace >( & header , RecordBase::head() , sizeof(SharedAllocationHeader) );
return std::string( header.m_label );
}
std::string
SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::get_label() const
{
return std::string( RecordBase::head()->m_label );
}
std::string
SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::get_label() const
{
return std::string( RecordBase::head()->m_label );
}
SharedAllocationRecord< Kokkos::CudaSpace , void > *
SharedAllocationRecord< Kokkos::CudaSpace , void >::
allocate( const Kokkos::CudaSpace & arg_space
, const std::string & arg_label
, const size_t arg_alloc_size
)
{
return new SharedAllocationRecord( arg_space , arg_label , arg_alloc_size );
}
SharedAllocationRecord< Kokkos::CudaUVMSpace , void > *
SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::
allocate( const Kokkos::CudaUVMSpace & arg_space
, const std::string & arg_label
, const size_t arg_alloc_size
)
{
return new SharedAllocationRecord( arg_space , arg_label , arg_alloc_size );
}
SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void > *
SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::
allocate( const Kokkos::CudaHostPinnedSpace & arg_space
, const std::string & arg_label
, const size_t arg_alloc_size
)
{
return new SharedAllocationRecord( arg_space , arg_label , arg_alloc_size );
}
void
SharedAllocationRecord< Kokkos::CudaSpace , void >::
deallocate( SharedAllocationRecord< void , void > * arg_rec )
{
delete static_cast<SharedAllocationRecord*>(arg_rec);
}
void
SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::
deallocate( SharedAllocationRecord< void , void > * arg_rec )
{
delete static_cast<SharedAllocationRecord*>(arg_rec);
}
void
SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::
deallocate( SharedAllocationRecord< void , void > * arg_rec )
{
delete static_cast<SharedAllocationRecord*>(arg_rec);
}
SharedAllocationRecord< Kokkos::CudaSpace , void >::
~SharedAllocationRecord()
{
+ #if (KOKKOS_ENABLE_PROFILING)
+ if(Kokkos::Profiling::profileLibraryLoaded()) {
+
+ SharedAllocationHeader header ;
+ Kokkos::Impl::DeepCopy<CudaSpace,HostSpace>::DeepCopy( & header , RecordBase::m_alloc_ptr , sizeof(SharedAllocationHeader) );
+
+ Kokkos::Profiling::deallocateData(
+ Kokkos::Profiling::SpaceHandle(Kokkos::CudaSpace::name()),header.m_label,
+ data(),size());
+ }
+ #endif
+
m_space.deallocate( SharedAllocationRecord< void , void >::m_alloc_ptr
, SharedAllocationRecord< void , void >::m_alloc_size
);
}
SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::
~SharedAllocationRecord()
{
+ #if (KOKKOS_ENABLE_PROFILING)
+ if(Kokkos::Profiling::profileLibraryLoaded()) {
+ Kokkos::fence(); //Make sure I can access the label ...
+ Kokkos::Profiling::deallocateData(
+ Kokkos::Profiling::SpaceHandle(Kokkos::CudaUVMSpace::name()),RecordBase::m_alloc_ptr->m_label,
+ data(),size());
+ }
+ #endif
+
m_space.deallocate( SharedAllocationRecord< void , void >::m_alloc_ptr
, SharedAllocationRecord< void , void >::m_alloc_size
);
}
SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::
~SharedAllocationRecord()
{
+ #if (KOKKOS_ENABLE_PROFILING)
+ if(Kokkos::Profiling::profileLibraryLoaded()) {
+ Kokkos::Profiling::deallocateData(
+ Kokkos::Profiling::SpaceHandle(Kokkos::CudaHostPinnedSpace::name()),RecordBase::m_alloc_ptr->m_label,
+ data(),size());
+ }
+ #endif
+
m_space.deallocate( SharedAllocationRecord< void , void >::m_alloc_ptr
, SharedAllocationRecord< void , void >::m_alloc_size
);
}
SharedAllocationRecord< Kokkos::CudaSpace , void >::
SharedAllocationRecord( const Kokkos::CudaSpace & arg_space
, const std::string & arg_label
, const size_t arg_alloc_size
, const SharedAllocationRecord< void , void >::function_type arg_dealloc
)
// Pass through allocated [ SharedAllocationHeader , user_memory ]
// Pass through deallocation function
: SharedAllocationRecord< void , void >
( & SharedAllocationRecord< Kokkos::CudaSpace , void >::s_root_record
, reinterpret_cast<SharedAllocationHeader*>( arg_space.allocate( sizeof(SharedAllocationHeader) + arg_alloc_size ) )
, sizeof(SharedAllocationHeader) + arg_alloc_size
, arg_dealloc
)
, m_tex_obj( 0 )
, m_space( arg_space )
{
+ #if (KOKKOS_ENABLE_PROFILING)
+ if(Kokkos::Profiling::profileLibraryLoaded()) {
+ Kokkos::Profiling::allocateData(Kokkos::Profiling::SpaceHandle(arg_space.name()),arg_label,data(),arg_alloc_size);
+ }
+ #endif
+
SharedAllocationHeader header ;
// Fill in the Header information
header.m_record = static_cast< SharedAllocationRecord< void , void > * >( this );
strncpy( header.m_label
, arg_label.c_str()
, SharedAllocationHeader::maximum_label_length
);
// Copy to device memory
Kokkos::Impl::DeepCopy<CudaSpace,HostSpace>::DeepCopy( RecordBase::m_alloc_ptr , & header , sizeof(SharedAllocationHeader) );
}
SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::
SharedAllocationRecord( const Kokkos::CudaUVMSpace & arg_space
, const std::string & arg_label
, const size_t arg_alloc_size
, const SharedAllocationRecord< void , void >::function_type arg_dealloc
)
// Pass through allocated [ SharedAllocationHeader , user_memory ]
// Pass through deallocation function
: SharedAllocationRecord< void , void >
( & SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::s_root_record
, reinterpret_cast<SharedAllocationHeader*>( arg_space.allocate( sizeof(SharedAllocationHeader) + arg_alloc_size ) )
, sizeof(SharedAllocationHeader) + arg_alloc_size
, arg_dealloc
)
, m_tex_obj( 0 )
, m_space( arg_space )
{
- // Fill in the Header information, directly accessible via UVM
+ #if (KOKKOS_ENABLE_PROFILING)
+ if(Kokkos::Profiling::profileLibraryLoaded()) {
+ Kokkos::Profiling::allocateData(Kokkos::Profiling::SpaceHandle(arg_space.name()),arg_label,data(),arg_alloc_size);
+ }
+ #endif
+ // Fill in the Header information, directly accessible via UVM
RecordBase::m_alloc_ptr->m_record = this ;
strncpy( RecordBase::m_alloc_ptr->m_label
, arg_label.c_str()
, SharedAllocationHeader::maximum_label_length
);
}
SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::
SharedAllocationRecord( const Kokkos::CudaHostPinnedSpace & arg_space
, const std::string & arg_label
, const size_t arg_alloc_size
, const SharedAllocationRecord< void , void >::function_type arg_dealloc
)
// Pass through allocated [ SharedAllocationHeader , user_memory ]
// Pass through deallocation function
: SharedAllocationRecord< void , void >
( & SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::s_root_record
, reinterpret_cast<SharedAllocationHeader*>( arg_space.allocate( sizeof(SharedAllocationHeader) + arg_alloc_size ) )
, sizeof(SharedAllocationHeader) + arg_alloc_size
, arg_dealloc
)
, m_space( arg_space )
{
+ #if (KOKKOS_ENABLE_PROFILING)
+ if(Kokkos::Profiling::profileLibraryLoaded()) {
+ Kokkos::Profiling::allocateData(Kokkos::Profiling::SpaceHandle(arg_space.name()),arg_label,data(),arg_alloc_size);
+ }
+ #endif
// Fill in the Header information, directly accessible via UVM
RecordBase::m_alloc_ptr->m_record = this ;
strncpy( RecordBase::m_alloc_ptr->m_label
, arg_label.c_str()
, SharedAllocationHeader::maximum_label_length
);
}
//----------------------------------------------------------------------------
void * SharedAllocationRecord< Kokkos::CudaSpace , void >::
allocate_tracked( const Kokkos::CudaSpace & arg_space
, const std::string & arg_alloc_label
, const size_t arg_alloc_size )
{
if ( ! arg_alloc_size ) return (void *) 0 ;
SharedAllocationRecord * const r =
allocate( arg_space , arg_alloc_label , arg_alloc_size );
RecordBase::increment( r );
return r->data();
}
void SharedAllocationRecord< Kokkos::CudaSpace , void >::
deallocate_tracked( void * const arg_alloc_ptr )
{
if ( arg_alloc_ptr != 0 ) {
SharedAllocationRecord * const r = get_record( arg_alloc_ptr );
RecordBase::decrement( r );
}
}
void * SharedAllocationRecord< Kokkos::CudaSpace , void >::
reallocate_tracked( void * const arg_alloc_ptr
, const size_t arg_alloc_size )
{
SharedAllocationRecord * const r_old = get_record( arg_alloc_ptr );
SharedAllocationRecord * const r_new = allocate( r_old->m_space , r_old->get_label() , arg_alloc_size );
Kokkos::Impl::DeepCopy<CudaSpace,CudaSpace>( r_new->data() , r_old->data()
, std::min( r_old->size() , r_new->size() ) );
RecordBase::increment( r_new );
RecordBase::decrement( r_old );
return r_new->data();
}
void * SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::
allocate_tracked( const Kokkos::CudaUVMSpace & arg_space
, const std::string & arg_alloc_label
, const size_t arg_alloc_size )
{
if ( ! arg_alloc_size ) return (void *) 0 ;
SharedAllocationRecord * const r =
allocate( arg_space , arg_alloc_label , arg_alloc_size );
RecordBase::increment( r );
return r->data();
}
void SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::
deallocate_tracked( void * const arg_alloc_ptr )
{
if ( arg_alloc_ptr != 0 ) {
+
SharedAllocationRecord * const r = get_record( arg_alloc_ptr );
RecordBase::decrement( r );
}
}
void * SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::
reallocate_tracked( void * const arg_alloc_ptr
, const size_t arg_alloc_size )
{
SharedAllocationRecord * const r_old = get_record( arg_alloc_ptr );
SharedAllocationRecord * const r_new = allocate( r_old->m_space , r_old->get_label() , arg_alloc_size );
Kokkos::Impl::DeepCopy<CudaUVMSpace,CudaUVMSpace>( r_new->data() , r_old->data()
, std::min( r_old->size() , r_new->size() ) );
RecordBase::increment( r_new );
RecordBase::decrement( r_old );
return r_new->data();
}
void * SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::
allocate_tracked( const Kokkos::CudaHostPinnedSpace & arg_space
, const std::string & arg_alloc_label
, const size_t arg_alloc_size )
{
if ( ! arg_alloc_size ) return (void *) 0 ;
SharedAllocationRecord * const r =
allocate( arg_space , arg_alloc_label , arg_alloc_size );
RecordBase::increment( r );
return r->data();
}
void SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::
deallocate_tracked( void * const arg_alloc_ptr )
{
if ( arg_alloc_ptr != 0 ) {
SharedAllocationRecord * const r = get_record( arg_alloc_ptr );
RecordBase::decrement( r );
}
}
void * SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::
reallocate_tracked( void * const arg_alloc_ptr
, const size_t arg_alloc_size )
{
SharedAllocationRecord * const r_old = get_record( arg_alloc_ptr );
SharedAllocationRecord * const r_new = allocate( r_old->m_space , r_old->get_label() , arg_alloc_size );
Kokkos::Impl::DeepCopy<CudaHostPinnedSpace,CudaHostPinnedSpace>( r_new->data() , r_old->data()
, std::min( r_old->size() , r_new->size() ) );
RecordBase::increment( r_new );
RecordBase::decrement( r_old );
return r_new->data();
}
//----------------------------------------------------------------------------
SharedAllocationRecord< Kokkos::CudaSpace , void > *
SharedAllocationRecord< Kokkos::CudaSpace , void >::get_record( void * alloc_ptr )
{
using Header = SharedAllocationHeader ;
using RecordBase = SharedAllocationRecord< void , void > ;
using RecordCuda = SharedAllocationRecord< Kokkos::CudaSpace , void > ;
#if 0
// Copy the header from the allocation
Header head ;
Header const * const head_cuda = alloc_ptr ? Header::get_header( alloc_ptr ) : (Header*) 0 ;
if ( alloc_ptr ) {
Kokkos::Impl::DeepCopy<HostSpace,CudaSpace>::DeepCopy( & head , head_cuda , sizeof(SharedAllocationHeader) );
}
RecordCuda * const record = alloc_ptr ? static_cast< RecordCuda * >( head.m_record ) : (RecordCuda *) 0 ;
if ( ! alloc_ptr || record->m_alloc_ptr != head_cuda ) {
- Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::CudaSpace , void >::get_record ERROR" ) );
+ Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Impl::SharedAllocationRecord< Kokkos::CudaSpace , void >::get_record ERROR" ) );
}
#else
// Iterate the list to search for the record among all allocations
// requires obtaining the root of the list and then locking the list.
RecordCuda * const record = static_cast< RecordCuda * >( RecordBase::find( & s_root_record , alloc_ptr ) );
if ( record == 0 ) {
- Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::CudaSpace , void >::get_record ERROR" ) );
+ Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Impl::SharedAllocationRecord< Kokkos::CudaSpace , void >::get_record ERROR" ) );
}
#endif
return record ;
}
SharedAllocationRecord< Kokkos::CudaUVMSpace , void > *
SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::get_record( void * alloc_ptr )
{
using Header = SharedAllocationHeader ;
using RecordCuda = SharedAllocationRecord< Kokkos::CudaUVMSpace , void > ;
Header * const h = alloc_ptr ? reinterpret_cast< Header * >( alloc_ptr ) - 1 : (Header *) 0 ;
if ( ! alloc_ptr || h->m_record->m_alloc_ptr != h ) {
- Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::get_record ERROR" ) );
+ Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Impl::SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::get_record ERROR" ) );
}
return static_cast< RecordCuda * >( h->m_record );
}
SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void > *
SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::get_record( void * alloc_ptr )
{
using Header = SharedAllocationHeader ;
using RecordCuda = SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void > ;
Header * const h = alloc_ptr ? reinterpret_cast< Header * >( alloc_ptr ) - 1 : (Header *) 0 ;
if ( ! alloc_ptr || h->m_record->m_alloc_ptr != h ) {
- Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::get_record ERROR" ) );
+ Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Impl::SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::get_record ERROR" ) );
}
return static_cast< RecordCuda * >( h->m_record );
}
// Iterate records to print orphaned memory ...
void
SharedAllocationRecord< Kokkos::CudaSpace , void >::
print_records( std::ostream & s , const Kokkos::CudaSpace & space , bool detail )
{
SharedAllocationRecord< void , void > * r = & s_root_record ;
char buffer[256] ;
SharedAllocationHeader head ;
if ( detail ) {
do {
if ( r->m_alloc_ptr ) {
Kokkos::Impl::DeepCopy<HostSpace,CudaSpace>::DeepCopy( & head , r->m_alloc_ptr , sizeof(SharedAllocationHeader) );
}
else {
head.m_label[0] = 0 ;
}
//Formatting dependent on sizeof(uintptr_t)
const char * format_string;
if (sizeof(uintptr_t) == sizeof(unsigned long)) {
format_string = "Cuda addr( 0x%.12lx ) list( 0x%.12lx 0x%.12lx ) extent[ 0x%.12lx + %.8ld ] count(%d) dealloc(0x%.12lx) %s\n";
}
else if (sizeof(uintptr_t) == sizeof(unsigned long long)) {
format_string = "Cuda addr( 0x%.12llx ) list( 0x%.12llx 0x%.12llx ) extent[ 0x%.12llx + %.8ld ] count(%d) dealloc(0x%.12llx) %s\n";
}
snprintf( buffer , 256
, format_string
, reinterpret_cast<uintptr_t>( r )
, reinterpret_cast<uintptr_t>( r->m_prev )
, reinterpret_cast<uintptr_t>( r->m_next )
, reinterpret_cast<uintptr_t>( r->m_alloc_ptr )
, r->m_alloc_size
, r->m_count
, reinterpret_cast<uintptr_t>( r->m_dealloc )
, head.m_label
);
std::cout << buffer ;
r = r->m_next ;
} while ( r != & s_root_record );
}
else {
do {
if ( r->m_alloc_ptr ) {
Kokkos::Impl::DeepCopy<HostSpace,CudaSpace>::DeepCopy( & head , r->m_alloc_ptr , sizeof(SharedAllocationHeader) );
//Formatting dependent on sizeof(uintptr_t)
const char * format_string;
if (sizeof(uintptr_t) == sizeof(unsigned long)) {
format_string = "Cuda [ 0x%.12lx + %ld ] %s\n";
}
else if (sizeof(uintptr_t) == sizeof(unsigned long long)) {
format_string = "Cuda [ 0x%.12llx + %ld ] %s\n";
}
snprintf( buffer , 256
, format_string
, reinterpret_cast< uintptr_t >( r->data() )
, r->size()
, head.m_label
);
}
else {
snprintf( buffer , 256 , "Cuda [ 0 + 0 ]\n" );
}
std::cout << buffer ;
r = r->m_next ;
} while ( r != & s_root_record );
}
}
void
SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::
print_records( std::ostream & s , const Kokkos::CudaUVMSpace & space , bool detail )
{
SharedAllocationRecord< void , void >::print_host_accessible_records( s , "CudaUVM" , & s_root_record , detail );
}
void
SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::
print_records( std::ostream & s , const Kokkos::CudaHostPinnedSpace & space , bool detail )
{
SharedAllocationRecord< void , void >::print_host_accessible_records( s , "CudaHostPinned" , & s_root_record , detail );
}
} // namespace Impl
-} // namespace Experimental
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
namespace Kokkos {
namespace {
__global__ void init_lock_array_kernel_atomic() {
unsigned i = blockIdx.x*blockDim.x + threadIdx.x;
if(i<CUDA_SPACE_ATOMIC_MASK+1)
kokkos_impl_cuda_lock_arrays.atomic[i] = 0;
}
__global__ void init_lock_array_kernel_scratch_threadid(int N) {
unsigned i = blockIdx.x*blockDim.x + threadIdx.x;
if(i<N) {
kokkos_impl_cuda_lock_arrays.scratch[i] = 0;
kokkos_impl_cuda_lock_arrays.threadid[i] = 0;
}
}
}
namespace Impl {
int* atomic_lock_array_cuda_space_ptr(bool deallocate) {
static int* ptr = NULL;
if(deallocate) {
cudaFree(ptr);
ptr = NULL;
}
if(ptr==NULL && !deallocate)
cudaMalloc(&ptr,sizeof(int)*(CUDA_SPACE_ATOMIC_MASK+1));
return ptr;
}
int* scratch_lock_array_cuda_space_ptr(bool deallocate) {
static int* ptr = NULL;
if(deallocate) {
cudaFree(ptr);
ptr = NULL;
}
if(ptr==NULL && !deallocate)
cudaMalloc(&ptr,sizeof(int)*(Cuda::concurrency()));
return ptr;
}
int* threadid_lock_array_cuda_space_ptr(bool deallocate) {
static int* ptr = NULL;
if(deallocate) {
cudaFree(ptr);
ptr = NULL;
}
if(ptr==NULL && !deallocate)
cudaMalloc(&ptr,sizeof(int)*(Cuda::concurrency()));
return ptr;
}
void init_lock_arrays_cuda_space() {
static int is_initialized = 0;
if(! is_initialized) {
Kokkos::Impl::CudaLockArraysStruct locks;
locks.atomic = atomic_lock_array_cuda_space_ptr(false);
locks.scratch = scratch_lock_array_cuda_space_ptr(false);
locks.threadid = threadid_lock_array_cuda_space_ptr(false);
cudaMemcpyToSymbol( kokkos_impl_cuda_lock_arrays , & locks , sizeof(CudaLockArraysStruct) );
init_lock_array_kernel_atomic<<<(CUDA_SPACE_ATOMIC_MASK+255)/256,256>>>();
init_lock_array_kernel_scratch_threadid<<<(Kokkos::Cuda::concurrency()+255)/256,256>>>(Kokkos::Cuda::concurrency());
}
}
void* cuda_resize_scratch_space(size_t bytes, bool force_shrink) {
static void* ptr = NULL;
static size_t current_size = 0;
if(current_size == 0) {
current_size = bytes;
ptr = Kokkos::kokkos_malloc<Kokkos::CudaSpace>("CudaSpace::ScratchMemory",current_size);
}
if(bytes > current_size) {
current_size = bytes;
ptr = Kokkos::kokkos_realloc<Kokkos::CudaSpace>(ptr,current_size);
}
if((bytes < current_size) && (force_shrink)) {
current_size = bytes;
Kokkos::kokkos_free<Kokkos::CudaSpace>(ptr);
ptr = Kokkos::kokkos_malloc<Kokkos::CudaSpace>("CudaSpace::ScratchMemory",current_size);
}
return ptr;
}
}
}
#endif // KOKKOS_HAVE_CUDA
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Impl.cpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Impl.cpp
index 2d8d07d07..59e79bba2 100644
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Impl.cpp
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Impl.cpp
@@ -1,778 +1,778 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
/*--------------------------------------------------------------------------*/
/* Kokkos interfaces */
#include <Kokkos_Core.hpp>
/* only compile this file if CUDA is enabled for Kokkos */
#ifdef KOKKOS_HAVE_CUDA
#include <Cuda/Kokkos_Cuda_Error.hpp>
#include <Cuda/Kokkos_Cuda_Internal.hpp>
#include <impl/Kokkos_Error.hpp>
#include <impl/Kokkos_Profiling_Interface.hpp>
/*--------------------------------------------------------------------------*/
/* Standard 'C' libraries */
#include <stdlib.h>
/* Standard 'C++' libraries */
#include <vector>
#include <iostream>
#include <sstream>
#include <string>
#ifdef KOKKOS_CUDA_USE_RELOCATABLE_DEVICE_CODE
__device__ __constant__
unsigned long kokkos_impl_cuda_constant_memory_buffer[ Kokkos::Impl::CudaTraits::ConstantMemoryUsage / sizeof(unsigned long) ] ;
__device__ __constant__
Kokkos::Impl::CudaLockArraysStruct kokkos_impl_cuda_lock_arrays ;
#endif
/*--------------------------------------------------------------------------*/
namespace Kokkos {
namespace Impl {
namespace {
__global__
void query_cuda_kernel_arch( int * d_arch )
{
#if defined( __CUDA_ARCH__ )
*d_arch = __CUDA_ARCH__ ;
#else
*d_arch = 0 ;
#endif
}
/** Query what compute capability is actually launched to the device: */
int cuda_kernel_arch()
{
int * d_arch = 0 ;
cudaMalloc( (void **) & d_arch , sizeof(int) );
query_cuda_kernel_arch<<<1,1>>>( d_arch );
int arch = 0 ;
cudaMemcpy( & arch , d_arch , sizeof(int) , cudaMemcpyDefault );
cudaFree( d_arch );
return arch ;
}
bool cuda_launch_blocking()
{
const char * env = getenv("CUDA_LAUNCH_BLOCKING");
if (env == 0) return false;
return atoi(env);
}
}
void cuda_device_synchronize()
{
// static const bool launch_blocking = cuda_launch_blocking();
// if (!launch_blocking) {
CUDA_SAFE_CALL( cudaDeviceSynchronize() );
// }
}
void cuda_internal_error_throw( cudaError e , const char * name, const char * file, const int line )
{
std::ostringstream out ;
out << name << " error( " << cudaGetErrorName(e) << "): " << cudaGetErrorString(e);
if (file) {
out << " " << file << ":" << line;
}
throw_runtime_exception( out.str() );
}
//----------------------------------------------------------------------------
// Some significant cuda device properties:
//
// cudaDeviceProp::name : Text label for device
// cudaDeviceProp::major : Device major number
// cudaDeviceProp::minor : Device minor number
// cudaDeviceProp::warpSize : number of threads per warp
// cudaDeviceProp::multiProcessorCount : number of multiprocessors
// cudaDeviceProp::sharedMemPerBlock : capacity of shared memory per block
// cudaDeviceProp::totalConstMem : capacity of constant memory
// cudaDeviceProp::totalGlobalMem : capacity of global memory
// cudaDeviceProp::maxGridSize[3] : maximum grid size
//
// Section 4.4.2.4 of the CUDA Toolkit Reference Manual
//
// struct cudaDeviceProp {
// char name[256];
// size_t totalGlobalMem;
// size_t sharedMemPerBlock;
// int regsPerBlock;
// int warpSize;
// size_t memPitch;
// int maxThreadsPerBlock;
// int maxThreadsDim[3];
// int maxGridSize[3];
// size_t totalConstMem;
// int major;
// int minor;
// int clockRate;
// size_t textureAlignment;
// int deviceOverlap;
// int multiProcessorCount;
// int kernelExecTimeoutEnabled;
// int integrated;
// int canMapHostMemory;
// int computeMode;
// int concurrentKernels;
// int ECCEnabled;
// int pciBusID;
// int pciDeviceID;
// int tccDriver;
// int asyncEngineCount;
// int unifiedAddressing;
// int memoryClockRate;
// int memoryBusWidth;
// int l2CacheSize;
// int maxThreadsPerMultiProcessor;
// };
namespace {
class CudaInternalDevices {
public:
enum { MAXIMUM_DEVICE_COUNT = 64 };
struct cudaDeviceProp m_cudaProp[ MAXIMUM_DEVICE_COUNT ] ;
int m_cudaDevCount ;
CudaInternalDevices();
static const CudaInternalDevices & singleton();
};
CudaInternalDevices::CudaInternalDevices()
{
// See 'cudaSetDeviceFlags' for host-device thread interaction
// Section 4.4.2.6 of the CUDA Toolkit Reference Manual
CUDA_SAFE_CALL (cudaGetDeviceCount( & m_cudaDevCount ) );
if(m_cudaDevCount > MAXIMUM_DEVICE_COUNT) {
Kokkos::abort("Sorry, you have more GPUs per node than we thought anybody would ever have. Please report this to github.com/kokkos/kokkos.");
}
for ( int i = 0 ; i < m_cudaDevCount ; ++i ) {
CUDA_SAFE_CALL( cudaGetDeviceProperties( m_cudaProp + i , i ) );
}
}
const CudaInternalDevices & CudaInternalDevices::singleton()
{
static CudaInternalDevices self ; return self ;
}
}
//----------------------------------------------------------------------------
class CudaInternal {
private:
CudaInternal( const CudaInternal & );
CudaInternal & operator = ( const CudaInternal & );
public:
typedef Cuda::size_type size_type ;
int m_cudaDev ;
int m_cudaArch ;
unsigned m_multiProcCount ;
unsigned m_maxWarpCount ;
unsigned m_maxBlock ;
unsigned m_maxSharedWords ;
size_type m_scratchSpaceCount ;
size_type m_scratchFlagsCount ;
size_type m_scratchUnifiedCount ;
size_type m_scratchUnifiedSupported ;
size_type m_streamCount ;
size_type * m_scratchSpace ;
size_type * m_scratchFlags ;
size_type * m_scratchUnified ;
cudaStream_t * m_stream ;
static int was_initialized;
static int was_finalized;
static CudaInternal & singleton();
int verify_is_initialized( const char * const label ) const ;
int is_initialized() const
{ return 0 != m_scratchSpace && 0 != m_scratchFlags ; }
void initialize( int cuda_device_id , int stream_count );
void finalize();
void print_configuration( std::ostream & ) const ;
~CudaInternal();
CudaInternal()
: m_cudaDev( -1 )
, m_cudaArch( -1 )
, m_multiProcCount( 0 )
, m_maxWarpCount( 0 )
, m_maxBlock( 0 )
, m_maxSharedWords( 0 )
, m_scratchSpaceCount( 0 )
, m_scratchFlagsCount( 0 )
, m_scratchUnifiedCount( 0 )
, m_scratchUnifiedSupported( 0 )
, m_streamCount( 0 )
, m_scratchSpace( 0 )
, m_scratchFlags( 0 )
, m_scratchUnified( 0 )
, m_stream( 0 )
{}
size_type * scratch_space( const size_type size );
size_type * scratch_flags( const size_type size );
size_type * scratch_unified( const size_type size );
};
int CudaInternal::was_initialized = 0;
int CudaInternal::was_finalized = 0;
//----------------------------------------------------------------------------
void CudaInternal::print_configuration( std::ostream & s ) const
{
const CudaInternalDevices & dev_info = CudaInternalDevices::singleton();
#if defined( KOKKOS_HAVE_CUDA )
s << "macro KOKKOS_HAVE_CUDA : defined" << std::endl ;
#endif
#if defined( CUDA_VERSION )
s << "macro CUDA_VERSION = " << CUDA_VERSION
<< " = version " << CUDA_VERSION / 1000
<< "." << ( CUDA_VERSION % 1000 ) / 10
<< std::endl ;
#endif
for ( int i = 0 ; i < dev_info.m_cudaDevCount ; ++i ) {
s << "Kokkos::Cuda[ " << i << " ] "
<< dev_info.m_cudaProp[i].name
<< " capability " << dev_info.m_cudaProp[i].major << "." << dev_info.m_cudaProp[i].minor
<< ", Total Global Memory: " << human_memory_size(dev_info.m_cudaProp[i].totalGlobalMem)
<< ", Shared Memory per Block: " << human_memory_size(dev_info.m_cudaProp[i].sharedMemPerBlock);
if ( m_cudaDev == i ) s << " : Selected" ;
s << std::endl ;
}
}
//----------------------------------------------------------------------------
CudaInternal::~CudaInternal()
{
if ( m_stream ||
m_scratchSpace ||
m_scratchFlags ||
m_scratchUnified ) {
std::cerr << "Kokkos::Cuda ERROR: Failed to call Kokkos::Cuda::finalize()"
<< std::endl ;
std::cerr.flush();
}
m_cudaDev = -1 ;
m_cudaArch = -1 ;
m_multiProcCount = 0 ;
m_maxWarpCount = 0 ;
m_maxBlock = 0 ;
m_maxSharedWords = 0 ;
m_scratchSpaceCount = 0 ;
m_scratchFlagsCount = 0 ;
m_scratchUnifiedCount = 0 ;
m_scratchUnifiedSupported = 0 ;
m_streamCount = 0 ;
m_scratchSpace = 0 ;
m_scratchFlags = 0 ;
m_scratchUnified = 0 ;
m_stream = 0 ;
}
int CudaInternal::verify_is_initialized( const char * const label ) const
{
if ( m_cudaDev < 0 ) {
std::cerr << "Kokkos::Cuda::" << label << " : ERROR device not initialized" << std::endl ;
}
return 0 <= m_cudaDev ;
}
CudaInternal & CudaInternal::singleton()
{
static CudaInternal self ;
return self ;
}
void CudaInternal::initialize( int cuda_device_id , int stream_count )
{
if ( was_finalized ) Kokkos::abort("Calling Cuda::initialize after Cuda::finalize is illegal\n");
was_initialized = 1;
if ( is_initialized() ) return;
enum { WordSize = sizeof(size_type) };
if ( ! HostSpace::execution_space::is_initialized() ) {
const std::string msg("Cuda::initialize ERROR : HostSpace::execution_space is not initialized");
throw_runtime_exception( msg );
}
const CudaInternalDevices & dev_info = CudaInternalDevices::singleton();
const bool ok_init = 0 == m_scratchSpace || 0 == m_scratchFlags ;
const bool ok_id = 0 <= cuda_device_id &&
cuda_device_id < dev_info.m_cudaDevCount ;
- // Need device capability 2.0 or better
+ // Need device capability 3.0 or better
const bool ok_dev = ok_id &&
- ( 2 <= dev_info.m_cudaProp[ cuda_device_id ].major &&
+ ( 3 <= dev_info.m_cudaProp[ cuda_device_id ].major &&
0 <= dev_info.m_cudaProp[ cuda_device_id ].minor );
if ( ok_init && ok_dev ) {
const struct cudaDeviceProp & cudaProp =
dev_info.m_cudaProp[ cuda_device_id ];
m_cudaDev = cuda_device_id ;
CUDA_SAFE_CALL( cudaSetDevice( m_cudaDev ) );
CUDA_SAFE_CALL( cudaDeviceReset() );
Kokkos::Impl::cuda_device_synchronize();
// Query what compute capability architecture a kernel executes:
m_cudaArch = cuda_kernel_arch();
if ( m_cudaArch != cudaProp.major * 100 + cudaProp.minor * 10 ) {
std::cerr << "Kokkos::Cuda::initialize WARNING: running kernels compiled for compute capability "
<< ( m_cudaArch / 100 ) << "." << ( ( m_cudaArch % 100 ) / 10 )
<< " on device with compute capability "
<< cudaProp.major << "." << cudaProp.minor
<< " , this will likely reduce potential performance."
<< std::endl ;
}
// number of multiprocessors
m_multiProcCount = cudaProp.multiProcessorCount ;
//----------------------------------
// Maximum number of warps,
// at most one warp per thread in a warp for reduction.
// HCE 2012-February :
// Found bug in CUDA 4.1 that sometimes a kernel launch would fail
// if the thread count == 1024 and a functor is passed to the kernel.
// Copying the kernel to constant memory and then launching with
// thread count == 1024 would work fine.
//
// HCE 2012-October :
// All compute capabilities support at least 16 warps (512 threads).
// However, we have found that 8 warps typically gives better performance.
m_maxWarpCount = 8 ;
// m_maxWarpCount = cudaProp.maxThreadsPerBlock / Impl::CudaTraits::WarpSize ;
if ( Impl::CudaTraits::WarpSize < m_maxWarpCount ) {
m_maxWarpCount = Impl::CudaTraits::WarpSize ;
}
m_maxSharedWords = cudaProp.sharedMemPerBlock / WordSize ;
//----------------------------------
// Maximum number of blocks:
- m_maxBlock = m_cudaArch < 300 ? 65535 : cudaProp.maxGridSize[0] ;
+ m_maxBlock = cudaProp.maxGridSize[0] ;
//----------------------------------
m_scratchUnifiedSupported = cudaProp.unifiedAddressing ;
if ( ! m_scratchUnifiedSupported ) {
std::cout << "Kokkos::Cuda device "
<< cudaProp.name << " capability "
<< cudaProp.major << "." << cudaProp.minor
<< " does not support unified virtual address space"
<< std::endl ;
}
//----------------------------------
// Multiblock reduction uses scratch flags for counters
// and scratch space for partial reduction values.
// Allocate some initial space. This will grow as needed.
{
const unsigned reduce_block_count = m_maxWarpCount * Impl::CudaTraits::WarpSize ;
(void) scratch_unified( 16 * sizeof(size_type) );
(void) scratch_flags( reduce_block_count * 2 * sizeof(size_type) );
(void) scratch_space( reduce_block_count * 16 * sizeof(size_type) );
}
//----------------------------------
if ( stream_count ) {
m_stream = (cudaStream_t*) ::malloc( stream_count * sizeof(cudaStream_t) );
m_streamCount = stream_count ;
for ( size_type i = 0 ; i < m_streamCount ; ++i ) m_stream[i] = 0 ;
}
}
else {
std::ostringstream msg ;
msg << "Kokkos::Cuda::initialize(" << cuda_device_id << ") FAILED" ;
if ( ! ok_init ) {
msg << " : Already initialized" ;
}
if ( ! ok_id ) {
msg << " : Device identifier out of range "
<< "[0.." << dev_info.m_cudaDevCount << "]" ;
}
else if ( ! ok_dev ) {
msg << " : Device " ;
msg << dev_info.m_cudaProp[ cuda_device_id ].major ;
msg << "." ;
msg << dev_info.m_cudaProp[ cuda_device_id ].minor ;
- msg << " has insufficient capability, required 2.0 or better" ;
+ msg << " has insufficient capability, required 3.0 or better" ;
}
Kokkos::Impl::throw_runtime_exception( msg.str() );
}
#ifdef KOKKOS_CUDA_USE_UVM
if(!cuda_launch_blocking()) {
std::cout << "Kokkos::Cuda::initialize WARNING: Cuda is allocating into UVMSpace by default" << std::endl;
std::cout << " without setting CUDA_LAUNCH_BLOCKING=1." << std::endl;
std::cout << " The code must call Cuda::fence() after each kernel" << std::endl;
std::cout << " or will likely crash when accessing data on the host." << std::endl;
}
const char * env_force_device_alloc = getenv("CUDA_MANAGED_FORCE_DEVICE_ALLOC");
bool force_device_alloc;
if (env_force_device_alloc == 0) force_device_alloc=false;
else force_device_alloc=atoi(env_force_device_alloc)!=0;
const char * env_visible_devices = getenv("CUDA_VISIBLE_DEVICES");
bool visible_devices_one=true;
if (env_visible_devices == 0) visible_devices_one=false;
if(!visible_devices_one && !force_device_alloc) {
std::cout << "Kokkos::Cuda::initialize WARNING: Cuda is allocating into UVMSpace by default" << std::endl;
std::cout << " without setting CUDA_MANAGED_FORCE_DEVICE_ALLOC=1 or " << std::endl;
std::cout << " setting CUDA_VISIBLE_DEVICES." << std::endl;
std::cout << " This could on multi GPU systems lead to severe performance" << std::endl;
std::cout << " penalties." << std::endl;
}
#endif
cudaThreadSetCacheConfig(cudaFuncCachePreferShared);
// Init the array for used for arbitrarily sized atomics
Impl::init_lock_arrays_cuda_space();
#ifdef KOKKOS_CUDA_USE_RELOCATABLE_DEVICE_CODE
Kokkos::Impl::CudaLockArraysStruct locks;
locks.atomic = atomic_lock_array_cuda_space_ptr(false);
locks.scratch = scratch_lock_array_cuda_space_ptr(false);
locks.threadid = threadid_lock_array_cuda_space_ptr(false);
cudaMemcpyToSymbol( kokkos_impl_cuda_lock_arrays , & locks , sizeof(CudaLockArraysStruct) );
#endif
}
//----------------------------------------------------------------------------
typedef Cuda::size_type ScratchGrain[ Impl::CudaTraits::WarpSize ] ;
enum { sizeScratchGrain = sizeof(ScratchGrain) };
Cuda::size_type *
CudaInternal::scratch_flags( const Cuda::size_type size )
{
if ( verify_is_initialized("scratch_flags") && m_scratchFlagsCount * sizeScratchGrain < size ) {
m_scratchFlagsCount = ( size + sizeScratchGrain - 1 ) / sizeScratchGrain ;
typedef Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::CudaSpace , void > Record ;
Record * const r = Record::allocate( Kokkos::CudaSpace()
, "InternalScratchFlags"
, ( sizeof( ScratchGrain ) * m_scratchFlagsCount ) );
Record::increment( r );
m_scratchFlags = reinterpret_cast<size_type *>( r->data() );
CUDA_SAFE_CALL( cudaMemset( m_scratchFlags , 0 , m_scratchFlagsCount * sizeScratchGrain ) );
}
return m_scratchFlags ;
}
Cuda::size_type *
CudaInternal::scratch_space( const Cuda::size_type size )
{
if ( verify_is_initialized("scratch_space") && m_scratchSpaceCount * sizeScratchGrain < size ) {
m_scratchSpaceCount = ( size + sizeScratchGrain - 1 ) / sizeScratchGrain ;
typedef Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::CudaSpace , void > Record ;
Record * const r = Record::allocate( Kokkos::CudaSpace()
, "InternalScratchSpace"
, ( sizeof( ScratchGrain ) * m_scratchSpaceCount ) );
Record::increment( r );
m_scratchSpace = reinterpret_cast<size_type *>( r->data() );
}
return m_scratchSpace ;
}
Cuda::size_type *
CudaInternal::scratch_unified( const Cuda::size_type size )
{
if ( verify_is_initialized("scratch_unified") &&
m_scratchUnifiedSupported && m_scratchUnifiedCount * sizeScratchGrain < size ) {
m_scratchUnifiedCount = ( size + sizeScratchGrain - 1 ) / sizeScratchGrain ;
typedef Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void > Record ;
Record * const r = Record::allocate( Kokkos::CudaHostPinnedSpace()
, "InternalScratchUnified"
, ( sizeof( ScratchGrain ) * m_scratchUnifiedCount ) );
Record::increment( r );
m_scratchUnified = reinterpret_cast<size_type *>( r->data() );
}
return m_scratchUnified ;
}
//----------------------------------------------------------------------------
void CudaInternal::finalize()
{
was_finalized = 1;
if ( 0 != m_scratchSpace || 0 != m_scratchFlags ) {
atomic_lock_array_cuda_space_ptr(false);
scratch_lock_array_cuda_space_ptr(false);
threadid_lock_array_cuda_space_ptr(false);
if ( m_stream ) {
for ( size_type i = 1 ; i < m_streamCount ; ++i ) {
cudaStreamDestroy( m_stream[i] );
m_stream[i] = 0 ;
}
::free( m_stream );
}
typedef Kokkos::Experimental::Impl::SharedAllocationRecord< CudaSpace > RecordCuda ;
typedef Kokkos::Experimental::Impl::SharedAllocationRecord< CudaHostPinnedSpace > RecordHost ;
RecordCuda::decrement( RecordCuda::get_record( m_scratchFlags ) );
RecordCuda::decrement( RecordCuda::get_record( m_scratchSpace ) );
RecordHost::decrement( RecordHost::get_record( m_scratchUnified ) );
m_cudaDev = -1 ;
m_multiProcCount = 0 ;
m_maxWarpCount = 0 ;
m_maxBlock = 0 ;
m_maxSharedWords = 0 ;
m_scratchSpaceCount = 0 ;
m_scratchFlagsCount = 0 ;
m_scratchUnifiedCount = 0 ;
m_streamCount = 0 ;
m_scratchSpace = 0 ;
m_scratchFlags = 0 ;
m_scratchUnified = 0 ;
m_stream = 0 ;
}
}
//----------------------------------------------------------------------------
Cuda::size_type cuda_internal_multiprocessor_count()
{ return CudaInternal::singleton().m_multiProcCount ; }
Cuda::size_type cuda_internal_maximum_warp_count()
{ return CudaInternal::singleton().m_maxWarpCount ; }
Cuda::size_type cuda_internal_maximum_grid_count()
{ return CudaInternal::singleton().m_maxBlock ; }
Cuda::size_type cuda_internal_maximum_shared_words()
{ return CudaInternal::singleton().m_maxSharedWords ; }
Cuda::size_type * cuda_internal_scratch_space( const Cuda::size_type size )
{ return CudaInternal::singleton().scratch_space( size ); }
Cuda::size_type * cuda_internal_scratch_flags( const Cuda::size_type size )
{ return CudaInternal::singleton().scratch_flags( size ); }
Cuda::size_type * cuda_internal_scratch_unified( const Cuda::size_type size )
{ return CudaInternal::singleton().scratch_unified( size ); }
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
namespace Kokkos {
Cuda::size_type Cuda::detect_device_count()
{ return Impl::CudaInternalDevices::singleton().m_cudaDevCount ; }
int Cuda::concurrency() {
return 131072;
}
int Cuda::is_initialized()
{ return Impl::CudaInternal::singleton().is_initialized(); }
void Cuda::initialize( const Cuda::SelectDevice config , size_t num_instances )
{
Impl::CudaInternal::singleton().initialize( config.cuda_device_id , num_instances );
#if (KOKKOS_ENABLE_PROFILING)
Kokkos::Profiling::initialize();
#endif
}
std::vector<unsigned>
Cuda::detect_device_arch()
{
const Impl::CudaInternalDevices & s = Impl::CudaInternalDevices::singleton();
std::vector<unsigned> output( s.m_cudaDevCount );
for ( int i = 0 ; i < s.m_cudaDevCount ; ++i ) {
output[i] = s.m_cudaProp[i].major * 100 + s.m_cudaProp[i].minor ;
}
return output ;
}
Cuda::size_type Cuda::device_arch()
{
const int dev_id = Impl::CudaInternal::singleton().m_cudaDev ;
int dev_arch = 0 ;
if ( 0 <= dev_id ) {
const struct cudaDeviceProp & cudaProp =
Impl::CudaInternalDevices::singleton().m_cudaProp[ dev_id ] ;
dev_arch = cudaProp.major * 100 + cudaProp.minor ;
}
return dev_arch ;
}
void Cuda::finalize()
{
Impl::CudaInternal::singleton().finalize();
#if (KOKKOS_ENABLE_PROFILING)
Kokkos::Profiling::finalize();
#endif
}
Cuda::Cuda()
: m_device( Impl::CudaInternal::singleton().m_cudaDev )
, m_stream( 0 )
{
Impl::CudaInternal::singleton().verify_is_initialized( "Cuda instance constructor" );
}
Cuda::Cuda( const int instance_id )
: m_device( Impl::CudaInternal::singleton().m_cudaDev )
, m_stream(
Impl::CudaInternal::singleton().verify_is_initialized( "Cuda instance constructor" )
? Impl::CudaInternal::singleton().m_stream[ instance_id % Impl::CudaInternal::singleton().m_streamCount ]
: 0 )
{}
void Cuda::print_configuration( std::ostream & s , const bool )
{ Impl::CudaInternal::singleton().print_configuration( s ); }
bool Cuda::sleep() { return false ; }
bool Cuda::wake() { return true ; }
void Cuda::fence()
{
Kokkos::Impl::cuda_device_synchronize();
}
} // namespace Kokkos
#endif // KOKKOS_HAVE_CUDA
//----------------------------------------------------------------------------
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel.hpp
index 7afa06fdf..12a639fd4 100644
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel.hpp
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel.hpp
@@ -1,1926 +1,1926 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_CUDA_PARALLEL_HPP
#define KOKKOS_CUDA_PARALLEL_HPP
#include <iostream>
#include <algorithm>
#include <stdio.h>
#include <Kokkos_Macros.hpp>
/* only compile this file if CUDA is enabled for Kokkos */
#if defined( __CUDACC__ ) && defined( KOKKOS_HAVE_CUDA )
#include <utility>
#include <Kokkos_Parallel.hpp>
#include <Cuda/Kokkos_CudaExec.hpp>
#include <Cuda/Kokkos_Cuda_ReduceScan.hpp>
#include <Cuda/Kokkos_Cuda_Internal.hpp>
#include <Kokkos_Vectorization.hpp>
#if (KOKKOS_ENABLE_PROFILING)
#include <impl/Kokkos_Profiling_Interface.hpp>
#include <typeinfo>
#endif
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template< typename Type >
struct CudaJoinFunctor {
typedef Type value_type ;
KOKKOS_INLINE_FUNCTION
static void join( volatile value_type & update ,
volatile const value_type & input )
{ update += input ; }
};
class CudaTeamMember {
private:
typedef Kokkos::Cuda execution_space ;
typedef execution_space::scratch_memory_space scratch_memory_space ;
void * m_team_reduce ;
scratch_memory_space m_team_shared ;
int m_league_rank ;
int m_league_size ;
public:
-#if defined( __CUDA_ARCH__ )
-
- __device__ inline
+ KOKKOS_INLINE_FUNCTION
const execution_space::scratch_memory_space & team_shmem() const
{ return m_team_shared.set_team_thread_mode(0,1,0) ; }
- __device__ inline
+ KOKKOS_INLINE_FUNCTION
const execution_space::scratch_memory_space & team_scratch(const int& level) const
{ return m_team_shared.set_team_thread_mode(level,1,0) ; }
- __device__ inline
+ KOKKOS_INLINE_FUNCTION
const execution_space::scratch_memory_space & thread_scratch(const int& level) const
{ return m_team_shared.set_team_thread_mode(level,team_size(),team_rank()) ; }
- __device__ inline int league_rank() const { return m_league_rank ; }
- __device__ inline int league_size() const { return m_league_size ; }
- __device__ inline int team_rank() const { return threadIdx.y ; }
- __device__ inline int team_size() const { return blockDim.y ; }
+ KOKKOS_INLINE_FUNCTION int league_rank() const { return m_league_rank ; }
+ KOKKOS_INLINE_FUNCTION int league_size() const { return m_league_size ; }
+ KOKKOS_INLINE_FUNCTION int team_rank() const {
+ #ifdef __CUDA_ARCH__
+ return threadIdx.y ;
+ #else
+ return 1;
+ #endif
+ }
+ KOKKOS_INLINE_FUNCTION int team_size() const {
+ #ifdef __CUDA_ARCH__
+ return blockDim.y ;
+ #else
+ return 1;
+ #endif
+ }
- __device__ inline void team_barrier() const { __syncthreads(); }
+ KOKKOS_INLINE_FUNCTION void team_barrier() const {
+ #ifdef __CUDA_ARCH__
+ __syncthreads();
+ #endif
+ }
template<class ValueType>
- __device__ inline void team_broadcast(ValueType& value, const int& thread_id) const {
+ KOKKOS_INLINE_FUNCTION void team_broadcast(ValueType& value, const int& thread_id) const {
+ #ifdef __CUDA_ARCH__
__shared__ ValueType sh_val;
if(threadIdx.x == 0 && threadIdx.y == thread_id) {
sh_val = value;
}
team_barrier();
value = sh_val;
team_barrier();
+ #endif
}
-#ifdef KOKKOS_HAVE_CXX11
template< class ValueType, class JoinOp >
- __device__ inline
+ KOKKOS_INLINE_FUNCTION
typename JoinOp::value_type team_reduce( const ValueType & value
- , const JoinOp & op_in ) const
- {
+ , const JoinOp & op_in ) const {
+ #ifdef __CUDA_ARCH__
typedef JoinLambdaAdapter<ValueType,JoinOp> JoinOpFunctor ;
const JoinOpFunctor op(op_in);
ValueType * const base_data = (ValueType *) m_team_reduce ;
-#else
- template< class JoinOp >
- __device__ inline
- typename JoinOp::value_type team_reduce( const typename JoinOp::value_type & value
- , const JoinOp & op ) const
- {
- typedef JoinOp JoinOpFunctor ;
- typename JoinOp::value_type * const base_data = (typename JoinOp::value_type *) m_team_reduce ;
-#endif
__syncthreads(); // Don't write in to shared data until all threads have entered this function
if ( 0 == threadIdx.y ) { base_data[0] = 0 ; }
base_data[ threadIdx.y ] = value ;
Impl::cuda_intra_block_reduce_scan<false,JoinOpFunctor,void>( op , base_data );
return base_data[ blockDim.y - 1 ];
+ #else
+ return typename JoinOp::value_type();
+ #endif
}
/** \brief Intra-team exclusive prefix sum with team_rank() ordering
* with intra-team non-deterministic ordering accumulation.
*
* The global inter-team accumulation value will, at the end of the
* league's parallel execution, be the scan's total.
* Parallel execution ordering of the league's teams is non-deterministic.
* As such the base value for each team's scan operation is similarly
* non-deterministic.
*/
template< typename Type >
- __device__ inline Type team_scan( const Type & value , Type * const global_accum ) const
- {
+ KOKKOS_INLINE_FUNCTION Type team_scan( const Type & value , Type * const global_accum ) const {
+ #ifdef __CUDA_ARCH__
Type * const base_data = (Type *) m_team_reduce ;
__syncthreads(); // Don't write in to shared data until all threads have entered this function
if ( 0 == threadIdx.y ) { base_data[0] = 0 ; }
base_data[ threadIdx.y + 1 ] = value ;
Impl::cuda_intra_block_reduce_scan<true,Impl::CudaJoinFunctor<Type>,void>( Impl::CudaJoinFunctor<Type>() , base_data + 1 );
if ( global_accum ) {
if ( blockDim.y == threadIdx.y + 1 ) {
base_data[ blockDim.y ] = atomic_fetch_add( global_accum , base_data[ blockDim.y ] );
}
__syncthreads(); // Wait for atomic
base_data[ threadIdx.y ] += base_data[ blockDim.y ] ;
}
return base_data[ threadIdx.y ];
+ #else
+ return Type();
+ #endif
}
/** \brief Intra-team exclusive prefix sum with team_rank() ordering.
*
* The highest rank thread can compute the reduction total as
* reduction_total = dev.team_scan( value ) + value ;
*/
template< typename Type >
- __device__ inline Type team_scan( const Type & value ) const
- { return this->template team_scan<Type>( value , 0 ); }
+ KOKKOS_INLINE_FUNCTION Type team_scan( const Type & value ) const {
+ return this->template team_scan<Type>( value , 0 );
+ }
//----------------------------------------
// Private for the driver
- __device__ inline
+ KOKKOS_INLINE_FUNCTION
CudaTeamMember( void * shared
, const int shared_begin
, const int shared_size
, void* scratch_level_1_ptr
, const int scratch_level_1_size
, const int arg_league_rank
, const int arg_league_size )
: m_team_reduce( shared )
, m_team_shared( ((char *)shared) + shared_begin , shared_size, scratch_level_1_ptr, scratch_level_1_size)
- , m_league_rank( arg_league_rank )
- , m_league_size( arg_league_size )
+ , m_league_rank( arg_league_rank )
+ , m_league_size( arg_league_size )
{}
-#else
-
- const execution_space::scratch_memory_space & team_shmem() const
- { return m_team_shared.set_team_thread_mode(0, 1,0) ; }
- const execution_space::scratch_memory_space & team_scratch(const int& level) const
- { return m_team_shared.set_team_thread_mode(level,1,0) ; }
- const execution_space::scratch_memory_space & thread_scratch(const int& level) const
- { return m_team_shared.set_team_thread_mode(level,team_size(),team_rank()) ; }
-
- int league_rank() const {return 0;}
- int league_size() const {return 1;}
- int team_rank() const {return 0;}
- int team_size() const {return 1;}
-
- void team_barrier() const {}
- template<class ValueType>
- void team_broadcast(ValueType& value, const int& thread_id) const {}
-
- template< class JoinOp >
- typename JoinOp::value_type team_reduce( const typename JoinOp::value_type & value
- , const JoinOp & op ) const {return typename JoinOp::value_type();}
-
- template< typename Type >
- Type team_scan( const Type & value , Type * const global_accum ) const {return Type();}
-
- template< typename Type >
- Type team_scan( const Type & value ) const {return Type();}
-
- //----------------------------------------
- // Private for the driver
-
- CudaTeamMember( void * shared
- , const int shared_begin
- , const int shared_end
- , void* scratch_level_1_ptr
- , const int scratch_level_1_size
- , const int arg_league_rank
- , const int arg_league_size );
-
-#endif /* #if ! defined( __CUDA_ARCH__ ) */
-
};
} // namespace Impl
namespace Impl {
template< class ... Properties >
class TeamPolicyInternal< Kokkos::Cuda , Properties ... >: public PolicyTraits<Properties ... >
{
public:
//! Tag this class as a kokkos execution policy
typedef TeamPolicyInternal execution_policy ;
typedef PolicyTraits<Properties ... > traits;
private:
enum { MAX_WARP = 8 };
int m_league_size ;
int m_team_size ;
int m_vector_length ;
int m_team_scratch_size[2] ;
int m_thread_scratch_size[2] ;
int m_chunk_size;
public:
//! Execution space of this execution policy
typedef Kokkos::Cuda execution_space ;
TeamPolicyInternal& operator = (const TeamPolicyInternal& p) {
m_league_size = p.m_league_size;
m_team_size = p.m_team_size;
m_vector_length = p.m_vector_length;
m_team_scratch_size[0] = p.m_team_scratch_size[0];
m_team_scratch_size[1] = p.m_team_scratch_size[1];
m_thread_scratch_size[0] = p.m_thread_scratch_size[0];
m_thread_scratch_size[1] = p.m_thread_scratch_size[1];
m_chunk_size = p.m_chunk_size;
return *this;
}
//----------------------------------------
template< class FunctorType >
inline static
int team_size_max( const FunctorType & functor )
{
int n = MAX_WARP * Impl::CudaTraits::WarpSize ;
for ( ; n ; n >>= 1 ) {
const int shmem_size =
/* for global reduce */ Impl::cuda_single_inter_block_reduce_scan_shmem<false,FunctorType,typename traits::work_tag>( functor , n )
/* for team reduce */ + ( n + 2 ) * sizeof(double)
/* for team shared */ + Impl::FunctorTeamShmemSize< FunctorType >::value( functor , n );
if ( shmem_size < Impl::CudaTraits::SharedMemoryCapacity ) break ;
}
return n ;
}
template< class FunctorType >
static int team_size_recommended( const FunctorType & functor )
{ return team_size_max( functor ); }
template< class FunctorType >
static int team_size_recommended( const FunctorType & functor , const int vector_length)
{
int max = team_size_max( functor )/vector_length;
if(max<1) max = 1;
return max;
}
inline static
int vector_length_max()
{ return Impl::CudaTraits::WarpSize; }
//----------------------------------------
inline int vector_length() const { return m_vector_length ; }
inline int team_size() const { return m_team_size ; }
inline int league_size() const { return m_league_size ; }
inline int scratch_size(int level, int team_size_ = -1) const {
if(team_size_<0) team_size_ = m_team_size;
return m_team_scratch_size[level] + team_size_*m_thread_scratch_size[level];
}
inline size_t team_scratch_size(int level) const {
return m_team_scratch_size[level];
}
inline size_t thread_scratch_size(int level) const {
return m_thread_scratch_size[level];
}
TeamPolicyInternal()
: m_league_size( 0 )
, m_team_size( 0 )
, m_vector_length( 0 )
, m_team_scratch_size {0,0}
, m_thread_scratch_size {0,0}
- , m_chunk_size ( 32 )
+ , m_chunk_size ( 32 )
{}
/** \brief Specify league size, request team size */
TeamPolicyInternal( execution_space &
, int league_size_
, int team_size_request
, int vector_length_request = 1 )
: m_league_size( league_size_ )
, m_team_size( team_size_request )
, m_vector_length( vector_length_request )
, m_team_scratch_size {0,0}
, m_thread_scratch_size {0,0}
, m_chunk_size ( 32 )
{
// Allow only power-of-two vector_length
if ( ! Kokkos::Impl::is_integral_power_of_two( vector_length_request ) ) {
Impl::throw_runtime_exception( "Requested non-power-of-two vector length for TeamPolicy.");
}
// Make sure league size is permissable
if(league_size_ >= int(Impl::cuda_internal_maximum_grid_count()))
Impl::throw_runtime_exception( "Requested too large league_size for TeamPolicy on Cuda execution space.");
// Make sure total block size is permissable
if ( m_team_size * m_vector_length > 1024 ) {
Impl::throw_runtime_exception(std::string("Kokkos::TeamPolicy< Cuda > the team size is too large. Team size x vector length must be smaller than 1024."));
}
}
/** \brief Specify league size, request team size */
TeamPolicyInternal( execution_space &
, int league_size_
, const Kokkos::AUTO_t & /* team_size_request */
, int vector_length_request = 1 )
: m_league_size( league_size_ )
, m_team_size( -1 )
, m_vector_length( vector_length_request )
, m_team_scratch_size {0,0}
, m_thread_scratch_size {0,0}
, m_chunk_size ( 32 )
{
// Allow only power-of-two vector_length
if ( ! Kokkos::Impl::is_integral_power_of_two( vector_length_request ) ) {
Impl::throw_runtime_exception( "Requested non-power-of-two vector length for TeamPolicy.");
}
// Make sure league size is permissable
if(league_size_ >= int(Impl::cuda_internal_maximum_grid_count()))
Impl::throw_runtime_exception( "Requested too large league_size for TeamPolicy on Cuda execution space.");
}
TeamPolicyInternal( int league_size_
, int team_size_request
, int vector_length_request = 1 )
: m_league_size( league_size_ )
, m_team_size( team_size_request )
, m_vector_length ( vector_length_request )
, m_team_scratch_size {0,0}
, m_thread_scratch_size {0,0}
, m_chunk_size ( 32 )
{
// Allow only power-of-two vector_length
if ( ! Kokkos::Impl::is_integral_power_of_two( vector_length_request ) ) {
Impl::throw_runtime_exception( "Requested non-power-of-two vector length for TeamPolicy.");
}
// Make sure league size is permissable
if(league_size_ >= int(Impl::cuda_internal_maximum_grid_count()))
Impl::throw_runtime_exception( "Requested too large league_size for TeamPolicy on Cuda execution space.");
// Make sure total block size is permissable
if ( m_team_size * m_vector_length > 1024 ) {
Impl::throw_runtime_exception(std::string("Kokkos::TeamPolicy< Cuda > the team size is too large. Team size x vector length must be smaller than 1024."));
}
}
TeamPolicyInternal( int league_size_
, const Kokkos::AUTO_t & /* team_size_request */
, int vector_length_request = 1 )
: m_league_size( league_size_ )
, m_team_size( -1 )
, m_vector_length ( vector_length_request )
, m_team_scratch_size {0,0}
, m_thread_scratch_size {0,0}
, m_chunk_size ( 32 )
{
// Allow only power-of-two vector_length
if ( ! Kokkos::Impl::is_integral_power_of_two( vector_length_request ) ) {
Impl::throw_runtime_exception( "Requested non-power-of-two vector length for TeamPolicy.");
}
// Make sure league size is permissable
if(league_size_ >= int(Impl::cuda_internal_maximum_grid_count()))
Impl::throw_runtime_exception( "Requested too large league_size for TeamPolicy on Cuda execution space.");
}
inline int chunk_size() const { return m_chunk_size ; }
/** \brief set chunk_size to a discrete value*/
inline TeamPolicyInternal set_chunk_size(typename traits::index_type chunk_size_) const {
TeamPolicyInternal p = *this;
p.m_chunk_size = chunk_size_;
return p;
}
/** \brief set per team scratch size for a specific level of the scratch hierarchy */
inline TeamPolicyInternal set_scratch_size(const int& level, const PerTeamValue& per_team) const {
TeamPolicyInternal p = *this;
p.m_team_scratch_size[level] = per_team.value;
return p;
};
/** \brief set per thread scratch size for a specific level of the scratch hierarchy */
inline TeamPolicyInternal set_scratch_size(const int& level, const PerThreadValue& per_thread) const {
TeamPolicyInternal p = *this;
p.m_thread_scratch_size[level] = per_thread.value;
return p;
};
/** \brief set per thread and per team scratch size for a specific level of the scratch hierarchy */
inline TeamPolicyInternal set_scratch_size(const int& level, const PerTeamValue& per_team, const PerThreadValue& per_thread) const {
TeamPolicyInternal p = *this;
p.m_team_scratch_size[level] = per_team.value;
p.m_thread_scratch_size[level] = per_thread.value;
return p;
};
typedef Kokkos::Impl::CudaTeamMember member_type ;
};
} // namspace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template< class FunctorType , class ... Traits >
class ParallelFor< FunctorType
, Kokkos::RangePolicy< Traits ... >
, Kokkos::Cuda
>
{
private:
typedef Kokkos::RangePolicy< Traits ... > Policy;
typedef typename Policy::member_type Member ;
typedef typename Policy::work_tag WorkTag ;
const FunctorType m_functor ;
- const Policy m_policy ;
+ const Policy m_policy ;
ParallelFor() = delete ;
ParallelFor & operator = ( const ParallelFor & ) = delete ;
template< class TagType >
inline __device__
typename std::enable_if< std::is_same< TagType , void >::value >::type
exec_range( const Member i ) const
{ m_functor( i ); }
template< class TagType >
inline __device__
typename std::enable_if< ! std::is_same< TagType , void >::value >::type
exec_range( const Member i ) const
{ m_functor( TagType() , i ); }
public:
typedef FunctorType functor_type ;
inline
__device__
void operator()(void) const
{
const Member work_stride = blockDim.y * gridDim.x ;
const Member work_end = m_policy.end();
for ( Member
iwork = m_policy.begin() + threadIdx.y + blockDim.y * blockIdx.x ;
iwork < work_end ;
iwork += work_stride ) {
this-> template exec_range< WorkTag >( iwork );
}
}
inline
void execute() const
{
const int nwork = m_policy.end() - m_policy.begin();
const dim3 block( 1 , CudaTraits::WarpSize * cuda_internal_maximum_warp_count(), 1);
const dim3 grid( std::min( ( nwork + block.y - 1 ) / block.y , cuda_internal_maximum_grid_count() ) , 1 , 1);
CudaParallelLaunch< ParallelFor >( *this , grid , block , 0 );
}
ParallelFor( const FunctorType & arg_functor ,
const Policy & arg_policy )
: m_functor( arg_functor )
, m_policy( arg_policy )
{ }
};
template< class FunctorType , class ... Properties >
class ParallelFor< FunctorType
, Kokkos::TeamPolicy< Properties ... >
, Kokkos::Cuda
>
{
private:
typedef TeamPolicyInternal< Kokkos::Cuda , Properties ... > Policy ;
typedef typename Policy::member_type Member ;
typedef typename Policy::work_tag WorkTag ;
public:
typedef FunctorType functor_type ;
typedef Cuda::size_type size_type ;
private:
// Algorithmic constraints: blockDim.y is a power of two AND blockDim.y == blockDim.z == 1
// shared memory utilization:
//
// [ team reduce space ]
// [ team shared space ]
//
const FunctorType m_functor ;
const size_type m_league_size ;
const size_type m_team_size ;
const size_type m_vector_size ;
const size_type m_shmem_begin ;
const size_type m_shmem_size ;
void* m_scratch_ptr[2] ;
const int m_scratch_size[2] ;
template< class TagType >
__device__ inline
typename std::enable_if< std::is_same< TagType , void >::value >::type
exec_team( const Member & member ) const
{ m_functor( member ); }
template< class TagType >
__device__ inline
typename std::enable_if< ! std::is_same< TagType , void >::value >::type
exec_team( const Member & member ) const
{ m_functor( TagType() , member ); }
public:
__device__ inline
void operator()(void) const
{
// Iterate this block through the league
for ( int league_rank = blockIdx.x ; league_rank < m_league_size ; league_rank += gridDim.x ) {
this-> template exec_team< WorkTag >(
typename Policy::member_type( kokkos_impl_cuda_shared_memory<void>()
, m_shmem_begin
, m_shmem_size
, m_scratch_ptr[1]
, m_scratch_size[1]
, league_rank
, m_league_size ) );
}
}
inline
void execute() const
{
const int shmem_size_total = m_shmem_begin + m_shmem_size ;
const dim3 grid( int(m_league_size) , 1 , 1 );
const dim3 block( int(m_vector_size) , int(m_team_size) , 1 );
CudaParallelLaunch< ParallelFor >( *this, grid, block, shmem_size_total ); // copy to device and execute
}
- ParallelFor( const FunctorType & arg_functor
- , const Policy & arg_policy
+ ParallelFor( const FunctorType & arg_functor
+ , const Policy & arg_policy
)
: m_functor( arg_functor )
, m_league_size( arg_policy.league_size() )
, m_team_size( 0 <= arg_policy.team_size() ? arg_policy.team_size() :
Kokkos::Impl::cuda_get_opt_block_size< ParallelFor >( arg_functor , arg_policy.vector_length(), arg_policy.team_scratch_size(0),arg_policy.thread_scratch_size(0) ) / arg_policy.vector_length() )
, m_vector_size( arg_policy.vector_length() )
, m_shmem_begin( sizeof(double) * ( m_team_size + 2 ) )
, m_shmem_size( arg_policy.scratch_size(0,m_team_size) + FunctorTeamShmemSize< FunctorType >::value( m_functor , m_team_size ) )
, m_scratch_ptr{NULL,NULL}
, m_scratch_size{arg_policy.scratch_size(0,m_team_size),arg_policy.scratch_size(1,m_team_size)}
{
// Functor's reduce memory, team scan memory, and team shared memory depend upon team size.
m_scratch_ptr[1] = cuda_resize_scratch_space(m_scratch_size[1]*(Cuda::concurrency()/(m_team_size*m_vector_size)));
const int shmem_size_total = m_shmem_begin + m_shmem_size ;
if ( CudaTraits::SharedMemoryCapacity < shmem_size_total ) {
Kokkos::Impl::throw_runtime_exception(std::string("Kokkos::Impl::ParallelFor< Cuda > insufficient shared memory"));
}
if ( int(m_team_size) >
int(Kokkos::Impl::cuda_get_max_block_size< ParallelFor >
( arg_functor , arg_policy.vector_length(), arg_policy.team_scratch_size(0),arg_policy.thread_scratch_size(0) ) / arg_policy.vector_length())) {
Kokkos::Impl::throw_runtime_exception(std::string("Kokkos::Impl::ParallelFor< Cuda > requested too large team size."));
}
}
};
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template< class FunctorType , class ReducerType, class ... Traits >
class ParallelReduce< FunctorType
, Kokkos::RangePolicy< Traits ... >
, ReducerType
- , Kokkos::Cuda
+ , Kokkos::Cuda
>
{
private:
typedef Kokkos::RangePolicy< Traits ... > Policy ;
typedef typename Policy::WorkRange WorkRange ;
typedef typename Policy::work_tag WorkTag ;
typedef typename Policy::member_type Member ;
typedef Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, FunctorType, ReducerType> ReducerConditional;
typedef typename ReducerConditional::type ReducerTypeFwd;
typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd, WorkTag > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd, WorkTag > ValueInit ;
typedef Kokkos::Impl::FunctorValueJoin< ReducerTypeFwd, WorkTag > ValueJoin ;
public:
typedef typename ValueTraits::pointer_type pointer_type ;
typedef typename ValueTraits::value_type value_type ;
typedef typename ValueTraits::reference_type reference_type ;
typedef FunctorType functor_type ;
typedef Cuda::size_type size_type ;
// Algorithmic constraints: blockSize is a power of two AND blockDim.y == blockDim.z == 1
const FunctorType m_functor ;
const Policy m_policy ;
const ReducerType m_reducer ;
const pointer_type m_result_ptr ;
size_type * m_scratch_space ;
size_type * m_scratch_flags ;
size_type * m_unified_space ;
// Shall we use the shfl based reduction or not (only use it for static sized types of more than 128bit
enum { UseShflReduction = ((sizeof(value_type)>2*sizeof(double)) && ValueTraits::StaticValueSize) };
// Some crutch to do function overloading
private:
typedef double DummyShflReductionType;
typedef int DummySHMEMReductionType;
public:
template< class TagType >
__device__ inline
typename std::enable_if< std::is_same< TagType , void >::value >::type
exec_range( const Member & i , reference_type update ) const
{ m_functor( i , update ); }
template< class TagType >
__device__ inline
typename std::enable_if< ! std::is_same< TagType , void >::value >::type
exec_range( const Member & i , reference_type update ) const
{ m_functor( TagType() , i , update ); }
__device__ inline
void operator() () const {
run(Kokkos::Impl::if_c<UseShflReduction, DummyShflReductionType, DummySHMEMReductionType>::select(1,1.0) );
}
__device__ inline
void run(const DummySHMEMReductionType& ) const
{
const integral_nonzero_constant< size_type , ValueTraits::StaticValueSize / sizeof(size_type) >
word_count( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) / sizeof(size_type) );
{
reference_type value =
ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , kokkos_impl_cuda_shared_memory<size_type>() + threadIdx.y * word_count.value );
// Number of blocks is bounded so that the reduction can be limited to two passes.
// Each thread block is given an approximately equal amount of work to perform.
// Accumulate the values for this block.
// The accumulation ordering does not match the final pass, but is arithmatically equivalent.
const WorkRange range( m_policy , blockIdx.x , gridDim.x );
for ( Member iwork = range.begin() + threadIdx.y , iwork_end = range.end() ;
iwork < iwork_end ; iwork += blockDim.y ) {
this-> template exec_range< WorkTag >( iwork , value );
}
}
// Reduce with final value at blockDim.y - 1 location.
if ( cuda_single_inter_block_reduce_scan<false,ReducerTypeFwd,WorkTag>(
ReducerConditional::select(m_functor , m_reducer) , blockIdx.x , gridDim.x ,
kokkos_impl_cuda_shared_memory<size_type>() , m_scratch_space , m_scratch_flags ) ) {
// This is the final block with the final result at the final threads' location
size_type * const shared = kokkos_impl_cuda_shared_memory<size_type>() + ( blockDim.y - 1 ) * word_count.value ;
size_type * const global = m_unified_space ? m_unified_space : m_scratch_space ;
if ( threadIdx.y == 0 ) {
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer) , shared );
}
if ( CudaTraits::WarpSize < word_count.value ) { __syncthreads(); }
for ( unsigned i = threadIdx.y ; i < word_count.value ; i += blockDim.y ) { global[i] = shared[i]; }
}
}
__device__ inline
void run(const DummyShflReductionType&) const
{
value_type value;
ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , &value);
// Number of blocks is bounded so that the reduction can be limited to two passes.
// Each thread block is given an approximately equal amount of work to perform.
// Accumulate the values for this block.
// The accumulation ordering does not match the final pass, but is arithmatically equivalent.
const WorkRange range( m_policy , blockIdx.x , gridDim.x );
for ( Member iwork = range.begin() + threadIdx.y , iwork_end = range.end() ;
iwork < iwork_end ; iwork += blockDim.y ) {
this-> template exec_range< WorkTag >( iwork , value );
}
pointer_type const result = (pointer_type) (m_unified_space ? m_unified_space : m_scratch_space) ;
int max_active_thread = range.end()-range.begin() < blockDim.y ? range.end() - range.begin():blockDim.y;
max_active_thread = (max_active_thread == 0)?blockDim.y:max_active_thread;
value_type init;
ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , &init);
if(Impl::cuda_inter_block_reduction<ReducerTypeFwd,ValueJoin,WorkTag>
(value,init,ValueJoin(ReducerConditional::select(m_functor , m_reducer)),m_scratch_space,result,m_scratch_flags,max_active_thread)) {
const unsigned id = threadIdx.y*blockDim.x + threadIdx.x;
if(id==0) {
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer) , (void*) &value );
*result = value;
}
}
}
// Determine block size constrained by shared memory:
static inline
unsigned local_block_size( const FunctorType & f )
{
unsigned n = CudaTraits::WarpSize * 8 ;
while ( n && CudaTraits::SharedMemoryCapacity < cuda_single_inter_block_reduce_scan_shmem<false,FunctorType,WorkTag>( f , n ) ) { n >>= 1 ; }
return n ;
}
inline
void execute()
{
const int nwork = m_policy.end() - m_policy.begin();
if ( nwork ) {
const int block_size = local_block_size( m_functor );
-
+
m_scratch_space = cuda_internal_scratch_space( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) * block_size /* block_size == max block_count */ );
m_scratch_flags = cuda_internal_scratch_flags( sizeof(size_type) );
m_unified_space = cuda_internal_scratch_unified( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) );
-
+
// REQUIRED ( 1 , N , 1 )
const dim3 block( 1 , block_size , 1 );
// Required grid.x <= block.y
const dim3 grid( std::min( int(block.y) , int( ( nwork + block.y - 1 ) / block.y ) ) , 1 , 1 );
-
+
const int shmem = UseShflReduction?0:cuda_single_inter_block_reduce_scan_shmem<false,FunctorType,WorkTag>( m_functor , block.y );
-
CudaParallelLaunch< ParallelReduce >( *this, grid, block, shmem ); // copy to device and execute
-
+
Cuda::fence();
-
+
if ( m_result_ptr ) {
if ( m_unified_space ) {
const int count = ValueTraits::value_count( ReducerConditional::select(m_functor , m_reducer) );
for ( int i = 0 ; i < count ; ++i ) { m_result_ptr[i] = pointer_type(m_unified_space)[i] ; }
}
else {
const int size = ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) );
DeepCopy<HostSpace,CudaSpace>( m_result_ptr , m_scratch_space , size );
}
}
}
else {
if (m_result_ptr) {
ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , m_result_ptr );
}
}
}
template< class HostViewType >
- ParallelReduce( const FunctorType & arg_functor
- , const Policy & arg_policy
+ ParallelReduce( const FunctorType & arg_functor
+ , const Policy & arg_policy
, const HostViewType & arg_result
, typename std::enable_if<
Kokkos::is_view< HostViewType >::value
,void*>::type = NULL)
: m_functor( arg_functor )
, m_policy( arg_policy )
, m_reducer( InvalidType() )
, m_result_ptr( arg_result.ptr_on_device() )
, m_scratch_space( 0 )
, m_scratch_flags( 0 )
, m_unified_space( 0 )
{ }
ParallelReduce( const FunctorType & arg_functor
, const Policy & arg_policy
, const ReducerType & reducer)
: m_functor( arg_functor )
, m_policy( arg_policy )
, m_reducer( reducer )
, m_result_ptr( reducer.result_view().ptr_on_device() )
, m_scratch_space( 0 )
, m_scratch_flags( 0 )
, m_unified_space( 0 )
{ }
};
//----------------------------------------------------------------------------
template< class FunctorType , class ReducerType, class ... Properties >
class ParallelReduce< FunctorType
, Kokkos::TeamPolicy< Properties ... >
, ReducerType
, Kokkos::Cuda
>
{
private:
typedef TeamPolicyInternal< Kokkos::Cuda, Properties ... > Policy ;
typedef typename Policy::member_type Member ;
typedef typename Policy::work_tag WorkTag ;
typedef Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, FunctorType, ReducerType> ReducerConditional;
typedef typename ReducerConditional::type ReducerTypeFwd;
typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd, WorkTag > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd, WorkTag > ValueInit ;
typedef Kokkos::Impl::FunctorValueJoin< ReducerTypeFwd, WorkTag > ValueJoin ;
typedef typename ValueTraits::pointer_type pointer_type ;
typedef typename ValueTraits::reference_type reference_type ;
typedef typename ValueTraits::value_type value_type ;
-
public:
typedef FunctorType functor_type ;
typedef Cuda::size_type size_type ;
enum { UseShflReduction = (true && ValueTraits::StaticValueSize) };
private:
typedef double DummyShflReductionType;
typedef int DummySHMEMReductionType;
-
// Algorithmic constraints: blockDim.y is a power of two AND blockDim.y == blockDim.z == 1
// shared memory utilization:
//
// [ global reduce space ]
// [ team reduce space ]
// [ team shared space ]
//
const FunctorType m_functor ;
const ReducerType m_reducer ;
const pointer_type m_result_ptr ;
size_type * m_scratch_space ;
size_type * m_scratch_flags ;
size_type * m_unified_space ;
size_type m_team_begin ;
size_type m_shmem_begin ;
size_type m_shmem_size ;
void* m_scratch_ptr[2] ;
int m_scratch_size[2] ;
const size_type m_league_size ;
const size_type m_team_size ;
const size_type m_vector_size ;
template< class TagType >
__device__ inline
typename std::enable_if< std::is_same< TagType , void >::value >::type
exec_team( const Member & member , reference_type update ) const
{ m_functor( member , update ); }
template< class TagType >
__device__ inline
typename std::enable_if< ! std::is_same< TagType , void >::value >::type
exec_team( const Member & member , reference_type update ) const
{ m_functor( TagType() , member , update ); }
public:
__device__ inline
void operator() () const {
run(Kokkos::Impl::if_c<UseShflReduction, DummyShflReductionType, DummySHMEMReductionType>::select(1,1.0) );
}
__device__ inline
void run(const DummySHMEMReductionType&) const
{
const integral_nonzero_constant< size_type , ValueTraits::StaticValueSize / sizeof(size_type) >
word_count( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) / sizeof(size_type) );
reference_type value =
ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , kokkos_impl_cuda_shared_memory<size_type>() + threadIdx.y * word_count.value );
// Iterate this block through the league
for ( int league_rank = blockIdx.x ; league_rank < m_league_size ; league_rank += gridDim.x ) {
this-> template exec_team< WorkTag >
( Member( kokkos_impl_cuda_shared_memory<char>() + m_team_begin
, m_shmem_begin
, m_shmem_size
, m_scratch_ptr[1]
, m_scratch_size[1]
, league_rank
, m_league_size )
, value );
}
// Reduce with final value at blockDim.y - 1 location.
if ( cuda_single_inter_block_reduce_scan<false,FunctorType,WorkTag>(
ReducerConditional::select(m_functor , m_reducer) , blockIdx.x , gridDim.x ,
kokkos_impl_cuda_shared_memory<size_type>() , m_scratch_space , m_scratch_flags ) ) {
// This is the final block with the final result at the final threads' location
size_type * const shared = kokkos_impl_cuda_shared_memory<size_type>() + ( blockDim.y - 1 ) * word_count.value ;
size_type * const global = m_unified_space ? m_unified_space : m_scratch_space ;
if ( threadIdx.y == 0 ) {
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer) , shared );
}
if ( CudaTraits::WarpSize < word_count.value ) { __syncthreads(); }
for ( unsigned i = threadIdx.y ; i < word_count.value ; i += blockDim.y ) { global[i] = shared[i]; }
}
}
__device__ inline
void run(const DummyShflReductionType&) const
{
value_type value;
ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , &value);
// Iterate this block through the league
for ( int league_rank = blockIdx.x ; league_rank < m_league_size ; league_rank += gridDim.x ) {
this-> template exec_team< WorkTag >
( Member( kokkos_impl_cuda_shared_memory<char>() + m_team_begin
, m_shmem_begin
, m_shmem_size
, m_scratch_ptr[1]
, m_scratch_size[1]
, league_rank
, m_league_size )
, value );
}
pointer_type const result = (pointer_type) (m_unified_space ? m_unified_space : m_scratch_space) ;
value_type init;
ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , &init);
if(Impl::cuda_inter_block_reduction<FunctorType,ValueJoin,WorkTag>
(value,init,ValueJoin(ReducerConditional::select(m_functor , m_reducer)),m_scratch_space,result,m_scratch_flags,blockDim.y)) {
const unsigned id = threadIdx.y*blockDim.x + threadIdx.x;
if(id==0) {
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer) , (void*) &value );
*result = value;
}
}
}
inline
void execute()
{
- const int block_count = UseShflReduction? std::min( m_league_size , size_type(1024) )
- :std::min( m_league_size , m_team_size );
+ const int nwork = m_league_size * m_team_size ;
+ if ( nwork ) {
+ const int block_count = UseShflReduction? std::min( m_league_size , size_type(1024) )
+ :std::min( m_league_size , m_team_size );
- m_scratch_space = cuda_internal_scratch_space( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) * block_count );
- m_scratch_flags = cuda_internal_scratch_flags( sizeof(size_type) );
- m_unified_space = cuda_internal_scratch_unified( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) );
+ m_scratch_space = cuda_internal_scratch_space( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) * block_count );
+ m_scratch_flags = cuda_internal_scratch_flags( sizeof(size_type) );
+ m_unified_space = cuda_internal_scratch_unified( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) );
- const dim3 block( m_vector_size , m_team_size , 1 );
- const dim3 grid( block_count , 1 , 1 );
- const int shmem_size_total = m_team_begin + m_shmem_begin + m_shmem_size ;
+ const dim3 block( m_vector_size , m_team_size , 1 );
+ const dim3 grid( block_count , 1 , 1 );
+ const int shmem_size_total = m_team_begin + m_shmem_begin + m_shmem_size ;
- CudaParallelLaunch< ParallelReduce >( *this, grid, block, shmem_size_total ); // copy to device and execute
+ CudaParallelLaunch< ParallelReduce >( *this, grid, block, shmem_size_total ); // copy to device and execute
- Cuda::fence();
+ Cuda::fence();
- if ( m_result_ptr ) {
- if ( m_unified_space ) {
- const int count = ValueTraits::value_count( ReducerConditional::select(m_functor , m_reducer) );
- for ( int i = 0 ; i < count ; ++i ) { m_result_ptr[i] = pointer_type(m_unified_space)[i] ; }
+ if ( m_result_ptr ) {
+ if ( m_unified_space ) {
+ const int count = ValueTraits::value_count( ReducerConditional::select(m_functor , m_reducer) );
+ for ( int i = 0 ; i < count ; ++i ) { m_result_ptr[i] = pointer_type(m_unified_space)[i] ; }
+ }
+ else {
+ const int size = ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) );
+ DeepCopy<HostSpace,CudaSpace>( m_result_ptr, m_scratch_space, size );
+ }
}
- else {
- const int size = ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) );
- DeepCopy<HostSpace,CudaSpace>( m_result_ptr, m_scratch_space, size );
+ }
+ else {
+ if (m_result_ptr) {
+ ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , m_result_ptr );
}
}
}
template< class HostViewType >
- ParallelReduce( const FunctorType & arg_functor
- , const Policy & arg_policy
+ ParallelReduce( const FunctorType & arg_functor
+ , const Policy & arg_policy
, const HostViewType & arg_result
, typename std::enable_if<
Kokkos::is_view< HostViewType >::value
,void*>::type = NULL)
: m_functor( arg_functor )
, m_reducer( InvalidType() )
, m_result_ptr( arg_result.ptr_on_device() )
, m_scratch_space( 0 )
, m_scratch_flags( 0 )
, m_unified_space( 0 )
, m_team_begin( 0 )
, m_shmem_begin( 0 )
, m_shmem_size( 0 )
, m_scratch_ptr{NULL,NULL}
, m_league_size( arg_policy.league_size() )
, m_team_size( 0 <= arg_policy.team_size() ? arg_policy.team_size() :
Kokkos::Impl::cuda_get_opt_block_size< ParallelReduce >( arg_functor , arg_policy.vector_length(),
arg_policy.team_scratch_size(0),arg_policy.thread_scratch_size(0) ) /
- arg_policy.vector_length() )
+ arg_policy.vector_length() )
, m_vector_size( arg_policy.vector_length() )
- , m_scratch_size{arg_policy.scratch_size(0,m_team_size),arg_policy.scratch_size(1,m_team_size)}
+ , m_scratch_size{
+ arg_policy.scratch_size(0,( 0 <= arg_policy.team_size() ? arg_policy.team_size() :
+ Kokkos::Impl::cuda_get_opt_block_size< ParallelReduce >( arg_functor , arg_policy.vector_length(),
+ arg_policy.team_scratch_size(0),arg_policy.thread_scratch_size(0) ) /
+ arg_policy.vector_length() )
+ ), arg_policy.scratch_size(1,( 0 <= arg_policy.team_size() ? arg_policy.team_size() :
+ Kokkos::Impl::cuda_get_opt_block_size< ParallelReduce >( arg_functor , arg_policy.vector_length(),
+ arg_policy.team_scratch_size(0),arg_policy.thread_scratch_size(0) ) /
+ arg_policy.vector_length() )
+ )}
{
// Return Init value if the number of worksets is zero
if( arg_policy.league_size() == 0) {
ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , arg_result.ptr_on_device() );
return ;
}
m_team_begin = UseShflReduction?0:cuda_single_inter_block_reduce_scan_shmem<false,FunctorType,WorkTag>( arg_functor , m_team_size );
m_shmem_begin = sizeof(double) * ( m_team_size + 2 );
m_shmem_size = arg_policy.scratch_size(0,m_team_size) + FunctorTeamShmemSize< FunctorType >::value( arg_functor , m_team_size );
m_scratch_ptr[1] = cuda_resize_scratch_space(m_scratch_size[1]*(Cuda::concurrency()/(m_team_size*m_vector_size)));
m_scratch_size[0] = m_shmem_size;
m_scratch_size[1] = arg_policy.scratch_size(1,m_team_size);
// The global parallel_reduce does not support vector_length other than 1 at the moment
if( (arg_policy.vector_length() > 1) && !UseShflReduction )
Impl::throw_runtime_exception( "Kokkos::parallel_reduce with a TeamPolicy using a vector length of greater than 1 is not currently supported for CUDA for dynamic sized reduction types.");
if( (m_team_size < 32) && !UseShflReduction )
Impl::throw_runtime_exception( "Kokkos::parallel_reduce with a TeamPolicy using a team_size smaller than 32 is not currently supported with CUDA for dynamic sized reduction types.");
// Functor's reduce memory, team scan memory, and team shared memory depend upon team size.
const int shmem_size_total = m_team_begin + m_shmem_begin + m_shmem_size ;
if (! Kokkos::Impl::is_integral_power_of_two( m_team_size ) && !UseShflReduction ) {
Kokkos::Impl::throw_runtime_exception(std::string("Kokkos::Impl::ParallelReduce< Cuda > bad team size"));
}
if ( CudaTraits::SharedMemoryCapacity < shmem_size_total ) {
Kokkos::Impl::throw_runtime_exception(std::string("Kokkos::Impl::ParallelReduce< Cuda > requested too much L0 scratch memory"));
}
if ( m_team_size >
Kokkos::Impl::cuda_get_max_block_size< ParallelReduce >
( arg_functor , arg_policy.vector_length(), arg_policy.team_scratch_size(0),arg_policy.thread_scratch_size(0) ) / arg_policy.vector_length()) {
Kokkos::Impl::throw_runtime_exception(std::string("Kokkos::Impl::ParallelReduce< Cuda > requested too large team size."));
}
}
ParallelReduce( const FunctorType & arg_functor
, const Policy & arg_policy
, const ReducerType & reducer)
: m_functor( arg_functor )
, m_reducer( reducer )
, m_result_ptr( reducer.result_view().ptr_on_device() )
, m_scratch_space( 0 )
, m_scratch_flags( 0 )
, m_unified_space( 0 )
, m_team_begin( 0 )
, m_shmem_begin( 0 )
, m_shmem_size( 0 )
, m_scratch_ptr{NULL,NULL}
, m_league_size( arg_policy.league_size() )
, m_team_size( 0 <= arg_policy.team_size() ? arg_policy.team_size() :
Kokkos::Impl::cuda_get_opt_block_size< ParallelReduce >( arg_functor , arg_policy.vector_length(),
arg_policy.team_scratch_size(0),arg_policy.thread_scratch_size(0) ) /
arg_policy.vector_length() )
, m_vector_size( arg_policy.vector_length() )
{
// Return Init value if the number of worksets is zero
if( arg_policy.league_size() == 0) {
ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , m_result_ptr );
return ;
}
m_team_begin = UseShflReduction?0:cuda_single_inter_block_reduce_scan_shmem<false,FunctorType,WorkTag>( arg_functor , m_team_size );
m_shmem_begin = sizeof(double) * ( m_team_size + 2 );
m_shmem_size = arg_policy.scratch_size(0,m_team_size) + FunctorTeamShmemSize< FunctorType >::value( arg_functor , m_team_size );
m_scratch_ptr[1] = cuda_resize_scratch_space(m_scratch_size[1]*(Cuda::concurrency()/(m_team_size*m_vector_size)));
m_scratch_size[0] = m_shmem_size;
m_scratch_size[1] = arg_policy.scratch_size(1,m_team_size);
// The global parallel_reduce does not support vector_length other than 1 at the moment
if( (arg_policy.vector_length() > 1) && !UseShflReduction )
Impl::throw_runtime_exception( "Kokkos::parallel_reduce with a TeamPolicy using a vector length of greater than 1 is not currently supported for CUDA for dynamic sized reduction types.");
if( (m_team_size < 32) && !UseShflReduction )
Impl::throw_runtime_exception( "Kokkos::parallel_reduce with a TeamPolicy using a team_size smaller than 32 is not currently supported with CUDA for dynamic sized reduction types.");
// Functor's reduce memory, team scan memory, and team shared memory depend upon team size.
const int shmem_size_total = m_team_begin + m_shmem_begin + m_shmem_size ;
if ( (! Kokkos::Impl::is_integral_power_of_two( m_team_size ) && !UseShflReduction ) ||
CudaTraits::SharedMemoryCapacity < shmem_size_total ) {
Kokkos::Impl::throw_runtime_exception(std::string("Kokkos::Impl::ParallelReduce< Cuda > bad team size"));
}
if ( int(m_team_size) >
int(Kokkos::Impl::cuda_get_max_block_size< ParallelReduce >
( arg_functor , arg_policy.vector_length(), arg_policy.team_scratch_size(0),arg_policy.thread_scratch_size(0) ) / arg_policy.vector_length())) {
Kokkos::Impl::throw_runtime_exception(std::string("Kokkos::Impl::ParallelReduce< Cuda > requested too large team size."));
}
}
};
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template< class FunctorType , class ... Traits >
class ParallelScan< FunctorType
, Kokkos::RangePolicy< Traits ... >
, Kokkos::Cuda
>
{
private:
typedef Kokkos::RangePolicy< Traits ... > Policy ;
typedef typename Policy::member_type Member ;
typedef typename Policy::work_tag WorkTag ;
typedef typename Policy::WorkRange WorkRange ;
typedef Kokkos::Impl::FunctorValueTraits< FunctorType, WorkTag > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< FunctorType, WorkTag > ValueInit ;
typedef Kokkos::Impl::FunctorValueOps< FunctorType, WorkTag > ValueOps ;
public:
typedef typename ValueTraits::pointer_type pointer_type ;
typedef typename ValueTraits::reference_type reference_type ;
typedef FunctorType functor_type ;
typedef Cuda::size_type size_type ;
private:
// Algorithmic constraints:
// (a) blockDim.y is a power of two
// (b) blockDim.y == blockDim.z == 1
// (c) gridDim.x <= blockDim.y * blockDim.y
// (d) gridDim.y == gridDim.z == 1
const FunctorType m_functor ;
const Policy m_policy ;
size_type * m_scratch_space ;
size_type * m_scratch_flags ;
size_type m_final ;
template< class TagType >
__device__ inline
typename std::enable_if< std::is_same< TagType , void >::value >::type
exec_range( const Member & i , reference_type update , const bool final_result ) const
{ m_functor( i , update , final_result ); }
template< class TagType >
__device__ inline
typename std::enable_if< ! std::is_same< TagType , void >::value >::type
exec_range( const Member & i , reference_type update , const bool final_result ) const
{ m_functor( TagType() , i , update , final_result ); }
//----------------------------------------
__device__ inline
void initial(void) const
{
const integral_nonzero_constant< size_type , ValueTraits::StaticValueSize / sizeof(size_type) >
word_count( ValueTraits::value_size( m_functor ) / sizeof(size_type) );
size_type * const shared_value = kokkos_impl_cuda_shared_memory<size_type>() + word_count.value * threadIdx.y ;
ValueInit::init( m_functor , shared_value );
// Number of blocks is bounded so that the reduction can be limited to two passes.
// Each thread block is given an approximately equal amount of work to perform.
// Accumulate the values for this block.
// The accumulation ordering does not match the final pass, but is arithmatically equivalent.
const WorkRange range( m_policy , blockIdx.x , gridDim.x );
for ( Member iwork = range.begin() + threadIdx.y , iwork_end = range.end() ;
iwork < iwork_end ; iwork += blockDim.y ) {
this-> template exec_range< WorkTag >( iwork , ValueOps::reference( shared_value ) , false );
}
// Reduce and scan, writing out scan of blocks' totals and block-groups' totals.
// Blocks' scan values are written to 'blockIdx.x' location.
// Block-groups' scan values are at: i = ( j * blockDim.y - 1 ) for i < gridDim.x
cuda_single_inter_block_reduce_scan<true,FunctorType,WorkTag>( m_functor , blockIdx.x , gridDim.x , kokkos_impl_cuda_shared_memory<size_type>() , m_scratch_space , m_scratch_flags );
}
//----------------------------------------
__device__ inline
void final(void) const
{
const integral_nonzero_constant< size_type , ValueTraits::StaticValueSize / sizeof(size_type) >
word_count( ValueTraits::value_size( m_functor ) / sizeof(size_type) );
// Use shared memory as an exclusive scan: { 0 , value[0] , value[1] , value[2] , ... }
size_type * const shared_data = kokkos_impl_cuda_shared_memory<size_type>();
size_type * const shared_prefix = shared_data + word_count.value * threadIdx.y ;
size_type * const shared_accum = shared_data + word_count.value * ( blockDim.y + 1 );
// Starting value for this thread block is the previous block's total.
if ( blockIdx.x ) {
size_type * const block_total = m_scratch_space + word_count.value * ( blockIdx.x - 1 );
for ( unsigned i = threadIdx.y ; i < word_count.value ; ++i ) { shared_accum[i] = block_total[i] ; }
}
else if ( 0 == threadIdx.y ) {
ValueInit::init( m_functor , shared_accum );
}
const WorkRange range( m_policy , blockIdx.x , gridDim.x );
for ( typename Policy::member_type iwork_base = range.begin(); iwork_base < range.end() ; iwork_base += blockDim.y ) {
const typename Policy::member_type iwork = iwork_base + threadIdx.y ;
__syncthreads(); // Don't overwrite previous iteration values until they are used
ValueInit::init( m_functor , shared_prefix + word_count.value );
// Copy previous block's accumulation total into thread[0] prefix and inclusive scan value of this block
for ( unsigned i = threadIdx.y ; i < word_count.value ; ++i ) {
shared_data[i + word_count.value] = shared_data[i] = shared_accum[i] ;
}
if ( CudaTraits::WarpSize < word_count.value ) { __syncthreads(); } // Protect against large scan values.
// Call functor to accumulate inclusive scan value for this work item
if ( iwork < range.end() ) {
this-> template exec_range< WorkTag >( iwork , ValueOps::reference( shared_prefix + word_count.value ) , false );
}
// Scan block values into locations shared_data[1..blockDim.y]
- cuda_intra_block_reduce_scan<true,FunctorType,WorkTag>( m_functor , ValueTraits::pointer_type(shared_data+word_count.value) );
+ cuda_intra_block_reduce_scan<true,FunctorType,WorkTag>( m_functor , typename ValueTraits::pointer_type(shared_data+word_count.value) );
{
size_type * const block_total = shared_data + word_count.value * blockDim.y ;
for ( unsigned i = threadIdx.y ; i < word_count.value ; ++i ) { shared_accum[i] = block_total[i]; }
}
// Call functor with exclusive scan value
if ( iwork < range.end() ) {
this-> template exec_range< WorkTag >( iwork , ValueOps::reference( shared_prefix ) , true );
}
}
}
public:
//----------------------------------------
__device__ inline
void operator()(void) const
{
if ( ! m_final ) {
initial();
}
else {
final();
}
}
// Determine block size constrained by shared memory:
static inline
unsigned local_block_size( const FunctorType & f )
{
// blockDim.y must be power of two = 128 (4 warps) or 256 (8 warps) or 512 (16 warps)
// gridDim.x <= blockDim.y * blockDim.y
//
// 4 warps was 10% faster than 8 warps and 20% faster than 16 warps in unit testing
unsigned n = CudaTraits::WarpSize * 4 ;
while ( n && CudaTraits::SharedMemoryCapacity < cuda_single_inter_block_reduce_scan_shmem<false,FunctorType,WorkTag>( f , n ) ) { n >>= 1 ; }
return n ;
}
inline
void execute()
{
const int nwork = m_policy.end() - m_policy.begin();
if ( nwork ) {
enum { GridMaxComputeCapability_2x = 0x0ffff };
-
+
const int block_size = local_block_size( m_functor );
-
+
const int grid_max =
( block_size * block_size ) < GridMaxComputeCapability_2x ?
( block_size * block_size ) : GridMaxComputeCapability_2x ;
-
+
// At most 'max_grid' blocks:
const int max_grid = std::min( int(grid_max) , int(( nwork + block_size - 1 ) / block_size ));
-
+
// How much work per block:
const int work_per_block = ( nwork + max_grid - 1 ) / max_grid ;
-
+
// How many block are really needed for this much work:
const int grid_x = ( nwork + work_per_block - 1 ) / work_per_block ;
-
+
m_scratch_space = cuda_internal_scratch_space( ValueTraits::value_size( m_functor ) * grid_x );
m_scratch_flags = cuda_internal_scratch_flags( sizeof(size_type) * 1 );
-
+
const dim3 grid( grid_x , 1 , 1 );
const dim3 block( 1 , block_size , 1 ); // REQUIRED DIMENSIONS ( 1 , N , 1 )
const int shmem = ValueTraits::value_size( m_functor ) * ( block_size + 2 );
-
+
m_final = false ;
CudaParallelLaunch< ParallelScan >( *this, grid, block, shmem ); // copy to device and execute
-
+
m_final = true ;
CudaParallelLaunch< ParallelScan >( *this, grid, block, shmem ); // copy to device and execute
}
}
ParallelScan( const FunctorType & arg_functor ,
const Policy & arg_policy )
: m_functor( arg_functor )
, m_policy( arg_policy )
, m_scratch_space( 0 )
, m_scratch_flags( 0 )
, m_final( false )
{ }
};
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template<typename iType>
struct TeamThreadRangeBoundariesStruct<iType,CudaTeamMember> {
typedef iType index_type;
const iType start;
const iType end;
const iType increment;
const CudaTeamMember& thread;
#ifdef __CUDA_ARCH__
__device__ inline
TeamThreadRangeBoundariesStruct (const CudaTeamMember& thread_, const iType& count):
start( threadIdx.y ),
end( count ),
increment( blockDim.y ),
thread(thread_)
{}
__device__ inline
TeamThreadRangeBoundariesStruct (const CudaTeamMember& thread_, const iType& begin_, const iType& end_):
start( begin_+threadIdx.y ),
end( end_ ),
increment( blockDim.y ),
thread(thread_)
{}
#else
KOKKOS_INLINE_FUNCTION
TeamThreadRangeBoundariesStruct (const CudaTeamMember& thread_, const iType& count):
start( 0 ),
end( count ),
increment( 1 ),
thread(thread_)
{}
KOKKOS_INLINE_FUNCTION
TeamThreadRangeBoundariesStruct (const CudaTeamMember& thread_, const iType& begin_, const iType& end_):
start( begin_ ),
end( end_ ),
increment( 1 ),
thread(thread_)
{}
#endif
};
template<typename iType>
struct ThreadVectorRangeBoundariesStruct<iType,CudaTeamMember> {
typedef iType index_type;
const iType start;
const iType end;
const iType increment;
#ifdef __CUDA_ARCH__
__device__ inline
- ThreadVectorRangeBoundariesStruct (const CudaTeamMember& thread, const iType& count):
+ ThreadVectorRangeBoundariesStruct (const CudaTeamMember, const iType& count):
start( threadIdx.x ),
end( count ),
increment( blockDim.x )
{}
+ __device__ inline
+ ThreadVectorRangeBoundariesStruct (const iType& count):
+ start( threadIdx.x ),
+ end( count ),
+ increment( blockDim.x )
+ {}
#else
KOKKOS_INLINE_FUNCTION
- ThreadVectorRangeBoundariesStruct (const CudaTeamMember& thread_, const iType& count):
+ ThreadVectorRangeBoundariesStruct (const CudaTeamMember, const iType& count):
start( 0 ),
end( count ),
increment( 1 )
{}
+ KOKKOS_INLINE_FUNCTION
+ ThreadVectorRangeBoundariesStruct (const iType& count):
+ start( 0 ),
+ end( count ),
+ increment( 1 )
+ {}
#endif
};
} // namespace Impl
template<typename iType>
KOKKOS_INLINE_FUNCTION
-Impl::TeamThreadRangeBoundariesStruct<iType,Impl::CudaTeamMember>
- TeamThreadRange(const Impl::CudaTeamMember& thread, const iType& count) {
- return Impl::TeamThreadRangeBoundariesStruct<iType,Impl::CudaTeamMember>(thread,count);
+Impl::TeamThreadRangeBoundariesStruct< iType, Impl::CudaTeamMember >
+TeamThreadRange( const Impl::CudaTeamMember & thread, const iType & count ) {
+ return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::CudaTeamMember >( thread, count );
}
-template<typename iType>
+template< typename iType1, typename iType2 >
KOKKOS_INLINE_FUNCTION
-Impl::TeamThreadRangeBoundariesStruct<iType,Impl::CudaTeamMember>
- TeamThreadRange(const Impl::CudaTeamMember& thread, const iType& begin, const iType& end) {
- return Impl::TeamThreadRangeBoundariesStruct<iType,Impl::CudaTeamMember>(thread,begin,end);
+Impl::TeamThreadRangeBoundariesStruct< typename std::common_type< iType1, iType2 >::type,
+ Impl::CudaTeamMember >
+TeamThreadRange( const Impl::CudaTeamMember & thread, const iType1 & begin, const iType2 & end ) {
+ typedef typename std::common_type< iType1, iType2 >::type iType;
+ return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::CudaTeamMember >( thread, iType(begin), iType(end) );
}
template<typename iType>
KOKKOS_INLINE_FUNCTION
Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember >
- ThreadVectorRange(const Impl::CudaTeamMember& thread, const iType& count) {
+ThreadVectorRange(const Impl::CudaTeamMember& thread, const iType& count) {
return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember >(thread,count);
}
KOKKOS_INLINE_FUNCTION
Impl::ThreadSingleStruct<Impl::CudaTeamMember> PerTeam(const Impl::CudaTeamMember& thread) {
return Impl::ThreadSingleStruct<Impl::CudaTeamMember>(thread);
}
KOKKOS_INLINE_FUNCTION
Impl::VectorSingleStruct<Impl::CudaTeamMember> PerThread(const Impl::CudaTeamMember& thread) {
return Impl::VectorSingleStruct<Impl::CudaTeamMember>(thread);
}
} // namespace Kokkos
namespace Kokkos {
/** \brief Inter-thread parallel_for. Executes lambda(iType i) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all threads of the the calling thread team.
* This functionality requires C++11 support.*/
template<typename iType, class Lambda>
KOKKOS_INLINE_FUNCTION
void parallel_for(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::CudaTeamMember>& loop_boundaries, const Lambda& lambda) {
#ifdef __CUDA_ARCH__
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment)
lambda(i);
#endif
}
/** \brief Inter-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all threads of the the calling thread team and a summation of
* val is performed and put into result. This functionality requires C++11 support.*/
template< typename iType, class Lambda, typename ValueType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::CudaTeamMember>& loop_boundaries,
const Lambda & lambda, ValueType& result) {
#ifdef __CUDA_ARCH__
result = ValueType();
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
lambda(i,result);
}
- Impl::cuda_intra_warp_reduction(result,[&] (ValueType& dst, const ValueType& src) { dst+=src; });
- Impl::cuda_inter_warp_reduction(result,[&] (ValueType& dst, const ValueType& src) { dst+=src; });
-
+ Impl::cuda_intra_warp_reduction(result,[&] (ValueType& dst, const ValueType& src)
+ { dst+=src; });
+ Impl::cuda_inter_warp_reduction(result,[&] (ValueType& dst, const ValueType& src)
+ { dst+=src; });
#endif
}
/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a reduction of
* val is performed using JoinType(ValueType& val, const ValueType& update) and put into init_result.
* The input value of init_result is used as initializer for temporary variables of ValueType. Therefore
* the input value should be the neutral element with respect to the join operation (e.g. '0 for +-' or
* '1 for *'). This functionality requires C++11 support.*/
template< typename iType, class Lambda, typename ValueType, class JoinType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::CudaTeamMember>& loop_boundaries,
const Lambda & lambda, const JoinType& join, ValueType& init_result) {
#ifdef __CUDA_ARCH__
ValueType result = init_result;
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
lambda(i,result);
}
Impl::cuda_intra_warp_reduction(result, join );
Impl::cuda_inter_warp_reduction(result, join );
init_result = result;
#endif
}
} //namespace Kokkos
namespace Kokkos {
/** \brief Intra-thread vector parallel_for. Executes lambda(iType i) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes of the the calling thread.
* This functionality requires C++11 support.*/
template<typename iType, class Lambda>
KOKKOS_INLINE_FUNCTION
void parallel_for(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember >&
loop_boundaries, const Lambda& lambda) {
#ifdef __CUDA_ARCH__
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment)
lambda(i);
#endif
}
/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a summation of
* val is performed and put into result. This functionality requires C++11 support.*/
template< typename iType, class Lambda, typename ValueType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember >&
loop_boundaries, const Lambda & lambda, ValueType& result) {
#ifdef __CUDA_ARCH__
result = ValueType();
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
lambda(i,result);
}
if (loop_boundaries.increment > 1)
result += shfl_down(result, 1,loop_boundaries.increment);
if (loop_boundaries.increment > 2)
result += shfl_down(result, 2,loop_boundaries.increment);
if (loop_boundaries.increment > 4)
result += shfl_down(result, 4,loop_boundaries.increment);
if (loop_boundaries.increment > 8)
result += shfl_down(result, 8,loop_boundaries.increment);
if (loop_boundaries.increment > 16)
result += shfl_down(result, 16,loop_boundaries.increment);
result = shfl(result,0,loop_boundaries.increment);
#endif
}
/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a reduction of
* val is performed using JoinType(ValueType& val, const ValueType& update) and put into init_result.
* The input value of init_result is used as initializer for temporary variables of ValueType. Therefore
* the input value should be the neutral element with respect to the join operation (e.g. '0 for +-' or
* '1 for *'). This functionality requires C++11 support.*/
template< typename iType, class Lambda, typename ValueType, class JoinType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember >&
loop_boundaries, const Lambda & lambda, const JoinType& join, ValueType& init_result) {
#ifdef __CUDA_ARCH__
ValueType result = init_result;
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
lambda(i,result);
}
if (loop_boundaries.increment > 1)
join( result, shfl_down(result, 1,loop_boundaries.increment));
if (loop_boundaries.increment > 2)
join( result, shfl_down(result, 2,loop_boundaries.increment));
if (loop_boundaries.increment > 4)
join( result, shfl_down(result, 4,loop_boundaries.increment));
if (loop_boundaries.increment > 8)
join( result, shfl_down(result, 8,loop_boundaries.increment));
if (loop_boundaries.increment > 16)
join( result, shfl_down(result, 16,loop_boundaries.increment));
init_result = shfl(result,0,loop_boundaries.increment);
#endif
}
/** \brief Intra-thread vector parallel exclusive prefix sum. Executes lambda(iType i, ValueType & val, bool final)
* for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes in the thread and a scan operation is performed.
* Depending on the target execution space the operator might be called twice: once with final=false
* and once with final=true. When final==true val contains the prefix sum value. The contribution of this
* "i" needs to be added to val no matter whether final==true or not. In a serial execution
* (i.e. team_size==1) the operator is only called once with final==true. Scan_val will be set
* to the final sum value over all vector lanes.
* This functionality requires C++11 support.*/
template< typename iType, class FunctorType >
KOKKOS_INLINE_FUNCTION
void parallel_scan(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember >&
loop_boundaries, const FunctorType & lambda) {
#ifdef __CUDA_ARCH__
typedef Kokkos::Impl::FunctorValueTraits< FunctorType , void > ValueTraits ;
typedef typename ValueTraits::value_type value_type ;
value_type scan_val = value_type();
const int VectorLength = blockDim.x;
iType loop_bound = ((loop_boundaries.end+VectorLength-1)/VectorLength) * VectorLength;
for(int _i = threadIdx.x; _i < loop_bound; _i += VectorLength) {
value_type val = value_type();
if(_i<loop_boundaries.end)
lambda(_i , val , false);
value_type tmp = val;
value_type result_i;
if(threadIdx.x%VectorLength == 0)
result_i = tmp;
if (VectorLength > 1) {
const value_type tmp2 = shfl_up(tmp, 1,VectorLength);
if(threadIdx.x > 0)
tmp+=tmp2;
}
if(threadIdx.x%VectorLength == 1)
result_i = tmp;
if (VectorLength > 3) {
const value_type tmp2 = shfl_up(tmp, 2,VectorLength);
if(threadIdx.x > 1)
tmp+=tmp2;
}
if ((threadIdx.x%VectorLength >= 2) &&
(threadIdx.x%VectorLength < 4))
result_i = tmp;
if (VectorLength > 7) {
const value_type tmp2 = shfl_up(tmp, 4,VectorLength);
if(threadIdx.x > 3)
tmp+=tmp2;
}
if ((threadIdx.x%VectorLength >= 4) &&
(threadIdx.x%VectorLength < 8))
result_i = tmp;
if (VectorLength > 15) {
const value_type tmp2 = shfl_up(tmp, 8,VectorLength);
if(threadIdx.x > 7)
tmp+=tmp2;
}
if ((threadIdx.x%VectorLength >= 8) &&
(threadIdx.x%VectorLength < 16))
result_i = tmp;
if (VectorLength > 31) {
const value_type tmp2 = shfl_up(tmp, 16,VectorLength);
if(threadIdx.x > 15)
tmp+=tmp2;
}
if (threadIdx.x%VectorLength >= 16)
result_i = tmp;
val = scan_val + result_i - val;
scan_val += shfl(tmp,VectorLength-1,VectorLength);
if(_i<loop_boundaries.end)
lambda(_i , val , true);
}
#endif
}
}
namespace Kokkos {
template<class FunctorType>
KOKKOS_INLINE_FUNCTION
void single(const Impl::VectorSingleStruct<Impl::CudaTeamMember>& , const FunctorType& lambda) {
#ifdef __CUDA_ARCH__
if(threadIdx.x == 0) lambda();
#endif
}
template<class FunctorType>
KOKKOS_INLINE_FUNCTION
void single(const Impl::ThreadSingleStruct<Impl::CudaTeamMember>& , const FunctorType& lambda) {
#ifdef __CUDA_ARCH__
if(threadIdx.x == 0 && threadIdx.y == 0) lambda();
#endif
}
template<class FunctorType, class ValueType>
KOKKOS_INLINE_FUNCTION
void single(const Impl::VectorSingleStruct<Impl::CudaTeamMember>& , const FunctorType& lambda, ValueType& val) {
#ifdef __CUDA_ARCH__
if(threadIdx.x == 0) lambda(val);
val = shfl(val,0,blockDim.x);
#endif
}
template<class FunctorType, class ValueType>
KOKKOS_INLINE_FUNCTION
void single(const Impl::ThreadSingleStruct<Impl::CudaTeamMember>& single_struct, const FunctorType& lambda, ValueType& val) {
#ifdef __CUDA_ARCH__
if(threadIdx.x == 0 && threadIdx.y == 0) {
lambda(val);
}
single_struct.team_member.team_broadcast(val,0);
#endif
}
}
namespace Kokkos {
namespace Impl {
template< class FunctorType, class ExecPolicy, class ValueType , class Tag = typename ExecPolicy::work_tag>
struct CudaFunctorAdapter {
const FunctorType f;
typedef ValueType value_type;
CudaFunctorAdapter(const FunctorType& f_):f(f_) {}
__device__ inline
void operator() (typename ExecPolicy::work_tag, const typename ExecPolicy::member_type& i, ValueType& val) const {
//Insert Static Assert with decltype on ValueType equals third argument type of FunctorType::operator()
f(typename ExecPolicy::work_tag(), i,val);
}
};
template< class FunctorType, class ExecPolicy, class ValueType >
struct CudaFunctorAdapter<FunctorType,ExecPolicy,ValueType,void> {
const FunctorType f;
typedef ValueType value_type;
CudaFunctorAdapter(const FunctorType& f_):f(f_) {}
__device__ inline
void operator() (const typename ExecPolicy::member_type& i, ValueType& val) const {
//Insert Static Assert with decltype on ValueType equals second argument type of FunctorType::operator()
f(i,val);
}
__device__ inline
void operator() (typename ExecPolicy::member_type& i, ValueType& val) const {
//Insert Static Assert with decltype on ValueType equals second argument type of FunctorType::operator()
f(i,val);
}
};
template< class FunctorType, class Enable = void>
struct ReduceFunctorHasInit {
enum {value = false};
};
template< class FunctorType>
struct ReduceFunctorHasInit<FunctorType, typename Impl::enable_if< 0 < sizeof( & FunctorType::init ) >::type > {
enum {value = true};
};
template< class FunctorType, class Enable = void>
struct ReduceFunctorHasJoin {
enum {value = false};
};
template< class FunctorType>
struct ReduceFunctorHasJoin<FunctorType, typename Impl::enable_if< 0 < sizeof( & FunctorType::join ) >::type > {
enum {value = true};
};
template< class FunctorType, class Enable = void>
struct ReduceFunctorHasFinal {
enum {value = false};
};
template< class FunctorType>
struct ReduceFunctorHasFinal<FunctorType, typename Impl::enable_if< 0 < sizeof( & FunctorType::final ) >::type > {
enum {value = true};
};
template< class FunctorType, class Enable = void>
struct ReduceFunctorHasShmemSize {
enum {value = false};
};
template< class FunctorType>
struct ReduceFunctorHasShmemSize<FunctorType, typename Impl::enable_if< 0 < sizeof( & FunctorType::team_shmem_size ) >::type > {
enum {value = true};
};
template< class FunctorType, bool Enable =
( FunctorDeclaresValueType<FunctorType,void>::value) ||
( ReduceFunctorHasInit<FunctorType>::value ) ||
( ReduceFunctorHasJoin<FunctorType>::value ) ||
( ReduceFunctorHasFinal<FunctorType>::value ) ||
( ReduceFunctorHasShmemSize<FunctorType>::value )
>
struct IsNonTrivialReduceFunctor {
enum {value = false};
};
template< class FunctorType>
struct IsNonTrivialReduceFunctor<FunctorType, true> {
enum {value = true};
};
template<class FunctorType, class ResultType, class Tag, bool Enable = IsNonTrivialReduceFunctor<FunctorType>::value >
struct FunctorReferenceType {
typedef ResultType& reference_type;
};
template<class FunctorType, class ResultType, class Tag>
struct FunctorReferenceType<FunctorType, ResultType, Tag, true> {
typedef typename Kokkos::Impl::FunctorValueTraits< FunctorType ,Tag >::reference_type reference_type;
};
template< class FunctorTypeIn, class ExecPolicy, class ValueType>
struct ParallelReduceFunctorType<FunctorTypeIn,ExecPolicy,ValueType,Cuda> {
enum {FunctorHasValueType = IsNonTrivialReduceFunctor<FunctorTypeIn>::value };
typedef typename Kokkos::Impl::if_c<FunctorHasValueType, FunctorTypeIn, Impl::CudaFunctorAdapter<FunctorTypeIn,ExecPolicy,ValueType> >::type functor_type;
static functor_type functor(const FunctorTypeIn& functor_in) {
return Impl::if_c<FunctorHasValueType,FunctorTypeIn,functor_type>::select(functor_in,functor_type(functor_in));
}
};
}
} // namespace Kokkos
#endif /* defined( __CUDACC__ ) */
#endif /* #ifndef KOKKOS_CUDA_PARALLEL_HPP */
-
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_ReduceScan.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_ReduceScan.hpp
index 1778f631c..f30a0a891 100644
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_ReduceScan.hpp
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_ReduceScan.hpp
@@ -1,433 +1,444 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_CUDA_REDUCESCAN_HPP
#define KOKKOS_CUDA_REDUCESCAN_HPP
#include <Kokkos_Macros.hpp>
/* only compile this file if CUDA is enabled for Kokkos */
#if defined( __CUDACC__ ) && defined( KOKKOS_HAVE_CUDA )
#include <utility>
#include <Kokkos_Parallel.hpp>
#include <impl/Kokkos_FunctorAdapter.hpp>
#include <impl/Kokkos_Error.hpp>
#include <Cuda/Kokkos_Cuda_Vectorization.hpp>
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
//Shfl based reductions
/*
* Algorithmic constraints:
* (a) threads with same threadIdx.y have same value
* (b) blockDim.x == power of two
* (c) blockDim.z == 1
*/
template< class ValueType , class JoinOp>
__device__
inline void cuda_intra_warp_reduction( ValueType& result,
const JoinOp& join,
const int max_active_thread = blockDim.y) {
unsigned int shift = 1;
//Reduce over values from threads with different threadIdx.y
while(blockDim.x * shift < 32 ) {
const ValueType tmp = shfl_down(result, blockDim.x*shift,32u);
//Only join if upper thread is active (this allows non power of two for blockDim.y
if(threadIdx.y + shift < max_active_thread)
join(result , tmp);
shift*=2;
}
result = shfl(result,0,32);
}
template< class ValueType , class JoinOp>
__device__
inline void cuda_inter_warp_reduction( ValueType& value,
const JoinOp& join,
const int max_active_thread = blockDim.y) {
#define STEP_WIDTH 4
__shared__ char sh_result[sizeof(ValueType)*STEP_WIDTH];
ValueType* result = (ValueType*) & sh_result;
const unsigned step = 32 / blockDim.x;
unsigned shift = STEP_WIDTH;
const int id = threadIdx.y%step==0?threadIdx.y/step:65000;
if(id < STEP_WIDTH ) {
result[id] = value;
}
__syncthreads();
while (shift<=max_active_thread/step) {
if(shift<=id && shift+STEP_WIDTH>id && threadIdx.x==0) {
join(result[id%STEP_WIDTH],value);
}
__syncthreads();
shift+=STEP_WIDTH;
}
value = result[0];
for(int i = 1; (i*step<max_active_thread) && i<STEP_WIDTH; i++)
join(value,result[i]);
}
template< class ValueType , class JoinOp>
__device__
inline void cuda_intra_block_reduction( ValueType& value,
const JoinOp& join,
const int max_active_thread = blockDim.y) {
cuda_intra_warp_reduction(value,join,max_active_thread);
cuda_inter_warp_reduction(value,join,max_active_thread);
}
template< class FunctorType , class JoinOp , class ArgTag = void >
__device__
bool cuda_inter_block_reduction( typename FunctorValueTraits< FunctorType , ArgTag >::reference_type value,
typename FunctorValueTraits< FunctorType , ArgTag >::reference_type neutral,
const JoinOp& join,
Cuda::size_type * const m_scratch_space,
typename FunctorValueTraits< FunctorType , ArgTag >::pointer_type const result,
Cuda::size_type * const m_scratch_flags,
const int max_active_thread = blockDim.y) {
+#ifdef __CUDA_ARCH__
typedef typename FunctorValueTraits< FunctorType , ArgTag >::pointer_type pointer_type;
typedef typename FunctorValueTraits< FunctorType , ArgTag >::value_type value_type;
//Do the intra-block reduction with shfl operations and static shared memory
cuda_intra_block_reduction(value,join,max_active_thread);
const unsigned id = threadIdx.y*blockDim.x + threadIdx.x;
//One thread in the block writes block result to global scratch_memory
if(id == 0 ) {
pointer_type global = ((pointer_type) m_scratch_space) + blockIdx.x;
*global = value;
}
//One warp of last block performs inter block reduction through loading the block values from global scratch_memory
bool last_block = false;
__syncthreads();
if ( id < 32 ) {
Cuda::size_type count;
//Figure out whether this is the last block
if(id == 0)
count = Kokkos::atomic_fetch_add(m_scratch_flags,1);
count = Kokkos::shfl(count,0,32);
//Last block does the inter block reduction
if( count == gridDim.x - 1) {
//set flag back to zero
if(id == 0)
*m_scratch_flags = 0;
last_block = true;
value = neutral;
pointer_type const volatile global = (pointer_type) m_scratch_space ;
//Reduce all global values with splitting work over threads in one warp
const int step_size = blockDim.x*blockDim.y < 32 ? blockDim.x*blockDim.y : 32;
for(int i=id; i<gridDim.x; i+=step_size) {
value_type tmp = global[i];
join(value, tmp);
}
//Perform shfl reductions within the warp only join if contribution is valid (allows gridDim.x non power of two and <32)
if (blockDim.x*blockDim.y > 1) {
value_type tmp = Kokkos::shfl_down(value, 1,32);
if( id + 1 < gridDim.x )
join(value, tmp);
}
if (blockDim.x*blockDim.y > 2) {
value_type tmp = Kokkos::shfl_down(value, 2,32);
if( id + 2 < gridDim.x )
join(value, tmp);
}
if (blockDim.x*blockDim.y > 4) {
value_type tmp = Kokkos::shfl_down(value, 4,32);
if( id + 4 < gridDim.x )
join(value, tmp);
}
if (blockDim.x*blockDim.y > 8) {
value_type tmp = Kokkos::shfl_down(value, 8,32);
if( id + 8 < gridDim.x )
join(value, tmp);
}
if (blockDim.x*blockDim.y > 16) {
value_type tmp = Kokkos::shfl_down(value, 16,32);
if( id + 16 < gridDim.x )
join(value, tmp);
}
}
}
//The last block has in its thread=0 the global reduction value through "value"
return last_block;
+#else
+ return true;
+#endif
}
//----------------------------------------------------------------------------
// See section B.17 of Cuda C Programming Guide Version 3.2
// for discussion of
// __launch_bounds__(maxThreadsPerBlock,minBlocksPerMultiprocessor)
// function qualifier which could be used to improve performance.
//----------------------------------------------------------------------------
// Maximize shared memory and minimize L1 cache:
// cudaFuncSetCacheConfig(MyKernel, cudaFuncCachePreferShared );
// For 2.0 capability: 48 KB shared and 16 KB L1
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
/*
* Algorithmic constraints:
* (a) blockDim.y is a power of two
* (b) blockDim.y <= 512
* (c) blockDim.x == blockDim.z == 1
*/
template< bool DoScan , class FunctorType , class ArgTag >
__device__
void cuda_intra_block_reduce_scan( const FunctorType & functor ,
const typename FunctorValueTraits< FunctorType , ArgTag >::pointer_type base_data )
{
typedef FunctorValueTraits< FunctorType , ArgTag > ValueTraits ;
typedef FunctorValueJoin< FunctorType , ArgTag > ValueJoin ;
typedef typename ValueTraits::pointer_type pointer_type ;
const unsigned value_count = ValueTraits::value_count( functor );
const unsigned BlockSizeMask = blockDim.y - 1 ;
// Must have power of two thread count
if ( BlockSizeMask & blockDim.y ) { Kokkos::abort("Cuda::cuda_intra_block_scan requires power-of-two blockDim"); }
#define BLOCK_REDUCE_STEP( R , TD , S ) \
if ( ! ( R & ((1<<(S+1))-1) ) ) { ValueJoin::join( functor , TD , (TD - (value_count<<S)) ); }
#define BLOCK_SCAN_STEP( TD , N , S ) \
if ( N == (1<<S) ) { ValueJoin::join( functor , TD , (TD - (value_count<<S))); }
const unsigned rtid_intra = threadIdx.y ^ BlockSizeMask ;
const pointer_type tdata_intra = base_data + value_count * threadIdx.y ;
{ // Intra-warp reduction:
BLOCK_REDUCE_STEP(rtid_intra,tdata_intra,0)
BLOCK_REDUCE_STEP(rtid_intra,tdata_intra,1)
BLOCK_REDUCE_STEP(rtid_intra,tdata_intra,2)
BLOCK_REDUCE_STEP(rtid_intra,tdata_intra,3)
BLOCK_REDUCE_STEP(rtid_intra,tdata_intra,4)
}
__syncthreads(); // Wait for all warps to reduce
{ // Inter-warp reduce-scan by a single warp to avoid extra synchronizations
const unsigned rtid_inter = ( threadIdx.y ^ BlockSizeMask ) << CudaTraits::WarpIndexShift ;
if ( rtid_inter < blockDim.y ) {
const pointer_type tdata_inter = base_data + value_count * ( rtid_inter ^ BlockSizeMask );
if ( (1<<5) < BlockSizeMask ) { BLOCK_REDUCE_STEP(rtid_inter,tdata_inter,5) }
if ( (1<<6) < BlockSizeMask ) { __threadfence_block(); BLOCK_REDUCE_STEP(rtid_inter,tdata_inter,6) }
if ( (1<<7) < BlockSizeMask ) { __threadfence_block(); BLOCK_REDUCE_STEP(rtid_inter,tdata_inter,7) }
if ( (1<<8) < BlockSizeMask ) { __threadfence_block(); BLOCK_REDUCE_STEP(rtid_inter,tdata_inter,8) }
if ( DoScan ) {
int n = ( rtid_inter & 32 ) ? 32 : (
( rtid_inter & 64 ) ? 64 : (
( rtid_inter & 128 ) ? 128 : (
( rtid_inter & 256 ) ? 256 : 0 )));
if ( ! ( rtid_inter + n < blockDim.y ) ) n = 0 ;
- BLOCK_SCAN_STEP(tdata_inter,n,8)
- BLOCK_SCAN_STEP(tdata_inter,n,7)
- BLOCK_SCAN_STEP(tdata_inter,n,6)
- BLOCK_SCAN_STEP(tdata_inter,n,5)
+ __threadfence_block(); BLOCK_SCAN_STEP(tdata_inter,n,8)
+ __threadfence_block(); BLOCK_SCAN_STEP(tdata_inter,n,7)
+ __threadfence_block(); BLOCK_SCAN_STEP(tdata_inter,n,6)
+ __threadfence_block(); BLOCK_SCAN_STEP(tdata_inter,n,5)
}
}
}
__syncthreads(); // Wait for inter-warp reduce-scan to complete
if ( DoScan ) {
int n = ( rtid_intra & 1 ) ? 1 : (
( rtid_intra & 2 ) ? 2 : (
( rtid_intra & 4 ) ? 4 : (
( rtid_intra & 8 ) ? 8 : (
( rtid_intra & 16 ) ? 16 : 0 ))));
if ( ! ( rtid_intra + n < blockDim.y ) ) n = 0 ;
-
+ #ifdef KOKKOS_CUDA_CLANG_WORKAROUND
+ BLOCK_SCAN_STEP(tdata_intra,n,4) __syncthreads();//__threadfence_block();
+ BLOCK_SCAN_STEP(tdata_intra,n,3) __syncthreads();//__threadfence_block();
+ BLOCK_SCAN_STEP(tdata_intra,n,2) __syncthreads();//__threadfence_block();
+ BLOCK_SCAN_STEP(tdata_intra,n,1) __syncthreads();//__threadfence_block();
+ BLOCK_SCAN_STEP(tdata_intra,n,0) __syncthreads();
+ #else
BLOCK_SCAN_STEP(tdata_intra,n,4) __threadfence_block();
BLOCK_SCAN_STEP(tdata_intra,n,3) __threadfence_block();
BLOCK_SCAN_STEP(tdata_intra,n,2) __threadfence_block();
BLOCK_SCAN_STEP(tdata_intra,n,1) __threadfence_block();
- BLOCK_SCAN_STEP(tdata_intra,n,0)
+ BLOCK_SCAN_STEP(tdata_intra,n,0) __threadfence_block();
+ #endif
}
#undef BLOCK_SCAN_STEP
#undef BLOCK_REDUCE_STEP
}
//----------------------------------------------------------------------------
/**\brief Input value-per-thread starting at 'shared_data'.
* Reduction value at last thread's location.
*
* If 'DoScan' then write blocks' scan values and block-groups' scan values.
*
* Global reduce result is in the last threads' 'shared_data' location.
*/
template< bool DoScan , class FunctorType , class ArgTag >
__device__
bool cuda_single_inter_block_reduce_scan( const FunctorType & functor ,
const Cuda::size_type block_id ,
const Cuda::size_type block_count ,
Cuda::size_type * const shared_data ,
Cuda::size_type * const global_data ,
Cuda::size_type * const global_flags )
{
typedef Cuda::size_type size_type ;
typedef FunctorValueTraits< FunctorType , ArgTag > ValueTraits ;
typedef FunctorValueJoin< FunctorType , ArgTag > ValueJoin ;
typedef FunctorValueInit< FunctorType , ArgTag > ValueInit ;
typedef FunctorValueOps< FunctorType , ArgTag > ValueOps ;
typedef typename ValueTraits::pointer_type pointer_type ;
typedef typename ValueTraits::reference_type reference_type ;
// '__ffs' = position of the least significant bit set to 1.
// 'blockDim.y' is guaranteed to be a power of two so this
// is the integral shift value that can replace an integral divide.
const unsigned BlockSizeShift = __ffs( blockDim.y ) - 1 ;
const unsigned BlockSizeMask = blockDim.y - 1 ;
// Must have power of two thread count
if ( BlockSizeMask & blockDim.y ) { Kokkos::abort("Cuda::cuda_single_inter_block_reduce_scan requires power-of-two blockDim"); }
const integral_nonzero_constant< size_type , ValueTraits::StaticValueSize / sizeof(size_type) >
word_count( ValueTraits::value_size( functor ) / sizeof(size_type) );
// Reduce the accumulation for the entire block.
cuda_intra_block_reduce_scan<false,FunctorType,ArgTag>( functor , pointer_type(shared_data) );
{
// Write accumulation total to global scratch space.
// Accumulation total is the last thread's data.
size_type * const shared = shared_data + word_count.value * BlockSizeMask ;
size_type * const global = global_data + word_count.value * block_id ;
#if (__CUDA_ARCH__ < 500)
for ( size_type i = threadIdx.y ; i < word_count.value ; i += blockDim.y ) { global[i] = shared[i] ; }
#else
for ( size_type i = 0 ; i < word_count.value ; i += 1 ) { global[i] = shared[i] ; }
#endif
}
// Contributing blocks note that their contribution has been completed via an atomic-increment flag
// If this block is not the last block to contribute to this group then the block is done.
const bool is_last_block =
! __syncthreads_or( threadIdx.y ? 0 : ( 1 + atomicInc( global_flags , block_count - 1 ) < block_count ) );
if ( is_last_block ) {
const size_type b = ( long(block_count) * long(threadIdx.y) ) >> BlockSizeShift ;
const size_type e = ( long(block_count) * long( threadIdx.y + 1 ) ) >> BlockSizeShift ;
{
void * const shared_ptr = shared_data + word_count.value * threadIdx.y ;
reference_type shared_value = ValueInit::init( functor , shared_ptr );
for ( size_type i = b ; i < e ; ++i ) {
ValueJoin::join( functor , shared_ptr , global_data + word_count.value * i );
}
}
cuda_intra_block_reduce_scan<DoScan,FunctorType,ArgTag>( functor , pointer_type(shared_data) );
if ( DoScan ) {
size_type * const shared_value = shared_data + word_count.value * ( threadIdx.y ? threadIdx.y - 1 : blockDim.y );
if ( ! threadIdx.y ) { ValueInit::init( functor , shared_value ); }
// Join previous inclusive scan value to each member
for ( size_type i = b ; i < e ; ++i ) {
size_type * const global_value = global_data + word_count.value * i ;
ValueJoin::join( functor , shared_value , global_value );
ValueOps ::copy( functor , global_value , shared_value );
}
}
}
return is_last_block ;
}
// Size in bytes required for inter block reduce or scan
template< bool DoScan , class FunctorType , class ArgTag >
inline
unsigned cuda_single_inter_block_reduce_scan_shmem( const FunctorType & functor , const unsigned BlockSize )
{
return ( BlockSize + 2 ) * Impl::FunctorValueTraits< FunctorType , ArgTag >::value_size( functor );
}
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
#endif /* #if defined( __CUDACC__ ) */
#endif /* KOKKOS_CUDA_REDUCESCAN_HPP */
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Task.cpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Task.cpp
index 701d267e1..d56de5db6 100644
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Task.cpp
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Task.cpp
@@ -1,179 +1,179 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Core.hpp>
-#if defined( KOKKOS_HAVE_CUDA ) && defined( KOKKOS_ENABLE_TASKPOLICY )
+#if defined( KOKKOS_HAVE_CUDA ) && defined( KOKKOS_ENABLE_TASKDAG )
#include <impl/Kokkos_TaskQueue_impl.hpp>
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template class TaskQueue< Kokkos::Cuda > ;
//----------------------------------------------------------------------------
__device__
void TaskQueueSpecialization< Kokkos::Cuda >::driver
( TaskQueueSpecialization< Kokkos::Cuda >::queue_type * const queue )
{
using Member = TaskExec< Kokkos::Cuda > ;
using Queue = TaskQueue< Kokkos::Cuda > ;
using task_root_type = TaskBase< Kokkos::Cuda , void , void > ;
task_root_type * const end = (task_root_type *) task_root_type::EndTag ;
Member single_exec( 1 );
Member team_exec( blockDim.y );
const int warp_lane = threadIdx.x + threadIdx.y * blockDim.x ;
union {
task_root_type * ptr ;
int raw[2] ;
} task ;
// Loop until all queues are empty and no tasks in flight
do {
// Each team lead attempts to acquire either a thread team task
// or collection of single thread tasks for the team.
if ( 0 == warp_lane ) {
task.ptr = 0 < *((volatile int *) & queue->m_ready_count) ? end : 0 ;
// Loop by priority and then type
for ( int i = 0 ; i < Queue::NumQueue && end == task.ptr ; ++i ) {
for ( int j = 0 ; j < 2 && end == task.ptr ; ++j ) {
task.ptr = Queue::pop_task( & queue->m_ready[i][j] );
}
}
#if 0
printf("TaskQueue<Cuda>::driver(%d,%d) task(%lx)\n",threadIdx.z,blockIdx.x
, uintptr_t(task.ptr));
#endif
}
// shuffle broadcast
task.raw[0] = __shfl( task.raw[0] , 0 );
task.raw[1] = __shfl( task.raw[1] , 0 );
if ( 0 == task.ptr ) break ; // 0 == queue->m_ready_count
if ( end != task.ptr ) {
if ( task_root_type::TaskTeam == task.ptr->m_task_type ) {
// Thread Team Task
(*task.ptr->m_apply)( task.ptr , & team_exec );
}
else if ( 0 == threadIdx.y ) {
// Single Thread Task
(*task.ptr->m_apply)( task.ptr , & single_exec );
}
if ( 0 == warp_lane ) {
queue->complete( task.ptr );
}
}
} while(1);
}
namespace {
__global__
void cuda_task_queue_execute( TaskQueue< Kokkos::Cuda > * queue )
{ TaskQueueSpecialization< Kokkos::Cuda >::driver( queue ); }
}
void TaskQueueSpecialization< Kokkos::Cuda >::execute
( TaskQueue< Kokkos::Cuda > * const queue )
{
const int warps_per_block = 4 ;
const dim3 grid( Kokkos::Impl::cuda_internal_multiprocessor_count() , 1 , 1 );
const dim3 block( 1 , Kokkos::Impl::CudaTraits::WarpSize , warps_per_block );
const int shared = 0 ;
const cudaStream_t stream = 0 ;
CUDA_SAFE_CALL( cudaDeviceSynchronize() );
#if 0
printf("cuda_task_queue_execute before\n");
#endif
// Query the stack size, in bytes:
//
// size_t stack_size = 0 ;
// CUDA_SAFE_CALL( cudaDeviceGetLimit( & stack_size , cudaLimitStackSize ) );
//
// If not large enough then set the stack size, in bytes:
//
// CUDA_SAFE_CALL( cudaDeviceSetLimit( cudaLimitStackSize , stack_size ) );
cuda_task_queue_execute<<< grid , block , shared , stream >>>( queue );
CUDA_SAFE_CALL( cudaGetLastError() );
CUDA_SAFE_CALL( cudaDeviceSynchronize() );
#if 0
printf("cuda_task_queue_execute after\n");
#endif
}
}} /* namespace Kokkos::Impl */
//----------------------------------------------------------------------------
-#endif /* #if defined( KOKKOS_HAVE_CUDA ) && defined( KOKKOS_ENABLE_TASKPOLICY ) */
+#endif /* #if defined( KOKKOS_HAVE_CUDA ) && defined( KOKKOS_ENABLE_TASKDAG ) */
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Task.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Task.hpp
index 9d9347cc8..479294f30 100644
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Task.hpp
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Task.hpp
@@ -1,519 +1,523 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_IMPL_CUDA_TASK_HPP
#define KOKKOS_IMPL_CUDA_TASK_HPP
-#if defined( KOKKOS_ENABLE_TASKPOLICY )
+#if defined( KOKKOS_ENABLE_TASKDAG )
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
namespace {
template< typename TaskType >
__global__
void set_cuda_task_base_apply_function_pointer
( TaskBase<Kokkos::Cuda,void,void>::function_type * ptr )
{ *ptr = TaskType::apply ; }
}
template<>
class TaskQueueSpecialization< Kokkos::Cuda >
{
public:
using execution_space = Kokkos::Cuda ;
using memory_space = Kokkos::CudaUVMSpace ;
using queue_type = TaskQueue< execution_space > ;
static
void iff_single_thread_recursive_execute( queue_type * const ) {}
__device__
static void driver( queue_type * const );
static
void execute( queue_type * const );
template< typename FunctorType >
static
void proc_set_apply( TaskBase<execution_space,void,void>::function_type * ptr )
{
using TaskType = TaskBase< execution_space
, typename FunctorType::value_type
, FunctorType > ;
CUDA_SAFE_CALL( cudaDeviceSynchronize() );
set_cuda_task_base_apply_function_pointer<TaskType><<<1,1>>>(ptr);
CUDA_SAFE_CALL( cudaGetLastError() );
CUDA_SAFE_CALL( cudaDeviceSynchronize() );
}
};
extern template class TaskQueue< Kokkos::Cuda > ;
//----------------------------------------------------------------------------
-/**\brief Impl::TaskExec<Cuda> is the TaskPolicy<Cuda>::member_type
+/**\brief Impl::TaskExec<Cuda> is the TaskScheduler<Cuda>::member_type
* passed to tasks running in a Cuda space.
*
* Cuda thread blocks for tasking are dimensioned:
* blockDim.x == vector length
* blockDim.y == team size
* blockDim.z == number of teams
* where
* blockDim.x * blockDim.y == WarpSize
*
* Both single thread and thread team tasks are run by a full Cuda warp.
* A single thread task is called by warp lane #0 and the remaining
* lanes of the warp are idle.
*/
template<>
class TaskExec< Kokkos::Cuda >
{
private:
TaskExec( TaskExec && ) = delete ;
TaskExec( TaskExec const & ) = delete ;
TaskExec & operator = ( TaskExec && ) = delete ;
TaskExec & operator = ( TaskExec const & ) = delete ;
friend class Kokkos::Impl::TaskQueue< Kokkos::Cuda > ;
friend class Kokkos::Impl::TaskQueueSpecialization< Kokkos::Cuda > ;
const int m_team_size ;
__device__
TaskExec( int arg_team_size = blockDim.y )
: m_team_size( arg_team_size ) {}
public:
#if defined( __CUDA_ARCH__ )
__device__ void team_barrier() { /* __threadfence_block(); */ }
__device__ int team_rank() const { return threadIdx.y ; }
__device__ int team_size() const { return m_team_size ; }
#else
__host__ void team_barrier() {}
__host__ int team_rank() const { return 0 ; }
__host__ int team_size() const { return 0 ; }
#endif
};
//----------------------------------------------------------------------------
template<typename iType>
struct TeamThreadRangeBoundariesStruct<iType, TaskExec< Kokkos::Cuda > >
{
typedef iType index_type;
const iType start ;
const iType end ;
const iType increment ;
const TaskExec< Kokkos::Cuda > & thread;
#if defined( __CUDA_ARCH__ )
__device__ inline
TeamThreadRangeBoundariesStruct
( const TaskExec< Kokkos::Cuda > & arg_thread, const iType& arg_count)
: start( threadIdx.y )
, end(arg_count)
, increment( blockDim.y )
, thread(arg_thread)
{}
__device__ inline
TeamThreadRangeBoundariesStruct
( const TaskExec< Kokkos::Cuda > & arg_thread
, const iType & arg_start
, const iType & arg_end
)
: start( arg_start + threadIdx.y )
, end( arg_end)
, increment( blockDim.y )
, thread( arg_thread )
{}
#else
TeamThreadRangeBoundariesStruct
( const TaskExec< Kokkos::Cuda > & arg_thread, const iType& arg_count);
TeamThreadRangeBoundariesStruct
( const TaskExec< Kokkos::Cuda > & arg_thread
, const iType & arg_start
, const iType & arg_end
);
#endif
};
//----------------------------------------------------------------------------
template<typename iType>
struct ThreadVectorRangeBoundariesStruct<iType, TaskExec< Kokkos::Cuda > >
{
typedef iType index_type;
const iType start ;
const iType end ;
const iType increment ;
const TaskExec< Kokkos::Cuda > & thread;
#if defined( __CUDA_ARCH__ )
__device__ inline
ThreadVectorRangeBoundariesStruct
( const TaskExec< Kokkos::Cuda > & arg_thread, const iType& arg_count)
: start( threadIdx.x )
, end(arg_count)
, increment( blockDim.x )
, thread(arg_thread)
{}
#else
ThreadVectorRangeBoundariesStruct
( const TaskExec< Kokkos::Cuda > & arg_thread, const iType& arg_count);
#endif
};
}} /* namespace Kokkos::Impl */
//----------------------------------------------------------------------------
namespace Kokkos {
template<typename iType>
KOKKOS_INLINE_FUNCTION
-Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Cuda > >
-TeamThreadRange( const Impl::TaskExec< Kokkos::Cuda > & thread
- , const iType & count )
+Impl::TeamThreadRangeBoundariesStruct< iType, Impl::TaskExec< Kokkos::Cuda > >
+TeamThreadRange( const Impl::TaskExec< Kokkos::Cuda > & thread, const iType & count )
{
- return Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Cuda > >(thread,count);
+ return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::TaskExec< Kokkos::Cuda > >( thread, count );
}
-template<typename iType>
+template<typename iType1, typename iType2>
KOKKOS_INLINE_FUNCTION
-Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Cuda > >
-TeamThreadRange( const Impl::TaskExec< Kokkos::Cuda > & thread, const iType & start , const iType & end )
+Impl::TeamThreadRangeBoundariesStruct
+ < typename std::common_type<iType1,iType2>::type
+ , Impl::TaskExec< Kokkos::Cuda > >
+TeamThreadRange( const Impl::TaskExec< Kokkos::Cuda > & thread
+ , const iType1 & begin, const iType2 & end )
{
- return Impl::TeamThreadRangeBoundariesStruct<iType,Impl:: TaskExec< Kokkos::Cuda > >(thread,start,end);
+ typedef typename std::common_type< iType1, iType2 >::type iType;
+ return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::TaskExec< Kokkos::Cuda > >(
+ thread, iType(begin), iType(end) );
}
template<typename iType>
KOKKOS_INLINE_FUNCTION
Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Cuda > >
ThreadVectorRange( const Impl::TaskExec< Kokkos::Cuda > & thread
, const iType & count )
{
return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Cuda > >(thread,count);
}
/** \brief Inter-thread parallel_for. Executes lambda(iType i) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all threads of the the calling thread team.
* This functionality requires C++11 support.
*/
template<typename iType, class Lambda>
KOKKOS_INLINE_FUNCTION
void parallel_for
( const Impl::TeamThreadRangeBoundariesStruct<iType,Impl:: TaskExec< Kokkos::Cuda > >& loop_boundaries
, const Lambda& lambda
)
{
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
lambda(i);
}
}
// reduce across corresponding lanes between team members within warp
// assume stride*team_size == warp_size
template< typename ValueType, class JoinType >
KOKKOS_INLINE_FUNCTION
void strided_shfl_warp_reduction
(const JoinType& join,
ValueType& val,
int team_size,
int stride)
{
for (int lane_delta=(team_size*stride)>>1; lane_delta>=stride; lane_delta>>=1) {
join(val, Kokkos::shfl_down(val, lane_delta, team_size*stride));
}
}
// multiple within-warp non-strided reductions
template< typename ValueType, class JoinType >
KOKKOS_INLINE_FUNCTION
void multi_shfl_warp_reduction
(const JoinType& join,
ValueType& val,
int vec_length)
{
for (int lane_delta=vec_length>>1; lane_delta; lane_delta>>=1) {
join(val, Kokkos::shfl_down(val, lane_delta, vec_length));
}
}
// broadcast within warp
template< class ValueType >
KOKKOS_INLINE_FUNCTION
ValueType shfl_warp_broadcast
(ValueType& val,
int src_lane,
int width)
{
return Kokkos::shfl(val, src_lane, width);
}
// all-reduce across corresponding vector lanes between team members within warp
-// assume vec_length*team_size == warp_size
+// assume vec_length*team_size == warp_size
// blockDim.x == vec_length == stride
// blockDim.y == team_size
// threadIdx.x == position in vec
// threadIdx.y == member number
template< typename iType, class Lambda, typename ValueType, class JoinType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce
(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Cuda > >& loop_boundaries,
const Lambda & lambda,
const JoinType& join,
ValueType& initialized_result) {
ValueType result = initialized_result;
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
lambda(i,result);
}
initialized_result = result;
strided_shfl_warp_reduction<ValueType, JoinType>(
join,
initialized_result,
loop_boundaries.thread.team_size(),
blockDim.x);
initialized_result = shfl_warp_broadcast<ValueType>( initialized_result, threadIdx.x, Impl::CudaTraits::WarpSize );
}
// all-reduce across corresponding vector lanes between team members within warp
// if no join() provided, use sum
-// assume vec_length*team_size == warp_size
+// assume vec_length*team_size == warp_size
// blockDim.x == vec_length == stride
// blockDim.y == team_size
// threadIdx.x == position in vec
// threadIdx.y == member number
template< typename iType, class Lambda, typename ValueType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce
(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Cuda > >& loop_boundaries,
const Lambda & lambda,
ValueType& initialized_result) {
//TODO what is the point of creating this temporary?
ValueType result = initialized_result;
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
lambda(i,result);
}
initialized_result = result;
strided_shfl_warp_reduction(
[&] (ValueType& val1, const ValueType& val2) { val1 += val2; },
initialized_result,
loop_boundaries.thread.team_size(),
blockDim.x);
initialized_result = shfl_warp_broadcast<ValueType>( initialized_result, threadIdx.x, Impl::CudaTraits::WarpSize );
}
// all-reduce within team members within warp
-// assume vec_length*team_size == warp_size
+// assume vec_length*team_size == warp_size
// blockDim.x == vec_length == stride
// blockDim.y == team_size
// threadIdx.x == position in vec
// threadIdx.y == member number
template< typename iType, class Lambda, typename ValueType, class JoinType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce
(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Cuda > >& loop_boundaries,
const Lambda & lambda,
const JoinType& join,
ValueType& initialized_result) {
ValueType result = initialized_result;
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
lambda(i,result);
}
initialized_result = result;
multi_shfl_warp_reduction<ValueType, JoinType>(join, initialized_result, blockDim.x);
initialized_result = shfl_warp_broadcast<ValueType>( initialized_result, 0, blockDim.x );
}
// all-reduce within team members within warp
// if no join() provided, use sum
-// assume vec_length*team_size == warp_size
+// assume vec_length*team_size == warp_size
// blockDim.x == vec_length == stride
// blockDim.y == team_size
// threadIdx.x == position in vec
// threadIdx.y == member number
template< typename iType, class Lambda, typename ValueType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce
(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Cuda > >& loop_boundaries,
const Lambda & lambda,
ValueType& initialized_result) {
ValueType result = initialized_result;
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
lambda(i,result);
}
initialized_result = result;
//initialized_result = multi_shfl_warp_reduction(
multi_shfl_warp_reduction(
[&] (ValueType& val1, const ValueType& val2) { val1 += val2; },
initialized_result,
blockDim.x);
initialized_result = shfl_warp_broadcast<ValueType>( initialized_result, 0, blockDim.x );
}
// scan across corresponding vector lanes between team members within warp
-// assume vec_length*team_size == warp_size
+// assume vec_length*team_size == warp_size
// blockDim.x == vec_length == stride
// blockDim.y == team_size
// threadIdx.x == position in vec
// threadIdx.y == member number
template< typename ValueType, typename iType, class Lambda >
KOKKOS_INLINE_FUNCTION
void parallel_scan
(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Cuda > >& loop_boundaries,
const Lambda & lambda) {
ValueType accum = 0 ;
ValueType val, y, local_total;
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
val = 0;
lambda(i,val,false);
// intra-blockDim.y exclusive scan on 'val'
// accum = accumulated, sum in total for this iteration
// INCLUSIVE scan
for( int offset = blockDim.x ; offset < Impl::CudaTraits::WarpSize ; offset <<= 1 ) {
y = Kokkos::shfl_up(val, offset, Impl::CudaTraits::WarpSize);
if(threadIdx.y*blockDim.x >= offset) { val += y; }
}
// pass accum to all threads
local_total = shfl_warp_broadcast<ValueType>(val,
threadIdx.x+Impl::CudaTraits::WarpSize-blockDim.x,
Impl::CudaTraits::WarpSize);
// make EXCLUSIVE scan by shifting values over one
val = Kokkos::shfl_up(val, blockDim.x, Impl::CudaTraits::WarpSize);
if ( threadIdx.y == 0 ) { val = 0 ; }
val += accum;
lambda(i,val,true);
accum += local_total;
}
}
// scan within team member (vector) within warp
-// assume vec_length*team_size == warp_size
+// assume vec_length*team_size == warp_size
// blockDim.x == vec_length == stride
// blockDim.y == team_size
// threadIdx.x == position in vec
// threadIdx.y == member number
template< typename iType, class Lambda, typename ValueType >
KOKKOS_INLINE_FUNCTION
void parallel_scan
(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Cuda > >& loop_boundaries,
const Lambda & lambda)
{
ValueType accum = 0 ;
ValueType val, y, local_total;
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
val = 0;
lambda(i,val,false);
// intra-blockDim.x exclusive scan on 'val'
// accum = accumulated, sum in total for this iteration
// INCLUSIVE scan
for( int offset = 1 ; offset < blockDim.x ; offset <<= 1 ) {
y = Kokkos::shfl_up(val, offset, blockDim.x);
if(threadIdx.x >= offset) { val += y; }
}
// pass accum to all threads
local_total = shfl_warp_broadcast<ValueType>(val, blockDim.x-1, blockDim.x);
// make EXCLUSIVE scan by shifting values over one
val = Kokkos::shfl_up(val, 1, blockDim.x);
if ( threadIdx.x == 0 ) { val = 0 ; }
val += accum;
lambda(i,val,true);
accum += local_total;
}
}
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
-#endif /* #if defined( KOKKOS_ENABLE_TASKPOLICY ) */
+#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
#endif /* #ifndef KOKKOS_IMPL_CUDA_TASK_HPP */
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_TaskPolicy.cpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_TaskPolicy.cpp
deleted file mode 100644
index bb3cd2640..000000000
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_TaskPolicy.cpp
+++ /dev/null
@@ -1,932 +0,0 @@
-/*
-//@HEADER
-// ************************************************************************
-//
-// Kokkos v. 2.0
-// Copyright (2014) Sandia Corporation
-//
-// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
-// the U.S. Government retains certain rights in this software.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are
-// met:
-//
-// 1. Redistributions of source code must retain the above copyright
-// notice, this list of conditions and the following disclaimer.
-//
-// 2. Redistributions in binary form must reproduce the above copyright
-// notice, this list of conditions and the following disclaimer in the
-// documentation and/or other materials provided with the distribution.
-//
-// 3. Neither the name of the Corporation nor the names of the
-// contributors may be used to endorse or promote products derived from
-// this software without specific prior written permission.
-//
-// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
-// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
-// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
-// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
-// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
-// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
-// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
-// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
-// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-//
-// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
-// ************************************************************************
-//@HEADER
-*/
-
-// Experimental unified task-data parallel manycore LDRD
-
-#include <stdio.h>
-#include <iostream>
-#include <sstream>
-#include <Kokkos_Core.hpp>
-#include <Cuda/Kokkos_Cuda_TaskPolicy.hpp>
-
-#if defined( KOKKOS_HAVE_CUDA ) && defined( KOKKOS_ENABLE_TASKPOLICY )
-
-// #define DETAILED_PRINT
-
-//----------------------------------------------------------------------------
-
-#define QLOCK reinterpret_cast<void*>( ~((uintptr_t)0) )
-#define QDENIED reinterpret_cast<void*>( ~((uintptr_t)0) - 1 )
-
-namespace Kokkos {
-namespace Experimental {
-namespace Impl {
-
-void CudaTaskPolicyQueue::Destroy::destroy_shared_allocation()
-{
- // Verify the queue is empty
-
- if ( m_policy->m_count_ready ||
- m_policy->m_team[0] ||
- m_policy->m_team[1] ||
- m_policy->m_team[2] ||
- m_policy->m_serial[0] ||
- m_policy->m_serial[1] ||
- m_policy->m_serial[2] ) {
- Kokkos::abort("CudaTaskPolicyQueue ERROR : Attempt to destroy non-empty queue" );
- }
-
- m_policy->~CudaTaskPolicyQueue();
-
- Kokkos::Cuda::fence();
-}
-
-CudaTaskPolicyQueue::
-~CudaTaskPolicyQueue()
-{
-}
-
-CudaTaskPolicyQueue::
-CudaTaskPolicyQueue
- ( const unsigned arg_task_max_count
- , const unsigned arg_task_max_size
- , const unsigned arg_task_default_dependence_capacity
- , const unsigned arg_team_size
- )
- : m_space( Kokkos::CudaUVMSpace()
- , arg_task_max_size * arg_task_max_count * 1.2
- , 16 /* log2(superblock size) */
- )
- , m_team { 0 , 0 , 0 }
- , m_serial { 0 , 0 , 0 }
- , m_team_size( 32 /* 1 warps */ )
- , m_default_dependence_capacity( arg_task_default_dependence_capacity )
- , m_count_ready(0)
-{
- constexpr int max_team_size = 32 * 16 /* 16 warps */ ;
-
- const int target_team_size =
- std::min( int(arg_team_size) , max_team_size );
-
- while ( m_team_size < target_team_size ) { m_team_size *= 2 ; }
-}
-
-//-----------------------------------------------------------------------
-// Called by each block & thread
-
-__device__
-void Kokkos::Experimental::Impl::CudaTaskPolicyQueue::driver()
-{
- task_root_type * const q_denied = reinterpret_cast<task_root_type*>(QDENIED);
-
-#define IS_TEAM_LEAD ( threadIdx.x == 0 && threadIdx.y == 0 )
-
-#ifdef DETAILED_PRINT
-if ( IS_TEAM_LEAD ) {
- printf( "CudaTaskPolicyQueue::driver() begin on %d with count %d\n"
- , blockIdx.x , m_count_ready );
-}
-#endif
-
- // Each thread block must iterate this loop synchronously
- // to insure team-execution of team-task
-
- __shared__ task_root_type * team_task ;
-
- __syncthreads();
-
- do {
-
- if ( IS_TEAM_LEAD ) {
- if ( 0 == m_count_ready ) {
- team_task = q_denied ; // All queues are empty and no running tasks
- }
- else {
- team_task = 0 ;
- for ( int i = 0 ; i < int(NPRIORITY) && 0 == team_task ; ++i ) {
- if ( ( i < 2 /* regular queue */ )
- || ( ! m_space.is_empty() /* waiting for memory */ ) ) {
- team_task = pop_ready_task( & m_team[i] );
- }
- }
- }
- }
-
- __syncthreads();
-
-#ifdef DETAILED_PRINT
-if ( IS_TEAM_LEAD && 0 != team_task ) {
- printf( "CudaTaskPolicyQueue::driver() (%d) team_task(0x%lx)\n"
- , blockIdx.x
- , (unsigned long) team_task );
-}
-#endif
-
- // team_task == q_denied if all queues are empty
- // team_task == 0 if no team tasks available
-
- if ( q_denied != team_task ) {
- if ( 0 != team_task ) {
-
- Kokkos::Impl::CudaTeamMember
- member( kokkos_impl_cuda_shared_memory<void>()
- , 16 /* shared_begin */
- , team_task->m_shmem_size /* shared size */
- , 0 /* scratch level 1 pointer */
- , 0 /* scratch level 1 size */
- , 0 /* league rank */
- , 1 /* league size */
- );
-
- (*team_task->m_team)( team_task , member );
-
- // A __synthreads was called and if completed the
- // functor was destroyed.
-
- if ( IS_TEAM_LEAD ) {
- complete_executed_task( team_task );
- }
- }
- else {
- // One thread of one warp performs this serial task
- if ( threadIdx.x == 0 &&
- 0 == ( threadIdx.y % 32 ) ) {
- task_root_type * task = 0 ;
- for ( int i = 0 ; i < int(NPRIORITY) && 0 == task ; ++i ) {
- if ( ( i < 2 /* regular queue */ )
- || ( ! m_space.is_empty() /* waiting for memory */ ) ) {
- task = pop_ready_task( & m_serial[i] );
- }
- }
-
-#ifdef DETAILED_PRINT
-if ( 0 != task ) {
- printf( "CudaTaskPolicyQueue::driver() (%2d)(%d) single task(0x%lx)\n"
- , blockIdx.x
- , threadIdx.y
- , (unsigned long) task );
-}
-#endif
-
- if ( task ) {
- (*task->m_serial)( task );
- complete_executed_task( task );
- }
- }
-
- __syncthreads();
- }
- }
- } while ( q_denied != team_task );
-
-#ifdef DETAILED_PRINT
-if ( IS_TEAM_LEAD ) {
- printf( "CudaTaskPolicyQueue::driver() end on %d with count %d\n"
- , blockIdx.x , m_count_ready );
-}
-#endif
-
-#undef IS_TEAM_LEAD
-}
-
-//-----------------------------------------------------------------------
-
-__device__
-CudaTaskPolicyQueue::task_root_type *
-CudaTaskPolicyQueue::pop_ready_task(
- CudaTaskPolicyQueue::task_root_type * volatile * const queue )
-{
- task_root_type * const q_lock = reinterpret_cast<task_root_type*>(QLOCK);
- task_root_type * task = 0 ;
- task_root_type * const task_claim = *queue ;
-
- if ( ( q_lock != task_claim ) && ( 0 != task_claim ) ) {
-
- // Queue is not locked and not null, try to claim head of queue.
- // Is a race among threads to claim the queue.
-
- if ( task_claim == atomic_compare_exchange(queue,task_claim,q_lock) ) {
-
- // Aquired the task which must be in the waiting state.
-
- const int claim_state =
- atomic_compare_exchange( & task_claim->m_state
- , int(TASK_STATE_WAITING)
- , int(TASK_STATE_EXECUTING) );
-
- task_root_type * lock_verify = 0 ;
-
- if ( claim_state == int(TASK_STATE_WAITING) ) {
-
- // Transitioned this task from waiting to executing
- // Update the queue to the next entry and release the lock
-
- task_root_type * const next =
- *((task_root_type * volatile *) & task_claim->m_next );
-
- *((task_root_type * volatile *) & task_claim->m_next ) = 0 ;
-
- lock_verify = atomic_compare_exchange( queue , q_lock , next );
- }
-
- if ( ( claim_state != int(TASK_STATE_WAITING) ) |
- ( q_lock != lock_verify ) ) {
-
- printf( "CudaTaskPolicyQueue::pop_ready_task(0x%lx) task(0x%lx) state(%d) ERROR %s\n"
- , (unsigned long) queue
- , (unsigned long) task
- , claim_state
- , ( claim_state != int(TASK_STATE_WAITING)
- ? "NOT WAITING"
- : "UNLOCK" ) );
- Kokkos::abort("CudaTaskPolicyQueue::pop_ready_task");
- }
-
- task = task_claim ;
- }
- }
- return task ;
-}
-
-//-----------------------------------------------------------------------
-
-__device__
-void CudaTaskPolicyQueue::complete_executed_task(
- CudaTaskPolicyQueue::task_root_type * task )
-{
- task_root_type * const q_denied = reinterpret_cast<task_root_type*>(QDENIED);
-
-
-#ifdef DETAILED_PRINT
-printf( "CudaTaskPolicyQueue::complete_executed_task(0x%lx) state(%d) (%d)(%d,%d)\n"
- , (unsigned long) task
- , task->m_state
- , blockIdx.x
- , threadIdx.x
- , threadIdx.y
- );
-#endif
-
- // State is either executing or if respawned then waiting,
- // try to transition from executing to complete.
- // Reads the current value.
-
- const int state_old =
- atomic_compare_exchange( & task->m_state
- , int(Kokkos::Experimental::TASK_STATE_EXECUTING)
- , int(Kokkos::Experimental::TASK_STATE_COMPLETE) );
-
- if ( int(Kokkos::Experimental::TASK_STATE_WAITING) == state_old ) {
- /* Task requested a respawn so reschedule it */
- schedule_task( task , false /* not initial spawn */ );
- }
- else if ( int(Kokkos::Experimental::TASK_STATE_EXECUTING) == state_old ) {
- /* Task is complete */
-
- // Clear dependences of this task before locking wait queue
-
- task->clear_dependence();
-
- // Stop other tasks from adding themselves to this task's wait queue.
- // The wait queue is updated concurrently so guard with an atomic.
-
- task_root_type * wait_queue = *((task_root_type * volatile *) & task->m_wait );
- task_root_type * wait_queue_old = 0 ;
-
- do {
- wait_queue_old = wait_queue ;
- wait_queue = atomic_compare_exchange( & task->m_wait , wait_queue_old , q_denied );
- } while ( wait_queue_old != wait_queue );
-
- // The task has been removed from ready queue and
- // execution is complete so decrement the reference count.
- // The reference count was incremented by the initial spawning.
- // The task may be deleted if this was the last reference.
-
- task_root_type::assign( & task , 0 );
-
- // Pop waiting tasks and schedule them
- while ( wait_queue ) {
- task_root_type * const x = wait_queue ; wait_queue = x->m_next ; x->m_next = 0 ;
- schedule_task( x , false /* not initial spawn */ );
- }
- }
- else {
- printf( "CudaTaskPolicyQueue::complete_executed_task(0x%lx) ERROR state_old(%d) dep_size(%d)\n"
- , (unsigned long)( task )
- , int(state_old)
- , task->m_dep_size
- );
- Kokkos::abort("CudaTaskPolicyQueue::complete_executed_task" );
- }
-
- // If the task was respawned it may have already been
- // put in a ready queue and the count incremented.
- // By decrementing the count last it will never go to zero
- // with a ready or executing task.
-
- atomic_fetch_add( & m_count_ready , -1 );
-}
-
-__device__
-void TaskMember< Kokkos::Cuda , void , void >::latch_add( const int k )
-{
- typedef TaskMember< Kokkos::Cuda , void , void > task_root_type ;
-
- task_root_type * const q_denied = reinterpret_cast<task_root_type*>(QDENIED);
-
- const bool ok_input = 0 < k ;
-
- const int count = ok_input ? atomic_fetch_add( & m_dep_size , -k ) - k
- : k ;
-
- const bool ok_count = 0 <= count ;
-
- const int state = 0 != count ? TASK_STATE_WAITING :
- atomic_compare_exchange( & m_state
- , TASK_STATE_WAITING
- , TASK_STATE_COMPLETE );
-
- const bool ok_state = state == TASK_STATE_WAITING ;
-
- if ( ! ok_count || ! ok_state ) {
- printf( "CudaTaskPolicyQueue::latch_add[0x%lx](%d) ERROR %s %d\n"
- , (unsigned long) this
- , k
- , ( ! ok_input ? "Non-positive input" :
- ( ! ok_count ? "Negative count" : "Bad State" ) )
- , ( ! ok_input ? k :
- ( ! ok_count ? count : state ) )
- );
- Kokkos::abort( "CudaTaskPolicyQueue::latch_add ERROR" );
- }
- else if ( 0 == count ) {
- // Stop other tasks from adding themselves to this latch's wait queue.
- // The wait queue is updated concurrently so guard with an atomic.
-
- CudaTaskPolicyQueue & policy = *m_policy ;
- task_root_type * wait_queue = *((task_root_type * volatile *) &m_wait);
- task_root_type * wait_queue_old = 0 ;
-
- do {
- wait_queue_old = wait_queue ;
- wait_queue = atomic_compare_exchange( & m_wait , wait_queue_old , q_denied );
- } while ( wait_queue_old != wait_queue );
-
- // Pop waiting tasks and schedule them
- while ( wait_queue ) {
- task_root_type * const x = wait_queue ; wait_queue = x->m_next ; x->m_next = 0 ;
- policy.schedule_task( x , false /* not initial spawn */ );
- }
- }
-}
-
-//----------------------------------------------------------------------------
-
-void CudaTaskPolicyQueue::reschedule_task(
- CudaTaskPolicyQueue::task_root_type * const task )
-{
- // Reschedule transitions from executing back to waiting.
- const int old_state =
- atomic_compare_exchange( & task->m_state
- , int(TASK_STATE_EXECUTING)
- , int(TASK_STATE_WAITING) );
-
- if ( old_state != int(TASK_STATE_EXECUTING) ) {
-
- printf( "CudaTaskPolicyQueue::reschedule_task(0x%lx) ERROR state(%d)\n"
- , (unsigned long) task
- , old_state
- );
- Kokkos::abort("CudaTaskPolicyQueue::reschedule" );
- }
-}
-
-KOKKOS_FUNCTION
-void CudaTaskPolicyQueue::schedule_task(
- CudaTaskPolicyQueue::task_root_type * const task ,
- const bool initial_spawn )
-{
- task_root_type * const q_lock = reinterpret_cast<task_root_type*>(QLOCK);
- task_root_type * const q_denied = reinterpret_cast<task_root_type*>(QDENIED);
-
- //----------------------------------------
- // State is either constructing or already waiting.
- // If constructing then transition to waiting.
-
- {
- const int old_state = atomic_compare_exchange( & task->m_state
- , int(TASK_STATE_CONSTRUCTING)
- , int(TASK_STATE_WAITING) );
-
- // Head of linked list of tasks waiting on this task
- task_root_type * const waitTask =
- *((task_root_type * volatile const *) & task->m_wait );
-
- // Member of linked list of tasks waiting on some other task
- task_root_type * const next =
- *((task_root_type * volatile const *) & task->m_next );
-
- // An incomplete and non-executing task has:
- // task->m_state == TASK_STATE_CONSTRUCTING or TASK_STATE_WAITING
- // task->m_wait != q_denied
- // task->m_next == 0
- //
- if ( ( q_denied == waitTask ) ||
- ( 0 != next ) ||
- ( old_state != int(TASK_STATE_CONSTRUCTING) &&
- old_state != int(TASK_STATE_WAITING) ) ) {
- printf( "CudaTaskPolicyQueue::schedule_task(0x%lx) STATE ERROR: state(%d) wait(0x%lx) next(0x%lx)\n"
- , (unsigned long) task
- , old_state
- , (unsigned long) waitTask
- , (unsigned long) next );
- Kokkos::abort("CudaTaskPolicyQueue::schedule" );
- }
- }
-
- //----------------------------------------
-
- if ( initial_spawn ) {
- // The initial spawn of a task increments the reference count
- // for the task's existence in either a waiting or ready queue
- // until the task has completed.
- // Completing the task's execution is the matching
- // decrement of the reference count.
- task_root_type::assign( 0 , task );
- }
-
- //----------------------------------------
- // Insert this task into a dependence task that is not complete.
- // Push on to that task's wait queue.
-
- bool attempt_insert_in_queue = true ;
-
- task_root_type * volatile * queue =
- task->m_dep_size ? & task->m_dep[0]->m_wait : (task_root_type **) 0 ;
-
- for ( int i = 0 ; attempt_insert_in_queue && ( 0 != queue ) ; ) {
-
- task_root_type * const head_value_old = *queue ;
-
- if ( q_denied == head_value_old ) {
- // Wait queue is closed because task is complete,
- // try again with the next dependence wait queue.
- ++i ;
- queue = i < task->m_dep_size ? & task->m_dep[i]->m_wait
- : (task_root_type **) 0 ;
- }
- else {
-
- // Wait queue is open and not denied.
- // Have exclusive access to this task.
- // Assign m_next assuming a successfull insertion into the queue.
- // Fence the memory assignment before attempting the CAS.
-
- *((task_root_type * volatile *) & task->m_next ) = head_value_old ;
-
- memory_fence();
-
- // Attempt to insert this task into the queue.
- // If fails then continue the attempt.
-
- attempt_insert_in_queue =
- head_value_old != atomic_compare_exchange(queue,head_value_old,task);
- }
- }
-
- //----------------------------------------
- // All dependences are complete, insert into the ready list
-
- if ( attempt_insert_in_queue ) {
-
- // Increment the count of ready tasks.
- // Count will be decremented when task is complete.
-
- atomic_fetch_add( & m_count_ready , 1 );
-
- queue = task->m_queue ;
-
- while ( attempt_insert_in_queue ) {
-
- // A locked queue is being popped.
-
- task_root_type * const head_value_old = *queue ;
-
- if ( q_lock != head_value_old ) {
- // Read the head of ready queue,
- // if same as previous value then CAS locks the ready queue
-
- // Have exclusive access to this task,
- // assign to head of queue, assuming successful insert
- // Fence assignment before attempting insert.
- *((task_root_type * volatile *) & task->m_next ) = head_value_old ;
-
- memory_fence();
-
- attempt_insert_in_queue =
- head_value_old != atomic_compare_exchange(queue,head_value_old,task);
- }
- }
- }
-}
-
-void CudaTaskPolicyQueue::deallocate_task
- ( CudaTaskPolicyQueue::task_root_type * const task )
-{
- m_space.deallocate( task , task->m_size_alloc );
-}
-
-KOKKOS_FUNCTION
-CudaTaskPolicyQueue::task_root_type *
-CudaTaskPolicyQueue::allocate_task
- ( const unsigned arg_sizeof_task
- , const unsigned arg_dep_capacity
- , const unsigned arg_team_shmem
- )
-{
- const unsigned base_size = arg_sizeof_task +
- ( arg_sizeof_task % sizeof(task_root_type*)
- ? sizeof(task_root_type*) - arg_sizeof_task % sizeof(task_root_type*)
- : 0 );
-
- const unsigned dep_capacity
- = ~0u == arg_dep_capacity
- ? m_default_dependence_capacity
- : arg_dep_capacity ;
-
- const unsigned size_alloc =
- base_size + sizeof(task_root_type*) * dep_capacity ;
-
- task_root_type * const task =
- reinterpret_cast<task_root_type*>( m_space.allocate( size_alloc ) );
-
- if ( task != 0 ) {
-
- // Initialize task's root and value data structure
- // Calling function must copy construct the functor.
-
- new( (void*) task ) task_root_type();
-
- task->m_policy = this ;
- task->m_size_alloc = size_alloc ;
- task->m_dep_capacity = dep_capacity ;
- task->m_shmem_size = arg_team_shmem ;
-
- if ( dep_capacity ) {
- task->m_dep =
- reinterpret_cast<task_root_type**>(
- reinterpret_cast<unsigned char*>(task) + base_size );
-
- for ( unsigned i = 0 ; i < dep_capacity ; ++i )
- task->task_root_type::m_dep[i] = 0 ;
- }
- }
- return task ;
-}
-
-//----------------------------------------------------------------------------
-
-void CudaTaskPolicyQueue::add_dependence
- ( CudaTaskPolicyQueue::task_root_type * const after
- , CudaTaskPolicyQueue::task_root_type * const before
- )
-{
- if ( ( after != 0 ) && ( before != 0 ) ) {
-
- int const state = *((volatile const int *) & after->m_state );
-
- // Only add dependence during construction or during execution.
- // Both tasks must have the same policy.
- // Dependence on non-full memory cannot be mixed with any other dependence.
-
- const bool ok_state =
- Kokkos::Experimental::TASK_STATE_CONSTRUCTING == state ||
- Kokkos::Experimental::TASK_STATE_EXECUTING == state ;
-
- const bool ok_capacity =
- after->m_dep_size < after->m_dep_capacity ;
-
- const bool ok_policy =
- after->m_policy == this && before->m_policy == this ;
-
- if ( ok_state && ok_capacity && ok_policy ) {
-
- ++after->m_dep_size ;
-
- task_root_type::assign( after->m_dep + (after->m_dep_size-1) , before );
-
- memory_fence();
- }
- else {
-
-printf( "CudaTaskPolicyQueue::add_dependence( 0x%lx , 0x%lx ) ERROR %s\n"
- , (unsigned long) after
- , (unsigned long) before
- , ( ! ok_state ? "Task not constructing or executing" :
- ( ! ok_capacity ? "Task Exceeded dependence capacity"
- : "Tasks from different policies" )) );
-
- Kokkos::abort("CudaTaskPolicyQueue::add_dependence ERROR");
- }
- }
-}
-
-} /* namespace Impl */
-} /* namespace Experimental */
-} /* namespace Kokkos */
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Experimental {
-
-TaskPolicy< Kokkos::Cuda >::TaskPolicy
- ( const unsigned arg_task_max_count
- , const unsigned arg_task_max_size
- , const unsigned arg_task_default_dependence_capacity
- , const unsigned arg_task_team_size
- )
- : m_track()
- , m_policy(0)
-{
- // Allocate the queue data sructure in UVM space
-
- typedef Kokkos::Experimental::Impl::SharedAllocationRecord
- < Kokkos::CudaUVMSpace , Impl::CudaTaskPolicyQueue::Destroy > record_type ;
-
- record_type * record =
- record_type::allocate( Kokkos::CudaUVMSpace()
- , "CudaUVM task queue"
- , sizeof(Impl::CudaTaskPolicyQueue)
- );
-
- m_policy = reinterpret_cast< Impl::CudaTaskPolicyQueue * >( record->data() );
-
- // Tasks are allocated with application's task size + sizeof(task_root_type)
-
- const size_t full_task_size_estimate =
- arg_task_max_size +
- sizeof(task_root_type) +
- sizeof(task_root_type*) * arg_task_default_dependence_capacity ;
-
- new( m_policy )
- Impl::CudaTaskPolicyQueue( arg_task_max_count
- , full_task_size_estimate
- , arg_task_default_dependence_capacity
- , arg_task_team_size );
-
- record->m_destroy.m_policy = m_policy ;
-
- m_track.assign_allocated_record_to_uninitialized( record );
-}
-
-__global__
-static void kokkos_cuda_task_policy_queue_driver
- ( Kokkos::Experimental::Impl::CudaTaskPolicyQueue * queue )
-{
- queue->driver();
-}
-
-void wait( Kokkos::Experimental::TaskPolicy< Kokkos::Cuda > & policy )
-{
- const dim3 grid( Kokkos::Impl::cuda_internal_multiprocessor_count() , 1 , 1 );
- const dim3 block( 1 , policy.m_policy->m_team_size , 1 );
-
- const int shared = 0 ; // Kokkos::Impl::CudaTraits::SharedMemoryUsage / 2 ;
- const cudaStream_t stream = 0 ;
-
-
-#ifdef DETAILED_PRINT
-printf("kokkos_cuda_task_policy_queue_driver grid(%d,%d,%d) block(%d,%d,%d) shared(%d) policy(0x%lx)\n"
- , grid.x , grid.y , grid.z
- , block.x , block.y , block.z
- , shared
- , (unsigned long)( policy.m_policy ) );
-fflush(stdout);
-#endif
-
- CUDA_SAFE_CALL( cudaDeviceSynchronize() );
-
-/*
- CUDA_SAFE_CALL(
- cudaFuncSetCacheConfig( kokkos_cuda_task_policy_queue_driver
- , cudaFuncCachePreferL1 ) );
-
- CUDA_SAFE_CALL( cudaGetLastError() );
-*/
-
- kokkos_cuda_task_policy_queue_driver<<< grid , block , shared , stream >>>
- ( policy.m_policy );
-
- CUDA_SAFE_CALL( cudaGetLastError() );
-
- CUDA_SAFE_CALL( cudaDeviceSynchronize() );
-
-#ifdef DETAILED_PRINT
-printf("kokkos_cuda_task_policy_queue_driver end\n");
-fflush(stdout);
-#endif
-
-}
-
-} /* namespace Experimental */
-} /* namespace Kokkos */
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Experimental {
-namespace Impl {
-
-typedef TaskMember< Kokkos::Cuda , void , void > Task ;
-
-__host__ __device__
-Task::~TaskMember()
-{
-}
-
-__host__ __device__
-void Task::assign( Task ** const lhs_ptr , Task * rhs )
-{
- Task * const q_denied = reinterpret_cast<Task*>(QDENIED);
-
- // Increment rhs reference count.
- if ( rhs ) { atomic_fetch_add( & rhs->m_ref_count , 1 ); }
-
- if ( 0 == lhs_ptr ) return ;
-
- // Must have exclusive access to *lhs_ptr.
- // Assign the pointer and retrieve the previous value.
- // Cannot use atomic exchange since *lhs_ptr may be
- // in Cuda register space.
-
-#if 0
-
- Task * const old_lhs = *((Task*volatile*)lhs_ptr);
-
- *((Task*volatile*)lhs_ptr) = rhs ;
-
- Kokkos::memory_fence();
-
-#else
-
- Task * const old_lhs = *lhs_ptr ;
-
- *lhs_ptr = rhs ;
-
-#endif
-
- if ( old_lhs && rhs && old_lhs->m_policy != rhs->m_policy ) {
- Kokkos::abort( "Kokkos::Impl::TaskMember<Kokkos::Cuda>::assign ERROR different queues");
- }
-
- if ( old_lhs ) {
-
- Kokkos::memory_fence();
-
- // Decrement former lhs reference count.
- // If reference count is zero task must be complete, then delete task.
- // Task is ready for deletion when wait == q_denied
-
- int const count = atomic_fetch_add( & (old_lhs->m_ref_count) , -1 ) - 1 ;
- int const state = old_lhs->m_state ;
- Task * const wait = *((Task * const volatile *) & old_lhs->m_wait );
-
- const bool ok_count = 0 <= count ;
-
- // If count == 0 then will be deleting
- // and must either be constructing or complete.
- const bool ok_state = 0 < count ? true :
- ( ( state == int(TASK_STATE_CONSTRUCTING) && wait == 0 ) ||
- ( state == int(TASK_STATE_COMPLETE) && wait == q_denied ) )
- &&
- old_lhs->m_next == 0 &&
- old_lhs->m_dep_size == 0 ;
-
- if ( ! ok_count || ! ok_state ) {
-
- printf( "%s Kokkos::Impl::TaskManager<Kokkos::Cuda>::assign ERROR deleting task(0x%lx) m_ref_count(%d) m_state(%d) m_wait(0x%ld)\n"
-#if defined( KOKKOS_ACTIVE_EXECUTION_SPACE_CUDA )
- , "CUDA "
-#else
- , "HOST "
-#endif
- , (unsigned long) old_lhs
- , count
- , state
- , (unsigned long) wait );
- Kokkos::abort( "Kokkos::Impl::TaskMember<Kokkos::Cuda>::assign ERROR deleting");
- }
-
- if ( count == 0 ) {
- // When 'count == 0' this thread has exclusive access to 'old_lhs'
-
-#ifdef DETAILED_PRINT
-printf( "Task::assign(...) old_lhs(0x%lx) deallocate\n"
- , (unsigned long) old_lhs
- );
-#endif
-
- old_lhs->m_policy->deallocate_task( old_lhs );
- }
- }
-}
-
-//----------------------------------------------------------------------------
-
-__device__
-int Task::get_dependence() const
-{
- return m_dep_size ;
-}
-
-__device__
-Task * Task::get_dependence( int i ) const
-{
- Task * const t = ((Task*volatile*)m_dep)[i] ;
-
- if ( Kokkos::Experimental::TASK_STATE_EXECUTING != m_state || i < 0 || m_dep_size <= i || 0 == t ) {
-
-printf( "TaskMember< Cuda >::get_dependence ERROR : task[%lx]{ state(%d) dep_size(%d) dep[%d] = %lx }\n"
- , (unsigned long) this
- , m_state
- , m_dep_size
- , i
- , (unsigned long) t
- );
-
- Kokkos::abort("TaskMember< Cuda >::get_dependence ERROR");
- }
-
- return t ;
-}
-
-//----------------------------------------------------------------------------
-
-__device__ __host__
-void Task::clear_dependence()
-{
- for ( int i = m_dep_size - 1 ; 0 <= i ; --i ) {
- assign( m_dep + i , 0 );
- }
-
- *((volatile int *) & m_dep_size ) = 0 ;
-
- memory_fence();
-}
-
-//----------------------------------------------------------------------------
-
-
-//----------------------------------------------------------------------------
-
-} /* namespace Impl */
-} /* namespace Experimental */
-} /* namespace Kokkos */
-
-
-#endif /* #if defined( KOKKOS_ENABLE_TASKPOLICY ) */
-
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_TaskPolicy.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_TaskPolicy.hpp
deleted file mode 100644
index e71512f03..000000000
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_TaskPolicy.hpp
+++ /dev/null
@@ -1,833 +0,0 @@
-/*
-//@HEADER
-// ************************************************************************
-//
-// Kokkos v. 2.0
-// Copyright (2014) Sandia Corporation
-//
-// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
-// the U.S. Government retains certain rights in this software.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are
-// met:
-//
-// 1. Redistributions of source code must retain the above copyright
-// notice, this list of conditions and the following disclaimer.
-//
-// 2. Redistributions in binary form must reproduce the above copyright
-// notice, this list of conditions and the following disclaimer in the
-// documentation and/or other materials provided with the distribution.
-//
-// 3. Neither the name of the Corporation nor the names of the
-// contributors may be used to endorse or promote products derived from
-// this software without specific prior written permission.
-//
-// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
-// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
-// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
-// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
-// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
-// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
-// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
-// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
-// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-//
-// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
-// ************************************************************************
-//@HEADER
-*/
-
-// Experimental unified task-data parallel manycore LDRD
-
-#ifndef KOKKOS_CUDA_TASKPOLICY_HPP
-#define KOKKOS_CUDA_TASKPOLICY_HPP
-
-#include <Kokkos_Core_fwd.hpp>
-#include <Kokkos_Cuda.hpp>
-#include <Kokkos_TaskPolicy.hpp>
-
-#if defined( KOKKOS_HAVE_CUDA ) && defined( KOKKOS_ENABLE_TASKPOLICY )
-
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Experimental {
-namespace Impl {
-
-struct CudaTaskPolicyQueue ;
-
-/** \brief Base class for all Kokkos::Cuda tasks */
-template<>
-class TaskMember< Kokkos::Cuda , void , void > {
-public:
-
- template< class > friend class Kokkos::Experimental::TaskPolicy ;
- friend struct CudaTaskPolicyQueue ;
-
- typedef void (* function_single_type) ( TaskMember * );
- typedef void (* function_team_type) ( TaskMember * , Kokkos::Impl::CudaTeamMember & );
-
-private:
-
- CudaTaskPolicyQueue * m_policy ;
- TaskMember * volatile * m_queue ;
- function_team_type m_team ; ///< Apply function on CUDA
- function_single_type m_serial ; ///< Apply function on CUDA
- TaskMember ** m_dep ; ///< Dependences
- TaskMember * m_wait ; ///< Linked list of tasks waiting on this task
- TaskMember * m_next ; ///< Linked list of tasks waiting on a different task
- int m_dep_capacity ; ///< Capacity of dependences
- int m_dep_size ; ///< Actual count of dependences
- int m_size_alloc ;
- int m_shmem_size ;
- int m_ref_count ; ///< Reference count
- int m_state ; ///< State of the task
-
-
- TaskMember( TaskMember && ) = delete ;
- TaskMember( const TaskMember & ) = delete ;
- TaskMember & operator = ( TaskMember && ) = delete ;
- TaskMember & operator = ( const TaskMember & ) = delete ;
-
-protected:
-
- KOKKOS_INLINE_FUNCTION
- TaskMember()
- : m_policy(0)
- , m_queue(0)
- , m_team(0)
- , m_serial(0)
- , m_dep(0)
- , m_wait(0)
- , m_next(0)
- , m_size_alloc(0)
- , m_dep_capacity(0)
- , m_dep_size(0)
- , m_shmem_size(0)
- , m_ref_count(0)
- , m_state( TASK_STATE_CONSTRUCTING )
- {}
-
-public:
-
- KOKKOS_FUNCTION
- ~TaskMember();
-
- KOKKOS_INLINE_FUNCTION
- int reference_count() const
- { return *((volatile int *) & m_ref_count ); }
-
- // Cannot use the function pointer to verify the type
- // since the function pointer is not unique between
- // Host and Cuda. Don't run verificaton for Cuda.
- // Assume testing on Host-only back-end will catch such errors.
-
- template< typename ResultType >
- KOKKOS_INLINE_FUNCTION static
- TaskMember * verify_type( TaskMember * t ) { return t ; }
-
- //----------------------------------------
- /* Inheritence Requirements on task types:
- *
- * class DerivedTaskType
- * : public TaskMember< Cuda , DerivedType::value_type , FunctorType >
- * { ... };
- *
- * class TaskMember< Cuda , DerivedType::value_type , FunctorType >
- * : public TaskMember< Cuda , DerivedType::value_type , void >
- * , public Functor
- * { ... };
- *
- * If value_type != void
- * class TaskMember< Cuda , value_type , void >
- * : public TaskMember< Cuda , void , void >
- *
- * Allocate space for DerivedTaskType followed by TaskMember*[ dependence_capacity ]
- *
- */
- //----------------------------------------
- // If after the 'apply' the task's state is waiting
- // then it will be rescheduled and called again.
- // Otherwise the functor must be destroyed.
-
- template< class DerivedTaskType , class Tag >
- __device__ static
- void apply_single(
- typename std::enable_if
- <( std::is_same< Tag , void >::value &&
- std::is_same< typename DerivedTaskType::result_type , void >::value
- ), TaskMember * >::type t )
- {
- typedef typename DerivedTaskType::functor_type functor_type ;
-
- functor_type * const f =
- static_cast< functor_type * >( static_cast< DerivedTaskType * >(t) );
-
- f->apply();
-
- if ( t->m_state == int(Kokkos::Experimental::TASK_STATE_EXECUTING) ) {
- f->~functor_type();
- }
- }
-
- template< class DerivedTaskType , class Tag >
- __device__ static
- void apply_single(
- typename std::enable_if
- <( std::is_same< Tag , void >::value &&
- ! std::is_same< typename DerivedTaskType::result_type , void >::value
- ), TaskMember * >::type t )
- {
- typedef typename DerivedTaskType::functor_type functor_type ;
-
- DerivedTaskType * const self = static_cast< DerivedTaskType * >(t);
- functor_type * const f = static_cast< functor_type * >( self );
-
- f->apply( self->m_result );
-
- if ( t->m_state == int(Kokkos::Experimental::TASK_STATE_EXECUTING) ) {
- f->~functor_type();
- }
- }
-
- template< class DerivedTaskType , class Tag >
- __device__
- void set_apply_single()
- {
- m_serial = & TaskMember::template apply_single<DerivedTaskType,Tag> ;
- }
-
- //----------------------------------------
-
- template< class DerivedTaskType , class Tag >
- __device__ static
- void apply_team(
- typename std::enable_if
- <( std::is_same<Tag,void>::value &&
- std::is_same<typename DerivedTaskType::result_type,void>::value
- ), TaskMember * >::type t
- , Kokkos::Impl::CudaTeamMember & member
- )
- {
- typedef typename DerivedTaskType::functor_type functor_type ;
-
- functor_type * const f =
- static_cast< functor_type * >( static_cast< DerivedTaskType * >(t) );
-
- f->apply( member );
-
- __syncthreads(); // Wait for team to finish calling function
-
- if ( threadIdx.x == 0 &&
- threadIdx.y == 0 &&
- t->m_state == int(Kokkos::Experimental::TASK_STATE_EXECUTING) ) {
- f->~functor_type();
- }
- }
-
- template< class DerivedTaskType , class Tag >
- __device__ static
- void apply_team(
- typename std::enable_if
- <( std::is_same<Tag,void>::value &&
- ! std::is_same<typename DerivedTaskType::result_type,void>::value
- ), TaskMember * >::type t
- , Kokkos::Impl::CudaTeamMember & member
- )
- {
- typedef typename DerivedTaskType::functor_type functor_type ;
-
- DerivedTaskType * const self = static_cast< DerivedTaskType * >(t);
- functor_type * const f = static_cast< functor_type * >( self );
-
- f->apply( member , self->m_result );
-
- __syncthreads(); // Wait for team to finish calling function
-
- if ( threadIdx.x == 0 &&
- threadIdx.y == 0 &&
- t->m_state == int(Kokkos::Experimental::TASK_STATE_EXECUTING) ) {
- f->~functor_type();
- }
- }
-
- template< class DerivedTaskType , class Tag >
- __device__
- void set_apply_team()
- {
- m_team = & TaskMember::template apply_team<DerivedTaskType,Tag> ;
- }
-
- //----------------------------------------
-
- KOKKOS_FUNCTION static
- void assign( TaskMember ** const lhs , TaskMember * const rhs );
-
- __device__
- TaskMember * get_dependence( int i ) const ;
-
- __device__
- int get_dependence() const ;
-
- KOKKOS_FUNCTION void clear_dependence();
-
- __device__
- void latch_add( const int k );
-
- //----------------------------------------
-
- KOKKOS_INLINE_FUNCTION static
- void construct_result( TaskMember * const ) {}
-
- typedef FutureValueTypeIsVoidError get_result_type ;
-
- KOKKOS_INLINE_FUNCTION
- get_result_type get() const { return get_result_type() ; }
-
- KOKKOS_INLINE_FUNCTION
- Kokkos::Experimental::TaskState get_state() const { return Kokkos::Experimental::TaskState( m_state ); }
-
-};
-
-/** \brief A Future< Kokkos::Cuda , ResultType > will cast
- * from TaskMember< Kokkos::Cuda , void , void >
- * to TaskMember< Kokkos::Cuda , ResultType , void >
- * to query the result.
- */
-template< class ResultType >
-class TaskMember< Kokkos::Cuda , ResultType , void >
- : public TaskMember< Kokkos::Cuda , void , void >
-{
-public:
-
- typedef ResultType result_type ;
-
- result_type m_result ;
-
- typedef const result_type & get_result_type ;
-
- KOKKOS_INLINE_FUNCTION
- get_result_type get() const { return m_result ; }
-
- KOKKOS_INLINE_FUNCTION static
- void construct_result( TaskMember * const ptr )
- {
- new((void*)(& ptr->m_result)) result_type();
- }
-
- TaskMember() = delete ;
- TaskMember( TaskMember && ) = delete ;
- TaskMember( const TaskMember & ) = delete ;
- TaskMember & operator = ( TaskMember && ) = delete ;
- TaskMember & operator = ( const TaskMember & ) = delete ;
-};
-
-/** \brief Callback functions will cast
- * from TaskMember< Kokkos::Cuda , void , void >
- * to TaskMember< Kokkos::Cuda , ResultType , FunctorType >
- * to execute work functions.
- */
-template< class ResultType , class FunctorType >
-class TaskMember< Kokkos::Cuda , ResultType , FunctorType >
- : public TaskMember< Kokkos::Cuda , ResultType , void >
- , public FunctorType
-{
-public:
- typedef ResultType result_type ;
- typedef FunctorType functor_type ;
-
- KOKKOS_INLINE_FUNCTION static
- void copy_construct( TaskMember * const ptr
- , const functor_type & arg_functor )
- {
- typedef TaskMember< Kokkos::Cuda , ResultType , void > base_type ;
-
- new((void*)static_cast<FunctorType*>(ptr)) functor_type( arg_functor );
-
- base_type::construct_result( static_cast<base_type*>( ptr ) );
- }
-
- TaskMember() = delete ;
- TaskMember( TaskMember && ) = delete ;
- TaskMember( const TaskMember & ) = delete ;
- TaskMember & operator = ( TaskMember && ) = delete ;
- TaskMember & operator = ( const TaskMember & ) = delete ;
-};
-
-//----------------------------------------------------------------------------
-
-namespace {
-
-template< class DerivedTaskType , class Tag >
-__global__
-void cuda_set_apply_single( DerivedTaskType * task )
-{
- typedef Kokkos::Experimental::Impl::TaskMember< Kokkos::Cuda , void , void >
- task_root_type ;
-
- task->task_root_type::template set_apply_single< DerivedTaskType , Tag >();
-}
-
-template< class DerivedTaskType , class Tag >
-__global__
-void cuda_set_apply_team( DerivedTaskType * task )
-{
- typedef Kokkos::Experimental::Impl::TaskMember< Kokkos::Cuda , void , void >
- task_root_type ;
-
- task->task_root_type::template set_apply_team< DerivedTaskType , Tag >();
-}
-
-} /* namespace */
-} /* namespace Impl */
-} /* namespace Experimental */
-} /* namespace Kokkos */
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Experimental {
-namespace Impl {
-
-struct CudaTaskPolicyQueue {
-
- enum { NPRIORITY = 3 };
-
- // Must use UVM so that tasks can be created in both
- // Host and Cuda space.
-
- typedef Kokkos::Experimental::MemoryPool< Kokkos::CudaUVMSpace >
- memory_space ;
-
- typedef Kokkos::Experimental::Impl::TaskMember< Kokkos::Cuda , void , void >
- task_root_type ;
-
- memory_space m_space ;
- task_root_type * m_team[ NPRIORITY ] ;
- task_root_type * m_serial[ NPRIORITY ];
- int m_team_size ;
- int m_default_dependence_capacity ;
- int volatile m_count_ready ; ///< Ready plus executing tasks
-
- // Execute tasks until all non-waiting tasks are complete
- __device__
- void driver();
-
- __device__ static
- task_root_type * pop_ready_task( task_root_type * volatile * const queue );
-
- // When a task finishes executing.
- __device__
- void complete_executed_task( task_root_type * );
-
- KOKKOS_FUNCTION void schedule_task( task_root_type * const
- , const bool initial_spawn = true );
- KOKKOS_FUNCTION void reschedule_task( task_root_type * const );
- KOKKOS_FUNCTION
- void add_dependence( task_root_type * const after
- , task_root_type * const before );
-
-
- CudaTaskPolicyQueue() = delete ;
- CudaTaskPolicyQueue( CudaTaskPolicyQueue && ) = delete ;
- CudaTaskPolicyQueue( const CudaTaskPolicyQueue & ) = delete ;
- CudaTaskPolicyQueue & operator = ( CudaTaskPolicyQueue && ) = delete ;
- CudaTaskPolicyQueue & operator = ( const CudaTaskPolicyQueue & ) = delete ;
-
-
- ~CudaTaskPolicyQueue();
-
- // Construct only on the Host
- CudaTaskPolicyQueue
- ( const unsigned arg_task_max_count
- , const unsigned arg_task_max_size
- , const unsigned arg_task_default_dependence_capacity
- , const unsigned arg_task_team_size
- );
-
- struct Destroy {
- CudaTaskPolicyQueue * m_policy ;
- void destroy_shared_allocation();
- };
-
- //----------------------------------------
- /** \brief Allocate and construct a task.
- *
- * Allocate space for DerivedTaskType followed
- * by TaskMember*[ dependence_capacity ]
- */
- KOKKOS_FUNCTION
- task_root_type *
- allocate_task( const unsigned arg_sizeof_task
- , const unsigned arg_dep_capacity
- , const unsigned arg_team_shmem = 0 );
-
- KOKKOS_FUNCTION void deallocate_task( task_root_type * const );
-};
-
-} /* namespace Impl */
-} /* namespace Experimental */
-} /* namespace Kokkos */
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Experimental {
-
-void wait( TaskPolicy< Kokkos::Cuda > & );
-
-template<>
-class TaskPolicy< Kokkos::Cuda >
-{
-public:
-
- typedef Kokkos::Cuda execution_space ;
- typedef TaskPolicy execution_policy ;
- typedef Kokkos::Impl::CudaTeamMember member_type ;
-
-private:
-
- typedef Impl::TaskMember< Kokkos::Cuda , void , void > task_root_type ;
- typedef Kokkos::Experimental::MemoryPool< Kokkos::CudaUVMSpace > memory_space ;
- typedef Kokkos::Experimental::Impl::SharedAllocationTracker track_type ;
-
- track_type m_track ;
- Impl::CudaTaskPolicyQueue * m_policy ;
-
- template< class FunctorType >
- KOKKOS_INLINE_FUNCTION static
- const task_root_type * get_task_root( const FunctorType * f )
- {
- typedef Impl::TaskMember< execution_space , typename FunctorType::value_type , FunctorType > task_type ;
- return static_cast< const task_root_type * >( static_cast< const task_type * >(f) );
- }
-
- template< class FunctorType >
- KOKKOS_INLINE_FUNCTION static
- task_root_type * get_task_root( FunctorType * f )
- {
- typedef Impl::TaskMember< execution_space , typename FunctorType::value_type , FunctorType > task_type ;
- return static_cast< task_root_type * >( static_cast< task_type * >(f) );
- }
-
-public:
-
- TaskPolicy
- ( const unsigned arg_task_max_count
- , const unsigned arg_task_max_size
- , const unsigned arg_task_default_dependence_capacity = 4
- , const unsigned arg_task_team_size = 0 /* choose default */
- );
-
- KOKKOS_FUNCTION TaskPolicy() = default ;
- KOKKOS_FUNCTION TaskPolicy( TaskPolicy && rhs ) = default ;
- KOKKOS_FUNCTION TaskPolicy( const TaskPolicy & rhs ) = default ;
- KOKKOS_FUNCTION TaskPolicy & operator = ( TaskPolicy && rhs ) = default ;
- KOKKOS_FUNCTION TaskPolicy & operator = ( const TaskPolicy & rhs ) = default ;
-
- KOKKOS_FUNCTION
- int allocated_task_count() const { return 0 ; }
-
- //----------------------------------------
- // Create serial-thread task
- // Main process and tasks must use different functions
- // to work around CUDA limitation where __host__ __device__
- // functions are not allowed to invoke templated __global__ functions.
-
- template< class FunctorType >
- Future< typename FunctorType::value_type , execution_space >
- proc_create( const FunctorType & arg_functor
- , const unsigned arg_dep_capacity = ~0u ) const
- {
- typedef typename FunctorType::value_type value_type ;
-
- typedef Impl::TaskMember< execution_space , value_type , FunctorType >
- task_type ;
-
- task_type * const task =
- static_cast<task_type*>(
- m_policy->allocate_task( sizeof(task_type) , arg_dep_capacity ) );
-
- if ( task ) {
- // The root part of the class has been constructed.
- // Must now construct the functor and result specific part.
-
- task_type::copy_construct( task , arg_functor );
-
- // Setting the apply pointer on the device requires code
- // executing on the GPU. This function is called on the
- // host process so a kernel must be run.
-
- // Launching a kernel will cause the allocated task in
- // UVM memory to be copied to the GPU.
- // Synchronize to guarantee non-concurrent access
- // between host and device.
-
- CUDA_SAFE_CALL( cudaDeviceSynchronize() );
-
- Impl::cuda_set_apply_single<task_type,void><<<1,1>>>( task );
-
- CUDA_SAFE_CALL( cudaGetLastError() );
- CUDA_SAFE_CALL( cudaDeviceSynchronize() );
- }
-
- return Future< value_type , execution_space >( task );
- }
-
- template< class FunctorType >
- __device__
- Future< typename FunctorType::value_type , execution_space >
- task_create( const FunctorType & arg_functor
- , const unsigned arg_dep_capacity = ~0u ) const
- {
- typedef typename FunctorType::value_type value_type ;
-
- typedef Impl::TaskMember< execution_space , value_type , FunctorType >
- task_type ;
-
- task_type * const task =
- static_cast<task_type*>(
- m_policy->allocate_task( sizeof(task_type) , arg_dep_capacity ) );
-
- if ( task ) {
- // The root part of the class has been constructed.
- // Must now construct the functor and result specific part.
-
- task_type::copy_construct( task , arg_functor );
-
- // Setting the apply pointer on the device requires code
- // executing on the GPU. If this function is called on the
- // Host then a kernel must be run.
-
- task->task_root_type::template set_apply_single< task_type , void >();
- }
-
- return Future< value_type , execution_space >( task );
- }
-
- //----------------------------------------
- // Create thread-team task
- // Main process and tasks must use different functions
- // to work around CUDA limitation where __host__ __device__
- // functions are not allowed to invoke templated __global__ functions.
-
- template< class FunctorType >
- Future< typename FunctorType::value_type , execution_space >
- proc_create_team( const FunctorType & arg_functor
- , const unsigned arg_dep_capacity = ~0u ) const
- {
- typedef typename FunctorType::value_type value_type ;
-
- typedef Impl::TaskMember< execution_space , value_type , FunctorType >
- task_type ;
-
- const unsigned team_shmem_size =
- Kokkos::Impl::FunctorTeamShmemSize< FunctorType >::value
- ( arg_functor , m_policy->m_team_size );
-
- task_type * const task =
- static_cast<task_type*>(
- m_policy->allocate_task( sizeof(task_type) , arg_dep_capacity , team_shmem_size ) );
-
- if ( task ) {
- // The root part of the class has been constructed.
- // Must now construct the functor and result specific part.
-
- task_type::copy_construct( task , arg_functor );
-
- // Setting the apply pointer on the device requires code
- // executing on the GPU. This function is called on the
- // host process so a kernel must be run.
-
- // Launching a kernel will cause the allocated task in
- // UVM memory to be copied to the GPU.
- // Synchronize to guarantee non-concurrent access
- // between host and device.
-
- CUDA_SAFE_CALL( cudaDeviceSynchronize() );
-
- Impl::cuda_set_apply_team<task_type,void><<<1,1>>>( task );
-
- CUDA_SAFE_CALL( cudaGetLastError() );
- CUDA_SAFE_CALL( cudaDeviceSynchronize() );
- }
-
- return Future< value_type , execution_space >( task );
- }
-
- template< class FunctorType >
- __device__
- Future< typename FunctorType::value_type , execution_space >
- task_create_team( const FunctorType & arg_functor
- , const unsigned arg_dep_capacity = ~0u ) const
- {
- typedef typename FunctorType::value_type value_type ;
-
- typedef Impl::TaskMember< execution_space , value_type , FunctorType >
- task_type ;
-
- const unsigned team_shmem_size =
- Kokkos::Impl::FunctorTeamShmemSize< FunctorType >::value
- ( arg_functor , m_policy->m_team_size );
-
- task_type * const task =
- static_cast<task_type*>(
- m_policy->allocate_task( sizeof(task_type) , arg_dep_capacity , team_shmem_size ) );
-
- if ( task ) {
- // The root part of the class has been constructed.
- // Must now construct the functor and result specific part.
-
- task_type::copy_construct( task , arg_functor );
-
- // Setting the apply pointer on the device requires code
- // executing on the GPU. If this function is called on the
- // Host then a kernel must be run.
-
- task->task_root_type::template set_apply_team< task_type , void >();
- }
-
- return Future< value_type , execution_space >( task );
- }
-
- //----------------------------------------
-
- Future< Latch , execution_space >
- KOKKOS_INLINE_FUNCTION
- create_latch( const int N ) const
- {
- task_root_type * const task =
- m_policy->allocate_task( sizeof(task_root_type) , 0 , 0 );
- task->m_dep_size = N ; // Using m_dep_size for latch counter
- task->m_state = TASK_STATE_WAITING ;
- return Future< Latch , execution_space >( task );
- }
-
- //----------------------------------------
-
- template< class A1 , class A2 , class A3 , class A4 >
- KOKKOS_INLINE_FUNCTION
- void add_dependence( const Future<A1,A2> & after
- , const Future<A3,A4> & before
- , typename std::enable_if
- < std::is_same< typename Future<A1,A2>::execution_space , execution_space >::value
- &&
- std::is_same< typename Future<A3,A4>::execution_space , execution_space >::value
- >::type * = 0
- ) const
- { m_policy->add_dependence( after.m_task , before.m_task ); }
-
- template< class FunctorType , class A3 , class A4 >
- KOKKOS_INLINE_FUNCTION
- void add_dependence( FunctorType * task_functor
- , const Future<A3,A4> & before
- , typename std::enable_if
- < std::is_same< typename Future<A3,A4>::execution_space , execution_space >::value
- >::type * = 0
- ) const
- { m_policy->add_dependence( get_task_root(task_functor) , before.m_task ); }
-
-
- template< class ValueType >
- KOKKOS_INLINE_FUNCTION
- const Future< ValueType , execution_space > &
- spawn( const Future< ValueType , execution_space > & f
- , const bool priority = false ) const
- {
- if ( f.m_task ) {
- f.m_task->m_queue =
- ( f.m_task->m_team != 0
- ? & ( m_policy->m_team[ priority ? 0 : 1 ] )
- : & ( m_policy->m_serial[ priority ? 0 : 1 ] ) );
- m_policy->schedule_task( f.m_task );
- }
- return f ;
- }
-
- template< class FunctorType >
- KOKKOS_INLINE_FUNCTION
- void respawn( FunctorType * task_functor
- , const bool priority = false ) const
- {
- task_root_type * const t = get_task_root(task_functor);
- t->m_queue =
- ( t->m_team != 0 ? & ( m_policy->m_team[ priority ? 0 : 1 ] )
- : & ( m_policy->m_serial[ priority ? 0 : 1 ] ) );
- m_policy->reschedule_task( t );
- }
-
- // When a create method fails by returning a null Future
- // the task that called the create method may respawn
- // with a dependence on memory becoming available.
- // This is a race as more than one task may be respawned
- // with this need.
-
- template< class FunctorType >
- KOKKOS_INLINE_FUNCTION
- void respawn_needing_memory( FunctorType * task_functor ) const
- {
- task_root_type * const t = get_task_root(task_functor);
- t->m_queue =
- ( t->m_team != 0 ? & ( m_policy->m_team[ 2 ] )
- : & ( m_policy->m_serial[ 2 ] ) );
- m_policy->reschedule_task( t );
- }
-
- //----------------------------------------
- // Functions for an executing task functor to query dependences,
- // set new dependences, and respawn itself.
-
- template< class FunctorType >
- KOKKOS_INLINE_FUNCTION
- Future< void , execution_space >
- get_dependence( const FunctorType * task_functor , int i ) const
- {
- return Future<void,execution_space>(
- get_task_root(task_functor)->get_dependence(i)
- );
- }
-
- template< class FunctorType >
- KOKKOS_INLINE_FUNCTION
- int get_dependence( const FunctorType * task_functor ) const
- { return get_task_root(task_functor)->get_dependence(); }
-
- template< class FunctorType >
- KOKKOS_INLINE_FUNCTION
- void clear_dependence( FunctorType * task_functor ) const
- { get_task_root(task_functor)->clear_dependence(); }
-
- //----------------------------------------
-
- __device__
- static member_type member_single()
- {
- return
- member_type( 0 /* shared memory pointer */
- , 0 /* shared memory begin offset */
- , 0 /* shared memory end offset */
- , 0 /* scratch level_1 pointer */
- , 0 /* scratch level_1 size */
- , 0 /* league rank */
- , 1 /* league size */ );
- }
-
- friend void wait( TaskPolicy< Kokkos::Cuda > & );
-};
-
-} /* namespace Experimental */
-} /* namespace Kokkos */
-
-
-//----------------------------------------------------------------------------
-
-#endif /* #if defined( KOKKOS_HAVE_CUDA ) && defined( KOKKOS_ENABLE_TASKPOLICY ) */
-#endif /* #ifndef KOKKOS_CUDA_TASKPOLICY_HPP */
-
-
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_View.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_View.hpp
index 92f6fc1f5..b505b766a 100644
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_View.hpp
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_View.hpp
@@ -1,93 +1,306 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
-#ifndef KOKKOS_CUDA_VIEW_HPP
-#define KOKKOS_CUDA_VIEW_HPP
-
-#include <Kokkos_Macros.hpp>
+#ifndef KOKKOS_EXPERIMENTAL_CUDA_VIEW_HPP
+#define KOKKOS_EXPERIMENTAL_CUDA_VIEW_HPP
/* only compile this file if CUDA is enabled for Kokkos */
-#ifdef KOKKOS_HAVE_CUDA
+#if defined( KOKKOS_HAVE_CUDA )
+
+//----------------------------------------------------------------------------
+//----------------------------------------------------------------------------
-#include <cstring>
+namespace Kokkos {
+namespace Experimental {
+namespace Impl {
-#include <Kokkos_HostSpace.hpp>
-#include <Kokkos_CudaSpace.hpp>
-#include <impl/Kokkos_Shape.hpp>
-#include <Kokkos_View.hpp>
+// Cuda Texture fetches can be performed for 4, 8 and 16 byte objects (int,int2,int4)
+// Via reinterpret_case this can be used to support all scalar types of those sizes.
+// Any other scalar type falls back to either normal reads out of global memory,
+// or using the __ldg intrinsic on Kepler GPUs or newer (Compute Capability >= 3.0)
+
+template< typename ValueType , typename AliasType >
+struct CudaTextureFetch {
+
+ ::cudaTextureObject_t m_obj ;
+ const ValueType * m_ptr ;
+ int m_offset ;
+
+ // Deference operator pulls through texture object and returns by value
+ template< typename iType >
+ KOKKOS_INLINE_FUNCTION
+ ValueType operator[]( const iType & i ) const
+ {
+#if defined( __CUDA_ARCH__ ) && ( 300 <= __CUDA_ARCH__ )
+ AliasType v = tex1Dfetch<AliasType>( m_obj , i + m_offset );
+ return *(reinterpret_cast<ValueType*> (&v));
+#else
+ return m_ptr[ i ];
+#endif
+ }
+
+ // Pointer to referenced memory
+ KOKKOS_INLINE_FUNCTION
+ operator const ValueType * () const { return m_ptr ; }
+
+
+ KOKKOS_INLINE_FUNCTION
+ CudaTextureFetch() : m_obj() , m_ptr() , m_offset() {}
+
+ KOKKOS_INLINE_FUNCTION
+ ~CudaTextureFetch() {}
+
+ KOKKOS_INLINE_FUNCTION
+ CudaTextureFetch( const CudaTextureFetch & rhs )
+ : m_obj( rhs.m_obj )
+ , m_ptr( rhs.m_ptr )
+ , m_offset( rhs.m_offset )
+ {}
+
+ KOKKOS_INLINE_FUNCTION
+ CudaTextureFetch( CudaTextureFetch && rhs )
+ : m_obj( rhs.m_obj )
+ , m_ptr( rhs.m_ptr )
+ , m_offset( rhs.m_offset )
+ {}
+
+ KOKKOS_INLINE_FUNCTION
+ CudaTextureFetch & operator = ( const CudaTextureFetch & rhs )
+ {
+ m_obj = rhs.m_obj ;
+ m_ptr = rhs.m_ptr ;
+ m_offset = rhs.m_offset ;
+ return *this ;
+ }
+
+ KOKKOS_INLINE_FUNCTION
+ CudaTextureFetch & operator = ( CudaTextureFetch && rhs )
+ {
+ m_obj = rhs.m_obj ;
+ m_ptr = rhs.m_ptr ;
+ m_offset = rhs.m_offset ;
+ return *this ;
+ }
+
+ // Texture object spans the entire allocation.
+ // This handle may view a subset of the allocation, so an offset is required.
+ template< class CudaMemorySpace >
+ inline explicit
+ CudaTextureFetch( const ValueType * const arg_ptr
+ , Kokkos::Experimental::Impl::SharedAllocationRecord< CudaMemorySpace , void > & record
+ )
+ : m_obj( record.template attach_texture_object< AliasType >() )
+ , m_ptr( arg_ptr )
+ , m_offset( record.attach_texture_object_offset( reinterpret_cast<const AliasType*>( arg_ptr ) ) )
+ {}
+
+ // Texture object spans the entire allocation.
+ // This handle may view a subset of the allocation, so an offset is required.
+ KOKKOS_INLINE_FUNCTION
+ CudaTextureFetch( const CudaTextureFetch & rhs , size_t offset )
+ : m_obj( rhs.m_obj )
+ , m_ptr( rhs.m_ptr + offset)
+ , m_offset( offset + rhs.m_offset )
+ {}
+};
+
+#if defined( KOKKOS_CUDA_USE_LDG_INTRINSIC )
+
+template< typename ValueType , typename AliasType >
+struct CudaLDGFetch {
+
+ const ValueType * m_ptr ;
+
+ template< typename iType >
+ KOKKOS_INLINE_FUNCTION
+ ValueType operator[]( const iType & i ) const
+ {
+ #ifdef __CUDA_ARCH__
+ AliasType v = __ldg(reinterpret_cast<const AliasType*>(&m_ptr[i]));
+ return *(reinterpret_cast<ValueType*> (&v));
+ #else
+ return m_ptr[i];
+ #endif
+ }
+
+ KOKKOS_INLINE_FUNCTION
+ operator const ValueType * () const { return m_ptr ; }
+
+ KOKKOS_INLINE_FUNCTION
+ CudaLDGFetch() : m_ptr() {}
+
+ KOKKOS_INLINE_FUNCTION
+ ~CudaLDGFetch() {}
+
+ KOKKOS_INLINE_FUNCTION
+ CudaLDGFetch( const CudaLDGFetch & rhs )
+ : m_ptr( rhs.m_ptr )
+ {}
+
+ KOKKOS_INLINE_FUNCTION
+ CudaLDGFetch( CudaLDGFetch && rhs )
+ : m_ptr( rhs.m_ptr )
+ {}
+
+ KOKKOS_INLINE_FUNCTION
+ CudaLDGFetch & operator = ( const CudaLDGFetch & rhs )
+ {
+ m_ptr = rhs.m_ptr ;
+ return *this ;
+ }
+
+ KOKKOS_INLINE_FUNCTION
+ CudaLDGFetch & operator = ( CudaLDGFetch && rhs )
+ {
+ m_ptr = rhs.m_ptr ;
+ return *this ;
+ }
+
+ template< class CudaMemorySpace >
+ inline explicit
+ CudaLDGFetch( const ValueType * const arg_ptr
+ , Kokkos::Experimental::Impl::SharedAllocationRecord< CudaMemorySpace , void > const &
+ )
+ : m_ptr( arg_ptr )
+ {}
+
+ KOKKOS_INLINE_FUNCTION
+ CudaLDGFetch( CudaLDGFetch const rhs ,size_t offset)
+ : m_ptr( rhs.m_ptr + offset )
+ {}
+
+};
+
+#endif
+
+} // namespace Impl
+} // namespace Experimental
+} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
+namespace Experimental {
namespace Impl {
-template<>
-struct AssertShapeBoundsAbort< CudaSpace >
+/** \brief Replace Default ViewDataHandle with Cuda texture fetch specialization
+ * if 'const' value type, CudaSpace and random access.
+ */
+template< class Traits >
+class ViewDataHandle< Traits ,
+ typename std::enable_if<(
+ // Is Cuda memory space
+ ( std::is_same< typename Traits::memory_space,Kokkos::CudaSpace>::value ||
+ std::is_same< typename Traits::memory_space,Kokkos::CudaUVMSpace>::value )
+ &&
+ // Is a trivial const value of 4, 8, or 16 bytes
+ std::is_trivial<typename Traits::const_value_type>::value
+ &&
+ std::is_same<typename Traits::const_value_type,typename Traits::value_type>::value
+ &&
+ ( sizeof(typename Traits::const_value_type) == 4 ||
+ sizeof(typename Traits::const_value_type) == 8 ||
+ sizeof(typename Traits::const_value_type) == 16 )
+ &&
+ // Random access trait
+ ( Traits::memory_traits::RandomAccess != 0 )
+ )>::type >
{
+public:
+
+ using track_type = Kokkos::Experimental::Impl::SharedAllocationTracker ;
+
+ using value_type = typename Traits::const_value_type ;
+ using return_type = typename Traits::const_value_type ; // NOT a reference
+
+ using alias_type = typename std::conditional< ( sizeof(value_type) == 4 ) , int ,
+ typename std::conditional< ( sizeof(value_type) == 8 ) , ::int2 ,
+ typename std::conditional< ( sizeof(value_type) == 16 ) , ::int4 , void
+ >::type
+ >::type
+ >::type ;
+
+#if defined( KOKKOS_CUDA_USE_LDG_INTRINSIC )
+ using handle_type = Kokkos::Experimental::Impl::CudaLDGFetch< value_type , alias_type > ;
+#else
+ using handle_type = Kokkos::Experimental::Impl::CudaTextureFetch< value_type , alias_type > ;
+#endif
+
+ KOKKOS_INLINE_FUNCTION
+ static handle_type const & assign( handle_type const & arg_handle , track_type const & /* arg_tracker */ )
+ {
+ return arg_handle ;
+ }
+
KOKKOS_INLINE_FUNCTION
- static void apply( const size_t /* rank */ ,
- const size_t /* n0 */ , const size_t /* n1 */ ,
- const size_t /* n2 */ , const size_t /* n3 */ ,
- const size_t /* n4 */ , const size_t /* n5 */ ,
- const size_t /* n6 */ , const size_t /* n7 */ ,
-
- const size_t /* arg_rank */ ,
- const size_t /* i0 */ , const size_t /* i1 */ ,
- const size_t /* i2 */ , const size_t /* i3 */ ,
- const size_t /* i4 */ , const size_t /* i5 */ ,
- const size_t /* i6 */ , const size_t /* i7 */ )
+ static handle_type const assign( handle_type const & arg_handle , size_t offset )
{
- Kokkos::abort("Kokkos::View array bounds violation");
+ return handle_type(arg_handle,offset) ;
+ }
+
+ KOKKOS_INLINE_FUNCTION
+ static handle_type assign( value_type * arg_data_ptr, track_type const & arg_tracker )
+ {
+#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
+ // Assignment of texture = non-texture requires creation of a texture object
+ // which can only occur on the host. In addition, 'get_record' is only valid
+ // if called in a host execution space
+ return handle_type( arg_data_ptr , arg_tracker.template get_record< typename Traits::memory_space >() );
+#else
+ Kokkos::Impl::cuda_abort("Cannot create Cuda texture object from within a Cuda kernel");
+ return handle_type();
+#endif
}
};
+}
}
}
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
-#endif // KOKKOS_HAVE_CUDA
+#endif /* #if defined( KOKKOS_HAVE_CUDA ) */
#endif /* #ifndef KOKKOS_CUDA_VIEW_HPP */
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_abort.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_abort.hpp
index deb955ccd..60903b757 100644
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_abort.hpp
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_abort.hpp
@@ -1,119 +1,87 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_CUDA_ABORT_HPP
#define KOKKOS_CUDA_ABORT_HPP
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
#include "Kokkos_Macros.hpp"
-#if defined( __CUDACC__ ) && defined( __CUDA_ARCH__ ) && defined( KOKKOS_HAVE_CUDA )
+#if defined( __CUDACC__ ) && defined( KOKKOS_HAVE_CUDA )
#include <cuda.h>
-#if ! defined( CUDA_VERSION ) || ( CUDA_VERSION < 4010 )
-#error "Cuda version 4.1 or greater required"
-#endif
-
-#if ( __CUDA_ARCH__ < 200 )
-#error "Cuda device capability 2.0 or greater required"
-#endif
-
extern "C" {
/* Cuda runtime function, declared in <crt/device_runtime.h>
* Requires capability 2.x or better.
*/
extern __device__ void __assertfail(
const void *message,
const void *file,
unsigned int line,
const void *function,
size_t charsize);
}
namespace Kokkos {
namespace Impl {
__device__ inline
void cuda_abort( const char * const message )
{
#ifndef __APPLE__
const char empty[] = "" ;
__assertfail( (const void *) message ,
(const void *) empty ,
(unsigned int) 0 ,
(const void *) empty ,
sizeof(char) );
#endif
}
} // namespace Impl
} // namespace Kokkos
-
-#else
-
-namespace Kokkos {
-namespace Impl {
-KOKKOS_INLINE_FUNCTION
-void cuda_abort( const char * const ) {}
-}
-}
-
-#endif /* #if defined( __CUDACC__ ) && defined( __CUDA_ARCH__ ) */
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_CUDA )
-namespace Kokkos {
-__device__ inline
-void abort( const char * const message ) { Kokkos::Impl::cuda_abort(message); }
-}
-#endif /* defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_CUDA ) */
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
+#endif /* #if defined(__CUDACC__) && defined( KOKKOS_HAVE_CUDA ) */
#endif /* #ifndef KOKKOS_CUDA_ABORT_HPP */
diff --git a/lib/kokkos/core/src/Kokkos_Atomic.hpp b/lib/kokkos/core/src/Kokkos_Atomic.hpp
index 6d37d69a6..3102402b8 100644
--- a/lib/kokkos/core/src/Kokkos_Atomic.hpp
+++ b/lib/kokkos/core/src/Kokkos_Atomic.hpp
@@ -1,305 +1,312 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
/// \file Kokkos_Atomic.hpp
/// \brief Atomic functions
///
/// This header file defines prototypes for the following atomic functions:
/// - exchange
/// - compare and exchange
/// - add
///
/// Supported types include:
/// - signed and unsigned 4 and 8 byte integers
/// - float
/// - double
///
/// They are implemented through GCC compatible intrinsics, OpenMP
/// directives and native CUDA intrinsics.
///
/// Including this header file requires one of the following
/// compilers:
/// - NVCC (for CUDA device code only)
/// - GCC (for host code only)
/// - Intel (for host code only)
/// - A compiler that supports OpenMP 3.1 (for host code only)
#ifndef KOKKOS_ATOMIC_HPP
#define KOKKOS_ATOMIC_HPP
#include <Kokkos_Macros.hpp>
#include <Kokkos_HostSpace.hpp>
#include <impl/Kokkos_Traits.hpp>
//----------------------------------------------------------------------------
#if defined(_WIN32)
#define KOKKOS_ATOMICS_USE_WINDOWS
#else
-#if defined( __CUDA_ARCH__ ) && defined( KOKKOS_HAVE_CUDA )
+#if defined( KOKKOS_HAVE_CUDA )
// Compiling NVIDIA device code, must use Cuda atomics:
#define KOKKOS_ATOMICS_USE_CUDA
+#endif
-#elif ! defined( KOKKOS_ATOMICS_USE_GCC ) && \
- ! defined( KOKKOS_ATOMICS_USE_INTEL ) && \
- ! defined( KOKKOS_ATOMICS_USE_OMP31 )
+#if ! defined( KOKKOS_ATOMICS_USE_GCC ) && \
+ ! defined( KOKKOS_ATOMICS_USE_INTEL ) && \
+ ! defined( KOKKOS_ATOMICS_USE_OMP31 )
// Compiling for non-Cuda atomic implementation has not been pre-selected.
// Choose the best implementation for the detected compiler.
// Preference: GCC, INTEL, OMP31
#if defined( KOKKOS_COMPILER_GNU ) || \
defined( KOKKOS_COMPILER_CLANG ) || \
- ( defined ( KOKKOS_COMPILER_NVCC ) && defined ( __GNUC__ ) )
+ ( defined ( KOKKOS_COMPILER_NVCC ) )
#define KOKKOS_ATOMICS_USE_GCC
#elif defined( KOKKOS_COMPILER_INTEL ) || \
defined( KOKKOS_COMPILER_CRAYC )
#define KOKKOS_ATOMICS_USE_INTEL
#elif defined( _OPENMP ) && ( 201107 <= _OPENMP )
#define KOKKOS_ATOMICS_USE_OMP31
#else
#error "KOKKOS_ATOMICS_USE : Unsupported compiler"
#endif
#endif /* Not pre-selected atomic implementation */
#endif
//----------------------------------------------------------------------------
// Forward decalaration of functions supporting arbitrary sized atomics
// This is necessary since Kokkos_Atomic.hpp is internally included very early
// through Kokkos_HostSpace.hpp as well as the allocation tracker.
#ifdef KOKKOS_HAVE_CUDA
namespace Kokkos {
namespace Impl {
/// \brief Aquire a lock for the address
///
/// This function tries to aquire the lock for the hash value derived
/// from the provided ptr. If the lock is successfully aquired the
/// function returns true. Otherwise it returns false.
+#ifdef KOKKOS_CUDA_USE_RELOCATABLE_DEVICE_CODE
+extern
+#endif
__device__ inline
bool lock_address_cuda_space(void* ptr);
/// \brief Release lock for the address
///
/// This function releases the lock for the hash value derived
/// from the provided ptr. This function should only be called
/// after previously successfully aquiring a lock with
/// lock_address.
+#ifdef KOKKOS_CUDA_USE_RELOCATABLE_DEVICE_CODE
+extern
+#endif
__device__ inline
void unlock_address_cuda_space(void* ptr);
}
}
#endif
namespace Kokkos {
template <typename T>
KOKKOS_INLINE_FUNCTION
void atomic_add(volatile T * const dest, const T src);
// Atomic increment
template<typename T>
KOKKOS_INLINE_FUNCTION
void atomic_increment(volatile T* a);
template<typename T>
KOKKOS_INLINE_FUNCTION
void atomic_decrement(volatile T* a);
}
namespace Kokkos {
inline
const char * atomic_query_version()
{
#if defined( KOKKOS_ATOMICS_USE_CUDA )
return "KOKKOS_ATOMICS_USE_CUDA" ;
#elif defined( KOKKOS_ATOMICS_USE_GCC )
return "KOKKOS_ATOMICS_USE_GCC" ;
#elif defined( KOKKOS_ATOMICS_USE_INTEL )
return "KOKKOS_ATOMICS_USE_INTEL" ;
#elif defined( KOKKOS_ATOMICS_USE_OMP31 )
return "KOKKOS_ATOMICS_USE_OMP31" ;
#elif defined( KOKKOS_ATOMICS_USE_WINDOWS )
return "KOKKOS_ATOMICS_USE_WINDOWS";
#endif
}
} // namespace Kokkos
#ifdef _WIN32
#include "impl/Kokkos_Atomic_Windows.hpp"
#else
//----------------------------------------------------------------------------
// Atomic Assembly
//
// Implements CAS128-bit in assembly
#include "impl/Kokkos_Atomic_Assembly.hpp"
//----------------------------------------------------------------------------
// Atomic exchange
//
// template< typename T >
// T atomic_exchange( volatile T* const dest , const T val )
// { T tmp = *dest ; *dest = val ; return tmp ; }
#include "impl/Kokkos_Atomic_Exchange.hpp"
//----------------------------------------------------------------------------
// Atomic compare-and-exchange
//
// template<class T>
// bool atomic_compare_exchange_strong(volatile T* const dest, const T compare, const T val)
// { bool equal = compare == *dest ; if ( equal ) { *dest = val ; } return equal ; }
#include "impl/Kokkos_Atomic_Compare_Exchange_Strong.hpp"
//----------------------------------------------------------------------------
// Atomic fetch and add
//
// template<class T>
// T atomic_fetch_add(volatile T* const dest, const T val)
// { T tmp = *dest ; *dest += val ; return tmp ; }
#include "impl/Kokkos_Atomic_Fetch_Add.hpp"
//----------------------------------------------------------------------------
// Atomic increment
//
// template<class T>
// T atomic_increment(volatile T* const dest)
// { dest++; }
#include "impl/Kokkos_Atomic_Increment.hpp"
//----------------------------------------------------------------------------
// Atomic Decrement
//
// template<class T>
// T atomic_decrement(volatile T* const dest)
// { dest--; }
#include "impl/Kokkos_Atomic_Decrement.hpp"
//----------------------------------------------------------------------------
// Atomic fetch and sub
//
// template<class T>
// T atomic_fetch_sub(volatile T* const dest, const T val)
// { T tmp = *dest ; *dest -= val ; return tmp ; }
#include "impl/Kokkos_Atomic_Fetch_Sub.hpp"
//----------------------------------------------------------------------------
// Atomic fetch and or
//
// template<class T>
// T atomic_fetch_or(volatile T* const dest, const T val)
// { T tmp = *dest ; *dest = tmp | val ; return tmp ; }
#include "impl/Kokkos_Atomic_Fetch_Or.hpp"
//----------------------------------------------------------------------------
// Atomic fetch and and
//
// template<class T>
// T atomic_fetch_and(volatile T* const dest, const T val)
// { T tmp = *dest ; *dest = tmp & val ; return tmp ; }
#include "impl/Kokkos_Atomic_Fetch_And.hpp"
#endif /*Not _WIN32*/
//----------------------------------------------------------------------------
// Memory fence
//
// All loads and stores from this thread will be globally consistent before continuing
//
// void memory_fence() {...};
#include "impl/Kokkos_Memory_Fence.hpp"
//----------------------------------------------------------------------------
// Provide volatile_load and safe_load
//
// T volatile_load(T const volatile * const ptr);
//
// T const& safe_load(T const * const ptr);
// XEON PHI
// T safe_load(T const * const ptr
#include "impl/Kokkos_Volatile_Load.hpp"
#ifndef _WIN32
#include "impl/Kokkos_Atomic_Generic.hpp"
#endif
//----------------------------------------------------------------------------
// This atomic-style macro should be an inlined function, not a macro
-#if defined( KOKKOS_COMPILER_GNU ) && !defined(__PGIC__)
+#if defined( KOKKOS_COMPILER_GNU ) && !defined(__PGIC__) && !defined(__CUDA_ARCH__)
#define KOKKOS_NONTEMPORAL_PREFETCH_LOAD(addr) __builtin_prefetch(addr,0,0)
#define KOKKOS_NONTEMPORAL_PREFETCH_STORE(addr) __builtin_prefetch(addr,1,0)
#else
#define KOKKOS_NONTEMPORAL_PREFETCH_LOAD(addr) ((void)0)
#define KOKKOS_NONTEMPORAL_PREFETCH_STORE(addr) ((void)0)
#endif
//----------------------------------------------------------------------------
#endif /* KOKKOS_ATOMIC_HPP */
diff --git a/lib/kokkos/core/src/Kokkos_Concepts.hpp b/lib/kokkos/core/src/Kokkos_Concepts.hpp
index 82a342eec..af83e5cac 100644
--- a/lib/kokkos/core/src/Kokkos_Concepts.hpp
+++ b/lib/kokkos/core/src/Kokkos_Concepts.hpp
@@ -1,78 +1,342 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_CORE_CONCEPTS_HPP
#define KOKKOS_CORE_CONCEPTS_HPP
#include <type_traits>
+// Needed for 'is_space<S>::host_mirror_space
+#include <Kokkos_Core_fwd.hpp>
+
+//----------------------------------------------------------------------------
+//----------------------------------------------------------------------------
+
namespace Kokkos {
+
//Schedules for Execution Policies
struct Static {};
struct Dynamic {};
//Schedule Wrapper Type
template<class T>
struct Schedule
{
static_assert( std::is_same<T,Static>::value
|| std::is_same<T,Dynamic>::value
, "Kokkos: Invalid Schedule<> type."
);
- using schedule_type = Schedule<T>;
+ using schedule_type = Schedule ;
using type = T;
};
//Specify Iteration Index Type
template<typename T>
struct IndexType
{
static_assert(std::is_integral<T>::value,"Kokkos: Invalid IndexType<>.");
- using index_type = IndexType<T>;
+ using index_type = IndexType ;
using type = T;
};
} // namespace Kokkos
+//----------------------------------------------------------------------------
+//----------------------------------------------------------------------------
+
+namespace Kokkos {
+
+#define KOKKOS_IMPL_IS_CONCEPT( CONCEPT ) \
+ template< typename T > struct is_ ## CONCEPT { \
+ private: \
+ template< typename , typename = std::true_type > struct have : std::false_type {}; \
+ template< typename U > struct have<U,typename std::is_same<U,typename U:: CONCEPT >::type> : std::true_type {}; \
+ public: \
+ enum { value = is_ ## CONCEPT::template have<T>::value }; \
+ };
+
+// Public concept:
+
+KOKKOS_IMPL_IS_CONCEPT( memory_space )
+KOKKOS_IMPL_IS_CONCEPT( memory_traits )
+KOKKOS_IMPL_IS_CONCEPT( execution_space )
+KOKKOS_IMPL_IS_CONCEPT( execution_policy )
+KOKKOS_IMPL_IS_CONCEPT( array_layout )
+
+namespace Impl {
+
+// For backward compatibility:
+
+using Kokkos::is_memory_space ;
+using Kokkos::is_memory_traits ;
+using Kokkos::is_execution_space ;
+using Kokkos::is_execution_policy ;
+using Kokkos::is_array_layout ;
+
+// Implementation concept:
+
+KOKKOS_IMPL_IS_CONCEPT( iteration_pattern )
+KOKKOS_IMPL_IS_CONCEPT( schedule_type )
+KOKKOS_IMPL_IS_CONCEPT( index_type )
+
+}
+
+#undef KOKKOS_IMPL_IS_CONCEPT
+
+} // namespace Kokkos
+
+//----------------------------------------------------------------------------
+
+namespace Kokkos {
+
+template< class ExecutionSpace , class MemorySpace >
+struct Device {
+ static_assert( Kokkos::is_execution_space<ExecutionSpace>::value
+ , "Execution space is not valid" );
+ static_assert( Kokkos::is_memory_space<MemorySpace>::value
+ , "Memory space is not valid" );
+ typedef ExecutionSpace execution_space;
+ typedef MemorySpace memory_space;
+ typedef Device<execution_space,memory_space> device_type;
+};
+
+
+template< typename T >
+struct is_space {
+private:
+
+ template< typename , typename = void >
+ struct exe : std::false_type { typedef void space ; };
+
+ template< typename , typename = void >
+ struct mem : std::false_type { typedef void space ; };
+
+ template< typename , typename = void >
+ struct dev : std::false_type { typedef void space ; };
+
+ template< typename U >
+ struct exe<U,typename std::conditional<true,void,typename U::execution_space>::type>
+ : std::is_same<U,typename U::execution_space>::type
+ { typedef typename U::execution_space space ; };
+
+ template< typename U >
+ struct mem<U,typename std::conditional<true,void,typename U::memory_space>::type>
+ : std::is_same<U,typename U::memory_space>::type
+ { typedef typename U::memory_space space ; };
+
+ template< typename U >
+ struct dev<U,typename std::conditional<true,void,typename U::device_type>::type>
+ : std::is_same<U,typename U::device_type>::type
+ { typedef typename U::device_type space ; };
+
+ typedef typename is_space::template exe<T> is_exe ;
+ typedef typename is_space::template mem<T> is_mem ;
+ typedef typename is_space::template dev<T> is_dev ;
+
+public:
+
+ enum { value = is_exe::value || is_mem::value || is_dev::value };
+
+ typedef typename is_exe::space execution_space ;
+ typedef typename is_mem::space memory_space ;
+
+ // For backward compatibility, deprecated in favor of
+ // Kokkos::Impl::HostMirror<S>::host_mirror_space
+
+ typedef typename std::conditional
+ < std::is_same< memory_space , Kokkos::HostSpace >::value
+#if defined( KOKKOS_HAVE_CUDA )
+ || std::is_same< memory_space , Kokkos::CudaUVMSpace >::value
+ || std::is_same< memory_space , Kokkos::CudaHostPinnedSpace >::value
+#endif /* #if defined( KOKKOS_HAVE_CUDA ) */
+ , memory_space
+ , Kokkos::HostSpace
+ >::type host_memory_space ;
+
+#if defined( KOKKOS_HAVE_CUDA )
+ typedef typename std::conditional
+ < std::is_same< execution_space , Kokkos::Cuda >::value
+ , Kokkos::DefaultHostExecutionSpace , execution_space
+ >::type host_execution_space ;
+#else
+ typedef execution_space host_execution_space ;
+#endif
+
+ typedef typename std::conditional
+ < std::is_same< execution_space , host_execution_space >::value &&
+ std::is_same< memory_space , host_memory_space >::value
+ , T , Kokkos::Device< host_execution_space , host_memory_space >
+ >::type host_mirror_space ;
+};
+
+// For backward compatiblity
+
+namespace Impl {
+
+using Kokkos::is_space ;
+
+}
+
+} // namespace Kokkos
+
+//----------------------------------------------------------------------------
+
+namespace Kokkos {
+namespace Impl {
+
+/**\brief Access relationship between DstMemorySpace and SrcMemorySpace
+ *
+ * The default case can assume accessibility for the same space.
+ * Specializations must be defined for different memory spaces.
+ */
+template< typename DstMemorySpace , typename SrcMemorySpace >
+struct MemorySpaceAccess {
+
+ static_assert( Kokkos::is_memory_space< DstMemorySpace >::value &&
+ Kokkos::is_memory_space< SrcMemorySpace >::value
+ , "template arguments must be memory spaces" );
+
+ /**\brief Can a View (or pointer) to memory in SrcMemorySpace
+ * be assigned to a View (or pointer) to memory marked DstMemorySpace.
+ *
+ * 1. DstMemorySpace::execution_space == SrcMemorySpace::execution_space
+ * 2. All execution spaces that can access DstMemorySpace can also access
+ * SrcMemorySpace.
+ */
+ enum { assignable = std::is_same<DstMemorySpace,SrcMemorySpace>::value };
+
+ /**\brief For all DstExecSpace::memory_space == DstMemorySpace
+ * DstExecSpace can access SrcMemorySpace.
+ */
+ enum { accessible = assignable };
+
+ /**\brief Does a DeepCopy capability exist
+ * to DstMemorySpace from SrcMemorySpace
+ */
+ enum { deepcopy = assignable };
+};
+
+
+/**\brief Can AccessSpace access MemorySpace ?
+ *
+ * Requires:
+ * Kokkos::is_space< AccessSpace >::value
+ * Kokkos::is_memory_space< MemorySpace >::value
+ *
+ * Can AccessSpace::execution_space access MemorySpace ?
+ * enum : bool { accessible };
+ *
+ * Is View<AccessSpace::memory_space> assignable from View<MemorySpace> ?
+ * enum : bool { assignable };
+ *
+ * If ! accessible then through which intercessory memory space
+ * should a be used to deep copy memory for
+ * AccessSpace::execution_space
+ * to get access.
+ * When AccessSpace::memory_space == Kokkos::HostSpace
+ * then space is the View host mirror space.
+ */
+template< typename AccessSpace , typename MemorySpace >
+struct SpaceAccessibility {
+private:
+
+ static_assert( Kokkos::is_space< AccessSpace >::value
+ , "template argument #1 must be a Kokkos space" );
+
+ static_assert( Kokkos::is_memory_space< MemorySpace >::value
+ , "template argument #2 must be a Kokkos memory space" );
+
+ // The input AccessSpace may be a Device<ExecSpace,MemSpace>
+ // verify that it is a valid combination of spaces.
+ static_assert( Kokkos::Impl::MemorySpaceAccess
+ < typename AccessSpace::execution_space::memory_space
+ , typename AccessSpace::memory_space
+ >::accessible
+ , "template argument #1 is an invalid space" );
+
+ typedef Kokkos::Impl::MemorySpaceAccess
+ < typename AccessSpace::execution_space::memory_space , MemorySpace >
+ exe_access ;
+
+ typedef Kokkos::Impl::MemorySpaceAccess
+ < typename AccessSpace::memory_space , MemorySpace >
+ mem_access ;
+
+public:
+
+ /**\brief Can AccessSpace::execution_space access MemorySpace ?
+ *
+ * Default based upon memory space accessibility.
+ * Specialization required for other relationships.
+ */
+ enum { accessible = exe_access::accessible };
+
+ /**\brief Can assign to AccessSpace from MemorySpace ?
+ *
+ * Default based upon memory space accessibility.
+ * Specialization required for other relationships.
+ */
+ enum { assignable =
+ is_memory_space< AccessSpace >::value && mem_access::assignable };
+
+ /**\brief Can deep copy to AccessSpace::memory_Space from MemorySpace ? */
+ enum { deepcopy = mem_access::deepcopy };
+
+ // What intercessory space for AccessSpace::execution_space
+ // to be able to access MemorySpace?
+ // If same memory space or not accessible use the AccessSpace
+ // else construct a device with execution space and memory space.
+ typedef typename std::conditional
+ < std::is_same<typename AccessSpace::memory_space,MemorySpace>::value ||
+ ! exe_access::accessible
+ , AccessSpace
+ , Kokkos::Device< typename AccessSpace::execution_space , MemorySpace >
+ >::type space ;
+};
+
+}} // namespace Kokkos::Impl
+
+//----------------------------------------------------------------------------
+
#endif // KOKKOS_CORE_CONCEPTS_HPP
diff --git a/lib/kokkos/core/src/Kokkos_Core.hpp b/lib/kokkos/core/src/Kokkos_Core.hpp
index 7cde4610e..266f750d3 100644
--- a/lib/kokkos/core/src/Kokkos_Core.hpp
+++ b/lib/kokkos/core/src/Kokkos_Core.hpp
@@ -1,174 +1,164 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_CORE_HPP
#define KOKKOS_CORE_HPP
//----------------------------------------------------------------------------
// Include the execution space header files for the enabled execution spaces.
#include <Kokkos_Core_fwd.hpp>
#if defined( KOKKOS_HAVE_SERIAL )
#include <Kokkos_Serial.hpp>
#endif
#if defined( KOKKOS_HAVE_OPENMP )
#include <Kokkos_OpenMP.hpp>
#endif
#if defined( KOKKOS_HAVE_PTHREAD )
#include <Kokkos_Threads.hpp>
#endif
#if defined( KOKKOS_HAVE_CUDA )
#include <Kokkos_Cuda.hpp>
#endif
#include <Kokkos_MemoryPool.hpp>
#include <Kokkos_Pair.hpp>
#include <Kokkos_Array.hpp>
#include <Kokkos_View.hpp>
#include <Kokkos_Vectorization.hpp>
#include <Kokkos_Atomic.hpp>
#include <Kokkos_hwloc.hpp>
+#include <Kokkos_Timer.hpp>
#ifdef KOKKOS_HAVE_CXX11
#include <Kokkos_Complex.hpp>
#endif
//----------------------------------------------------------------------------
namespace Kokkos {
struct InitArguments {
int num_threads;
int num_numa;
int device_id;
InitArguments() {
num_threads = -1;
num_numa = -1;
device_id = -1;
}
};
void initialize(int& narg, char* arg[]);
void initialize(const InitArguments& args = InitArguments());
/** \brief Finalize the spaces that were initialized via Kokkos::initialize */
void finalize();
/** \brief Finalize all known execution spaces */
void finalize_all();
void fence();
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
-namespace Experimental {
/* Allocate memory from a memory space.
* The allocation is tracked in Kokkos memory tracking system, so
* leaked memory can be identified.
*/
template< class Space = typename Kokkos::DefaultExecutionSpace::memory_space >
inline
void * kokkos_malloc( const std::string & arg_alloc_label
, const size_t arg_alloc_size )
{
typedef typename Space::memory_space MemorySpace ;
return Impl::SharedAllocationRecord< MemorySpace >::
allocate_tracked( MemorySpace() , arg_alloc_label , arg_alloc_size );
}
template< class Space = typename Kokkos::DefaultExecutionSpace::memory_space >
inline
void * kokkos_malloc( const size_t arg_alloc_size )
{
typedef typename Space::memory_space MemorySpace ;
return Impl::SharedAllocationRecord< MemorySpace >::
allocate_tracked( MemorySpace() , "no-label" , arg_alloc_size );
}
template< class Space = typename Kokkos::DefaultExecutionSpace::memory_space >
inline
void kokkos_free( void * arg_alloc )
{
typedef typename Space::memory_space MemorySpace ;
return Impl::SharedAllocationRecord< MemorySpace >::
deallocate_tracked( arg_alloc );
}
template< class Space = typename Kokkos::DefaultExecutionSpace::memory_space >
inline
void * kokkos_realloc( void * arg_alloc , const size_t arg_alloc_size )
{
typedef typename Space::memory_space MemorySpace ;
return Impl::SharedAllocationRecord< MemorySpace >::
reallocate_tracked( arg_alloc , arg_alloc_size );
}
-} // namespace Experimental
} // namespace Kokkos
-
-namespace Kokkos {
-
-using Kokkos::Experimental::kokkos_malloc ;
-using Kokkos::Experimental::kokkos_realloc ;
-using Kokkos::Experimental::kokkos_free ;
-
-}
-
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
#endif
diff --git a/lib/kokkos/core/src/Kokkos_Core_fwd.hpp b/lib/kokkos/core/src/Kokkos_Core_fwd.hpp
index e9648b59b..0f5ef9200 100644
--- a/lib/kokkos/core/src/Kokkos_Core_fwd.hpp
+++ b/lib/kokkos/core/src/Kokkos_Core_fwd.hpp
@@ -1,247 +1,248 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_CORE_FWD_HPP
#define KOKKOS_CORE_FWD_HPP
//----------------------------------------------------------------------------
// Kokkos_Macros.hpp does introspection on configuration options
// and compiler environment then sets a collection of #define macros.
#include <Kokkos_Macros.hpp>
+#include <impl/Kokkos_Utilities.hpp>
//----------------------------------------------------------------------------
// Have assumed a 64bit build (8byte pointers) throughout the code base.
static_assert( sizeof(void*) == 8
, "Kokkos assumes 64-bit build; i.e., 8-byte pointers" );
//----------------------------------------------------------------------------
namespace Kokkos {
struct AUTO_t {
KOKKOS_INLINE_FUNCTION
constexpr const AUTO_t & operator()() const { return *this ; }
};
namespace {
/**\brief Token to indicate that a parameter's value is to be automatically selected */
constexpr AUTO_t AUTO = Kokkos::AUTO_t();
}
struct InvalidType {};
}
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
// Forward declarations for class inter-relationships
namespace Kokkos {
class HostSpace ; ///< Memory space for main process and CPU execution spaces
#ifdef KOKKOS_HAVE_HBWSPACE
namespace Experimental {
class HBWSpace ; /// Memory space for hbw_malloc from memkind (e.g. for KNL processor)
}
#endif
#if defined( KOKKOS_HAVE_SERIAL )
class Serial ; ///< Execution space main process on CPU
#endif // defined( KOKKOS_HAVE_SERIAL )
#if defined( KOKKOS_HAVE_PTHREAD )
class Threads ; ///< Execution space with pthreads back-end
#endif
#if defined( KOKKOS_HAVE_OPENMP )
class OpenMP ; ///< OpenMP execution space
#endif
#if defined( KOKKOS_HAVE_CUDA )
class CudaSpace ; ///< Memory space on Cuda GPU
class CudaUVMSpace ; ///< Memory space on Cuda GPU with UVM
class CudaHostPinnedSpace ; ///< Memory space on Host accessible to Cuda GPU
class Cuda ; ///< Execution space for Cuda GPU
#endif
template<class ExecutionSpace, class MemorySpace>
struct Device;
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
// Set the default execution space.
/// Define Kokkos::DefaultExecutionSpace as per configuration option
/// or chosen from the enabled execution spaces in the following order:
/// Kokkos::Cuda, Kokkos::OpenMP, Kokkos::Threads, Kokkos::Serial
namespace Kokkos {
#if defined ( KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_CUDA )
typedef Cuda DefaultExecutionSpace ;
#elif defined ( KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_OPENMP )
typedef OpenMP DefaultExecutionSpace ;
#elif defined ( KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_THREADS )
typedef Threads DefaultExecutionSpace ;
#elif defined ( KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_SERIAL )
typedef Serial DefaultExecutionSpace ;
#else
# error "At least one of the following execution spaces must be defined in order to use Kokkos: Kokkos::Cuda, Kokkos::OpenMP, Kokkos::Serial, or Kokkos::Threads."
#endif
#if defined ( KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_OPENMP )
typedef OpenMP DefaultHostExecutionSpace ;
#elif defined ( KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_THREADS )
typedef Threads DefaultHostExecutionSpace ;
#elif defined ( KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_SERIAL )
typedef Serial DefaultHostExecutionSpace ;
#elif defined ( KOKKOS_HAVE_OPENMP )
typedef OpenMP DefaultHostExecutionSpace ;
#elif defined ( KOKKOS_HAVE_PTHREAD )
typedef Threads DefaultHostExecutionSpace ;
#elif defined ( KOKKOS_HAVE_SERIAL )
typedef Serial DefaultHostExecutionSpace ;
#else
# error "At least one of the following execution spaces must be defined in order to use Kokkos: Kokkos::OpenMP, Kokkos::Serial, or Kokkos::Threads."
#endif
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
// Detect the active execution space and define its memory space.
// This is used to verify whether a running kernel can access
// a given memory space.
namespace Kokkos {
namespace Impl {
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_CUDA ) && defined (KOKKOS_HAVE_CUDA)
typedef Kokkos::CudaSpace ActiveExecutionMemorySpace ;
#elif defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
typedef Kokkos::HostSpace ActiveExecutionMemorySpace ;
#else
typedef void ActiveExecutionMemorySpace ;
#endif
template< class ActiveSpace , class MemorySpace >
struct VerifyExecutionCanAccessMemorySpace {
enum {value = 0};
};
template< class Space >
struct VerifyExecutionCanAccessMemorySpace< Space , Space >
{
enum {value = 1};
KOKKOS_INLINE_FUNCTION static void verify(void) {}
KOKKOS_INLINE_FUNCTION static void verify(const void *) {}
};
} // namespace Impl
} // namespace Kokkos
#define KOKKOS_RESTRICT_EXECUTION_TO_DATA( DATA_SPACE , DATA_PTR ) \
Kokkos::Impl::VerifyExecutionCanAccessMemorySpace< \
Kokkos::Impl::ActiveExecutionMemorySpace , DATA_SPACE >::verify( DATA_PTR )
#define KOKKOS_RESTRICT_EXECUTION_TO_( DATA_SPACE ) \
Kokkos::Impl::VerifyExecutionCanAccessMemorySpace< \
Kokkos::Impl::ActiveExecutionMemorySpace , DATA_SPACE >::verify()
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
void fence();
}
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template< class Functor
, class Policy
- , class EnableFunctor = void
+ , class EnableFunctor = void
, class EnablePolicy = void
>
struct FunctorPolicyExecutionSpace;
//----------------------------------------------------------------------------
/// \class ParallelFor
/// \brief Implementation of the ParallelFor operator that has a
/// partial specialization for the device.
///
/// This is an implementation detail of parallel_for. Users should
/// skip this and go directly to the nonmember function parallel_for.
template< class FunctorType , class ExecPolicy , class ExecutionSpace =
- typename Impl::FunctorPolicyExecutionSpace< FunctorType , ExecPolicy >::execution_space
+ typename Impl::FunctorPolicyExecutionSpace< FunctorType , ExecPolicy >::execution_space
> class ParallelFor ;
/// \class ParallelReduce
/// \brief Implementation detail of parallel_reduce.
///
/// This is an implementation detail of parallel_reduce. Users should
/// skip this and go directly to the nonmember function parallel_reduce.
template< class FunctorType , class ExecPolicy , class ReducerType = InvalidType, class ExecutionSpace =
- typename Impl::FunctorPolicyExecutionSpace< FunctorType , ExecPolicy >::execution_space
+ typename Impl::FunctorPolicyExecutionSpace< FunctorType , ExecPolicy >::execution_space
> class ParallelReduce ;
/// \class ParallelScan
/// \brief Implementation detail of parallel_scan.
///
/// This is an implementation detail of parallel_scan. Users should
/// skip this and go directly to the documentation of the nonmember
/// template function Kokkos::parallel_scan.
-template< class FunctorType , class ExecPolicy , class ExecutionSapce =
- typename Impl::FunctorPolicyExecutionSpace< FunctorType , ExecPolicy >::execution_space
+template< class FunctorType , class ExecPolicy , class ExecutionSapce =
+ typename Impl::FunctorPolicyExecutionSpace< FunctorType , ExecPolicy >::execution_space
> class ParallelScan ;
}}
#endif /* #ifndef KOKKOS_CORE_FWD_HPP */
diff --git a/lib/kokkos/core/src/Kokkos_Cuda.hpp b/lib/kokkos/core/src/Kokkos_Cuda.hpp
index 3130ee319..84ae5ee04 100644
--- a/lib/kokkos/core/src/Kokkos_Cuda.hpp
+++ b/lib/kokkos/core/src/Kokkos_Cuda.hpp
@@ -1,274 +1,304 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_CUDA_HPP
#define KOKKOS_CUDA_HPP
#include <Kokkos_Core_fwd.hpp>
// If CUDA execution space is enabled then use this header file.
#if defined( KOKKOS_HAVE_CUDA )
#include <iosfwd>
#include <vector>
#include <Kokkos_CudaSpace.hpp>
#include <Kokkos_Parallel.hpp>
-#include <Kokkos_TaskPolicy.hpp>
+#include <Kokkos_TaskScheduler.hpp>
#include <Kokkos_Layout.hpp>
#include <Kokkos_ScratchSpace.hpp>
#include <Kokkos_MemoryTraits.hpp>
#include <impl/Kokkos_Tags.hpp>
#include <KokkosExp_MDRangePolicy.hpp>
/*--------------------------------------------------------------------------*/
namespace Kokkos {
namespace Impl {
class CudaExec ;
} // namespace Impl
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
namespace Kokkos {
/// \class Cuda
/// \brief Kokkos Execution Space that uses CUDA to run on GPUs.
///
/// An "execution space" represents a parallel execution model. It tells Kokkos
/// how to parallelize the execution of kernels in a parallel_for or
/// parallel_reduce. For example, the Threads execution space uses Pthreads or
/// C++11 threads on a CPU, the OpenMP execution space uses the OpenMP language
/// extensions, and the Serial execution space executes "parallel" kernels
/// sequentially. The Cuda execution space uses NVIDIA's CUDA programming
/// model to execute kernels in parallel on GPUs.
class Cuda {
public:
//! \name Type declarations that all Kokkos execution spaces must provide.
//@{
//! Tag this class as a kokkos execution space
typedef Cuda execution_space ;
#if defined( KOKKOS_USE_CUDA_UVM )
//! This execution space's preferred memory space.
typedef CudaUVMSpace memory_space ;
#else
//! This execution space's preferred memory space.
typedef CudaSpace memory_space ;
#endif
//! This execution space preferred device_type
typedef Kokkos::Device<execution_space,memory_space> device_type;
//! The size_type best suited for this execution space.
typedef memory_space::size_type size_type ;
//! This execution space's preferred array layout.
typedef LayoutLeft array_layout ;
//!
typedef ScratchMemorySpace< Cuda > scratch_memory_space ;
//@}
//--------------------------------------------------
//! \name Functions that all Kokkos devices must implement.
//@{
/// \brief True if and only if this method is being called in a
/// thread-parallel function.
KOKKOS_INLINE_FUNCTION static int in_parallel() {
#if defined( __CUDA_ARCH__ )
return true;
#else
return false;
#endif
}
/** \brief Set the device in a "sleep" state.
*
* This function sets the device in a "sleep" state in which it is
* not ready for work. This may consume less resources than if the
* device were in an "awake" state, but it may also take time to
* bring the device from a sleep state to be ready for work.
*
* \return True if the device is in the "sleep" state, else false if
* the device is actively working and could not enter the "sleep"
* state.
*/
static bool sleep();
/// \brief Wake the device from the 'sleep' state so it is ready for work.
///
/// \return True if the device is in the "ready" state, else "false"
/// if the device is actively working (which also means that it's
/// awake).
static bool wake();
/// \brief Wait until all dispatched functors complete.
///
/// The parallel_for or parallel_reduce dispatch of a functor may
/// return asynchronously, before the functor completes. This
/// method does not return until all dispatched functors on this
/// device have completed.
static void fence();
//! Free any resources being consumed by the device.
static void finalize();
//! Has been initialized
static int is_initialized();
/** \brief Return the maximum amount of concurrency. */
static int concurrency();
//! Print configuration information to the given output stream.
static void print_configuration( std::ostream & , const bool detail = false );
//@}
//--------------------------------------------------
//! \name Cuda space instances
~Cuda() {}
Cuda();
explicit Cuda( const int instance_id );
Cuda( Cuda && ) = default ;
Cuda( const Cuda & ) = default ;
Cuda & operator = ( Cuda && ) = default ;
Cuda & operator = ( const Cuda & ) = default ;
//--------------------------------------------------------------------------
//! \name Device-specific functions
//@{
struct SelectDevice {
int cuda_device_id ;
SelectDevice() : cuda_device_id(0) {}
explicit SelectDevice( int id ) : cuda_device_id( id ) {}
};
//! Initialize, telling the CUDA run-time library which device to use.
static void initialize( const SelectDevice = SelectDevice()
, const size_t num_instances = 1 );
/// \brief Cuda device architecture of the selected device.
///
/// This matches the __CUDA_ARCH__ specification.
static size_type device_arch();
//! Query device count.
static size_type detect_device_count();
/** \brief Detect the available devices and their architecture
* as defined by the __CUDA_ARCH__ specification.
*/
static std::vector<unsigned> detect_device_arch();
cudaStream_t cuda_stream() const { return m_stream ; }
int cuda_device() const { return m_device ; }
//@}
//--------------------------------------------------------------------------
private:
cudaStream_t m_stream ;
int m_device ;
};
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
namespace Kokkos {
namespace Impl {
+template<>
+struct MemorySpaceAccess
+ < Kokkos::CudaSpace
+ , Kokkos::Cuda::scratch_memory_space
+ >
+{
+ enum { assignable = false };
+ enum { accessible = true };
+ enum { deepcopy = false };
+};
+
+#if defined( KOKKOS_USE_CUDA_UVM )
+
+// If forcing use of UVM everywhere
+// then must assume that CudaUVMSpace
+// can be a stand-in for CudaSpace.
+// This will fail when a strange host-side execution space
+// that defines CudaUVMSpace as its preferredmemory space.
+
+template<>
+struct MemorySpaceAccess
+ < Kokkos::CudaUVMSpace
+ , Kokkos::Cuda::scratch_memory_space
+ >
+{
+ enum { assignable = false };
+ enum { accessible = true };
+ enum { deepcopy = false };
+};
+
+#endif
+
+
template<>
struct VerifyExecutionCanAccessMemorySpace
< Kokkos::CudaSpace
, Kokkos::Cuda::scratch_memory_space
>
{
enum { value = true };
KOKKOS_INLINE_FUNCTION static void verify( void ) { }
KOKKOS_INLINE_FUNCTION static void verify( const void * ) { }
};
template<>
struct VerifyExecutionCanAccessMemorySpace
< Kokkos::HostSpace
, Kokkos::Cuda::scratch_memory_space
>
{
enum { value = false };
inline static void verify( void ) { CudaSpace::access_error(); }
inline static void verify( const void * p ) { CudaSpace::access_error(p); }
};
} // namespace Impl
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
#include <Cuda/Kokkos_CudaExec.hpp>
#include <Cuda/Kokkos_Cuda_View.hpp>
-
-#include <Cuda/KokkosExp_Cuda_View.hpp>
-
#include <Cuda/Kokkos_Cuda_Parallel.hpp>
#include <Cuda/Kokkos_Cuda_Task.hpp>
//----------------------------------------------------------------------------
#endif /* #if defined( KOKKOS_HAVE_CUDA ) */
#endif /* #ifndef KOKKOS_CUDA_HPP */
diff --git a/lib/kokkos/core/src/Kokkos_CudaSpace.hpp b/lib/kokkos/core/src/Kokkos_CudaSpace.hpp
index cd728895d..fd9b0ad12 100644
--- a/lib/kokkos/core/src/Kokkos_CudaSpace.hpp
+++ b/lib/kokkos/core/src/Kokkos_CudaSpace.hpp
@@ -1,802 +1,944 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_CUDASPACE_HPP
#define KOKKOS_CUDASPACE_HPP
#include <Kokkos_Core_fwd.hpp>
#if defined( KOKKOS_HAVE_CUDA )
#include <iosfwd>
#include <typeinfo>
#include <string>
#include <Kokkos_HostSpace.hpp>
#include <Cuda/Kokkos_Cuda_abort.hpp>
/*--------------------------------------------------------------------------*/
namespace Kokkos {
/** \brief Cuda on-device memory management */
class CudaSpace {
public:
//! Tag this class as a kokkos memory space
typedef CudaSpace memory_space ;
typedef Kokkos::Cuda execution_space ;
typedef Kokkos::Device<execution_space,memory_space> device_type;
typedef unsigned int size_type ;
/*--------------------------------*/
CudaSpace();
CudaSpace( CudaSpace && rhs ) = default ;
CudaSpace( const CudaSpace & rhs ) = default ;
CudaSpace & operator = ( CudaSpace && rhs ) = default ;
CudaSpace & operator = ( const CudaSpace & rhs ) = default ;
~CudaSpace() = default ;
/**\brief Allocate untracked memory in the cuda space */
void * allocate( const size_t arg_alloc_size ) const ;
/**\brief Deallocate untracked memory in the cuda space */
void deallocate( void * const arg_alloc_ptr
, const size_t arg_alloc_size ) const ;
+ /**\brief Return Name of the MemorySpace */
+ static constexpr const char* name();
+
/*--------------------------------*/
/** \brief Error reporting for HostSpace attempt to access CudaSpace */
static void access_error();
static void access_error( const void * const );
private:
int m_device ; ///< Which Cuda device
- // friend class Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::CudaSpace , void > ;
+ static constexpr const char* m_name = "Cuda";
+ friend class Kokkos::Impl::SharedAllocationRecord< Kokkos::CudaSpace , void > ;
};
namespace Impl {
/// \brief Initialize lock array for arbitrary size atomics.
///
/// Arbitrary atomics are implemented using a hash table of locks
/// where the hash value is derived from the address of the
/// object for which an atomic operation is performed.
/// This function initializes the locks to zero (unset).
void init_lock_arrays_cuda_space();
/// \brief Retrieve the pointer to the lock array for arbitrary size atomics.
///
/// Arbitrary atomics are implemented using a hash table of locks
/// where the hash value is derived from the address of the
/// object for which an atomic operation is performed.
/// This function retrieves the lock array pointer.
/// If the array is not yet allocated it will do so.
int* atomic_lock_array_cuda_space_ptr(bool deallocate = false);
/// \brief Retrieve the pointer to the scratch array for team and thread private global memory.
///
/// Team and Thread private scratch allocations in
/// global memory are aquired via locks.
/// This function retrieves the lock array pointer.
/// If the array is not yet allocated it will do so.
int* scratch_lock_array_cuda_space_ptr(bool deallocate = false);
/// \brief Retrieve the pointer to the scratch array for unique identifiers.
///
/// Unique identifiers in the range 0-Cuda::concurrency
/// are provided via locks.
/// This function retrieves the lock array pointer.
/// If the array is not yet allocated it will do so.
int* threadid_lock_array_cuda_space_ptr(bool deallocate = false);
}
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
namespace Kokkos {
/** \brief Cuda memory that is accessible to Host execution space
* through Cuda's unified virtual memory (UVM) runtime.
*/
class CudaUVMSpace {
public:
//! Tag this class as a kokkos memory space
typedef CudaUVMSpace memory_space ;
typedef Cuda execution_space ;
typedef Kokkos::Device<execution_space,memory_space> device_type;
typedef unsigned int size_type ;
/** \brief If UVM capability is available */
static bool available();
+
+ /*--------------------------------*/
+ /** \brief CudaUVMSpace specific routine */
+ static int number_of_allocations();
+
+ /*--------------------------------*/
+
+
/*--------------------------------*/
CudaUVMSpace();
CudaUVMSpace( CudaUVMSpace && rhs ) = default ;
CudaUVMSpace( const CudaUVMSpace & rhs ) = default ;
CudaUVMSpace & operator = ( CudaUVMSpace && rhs ) = default ;
CudaUVMSpace & operator = ( const CudaUVMSpace & rhs ) = default ;
~CudaUVMSpace() = default ;
/**\brief Allocate untracked memory in the cuda space */
void * allocate( const size_t arg_alloc_size ) const ;
/**\brief Deallocate untracked memory in the cuda space */
void deallocate( void * const arg_alloc_ptr
, const size_t arg_alloc_size ) const ;
+ /**\brief Return Name of the MemorySpace */
+ static constexpr const char* name();
+
/*--------------------------------*/
private:
-
int m_device ; ///< Which Cuda device
+
+ static constexpr const char* m_name = "CudaUVM";
+
};
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
namespace Kokkos {
/** \brief Host memory that is accessible to Cuda execution space
* through Cuda's host-pinned memory allocation.
*/
class CudaHostPinnedSpace {
public:
//! Tag this class as a kokkos memory space
/** \brief Memory is in HostSpace so use the HostSpace::execution_space */
typedef HostSpace::execution_space execution_space ;
typedef CudaHostPinnedSpace memory_space ;
typedef Kokkos::Device<execution_space,memory_space> device_type;
typedef unsigned int size_type ;
/*--------------------------------*/
CudaHostPinnedSpace();
CudaHostPinnedSpace( CudaHostPinnedSpace && rhs ) = default ;
CudaHostPinnedSpace( const CudaHostPinnedSpace & rhs ) = default ;
CudaHostPinnedSpace & operator = ( CudaHostPinnedSpace && rhs ) = default ;
CudaHostPinnedSpace & operator = ( const CudaHostPinnedSpace & rhs ) = default ;
~CudaHostPinnedSpace() = default ;
/**\brief Allocate untracked memory in the space */
void * allocate( const size_t arg_alloc_size ) const ;
/**\brief Deallocate untracked memory in the space */
void deallocate( void * const arg_alloc_ptr
, const size_t arg_alloc_size ) const ;
+ /**\brief Return Name of the MemorySpace */
+ static constexpr const char* name();
+
+private:
+
+ static constexpr const char* m_name = "CudaHostPinned";
+
/*--------------------------------*/
};
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
namespace Kokkos {
namespace Impl {
+static_assert( Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaSpace , Kokkos::CudaSpace >::assignable , "" );
+static_assert( Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace , Kokkos::CudaUVMSpace >::assignable , "" );
+static_assert( Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace , Kokkos::CudaHostPinnedSpace >::assignable , "" );
+
+//----------------------------------------
+
+template<>
+struct MemorySpaceAccess< Kokkos::HostSpace , Kokkos::CudaSpace > {
+ enum { assignable = false };
+ enum { accessible = false };
+ enum { deepcopy = true };
+};
+
+template<>
+struct MemorySpaceAccess< Kokkos::HostSpace , Kokkos::CudaUVMSpace > {
+ // HostSpace::execution_space != CudaUVMSpace::execution_space
+ enum { assignable = false };
+ enum { accessible = true };
+ enum { deepcopy = true };
+};
+
+template<>
+struct MemorySpaceAccess< Kokkos::HostSpace , Kokkos::CudaHostPinnedSpace > {
+ // HostSpace::execution_space == CudaHostPinnedSpace::execution_space
+ enum { assignable = true };
+ enum { accessible = true };
+ enum { deepcopy = true };
+};
+
+//----------------------------------------
+
+template<>
+struct MemorySpaceAccess< Kokkos::CudaSpace , Kokkos::HostSpace > {
+ enum { assignable = false };
+ enum { accessible = false };
+ enum { deepcopy = true };
+};
+
+template<>
+struct MemorySpaceAccess< Kokkos::CudaSpace , Kokkos::CudaUVMSpace > {
+ // CudaSpace::execution_space == CudaUVMSpace::execution_space
+ enum { assignable = true };
+ enum { accessible = true };
+ enum { deepcopy = true };
+};
+
+template<>
+struct MemorySpaceAccess< Kokkos::CudaSpace , Kokkos::CudaHostPinnedSpace > {
+ // CudaSpace::execution_space != CudaHostPinnedSpace::execution_space
+ enum { assignable = false };
+ enum { accessible = true }; // CudaSpace::execution_space
+ enum { deepcopy = true };
+};
+
+//----------------------------------------
+// CudaUVMSpace::execution_space == Cuda
+// CudaUVMSpace accessible to both Cuda and Host
+
+template<>
+struct MemorySpaceAccess< Kokkos::CudaUVMSpace , Kokkos::HostSpace > {
+ enum { assignable = false };
+ enum { accessible = false }; // Cuda cannot access HostSpace
+ enum { deepcopy = true };
+};
+
+template<>
+struct MemorySpaceAccess< Kokkos::CudaUVMSpace , Kokkos::CudaSpace > {
+ // CudaUVMSpace::execution_space == CudaSpace::execution_space
+ // Can access CudaUVMSpace from Host but cannot access CudaSpace from Host
+ enum { assignable = false };
+
+ // CudaUVMSpace::execution_space can access CudaSpace
+ enum { accessible = true };
+ enum { deepcopy = true };
+};
+
+template<>
+struct MemorySpaceAccess< Kokkos::CudaUVMSpace , Kokkos::CudaHostPinnedSpace > {
+ // CudaUVMSpace::execution_space != CudaHostPinnedSpace::execution_space
+ enum { assignable = false };
+ enum { accessible = true }; // CudaUVMSpace::execution_space
+ enum { deepcopy = true };
+};
+
+
+//----------------------------------------
+// CudaHostPinnedSpace::execution_space == HostSpace::execution_space
+// CudaHostPinnedSpace accessible to both Cuda and Host
+
+template<>
+struct MemorySpaceAccess< Kokkos::CudaHostPinnedSpace , Kokkos::HostSpace > {
+ enum { assignable = false }; // Cannot access from Cuda
+ enum { accessible = true }; // CudaHostPinnedSpace::execution_space
+ enum { deepcopy = true };
+};
+
+template<>
+struct MemorySpaceAccess< Kokkos::CudaHostPinnedSpace , Kokkos::CudaSpace > {
+ enum { assignable = false }; // Cannot access from Host
+ enum { accessible = false };
+ enum { deepcopy = true };
+};
+
+template<>
+struct MemorySpaceAccess< Kokkos::CudaHostPinnedSpace , Kokkos::CudaUVMSpace > {
+ enum { assignable = false }; // different execution_space
+ enum { accessible = true }; // same accessibility
+ enum { deepcopy = true };
+};
+
+//----------------------------------------
+
+}} // namespace Kokkos::Impl
+
+/*--------------------------------------------------------------------------*/
+/*--------------------------------------------------------------------------*/
+
+namespace Kokkos {
+namespace Impl {
+
void DeepCopyAsyncCuda( void * dst , const void * src , size_t n);
template<> struct DeepCopy< CudaSpace , CudaSpace , Cuda>
{
DeepCopy( void * dst , const void * src , size_t );
DeepCopy( const Cuda & , void * dst , const void * src , size_t );
};
template<> struct DeepCopy< CudaSpace , HostSpace , Cuda >
{
DeepCopy( void * dst , const void * src , size_t );
DeepCopy( const Cuda & , void * dst , const void * src , size_t );
};
template<> struct DeepCopy< HostSpace , CudaSpace , Cuda >
{
DeepCopy( void * dst , const void * src , size_t );
DeepCopy( const Cuda & , void * dst , const void * src , size_t );
};
template<class ExecutionSpace> struct DeepCopy< CudaSpace , CudaSpace , ExecutionSpace >
{
inline
DeepCopy( void * dst , const void * src , size_t n )
{ (void) DeepCopy< CudaSpace , CudaSpace , Cuda >( dst , src , n ); }
inline
DeepCopy( const ExecutionSpace& exec, void * dst , const void * src , size_t n )
{
exec.fence();
DeepCopyAsyncCuda (dst,src,n);
}
};
template<class ExecutionSpace> struct DeepCopy< CudaSpace , HostSpace , ExecutionSpace >
{
inline
DeepCopy( void * dst , const void * src , size_t n )
{ (void) DeepCopy< CudaSpace , HostSpace , Cuda>( dst , src , n ); }
inline
DeepCopy( const ExecutionSpace& exec, void * dst , const void * src , size_t n )
{
exec.fence();
DeepCopyAsyncCuda (dst,src,n);
}
};
template<class ExecutionSpace>
struct DeepCopy< HostSpace , CudaSpace , ExecutionSpace >
{
inline
DeepCopy( void * dst , const void * src , size_t n )
{ (void) DeepCopy< HostSpace , CudaSpace , Cuda >( dst , src , n ); }
inline
DeepCopy( const ExecutionSpace& exec, void * dst , const void * src , size_t n )
{
exec.fence();
DeepCopyAsyncCuda (dst,src,n);
}
};
template<class ExecutionSpace>
struct DeepCopy< CudaSpace , CudaUVMSpace , ExecutionSpace >
{
inline
DeepCopy( void * dst , const void * src , size_t n )
{ (void) DeepCopy< CudaSpace , CudaSpace , Cuda >( dst , src , n ); }
inline
DeepCopy( const ExecutionSpace& exec, void * dst , const void * src , size_t n )
{
exec.fence();
DeepCopyAsyncCuda (dst,src,n);
}
};
template<class ExecutionSpace>
struct DeepCopy< CudaSpace , CudaHostPinnedSpace , ExecutionSpace>
{
inline
DeepCopy( void * dst , const void * src , size_t n )
{ (void) DeepCopy< CudaSpace , HostSpace , Cuda >( dst , src , n ); }
inline
DeepCopy( const ExecutionSpace& exec, void * dst , const void * src , size_t n )
{
exec.fence();
DeepCopyAsyncCuda (dst,src,n);
}
};
template<class ExecutionSpace>
struct DeepCopy< CudaUVMSpace , CudaSpace , ExecutionSpace>
{
inline
DeepCopy( void * dst , const void * src , size_t n )
{ (void) DeepCopy< CudaSpace , CudaSpace , Cuda >( dst , src , n ); }
inline
DeepCopy( const ExecutionSpace& exec, void * dst , const void * src , size_t n )
{
exec.fence();
DeepCopyAsyncCuda (dst,src,n);
}
};
template<class ExecutionSpace>
struct DeepCopy< CudaUVMSpace , CudaUVMSpace , ExecutionSpace>
{
inline
DeepCopy( void * dst , const void * src , size_t n )
{ (void) DeepCopy< CudaSpace , CudaSpace , Cuda >( dst , src , n ); }
inline
DeepCopy( const ExecutionSpace& exec, void * dst , const void * src , size_t n )
{
exec.fence();
DeepCopyAsyncCuda (dst,src,n);
}
};
template<class ExecutionSpace>
struct DeepCopy< CudaUVMSpace , CudaHostPinnedSpace , ExecutionSpace>
{
inline
DeepCopy( void * dst , const void * src , size_t n )
{ (void) DeepCopy< CudaSpace , HostSpace , Cuda >( dst , src , n ); }
inline
DeepCopy( const ExecutionSpace& exec, void * dst , const void * src , size_t n )
{
exec.fence();
DeepCopyAsyncCuda (dst,src,n);
}
};
template<class ExecutionSpace> struct DeepCopy< CudaUVMSpace , HostSpace , ExecutionSpace >
{
inline
DeepCopy( void * dst , const void * src , size_t n )
{ (void) DeepCopy< CudaSpace , HostSpace , Cuda >( dst , src , n ); }
inline
DeepCopy( const ExecutionSpace& exec, void * dst , const void * src , size_t n )
{
exec.fence();
DeepCopyAsyncCuda (dst,src,n);
}
};
template<class ExecutionSpace> struct DeepCopy< CudaHostPinnedSpace , CudaSpace , ExecutionSpace >
{
inline
DeepCopy( void * dst , const void * src , size_t n )
{ (void) DeepCopy< HostSpace , CudaSpace , Cuda >( dst , src , n ); }
inline
DeepCopy( const ExecutionSpace& exec, void * dst , const void * src , size_t n )
{
exec.fence();
DeepCopyAsyncCuda (dst,src,n);
}
};
template<class ExecutionSpace> struct DeepCopy< CudaHostPinnedSpace , CudaUVMSpace , ExecutionSpace >
{
inline
DeepCopy( void * dst , const void * src , size_t n )
{ (void) DeepCopy< HostSpace , CudaSpace , Cuda >( dst , src , n ); }
inline
DeepCopy( const ExecutionSpace& exec, void * dst , const void * src , size_t n )
{
exec.fence();
DeepCopyAsyncCuda (dst,src,n);
}
};
template<class ExecutionSpace> struct DeepCopy< CudaHostPinnedSpace , CudaHostPinnedSpace , ExecutionSpace >
{
inline
DeepCopy( void * dst , const void * src , size_t n )
{ (void) DeepCopy< HostSpace , HostSpace , Cuda >( dst , src , n ); }
inline
DeepCopy( const ExecutionSpace& exec, void * dst , const void * src , size_t n )
{
exec.fence();
DeepCopyAsyncCuda (dst,src,n);
}
};
template<class ExecutionSpace> struct DeepCopy< CudaHostPinnedSpace , HostSpace , ExecutionSpace >
{
inline
DeepCopy( void * dst , const void * src , size_t n )
{ (void) DeepCopy< HostSpace , HostSpace , Cuda >( dst , src , n ); }
inline
DeepCopy( const ExecutionSpace& exec, void * dst , const void * src , size_t n )
{
exec.fence();
DeepCopyAsyncCuda (dst,src,n);
}
};
template<class ExecutionSpace> struct DeepCopy< HostSpace , CudaUVMSpace , ExecutionSpace >
{
inline
DeepCopy( void * dst , const void * src , size_t n )
{ (void) DeepCopy< HostSpace , CudaSpace , Cuda >( dst , src , n ); }
inline
DeepCopy( const ExecutionSpace& exec, void * dst , const void * src , size_t n )
{
exec.fence();
DeepCopyAsyncCuda (dst,src,n);
}
};
template<class ExecutionSpace> struct DeepCopy< HostSpace , CudaHostPinnedSpace , ExecutionSpace >
{
inline
DeepCopy( void * dst , const void * src , size_t n )
{ (void) DeepCopy< HostSpace , HostSpace , Cuda >( dst , src , n ); }
inline
DeepCopy( const ExecutionSpace& exec, void * dst , const void * src , size_t n )
{
exec.fence();
DeepCopyAsyncCuda (dst,src,n);
}
};
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
/** Running in CudaSpace attempting to access HostSpace: error */
template<>
struct VerifyExecutionCanAccessMemorySpace< Kokkos::CudaSpace , Kokkos::HostSpace >
{
enum { value = false };
KOKKOS_INLINE_FUNCTION static void verify( void )
{ Kokkos::abort("Cuda code attempted to access HostSpace memory"); }
KOKKOS_INLINE_FUNCTION static void verify( const void * )
{ Kokkos::abort("Cuda code attempted to access HostSpace memory"); }
};
/** Running in CudaSpace accessing CudaUVMSpace: ok */
template<>
struct VerifyExecutionCanAccessMemorySpace< Kokkos::CudaSpace , Kokkos::CudaUVMSpace >
{
enum { value = true };
KOKKOS_INLINE_FUNCTION static void verify( void ) { }
KOKKOS_INLINE_FUNCTION static void verify( const void * ) { }
};
/** Running in CudaSpace accessing CudaHostPinnedSpace: ok */
template<>
struct VerifyExecutionCanAccessMemorySpace< Kokkos::CudaSpace , Kokkos::CudaHostPinnedSpace >
{
enum { value = true };
KOKKOS_INLINE_FUNCTION static void verify( void ) { }
KOKKOS_INLINE_FUNCTION static void verify( const void * ) { }
};
/** Running in CudaSpace attempting to access an unknown space: error */
template< class OtherSpace >
struct VerifyExecutionCanAccessMemorySpace<
typename enable_if< ! is_same<Kokkos::CudaSpace,OtherSpace>::value , Kokkos::CudaSpace >::type ,
OtherSpace >
{
enum { value = false };
KOKKOS_INLINE_FUNCTION static void verify( void )
{ Kokkos::abort("Cuda code attempted to access unknown Space memory"); }
KOKKOS_INLINE_FUNCTION static void verify( const void * )
{ Kokkos::abort("Cuda code attempted to access unknown Space memory"); }
};
//----------------------------------------------------------------------------
/** Running in HostSpace attempting to access CudaSpace */
template<>
struct VerifyExecutionCanAccessMemorySpace< Kokkos::HostSpace , Kokkos::CudaSpace >
{
enum { value = false };
inline static void verify( void ) { CudaSpace::access_error(); }
inline static void verify( const void * p ) { CudaSpace::access_error(p); }
};
/** Running in HostSpace accessing CudaUVMSpace is OK */
template<>
struct VerifyExecutionCanAccessMemorySpace< Kokkos::HostSpace , Kokkos::CudaUVMSpace >
{
enum { value = true };
inline static void verify( void ) { }
inline static void verify( const void * ) { }
};
/** Running in HostSpace accessing CudaHostPinnedSpace is OK */
template<>
struct VerifyExecutionCanAccessMemorySpace< Kokkos::HostSpace , Kokkos::CudaHostPinnedSpace >
{
enum { value = true };
KOKKOS_INLINE_FUNCTION static void verify( void ) {}
KOKKOS_INLINE_FUNCTION static void verify( const void * ) {}
};
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
-namespace Experimental {
namespace Impl {
template<>
class SharedAllocationRecord< Kokkos::CudaSpace , void >
: public SharedAllocationRecord< void , void >
{
private:
friend class SharedAllocationRecord< Kokkos::CudaUVMSpace , void > ;
typedef SharedAllocationRecord< void , void > RecordBase ;
SharedAllocationRecord( const SharedAllocationRecord & ) = delete ;
SharedAllocationRecord & operator = ( const SharedAllocationRecord & ) = delete ;
static void deallocate( RecordBase * );
static ::cudaTextureObject_t
attach_texture_object( const unsigned sizeof_alias
, void * const alloc_ptr
, const size_t alloc_size );
static RecordBase s_root_record ;
::cudaTextureObject_t m_tex_obj ;
const Kokkos::CudaSpace m_space ;
protected:
~SharedAllocationRecord();
SharedAllocationRecord() : RecordBase(), m_tex_obj(0), m_space() {}
SharedAllocationRecord( const Kokkos::CudaSpace & arg_space
, const std::string & arg_label
, const size_t arg_alloc_size
, const RecordBase::function_type arg_dealloc = & deallocate
);
public:
std::string get_label() const ;
static SharedAllocationRecord * allocate( const Kokkos::CudaSpace & arg_space
, const std::string & arg_label
, const size_t arg_alloc_size );
/**\brief Allocate tracked memory in the space */
static
void * allocate_tracked( const Kokkos::CudaSpace & arg_space
, const std::string & arg_label
, const size_t arg_alloc_size );
/**\brief Reallocate tracked memory in the space */
static
void * reallocate_tracked( void * const arg_alloc_ptr
, const size_t arg_alloc_size );
/**\brief Deallocate tracked memory in the space */
static
void deallocate_tracked( void * const arg_alloc_ptr );
static SharedAllocationRecord * get_record( void * arg_alloc_ptr );
template< typename AliasType >
inline
::cudaTextureObject_t attach_texture_object()
{
static_assert( ( std::is_same< AliasType , int >::value ||
std::is_same< AliasType , ::int2 >::value ||
std::is_same< AliasType , ::int4 >::value )
, "Cuda texture fetch only supported for alias types of int, ::int2, or ::int4" );
if ( m_tex_obj == 0 ) {
m_tex_obj = attach_texture_object( sizeof(AliasType)
, (void*) RecordBase::m_alloc_ptr
, RecordBase::m_alloc_size );
}
return m_tex_obj ;
}
template< typename AliasType >
inline
int attach_texture_object_offset( const AliasType * const ptr )
{
// Texture object is attached to the entire allocation range
return ptr - reinterpret_cast<AliasType*>( RecordBase::m_alloc_ptr );
}
static void print_records( std::ostream & , const Kokkos::CudaSpace & , bool detail = false );
};
template<>
class SharedAllocationRecord< Kokkos::CudaUVMSpace , void >
: public SharedAllocationRecord< void , void >
{
private:
typedef SharedAllocationRecord< void , void > RecordBase ;
SharedAllocationRecord( const SharedAllocationRecord & ) = delete ;
SharedAllocationRecord & operator = ( const SharedAllocationRecord & ) = delete ;
static void deallocate( RecordBase * );
static RecordBase s_root_record ;
::cudaTextureObject_t m_tex_obj ;
const Kokkos::CudaUVMSpace m_space ;
protected:
~SharedAllocationRecord();
SharedAllocationRecord() : RecordBase(), m_tex_obj(0), m_space() {}
SharedAllocationRecord( const Kokkos::CudaUVMSpace & arg_space
, const std::string & arg_label
, const size_t arg_alloc_size
, const RecordBase::function_type arg_dealloc = & deallocate
);
public:
std::string get_label() const ;
static SharedAllocationRecord * allocate( const Kokkos::CudaUVMSpace & arg_space
, const std::string & arg_label
, const size_t arg_alloc_size
);
/**\brief Allocate tracked memory in the space */
static
void * allocate_tracked( const Kokkos::CudaUVMSpace & arg_space
, const std::string & arg_label
, const size_t arg_alloc_size );
/**\brief Reallocate tracked memory in the space */
static
void * reallocate_tracked( void * const arg_alloc_ptr
, const size_t arg_alloc_size );
/**\brief Deallocate tracked memory in the space */
static
void deallocate_tracked( void * const arg_alloc_ptr );
static SharedAllocationRecord * get_record( void * arg_alloc_ptr );
template< typename AliasType >
inline
::cudaTextureObject_t attach_texture_object()
{
static_assert( ( std::is_same< AliasType , int >::value ||
std::is_same< AliasType , ::int2 >::value ||
std::is_same< AliasType , ::int4 >::value )
, "Cuda texture fetch only supported for alias types of int, ::int2, or ::int4" );
if ( m_tex_obj == 0 ) {
m_tex_obj = SharedAllocationRecord< Kokkos::CudaSpace , void >::
attach_texture_object( sizeof(AliasType)
, (void*) RecordBase::m_alloc_ptr
, RecordBase::m_alloc_size );
}
return m_tex_obj ;
}
template< typename AliasType >
inline
int attach_texture_object_offset( const AliasType * const ptr )
{
// Texture object is attached to the entire allocation range
return ptr - reinterpret_cast<AliasType*>( RecordBase::m_alloc_ptr );
}
static void print_records( std::ostream & , const Kokkos::CudaUVMSpace & , bool detail = false );
};
template<>
class SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >
: public SharedAllocationRecord< void , void >
{
private:
typedef SharedAllocationRecord< void , void > RecordBase ;
SharedAllocationRecord( const SharedAllocationRecord & ) = delete ;
SharedAllocationRecord & operator = ( const SharedAllocationRecord & ) = delete ;
static void deallocate( RecordBase * );
static RecordBase s_root_record ;
const Kokkos::CudaHostPinnedSpace m_space ;
protected:
~SharedAllocationRecord();
SharedAllocationRecord() : RecordBase(), m_space() {}
SharedAllocationRecord( const Kokkos::CudaHostPinnedSpace & arg_space
, const std::string & arg_label
, const size_t arg_alloc_size
, const RecordBase::function_type arg_dealloc = & deallocate
);
public:
std::string get_label() const ;
static SharedAllocationRecord * allocate( const Kokkos::CudaHostPinnedSpace & arg_space
, const std::string & arg_label
, const size_t arg_alloc_size
);
/**\brief Allocate tracked memory in the space */
static
void * allocate_tracked( const Kokkos::CudaHostPinnedSpace & arg_space
, const std::string & arg_label
, const size_t arg_alloc_size );
/**\brief Reallocate tracked memory in the space */
static
void * reallocate_tracked( void * const arg_alloc_ptr
, const size_t arg_alloc_size );
/**\brief Deallocate tracked memory in the space */
static
void deallocate_tracked( void * const arg_alloc_ptr );
static SharedAllocationRecord * get_record( void * arg_alloc_ptr );
static void print_records( std::ostream & , const Kokkos::CudaHostPinnedSpace & , bool detail = false );
};
} // namespace Impl
-} // namespace Experimental
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
#endif /* #if defined( KOKKOS_HAVE_CUDA ) */
#endif /* #define KOKKOS_CUDASPACE_HPP */
diff --git a/lib/kokkos/core/src/Kokkos_ExecPolicy.hpp b/lib/kokkos/core/src/Kokkos_ExecPolicy.hpp
index 5834fc04d..db4d67ae7 100644
--- a/lib/kokkos/core/src/Kokkos_ExecPolicy.hpp
+++ b/lib/kokkos/core/src/Kokkos_ExecPolicy.hpp
@@ -1,570 +1,569 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_EXECPOLICY_HPP
#define KOKKOS_EXECPOLICY_HPP
#include <Kokkos_Core_fwd.hpp>
#include <impl/Kokkos_Traits.hpp>
#include <impl/Kokkos_StaticAssert.hpp>
#include <impl/Kokkos_Error.hpp>
#include <impl/Kokkos_Tags.hpp>
#include <impl/Kokkos_AnalyzePolicy.hpp>
#include <Kokkos_Concepts.hpp>
#include <iostream>
+
//----------------------------------------------------------------------------
namespace Kokkos {
/** \brief Execution policy for work over a range of an integral type.
*
* Valid template argument options:
*
* With a specified execution space:
* < ExecSpace , WorkTag , { IntConst | IntType } >
* < ExecSpace , WorkTag , void >
* < ExecSpace , { IntConst | IntType } , void >
* < ExecSpace , void , void >
*
* With the default execution space:
* < WorkTag , { IntConst | IntType } , void >
* < WorkTag , void , void >
* < { IntConst | IntType } , void , void >
* < void , void , void >
*
* IntType is a fundamental integral type
* IntConst is an Impl::integral_constant< IntType , Blocking >
*
* Blocking is the granularity of partitioning the range among threads.
*/
template<class ... Properties>
class RangePolicy
: public Impl::PolicyTraits<Properties ... >
{
private:
-
typedef Impl::PolicyTraits<Properties ... > traits;
typename traits::execution_space m_space ;
typename traits::index_type m_begin ;
typename traits::index_type m_end ;
typename traits::index_type m_granularity ;
typename traits::index_type m_granularity_mask ;
-public:
+public:
//! Tag this class as an execution policy
typedef RangePolicy execution_policy;
typedef typename traits::index_type member_type ;
KOKKOS_INLINE_FUNCTION const typename traits::execution_space & space() const { return m_space ; }
KOKKOS_INLINE_FUNCTION member_type begin() const { return m_begin ; }
KOKKOS_INLINE_FUNCTION member_type end() const { return m_end ; }
-
//TODO: find a better workaround for Clangs weird instantiation order
// This thing is here because of an instantiation error, where the RangePolicy is inserted into FunctorValue Traits, which
// tries decltype on the operator. It tries to do this even though the first argument of parallel for clearly doesn't match.
void operator()(const int&) const {}
RangePolicy(const RangePolicy&) = default;
RangePolicy(RangePolicy&&) = default;
inline RangePolicy() : m_space(), m_begin(0), m_end(0) {}
/** \brief Total range */
inline
RangePolicy( const typename traits::execution_space & work_space
, const member_type work_begin
, const member_type work_end
)
: m_space( work_space )
, m_begin( work_begin < work_end ? work_begin : 0 )
, m_end( work_begin < work_end ? work_end : 0 )
, m_granularity(0)
, m_granularity_mask(0)
{
set_auto_chunk_size();
}
/** \brief Total range */
inline
RangePolicy( const member_type work_begin
, const member_type work_end
)
: RangePolicy( typename traits::execution_space()
, work_begin , work_end )
{}
- public:
-
- /** \brief return chunk_size */
- inline member_type chunk_size() const {
- return m_granularity;
- }
+public:
+ /** \brief return chunk_size */
+ inline member_type chunk_size() const {
+ return m_granularity;
+ }
+
+ /** \brief set chunk_size to a discrete value*/
+ inline RangePolicy set_chunk_size(int chunk_size_) const {
+ RangePolicy p = *this;
+ p.m_granularity = chunk_size_;
+ p.m_granularity_mask = p.m_granularity - 1;
+ return p;
+ }
- /** \brief set chunk_size to a discrete value*/
- inline RangePolicy set_chunk_size(int chunk_size_) const {
- RangePolicy p = *this;
- p.m_granularity = chunk_size_;
- p.m_granularity_mask = p.m_granularity - 1;
- return p;
- }
+private:
+ /** \brief finalize chunk_size if it was set to AUTO*/
+ inline void set_auto_chunk_size() {
+
+ typename traits::index_type concurrency = traits::execution_space::concurrency();
+ if( concurrency==0 ) concurrency=1;
+
+ if(m_granularity > 0) {
+ if(!Impl::is_integral_power_of_two( m_granularity ))
+ Kokkos::abort("RangePolicy blocking granularity must be power of two" );
+ }
+
+ member_type new_chunk_size = 1;
+ while(new_chunk_size*100*concurrency < m_end-m_begin)
+ new_chunk_size *= 2;
+ if(new_chunk_size < 128) {
+ new_chunk_size = 1;
+ while( (new_chunk_size*40*concurrency < m_end-m_begin ) && (new_chunk_size<128) )
+ new_chunk_size*=2;
+ }
+ m_granularity = new_chunk_size;
+ m_granularity_mask = m_granularity - 1;
+ }
- private:
- /** \brief finalize chunk_size if it was set to AUTO*/
- inline void set_auto_chunk_size() {
-
- typename traits::index_type concurrency = traits::execution_space::concurrency();
- if( concurrency==0 ) concurrency=1;
-
- if(m_granularity > 0) {
- if(!Impl::is_integral_power_of_two( m_granularity ))
- Kokkos::abort("RangePolicy blocking granularity must be power of two" );
- }
-
-
- member_type new_chunk_size = 1;
- while(new_chunk_size*100*concurrency < m_end-m_begin)
- new_chunk_size *= 2;
- if(new_chunk_size < 128) {
- new_chunk_size = 1;
- while( (new_chunk_size*40*concurrency < m_end-m_begin ) && (new_chunk_size<128) )
- new_chunk_size*=2;
- }
- m_granularity = new_chunk_size;
- m_granularity_mask = m_granularity - 1;
- }
-
- public:
+public:
/** \brief Subrange for a partition's rank and size.
*
* Typically used to partition a range over a group of threads.
*/
struct WorkRange {
typedef typename RangePolicy::work_tag work_tag ;
typedef typename RangePolicy::member_type member_type ;
KOKKOS_INLINE_FUNCTION member_type begin() const { return m_begin ; }
KOKKOS_INLINE_FUNCTION member_type end() const { return m_end ; }
/** \brief Subrange for a partition's rank and size.
*
* Typically used to partition a range over a group of threads.
*/
KOKKOS_INLINE_FUNCTION
WorkRange( const RangePolicy & range
, const int part_rank
, const int part_size
)
: m_begin(0), m_end(0)
{
if ( part_size ) {
// Split evenly among partitions, then round up to the granularity.
const member_type work_part =
( ( ( ( range.end() - range.begin() ) + ( part_size - 1 ) ) / part_size )
+ range.m_granularity_mask ) & ~member_type(range.m_granularity_mask);
m_begin = range.begin() + work_part * part_rank ;
m_end = m_begin + work_part ;
if ( range.end() < m_begin ) m_begin = range.end() ;
if ( range.end() < m_end ) m_end = range.end() ;
}
}
- private:
- member_type m_begin ;
- member_type m_end ;
- WorkRange();
- WorkRange & operator = ( const WorkRange & );
+ private:
+ member_type m_begin ;
+ member_type m_end ;
+ WorkRange();
+ WorkRange & operator = ( const WorkRange & );
};
};
-
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
-
template< class ExecSpace, class ... Properties>
class TeamPolicyInternal: public Impl::PolicyTraits<Properties ... > {
private:
typedef Impl::PolicyTraits<Properties ... > traits;
public:
//----------------------------------------
/** \brief Query maximum team size for a given functor.
*
* This size takes into account execution space concurrency limitations and
* scratch memory space limitations for reductions, team reduce/scan, and
* team shared memory.
+ *
+ * This function only works for single-operator functors.
+ * With multi-operator functors it cannot be determined
+ * which operator will be called.
*/
template< class FunctorType >
static int team_size_max( const FunctorType & );
/** \brief Query recommended team size for a given functor.
*
* This size takes into account execution space concurrency limitations and
* scratch memory space limitations for reductions, team reduce/scan, and
* team shared memory.
+ *
+ * This function only works for single-operator functors.
+ * With multi-operator functors it cannot be determined
+ * which operator will be called.
*/
template< class FunctorType >
static int team_size_recommended( const FunctorType & );
template< class FunctorType >
static int team_size_recommended( const FunctorType & , const int&);
//----------------------------------------
/** \brief Construct policy with the given instance of the execution space */
TeamPolicyInternal( const typename traits::execution_space & , int league_size_request , int team_size_request , int vector_length_request = 1 );
TeamPolicyInternal( const typename traits::execution_space & , int league_size_request , const Kokkos::AUTO_t & , int vector_length_request = 1 );
/** \brief Construct policy with the default instance of the execution space */
TeamPolicyInternal( int league_size_request , int team_size_request , int vector_length_request = 1 );
TeamPolicyInternal( int league_size_request , const Kokkos::AUTO_t & , int vector_length_request = 1 );
/* TeamPolicyInternal( int league_size_request , int team_size_request );
TeamPolicyInternal( int league_size_request , const Kokkos::AUTO_t & );*/
/** \brief The actual league size (number of teams) of the policy.
*
* This may be smaller than the requested league size due to limitations
* of the execution space.
*/
KOKKOS_INLINE_FUNCTION int league_size() const ;
/** \brief The actual team size (number of threads per team) of the policy.
*
* This may be smaller than the requested team size due to limitations
* of the execution space.
*/
KOKKOS_INLINE_FUNCTION int team_size() const ;
inline typename traits::index_type chunk_size() const ;
inline TeamPolicyInternal set_chunk_size(int chunk_size) const ;
/** \brief Parallel execution of a functor calls the functor once with
* each member of the execution policy.
*/
struct member_type {
/** \brief Handle to the currently executing team shared scratch memory */
KOKKOS_INLINE_FUNCTION
typename traits::execution_space::scratch_memory_space team_shmem() const ;
/** \brief Rank of this team within the league of teams */
KOKKOS_INLINE_FUNCTION int league_rank() const ;
/** \brief Number of teams in the league */
KOKKOS_INLINE_FUNCTION int league_size() const ;
/** \brief Rank of this thread within this team */
KOKKOS_INLINE_FUNCTION int team_rank() const ;
/** \brief Number of threads in this team */
KOKKOS_INLINE_FUNCTION int team_size() const ;
/** \brief Barrier among the threads of this team */
KOKKOS_INLINE_FUNCTION void team_barrier() const ;
/** \brief Intra-team reduction. Returns join of all values of the team members. */
template< class JoinOp >
KOKKOS_INLINE_FUNCTION
typename JoinOp::value_type team_reduce( const typename JoinOp::value_type
, const JoinOp & ) const ;
/** \brief Intra-team exclusive prefix sum with team_rank() ordering.
*
* The highest rank thread can compute the reduction total as
* reduction_total = dev.team_scan( value ) + value ;
*/
template< typename Type >
KOKKOS_INLINE_FUNCTION Type team_scan( const Type & value ) const ;
/** \brief Intra-team exclusive prefix sum with team_rank() ordering
* with intra-team non-deterministic ordering accumulation.
*
* The global inter-team accumulation value will, at the end of the
* league's parallel execution, be the scan's total.
* Parallel execution ordering of the league's teams is non-deterministic.
* As such the base value for each team's scan operation is similarly
* non-deterministic.
*/
template< typename Type >
KOKKOS_INLINE_FUNCTION Type team_scan( const Type & value , Type * const global_accum ) const ;
};
};
-}
-namespace Impl {
struct PerTeamValue {
int value;
PerTeamValue(int arg);
};
struct PerThreadValue {
int value;
PerThreadValue(int arg);
};
+
}
Impl::PerTeamValue PerTeam(const int& arg);
Impl::PerThreadValue PerThread(const int& arg);
-
/** \brief Execution policy for parallel work over a league of teams of threads.
*
* The work functor is called for each thread of each team such that
* the team's member threads are guaranteed to be concurrent.
*
* The team's threads have access to team shared scratch memory and
* team collective operations.
*
* If the WorkTag is non-void then the first calling argument of the
* work functor's parentheses operator is 'const WorkTag &'.
* This allows a functor to have multiple work member functions.
*
* Order of template arguments does not matter, since the implementation
* uses variadic templates. Each and any of the template arguments can
* be omitted.
*
* Possible Template arguments and there default values:
* ExecutionSpace (DefaultExecutionSpace): where to execute code. Must be enabled.
* WorkTag (none): Tag which is used as the first argument for the functor operator.
* Schedule<Type> (Schedule<Static>): Scheduling Policy (Dynamic, or Static).
* IndexType<Type> (IndexType<ExecutionSpace::size_type>: Integer Index type used to iterate over the Index space.
*/
template< class ... Properties>
class TeamPolicy: public
Impl::TeamPolicyInternal<
typename Impl::PolicyTraits<Properties ... >::execution_space,
Properties ...> {
typedef Impl::TeamPolicyInternal<
typename Impl::PolicyTraits<Properties ... >::execution_space,
Properties ...> internal_policy;
typedef Impl::PolicyTraits<Properties ... > traits;
public:
typedef TeamPolicy execution_policy;
TeamPolicy& operator = (const TeamPolicy&) = default;
/** \brief Construct policy with the given instance of the execution space */
TeamPolicy( const typename traits::execution_space & , int league_size_request , int team_size_request , int vector_length_request = 1 )
: internal_policy(typename traits::execution_space(),league_size_request,team_size_request, vector_length_request) {}
TeamPolicy( const typename traits::execution_space & , int league_size_request , const Kokkos::AUTO_t & , int vector_length_request = 1 )
: internal_policy(typename traits::execution_space(),league_size_request,Kokkos::AUTO(), vector_length_request) {}
/** \brief Construct policy with the default instance of the execution space */
TeamPolicy( int league_size_request , int team_size_request , int vector_length_request = 1 )
: internal_policy(league_size_request,team_size_request, vector_length_request) {}
TeamPolicy( int league_size_request , const Kokkos::AUTO_t & , int vector_length_request = 1 )
: internal_policy(league_size_request,Kokkos::AUTO(), vector_length_request) {}
/* TeamPolicy( int league_size_request , int team_size_request )
: internal_policy(league_size_request,team_size_request) {}
TeamPolicy( int league_size_request , const Kokkos::AUTO_t & )
: internal_policy(league_size_request,Kokkos::AUTO()) {}*/
private:
TeamPolicy(const internal_policy& p):internal_policy(p) {}
public:
inline TeamPolicy set_chunk_size(int chunk) const {
return TeamPolicy(internal_policy::set_chunk_size(chunk));
};
inline TeamPolicy set_scratch_size(const int& level, const Impl::PerTeamValue& per_team) const {
return TeamPolicy(internal_policy::set_scratch_size(level,per_team));
};
inline TeamPolicy set_scratch_size(const int& level, const Impl::PerThreadValue& per_thread) const {
return TeamPolicy(internal_policy::set_scratch_size(level,per_thread));
};
inline TeamPolicy set_scratch_size(const int& level, const Impl::PerTeamValue& per_team, const Impl::PerThreadValue& per_thread) const {
return TeamPolicy(internal_policy::set_scratch_size(level, per_team, per_thread));
};
inline TeamPolicy set_scratch_size(const int& level, const Impl::PerThreadValue& per_thread, const Impl::PerTeamValue& per_team) const {
return TeamPolicy(internal_policy::set_scratch_size(level, per_team, per_thread));
};
};
-} // namespace Kokkos
-
-namespace Kokkos {
-
namespace Impl {
template<typename iType, class TeamMemberType>
struct TeamThreadRangeBoundariesStruct {
private:
KOKKOS_INLINE_FUNCTION static
iType ibegin( const iType & arg_begin
, const iType & arg_end
, const iType & arg_rank
, const iType & arg_size
)
{
return arg_begin + ( ( arg_end - arg_begin + arg_size - 1 ) / arg_size ) * arg_rank ;
}
KOKKOS_INLINE_FUNCTION static
iType iend( const iType & arg_begin
, const iType & arg_end
, const iType & arg_rank
, const iType & arg_size
)
{
const iType end_ = arg_begin + ( ( arg_end - arg_begin + arg_size - 1 ) / arg_size ) * ( arg_rank + 1 );
return end_ < arg_end ? end_ : arg_end ;
}
public:
typedef iType index_type;
const iType start;
const iType end;
enum {increment = 1};
const TeamMemberType& thread;
KOKKOS_INLINE_FUNCTION
TeamThreadRangeBoundariesStruct( const TeamMemberType& arg_thread
- , const iType& arg_end
- )
+ , const iType& arg_end
+ )
: start( ibegin( 0 , arg_end , arg_thread.team_rank() , arg_thread.team_size() ) )
, end( iend( 0 , arg_end , arg_thread.team_rank() , arg_thread.team_size() ) )
, thread( arg_thread )
{}
KOKKOS_INLINE_FUNCTION
TeamThreadRangeBoundariesStruct( const TeamMemberType& arg_thread
, const iType& arg_begin
, const iType& arg_end
)
: start( ibegin( arg_begin , arg_end , arg_thread.team_rank() , arg_thread.team_size() ) )
, end( iend( arg_begin , arg_end , arg_thread.team_rank() , arg_thread.team_size() ) )
, thread( arg_thread )
{}
};
- template<typename iType, class TeamMemberType>
- struct ThreadVectorRangeBoundariesStruct {
- typedef iType index_type;
- enum {start = 0};
- const iType end;
- enum {increment = 1};
+template<typename iType, class TeamMemberType>
+struct ThreadVectorRangeBoundariesStruct {
+ typedef iType index_type;
+ enum {start = 0};
+ const iType end;
+ enum {increment = 1};
- KOKKOS_INLINE_FUNCTION
- ThreadVectorRangeBoundariesStruct (const TeamMemberType& thread, const iType& count):
- end( count )
- {}
- };
+ KOKKOS_INLINE_FUNCTION
+ ThreadVectorRangeBoundariesStruct ( const TeamMemberType, const iType& count ) : end( count ) {}
+ KOKKOS_INLINE_FUNCTION
+ ThreadVectorRangeBoundariesStruct ( const iType& count ) : end( count ) {}
+};
- template<class TeamMemberType>
- struct ThreadSingleStruct {
- const TeamMemberType& team_member;
- KOKKOS_INLINE_FUNCTION
- ThreadSingleStruct(const TeamMemberType& team_member_):team_member(team_member_){}
- };
+template<class TeamMemberType>
+struct ThreadSingleStruct {
+ const TeamMemberType& team_member;
+ KOKKOS_INLINE_FUNCTION
+ ThreadSingleStruct( const TeamMemberType& team_member_ ) : team_member( team_member_ ) {}
+};
+
+template<class TeamMemberType>
+struct VectorSingleStruct {
+ const TeamMemberType& team_member;
+ KOKKOS_INLINE_FUNCTION
+ VectorSingleStruct( const TeamMemberType& team_member_ ) : team_member( team_member_ ) {}
+};
- template<class TeamMemberType>
- struct VectorSingleStruct {
- const TeamMemberType& team_member;
- KOKKOS_INLINE_FUNCTION
- VectorSingleStruct(const TeamMemberType& team_member_):team_member(team_member_){}
- };
} // namespace Impl
/** \brief Execution policy for parallel work over a threads within a team.
*
* The range is split over all threads in a team. The Mapping scheme depends on the architecture.
* This policy is used together with a parallel pattern as a nested layer within a kernel launched
* with the TeamPolicy. This variant expects a single count. So the range is (0,count].
*/
template<typename iType, class TeamMemberType>
KOKKOS_INLINE_FUNCTION
-Impl::TeamThreadRangeBoundariesStruct<iType,TeamMemberType> TeamThreadRange(const TeamMemberType&, const iType& count);
+Impl::TeamThreadRangeBoundariesStruct<iType,TeamMemberType>
+TeamThreadRange( const TeamMemberType&, const iType& count );
/** \brief Execution policy for parallel work over a threads within a team.
*
* The range is split over all threads in a team. The Mapping scheme depends on the architecture.
* This policy is used together with a parallel pattern as a nested layer within a kernel launched
* with the TeamPolicy. This variant expects a begin and end. So the range is (begin,end].
*/
-template<typename iType, class TeamMemberType>
+template<typename iType1, typename iType2, class TeamMemberType>
KOKKOS_INLINE_FUNCTION
-Impl::TeamThreadRangeBoundariesStruct<iType,TeamMemberType> TeamThreadRange(const TeamMemberType&, const iType& begin, const iType& end);
+Impl::TeamThreadRangeBoundariesStruct<typename std::common_type<iType1, iType2>::type, TeamMemberType>
+TeamThreadRange( const TeamMemberType&, const iType1& begin, const iType2& end );
/** \brief Execution policy for a vector parallel loop.
*
* The range is split over all vector lanes in a thread. The Mapping scheme depends on the architecture.
* This policy is used together with a parallel pattern as a nested layer within a kernel launched
* with the TeamPolicy. This variant expects a single count. So the range is (0,count].
*/
template<typename iType, class TeamMemberType>
KOKKOS_INLINE_FUNCTION
-Impl::ThreadVectorRangeBoundariesStruct<iType,TeamMemberType> ThreadVectorRange(const TeamMemberType&, const iType& count);
+Impl::ThreadVectorRangeBoundariesStruct<iType,TeamMemberType>
+ThreadVectorRange( const TeamMemberType&, const iType& count );
} // namespace Kokkos
-
#endif /* #define KOKKOS_EXECPOLICY_HPP */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
-
diff --git a/lib/kokkos/core/src/Kokkos_HBWSpace.hpp b/lib/kokkos/core/src/Kokkos_HBWSpace.hpp
index e02689b0f..10e735fe0 100644
--- a/lib/kokkos/core/src/Kokkos_HBWSpace.hpp
+++ b/lib/kokkos/core/src/Kokkos_HBWSpace.hpp
@@ -1,312 +1,337 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_HBWSPACE_HPP
#define KOKKOS_HBWSPACE_HPP
#include <Kokkos_HostSpace.hpp>
-#include <impl/Kokkos_HBWAllocators.hpp>
/*--------------------------------------------------------------------------*/
#ifdef KOKKOS_HAVE_HBWSPACE
namespace Kokkos {
namespace Experimental {
namespace Impl {
/// \brief Initialize lock array for arbitrary size atomics.
///
/// Arbitrary atomics are implemented using a hash table of locks
/// where the hash value is derived from the address of the
/// object for which an atomic operation is performed.
/// This function initializes the locks to zero (unset).
void init_lock_array_hbw_space();
/// \brief Aquire a lock for the address
///
/// This function tries to aquire the lock for the hash value derived
/// from the provided ptr. If the lock is successfully aquired the
/// function returns true. Otherwise it returns false.
bool lock_address_hbw_space(void* ptr);
/// \brief Release lock for the address
///
/// This function releases the lock for the hash value derived
/// from the provided ptr. This function should only be called
/// after previously successfully aquiring a lock with
/// lock_address.
void unlock_address_hbw_space(void* ptr);
} // namespace Impl
} // neamspace Experimental
} // namespace Kokkos
namespace Kokkos {
namespace Experimental {
/// \class HBWSpace
/// \brief Memory management for host memory.
///
/// HBWSpace is a memory space that governs host memory. "Host"
/// memory means the usual CPU-accessible memory.
class HBWSpace {
public:
//! Tag this class as a kokkos memory space
typedef HBWSpace memory_space ;
typedef size_t size_type ;
/// \typedef execution_space
/// \brief Default execution space for this memory space.
///
/// Every memory space has a default execution space. This is
/// useful for things like initializing a View (which happens in
/// parallel using the View's default execution space).
#if defined( KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_OPENMP )
typedef Kokkos::OpenMP execution_space ;
#elif defined( KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_THREADS )
typedef Kokkos::Threads execution_space ;
#elif defined( KOKKOS_HAVE_OPENMP )
typedef Kokkos::OpenMP execution_space ;
#elif defined( KOKKOS_HAVE_PTHREAD )
typedef Kokkos::Threads execution_space ;
#elif defined( KOKKOS_HAVE_SERIAL )
typedef Kokkos::Serial execution_space ;
#else
# error "At least one of the following host execution spaces must be defined: Kokkos::OpenMP, Kokkos::Serial, or Kokkos::Threads. You might be seeing this message if you disabled the Kokkos::Serial device explicitly using the Kokkos_ENABLE_Serial:BOOL=OFF CMake option, but did not enable any of the other host execution space devices."
#endif
//! This memory space preferred device_type
typedef Kokkos::Device<execution_space,memory_space> device_type;
/*--------------------------------*/
/* Functions unique to the HBWSpace */
static int in_parallel();
static void register_in_parallel( int (*)() );
/*--------------------------------*/
/**\brief Default memory space instance */
HBWSpace();
HBWSpace( const HBWSpace & rhs ) = default ;
HBWSpace & operator = ( const HBWSpace & ) = default ;
~HBWSpace() = default ;
/**\brief Non-default memory space instance to choose allocation mechansim, if available */
enum AllocationMechanism { STD_MALLOC , POSIX_MEMALIGN , POSIX_MMAP , INTEL_MM_ALLOC };
explicit
HBWSpace( const AllocationMechanism & );
/**\brief Allocate untracked memory in the space */
void * allocate( const size_t arg_alloc_size ) const ;
/**\brief Deallocate untracked memory in the space */
void deallocate( void * const arg_alloc_ptr
, const size_t arg_alloc_size ) const ;
+ /**\brief Return Name of the MemorySpace */
+ static constexpr const char* name();
+
private:
AllocationMechanism m_alloc_mech ;
-
- friend class Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void > ;
+ static constexpr const char* m_name = "HBW";
+ friend class Kokkos::Impl::SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void > ;
};
} // namespace Experimental
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
-namespace Experimental {
namespace Impl {
template<>
class SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >
: public SharedAllocationRecord< void , void >
{
private:
friend Kokkos::Experimental::HBWSpace ;
typedef SharedAllocationRecord< void , void > RecordBase ;
SharedAllocationRecord( const SharedAllocationRecord & ) = delete ;
SharedAllocationRecord & operator = ( const SharedAllocationRecord & ) = delete ;
static void deallocate( RecordBase * );
/**\brief Root record for tracked allocations from this HBWSpace instance */
static RecordBase s_root_record ;
const Kokkos::Experimental::HBWSpace m_space ;
protected:
~SharedAllocationRecord();
SharedAllocationRecord() = default ;
SharedAllocationRecord( const Kokkos::Experimental::HBWSpace & arg_space
, const std::string & arg_label
, const size_t arg_alloc_size
, const RecordBase::function_type arg_dealloc = & deallocate
);
public:
inline
std::string get_label() const
{
return std::string( RecordBase::head()->m_label );
}
KOKKOS_INLINE_FUNCTION static
SharedAllocationRecord * allocate( const Kokkos::Experimental::HBWSpace & arg_space
, const std::string & arg_label
, const size_t arg_alloc_size
)
{
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
return new SharedAllocationRecord( arg_space , arg_label , arg_alloc_size );
#else
return (SharedAllocationRecord *) 0 ;
#endif
}
/**\brief Allocate tracked memory in the space */
static
void * allocate_tracked( const Kokkos::Experimental::HBWSpace & arg_space
, const std::string & arg_label
, const size_t arg_alloc_size );
/**\brief Reallocate tracked memory in the space */
static
void * reallocate_tracked( void * const arg_alloc_ptr
, const size_t arg_alloc_size );
/**\brief Deallocate tracked memory in the space */
static
void deallocate_tracked( void * const arg_alloc_ptr );
static SharedAllocationRecord * get_record( void * arg_alloc_ptr );
static void print_records( std::ostream & , const Kokkos::Experimental::HBWSpace & , bool detail = false );
};
} // namespace Impl
-} // namespace Experimental
} // namespace Kokkos
+
+//----------------------------------------------------------------------------
+//----------------------------------------------------------------------------
+
+namespace Kokkos {
+namespace Impl {
+
+static_assert( Kokkos::Impl::MemorySpaceAccess< Kokkos::Experimental::HBWSpace , Kokkos::Experimental::HBWSpace >::assignable , "" );
+
+template<>
+struct MemorySpaceAccess< Kokkos::HostSpace , Kokkos::Experimental::HBWSpace > {
+ enum { assignable = true };
+ enum { accessible = true };
+ enum { deepcopy = true };
+};
+
+template<>
+struct MemorySpaceAccess< Kokkos::Experimental::HBWSpace , Kokkos::HostSpace> {
+ enum { assignable = false };
+ enum { accessible = true };
+ enum { deepcopy = true };
+};
+
+}}
+
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template<class ExecutionSpace>
struct DeepCopy<Experimental::HBWSpace,Experimental::HBWSpace,ExecutionSpace> {
DeepCopy( void * dst , const void * src , size_t n ) {
memcpy( dst , src , n );
}
DeepCopy( const ExecutionSpace& exec, void * dst , const void * src , size_t n ) {
exec.fence();
memcpy( dst , src , n );
}
};
template<class ExecutionSpace>
struct DeepCopy<HostSpace,Experimental::HBWSpace,ExecutionSpace> {
DeepCopy( void * dst , const void * src , size_t n ) {
memcpy( dst , src , n );
}
DeepCopy( const ExecutionSpace& exec, void * dst , const void * src , size_t n ) {
exec.fence();
memcpy( dst , src , n );
}
};
template<class ExecutionSpace>
struct DeepCopy<Experimental::HBWSpace,HostSpace,ExecutionSpace> {
DeepCopy( void * dst , const void * src , size_t n ) {
memcpy( dst , src , n );
}
DeepCopy( const ExecutionSpace& exec, void * dst , const void * src , size_t n ) {
exec.fence();
memcpy( dst , src , n );
}
};
} // namespace Impl
} // namespace Kokkos
namespace Kokkos {
namespace Impl {
template<>
struct VerifyExecutionCanAccessMemorySpace< Kokkos::HostSpace , Kokkos::Experimental::HBWSpace >
{
enum { value = true };
inline static void verify( void ) { }
inline static void verify( const void * ) { }
};
template<>
struct VerifyExecutionCanAccessMemorySpace< Kokkos::Experimental::HBWSpace , Kokkos::HostSpace >
{
enum { value = true };
inline static void verify( void ) { }
inline static void verify( const void * ) { }
};
} // namespace Impl
} // namespace Kokkos
#endif
#endif /* #define KOKKOS_HBWSPACE_HPP */
diff --git a/lib/kokkos/core/src/Kokkos_HostSpace.hpp b/lib/kokkos/core/src/Kokkos_HostSpace.hpp
index 5fe686559..0292dd8a6 100644
--- a/lib/kokkos/core/src/Kokkos_HostSpace.hpp
+++ b/lib/kokkos/core/src/Kokkos_HostSpace.hpp
@@ -1,275 +1,317 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_HOSTSPACE_HPP
#define KOKKOS_HOSTSPACE_HPP
#include <cstring>
#include <string>
#include <iosfwd>
#include <typeinfo>
#include <Kokkos_Core_fwd.hpp>
+#include <Kokkos_Concepts.hpp>
#include <Kokkos_MemoryTraits.hpp>
#include <impl/Kokkos_Traits.hpp>
#include <impl/Kokkos_Error.hpp>
-
-#include <impl/KokkosExp_SharedAlloc.hpp>
+#include <impl/Kokkos_SharedAlloc.hpp>
/*--------------------------------------------------------------------------*/
namespace Kokkos {
namespace Impl {
/// \brief Initialize lock array for arbitrary size atomics.
///
/// Arbitrary atomics are implemented using a hash table of locks
/// where the hash value is derived from the address of the
/// object for which an atomic operation is performed.
/// This function initializes the locks to zero (unset).
void init_lock_array_host_space();
/// \brief Aquire a lock for the address
///
/// This function tries to aquire the lock for the hash value derived
/// from the provided ptr. If the lock is successfully aquired the
/// function returns true. Otherwise it returns false.
bool lock_address_host_space(void* ptr);
/// \brief Release lock for the address
///
/// This function releases the lock for the hash value derived
/// from the provided ptr. This function should only be called
/// after previously successfully aquiring a lock with
/// lock_address.
void unlock_address_host_space(void* ptr);
} // namespace Impl
} // namespace Kokkos
namespace Kokkos {
/// \class HostSpace
/// \brief Memory management for host memory.
///
/// HostSpace is a memory space that governs host memory. "Host"
/// memory means the usual CPU-accessible memory.
class HostSpace {
public:
//! Tag this class as a kokkos memory space
typedef HostSpace memory_space ;
typedef size_t size_type ;
/// \typedef execution_space
/// \brief Default execution space for this memory space.
///
/// Every memory space has a default execution space. This is
/// useful for things like initializing a View (which happens in
/// parallel using the View's default execution space).
#if defined( KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_OPENMP )
typedef Kokkos::OpenMP execution_space ;
#elif defined( KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_THREADS )
typedef Kokkos::Threads execution_space ;
#elif defined( KOKKOS_HAVE_OPENMP )
typedef Kokkos::OpenMP execution_space ;
#elif defined( KOKKOS_HAVE_PTHREAD )
typedef Kokkos::Threads execution_space ;
#elif defined( KOKKOS_HAVE_SERIAL )
typedef Kokkos::Serial execution_space ;
#else
# error "At least one of the following host execution spaces must be defined: Kokkos::OpenMP, Kokkos::Serial, or Kokkos::Threads. You might be seeing this message if you disabled the Kokkos::Serial device explicitly using the Kokkos_ENABLE_Serial:BOOL=OFF CMake option, but did not enable any of the other host execution space devices."
#endif
//! This memory space preferred device_type
typedef Kokkos::Device<execution_space,memory_space> device_type;
/*--------------------------------*/
/* Functions unique to the HostSpace */
static int in_parallel();
static void register_in_parallel( int (*)() );
/*--------------------------------*/
/**\brief Default memory space instance */
HostSpace();
HostSpace( HostSpace && rhs ) = default ;
HostSpace( const HostSpace & rhs ) = default ;
HostSpace & operator = ( HostSpace && ) = default ;
HostSpace & operator = ( const HostSpace & ) = default ;
~HostSpace() = default ;
/**\brief Non-default memory space instance to choose allocation mechansim, if available */
enum AllocationMechanism { STD_MALLOC , POSIX_MEMALIGN , POSIX_MMAP , INTEL_MM_ALLOC };
explicit
HostSpace( const AllocationMechanism & );
/**\brief Allocate untracked memory in the space */
void * allocate( const size_t arg_alloc_size ) const ;
/**\brief Deallocate untracked memory in the space */
void deallocate( void * const arg_alloc_ptr
, const size_t arg_alloc_size ) const ;
+ /**\brief Return Name of the MemorySpace */
+ static constexpr const char* name();
+
private:
AllocationMechanism m_alloc_mech ;
+ static constexpr const char* m_name = "Host";
+ friend class Kokkos::Impl::SharedAllocationRecord< Kokkos::HostSpace , void > ;
+};
+
+} // namespace Kokkos
+
+//----------------------------------------------------------------------------
+//----------------------------------------------------------------------------
+
+namespace Kokkos {
+namespace Impl {
+
+static_assert( Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace , Kokkos::HostSpace >::assignable , "" );
+
+
+template< typename S >
+struct HostMirror {
+private:
- friend class Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::HostSpace , void > ;
+ // If input execution space can access HostSpace then keep it.
+ // Example: Kokkos::OpenMP can access, Kokkos::Cuda cannot
+ enum { keep_exe = Kokkos::Impl::MemorySpaceAccess
+ < typename S::execution_space::memory_space , Kokkos::HostSpace >
+ ::accessible };
+
+ // If HostSpace can access memory space then keep it.
+ // Example: Cannot access Kokkos::CudaSpace, can access Kokkos::CudaUVMSpace
+ enum { keep_mem = Kokkos::Impl::MemorySpaceAccess
+ < Kokkos::HostSpace , typename S::memory_space >::accessible };
+
+public:
+
+ typedef typename std::conditional
+ < keep_exe && keep_mem /* Can keep whole space */
+ , S
+ , typename std::conditional
+ < keep_mem /* Can keep memory space, use default Host execution space */
+ , Kokkos::Device< Kokkos::HostSpace::execution_space
+ , typename S::memory_space >
+ , Kokkos::HostSpace
+ >::type
+ >::type Space ;
};
+} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
-namespace Experimental {
namespace Impl {
template<>
class SharedAllocationRecord< Kokkos::HostSpace , void >
: public SharedAllocationRecord< void , void >
{
private:
friend Kokkos::HostSpace ;
typedef SharedAllocationRecord< void , void > RecordBase ;
SharedAllocationRecord( const SharedAllocationRecord & ) = delete ;
SharedAllocationRecord & operator = ( const SharedAllocationRecord & ) = delete ;
static void deallocate( RecordBase * );
/**\brief Root record for tracked allocations from this HostSpace instance */
static RecordBase s_root_record ;
const Kokkos::HostSpace m_space ;
protected:
~SharedAllocationRecord();
SharedAllocationRecord() = default ;
SharedAllocationRecord( const Kokkos::HostSpace & arg_space
, const std::string & arg_label
, const size_t arg_alloc_size
, const RecordBase::function_type arg_dealloc = & deallocate
);
public:
inline
std::string get_label() const
{
return std::string( RecordBase::head()->m_label );
}
KOKKOS_INLINE_FUNCTION static
SharedAllocationRecord * allocate( const Kokkos::HostSpace & arg_space
, const std::string & arg_label
, const size_t arg_alloc_size
)
{
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
return new SharedAllocationRecord( arg_space , arg_label , arg_alloc_size );
#else
return (SharedAllocationRecord *) 0 ;
#endif
}
/**\brief Allocate tracked memory in the space */
static
void * allocate_tracked( const Kokkos::HostSpace & arg_space
, const std::string & arg_label
, const size_t arg_alloc_size );
/**\brief Reallocate tracked memory in the space */
static
void * reallocate_tracked( void * const arg_alloc_ptr
, const size_t arg_alloc_size );
/**\brief Deallocate tracked memory in the space */
static
void deallocate_tracked( void * const arg_alloc_ptr );
static SharedAllocationRecord * get_record( void * arg_alloc_ptr );
static void print_records( std::ostream & , const Kokkos::HostSpace & , bool detail = false );
};
} // namespace Impl
-} // namespace Experimental
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template< class DstSpace, class SrcSpace, class ExecutionSpace = typename DstSpace::execution_space> struct DeepCopy ;
template<class ExecutionSpace>
struct DeepCopy<HostSpace,HostSpace,ExecutionSpace> {
DeepCopy( void * dst , const void * src , size_t n ) {
memcpy( dst , src , n );
}
DeepCopy( const ExecutionSpace& exec, void * dst , const void * src , size_t n ) {
exec.fence();
memcpy( dst , src , n );
}
};
} // namespace Impl
} // namespace Kokkos
#endif /* #define KOKKOS_HOSTSPACE_HPP */
diff --git a/lib/kokkos/core/src/Kokkos_Layout.hpp b/lib/kokkos/core/src/Kokkos_Layout.hpp
index c77c33703..8ffbc8bb0 100644
--- a/lib/kokkos/core/src/Kokkos_Layout.hpp
+++ b/lib/kokkos/core/src/Kokkos_Layout.hpp
@@ -1,233 +1,239 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
/// \file Kokkos_Layout.hpp
/// \brief Declaration of various \c MemoryLayout options.
#ifndef KOKKOS_LAYOUT_HPP
#define KOKKOS_LAYOUT_HPP
#include <stddef.h>
#include <impl/Kokkos_Traits.hpp>
#include <impl/Kokkos_Tags.hpp>
namespace Kokkos {
enum { ARRAY_LAYOUT_MAX_RANK = 8 };
//----------------------------------------------------------------------------
/// \struct LayoutLeft
/// \brief Memory layout tag indicating left-to-right (Fortran scheme)
/// striding of multi-indices.
///
/// This is an example of a \c MemoryLayout template parameter of
/// View. The memory layout describes how View maps from a
/// multi-index (i0, i1, ..., ik) to a memory location.
///
/// "Layout left" indicates a mapping where the leftmost index i0
/// refers to contiguous access, and strides increase for dimensions
/// going right from there (i1, i2, ...). This layout imitates how
/// Fortran stores multi-dimensional arrays. For the special case of
/// a two-dimensional array, "layout left" is also called "column
/// major."
struct LayoutLeft {
//! Tag this class as a kokkos array layout
typedef LayoutLeft array_layout ;
size_t dimension[ ARRAY_LAYOUT_MAX_RANK ];
LayoutLeft( LayoutLeft const & ) = default ;
LayoutLeft( LayoutLeft && ) = default ;
LayoutLeft & operator = ( LayoutLeft const & ) = default ;
LayoutLeft & operator = ( LayoutLeft && ) = default ;
KOKKOS_INLINE_FUNCTION
- constexpr
+ explicit constexpr
LayoutLeft( size_t N0 = 0 , size_t N1 = 0 , size_t N2 = 0 , size_t N3 = 0
, size_t N4 = 0 , size_t N5 = 0 , size_t N6 = 0 , size_t N7 = 0 )
: dimension { N0 , N1 , N2 , N3 , N4 , N5 , N6 , N7 } {}
};
//----------------------------------------------------------------------------
/// \struct LayoutRight
/// \brief Memory layout tag indicating right-to-left (C or
/// lexigraphical scheme) striding of multi-indices.
///
/// This is an example of a \c MemoryLayout template parameter of
/// View. The memory layout describes how View maps from a
/// multi-index (i0, i1, ..., ik) to a memory location.
///
/// "Right layout" indicates a mapping where the rightmost index ik
/// refers to contiguous access, and strides increase for dimensions
/// going left from there. This layout imitates how C stores
/// multi-dimensional arrays. For the special case of a
/// two-dimensional array, "layout right" is also called "row major."
struct LayoutRight {
//! Tag this class as a kokkos array layout
typedef LayoutRight array_layout ;
size_t dimension[ ARRAY_LAYOUT_MAX_RANK ];
LayoutRight( LayoutRight const & ) = default ;
LayoutRight( LayoutRight && ) = default ;
LayoutRight & operator = ( LayoutRight const & ) = default ;
LayoutRight & operator = ( LayoutRight && ) = default ;
KOKKOS_INLINE_FUNCTION
- constexpr
+ explicit constexpr
LayoutRight( size_t N0 = 0 , size_t N1 = 0 , size_t N2 = 0 , size_t N3 = 0
, size_t N4 = 0 , size_t N5 = 0 , size_t N6 = 0 , size_t N7 = 0 )
: dimension { N0 , N1 , N2 , N3 , N4 , N5 , N6 , N7 } {}
};
//----------------------------------------------------------------------------
/// \struct LayoutStride
/// \brief Memory layout tag indicated arbitrarily strided
/// multi-index mapping into contiguous memory.
struct LayoutStride {
//! Tag this class as a kokkos array layout
typedef LayoutStride array_layout ;
size_t dimension[ ARRAY_LAYOUT_MAX_RANK ] ;
size_t stride[ ARRAY_LAYOUT_MAX_RANK ] ;
+ LayoutStride( LayoutStride const & ) = default ;
+ LayoutStride( LayoutStride && ) = default ;
+ LayoutStride & operator = ( LayoutStride const & ) = default ;
+ LayoutStride & operator = ( LayoutStride && ) = default ;
+
/** \brief Compute strides from ordered dimensions.
*
* Values of order uniquely form the set [0..rank)
* and specify ordering of the dimensions.
* Order = {0,1,2,...} is LayoutLeft
* Order = {...,2,1,0} is LayoutRight
*/
template< typename iTypeOrder , typename iTypeDimen >
KOKKOS_INLINE_FUNCTION static
LayoutStride order_dimensions( int const rank
, iTypeOrder const * const order
, iTypeDimen const * const dimen )
{
LayoutStride tmp ;
// Verify valid rank order:
int check_input = ARRAY_LAYOUT_MAX_RANK < rank ? 0 : int( 1 << rank ) - 1 ;
for ( int r = 0 ; r < ARRAY_LAYOUT_MAX_RANK ; ++r ) {
tmp.dimension[r] = 0 ;
tmp.stride[r] = 0 ;
check_input &= ~int( 1 << order[r] );
}
if ( 0 == check_input ) {
size_t n = 1 ;
for ( int r = 0 ; r < rank ; ++r ) {
tmp.stride[ order[r] ] = n ;
n *= ( dimen[order[r]] );
tmp.dimension[r] = dimen[r];
}
}
return tmp ;
}
- KOKKOS_INLINE_FUNCTION constexpr
+ KOKKOS_INLINE_FUNCTION
+ explicit constexpr
LayoutStride( size_t N0 = 0 , size_t S0 = 0
, size_t N1 = 0 , size_t S1 = 0
, size_t N2 = 0 , size_t S2 = 0
, size_t N3 = 0 , size_t S3 = 0
, size_t N4 = 0 , size_t S4 = 0
, size_t N5 = 0 , size_t S5 = 0
, size_t N6 = 0 , size_t S6 = 0
, size_t N7 = 0 , size_t S7 = 0
)
: dimension { N0 , N1 , N2 , N3 , N4 , N5 , N6 , N7 }
, stride { S0 , S1 , S2 , S3 , S4 , S5 , S6 , S7 }
{}
};
//----------------------------------------------------------------------------
/// \struct LayoutTileLeft
/// \brief Memory layout tag indicating left-to-right (Fortran scheme)
/// striding of multi-indices by tiles.
///
/// This is an example of a \c MemoryLayout template parameter of
/// View. The memory layout describes how View maps from a
/// multi-index (i0, i1, ..., ik) to a memory location.
///
/// "Tiled layout" indicates a mapping to contiguously stored
/// <tt>ArgN0</tt> by <tt>ArgN1</tt> tiles for the rightmost two
/// dimensions. Indices are LayoutLeft within each tile, and the
/// tiles themselves are arranged using LayoutLeft. Note that the
/// dimensions <tt>ArgN0</tt> and <tt>ArgN1</tt> of the tiles must be
/// compile-time constants. This speeds up index calculations. If
/// both tile dimensions are powers of two, Kokkos can optimize
/// further.
template < unsigned ArgN0 , unsigned ArgN1 ,
bool IsPowerOfTwo = ( Impl::is_integral_power_of_two(ArgN0) &&
Impl::is_integral_power_of_two(ArgN1) )
>
struct LayoutTileLeft {
static_assert( Impl::is_integral_power_of_two(ArgN0) &&
Impl::is_integral_power_of_two(ArgN1)
, "LayoutTileLeft must be given power-of-two tile dimensions" );
//! Tag this class as a kokkos array layout
typedef LayoutTileLeft<ArgN0,ArgN1,IsPowerOfTwo> array_layout ;
enum { N0 = ArgN0 };
enum { N1 = ArgN1 };
size_t dimension[ ARRAY_LAYOUT_MAX_RANK ] ;
LayoutTileLeft( LayoutTileLeft const & ) = default ;
LayoutTileLeft( LayoutTileLeft && ) = default ;
LayoutTileLeft & operator = ( LayoutTileLeft const & ) = default ;
LayoutTileLeft & operator = ( LayoutTileLeft && ) = default ;
KOKKOS_INLINE_FUNCTION
- constexpr
+ explicit constexpr
LayoutTileLeft( size_t argN0 = 0 , size_t argN1 = 0 , size_t argN2 = 0 , size_t argN3 = 0
, size_t argN4 = 0 , size_t argN5 = 0 , size_t argN6 = 0 , size_t argN7 = 0
)
: dimension { argN0 , argN1 , argN2 , argN3 , argN4 , argN5 , argN6 , argN7 } {}
};
} // namespace Kokkos
#endif // #ifndef KOKKOS_LAYOUT_HPP
diff --git a/lib/kokkos/core/src/Kokkos_Macros.hpp b/lib/kokkos/core/src/Kokkos_Macros.hpp
index 7d1e59af5..fbe699deb 100644
--- a/lib/kokkos/core/src/Kokkos_Macros.hpp
+++ b/lib/kokkos/core/src/Kokkos_Macros.hpp
@@ -1,470 +1,505 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_MACROS_HPP
#define KOKKOS_MACROS_HPP
//----------------------------------------------------------------------------
/** Pick up configure/build options via #define macros:
*
* KOKKOS_HAVE_CUDA Kokkos::Cuda execution and memory spaces
* KOKKOS_HAVE_PTHREAD Kokkos::Threads execution space
* KOKKOS_HAVE_QTHREAD Kokkos::Qthread execution space
* KOKKOS_HAVE_OPENMP Kokkos::OpenMP execution space
* KOKKOS_HAVE_HWLOC HWLOC library is available
* KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK insert array bounds checks, is expensive!
* KOKKOS_HAVE_CXX11 enable C++11 features
*
* KOKKOS_HAVE_MPI negotiate MPI/execution space interactions
*
* KOKKOS_USE_CUDA_UVM Use CUDA UVM for Cuda memory space
*/
#ifndef KOKKOS_DONT_INCLUDE_CORE_CONFIG_H
#include <KokkosCore_config.h>
#endif
//----------------------------------------------------------------------------
/** Pick up compiler specific #define macros:
*
* Macros for known compilers evaluate to an integral version value
*
* KOKKOS_COMPILER_NVCC
* KOKKOS_COMPILER_GNU
* KOKKOS_COMPILER_INTEL
* KOKKOS_COMPILER_IBM
* KOKKOS_COMPILER_CRAYC
* KOKKOS_COMPILER_APPLECC
* KOKKOS_COMPILER_CLANG
* KOKKOS_COMPILER_PGI
*
* Macros for which compiler extension to use for atomics on intrinsice types
*
* KOKKOS_ATOMICS_USE_CUDA
* KOKKOS_ATOMICS_USE_GNU
* KOKKOS_ATOMICS_USE_INTEL
* KOKKOS_ATOMICS_USE_OPENMP31
*
* A suite of 'KOKKOS_HAVE_PRAGMA_...' are defined for internal use.
*
* Macros for marking functions to run in an execution space:
*
* KOKKOS_FUNCTION
* KOKKOS_INLINE_FUNCTION request compiler to inline
* KOKKOS_FORCEINLINE_FUNCTION force compiler to inline, use with care!
*/
//----------------------------------------------------------------------------
#if defined( KOKKOS_HAVE_CUDA ) && defined( __CUDACC__ )
/* Compiling with a CUDA compiler.
*
* Include <cuda.h> to pick up the CUDA_VERSION macro defined as:
* CUDA_VERSION = ( MAJOR_VERSION * 1000 ) + ( MINOR_VERSION * 10 )
*
* When generating device code the __CUDA_ARCH__ macro is defined as:
* __CUDA_ARCH__ = ( MAJOR_CAPABILITY * 100 ) + ( MINOR_CAPABILITY * 10 )
*/
#include <cuda_runtime.h>
#include <cuda.h>
#if ! defined( CUDA_VERSION )
#error "#include <cuda.h> did not define CUDA_VERSION"
#endif
-#if ( CUDA_VERSION < 6050 )
-// CUDA supports (inofficially) C++11 in device code starting with
-// version 6.5. This includes auto type and device code internal
+#if ( CUDA_VERSION < 7000 )
+// CUDA supports C++11 in device code starting with
+// version 7.0. This includes auto type and device code internal
// lambdas.
-#error "Cuda version 6.5 or greater required"
+#error "Cuda version 7.0 or greater required"
#endif
#if defined( __CUDA_ARCH__ ) && ( __CUDA_ARCH__ < 300 )
/* Compiling with CUDA compiler for device code. */
#error "Cuda device capability >= 3.0 is required"
#endif
#ifdef KOKKOS_CUDA_USE_LAMBDA
-#if ( CUDA_VERSION < 7000 )
-// CUDA supports C++11 lambdas generated in host code to be given
-// to the device starting with version 7.5. But the release candidate (7.5.6)
-// still identifies as 7.0
-#error "Cuda version 7.5 or greater required for host-to-device Lambda support"
-#endif
-#if ( CUDA_VERSION < 8000 )
-#define KOKKOS_LAMBDA [=]__device__
+#if ( CUDA_VERSION < 7050 )
+ // CUDA supports C++11 lambdas generated in host code to be given
+ // to the device starting with version 7.5. But the release candidate (7.5.6)
+ // still identifies as 7.0
+ #error "Cuda version 7.5 or greater required for host-to-device Lambda support"
+#endif
+#if ( CUDA_VERSION < 8000 ) && defined(__NVCC__)
+ #define KOKKOS_LAMBDA [=]__device__
#else
-#define KOKKOS_LAMBDA [=]__host__ __device__
+ #define KOKKOS_LAMBDA [=]__host__ __device__
+ #if defined( KOKKOS_HAVE_CXX1Z )
+ #define KOKKOS_CLASS_LAMBDA [=,*this] __host__ __device__
+ #endif
#endif
#define KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA 1
#endif
#endif /* #if defined( KOKKOS_HAVE_CUDA ) && defined( __CUDACC__ ) */
#if defined(KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA)
// Cuda version 8.0 still needs the functor wrapper
- #if (KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA /* && (CUDA_VERSION < 8000) */ )
+ #if (KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA /* && (CUDA_VERSION < 8000) */ ) && defined(__NVCC__)
#define KOKKOS_IMPL_NEED_FUNCTOR_WRAPPER
#endif
#endif
/*--------------------------------------------------------------------------*/
/* Language info: C++, CUDA, OPENMP */
-#if defined( __CUDA_ARCH__ ) && defined( KOKKOS_HAVE_CUDA )
+#if defined( KOKKOS_HAVE_CUDA )
// Compiling Cuda code to 'ptx'
#define KOKKOS_FORCEINLINE_FUNCTION __device__ __host__ __forceinline__
#define KOKKOS_INLINE_FUNCTION __device__ __host__ inline
#define KOKKOS_FUNCTION __device__ __host__
-
#endif /* #if defined( __CUDA_ARCH__ ) */
#if defined( _OPENMP )
/* Compiling with OpenMP.
* The value of _OPENMP is an integer value YYYYMM
* where YYYY and MM are the year and month designation
* of the supported OpenMP API version.
*/
#endif /* #if defined( _OPENMP ) */
/*--------------------------------------------------------------------------*/
/* Mapping compiler built-ins to KOKKOS_COMPILER_*** macros */
#if defined( __NVCC__ )
// NVIDIA compiler is being used.
// Code is parsed and separated into host and device code.
// Host code is compiled again with another compiler.
// Device code is compile to 'ptx'.
#define KOKKOS_COMPILER_NVCC __NVCC__
#else
#if defined( KOKKOS_HAVE_CXX11 ) && ! defined( KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA )
+ #if !defined (KOKKOS_HAVE_CUDA) // Compiling with clang for Cuda does not work with LAMBDAs either
// CUDA (including version 6.5) does not support giving lambdas as
// arguments to global functions. Thus its not currently possible
// to dispatch lambdas from the host.
#define KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA 1
+ #endif
#endif
#endif /* #if defined( __NVCC__ ) */
#if defined( KOKKOS_HAVE_CXX11 ) && !defined (KOKKOS_LAMBDA)
#define KOKKOS_LAMBDA [=]
#endif
-#if ! defined( __CUDA_ARCH__ ) /* Not compiling Cuda code to 'ptx'. */
+#if defined( KOKKOS_HAVE_CXX1Z ) && !defined (KOKKOS_CLASS_LAMBDA)
+ #define KOKKOS_CLASS_LAMBDA [=,*this]
+#endif
+
+//#if ! defined( __CUDA_ARCH__ ) /* Not compiling Cuda code to 'ptx'. */
/* Intel compiler for host code */
#if defined( __INTEL_COMPILER )
#define KOKKOS_COMPILER_INTEL __INTEL_COMPILER
#elif defined( __ICC )
// Old define
#define KOKKOS_COMPILER_INTEL __ICC
#elif defined( __ECC )
// Very old define
#define KOKKOS_COMPILER_INTEL __ECC
#endif
/* CRAY compiler for host code */
#if defined( _CRAYC )
#define KOKKOS_COMPILER_CRAYC _CRAYC
#endif
#if defined( __IBMCPP__ )
// IBM C++
#define KOKKOS_COMPILER_IBM __IBMCPP__
#elif defined( __IBMC__ )
#define KOKKOS_COMPILER_IBM __IBMC__
#endif
#if defined( __APPLE_CC__ )
#define KOKKOS_COMPILER_APPLECC __APPLE_CC__
#endif
#if defined (__clang__) && !defined (KOKKOS_COMPILER_INTEL)
#define KOKKOS_COMPILER_CLANG __clang_major__*100+__clang_minor__*10+__clang_patchlevel__
#endif
#if ! defined( __clang__ ) && ! defined( KOKKOS_COMPILER_INTEL ) &&defined( __GNUC__ )
#define KOKKOS_COMPILER_GNU __GNUC__*100+__GNUC_MINOR__*10+__GNUC_PATCHLEVEL__
#if ( 472 > KOKKOS_COMPILER_GNU )
#error "Compiling with GCC version earlier than 4.7.2 is not supported."
#endif
#endif
#if defined( __PGIC__ ) && ! defined( __GNUC__ )
#define KOKKOS_COMPILER_PGI __PGIC__*100+__PGIC_MINOR__*10+__PGIC_PATCHLEVEL__
#if ( 1540 > KOKKOS_COMPILER_PGI )
#error "Compiling with PGI version earlier than 15.4 is not supported."
#endif
#endif
-#endif /* #if ! defined( __CUDA_ARCH__ ) */
+//#endif /* #if ! defined( __CUDA_ARCH__ ) */
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
/* Intel compiler macros */
#if defined( KOKKOS_COMPILER_INTEL )
#define KOKKOS_HAVE_PRAGMA_UNROLL 1
#define KOKKOS_HAVE_PRAGMA_IVDEP 1
#define KOKKOS_HAVE_PRAGMA_LOOPCOUNT 1
#define KOKKOS_HAVE_PRAGMA_VECTOR 1
#define KOKKOS_HAVE_PRAGMA_SIMD 1
+ #define KOKKOS_RESTRICT __restrict__
+
+ #ifndef KOKKOS_ALIGN
+ #define KOKKOS_ALIGN(size) __attribute__((aligned(size)))
+ #endif
+
+ #ifndef KOKKOS_ALIGN_PTR
+ #define KOKKOS_ALIGN_PTR(size) __attribute__((align_value(size)))
+ #endif
+
+ #ifndef KOKKOS_ALIGN_SIZE
+ #define KOKKOS_ALIGN_SIZE 64
+ #endif
+
#if ( 1400 > KOKKOS_COMPILER_INTEL )
#if ( 1300 > KOKKOS_COMPILER_INTEL )
#error "Compiling with Intel version earlier than 13.0 is not supported. Official minimal version is 14.0."
#else
#warning "Compiling with Intel version 13.x probably works but is not officially supported. Official minimal version is 14.0."
#endif
#endif
- #if ( 1200 <= KOKKOS_COMPILER_INTEL ) && ! defined( KOKKOS_ENABLE_ASM ) && ! defined( _WIN32 )
+ #if ! defined( KOKKOS_ENABLE_ASM ) && ! defined( _WIN32 )
#define KOKKOS_ENABLE_ASM 1
#endif
- #if ( 1200 <= KOKKOS_COMPILER_INTEL ) && ! defined( KOKKOS_FORCEINLINE_FUNCTION )
+ #if ! defined( KOKKOS_FORCEINLINE_FUNCTION )
#if !defined (_WIN32)
#define KOKKOS_FORCEINLINE_FUNCTION inline __attribute__((always_inline))
#else
#define KOKKOS_FORCEINLINE_FUNCTION inline
#endif
#endif
#if defined( __MIC__ )
// Compiling for Xeon Phi
#endif
#endif
/*--------------------------------------------------------------------------*/
/* Cray compiler macros */
#if defined( KOKKOS_COMPILER_CRAYC )
#endif
/*--------------------------------------------------------------------------*/
/* IBM Compiler macros */
#if defined( KOKKOS_COMPILER_IBM )
#define KOKKOS_HAVE_PRAGMA_UNROLL 1
//#define KOKKOS_HAVE_PRAGMA_IVDEP 1
//#define KOKKOS_HAVE_PRAGMA_LOOPCOUNT 1
//#define KOKKOS_HAVE_PRAGMA_VECTOR 1
//#define KOKKOS_HAVE_PRAGMA_SIMD 1
#endif
/*--------------------------------------------------------------------------*/
/* CLANG compiler macros */
#if defined( KOKKOS_COMPILER_CLANG )
//#define KOKKOS_HAVE_PRAGMA_UNROLL 1
//#define KOKKOS_HAVE_PRAGMA_IVDEP 1
//#define KOKKOS_HAVE_PRAGMA_LOOPCOUNT 1
//#define KOKKOS_HAVE_PRAGMA_VECTOR 1
//#define KOKKOS_HAVE_PRAGMA_SIMD 1
#if ! defined( KOKKOS_FORCEINLINE_FUNCTION )
#define KOKKOS_FORCEINLINE_FUNCTION inline __attribute__((always_inline))
#endif
#endif
/*--------------------------------------------------------------------------*/
/* GNU Compiler macros */
#if defined( KOKKOS_COMPILER_GNU )
//#define KOKKOS_HAVE_PRAGMA_UNROLL 1
//#define KOKKOS_HAVE_PRAGMA_IVDEP 1
//#define KOKKOS_HAVE_PRAGMA_LOOPCOUNT 1
//#define KOKKOS_HAVE_PRAGMA_VECTOR 1
//#define KOKKOS_HAVE_PRAGMA_SIMD 1
#if ! defined( KOKKOS_FORCEINLINE_FUNCTION )
#define KOKKOS_FORCEINLINE_FUNCTION inline __attribute__((always_inline))
#endif
- #if ! defined( KOKKOS_ENABLE_ASM ) && \
- ! ( defined( __powerpc) || \
- defined(__powerpc__) || \
- defined(__powerpc64__) || \
- defined(__POWERPC__) || \
- defined(__ppc__) || \
- defined(__ppc64__) || \
- defined(__PGIC__) )
+ #if ! defined( KOKKOS_ENABLE_ASM ) && ! defined( __PGIC__ ) && \
+ ( defined( __amd64 ) || \
+ defined( __amd64__ ) || \
+ defined( __x86_64 ) || \
+ defined( __x86_64__ ) )
#define KOKKOS_ENABLE_ASM 1
#endif
#endif
/*--------------------------------------------------------------------------*/
#if defined( KOKKOS_COMPILER_PGI )
#define KOKKOS_HAVE_PRAGMA_UNROLL 1
#define KOKKOS_HAVE_PRAGMA_IVDEP 1
//#define KOKKOS_HAVE_PRAGMA_LOOPCOUNT 1
#define KOKKOS_HAVE_PRAGMA_VECTOR 1
//#define KOKKOS_HAVE_PRAGMA_SIMD 1
#endif
/*--------------------------------------------------------------------------*/
#if defined( KOKKOS_COMPILER_NVCC )
#if defined(__CUDA_ARCH__ )
#define KOKKOS_HAVE_PRAGMA_UNROLL 1
#endif
#endif
//----------------------------------------------------------------------------
/** Define function marking macros if compiler specific macros are undefined: */
#if ! defined( KOKKOS_FORCEINLINE_FUNCTION )
#define KOKKOS_FORCEINLINE_FUNCTION inline
#endif
#if ! defined( KOKKOS_INLINE_FUNCTION )
#define KOKKOS_INLINE_FUNCTION inline
#endif
#if ! defined( KOKKOS_FUNCTION )
#define KOKKOS_FUNCTION /**/
#endif
+
+//----------------------------------------------------------------------------
+///** Define empty macro for restrict if necessary: */
+
+#if ! defined(KOKKOS_RESTRICT)
+#define KOKKOS_RESTRICT
+#endif
+
//----------------------------------------------------------------------------
/** Define Macro for alignment: */
+#if ! defined KOKKOS_ALIGN_SIZE
+#define KOKKOS_ALIGN_SIZE 16
+#endif
+
+#if ! defined(KOKKOS_ALIGN)
+#define KOKKOS_ALIGN(size) __attribute__((aligned(size)))
+#endif
+
+#if ! defined(KOKKOS_ALIGN_PTR)
+#define KOKKOS_ALIGN_PTR(size) __attribute__((aligned(size)))
+#endif
+
#if ! defined(KOKKOS_ALIGN_16)
-#define KOKKOS_ALIGN_16 __attribute__((aligned(16)))
+#define KOKKOS_ALIGN_16 KOKKOS_ALIGN(16)
#endif
//----------------------------------------------------------------------------
/** Determine the default execution space for parallel dispatch.
* There is zero or one default execution space specified.
*/
#if 1 < ( ( defined ( KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_CUDA ) ? 1 : 0 ) + \
( defined ( KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_OPENMP ) ? 1 : 0 ) + \
( defined ( KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_THREADS ) ? 1 : 0 ) + \
( defined ( KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_SERIAL ) ? 1 : 0 ) )
#error "More than one KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_* specified" ;
#endif
/** If default is not specified then chose from enabled execution spaces.
* Priority: CUDA, OPENMP, THREADS, SERIAL
*/
#if defined ( KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_CUDA )
#elif defined ( KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_OPENMP )
#elif defined ( KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_THREADS )
#elif defined ( KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_SERIAL )
#elif defined ( KOKKOS_HAVE_CUDA )
#define KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_CUDA
#elif defined ( KOKKOS_HAVE_OPENMP )
#define KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_OPENMP
#elif defined ( KOKKOS_HAVE_PTHREAD )
#define KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_THREADS
#else
#define KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_SERIAL
#endif
//----------------------------------------------------------------------------
/** Determine for what space the code is being compiled: */
#if defined( __CUDACC__ ) && defined( __CUDA_ARCH__ ) && defined (KOKKOS_HAVE_CUDA)
#define KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_CUDA
#else
#define KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
#endif
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
#if ( defined( _POSIX_C_SOURCE ) && _POSIX_C_SOURCE >= 200112L ) || \
( defined( _XOPEN_SOURCE ) && _XOPEN_SOURCE >= 600 )
#if defined(KOKKOS_ENABLE_PERFORMANCE_POSIX_MEMALIGN)
#define KOKKOS_POSIX_MEMALIGN_AVAILABLE 1
#endif
#endif
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
/**Enable Profiling by default**/
#ifndef KOKKOS_ENABLE_PROFILING
#define KOKKOS_ENABLE_PROFILING 1
#endif
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
/* Transitional macro to change between old and new View
* are no longer supported.
*/
-#if defined( KOKKOS_USING_DEPRECATED_VIEW )
-#error "Kokkos deprecated View has been removed"
-#endif
-
#define KOKKOS_USING_EXP_VIEW 1
#define KOKKOS_USING_EXPERIMENTAL_VIEW
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
#endif /* #ifndef KOKKOS_MACROS_HPP */
diff --git a/lib/kokkos/core/src/Kokkos_MemoryPool.hpp b/lib/kokkos/core/src/Kokkos_MemoryPool.hpp
index d843f7c9a..e4f895b7d 100644
--- a/lib/kokkos/core/src/Kokkos_MemoryPool.hpp
+++ b/lib/kokkos/core/src/Kokkos_MemoryPool.hpp
@@ -1,1523 +1,1558 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_MEMORYPOOL_HPP
#define KOKKOS_MEMORYPOOL_HPP
#include <Kokkos_Core_fwd.hpp>
#include <Kokkos_Parallel.hpp>
#include <Kokkos_Atomic.hpp>
#include <impl/Kokkos_BitOps.hpp>
#include <impl/Kokkos_Error.hpp>
-#include <impl/KokkosExp_SharedAlloc.hpp>
+#include <impl/Kokkos_SharedAlloc.hpp>
#include <limits>
#include <algorithm>
#include <chrono>
// How should errors be handled? In general, production code should return a
// value indicating failure so the user can decide how the error is handled.
// While experimental, code can abort instead. If KOKKOS_MEMPOOL_PRINTERR is
// defined, the code will abort with an error message. Otherwise, the code will
// return with a value indicating failure when possible, or do nothing instead.
//#define KOKKOS_MEMPOOL_PRINTERR
//#define KOKKOS_MEMPOOL_PRINT_INFO
//#define KOKKOS_MEMPOOL_PRINT_CONSTRUCTOR_INFO
//#define KOKKOS_MEMPOOL_PRINT_BLOCKSIZE_INFO
//#define KOKKOS_MEMPOOL_PRINT_SUPERBLOCK_INFO
//#define KOKKOS_MEMPOOL_PRINT_ACTIVE_SUPERBLOCKS
//#define KOKKOS_MEMPOOL_PRINT_PAGE_INFO
//#define KOKKOS_MEMPOOL_PRINT_INDIVIDUAL_PAGE_INFO
-// A superblock is considered full when this percentage of its pages are full.
-#define KOKKOS_MEMPOOL_SB_FULL_FRACTION 0.80
-
-// A page is considered full when this percentage of its blocks are full.
-#define KOKKOS_MEMPOOL_PAGE_FULL_FRACTION 0.875 // 28 / 32
-
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Experimental {
namespace MempoolImpl {
template < typename T, typename ExecutionSpace >
struct initialize_array {
typedef ExecutionSpace execution_space;
typedef typename ExecutionSpace::size_type size_type;
T * m_data;
T m_value;
initialize_array( T * d, size_t size, T v ) : m_data( d ), m_value( v )
{
Kokkos::parallel_for( size, *this );
execution_space::fence();
}
KOKKOS_INLINE_FUNCTION
void operator()( size_type i ) const { m_data[i] = m_value; }
};
template <typename Bitset>
struct bitset_count
{
typedef typename Bitset::execution_space execution_space;
typedef typename execution_space::size_type size_type;
typedef typename Bitset::size_type value_type;
typedef typename Bitset::word_type word_type;
word_type * m_words;
value_type & m_result;
bitset_count( word_type * w, value_type num_words, value_type & r )
: m_words( w ), m_result( r )
{
parallel_reduce( num_words, *this, m_result );
}
KOKKOS_INLINE_FUNCTION
void init( value_type & v ) const
{ v = 0; }
KOKKOS_INLINE_FUNCTION
void join( volatile value_type & dst, volatile value_type const & src ) const
{ dst += src; }
KOKKOS_INLINE_FUNCTION
- void operator()( size_type i, value_type & count) const
+ void operator()( size_type i, value_type & count ) const
{
count += Kokkos::Impl::bit_count( m_words[i] );
}
};
template < typename Device >
class Bitset {
public:
typedef typename Device::execution_space execution_space;
typedef typename Device::memory_space memory_space;
typedef unsigned word_type;
typedef unsigned size_type;
typedef Kokkos::Impl::DeepCopy< memory_space, Kokkos::HostSpace > raw_deep_copy;
// Define some constants.
enum {
// Size of bitset word. Should be 32.
WORD_SIZE = sizeof(word_type) * CHAR_BIT,
LG_WORD_SIZE = Kokkos::Impl::integral_power_of_two( WORD_SIZE ),
WORD_MASK = WORD_SIZE - 1
};
private:
word_type * m_words;
size_type m_size;
size_type m_num_words;
word_type m_last_word_mask;
public:
~Bitset() = default;
Bitset() = default;
Bitset( Bitset && ) = default;
Bitset( const Bitset & ) = default;
Bitset & operator = ( Bitset && ) = default;
Bitset & operator = ( const Bitset & ) = default;
void init( void * w, size_type s )
{
// Assumption: The size of the memory pointed to by w is a multiple of
// sizeof(word_type).
m_words = reinterpret_cast<word_type*>( w );
m_size = s;
m_num_words = ( s + WORD_SIZE - 1 ) >> LG_WORD_SIZE;
m_last_word_mask = m_size & WORD_MASK ? ( word_type(1) << ( m_size & WORD_MASK ) ) - 1 : 0;
reset();
}
size_type size() const { return m_size; }
size_type count() const
{
- size_type val;
+ size_type val = 0;
bitset_count< Bitset > bc( m_words, m_num_words, val );
return val;
}
void set()
{
// Set all the bits.
initialize_array< word_type, execution_space > ia( m_words, m_num_words, ~word_type(0) );
if ( m_last_word_mask ) {
// Clear the unused bits in the last block.
raw_deep_copy( m_words + ( m_num_words - 1 ), &m_last_word_mask, sizeof(word_type) );
}
}
void reset()
{
initialize_array< word_type, execution_space > ia( m_words, m_num_words, word_type(0) );
}
KOKKOS_FORCEINLINE_FUNCTION
bool test( size_type i ) const
{
size_type word_pos = i >> LG_WORD_SIZE;
word_type word = volatile_load( &m_words[ word_pos ] );
word_type mask = word_type(1) << ( i & WORD_MASK );
return word & mask;
}
KOKKOS_FORCEINLINE_FUNCTION
bool set( size_type i ) const
{
size_type word_pos = i >> LG_WORD_SIZE;
word_type mask = word_type(1) << ( i & WORD_MASK );
return !( atomic_fetch_or( &m_words[ word_pos ], mask ) & mask );
}
KOKKOS_FORCEINLINE_FUNCTION
bool reset( size_type i ) const
{
size_type word_pos = i >> LG_WORD_SIZE;
word_type mask = word_type(1) << ( i & WORD_MASK );
return atomic_fetch_and( &m_words[ word_pos ], ~mask ) & mask;
}
+ KOKKOS_FORCEINLINE_FUNCTION
+ Kokkos::pair< bool, word_type >
+ fetch_word_set( size_type i ) const
+ {
+ size_type word_pos = i >> LG_WORD_SIZE;
+ word_type mask = word_type(1) << ( i & WORD_MASK );
+
+ Kokkos::pair<bool, word_type> result;
+ result.second = atomic_fetch_or( &m_words[ word_pos ], mask );
+ result.first = !( result.second & mask );
+
+ return result;
+ }
+
KOKKOS_FORCEINLINE_FUNCTION
Kokkos::pair< bool, word_type >
fetch_word_reset( size_type i ) const
{
size_type word_pos = i >> LG_WORD_SIZE;
word_type mask = word_type(1) << ( i & WORD_MASK );
Kokkos::pair<bool, word_type> result;
result.second = atomic_fetch_and( &m_words[ word_pos ], ~mask );
result.first = result.second & mask;
return result;
}
KOKKOS_FORCEINLINE_FUNCTION
- Kokkos::pair< bool, size_type >
- set_any_in_word( size_type i, word_type & prev_val ) const
+ Kokkos::pair< bool, word_type >
+ set_any_in_word( size_type & pos ) const
{
- prev_val = 0;
-
- size_type word_pos = i >> LG_WORD_SIZE;
+ size_type word_pos = pos >> LG_WORD_SIZE;
word_type word = volatile_load( &m_words[ word_pos ] );
// Loop until there are no more unset bits in the word.
while ( ~word ) {
// Find the first unset bit in the word.
size_type bit = Kokkos::Impl::bit_scan_forward( ~word );
// Try to set the bit.
- word_type mask = word_type(1) << bit;
+ word_type mask = word_type(1) << bit;
word = atomic_fetch_or( &m_words[ word_pos ], mask );
if ( !( word & mask ) ) {
// Successfully set the bit.
- prev_val = word;
+ pos = ( word_pos << LG_WORD_SIZE ) + bit;
- return Kokkos::pair<bool, size_type>( true, ( word_pos << LG_WORD_SIZE ) + bit );
+ return Kokkos::pair<bool, word_type>( true, word );
}
}
// Didn't find a free bit in this word.
- return Kokkos::pair<bool, size_type>( false, i );
+ return Kokkos::pair<bool, word_type>( false, word_type(0) );
}
KOKKOS_FORCEINLINE_FUNCTION
- Kokkos::pair< bool, size_type >
- set_any_in_word( size_type i, word_type & prev_val, word_type word_mask ) const
+ Kokkos::pair< bool, word_type >
+ set_any_in_word( size_type & pos, word_type word_mask ) const
{
- prev_val = 0;
-
- size_type word_pos = i >> LG_WORD_SIZE;
+ size_type word_pos = pos >> LG_WORD_SIZE;
word_type word = volatile_load( &m_words[ word_pos ] );
word = ( ~word ) & word_mask;
// Loop until there are no more unset bits in the word.
while ( word ) {
// Find the first unset bit in the word.
size_type bit = Kokkos::Impl::bit_scan_forward( word );
// Try to set the bit.
- word_type mask = word_type(1) << bit;
+ word_type mask = word_type(1) << bit;
word = atomic_fetch_or( &m_words[ word_pos ], mask );
if ( !( word & mask ) ) {
// Successfully set the bit.
- prev_val = word;
+ pos = ( word_pos << LG_WORD_SIZE ) + bit;
- return Kokkos::pair<bool, size_type>( true, ( word_pos << LG_WORD_SIZE ) + bit );
+ return Kokkos::pair<bool, word_type>( true, word );
}
word = ( ~word ) & word_mask;
}
// Didn't find a free bit in this word.
- return Kokkos::pair<bool, size_type>( false, i );
+ return Kokkos::pair<bool, word_type>( false, word_type(0) );
}
KOKKOS_FORCEINLINE_FUNCTION
- Kokkos::pair< bool, size_type >
- reset_any_in_word( size_type i, word_type & prev_val ) const
+ Kokkos::pair< bool, word_type >
+ reset_any_in_word( size_type & pos ) const
{
- prev_val = 0;
-
- size_type word_pos = i >> LG_WORD_SIZE;
+ size_type word_pos = pos >> LG_WORD_SIZE;
word_type word = volatile_load( &m_words[ word_pos ] );
// Loop until there are no more set bits in the word.
while ( word ) {
// Find the first unset bit in the word.
size_type bit = Kokkos::Impl::bit_scan_forward( word );
// Try to reset the bit.
- word_type mask = word_type(1) << bit;
+ word_type mask = word_type(1) << bit;
word = atomic_fetch_and( &m_words[ word_pos ], ~mask );
if ( word & mask ) {
// Successfully reset the bit.
- prev_val = word;
+ pos = ( word_pos << LG_WORD_SIZE ) + bit;
- return Kokkos::pair<bool, size_type>( true, ( word_pos << LG_WORD_SIZE ) + bit );
+ return Kokkos::pair<bool, word_type>( true, word );
}
}
// Didn't find a free bit in this word.
- return Kokkos::pair<bool, size_type>( false, i );
+ return Kokkos::pair<bool, word_type>( false, word_type(0) );
}
KOKKOS_FORCEINLINE_FUNCTION
- Kokkos::pair< bool, size_type >
- reset_any_in_word( size_type i, word_type & prev_val, word_type word_mask ) const
+ Kokkos::pair< bool, word_type >
+ reset_any_in_word( size_type & pos, word_type word_mask ) const
{
- prev_val = 0;
-
- size_type word_pos = i >> LG_WORD_SIZE;
+ size_type word_pos = pos >> LG_WORD_SIZE;
word_type word = volatile_load( &m_words[ word_pos ] );
word = word & word_mask;
// Loop until there are no more set bits in the word.
while ( word ) {
// Find the first unset bit in the word.
size_type bit = Kokkos::Impl::bit_scan_forward( word );
// Try to reset the bit.
- word_type mask = word_type(1) << bit;
+ word_type mask = word_type(1) << bit;
word = atomic_fetch_and( &m_words[ word_pos ], ~mask );
if ( word & mask ) {
// Successfully reset the bit.
- prev_val = word;
+ pos = ( word_pos << LG_WORD_SIZE ) + bit;
- return Kokkos::pair<bool, size_type>( true, ( word_pos << LG_WORD_SIZE ) + bit );
+ return Kokkos::pair<bool, word_type>( true, word );
}
word = word & word_mask;
}
// Didn't find a free bit in this word.
- return Kokkos::pair<bool, size_type>( false, i );
+ return Kokkos::pair<bool, word_type>( false, word_type(0) );
}
};
template < typename UInt32View, typename BSHeaderView, typename SBHeaderView,
typename MempoolBitset >
struct create_histogram {
typedef typename UInt32View::execution_space execution_space;
typedef typename execution_space::size_type size_type;
typedef Kokkos::pair< double, uint32_t > value_type;
size_t m_start;
UInt32View m_page_histogram;
BSHeaderView m_blocksize_info;
SBHeaderView m_sb_header;
MempoolBitset m_sb_blocks;
size_t m_lg_max_sb_blocks;
uint32_t m_lg_min_block_size;
uint32_t m_blocks_per_page;
value_type & m_result;
create_histogram( size_t start, size_t end, UInt32View ph, BSHeaderView bsi,
SBHeaderView sbh, MempoolBitset sbb, size_t lmsb,
uint32_t lmbs, uint32_t bpp, value_type & r )
: m_start( start ), m_page_histogram( ph ), m_blocksize_info( bsi ),
m_sb_header( sbh ), m_sb_blocks( sbb ), m_lg_max_sb_blocks( lmsb ),
m_lg_min_block_size( lmbs ), m_blocks_per_page( bpp ), m_result( r )
{
Kokkos::parallel_reduce( end - start, *this, m_result );
execution_space::fence();
}
KOKKOS_INLINE_FUNCTION
void init( value_type & v ) const
{
v.first = 0.0;
v.second = 0;
}
KOKKOS_INLINE_FUNCTION
void join( volatile value_type & dst, volatile value_type const & src ) const
{
dst.first += src.first;
dst.second += src.second;
}
KOKKOS_INLINE_FUNCTION
void operator()( size_type i, value_type & r ) const
{
size_type i2 = i + m_start;
uint32_t lg_block_size = m_sb_header(i2).m_lg_block_size;
// A superblock only has a block size of 0 when it is empty.
if ( lg_block_size != 0 ) {
uint32_t block_size_id = lg_block_size - m_lg_min_block_size;
uint32_t blocks_per_sb = m_blocksize_info[block_size_id].m_blocks_per_sb;
uint32_t pages_per_sb = m_blocksize_info[block_size_id].m_pages_per_sb;
uint32_t total_allocated_blocks = 0;
for ( uint32_t j = 0; j < pages_per_sb; ++j ) {
unsigned start_pos = ( i2 << m_lg_max_sb_blocks ) + j * m_blocks_per_page;
unsigned end_pos = start_pos + m_blocks_per_page;
uint32_t page_allocated_blocks = 0;
for ( unsigned k = start_pos; k < end_pos; ++k ) {
page_allocated_blocks += m_sb_blocks.test( k );
}
total_allocated_blocks += page_allocated_blocks;
- atomic_fetch_add( &m_page_histogram(page_allocated_blocks), 1 );
+ atomic_increment( &m_page_histogram(page_allocated_blocks) );
}
r.first += double(total_allocated_blocks) / blocks_per_sb;
r.second += blocks_per_sb;
}
}
};
#ifdef KOKKOS_MEMPOOL_PRINT_SUPERBLOCK_INFO
template < typename UInt32View, typename SBHeaderView, typename MempoolBitset >
struct count_allocated_blocks {
typedef typename UInt32View::execution_space execution_space;
typedef typename execution_space::size_type size_type;
UInt32View m_num_allocated_blocks;
SBHeaderView m_sb_header;
MempoolBitset m_sb_blocks;
size_t m_sb_size;
size_t m_lg_max_sb_blocks;
count_allocated_blocks( size_t num_sb, UInt32View nab, SBHeaderView sbh,
MempoolBitset sbb, size_t sbs, size_t lmsb )
: m_num_allocated_blocks( nab ), m_sb_header( sbh ),
m_sb_blocks( sbb ), m_sb_size( sbs ), m_lg_max_sb_blocks( lmsb )
{
Kokkos::parallel_for( num_sb, *this );
execution_space::fence();
}
KOKKOS_INLINE_FUNCTION
void operator()( size_type i ) const
{
uint32_t lg_block_size = m_sb_header(i).m_lg_block_size;
// A superblock only has a block size of 0 when it is empty.
if ( lg_block_size != 0 ) {
// Count the allocated blocks in the superblock.
uint32_t blocks_per_sb = lg_block_size > 0 ? m_sb_size >> lg_block_size : 0;
unsigned start_pos = i << m_lg_max_sb_blocks;
unsigned end_pos = start_pos + blocks_per_sb;
uint32_t count = 0;
for ( unsigned j = start_pos; j < end_pos; ++j ) {
count += m_sb_blocks.test( j );
}
m_num_allocated_blocks(i) = count;
}
}
};
#endif
}
/// \class MemoryPool
/// \brief Bitset based memory manager for pools of same-sized chunks of memory.
/// \tparam Device Kokkos device that gives the execution and memory space the
/// allocator will be used in.
///
/// MemoryPool is a memory space that can be on host or device. It provides a
/// pool memory allocator for fast allocation of same-sized chunks of memory.
/// The memory is only accessible on the host / device this allocator is
/// associated with.
///
/// This allocator is based on ideas from the following GPU allocators:
/// Halloc (https://github.com/canonizer/halloc).
/// ScatterAlloc (https://github.com/ComputationalRadiationPhysics/scatteralloc)
template < typename Device >
class MemoryPool {
private:
// The allocator uses superblocks. A superblock is divided into pages, and a
// page is divided into blocks. A block is the chunk of memory that is given
// out by the allocator. A page always has a number of blocks equal to the
// size of the word used by the bitset. Thus, the pagesize can vary between
// superblocks as it is based on the block size of the superblock. The
// allocator supports all powers of 2 from MIN_BLOCK_SIZE to the size of a
// superblock as block sizes.
// Superblocks are divided into 4 categories:
// 1. empty - is completely empty; there are no active allocations
// 2. partfull - partially full; there are some active allocations
// 3. full - full enough with active allocations that new allocations
// will likely fail
// 4. active - is currently the active superblock for a block size
//
// An inactive superblock is one that is empty, partfull, or full.
//
// New allocations occur only from an active superblock. If a superblock is
// made inactive after an allocation request is made to it but before the
// allocation request is fulfilled, the allocation will still be attempted
// from that superblock. Deallocations can occur to partfull, full, or
// active superblocks. Superblocks move between categories as allocations
// and deallocations happen. Superblocks all start empty.
//
// Here are the possible moves between categories:
// empty -> active During allocation, there is no active superblock
// or the active superblock is full.
// active -> full During allocation, the full threshold of the
// superblock is reached when increasing the fill
// level.
// full -> partfull During deallocation, the full threshold of the
// superblock is crossed when decreasing the fill
// level.
// partfull -> empty Deallocation of the last allocated block of an
// inactive superblock.
// partfull -> active During allocation, the active superblock is full.
//
// When a new active superblock is needed, partfull superblocks of the same
// block size are chosen over empty superblocks.
//
// The empty and partfull superblocks are tracked using bitsets that represent
// the superblocks in those repsective categories. Empty superblocks use a
// single bitset, while partfull superblocks use a bitset per block size
// (contained sequentially in a single bitset). Active superblocks are
// tracked by the active superblocks array. Full superblocks aren't tracked
// at all.
typedef typename Device::execution_space execution_space;
typedef typename Device::memory_space backend_memory_space;
typedef Device device_type;
typedef MempoolImpl::Bitset< device_type > MempoolBitset;
// Define some constants.
enum {
MIN_BLOCK_SIZE = 64,
LG_MIN_BLOCK_SIZE = Kokkos::Impl::integral_power_of_two( MIN_BLOCK_SIZE ),
MAX_BLOCK_SIZES = 31 - LG_MIN_BLOCK_SIZE + 1,
// Size of bitset word.
BLOCKS_PER_PAGE = MempoolBitset::WORD_SIZE,
LG_BLOCKS_PER_PAGE = MempoolBitset::LG_WORD_SIZE,
INVALID_SUPERBLOCK = ~uint32_t(0),
SUPERBLOCK_LOCK = ~uint32_t(0) - 1,
MAX_TRIES = 32 // Cap on the number of pages searched
// before an allocation returns empty.
};
public:
// Stores information about each superblock.
struct SuperblockHeader {
uint32_t m_full_pages;
uint32_t m_empty_pages;
uint32_t m_lg_block_size;
uint32_t m_is_active;
KOKKOS_FUNCTION
SuperblockHeader() :
m_full_pages(0), m_empty_pages(0), m_lg_block_size(0), m_is_active(false) {}
};
// Stores information about each block size.
struct BlockSizeHeader {
uint32_t m_blocks_per_sb;
uint32_t m_pages_per_sb;
uint32_t m_sb_full_level;
uint32_t m_page_full_level;
KOKKOS_FUNCTION
BlockSizeHeader() :
m_blocks_per_sb(0), m_pages_per_sb(0), m_sb_full_level(0), m_page_full_level(0) {}
};
private:
- typedef Impl::SharedAllocationTracker Tracker;
+ typedef Kokkos::Impl::SharedAllocationTracker Tracker;
typedef View< uint32_t *, device_type > UInt32View;
typedef View< SuperblockHeader *, device_type > SBHeaderView;
// The letters 'sb' used in any variable name mean superblock.
size_t m_lg_sb_size; // Log2 of superblock size.
size_t m_sb_size; // Superblock size.
size_t m_lg_max_sb_blocks; // Log2 of the number of blocks of the
// minimum block size in a superblock.
size_t m_num_sb; // Number of superblocks.
size_t m_ceil_num_sb; // Number of superblocks rounded up to the smallest
// multiple of the bitset word size. Used by
// bitsets representing superblock categories to
// ensure different block sizes never share a word
// in the bitset.
size_t m_num_block_size; // Number of block sizes supported.
size_t m_data_size; // Amount of memory available to the allocator.
size_t m_sb_blocks_size; // Amount of memory for free / empty blocks bitset.
size_t m_empty_sb_size; // Amount of memory for empty superblocks bitset.
size_t m_partfull_sb_size; // Amount of memory for partfull superblocks bitset.
size_t m_total_size; // Total amount of memory allocated.
char * m_data; // Beginning device memory location used for
// superblocks.
UInt32View m_active; // Active superblocks IDs.
SBHeaderView m_sb_header; // Header info for superblocks.
MempoolBitset m_sb_blocks; // Bitsets representing free / allocated status
// of blocks in superblocks.
MempoolBitset m_empty_sb; // Bitset representing empty superblocks.
MempoolBitset m_partfull_sb; // Bitsets representing partially full superblocks.
Tracker m_track; // Tracker for superblock memory.
BlockSizeHeader m_blocksize_info[MAX_BLOCK_SIZES]; // Header info for block sizes.
// There were several methods tried for storing the block size header info: in a View,
// in a View of const data, and in a RandomAccess View. All of these were slower than
// storing it in a static array that is a member variable to the class. In the latter
// case, the block size info gets copied into the constant memory on the GPU along with
// the class when it is copied there for exeucting a parallel loop. Instead of storing
// the values, computing the values every time they were needed was also tried. This
// method was slightly slower than storing them in the static array.
public:
//! Tag this class as a kokkos memory space
typedef MemoryPool memory_space;
~MemoryPool() = default;
MemoryPool() = default;
MemoryPool( MemoryPool && ) = default;
MemoryPool( const MemoryPool & ) = default;
MemoryPool & operator = ( MemoryPool && ) = default;
MemoryPool & operator = ( const MemoryPool & ) = default;
/// \brief Initializes the memory pool.
/// \param memspace The memory space from which the memory pool will allocate memory.
/// \param total_size The requested memory amount controlled by the allocator. The
/// actual amount is rounded up to the smallest multiple of the
/// superblock size >= the requested size.
/// \param log2_superblock_size Log2 of the size of superblocks used by the allocator.
/// In most use cases, the default value should work.
inline
MemoryPool( const backend_memory_space & memspace,
size_t total_size, size_t log2_superblock_size = 20 )
: m_lg_sb_size( log2_superblock_size ),
m_sb_size( size_t(1) << m_lg_sb_size ),
m_lg_max_sb_blocks( m_lg_sb_size - LG_MIN_BLOCK_SIZE ),
m_num_sb( ( total_size + m_sb_size - 1 ) >> m_lg_sb_size ),
m_ceil_num_sb( ( ( m_num_sb + BLOCKS_PER_PAGE - 1 ) >> LG_BLOCKS_PER_PAGE ) <<
LG_BLOCKS_PER_PAGE ),
m_num_block_size( m_lg_sb_size - LG_MIN_BLOCK_SIZE + 1 ),
m_data_size( m_num_sb * m_sb_size ),
m_sb_blocks_size( ( m_num_sb << m_lg_max_sb_blocks ) / CHAR_BIT ),
m_empty_sb_size( m_ceil_num_sb / CHAR_BIT ),
m_partfull_sb_size( m_ceil_num_sb * m_num_block_size / CHAR_BIT ),
m_total_size( m_data_size + m_sb_blocks_size + m_empty_sb_size + m_partfull_sb_size ),
m_data(0),
m_active( "Active superblocks" ),
m_sb_header( "Superblock headers" ),
m_track()
{
// Assumption. The minimum block size must be a power of 2.
static_assert( Kokkos::Impl::is_integral_power_of_two( MIN_BLOCK_SIZE ), "" );
// Assumption. Require a superblock be large enough so it takes at least 1
// whole bitset word to represent it using the minimum blocksize.
if ( m_sb_size < MIN_BLOCK_SIZE * BLOCKS_PER_PAGE ) {
printf( "\n** MemoryPool::MemoryPool() Superblock size must be >= %u **\n",
MIN_BLOCK_SIZE * BLOCKS_PER_PAGE );
#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
fflush( stdout );
#endif
Kokkos::abort( "" );
}
// Assumption. A superblock's size can be at most 2^31. Verify this.
if ( m_lg_sb_size > 31 ) {
printf( "\n** MemoryPool::MemoryPool() Superblock size must be < %u **\n",
( uint32_t(1) << 31 ) );
#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
fflush( stdout );
#endif
Kokkos::abort( "" );
}
// Assumption. The Bitset only uses unsigned for size types which limits
// the amount of memory the allocator can manage. Verify the memory size
// is below this limit.
if ( m_data_size > size_t(MIN_BLOCK_SIZE) * std::numeric_limits<unsigned>::max() ) {
printf( "\n** MemoryPool::MemoryPool() Allocator can only manage %lu bytes of memory; requested %lu **\n",
size_t(MIN_BLOCK_SIZE) * std::numeric_limits<unsigned>::max(), total_size );
#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
fflush( stdout );
#endif
Kokkos::abort( "" );
}
// Allocate memory for Views. This is done here instead of at construction
// so that the runtime checks can be performed before allocating memory.
- resize(m_active, m_num_block_size );
- resize(m_sb_header, m_num_sb );
+ resize( m_active, m_num_block_size );
+ resize( m_sb_header, m_num_sb );
// Allocate superblock memory.
- typedef Impl::SharedAllocationRecord< backend_memory_space, void > SharedRecord;
+ typedef Kokkos::Impl::SharedAllocationRecord< backend_memory_space, void > SharedRecord;
SharedRecord * rec =
SharedRecord::allocate( memspace, "mempool", m_total_size );
m_track.assign_allocated_record_to_uninitialized( rec );
m_data = reinterpret_cast<char *>( rec->data() );
// Set and initialize the free / empty block bitset memory.
m_sb_blocks.init( m_data + m_data_size, m_num_sb << m_lg_max_sb_blocks );
// Set and initialize the empty superblock block bitset memory.
m_empty_sb.init( m_data + m_data_size + m_sb_blocks_size, m_num_sb );
// Start with all superblocks in the empty category.
m_empty_sb.set();
// Set and initialize the partfull superblock block bitset memory.
m_partfull_sb.init( m_data + m_data_size + m_sb_blocks_size + m_empty_sb_size,
m_ceil_num_sb * m_num_block_size );
// Initialize all active superblocks to be invalid.
- typename UInt32View::HostMirror host_active = create_mirror_view(m_active);
- for (size_t i = 0; i < m_num_block_size; ++i) host_active(i) = INVALID_SUPERBLOCK;
+ typename UInt32View::HostMirror host_active = create_mirror_view( m_active );
+ for ( size_t i = 0; i < m_num_block_size; ++i ) host_active(i) = INVALID_SUPERBLOCK;
+ deep_copy( m_active, host_active );
+
+ // A superblock is considered full when this percentage of its pages are full.
+ const double superblock_full_fraction = .8;
- deep_copy(m_active, host_active);
+ // A page is considered full when this percentage of its blocks are full.
+ const double page_full_fraction = .875;
// Initialize the blocksize info.
for ( size_t i = 0; i < m_num_block_size; ++i ) {
uint32_t lg_block_size = i + LG_MIN_BLOCK_SIZE;
uint32_t blocks_per_sb = m_sb_size >> lg_block_size;
uint32_t pages_per_sb = ( blocks_per_sb + BLOCKS_PER_PAGE - 1 ) >> LG_BLOCKS_PER_PAGE;
m_blocksize_info[i].m_blocks_per_sb = blocks_per_sb;
m_blocksize_info[i].m_pages_per_sb = pages_per_sb;
// Set the full level for the superblock.
m_blocksize_info[i].m_sb_full_level =
- static_cast<uint32_t>( pages_per_sb * KOKKOS_MEMPOOL_SB_FULL_FRACTION );
+ static_cast<uint32_t>( pages_per_sb * superblock_full_fraction );
if ( m_blocksize_info[i].m_sb_full_level == 0 ) {
m_blocksize_info[i].m_sb_full_level = 1;
}
// Set the full level for the page.
uint32_t blocks_per_page =
blocks_per_sb < BLOCKS_PER_PAGE ? blocks_per_sb : BLOCKS_PER_PAGE;
m_blocksize_info[i].m_page_full_level =
- static_cast<uint32_t>( blocks_per_page * KOKKOS_MEMPOOL_PAGE_FULL_FRACTION );
+ static_cast<uint32_t>( blocks_per_page * page_full_fraction );
if ( m_blocksize_info[i].m_page_full_level == 0 ) {
m_blocksize_info[i].m_page_full_level = 1;
}
}
#ifdef KOKKOS_MEMPOOL_PRINT_CONSTRUCTOR_INFO
printf( "\n" );
printf( " m_lg_sb_size: %12lu\n", m_lg_sb_size );
printf( " m_sb_size: %12lu\n", m_sb_size );
printf( " m_max_sb_blocks: %12lu\n", size_t(1) << m_lg_max_sb_blocks );
printf( "m_lg_max_sb_blocks: %12lu\n", m_lg_max_sb_blocks );
printf( " m_num_sb: %12lu\n", m_num_sb );
printf( " m_ceil_num_sb: %12lu\n", m_ceil_num_sb );
printf( " m_num_block_size: %12lu\n", m_num_block_size );
printf( " data bytes: %12lu\n", m_data_size );
printf( " sb_blocks bytes: %12lu\n", m_sb_blocks_size );
printf( " empty_sb bytes: %12lu\n", m_empty_sb_size );
printf( " partfull_sb bytes: %12lu\n", m_partfull_sb_size );
printf( " total bytes: %12lu\n", m_total_size );
printf( " m_empty_sb size: %12u\n", m_empty_sb.size() );
printf( "m_partfull_sb size: %12u\n", m_partfull_sb.size() );
printf( "\n" );
fflush( stdout );
#endif
#ifdef KOKKOS_MEMPOOL_PRINT_BLOCKSIZE_INFO
// Print the blocksize info for all the block sizes.
printf( "SIZE BLOCKS_PER_SB PAGES_PER_SB SB_FULL_LEVEL PAGE_FULL_LEVEL\n" );
for ( size_t i = 0; i < m_num_block_size; ++i ) {
printf( "%4zu %13u %12u %13u %15u\n", i + LG_MIN_BLOCK_SIZE,
m_blocksize_info[i].m_blocks_per_sb, m_blocksize_info[i].m_pages_per_sb,
m_blocksize_info[i].m_sb_full_level, m_blocksize_info[i].m_page_full_level );
}
printf( "\n" );
#endif
}
/// \brief The actual block size allocated given alloc_size.
KOKKOS_INLINE_FUNCTION
size_t allocate_block_size( const size_t alloc_size ) const
- { return size_t(1) << ( get_block_size_index( alloc_size ) + LG_MIN_BLOCK_SIZE); }
+ { return size_t(1) << ( get_block_size_index( alloc_size ) + LG_MIN_BLOCK_SIZE ); }
/// \brief Allocate a chunk of memory.
/// \param alloc_size Size of the requested allocated in number of bytes.
///
/// The function returns a void pointer to a memory location on success and
/// NULL on failure.
KOKKOS_FUNCTION
void * allocate( size_t alloc_size ) const
{
void * p = 0;
// Only support allocations up to the superblock size. Just return 0
// (failed allocation) for any size above this.
- if (alloc_size <= m_sb_size )
+ if ( alloc_size <= m_sb_size )
{
int block_size_id = get_block_size_index( alloc_size );
uint32_t blocks_per_sb = m_blocksize_info[block_size_id].m_blocks_per_sb;
uint32_t pages_per_sb = m_blocksize_info[block_size_id].m_pages_per_sb;
+
+#ifdef KOKKOS_CUDA_CLANG_WORKAROUND
+ // Without this test it looks like pages_per_sb might come back wrong.
+ if ( pages_per_sb == 0 ) return NULL;
+#endif
+
unsigned word_size = blocks_per_sb > 32 ? 32 : blocks_per_sb;
unsigned word_mask = ( uint64_t(1) << word_size ) - 1;
+ // Instead of forcing an atomic read to guarantee the updated value,
+ // reading the old value is actually beneficial because more threads will
+ // attempt allocations on the old active superblock instead of waiting on
+ // the new active superblock. This will help hide the latency of
+ // switching the active superblock.
uint32_t sb_id = volatile_load( &m_active(block_size_id) );
- // If the active is locked, keep reading it until the lock is released.
+ // If the active is locked, keep reading it atomically until the lock is
+ // released.
while ( sb_id == SUPERBLOCK_LOCK ) {
- sb_id = volatile_load( &m_active(block_size_id) );
+ sb_id = atomic_fetch_or( &m_active(block_size_id), uint32_t(0) );
}
+ load_fence();
+
bool allocation_done = false;
- while (!allocation_done) {
+ while ( !allocation_done ) {
bool need_new_sb = false;
- if (sb_id != INVALID_SUPERBLOCK) {
+ if ( sb_id != INVALID_SUPERBLOCK ) {
// Use the value from the clock register as the hash value.
uint64_t hash_val = get_clock_register();
// Get the starting position for this superblock's bits in the bitset.
uint32_t pos_base = sb_id << m_lg_max_sb_blocks;
// Mod the hash value to choose a page in the superblock. The
// initial block searched is the first block of that page.
uint32_t pos_rel = uint32_t( hash_val & ( pages_per_sb - 1 ) ) << LG_BLOCKS_PER_PAGE;
// Get the absolute starting position for this superblock's bits in the bitset.
uint32_t pos = pos_base + pos_rel;
// Keep track of the number of pages searched. Pages in the superblock are
// searched linearly from the starting page. All pages in the superblock are
// searched until either a location is found, or it is proven empty.
uint32_t pages_searched = 0;
bool search_done = false;
- while (!search_done) {
- bool success;
- unsigned prev_val;
+ while ( !search_done ) {
+ bool success = false;
+ unsigned prev_val = 0;
- Kokkos::tie( success, pos ) =
- m_sb_blocks.set_any_in_word( pos, prev_val, word_mask );
+ Kokkos::tie( success, prev_val ) = m_sb_blocks.set_any_in_word( pos, word_mask );
if ( !success ) {
if ( ++pages_searched >= pages_per_sb ) {
// Searched all the pages in this superblock. Look for a new superblock.
//
// The previous method tried limiting the number of pages searched, but
// that caused a huge performance issue in CUDA where the outer loop
// executed massive numbers of times. Threads weren't able to find a
// free location when the superblock wasn't full and were able to execute
// the outer loop many times before the superblock was switched for a new
// one. Switching to an exhaustive search eliminated this possiblity and
// didn't slow anything down for the tests.
need_new_sb = true;
search_done = true;
}
else {
// Move to the next page making sure the new search position
// doesn't go past this superblock's bits.
pos += BLOCKS_PER_PAGE;
pos = ( pos < pos_base + blocks_per_sb ) ? pos : pos_base;
}
}
else {
// Reserved a memory location to allocate.
+ memory_fence();
+
search_done = true;
allocation_done = true;
uint32_t lg_block_size = block_size_id + LG_MIN_BLOCK_SIZE;
p = m_data + ( size_t(sb_id) << m_lg_sb_size ) +
( ( pos - pos_base ) << lg_block_size );
uint32_t used_bits = Kokkos::Impl::bit_count( prev_val );
if ( used_bits == 0 ) {
// This page was empty. Decrement the number of empty pages for
// the superblock.
- atomic_fetch_sub( &m_sb_header(sb_id).m_empty_pages, 1 );
+ atomic_decrement( &m_sb_header(sb_id).m_empty_pages );
}
else if ( used_bits == m_blocksize_info[block_size_id].m_page_full_level - 1 )
{
// This page is full. Increment the number of full pages for
// the superblock.
uint32_t full_pages = atomic_fetch_add( &m_sb_header(sb_id).m_full_pages, 1 );
// This allocation made the superblock full, so a new one needs to be found.
if ( full_pages == m_blocksize_info[block_size_id].m_sb_full_level - 1 ) {
need_new_sb = true;
}
}
}
}
}
else {
// This is the first allocation for this block size. A superblock needs
// to be set as the active one. If this point is reached any other time,
// it is an error.
need_new_sb = true;
}
if ( need_new_sb ) {
uint32_t new_sb_id = find_superblock( block_size_id, sb_id );
if ( new_sb_id == sb_id ) {
allocation_done = true;
#ifdef KOKKOS_MEMPOOL_PRINT_INFO
printf( "** No superblocks available. **\n" );
#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
fflush( stdout );
#endif
#endif
}
else {
sb_id = new_sb_id;
}
}
}
}
#ifdef KOKKOS_MEMPOOL_PRINT_INFO
else {
printf( "** Requested allocation size (%zu) larger than superblock size (%lu). **\n",
- alloc_size, m_sb_size);
+ alloc_size, m_sb_size );
#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
fflush( stdout );
#endif
}
#endif
return p;
}
/// \brief Release allocated memory back to the pool.
/// \param alloc_ptr Pointer to chunk of memory previously allocated by
/// the allocator.
/// \param alloc_size Size of the allocated memory in number of bytes.
KOKKOS_FUNCTION
void deallocate( void * alloc_ptr, size_t alloc_size ) const
{
char * ap = static_cast<char *>( alloc_ptr );
// Only deallocate memory controlled by this pool.
if ( ap >= m_data && ap + alloc_size <= m_data + m_data_size ) {
// Get the superblock for the address. This can be calculated by math on
// the address since the superblocks are stored contiguously in one memory
// chunk.
uint32_t sb_id = ( ap - m_data ) >> m_lg_sb_size;
// Get the starting position for this superblock's bits in the bitset.
uint32_t pos_base = sb_id << m_lg_max_sb_blocks;
// Get the relative position for this memory location's bit in the bitset.
uint32_t offset = ( ap - m_data ) - ( size_t(sb_id) << m_lg_sb_size );
uint32_t lg_block_size = m_sb_header(sb_id).m_lg_block_size;
uint32_t block_size_id = lg_block_size - LG_MIN_BLOCK_SIZE;
uint32_t pos_rel = offset >> lg_block_size;
- bool success;
- unsigned prev_val;
+ bool success = false;
+ unsigned prev_val = 0;
+
+ memory_fence();
Kokkos::tie( success, prev_val ) = m_sb_blocks.fetch_word_reset( pos_base + pos_rel );
// If the memory location was previously deallocated, do nothing.
if ( success ) {
uint32_t page_fill_level = Kokkos::Impl::bit_count( prev_val );
if ( page_fill_level == 1 ) {
// This page is now empty. Increment the number of empty pages for the
// superblock.
uint32_t empty_pages = atomic_fetch_add( &m_sb_header(sb_id).m_empty_pages, 1 );
if ( !volatile_load( &m_sb_header(sb_id).m_is_active ) &&
empty_pages == m_blocksize_info[block_size_id].m_pages_per_sb - 1 )
{
// This deallocation caused the superblock to be empty. Change the
// superblock category from partially full to empty.
unsigned pos = block_size_id * m_ceil_num_sb + sb_id;
if ( m_partfull_sb.reset( pos ) ) {
// Reset the empty pages and block size for the superblock.
volatile_store( &m_sb_header(sb_id).m_empty_pages, uint32_t(0) );
volatile_store( &m_sb_header(sb_id).m_lg_block_size, uint32_t(0) );
- memory_fence();
+ store_fence();
m_empty_sb.set( sb_id );
}
}
}
else if ( page_fill_level == m_blocksize_info[block_size_id].m_page_full_level ) {
// This page is no longer full. Decrement the number of full pages for
// the superblock.
uint32_t full_pages = atomic_fetch_sub( &m_sb_header(sb_id).m_full_pages, 1 );
if ( !volatile_load( &m_sb_header(sb_id).m_is_active ) &&
full_pages == m_blocksize_info[block_size_id].m_sb_full_level )
{
// This deallocation caused the number of full pages to decrease below
// the full threshold. Change the superblock category from full to
// partially full.
unsigned pos = block_size_id * m_ceil_num_sb + sb_id;
m_partfull_sb.set( pos );
}
}
}
}
#ifdef KOKKOS_MEMPOOL_PRINTERR
else {
printf( "\n** MemoryPool::deallocate() ADDRESS_OUT_OF_RANGE(0x%llx) **\n",
reinterpret_cast<uint64_t>( alloc_ptr ) );
#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
fflush( stdout );
#endif
}
#endif
}
/// \brief Tests if the memory pool has no more memory available to allocate.
KOKKOS_INLINE_FUNCTION
bool is_empty() const
{
// The allocator is empty if all superblocks are full. A superblock is
// full if it has >= 80% of its pages allocated.
// Look at all the superblocks. If one is not full, then the allocator
// isn't empty.
for ( size_t i = 0; i < m_num_sb; ++i ) {
uint32_t lg_block_size = m_sb_header(i).m_lg_block_size;
// A superblock only has a block size of 0 when it is empty.
if ( lg_block_size == 0 ) return false;
uint32_t block_size_id = lg_block_size - LG_MIN_BLOCK_SIZE;
uint32_t full_pages = volatile_load( &m_sb_header(i).m_full_pages );
if ( full_pages < m_blocksize_info[block_size_id].m_sb_full_level ) return false;
}
// All the superblocks were full. The allocator is empty.
return true;
}
// The following functions are used for debugging.
void print_status() const
{
printf( "\n" );
#ifdef KOKKOS_MEMPOOL_PRINT_SUPERBLOCK_INFO
- typename SBHeaderView::HostMirror host_sb_header = create_mirror_view(m_sb_header);
+ typename SBHeaderView::HostMirror host_sb_header = create_mirror_view( m_sb_header );
deep_copy( host_sb_header, m_sb_header );
UInt32View num_allocated_blocks( "Allocated Blocks", m_num_sb );
// Count the number of allocated blocks per superblock.
{
MempoolImpl::count_allocated_blocks< UInt32View, SBHeaderView, MempoolBitset >
mch( m_num_sb, num_allocated_blocks, m_sb_header,
m_sb_blocks, m_sb_size, m_lg_max_sb_blocks );
}
typename UInt32View::HostMirror host_num_allocated_blocks =
- create_mirror_view(num_allocated_blocks);
+ create_mirror_view( num_allocated_blocks );
deep_copy( host_num_allocated_blocks, num_allocated_blocks );
// Print header info of all superblocks.
printf( "SB_ID SIZE ACTIVE EMPTY_PAGES FULL_PAGES USED_BLOCKS\n" );
for ( size_t i = 0; i < m_num_sb; ++i ) {
printf( "%5zu %4u %6d %11u %10u %10u\n", i,
host_sb_header(i).m_lg_block_size, host_sb_header(i).m_is_active,
host_sb_header(i).m_empty_pages, host_sb_header(i).m_full_pages,
host_num_allocated_blocks(i) );
}
printf( "\n" );
#endif
UInt32View page_histogram( "Page Histogram", 33 );
// Get a View version of the blocksize info.
typedef View< BlockSizeHeader *, device_type > BSHeaderView;
BSHeaderView blocksize_info( "BlockSize Headers", MAX_BLOCK_SIZES );
Kokkos::Impl::DeepCopy< backend_memory_space, Kokkos::HostSpace >
dc( blocksize_info.ptr_on_device(), m_blocksize_info,
sizeof(BlockSizeHeader) * m_num_block_size );
Kokkos::pair< double, uint32_t > result = Kokkos::pair< double, uint32_t >( 0.0, 0 );
// Create the page histogram.
{
MempoolImpl::create_histogram< UInt32View, BSHeaderView, SBHeaderView, MempoolBitset >
mch( 0, m_num_sb, page_histogram, blocksize_info, m_sb_header, m_sb_blocks,
m_lg_max_sb_blocks, LG_MIN_BLOCK_SIZE, BLOCKS_PER_PAGE, result );
}
- typename UInt32View::HostMirror host_page_histogram = create_mirror_view(page_histogram);
+ typename UInt32View::HostMirror host_page_histogram = create_mirror_view( page_histogram );
deep_copy( host_page_histogram, page_histogram );
// Find the used and total pages and blocks.
uint32_t used_pages = 0;
uint32_t used_blocks = 0;
for ( uint32_t i = 1; i < 33; ++i ) {
used_pages += host_page_histogram(i);
used_blocks += i * host_page_histogram(i);
}
uint32_t total_pages = used_pages + host_page_histogram(0);
unsigned num_empty_sb = m_empty_sb.count();
unsigned num_non_empty_sb = m_num_sb - num_empty_sb;
unsigned num_partfull_sb = m_partfull_sb.count();
uint32_t total_blocks = result.second;
double ave_sb_full = num_non_empty_sb == 0 ? 0.0 : result.first / num_non_empty_sb;
double percent_used_sb = double( m_num_sb - num_empty_sb ) / m_num_sb;
double percent_used_pages = total_pages == 0 ? 0.0 : double(used_pages) / total_pages;
double percent_used_blocks = total_blocks == 0 ? 0.0 : double(used_blocks) / total_blocks;
// Count active superblocks.
- typename UInt32View::HostMirror host_active = create_mirror_view(m_active);
- deep_copy(host_active, m_active);
+ typename UInt32View::HostMirror host_active = create_mirror_view( m_active );
+ deep_copy( host_active, m_active );
unsigned num_active_sb = 0;
for ( size_t i = 0; i < m_num_block_size; ++i ) {
num_active_sb += host_active(i) != INVALID_SUPERBLOCK;
}
#ifdef KOKKOS_MEMPOOL_PRINT_ACTIVE_SUPERBLOCKS
// Print active superblocks.
printf( "BS_ID SB_ID\n" );
for ( size_t i = 0; i < m_num_block_size; ++i ) {
uint32_t sb_id = host_active(i);
if ( sb_id == INVALID_SUPERBLOCK ) {
printf( "%5zu I\n", i );
}
else if ( sb_id == SUPERBLOCK_LOCK ) {
printf( "%5zu L\n", i );
}
else {
printf( "%5zu %7u\n", i, sb_id );
}
}
printf( "\n" );
fflush( stdout );
#endif
#ifdef KOKKOS_MEMPOOL_PRINT_PAGE_INFO
// Print the summary page histogram.
printf( "USED_BLOCKS PAGE_COUNT\n" );
for ( uint32_t i = 0; i < 33; ++i ) {
printf( "%10u %10u\n", i, host_page_histogram[i] );
}
printf( "\n" );
#endif
#ifdef KOKKOS_MEMPOOL_PRINT_INDIVIDUAL_PAGE_INFO
// Print the page histogram for a few individual superblocks.
// const uint32_t num_sb_id = 2;
// uint32_t sb_id[num_sb_id] = { 0, 10 };
const uint32_t num_sb_id = 1;
uint32_t sb_id[num_sb_id] = { 0 };
for ( uint32_t i = 0; i < num_sb_id; ++i ) {
deep_copy( page_histogram, 0 );
{
MempoolImpl::create_histogram< UInt32View, BSHeaderView, SBHeaderView, MempoolBitset >
mch( sb_id[i], sb_id[i] + 1, page_histogram, blocksize_info, m_sb_header,
m_sb_blocks, m_lg_max_sb_blocks, LG_MIN_BLOCK_SIZE, BLOCKS_PER_PAGE, result );
}
deep_copy( host_page_histogram, page_histogram );
printf( "SB_ID USED_BLOCKS PAGE_COUNT\n" );
for ( uint32_t j = 0; j < 33; ++j ) {
printf( "%5u %10u %10u\n", sb_id[i], j, host_page_histogram[j] );
}
printf( "\n" );
}
/*
// Print the blocks used for each page of a few individual superblocks.
for ( uint32_t i = 0; i < num_sb_id; ++i ) {
uint32_t lg_block_size = host_sb_header(sb_id[i]).m_lg_block_size;
+
if ( lg_block_size != 0 ) {
printf( "SB_ID BLOCK ID USED_BLOCKS\n" );
uint32_t block_size_id = lg_block_size - LG_MIN_BLOCK_SIZE;
uint32_t pages_per_sb = m_blocksize_info[block_size_id].m_pages_per_sb;
for ( uint32_t j = 0; j < pages_per_sb; ++j ) {
unsigned start_pos = ( sb_id[i] << m_lg_max_sb_blocks ) + j * BLOCKS_PER_PAGE;
unsigned end_pos = start_pos + BLOCKS_PER_PAGE;
uint32_t num_allocated_blocks = 0;
for ( unsigned k = start_pos; k < end_pos; ++k ) {
num_allocated_blocks += m_sb_blocks.test( k );
}
printf( "%5u %8u %11u\n", sb_id[i], j, num_allocated_blocks );
}
printf( "\n" );
}
}
*/
#endif
printf( " Used blocks: %10u / %10u = %10.6lf\n", used_blocks, total_blocks,
- percent_used_blocks );
+ percent_used_blocks );
printf( " Used pages: %10u / %10u = %10.6lf\n", used_pages, total_pages,
- percent_used_pages );
+ percent_used_pages );
printf( " Used SB: %10zu / %10zu = %10.6lf\n", m_num_sb - num_empty_sb, m_num_sb,
- percent_used_sb );
+ percent_used_sb );
printf( " Active SB: %10u\n", num_active_sb );
printf( " Empty SB: %10u\n", num_empty_sb );
printf( " Partfull SB: %10u\n", num_partfull_sb );
printf( " Full SB: %10lu\n",
- m_num_sb - num_active_sb - num_empty_sb - num_partfull_sb );
+ m_num_sb - num_active_sb - num_empty_sb - num_partfull_sb );
printf( "Ave. SB Full %%: %10.6lf\n", ave_sb_full );
printf( "\n" );
fflush( stdout );
#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
fflush( stdout );
#endif
}
KOKKOS_INLINE_FUNCTION
size_t get_min_block_size() const { return MIN_BLOCK_SIZE; }
size_t get_mem_size() const { return m_data_size; }
private:
/// \brief Returns the index into the active array for the given size.
///
/// Computes log2 of the largest power of two >= the given size
/// ( ie ceil( log2(size) ) ) shifted by LG_MIN_BLOCK_SIZE.
KOKKOS_FORCEINLINE_FUNCTION
int get_block_size_index( const size_t size ) const
{
// We know the size fits in a 32 bit unsigned because the size of a
// superblock is limited to 2^31, so casting to an unsigned is safe.
// Find the most significant nonzero bit.
uint32_t first_nonzero_bit =
Kokkos::Impl::bit_scan_reverse( static_cast<unsigned>( size ) );
// If size is an integral power of 2, ceil( log2(size) ) is equal to the
// most significant nonzero bit. Otherwise, you need to add 1. Since the
// minimum block size is MIN_BLOCK_SIZE, make sure ceil( log2(size) ) is at
// least LG_MIN_BLOCK_SIZE.
uint32_t lg2_size = first_nonzero_bit + !Kokkos::Impl::is_integral_power_of_two( size );
lg2_size = lg2_size > LG_MIN_BLOCK_SIZE ? lg2_size : LG_MIN_BLOCK_SIZE;
// Return ceil( log2(size) ) shifted so that the value for MIN_BLOCK_SIZE
// is 0.
return lg2_size - LG_MIN_BLOCK_SIZE;
}
/// \brief Finds a superblock with free space to become a new active superblock.
///
/// If this function is called, the current active superblock needs to be replaced
/// because it is full. Initially, only the thread that sets the active superblock
/// to full calls this function. Other threads can still allocate from the "full"
/// active superblock because a full superblock still has locations available. If
/// a thread tries to allocate from the active superblock when it has no free
/// locations, then that thread will call this function, too, and spin on a lock
/// waiting until the active superblock has been replaced.
KOKKOS_FUNCTION
uint32_t find_superblock( int block_size_id, uint32_t old_sb ) const
{
// Try to grab the lock on the head.
uint32_t lock_sb =
Kokkos::atomic_compare_exchange( &m_active(block_size_id), old_sb, SUPERBLOCK_LOCK );
+ load_fence();
+
// Initialize the new superblock to be the previous one so the previous
// superblock is returned if a new superblock can't be found.
uint32_t new_sb = lock_sb;
if ( lock_sb == old_sb ) {
// This thread has the lock.
// 1. Look for a partially filled superblock that is of the right block
// size.
size_t max_tries = m_ceil_num_sb >> LG_BLOCKS_PER_PAGE;
size_t tries = 0;
bool search_done = false;
// Set the starting search position to the beginning of this block
// size's bitset.
unsigned pos = block_size_id * m_ceil_num_sb;
- while (!search_done) {
+ while ( !search_done ) {
bool success = false;
- unsigned prev_val;
+ unsigned prev_val = 0;
- Kokkos::tie( success, pos ) = m_partfull_sb.reset_any_in_word( pos, prev_val );
+ Kokkos::tie( success, prev_val ) = m_partfull_sb.reset_any_in_word( pos );
if ( !success ) {
if ( ++tries >= max_tries ) {
// Exceeded number of words for this block size's bitset.
search_done = true;
}
else {
pos += BLOCKS_PER_PAGE;
}
}
else {
// Found a superblock.
+
+ // It is possible that the newly found superblock is the same as the
+ // old superblock. In this case putting the old value back in yields
+ // correct behavior. This could happen as follows. This thread
+ // grabs the lock and transitions the superblock to the full state.
+ // Before it searches for a new superblock, other threads perform
+ // enough deallocations to transition the superblock to the partially
+ // full state. This thread then searches for a partially full
+ // superblock and finds the one it removed. There's potential for
+ // this to cause a performance issue if the same superblock keeps
+ // being removed and added due to the right mix and ordering of
+ // allocations and deallocations.
search_done = true;
new_sb = pos - block_size_id * m_ceil_num_sb;
- // Assertions:
- // 1. A different superblock than the current should be found.
-#ifdef KOKKOS_MEMPOOL_PRINTERR
- if ( new_sb == lock_sb ) {
- printf( "\n** MemoryPool::find_superblock() FOUND_SAME_SUPERBLOCK: %u **\n",
- new_sb);
-#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
- fflush( stdout );
-#endif
- Kokkos::abort( "" );
- }
-#endif
-
// Set the head status for the superblock.
volatile_store( &m_sb_header(new_sb).m_is_active, uint32_t(true) );
// If there was a previous active superblock, mark it as not active.
// It is now in the full category and as such isn't tracked.
if ( lock_sb != INVALID_SUPERBLOCK ) {
volatile_store( &m_sb_header(lock_sb).m_is_active, uint32_t(false) );
}
- memory_fence();
+ store_fence();
}
}
// 2. Look for an empty superblock.
if ( new_sb == lock_sb ) {
tries = 0;
search_done = false;
// Set the starting search position to the beginning of this block
// size's bitset.
pos = 0;
- while (!search_done) {
+ while ( !search_done ) {
bool success = false;
- unsigned prev_val;
+ unsigned prev_val = 0;
- Kokkos::tie( success, pos ) = m_empty_sb.reset_any_in_word( pos, prev_val );
+ Kokkos::tie( success, prev_val ) = m_empty_sb.reset_any_in_word( pos );
if ( !success ) {
if ( ++tries >= max_tries ) {
// Exceeded number of words for this block size's bitset.
search_done = true;
}
else {
pos += BLOCKS_PER_PAGE;
}
}
else {
// Found a superblock.
+
+ // It is possible that the newly found superblock is the same as
+ // the old superblock. In this case putting the old value back in
+ // yields correct behavior. This could happen as follows. This
+ // thread grabs the lock and transitions the superblock to the full
+ // state. Before it searches for a new superblock, other threads
+ // perform enough deallocations to transition the superblock to the
+ // partially full state and then the empty state. This thread then
+ // searches for a partially full superblock and none exist. This
+ // thread then searches for an empty superblock and finds the one
+ // it removed. The likelihood of this happening is so remote that
+ // the potential for this to cause a performance issue is
+ // infinitesimal.
search_done = true;
new_sb = pos;
- // Assertions:
- // 1. A different superblock than the current should be found.
-#ifdef KOKKOS_MEMPOOL_PRINTERR
- if ( new_sb == lock_sb ) {
- printf( "\n** MemoryPool::find_superblock() FOUND_SAME_SUPERBLOCK: %u **\n",
- new_sb);
-#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
- fflush( stdout );
-#endif
- Kokkos::abort( "" );
- }
-#endif
-
// Set the empty pages, block size, and head status for the
// superblock.
volatile_store( &m_sb_header(new_sb).m_empty_pages,
m_blocksize_info[block_size_id].m_pages_per_sb );
volatile_store( &m_sb_header(new_sb).m_lg_block_size,
block_size_id + LG_MIN_BLOCK_SIZE );
volatile_store( &m_sb_header(new_sb).m_is_active, uint32_t(true) );
// If there was a previous active superblock, mark it as not active.
// It is now in the full category and as such isn't tracked.
if ( lock_sb != INVALID_SUPERBLOCK ) {
volatile_store( &m_sb_header(lock_sb).m_is_active, uint32_t(false) );
}
- memory_fence();
+ store_fence();
}
}
}
// Write the new active superblock to release the lock.
atomic_exchange( &m_active(block_size_id), new_sb );
}
else {
- // Either another thread has the lock and is switching the active superblock for
- // this block size or another thread has already changed the active superblock
- // since this thread read its value. Keep reading the active superblock until
- // it isn't locked to get the new active superblock.
+ // Either another thread has the lock and is switching the active
+ // superblock for this block size or another thread has already changed
+ // the active superblock since this thread read its value. Keep
+ // atomically reading the active superblock until it isn't locked to get
+ // the new active superblock.
do {
- new_sb = volatile_load( &m_active(block_size_id) );
+ new_sb = atomic_fetch_or( &m_active(block_size_id), uint32_t(0) );
} while ( new_sb == SUPERBLOCK_LOCK );
+ load_fence();
+
// Assertions:
// 1. An invalid superblock should never be found here.
// 2. If the new superblock is the same as the previous superblock, the
// allocator is empty.
#ifdef KOKKOS_MEMPOOL_PRINTERR
if ( new_sb == INVALID_SUPERBLOCK ) {
printf( "\n** MemoryPool::find_superblock() FOUND_INACTIVE_SUPERBLOCK **\n" );
#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
fflush( stdout );
#endif
Kokkos::abort( "" );
}
#endif
}
return new_sb;
}
/// Returns 64 bits from a clock register.
KOKKOS_FORCEINLINE_FUNCTION
uint64_t get_clock_register(void) const
{
#if defined( __CUDA_ARCH__ )
// Return value of 64-bit hi-res clock register.
- return clock64();
+ return clock64();
#elif defined( __i386__ ) || defined( __x86_64 )
// Return value of 64-bit hi-res clock register.
- unsigned a, d;
- __asm__ volatile("rdtsc" : "=a" (a), "=d" (d));
- return ( (uint64_t) a) | ( ( (uint64_t) d ) << 32 );
+ unsigned a = 0, d = 0;
+
+ __asm__ volatile( "rdtsc" : "=a" (a), "=d" (d) );
+
+ return ( (uint64_t) a ) | ( ( (uint64_t) d ) << 32 );
+#elif defined( __powerpc ) || defined( __powerpc__ ) || defined( __powerpc64__ ) || \
+ defined( __POWERPC__ ) || defined( __ppc__ ) || defined( __ppc64__ )
+ unsigned int cycles = 0;
+
+ asm volatile( "mftb %0" : "=r" (cycles) );
+
+ return (uint64_t) cycles;
#else
- const uint64_t ticks = std::chrono::high_resolution_clock::now().time_since_epoch().count();
+ const uint64_t ticks =
+ std::chrono::high_resolution_clock::now().time_since_epoch().count();
+
return ticks;
#endif
}
};
} // namespace Experimental
} // namespace Kokkos
#ifdef KOKKOS_MEMPOOL_PRINTERR
#undef KOKKOS_MEMPOOL_PRINTERR
#endif
#ifdef KOKKOS_MEMPOOL_PRINT_INFO
#undef KOKKOS_MEMPOOL_PRINT_INFO
#endif
#ifdef KOKKOS_MEMPOOL_PRINT_BLOCKSIZE_INFO
#undef KOKKOS_MEMPOOL_PRINT_BLOCKSIZE_INFO
#endif
#ifdef KOKKOS_MEMPOOL_PRINT_SUPERBLOCK_INFO
#undef KOKKOS_MEMPOOL_PRINT_SUPERBLOCK_INFO
#endif
#ifdef KOKKOS_MEMPOOL_PRINT_PAGE_INFO
#undef KOKKOS_MEMPOOL_PRINT_PAGE_INFO
#endif
#ifdef KOKKOS_MEMPOOL_PRINT_INDIVIDUAL_PAGE_INFO
#undef KOKKOS_MEMPOOL_PRINT_INDIVIDUAL_PAGE_INFO
#endif
-#undef KOKKOS_MEMPOOL_SB_FULL_FRACTION
-#undef KOKKOS_MEMPOOL_PAGE_FULL_FRACTION
-
#endif // KOKKOS_MEMORYPOOL_HPP
diff --git a/lib/kokkos/core/src/Kokkos_MemoryTraits.hpp b/lib/kokkos/core/src/Kokkos_MemoryTraits.hpp
index 5ee1f16fe..94b58b8af 100644
--- a/lib/kokkos/core/src/Kokkos_MemoryTraits.hpp
+++ b/lib/kokkos/core/src/Kokkos_MemoryTraits.hpp
@@ -1,116 +1,120 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_MEMORYTRAITS_HPP
#define KOKKOS_MEMORYTRAITS_HPP
#include <impl/Kokkos_Traits.hpp>
#include <impl/Kokkos_Tags.hpp>
//----------------------------------------------------------------------------
namespace Kokkos {
/** \brief Memory access traits for views, an extension point.
*
* These traits should be orthogonal. If there are dependencies then
* the MemoryTraits template must detect and enforce dependencies.
*
* A zero value is the default for a View, indicating that none of
* these traits are present.
*/
enum MemoryTraitsFlags
{ Unmanaged = 0x01
, RandomAccess = 0x02
, Atomic = 0x04
+ , Restrict = 0x08
+ , Aligned = 0x10
};
template < unsigned T >
struct MemoryTraits {
//! Tag this class as a kokkos memory traits:
typedef MemoryTraits memory_traits ;
enum { Unmanaged = T & unsigned(Kokkos::Unmanaged) };
enum { RandomAccess = T & unsigned(Kokkos::RandomAccess) };
enum { Atomic = T & unsigned(Kokkos::Atomic) };
+ enum { Restrict = T & unsigned(Kokkos::Restrict) };
+ enum { Aligned = T & unsigned(Kokkos::Aligned) };
};
} // namespace Kokkos
//----------------------------------------------------------------------------
namespace Kokkos {
typedef Kokkos::MemoryTraits<0> MemoryManaged ;
typedef Kokkos::MemoryTraits< Kokkos::Unmanaged > MemoryUnmanaged ;
typedef Kokkos::MemoryTraits< Kokkos::Unmanaged | Kokkos::RandomAccess > MemoryRandomAccess ;
} // namespace Kokkos
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
/** \brief Memory alignment settings
*
* Sets global value for memory alignment. Must be a power of two!
* Enable compatibility of views from different devices with static stride.
* Use compiler flag to enable overwrites.
*/
enum { MEMORY_ALIGNMENT =
#if defined( KOKKOS_MEMORY_ALIGNMENT )
( 1 << Kokkos::Impl::integral_power_of_two( KOKKOS_MEMORY_ALIGNMENT ) )
#else
( 1 << Kokkos::Impl::integral_power_of_two( 128 ) )
#endif
, MEMORY_ALIGNMENT_THRESHOLD = 4
};
} //namespace Impl
} // namespace Kokkos
#endif /* #ifndef KOKKOS_MEMORYTRAITS_HPP */
diff --git a/lib/kokkos/core/src/Kokkos_OpenMP.hpp b/lib/kokkos/core/src/Kokkos_OpenMP.hpp
index 7be4f8245..0e6c6d84f 100644
--- a/lib/kokkos/core/src/Kokkos_OpenMP.hpp
+++ b/lib/kokkos/core/src/Kokkos_OpenMP.hpp
@@ -1,189 +1,200 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_OPENMP_HPP
#define KOKKOS_OPENMP_HPP
#include <Kokkos_Core_fwd.hpp>
#if defined( KOKKOS_HAVE_OPENMP ) && defined( _OPENMP )
#include <omp.h>
#include <cstddef>
#include <iosfwd>
#include <Kokkos_HostSpace.hpp>
#ifdef KOKKOS_HAVE_HBWSPACE
#include <Kokkos_HBWSpace.hpp>
#endif
#include <Kokkos_ScratchSpace.hpp>
#include <Kokkos_Parallel.hpp>
-#include <Kokkos_TaskPolicy.hpp>
+#include <Kokkos_TaskScheduler.hpp>
#include <Kokkos_Layout.hpp>
#include <impl/Kokkos_Tags.hpp>
#include <KokkosExp_MDRangePolicy.hpp>
/*--------------------------------------------------------------------------*/
namespace Kokkos {
/// \class OpenMP
/// \brief Kokkos device for multicore processors in the host memory space.
class OpenMP {
public:
//------------------------------------
//! \name Type declarations that all Kokkos devices must provide.
//@{
//! Tag this class as a kokkos execution space
typedef OpenMP execution_space ;
#ifdef KOKKOS_HAVE_HBWSPACE
typedef Experimental::HBWSpace memory_space ;
#else
typedef HostSpace memory_space ;
#endif
//! This execution space preferred device_type
typedef Kokkos::Device<execution_space,memory_space> device_type;
typedef LayoutRight array_layout ;
typedef memory_space::size_type size_type ;
typedef ScratchMemorySpace< OpenMP > scratch_memory_space ;
//@}
//------------------------------------
//! \name Functions that all Kokkos execution spaces must implement.
//@{
inline static bool in_parallel() { return omp_in_parallel(); }
/** \brief Set the device in a "sleep" state. A noop for OpenMP. */
static bool sleep();
/** \brief Wake the device from the 'sleep' state. A noop for OpenMP. */
static bool wake();
/** \brief Wait until all dispatched functors complete. A noop for OpenMP. */
static void fence() {}
/// \brief Print configuration information to the given output stream.
static void print_configuration( std::ostream & , const bool detail = false );
/// \brief Free any resources being consumed by the device.
static void finalize();
/** \brief Initialize the device.
*
* 1) If the hardware locality library is enabled and OpenMP has not
* already bound threads then bind OpenMP threads to maximize
* core utilization and group for memory hierarchy locality.
*
* 2) Allocate a HostThread for each OpenMP thread to hold its
* topology and fan in/out data.
*/
static void initialize( unsigned thread_count = 0 ,
unsigned use_numa_count = 0 ,
unsigned use_cores_per_numa = 0 );
static int is_initialized();
/** \brief Return the maximum amount of concurrency. */
static int concurrency();
//@}
//------------------------------------
/** \brief This execution space has a topological thread pool which can be queried.
*
* All threads within a pool have a common memory space for which they are cache coherent.
* depth = 0 gives the number of threads in the whole pool.
* depth = 1 gives the number of threads in a NUMA region, typically sharing L3 cache.
* depth = 2 gives the number of threads at the finest granularity, typically sharing L1 cache.
*/
inline static int thread_pool_size( int depth = 0 );
/** \brief The rank of the executing thread in this thread pool */
KOKKOS_INLINE_FUNCTION static int thread_pool_rank();
//------------------------------------
inline static unsigned max_hardware_threads() { return thread_pool_size(0); }
KOKKOS_INLINE_FUNCTION static
unsigned hardware_thread_id() { return thread_pool_rank(); }
};
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
namespace Kokkos {
namespace Impl {
+template<>
+struct MemorySpaceAccess
+ < Kokkos::OpenMP::memory_space
+ , Kokkos::OpenMP::scratch_memory_space
+ >
+{
+ enum { assignable = false };
+ enum { accessible = true };
+ enum { deepcopy = false };
+};
+
template<>
struct VerifyExecutionCanAccessMemorySpace
< Kokkos::OpenMP::memory_space
, Kokkos::OpenMP::scratch_memory_space
>
{
enum { value = true };
inline static void verify( void ) { }
inline static void verify( const void * ) { }
};
} // namespace Impl
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
#include <OpenMP/Kokkos_OpenMPexec.hpp>
#include <OpenMP/Kokkos_OpenMP_Parallel.hpp>
#include <OpenMP/Kokkos_OpenMP_Task.hpp>
/*--------------------------------------------------------------------------*/
#endif /* #if defined( KOKKOS_HAVE_OPENMP ) && defined( _OPENMP ) */
#endif /* #ifndef KOKKOS_OPENMP_HPP */
diff --git a/lib/kokkos/core/src/Kokkos_Parallel_Reduce.hpp b/lib/kokkos/core/src/Kokkos_Parallel_Reduce.hpp
index 695bc79a1..3a73e8a81 100644
--- a/lib/kokkos/core/src/Kokkos_Parallel_Reduce.hpp
+++ b/lib/kokkos/core/src/Kokkos_Parallel_Reduce.hpp
@@ -1,1240 +1,1356 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
namespace Kokkos {
template<class T, class Enable = void>
struct is_reducer_type {
enum { value = 0 };
};
template<class T>
struct is_reducer_type<T,typename std::enable_if<
- std::is_same<T,typename T::reducer_type>::value
+ std::is_same<typename std::remove_cv<T>::type,
+ typename std::remove_cv<typename T::reducer_type>::type>::value
>::type> {
enum { value = 1 };
};
namespace Experimental {
template<class Scalar,class Space = HostSpace>
struct Sum {
public:
//Required
typedef Sum reducer_type;
typedef Scalar value_type;
typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;
value_type init_value;
private:
result_view_type result;
template<class ValueType, bool is_arithmetic = std::is_arithmetic<ValueType>::value >
struct InitWrapper;
template<class ValueType >
struct InitWrapper<ValueType,true> {
static ValueType value() {
return static_cast<value_type>(0);
}
};
template<class ValueType >
struct InitWrapper<ValueType,false> {
static ValueType value() {
return value_type();
}
};
public:
Sum(value_type& result_):
init_value(InitWrapper<value_type>::value()),result(&result_) {}
Sum(const result_view_type& result_):
init_value(InitWrapper<value_type>::value()),result(result_) {}
Sum(value_type& result_, const value_type& init_value_):
init_value(init_value_),result(&result_) {}
Sum(const result_view_type& result_, const value_type& init_value_):
init_value(init_value_),result(result_) {}
//Required
KOKKOS_INLINE_FUNCTION
void join(value_type& dest, const value_type& src) const {
dest += src;
}
KOKKOS_INLINE_FUNCTION
void join(volatile value_type& dest, const volatile value_type& src) const {
dest += src;
}
//Optional
KOKKOS_INLINE_FUNCTION
void init( value_type& val) const {
val = init_value;
}
result_view_type result_view() const {
return result;
}
};
template<class Scalar,class Space = HostSpace>
struct Prod {
public:
//Required
typedef Prod reducer_type;
typedef Scalar value_type;
typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;
value_type init_value;
private:
result_view_type result;
template<class ValueType, bool is_arithmetic = std::is_arithmetic<ValueType>::value >
struct InitWrapper;
template<class ValueType >
struct InitWrapper<ValueType,true> {
static ValueType value() {
return static_cast<value_type>(1);
}
};
template<class ValueType >
struct InitWrapper<ValueType,false> {
static ValueType value() {
return value_type();
}
};
public:
Prod(value_type& result_):
init_value(InitWrapper<value_type>::value()),result(&result_) {}
Prod(const result_view_type& result_):
init_value(InitWrapper<value_type>::value()),result(result_) {}
Prod(value_type& result_, const value_type& init_value_):
init_value(init_value_),result(&result_) {}
Prod(const result_view_type& result_, const value_type& init_value_):
init_value(init_value_),result(result_) {}
//Required
KOKKOS_INLINE_FUNCTION
void join(value_type& dest, const value_type& src) const {
dest *= src;
}
KOKKOS_INLINE_FUNCTION
void join(volatile value_type& dest, const volatile value_type& src) const {
dest *= src;
}
//Optional
KOKKOS_INLINE_FUNCTION
void init( value_type& val) const {
val = init_value;
}
result_view_type result_view() const {
return result;
}
};
template<class Scalar, class Space = HostSpace>
struct Min {
public:
//Required
typedef Min reducer_type;
typedef typename std::remove_cv<Scalar>::type value_type;
typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;
value_type init_value;
private:
result_view_type result;
template<class ValueType, bool is_arithmetic = std::is_arithmetic<ValueType>::value >
struct InitWrapper;
template<class ValueType >
struct InitWrapper<ValueType,true> {
static ValueType value() {
return std::numeric_limits<value_type>::max();
}
};
template<class ValueType >
struct InitWrapper<ValueType,false> {
static ValueType value() {
return value_type();
}
};
public:
Min(value_type& result_):
init_value(InitWrapper<value_type>::value()),result(&result_) {}
Min(const result_view_type& result_):
init_value(InitWrapper<value_type>::value()),result(result_) {}
Min(value_type& result_, const value_type& init_value_):
init_value(init_value_),result(&result_) {}
Min(const result_view_type& result_, const value_type& init_value_):
init_value(init_value_),result(result_) {}
//Required
KOKKOS_INLINE_FUNCTION
void join(value_type& dest, const value_type& src) const {
if ( src < dest )
dest = src;
}
KOKKOS_INLINE_FUNCTION
void join(volatile value_type& dest, const volatile value_type& src) const {
if ( src < dest )
dest = src;
}
//Optional
KOKKOS_INLINE_FUNCTION
void init( value_type& val) const {
val = init_value;
}
result_view_type result_view() const {
return result;
}
};
template<class Scalar, class Space = HostSpace>
struct Max {
public:
//Required
typedef Max reducer_type;
typedef typename std::remove_cv<Scalar>::type value_type;
typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;
value_type init_value;
private:
result_view_type result;
template<class ValueType, bool is_arithmetic = std::is_arithmetic<ValueType>::value >
struct InitWrapper;
template<class ValueType >
struct InitWrapper<ValueType,true> {
static ValueType value() {
return std::numeric_limits<value_type>::min();
}
};
template<class ValueType >
struct InitWrapper<ValueType,false> {
static ValueType value() {
return value_type();
}
};
public:
Max(value_type& result_):
init_value(InitWrapper<value_type>::value()),result(&result_) {}
Max(const result_view_type& result_):
init_value(InitWrapper<value_type>::value()),result(result_) {}
Max(value_type& result_, const value_type& init_value_):
init_value(init_value_),result(&result_) {}
Max(const result_view_type& result_, const value_type& init_value_):
init_value(init_value_),result(result_) {}
//Required
KOKKOS_INLINE_FUNCTION
void join(value_type& dest, const value_type& src) const {
if ( src > dest )
dest = src;
}
KOKKOS_INLINE_FUNCTION
void join(volatile value_type& dest, const volatile value_type& src) const {
if ( src > dest )
dest = src;
}
//Optional
KOKKOS_INLINE_FUNCTION
void init( value_type& val) const {
val = init_value;
}
result_view_type result_view() const {
return result;
}
};
template<class Scalar, class Space = HostSpace>
struct LAnd {
public:
//Required
typedef LAnd reducer_type;
typedef Scalar value_type;
typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;
private:
result_view_type result;
public:
LAnd(value_type& result_):result(&result_) {}
LAnd(const result_view_type& result_):result(result_) {}
//Required
KOKKOS_INLINE_FUNCTION
void join(value_type& dest, const value_type& src) const {
dest = dest && src;
}
KOKKOS_INLINE_FUNCTION
void join(volatile value_type& dest, const volatile value_type& src) const {
dest = dest && src;
}
//Optional
KOKKOS_INLINE_FUNCTION
void init( value_type& val) const {
val = 1;
}
result_view_type result_view() const {
return result;
}
};
template<class Scalar, class Space = HostSpace>
struct LOr {
public:
//Required
typedef LOr reducer_type;
typedef Scalar value_type;
typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;
private:
result_view_type result;
public:
LOr(value_type& result_):result(&result_) {}
LOr(const result_view_type& result_):result(result_) {}
//Required
KOKKOS_INLINE_FUNCTION
void join(value_type& dest, const value_type& src) const {
dest = dest || src;
}
KOKKOS_INLINE_FUNCTION
void join(volatile value_type& dest, const volatile value_type& src) const {
dest = dest || src;
}
//Optional
KOKKOS_INLINE_FUNCTION
void init( value_type& val) const {
val = 0;
}
result_view_type result_view() const {
return result;
}
};
template<class Scalar, class Space = HostSpace>
struct LXor {
public:
//Required
typedef LXor reducer_type;
typedef Scalar value_type;
typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;
private:
result_view_type result;
public:
LXor(value_type& result_):result(&result_) {}
LXor(const result_view_type& result_):result(result_) {}
//Required
KOKKOS_INLINE_FUNCTION
void join(value_type& dest, const value_type& src) const {
dest = dest? (!src) : src;
}
KOKKOS_INLINE_FUNCTION
void join(volatile value_type& dest, const volatile value_type& src) const {
dest = dest? (!src) : src;
}
//Optional
KOKKOS_INLINE_FUNCTION
void init( value_type& val) const {
val = 0;
}
result_view_type result_view() const {
return result;
}
};
template<class Scalar, class Space = HostSpace>
struct BAnd {
public:
//Required
typedef BAnd reducer_type;
typedef typename std::remove_cv<Scalar>::type value_type;
typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;
value_type init_value;
private:
result_view_type result;
public:
BAnd(value_type& result_):
init_value(value_type() | (~value_type())),result(&result_) {}
BAnd(const result_view_type& result_):
init_value(value_type() | (~value_type())),result(result_) {}
//Required
KOKKOS_INLINE_FUNCTION
void join(value_type& dest, const value_type& src) const {
dest = dest & src;
}
KOKKOS_INLINE_FUNCTION
void join(volatile value_type& dest, const volatile value_type& src) const {
dest = dest & src;
}
//Optional
KOKKOS_INLINE_FUNCTION
void init( value_type& val) const {
val = init_value;
}
result_view_type result_view() const {
return result;
}
};
template<class Scalar, class Space = HostSpace>
struct BOr {
public:
//Required
typedef BOr reducer_type;
typedef typename std::remove_cv<Scalar>::type value_type;
typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;
value_type init_value;
private:
result_view_type result;
public:
BOr(value_type& result_):
init_value(value_type() & (~value_type())),result(&result_) {}
BOr(const result_view_type& result_):
init_value(value_type() & (~value_type())),result(result_) {}
//Required
KOKKOS_INLINE_FUNCTION
void join(value_type& dest, const value_type& src) const {
dest = dest | src;
}
KOKKOS_INLINE_FUNCTION
void join(volatile value_type& dest, const volatile value_type& src) const {
dest = dest | src;
}
//Optional
KOKKOS_INLINE_FUNCTION
void init( value_type& val) const {
val = init_value;
}
result_view_type result_view() const {
return result;
}
};
template<class Scalar, class Space = HostSpace>
struct BXor {
public:
//Required
typedef BXor reducer_type;
typedef typename std::remove_cv<Scalar>::type value_type;
typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;
value_type init_value;
private:
result_view_type result;
public:
BXor(value_type& result_):
init_value(value_type() & (~value_type())),result(&result_) {}
BXor(const result_view_type& result_):
init_value(value_type() & (~value_type())),result(result_) {}
//Required
KOKKOS_INLINE_FUNCTION
void join(value_type& dest, const value_type& src) const {
dest = dest ^ src;
}
KOKKOS_INLINE_FUNCTION
void join(volatile value_type& dest, const volatile value_type& src) const {
dest = dest ^ src;
}
//Optional
KOKKOS_INLINE_FUNCTION
void init( value_type& val) const {
val = init_value;
}
result_view_type result_view() const {
return result;
}
};
template<class Scalar, class Index>
struct ValLocScalar {
Scalar val;
Index loc;
KOKKOS_INLINE_FUNCTION
void operator = (const ValLocScalar& rhs) {
val = rhs.val;
loc = rhs.loc;
}
KOKKOS_INLINE_FUNCTION
void operator = (const volatile ValLocScalar& rhs) volatile {
val = rhs.val;
loc = rhs.loc;
}
};
template<class Scalar, class Index, class Space = HostSpace>
struct MinLoc {
private:
typedef typename std::remove_cv<Scalar>::type scalar_type;
typedef typename std::remove_cv<Index>::type index_type;
public:
//Required
typedef MinLoc reducer_type;
typedef ValLocScalar<scalar_type,index_type> value_type;
typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;
scalar_type init_value;
private:
result_view_type result;
template<class ValueType, bool is_arithmetic = std::is_arithmetic<ValueType>::value >
struct InitWrapper;
template<class ValueType >
struct InitWrapper<ValueType,true> {
static ValueType value() {
return std::numeric_limits<scalar_type>::max();
}
};
template<class ValueType >
struct InitWrapper<ValueType,false> {
static ValueType value() {
return scalar_type();
}
};
public:
MinLoc(value_type& result_):
init_value(InitWrapper<scalar_type>::value()),result(&result_) {}
MinLoc(const result_view_type& result_):
init_value(InitWrapper<scalar_type>::value()),result(result_) {}
MinLoc(value_type& result_, const scalar_type& init_value_):
init_value(init_value_),result(&result_) {}
MinLoc(const result_view_type& result_, const scalar_type& init_value_):
init_value(init_value_),result(result_) {}
//Required
KOKKOS_INLINE_FUNCTION
void join(value_type& dest, const value_type& src) const {
if ( src.val < dest.val )
dest = src;
}
KOKKOS_INLINE_FUNCTION
void join(volatile value_type& dest, const volatile value_type& src) const {
if ( src.val < dest.val )
dest = src;
}
//Optional
KOKKOS_INLINE_FUNCTION
void init( value_type& val) const {
val.val = init_value;
}
result_view_type result_view() const {
return result;
}
};
template<class Scalar, class Index, class Space = HostSpace>
struct MaxLoc {
private:
typedef typename std::remove_cv<Scalar>::type scalar_type;
typedef typename std::remove_cv<Index>::type index_type;
public:
//Required
typedef MaxLoc reducer_type;
typedef ValLocScalar<scalar_type,index_type> value_type;
typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;
scalar_type init_value;
private:
result_view_type result;
template<class ValueType, bool is_arithmetic = std::is_arithmetic<ValueType>::value >
struct InitWrapper;
template<class ValueType >
struct InitWrapper<ValueType,true> {
static ValueType value() {
return std::numeric_limits<scalar_type>::min();
}
};
template<class ValueType >
struct InitWrapper<ValueType,false> {
static ValueType value() {
return scalar_type();
}
};
public:
MaxLoc(value_type& result_):
init_value(InitWrapper<scalar_type>::value()),result(&result_) {}
MaxLoc(const result_view_type& result_):
init_value(InitWrapper<scalar_type>::value()),result(result_) {}
MaxLoc(value_type& result_, const scalar_type& init_value_):
init_value(init_value_),result(&result_) {}
MaxLoc(const result_view_type& result_, const scalar_type& init_value_):
init_value(init_value_),result(result_) {}
//Required
KOKKOS_INLINE_FUNCTION
void join(value_type& dest, const value_type& src) const {
if ( src.val > dest.val )
dest = src;
}
KOKKOS_INLINE_FUNCTION
void join(volatile value_type& dest, const volatile value_type& src) const {
if ( src.val > dest.val )
dest = src;
}
//Optional
KOKKOS_INLINE_FUNCTION
void init( value_type& val) const {
val.val = init_value;
}
result_view_type result_view() const {
return result;
}
};
+template<class Scalar>
+struct MinMaxScalar {
+ Scalar min_val,max_val;
+
+ KOKKOS_INLINE_FUNCTION
+ void operator = (const MinMaxScalar& rhs) {
+ min_val = rhs.min_val;
+ max_val = rhs.max_val;
+ }
+
+ KOKKOS_INLINE_FUNCTION
+ void operator = (const volatile MinMaxScalar& rhs) volatile {
+ min_val = rhs.min_val;
+ max_val = rhs.max_val;
+ }
+};
+
+template<class Scalar, class Space = HostSpace>
+struct MinMax {
+private:
+ typedef typename std::remove_cv<Scalar>::type scalar_type;
+
+public:
+ //Required
+ typedef MinMax reducer_type;
+ typedef MinMaxScalar<scalar_type> value_type;
+
+ typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;
+
+ scalar_type min_init_value;
+ scalar_type max_init_value;
+
+private:
+ result_view_type result;
+
+ template<class ValueType, bool is_arithmetic = std::is_arithmetic<ValueType>::value >
+ struct MinInitWrapper;
+
+ template<class ValueType >
+ struct MinInitWrapper<ValueType,true> {
+ static ValueType value() {
+ return std::numeric_limits<scalar_type>::max();
+ }
+ };
+
+ template<class ValueType >
+ struct MinInitWrapper<ValueType,false> {
+ static ValueType value() {
+ return scalar_type();
+ }
+ };
+
+ template<class ValueType, bool is_arithmetic = std::is_arithmetic<ValueType>::value >
+ struct MaxInitWrapper;
+
+ template<class ValueType >
+ struct MaxInitWrapper<ValueType,true> {
+ static ValueType value() {
+ return std::numeric_limits<scalar_type>::min();
+ }
+ };
+
+ template<class ValueType >
+ struct MaxInitWrapper<ValueType,false> {
+ static ValueType value() {
+ return scalar_type();
+ }
+ };
+
+public:
+
+ MinMax(value_type& result_):
+ min_init_value(MinInitWrapper<scalar_type>::value()),max_init_value(MaxInitWrapper<scalar_type>::value()),result(&result_) {}
+ MinMax(const result_view_type& result_):
+ min_init_value(MinInitWrapper<scalar_type>::value()),max_init_value(MaxInitWrapper<scalar_type>::value()),result(result_) {}
+ MinMax(value_type& result_, const scalar_type& min_init_value_, const scalar_type& max_init_value_):
+ min_init_value(min_init_value_),max_init_value(max_init_value_),result(&result_) {}
+ MinMax(const result_view_type& result_, const scalar_type& min_init_value_, const scalar_type& max_init_value_):
+ min_init_value(min_init_value_),max_init_value(max_init_value_),result(result_) {}
+
+ //Required
+ KOKKOS_INLINE_FUNCTION
+ void join(value_type& dest, const value_type& src) const {
+ if ( src.min_val < dest.min_val ) {
+ dest.min_val = src.min_val;
+ }
+ if ( src.max_val > dest.max_val ) {
+ dest.max_val = src.max_val;
+ }
+ }
+
+ KOKKOS_INLINE_FUNCTION
+ void join(volatile value_type& dest, const volatile value_type& src) const {
+ if ( src.min_val < dest.min_val ) {
+ dest.min_val = src.min_val;
+ }
+ if ( src.max_val > dest.max_val ) {
+ dest.max_val = src.max_val;
+ }
+ }
+
+ //Optional
+ KOKKOS_INLINE_FUNCTION
+ void init( value_type& val) const {
+ val.min_val = min_init_value;
+ val.max_val = max_init_value;
+ }
+
+ result_view_type result_view() const {
+ return result;
+ }
+};
+
template<class Scalar, class Index>
struct MinMaxLocScalar {
Scalar min_val,max_val;
Index min_loc,max_loc;
KOKKOS_INLINE_FUNCTION
void operator = (const MinMaxLocScalar& rhs) {
min_val = rhs.min_val;
min_loc = rhs.min_loc;
max_val = rhs.max_val;
max_loc = rhs.max_loc;
}
KOKKOS_INLINE_FUNCTION
void operator = (const volatile MinMaxLocScalar& rhs) volatile {
min_val = rhs.min_val;
min_loc = rhs.min_loc;
max_val = rhs.max_val;
max_loc = rhs.max_loc;
}
};
template<class Scalar, class Index, class Space = HostSpace>
struct MinMaxLoc {
private:
typedef typename std::remove_cv<Scalar>::type scalar_type;
typedef typename std::remove_cv<Index>::type index_type;
public:
//Required
typedef MinMaxLoc reducer_type;
typedef MinMaxLocScalar<scalar_type,index_type> value_type;
typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;
scalar_type min_init_value;
scalar_type max_init_value;
private:
result_view_type result;
template<class ValueType, bool is_arithmetic = std::is_arithmetic<ValueType>::value >
struct MinInitWrapper;
template<class ValueType >
struct MinInitWrapper<ValueType,true> {
static ValueType value() {
return std::numeric_limits<scalar_type>::max();
}
};
template<class ValueType >
struct MinInitWrapper<ValueType,false> {
static ValueType value() {
return scalar_type();
}
};
template<class ValueType, bool is_arithmetic = std::is_arithmetic<ValueType>::value >
struct MaxInitWrapper;
template<class ValueType >
struct MaxInitWrapper<ValueType,true> {
static ValueType value() {
return std::numeric_limits<scalar_type>::min();
}
};
template<class ValueType >
struct MaxInitWrapper<ValueType,false> {
static ValueType value() {
return scalar_type();
}
};
public:
MinMaxLoc(value_type& result_):
min_init_value(MinInitWrapper<scalar_type>::value()),max_init_value(MaxInitWrapper<scalar_type>::value()),result(&result_) {}
MinMaxLoc(const result_view_type& result_):
min_init_value(MinInitWrapper<scalar_type>::value()),max_init_value(MaxInitWrapper<scalar_type>::value()),result(result_) {}
MinMaxLoc(value_type& result_, const scalar_type& min_init_value_, const scalar_type& max_init_value_):
min_init_value(min_init_value_),max_init_value(max_init_value_),result(&result_) {}
MinMaxLoc(const result_view_type& result_, const scalar_type& min_init_value_, const scalar_type& max_init_value_):
min_init_value(min_init_value_),max_init_value(max_init_value_),result(result_) {}
//Required
KOKKOS_INLINE_FUNCTION
void join(value_type& dest, const value_type& src) const {
if ( src.min_val < dest.min_val ) {
dest.min_val = src.min_val;
dest.min_loc = src.min_loc;
}
if ( src.max_val > dest.max_val ) {
dest.max_val = src.max_val;
dest.max_loc = src.max_loc;
}
}
KOKKOS_INLINE_FUNCTION
void join(volatile value_type& dest, const volatile value_type& src) const {
if ( src.min_val < dest.min_val ) {
dest.min_val = src.min_val;
dest.min_loc = src.min_loc;
}
if ( src.max_val > dest.max_val ) {
dest.max_val = src.max_val;
dest.max_loc = src.max_loc;
}
}
//Optional
KOKKOS_INLINE_FUNCTION
void init( value_type& val) const {
val.min_val = min_init_value;
val.max_val = max_init_value;
}
result_view_type result_view() const {
return result;
}
};
}
}
namespace Kokkos {
namespace Impl {
template< class T, class ReturnType , class ValueTraits>
struct ParallelReduceReturnValue;
template< class ReturnType , class FunctorType >
struct ParallelReduceReturnValue<typename std::enable_if<Kokkos::is_view<ReturnType>::value>::type, ReturnType, FunctorType> {
typedef ReturnType return_type;
typedef InvalidType reducer_type;
typedef typename return_type::value_type value_type_scalar;
typedef typename return_type::value_type value_type_array[];
typedef typename if_c<return_type::rank==0,value_type_scalar,value_type_array>::type value_type;
static return_type& return_value(ReturnType& return_val, const FunctorType&) {
return return_val;
}
};
template< class ReturnType , class FunctorType>
struct ParallelReduceReturnValue<typename std::enable_if<
!Kokkos::is_view<ReturnType>::value &&
(!std::is_array<ReturnType>::value && !std::is_pointer<ReturnType>::value) &&
!Kokkos::is_reducer_type<ReturnType>::value
>::type, ReturnType, FunctorType> {
typedef Kokkos::View< ReturnType
, Kokkos::HostSpace
, Kokkos::MemoryUnmanaged
> return_type;
typedef InvalidType reducer_type;
typedef typename return_type::value_type value_type;
static return_type return_value(ReturnType& return_val, const FunctorType&) {
return return_type(&return_val);
}
};
template< class ReturnType , class FunctorType>
struct ParallelReduceReturnValue<typename std::enable_if<
(is_array<ReturnType>::value || std::is_pointer<ReturnType>::value)
>::type, ReturnType, FunctorType> {
typedef Kokkos::View< typename std::remove_const<ReturnType>::type
, Kokkos::HostSpace
, Kokkos::MemoryUnmanaged
> return_type;
typedef InvalidType reducer_type;
typedef typename return_type::value_type value_type[];
static return_type return_value(ReturnType& return_val,
const FunctorType& functor) {
return return_type(return_val,functor.value_count);
}
};
template< class ReturnType , class FunctorType>
struct ParallelReduceReturnValue<typename std::enable_if<
Kokkos::is_reducer_type<ReturnType>::value
>::type, ReturnType, FunctorType> {
typedef ReturnType return_type;
typedef ReturnType reducer_type;
typedef typename return_type::value_type value_type;
static return_type return_value(ReturnType& return_val,
const FunctorType& functor) {
return return_val;
}
};
}
namespace Impl {
template< class T, class ReturnType , class FunctorType>
struct ParallelReducePolicyType;
template< class PolicyType , class FunctorType >
struct ParallelReducePolicyType<typename std::enable_if<Kokkos::Impl::is_execution_policy<PolicyType>::value>::type, PolicyType,FunctorType> {
typedef PolicyType policy_type;
static PolicyType policy(const PolicyType& policy_) {
return policy_;
}
};
template< class PolicyType , class FunctorType >
struct ParallelReducePolicyType<typename std::enable_if<std::is_integral<PolicyType>::value>::type, PolicyType,FunctorType> {
typedef typename
Impl::FunctorPolicyExecutionSpace< FunctorType , void >::execution_space
execution_space ;
typedef Kokkos::RangePolicy<execution_space> policy_type;
static policy_type policy(const PolicyType& policy_) {
return policy_type(0,policy_);
}
};
}
namespace Impl {
template< class FunctorType, class ExecPolicy, class ValueType, class ExecutionSpace>
struct ParallelReduceFunctorType {
typedef FunctorType functor_type;
static const functor_type& functor(const functor_type& functor) {
return functor;
}
};
}
namespace Impl {
template< class PolicyType, class FunctorType, class ReturnType >
struct ParallelReduceAdaptor {
typedef Impl::ParallelReduceReturnValue<void,ReturnType,FunctorType> return_value_adapter;
#ifdef KOKKOS_IMPL_NEED_FUNCTOR_WRAPPER
typedef Impl::ParallelReduceFunctorType<FunctorType,PolicyType,
typename return_value_adapter::value_type,
typename PolicyType::execution_space> functor_adaptor;
#endif
static inline
void execute(const std::string& label,
const PolicyType& policy,
const FunctorType& functor,
ReturnType& return_value) {
#if (KOKKOS_ENABLE_PROFILING)
uint64_t kpID = 0;
if(Kokkos::Profiling::profileLibraryLoaded()) {
Kokkos::Profiling::beginParallelReduce("" == label ? typeid(FunctorType).name() : label, 0, &kpID);
}
#endif
Kokkos::Impl::shared_allocation_tracking_claim_and_disable();
#ifdef KOKKOS_IMPL_NEED_FUNCTOR_WRAPPER
Impl::ParallelReduce<typename functor_adaptor::functor_type, PolicyType, typename return_value_adapter::reducer_type >
closure(functor_adaptor::functor(functor),
policy,
return_value_adapter::return_value(return_value,functor));
#else
Impl::ParallelReduce<FunctorType, PolicyType, typename return_value_adapter::reducer_type >
closure(functor,
policy,
return_value_adapter::return_value(return_value,functor));
#endif
Kokkos::Impl::shared_allocation_tracking_release_and_enable();
closure.execute();
#if (KOKKOS_ENABLE_PROFILING)
if(Kokkos::Profiling::profileLibraryLoaded()) {
Kokkos::Profiling::endParallelReduce(kpID);
}
#endif
}
};
}
/*! \fn void parallel_reduce(label,policy,functor,return_argument)
\brief Perform a parallel reduction.
\param label An optional Label giving the call name. Must be able to construct a std::string from the argument.
\param policy A Kokkos Execution Policy, such as an integer, a RangePolicy or a TeamPolicy.
\param functor A functor with a reduction operator, and optional init, join and final functions.
\param return_argument A return argument which can be a scalar, a View, or a ReducerStruct. This argument can be left out if the functor has a final function.
*/
/** \brief Parallel reduction
*
* parallel_reduce performs parallel reductions with arbitrary functions - i.e.
* it is not solely data based. The call expects up to 4 arguments:
*
*
* Example of a parallel_reduce functor for a POD (plain old data) value type:
* \code
* class FunctorType { // For POD value type
* public:
* typedef ... execution_space ;
* typedef <podType> value_type ;
* void operator()( <intType> iwork , <podType> & update ) const ;
* void init( <podType> & update ) const ;
* void join( volatile <podType> & update ,
* volatile const <podType> & input ) const ;
*
* typedef true_type has_final ;
* void final( <podType> & update ) const ;
* };
* \endcode
*
* Example of a parallel_reduce functor for an array of POD (plain old data) values:
* \code
* class FunctorType { // For array of POD value
* public:
* typedef ... execution_space ;
* typedef <podType> value_type[] ;
* void operator()( <intType> , <podType> update[] ) const ;
* void init( <podType> update[] ) const ;
* void join( volatile <podType> update[] ,
* volatile const <podType> input[] ) const ;
*
* typedef true_type has_final ;
* void final( <podType> update[] ) const ;
* };
* \endcode
*/
// ReturnValue is scalar or array: take by reference
template< class PolicyType, class FunctorType, class ReturnType >
inline
void parallel_reduce(const std::string& label,
const PolicyType& policy,
const FunctorType& functor,
ReturnType& return_value,
typename Impl::enable_if<
Kokkos::Impl::is_execution_policy<PolicyType>::value
>::type * = 0) {
Impl::ParallelReduceAdaptor<PolicyType,FunctorType,ReturnType>::execute(label,policy,functor,return_value);
}
template< class PolicyType, class FunctorType, class ReturnType >
inline
void parallel_reduce(const PolicyType& policy,
const FunctorType& functor,
ReturnType& return_value,
typename Impl::enable_if<
Kokkos::Impl::is_execution_policy<PolicyType>::value
>::type * = 0) {
Impl::ParallelReduceAdaptor<PolicyType,FunctorType,ReturnType>::execute("",policy,functor,return_value);
}
template< class FunctorType, class ReturnType >
inline
void parallel_reduce(const size_t& policy,
const FunctorType& functor,
ReturnType& return_value) {
typedef typename Impl::ParallelReducePolicyType<void,size_t,FunctorType>::policy_type policy_type;
Impl::ParallelReduceAdaptor<policy_type,FunctorType,ReturnType>::execute("",policy_type(0,policy),functor,return_value);
}
template< class FunctorType, class ReturnType >
inline
void parallel_reduce(const std::string& label,
const size_t& policy,
const FunctorType& functor,
ReturnType& return_value) {
typedef typename Impl::ParallelReducePolicyType<void,size_t,FunctorType>::policy_type policy_type;
Impl::ParallelReduceAdaptor<policy_type,FunctorType,ReturnType>::execute(label,policy_type(0,policy),functor,return_value);
}
// ReturnValue as View or Reducer: take by copy to allow for inline construction
template< class PolicyType, class FunctorType, class ReturnType >
inline
void parallel_reduce(const std::string& label,
const PolicyType& policy,
const FunctorType& functor,
const ReturnType& return_value,
typename Impl::enable_if<
Kokkos::Impl::is_execution_policy<PolicyType>::value
>::type * = 0) {
Impl::ParallelReduceAdaptor<PolicyType,FunctorType,const ReturnType>::execute(label,policy,functor,return_value);
}
template< class PolicyType, class FunctorType, class ReturnType >
inline
void parallel_reduce(const PolicyType& policy,
const FunctorType& functor,
const ReturnType& return_value,
typename Impl::enable_if<
Kokkos::Impl::is_execution_policy<PolicyType>::value
>::type * = 0) {
- Impl::ParallelReduceAdaptor<PolicyType,FunctorType,const ReturnType>::execute("",policy,functor,return_value);
+ ReturnType return_value_impl = return_value;
+ Impl::ParallelReduceAdaptor<PolicyType,FunctorType,ReturnType>::execute("",policy,functor,return_value_impl);
}
template< class FunctorType, class ReturnType >
inline
void parallel_reduce(const size_t& policy,
const FunctorType& functor,
const ReturnType& return_value) {
typedef typename Impl::ParallelReducePolicyType<void,size_t,FunctorType>::policy_type policy_type;
-
- Impl::ParallelReduceAdaptor<policy_type,FunctorType,const ReturnType>::execute("",policy_type(0,policy),functor,return_value);
+ ReturnType return_value_impl = return_value;
+ Impl::ParallelReduceAdaptor<policy_type,FunctorType,ReturnType>::execute("",policy_type(0,policy),functor,return_value_impl);
}
template< class FunctorType, class ReturnType >
inline
void parallel_reduce(const std::string& label,
const size_t& policy,
const FunctorType& functor,
const ReturnType& return_value) {
typedef typename Impl::ParallelReducePolicyType<void,size_t,FunctorType>::policy_type policy_type;
- Impl::ParallelReduceAdaptor<policy_type,FunctorType,const ReturnType>::execute(label,policy_type(0,policy),functor,return_value);
+ ReturnType return_value_impl = return_value;
+ Impl::ParallelReduceAdaptor<policy_type,FunctorType,ReturnType>::execute(label,policy_type(0,policy),functor,return_value_impl);
}
// No Return Argument
template< class PolicyType, class FunctorType>
inline
void parallel_reduce(const std::string& label,
const PolicyType& policy,
const FunctorType& functor,
typename Impl::enable_if<
Kokkos::Impl::is_execution_policy<PolicyType>::value
>::type * = 0) {
typedef Kokkos::Impl::FunctorValueTraits< FunctorType , void > ValueTraits ;
typedef typename Kokkos::Impl::if_c< (ValueTraits::StaticValueSize != 0)
, typename ValueTraits::value_type
, typename ValueTraits::pointer_type
>::type value_type ;
typedef Kokkos::View< value_type
, Kokkos::HostSpace
, Kokkos::MemoryUnmanaged
> result_view_type;
result_view_type result_view ;
Impl::ParallelReduceAdaptor<PolicyType,FunctorType,result_view_type>::execute(label,policy,functor,result_view);
}
template< class PolicyType, class FunctorType >
inline
void parallel_reduce(const PolicyType& policy,
const FunctorType& functor,
typename Impl::enable_if<
Kokkos::Impl::is_execution_policy<PolicyType>::value
>::type * = 0) {
typedef Kokkos::Impl::FunctorValueTraits< FunctorType , void > ValueTraits ;
typedef typename Kokkos::Impl::if_c< (ValueTraits::StaticValueSize != 0)
, typename ValueTraits::value_type
, typename ValueTraits::pointer_type
>::type value_type ;
typedef Kokkos::View< value_type
, Kokkos::HostSpace
, Kokkos::MemoryUnmanaged
> result_view_type;
result_view_type result_view ;
Impl::ParallelReduceAdaptor<PolicyType,FunctorType,result_view_type>::execute("",policy,functor,result_view);
}
template< class FunctorType >
inline
void parallel_reduce(const size_t& policy,
const FunctorType& functor) {
typedef typename Impl::ParallelReducePolicyType<void,size_t,FunctorType>::policy_type policy_type;
typedef Kokkos::Impl::FunctorValueTraits< FunctorType , void > ValueTraits ;
typedef typename Kokkos::Impl::if_c< (ValueTraits::StaticValueSize != 0)
, typename ValueTraits::value_type
, typename ValueTraits::pointer_type
>::type value_type ;
typedef Kokkos::View< value_type
, Kokkos::HostSpace
, Kokkos::MemoryUnmanaged
> result_view_type;
result_view_type result_view ;
Impl::ParallelReduceAdaptor<policy_type,FunctorType,result_view_type>::execute("",policy_type(0,policy),functor,result_view);
}
template< class FunctorType>
inline
void parallel_reduce(const std::string& label,
const size_t& policy,
const FunctorType& functor) {
typedef typename Impl::ParallelReducePolicyType<void,size_t,FunctorType>::policy_type policy_type;
typedef Kokkos::Impl::FunctorValueTraits< FunctorType , void > ValueTraits ;
typedef typename Kokkos::Impl::if_c< (ValueTraits::StaticValueSize != 0)
, typename ValueTraits::value_type
, typename ValueTraits::pointer_type
>::type value_type ;
typedef Kokkos::View< value_type
, Kokkos::HostSpace
, Kokkos::MemoryUnmanaged
> result_view_type;
result_view_type result_view ;
Impl::ParallelReduceAdaptor<policy_type,FunctorType,result_view_type>::execute(label,policy_type(0,policy),functor,result_view);
}
} //namespace Kokkos
diff --git a/lib/kokkos/core/src/Kokkos_Qthread.hpp b/lib/kokkos/core/src/Kokkos_Qthread.hpp
index d61f8d518..c58518b06 100644
--- a/lib/kokkos/core/src/Kokkos_Qthread.hpp
+++ b/lib/kokkos/core/src/Kokkos_Qthread.hpp
@@ -1,172 +1,183 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_QTHREAD_HPP
#define KOKKOS_QTHREAD_HPP
#include <cstddef>
#include <iosfwd>
#include <Kokkos_Core.hpp>
#include <Kokkos_Layout.hpp>
#include <Kokkos_MemoryTraits.hpp>
#include <Kokkos_HostSpace.hpp>
#include <Kokkos_ExecPolicy.hpp>
#include <impl/Kokkos_Tags.hpp>
/*--------------------------------------------------------------------------*/
namespace Kokkos {
namespace Impl {
class QthreadExec ;
} // namespace Impl
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
namespace Kokkos {
/** \brief Execution space supported by Qthread */
class Qthread {
public:
//! \name Type declarations that all Kokkos devices must provide.
//@{
//! Tag this class as an execution space
typedef Qthread execution_space ;
typedef Kokkos::HostSpace memory_space ;
//! This execution space preferred device_type
typedef Kokkos::Device<execution_space,memory_space> device_type;
typedef Kokkos::LayoutRight array_layout ;
typedef memory_space::size_type size_type ;
typedef ScratchMemorySpace< Qthread > scratch_memory_space ;
//@}
/*------------------------------------------------------------------------*/
/** \brief Initialization will construct one or more instances */
static Qthread & instance( int = 0 );
/** \brief Set the execution space to a "sleep" state.
*
* This function sets the "sleep" state in which it is not ready for work.
* This may consume less resources than in an "ready" state,
* but it may also take time to transition to the "ready" state.
*
* \return True if enters or is in the "sleep" state.
* False if functions are currently executing.
*/
bool sleep();
/** \brief Wake from the sleep state.
*
* \return True if enters or is in the "ready" state.
* False if functions are currently executing.
*/
static bool wake();
/** \brief Wait until all dispatched functions to complete.
*
* The parallel_for or parallel_reduce dispatch of a functor may
* return asynchronously, before the functor completes. This
* method does not return until all dispatched functors on this
* device have completed.
*/
static void fence();
/*------------------------------------------------------------------------*/
static int in_parallel();
static int is_initialized();
/** \brief Return maximum amount of concurrency */
static int concurrency();
static void initialize( int thread_count );
static void finalize();
/** \brief Print configuration information to the given output stream. */
static void print_configuration( std::ostream & , const bool detail = false );
int shepherd_size() const ;
int shepherd_worker_size() const ;
};
/*--------------------------------------------------------------------------*/
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
namespace Kokkos {
namespace Impl {
+template<>
+struct MemorySpaceAccess
+ < Kokkos::Qthread::memory_space
+ , Kokkos::Qthread::scratch_memory_space
+ >
+{
+ enum { assignable = false };
+ enum { accessible = true };
+ enum { deepcopy = false };
+};
+
template<>
struct VerifyExecutionCanAccessMemorySpace
< Kokkos::Qthread::memory_space
, Kokkos::Qthread::scratch_memory_space
>
{
enum { value = true };
inline static void verify( void ) { }
inline static void verify( const void * ) { }
};
} // namespace Impl
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
#include <Kokkos_Parallel.hpp>
#include <Qthread/Kokkos_QthreadExec.hpp>
#include <Qthread/Kokkos_Qthread_Parallel.hpp>
#endif /* #define KOKKOS_QTHREAD_HPP */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
diff --git a/lib/kokkos/core/src/Kokkos_Serial.hpp b/lib/kokkos/core/src/Kokkos_Serial.hpp
index 233b56c93..914edbc7c 100644
--- a/lib/kokkos/core/src/Kokkos_Serial.hpp
+++ b/lib/kokkos/core/src/Kokkos_Serial.hpp
@@ -1,1116 +1,1123 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
/// \file Kokkos_Serial.hpp
/// \brief Declaration and definition of Kokkos::Serial device.
#ifndef KOKKOS_SERIAL_HPP
#define KOKKOS_SERIAL_HPP
#include <cstddef>
#include <iosfwd>
#include <Kokkos_Parallel.hpp>
-#include <Kokkos_TaskPolicy.hpp>
+#include <Kokkos_TaskScheduler.hpp>
#include <Kokkos_Layout.hpp>
#include <Kokkos_HostSpace.hpp>
#include <Kokkos_ScratchSpace.hpp>
#include <Kokkos_MemoryTraits.hpp>
#include <impl/Kokkos_Tags.hpp>
#include <impl/Kokkos_FunctorAdapter.hpp>
#include <impl/Kokkos_Profiling_Interface.hpp>
-
#include <KokkosExp_MDRangePolicy.hpp>
#if defined( KOKKOS_HAVE_SERIAL )
namespace Kokkos {
/// \class Serial
/// \brief Kokkos device for non-parallel execution
///
/// A "device" represents a parallel execution model. It tells Kokkos
/// how to parallelize the execution of kernels in a parallel_for or
/// parallel_reduce. For example, the Threads device uses Pthreads or
/// C++11 threads on a CPU, the OpenMP device uses the OpenMP language
/// extensions, and the Cuda device uses NVIDIA's CUDA programming
/// model. The Serial device executes "parallel" kernels
/// sequentially. This is useful if you really do not want to use
/// threads, or if you want to explore different combinations of MPI
/// and shared-memory parallel programming models.
class Serial {
public:
//! \name Type declarations that all Kokkos devices must provide.
//@{
//! Tag this class as an execution space:
typedef Serial execution_space ;
//! The size_type typedef best suited for this device.
typedef HostSpace::size_type size_type ;
//! This device's preferred memory space.
typedef HostSpace memory_space ;
//! This execution space preferred device_type
typedef Kokkos::Device<execution_space,memory_space> device_type;
//! This device's preferred array layout.
typedef LayoutRight array_layout ;
/// \brief Scratch memory space
typedef ScratchMemorySpace< Kokkos::Serial > scratch_memory_space ;
//@}
/// \brief True if and only if this method is being called in a
/// thread-parallel function.
///
/// For the Serial device, this method <i>always</i> returns false,
/// because parallel_for or parallel_reduce with the Serial device
/// always execute sequentially.
inline static int in_parallel() { return false ; }
/** \brief Set the device in a "sleep" state.
*
* This function sets the device in a "sleep" state in which it is
* not ready for work. This may consume less resources than if the
* device were in an "awake" state, but it may also take time to
* bring the device from a sleep state to be ready for work.
*
* \return True if the device is in the "sleep" state, else false if
* the device is actively working and could not enter the "sleep"
* state.
*/
static bool sleep();
/// \brief Wake the device from the 'sleep' state so it is ready for work.
///
/// \return True if the device is in the "ready" state, else "false"
/// if the device is actively working (which also means that it's
/// awake).
static bool wake();
/// \brief Wait until all dispatched functors complete.
///
/// The parallel_for or parallel_reduce dispatch of a functor may
/// return asynchronously, before the functor completes. This
/// method does not return until all dispatched functors on this
/// device have completed.
static void fence() {}
static void initialize( unsigned threads_count = 1 ,
unsigned use_numa_count = 0 ,
unsigned use_cores_per_numa = 0 ,
bool allow_asynchronous_threadpool = false) {
(void) threads_count;
(void) use_numa_count;
(void) use_cores_per_numa;
(void) allow_asynchronous_threadpool;
// Init the array of locks used for arbitrarily sized atomics
Impl::init_lock_array_host_space();
#if (KOKKOS_ENABLE_PROFILING)
Kokkos::Profiling::initialize();
#endif
}
static int is_initialized() { return 1 ; }
/** \brief Return the maximum amount of concurrency. */
static int concurrency() {return 1;};
//! Free any resources being consumed by the device.
static void finalize() {
#if (KOKKOS_ENABLE_PROFILING)
Kokkos::Profiling::finalize();
#endif
}
//! Print configuration information to the given output stream.
static void print_configuration( std::ostream & , const bool /* detail */ = false ) {}
//--------------------------------------------------------------------------
inline static int thread_pool_size( int = 0 ) { return 1 ; }
KOKKOS_INLINE_FUNCTION static int thread_pool_rank() { return 0 ; }
//--------------------------------------------------------------------------
KOKKOS_INLINE_FUNCTION static unsigned hardware_thread_id() { return thread_pool_rank(); }
inline static unsigned max_hardware_threads() { return thread_pool_size(0); }
//--------------------------------------------------------------------------
static void * scratch_memory_resize( unsigned reduce_size , unsigned shared_size );
//--------------------------------------------------------------------------
};
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
namespace Kokkos {
namespace Impl {
+template<>
+struct MemorySpaceAccess
+ < Kokkos::Serial::memory_space
+ , Kokkos::Serial::scratch_memory_space
+ >
+{
+ enum { assignable = false };
+ enum { accessible = true };
+ enum { deepcopy = false };
+};
+
template<>
struct VerifyExecutionCanAccessMemorySpace
< Kokkos::Serial::memory_space
, Kokkos::Serial::scratch_memory_space
>
{
enum { value = true };
inline static void verify( void ) { }
inline static void verify( const void * ) { }
};
namespace SerialImpl {
struct Sentinel {
void * m_scratch ;
unsigned m_reduce_end ;
unsigned m_shared_end ;
Sentinel();
~Sentinel();
static Sentinel & singleton();
};
inline
unsigned align( unsigned n );
}
} // namespace Impl
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
namespace Kokkos {
namespace Impl {
class SerialTeamMember {
private:
typedef Kokkos::ScratchMemorySpace< Kokkos::Serial > scratch_memory_space ;
const scratch_memory_space m_space ;
const int m_league_rank ;
const int m_league_size ;
SerialTeamMember & operator = ( const SerialTeamMember & );
public:
KOKKOS_INLINE_FUNCTION
const scratch_memory_space & team_shmem() const { return m_space ; }
KOKKOS_INLINE_FUNCTION
const scratch_memory_space & team_scratch(int) const
{ return m_space ; }
KOKKOS_INLINE_FUNCTION
const scratch_memory_space & thread_scratch(int) const
{ return m_space ; }
-
KOKKOS_INLINE_FUNCTION int league_rank() const { return m_league_rank ; }
KOKKOS_INLINE_FUNCTION int league_size() const { return m_league_size ; }
KOKKOS_INLINE_FUNCTION int team_rank() const { return 0 ; }
KOKKOS_INLINE_FUNCTION int team_size() const { return 1 ; }
KOKKOS_INLINE_FUNCTION void team_barrier() const {}
template<class ValueType>
KOKKOS_INLINE_FUNCTION
void team_broadcast(const ValueType& , const int& ) const {}
template< class ValueType, class JoinOp >
KOKKOS_INLINE_FUNCTION
ValueType team_reduce( const ValueType & value , const JoinOp & ) const
{
return value ;
}
/** \brief Intra-team exclusive prefix sum with team_rank() ordering
* with intra-team non-deterministic ordering accumulation.
*
* The global inter-team accumulation value will, at the end of the
* league's parallel execution, be the scan's total.
* Parallel execution ordering of the league's teams is non-deterministic.
* As such the base value for each team's scan operation is similarly
* non-deterministic.
*/
template< typename Type >
KOKKOS_INLINE_FUNCTION Type team_scan( const Type & value , Type * const global_accum ) const
{
const Type tmp = global_accum ? *global_accum : Type(0) ;
if ( global_accum ) { *global_accum += value ; }
return tmp ;
}
/** \brief Intra-team exclusive prefix sum with team_rank() ordering.
*
* The highest rank thread can compute the reduction total as
* reduction_total = dev.team_scan( value ) + value ;
*/
template< typename Type >
KOKKOS_INLINE_FUNCTION Type team_scan( const Type & ) const
{ return Type(0); }
//----------------------------------------
// Execution space specific:
SerialTeamMember( int arg_league_rank
, int arg_league_size
, int arg_shared_size
);
};
} // namespace Impl
-
/*
* < Kokkos::Serial , WorkArgTag >
- * < WorkArgTag , Impl::enable_if< Impl::is_same< Kokkos::Serial , Kokkos::DefaultExecutionSpace >::value >::type >
+ * < WorkArgTag , Impl::enable_if< std::is_same< Kokkos::Serial , Kokkos::DefaultExecutionSpace >::value >::type >
*
*/
namespace Impl {
template< class ... Properties >
class TeamPolicyInternal< Kokkos::Serial , Properties ... >:public PolicyTraits<Properties...>
{
private:
size_t m_team_scratch_size[2] ;
size_t m_thread_scratch_size[2] ;
int m_league_size ;
int m_chunk_size;
public:
//! Tag this class as a kokkos execution policy
typedef TeamPolicyInternal execution_policy ;
typedef PolicyTraits<Properties ... > traits;
//! Execution space of this execution policy:
typedef Kokkos::Serial execution_space ;
TeamPolicyInternal& operator = (const TeamPolicyInternal& p) {
m_league_size = p.m_league_size;
m_team_scratch_size[0] = p.m_team_scratch_size[0];
m_thread_scratch_size[0] = p.m_thread_scratch_size[0];
m_team_scratch_size[1] = p.m_team_scratch_size[1];
m_thread_scratch_size[1] = p.m_thread_scratch_size[1];
m_chunk_size = p.m_chunk_size;
return *this;
}
//----------------------------------------
template< class FunctorType >
static
int team_size_max( const FunctorType & ) { return 1 ; }
template< class FunctorType >
static
int team_size_recommended( const FunctorType & ) { return 1 ; }
template< class FunctorType >
static
int team_size_recommended( const FunctorType & , const int& ) { return 1 ; }
//----------------------------------------
inline int team_size() const { return 1 ; }
inline int league_size() const { return m_league_size ; }
inline size_t scratch_size(const int& level, int = 0) const { return m_team_scratch_size[level] + m_thread_scratch_size[level]; }
/** \brief Specify league size, request team size */
TeamPolicyInternal( execution_space &
, int league_size_request
, int /* team_size_request */
, int /* vector_length_request */ = 1 )
: m_team_scratch_size { 0 , 0 }
, m_thread_scratch_size { 0 , 0 }
, m_league_size( league_size_request )
, m_chunk_size ( 32 )
{}
TeamPolicyInternal( execution_space &
, int league_size_request
, const Kokkos::AUTO_t & /* team_size_request */
, int /* vector_length_request */ = 1 )
: m_team_scratch_size { 0 , 0 }
, m_thread_scratch_size { 0 , 0 }
, m_league_size( league_size_request )
, m_chunk_size ( 32 )
{}
TeamPolicyInternal( int league_size_request
, int /* team_size_request */
, int /* vector_length_request */ = 1 )
: m_team_scratch_size { 0 , 0 }
, m_thread_scratch_size { 0 , 0 }
, m_league_size( league_size_request )
, m_chunk_size ( 32 )
{}
TeamPolicyInternal( int league_size_request
, const Kokkos::AUTO_t & /* team_size_request */
, int /* vector_length_request */ = 1 )
: m_team_scratch_size { 0 , 0 }
, m_thread_scratch_size { 0 , 0 }
, m_league_size( league_size_request )
, m_chunk_size ( 32 )
{}
-
inline int chunk_size() const { return m_chunk_size ; }
/** \brief set chunk_size to a discrete value*/
inline TeamPolicyInternal set_chunk_size(typename traits::index_type chunk_size_) const {
TeamPolicyInternal p = *this;
p.m_chunk_size = chunk_size_;
return p;
}
/** \brief set per team scratch size for a specific level of the scratch hierarchy */
inline TeamPolicyInternal set_scratch_size(const int& level, const PerTeamValue& per_team) const {
TeamPolicyInternal p = *this;
p.m_team_scratch_size[level] = per_team.value;
return p;
};
/** \brief set per thread scratch size for a specific level of the scratch hierarchy */
inline TeamPolicyInternal set_scratch_size(const int& level, const PerThreadValue& per_thread) const {
TeamPolicyInternal p = *this;
p.m_thread_scratch_size[level] = per_thread.value;
return p;
};
/** \brief set per thread and per team scratch size for a specific level of the scratch hierarchy */
inline TeamPolicyInternal set_scratch_size(const int& level, const PerTeamValue& per_team, const PerThreadValue& per_thread) const {
TeamPolicyInternal p = *this;
p.m_team_scratch_size[level] = per_team.value;
p.m_thread_scratch_size[level] = per_thread.value;
return p;
};
typedef Impl::SerialTeamMember member_type ;
};
} /* namespace Impl */
} /* namespace Kokkos */
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
/* Parallel patterns for Kokkos::Serial with RangePolicy */
namespace Kokkos {
namespace Impl {
template< class FunctorType , class ... Traits >
class ParallelFor< FunctorType ,
Kokkos::RangePolicy< Traits ... > ,
Kokkos::Serial
>
{
private:
typedef Kokkos::RangePolicy< Traits ... > Policy ;
const FunctorType m_functor ;
const Policy m_policy ;
template< class TagType >
typename std::enable_if< std::is_same< TagType , void >::value >::type
exec() const
{
const typename Policy::member_type e = m_policy.end();
for ( typename Policy::member_type i = m_policy.begin() ; i < e ; ++i ) {
m_functor( i );
}
}
template< class TagType >
typename std::enable_if< ! std::is_same< TagType , void >::value >::type
exec() const
{
const TagType t{} ;
const typename Policy::member_type e = m_policy.end();
for ( typename Policy::member_type i = m_policy.begin() ; i < e ; ++i ) {
m_functor( t , i );
}
}
public:
inline
void execute() const
{ this-> template exec< typename Policy::work_tag >(); }
inline
ParallelFor( const FunctorType & arg_functor
, const Policy & arg_policy )
: m_functor( arg_functor )
, m_policy( arg_policy )
{}
};
/*--------------------------------------------------------------------------*/
template< class FunctorType , class ReducerType , class ... Traits >
class ParallelReduce< FunctorType
, Kokkos::RangePolicy< Traits ... >
, ReducerType
, Kokkos::Serial
>
{
private:
typedef Kokkos::RangePolicy< Traits ... > Policy ;
typedef typename Policy::work_tag WorkTag ;
typedef Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, FunctorType, ReducerType> ReducerConditional;
typedef typename ReducerConditional::type ReducerTypeFwd;
typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd , WorkTag > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd , WorkTag > ValueInit ;
typedef typename ValueTraits::pointer_type pointer_type ;
typedef typename ValueTraits::reference_type reference_type ;
const FunctorType m_functor ;
const Policy m_policy ;
const ReducerType m_reducer ;
const pointer_type m_result_ptr ;
-
template< class TagType >
inline
typename std::enable_if< std::is_same< TagType , void >::value >::type
exec( pointer_type ptr ) const
{
reference_type update = ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , ptr );
const typename Policy::member_type e = m_policy.end();
for ( typename Policy::member_type i = m_policy.begin() ; i < e ; ++i ) {
m_functor( i , update );
}
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , TagType >::
final( ReducerConditional::select(m_functor , m_reducer) , ptr );
}
template< class TagType >
inline
typename std::enable_if< ! std::is_same< TagType , void >::value >::type
exec( pointer_type ptr ) const
{
const TagType t{} ;
reference_type update = ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , ptr );
const typename Policy::member_type e = m_policy.end();
for ( typename Policy::member_type i = m_policy.begin() ; i < e ; ++i ) {
m_functor( t , i , update );
}
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , TagType >::
final( ReducerConditional::select(m_functor , m_reducer) , ptr );
}
public:
inline
void execute() const
{
pointer_type ptr = (pointer_type) Kokkos::Serial::scratch_memory_resize
( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) , 0 );
this-> template exec< WorkTag >( m_result_ptr ? m_result_ptr : ptr );
}
template< class HostViewType >
ParallelReduce( const FunctorType & arg_functor ,
const Policy & arg_policy ,
const HostViewType & arg_result_view ,
typename std::enable_if<
Kokkos::is_view< HostViewType >::value &&
!Kokkos::is_reducer_type<ReducerType>::value
,void*>::type = NULL)
: m_functor( arg_functor )
, m_policy( arg_policy )
, m_reducer( InvalidType() )
, m_result_ptr( arg_result_view.ptr_on_device() )
{
static_assert( Kokkos::is_view< HostViewType >::value
, "Kokkos::Serial reduce result must be a View" );
static_assert( std::is_same< typename HostViewType::memory_space , HostSpace >::value
, "Kokkos::Serial reduce result must be a View in HostSpace" );
}
inline
ParallelReduce( const FunctorType & arg_functor
, Policy arg_policy
, const ReducerType& reducer )
: m_functor( arg_functor )
, m_policy( arg_policy )
, m_reducer( reducer )
, m_result_ptr( reducer.result_view().data() )
{
/*static_assert( std::is_same< typename ViewType::memory_space
, Kokkos::HostSpace >::value
, "Reduction result on Kokkos::OpenMP must be a Kokkos::View in HostSpace" );*/
}
};
/*--------------------------------------------------------------------------*/
template< class FunctorType , class ... Traits >
class ParallelScan< FunctorType
, Kokkos::RangePolicy< Traits ... >
, Kokkos::Serial
>
{
private:
typedef Kokkos::RangePolicy< Traits ... > Policy ;
typedef typename Policy::work_tag WorkTag ;
typedef Kokkos::Impl::FunctorValueTraits< FunctorType , WorkTag > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< FunctorType , WorkTag > ValueInit ;
typedef typename ValueTraits::pointer_type pointer_type ;
typedef typename ValueTraits::reference_type reference_type ;
const FunctorType m_functor ;
const Policy m_policy ;
template< class TagType >
inline
typename std::enable_if< std::is_same< TagType , void >::value >::type
exec( pointer_type ptr ) const
{
reference_type update = ValueInit::init( m_functor , ptr );
const typename Policy::member_type e = m_policy.end();
for ( typename Policy::member_type i = m_policy.begin() ; i < e ; ++i ) {
m_functor( i , update , true );
}
}
template< class TagType >
inline
typename std::enable_if< ! std::is_same< TagType , void >::value >::type
exec( pointer_type ptr ) const
{
const TagType t{} ;
reference_type update = ValueInit::init( m_functor , ptr );
const typename Policy::member_type e = m_policy.end();
for ( typename Policy::member_type i = m_policy.begin() ; i < e ; ++i ) {
m_functor( t , i , update , true );
}
}
public:
inline
void execute() const
{
pointer_type ptr = (pointer_type)
Kokkos::Serial::scratch_memory_resize( ValueTraits::value_size( m_functor ) , 0 );
this-> template exec< WorkTag >( ptr );
}
inline
ParallelScan( const FunctorType & arg_functor
, const Policy & arg_policy
)
: m_functor( arg_functor )
, m_policy( arg_policy )
{}
};
} // namespace Impl
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
/* Parallel patterns for Kokkos::Serial with TeamPolicy */
namespace Kokkos {
namespace Impl {
template< class FunctorType , class ... Properties >
class ParallelFor< FunctorType
, Kokkos::TeamPolicy< Properties ... >
, Kokkos::Serial
>
{
private:
typedef TeamPolicyInternal< Kokkos::Serial , Properties ...> Policy ;
typedef typename Policy::member_type Member ;
const FunctorType m_functor ;
const int m_league ;
const int m_shared ;
template< class TagType >
inline
typename std::enable_if< std::is_same< TagType , void >::value >::type
exec() const
{
for ( int ileague = 0 ; ileague < m_league ; ++ileague ) {
m_functor( Member(ileague,m_league,m_shared) );
}
}
template< class TagType >
inline
typename std::enable_if< ! std::is_same< TagType , void >::value >::type
exec() const
{
const TagType t{} ;
for ( int ileague = 0 ; ileague < m_league ; ++ileague ) {
m_functor( t , Member(ileague,m_league,m_shared) );
}
}
public:
inline
void execute() const
{
Kokkos::Serial::scratch_memory_resize( 0 , m_shared );
this-> template exec< typename Policy::work_tag >();
}
ParallelFor( const FunctorType & arg_functor
, const Policy & arg_policy )
: m_functor( arg_functor )
, m_league( arg_policy.league_size() )
, m_shared( arg_policy.scratch_size(0) + arg_policy.scratch_size(1) + FunctorTeamShmemSize< FunctorType >::value( arg_functor , 1 ) )
{ }
};
/*--------------------------------------------------------------------------*/
template< class FunctorType , class ReducerType , class ... Properties >
class ParallelReduce< FunctorType
, Kokkos::TeamPolicy< Properties ... >
, ReducerType
, Kokkos::Serial
>
{
private:
typedef TeamPolicyInternal< Kokkos::Serial, Properties ... > Policy ;
typedef typename Policy::member_type Member ;
typedef typename Policy::work_tag WorkTag ;
typedef Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, FunctorType, ReducerType> ReducerConditional;
typedef typename ReducerConditional::type ReducerTypeFwd;
typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd , WorkTag > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd , WorkTag > ValueInit ;
typedef typename ValueTraits::pointer_type pointer_type ;
typedef typename ValueTraits::reference_type reference_type ;
const FunctorType m_functor ;
const int m_league ;
const ReducerType m_reducer ;
pointer_type m_result_ptr ;
const int m_shared ;
template< class TagType >
inline
typename std::enable_if< std::is_same< TagType , void >::value >::type
exec( pointer_type ptr ) const
{
reference_type update = ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , ptr );
for ( int ileague = 0 ; ileague < m_league ; ++ileague ) {
m_functor( Member(ileague,m_league,m_shared) , update );
}
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , TagType >::
final( ReducerConditional::select(m_functor , m_reducer) , ptr );
}
template< class TagType >
inline
typename std::enable_if< ! std::is_same< TagType , void >::value >::type
exec( pointer_type ptr ) const
{
const TagType t{} ;
reference_type update = ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , ptr );
for ( int ileague = 0 ; ileague < m_league ; ++ileague ) {
m_functor( t , Member(ileague,m_league,m_shared) , update );
}
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , TagType >::
final( ReducerConditional::select(m_functor , m_reducer) , ptr );
}
public:
inline
void execute() const
{
pointer_type ptr = (pointer_type) Kokkos::Serial::scratch_memory_resize
( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) , m_shared );
this-> template exec< WorkTag >( m_result_ptr ? m_result_ptr : ptr );
}
template< class ViewType >
ParallelReduce( const FunctorType & arg_functor
, const Policy & arg_policy
, const ViewType & arg_result ,
typename std::enable_if<
Kokkos::is_view< ViewType >::value &&
!Kokkos::is_reducer_type<ReducerType>::value
,void*>::type = NULL)
: m_functor( arg_functor )
, m_league( arg_policy.league_size() )
, m_reducer( InvalidType() )
, m_result_ptr( arg_result.ptr_on_device() )
, m_shared( arg_policy.scratch_size(0) + arg_policy.scratch_size(1) + FunctorTeamShmemSize< FunctorType >::value( m_functor , 1 ) )
{
static_assert( Kokkos::is_view< ViewType >::value
, "Reduction result on Kokkos::Serial must be a Kokkos::View" );
static_assert( std::is_same< typename ViewType::memory_space
, Kokkos::HostSpace >::value
, "Reduction result on Kokkos::Serial must be a Kokkos::View in HostSpace" );
}
inline
ParallelReduce( const FunctorType & arg_functor
, Policy arg_policy
, const ReducerType& reducer )
: m_functor( arg_functor )
, m_league( arg_policy.league_size() )
, m_reducer( reducer )
, m_result_ptr( reducer.result_view().data() )
, m_shared( arg_policy.scratch_size(0) + arg_policy.scratch_size(1) + FunctorTeamShmemSize< FunctorType >::value( arg_functor , arg_policy.team_size() ) )
{
/*static_assert( std::is_same< typename ViewType::memory_space
, Kokkos::HostSpace >::value
, "Reduction result on Kokkos::OpenMP must be a Kokkos::View in HostSpace" );*/
}
};
} // namespace Impl
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
/* Nested parallel patterns for Kokkos::Serial with TeamPolicy */
namespace Kokkos {
namespace Impl {
template<typename iType>
struct TeamThreadRangeBoundariesStruct<iType,SerialTeamMember> {
typedef iType index_type;
const iType begin ;
const iType end ;
enum {increment = 1};
const SerialTeamMember& thread;
KOKKOS_INLINE_FUNCTION
TeamThreadRangeBoundariesStruct (const SerialTeamMember& arg_thread, const iType& arg_count)
: begin(0)
, end(arg_count)
, thread(arg_thread)
{}
KOKKOS_INLINE_FUNCTION
TeamThreadRangeBoundariesStruct (const SerialTeamMember& arg_thread, const iType& arg_begin, const iType & arg_end )
: begin( arg_begin )
, end( arg_end)
, thread( arg_thread )
{}
};
template<typename iType>
struct ThreadVectorRangeBoundariesStruct<iType,SerialTeamMember> {
typedef iType index_type;
enum {start = 0};
const iType end;
enum {increment = 1};
KOKKOS_INLINE_FUNCTION
ThreadVectorRangeBoundariesStruct (const SerialTeamMember& thread, const iType& count):
end( count )
{}
};
} // namespace Impl
-template<typename iType>
+template< typename iType >
KOKKOS_INLINE_FUNCTION
Impl::TeamThreadRangeBoundariesStruct<iType,Impl::SerialTeamMember>
TeamThreadRange( const Impl::SerialTeamMember& thread, const iType & count )
{
- return Impl::TeamThreadRangeBoundariesStruct<iType,Impl::SerialTeamMember>(thread,count);
+ return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::SerialTeamMember >( thread, count );
}
-template<typename iType>
+template< typename iType1, typename iType2 >
KOKKOS_INLINE_FUNCTION
-Impl::TeamThreadRangeBoundariesStruct<iType,Impl::SerialTeamMember>
-TeamThreadRange( const Impl::SerialTeamMember& thread, const iType & begin , const iType & end )
+Impl::TeamThreadRangeBoundariesStruct< typename std::common_type< iType1, iType2 >::type,
+ Impl::SerialTeamMember >
+TeamThreadRange( const Impl::SerialTeamMember& thread, const iType1 & begin, const iType2 & end )
{
- return Impl::TeamThreadRangeBoundariesStruct<iType,Impl::SerialTeamMember>(thread,begin,end);
+ typedef typename std::common_type< iType1, iType2 >::type iType;
+ return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::SerialTeamMember >( thread, iType(begin), iType(end) );
}
template<typename iType>
KOKKOS_INLINE_FUNCTION
Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::SerialTeamMember >
ThreadVectorRange(const Impl::SerialTeamMember& thread, const iType& count) {
return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::SerialTeamMember >(thread,count);
}
KOKKOS_INLINE_FUNCTION
Impl::ThreadSingleStruct<Impl::SerialTeamMember> PerTeam(const Impl::SerialTeamMember& thread) {
return Impl::ThreadSingleStruct<Impl::SerialTeamMember>(thread);
}
KOKKOS_INLINE_FUNCTION
Impl::VectorSingleStruct<Impl::SerialTeamMember> PerThread(const Impl::SerialTeamMember& thread) {
return Impl::VectorSingleStruct<Impl::SerialTeamMember>(thread);
}
} // namespace Kokkos
namespace Kokkos {
/** \brief Inter-thread parallel_for. Executes lambda(iType i) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all threads of the the calling thread team.
* This functionality requires C++11 support.*/
template<typename iType, class Lambda>
KOKKOS_INLINE_FUNCTION
void parallel_for(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::SerialTeamMember>& loop_boundaries, const Lambda& lambda) {
for( iType i = loop_boundaries.begin; i < loop_boundaries.end; i+=loop_boundaries.increment)
lambda(i);
}
/** \brief Inter-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all threads of the the calling thread team and a summation of
* val is performed and put into result. This functionality requires C++11 support.*/
template< typename iType, class Lambda, typename ValueType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::SerialTeamMember>& loop_boundaries,
const Lambda & lambda, ValueType& result) {
result = ValueType();
for( iType i = loop_boundaries.begin; i < loop_boundaries.end; i+=loop_boundaries.increment) {
ValueType tmp = ValueType();
lambda(i,tmp);
result+=tmp;
}
result = loop_boundaries.thread.team_reduce(result,Impl::JoinAdd<ValueType>());
}
/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a reduction of
* val is performed using JoinType(ValueType& val, const ValueType& update) and put into init_result.
* The input value of init_result is used as initializer for temporary variables of ValueType. Therefore
* the input value should be the neutral element with respect to the join operation (e.g. '0 for +-' or
* '1 for *'). This functionality requires C++11 support.*/
template< typename iType, class Lambda, typename ValueType, class JoinType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::SerialTeamMember>& loop_boundaries,
const Lambda & lambda, const JoinType& join, ValueType& init_result) {
ValueType result = init_result;
for( iType i = loop_boundaries.begin; i < loop_boundaries.end; i+=loop_boundaries.increment) {
ValueType tmp = ValueType();
lambda(i,tmp);
join(result,tmp);
}
init_result = loop_boundaries.thread.team_reduce(result,Impl::JoinLambdaAdapter<ValueType,JoinType>(join));
}
} //namespace Kokkos
namespace Kokkos {
/** \brief Intra-thread vector parallel_for. Executes lambda(iType i) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes of the the calling thread.
* This functionality requires C++11 support.*/
template<typename iType, class Lambda>
KOKKOS_INLINE_FUNCTION
void parallel_for(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::SerialTeamMember >&
loop_boundaries, const Lambda& lambda) {
#ifdef KOKKOS_HAVE_PRAGMA_IVDEP
#pragma ivdep
#endif
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment)
lambda(i);
}
/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a summation of
* val is performed and put into result. This functionality requires C++11 support.*/
template< typename iType, class Lambda, typename ValueType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::SerialTeamMember >&
loop_boundaries, const Lambda & lambda, ValueType& result) {
result = ValueType();
#ifdef KOKKOS_HAVE_PRAGMA_IVDEP
#pragma ivdep
#endif
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
ValueType tmp = ValueType();
lambda(i,tmp);
result+=tmp;
}
}
/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a reduction of
* val is performed using JoinType(ValueType& val, const ValueType& update) and put into init_result.
* The input value of init_result is used as initializer for temporary variables of ValueType. Therefore
* the input value should be the neutral element with respect to the join operation (e.g. '0 for +-' or
* '1 for *'). This functionality requires C++11 support.*/
template< typename iType, class Lambda, typename ValueType, class JoinType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::SerialTeamMember >&
loop_boundaries, const Lambda & lambda, const JoinType& join, ValueType& init_result) {
ValueType result = init_result;
#ifdef KOKKOS_HAVE_PRAGMA_IVDEP
#pragma ivdep
#endif
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
ValueType tmp = ValueType();
lambda(i,tmp);
join(result,tmp);
}
init_result = result;
}
/** \brief Intra-thread vector parallel exclusive prefix sum. Executes lambda(iType i, ValueType & val, bool final)
* for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes in the thread and a scan operation is performed.
* Depending on the target execution space the operator might be called twice: once with final=false
* and once with final=true. When final==true val contains the prefix sum value. The contribution of this
* "i" needs to be added to val no matter whether final==true or not. In a serial execution
* (i.e. team_size==1) the operator is only called once with final==true. Scan_val will be set
* to the final sum value over all vector lanes.
* This functionality requires C++11 support.*/
template< typename iType, class FunctorType >
KOKKOS_INLINE_FUNCTION
void parallel_scan(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::SerialTeamMember >&
loop_boundaries, const FunctorType & lambda) {
typedef Kokkos::Impl::FunctorValueTraits< FunctorType , void > ValueTraits ;
typedef typename ValueTraits::value_type value_type ;
value_type scan_val = value_type();
#ifdef KOKKOS_HAVE_PRAGMA_IVDEP
#pragma ivdep
#endif
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
lambda(i,scan_val,true);
}
}
} // namespace Kokkos
namespace Kokkos {
template<class FunctorType>
KOKKOS_INLINE_FUNCTION
void single(const Impl::VectorSingleStruct<Impl::SerialTeamMember>& , const FunctorType& lambda) {
lambda();
}
template<class FunctorType>
KOKKOS_INLINE_FUNCTION
void single(const Impl::ThreadSingleStruct<Impl::SerialTeamMember>& , const FunctorType& lambda) {
lambda();
}
template<class FunctorType, class ValueType>
KOKKOS_INLINE_FUNCTION
void single(const Impl::VectorSingleStruct<Impl::SerialTeamMember>& , const FunctorType& lambda, ValueType& val) {
lambda(val);
}
template<class FunctorType, class ValueType>
KOKKOS_INLINE_FUNCTION
void single(const Impl::ThreadSingleStruct<Impl::SerialTeamMember>& , const FunctorType& lambda, ValueType& val) {
lambda(val);
}
}
//----------------------------------------------------------------------------
#include <impl/Kokkos_Serial_Task.hpp>
#endif // defined( KOKKOS_HAVE_SERIAL )
#endif /* #define KOKKOS_SERIAL_HPP */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
-
diff --git a/lib/kokkos/core/src/Kokkos_TaskPolicy.hpp b/lib/kokkos/core/src/Kokkos_TaskPolicy.hpp
index fc9113b75..05ed5103b 100644
--- a/lib/kokkos/core/src/Kokkos_TaskPolicy.hpp
+++ b/lib/kokkos/core/src/Kokkos_TaskPolicy.hpp
@@ -1,1109 +1,47 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
-// Experimental unified task-data parallel manycore LDRD
+// For backward compatibility:
-#ifndef KOKKOS_TASKPOLICY_HPP
-#define KOKKOS_TASKPOLICY_HPP
-
-//----------------------------------------------------------------------------
-
-#include <Kokkos_Core_fwd.hpp>
-
-// If compiling with CUDA then must be using CUDA 8 or better
-// and use relocateable device code to enable the task policy.
-// nvcc relocatable device code option: --relocatable-device-code=true
-
-#if ( defined( KOKKOS_COMPILER_NVCC ) )
- #if ( 8000 <= CUDA_VERSION ) && \
- defined( KOKKOS_CUDA_USE_RELOCATABLE_DEVICE_CODE )
-
- #define KOKKOS_ENABLE_TASKPOLICY
-
- #endif
-#else
-
-#define KOKKOS_ENABLE_TASKPOLICY
-
-#endif
-
-
-#if defined( KOKKOS_ENABLE_TASKPOLICY )
-
-//----------------------------------------------------------------------------
-
-#include <Kokkos_MemoryPool.hpp>
-#include <impl/Kokkos_Tags.hpp>
-#include <impl/Kokkos_TaskQueue.hpp>
-
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-
-enum TaskType { TaskTeam = Impl::TaskBase<void,void,void>::TaskTeam
- , TaskSingle = Impl::TaskBase<void,void,void>::TaskSingle };
-
-enum TaskPriority { TaskHighPriority = 0
- , TaskRegularPriority = 1
- , TaskLowPriority = 2 };
-
-template< typename Space >
-class TaskPolicy ;
-
-template< typename Space >
-void wait( TaskPolicy< Space > const & );
-
-} // namespace Kokkos
-
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Impl {
-
-/*\brief Implementation data for task data management, access, and execution.
- *
- * CRTP Inheritance structure to allow static_cast from the
- * task root type and a task's FunctorType.
- *
- * TaskBase< Space , ResultType , FunctorType >
- * : TaskBase< Space , ResultType , void >
- * , FunctorType
- * { ... };
- *
- * TaskBase< Space , ResultType , void >
- * : TaskBase< Space , void , void >
- * { ... };
- */
-template< typename Space , typename ResultType , typename FunctorType >
-class TaskBase ;
-
-template< typename Space >
-class TaskExec ;
-
-}} // namespace Kokkos::Impl
-
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-
-/**
- *
- * Future< space > // value_type == void
- * Future< value > // space == Default
- * Future< value , space >
- *
- */
-template< typename Arg1 /* = void */ , typename Arg2 /* = void */ >
-class Future {
-private:
-
- template< typename > friend class TaskPolicy ;
- template< typename , typename > friend class Future ;
- template< typename , typename , typename > friend class Impl::TaskBase ;
-
- enum { Arg1_is_space = Kokkos::Impl::is_space< Arg1 >::value };
- enum { Arg2_is_space = Kokkos::Impl::is_space< Arg2 >::value };
- enum { Arg1_is_value = ! Arg1_is_space &&
- ! std::is_same< Arg1 , void >::value };
- enum { Arg2_is_value = ! Arg2_is_space &&
- ! std::is_same< Arg2 , void >::value };
-
- static_assert( ! ( Arg1_is_space && Arg2_is_space )
- , "Future cannot be given two spaces" );
-
- static_assert( ! ( Arg1_is_value && Arg2_is_value )
- , "Future cannot be given two value types" );
-
- using ValueType =
- typename std::conditional< Arg1_is_value , Arg1 ,
- typename std::conditional< Arg2_is_value , Arg2 , void
- >::type >::type ;
-
- using Space =
- typename std::conditional< Arg1_is_space , Arg1 ,
- typename std::conditional< Arg2_is_space , Arg2 , void
- >::type >::type ;
-
- using task_base = Impl::TaskBase< Space , ValueType , void > ;
- using queue_type = Impl::TaskQueue< Space > ;
-
- task_base * m_task ;
-
- KOKKOS_INLINE_FUNCTION explicit
- Future( task_base * task ) : m_task(0)
- { if ( task ) queue_type::assign( & m_task , task ); }
-
- //----------------------------------------
-
-public:
-
- using execution_space = typename Space::execution_space ;
- using value_type = ValueType ;
-
- //----------------------------------------
-
- KOKKOS_INLINE_FUNCTION
- bool is_null() const { return 0 == m_task ; }
-
- KOKKOS_INLINE_FUNCTION
- int reference_count() const
- { return 0 != m_task ? m_task->reference_count() : 0 ; }
-
- //----------------------------------------
-
- KOKKOS_INLINE_FUNCTION
- ~Future() { if ( m_task ) queue_type::assign( & m_task , (task_base*)0 ); }
-
- //----------------------------------------
-
- KOKKOS_INLINE_FUNCTION
- constexpr Future() noexcept : m_task(0) {}
-
- KOKKOS_INLINE_FUNCTION
- Future( Future && rhs )
- : m_task( rhs.m_task ) { rhs.m_task = 0 ; }
-
- KOKKOS_INLINE_FUNCTION
- Future( const Future & rhs )
- : m_task(0)
- { if ( rhs.m_task ) queue_type::assign( & m_task , rhs.m_task ); }
-
- KOKKOS_INLINE_FUNCTION
- Future & operator = ( Future && rhs )
- {
- if ( m_task ) queue_type::assign( & m_task , (task_base*)0 );
- m_task = rhs.m_task ;
- rhs.m_task = 0 ;
- return *this ;
- }
-
- KOKKOS_INLINE_FUNCTION
- Future & operator = ( const Future & rhs )
- {
- if ( m_task || rhs.m_task ) queue_type::assign( & m_task , rhs.m_task );
- return *this ;
- }
-
- //----------------------------------------
-
- template< class A1 , class A2 >
- KOKKOS_INLINE_FUNCTION
- Future( Future<A1,A2> && rhs )
- : m_task( rhs.m_task )
- {
- static_assert
- ( std::is_same< Space , void >::value ||
- std::is_same< Space , typename Future<A1,A2>::Space >::value
- , "Assigned Futures must have the same space" );
-
- static_assert
- ( std::is_same< value_type , void >::value ||
- std::is_same< value_type , typename Future<A1,A2>::value_type >::value
- , "Assigned Futures must have the same value_type" );
-
- rhs.m_task = 0 ;
- }
-
- template< class A1 , class A2 >
- KOKKOS_INLINE_FUNCTION
- Future( const Future<A1,A2> & rhs )
- : m_task(0)
- {
- static_assert
- ( std::is_same< Space , void >::value ||
- std::is_same< Space , typename Future<A1,A2>::Space >::value
- , "Assigned Futures must have the same space" );
-
- static_assert
- ( std::is_same< value_type , void >::value ||
- std::is_same< value_type , typename Future<A1,A2>::value_type >::value
- , "Assigned Futures must have the same value_type" );
-
- if ( rhs.m_task ) queue_type::assign( & m_task , rhs.m_task );
- }
-
- template< class A1 , class A2 >
- KOKKOS_INLINE_FUNCTION
- Future & operator = ( const Future<A1,A2> & rhs )
- {
- static_assert
- ( std::is_same< Space , void >::value ||
- std::is_same< Space , typename Future<A1,A2>::Space >::value
- , "Assigned Futures must have the same space" );
-
- static_assert
- ( std::is_same< value_type , void >::value ||
- std::is_same< value_type , typename Future<A1,A2>::value_type >::value
- , "Assigned Futures must have the same value_type" );
-
- if ( m_task || rhs.m_task ) queue_type::assign( & m_task , rhs.m_task );
- return *this ;
- }
-
- template< class A1 , class A2 >
- KOKKOS_INLINE_FUNCTION
- Future & operator = ( Future<A1,A2> && rhs )
- {
- static_assert
- ( std::is_same< Space , void >::value ||
- std::is_same< Space , typename Future<A1,A2>::Space >::value
- , "Assigned Futures must have the same space" );
-
- static_assert
- ( std::is_same< value_type , void >::value ||
- std::is_same< value_type , typename Future<A1,A2>::value_type >::value
- , "Assigned Futures must have the same value_type" );
-
- if ( m_task ) queue_type::assign( & m_task , (task_base*) 0 );
- m_task = rhs.m_task ;
- rhs.m_task = 0 ;
- return *this ;
- }
-
- //----------------------------------------
-
- KOKKOS_INLINE_FUNCTION
- typename task_base::get_return_type
- get() const
- {
- if ( 0 == m_task ) {
- Kokkos::abort( "Kokkos:::Future::get ERROR: is_null()");
- }
- return m_task->get();
- }
-};
-
-} // namespace Kokkos
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-
-template< typename ExecSpace >
-class TaskPolicy
-{
-private:
-
- using track_type = Kokkos::Experimental::Impl::SharedAllocationTracker ;
- using queue_type = Kokkos::Impl::TaskQueue< ExecSpace > ;
- using task_base = Impl::TaskBase< ExecSpace , void , void > ;
-
- track_type m_track ;
- queue_type * m_queue ;
-
- //----------------------------------------
- // Process optional arguments to spawn and respawn functions
-
- KOKKOS_INLINE_FUNCTION static
- void assign( task_base * const ) {}
-
- // TaskTeam or TaskSingle
- template< typename ... Options >
- KOKKOS_INLINE_FUNCTION static
- void assign( task_base * const task
- , TaskType const & arg
- , Options const & ... opts )
- {
- task->m_task_type = arg ;
- assign( task , opts ... );
- }
-
- // TaskHighPriority or TaskRegularPriority or TaskLowPriority
- template< typename ... Options >
- KOKKOS_INLINE_FUNCTION static
- void assign( task_base * const task
- , TaskPriority const & arg
- , Options const & ... opts )
- {
- task->m_priority = arg ;
- assign( task , opts ... );
- }
-
- // Future for a dependence
- template< typename A1 , typename A2 , typename ... Options >
- KOKKOS_INLINE_FUNCTION static
- void assign( task_base * const task
- , Future< A1 , A2 > const & arg
- , Options const & ... opts )
- {
- // Assign dependence to task->m_next
- // which will be processed within subsequent call to schedule.
- // Error if the dependence is reset.
-
- if ( 0 != Kokkos::atomic_exchange(& task->m_next, arg.m_task) ) {
- Kokkos::abort("TaskPolicy ERROR: resetting task dependence");
- }
-
- if ( 0 != arg.m_task ) {
- // The future may be destroyed upon returning from this call
- // so increment reference count to track this assignment.
- Kokkos::atomic_fetch_add( &(arg.m_task->m_ref_count) , 1 );
- }
-
- assign( task , opts ... );
- }
-
- //----------------------------------------
-
-public:
-
- using execution_policy = TaskPolicy ;
- using execution_space = ExecSpace ;
- using memory_space = typename queue_type::memory_space ;
- using member_type = Kokkos::Impl::TaskExec< ExecSpace > ;
-
- KOKKOS_INLINE_FUNCTION
- TaskPolicy() : m_track(), m_queue(0) {}
-
- KOKKOS_INLINE_FUNCTION
- TaskPolicy( TaskPolicy && rhs ) = default ;
-
- KOKKOS_INLINE_FUNCTION
- TaskPolicy( TaskPolicy const & rhs ) = default ;
-
- KOKKOS_INLINE_FUNCTION
- TaskPolicy & operator = ( TaskPolicy && rhs ) = default ;
-
- KOKKOS_INLINE_FUNCTION
- TaskPolicy & operator = ( TaskPolicy const & rhs ) = default ;
-
- TaskPolicy( memory_space const & arg_memory_space
- , unsigned const arg_memory_pool_capacity
- , unsigned const arg_memory_pool_log2_superblock = 12 )
- : m_track()
- , m_queue(0)
- {
- typedef Kokkos::Experimental::Impl::SharedAllocationRecord
- < memory_space , typename queue_type::Destroy >
- record_type ;
-
- record_type * record =
- record_type::allocate( arg_memory_space
- , "TaskQueue"
- , sizeof(queue_type)
- );
-
- m_queue = new( record->data() )
- queue_type( arg_memory_space
- , arg_memory_pool_capacity
- , arg_memory_pool_log2_superblock );
-
- record->m_destroy.m_queue = m_queue ;
-
- m_track.assign_allocated_record_to_uninitialized( record );
- }
-
- //----------------------------------------
- /**\brief Allocation size for a spawned task */
- template< typename FunctorType >
- KOKKOS_FUNCTION
- size_t spawn_allocation_size() const
- {
- using task_type = Impl::TaskBase< execution_space
- , typename FunctorType::value_type
- , FunctorType > ;
-
- return m_queue->allocate_block_size( sizeof(task_type) );
- }
-
- /**\brief Allocation size for a when_all aggregate */
- KOKKOS_FUNCTION
- size_t when_all_allocation_size( int narg ) const
- {
- using task_base = Kokkos::Impl::TaskBase< ExecSpace , void , void > ;
-
- return m_queue->allocate_block_size( sizeof(task_base) + narg * sizeof(task_base*) );
- }
-
- //----------------------------------------
-
- /**\brief A task spawns a task with options
- *
- * 1) High, Normal, or Low priority
- * 2) With or without dependence
- * 3) Team or Serial
- */
- template< typename FunctorType , typename ... Options >
- KOKKOS_FUNCTION
- Future< typename FunctorType::value_type , ExecSpace >
- task_spawn( FunctorType const & arg_functor
- , Options const & ... arg_options
- ) const
- {
- using value_type = typename FunctorType::value_type ;
- using future_type = Future< value_type , execution_space > ;
- using task_type = Impl::TaskBase< execution_space
- , value_type
- , FunctorType > ;
-
- //----------------------------------------
- // Give single-thread back-ends an opportunity to clear
- // queue of ready tasks before allocating a new task
-
- m_queue->iff_single_thread_recursive_execute();
-
- //----------------------------------------
-
- future_type f ;
-
- // Allocate task from memory pool
- f.m_task =
- reinterpret_cast< task_type * >(m_queue->allocate(sizeof(task_type)));
-
- if ( f.m_task ) {
-
- // Placement new construction
- new ( f.m_task ) task_type( arg_functor );
-
- // Reference count starts at two
- // +1 for matching decrement when task is complete
- // +1 for future
- f.m_task->m_queue = m_queue ;
- f.m_task->m_ref_count = 2 ;
- f.m_task->m_alloc_size = sizeof(task_type);
-
- assign( f.m_task , arg_options... );
-
- // Spawning from within the execution space so the
- // apply function pointer is guaranteed to be valid
- f.m_task->m_apply = task_type::apply ;
-
- m_queue->schedule( f.m_task );
- // this task may be updated or executed at any moment
- }
-
- return f ;
- }
-
- /**\brief The host process spawns a task with options
- *
- * 1) High, Normal, or Low priority
- * 2) With or without dependence
- * 3) Team or Serial
- */
- template< typename FunctorType , typename ... Options >
- inline
- Future< typename FunctorType::value_type , ExecSpace >
- host_spawn( FunctorType const & arg_functor
- , Options const & ... arg_options
- ) const
- {
- using value_type = typename FunctorType::value_type ;
- using future_type = Future< value_type , execution_space > ;
- using task_type = Impl::TaskBase< execution_space
- , value_type
- , FunctorType > ;
-
- future_type f ;
-
- // Allocate task from memory pool
- f.m_task =
- reinterpret_cast<task_type*>( m_queue->allocate(sizeof(task_type)) );
-
- if ( f.m_task ) {
-
- // Placement new construction
- new( f.m_task ) task_type( arg_functor );
-
- // Reference count starts at two:
- // +1 to match decrement when task completes
- // +1 for the future
- f.m_task->m_queue = m_queue ;
- f.m_task->m_ref_count = 2 ;
- f.m_task->m_alloc_size = sizeof(task_type);
-
- assign( f.m_task , arg_options... );
-
- // Potentially spawning outside execution space so the
- // apply function pointer must be obtained from execution space.
- // Required for Cuda execution space function pointer.
- queue_type::specialization::template
- proc_set_apply< FunctorType >( & f.m_task->m_apply );
-
- m_queue->schedule( f.m_task );
- }
- return f ;
- }
-
- /**\brief Return a future that is complete
- * when all input futures are complete.
- */
- template< typename A1 , typename A2 >
- KOKKOS_FUNCTION
- Future< ExecSpace >
- when_all( int narg , Future< A1 , A2 > const * const arg ) const
- {
- static_assert
- ( std::is_same< execution_space
- , typename Future< A1 , A2 >::execution_space
- >::value
- , "Future must have same execution space" );
-
- using future_type = Future< ExecSpace > ;
- using task_base = Kokkos::Impl::TaskBase< ExecSpace , void , void > ;
-
- future_type f ;
-
- size_t const size = sizeof(task_base) + narg * sizeof(task_base*);
-
- f.m_task =
- reinterpret_cast< task_base * >( m_queue->allocate( size ) );
-
- if ( f.m_task ) {
-
- new( f.m_task ) task_base();
-
- // Reference count starts at two:
- // +1 to match decrement when task completes
- // +1 for the future
- f.m_task->m_queue = m_queue ;
- f.m_task->m_ref_count = 2 ;
- f.m_task->m_alloc_size = size ;
- f.m_task->m_dep_count = narg ;
- f.m_task->m_task_type = task_base::Aggregate ;
-
- task_base ** const dep = f.m_task->aggregate_dependences();
-
- // Assign dependences to increment their reference count
- // The futures may be destroyed upon returning from this call
- // so increment reference count to track this assignment.
-
- for ( int i = 0 ; i < narg ; ++i ) {
- task_base * const t = dep[i] = arg[i].m_task ;
- if ( 0 != t ) {
- Kokkos::atomic_fetch_add( &(t->m_ref_count) , 1 );
- }
- }
-
- m_queue->schedule( f.m_task );
- // this when_all may be processed at any moment
- }
-
- return f ;
- }
-
- /**\brief An executing task respawns itself with options
- *
- * 1) High, Normal, or Low priority
- * 2) With or without dependence
- */
- template< class FunctorType , typename ... Options >
- KOKKOS_FUNCTION
- void respawn( FunctorType * task_self
- , Options const & ... arg_options ) const
- {
- using value_type = typename FunctorType::value_type ;
- using task_type = Impl::TaskBase< execution_space
- , value_type
- , FunctorType > ;
-
- task_base * const zero = (task_base *) 0 ;
- task_base * const lock = (task_base *) task_base::LockTag ;
- task_type * const task = static_cast< task_type * >( task_self );
-
- // Precondition:
- // task is in Executing state
- // therefore m_next == LockTag
- //
- // Change to m_next == 0 for no dependence
-
- if ( lock != Kokkos::atomic_exchange( & task->m_next, zero ) ) {
- Kokkos::abort("TaskPolicy::respawn ERROR: already respawned");
- }
-
- assign( task , arg_options... );
-
- // Postcondition:
- // task is in Executing-Respawn state
- // therefore m_next == dependece or 0
- }
-
- //----------------------------------------
-
- template< typename S >
- friend
- void Kokkos::wait( Kokkos::TaskPolicy< S > const & );
-
- //----------------------------------------
-
- inline
- int allocation_capacity() const noexcept
- { return m_queue->m_memory.get_mem_size(); }
-
- KOKKOS_INLINE_FUNCTION
- int allocated_task_count() const noexcept
- { return m_queue->m_count_alloc ; }
-
- KOKKOS_INLINE_FUNCTION
- int allocated_task_count_max() const noexcept
- { return m_queue->m_max_alloc ; }
-
- KOKKOS_INLINE_FUNCTION
- long allocated_task_count_accum() const noexcept
- { return m_queue->m_accum_alloc ; }
-
-};
-
-template< typename ExecSpace >
-inline
-void wait( TaskPolicy< ExecSpace > const & policy )
-{ policy.m_queue->execute(); }
-
-} // namespace Kokkos
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Experimental {
-namespace Impl {
-
-struct FutureValueTypeIsVoidError {};
-
-template < class ExecSpace , class ResultType , class FunctorType >
-class TaskMember ;
-
-} /* namespace Impl */
-} /* namespace Experimental */
-} /* namespace Kokkos */
-
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Experimental {
-
-/**\brief States of a task */
-enum TaskState
- { TASK_STATE_NULL = 0 ///< Does not exist
- , TASK_STATE_CONSTRUCTING = 1 ///< Is under construction
- , TASK_STATE_WAITING = 2 ///< Is waiting for execution
- , TASK_STATE_EXECUTING = 4 ///< Is executing
- , TASK_STATE_COMPLETE = 8 ///< Execution is complete
- };
-
-/**\brief Tag for Future<Latch,Space>
- */
-struct Latch {};
-
-/**
- *
- * Future< space > // value_type == void
- * Future< value > // space == Default
- * Future< value , space >
- *
- */
-template< class Arg1 = void , class Arg2 = void >
-class Future {
-private:
-
- template< class , class , class > friend class Impl::TaskMember ;
- template< class > friend class TaskPolicy ;
- template< class , class > friend class Future ;
-
- // Argument #2, if not void, must be the space.
- enum { Arg1_is_space = Kokkos::Impl::is_execution_space< Arg1 >::value };
- enum { Arg2_is_space = Kokkos::Impl::is_execution_space< Arg2 >::value };
- enum { Arg2_is_void = std::is_same< Arg2 , void >::value };
-
- struct ErrorNoExecutionSpace {};
-
- enum { Opt1 = Arg1_is_space && Arg2_is_void
- , Opt2 = ! Arg1_is_space && Arg2_is_void
- , Opt3 = ! Arg1_is_space && Arg2_is_space
- , OptOK = Kokkos::Impl::StaticAssert< Opt1 || Opt2 || Opt3 , ErrorNoExecutionSpace >::value
- };
-
- typedef typename
- Kokkos::Impl::if_c< Opt2 || Opt3 , Arg1 , void >::type
- ValueType ;
-
- typedef typename
- Kokkos::Impl::if_c< Opt1 , Arg1 , typename
- Kokkos::Impl::if_c< Opt2 , Kokkos::DefaultExecutionSpace , typename
- Kokkos::Impl::if_c< Opt3 , Arg2 , void
- >::type >::type >::type
- ExecutionSpace ;
-
- typedef Impl::TaskMember< ExecutionSpace , void , void > TaskRoot ;
- typedef Impl::TaskMember< ExecutionSpace , ValueType , void > TaskValue ;
-
- TaskRoot * m_task ;
-
- KOKKOS_INLINE_FUNCTION explicit
- Future( TaskRoot * task )
- : m_task(0)
- { TaskRoot::assign( & m_task , TaskRoot::template verify_type< ValueType >( task ) ); }
-
- //----------------------------------------
-
-public:
-
- typedef ValueType value_type;
- typedef ExecutionSpace execution_space ;
-
- //----------------------------------------
-
- KOKKOS_INLINE_FUNCTION
- TaskState get_task_state() const
- { return 0 != m_task ? m_task->get_state() : TASK_STATE_NULL ; }
-
- KOKKOS_INLINE_FUNCTION
- bool is_null() const { return 0 == m_task ; }
-
- KOKKOS_INLINE_FUNCTION
- int reference_count() const
- { return 0 != m_task ? m_task->reference_count() : 0 ; }
-
- //----------------------------------------
-
- KOKKOS_INLINE_FUNCTION
- ~Future() { TaskRoot::assign( & m_task , 0 ); }
-
- //----------------------------------------
-
- KOKKOS_INLINE_FUNCTION
- Future() : m_task(0) {}
-
- KOKKOS_INLINE_FUNCTION
- Future( const Future & rhs )
- : m_task(0)
- { TaskRoot::assign( & m_task , rhs.m_task ); }
-
- KOKKOS_INLINE_FUNCTION
- Future & operator = ( const Future & rhs )
- { TaskRoot::assign( & m_task , rhs.m_task ); return *this ; }
-
- //----------------------------------------
-
- template< class A1 , class A2 >
- KOKKOS_INLINE_FUNCTION
- Future( const Future<A1,A2> & rhs )
- : m_task(0)
- { TaskRoot::assign( & m_task , TaskRoot::template verify_type< value_type >( rhs.m_task ) ); }
-
- template< class A1 , class A2 >
- KOKKOS_INLINE_FUNCTION
- Future & operator = ( const Future<A1,A2> & rhs )
- { TaskRoot::assign( & m_task , TaskRoot::template verify_type< value_type >( rhs.m_task ) ); return *this ; }
-
- //----------------------------------------
-
- typedef typename TaskValue::get_result_type get_result_type ;
-
- KOKKOS_INLINE_FUNCTION
- get_result_type get() const
- {
- if ( 0 == m_task ) {
- Kokkos::abort( "Kokkos::Experimental::Future::get ERROR: is_null()");
- }
- return static_cast<TaskValue*>( m_task )->get();
- }
-
- //----------------------------------------
-};
-
-template< class Arg2 >
-class Future< Latch , Arg2 > {
-private:
-
- template< class , class , class > friend class Impl::TaskMember ;
- template< class > friend class TaskPolicy ;
- template< class , class > friend class Future ;
-
- // Argument #2, if not void, must be the space.
- enum { Arg2_is_space = Kokkos::Impl::is_execution_space< Arg2 >::value };
- enum { Arg2_is_void = std::is_same< Arg2 , void >::value };
-
- static_assert( Arg2_is_space || Arg2_is_void
- , "Future template argument #2 must be a space" );
-
- typedef typename
- std::conditional< Arg2_is_space , Arg2 , Kokkos::DefaultExecutionSpace >
- ::type ExecutionSpace ;
-
- typedef Impl::TaskMember< ExecutionSpace , void , void > TaskRoot ;
-
- TaskRoot * m_task ;
-
- KOKKOS_INLINE_FUNCTION explicit
- Future( TaskRoot * task )
- : m_task(0)
- { TaskRoot::assign( & m_task , task ); }
-
- //----------------------------------------
-
-public:
-
- typedef void value_type;
- typedef ExecutionSpace execution_space ;
-
- //----------------------------------------
-
- KOKKOS_INLINE_FUNCTION
- void add( const int k ) const
- { if ( 0 != m_task ) m_task->latch_add(k); }
-
- //----------------------------------------
-
- KOKKOS_INLINE_FUNCTION
- TaskState get_task_state() const
- { return 0 != m_task ? m_task->get_state() : TASK_STATE_NULL ; }
-
- KOKKOS_INLINE_FUNCTION
- bool is_null() const { return 0 == m_task ; }
-
- //----------------------------------------
-
- KOKKOS_INLINE_FUNCTION
- ~Future() { TaskRoot::assign( & m_task , 0 ); }
-
- //----------------------------------------
-
- KOKKOS_INLINE_FUNCTION
- Future() : m_task(0) {}
-
- KOKKOS_INLINE_FUNCTION
- Future( const Future & rhs )
- : m_task(0)
- { TaskRoot::assign( & m_task , rhs.m_task ); }
-
- KOKKOS_INLINE_FUNCTION
- Future & operator = ( const Future & rhs )
- { TaskRoot::assign( & m_task , rhs.m_task ); return *this ; }
-
- //----------------------------------------
-
- typedef void get_result_type ;
-
- KOKKOS_INLINE_FUNCTION
- void get() const {}
-
- //----------------------------------------
-
-};
-
-namespace Impl {
-
-template< class T >
-struct is_future : public std::false_type {};
-
-template< class Arg0 , class Arg1 >
-struct is_future< Kokkos::Experimental::Future<Arg0,Arg1> >
- : public std::true_type {};
-
-} /* namespace Impl */
-} /* namespace Experimental */
-} /* namespace Kokkos */
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Experimental {
-
-/** \brief If the argument is an execution space then a serial task in that space */
-template< class Arg0 = Kokkos::DefaultExecutionSpace >
-class TaskPolicy {
-public:
-
- typedef typename Arg0::execution_space execution_space ;
-
- //----------------------------------------
-
- TaskPolicy
- ( const unsigned arg_task_max_count
- , const unsigned arg_task_max_size
- , const unsigned arg_task_default_dependence_capacity = 4
- , const unsigned arg_task_team_size = 0 /* choose default */
- );
-
- TaskPolicy() = default ;
- TaskPolicy( TaskPolicy && rhs ) = default ;
- TaskPolicy( const TaskPolicy & rhs ) = default ;
- TaskPolicy & operator = ( TaskPolicy && rhs ) = default ;
- TaskPolicy & operator = ( const TaskPolicy & rhs ) = default ;
-
- //----------------------------------------
- /** \brief Create a serial task with storage for dependences.
- *
- * Postcondition: Task is in the 'constructing' state.
- */
- template< class FunctorType >
- Future< typename FunctorType::value_type , execution_space >
- create( const FunctorType & functor
- , const unsigned dependence_capacity /* = default */ );
-
- template< class FunctorType >
- KOKKOS_INLINE_FUNCTION
- Future< typename FunctorType::value_type , execution_space >
- create_team( const FunctorType & functor
- , const unsigned dependence_capacity /* = default */ );
-
- /** \brief Set dependence that 'after' cannot start execution
- * until 'before' has completed.
- *
- * Precondition: The 'after' task must be in then 'Constructing' state.
- */
- template< class TA , class TB >
- void add_dependence( const Future<TA,execution_space> & after
- , const Future<TB,execution_space> & before ) const ;
-
- /** \brief Spawn a task in the 'Constructing' state
- *
- * Precondition: Task is in the 'constructing' state.
- * Postcondition: Task is waiting, executing, or complete.
- */
- template< class T >
- const Future<T,execution_space> &
- spawn( const Future<T,execution_space> & ) const ;
-
- //----------------------------------------
- /** \brief Query dependence of an executing task */
-
- template< class FunctorType >
- Future< execution_space >
- get_dependence( FunctorType * , const int ) const ;
-
- //----------------------------------------
- /** \brief Clear current dependences of an executing task
- * in preparation for setting new dependences and
- * respawning.
- *
- * Precondition: The functor must be a task in the executing state.
- */
- template< class FunctorType >
- void clear_dependence( FunctorType * ) const ;
-
- /** \brief Set dependence that 'after' cannot resume execution
- * until 'before' has completed.
- *
- * The 'after' functor must be in the executing state
- */
- template< class FunctorType , class TB >
- void add_dependence( FunctorType * after
- , const Future<TB,execution_space> & before ) const ;
-
- /** \brief Respawn (reschedule) an executing task to be called again
- * after all dependences have completed.
- */
- template< class FunctorType >
- void respawn( FunctorType * ) const ;
-};
-
-//----------------------------------------------------------------------------
-/** \brief Create and spawn a single-thread task */
-template< class ExecSpace , class FunctorType >
-inline
-Future< typename FunctorType::value_type , ExecSpace >
-spawn( TaskPolicy<ExecSpace> & policy , const FunctorType & functor )
-{ return policy.spawn( policy.create( functor ) ); }
-
-/** \brief Create and spawn a single-thread task with dependences */
-template< class ExecSpace , class FunctorType , class Arg0 , class Arg1 >
-inline
-Future< typename FunctorType::value_type , ExecSpace >
-spawn( TaskPolicy<ExecSpace> & policy
- , const FunctorType & functor
- , const Future<Arg0,Arg1> & before_0
- , const Future<Arg0,Arg1> & before_1 )
-{
- Future< typename FunctorType::value_type , ExecSpace > f ;
- f = policy.create( functor , 2 );
- policy.add_dependence( f , before_0 );
- policy.add_dependence( f , before_1 );
- policy.spawn( f );
- return f ;
-}
-
-//----------------------------------------------------------------------------
-/** \brief Create and spawn a parallel_for task */
-template< class ExecSpace , class ParallelPolicyType , class FunctorType >
-inline
-Future< typename FunctorType::value_type , ExecSpace >
-spawn_foreach( TaskPolicy<ExecSpace> & task_policy
- , const ParallelPolicyType & parallel_policy
- , const FunctorType & functor )
-{ return task_policy.spawn( task_policy.create_foreach( parallel_policy , functor ) ); }
-
-/** \brief Create and spawn a parallel_reduce task */
-template< class ExecSpace , class ParallelPolicyType , class FunctorType >
-inline
-Future< typename FunctorType::value_type , ExecSpace >
-spawn_reduce( TaskPolicy<ExecSpace> & task_policy
- , const ParallelPolicyType & parallel_policy
- , const FunctorType & functor )
-{ return task_policy.spawn( task_policy.create_reduce( parallel_policy , functor ) ); }
-
-//----------------------------------------------------------------------------
-/** \brief Respawn a task functor with dependences */
-template< class ExecSpace , class FunctorType , class Arg0 , class Arg1 >
-inline
-void respawn( TaskPolicy<ExecSpace> & policy
- , FunctorType * functor
- , const Future<Arg0,Arg1> & before_0
- , const Future<Arg0,Arg1> & before_1
- )
-{
- policy.clear_dependence( functor );
- policy.add_dependence( functor , before_0 );
- policy.add_dependence( functor , before_1 );
- policy.respawn( functor );
-}
-
-//----------------------------------------------------------------------------
-
-template< class ExecSpace >
-void wait( TaskPolicy< ExecSpace > & );
-
-} /* namespace Experimental */
-} /* namespace Kokkos */
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-#endif /* #if defined( KOKKOS_ENABLE_TASKPOLICY ) */
-#endif /* #ifndef KOKKOS_TASKPOLICY_HPP */
+#include <Kokkos_TaskScheduler.hpp>
diff --git a/lib/kokkos/core/src/Kokkos_TaskPolicy.hpp b/lib/kokkos/core/src/Kokkos_TaskScheduler.hpp
similarity index 56%
copy from lib/kokkos/core/src/Kokkos_TaskPolicy.hpp
copy to lib/kokkos/core/src/Kokkos_TaskScheduler.hpp
index fc9113b75..0de926aa1 100644
--- a/lib/kokkos/core/src/Kokkos_TaskPolicy.hpp
+++ b/lib/kokkos/core/src/Kokkos_TaskScheduler.hpp
@@ -1,1109 +1,700 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
-// Experimental unified task-data parallel manycore LDRD
-
-#ifndef KOKKOS_TASKPOLICY_HPP
-#define KOKKOS_TASKPOLICY_HPP
+#ifndef KOKKOS_TASKSCHEDULER_HPP
+#define KOKKOS_TASKSCHEDULER_HPP
//----------------------------------------------------------------------------
#include <Kokkos_Core_fwd.hpp>
// If compiling with CUDA then must be using CUDA 8 or better
// and use relocateable device code to enable the task policy.
// nvcc relocatable device code option: --relocatable-device-code=true
-#if ( defined( KOKKOS_COMPILER_NVCC ) )
+#if ( defined( KOKKOS_HAVE_CUDA ) )
#if ( 8000 <= CUDA_VERSION ) && \
defined( KOKKOS_CUDA_USE_RELOCATABLE_DEVICE_CODE )
- #define KOKKOS_ENABLE_TASKPOLICY
+ #define KOKKOS_ENABLE_TASKDAG
#endif
#else
-
-#define KOKKOS_ENABLE_TASKPOLICY
-
+ #define KOKKOS_ENABLE_TASKDAG
#endif
-#if defined( KOKKOS_ENABLE_TASKPOLICY )
+#if defined( KOKKOS_ENABLE_TASKDAG )
//----------------------------------------------------------------------------
#include <Kokkos_MemoryPool.hpp>
#include <impl/Kokkos_Tags.hpp>
-#include <impl/Kokkos_TaskQueue.hpp>
//----------------------------------------------------------------------------
namespace Kokkos {
-enum TaskType { TaskTeam = Impl::TaskBase<void,void,void>::TaskTeam
- , TaskSingle = Impl::TaskBase<void,void,void>::TaskSingle };
-
-enum TaskPriority { TaskHighPriority = 0
- , TaskRegularPriority = 1
- , TaskLowPriority = 2 };
+// Forward declarations used in Impl::TaskQueue
-template< typename Space >
-class TaskPolicy ;
+template< typename Arg1 = void , typename Arg2 = void >
+class Future ;
template< typename Space >
-void wait( TaskPolicy< Space > const & );
+class TaskScheduler ;
} // namespace Kokkos
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Impl {
-
-/*\brief Implementation data for task data management, access, and execution.
- *
- * CRTP Inheritance structure to allow static_cast from the
- * task root type and a task's FunctorType.
- *
- * TaskBase< Space , ResultType , FunctorType >
- * : TaskBase< Space , ResultType , void >
- * , FunctorType
- * { ... };
- *
- * TaskBase< Space , ResultType , void >
- * : TaskBase< Space , void , void >
- * { ... };
- */
-template< typename Space , typename ResultType , typename FunctorType >
-class TaskBase ;
-
-template< typename Space >
-class TaskExec ;
-
-}} // namespace Kokkos::Impl
+#include <impl/Kokkos_TaskQueue.hpp>
+//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
/**
*
* Future< space > // value_type == void
* Future< value > // space == Default
* Future< value , space >
*
*/
-template< typename Arg1 /* = void */ , typename Arg2 /* = void */ >
+template< typename Arg1 , typename Arg2 >
class Future {
private:
- template< typename > friend class TaskPolicy ;
+ template< typename > friend class TaskScheduler ;
template< typename , typename > friend class Future ;
template< typename , typename , typename > friend class Impl::TaskBase ;
- enum { Arg1_is_space = Kokkos::Impl::is_space< Arg1 >::value };
- enum { Arg2_is_space = Kokkos::Impl::is_space< Arg2 >::value };
+ enum { Arg1_is_space = Kokkos::is_space< Arg1 >::value };
+ enum { Arg2_is_space = Kokkos::is_space< Arg2 >::value };
enum { Arg1_is_value = ! Arg1_is_space &&
! std::is_same< Arg1 , void >::value };
enum { Arg2_is_value = ! Arg2_is_space &&
! std::is_same< Arg2 , void >::value };
static_assert( ! ( Arg1_is_space && Arg2_is_space )
, "Future cannot be given two spaces" );
static_assert( ! ( Arg1_is_value && Arg2_is_value )
, "Future cannot be given two value types" );
using ValueType =
typename std::conditional< Arg1_is_value , Arg1 ,
typename std::conditional< Arg2_is_value , Arg2 , void
>::type >::type ;
using Space =
typename std::conditional< Arg1_is_space , Arg1 ,
typename std::conditional< Arg2_is_space , Arg2 , void
>::type >::type ;
using task_base = Impl::TaskBase< Space , ValueType , void > ;
using queue_type = Impl::TaskQueue< Space > ;
task_base * m_task ;
KOKKOS_INLINE_FUNCTION explicit
Future( task_base * task ) : m_task(0)
{ if ( task ) queue_type::assign( & m_task , task ); }
//----------------------------------------
public:
using execution_space = typename Space::execution_space ;
using value_type = ValueType ;
//----------------------------------------
KOKKOS_INLINE_FUNCTION
bool is_null() const { return 0 == m_task ; }
KOKKOS_INLINE_FUNCTION
int reference_count() const
{ return 0 != m_task ? m_task->reference_count() : 0 ; }
//----------------------------------------
KOKKOS_INLINE_FUNCTION
- ~Future() { if ( m_task ) queue_type::assign( & m_task , (task_base*)0 ); }
+ void clear()
+ { if ( m_task ) queue_type::assign( & m_task , (task_base*)0 ); }
+
+ //----------------------------------------
+
+ KOKKOS_INLINE_FUNCTION
+ ~Future() { clear(); }
//----------------------------------------
KOKKOS_INLINE_FUNCTION
constexpr Future() noexcept : m_task(0) {}
KOKKOS_INLINE_FUNCTION
Future( Future && rhs )
: m_task( rhs.m_task ) { rhs.m_task = 0 ; }
KOKKOS_INLINE_FUNCTION
Future( const Future & rhs )
: m_task(0)
{ if ( rhs.m_task ) queue_type::assign( & m_task , rhs.m_task ); }
KOKKOS_INLINE_FUNCTION
Future & operator = ( Future && rhs )
{
- if ( m_task ) queue_type::assign( & m_task , (task_base*)0 );
+ clear();
m_task = rhs.m_task ;
rhs.m_task = 0 ;
return *this ;
}
KOKKOS_INLINE_FUNCTION
Future & operator = ( const Future & rhs )
{
if ( m_task || rhs.m_task ) queue_type::assign( & m_task , rhs.m_task );
return *this ;
}
//----------------------------------------
template< class A1 , class A2 >
KOKKOS_INLINE_FUNCTION
Future( Future<A1,A2> && rhs )
: m_task( rhs.m_task )
{
static_assert
( std::is_same< Space , void >::value ||
std::is_same< Space , typename Future<A1,A2>::Space >::value
, "Assigned Futures must have the same space" );
static_assert
( std::is_same< value_type , void >::value ||
std::is_same< value_type , typename Future<A1,A2>::value_type >::value
, "Assigned Futures must have the same value_type" );
rhs.m_task = 0 ;
}
template< class A1 , class A2 >
KOKKOS_INLINE_FUNCTION
Future( const Future<A1,A2> & rhs )
: m_task(0)
{
static_assert
( std::is_same< Space , void >::value ||
std::is_same< Space , typename Future<A1,A2>::Space >::value
, "Assigned Futures must have the same space" );
static_assert
( std::is_same< value_type , void >::value ||
std::is_same< value_type , typename Future<A1,A2>::value_type >::value
, "Assigned Futures must have the same value_type" );
if ( rhs.m_task ) queue_type::assign( & m_task , rhs.m_task );
}
template< class A1 , class A2 >
KOKKOS_INLINE_FUNCTION
Future & operator = ( const Future<A1,A2> & rhs )
{
static_assert
( std::is_same< Space , void >::value ||
std::is_same< Space , typename Future<A1,A2>::Space >::value
, "Assigned Futures must have the same space" );
static_assert
( std::is_same< value_type , void >::value ||
std::is_same< value_type , typename Future<A1,A2>::value_type >::value
, "Assigned Futures must have the same value_type" );
if ( m_task || rhs.m_task ) queue_type::assign( & m_task , rhs.m_task );
return *this ;
}
template< class A1 , class A2 >
KOKKOS_INLINE_FUNCTION
Future & operator = ( Future<A1,A2> && rhs )
{
static_assert
( std::is_same< Space , void >::value ||
std::is_same< Space , typename Future<A1,A2>::Space >::value
, "Assigned Futures must have the same space" );
static_assert
( std::is_same< value_type , void >::value ||
std::is_same< value_type , typename Future<A1,A2>::value_type >::value
, "Assigned Futures must have the same value_type" );
- if ( m_task ) queue_type::assign( & m_task , (task_base*) 0 );
+ clear();
m_task = rhs.m_task ;
rhs.m_task = 0 ;
return *this ;
}
//----------------------------------------
KOKKOS_INLINE_FUNCTION
typename task_base::get_return_type
get() const
{
if ( 0 == m_task ) {
Kokkos::abort( "Kokkos:::Future::get ERROR: is_null()");
}
return m_task->get();
}
};
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
+enum TaskType { TaskTeam = Impl::TaskBase<void,void,void>::TaskTeam
+ , TaskSingle = Impl::TaskBase<void,void,void>::TaskSingle };
+
+enum TaskPriority { TaskHighPriority = 0
+ , TaskRegularPriority = 1
+ , TaskLowPriority = 2 };
+
+template< typename Space >
+void wait( TaskScheduler< Space > const & );
+
+} // namespace Kokkos
+
+//----------------------------------------------------------------------------
+
+namespace Kokkos {
+
+
+
+} // namespace Kokkos
+
+//----------------------------------------------------------------------------
+
+namespace Kokkos {
+
template< typename ExecSpace >
-class TaskPolicy
+class TaskScheduler
{
private:
- using track_type = Kokkos::Experimental::Impl::SharedAllocationTracker ;
+ using track_type = Kokkos::Impl::SharedAllocationTracker ;
using queue_type = Kokkos::Impl::TaskQueue< ExecSpace > ;
using task_base = Impl::TaskBase< ExecSpace , void , void > ;
track_type m_track ;
queue_type * m_queue ;
//----------------------------------------
// Process optional arguments to spawn and respawn functions
KOKKOS_INLINE_FUNCTION static
void assign( task_base * const ) {}
// TaskTeam or TaskSingle
template< typename ... Options >
KOKKOS_INLINE_FUNCTION static
void assign( task_base * const task
, TaskType const & arg
, Options const & ... opts )
{
task->m_task_type = arg ;
assign( task , opts ... );
}
// TaskHighPriority or TaskRegularPriority or TaskLowPriority
template< typename ... Options >
KOKKOS_INLINE_FUNCTION static
void assign( task_base * const task
, TaskPriority const & arg
, Options const & ... opts )
{
task->m_priority = arg ;
assign( task , opts ... );
}
// Future for a dependence
template< typename A1 , typename A2 , typename ... Options >
KOKKOS_INLINE_FUNCTION static
void assign( task_base * const task
, Future< A1 , A2 > const & arg
, Options const & ... opts )
{
// Assign dependence to task->m_next
// which will be processed within subsequent call to schedule.
// Error if the dependence is reset.
if ( 0 != Kokkos::atomic_exchange(& task->m_next, arg.m_task) ) {
- Kokkos::abort("TaskPolicy ERROR: resetting task dependence");
+ Kokkos::abort("TaskScheduler ERROR: resetting task dependence");
}
if ( 0 != arg.m_task ) {
// The future may be destroyed upon returning from this call
// so increment reference count to track this assignment.
- Kokkos::atomic_fetch_add( &(arg.m_task->m_ref_count) , 1 );
+ Kokkos::atomic_increment( &(arg.m_task->m_ref_count) );
}
assign( task , opts ... );
}
//----------------------------------------
public:
- using execution_policy = TaskPolicy ;
+ using execution_policy = TaskScheduler ;
using execution_space = ExecSpace ;
using memory_space = typename queue_type::memory_space ;
using member_type = Kokkos::Impl::TaskExec< ExecSpace > ;
KOKKOS_INLINE_FUNCTION
- TaskPolicy() : m_track(), m_queue(0) {}
+ TaskScheduler() : m_track(), m_queue(0) {}
KOKKOS_INLINE_FUNCTION
- TaskPolicy( TaskPolicy && rhs ) = default ;
+ TaskScheduler( TaskScheduler && rhs ) = default ;
KOKKOS_INLINE_FUNCTION
- TaskPolicy( TaskPolicy const & rhs ) = default ;
+ TaskScheduler( TaskScheduler const & rhs ) = default ;
KOKKOS_INLINE_FUNCTION
- TaskPolicy & operator = ( TaskPolicy && rhs ) = default ;
+ TaskScheduler & operator = ( TaskScheduler && rhs ) = default ;
KOKKOS_INLINE_FUNCTION
- TaskPolicy & operator = ( TaskPolicy const & rhs ) = default ;
+ TaskScheduler & operator = ( TaskScheduler const & rhs ) = default ;
- TaskPolicy( memory_space const & arg_memory_space
- , unsigned const arg_memory_pool_capacity
- , unsigned const arg_memory_pool_log2_superblock = 12 )
+ TaskScheduler( memory_space const & arg_memory_space
+ , unsigned const arg_memory_pool_capacity
+ , unsigned const arg_memory_pool_log2_superblock = 12 )
: m_track()
, m_queue(0)
{
- typedef Kokkos::Experimental::Impl::SharedAllocationRecord
+ typedef Kokkos::Impl::SharedAllocationRecord
< memory_space , typename queue_type::Destroy >
record_type ;
record_type * record =
record_type::allocate( arg_memory_space
, "TaskQueue"
, sizeof(queue_type)
);
m_queue = new( record->data() )
queue_type( arg_memory_space
, arg_memory_pool_capacity
, arg_memory_pool_log2_superblock );
record->m_destroy.m_queue = m_queue ;
m_track.assign_allocated_record_to_uninitialized( record );
}
//----------------------------------------
/**\brief Allocation size for a spawned task */
template< typename FunctorType >
KOKKOS_FUNCTION
size_t spawn_allocation_size() const
{
using task_type = Impl::TaskBase< execution_space
, typename FunctorType::value_type
, FunctorType > ;
return m_queue->allocate_block_size( sizeof(task_type) );
}
/**\brief Allocation size for a when_all aggregate */
KOKKOS_FUNCTION
size_t when_all_allocation_size( int narg ) const
{
using task_base = Kokkos::Impl::TaskBase< ExecSpace , void , void > ;
return m_queue->allocate_block_size( sizeof(task_base) + narg * sizeof(task_base*) );
}
//----------------------------------------
/**\brief A task spawns a task with options
*
* 1) High, Normal, or Low priority
* 2) With or without dependence
* 3) Team or Serial
*/
template< typename FunctorType , typename ... Options >
KOKKOS_FUNCTION
Future< typename FunctorType::value_type , ExecSpace >
task_spawn( FunctorType const & arg_functor
, Options const & ... arg_options
) const
{
using value_type = typename FunctorType::value_type ;
using future_type = Future< value_type , execution_space > ;
using task_type = Impl::TaskBase< execution_space
, value_type
, FunctorType > ;
//----------------------------------------
// Give single-thread back-ends an opportunity to clear
// queue of ready tasks before allocating a new task
m_queue->iff_single_thread_recursive_execute();
//----------------------------------------
future_type f ;
// Allocate task from memory pool
f.m_task =
reinterpret_cast< task_type * >(m_queue->allocate(sizeof(task_type)));
if ( f.m_task ) {
// Placement new construction
new ( f.m_task ) task_type( arg_functor );
// Reference count starts at two
// +1 for matching decrement when task is complete
// +1 for future
f.m_task->m_queue = m_queue ;
f.m_task->m_ref_count = 2 ;
f.m_task->m_alloc_size = sizeof(task_type);
assign( f.m_task , arg_options... );
// Spawning from within the execution space so the
// apply function pointer is guaranteed to be valid
f.m_task->m_apply = task_type::apply ;
m_queue->schedule( f.m_task );
// this task may be updated or executed at any moment
}
return f ;
}
/**\brief The host process spawns a task with options
*
* 1) High, Normal, or Low priority
* 2) With or without dependence
* 3) Team or Serial
*/
template< typename FunctorType , typename ... Options >
inline
Future< typename FunctorType::value_type , ExecSpace >
host_spawn( FunctorType const & arg_functor
, Options const & ... arg_options
) const
{
using value_type = typename FunctorType::value_type ;
using future_type = Future< value_type , execution_space > ;
using task_type = Impl::TaskBase< execution_space
, value_type
, FunctorType > ;
+ if ( m_queue == 0 ) {
+ Kokkos::abort("Kokkos::TaskScheduler not initialized");
+ }
+
future_type f ;
// Allocate task from memory pool
f.m_task =
reinterpret_cast<task_type*>( m_queue->allocate(sizeof(task_type)) );
if ( f.m_task ) {
// Placement new construction
new( f.m_task ) task_type( arg_functor );
// Reference count starts at two:
// +1 to match decrement when task completes
// +1 for the future
f.m_task->m_queue = m_queue ;
f.m_task->m_ref_count = 2 ;
f.m_task->m_alloc_size = sizeof(task_type);
assign( f.m_task , arg_options... );
// Potentially spawning outside execution space so the
// apply function pointer must be obtained from execution space.
// Required for Cuda execution space function pointer.
queue_type::specialization::template
proc_set_apply< FunctorType >( & f.m_task->m_apply );
m_queue->schedule( f.m_task );
}
return f ;
}
/**\brief Return a future that is complete
* when all input futures are complete.
*/
template< typename A1 , typename A2 >
KOKKOS_FUNCTION
Future< ExecSpace >
when_all( int narg , Future< A1 , A2 > const * const arg ) const
{
static_assert
( std::is_same< execution_space
, typename Future< A1 , A2 >::execution_space
>::value
, "Future must have same execution space" );
using future_type = Future< ExecSpace > ;
using task_base = Kokkos::Impl::TaskBase< ExecSpace , void , void > ;
future_type f ;
size_t const size = sizeof(task_base) + narg * sizeof(task_base*);
f.m_task =
reinterpret_cast< task_base * >( m_queue->allocate( size ) );
if ( f.m_task ) {
new( f.m_task ) task_base();
// Reference count starts at two:
// +1 to match decrement when task completes
// +1 for the future
f.m_task->m_queue = m_queue ;
f.m_task->m_ref_count = 2 ;
f.m_task->m_alloc_size = size ;
f.m_task->m_dep_count = narg ;
f.m_task->m_task_type = task_base::Aggregate ;
task_base ** const dep = f.m_task->aggregate_dependences();
// Assign dependences to increment their reference count
// The futures may be destroyed upon returning from this call
// so increment reference count to track this assignment.
for ( int i = 0 ; i < narg ; ++i ) {
task_base * const t = dep[i] = arg[i].m_task ;
if ( 0 != t ) {
- Kokkos::atomic_fetch_add( &(t->m_ref_count) , 1 );
+ Kokkos::atomic_increment( &(t->m_ref_count) );
}
}
m_queue->schedule( f.m_task );
// this when_all may be processed at any moment
}
return f ;
}
/**\brief An executing task respawns itself with options
*
* 1) High, Normal, or Low priority
* 2) With or without dependence
*/
template< class FunctorType , typename ... Options >
KOKKOS_FUNCTION
void respawn( FunctorType * task_self
, Options const & ... arg_options ) const
{
using value_type = typename FunctorType::value_type ;
using task_type = Impl::TaskBase< execution_space
, value_type
, FunctorType > ;
task_base * const zero = (task_base *) 0 ;
task_base * const lock = (task_base *) task_base::LockTag ;
task_type * const task = static_cast< task_type * >( task_self );
// Precondition:
// task is in Executing state
// therefore m_next == LockTag
//
// Change to m_next == 0 for no dependence
if ( lock != Kokkos::atomic_exchange( & task->m_next, zero ) ) {
- Kokkos::abort("TaskPolicy::respawn ERROR: already respawned");
+ Kokkos::abort("TaskScheduler::respawn ERROR: already respawned");
}
assign( task , arg_options... );
// Postcondition:
// task is in Executing-Respawn state
// therefore m_next == dependece or 0
}
//----------------------------------------
template< typename S >
friend
- void Kokkos::wait( Kokkos::TaskPolicy< S > const & );
+ void Kokkos::wait( Kokkos::TaskScheduler< S > const & );
//----------------------------------------
inline
int allocation_capacity() const noexcept
{ return m_queue->m_memory.get_mem_size(); }
KOKKOS_INLINE_FUNCTION
int allocated_task_count() const noexcept
{ return m_queue->m_count_alloc ; }
KOKKOS_INLINE_FUNCTION
int allocated_task_count_max() const noexcept
{ return m_queue->m_max_alloc ; }
KOKKOS_INLINE_FUNCTION
long allocated_task_count_accum() const noexcept
{ return m_queue->m_accum_alloc ; }
};
template< typename ExecSpace >
inline
-void wait( TaskPolicy< ExecSpace > const & policy )
+void wait( TaskScheduler< ExecSpace > const & policy )
{ policy.m_queue->execute(); }
} // namespace Kokkos
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Experimental {
-namespace Impl {
-
-struct FutureValueTypeIsVoidError {};
-
-template < class ExecSpace , class ResultType , class FunctorType >
-class TaskMember ;
-
-} /* namespace Impl */
-} /* namespace Experimental */
-} /* namespace Kokkos */
-
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Experimental {
-
-/**\brief States of a task */
-enum TaskState
- { TASK_STATE_NULL = 0 ///< Does not exist
- , TASK_STATE_CONSTRUCTING = 1 ///< Is under construction
- , TASK_STATE_WAITING = 2 ///< Is waiting for execution
- , TASK_STATE_EXECUTING = 4 ///< Is executing
- , TASK_STATE_COMPLETE = 8 ///< Execution is complete
- };
-
-/**\brief Tag for Future<Latch,Space>
- */
-struct Latch {};
-
-/**
- *
- * Future< space > // value_type == void
- * Future< value > // space == Default
- * Future< value , space >
- *
- */
-template< class Arg1 = void , class Arg2 = void >
-class Future {
-private:
-
- template< class , class , class > friend class Impl::TaskMember ;
- template< class > friend class TaskPolicy ;
- template< class , class > friend class Future ;
-
- // Argument #2, if not void, must be the space.
- enum { Arg1_is_space = Kokkos::Impl::is_execution_space< Arg1 >::value };
- enum { Arg2_is_space = Kokkos::Impl::is_execution_space< Arg2 >::value };
- enum { Arg2_is_void = std::is_same< Arg2 , void >::value };
-
- struct ErrorNoExecutionSpace {};
-
- enum { Opt1 = Arg1_is_space && Arg2_is_void
- , Opt2 = ! Arg1_is_space && Arg2_is_void
- , Opt3 = ! Arg1_is_space && Arg2_is_space
- , OptOK = Kokkos::Impl::StaticAssert< Opt1 || Opt2 || Opt3 , ErrorNoExecutionSpace >::value
- };
-
- typedef typename
- Kokkos::Impl::if_c< Opt2 || Opt3 , Arg1 , void >::type
- ValueType ;
-
- typedef typename
- Kokkos::Impl::if_c< Opt1 , Arg1 , typename
- Kokkos::Impl::if_c< Opt2 , Kokkos::DefaultExecutionSpace , typename
- Kokkos::Impl::if_c< Opt3 , Arg2 , void
- >::type >::type >::type
- ExecutionSpace ;
-
- typedef Impl::TaskMember< ExecutionSpace , void , void > TaskRoot ;
- typedef Impl::TaskMember< ExecutionSpace , ValueType , void > TaskValue ;
-
- TaskRoot * m_task ;
-
- KOKKOS_INLINE_FUNCTION explicit
- Future( TaskRoot * task )
- : m_task(0)
- { TaskRoot::assign( & m_task , TaskRoot::template verify_type< ValueType >( task ) ); }
-
- //----------------------------------------
-
-public:
-
- typedef ValueType value_type;
- typedef ExecutionSpace execution_space ;
-
- //----------------------------------------
-
- KOKKOS_INLINE_FUNCTION
- TaskState get_task_state() const
- { return 0 != m_task ? m_task->get_state() : TASK_STATE_NULL ; }
-
- KOKKOS_INLINE_FUNCTION
- bool is_null() const { return 0 == m_task ; }
-
- KOKKOS_INLINE_FUNCTION
- int reference_count() const
- { return 0 != m_task ? m_task->reference_count() : 0 ; }
-
- //----------------------------------------
-
- KOKKOS_INLINE_FUNCTION
- ~Future() { TaskRoot::assign( & m_task , 0 ); }
-
- //----------------------------------------
-
- KOKKOS_INLINE_FUNCTION
- Future() : m_task(0) {}
-
- KOKKOS_INLINE_FUNCTION
- Future( const Future & rhs )
- : m_task(0)
- { TaskRoot::assign( & m_task , rhs.m_task ); }
-
- KOKKOS_INLINE_FUNCTION
- Future & operator = ( const Future & rhs )
- { TaskRoot::assign( & m_task , rhs.m_task ); return *this ; }
-
- //----------------------------------------
-
- template< class A1 , class A2 >
- KOKKOS_INLINE_FUNCTION
- Future( const Future<A1,A2> & rhs )
- : m_task(0)
- { TaskRoot::assign( & m_task , TaskRoot::template verify_type< value_type >( rhs.m_task ) ); }
-
- template< class A1 , class A2 >
- KOKKOS_INLINE_FUNCTION
- Future & operator = ( const Future<A1,A2> & rhs )
- { TaskRoot::assign( & m_task , TaskRoot::template verify_type< value_type >( rhs.m_task ) ); return *this ; }
-
- //----------------------------------------
-
- typedef typename TaskValue::get_result_type get_result_type ;
-
- KOKKOS_INLINE_FUNCTION
- get_result_type get() const
- {
- if ( 0 == m_task ) {
- Kokkos::abort( "Kokkos::Experimental::Future::get ERROR: is_null()");
- }
- return static_cast<TaskValue*>( m_task )->get();
- }
-
- //----------------------------------------
-};
-
-template< class Arg2 >
-class Future< Latch , Arg2 > {
-private:
-
- template< class , class , class > friend class Impl::TaskMember ;
- template< class > friend class TaskPolicy ;
- template< class , class > friend class Future ;
-
- // Argument #2, if not void, must be the space.
- enum { Arg2_is_space = Kokkos::Impl::is_execution_space< Arg2 >::value };
- enum { Arg2_is_void = std::is_same< Arg2 , void >::value };
-
- static_assert( Arg2_is_space || Arg2_is_void
- , "Future template argument #2 must be a space" );
-
- typedef typename
- std::conditional< Arg2_is_space , Arg2 , Kokkos::DefaultExecutionSpace >
- ::type ExecutionSpace ;
-
- typedef Impl::TaskMember< ExecutionSpace , void , void > TaskRoot ;
-
- TaskRoot * m_task ;
-
- KOKKOS_INLINE_FUNCTION explicit
- Future( TaskRoot * task )
- : m_task(0)
- { TaskRoot::assign( & m_task , task ); }
-
- //----------------------------------------
-
-public:
-
- typedef void value_type;
- typedef ExecutionSpace execution_space ;
-
- //----------------------------------------
-
- KOKKOS_INLINE_FUNCTION
- void add( const int k ) const
- { if ( 0 != m_task ) m_task->latch_add(k); }
-
- //----------------------------------------
-
- KOKKOS_INLINE_FUNCTION
- TaskState get_task_state() const
- { return 0 != m_task ? m_task->get_state() : TASK_STATE_NULL ; }
-
- KOKKOS_INLINE_FUNCTION
- bool is_null() const { return 0 == m_task ; }
-
- //----------------------------------------
-
- KOKKOS_INLINE_FUNCTION
- ~Future() { TaskRoot::assign( & m_task , 0 ); }
-
- //----------------------------------------
-
- KOKKOS_INLINE_FUNCTION
- Future() : m_task(0) {}
-
- KOKKOS_INLINE_FUNCTION
- Future( const Future & rhs )
- : m_task(0)
- { TaskRoot::assign( & m_task , rhs.m_task ); }
-
- KOKKOS_INLINE_FUNCTION
- Future & operator = ( const Future & rhs )
- { TaskRoot::assign( & m_task , rhs.m_task ); return *this ; }
-
- //----------------------------------------
-
- typedef void get_result_type ;
-
- KOKKOS_INLINE_FUNCTION
- void get() const {}
-
- //----------------------------------------
-
-};
-
-namespace Impl {
-
-template< class T >
-struct is_future : public std::false_type {};
-
-template< class Arg0 , class Arg1 >
-struct is_future< Kokkos::Experimental::Future<Arg0,Arg1> >
- : public std::true_type {};
-
-} /* namespace Impl */
-} /* namespace Experimental */
-} /* namespace Kokkos */
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Experimental {
-
-/** \brief If the argument is an execution space then a serial task in that space */
-template< class Arg0 = Kokkos::DefaultExecutionSpace >
-class TaskPolicy {
-public:
-
- typedef typename Arg0::execution_space execution_space ;
-
- //----------------------------------------
-
- TaskPolicy
- ( const unsigned arg_task_max_count
- , const unsigned arg_task_max_size
- , const unsigned arg_task_default_dependence_capacity = 4
- , const unsigned arg_task_team_size = 0 /* choose default */
- );
-
- TaskPolicy() = default ;
- TaskPolicy( TaskPolicy && rhs ) = default ;
- TaskPolicy( const TaskPolicy & rhs ) = default ;
- TaskPolicy & operator = ( TaskPolicy && rhs ) = default ;
- TaskPolicy & operator = ( const TaskPolicy & rhs ) = default ;
-
- //----------------------------------------
- /** \brief Create a serial task with storage for dependences.
- *
- * Postcondition: Task is in the 'constructing' state.
- */
- template< class FunctorType >
- Future< typename FunctorType::value_type , execution_space >
- create( const FunctorType & functor
- , const unsigned dependence_capacity /* = default */ );
-
- template< class FunctorType >
- KOKKOS_INLINE_FUNCTION
- Future< typename FunctorType::value_type , execution_space >
- create_team( const FunctorType & functor
- , const unsigned dependence_capacity /* = default */ );
-
- /** \brief Set dependence that 'after' cannot start execution
- * until 'before' has completed.
- *
- * Precondition: The 'after' task must be in then 'Constructing' state.
- */
- template< class TA , class TB >
- void add_dependence( const Future<TA,execution_space> & after
- , const Future<TB,execution_space> & before ) const ;
-
- /** \brief Spawn a task in the 'Constructing' state
- *
- * Precondition: Task is in the 'constructing' state.
- * Postcondition: Task is waiting, executing, or complete.
- */
- template< class T >
- const Future<T,execution_space> &
- spawn( const Future<T,execution_space> & ) const ;
-
- //----------------------------------------
- /** \brief Query dependence of an executing task */
-
- template< class FunctorType >
- Future< execution_space >
- get_dependence( FunctorType * , const int ) const ;
-
- //----------------------------------------
- /** \brief Clear current dependences of an executing task
- * in preparation for setting new dependences and
- * respawning.
- *
- * Precondition: The functor must be a task in the executing state.
- */
- template< class FunctorType >
- void clear_dependence( FunctorType * ) const ;
-
- /** \brief Set dependence that 'after' cannot resume execution
- * until 'before' has completed.
- *
- * The 'after' functor must be in the executing state
- */
- template< class FunctorType , class TB >
- void add_dependence( FunctorType * after
- , const Future<TB,execution_space> & before ) const ;
-
- /** \brief Respawn (reschedule) an executing task to be called again
- * after all dependences have completed.
- */
- template< class FunctorType >
- void respawn( FunctorType * ) const ;
-};
-
-//----------------------------------------------------------------------------
-/** \brief Create and spawn a single-thread task */
-template< class ExecSpace , class FunctorType >
-inline
-Future< typename FunctorType::value_type , ExecSpace >
-spawn( TaskPolicy<ExecSpace> & policy , const FunctorType & functor )
-{ return policy.spawn( policy.create( functor ) ); }
-
-/** \brief Create and spawn a single-thread task with dependences */
-template< class ExecSpace , class FunctorType , class Arg0 , class Arg1 >
-inline
-Future< typename FunctorType::value_type , ExecSpace >
-spawn( TaskPolicy<ExecSpace> & policy
- , const FunctorType & functor
- , const Future<Arg0,Arg1> & before_0
- , const Future<Arg0,Arg1> & before_1 )
-{
- Future< typename FunctorType::value_type , ExecSpace > f ;
- f = policy.create( functor , 2 );
- policy.add_dependence( f , before_0 );
- policy.add_dependence( f , before_1 );
- policy.spawn( f );
- return f ;
-}
-
-//----------------------------------------------------------------------------
-/** \brief Create and spawn a parallel_for task */
-template< class ExecSpace , class ParallelPolicyType , class FunctorType >
-inline
-Future< typename FunctorType::value_type , ExecSpace >
-spawn_foreach( TaskPolicy<ExecSpace> & task_policy
- , const ParallelPolicyType & parallel_policy
- , const FunctorType & functor )
-{ return task_policy.spawn( task_policy.create_foreach( parallel_policy , functor ) ); }
-
-/** \brief Create and spawn a parallel_reduce task */
-template< class ExecSpace , class ParallelPolicyType , class FunctorType >
-inline
-Future< typename FunctorType::value_type , ExecSpace >
-spawn_reduce( TaskPolicy<ExecSpace> & task_policy
- , const ParallelPolicyType & parallel_policy
- , const FunctorType & functor )
-{ return task_policy.spawn( task_policy.create_reduce( parallel_policy , functor ) ); }
-
-//----------------------------------------------------------------------------
-/** \brief Respawn a task functor with dependences */
-template< class ExecSpace , class FunctorType , class Arg0 , class Arg1 >
-inline
-void respawn( TaskPolicy<ExecSpace> & policy
- , FunctorType * functor
- , const Future<Arg0,Arg1> & before_0
- , const Future<Arg0,Arg1> & before_1
- )
-{
- policy.clear_dependence( functor );
- policy.add_dependence( functor , before_0 );
- policy.add_dependence( functor , before_1 );
- policy.respawn( functor );
-}
-
-//----------------------------------------------------------------------------
-
-template< class ExecSpace >
-void wait( TaskPolicy< ExecSpace > & );
-
-} /* namespace Experimental */
-} /* namespace Kokkos */
-
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
-#endif /* #if defined( KOKKOS_ENABLE_TASKPOLICY ) */
-#endif /* #ifndef KOKKOS_TASKPOLICY_HPP */
+#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
+#endif /* #ifndef KOKKOS_TASKSCHEDULER_HPP */
diff --git a/lib/kokkos/core/src/Kokkos_Threads.hpp b/lib/kokkos/core/src/Kokkos_Threads.hpp
index c9ebbf926..f01b14724 100644
--- a/lib/kokkos/core/src/Kokkos_Threads.hpp
+++ b/lib/kokkos/core/src/Kokkos_Threads.hpp
@@ -1,222 +1,233 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_THREADS_HPP
#define KOKKOS_THREADS_HPP
#include <Kokkos_Core_fwd.hpp>
#if defined( KOKKOS_HAVE_PTHREAD )
#include <cstddef>
#include <iosfwd>
#include <Kokkos_HostSpace.hpp>
#include <Kokkos_ScratchSpace.hpp>
#include <Kokkos_Layout.hpp>
#include <Kokkos_MemoryTraits.hpp>
#include <impl/Kokkos_Tags.hpp>
/*--------------------------------------------------------------------------*/
namespace Kokkos {
namespace Impl {
class ThreadsExec ;
} // namespace Impl
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
namespace Kokkos {
/** \brief Execution space for a pool of Pthreads or C11 threads on a CPU. */
class Threads {
public:
//! \name Type declarations that all Kokkos devices must provide.
//@{
//! Tag this class as a kokkos execution space
typedef Threads execution_space ;
typedef Kokkos::HostSpace memory_space ;
//! This execution space preferred device_type
typedef Kokkos::Device<execution_space,memory_space> device_type;
typedef Kokkos::LayoutRight array_layout ;
typedef memory_space::size_type size_type ;
typedef ScratchMemorySpace< Threads > scratch_memory_space ;
//@}
/*------------------------------------------------------------------------*/
//! \name Static functions that all Kokkos devices must implement.
//@{
/// \brief True if and only if this method is being called in a
/// thread-parallel function.
static int in_parallel();
/** \brief Set the device in a "sleep" state.
*
* This function sets the device in a "sleep" state in which it is
* not ready for work. This may consume less resources than if the
* device were in an "awake" state, but it may also take time to
* bring the device from a sleep state to be ready for work.
*
* \return True if the device is in the "sleep" state, else false if
* the device is actively working and could not enter the "sleep"
* state.
*/
static bool sleep();
/// \brief Wake the device from the 'sleep' state so it is ready for work.
///
/// \return True if the device is in the "ready" state, else "false"
/// if the device is actively working (which also means that it's
/// awake).
static bool wake();
/// \brief Wait until all dispatched functors complete.
///
/// The parallel_for or parallel_reduce dispatch of a functor may
/// return asynchronously, before the functor completes. This
/// method does not return until all dispatched functors on this
/// device have completed.
static void fence();
/// \brief Free any resources being consumed by the device.
///
/// For the Threads device, this terminates spawned worker threads.
static void finalize();
/// \brief Print configuration information to the given output stream.
static void print_configuration( std::ostream & , const bool detail = false );
//@}
/*------------------------------------------------------------------------*/
/*------------------------------------------------------------------------*/
//! \name Space-specific functions
//@{
/** \brief Initialize the device in the "ready to work" state.
*
* The device is initialized in a "ready to work" or "awake" state.
* This state reduces latency and thus improves performance when
* dispatching work. However, the "awake" state consumes resources
* even when no work is being done. You may call sleep() to put
* the device in a "sleeping" state that does not consume as many
* resources, but it will take time (latency) to awaken the device
* again (via the wake()) method so that it is ready for work.
*
* Teams of threads are distributed as evenly as possible across
* the requested number of numa regions and cores per numa region.
* A team will not be split across a numa region.
*
* If the 'use_' arguments are not supplied the hwloc is queried
* to use all available cores.
*/
static void initialize( unsigned threads_count = 0 ,
unsigned use_numa_count = 0 ,
unsigned use_cores_per_numa = 0 ,
bool allow_asynchronous_threadpool = false );
static int is_initialized();
/** \brief Return the maximum amount of concurrency. */
static int concurrency();
static Threads & instance( int = 0 );
//----------------------------------------
static int thread_pool_size( int depth = 0 );
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
static int thread_pool_rank();
#else
KOKKOS_INLINE_FUNCTION static int thread_pool_rank() { return 0 ; }
#endif
inline static unsigned max_hardware_threads() { return thread_pool_size(0); }
KOKKOS_INLINE_FUNCTION static unsigned hardware_thread_id() { return thread_pool_rank(); }
//@}
//----------------------------------------
};
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
namespace Kokkos {
namespace Impl {
+template<>
+struct MemorySpaceAccess
+ < Kokkos::Threads::memory_space
+ , Kokkos::Threads::scratch_memory_space
+ >
+{
+ enum { assignable = false };
+ enum { accessible = true };
+ enum { deepcopy = false };
+};
+
template<>
struct VerifyExecutionCanAccessMemorySpace
< Kokkos::Threads::memory_space
, Kokkos::Threads::scratch_memory_space
>
{
enum { value = true };
inline static void verify( void ) { }
inline static void verify( const void * ) { }
};
} // namespace Impl
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
#include <Kokkos_ExecPolicy.hpp>
#include <Kokkos_Parallel.hpp>
#include <Threads/Kokkos_ThreadsExec.hpp>
#include <Threads/Kokkos_ThreadsTeam.hpp>
#include <Threads/Kokkos_Threads_Parallel.hpp>
#include <KokkosExp_MDRangePolicy.hpp>
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
#endif /* #if defined( KOKKOS_HAVE_PTHREAD ) */
#endif /* #define KOKKOS_THREADS_HPP */
diff --git a/lib/kokkos/core/src/impl/Kokkos_Timer.hpp b/lib/kokkos/core/src/Kokkos_Timer.hpp
similarity index 92%
copy from lib/kokkos/core/src/impl/Kokkos_Timer.hpp
copy to lib/kokkos/core/src/Kokkos_Timer.hpp
index 1f14e4287..4eca5037e 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Timer.hpp
+++ b/lib/kokkos/core/src/Kokkos_Timer.hpp
@@ -1,118 +1,112 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
-#ifndef KOKKOS_IMPLWALLTIME_HPP
-#define KOKKOS_IMPLWALLTIME_HPP
+#ifndef KOKKOS_TIMER_HPP
+#define KOKKOS_TIMER_HPP
#include <stddef.h>
#ifdef _MSC_VER
#undef KOKKOS_USE_LIBRT
#include <gettimeofday.c>
#else
#ifdef KOKKOS_USE_LIBRT
#include <ctime>
#else
#include <sys/time.h>
#endif
#endif
namespace Kokkos {
-namespace Impl {
/** \brief Time since construction */
class Timer {
private:
#ifdef KOKKOS_USE_LIBRT
struct timespec m_old;
#else
struct timeval m_old ;
#endif
Timer( const Timer & );
Timer & operator = ( const Timer & );
public:
inline
void reset() {
#ifdef KOKKOS_USE_LIBRT
clock_gettime(CLOCK_REALTIME, &m_old);
#else
gettimeofday( & m_old , ((struct timezone *) NULL ) );
#endif
}
inline
~Timer() {}
inline
Timer() { reset(); }
inline
double seconds() const
{
#ifdef KOKKOS_USE_LIBRT
struct timespec m_new;
clock_gettime(CLOCK_REALTIME, &m_new);
return ( (double) ( m_new.tv_sec - m_old.tv_sec ) ) +
( (double) ( m_new.tv_nsec - m_old.tv_nsec ) * 1.0e-9 );
#else
struct timeval m_new ;
- ::gettimeofday( & m_new , ((struct timezone *) NULL ) );
+ gettimeofday( & m_new , ((struct timezone *) NULL ) );
return ( (double) ( m_new.tv_sec - m_old.tv_sec ) ) +
( (double) ( m_new.tv_usec - m_old.tv_usec ) * 1.0e-6 );
#endif
}
};
-} // namespace Impl
-
- using Kokkos::Impl::Timer ;
-
} // namespace Kokkos
-#endif /* #ifndef KOKKOS_IMPLWALLTIME_HPP */
-
+#endif /* #ifndef KOKKOS_TIMER_HPP */
diff --git a/lib/kokkos/core/src/Kokkos_View.hpp b/lib/kokkos/core/src/Kokkos_View.hpp
index 1cc8b0338..b728b3649 100644
--- a/lib/kokkos/core/src/Kokkos_View.hpp
+++ b/lib/kokkos/core/src/Kokkos_View.hpp
@@ -1,2384 +1,2510 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_VIEW_HPP
#define KOKKOS_VIEW_HPP
#include <type_traits>
#include <string>
#include <algorithm>
#include <initializer_list>
#include <Kokkos_Core_fwd.hpp>
#include <Kokkos_HostSpace.hpp>
#include <Kokkos_MemoryTraits.hpp>
#include <Kokkos_ExecPolicy.hpp>
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Experimental {
namespace Impl {
-template< class DstMemorySpace , class SrcMemorySpace >
-struct DeepCopy ;
-
template< class DataType >
struct ViewArrayAnalysis ;
template< class DataType , class ArrayLayout
, typename ValueType =
typename ViewArrayAnalysis< DataType >::non_const_value_type
>
struct ViewDataAnalysis ;
template< class , class ... >
class ViewMapping { public: enum { is_assignable = false }; };
-template< class MemorySpace >
-struct ViewOperatorBoundsErrorAbort ;
+} /* namespace Impl */
+} /* namespace Experimental */
+} /* namespace Kokkos */
-template<>
-struct ViewOperatorBoundsErrorAbort< Kokkos::HostSpace > {
- static void apply( const size_t rank
- , const size_t n0 , const size_t n1
- , const size_t n2 , const size_t n3
- , const size_t n4 , const size_t n5
- , const size_t n6 , const size_t n7
- , const size_t i0 , const size_t i1
- , const size_t i2 , const size_t i3
- , const size_t i4 , const size_t i5
- , const size_t i6 , const size_t i7 );
-};
+namespace Kokkos {
+namespace Impl {
+
+using Kokkos::Experimental::Impl::ViewMapping ;
+using Kokkos::Experimental::Impl::ViewDataAnalysis ;
} /* namespace Impl */
-} /* namespace Experimental */
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
-namespace Experimental {
/** \class ViewTraits
* \brief Traits class for accessing attributes of a View.
*
* This is an implementation detail of View. It is only of interest
* to developers implementing a new specialization of View.
*
* Template argument options:
* - View< DataType >
* - View< DataType , Space >
* - View< DataType , Space , MemoryTraits >
* - View< DataType , ArrayLayout >
* - View< DataType , ArrayLayout , Space >
* - View< DataType , ArrayLayout , MemoryTraits >
* - View< DataType , ArrayLayout , Space , MemoryTraits >
* - View< DataType , MemoryTraits >
*/
template< class DataType , class ... Properties >
struct ViewTraits ;
template<>
struct ViewTraits< void >
{
typedef void execution_space ;
typedef void memory_space ;
typedef void HostMirrorSpace ;
typedef void array_layout ;
typedef void memory_traits ;
};
template< class ... Prop >
struct ViewTraits< void , void , Prop ... >
{
// Ignore an extraneous 'void'
typedef typename ViewTraits<void,Prop...>::execution_space execution_space ;
typedef typename ViewTraits<void,Prop...>::memory_space memory_space ;
typedef typename ViewTraits<void,Prop...>::HostMirrorSpace HostMirrorSpace ;
typedef typename ViewTraits<void,Prop...>::array_layout array_layout ;
typedef typename ViewTraits<void,Prop...>::memory_traits memory_traits ;
};
template< class ArrayLayout , class ... Prop >
struct ViewTraits< typename std::enable_if< Kokkos::Impl::is_array_layout<ArrayLayout>::value >::type , ArrayLayout , Prop ... >
{
// Specify layout, keep subsequent space and memory traits arguments
typedef typename ViewTraits<void,Prop...>::execution_space execution_space ;
typedef typename ViewTraits<void,Prop...>::memory_space memory_space ;
typedef typename ViewTraits<void,Prop...>::HostMirrorSpace HostMirrorSpace ;
typedef ArrayLayout array_layout ;
typedef typename ViewTraits<void,Prop...>::memory_traits memory_traits ;
};
template< class Space , class ... Prop >
struct ViewTraits< typename std::enable_if< Kokkos::Impl::is_space<Space>::value >::type , Space , Prop ... >
{
// Specify Space, memory traits should be the only subsequent argument.
static_assert( std::is_same< typename ViewTraits<void,Prop...>::execution_space , void >::value &&
std::is_same< typename ViewTraits<void,Prop...>::memory_space , void >::value &&
std::is_same< typename ViewTraits<void,Prop...>::HostMirrorSpace , void >::value &&
std::is_same< typename ViewTraits<void,Prop...>::array_layout , void >::value
, "Only one View Execution or Memory Space template argument" );
typedef typename Space::execution_space execution_space ;
typedef typename Space::memory_space memory_space ;
- typedef typename Kokkos::Impl::is_space< Space >::host_mirror_space
- HostMirrorSpace ;
+ typedef typename Kokkos::Impl::HostMirror< Space >::Space HostMirrorSpace ;
typedef typename execution_space::array_layout array_layout ;
typedef typename ViewTraits<void,Prop...>::memory_traits memory_traits ;
};
template< class MemoryTraits , class ... Prop >
struct ViewTraits< typename std::enable_if< Kokkos::Impl::is_memory_traits<MemoryTraits>::value >::type , MemoryTraits , Prop ... >
{
// Specify memory trait, should not be any subsequent arguments
static_assert( std::is_same< typename ViewTraits<void,Prop...>::execution_space , void >::value &&
std::is_same< typename ViewTraits<void,Prop...>::memory_space , void >::value &&
std::is_same< typename ViewTraits<void,Prop...>::array_layout , void >::value &&
std::is_same< typename ViewTraits<void,Prop...>::memory_traits , void >::value
, "MemoryTrait is the final optional template argument for a View" );
typedef void execution_space ;
typedef void memory_space ;
typedef void HostMirrorSpace ;
typedef void array_layout ;
typedef MemoryTraits memory_traits ;
};
template< class DataType , class ... Properties >
struct ViewTraits {
private:
// Unpack the properties arguments
typedef ViewTraits< void , Properties ... > prop ;
typedef typename
std::conditional< ! std::is_same< typename prop::execution_space , void >::value
, typename prop::execution_space
, Kokkos::DefaultExecutionSpace
>::type
ExecutionSpace ;
typedef typename
std::conditional< ! std::is_same< typename prop::memory_space , void >::value
, typename prop::memory_space
, typename ExecutionSpace::memory_space
>::type
MemorySpace ;
typedef typename
std::conditional< ! std::is_same< typename prop::array_layout , void >::value
, typename prop::array_layout
, typename ExecutionSpace::array_layout
>::type
ArrayLayout ;
typedef typename
std::conditional
< ! std::is_same< typename prop::HostMirrorSpace , void >::value
, typename prop::HostMirrorSpace
- , typename Kokkos::Impl::is_space< ExecutionSpace >::host_mirror_space
+ , typename Kokkos::Impl::HostMirror< ExecutionSpace >::Space
>::type
HostMirrorSpace ;
typedef typename
std::conditional< ! std::is_same< typename prop::memory_traits , void >::value
, typename prop::memory_traits
, typename Kokkos::MemoryManaged
>::type
MemoryTraits ;
// Analyze data type's properties,
// May be specialized based upon the layout and value type
- typedef Kokkos::Experimental::Impl::ViewDataAnalysis< DataType , ArrayLayout > data_analysis ;
+ typedef Kokkos::Impl::ViewDataAnalysis< DataType , ArrayLayout > data_analysis ;
public:
//------------------------------------
// Data type traits:
typedef typename data_analysis::type data_type ;
typedef typename data_analysis::const_type const_data_type ;
typedef typename data_analysis::non_const_type non_const_data_type ;
//------------------------------------
// Compatible array of trivial type traits:
typedef typename data_analysis::scalar_array_type scalar_array_type ;
typedef typename data_analysis::const_scalar_array_type const_scalar_array_type ;
typedef typename data_analysis::non_const_scalar_array_type non_const_scalar_array_type ;
//------------------------------------
// Value type traits:
typedef typename data_analysis::value_type value_type ;
typedef typename data_analysis::const_value_type const_value_type ;
typedef typename data_analysis::non_const_value_type non_const_value_type ;
//------------------------------------
// Mapping traits:
typedef ArrayLayout array_layout ;
typedef typename data_analysis::dimension dimension ;
typedef typename data_analysis::specialize specialize /* mapping specialization tag */ ;
enum { rank = dimension::rank };
enum { rank_dynamic = dimension::rank_dynamic };
//------------------------------------
// Execution space, memory space, memory access traits, and host mirror space.
typedef ExecutionSpace execution_space ;
typedef MemorySpace memory_space ;
typedef Kokkos::Device<ExecutionSpace,MemorySpace> device_type ;
typedef MemoryTraits memory_traits ;
typedef HostMirrorSpace host_mirror_space ;
typedef typename MemorySpace::size_type size_type ;
enum { is_hostspace = std::is_same< MemorySpace , HostSpace >::value };
enum { is_managed = MemoryTraits::Unmanaged == 0 };
enum { is_random_access = MemoryTraits::RandomAccess == 1 };
//------------------------------------
};
/** \class View
* \brief View to an array of data.
*
* A View represents an array of one or more dimensions.
* For details, please refer to Kokkos' tutorial materials.
*
* \section Kokkos_View_TemplateParameters Template parameters
*
* This class has both required and optional template parameters. The
* \c DataType parameter must always be provided, and must always be
* first. The parameters \c Arg1Type, \c Arg2Type, and \c Arg3Type are
* placeholders for different template parameters. The default value
* of the fifth template parameter \c Specialize suffices for most use
* cases. When explaining the template parameters, we won't refer to
* \c Arg1Type, \c Arg2Type, and \c Arg3Type; instead, we will refer
* to the valid categories of template parameters, in whatever order
* they may occur.
*
* Valid ways in which template arguments may be specified:
* - View< DataType >
* - View< DataType , Layout >
* - View< DataType , Layout , Space >
* - View< DataType , Layout , Space , MemoryTraits >
* - View< DataType , Space >
* - View< DataType , Space , MemoryTraits >
* - View< DataType , MemoryTraits >
*
* \tparam DataType (required) This indicates both the type of each
* entry of the array, and the combination of compile-time and
* run-time array dimension(s). For example, <tt>double*</tt>
* indicates a one-dimensional array of \c double with run-time
* dimension, and <tt>int*[3]</tt> a two-dimensional array of \c int
* with run-time first dimension and compile-time second dimension
* (of 3). In general, the run-time dimensions (if any) must go
* first, followed by zero or more compile-time dimensions. For
* more examples, please refer to the tutorial materials.
*
* \tparam Space (required) The memory space.
*
* \tparam Layout (optional) The array's layout in memory. For
* example, LayoutLeft indicates a column-major (Fortran style)
* layout, and LayoutRight a row-major (C style) layout. If not
* specified, this defaults to the preferred layout for the
* <tt>Space</tt>.
*
* \tparam MemoryTraits (optional) Assertion of the user's intended
* access behavior. For example, RandomAccess indicates read-only
* access with limited spatial locality, and Unmanaged lets users
* wrap externally allocated memory in a View without automatic
* deallocation.
*
* \section Kokkos_View_MT MemoryTraits discussion
*
* \subsection Kokkos_View_MT_Interp MemoryTraits interpretation depends on Space
*
* Some \c MemoryTraits options may have different interpretations for
* different \c Space types. For example, with the Cuda device,
* \c RandomAccess tells Kokkos to fetch the data through the texture
* cache, whereas the non-GPU devices have no such hardware construct.
*
* \subsection Kokkos_View_MT_PrefUse Preferred use of MemoryTraits
*
* Users should defer applying the optional \c MemoryTraits parameter
* until the point at which they actually plan to rely on it in a
* computational kernel. This minimizes the number of template
* parameters exposed in their code, which reduces the cost of
* compilation. Users may always assign a View without specified
* \c MemoryTraits to a compatible View with that specification.
* For example:
* \code
* // Pass in the simplest types of View possible.
* void
* doSomething (View<double*, Cuda> out,
* View<const double*, Cuda> in)
* {
* // Assign the "generic" View in to a RandomAccess View in_rr.
* // Note that RandomAccess View objects must have const data.
* View<const double*, Cuda, RandomAccess> in_rr = in;
* // ... do something with in_rr and out ...
* }
* \endcode
*/
template< class DataType , class ... Properties >
class View ;
-} /* namespace Experimental */
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
-#include <impl/KokkosExp_ViewMapping.hpp>
-#include <impl/KokkosExp_ViewArray.hpp>
+#include <impl/Kokkos_ViewMapping.hpp>
+#include <impl/Kokkos_ViewArray.hpp>
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
-namespace Experimental {
namespace {
-constexpr Kokkos::Experimental::Impl::ALL_t
- ALL = Kokkos::Experimental::Impl::ALL_t();
+constexpr Kokkos::Impl::ALL_t
+ ALL = Kokkos::Impl::ALL_t();
-constexpr Kokkos::Experimental::Impl::WithoutInitializing_t
- WithoutInitializing = Kokkos::Experimental::Impl::WithoutInitializing_t();
+constexpr Kokkos::Impl::WithoutInitializing_t
+ WithoutInitializing = Kokkos::Impl::WithoutInitializing_t();
-constexpr Kokkos::Experimental::Impl::AllowPadding_t
- AllowPadding = Kokkos::Experimental::Impl::AllowPadding_t();
+constexpr Kokkos::Impl::AllowPadding_t
+ AllowPadding = Kokkos::Impl::AllowPadding_t();
}
/** \brief Create View allocation parameter bundle from argument list.
*
* Valid argument list members are:
* 1) label as a "string" or std::string
* 2) memory space instance of the View::memory_space type
* 3) execution space instance compatible with the View::memory_space
* 4) Kokkos::WithoutInitializing to bypass initialization
* 4) Kokkos::AllowPadding to allow allocation to pad dimensions for memory alignment
*/
template< class ... Args >
inline
Impl::ViewCtorProp< typename Impl::ViewCtorProp< void , Args >::type ... >
view_alloc( Args const & ... args )
{
typedef
Impl::ViewCtorProp< typename Impl::ViewCtorProp< void , Args >::type ... >
return_type ;
static_assert( ! return_type::has_pointer
, "Cannot give pointer-to-memory for view allocation" );
return return_type( args... );
}
template< class ... Args >
inline
Impl::ViewCtorProp< typename Impl::ViewCtorProp< void , Args >::type ... >
view_wrap( Args const & ... args )
{
typedef
Impl::ViewCtorProp< typename Impl::ViewCtorProp< void , Args >::type ... >
return_type ;
static_assert( ! return_type::has_memory_space &&
! return_type::has_execution_space &&
! return_type::has_label &&
return_type::has_pointer
, "Must only give pointer-to-memory for view wrapping" );
return return_type( args... );
}
-} /* namespace Experimental */
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
-namespace Experimental {
template< class DataType , class ... Properties >
class View ;
template< class > struct is_view : public std::false_type {};
template< class D, class ... P >
struct is_view< View<D,P...> > : public std::true_type {};
template< class D, class ... P >
struct is_view< const View<D,P...> > : public std::true_type {};
template< class DataType , class ... Properties >
class View : public ViewTraits< DataType , Properties ... > {
private:
template< class , class ... > friend class View ;
- template< class , class ... > friend class Impl::ViewMapping ;
+ template< class , class ... > friend class Kokkos::Impl::ViewMapping ;
public:
typedef ViewTraits< DataType , Properties ... > traits ;
private:
- typedef Kokkos::Experimental::Impl::ViewMapping< traits , void > map_type ;
- typedef Kokkos::Experimental::Impl::SharedAllocationTracker track_type ;
+ typedef Kokkos::Impl::ViewMapping< traits , void > map_type ;
+ typedef Kokkos::Impl::SharedAllocationTracker track_type ;
track_type m_track ;
map_type m_map ;
public:
//----------------------------------------
/** \brief Compatible view of array of scalar types */
typedef View< typename traits::scalar_array_type ,
typename traits::array_layout ,
typename traits::device_type ,
typename traits::memory_traits >
array_type ;
/** \brief Compatible view of const data type */
typedef View< typename traits::const_data_type ,
typename traits::array_layout ,
typename traits::device_type ,
typename traits::memory_traits >
const_type ;
/** \brief Compatible view of non-const data type */
typedef View< typename traits::non_const_data_type ,
typename traits::array_layout ,
typename traits::device_type ,
typename traits::memory_traits >
non_const_type ;
/** \brief Compatible HostMirror view */
typedef View< typename traits::non_const_data_type ,
typename traits::array_layout ,
typename traits::host_mirror_space >
HostMirror ;
//----------------------------------------
// Domain rank and extents
enum { Rank = map_type::Rank };
/** \brief rank() to be implemented
*/
//KOKKOS_INLINE_FUNCTION
//static
//constexpr unsigned rank() { return map_type::Rank; }
template< typename iType >
KOKKOS_INLINE_FUNCTION constexpr
typename std::enable_if< std::is_integral<iType>::value , size_t >::type
extent( const iType & r ) const
{ return m_map.extent(r); }
template< typename iType >
KOKKOS_INLINE_FUNCTION constexpr
typename std::enable_if< std::is_integral<iType>::value , int >::type
extent_int( const iType & r ) const
{ return static_cast<int>(m_map.extent(r)); }
KOKKOS_INLINE_FUNCTION constexpr
typename traits::array_layout layout() const
{ return m_map.layout(); }
//----------------------------------------
/* Deprecate all 'dimension' functions in favor of
* ISO/C++ vocabulary 'extent'.
*/
template< typename iType >
KOKKOS_INLINE_FUNCTION constexpr
typename std::enable_if< std::is_integral<iType>::value , size_t >::type
dimension( const iType & r ) const { return extent( r ); }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_0() const { return m_map.dimension_0(); }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_1() const { return m_map.dimension_1(); }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_2() const { return m_map.dimension_2(); }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_3() const { return m_map.dimension_3(); }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_4() const { return m_map.dimension_4(); }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_5() const { return m_map.dimension_5(); }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_6() const { return m_map.dimension_6(); }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_7() const { return m_map.dimension_7(); }
//----------------------------------------
KOKKOS_INLINE_FUNCTION constexpr size_t size() const { return m_map.dimension_0() *
m_map.dimension_1() *
m_map.dimension_2() *
m_map.dimension_3() *
m_map.dimension_4() *
m_map.dimension_5() *
m_map.dimension_6() *
m_map.dimension_7(); }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_0() const { return m_map.stride_0(); }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_1() const { return m_map.stride_1(); }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_2() const { return m_map.stride_2(); }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_3() const { return m_map.stride_3(); }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_4() const { return m_map.stride_4(); }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_5() const { return m_map.stride_5(); }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_6() const { return m_map.stride_6(); }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_7() const { return m_map.stride_7(); }
template< typename iType >
KOKKOS_INLINE_FUNCTION void stride( iType * const s ) const { m_map.stride(s); }
//----------------------------------------
// Range span is the span which contains all members.
typedef typename map_type::reference_type reference_type ;
typedef typename map_type::pointer_type pointer_type ;
enum { reference_type_is_lvalue_reference = std::is_lvalue_reference< reference_type >::value };
KOKKOS_INLINE_FUNCTION constexpr size_t span() const { return m_map.span(); }
// Deprecated, use 'span()' instead
KOKKOS_INLINE_FUNCTION constexpr size_t capacity() const { return m_map.span(); }
KOKKOS_INLINE_FUNCTION constexpr bool span_is_contiguous() const { return m_map.span_is_contiguous(); }
KOKKOS_INLINE_FUNCTION constexpr pointer_type data() const { return m_map.data(); }
// Deprecated, use 'span_is_contigous()' instead
KOKKOS_INLINE_FUNCTION constexpr bool is_contiguous() const { return m_map.span_is_contiguous(); }
// Deprecated, use 'data()' instead
KOKKOS_INLINE_FUNCTION constexpr pointer_type ptr_on_device() const { return m_map.data(); }
//----------------------------------------
// Allow specializations to query their specialized map
KOKKOS_INLINE_FUNCTION
- const Kokkos::Experimental::Impl::ViewMapping< traits , void > &
+ const Kokkos::Impl::ViewMapping< traits , void > &
implementation_map() const { return m_map ; }
//----------------------------------------
private:
enum {
is_layout_left = std::is_same< typename traits::array_layout
, Kokkos::LayoutLeft >::value ,
is_layout_right = std::is_same< typename traits::array_layout
, Kokkos::LayoutRight >::value ,
is_layout_stride = std::is_same< typename traits::array_layout
, Kokkos::LayoutStride >::value ,
is_default_map =
std::is_same< typename traits::specialize , void >::value &&
( is_layout_left || is_layout_right || is_layout_stride )
};
+ template< class Space , bool = Kokkos::Impl::MemorySpaceAccess< Space , typename traits::memory_space >::accessible > struct verify_space
+ { KOKKOS_FORCEINLINE_FUNCTION static void check() {} };
+
+ template< class Space > struct verify_space<Space,false>
+ { KOKKOS_FORCEINLINE_FUNCTION static void check()
+ { Kokkos::abort("Kokkos::View ERROR: attempt to access inaccessible memory space"); };
+ };
+
#if defined( KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK )
#define KOKKOS_VIEW_OPERATOR_VERIFY( ARG ) \
- Kokkos::Impl::VerifyExecutionCanAccessMemorySpace \
- < Kokkos::Impl::ActiveExecutionMemorySpace , typename traits::memory_space >::verify(); \
- Kokkos::Experimental::Impl::view_verify_operator_bounds ARG ;
+ View::template verify_space< Kokkos::Impl::ActiveExecutionMemorySpace >::check(); \
+ Kokkos::Impl::view_verify_operator_bounds ARG ;
#else
#define KOKKOS_VIEW_OPERATOR_VERIFY( ARG ) \
- Kokkos::Impl::VerifyExecutionCanAccessMemorySpace \
- < Kokkos::Impl::ActiveExecutionMemorySpace , typename traits::memory_space >::verify();
+ View::template verify_space< Kokkos::Impl::ActiveExecutionMemorySpace >::check();
#endif
public:
//------------------------------
// Rank 0 operator()
template< class ... Args >
KOKKOS_FORCEINLINE_FUNCTION
typename std::enable_if<( Kokkos::Impl::are_integral<Args...>::value
&& ( 0 == Rank )
), reference_type >::type
operator()( Args ... args ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( (m_map,args...) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (NULL,m_map,args...) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (m_track.template get_label<typename traits::memory_space>().c_str(),m_map,args...) )
+ #endif
return m_map.reference();
}
//------------------------------
// Rank 1 operator()
template< typename I0
, class ... Args >
KOKKOS_FORCEINLINE_FUNCTION
typename std::enable_if<
( Kokkos::Impl::are_integral<I0,Args...>::value
&& ( 1 == Rank )
&& ! is_default_map
), reference_type >::type
operator()( const I0 & i0
, Args ... args ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( (m_map,i0,args...) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (NULL,m_map,i0,args...) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,args...) )
+ #endif
return m_map.reference(i0);
}
template< typename I0
, class ... Args >
KOKKOS_FORCEINLINE_FUNCTION
typename std::enable_if<
( Kokkos::Impl::are_integral<I0,Args...>::value
&& ( 1 == Rank )
&& is_default_map
&& ! is_layout_stride
), reference_type >::type
operator()( const I0 & i0
, Args ... args ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( (m_map,i0,args...) )
+
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (NULL,m_map,i0,args...) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,args...) )
+ #endif
return m_map.m_handle[ i0 ];
}
template< typename I0
, class ... Args >
KOKKOS_FORCEINLINE_FUNCTION
typename std::enable_if<
( Kokkos::Impl::are_integral<I0,Args...>::value
&& ( 1 == Rank )
&& is_default_map
&& is_layout_stride
), reference_type >::type
operator()( const I0 & i0
, Args ... args ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( (m_map,i0,args...) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (NULL,m_map,i0,args...) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,args...) )
+ #endif
return m_map.m_handle[ m_map.m_offset.m_stride.S0 * i0 ];
}
//------------------------------
// Rank 1 operator[]
template< typename I0 >
KOKKOS_FORCEINLINE_FUNCTION
typename std::enable_if<
( Kokkos::Impl::are_integral<I0>::value
&& ( 1 == Rank )
&& ! is_default_map
), reference_type >::type
operator[]( const I0 & i0 ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( (m_map,i0) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (NULL,m_map,i0) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0) )
+ #endif
return m_map.reference(i0);
}
template< typename I0 >
KOKKOS_FORCEINLINE_FUNCTION
typename std::enable_if<
( Kokkos::Impl::are_integral<I0>::value
&& ( 1 == Rank )
&& is_default_map
&& ! is_layout_stride
), reference_type >::type
operator[]( const I0 & i0 ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( (m_map,i0) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (NULL,m_map,i0) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0) )
+ #endif
return m_map.m_handle[ i0 ];
}
template< typename I0 >
KOKKOS_FORCEINLINE_FUNCTION
typename std::enable_if<
( Kokkos::Impl::are_integral<I0>::value
&& ( 1 == Rank )
&& is_default_map
&& is_layout_stride
), reference_type >::type
operator[]( const I0 & i0 ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( (m_map,i0) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (NULL,m_map,i0) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0) )
+ #endif
return m_map.m_handle[ m_map.m_offset.m_stride.S0 * i0 ];
}
//------------------------------
// Rank 2
template< typename I0 , typename I1
, class ... Args >
KOKKOS_FORCEINLINE_FUNCTION
typename std::enable_if<
( Kokkos::Impl::are_integral<I0,I1,Args...>::value
&& ( 2 == Rank )
&& ! is_default_map
), reference_type >::type
operator()( const I0 & i0 , const I1 & i1
, Args ... args ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( (m_map,i0,i1,args...) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (NULL,m_map,i0,i1,args...) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,args...) )
+ #endif
return m_map.reference(i0,i1);
}
template< typename I0 , typename I1
, class ... Args >
KOKKOS_FORCEINLINE_FUNCTION
typename std::enable_if<
( Kokkos::Impl::are_integral<I0,I1,Args...>::value
&& ( 2 == Rank )
&& is_default_map
&& is_layout_left && ( traits::rank_dynamic == 0 )
), reference_type >::type
operator()( const I0 & i0 , const I1 & i1
, Args ... args ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( (m_map,i0,i1,args...) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (NULL,m_map,i0,i1,args...) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,args...) )
+ #endif
return m_map.m_handle[ i0 + m_map.m_offset.m_dim.N0 * i1 ];
}
template< typename I0 , typename I1
, class ... Args >
KOKKOS_FORCEINLINE_FUNCTION
typename std::enable_if<
( Kokkos::Impl::are_integral<I0,I1,Args...>::value
&& ( 2 == Rank )
&& is_default_map
&& is_layout_left && ( traits::rank_dynamic != 0 )
), reference_type >::type
operator()( const I0 & i0 , const I1 & i1
, Args ... args ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( (m_map,i0,i1,args...) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (NULL,m_map,i0,i1,args...) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,args...) )
+ #endif
return m_map.m_handle[ i0 + m_map.m_offset.m_stride * i1 ];
}
template< typename I0 , typename I1
, class ... Args >
KOKKOS_FORCEINLINE_FUNCTION
typename std::enable_if<
( Kokkos::Impl::are_integral<I0,I1,Args...>::value
&& ( 2 == Rank )
&& is_default_map
&& is_layout_right && ( traits::rank_dynamic == 0 )
), reference_type >::type
operator()( const I0 & i0 , const I1 & i1
, Args ... args ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( (m_map,i0,i1,args...) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (NULL,m_map,i0,i1,args...) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,args...) )
+ #endif
return m_map.m_handle[ i1 + m_map.m_offset.m_dim.N1 * i0 ];
}
template< typename I0 , typename I1
, class ... Args >
KOKKOS_FORCEINLINE_FUNCTION
typename std::enable_if<
( Kokkos::Impl::are_integral<I0,I1,Args...>::value
&& ( 2 == Rank )
&& is_default_map
&& is_layout_right && ( traits::rank_dynamic != 0 )
), reference_type >::type
operator()( const I0 & i0 , const I1 & i1
, Args ... args ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( (m_map,i0,i1,args...) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (NULL,m_map,i0,i1,args...) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,args...) )
+ #endif
return m_map.m_handle[ i1 + m_map.m_offset.m_stride * i0 ];
}
template< typename I0 , typename I1
, class ... Args >
KOKKOS_FORCEINLINE_FUNCTION
typename std::enable_if<
( Kokkos::Impl::are_integral<I0,I1,Args...>::value
&& ( 2 == Rank )
&& is_default_map
&& is_layout_stride
), reference_type >::type
operator()( const I0 & i0 , const I1 & i1
, Args ... args ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( (m_map,i0,i1,args...) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (NULL,m_map,i0,i1,args...) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,args...) )
+ #endif
return m_map.m_handle[ i0 * m_map.m_offset.m_stride.S0 +
i1 * m_map.m_offset.m_stride.S1 ];
}
//------------------------------
// Rank 3
template< typename I0 , typename I1 , typename I2
, class ... Args >
KOKKOS_FORCEINLINE_FUNCTION
typename std::enable_if<
( Kokkos::Impl::are_integral<I0,I1,I2,Args...>::value
&& ( 3 == Rank )
&& is_default_map
), reference_type >::type
operator()( const I0 & i0 , const I1 & i1 , const I2 & i2
, Args ... args ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( (m_map,i0,i1,i2,args...) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (NULL,m_map,i0,i1,i2,args...) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,args...) )
+ #endif
return m_map.m_handle[ m_map.m_offset(i0,i1,i2) ];
}
template< typename I0 , typename I1 , typename I2
, class ... Args >
KOKKOS_FORCEINLINE_FUNCTION
typename std::enable_if<
( Kokkos::Impl::are_integral<I0,I1,I2,Args...>::value
&& ( 3 == Rank )
&& ! is_default_map
), reference_type >::type
operator()( const I0 & i0 , const I1 & i1 , const I2 & i2
, Args ... args ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( (m_map,i0,i1,i2,args...) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (NULL,m_map,i0,i1,i2,args...) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,args...) )
+ #endif
return m_map.reference(i0,i1,i2);
}
//------------------------------
// Rank 4
template< typename I0 , typename I1 , typename I2 , typename I3
, class ... Args >
KOKKOS_FORCEINLINE_FUNCTION
typename std::enable_if<
( Kokkos::Impl::are_integral<I0,I1,I2,I3,Args...>::value
&& ( 4 == Rank )
&& is_default_map
), reference_type >::type
operator()( const I0 & i0 , const I1 & i1 , const I2 & i2 , const I3 & i3
, Args ... args ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( (m_map,i0,i1,i2,i3,args...) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (NULL,m_map,i0,i1,i2,i3,args...) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3,args...) )
+ #endif
return m_map.m_handle[ m_map.m_offset(i0,i1,i2,i3) ];
}
template< typename I0 , typename I1 , typename I2 , typename I3
, class ... Args >
KOKKOS_FORCEINLINE_FUNCTION
typename std::enable_if<
( Kokkos::Impl::are_integral<I0,I1,I2,I3,Args...>::value
&& ( 4 == Rank )
&& ! is_default_map
), reference_type >::type
operator()( const I0 & i0 , const I1 & i1 , const I2 & i2 , const I3 & i3
, Args ... args ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( (m_map,i0,i1,i2,i3,args...) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (NULL,m_map,i0,i1,i2,i3,args...) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3,args...) )
+ #endif
return m_map.reference(i0,i1,i2,i3);
}
//------------------------------
// Rank 5
template< typename I0 , typename I1 , typename I2 , typename I3
, typename I4
, class ... Args >
KOKKOS_FORCEINLINE_FUNCTION
typename std::enable_if<
( Kokkos::Impl::are_integral<I0,I1,I2,I3,I4,Args...>::value
&& ( 5 == Rank )
&& is_default_map
), reference_type >::type
operator()( const I0 & i0 , const I1 & i1 , const I2 & i2 , const I3 & i3
, const I4 & i4
, Args ... args ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( (m_map,i0,i1,i2,i3,i4,args...) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (NULL,m_map,i0,i1,i2,i3,i4,args...) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3,i4,args...) )
+ #endif
return m_map.m_handle[ m_map.m_offset(i0,i1,i2,i3,i4) ];
}
template< typename I0 , typename I1 , typename I2 , typename I3
, typename I4
, class ... Args >
KOKKOS_FORCEINLINE_FUNCTION
typename std::enable_if<
( Kokkos::Impl::are_integral<I0,I1,I2,I3,I4,Args...>::value
&& ( 5 == Rank )
&& ! is_default_map
), reference_type >::type
operator()( const I0 & i0 , const I1 & i1 , const I2 & i2 , const I3 & i3
, const I4 & i4
, Args ... args ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( (m_map,i0,i1,i2,i3,i4,args...) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (NULL,m_map,i0,i1,i2,i3,i4,args...) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3,i4,args...) )
+ #endif
return m_map.reference(i0,i1,i2,i3,i4);
}
//------------------------------
// Rank 6
template< typename I0 , typename I1 , typename I2 , typename I3
, typename I4 , typename I5
, class ... Args >
KOKKOS_FORCEINLINE_FUNCTION
typename std::enable_if<
( Kokkos::Impl::are_integral<I0,I1,I2,I3,I4,I5,Args...>::value
&& ( 6 == Rank )
&& is_default_map
), reference_type >::type
operator()( const I0 & i0 , const I1 & i1 , const I2 & i2 , const I3 & i3
, const I4 & i4 , const I5 & i5
, Args ... args ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( (m_map,i0,i1,i2,i3,i4,i5,args...) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (NULL,m_map,i0,i1,i2,i3,i4,i5,args...) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3,i4,i5,args...) )
+ #endif
return m_map.m_handle[ m_map.m_offset(i0,i1,i2,i3,i4,i5) ];
}
template< typename I0 , typename I1 , typename I2 , typename I3
, typename I4 , typename I5
, class ... Args >
KOKKOS_FORCEINLINE_FUNCTION
typename std::enable_if<
( Kokkos::Impl::are_integral<I0,I1,I2,I3,I4,I5,Args...>::value
&& ( 6 == Rank )
&& ! is_default_map
), reference_type >::type
operator()( const I0 & i0 , const I1 & i1 , const I2 & i2 , const I3 & i3
, const I4 & i4 , const I5 & i5
, Args ... args ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( (m_map,i0,i1,i2,i3,i4,i5,args...) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (NULL,m_map,i0,i1,i2,i3,i4,i5,args...) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3,i4,i5,args...) )
+ #endif
return m_map.reference(i0,i1,i2,i3,i4,i5);
}
//------------------------------
// Rank 7
template< typename I0 , typename I1 , typename I2 , typename I3
, typename I4 , typename I5 , typename I6
, class ... Args >
KOKKOS_FORCEINLINE_FUNCTION
typename std::enable_if<
( Kokkos::Impl::are_integral<I0,I1,I2,I3,I4,I5,I6,Args...>::value
&& ( 7 == Rank )
&& is_default_map
), reference_type >::type
operator()( const I0 & i0 , const I1 & i1 , const I2 & i2 , const I3 & i3
, const I4 & i4 , const I5 & i5 , const I6 & i6
, Args ... args ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( (m_map,i0,i1,i2,i3,i4,i5,i6,args...) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (NULL,m_map,i0,i1,i2,i3,i4,i5,i6,args...) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3,i4,i5,i6,args...) )
+ #endif
return m_map.m_handle[ m_map.m_offset(i0,i1,i2,i3,i4,i5,i6) ];
}
template< typename I0 , typename I1 , typename I2 , typename I3
, typename I4 , typename I5 , typename I6
, class ... Args >
KOKKOS_FORCEINLINE_FUNCTION
typename std::enable_if<
( Kokkos::Impl::are_integral<I0,I1,I2,I3,I4,I5,I6,Args...>::value
&& ( 7 == Rank )
&& ! is_default_map
), reference_type >::type
operator()( const I0 & i0 , const I1 & i1 , const I2 & i2 , const I3 & i3
, const I4 & i4 , const I5 & i5 , const I6 & i6
, Args ... args ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( (m_map,i0,i1,i2,i3,i4,i5,i6,args...) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (NULL,m_map,i0,i1,i2,i3,i4,i5,i6,args...) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3,i4,i5,i6,args...) )
+ #endif
return m_map.reference(i0,i1,i2,i3,i4,i5,i6);
}
//------------------------------
// Rank 8
template< typename I0 , typename I1 , typename I2 , typename I3
, typename I4 , typename I5 , typename I6 , typename I7
, class ... Args >
KOKKOS_FORCEINLINE_FUNCTION
typename std::enable_if<
( Kokkos::Impl::are_integral<I0,I1,I2,I3,I4,I5,I6,I7,Args...>::value
&& ( 8 == Rank )
&& is_default_map
), reference_type >::type
operator()( const I0 & i0 , const I1 & i1 , const I2 & i2 , const I3 & i3
, const I4 & i4 , const I5 & i5 , const I6 & i6 , const I7 & i7
, Args ... args ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( (m_map,i0,i1,i2,i3,i4,i5,i6,i7,args...) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (NULL,m_map,i0,i1,i2,i3,i4,i5,i6,i7,args...) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3,i4,i5,i6,i7,args...) )
+ #endif
return m_map.m_handle[ m_map.m_offset(i0,i1,i2,i3,i4,i5,i6,i7) ];
}
template< typename I0 , typename I1 , typename I2 , typename I3
, typename I4 , typename I5 , typename I6 , typename I7
, class ... Args >
KOKKOS_FORCEINLINE_FUNCTION
typename std::enable_if<
( Kokkos::Impl::are_integral<I0,I1,I2,I3,I4,I5,I6,I7,Args...>::value
&& ( 8 == Rank )
&& ! is_default_map
), reference_type >::type
operator()( const I0 & i0 , const I1 & i1 , const I2 & i2 , const I3 & i3
, const I4 & i4 , const I5 & i5 , const I6 & i6 , const I7 & i7
, Args ... args ) const
{
- KOKKOS_VIEW_OPERATOR_VERIFY( (m_map,i0,i1,i2,i3,i4,i5,i6,i7,args...) )
+ #ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ KOKKOS_VIEW_OPERATOR_VERIFY( (NULL,m_map,i0,i1,i2,i3,i4,i5,i6,i7,args...) )
+ #else
+ KOKKOS_VIEW_OPERATOR_VERIFY( (m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3,i4,i5,i6,i7,args...) )
+ #endif
return m_map.reference(i0,i1,i2,i3,i4,i5,i6,i7);
}
#undef KOKKOS_VIEW_OPERATOR_VERIFY
//----------------------------------------
// Standard destructor, constructors, and assignment operators
KOKKOS_INLINE_FUNCTION
~View() {}
KOKKOS_INLINE_FUNCTION
View() : m_track(), m_map() {}
KOKKOS_INLINE_FUNCTION
View( const View & rhs ) : m_track( rhs.m_track ), m_map( rhs.m_map ) {}
KOKKOS_INLINE_FUNCTION
View( View && rhs ) : m_track( rhs.m_track ), m_map( rhs.m_map ) {}
KOKKOS_INLINE_FUNCTION
View & operator = ( const View & rhs ) { m_track = rhs.m_track ; m_map = rhs.m_map ; return *this ; }
KOKKOS_INLINE_FUNCTION
View & operator = ( View && rhs ) { m_track = rhs.m_track ; m_map = rhs.m_map ; return *this ; }
//----------------------------------------
// Compatible view copy constructor and assignment
// may assign unmanaged from managed.
template< class RT , class ... RP >
KOKKOS_INLINE_FUNCTION
View( const View<RT,RP...> & rhs )
: m_track( rhs.m_track , traits::is_managed )
, m_map()
{
typedef typename View<RT,RP...>::traits SrcTraits ;
- typedef Kokkos::Experimental::Impl::ViewMapping< traits , SrcTraits , void > Mapping ;
+ typedef Kokkos::Impl::ViewMapping< traits , SrcTraits , void > Mapping ;
static_assert( Mapping::is_assignable , "Incompatible View copy construction" );
Mapping::assign( m_map , rhs.m_map , rhs.m_track );
}
template< class RT , class ... RP >
KOKKOS_INLINE_FUNCTION
View & operator = ( const View<RT,RP...> & rhs )
{
typedef typename View<RT,RP...>::traits SrcTraits ;
- typedef Kokkos::Experimental::Impl::ViewMapping< traits , SrcTraits , void > Mapping ;
+ typedef Kokkos::Impl::ViewMapping< traits , SrcTraits , void > Mapping ;
static_assert( Mapping::is_assignable , "Incompatible View copy assignment" );
Mapping::assign( m_map , rhs.m_map , rhs.m_track );
m_track.assign( rhs.m_track , traits::is_managed );
return *this ;
}
//----------------------------------------
// Compatible subview constructor
// may assign unmanaged from managed.
template< class RT , class ... RP , class Arg0 , class ... Args >
KOKKOS_INLINE_FUNCTION
View( const View< RT , RP... > & src_view
, const Arg0 & arg0 , Args ... args )
: m_track( src_view.m_track , traits::is_managed )
, m_map()
{
typedef View< RT , RP... > SrcType ;
- typedef Kokkos::Experimental::Impl::ViewMapping
+ typedef Kokkos::Impl::ViewMapping
< void /* deduce destination view type from source view traits */
, typename SrcType::traits
, Arg0 , Args... > Mapping ;
typedef typename Mapping::type DstType ;
- static_assert( Kokkos::Experimental::Impl::ViewMapping< traits , typename DstType::traits , void >::is_assignable
+ static_assert( Kokkos::Impl::ViewMapping< traits , typename DstType::traits , void >::is_assignable
, "Subview construction requires compatible view and subview arguments" );
Mapping::assign( m_map, src_view.m_map, arg0 , args... );
}
//----------------------------------------
// Allocation tracking properties
KOKKOS_INLINE_FUNCTION
int use_count() const
{ return m_track.use_count(); }
inline
const std::string label() const
{ return m_track.template get_label< typename traits::memory_space >(); }
//----------------------------------------
// Allocation according to allocation properties and array layout
template< class ... P >
explicit inline
View( const Impl::ViewCtorProp< P ... > & arg_prop
, typename std::enable_if< ! Impl::ViewCtorProp< P... >::has_pointer
, typename traits::array_layout
>::type const & arg_layout
)
: m_track()
, m_map()
{
// Append layout and spaces if not input
typedef Impl::ViewCtorProp< P ... > alloc_prop_input ;
// use 'std::integral_constant<unsigned,I>' for non-types
// to avoid duplicate class error.
typedef Impl::ViewCtorProp
< P ...
, typename std::conditional
< alloc_prop_input::has_label
, std::integral_constant<unsigned,0>
, typename std::string
>::type
, typename std::conditional
< alloc_prop_input::has_memory_space
, std::integral_constant<unsigned,1>
, typename traits::device_type::memory_space
>::type
, typename std::conditional
< alloc_prop_input::has_execution_space
, std::integral_constant<unsigned,2>
, typename traits::device_type::execution_space
>::type
> alloc_prop ;
static_assert( traits::is_managed
, "View allocation constructor requires managed memory" );
if ( alloc_prop::initialize &&
! alloc_prop::execution_space::is_initialized() ) {
// If initializing view data then
// the execution space must be initialized.
Kokkos::Impl::throw_runtime_exception("Constructing View and initializing data with uninitialized execution space");
}
// Copy the input allocation properties with possibly defaulted properties
alloc_prop prop( arg_prop );
//------------------------------------------------------------
#if defined( KOKKOS_HAVE_CUDA )
// If allocating in CudaUVMSpace must fence before and after
// the allocation to protect against possible concurrent access
// on the CPU and the GPU.
// Fence using the trait's executon space (which will be Kokkos::Cuda)
// to avoid incomplete type errors from usng Kokkos::Cuda directly.
if ( std::is_same< Kokkos::CudaUVMSpace , typename traits::device_type::memory_space >::value ) {
traits::device_type::memory_space::execution_space::fence();
}
#endif
//------------------------------------------------------------
- Kokkos::Experimental::Impl::SharedAllocationRecord<> *
+ Kokkos::Impl::SharedAllocationRecord<> *
record = m_map.allocate_shared( prop , arg_layout );
//------------------------------------------------------------
#if defined( KOKKOS_HAVE_CUDA )
if ( std::is_same< Kokkos::CudaUVMSpace , typename traits::device_type::memory_space >::value ) {
traits::device_type::memory_space::execution_space::fence();
}
#endif
//------------------------------------------------------------
// Setup and initialization complete, start tracking
m_track.assign_allocated_record_to_uninitialized( record );
}
// Wrap memory according to properties and array layout
template< class ... P >
explicit KOKKOS_INLINE_FUNCTION
View( const Impl::ViewCtorProp< P ... > & arg_prop
, typename std::enable_if< Impl::ViewCtorProp< P... >::has_pointer
, typename traits::array_layout
>::type const & arg_layout
)
: m_track() // No memory tracking
, m_map( arg_prop , arg_layout )
{
static_assert(
std::is_same< pointer_type
, typename Impl::ViewCtorProp< P... >::pointer_type
>::value ,
"Constructing View to wrap user memory must supply matching pointer type" );
}
// Simple dimension-only layout
template< class ... P >
explicit inline
View( const Impl::ViewCtorProp< P ... > & arg_prop
, typename std::enable_if< ! Impl::ViewCtorProp< P... >::has_pointer
, size_t
>::type const arg_N0 = 0
, const size_t arg_N1 = 0
, const size_t arg_N2 = 0
, const size_t arg_N3 = 0
, const size_t arg_N4 = 0
, const size_t arg_N5 = 0
, const size_t arg_N6 = 0
, const size_t arg_N7 = 0
)
: View( arg_prop
, typename traits::array_layout
( arg_N0 , arg_N1 , arg_N2 , arg_N3
, arg_N4 , arg_N5 , arg_N6 , arg_N7 )
)
{}
template< class ... P >
explicit KOKKOS_INLINE_FUNCTION
View( const Impl::ViewCtorProp< P ... > & arg_prop
, typename std::enable_if< Impl::ViewCtorProp< P... >::has_pointer
, size_t
>::type const arg_N0 = 0
, const size_t arg_N1 = 0
, const size_t arg_N2 = 0
, const size_t arg_N3 = 0
, const size_t arg_N4 = 0
, const size_t arg_N5 = 0
, const size_t arg_N6 = 0
, const size_t arg_N7 = 0
)
: View( arg_prop
, typename traits::array_layout
( arg_N0 , arg_N1 , arg_N2 , arg_N3
, arg_N4 , arg_N5 , arg_N6 , arg_N7 )
)
{}
// Allocate with label and layout
template< typename Label >
explicit inline
View( const Label & arg_label
, typename std::enable_if<
- Kokkos::Experimental::Impl::is_view_label<Label>::value ,
+ Kokkos::Impl::is_view_label<Label>::value ,
typename traits::array_layout >::type const & arg_layout
)
: View( Impl::ViewCtorProp< std::string >( arg_label ) , arg_layout )
{}
// Allocate label and layout, must disambiguate from subview constructor.
template< typename Label >
explicit inline
View( const Label & arg_label
, typename std::enable_if<
- Kokkos::Experimental::Impl::is_view_label<Label>::value ,
+ Kokkos::Impl::is_view_label<Label>::value ,
const size_t >::type arg_N0 = 0
, const size_t arg_N1 = 0
, const size_t arg_N2 = 0
, const size_t arg_N3 = 0
, const size_t arg_N4 = 0
, const size_t arg_N5 = 0
, const size_t arg_N6 = 0
, const size_t arg_N7 = 0
)
: View( Impl::ViewCtorProp< std::string >( arg_label )
, typename traits::array_layout
( arg_N0 , arg_N1 , arg_N2 , arg_N3
, arg_N4 , arg_N5 , arg_N6 , arg_N7 )
)
{}
// For backward compatibility
explicit inline
View( const ViewAllocateWithoutInitializing & arg_prop
, const typename traits::array_layout & arg_layout
)
- : View( Impl::ViewCtorProp< std::string , Kokkos::Experimental::Impl::WithoutInitializing_t >( arg_prop.label , Kokkos::Experimental::WithoutInitializing )
+ : View( Impl::ViewCtorProp< std::string , Kokkos::Impl::WithoutInitializing_t >( arg_prop.label , Kokkos::WithoutInitializing )
, arg_layout
)
{}
explicit inline
View( const ViewAllocateWithoutInitializing & arg_prop
, const size_t arg_N0 = 0
, const size_t arg_N1 = 0
, const size_t arg_N2 = 0
, const size_t arg_N3 = 0
, const size_t arg_N4 = 0
, const size_t arg_N5 = 0
, const size_t arg_N6 = 0
, const size_t arg_N7 = 0
)
- : View( Impl::ViewCtorProp< std::string , Kokkos::Experimental::Impl::WithoutInitializing_t >( arg_prop.label , Kokkos::Experimental::WithoutInitializing )
+ : View( Impl::ViewCtorProp< std::string , Kokkos::Impl::WithoutInitializing_t >( arg_prop.label , Kokkos::WithoutInitializing )
, typename traits::array_layout
( arg_N0 , arg_N1 , arg_N2 , arg_N3
, arg_N4 , arg_N5 , arg_N6 , arg_N7 )
)
{}
//----------------------------------------
// Memory span required to wrap these dimensions.
static constexpr size_t required_allocation_size(
const size_t arg_N0 = 0
, const size_t arg_N1 = 0
, const size_t arg_N2 = 0
, const size_t arg_N3 = 0
, const size_t arg_N4 = 0
, const size_t arg_N5 = 0
, const size_t arg_N6 = 0
, const size_t arg_N7 = 0
)
{
return map_type::memory_span(
typename traits::array_layout
( arg_N0 , arg_N1 , arg_N2 , arg_N3
, arg_N4 , arg_N5 , arg_N6 , arg_N7 ) );
}
explicit KOKKOS_INLINE_FUNCTION
View( pointer_type arg_ptr
, const size_t arg_N0 = 0
, const size_t arg_N1 = 0
, const size_t arg_N2 = 0
, const size_t arg_N3 = 0
, const size_t arg_N4 = 0
, const size_t arg_N5 = 0
, const size_t arg_N6 = 0
, const size_t arg_N7 = 0
)
: View( Impl::ViewCtorProp<pointer_type>(arg_ptr)
, typename traits::array_layout
( arg_N0 , arg_N1 , arg_N2 , arg_N3
, arg_N4 , arg_N5 , arg_N6 , arg_N7 )
)
{}
explicit KOKKOS_INLINE_FUNCTION
View( pointer_type arg_ptr
, const typename traits::array_layout & arg_layout
)
: View( Impl::ViewCtorProp<pointer_type>(arg_ptr) , arg_layout )
{}
//----------------------------------------
// Shared scratch memory constructor
static inline
size_t shmem_size( const size_t arg_N0 = ~size_t(0) ,
const size_t arg_N1 = ~size_t(0) ,
const size_t arg_N2 = ~size_t(0) ,
const size_t arg_N3 = ~size_t(0) ,
const size_t arg_N4 = ~size_t(0) ,
const size_t arg_N5 = ~size_t(0) ,
const size_t arg_N6 = ~size_t(0) ,
const size_t arg_N7 = ~size_t(0) )
{
const size_t num_passed_args =
( arg_N0 != ~size_t(0) ) + ( arg_N1 != ~size_t(0) ) + ( arg_N2 != ~size_t(0) ) +
( arg_N3 != ~size_t(0) ) + ( arg_N4 != ~size_t(0) ) + ( arg_N5 != ~size_t(0) ) +
( arg_N6 != ~size_t(0) ) + ( arg_N7 != ~size_t(0) );
if ( std::is_same<typename traits::specialize,void>::value && num_passed_args != traits::rank_dynamic ) {
Kokkos::abort( "Kokkos::View::shmem_size() rank_dynamic != number of arguments.\n" );
}
return map_type::memory_span(
typename traits::array_layout
( arg_N0 , arg_N1 , arg_N2 , arg_N3
, arg_N4 , arg_N5 , arg_N6 , arg_N7 ) );
}
explicit KOKKOS_INLINE_FUNCTION
View( const typename traits::execution_space::scratch_memory_space & arg_space
, const typename traits::array_layout & arg_layout )
: View( Impl::ViewCtorProp<pointer_type>(
reinterpret_cast<pointer_type>(
arg_space.get_shmem( map_type::memory_span( arg_layout ) ) ) )
, arg_layout )
{}
explicit KOKKOS_INLINE_FUNCTION
View( const typename traits::execution_space::scratch_memory_space & arg_space
, const size_t arg_N0 = 0
, const size_t arg_N1 = 0
, const size_t arg_N2 = 0
, const size_t arg_N3 = 0
, const size_t arg_N4 = 0
, const size_t arg_N5 = 0
, const size_t arg_N6 = 0
, const size_t arg_N7 = 0 )
: View( Impl::ViewCtorProp<pointer_type>(
reinterpret_cast<pointer_type>(
arg_space.get_shmem(
map_type::memory_span(
typename traits::array_layout
( arg_N0 , arg_N1 , arg_N2 , arg_N3
, arg_N4 , arg_N5 , arg_N6 , arg_N7 ) ) ) ) )
, typename traits::array_layout
( arg_N0 , arg_N1 , arg_N2 , arg_N3
, arg_N4 , arg_N5 , arg_N6 , arg_N7 )
)
{}
};
/** \brief Temporary free function rank()
* until rank() is implemented
* in the View
*/
template < typename D , class ... P >
KOKKOS_INLINE_FUNCTION
constexpr unsigned rank( const View<D , P...> & V ) { return V.Rank; } //Temporary until added to view
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
template< class V , class ... Args >
using Subview =
- typename Kokkos::Experimental::Impl::ViewMapping
+ typename Kokkos::Impl::ViewMapping
< void /* deduce subview type from source view traits */
, typename V::traits
, Args ...
>::type ;
template< class D, class ... P , class ... Args >
KOKKOS_INLINE_FUNCTION
-typename Kokkos::Experimental::Impl::ViewMapping
+typename Kokkos::Impl::ViewMapping
< void /* deduce subview type from source view traits */
, ViewTraits< D , P... >
, Args ...
>::type
subview( const View< D, P... > & src , Args ... args )
{
static_assert( View< D , P... >::Rank == sizeof...(Args) ,
"subview requires one argument for each source View rank" );
return typename
- Kokkos::Experimental::Impl::ViewMapping
+ Kokkos::Impl::ViewMapping
< void /* deduce subview type from source view traits */
, ViewTraits< D , P ... >
, Args ... >::type( src , args ... );
}
template< class MemoryTraits , class D, class ... P , class ... Args >
KOKKOS_INLINE_FUNCTION
-typename Kokkos::Experimental::Impl::ViewMapping
+typename Kokkos::Impl::ViewMapping
< void /* deduce subview type from source view traits */
, ViewTraits< D , P... >
, Args ...
>::template apply< MemoryTraits >::type
subview( const View< D, P... > & src , Args ... args )
{
static_assert( View< D , P... >::Rank == sizeof...(Args) ,
"subview requires one argument for each source View rank" );
return typename
- Kokkos::Experimental::Impl::ViewMapping
+ Kokkos::Impl::ViewMapping
< void /* deduce subview type from source view traits */
, ViewTraits< D , P ... >
, Args ... >
::template apply< MemoryTraits >
::type( src , args ... );
}
-
-
-} /* namespace Experimental */
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
-namespace Experimental {
template< class LT , class ... LP , class RT , class ... RP >
KOKKOS_INLINE_FUNCTION
bool operator == ( const View<LT,LP...> & lhs ,
const View<RT,RP...> & rhs )
{
// Same data, layout, dimensions
typedef ViewTraits<LT,LP...> lhs_traits ;
typedef ViewTraits<RT,RP...> rhs_traits ;
return
std::is_same< typename lhs_traits::const_value_type ,
typename rhs_traits::const_value_type >::value &&
std::is_same< typename lhs_traits::array_layout ,
typename rhs_traits::array_layout >::value &&
std::is_same< typename lhs_traits::memory_space ,
typename rhs_traits::memory_space >::value &&
unsigned(lhs_traits::rank) == unsigned(rhs_traits::rank) &&
lhs.data() == rhs.data() &&
lhs.span() == rhs.span() &&
lhs.dimension_0() == rhs.dimension_0() &&
lhs.dimension_1() == rhs.dimension_1() &&
lhs.dimension_2() == rhs.dimension_2() &&
lhs.dimension_3() == rhs.dimension_3() &&
lhs.dimension_4() == rhs.dimension_4() &&
lhs.dimension_5() == rhs.dimension_5() &&
lhs.dimension_6() == rhs.dimension_6() &&
lhs.dimension_7() == rhs.dimension_7();
}
template< class LT , class ... LP , class RT , class ... RP >
KOKKOS_INLINE_FUNCTION
bool operator != ( const View<LT,LP...> & lhs ,
const View<RT,RP...> & rhs )
{
return ! ( operator==(lhs,rhs) );
}
-} /* namespace Experimental */
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
inline
void shared_allocation_tracking_claim_and_disable()
-{ Kokkos::Experimental::Impl::SharedAllocationRecord<void,void>::tracking_claim_and_disable(); }
+{ Kokkos::Impl::SharedAllocationRecord<void,void>::tracking_claim_and_disable(); }
inline
void shared_allocation_tracking_release_and_enable()
-{ Kokkos::Experimental::Impl::SharedAllocationRecord<void,void>::tracking_release_and_enable(); }
+{ Kokkos::Impl::SharedAllocationRecord<void,void>::tracking_release_and_enable(); }
} /* namespace Impl */
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
-namespace Experimental {
namespace Impl {
template< class OutputView , typename Enable = void >
struct ViewFill {
typedef typename OutputView::const_value_type const_value_type ;
const OutputView output ;
const_value_type input ;
KOKKOS_INLINE_FUNCTION
void operator()( const size_t i0 ) const
{
const size_t n1 = output.dimension_1();
const size_t n2 = output.dimension_2();
const size_t n3 = output.dimension_3();
const size_t n4 = output.dimension_4();
const size_t n5 = output.dimension_5();
const size_t n6 = output.dimension_6();
const size_t n7 = output.dimension_7();
for ( size_t i1 = 0 ; i1 < n1 ; ++i1 ) {
for ( size_t i2 = 0 ; i2 < n2 ; ++i2 ) {
for ( size_t i3 = 0 ; i3 < n3 ; ++i3 ) {
for ( size_t i4 = 0 ; i4 < n4 ; ++i4 ) {
for ( size_t i5 = 0 ; i5 < n5 ; ++i5 ) {
for ( size_t i6 = 0 ; i6 < n6 ; ++i6 ) {
for ( size_t i7 = 0 ; i7 < n7 ; ++i7 ) {
output(i0,i1,i2,i3,i4,i5,i6,i7) = input ;
}}}}}}}
}
ViewFill( const OutputView & arg_out , const_value_type & arg_in )
: output( arg_out ), input( arg_in )
{
typedef typename OutputView::execution_space execution_space ;
typedef Kokkos::RangePolicy< execution_space > Policy ;
const Kokkos::Impl::ParallelFor< ViewFill , Policy > closure( *this , Policy( 0 , output.dimension_0() ) );
closure.execute();
execution_space::fence();
}
};
template< class OutputView >
struct ViewFill< OutputView , typename std::enable_if< OutputView::Rank == 0 >::type > {
ViewFill( const OutputView & dst , const typename OutputView::const_value_type & src )
{
Kokkos::Impl::DeepCopy< typename OutputView::memory_space , Kokkos::HostSpace >
( dst.data() , & src , sizeof(typename OutputView::const_value_type) );
}
};
template< class OutputView , class InputView , class ExecSpace = typename OutputView::execution_space >
struct ViewRemap {
const OutputView output ;
const InputView input ;
const size_t n0 ;
const size_t n1 ;
const size_t n2 ;
const size_t n3 ;
const size_t n4 ;
const size_t n5 ;
const size_t n6 ;
const size_t n7 ;
ViewRemap( const OutputView & arg_out , const InputView & arg_in )
: output( arg_out ), input( arg_in )
, n0( std::min( (size_t)arg_out.dimension_0() , (size_t)arg_in.dimension_0() ) )
, n1( std::min( (size_t)arg_out.dimension_1() , (size_t)arg_in.dimension_1() ) )
, n2( std::min( (size_t)arg_out.dimension_2() , (size_t)arg_in.dimension_2() ) )
, n3( std::min( (size_t)arg_out.dimension_3() , (size_t)arg_in.dimension_3() ) )
, n4( std::min( (size_t)arg_out.dimension_4() , (size_t)arg_in.dimension_4() ) )
, n5( std::min( (size_t)arg_out.dimension_5() , (size_t)arg_in.dimension_5() ) )
, n6( std::min( (size_t)arg_out.dimension_6() , (size_t)arg_in.dimension_6() ) )
, n7( std::min( (size_t)arg_out.dimension_7() , (size_t)arg_in.dimension_7() ) )
{
typedef Kokkos::RangePolicy< ExecSpace > Policy ;
const Kokkos::Impl::ParallelFor< ViewRemap , Policy > closure( *this , Policy( 0 , n0 ) );
closure.execute();
}
KOKKOS_INLINE_FUNCTION
void operator()( const size_t i0 ) const
{
for ( size_t i1 = 0 ; i1 < n1 ; ++i1 ) {
for ( size_t i2 = 0 ; i2 < n2 ; ++i2 ) {
for ( size_t i3 = 0 ; i3 < n3 ; ++i3 ) {
for ( size_t i4 = 0 ; i4 < n4 ; ++i4 ) {
for ( size_t i5 = 0 ; i5 < n5 ; ++i5 ) {
for ( size_t i6 = 0 ; i6 < n6 ; ++i6 ) {
for ( size_t i7 = 0 ; i7 < n7 ; ++i7 ) {
output(i0,i1,i2,i3,i4,i5,i6,i7) = input(i0,i1,i2,i3,i4,i5,i6,i7);
}}}}}}}
}
};
} /* namespace Impl */
-} /* namespace Experimental */
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
-namespace Experimental {
/** \brief Deep copy a value from Host memory into a view. */
template< class DT , class ... DP >
inline
void deep_copy
( const View<DT,DP...> & dst
, typename ViewTraits<DT,DP...>::const_value_type & value
, typename std::enable_if<
std::is_same< typename ViewTraits<DT,DP...>::specialize , void >::value
>::type * = 0 )
{
static_assert(
std::is_same< typename ViewTraits<DT,DP...>::non_const_value_type ,
typename ViewTraits<DT,DP...>::value_type >::value
, "deep_copy requires non-const type" );
- Kokkos::Experimental::Impl::ViewFill< View<DT,DP...> >( dst , value );
+ Kokkos::Impl::ViewFill< View<DT,DP...> >( dst , value );
}
/** \brief Deep copy into a value in Host memory from a view. */
template< class ST , class ... SP >
inline
void deep_copy
( typename ViewTraits<ST,SP...>::non_const_value_type & dst
, const View<ST,SP...> & src
, typename std::enable_if<
std::is_same< typename ViewTraits<ST,SP...>::specialize , void >::value
>::type * = 0 )
{
static_assert( ViewTraits<ST,SP...>::rank == 0
, "ERROR: Non-rank-zero view in deep_copy( value , View )" );
typedef ViewTraits<ST,SP...> src_traits ;
typedef typename src_traits::memory_space src_memory_space ;
Kokkos::Impl::DeepCopy< HostSpace , src_memory_space >( & dst , src.data() , sizeof(ST) );
}
//----------------------------------------------------------------------------
/** \brief A deep copy between views of compatible type, and rank zero. */
template< class DT , class ... DP , class ST , class ... SP >
inline
void deep_copy
( const View<DT,DP...> & dst
, const View<ST,SP...> & src
, typename std::enable_if<(
std::is_same< typename ViewTraits<DT,DP...>::specialize , void >::value &&
std::is_same< typename ViewTraits<ST,SP...>::specialize , void >::value &&
( unsigned(ViewTraits<DT,DP...>::rank) == unsigned(0) &&
unsigned(ViewTraits<ST,SP...>::rank) == unsigned(0) )
)>::type * = 0 )
{
static_assert(
std::is_same< typename ViewTraits<DT,DP...>::value_type ,
typename ViewTraits<ST,SP...>::non_const_value_type >::value
, "deep_copy requires matching non-const destination type" );
typedef View<DT,DP...> dst_type ;
typedef View<ST,SP...> src_type ;
typedef typename dst_type::value_type value_type ;
typedef typename dst_type::memory_space dst_memory_space ;
typedef typename src_type::memory_space src_memory_space ;
if ( dst.data() != src.data() ) {
Kokkos::Impl::DeepCopy< dst_memory_space , src_memory_space >( dst.data() , src.data() , sizeof(value_type) );
}
}
//----------------------------------------------------------------------------
/** \brief A deep copy between views of the default specialization, compatible type,
* same non-zero rank, same contiguous layout.
*/
template< class DT , class ... DP , class ST , class ... SP >
inline
void deep_copy
( const View<DT,DP...> & dst
, const View<ST,SP...> & src
, typename std::enable_if<(
std::is_same< typename ViewTraits<DT,DP...>::specialize , void >::value &&
std::is_same< typename ViewTraits<ST,SP...>::specialize , void >::value &&
( unsigned(ViewTraits<DT,DP...>::rank) != 0 ||
unsigned(ViewTraits<ST,SP...>::rank) != 0 )
)>::type * = 0 )
{
static_assert(
std::is_same< typename ViewTraits<DT,DP...>::value_type ,
typename ViewTraits<DT,DP...>::non_const_value_type >::value
, "deep_copy requires non-const destination type" );
static_assert(
( unsigned(ViewTraits<DT,DP...>::rank) ==
unsigned(ViewTraits<ST,SP...>::rank) )
, "deep_copy requires Views of equal rank" );
typedef View<DT,DP...> dst_type ;
typedef View<ST,SP...> src_type ;
typedef typename dst_type::execution_space dst_execution_space ;
typedef typename src_type::execution_space src_execution_space ;
typedef typename dst_type::memory_space dst_memory_space ;
typedef typename src_type::memory_space src_memory_space ;
enum { DstExecCanAccessSrc =
- Kokkos::Impl::VerifyExecutionCanAccessMemorySpace< typename dst_execution_space::memory_space , src_memory_space >::value };
+ Kokkos::Impl::SpaceAccessibility< dst_execution_space , src_memory_space >::accessible };
enum { SrcExecCanAccessDst =
- Kokkos::Impl::VerifyExecutionCanAccessMemorySpace< typename src_execution_space::memory_space , dst_memory_space >::value };
+ Kokkos::Impl::SpaceAccessibility< src_execution_space , dst_memory_space >::accessible };
if ( (void *) dst.data() != (void*) src.data() ) {
// Concern: If overlapping views then a parallel copy will be erroneous.
// ...
// If same type, equal layout, equal dimensions, equal span, and contiguous memory then can byte-wise copy
if ( std::is_same< typename ViewTraits<DT,DP...>::value_type ,
typename ViewTraits<ST,SP...>::non_const_value_type >::value &&
(
( std::is_same< typename ViewTraits<DT,DP...>::array_layout ,
typename ViewTraits<ST,SP...>::array_layout >::value
&&
( std::is_same< typename ViewTraits<DT,DP...>::array_layout ,
typename Kokkos::LayoutLeft>::value
||
std::is_same< typename ViewTraits<DT,DP...>::array_layout ,
typename Kokkos::LayoutRight>::value
)
)
||
( ViewTraits<DT,DP...>::rank == 1 &&
ViewTraits<ST,SP...>::rank == 1 )
) &&
dst.span_is_contiguous() &&
src.span_is_contiguous() &&
dst.span() == src.span() &&
dst.dimension_0() == src.dimension_0() &&
dst.dimension_1() == src.dimension_1() &&
dst.dimension_2() == src.dimension_2() &&
dst.dimension_3() == src.dimension_3() &&
dst.dimension_4() == src.dimension_4() &&
dst.dimension_5() == src.dimension_5() &&
dst.dimension_6() == src.dimension_6() &&
dst.dimension_7() == src.dimension_7() ) {
const size_t nbytes = sizeof(typename dst_type::value_type) * dst.span();
Kokkos::Impl::DeepCopy< dst_memory_space , src_memory_space >( dst.data() , src.data() , nbytes );
}
else if ( std::is_same< typename ViewTraits<DT,DP...>::value_type ,
typename ViewTraits<ST,SP...>::non_const_value_type >::value &&
(
( std::is_same< typename ViewTraits<DT,DP...>::array_layout ,
typename ViewTraits<ST,SP...>::array_layout >::value
&&
std::is_same< typename ViewTraits<DT,DP...>::array_layout ,
typename Kokkos::LayoutStride>::value
)
||
( ViewTraits<DT,DP...>::rank == 1 &&
ViewTraits<ST,SP...>::rank == 1 )
) &&
dst.span_is_contiguous() &&
src.span_is_contiguous() &&
dst.span() == src.span() &&
dst.dimension_0() == src.dimension_0() &&
dst.dimension_1() == src.dimension_1() &&
dst.dimension_2() == src.dimension_2() &&
dst.dimension_3() == src.dimension_3() &&
dst.dimension_4() == src.dimension_4() &&
dst.dimension_5() == src.dimension_5() &&
dst.dimension_6() == src.dimension_6() &&
dst.dimension_7() == src.dimension_7() &&
dst.stride_0() == src.stride_0() &&
dst.stride_1() == src.stride_1() &&
dst.stride_2() == src.stride_2() &&
dst.stride_3() == src.stride_3() &&
dst.stride_4() == src.stride_4() &&
dst.stride_5() == src.stride_5() &&
dst.stride_6() == src.stride_6() &&
dst.stride_7() == src.stride_7()
) {
const size_t nbytes = sizeof(typename dst_type::value_type) * dst.span();
Kokkos::Impl::DeepCopy< dst_memory_space , src_memory_space >( dst.data() , src.data() , nbytes );
}
else if ( DstExecCanAccessSrc ) {
// Copying data between views in accessible memory spaces and either non-contiguous or incompatible shape.
- Kokkos::Experimental::Impl::ViewRemap< dst_type , src_type >( dst , src );
+ Kokkos::Impl::ViewRemap< dst_type , src_type >( dst , src );
}
else if ( SrcExecCanAccessDst ) {
// Copying data between views in accessible memory spaces and either non-contiguous or incompatible shape.
- Kokkos::Experimental::Impl::ViewRemap< dst_type , src_type , src_execution_space >( dst , src );
+ Kokkos::Impl::ViewRemap< dst_type , src_type , src_execution_space >( dst , src );
}
else {
Kokkos::Impl::throw_runtime_exception("deep_copy given views that would require a temporary allocation");
}
}
}
-} /* namespace Experimental */
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
-namespace Experimental {
/** \brief Deep copy a value from Host memory into a view. */
template< class ExecSpace ,class DT , class ... DP >
inline
void deep_copy
( const ExecSpace &
, const View<DT,DP...> & dst
, typename ViewTraits<DT,DP...>::const_value_type & value
, typename std::enable_if<
Kokkos::Impl::is_execution_space< ExecSpace >::value &&
std::is_same< typename ViewTraits<DT,DP...>::specialize , void >::value
>::type * = 0 )
{
static_assert(
std::is_same< typename ViewTraits<DT,DP...>::non_const_value_type ,
typename ViewTraits<DT,DP...>::value_type >::value
, "deep_copy requires non-const type" );
- Kokkos::Experimental::Impl::ViewFill< View<DT,DP...> >( dst , value );
+ Kokkos::Impl::ViewFill< View<DT,DP...> >( dst , value );
}
/** \brief Deep copy into a value in Host memory from a view. */
template< class ExecSpace , class ST , class ... SP >
inline
void deep_copy
( const ExecSpace & exec_space
, typename ViewTraits<ST,SP...>::non_const_value_type & dst
, const View<ST,SP...> & src
, typename std::enable_if<
Kokkos::Impl::is_execution_space< ExecSpace >::value &&
std::is_same< typename ViewTraits<ST,SP...>::specialize , void >::value
>::type * = 0 )
{
static_assert( ViewTraits<ST,SP...>::rank == 0
, "ERROR: Non-rank-zero view in deep_copy( value , View )" );
typedef ViewTraits<ST,SP...> src_traits ;
typedef typename src_traits::memory_space src_memory_space ;
Kokkos::Impl::DeepCopy< HostSpace , src_memory_space , ExecSpace >
( exec_space , & dst , src.data() , sizeof(ST) );
}
//----------------------------------------------------------------------------
/** \brief A deep copy between views of compatible type, and rank zero. */
template< class ExecSpace , class DT , class ... DP , class ST , class ... SP >
inline
void deep_copy
( const ExecSpace & exec_space
, const View<DT,DP...> & dst
, const View<ST,SP...> & src
, typename std::enable_if<(
Kokkos::Impl::is_execution_space< ExecSpace >::value &&
std::is_same< typename ViewTraits<DT,DP...>::specialize , void >::value &&
std::is_same< typename ViewTraits<ST,SP...>::specialize , void >::value &&
( unsigned(ViewTraits<DT,DP...>::rank) == unsigned(0) &&
unsigned(ViewTraits<ST,SP...>::rank) == unsigned(0) )
)>::type * = 0 )
{
static_assert(
std::is_same< typename ViewTraits<DT,DP...>::value_type ,
typename ViewTraits<ST,SP...>::non_const_value_type >::value
, "deep_copy requires matching non-const destination type" );
typedef View<DT,DP...> dst_type ;
typedef View<ST,SP...> src_type ;
typedef typename dst_type::value_type value_type ;
typedef typename dst_type::memory_space dst_memory_space ;
typedef typename src_type::memory_space src_memory_space ;
if ( dst.data() != src.data() ) {
Kokkos::Impl::DeepCopy< dst_memory_space , src_memory_space , ExecSpace >
( exec_space , dst.data() , src.data() , sizeof(value_type) );
}
}
//----------------------------------------------------------------------------
/** \brief A deep copy between views of the default specialization, compatible type,
* same non-zero rank, same contiguous layout.
*/
template< class ExecSpace , class DT, class ... DP, class ST, class ... SP >
inline
void deep_copy
( const ExecSpace & exec_space
, const View<DT,DP...> & dst
, const View<ST,SP...> & src
, typename std::enable_if<(
Kokkos::Impl::is_execution_space< ExecSpace >::value &&
std::is_same< typename ViewTraits<DT,DP...>::specialize , void >::value &&
std::is_same< typename ViewTraits<ST,SP...>::specialize , void >::value &&
( unsigned(ViewTraits<DT,DP...>::rank) != 0 ||
unsigned(ViewTraits<ST,SP...>::rank) != 0 )
)>::type * = 0 )
{
static_assert(
std::is_same< typename ViewTraits<DT,DP...>::value_type ,
typename ViewTraits<DT,DP...>::non_const_value_type >::value
, "deep_copy requires non-const destination type" );
static_assert(
( unsigned(ViewTraits<DT,DP...>::rank) ==
unsigned(ViewTraits<ST,SP...>::rank) )
, "deep_copy requires Views of equal rank" );
typedef View<DT,DP...> dst_type ;
typedef View<ST,SP...> src_type ;
typedef typename dst_type::execution_space dst_execution_space ;
typedef typename src_type::execution_space src_execution_space ;
typedef typename dst_type::memory_space dst_memory_space ;
typedef typename src_type::memory_space src_memory_space ;
enum { DstExecCanAccessSrc =
- Kokkos::Impl::VerifyExecutionCanAccessMemorySpace< typename dst_execution_space::memory_space , src_memory_space >::value };
+ Kokkos::Impl::SpaceAccessibility< dst_execution_space , src_memory_space >::accessible };
enum { SrcExecCanAccessDst =
- Kokkos::Impl::VerifyExecutionCanAccessMemorySpace< typename src_execution_space::memory_space , dst_memory_space >::value };
+ Kokkos::Impl::SpaceAccessibility< src_execution_space , dst_memory_space >::accessible };
if ( (void *) dst.data() != (void*) src.data() ) {
// Concern: If overlapping views then a parallel copy will be erroneous.
// ...
// If same type, equal layout, equal dimensions, equal span, and contiguous memory then can byte-wise copy
if ( std::is_same< typename ViewTraits<DT,DP...>::value_type ,
typename ViewTraits<ST,SP...>::non_const_value_type >::value &&
(
std::is_same< typename ViewTraits<DT,DP...>::array_layout ,
typename ViewTraits<ST,SP...>::array_layout >::value
||
( ViewTraits<DT,DP...>::rank == 1 &&
ViewTraits<ST,SP...>::rank == 1 )
) &&
dst.span_is_contiguous() &&
src.span_is_contiguous() &&
dst.span() == src.span() &&
dst.dimension_0() == src.dimension_0() &&
dst.dimension_1() == src.dimension_1() &&
dst.dimension_2() == src.dimension_2() &&
dst.dimension_3() == src.dimension_3() &&
dst.dimension_4() == src.dimension_4() &&
dst.dimension_5() == src.dimension_5() &&
dst.dimension_6() == src.dimension_6() &&
dst.dimension_7() == src.dimension_7() ) {
const size_t nbytes = sizeof(typename dst_type::value_type) * dst.span();
Kokkos::Impl::DeepCopy< dst_memory_space , src_memory_space , ExecSpace >
( exec_space , dst.data() , src.data() , nbytes );
}
else if ( DstExecCanAccessSrc ) {
// Copying data between views in accessible memory spaces and either non-contiguous or incompatible shape.
- Kokkos::Experimental::Impl::ViewRemap< dst_type , src_type >( dst , src );
+ Kokkos::Impl::ViewRemap< dst_type , src_type >( dst , src );
}
else if ( SrcExecCanAccessDst ) {
// Copying data between views in accessible memory spaces and either non-contiguous or incompatible shape.
- Kokkos::Experimental::Impl::ViewRemap< dst_type , src_type , src_execution_space >( dst , src );
+ Kokkos::Impl::ViewRemap< dst_type , src_type , src_execution_space >( dst , src );
}
else {
Kokkos::Impl::throw_runtime_exception("deep_copy given views that would require a temporary allocation");
}
}
}
-} /* namespace Experimental */
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
-namespace Experimental {
namespace Impl {
// Deduce Mirror Types
template<class Space, class T, class ... P>
struct MirrorViewType {
// The incoming view_type
- typedef typename Kokkos::Experimental::View<T,P...> src_view_type;
+ typedef typename Kokkos::View<T,P...> src_view_type;
// The memory space for the mirror view
typedef typename Space::memory_space memory_space;
// Check whether it is the same memory space
enum { is_same_memspace = std::is_same<memory_space,typename src_view_type::memory_space>::value };
// The array_layout
typedef typename src_view_type::array_layout array_layout;
// The data type (we probably want it non-const since otherwise we can't even deep_copy to it.
typedef typename src_view_type::non_const_data_type data_type;
// The destination view type if it is not the same memory space
- typedef Kokkos::Experimental::View<data_type,array_layout,Space> dest_view_type;
+ typedef Kokkos::View<data_type,array_layout,Space> dest_view_type;
// If it is the same memory_space return the existsing view_type
// This will also keep the unmanaged trait if necessary
typedef typename std::conditional<is_same_memspace,src_view_type,dest_view_type>::type view_type;
};
template<class Space, class T, class ... P>
struct MirrorType {
// The incoming view_type
- typedef typename Kokkos::Experimental::View<T,P...> src_view_type;
+ typedef typename Kokkos::View<T,P...> src_view_type;
// The memory space for the mirror view
typedef typename Space::memory_space memory_space;
// Check whether it is the same memory space
enum { is_same_memspace = std::is_same<memory_space,typename src_view_type::memory_space>::value };
// The array_layout
typedef typename src_view_type::array_layout array_layout;
// The data type (we probably want it non-const since otherwise we can't even deep_copy to it.
typedef typename src_view_type::non_const_data_type data_type;
// The destination view type if it is not the same memory space
- typedef Kokkos::Experimental::View<data_type,array_layout,Space> view_type;
+ typedef Kokkos::View<data_type,array_layout,Space> view_type;
};
}
template< class T , class ... P >
inline
-typename Kokkos::Experimental::View<T,P...>::HostMirror
-create_mirror( const Kokkos::Experimental::View<T,P...> & src
+typename Kokkos::View<T,P...>::HostMirror
+create_mirror( const Kokkos::View<T,P...> & src
, typename std::enable_if<
- ! std::is_same< typename Kokkos::Experimental::ViewTraits<T,P...>::array_layout
+ ! std::is_same< typename Kokkos::ViewTraits<T,P...>::array_layout
, Kokkos::LayoutStride >::value
>::type * = 0
)
{
typedef View<T,P...> src_type ;
typedef typename src_type::HostMirror dst_type ;
return dst_type( std::string( src.label() ).append("_mirror")
, src.dimension_0()
, src.dimension_1()
, src.dimension_2()
, src.dimension_3()
, src.dimension_4()
, src.dimension_5()
, src.dimension_6()
, src.dimension_7() );
}
template< class T , class ... P >
inline
-typename Kokkos::Experimental::View<T,P...>::HostMirror
-create_mirror( const Kokkos::Experimental::View<T,P...> & src
+typename Kokkos::View<T,P...>::HostMirror
+create_mirror( const Kokkos::View<T,P...> & src
, typename std::enable_if<
- std::is_same< typename Kokkos::Experimental::ViewTraits<T,P...>::array_layout
+ std::is_same< typename Kokkos::ViewTraits<T,P...>::array_layout
, Kokkos::LayoutStride >::value
>::type * = 0
)
{
typedef View<T,P...> src_type ;
typedef typename src_type::HostMirror dst_type ;
Kokkos::LayoutStride layout ;
layout.dimension[0] = src.dimension_0();
layout.dimension[1] = src.dimension_1();
layout.dimension[2] = src.dimension_2();
layout.dimension[3] = src.dimension_3();
layout.dimension[4] = src.dimension_4();
layout.dimension[5] = src.dimension_5();
layout.dimension[6] = src.dimension_6();
layout.dimension[7] = src.dimension_7();
layout.stride[0] = src.stride_0();
layout.stride[1] = src.stride_1();
layout.stride[2] = src.stride_2();
layout.stride[3] = src.stride_3();
layout.stride[4] = src.stride_4();
layout.stride[5] = src.stride_5();
layout.stride[6] = src.stride_6();
layout.stride[7] = src.stride_7();
return dst_type( std::string( src.label() ).append("_mirror") , layout );
}
// Create a mirror in a new space (specialization for different space)
template<class Space, class T, class ... P>
-typename Impl::MirrorType<Space,T,P ...>::view_type create_mirror(const Space& , const Kokkos::Experimental::View<T,P...> & src) {
+typename Impl::MirrorType<Space,T,P ...>::view_type create_mirror(const Space& , const Kokkos::View<T,P...> & src) {
return typename Impl::MirrorType<Space,T,P ...>::view_type(src.label(),src.layout());
}
template< class T , class ... P >
inline
-typename Kokkos::Experimental::View<T,P...>::HostMirror
-create_mirror_view( const Kokkos::Experimental::View<T,P...> & src
+typename Kokkos::View<T,P...>::HostMirror
+create_mirror_view( const Kokkos::View<T,P...> & src
, typename std::enable_if<(
- std::is_same< typename Kokkos::Experimental::View<T,P...>::memory_space
- , typename Kokkos::Experimental::View<T,P...>::HostMirror::memory_space
+ std::is_same< typename Kokkos::View<T,P...>::memory_space
+ , typename Kokkos::View<T,P...>::HostMirror::memory_space
>::value
&&
- std::is_same< typename Kokkos::Experimental::View<T,P...>::data_type
- , typename Kokkos::Experimental::View<T,P...>::HostMirror::data_type
+ std::is_same< typename Kokkos::View<T,P...>::data_type
+ , typename Kokkos::View<T,P...>::HostMirror::data_type
>::value
)>::type * = 0
)
{
return src ;
}
template< class T , class ... P >
inline
-typename Kokkos::Experimental::View<T,P...>::HostMirror
-create_mirror_view( const Kokkos::Experimental::View<T,P...> & src
+typename Kokkos::View<T,P...>::HostMirror
+create_mirror_view( const Kokkos::View<T,P...> & src
, typename std::enable_if< ! (
- std::is_same< typename Kokkos::Experimental::View<T,P...>::memory_space
- , typename Kokkos::Experimental::View<T,P...>::HostMirror::memory_space
+ std::is_same< typename Kokkos::View<T,P...>::memory_space
+ , typename Kokkos::View<T,P...>::HostMirror::memory_space
>::value
&&
- std::is_same< typename Kokkos::Experimental::View<T,P...>::data_type
- , typename Kokkos::Experimental::View<T,P...>::HostMirror::data_type
+ std::is_same< typename Kokkos::View<T,P...>::data_type
+ , typename Kokkos::View<T,P...>::HostMirror::data_type
>::value
)>::type * = 0
)
{
- return Kokkos::Experimental::create_mirror( src );
+ return Kokkos::create_mirror( src );
}
// Create a mirror view in a new space (specialization for same space)
template<class Space, class T, class ... P>
typename Impl::MirrorViewType<Space,T,P ...>::view_type
-create_mirror_view(const Space& , const Kokkos::Experimental::View<T,P...> & src
+create_mirror_view(const Space& , const Kokkos::View<T,P...> & src
, typename std::enable_if<Impl::MirrorViewType<Space,T,P ...>::is_same_memspace>::type* = 0 ) {
return src;
}
// Create a mirror view in a new space (specialization for different space)
template<class Space, class T, class ... P>
typename Impl::MirrorViewType<Space,T,P ...>::view_type
-create_mirror_view(const Space& , const Kokkos::Experimental::View<T,P...> & src
+create_mirror_view(const Space& , const Kokkos::View<T,P...> & src
, typename std::enable_if<!Impl::MirrorViewType<Space,T,P ...>::is_same_memspace>::type* = 0 ) {
return typename Impl::MirrorViewType<Space,T,P ...>::view_type(src.label(),src.layout());
}
-} /* namespace Experimental */
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
-namespace Experimental {
/** \brief Resize a view with copying old data to new data at the corresponding indices. */
template< class T , class ... P >
inline
-void resize( Kokkos::Experimental::View<T,P...> & v ,
+typename std::enable_if<
+ std::is_same<typename Kokkos::View<T,P...>::array_layout,Kokkos::LayoutLeft>::value ||
+ std::is_same<typename Kokkos::View<T,P...>::array_layout,Kokkos::LayoutRight>::value
+>::type
+resize( Kokkos::View<T,P...> & v ,
const size_t n0 = 0 ,
const size_t n1 = 0 ,
const size_t n2 = 0 ,
const size_t n3 = 0 ,
const size_t n4 = 0 ,
const size_t n5 = 0 ,
const size_t n6 = 0 ,
const size_t n7 = 0 )
{
- typedef Kokkos::Experimental::View<T,P...> view_type ;
+ typedef Kokkos::View<T,P...> view_type ;
- static_assert( Kokkos::Experimental::ViewTraits<T,P...>::is_managed , "Can only resize managed views" );
+ static_assert( Kokkos::ViewTraits<T,P...>::is_managed , "Can only resize managed views" );
view_type v_resized( v.label(), n0, n1, n2, n3, n4, n5, n6, n7 );
- Kokkos::Experimental::Impl::ViewRemap< view_type , view_type >( v_resized , v );
+ Kokkos::Impl::ViewRemap< view_type , view_type >( v_resized , v );
v = v_resized ;
}
/** \brief Resize a view with copying old data to new data at the corresponding indices. */
template< class T , class ... P >
inline
-void realloc( Kokkos::Experimental::View<T,P...> & v ,
+void resize( Kokkos::View<T,P...> & v ,
+ const typename Kokkos::View<T,P...>::array_layout & layout)
+{
+ typedef Kokkos::View<T,P...> view_type ;
+
+ static_assert( Kokkos::ViewTraits<T,P...>::is_managed , "Can only resize managed views" );
+
+ view_type v_resized( v.label(), layout );
+
+ Kokkos::Impl::ViewRemap< view_type , view_type >( v_resized , v );
+
+ v = v_resized ;
+}
+
+/** \brief Resize a view with discarding old data. */
+template< class T , class ... P >
+inline
+typename std::enable_if<
+ std::is_same<typename Kokkos::View<T,P...>::array_layout,Kokkos::LayoutLeft>::value ||
+ std::is_same<typename Kokkos::View<T,P...>::array_layout,Kokkos::LayoutRight>::value
+>::type
+realloc( Kokkos::View<T,P...> & v ,
const size_t n0 = 0 ,
const size_t n1 = 0 ,
const size_t n2 = 0 ,
const size_t n3 = 0 ,
const size_t n4 = 0 ,
const size_t n5 = 0 ,
const size_t n6 = 0 ,
const size_t n7 = 0 )
{
- typedef Kokkos::Experimental::View<T,P...> view_type ;
+ typedef Kokkos::View<T,P...> view_type ;
- static_assert( Kokkos::Experimental::ViewTraits<T,P...>::is_managed , "Can only realloc managed views" );
+ static_assert( Kokkos::ViewTraits<T,P...>::is_managed , "Can only realloc managed views" );
const std::string label = v.label();
v = view_type(); // Deallocate first, if the only view to allocation
v = view_type( label, n0, n1, n2, n3, n4, n5, n6, n7 );
}
-} /* namespace Experimental */
+/** \brief Resize a view with discarding old data. */
+template< class T , class ... P >
+inline
+void realloc( Kokkos::View<T,P...> & v ,
+ const typename Kokkos::View<T,P...>::array_layout & layout)
+{
+ typedef Kokkos::View<T,P...> view_type ;
+
+ static_assert( Kokkos::ViewTraits<T,P...>::is_managed , "Can only realloc managed views" );
+
+ const std::string label = v.label();
+
+ v = view_type(); // Deallocate first, if the only view to allocation
+ v = view_type( label, layout );
+}
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
+// For backward compatibility:
namespace Kokkos {
+namespace Experimental {
-template< class D , class ... P >
-using ViewTraits = Kokkos::Experimental::ViewTraits<D,P...> ;
+using Kokkos::ViewTraits ;
+using Kokkos::View ;
+using Kokkos::Subview ;
+using Kokkos::is_view ;
+using Kokkos::subview ;
+using Kokkos::ALL ;
+using Kokkos::WithoutInitializing ;
+using Kokkos::AllowPadding ;
+using Kokkos::view_alloc ;
+using Kokkos::view_wrap ;
+using Kokkos::deep_copy ;
+using Kokkos::create_mirror ;
+using Kokkos::create_mirror_view ;
+using Kokkos::resize ;
+using Kokkos::realloc ;
-using Experimental::View ; //modified due to gcc parser bug
-//template< class D , class ... P >
-//using View = Kokkos::Experimental::View<D,P...> ;
+namespace Impl {
-using Kokkos::Experimental::ALL ;
-using Kokkos::Experimental::WithoutInitializing ;
-using Kokkos::Experimental::AllowPadding ;
-using Kokkos::Experimental::view_alloc ;
-using Kokkos::Experimental::view_wrap ;
+using Kokkos::Impl::ViewFill ;
+using Kokkos::Impl::ViewRemap ;
+using Kokkos::Impl::ViewCtorProp ;
+using Kokkos::Impl::is_view_label ;
+using Kokkos::Impl::WithoutInitializing_t ;
+using Kokkos::Impl::AllowPadding_t ;
+using Kokkos::Impl::SharedAllocationRecord ;
+using Kokkos::Impl::SharedAllocationTracker ;
-using Kokkos::Experimental::deep_copy ;
-using Kokkos::Experimental::create_mirror ;
-using Kokkos::Experimental::create_mirror_view ;
-using Kokkos::Experimental::subview ;
-using Kokkos::Experimental::resize ;
-using Kokkos::Experimental::realloc ;
-using Kokkos::Experimental::is_view ;
+} /* namespace Impl */
+} /* namespace Experimental */
+} /* namespace Kokkos */
+namespace Kokkos {
namespace Impl {
-using Kokkos::Experimental::is_view ;
-
-class ViewDefault {};
+using Kokkos::is_view ;
template< class SrcViewType
, class Arg0Type
, class Arg1Type
, class Arg2Type
, class Arg3Type
, class Arg4Type
, class Arg5Type
, class Arg6Type
, class Arg7Type
>
struct ViewSubview /* { typedef ... type ; } */ ;
-}
-
+} /* namespace Impl */
} /* namespace Kokkos */
#include <impl/Kokkos_Atomic_View.hpp>
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
#endif /* #ifndef KOKKOS_VIEW_HPP */
diff --git a/lib/kokkos/core/src/Makefile b/lib/kokkos/core/src/Makefile
index dc27d341a..316f61fd4 100644
--- a/lib/kokkos/core/src/Makefile
+++ b/lib/kokkos/core/src/Makefile
@@ -1,124 +1,144 @@
-KOKKOS_PATH = ../..
+ifndef KOKKOS_PATH
+ MAKEFILE_PATH := $(abspath $(lastword $(MAKEFILE_LIST)))
+ KOKKOS_PATH = $(subst Makefile,,$(MAKEFILE_PATH))../..
+endif
PREFIX ?= /usr/local/lib/kokkos
default: messages build-lib
echo "End Build"
-include $(KOKKOS_PATH)/Makefile.kokkos
-
-ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
- CXX = $(NVCC_WRAPPER)
- CXXFLAGS ?= -O3
- LINK = $(NVCC_WRAPPER)
- LINKFLAGS ?=
+ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
+ CXX = $(KOKKOS_PATH)/config/nvcc_wrapper
else
- CXX ?= g++
- CXXFLAGS ?= -O3
- LINK ?= g++
- LINKFLAGS ?=
+ CXX = g++
endif
+CXXFLAGS = -O3
+LINK ?= $(CXX)
+LDFLAGS ?=
+
+include $(KOKKOS_PATH)/Makefile.kokkos
+
PWD = $(shell pwd)
KOKKOS_HEADERS_INCLUDE = $(wildcard $(KOKKOS_PATH)/core/src/*.hpp)
KOKKOS_HEADERS_INCLUDE_IMPL = $(wildcard $(KOKKOS_PATH)/core/src/impl/*.hpp)
KOKKOS_HEADERS_INCLUDE += $(wildcard $(KOKKOS_PATH)/containers/src/*.hpp)
KOKKOS_HEADERS_INCLUDE_IMPL += $(wildcard $(KOKKOS_PATH)/containers/src/impl/*.hpp)
KOKKOS_HEADERS_INCLUDE += $(wildcard $(KOKKOS_PATH)/algorithms/src/*.hpp)
CONDITIONAL_COPIES =
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
KOKKOS_HEADERS_CUDA += $(wildcard $(KOKKOS_PATH)/core/src/Cuda/*.hpp)
CONDITIONAL_COPIES += copy-cuda
endif
ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 1)
KOKKOS_HEADERS_THREADS += $(wildcard $(KOKKOS_PATH)/core/src/Threads/*.hpp)
CONDITIONAL_COPIES += copy-threads
endif
ifeq ($(KOKKOS_INTERNAL_USE_QTHREAD), 1)
KOKKOS_HEADERS_QTHREAD += $(wildcard $(KOKKOS_PATH)/core/src/Qthread/*.hpp)
CONDITIONAL_COPIES += copy-qthread
endif
ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
KOKKOS_HEADERS_OPENMP += $(wildcard $(KOKKOS_PATH)/core/src/OpenMP/*.hpp)
CONDITIONAL_COPIES += copy-openmp
endif
+ifeq ($(KOKKOS_OS),CYGWIN)
+ COPY_FLAG = -u
+endif
+ifeq ($(KOKKOS_OS),Linux)
+ COPY_FLAG = -u
+endif
+ifeq ($(KOKKOS_OS),Darwin)
+ COPY_FLAG =
+endif
+
messages:
echo "Start Build"
build-makefile-kokkos:
rm -f Makefile.kokkos
echo "#Global Settings used to generate this library" >> Makefile.kokkos
echo "KOKKOS_PATH = $(PREFIX)" >> Makefile.kokkos
echo "KOKKOS_DEVICES = $(KOKKOS_DEVICES)" >> Makefile.kokkos
echo "KOKKOS_ARCH = $(KOKKOS_ARCH)" >> Makefile.kokkos
echo "KOKKOS_DEBUG = $(KOKKOS_DEBUG)" >> Makefile.kokkos
echo "KOKKOS_USE_TPLS = $(KOKKOS_USE_TPLS)" >> Makefile.kokkos
echo "KOKKOS_CXX_STANDARD = $(KOKKOS_CXX_STANDARD)" >> Makefile.kokkos
echo "KOKKOS_OPTIONS = $(KOKKOS_OPTIONS)" >> Makefile.kokkos
echo "KOKKOS_CUDA_OPTIONS = $(KOKKOS_CUDA_OPTIONS)" >> Makefile.kokkos
echo "CXX ?= $(CXX)" >> Makefile.kokkos
echo "NVCC_WRAPPER ?= $(PREFIX)/bin/nvcc_wrapper" >> Makefile.kokkos
echo "" >> Makefile.kokkos
echo "#Source and Header files of Kokkos relative to KOKKOS_PATH" >> Makefile.kokkos
echo "KOKKOS_HEADERS = $(KOKKOS_HEADERS)" >> Makefile.kokkos
echo "KOKKOS_SRC = $(KOKKOS_SRC)" >> Makefile.kokkos
echo "" >> Makefile.kokkos
echo "#Variables used in application Makefiles" >> Makefile.kokkos
echo "KOKKOS_CPP_DEPENDS = $(KOKKOS_CPP_DEPENDS)" >> Makefile.kokkos
echo "KOKKOS_CXXFLAGS = $(KOKKOS_CXXFLAGS)" >> Makefile.kokkos
echo "KOKKOS_CPPFLAGS = $(KOKKOS_CPPFLAGS)" >> Makefile.kokkos
echo "KOKKOS_LINK_DEPENDS = $(KOKKOS_LINK_DEPENDS)" >> Makefile.kokkos
echo "KOKKOS_LIBS = $(KOKKOS_LIBS)" >> Makefile.kokkos
echo "KOKKOS_LDFLAGS = $(KOKKOS_LDFLAGS)" >> Makefile.kokkos
+ echo "" >> Makefile.kokkos
+ echo "#Internal settings which need to propagated for Kokkos examples" >> Makefile.kokkos
+ echo "KOKKOS_INTERNAL_USE_CUDA = ${KOKKOS_INTERNAL_USE_CUDA}" >> Makefile.kokkos
+ echo "KOKKOS_INTERNAL_USE_OPENMP = ${KOKKOS_INTERNAL_USE_OPENMP}" >> Makefile.kokkos
+ echo "KOKKOS_INTERNAL_USE_PTHREADS = ${KOKKOS_INTERNAL_USE_PTHREADS}" >> Makefile.kokkos
+ echo "" >> Makefile.kokkos
+ echo "#Fake kokkos-clean target" >> Makefile.kokkos
+ echo "kokkos-clean:" >> Makefile.kokkos
+ echo "" >> Makefile.kokkos
sed \
-e 's|$(KOKKOS_PATH)/core/src|$(PREFIX)/include|g' \
-e 's|$(KOKKOS_PATH)/containers/src|$(PREFIX)/include|g' \
-e 's|$(KOKKOS_PATH)/algorithms/src|$(PREFIX)/include|g' \
-e 's|-L$(PWD)|-L$(PREFIX)/lib|g' \
-e 's|= libkokkos.a|= $(PREFIX)/lib/libkokkos.a|g' \
-e 's|= KokkosCore_config.h|= $(PREFIX)/include/KokkosCore_config.h|g' Makefile.kokkos \
> Makefile.kokkos.tmp
mv -f Makefile.kokkos.tmp Makefile.kokkos
build-lib: build-makefile-kokkos $(KOKKOS_LINK_DEPENDS)
mkdir:
mkdir -p $(PREFIX)
mkdir -p $(PREFIX)/bin
mkdir -p $(PREFIX)/include
mkdir -p $(PREFIX)/lib
mkdir -p $(PREFIX)/include/impl
copy-cuda: mkdir
mkdir -p $(PREFIX)/include/Cuda
- cp $(KOKKOS_HEADERS_CUDA) $(PREFIX)/include/Cuda
+ cp $(COPY_FLAG) $(KOKKOS_HEADERS_CUDA) $(PREFIX)/include/Cuda
copy-threads: mkdir
mkdir -p $(PREFIX)/include/Threads
- cp $(KOKKOS_HEADERS_THREADS) $(PREFIX)/include/Threads
+ cp $(COPY_FLAG) $(KOKKOS_HEADERS_THREADS) $(PREFIX)/include/Threads
copy-qthread: mkdir
mkdir -p $(PREFIX)/include/Qthread
- cp $(KOKKOS_HEADERS_QTHREAD) $(PREFIX)/include/Qthread
+ cp $(COPY_FLAG) $(KOKKOS_HEADERS_QTHREAD) $(PREFIX)/include/Qthread
copy-openmp: mkdir
mkdir -p $(PREFIX)/include/OpenMP
- cp $(KOKKOS_HEADERS_OPENMP) $(PREFIX)/include/OpenMP
+ cp $(COPY_FLAG) $(KOKKOS_HEADERS_OPENMP) $(PREFIX)/include/OpenMP
install: mkdir $(CONDITIONAL_COPIES) build-lib
- cp $(NVCC_WRAPPER) $(PREFIX)/bin
- cp $(KOKKOS_HEADERS_INCLUDE) $(PREFIX)/include
- cp $(KOKKOS_HEADERS_INCLUDE_IMPL) $(PREFIX)/include/impl
- cp Makefile.kokkos $(PREFIX)
- cp libkokkos.a $(PREFIX)/lib
- cp KokkosCore_config.h $(PREFIX)/include
+ cp $(COPY_FLAG) $(NVCC_WRAPPER) $(PREFIX)/bin
+ cp $(COPY_FLAG) $(KOKKOS_HEADERS_INCLUDE) $(PREFIX)/include
+ cp $(COPY_FLAG) $(KOKKOS_HEADERS_INCLUDE_IMPL) $(PREFIX)/include/impl
+ cp $(COPY_FLAG) Makefile.kokkos $(PREFIX)
+ cp $(COPY_FLAG) libkokkos.a $(PREFIX)/lib
+ cp $(COPY_FLAG) KokkosCore_config.h $(PREFIX)/include
clean: kokkos-clean
rm -f Makefile.kokkos
diff --git a/lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Task.cpp b/lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Task.cpp
index 3e22033f7..00a9957ee 100644
--- a/lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Task.cpp
+++ b/lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Task.cpp
@@ -1,329 +1,329 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Core.hpp>
-#if defined( KOKKOS_HAVE_OPENMP ) && defined( KOKKOS_ENABLE_TASKPOLICY )
+#if defined( KOKKOS_HAVE_OPENMP ) && defined( KOKKOS_ENABLE_TASKDAG )
#include <impl/Kokkos_TaskQueue_impl.hpp>
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template class TaskQueue< Kokkos::OpenMP > ;
//----------------------------------------------------------------------------
TaskExec< Kokkos::OpenMP >::
TaskExec()
: m_self_exec( 0 )
, m_team_exec( 0 )
, m_sync_mask( 0 )
, m_sync_value( 0 )
, m_sync_step( 0 )
, m_group_rank( 0 )
, m_team_rank( 0 )
, m_team_size( 1 )
{
}
TaskExec< Kokkos::OpenMP >::
TaskExec( Kokkos::Impl::OpenMPexec & arg_exec , int const arg_team_size )
: m_self_exec( & arg_exec )
, m_team_exec( arg_exec.pool_rev(arg_exec.pool_rank_rev() / arg_team_size) )
, m_sync_mask( 0 )
, m_sync_value( 0 )
, m_sync_step( 0 )
, m_group_rank( arg_exec.pool_rank_rev() / arg_team_size )
, m_team_rank( arg_exec.pool_rank_rev() % arg_team_size )
, m_team_size( arg_team_size )
{
// This team spans
// m_self_exec->pool_rev( team_size * group_rank )
// m_self_exec->pool_rev( team_size * ( group_rank + 1 ) - 1 )
int64_t volatile * const sync = (int64_t *) m_self_exec->scratch_reduce();
sync[0] = int64_t(0) ;
sync[1] = int64_t(0) ;
for ( int i = 0 ; i < m_team_size ; ++i ) {
m_sync_value |= int64_t(1) << (8*i);
m_sync_mask |= int64_t(3) << (8*i);
}
Kokkos::memory_fence();
}
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
void TaskExec< Kokkos::OpenMP >::team_barrier_impl() const
{
if ( m_team_exec->scratch_reduce_size() < int(2 * sizeof(int64_t)) ) {
Kokkos::abort("TaskQueue<OpenMP> scratch_reduce memory too small");
}
// Use team shared memory to synchronize.
// Alternate memory locations between barriers to avoid a sequence
// of barriers overtaking one another.
int64_t volatile * const sync =
((int64_t *) m_team_exec->scratch_reduce()) + ( m_sync_step & 0x01 );
// This team member sets one byte within the sync variable
int8_t volatile * const sync_self =
((int8_t *) sync) + m_team_rank ;
#if 0
fprintf( stdout
, "barrier group(%d) member(%d) step(%d) wait(%lx) : before(%lx)\n"
, m_group_rank
, m_team_rank
, m_sync_step
, m_sync_value
, *sync
);
fflush(stdout);
#endif
*sync_self = int8_t( m_sync_value & 0x03 ); // signal arrival
while ( m_sync_value != *sync ); // wait for team to arrive
#if 0
fprintf( stdout
, "barrier group(%d) member(%d) step(%d) wait(%lx) : after(%lx)\n"
, m_group_rank
, m_team_rank
, m_sync_step
, m_sync_value
, *sync
);
fflush(stdout);
#endif
++m_sync_step ;
if ( 0 == ( 0x01 & m_sync_step ) ) { // Every other step
m_sync_value ^= m_sync_mask ;
if ( 1000 < m_sync_step ) m_sync_step = 0 ;
}
}
#endif
//----------------------------------------------------------------------------
void TaskQueueSpecialization< Kokkos::OpenMP >::execute
( TaskQueue< Kokkos::OpenMP > * const queue )
{
using execution_space = Kokkos::OpenMP ;
using queue_type = TaskQueue< execution_space > ;
using task_root_type = TaskBase< execution_space , void , void > ;
using PoolExec = Kokkos::Impl::OpenMPexec ;
using Member = TaskExec< execution_space > ;
task_root_type * const end = (task_root_type *) task_root_type::EndTag ;
// Required: team_size <= 8
const int team_size = PoolExec::pool_size(2); // Threads per core
// const int team_size = PoolExec::pool_size(1); // Threads per NUMA
if ( 8 < team_size ) {
Kokkos::abort("TaskQueue<OpenMP> unsupported team size");
}
#pragma omp parallel
{
PoolExec & self = *PoolExec::get_thread_omp();
Member single_exec ;
Member team_exec( self , team_size );
// Team shared memory
task_root_type * volatile * const task_shared =
(task_root_type **) team_exec.m_team_exec->scratch_thread();
// Barrier across entire OpenMP thread pool to insure initialization
#pragma omp barrier
// Loop until all queues are empty and no tasks in flight
do {
task_root_type * task = 0 ;
// Each team lead attempts to acquire either a thread team task
// or a single thread task for the team.
if ( 0 == team_exec.team_rank() ) {
task = 0 < *((volatile int *) & queue->m_ready_count) ? end : 0 ;
// Loop by priority and then type
for ( int i = 0 ; i < queue_type::NumQueue && end == task ; ++i ) {
for ( int j = 0 ; j < 2 && end == task ; ++j ) {
task = queue_type::pop_task( & queue->m_ready[i][j] );
}
}
}
// Team lead broadcast acquired task to team members:
if ( 1 < team_exec.team_size() ) {
if ( 0 == team_exec.team_rank() ) *task_shared = task ;
// Fence to be sure task_shared is stored before the barrier
Kokkos::memory_fence();
// Whole team waits for every team member to reach this statement
team_exec.team_barrier();
// Fence to be sure task_shared is stored
Kokkos::memory_fence();
task = *task_shared ;
}
#if 0
fprintf( stdout
, "\nexecute group(%d) member(%d) task_shared(0x%lx) task(0x%lx)\n"
, team_exec.m_group_rank
, team_exec.m_team_rank
, uintptr_t(task_shared)
, uintptr_t(task)
);
fflush(stdout);
#endif
if ( 0 == task ) break ; // 0 == m_ready_count
if ( end == task ) {
// All team members wait for whole team to reach this statement.
// Is necessary to prevent task_shared from being updated
// before it is read by all threads.
team_exec.team_barrier();
}
else if ( task_root_type::TaskTeam == task->m_task_type ) {
// Thread Team Task
(*task->m_apply)( task , & team_exec );
// The m_apply function performs a barrier
if ( 0 == team_exec.team_rank() ) {
// team member #0 completes the task, which may delete the task
queue->complete( task );
}
}
else {
// Single Thread Task
if ( 0 == team_exec.team_rank() ) {
(*task->m_apply)( task , & single_exec );
queue->complete( task );
}
// All team members wait for whole team to reach this statement.
// Not necessary to complete the task.
// Is necessary to prevent task_shared from being updated
// before it is read by all threads.
team_exec.team_barrier();
}
} while(1);
}
// END #pragma omp parallel
}
void TaskQueueSpecialization< Kokkos::OpenMP >::
iff_single_thread_recursive_execute
( TaskQueue< Kokkos::OpenMP > * const queue )
{
using execution_space = Kokkos::OpenMP ;
using queue_type = TaskQueue< execution_space > ;
using task_root_type = TaskBase< execution_space , void , void > ;
using Member = TaskExec< execution_space > ;
if ( 1 == omp_get_num_threads() ) {
task_root_type * const end = (task_root_type *) task_root_type::EndTag ;
Member single_exec ;
task_root_type * task = end ;
do {
task = end ;
// Loop by priority and then type
for ( int i = 0 ; i < queue_type::NumQueue && end == task ; ++i ) {
for ( int j = 0 ; j < 2 && end == task ; ++j ) {
task = queue_type::pop_task( & queue->m_ready[i][j] );
}
}
if ( end == task ) break ;
(*task->m_apply)( task , & single_exec );
queue->complete( task );
} while(1);
}
}
}} /* namespace Kokkos::Impl */
//----------------------------------------------------------------------------
-#endif /* #if defined( KOKKOS_HAVE_OPENMP ) && defined( KOKKOS_ENABLE_TASKPOLICY ) */
+#endif /* #if defined( KOKKOS_HAVE_OPENMP ) && defined( KOKKOS_ENABLE_TASKDAG ) */
diff --git a/lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Task.hpp b/lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Task.hpp
index 2761247c4..15dbb77c2 100644
--- a/lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Task.hpp
+++ b/lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Task.hpp
@@ -1,356 +1,365 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_IMPL_OPENMP_TASK_HPP
#define KOKKOS_IMPL_OPENMP_TASK_HPP
-#if defined( KOKKOS_ENABLE_TASKPOLICY )
+#if defined( KOKKOS_ENABLE_TASKDAG )
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template<>
class TaskQueueSpecialization< Kokkos::OpenMP >
{
public:
using execution_space = Kokkos::OpenMP ;
using queue_type = Kokkos::Impl::TaskQueue< execution_space > ;
using task_base_type = Kokkos::Impl::TaskBase< execution_space , void , void > ;
// Must specify memory space
using memory_space = Kokkos::HostSpace ;
static
void iff_single_thread_recursive_execute( queue_type * const );
// Must provide task queue execution function
static void execute( queue_type * const );
// Must provide mechanism to set function pointer in
// execution space from the host process.
template< typename FunctorType >
static
void proc_set_apply( task_base_type::function_type * ptr )
{
using TaskType = TaskBase< Kokkos::OpenMP
, typename FunctorType::value_type
, FunctorType
> ;
*ptr = TaskType::apply ;
}
};
extern template class TaskQueue< Kokkos::OpenMP > ;
//----------------------------------------------------------------------------
template<>
class TaskExec< Kokkos::OpenMP >
{
private:
TaskExec( TaskExec && ) = delete ;
TaskExec( TaskExec const & ) = delete ;
TaskExec & operator = ( TaskExec && ) = delete ;
TaskExec & operator = ( TaskExec const & ) = delete ;
using PoolExec = Kokkos::Impl::OpenMPexec ;
friend class Kokkos::Impl::TaskQueue< Kokkos::OpenMP > ;
friend class Kokkos::Impl::TaskQueueSpecialization< Kokkos::OpenMP > ;
PoolExec * const m_self_exec ; ///< This thread's thread pool data structure
PoolExec * const m_team_exec ; ///< Team thread's thread pool data structure
int64_t m_sync_mask ;
int64_t mutable m_sync_value ;
int mutable m_sync_step ;
int m_group_rank ; ///< Which "team" subset of thread pool
int m_team_rank ; ///< Which thread within a team
int m_team_size ;
TaskExec();
TaskExec( PoolExec & arg_exec , int arg_team_size );
void team_barrier_impl() const ;
public:
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
void * team_shared() const
{ return m_team_exec ? m_team_exec->scratch_thread() : (void*) 0 ; }
int team_shared_size() const
{ return m_team_exec ? m_team_exec->scratch_thread_size() : 0 ; }
/**\brief Whole team enters this function call
* before any teeam member returns from
* this function call.
*/
void team_barrier() const { if ( 1 < m_team_size ) team_barrier_impl(); }
#else
KOKKOS_INLINE_FUNCTION void team_barrier() const {}
KOKKOS_INLINE_FUNCTION void * team_shared() const { return 0 ; }
KOKKOS_INLINE_FUNCTION int team_shared_size() const { return 0 ; }
#endif
KOKKOS_INLINE_FUNCTION
int team_rank() const { return m_team_rank ; }
KOKKOS_INLINE_FUNCTION
int team_size() const { return m_team_size ; }
};
}} /* namespace Kokkos::Impl */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
template<typename iType>
KOKKOS_INLINE_FUNCTION
Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::OpenMP > >
TeamThreadRange
- ( Impl::TaskExec< Kokkos::OpenMP > & thread
- , const iType & count )
+ ( Impl::TaskExec< Kokkos::OpenMP > & thread, const iType & count )
{
return Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::OpenMP > >(thread,count);
}
-template<typename iType>
+template<typename iType1, typename iType2>
KOKKOS_INLINE_FUNCTION
-Impl::TeamThreadRangeBoundariesStruct<iType,Impl:: TaskExec< Kokkos::OpenMP > >
+Impl::TeamThreadRangeBoundariesStruct< typename std::common_type< iType1, iType2 >::type,
+ Impl::TaskExec< Kokkos::OpenMP > >
TeamThreadRange
- ( Impl:: TaskExec< Kokkos::OpenMP > & thread
- , const iType & start
- , const iType & end )
+ ( Impl:: TaskExec< Kokkos::OpenMP > & thread, const iType1 & begin, const iType2 & end )
+{
+ typedef typename std::common_type<iType1, iType2>::type iType;
+ return Impl::TeamThreadRangeBoundariesStruct<iType, Impl::TaskExec< Kokkos::OpenMP > >(thread, begin, end);
+}
+
+template<typename iType>
+KOKKOS_INLINE_FUNCTION
+Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::OpenMP > >
+ThreadVectorRange
+ ( Impl::TaskExec< Kokkos::OpenMP > & thread
+ , const iType & count )
{
- return Impl::TeamThreadRangeBoundariesStruct<iType,Impl:: TaskExec< Kokkos::OpenMP > >(thread,start,end);
+ return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::OpenMP > >(thread,count);
}
/** \brief Inter-thread parallel_for. Executes lambda(iType i) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all threads of the the calling thread team.
* This functionality requires C++11 support.
*/
template<typename iType, class Lambda>
KOKKOS_INLINE_FUNCTION
void parallel_for
( const Impl::TeamThreadRangeBoundariesStruct<iType,Impl:: TaskExec< Kokkos::OpenMP > >& loop_boundaries
, const Lambda& lambda
)
{
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
lambda(i);
}
}
template<typename iType, class Lambda, typename ValueType>
KOKKOS_INLINE_FUNCTION
void parallel_reduce
( const Impl::TeamThreadRangeBoundariesStruct<iType,Impl:: TaskExec< Kokkos::OpenMP > >& loop_boundaries
, const Lambda& lambda
, ValueType& initialized_result)
{
int team_rank = loop_boundaries.thread.team_rank(); // member num within the team
ValueType result = initialized_result;
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
lambda(i, result);
}
if ( 1 < loop_boundaries.thread.team_size() ) {
ValueType *shared = (ValueType*) loop_boundaries.thread.team_shared();
loop_boundaries.thread.team_barrier();
shared[team_rank] = result;
loop_boundaries.thread.team_barrier();
// reduce across threads to thread 0
if (team_rank == 0) {
for (int i = 1; i < loop_boundaries.thread.team_size(); i++) {
shared[0] += shared[i];
}
}
loop_boundaries.thread.team_barrier();
// broadcast result
initialized_result = shared[0];
}
else {
initialized_result = result ;
}
}
template< typename iType, class Lambda, typename ValueType, class JoinType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce
(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::OpenMP > >& loop_boundaries,
const Lambda & lambda,
const JoinType & join,
ValueType& initialized_result)
{
int team_rank = loop_boundaries.thread.team_rank(); // member num within the team
ValueType result = initialized_result;
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
lambda(i, result);
}
if ( 1 < loop_boundaries.thread.team_size() ) {
ValueType *shared = (ValueType*) loop_boundaries.thread.team_shared();
loop_boundaries.thread.team_barrier();
shared[team_rank] = result;
loop_boundaries.thread.team_barrier();
// reduce across threads to thread 0
if (team_rank == 0) {
for (int i = 1; i < loop_boundaries.thread.team_size(); i++) {
join(shared[0], shared[i]);
}
}
loop_boundaries.thread.team_barrier();
// broadcast result
initialized_result = shared[0];
}
else {
initialized_result = result ;
}
}
// placeholder for future function
template< typename iType, class Lambda, typename ValueType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce
(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::OpenMP > >& loop_boundaries,
const Lambda & lambda,
ValueType& initialized_result)
{
}
// placeholder for future function
template< typename iType, class Lambda, typename ValueType, class JoinType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce
(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::OpenMP > >& loop_boundaries,
const Lambda & lambda,
const JoinType & join,
ValueType& initialized_result)
{
}
template< typename ValueType, typename iType, class Lambda >
KOKKOS_INLINE_FUNCTION
void parallel_scan
(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::OpenMP > >& loop_boundaries,
const Lambda & lambda)
{
ValueType accum = 0 ;
ValueType val, local_total;
ValueType *shared = (ValueType*) loop_boundaries.thread.team_shared();
int team_size = loop_boundaries.thread.team_size();
int team_rank = loop_boundaries.thread.team_rank(); // member num within the team
// Intra-member scan
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
local_total = 0;
lambda(i,local_total,false);
val = accum;
lambda(i,val,true);
accum += local_total;
}
shared[team_rank] = accum;
loop_boundaries.thread.team_barrier();
// Member 0 do scan on accumulated totals
if (team_rank == 0) {
for( iType i = 1; i < team_size; i+=1) {
shared[i] += shared[i-1];
}
accum = 0; // Member 0 set accum to 0 in preparation for inter-member scan
}
loop_boundaries.thread.team_barrier();
// Inter-member scan adding in accumulated totals
if (team_rank != 0) { accum = shared[team_rank-1]; }
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
local_total = 0;
lambda(i,local_total,false);
val = accum;
lambda(i,val,true);
accum += local_total;
}
}
// placeholder for future function
template< typename iType, class Lambda, typename ValueType >
KOKKOS_INLINE_FUNCTION
void parallel_scan
(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::OpenMP > >& loop_boundaries,
const Lambda & lambda)
{
}
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
-#endif /* #if defined( KOKKOS_ENABLE_TASKPOLICY ) */
+#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
#endif /* #ifndef KOKKOS_IMPL_OPENMP_TASK_HPP */
diff --git a/lib/kokkos/core/src/OpenMP/Kokkos_OpenMPexec.cpp b/lib/kokkos/core/src/OpenMP/Kokkos_OpenMPexec.cpp
index 7d06a2f66..25e7d8927 100644
--- a/lib/kokkos/core/src/OpenMP/Kokkos_OpenMPexec.cpp
+++ b/lib/kokkos/core/src/OpenMP/Kokkos_OpenMPexec.cpp
@@ -1,408 +1,408 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <stdio.h>
#include <limits>
#include <iostream>
#include <vector>
#include <Kokkos_Core.hpp>
#include <impl/Kokkos_Error.hpp>
#include <iostream>
#include <impl/Kokkos_CPUDiscovery.hpp>
#include <impl/Kokkos_Profiling_Interface.hpp>
#ifdef KOKKOS_HAVE_OPENMP
namespace Kokkos {
namespace Impl {
namespace {
KOKKOS_INLINE_FUNCTION
int kokkos_omp_in_parallel();
int kokkos_omp_in_critical_region = ( Kokkos::HostSpace::register_in_parallel( kokkos_omp_in_parallel ) , 0 );
KOKKOS_INLINE_FUNCTION
int kokkos_omp_in_parallel()
{
#ifndef __CUDA_ARCH__
return omp_in_parallel() && ! kokkos_omp_in_critical_region ;
#else
return 0;
#endif
}
bool s_using_hwloc = false;
} // namespace
} // namespace Impl
} // namespace Kokkos
namespace Kokkos {
namespace Impl {
int OpenMPexec::m_map_rank[ OpenMPexec::MAX_THREAD_COUNT ] = { 0 };
int OpenMPexec::m_pool_topo[ 4 ] = { 0 };
OpenMPexec * OpenMPexec::m_pool[ OpenMPexec::MAX_THREAD_COUNT ] = { 0 };
void OpenMPexec::verify_is_process( const char * const label )
{
if ( omp_in_parallel() ) {
std::string msg( label );
msg.append( " ERROR: in parallel" );
Kokkos::Impl::throw_runtime_exception( msg );
}
}
void OpenMPexec::verify_initialized( const char * const label )
{
if ( 0 == m_pool[0] ) {
std::string msg( label );
msg.append( " ERROR: not initialized" );
Kokkos::Impl::throw_runtime_exception( msg );
}
if ( omp_get_max_threads() != Kokkos::OpenMP::thread_pool_size(0) ) {
std::string msg( label );
msg.append( " ERROR: Initialized but threads modified inappropriately" );
Kokkos::Impl::throw_runtime_exception( msg );
}
}
void OpenMPexec::clear_scratch()
{
#pragma omp parallel
{
const int rank_rev = m_map_rank[ omp_get_thread_num() ];
typedef Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::HostSpace , void > Record ;
if ( m_pool[ rank_rev ] ) {
Record * const r = Record::get_record( m_pool[ rank_rev ] );
m_pool[ rank_rev ] = 0 ;
Record::decrement( r );
}
}
/* END #pragma omp parallel */
}
void OpenMPexec::resize_scratch( size_t reduce_size , size_t thread_size )
{
enum { ALIGN_MASK = Kokkos::Impl::MEMORY_ALIGNMENT - 1 };
enum { ALLOC_EXEC = ( sizeof(OpenMPexec) + ALIGN_MASK ) & ~ALIGN_MASK };
const size_t old_reduce_size = m_pool[0] ? m_pool[0]->m_scratch_reduce_end : 0 ;
const size_t old_thread_size = m_pool[0] ? m_pool[0]->m_scratch_thread_end - m_pool[0]->m_scratch_reduce_end : 0 ;
reduce_size = ( reduce_size + ALIGN_MASK ) & ~ALIGN_MASK ;
thread_size = ( thread_size + ALIGN_MASK ) & ~ALIGN_MASK ;
// Requesting allocation and old allocation is too small:
const bool allocate = ( old_reduce_size < reduce_size ) ||
( old_thread_size < thread_size );
if ( allocate ) {
if ( reduce_size < old_reduce_size ) { reduce_size = old_reduce_size ; }
if ( thread_size < old_thread_size ) { thread_size = old_thread_size ; }
}
const size_t alloc_size = allocate ? ALLOC_EXEC + reduce_size + thread_size : 0 ;
const int pool_size = m_pool_topo[0] ;
if ( allocate ) {
clear_scratch();
#pragma omp parallel
{
const int rank_rev = m_map_rank[ omp_get_thread_num() ];
const int rank = pool_size - ( rank_rev + 1 );
typedef Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::HostSpace , void > Record ;
Record * const r = Record::allocate( Kokkos::HostSpace()
, "openmp_scratch"
, alloc_size );
Record::increment( r );
m_pool[ rank_rev ] = reinterpret_cast<OpenMPexec*>( r->data() );
new ( m_pool[ rank_rev ] ) OpenMPexec( rank , ALLOC_EXEC , reduce_size , thread_size );
}
/* END #pragma omp parallel */
}
}
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
//----------------------------------------------------------------------------
int OpenMP::is_initialized()
{ return 0 != Impl::OpenMPexec::m_pool[0]; }
void OpenMP::initialize( unsigned thread_count ,
unsigned use_numa_count ,
unsigned use_cores_per_numa )
{
// Before any other call to OMP query the maximum number of threads
// and save the value for re-initialization unit testing.
//Using omp_get_max_threads(); is problematic in conjunction with
//Hwloc on Intel (essentially an initial call to the OpenMP runtime
//without a parallel region before will set a process mask for a single core
//The runtime will than bind threads for a parallel region to other cores on the
//entering the first parallel region and make the process mask the aggregate of
//the thread masks. The intend seems to be to make serial code run fast, if you
//compile with OpenMP enabled but don't actually use parallel regions or so
//static int omp_max_threads = omp_get_max_threads();
int nthreads = 0;
#pragma omp parallel
{
#pragma omp atomic
nthreads++;
}
static int omp_max_threads = nthreads;
const bool is_initialized = 0 != Impl::OpenMPexec::m_pool[0] ;
bool thread_spawn_failed = false ;
if ( ! is_initialized ) {
// Use hwloc thread pinning if concerned with locality.
// If spreading threads across multiple NUMA regions.
// If hyperthreading is enabled.
Impl::s_using_hwloc = hwloc::available() && (
( 1 < Kokkos::hwloc::get_available_numa_count() ) ||
( 1 < Kokkos::hwloc::get_available_threads_per_core() ) );
std::pair<unsigned,unsigned> threads_coord[ Impl::OpenMPexec::MAX_THREAD_COUNT ];
// If hwloc available then use it's maximum value.
if ( thread_count == 0 ) {
thread_count = Impl::s_using_hwloc
? Kokkos::hwloc::get_available_numa_count() *
Kokkos::hwloc::get_available_cores_per_numa() *
Kokkos::hwloc::get_available_threads_per_core()
: omp_max_threads ;
}
if(Impl::s_using_hwloc)
hwloc::thread_mapping( "Kokkos::OpenMP::initialize" ,
false /* do not allow asynchronous */ ,
thread_count ,
use_numa_count ,
use_cores_per_numa ,
threads_coord );
// Spawn threads:
omp_set_num_threads( thread_count );
// Verify OMP interaction:
if ( int(thread_count) != omp_get_max_threads() ) {
thread_spawn_failed = true ;
}
// Verify spawning and bind threads:
#pragma omp parallel
{
#pragma omp critical
{
if ( int(thread_count) != omp_get_num_threads() ) {
thread_spawn_failed = true ;
}
// Call to 'bind_this_thread' is not thread safe so place this whole block in a critical region.
// Call to 'new' may not be thread safe as well.
// Reverse the rank for threads so that the scan operation reduces to the highest rank thread.
const unsigned omp_rank = omp_get_thread_num();
const unsigned thread_r = Impl::s_using_hwloc && Kokkos::hwloc::can_bind_threads()
? Kokkos::hwloc::bind_this_thread( thread_count , threads_coord )
: omp_rank ;
Impl::OpenMPexec::m_map_rank[ omp_rank ] = thread_r ;
}
/* END #pragma omp critical */
}
/* END #pragma omp parallel */
if ( ! thread_spawn_failed ) {
Impl::OpenMPexec::m_pool_topo[0] = thread_count ;
Impl::OpenMPexec::m_pool_topo[1] = Impl::s_using_hwloc ? thread_count / use_numa_count : thread_count;
Impl::OpenMPexec::m_pool_topo[2] = Impl::s_using_hwloc ? thread_count / ( use_numa_count * use_cores_per_numa ) : 1;
Impl::OpenMPexec::resize_scratch( 1024 , 1024 );
}
}
if ( is_initialized || thread_spawn_failed ) {
std::string msg("Kokkos::OpenMP::initialize ERROR");
if ( is_initialized ) { msg.append(" : already initialized"); }
if ( thread_spawn_failed ) { msg.append(" : failed spawning threads"); }
Kokkos::Impl::throw_runtime_exception(msg);
}
// Check for over-subscription
- if( Impl::mpi_ranks_per_node() * long(thread_count) > Impl::processors_per_node() ) {
- std::cout << "Kokkos::OpenMP::initialize WARNING: You are likely oversubscribing your CPU cores." << std::endl;
- std::cout << " Detected: " << Impl::processors_per_node() << " cores per node." << std::endl;
- std::cout << " Detected: " << Impl::mpi_ranks_per_node() << " MPI_ranks per node." << std::endl;
- std::cout << " Requested: " << thread_count << " threads per process." << std::endl;
- }
+ //if( Impl::mpi_ranks_per_node() * long(thread_count) > Impl::processors_per_node() ) {
+ // std::cout << "Kokkos::OpenMP::initialize WARNING: You are likely oversubscribing your CPU cores." << std::endl;
+ // std::cout << " Detected: " << Impl::processors_per_node() << " cores per node." << std::endl;
+ // std::cout << " Detected: " << Impl::mpi_ranks_per_node() << " MPI_ranks per node." << std::endl;
+ // std::cout << " Requested: " << thread_count << " threads per process." << std::endl;
+ //}
// Init the array for used for arbitrarily sized atomics
Impl::init_lock_array_host_space();
#if (KOKKOS_ENABLE_PROFILING)
Kokkos::Profiling::initialize();
#endif
}
//----------------------------------------------------------------------------
void OpenMP::finalize()
{
Impl::OpenMPexec::verify_initialized( "OpenMP::finalize" );
Impl::OpenMPexec::verify_is_process( "OpenMP::finalize" );
Impl::OpenMPexec::clear_scratch();
Impl::OpenMPexec::m_pool_topo[0] = 0 ;
Impl::OpenMPexec::m_pool_topo[1] = 0 ;
Impl::OpenMPexec::m_pool_topo[2] = 0 ;
omp_set_num_threads(1);
if ( Impl::s_using_hwloc && Kokkos::hwloc::can_bind_threads() ) {
hwloc::unbind_this_thread();
}
#if (KOKKOS_ENABLE_PROFILING)
Kokkos::Profiling::finalize();
#endif
}
//----------------------------------------------------------------------------
void OpenMP::print_configuration( std::ostream & s , const bool detail )
{
Impl::OpenMPexec::verify_is_process( "OpenMP::print_configuration" );
s << "Kokkos::OpenMP" ;
#if defined( KOKKOS_HAVE_OPENMP )
s << " KOKKOS_HAVE_OPENMP" ;
#endif
#if defined( KOKKOS_HAVE_HWLOC )
const unsigned numa_count_ = Kokkos::hwloc::get_available_numa_count();
const unsigned cores_per_numa = Kokkos::hwloc::get_available_cores_per_numa();
const unsigned threads_per_core = Kokkos::hwloc::get_available_threads_per_core();
s << " hwloc[" << numa_count_ << "x" << cores_per_numa << "x" << threads_per_core << "]"
<< " hwloc_binding_" << ( Impl::s_using_hwloc ? "enabled" : "disabled" )
;
#endif
const bool is_initialized = 0 != Impl::OpenMPexec::m_pool[0] ;
if ( is_initialized ) {
const int numa_count = Kokkos::Impl::OpenMPexec::m_pool_topo[0] / Kokkos::Impl::OpenMPexec::m_pool_topo[1] ;
const int core_per_numa = Kokkos::Impl::OpenMPexec::m_pool_topo[1] / Kokkos::Impl::OpenMPexec::m_pool_topo[2] ;
const int thread_per_core = Kokkos::Impl::OpenMPexec::m_pool_topo[2] ;
s << " thread_pool_topology[ " << numa_count
<< " x " << core_per_numa
<< " x " << thread_per_core
<< " ]"
<< std::endl ;
if ( detail ) {
std::vector< std::pair<unsigned,unsigned> > coord( Kokkos::Impl::OpenMPexec::m_pool_topo[0] );
#pragma omp parallel
{
#pragma omp critical
{
coord[ omp_get_thread_num() ] = hwloc::get_this_thread_coordinate();
}
/* END #pragma omp critical */
}
/* END #pragma omp parallel */
for ( unsigned i = 0 ; i < coord.size() ; ++i ) {
s << " thread omp_rank[" << i << "]"
<< " kokkos_rank[" << Impl::OpenMPexec::m_map_rank[ i ] << "]"
<< " hwloc_coord[" << coord[i].first << "." << coord[i].second << "]"
<< std::endl ;
}
}
}
else {
s << " not initialized" << std::endl ;
}
}
int OpenMP::concurrency() {
return thread_pool_size(0);
}
} // namespace Kokkos
#endif //KOKKOS_HAVE_OPENMP
diff --git a/lib/kokkos/core/src/OpenMP/Kokkos_OpenMPexec.hpp b/lib/kokkos/core/src/OpenMP/Kokkos_OpenMPexec.hpp
index a01c9cb64..a2bfa742d 100644
--- a/lib/kokkos/core/src/OpenMP/Kokkos_OpenMPexec.hpp
+++ b/lib/kokkos/core/src/OpenMP/Kokkos_OpenMPexec.hpp
@@ -1,1083 +1,1078 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_OPENMPEXEC_HPP
#define KOKKOS_OPENMPEXEC_HPP
#include <impl/Kokkos_Traits.hpp>
#include <impl/Kokkos_spinwait.hpp>
#include <Kokkos_Atomic.hpp>
#include <iostream>
#include <sstream>
#include <fstream>
namespace Kokkos {
namespace Impl {
//----------------------------------------------------------------------------
/** \brief Data for OpenMP thread execution */
class OpenMPexec {
public:
enum { MAX_THREAD_COUNT = 4096 };
private:
static OpenMPexec * m_pool[ MAX_THREAD_COUNT ]; // Indexed by: m_pool_rank_rev
static int m_pool_topo[ 4 ];
static int m_map_rank[ MAX_THREAD_COUNT ];
friend class Kokkos::OpenMP ;
int const m_pool_rank ;
int const m_pool_rank_rev ;
int const m_scratch_exec_end ;
int const m_scratch_reduce_end ;
int const m_scratch_thread_end ;
int volatile m_barrier_state ;
// Members for dynamic scheduling
// Which thread am I stealing from currently
int m_current_steal_target;
// This thread's owned work_range
Kokkos::pair<long,long> m_work_range KOKKOS_ALIGN_16;
// Team Offset if one thread determines work_range for others
long m_team_work_index;
// Is this thread stealing (i.e. its owned work_range is exhausted
bool m_stealing;
OpenMPexec();
OpenMPexec( const OpenMPexec & );
OpenMPexec & operator = ( const OpenMPexec & );
static void clear_scratch();
public:
// Topology of a cache coherent thread pool:
// TOTAL = NUMA x GRAIN
// pool_size( depth = 0 )
// pool_size(0) = total number of threads
// pool_size(1) = number of threads per NUMA
// pool_size(2) = number of threads sharing finest grain memory hierarchy
inline static
int pool_size( int depth = 0 ) { return m_pool_topo[ depth ]; }
inline static
OpenMPexec * pool_rev( int pool_rank_rev ) { return m_pool[ pool_rank_rev ]; }
inline int pool_rank() const { return m_pool_rank ; }
inline int pool_rank_rev() const { return m_pool_rank_rev ; }
inline long team_work_index() const { return m_team_work_index ; }
inline int scratch_reduce_size() const
{ return m_scratch_reduce_end - m_scratch_exec_end ; }
inline int scratch_thread_size() const
{ return m_scratch_thread_end - m_scratch_reduce_end ; }
inline void * scratch_reduce() const { return ((char *) this) + m_scratch_exec_end ; }
inline void * scratch_thread() const { return ((char *) this) + m_scratch_reduce_end ; }
inline
void state_wait( int state )
{ Impl::spinwait( m_barrier_state , state ); }
inline
void state_set( int state ) { m_barrier_state = state ; }
~OpenMPexec() {}
OpenMPexec( const int arg_poolRank
, const int arg_scratch_exec_size
, const int arg_scratch_reduce_size
, const int arg_scratch_thread_size )
: m_pool_rank( arg_poolRank )
, m_pool_rank_rev( pool_size() - ( arg_poolRank + 1 ) )
, m_scratch_exec_end( arg_scratch_exec_size )
, m_scratch_reduce_end( m_scratch_exec_end + arg_scratch_reduce_size )
, m_scratch_thread_end( m_scratch_reduce_end + arg_scratch_thread_size )
, m_barrier_state(0)
{}
static void finalize();
static void initialize( const unsigned team_count ,
const unsigned threads_per_team ,
const unsigned numa_count ,
const unsigned cores_per_numa );
static void verify_is_process( const char * const );
static void verify_initialized( const char * const );
static void resize_scratch( size_t reduce_size , size_t thread_size );
inline static
OpenMPexec * get_thread_omp() { return m_pool[ m_map_rank[ omp_get_thread_num() ] ]; }
/* Dynamic Scheduling related functionality */
// Initialize the work range for this thread
inline void set_work_range(const long& begin, const long& end, const long& chunk_size) {
m_work_range.first = (begin+chunk_size-1)/chunk_size;
m_work_range.second = end>0?(end+chunk_size-1)/chunk_size:m_work_range.first;
}
// Claim and index from this thread's range from the beginning
inline long get_work_index_begin () {
Kokkos::pair<long,long> work_range_new = m_work_range;
Kokkos::pair<long,long> work_range_old = work_range_new;
if(work_range_old.first>=work_range_old.second)
return -1;
work_range_new.first+=1;
bool success = false;
while(!success) {
work_range_new = Kokkos::atomic_compare_exchange(&m_work_range,work_range_old,work_range_new);
- success = ( (work_range_new == work_range_old) ||
+ success = ( (work_range_new == work_range_old) ||
(work_range_new.first>=work_range_new.second));
work_range_old = work_range_new;
work_range_new.first+=1;
}
if(work_range_old.first<work_range_old.second)
return work_range_old.first;
else
return -1;
}
// Claim and index from this thread's range from the end
inline long get_work_index_end () {
Kokkos::pair<long,long> work_range_new = m_work_range;
Kokkos::pair<long,long> work_range_old = work_range_new;
if(work_range_old.first>=work_range_old.second)
return -1;
work_range_new.second-=1;
bool success = false;
while(!success) {
work_range_new = Kokkos::atomic_compare_exchange(&m_work_range,work_range_old,work_range_new);
success = ( (work_range_new == work_range_old) ||
(work_range_new.first>=work_range_new.second) );
work_range_old = work_range_new;
work_range_new.second-=1;
}
if(work_range_old.first<work_range_old.second)
return work_range_old.second-1;
else
return -1;
}
// Reset the steal target
inline void reset_steal_target() {
m_current_steal_target = (m_pool_rank+1)%m_pool_topo[0];
m_stealing = false;
}
// Reset the steal target
inline void reset_steal_target(int team_size) {
m_current_steal_target = (m_pool_rank_rev+team_size);
if(m_current_steal_target>=m_pool_topo[0])
m_current_steal_target = 0;//m_pool_topo[0]-1;
m_stealing = false;
}
// Get a steal target; start with my-rank + 1 and go round robin, until arriving at this threads rank
// Returns -1 fi no active steal target available
inline int get_steal_target() {
while(( m_pool[m_current_steal_target]->m_work_range.second <=
m_pool[m_current_steal_target]->m_work_range.first ) &&
(m_current_steal_target!=m_pool_rank) ) {
m_current_steal_target = (m_current_steal_target+1)%m_pool_topo[0];
}
if(m_current_steal_target == m_pool_rank)
return -1;
else
return m_current_steal_target;
}
inline int get_steal_target(int team_size) {
while(( m_pool[m_current_steal_target]->m_work_range.second <=
m_pool[m_current_steal_target]->m_work_range.first ) &&
(m_current_steal_target!=m_pool_rank_rev) ) {
if(m_current_steal_target + team_size < m_pool_topo[0])
m_current_steal_target = (m_current_steal_target+team_size);
else
m_current_steal_target = 0;
}
if(m_current_steal_target == m_pool_rank_rev)
return -1;
else
return m_current_steal_target;
}
inline long steal_work_index (int team_size = 0) {
long index = -1;
int steal_target = team_size>0?get_steal_target(team_size):get_steal_target();
while ( (steal_target != -1) && (index == -1)) {
index = m_pool[steal_target]->get_work_index_end();
if(index == -1)
steal_target = team_size>0?get_steal_target(team_size):get_steal_target();
}
return index;
}
// Get a work index. Claim from owned range until its exhausted, then steal from other thread
inline long get_work_index (int team_size = 0) {
long work_index = -1;
if(!m_stealing) work_index = get_work_index_begin();
if( work_index == -1) {
memory_fence();
m_stealing = true;
work_index = steal_work_index(team_size);
}
m_team_work_index = work_index;
memory_fence();
return work_index;
}
};
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
class OpenMPexecTeamMember {
public:
enum { TEAM_REDUCE_SIZE = 512 };
/** \brief Thread states for team synchronization */
enum { Active = 0 , Rendezvous = 1 };
typedef Kokkos::OpenMP execution_space ;
typedef execution_space::scratch_memory_space scratch_memory_space ;
Impl::OpenMPexec & m_exec ;
scratch_memory_space m_team_shared ;
int m_team_scratch_size[2] ;
int m_team_base_rev ;
int m_team_rank_rev ;
int m_team_rank ;
int m_team_size ;
int m_league_rank ;
int m_league_end ;
int m_league_size ;
int m_chunk_size;
int m_league_chunk_end;
Impl::OpenMPexec & m_team_lead_exec ;
int m_invalid_thread;
int m_team_alloc;
// Fan-in team threads, root of the fan-in which does not block returns true
inline
bool team_fan_in() const
{
memory_fence();
for ( int n = 1 , j ; ( ( j = m_team_rank_rev + n ) < m_team_size ) && ! ( m_team_rank_rev & n ) ; n <<= 1 ) {
m_exec.pool_rev( m_team_base_rev + j )->state_wait( Active );
}
if ( m_team_rank_rev ) {
m_exec.state_set( Rendezvous );
memory_fence();
m_exec.state_wait( Rendezvous );
}
return 0 == m_team_rank_rev ;
}
inline
void team_fan_out() const
{
memory_fence();
for ( int n = 1 , j ; ( ( j = m_team_rank_rev + n ) < m_team_size ) && ! ( m_team_rank_rev & n ) ; n <<= 1 ) {
m_exec.pool_rev( m_team_base_rev + j )->state_set( Active );
memory_fence();
}
}
public:
KOKKOS_INLINE_FUNCTION
const execution_space::scratch_memory_space& team_shmem() const
{ return m_team_shared.set_team_thread_mode(0,1,0) ; }
KOKKOS_INLINE_FUNCTION
const execution_space::scratch_memory_space& team_scratch(int) const
{ return m_team_shared.set_team_thread_mode(0,1,0) ; }
KOKKOS_INLINE_FUNCTION
const execution_space::scratch_memory_space& thread_scratch(int) const
{ return m_team_shared.set_team_thread_mode(0,team_size(),team_rank()) ; }
KOKKOS_INLINE_FUNCTION int league_rank() const { return m_league_rank ; }
KOKKOS_INLINE_FUNCTION int league_size() const { return m_league_size ; }
KOKKOS_INLINE_FUNCTION int team_rank() const { return m_team_rank ; }
KOKKOS_INLINE_FUNCTION int team_size() const { return m_team_size ; }
KOKKOS_INLINE_FUNCTION void team_barrier() const
#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
{}
#else
{
if ( 1 < m_team_size && !m_invalid_thread) {
team_fan_in();
team_fan_out();
}
}
#endif
template<class ValueType>
KOKKOS_INLINE_FUNCTION
void team_broadcast(ValueType& value, const int& thread_id) const
{
#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
{ }
#else
// Make sure there is enough scratch space:
typedef typename if_c< sizeof(ValueType) < TEAM_REDUCE_SIZE
, ValueType , void >::type type ;
- type * const local_value = ((type*) m_exec.scratch_thread());
- if(team_rank() == thread_id)
- *local_value = value;
+ type volatile * const shared_value =
+ ((type*) m_exec.pool_rev( m_team_base_rev )->scratch_thread());
+
+ if ( team_rank() == thread_id ) *shared_value = value;
memory_fence();
- team_barrier();
- value = *local_value;
+ team_barrier(); // Wait for 'thread_id' to write
+ value = *shared_value ;
+ team_barrier(); // Wait for team members to read
#endif
}
#ifdef KOKKOS_HAVE_CXX11
template< class ValueType, class JoinOp >
KOKKOS_INLINE_FUNCTION ValueType
team_reduce( const ValueType & value
, const JoinOp & op_in ) const
#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
{ return ValueType(); }
#else
{
memory_fence();
typedef ValueType value_type;
const JoinLambdaAdapter<value_type,JoinOp> op(op_in);
#endif
#else // KOKKOS_HAVE_CXX11
template< class JoinOp >
KOKKOS_INLINE_FUNCTION typename JoinOp::value_type
team_reduce( const typename JoinOp::value_type & value
, const JoinOp & op ) const
#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
{ return typename JoinOp::value_type(); }
#else
{
typedef typename JoinOp::value_type value_type;
#endif
#endif // KOKKOS_HAVE_CXX11
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
// Make sure there is enough scratch space:
typedef typename if_c< sizeof(value_type) < TEAM_REDUCE_SIZE
, value_type , void >::type type ;
type * const local_value = ((type*) m_exec.scratch_thread());
// Set this thread's contribution
*local_value = value ;
// Fence to make sure the base team member has access:
memory_fence();
if ( team_fan_in() ) {
// The last thread to synchronize returns true, all other threads wait for team_fan_out()
type * const team_value = ((type*) m_exec.pool_rev( m_team_base_rev )->scratch_thread());
// Join to the team value:
for ( int i = 1 ; i < m_team_size ; ++i ) {
op.join( *team_value , *((type*) m_exec.pool_rev( m_team_base_rev + i )->scratch_thread()) );
}
memory_fence();
// The base team member may "lap" the other team members,
// copy to their local value before proceeding.
for ( int i = 1 ; i < m_team_size ; ++i ) {
*((type*) m_exec.pool_rev( m_team_base_rev + i )->scratch_thread()) = *team_value ;
}
// Fence to make sure all team members have access
memory_fence();
}
team_fan_out();
return *((type volatile const *)local_value);
}
#endif
/** \brief Intra-team exclusive prefix sum with team_rank() ordering
* with intra-team non-deterministic ordering accumulation.
*
* The global inter-team accumulation value will, at the end of the
* league's parallel execution, be the scan's total.
* Parallel execution ordering of the league's teams is non-deterministic.
* As such the base value for each team's scan operation is similarly
* non-deterministic.
*/
template< typename ArgType >
KOKKOS_INLINE_FUNCTION ArgType team_scan( const ArgType & value , ArgType * const global_accum ) const
#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
{ return ArgType(); }
#else
{
// Make sure there is enough scratch space:
typedef typename if_c< sizeof(ArgType) < TEAM_REDUCE_SIZE , ArgType , void >::type type ;
volatile type * const work_value = ((type*) m_exec.scratch_thread());
*work_value = value ;
memory_fence();
if ( team_fan_in() ) {
// The last thread to synchronize returns true, all other threads wait for team_fan_out()
// m_team_base[0] == highest ranking team member
// m_team_base[ m_team_size - 1 ] == lowest ranking team member
//
// 1) copy from lower to higher rank, initialize lowest rank to zero
// 2) prefix sum from lowest to highest rank, skipping lowest rank
type accum = 0 ;
if ( global_accum ) {
for ( int i = m_team_size ; i-- ; ) {
type & val = *((type*) m_exec.pool_rev( m_team_base_rev + i )->scratch_thread());
accum += val ;
}
accum = atomic_fetch_add( global_accum , accum );
}
for ( int i = m_team_size ; i-- ; ) {
type & val = *((type*) m_exec.pool_rev( m_team_base_rev + i )->scratch_thread());
const type offset = accum ;
accum += val ;
val = offset ;
}
memory_fence();
}
team_fan_out();
return *work_value ;
}
#endif
/** \brief Intra-team exclusive prefix sum with team_rank() ordering.
*
* The highest rank thread can compute the reduction total as
* reduction_total = dev.team_scan( value ) + value ;
*/
template< typename Type >
KOKKOS_INLINE_FUNCTION Type team_scan( const Type & value ) const
{ return this-> template team_scan<Type>( value , 0 ); }
//----------------------------------------
// Private for the driver
private:
typedef execution_space::scratch_memory_space space ;
public:
template< class ... Properties >
inline
OpenMPexecTeamMember( Impl::OpenMPexec & exec
, const TeamPolicyInternal< OpenMP, Properties ...> & team
, const int shmem_size_L1
, const int shmem_size_L2
)
: m_exec( exec )
, m_team_shared(0,0)
, m_team_scratch_size{ shmem_size_L1 , shmem_size_L2 }
, m_team_base_rev(0)
, m_team_rank_rev(0)
, m_team_rank(0)
, m_team_size( team.team_size() )
, m_league_rank(0)
, m_league_end(0)
, m_league_size( team.league_size() )
, m_chunk_size( team.chunk_size()>0?team.chunk_size():team.team_iter() )
, m_league_chunk_end(0)
, m_team_lead_exec( *exec.pool_rev( team.team_alloc() * (m_exec.pool_rank_rev()/team.team_alloc()) ))
, m_team_alloc( team.team_alloc())
{
const int pool_rank_rev = m_exec.pool_rank_rev();
const int pool_team_rank_rev = pool_rank_rev % team.team_alloc();
const int pool_league_rank_rev = pool_rank_rev / team.team_alloc();
const int pool_num_teams = OpenMP::thread_pool_size(0)/team.team_alloc();
const int chunks_per_team = ( team.league_size() + m_chunk_size*pool_num_teams-1 ) / (m_chunk_size*pool_num_teams);
int league_iter_end = team.league_size() - pool_league_rank_rev * chunks_per_team * m_chunk_size;
int league_iter_begin = league_iter_end - chunks_per_team * m_chunk_size;
if (league_iter_begin < 0) league_iter_begin = 0;
if (league_iter_end>team.league_size()) league_iter_end = team.league_size();
if ((team.team_alloc()>m_team_size)?
(pool_team_rank_rev >= m_team_size):
(m_exec.pool_size() - pool_num_teams*m_team_size > m_exec.pool_rank())
)
m_invalid_thread = 1;
else
m_invalid_thread = 0;
m_team_rank_rev = pool_team_rank_rev ;
if ( pool_team_rank_rev < m_team_size && !m_invalid_thread ) {
m_team_base_rev = team.team_alloc() * pool_league_rank_rev ;
m_team_rank_rev = pool_team_rank_rev ;
m_team_rank = m_team_size - ( m_team_rank_rev + 1 );
m_league_end = league_iter_end ;
m_league_rank = league_iter_begin ;
new( (void*) &m_team_shared ) space( ( (char*) m_exec.pool_rev(m_team_base_rev)->scratch_thread() ) + TEAM_REDUCE_SIZE , m_team_scratch_size[0] ,
( (char*) m_exec.pool_rev(m_team_base_rev)->scratch_thread() ) + TEAM_REDUCE_SIZE + m_team_scratch_size[0],
0 );
}
if ( (m_team_rank_rev == 0) && (m_invalid_thread == 0) ) {
m_exec.set_work_range(m_league_rank,m_league_end,m_chunk_size);
m_exec.reset_steal_target(m_team_size);
}
}
bool valid_static() const
{
return m_league_rank < m_league_end ;
}
void next_static()
{
if ( m_league_rank < m_league_end ) {
team_barrier();
new( (void*) &m_team_shared ) space( ( (char*) m_exec.pool_rev(m_team_base_rev)->scratch_thread() ) + TEAM_REDUCE_SIZE , m_team_scratch_size[0] ,
( (char*) m_exec.pool_rev(m_team_base_rev)->scratch_thread() ) + TEAM_REDUCE_SIZE + m_team_scratch_size[0],
0);
}
m_league_rank++;
}
bool valid_dynamic() {
if(m_invalid_thread)
return false;
if ((m_league_rank < m_league_chunk_end) && (m_league_rank < m_league_size)) {
return true;
}
if ( m_team_rank_rev == 0 ) {
m_team_lead_exec.get_work_index(m_team_alloc);
}
team_barrier();
long work_index = m_team_lead_exec.team_work_index();
m_league_rank = work_index * m_chunk_size;
m_league_chunk_end = (work_index +1 ) * m_chunk_size;
if(m_league_chunk_end > m_league_size) m_league_chunk_end = m_league_size;
if(m_league_rank>=0)
return true;
return false;
}
void next_dynamic() {
if(m_invalid_thread)
return;
if ( m_league_rank < m_league_chunk_end ) {
team_barrier();
new( (void*) &m_team_shared ) space( ( (char*) m_exec.pool_rev(m_team_base_rev)->scratch_thread() ) + TEAM_REDUCE_SIZE , m_team_scratch_size[0] ,
( (char*) m_exec.pool_rev(m_team_base_rev)->scratch_thread() ) + TEAM_REDUCE_SIZE + m_team_scratch_size[0],
0);
}
m_league_rank++;
}
static inline int team_reduce_size() { return TEAM_REDUCE_SIZE ; }
};
-
-
template< class ... Properties >
class TeamPolicyInternal< Kokkos::OpenMP, Properties ... >: public PolicyTraits<Properties ...>
{
public:
//! Tag this class as a kokkos execution policy
typedef TeamPolicyInternal execution_policy ;
typedef PolicyTraits<Properties ... > traits;
TeamPolicyInternal& operator = (const TeamPolicyInternal& p) {
m_league_size = p.m_league_size;
m_team_size = p.m_team_size;
m_team_alloc = p.m_team_alloc;
m_team_iter = p.m_team_iter;
m_team_scratch_size[0] = p.m_team_scratch_size[0];
m_thread_scratch_size[0] = p.m_thread_scratch_size[0];
m_team_scratch_size[1] = p.m_team_scratch_size[1];
m_thread_scratch_size[1] = p.m_thread_scratch_size[1];
m_chunk_size = p.m_chunk_size;
return *this;
}
//----------------------------------------
template< class FunctorType >
inline static
int team_size_max( const FunctorType & )
{ return traits::execution_space::thread_pool_size(1); }
template< class FunctorType >
inline static
int team_size_recommended( const FunctorType & )
{ return traits::execution_space::thread_pool_size(2); }
template< class FunctorType >
inline static
int team_size_recommended( const FunctorType &, const int& )
{ return traits::execution_space::thread_pool_size(2); }
//----------------------------------------
private:
int m_league_size ;
int m_team_size ;
int m_team_alloc ;
int m_team_iter ;
size_t m_team_scratch_size[2];
size_t m_thread_scratch_size[2];
int m_chunk_size;
inline void init( const int league_size_request
, const int team_size_request )
{
const int pool_size = traits::execution_space::thread_pool_size(0);
const int team_max = traits::execution_space::thread_pool_size(1);
const int team_grain = traits::execution_space::thread_pool_size(2);
m_league_size = league_size_request ;
m_team_size = team_size_request < team_max ?
team_size_request : team_max ;
// Round team size up to a multiple of 'team_gain'
const int team_size_grain = team_grain * ( ( m_team_size + team_grain - 1 ) / team_grain );
const int team_count = pool_size / team_size_grain ;
// Constraint : pool_size = m_team_alloc * team_count
m_team_alloc = pool_size / team_count ;
// Maxumum number of iterations each team will take:
m_team_iter = ( m_league_size + team_count - 1 ) / team_count ;
set_auto_chunk_size();
}
public:
inline int team_size() const { return m_team_size ; }
inline int league_size() const { return m_league_size ; }
+
inline size_t scratch_size(const int& level, int team_size_ = -1) const {
- if(team_size_ < 0)
- team_size_ = m_team_size;
+ if(team_size_ < 0) team_size_ = m_team_size;
return m_team_scratch_size[level] + team_size_*m_thread_scratch_size[level] ;
}
/** \brief Specify league size, request team size */
TeamPolicyInternal( typename traits::execution_space &
, int league_size_request
, int team_size_request
, int /* vector_length_request */ = 1 )
: m_team_scratch_size { 0 , 0 }
, m_thread_scratch_size { 0 , 0 }
, m_chunk_size(0)
{ init( league_size_request , team_size_request ); }
TeamPolicyInternal( typename traits::execution_space &
, int league_size_request
, const Kokkos::AUTO_t & /* team_size_request */
, int /* vector_length_request */ = 1)
: m_team_scratch_size { 0 , 0 }
, m_thread_scratch_size { 0 , 0 }
, m_chunk_size(0)
{ init( league_size_request , traits::execution_space::thread_pool_size(2) ); }
TeamPolicyInternal( int league_size_request
, int team_size_request
, int /* vector_length_request */ = 1 )
: m_team_scratch_size { 0 , 0 }
, m_thread_scratch_size { 0 , 0 }
, m_chunk_size(0)
{ init( league_size_request , team_size_request ); }
TeamPolicyInternal( int league_size_request
, const Kokkos::AUTO_t & /* team_size_request */
, int /* vector_length_request */ = 1 )
: m_team_scratch_size { 0 , 0 }
, m_thread_scratch_size { 0 , 0 }
, m_chunk_size(0)
{ init( league_size_request , traits::execution_space::thread_pool_size(2) ); }
inline int team_alloc() const { return m_team_alloc ; }
inline int team_iter() const { return m_team_iter ; }
inline int chunk_size() const { return m_chunk_size ; }
/** \brief set chunk_size to a discrete value*/
inline TeamPolicyInternal set_chunk_size(typename traits::index_type chunk_size_) const {
TeamPolicyInternal p = *this;
p.m_chunk_size = chunk_size_;
return p;
}
inline TeamPolicyInternal set_scratch_size(const int& level, const PerTeamValue& per_team) const {
TeamPolicyInternal p = *this;
p.m_team_scratch_size[level] = per_team.value;
return p;
};
inline TeamPolicyInternal set_scratch_size(const int& level, const PerThreadValue& per_thread) const {
TeamPolicyInternal p = *this;
p.m_thread_scratch_size[level] = per_thread.value;
return p;
};
inline TeamPolicyInternal set_scratch_size(const int& level, const PerTeamValue& per_team, const PerThreadValue& per_thread) const {
TeamPolicyInternal p = *this;
p.m_team_scratch_size[level] = per_team.value;
p.m_thread_scratch_size[level] = per_thread.value;
return p;
};
private:
/** \brief finalize chunk_size if it was set to AUTO*/
inline void set_auto_chunk_size() {
int concurrency = traits::execution_space::thread_pool_size(0)/m_team_alloc;
if( concurrency==0 ) concurrency=1;
if(m_chunk_size > 0) {
if(!Impl::is_integral_power_of_two( m_chunk_size ))
Kokkos::abort("TeamPolicy blocking granularity must be power of two" );
}
int new_chunk_size = 1;
while(new_chunk_size*100*concurrency < m_league_size)
new_chunk_size *= 2;
if(new_chunk_size < 128) {
new_chunk_size = 1;
while( (new_chunk_size*40*concurrency < m_league_size ) && (new_chunk_size<128) )
new_chunk_size*=2;
}
m_chunk_size = new_chunk_size;
}
public:
typedef Impl::OpenMPexecTeamMember member_type ;
};
} // namespace Impl
-
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
inline
int OpenMP::thread_pool_size( int depth )
{
return Impl::OpenMPexec::pool_size(depth);
}
KOKKOS_INLINE_FUNCTION
int OpenMP::thread_pool_rank()
{
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
return Impl::OpenMPexec::m_map_rank[ omp_get_thread_num() ];
#else
return -1 ;
#endif
}
-} // namespace Kokkos
-
-
-namespace Kokkos {
-
-template<typename iType>
+template< typename iType >
KOKKOS_INLINE_FUNCTION
-Impl::TeamThreadRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember>
- TeamThreadRange(const Impl::OpenMPexecTeamMember& thread, const iType& count) {
- return Impl::TeamThreadRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember>(thread,count);
+Impl::TeamThreadRangeBoundariesStruct< iType, Impl::OpenMPexecTeamMember >
+TeamThreadRange( const Impl::OpenMPexecTeamMember& thread, const iType& count ) {
+ return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::OpenMPexecTeamMember >( thread, count );
}
-template<typename iType>
+template< typename iType1, typename iType2 >
KOKKOS_INLINE_FUNCTION
-Impl::TeamThreadRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember>
- TeamThreadRange(const Impl::OpenMPexecTeamMember& thread, const iType& begin, const iType& end) {
- return Impl::TeamThreadRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember>(thread,begin,end);
+Impl::TeamThreadRangeBoundariesStruct< typename std::common_type< iType1, iType2 >::type,
+ Impl::OpenMPexecTeamMember >
+TeamThreadRange( const Impl::OpenMPexecTeamMember& thread, const iType1& begin, const iType2& end ) {
+ typedef typename std::common_type< iType1, iType2 >::type iType;
+ return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::OpenMPexecTeamMember >( thread, iType(begin), iType(end) );
}
template<typename iType>
KOKKOS_INLINE_FUNCTION
Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember >
- ThreadVectorRange(const Impl::OpenMPexecTeamMember& thread, const iType& count) {
+ThreadVectorRange(const Impl::OpenMPexecTeamMember& thread, const iType& count) {
return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember >(thread,count);
}
KOKKOS_INLINE_FUNCTION
Impl::ThreadSingleStruct<Impl::OpenMPexecTeamMember> PerTeam(const Impl::OpenMPexecTeamMember& thread) {
return Impl::ThreadSingleStruct<Impl::OpenMPexecTeamMember>(thread);
}
KOKKOS_INLINE_FUNCTION
Impl::VectorSingleStruct<Impl::OpenMPexecTeamMember> PerThread(const Impl::OpenMPexecTeamMember& thread) {
return Impl::VectorSingleStruct<Impl::OpenMPexecTeamMember>(thread);
}
+
} // namespace Kokkos
namespace Kokkos {
/** \brief Inter-thread parallel_for. Executes lambda(iType i) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all threads of the the calling thread team.
* This functionality requires C++11 support.*/
template<typename iType, class Lambda>
KOKKOS_INLINE_FUNCTION
void parallel_for(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember>& loop_boundaries, const Lambda& lambda) {
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment)
lambda(i);
}
/** \brief Inter-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all threads of the the calling thread team and a summation of
* val is performed and put into result. This functionality requires C++11 support.*/
template< typename iType, class Lambda, typename ValueType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember>& loop_boundaries,
const Lambda & lambda, ValueType& result) {
result = ValueType();
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
ValueType tmp = ValueType();
lambda(i,tmp);
result+=tmp;
}
result = loop_boundaries.thread.team_reduce(result,Impl::JoinAdd<ValueType>());
}
/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a reduction of
* val is performed using JoinType(ValueType& val, const ValueType& update) and put into init_result.
* The input value of init_result is used as initializer for temporary variables of ValueType. Therefore
* the input value should be the neutral element with respect to the join operation (e.g. '0 for +-' or
* '1 for *'). This functionality requires C++11 support.*/
template< typename iType, class Lambda, typename ValueType, class JoinType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember>& loop_boundaries,
const Lambda & lambda, const JoinType& join, ValueType& init_result) {
ValueType result = init_result;
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
ValueType tmp = ValueType();
lambda(i,tmp);
join(result,tmp);
}
init_result = loop_boundaries.thread.team_reduce(result,join);
}
} //namespace Kokkos
-
namespace Kokkos {
/** \brief Intra-thread vector parallel_for. Executes lambda(iType i) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes of the the calling thread.
* This functionality requires C++11 support.*/
template<typename iType, class Lambda>
KOKKOS_INLINE_FUNCTION
void parallel_for(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember >&
loop_boundaries, const Lambda& lambda) {
#ifdef KOKKOS_HAVE_PRAGMA_IVDEP
#pragma ivdep
#endif
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment)
lambda(i);
}
/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a summation of
* val is performed and put into result. This functionality requires C++11 support.*/
template< typename iType, class Lambda, typename ValueType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember >&
loop_boundaries, const Lambda & lambda, ValueType& result) {
result = ValueType();
#ifdef KOKKOS_HAVE_PRAGMA_IVDEP
#pragma ivdep
#endif
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
ValueType tmp = ValueType();
lambda(i,tmp);
result+=tmp;
}
}
/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a reduction of
* val is performed using JoinType(ValueType& val, const ValueType& update) and put into init_result.
* The input value of init_result is used as initializer for temporary variables of ValueType. Therefore
* the input value should be the neutral element with respect to the join operation (e.g. '0 for +-' or
* '1 for *'). This functionality requires C++11 support.*/
template< typename iType, class Lambda, typename ValueType, class JoinType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember >&
loop_boundaries, const Lambda & lambda, const JoinType& join, ValueType& init_result) {
ValueType result = init_result;
#ifdef KOKKOS_HAVE_PRAGMA_IVDEP
#pragma ivdep
#endif
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
ValueType tmp = ValueType();
lambda(i,tmp);
join(result,tmp);
}
init_result = result;
}
/** \brief Intra-thread vector parallel exclusive prefix sum. Executes lambda(iType i, ValueType & val, bool final)
* for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes in the thread and a scan operation is performed.
* Depending on the target execution space the operator might be called twice: once with final=false
* and once with final=true. When final==true val contains the prefix sum value. The contribution of this
* "i" needs to be added to val no matter whether final==true or not. In a serial execution
* (i.e. team_size==1) the operator is only called once with final==true. Scan_val will be set
* to the final sum value over all vector lanes.
* This functionality requires C++11 support.*/
template< typename iType, class FunctorType >
KOKKOS_INLINE_FUNCTION
void parallel_scan(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember >&
loop_boundaries, const FunctorType & lambda) {
typedef Kokkos::Impl::FunctorValueTraits< FunctorType , void > ValueTraits ;
typedef typename ValueTraits::value_type value_type ;
value_type scan_val = value_type();
#ifdef KOKKOS_HAVE_PRAGMA_IVDEP
#pragma ivdep
#endif
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
lambda(i,scan_val,true);
}
}
} // namespace Kokkos
namespace Kokkos {
template<class FunctorType>
KOKKOS_INLINE_FUNCTION
void single(const Impl::VectorSingleStruct<Impl::OpenMPexecTeamMember>& single_struct, const FunctorType& lambda) {
lambda();
}
template<class FunctorType>
KOKKOS_INLINE_FUNCTION
void single(const Impl::ThreadSingleStruct<Impl::OpenMPexecTeamMember>& single_struct, const FunctorType& lambda) {
if(single_struct.team_member.team_rank()==0) lambda();
}
template<class FunctorType, class ValueType>
KOKKOS_INLINE_FUNCTION
void single(const Impl::VectorSingleStruct<Impl::OpenMPexecTeamMember>& single_struct, const FunctorType& lambda, ValueType& val) {
lambda(val);
}
template<class FunctorType, class ValueType>
KOKKOS_INLINE_FUNCTION
void single(const Impl::ThreadSingleStruct<Impl::OpenMPexecTeamMember>& single_struct, const FunctorType& lambda, ValueType& val) {
if(single_struct.team_member.team_rank()==0) {
lambda(val);
}
single_struct.team_member.team_broadcast(val,0);
}
}
#endif /* #ifndef KOKKOS_OPENMPEXEC_HPP */
-
diff --git a/lib/kokkos/core/src/Qthread/Kokkos_Qthread_Parallel.hpp b/lib/kokkos/core/src/Qthread/Kokkos_Qthread_Parallel.hpp
index 5b6419289..8ee70b9ef 100644
--- a/lib/kokkos/core/src/Qthread/Kokkos_Qthread_Parallel.hpp
+++ b/lib/kokkos/core/src/Qthread/Kokkos_Qthread_Parallel.hpp
@@ -1,745 +1,730 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_QTHREAD_PARALLEL_HPP
#define KOKKOS_QTHREAD_PARALLEL_HPP
#include <vector>
#include <Kokkos_Parallel.hpp>
#include <impl/Kokkos_StaticAssert.hpp>
#include <impl/Kokkos_FunctorAdapter.hpp>
#include <Qthread/Kokkos_QthreadExec.hpp>
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
//----------------------------------------------------------------------------
template< class FunctorType , class ... Traits >
class ParallelFor< FunctorType
, Kokkos::RangePolicy< Traits ... >
, Kokkos::Qthread
>
{
private:
typedef Kokkos::RangePolicy< Traits ... > Policy ;
typedef typename Policy::work_tag WorkTag ;
typedef typename Policy::member_type Member ;
typedef typename Policy::WorkRange WorkRange ;
const FunctorType m_functor ;
const Policy m_policy ;
template< class TagType >
inline static
typename std::enable_if< std::is_same< TagType , void >::value >::type
exec_range( const FunctorType & functor , const Member ibeg , const Member iend )
{
for ( Member i = ibeg ; i < iend ; ++i ) {
functor( i );
}
}
template< class TagType >
inline static
typename std::enable_if< ! std::is_same< TagType , void >::value >::type
exec_range( const FunctorType & functor , const Member ibeg , const Member iend )
{
const TagType t{} ;
for ( Member i = ibeg ; i < iend ; ++i ) {
functor( t , i );
}
}
// Function is called once by every concurrent thread.
static void exec( QthreadExec & exec , const void * arg )
{
const ParallelFor & self = * ((const ParallelFor *) arg );
const WorkRange range( self.m_policy, exec.worker_rank(), exec.worker_size() );
ParallelFor::template exec_range< WorkTag > ( self.m_functor , range.begin() , range.end() );
// All threads wait for completion.
exec.exec_all_barrier();
}
public:
inline
void execute() const
{
Impl::QthreadExec::exec_all( Qthread::instance() , & ParallelFor::exec , this );
}
ParallelFor( const FunctorType & arg_functor
, const Policy & arg_policy
)
: m_functor( arg_functor )
, m_policy( arg_policy )
{ }
};
//----------------------------------------------------------------------------
template< class FunctorType , class ReducerType , class ... Traits >
class ParallelReduce< FunctorType
, Kokkos::RangePolicy< Traits ... >
, ReducerType
, Kokkos::Qthread
>
{
private:
typedef Kokkos::RangePolicy< Traits ... > Policy ;
typedef typename Policy::work_tag WorkTag ;
typedef typename Policy::WorkRange WorkRange ;
typedef typename Policy::member_type Member ;
typedef Kokkos::Impl::if_c< std::is_same<InvalidType, ReducerType>::value, FunctorType, ReducerType > ReducerConditional;
typedef typename ReducerConditional::type ReducerTypeFwd;
// Static Assert WorkTag void if ReducerType not InvalidType
typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd, WorkTag > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd, WorkTag > ValueInit ;
typedef typename ValueTraits::pointer_type pointer_type ;
typedef typename ValueTraits::reference_type reference_type ;
const FunctorType m_functor ;
const Policy m_policy ;
const ReducerType m_reducer ;
const pointer_type m_result_ptr ;
template< class TagType >
inline static
typename std::enable_if< std::is_same< TagType , void >::value >::type
exec_range( const FunctorType & functor
, const Member ibeg , const Member iend
, reference_type update )
{
for ( Member i = ibeg ; i < iend ; ++i ) {
functor( i , update );
}
}
template< class TagType >
inline static
typename std::enable_if< ! std::is_same< TagType , void >::value >::type
exec_range( const FunctorType & functor
, const Member ibeg , const Member iend
, reference_type update )
{
const TagType t{} ;
for ( Member i = ibeg ; i < iend ; ++i ) {
functor( t , i , update );
}
}
static void exec( QthreadExec & exec , const void * arg )
{
const ParallelReduce & self = * ((const ParallelReduce *) arg );
const WorkRange range( self.m_policy, exec.worker_rank(), exec.worker_size() );
ParallelReduce::template exec_range< WorkTag >(
self.m_functor, range.begin(), range.end(),
ValueInit::init( ReducerConditional::select(self.m_functor , self.m_reducer)
, exec.exec_all_reduce_value() ) );
exec.template exec_all_reduce< FunctorType, ReducerType, WorkTag >( self.m_functor, self.m_reducer );
}
public:
inline
void execute() const
{
QthreadExec::resize_worker_scratch( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) , 0 );
Impl::QthreadExec::exec_all( Qthread::instance() , & ParallelReduce::exec , this );
const pointer_type data = (pointer_type) QthreadExec::exec_all_reduce_result();
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer) , data );
if ( m_result_ptr ) {
const unsigned n = ValueTraits::value_count( ReducerConditional::select(m_functor , m_reducer) );
for ( unsigned i = 0 ; i < n ; ++i ) { m_result_ptr[i] = data[i]; }
}
}
template< class ViewType >
ParallelReduce( const FunctorType & arg_functor
, const Policy & arg_policy
, const ViewType & arg_result_view
, typename std::enable_if<Kokkos::is_view< ViewType >::value &&
!Kokkos::is_reducer_type< ReducerType >::value
, void*>::type = NULL)
: m_functor( arg_functor )
, m_policy( arg_policy )
, m_reducer( InvalidType() )
, m_result_ptr( arg_result_view.data() )
{ }
ParallelReduce( const FunctorType & arg_functor
, Policy arg_policy
, const ReducerType& reducer )
: m_functor( arg_functor )
, m_policy( arg_policy )
, m_reducer( reducer )
, m_result_ptr( reducer.result_view().data() )
{ }
};
//----------------------------------------------------------------------------
template< class FunctorType , class ... Properties >
class ParallelFor< FunctorType
, TeamPolicy< Properties ... >
, Kokkos::Qthread >
{
private:
typedef Kokkos::Impl::TeamPolicyInternal< Kokkos::Qthread , Properties ... > Policy ;
typedef typename Policy::member_type Member ;
typedef typename Policy::work_tag WorkTag ;
const FunctorType m_functor ;
const Policy m_policy ;
template< class TagType >
inline static
typename std::enable_if< std::is_same< TagType , void >::value >::type
exec_team( const FunctorType & functor , Member member )
{
while ( member ) {
functor( member );
member.team_barrier();
member.next_team();
}
}
template< class TagType >
inline static
typename std::enable_if< ! std::is_same< TagType , void >::value >::type
exec_team( const FunctorType & functor , Member member )
{
const TagType t{} ;
while ( member ) {
functor( t , member );
member.team_barrier();
member.next_team();
}
}
static void exec( QthreadExec & exec , const void * arg )
{
const ParallelFor & self = * ((const ParallelFor *) arg );
ParallelFor::template exec_team< WorkTag >
( self.m_functor , Member( exec , self.m_policy ) );
exec.exec_all_barrier();
}
public:
inline
void execute() const
{
QthreadExec::resize_worker_scratch
( /* reduction memory */ 0
, /* team shared memory */ FunctorTeamShmemSize< FunctorType >::value( m_functor , m_policy.team_size() ) );
Impl::QthreadExec::exec_all( Qthread::instance() , & ParallelFor::exec , this );
}
ParallelFor( const FunctorType & arg_functor ,
const Policy & arg_policy )
: m_functor( arg_functor )
, m_policy( arg_policy )
{ }
};
//----------------------------------------------------------------------------
template< class FunctorType , class ReducerType , class ... Properties >
class ParallelReduce< FunctorType
, TeamPolicy< Properties... >
, ReducerType
, Kokkos::Qthread
>
{
private:
typedef Kokkos::Impl::TeamPolicyInternal< Kokkos::Qthread , Properties ... > Policy ;
typedef typename Policy::work_tag WorkTag ;
typedef typename Policy::member_type Member ;
typedef Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, FunctorType, ReducerType> ReducerConditional;
typedef typename ReducerConditional::type ReducerTypeFwd;
typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd , WorkTag > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd , WorkTag > ValueInit ;
typedef typename ValueTraits::pointer_type pointer_type ;
typedef typename ValueTraits::reference_type reference_type ;
const FunctorType m_functor ;
const Policy m_policy ;
const ReducerType m_reducer ;
const pointer_type m_result_ptr ;
template< class TagType >
inline static
typename std::enable_if< std::is_same< TagType , void >::value >::type
exec_team( const FunctorType & functor , Member member , reference_type update )
{
while ( member ) {
functor( member , update );
member.team_barrier();
member.next_team();
}
}
template< class TagType >
inline static
typename std::enable_if< ! std::is_same< TagType , void >::value >::type
exec_team( const FunctorType & functor , Member member , reference_type update )
{
const TagType t{} ;
while ( member ) {
functor( t , member , update );
member.team_barrier();
member.next_team();
}
}
static void exec( QthreadExec & exec , const void * arg )
{
const ParallelReduce & self = * ((const ParallelReduce *) arg );
ParallelReduce::template exec_team< WorkTag >
( self.m_functor
, Member( exec , self.m_policy )
, ValueInit::init( ReducerConditional::select( self.m_functor , self.m_reducer )
, exec.exec_all_reduce_value() ) );
exec.template exec_all_reduce< FunctorType, ReducerType, WorkTag >( self.m_functor, self.m_reducer );
}
public:
inline
void execute() const
{
QthreadExec::resize_worker_scratch
( /* reduction memory */ ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) )
, /* team shared memory */ FunctorTeamShmemSize< FunctorType >::value( m_functor , m_policy.team_size() ) );
Impl::QthreadExec::exec_all( Qthread::instance() , & ParallelReduce::exec , this );
const pointer_type data = (pointer_type) QthreadExec::exec_all_reduce_result();
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer), data );
if ( m_result_ptr ) {
const unsigned n = ValueTraits::value_count( ReducerConditional::select(m_functor , m_reducer) );
for ( unsigned i = 0 ; i < n ; ++i ) { m_result_ptr[i] = data[i]; }
}
}
template< class ViewType >
ParallelReduce( const FunctorType & arg_functor
, const Policy & arg_policy
, const ViewType & arg_result
, typename std::enable_if<Kokkos::is_view< ViewType >::value &&
!Kokkos::is_reducer_type< ReducerType >::value
, void*>::type = NULL)
: m_functor( arg_functor )
, m_policy( arg_policy )
, m_reducer( InvalidType() )
, m_result_ptr( arg_result.ptr_on_device() )
{ }
inline
ParallelReduce( const FunctorType & arg_functor
, Policy arg_policy
, const ReducerType& reducer )
: m_functor( arg_functor )
, m_policy( arg_policy )
, m_reducer( reducer )
, m_result_ptr( reducer.result_view().data() )
{ }
};
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
template< class FunctorType , class ... Traits >
class ParallelScan< FunctorType
, Kokkos::RangePolicy< Traits ... >
, Kokkos::Qthread
>
{
private:
typedef Kokkos::RangePolicy< Traits ... > Policy ;
typedef typename Policy::work_tag WorkTag ;
typedef typename Policy::WorkRange WorkRange ;
typedef typename Policy::member_type Member ;
typedef Kokkos::Impl::FunctorValueTraits< FunctorType, WorkTag > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< FunctorType, WorkTag > ValueInit ;
typedef typename ValueTraits::pointer_type pointer_type ;
typedef typename ValueTraits::reference_type reference_type ;
const FunctorType m_functor ;
const Policy m_policy ;
template< class TagType >
inline static
typename std::enable_if< std::is_same< TagType , void >::value >::type
exec_range( const FunctorType & functor
, const Member ibeg , const Member iend
, reference_type update , const bool final )
{
for ( Member i = ibeg ; i < iend ; ++i ) {
functor( i , update , final );
}
}
template< class TagType >
inline static
typename std::enable_if< ! std::is_same< TagType , void >::value >::type
exec_range( const FunctorType & functor
, const Member ibeg , const Member iend
, reference_type update , const bool final )
{
const TagType t{} ;
for ( Member i = ibeg ; i < iend ; ++i ) {
functor( t , i , update , final );
}
}
static void exec( QthreadExec & exec , const void * arg )
{
const ParallelScan & self = * ((const ParallelScan *) arg );
const WorkRange range( self.m_policy , exec.worker_rank() , exec.worker_size() );
// Initialize thread-local value
reference_type update = ValueInit::init( self.m_functor , exec.exec_all_reduce_value() );
ParallelScan::template exec_range< WorkTag >( self.m_functor, range.begin() , range.end() , update , false );
exec.template exec_all_scan< FunctorType , typename Policy::work_tag >( self.m_functor );
ParallelScan::template exec_range< WorkTag >( self.m_functor , range.begin() , range.end() , update , true );
exec.exec_all_barrier();
}
public:
inline
void execute() const
{
QthreadExec::resize_worker_scratch( ValueTraits::value_size( m_functor ) , 0 );
Impl::QthreadExec::exec_all( Qthread::instance() , & ParallelScan::exec , this );
}
ParallelScan( const FunctorType & arg_functor
, const Policy & arg_policy
)
: m_functor( arg_functor )
, m_policy( arg_policy )
{
}
};
} // namespace Impl
+
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
-template<typename iType>
+template< typename iType >
KOKKOS_INLINE_FUNCTION
-Impl::TeamThreadRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember>
-TeamThreadRange(const Impl::QthreadTeamPolicyMember& thread, const iType& count)
+Impl::TeamThreadRangeBoundariesStruct< iType, Impl::QthreadTeamPolicyMember >
+TeamThreadRange( const Impl::QthreadTeamPolicyMember& thread, const iType& count )
{
- return Impl::TeamThreadRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember>(thread,count);
+ return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::QthreadTeamPolicyMember >( thread, count );
}
-template<typename iType>
+template< typename iType1, typename iType2 >
KOKKOS_INLINE_FUNCTION
-Impl::TeamThreadRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember>
-TeamThreadRange( const Impl::QthreadTeamPolicyMember& thread
- , const iType & begin
- , const iType & end
- )
+Impl::TeamThreadRangeBoundariesStruct< typename std::common_type< iType1, iType2 >::type,
+ Impl::QthreadTeamPolicyMember >
+TeamThreadRange( const Impl::QthreadTeamPolicyMember& thread, const iType1 & begin, const iType2 & end )
{
- return Impl::TeamThreadRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember>(thread,begin,end);
+ typedef typename std::common_type< iType1, iType2 >::type iType;
+ return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::QthreadTeamPolicyMember >( thread, iType(begin), iType(end) );
}
-
template<typename iType>
KOKKOS_INLINE_FUNCTION
Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember >
ThreadVectorRange(const Impl::QthreadTeamPolicyMember& thread, const iType& count) {
return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember >(thread,count);
}
-
KOKKOS_INLINE_FUNCTION
Impl::ThreadSingleStruct<Impl::QthreadTeamPolicyMember> PerTeam(const Impl::QthreadTeamPolicyMember& thread) {
return Impl::ThreadSingleStruct<Impl::QthreadTeamPolicyMember>(thread);
}
KOKKOS_INLINE_FUNCTION
Impl::VectorSingleStruct<Impl::QthreadTeamPolicyMember> PerThread(const Impl::QthreadTeamPolicyMember& thread) {
return Impl::VectorSingleStruct<Impl::QthreadTeamPolicyMember>(thread);
}
-} // namespace Kokkos
-
-namespace Kokkos {
-
- /** \brief Inter-thread parallel_for. Executes lambda(iType i) for each i=0..N-1.
- *
- * The range i=0..N-1 is mapped to all threads of the the calling thread team.
- * This functionality requires C++11 support.*/
+/** \brief Inter-thread parallel_for. Executes lambda(iType i) for each i=0..N-1.
+ *
+ * The range i=0..N-1 is mapped to all threads of the the calling thread team.
+ * This functionality requires C++11 support.*/
template<typename iType, class Lambda>
KOKKOS_INLINE_FUNCTION
void parallel_for(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember>& loop_boundaries, const Lambda& lambda) {
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment)
lambda(i);
}
/** \brief Inter-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all threads of the the calling thread team and a summation of
* val is performed and put into result. This functionality requires C++11 support.*/
template< typename iType, class Lambda, typename ValueType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember>& loop_boundaries,
const Lambda & lambda, ValueType& result) {
result = ValueType();
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
ValueType tmp = ValueType();
lambda(i,tmp);
result+=tmp;
}
result = loop_boundaries.thread.team_reduce(result,Impl::JoinAdd<ValueType>());
}
#if defined( KOKKOS_HAVE_CXX11 )
/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a reduction of
* val is performed using JoinType(ValueType& val, const ValueType& update) and put into init_result.
* The input value of init_result is used as initializer for temporary variables of ValueType. Therefore
* the input value should be the neutral element with respect to the join operation (e.g. '0 for +-' or
* '1 for *'). This functionality requires C++11 support.*/
template< typename iType, class Lambda, typename ValueType, class JoinType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember>& loop_boundaries,
const Lambda & lambda, const JoinType& join, ValueType& init_result) {
ValueType result = init_result;
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
ValueType tmp = ValueType();
lambda(i,tmp);
join(result,tmp);
}
init_result = loop_boundaries.thread.team_reduce(result,Impl::JoinLambdaAdapter<ValueType,JoinType>(join));
}
#endif /* #if defined( KOKKOS_HAVE_CXX11 ) */
-} // namespace Kokkos
-
-namespace Kokkos {
/** \brief Intra-thread vector parallel_for. Executes lambda(iType i) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes of the the calling thread.
* This functionality requires C++11 support.*/
template<typename iType, class Lambda>
KOKKOS_INLINE_FUNCTION
void parallel_for(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember >&
loop_boundaries, const Lambda& lambda) {
#ifdef KOKKOS_HAVE_PRAGMA_IVDEP
#pragma ivdep
#endif
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment)
lambda(i);
}
/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a summation of
* val is performed and put into result. This functionality requires C++11 support.*/
template< typename iType, class Lambda, typename ValueType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember >&
loop_boundaries, const Lambda & lambda, ValueType& result) {
result = ValueType();
#ifdef KOKKOS_HAVE_PRAGMA_IVDEP
#pragma ivdep
#endif
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
ValueType tmp = ValueType();
lambda(i,tmp);
result+=tmp;
}
}
/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a reduction of
* val is performed using JoinType(ValueType& val, const ValueType& update) and put into init_result.
* The input value of init_result is used as initializer for temporary variables of ValueType. Therefore
* the input value should be the neutral element with respect to the join operation (e.g. '0 for +-' or
* '1 for *'). This functionality requires C++11 support.*/
template< typename iType, class Lambda, typename ValueType, class JoinType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember >&
loop_boundaries, const Lambda & lambda, const JoinType& join, ValueType& init_result) {
ValueType result = init_result;
#ifdef KOKKOS_HAVE_PRAGMA_IVDEP
#pragma ivdep
#endif
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
ValueType tmp = ValueType();
lambda(i,tmp);
join(result,tmp);
}
init_result = result;
}
/** \brief Intra-thread vector parallel exclusive prefix sum. Executes lambda(iType i, ValueType & val, bool final)
* for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes in the thread and a scan operation is performed.
* Depending on the target execution space the operator might be called twice: once with final=false
* and once with final=true. When final==true val contains the prefix sum value. The contribution of this
* "i" needs to be added to val no matter whether final==true or not. In a serial execution
* (i.e. team_size==1) the operator is only called once with final==true. Scan_val will be set
* to the final sum value over all vector lanes.
* This functionality requires C++11 support.*/
template< typename iType, class FunctorType >
KOKKOS_INLINE_FUNCTION
void parallel_scan(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember >&
loop_boundaries, const FunctorType & lambda) {
typedef Kokkos::Impl::FunctorValueTraits< FunctorType , void > ValueTraits ;
typedef typename ValueTraits::value_type value_type ;
value_type scan_val = value_type();
#ifdef KOKKOS_HAVE_PRAGMA_IVDEP
#pragma ivdep
#endif
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
lambda(i,scan_val,true);
}
}
-} // namespace Kokkos
-
-namespace Kokkos {
-
template<class FunctorType>
KOKKOS_INLINE_FUNCTION
void single(const Impl::VectorSingleStruct<Impl::QthreadTeamPolicyMember>& single_struct, const FunctorType& lambda) {
lambda();
}
template<class FunctorType>
KOKKOS_INLINE_FUNCTION
void single(const Impl::ThreadSingleStruct<Impl::QthreadTeamPolicyMember>& single_struct, const FunctorType& lambda) {
if(single_struct.team_member.team_rank()==0) lambda();
}
template<class FunctorType, class ValueType>
KOKKOS_INLINE_FUNCTION
void single(const Impl::VectorSingleStruct<Impl::QthreadTeamPolicyMember>& single_struct, const FunctorType& lambda, ValueType& val) {
lambda(val);
}
template<class FunctorType, class ValueType>
KOKKOS_INLINE_FUNCTION
void single(const Impl::ThreadSingleStruct<Impl::QthreadTeamPolicyMember>& single_struct, const FunctorType& lambda, ValueType& val) {
if(single_struct.team_member.team_rank()==0) {
lambda(val);
}
single_struct.team_member.team_broadcast(val,0);
}
} // namespace Kokkos
-
#endif /* #define KOKKOS_QTHREAD_PARALLEL_HPP */
-
diff --git a/lib/kokkos/core/src/Qthread/Kokkos_Qthread_TaskPolicy.cpp b/lib/kokkos/core/src/Qthread/Kokkos_Qthread_TaskPolicy.cpp
index 8cc39d277..e651b9fdb 100644
--- a/lib/kokkos/core/src/Qthread/Kokkos_Qthread_TaskPolicy.cpp
+++ b/lib/kokkos/core/src/Qthread/Kokkos_Qthread_TaskPolicy.cpp
@@ -1,491 +1,491 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
// Experimental unified task-data parallel manycore LDRD
#include <Kokkos_Core_fwd.hpp>
#if defined( KOKKOS_HAVE_QTHREAD )
#include <stdio.h>
#include <stdlib.h>
#include <stdexcept>
#include <iostream>
#include <sstream>
#include <string>
#include <Kokkos_Atomic.hpp>
#include <Qthread/Kokkos_Qthread_TaskPolicy.hpp>
-#if defined( KOKKOS_ENABLE_TASKPOLICY )
+#if defined( KOKKOS_ENABLE_TASKDAG )
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Experimental {
namespace Impl {
typedef TaskMember< Kokkos::Qthread , void , void > Task ;
namespace {
inline
unsigned padded_sizeof_derived( unsigned sizeof_derived )
{
return sizeof_derived +
( sizeof_derived % sizeof(Task*) ? sizeof(Task*) - sizeof_derived % sizeof(Task*) : 0 );
}
// int lock_alloc_dealloc = 0 ;
} // namespace
void Task::deallocate( void * ptr )
{
// Counting on 'free' thread safety so lock/unlock not required.
// However, isolate calls here to mitigate future need to introduce lock/unlock.
// lock
// while ( ! Kokkos::atomic_compare_exchange_strong( & lock_alloc_dealloc , 0 , 1 ) );
free( ptr );
// unlock
// Kokkos::atomic_compare_exchange_strong( & lock_alloc_dealloc , 1 , 0 );
}
void * Task::allocate( const unsigned arg_sizeof_derived
, const unsigned arg_dependence_capacity )
{
// Counting on 'malloc' thread safety so lock/unlock not required.
// However, isolate calls here to mitigate future need to introduce lock/unlock.
// lock
// while ( ! Kokkos::atomic_compare_exchange_strong( & lock_alloc_dealloc , 0 , 1 ) );
void * const ptr = malloc( padded_sizeof_derived( arg_sizeof_derived ) + arg_dependence_capacity * sizeof(Task*) );
// unlock
// Kokkos::atomic_compare_exchange_strong( & lock_alloc_dealloc , 1 , 0 );
return ptr ;
}
Task::~TaskMember()
{
}
Task::TaskMember( const function_verify_type arg_verify
, const function_dealloc_type arg_dealloc
, const function_single_type arg_apply_single
, const function_team_type arg_apply_team
, volatile int & arg_active_count
, const unsigned arg_sizeof_derived
, const unsigned arg_dependence_capacity
)
: m_dealloc( arg_dealloc )
, m_verify( arg_verify )
, m_apply_single( arg_apply_single )
, m_apply_team( arg_apply_team )
, m_active_count( & arg_active_count )
, m_qfeb(0)
, m_dep( (Task **)( ((unsigned char *) this) + padded_sizeof_derived( arg_sizeof_derived ) ) )
, m_dep_capacity( arg_dependence_capacity )
, m_dep_size( 0 )
, m_ref_count( 0 )
, m_state( Kokkos::Experimental::TASK_STATE_CONSTRUCTING )
{
qthread_empty( & m_qfeb ); // Set to full when complete
for ( unsigned i = 0 ; i < arg_dependence_capacity ; ++i ) m_dep[i] = 0 ;
}
Task::TaskMember( const function_dealloc_type arg_dealloc
, const function_single_type arg_apply_single
, const function_team_type arg_apply_team
, volatile int & arg_active_count
, const unsigned arg_sizeof_derived
, const unsigned arg_dependence_capacity
)
: m_dealloc( arg_dealloc )
, m_verify( & Task::verify_type<void> )
, m_apply_single( arg_apply_single )
, m_apply_team( arg_apply_team )
, m_active_count( & arg_active_count )
, m_qfeb(0)
, m_dep( (Task **)( ((unsigned char *) this) + padded_sizeof_derived( arg_sizeof_derived ) ) )
, m_dep_capacity( arg_dependence_capacity )
, m_dep_size( 0 )
, m_ref_count( 0 )
, m_state( Kokkos::Experimental::TASK_STATE_CONSTRUCTING )
{
qthread_empty( & m_qfeb ); // Set to full when complete
for ( unsigned i = 0 ; i < arg_dependence_capacity ; ++i ) m_dep[i] = 0 ;
}
//----------------------------------------------------------------------------
void Task::throw_error_add_dependence() const
{
std::cerr << "TaskMember< Qthread >::add_dependence ERROR"
<< " state(" << m_state << ")"
<< " dep_size(" << m_dep_size << ")"
<< std::endl ;
throw std::runtime_error("TaskMember< Qthread >::add_dependence ERROR");
}
void Task::throw_error_verify_type()
{
throw std::runtime_error("TaskMember< Qthread >::verify_type ERROR");
}
//----------------------------------------------------------------------------
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
void Task::assign( Task ** const lhs , Task * rhs , const bool no_throw )
{
static const char msg_error_header[] = "Kokkos::Impl::TaskManager<Kokkos::Qthread>::assign ERROR" ;
static const char msg_error_count[] = ": negative reference count" ;
static const char msg_error_complete[] = ": destroy task that is not complete" ;
static const char msg_error_dependences[] = ": destroy task that has dependences" ;
static const char msg_error_exception[] = ": caught internal exception" ;
- if ( rhs ) { Kokkos::atomic_fetch_add( & (*rhs).m_ref_count , 1 ); }
+ if ( rhs ) { Kokkos::atomic_increment( &(*rhs).m_ref_count ); }
Task * const lhs_val = Kokkos::atomic_exchange( lhs , rhs );
if ( lhs_val ) {
const int count = Kokkos::atomic_fetch_add( & (*lhs_val).m_ref_count , -1 );
const char * msg_error = 0 ;
try {
if ( 1 == count ) {
// Reference count at zero, delete it
// Should only be deallocating a completed task
if ( (*lhs_val).m_state == Kokkos::Experimental::TASK_STATE_COMPLETE ) {
// A completed task should not have dependences...
for ( int i = 0 ; i < (*lhs_val).m_dep_size && 0 == msg_error ; ++i ) {
if ( (*lhs_val).m_dep[i] ) msg_error = msg_error_dependences ;
}
}
else {
msg_error = msg_error_complete ;
}
if ( 0 == msg_error ) {
// Get deletion function and apply it
const Task::function_dealloc_type d = (*lhs_val).m_dealloc ;
(*d)( lhs_val );
}
}
else if ( count <= 0 ) {
msg_error = msg_error_count ;
}
}
catch( ... ) {
if ( 0 == msg_error ) msg_error = msg_error_exception ;
}
if ( 0 != msg_error ) {
if ( no_throw ) {
std::cerr << msg_error_header << msg_error << std::endl ;
std::cerr.flush();
}
else {
std::string msg(msg_error_header);
msg.append(msg_error);
throw std::runtime_error( msg );
}
}
}
}
#endif
//----------------------------------------------------------------------------
void Task::closeout()
{
enum { RESPAWN = int( Kokkos::Experimental::TASK_STATE_WAITING ) |
int( Kokkos::Experimental::TASK_STATE_EXECUTING ) };
#if 0
fprintf( stdout
, "worker(%d.%d) task 0x%.12lx %s\n"
, qthread_shep()
, qthread_worker_local(NULL)
, reinterpret_cast<unsigned long>(this)
, ( m_state == RESPAWN ? "respawn" : "complete" )
);
fflush(stdout);
#endif
// When dependent tasks run there would be a race
// condition between destroying this task and
// querying the active count pointer from this task.
int volatile * const active_count = m_active_count ;
if ( m_state == RESPAWN ) {
// Task requests respawn, set state to waiting and reschedule the task
m_state = Kokkos::Experimental::TASK_STATE_WAITING ;
schedule();
}
else {
// Task did not respawn, is complete
m_state = Kokkos::Experimental::TASK_STATE_COMPLETE ;
// Release dependences before allowing dependent tasks to run.
// Otherwise there is a thread race condition for removing dependences.
for ( int i = 0 ; i < m_dep_size ; ++i ) {
assign( & m_dep[i] , 0 );
}
// Set qthread FEB to full so that dependent tasks are allowed to execute.
// This 'task' may be deleted immediately following this function call.
qthread_fill( & m_qfeb );
// The dependent task could now complete and destroy 'this' task
// before the call to 'qthread_fill' returns. Therefore, for
// thread safety assume that 'this' task has now been destroyed.
}
// Decrement active task count before returning.
Kokkos::atomic_decrement( active_count );
}
aligned_t Task::qthread_func( void * arg )
{
Task * const task = reinterpret_cast< Task * >(arg);
// First member of the team change state to executing.
// Use compare-exchange to avoid race condition with a respawn.
Kokkos::atomic_compare_exchange_strong( & task->m_state
, int(Kokkos::Experimental::TASK_STATE_WAITING)
, int(Kokkos::Experimental::TASK_STATE_EXECUTING)
);
if ( task->m_apply_team && ! task->m_apply_single ) {
Kokkos::Impl::QthreadTeamPolicyMember::TaskTeam task_team_tag ;
// Initialize team size and rank with shephered info
Kokkos::Impl::QthreadTeamPolicyMember member( task_team_tag );
(*task->m_apply_team)( task , member );
#if 0
fprintf( stdout
, "worker(%d.%d) task 0x%.12lx executed by member(%d:%d)\n"
, qthread_shep()
, qthread_worker_local(NULL)
, reinterpret_cast<unsigned long>(task)
, member.team_rank()
, member.team_size()
);
fflush(stdout);
#endif
member.team_barrier();
if ( member.team_rank() == 0 ) task->closeout();
member.team_barrier();
}
else if ( task->m_apply_team && task->m_apply_single == reinterpret_cast<function_single_type>(1) ) {
// Team hard-wired to one, no cloning
Kokkos::Impl::QthreadTeamPolicyMember member ;
(*task->m_apply_team)( task , member );
task->closeout();
}
else {
(*task->m_apply_single)( task );
task->closeout();
}
#if 0
fprintf( stdout
, "worker(%d.%d) task 0x%.12lx return\n"
, qthread_shep()
, qthread_worker_local(NULL)
, reinterpret_cast<unsigned long>(task)
);
fflush(stdout);
#endif
return 0 ;
}
void Task::respawn()
{
// Change state from pure executing to ( waiting | executing )
// to avoid confusion with simply waiting.
Kokkos::atomic_compare_exchange_strong( & m_state
, int(Kokkos::Experimental::TASK_STATE_EXECUTING)
, int(Kokkos::Experimental::TASK_STATE_WAITING |
Kokkos::Experimental::TASK_STATE_EXECUTING)
);
}
void Task::schedule()
{
// Is waiting for execution
// Increment active task count before spawning.
Kokkos::atomic_increment( m_active_count );
// spawn in qthread. must malloc the precondition array and give to qthread.
// qthread will eventually free this allocation so memory will not be leaked.
// concern with thread safety of malloc, does this need to be guarded?
aligned_t ** qprecon = (aligned_t **) malloc( ( m_dep_size + 1 ) * sizeof(aligned_t *) );
qprecon[0] = reinterpret_cast<aligned_t *>( uintptr_t(m_dep_size) );
for ( int i = 0 ; i < m_dep_size ; ++i ) {
qprecon[i+1] = & m_dep[i]->m_qfeb ; // Qthread precondition flag
}
if ( m_apply_team && ! m_apply_single ) {
// If more than one shepherd spawn on a shepherd other than this shepherd
const int num_shepherd = qthread_num_shepherds();
const int num_worker_per_shepherd = qthread_num_workers_local(NO_SHEPHERD);
const int this_shepherd = qthread_shep();
int spawn_shepherd = ( this_shepherd + 1 ) % num_shepherd ;
#if 0
fprintf( stdout
, "worker(%d.%d) task 0x%.12lx spawning on shepherd(%d) clone(%d)\n"
, qthread_shep()
, qthread_worker_local(NULL)
, reinterpret_cast<unsigned long>(this)
, spawn_shepherd
, num_worker_per_shepherd - 1
);
fflush(stdout);
#endif
qthread_spawn_cloneable
( & Task::qthread_func
, this
, 0
, NULL
, m_dep_size , qprecon /* dependences */
, spawn_shepherd
, unsigned( QTHREAD_SPAWN_SIMPLE | QTHREAD_SPAWN_LOCAL_PRIORITY )
, num_worker_per_shepherd - 1
);
}
else {
qthread_spawn( & Task::qthread_func /* function */
, this /* function argument */
, 0
, NULL
, m_dep_size , qprecon /* dependences */
, NO_SHEPHERD
, QTHREAD_SPAWN_SIMPLE /* allows optimization for non-blocking task */
);
}
}
} // namespace Impl
} // namespace Experimental
} // namespace Kokkos
namespace Kokkos {
namespace Experimental {
TaskPolicy< Kokkos::Qthread >::
TaskPolicy
( const unsigned /* arg_task_max_count */
, const unsigned /* arg_task_max_size */
, const unsigned arg_task_default_dependence_capacity
, const unsigned arg_task_team_size
)
: m_default_dependence_capacity( arg_task_default_dependence_capacity )
, m_team_size( arg_task_team_size != 0 ? arg_task_team_size : unsigned(qthread_num_workers_local(NO_SHEPHERD)) )
, m_active_count_root(0)
, m_active_count( m_active_count_root )
{
const unsigned num_worker_per_shepherd = unsigned( qthread_num_workers_local(NO_SHEPHERD) );
if ( m_team_size != 1 && m_team_size != num_worker_per_shepherd ) {
std::ostringstream msg ;
msg << "Kokkos::Experimental::TaskPolicy< Kokkos::Qthread >( "
<< "default_depedence = " << arg_task_default_dependence_capacity
<< " , team_size = " << arg_task_team_size
<< " ) ERROR, valid team_size arguments are { (omitted) , 1 , " << num_worker_per_shepherd << " }" ;
Kokkos::Impl::throw_runtime_exception(msg.str());
}
}
TaskPolicy< Kokkos::Qthread >::member_type &
TaskPolicy< Kokkos::Qthread >::member_single()
{
static member_type s ;
return s ;
}
void wait( Kokkos::Experimental::TaskPolicy< Kokkos::Qthread > & policy )
{
volatile int * const active_task_count = & policy.m_active_count ;
while ( *active_task_count ) qthread_yield();
}
} // namespace Experimental
} // namespace Kokkos
-#endif /* #if defined( KOKKOS_ENABLE_TASKPOLICY ) */
+#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
#endif /* #if defined( KOKKOS_HAVE_QTHREAD ) */
diff --git a/lib/kokkos/core/src/Qthread/Kokkos_Qthread_TaskPolicy.hpp b/lib/kokkos/core/src/Qthread/Kokkos_Qthread_TaskPolicy.hpp
index 22a565503..565dbf7e6 100644
--- a/lib/kokkos/core/src/Qthread/Kokkos_Qthread_TaskPolicy.hpp
+++ b/lib/kokkos/core/src/Qthread/Kokkos_Qthread_TaskPolicy.hpp
@@ -1,664 +1,664 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
// Experimental unified task-data parallel manycore LDRD
-#ifndef KOKKOS_QTHREAD_TASKPOLICY_HPP
-#define KOKKOS_QTHREAD_TASKPOLICY_HPP
+#ifndef KOKKOS_QTHREAD_TASKSCHEDULER_HPP
+#define KOKKOS_QTHREAD_TASKSCHEDULER_HPP
#include <string>
#include <typeinfo>
#include <stdexcept>
//----------------------------------------------------------------------------
// Defines to enable experimental Qthread functionality
#define QTHREAD_LOCAL_PRIORITY
#define CLONED_TASKS
#include <qthread.h>
#undef QTHREAD_LOCAL_PRIORITY
#undef CLONED_TASKS
//----------------------------------------------------------------------------
#include <Kokkos_Qthread.hpp>
-#include <Kokkos_TaskPolicy.hpp>
+#include <Kokkos_TaskScheduler.hpp>
#include <Kokkos_View.hpp>
#include <impl/Kokkos_FunctorAdapter.hpp>
-#if defined( KOKKOS_ENABLE_TASKPOLICY )
+#if defined( KOKKOS_ENABLE_TASKDAG )
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Experimental {
namespace Impl {
template<>
class TaskMember< Kokkos::Qthread , void , void >
{
public:
typedef TaskMember * (* function_verify_type) ( TaskMember * );
typedef void (* function_single_type) ( TaskMember * );
typedef void (* function_team_type) ( TaskMember * , Kokkos::Impl::QthreadTeamPolicyMember & );
typedef void (* function_dealloc_type)( TaskMember * );
private:
const function_dealloc_type m_dealloc ; ///< Deallocation
const function_verify_type m_verify ; ///< Result type verification
const function_single_type m_apply_single ; ///< Apply function
const function_team_type m_apply_team ; ///< Apply function
int volatile * const m_active_count ; ///< Count of active tasks on this policy
aligned_t m_qfeb ; ///< Qthread full/empty bit
TaskMember ** const m_dep ; ///< Dependences
const int m_dep_capacity ; ///< Capacity of dependences
int m_dep_size ; ///< Actual count of dependences
int m_ref_count ; ///< Reference count
int m_state ; ///< State of the task
TaskMember() /* = delete */ ;
TaskMember( const TaskMember & ) /* = delete */ ;
TaskMember & operator = ( const TaskMember & ) /* = delete */ ;
static aligned_t qthread_func( void * arg );
static void * allocate( const unsigned arg_sizeof_derived , const unsigned arg_dependence_capacity );
static void deallocate( void * );
void throw_error_add_dependence() const ;
static void throw_error_verify_type();
template < class DerivedTaskType >
static
void deallocate( TaskMember * t )
{
DerivedTaskType * ptr = static_cast< DerivedTaskType * >(t);
ptr->~DerivedTaskType();
deallocate( (void *) ptr );
}
void schedule();
void closeout();
protected :
~TaskMember();
// Used by TaskMember< Qthread , ResultType , void >
TaskMember( const function_verify_type arg_verify
, const function_dealloc_type arg_dealloc
, const function_single_type arg_apply_single
, const function_team_type arg_apply_team
, volatile int & arg_active_count
, const unsigned arg_sizeof_derived
, const unsigned arg_dependence_capacity
);
// Used for TaskMember< Qthread , void , void >
TaskMember( const function_dealloc_type arg_dealloc
, const function_single_type arg_apply_single
, const function_team_type arg_apply_team
, volatile int & arg_active_count
, const unsigned arg_sizeof_derived
, const unsigned arg_dependence_capacity
);
public:
template< typename ResultType >
KOKKOS_FUNCTION static
TaskMember * verify_type( TaskMember * t )
{
- enum { check_type = ! Kokkos::Impl::is_same< ResultType , void >::value };
+ enum { check_type = ! std::is_same< ResultType , void >::value };
if ( check_type && t != 0 ) {
// Verify that t->m_verify is this function
const function_verify_type self = & TaskMember::template verify_type< ResultType > ;
if ( t->m_verify != self ) {
t = 0 ;
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
throw_error_verify_type();
#endif
}
}
return t ;
}
//----------------------------------------
/* Inheritence Requirements on task types:
* typedef FunctorType::value_type value_type ;
* class DerivedTaskType
* : public TaskMember< Qthread , value_type , FunctorType >
* { ... };
* class TaskMember< Qthread , value_type , FunctorType >
* : public TaskMember< Qthread , value_type , void >
* , public Functor
* { ... };
* If value_type != void
* class TaskMember< Qthread , value_type , void >
* : public TaskMember< Qthread , void , void >
*
* Allocate space for DerivedTaskType followed by TaskMember*[ dependence_capacity ]
*
*/
/** \brief Allocate and construct a single-thread task */
template< class DerivedTaskType >
static
TaskMember * create_single( const typename DerivedTaskType::functor_type & arg_functor
, volatile int & arg_active_count
, const unsigned arg_dependence_capacity )
{
typedef typename DerivedTaskType::functor_type functor_type ;
typedef typename functor_type::value_type value_type ;
DerivedTaskType * const task =
new( allocate( sizeof(DerivedTaskType) , arg_dependence_capacity ) )
DerivedTaskType( & TaskMember::template deallocate< DerivedTaskType >
, & TaskMember::template apply_single< functor_type , value_type >
, 0
, arg_active_count
, sizeof(DerivedTaskType)
, arg_dependence_capacity
, arg_functor );
return static_cast< TaskMember * >( task );
}
/** \brief Allocate and construct a team-thread task */
template< class DerivedTaskType >
static
TaskMember * create_team( const typename DerivedTaskType::functor_type & arg_functor
, volatile int & arg_active_count
, const unsigned arg_dependence_capacity
, const bool arg_is_team )
{
typedef typename DerivedTaskType::functor_type functor_type ;
typedef typename functor_type::value_type value_type ;
const function_single_type flag = reinterpret_cast<function_single_type>( arg_is_team ? 0 : 1 );
DerivedTaskType * const task =
new( allocate( sizeof(DerivedTaskType) , arg_dependence_capacity ) )
DerivedTaskType( & TaskMember::template deallocate< DerivedTaskType >
, flag
, & TaskMember::template apply_team< functor_type , value_type >
, arg_active_count
, sizeof(DerivedTaskType)
, arg_dependence_capacity
, arg_functor );
return static_cast< TaskMember * >( task );
}
void respawn();
void spawn()
{
m_state = Kokkos::Experimental::TASK_STATE_WAITING ;
schedule();
}
//----------------------------------------
typedef FutureValueTypeIsVoidError get_result_type ;
KOKKOS_INLINE_FUNCTION
get_result_type get() const { return get_result_type() ; }
KOKKOS_INLINE_FUNCTION
Kokkos::Experimental::TaskState get_state() const { return Kokkos::Experimental::TaskState( m_state ); }
//----------------------------------------
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
static
void assign( TaskMember ** const lhs , TaskMember * const rhs , const bool no_throw = false );
#else
KOKKOS_INLINE_FUNCTION static
void assign( TaskMember ** const lhs , TaskMember * const rhs , const bool no_throw = false ) {}
#endif
KOKKOS_INLINE_FUNCTION
TaskMember * get_dependence( int i ) const
{ return ( Kokkos::Experimental::TASK_STATE_EXECUTING == m_state && 0 <= i && i < m_dep_size ) ? m_dep[i] : (TaskMember*) 0 ; }
KOKKOS_INLINE_FUNCTION
int get_dependence() const
{ return m_dep_size ; }
KOKKOS_INLINE_FUNCTION
void clear_dependence()
{
for ( int i = 0 ; i < m_dep_size ; ++i ) assign( m_dep + i , 0 );
m_dep_size = 0 ;
}
KOKKOS_INLINE_FUNCTION
void add_dependence( TaskMember * before )
{
if ( ( Kokkos::Experimental::TASK_STATE_CONSTRUCTING == m_state ||
Kokkos::Experimental::TASK_STATE_EXECUTING == m_state ) &&
m_dep_size < m_dep_capacity ) {
assign( m_dep + m_dep_size , before );
++m_dep_size ;
}
else {
throw_error_add_dependence();
}
}
//----------------------------------------
template< class FunctorType , class ResultType >
KOKKOS_INLINE_FUNCTION static
- void apply_single( typename Kokkos::Impl::enable_if< ! Kokkos::Impl::is_same< ResultType , void >::value , TaskMember * >::type t )
+ void apply_single( typename std::enable_if< ! std::is_same< ResultType , void >::value , TaskMember * >::type t )
{
typedef TaskMember< Kokkos::Qthread , ResultType , FunctorType > derived_type ;
// TaskMember< Kokkos::Qthread , ResultType , FunctorType >
// : public TaskMember< Kokkos::Qthread , ResultType , void >
// , public FunctorType
// { ... };
derived_type & m = * static_cast< derived_type * >( t );
Kokkos::Impl::FunctorApply< FunctorType , void , ResultType & >::apply( (FunctorType &) m , & m.m_result );
}
template< class FunctorType , class ResultType >
KOKKOS_INLINE_FUNCTION static
- void apply_single( typename Kokkos::Impl::enable_if< Kokkos::Impl::is_same< ResultType , void >::value , TaskMember * >::type t )
+ void apply_single( typename std::enable_if< std::is_same< ResultType , void >::value , TaskMember * >::type t )
{
typedef TaskMember< Kokkos::Qthread , ResultType , FunctorType > derived_type ;
// TaskMember< Kokkos::Qthread , ResultType , FunctorType >
// : public TaskMember< Kokkos::Qthread , ResultType , void >
// , public FunctorType
// { ... };
derived_type & m = * static_cast< derived_type * >( t );
Kokkos::Impl::FunctorApply< FunctorType , void , void >::apply( (FunctorType &) m );
}
//----------------------------------------
template< class FunctorType , class ResultType >
KOKKOS_INLINE_FUNCTION static
- void apply_team( typename Kokkos::Impl::enable_if< ! Kokkos::Impl::is_same< ResultType , void >::value , TaskMember * >::type t
+ void apply_team( typename std::enable_if< ! std::is_same< ResultType , void >::value , TaskMember * >::type t
, Kokkos::Impl::QthreadTeamPolicyMember & member )
{
typedef TaskMember< Kokkos::Qthread , ResultType , FunctorType > derived_type ;
derived_type & m = * static_cast< derived_type * >( t );
m.FunctorType::apply( member , m.m_result );
}
template< class FunctorType , class ResultType >
KOKKOS_INLINE_FUNCTION static
- void apply_team( typename Kokkos::Impl::enable_if< Kokkos::Impl::is_same< ResultType , void >::value , TaskMember * >::type t
+ void apply_team( typename std::enable_if< std::is_same< ResultType , void >::value , TaskMember * >::type t
, Kokkos::Impl::QthreadTeamPolicyMember & member )
{
typedef TaskMember< Kokkos::Qthread , ResultType , FunctorType > derived_type ;
derived_type & m = * static_cast< derived_type * >( t );
m.FunctorType::apply( member );
}
};
//----------------------------------------------------------------------------
/** \brief Base class for tasks with a result value in the Qthread execution space.
*
* The FunctorType must be void because this class is accessed by the
* Future class for the task and result value.
*
* Must be derived from TaskMember<S,void,void> 'root class' so the Future class
* can correctly static_cast from the 'root class' to this class.
*/
template < class ResultType >
class TaskMember< Kokkos::Qthread , ResultType , void >
: public TaskMember< Kokkos::Qthread , void , void >
{
public:
ResultType m_result ;
typedef const ResultType & get_result_type ;
KOKKOS_INLINE_FUNCTION
get_result_type get() const { return m_result ; }
protected:
typedef TaskMember< Kokkos::Qthread , void , void > task_root_type ;
typedef task_root_type::function_dealloc_type function_dealloc_type ;
typedef task_root_type::function_single_type function_single_type ;
typedef task_root_type::function_team_type function_team_type ;
inline
TaskMember( const function_dealloc_type arg_dealloc
, const function_single_type arg_apply_single
, const function_team_type arg_apply_team
, volatile int & arg_active_count
, const unsigned arg_sizeof_derived
, const unsigned arg_dependence_capacity
)
: task_root_type( & task_root_type::template verify_type< ResultType >
, arg_dealloc
, arg_apply_single
, arg_apply_team
, arg_active_count
, arg_sizeof_derived
, arg_dependence_capacity )
, m_result()
{}
};
template< class ResultType , class FunctorType >
class TaskMember< Kokkos::Qthread , ResultType , FunctorType >
: public TaskMember< Kokkos::Qthread , ResultType , void >
, public FunctorType
{
public:
typedef FunctorType functor_type ;
typedef TaskMember< Kokkos::Qthread , void , void > task_root_type ;
typedef TaskMember< Kokkos::Qthread , ResultType , void > task_base_type ;
typedef task_root_type::function_dealloc_type function_dealloc_type ;
typedef task_root_type::function_single_type function_single_type ;
typedef task_root_type::function_team_type function_team_type ;
inline
TaskMember( const function_dealloc_type arg_dealloc
, const function_single_type arg_apply_single
, const function_team_type arg_apply_team
, volatile int & arg_active_count
, const unsigned arg_sizeof_derived
, const unsigned arg_dependence_capacity
, const functor_type & arg_functor
)
: task_base_type( arg_dealloc
, arg_apply_single
, arg_apply_team
, arg_active_count
, arg_sizeof_derived
, arg_dependence_capacity )
, functor_type( arg_functor )
{}
};
} /* namespace Impl */
} /* namespace Experimental */
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Experimental {
void wait( TaskPolicy< Kokkos::Qthread > & );
template<>
class TaskPolicy< Kokkos::Qthread >
{
public:
typedef Kokkos::Qthread execution_space ;
typedef TaskPolicy execution_policy ;
typedef Kokkos::Impl::QthreadTeamPolicyMember member_type ;
private:
typedef Impl::TaskMember< execution_space , void , void > task_root_type ;
template< class FunctorType >
static inline
const task_root_type * get_task_root( const FunctorType * f )
{
typedef Impl::TaskMember< execution_space , typename FunctorType::value_type , FunctorType > task_type ;
return static_cast< const task_root_type * >( static_cast< const task_type * >(f) );
}
template< class FunctorType >
static inline
task_root_type * get_task_root( FunctorType * f )
{
typedef Impl::TaskMember< execution_space , typename FunctorType::value_type , FunctorType > task_type ;
return static_cast< task_root_type * >( static_cast< task_type * >(f) );
}
unsigned m_default_dependence_capacity ;
unsigned m_team_size ;
volatile int m_active_count_root ;
volatile int & m_active_count ;
public:
TaskPolicy
( const unsigned arg_task_max_count
, const unsigned arg_task_max_size
, const unsigned arg_task_default_dependence_capacity = 4
, const unsigned arg_task_team_size = 0 /* choose default */
);
KOKKOS_FUNCTION TaskPolicy() = default ;
KOKKOS_FUNCTION TaskPolicy( TaskPolicy && rhs ) = default ;
KOKKOS_FUNCTION TaskPolicy( const TaskPolicy & rhs ) = default ;
KOKKOS_FUNCTION TaskPolicy & operator = ( TaskPolicy && rhs ) = default ;
KOKKOS_FUNCTION TaskPolicy & operator = ( const TaskPolicy & rhs ) = default ;
//----------------------------------------
KOKKOS_INLINE_FUNCTION
int allocated_task_count() const { return m_active_count ; }
template< class ValueType >
const Future< ValueType , execution_space > &
spawn( const Future< ValueType , execution_space > & f
, const bool priority = false ) const
{
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
f.m_task->spawn();
#endif
return f ;
}
// Create single-thread task
template< class FunctorType >
KOKKOS_INLINE_FUNCTION
Future< typename FunctorType::value_type , execution_space >
task_create( const FunctorType & functor
, const unsigned dependence_capacity = ~0u ) const
{
typedef typename FunctorType::value_type value_type ;
typedef Impl::TaskMember< execution_space , value_type , FunctorType > task_type ;
return Future< value_type , execution_space >(
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
task_root_type::create_single< task_type >
( functor
, m_active_count
, ( ~0u == dependence_capacity ? m_default_dependence_capacity : dependence_capacity )
)
#endif
);
}
template< class FunctorType >
Future< typename FunctorType::value_type , execution_space >
proc_create( const FunctorType & functor
, const unsigned dependence_capacity = ~0u ) const
{ return task_create( functor , dependence_capacity ); }
// Create thread-team task
template< class FunctorType >
KOKKOS_INLINE_FUNCTION
Future< typename FunctorType::value_type , execution_space >
task_create_team( const FunctorType & functor
, const unsigned dependence_capacity = ~0u ) const
{
typedef typename FunctorType::value_type value_type ;
typedef Impl::TaskMember< execution_space , value_type , FunctorType > task_type ;
return Future< value_type , execution_space >(
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
task_root_type::create_team< task_type >
( functor
, m_active_count
, ( ~0u == dependence_capacity ? m_default_dependence_capacity : dependence_capacity )
, 1 < m_team_size
)
#endif
);
}
template< class FunctorType >
KOKKOS_INLINE_FUNCTION
Future< typename FunctorType::value_type , execution_space >
proc_create_team( const FunctorType & functor
, const unsigned dependence_capacity = ~0u ) const
{ return task_create_team( functor , dependence_capacity ); }
// Add dependence
template< class A1 , class A2 , class A3 , class A4 >
void add_dependence( const Future<A1,A2> & after
, const Future<A3,A4> & before
- , typename Kokkos::Impl::enable_if
- < Kokkos::Impl::is_same< typename Future<A1,A2>::execution_space , execution_space >::value
+ , typename std::enable_if
+ < std::is_same< typename Future<A1,A2>::execution_space , execution_space >::value
&&
- Kokkos::Impl::is_same< typename Future<A3,A4>::execution_space , execution_space >::value
+ std::is_same< typename Future<A3,A4>::execution_space , execution_space >::value
>::type * = 0
)
{
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
after.m_task->add_dependence( before.m_task );
#endif
}
//----------------------------------------
// Functions for an executing task functor to query dependences,
// set new dependences, and respawn itself.
template< class FunctorType >
Future< void , execution_space >
get_dependence( const FunctorType * task_functor , int i ) const
{
return Future<void,execution_space>(
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
get_task_root(task_functor)->get_dependence(i)
#endif
);
}
template< class FunctorType >
int get_dependence( const FunctorType * task_functor ) const
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
{ return get_task_root(task_functor)->get_dependence(); }
#else
{ return 0 ; }
#endif
template< class FunctorType >
void clear_dependence( FunctorType * task_functor ) const
{
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
get_task_root(task_functor)->clear_dependence();
#endif
}
template< class FunctorType , class A3 , class A4 >
void add_dependence( FunctorType * task_functor
, const Future<A3,A4> & before
- , typename Kokkos::Impl::enable_if
- < Kokkos::Impl::is_same< typename Future<A3,A4>::execution_space , execution_space >::value
+ , typename std::enable_if
+ < std::is_same< typename Future<A3,A4>::execution_space , execution_space >::value
>::type * = 0
)
{
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
get_task_root(task_functor)->add_dependence( before.m_task );
#endif
}
template< class FunctorType >
void respawn( FunctorType * task_functor
, const bool priority = false ) const
{
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
get_task_root(task_functor)->respawn();
#endif
}
template< class FunctorType >
void respawn_needing_memory( FunctorType * task_functor ) const
{
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
get_task_root(task_functor)->respawn();
#endif
}
static member_type & member_single();
friend void wait( TaskPolicy< Kokkos::Qthread > & );
};
} /* namespace Experimental */
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
-#endif /* #if defined( KOKKOS_ENABLE_TASKPOLICY ) */
+#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
#endif /* #define KOKKOS_QTHREAD_TASK_HPP */
diff --git a/lib/kokkos/core/src/Threads/Kokkos_ThreadsExec.cpp b/lib/kokkos/core/src/Threads/Kokkos_ThreadsExec.cpp
index 5f0b8f70c..9f6e3d37b 100644
--- a/lib/kokkos/core/src/Threads/Kokkos_ThreadsExec.cpp
+++ b/lib/kokkos/core/src/Threads/Kokkos_ThreadsExec.cpp
@@ -1,826 +1,826 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Core_fwd.hpp>
#if defined( KOKKOS_HAVE_PTHREAD ) || defined( KOKKOS_HAVE_WINTHREAD )
#include <stdint.h>
#include <limits>
#include <utility>
#include <iostream>
#include <sstream>
#include <Kokkos_Core.hpp>
#include <impl/Kokkos_Error.hpp>
#include <impl/Kokkos_CPUDiscovery.hpp>
#include <impl/Kokkos_Profiling_Interface.hpp>
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
namespace {
ThreadsExec s_threads_process ;
ThreadsExec * s_threads_exec[ ThreadsExec::MAX_THREAD_COUNT ] = { 0 };
pthread_t s_threads_pid[ ThreadsExec::MAX_THREAD_COUNT ] = { 0 };
std::pair<unsigned,unsigned> s_threads_coord[ ThreadsExec::MAX_THREAD_COUNT ];
int s_thread_pool_size[3] = { 0 , 0 , 0 };
unsigned s_current_reduce_size = 0 ;
unsigned s_current_shared_size = 0 ;
void (* volatile s_current_function)( ThreadsExec & , const void * );
const void * volatile s_current_function_arg = 0 ;
struct Sentinel {
Sentinel()
{
HostSpace::register_in_parallel( ThreadsExec::in_parallel );
}
~Sentinel()
{
if ( s_thread_pool_size[0] ||
s_thread_pool_size[1] ||
s_thread_pool_size[2] ||
s_current_reduce_size ||
s_current_shared_size ||
s_current_function ||
s_current_function_arg ||
s_threads_exec[0] ) {
std::cerr << "ERROR : Process exiting without calling Kokkos::Threads::terminate()" << std::endl ;
}
}
};
inline
unsigned fan_size( const unsigned rank , const unsigned size )
{
const unsigned rank_rev = size - ( rank + 1 );
unsigned count = 0 ;
for ( unsigned n = 1 ; ( rank_rev + n < size ) && ! ( rank_rev & n ) ; n <<= 1 ) { ++count ; }
return count ;
}
} // namespace
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
void execute_function_noop( ThreadsExec & , const void * ) {}
void ThreadsExec::driver(void)
{
ThreadsExec this_thread ;
while ( ThreadsExec::Active == this_thread.m_pool_state ) {
(*s_current_function)( this_thread , s_current_function_arg );
// Deactivate thread and wait for reactivation
this_thread.m_pool_state = ThreadsExec::Inactive ;
wait_yield( this_thread.m_pool_state , ThreadsExec::Inactive );
}
}
ThreadsExec::ThreadsExec()
: m_pool_base(0)
, m_scratch(0)
, m_scratch_reduce_end(0)
, m_scratch_thread_end(0)
, m_numa_rank(0)
, m_numa_core_rank(0)
, m_pool_rank(0)
, m_pool_size(0)
, m_pool_fan_size(0)
, m_pool_state( ThreadsExec::Terminating )
{
if ( & s_threads_process != this ) {
// A spawned thread
ThreadsExec * const nil = 0 ;
// Which entry in 's_threads_exec', possibly determined from hwloc binding
const int entry = ((size_t)s_current_function_arg) < size_t(s_thread_pool_size[0])
? ((size_t)s_current_function_arg)
: size_t(Kokkos::hwloc::bind_this_thread( s_thread_pool_size[0] , s_threads_coord ));
// Given a good entry set this thread in the 's_threads_exec' array
if ( entry < s_thread_pool_size[0] &&
nil == atomic_compare_exchange( s_threads_exec + entry , nil , this ) ) {
const std::pair<unsigned,unsigned> coord = Kokkos::hwloc::get_this_thread_coordinate();
m_numa_rank = coord.first ;
m_numa_core_rank = coord.second ;
m_pool_base = s_threads_exec ;
m_pool_rank = s_thread_pool_size[0] - ( entry + 1 );
m_pool_rank_rev = s_thread_pool_size[0] - ( pool_rank() + 1 );
m_pool_size = s_thread_pool_size[0] ;
m_pool_fan_size = fan_size( m_pool_rank , m_pool_size );
m_pool_state = ThreadsExec::Active ;
s_threads_pid[ m_pool_rank ] = pthread_self();
// Inform spawning process that the threads_exec entry has been set.
s_threads_process.m_pool_state = ThreadsExec::Active ;
}
else {
// Inform spawning process that the threads_exec entry could not be set.
s_threads_process.m_pool_state = ThreadsExec::Terminating ;
}
}
else {
// Enables 'parallel_for' to execute on unitialized Threads device
m_pool_rank = 0 ;
m_pool_size = 1 ;
m_pool_state = ThreadsExec::Inactive ;
s_threads_pid[ m_pool_rank ] = pthread_self();
}
}
ThreadsExec::~ThreadsExec()
{
const unsigned entry = m_pool_size - ( m_pool_rank + 1 );
typedef Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::HostSpace , void > Record ;
if ( m_scratch ) {
Record * const r = Record::get_record( m_scratch );
m_scratch = 0 ;
Record::decrement( r );
}
m_pool_base = 0 ;
m_scratch_reduce_end = 0 ;
m_scratch_thread_end = 0 ;
m_numa_rank = 0 ;
m_numa_core_rank = 0 ;
m_pool_rank = 0 ;
m_pool_size = 0 ;
m_pool_fan_size = 0 ;
m_pool_state = ThreadsExec::Terminating ;
if ( & s_threads_process != this && entry < MAX_THREAD_COUNT ) {
ThreadsExec * const nil = 0 ;
atomic_compare_exchange( s_threads_exec + entry , this , nil );
s_threads_process.m_pool_state = ThreadsExec::Terminating ;
}
}
int ThreadsExec::get_thread_count()
{
return s_thread_pool_size[0] ;
}
ThreadsExec * ThreadsExec::get_thread( const int init_thread_rank )
{
ThreadsExec * const th =
init_thread_rank < s_thread_pool_size[0]
? s_threads_exec[ s_thread_pool_size[0] - ( init_thread_rank + 1 ) ] : 0 ;
if ( 0 == th || th->m_pool_rank != init_thread_rank ) {
std::ostringstream msg ;
msg << "Kokkos::Impl::ThreadsExec::get_thread ERROR : "
<< "thread " << init_thread_rank << " of " << s_thread_pool_size[0] ;
if ( 0 == th ) {
msg << " does not exist" ;
}
else {
msg << " has wrong thread_rank " << th->m_pool_rank ;
}
Kokkos::Impl::throw_runtime_exception( msg.str() );
}
return th ;
}
//----------------------------------------------------------------------------
void ThreadsExec::execute_sleep( ThreadsExec & exec , const void * )
{
ThreadsExec::global_lock();
ThreadsExec::global_unlock();
const int n = exec.m_pool_fan_size ;
const int rank_rev = exec.m_pool_size - ( exec.m_pool_rank + 1 );
for ( int i = 0 ; i < n ; ++i ) {
Impl::spinwait( exec.m_pool_base[ rank_rev + (1<<i) ]->m_pool_state , ThreadsExec::Active );
}
exec.m_pool_state = ThreadsExec::Inactive ;
}
}
}
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
void ThreadsExec::verify_is_process( const std::string & name , const bool initialized )
{
if ( ! is_process() ) {
std::string msg( name );
msg.append( " FAILED : Called by a worker thread, can only be called by the master process." );
Kokkos::Impl::throw_runtime_exception( msg );
}
if ( initialized && 0 == s_thread_pool_size[0] ) {
std::string msg( name );
msg.append( " FAILED : Threads not initialized." );
Kokkos::Impl::throw_runtime_exception( msg );
}
}
int ThreadsExec::in_parallel()
{
// A thread function is in execution and
// the function argument is not the special threads process argument and
// the master process is a worker or is not the master process.
return s_current_function &&
( & s_threads_process != s_current_function_arg ) &&
( s_threads_process.m_pool_base || ! is_process() );
}
// Wait for root thread to become inactive
void ThreadsExec::fence()
{
if ( s_thread_pool_size[0] ) {
// Wait for the root thread to complete:
Impl::spinwait( s_threads_exec[0]->m_pool_state , ThreadsExec::Active );
}
s_current_function = 0 ;
s_current_function_arg = 0 ;
// Make sure function and arguments are cleared before
// potentially re-activating threads with a subsequent launch.
memory_fence();
}
/** \brief Begin execution of the asynchronous functor */
void ThreadsExec::start( void (*func)( ThreadsExec & , const void * ) , const void * arg )
{
verify_is_process("ThreadsExec::start" , true );
if ( s_current_function || s_current_function_arg ) {
Kokkos::Impl::throw_runtime_exception( std::string( "ThreadsExec::start() FAILED : already executing" ) );
}
s_current_function = func ;
s_current_function_arg = arg ;
// Make sure function and arguments are written before activating threads.
memory_fence();
// Activate threads:
for ( int i = s_thread_pool_size[0] ; 0 < i-- ; ) {
s_threads_exec[i]->m_pool_state = ThreadsExec::Active ;
}
if ( s_threads_process.m_pool_size ) {
// Master process is the root thread, run it:
(*func)( s_threads_process , arg );
s_threads_process.m_pool_state = ThreadsExec::Inactive ;
}
}
//----------------------------------------------------------------------------
bool ThreadsExec::sleep()
{
verify_is_process("ThreadsExec::sleep", true );
if ( & execute_sleep == s_current_function ) return false ;
fence();
ThreadsExec::global_lock();
s_current_function = & execute_sleep ;
// Activate threads:
for ( unsigned i = s_thread_pool_size[0] ; 0 < i ; ) {
s_threads_exec[--i]->m_pool_state = ThreadsExec::Active ;
}
return true ;
}
bool ThreadsExec::wake()
{
verify_is_process("ThreadsExec::wake", true );
if ( & execute_sleep != s_current_function ) return false ;
ThreadsExec::global_unlock();
if ( s_threads_process.m_pool_base ) {
execute_sleep( s_threads_process , 0 );
s_threads_process.m_pool_state = ThreadsExec::Inactive ;
}
fence();
return true ;
}
//----------------------------------------------------------------------------
void ThreadsExec::execute_serial( void (*func)( ThreadsExec & , const void * ) )
{
s_current_function = func ;
s_current_function_arg = & s_threads_process ;
// Make sure function and arguments are written before activating threads.
memory_fence();
const unsigned begin = s_threads_process.m_pool_base ? 1 : 0 ;
for ( unsigned i = s_thread_pool_size[0] ; begin < i ; ) {
ThreadsExec & th = * s_threads_exec[ --i ];
th.m_pool_state = ThreadsExec::Active ;
wait_yield( th.m_pool_state , ThreadsExec::Active );
}
if ( s_threads_process.m_pool_base ) {
s_threads_process.m_pool_state = ThreadsExec::Active ;
(*func)( s_threads_process , 0 );
s_threads_process.m_pool_state = ThreadsExec::Inactive ;
}
s_current_function_arg = 0 ;
s_current_function = 0 ;
// Make sure function and arguments are cleared before proceeding.
memory_fence();
}
//----------------------------------------------------------------------------
void * ThreadsExec::root_reduce_scratch()
{
return s_threads_process.reduce_memory();
}
void ThreadsExec::execute_resize_scratch( ThreadsExec & exec , const void * )
{
typedef Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::HostSpace , void > Record ;
if ( exec.m_scratch ) {
Record * const r = Record::get_record( exec.m_scratch );
exec.m_scratch = 0 ;
Record::decrement( r );
}
exec.m_scratch_reduce_end = s_threads_process.m_scratch_reduce_end ;
exec.m_scratch_thread_end = s_threads_process.m_scratch_thread_end ;
if ( s_threads_process.m_scratch_thread_end ) {
// Allocate tracked memory:
{
Record * const r = Record::allocate( Kokkos::HostSpace() , "thread_scratch" , s_threads_process.m_scratch_thread_end );
Record::increment( r );
exec.m_scratch = r->data();
}
unsigned * ptr = reinterpret_cast<unsigned *>( exec.m_scratch );
unsigned * const end = ptr + s_threads_process.m_scratch_thread_end / sizeof(unsigned);
// touch on this thread
while ( ptr < end ) *ptr++ = 0 ;
}
}
void * ThreadsExec::resize_scratch( size_t reduce_size , size_t thread_size )
{
enum { ALIGN_MASK = Kokkos::Impl::MEMORY_ALIGNMENT - 1 };
fence();
const size_t old_reduce_size = s_threads_process.m_scratch_reduce_end ;
const size_t old_thread_size = s_threads_process.m_scratch_thread_end - s_threads_process.m_scratch_reduce_end ;
reduce_size = ( reduce_size + ALIGN_MASK ) & ~ALIGN_MASK ;
thread_size = ( thread_size + ALIGN_MASK ) & ~ALIGN_MASK ;
// Increase size or deallocate completely.
if ( ( old_reduce_size < reduce_size ) ||
( old_thread_size < thread_size ) ||
( ( reduce_size == 0 && thread_size == 0 ) &&
( old_reduce_size != 0 || old_thread_size != 0 ) ) ) {
verify_is_process( "ThreadsExec::resize_scratch" , true );
s_threads_process.m_scratch_reduce_end = reduce_size ;
s_threads_process.m_scratch_thread_end = reduce_size + thread_size ;
execute_serial( & execute_resize_scratch );
s_threads_process.m_scratch = s_threads_exec[0]->m_scratch ;
}
return s_threads_process.m_scratch ;
}
//----------------------------------------------------------------------------
void ThreadsExec::print_configuration( std::ostream & s , const bool detail )
{
verify_is_process("ThreadsExec::print_configuration",false);
fence();
const unsigned numa_count = Kokkos::hwloc::get_available_numa_count();
const unsigned cores_per_numa = Kokkos::hwloc::get_available_cores_per_numa();
const unsigned threads_per_core = Kokkos::hwloc::get_available_threads_per_core();
// Forestall compiler warnings for unused variables.
(void) numa_count;
(void) cores_per_numa;
(void) threads_per_core;
s << "Kokkos::Threads" ;
#if defined( KOKKOS_HAVE_PTHREAD )
s << " KOKKOS_HAVE_PTHREAD" ;
#endif
#if defined( KOKKOS_HAVE_HWLOC )
s << " hwloc[" << numa_count << "x" << cores_per_numa << "x" << threads_per_core << "]" ;
#endif
if ( s_thread_pool_size[0] ) {
s << " threads[" << s_thread_pool_size[0] << "]"
<< " threads_per_numa[" << s_thread_pool_size[1] << "]"
<< " threads_per_core[" << s_thread_pool_size[2] << "]"
;
if ( 0 == s_threads_process.m_pool_base ) { s << " Asynchronous" ; }
s << " ReduceScratch[" << s_current_reduce_size << "]"
<< " SharedScratch[" << s_current_shared_size << "]" ;
s << std::endl ;
if ( detail ) {
for ( int i = 0 ; i < s_thread_pool_size[0] ; ++i ) {
ThreadsExec * const th = s_threads_exec[i] ;
if ( th ) {
const int rank_rev = th->m_pool_size - ( th->m_pool_rank + 1 );
s << " Thread[ " << th->m_pool_rank << " : "
<< th->m_numa_rank << "." << th->m_numa_core_rank << " ]" ;
s << " Fan{" ;
for ( int j = 0 ; j < th->m_pool_fan_size ; ++j ) {
ThreadsExec * const thfan = th->m_pool_base[rank_rev+(1<<j)] ;
s << " [ " << thfan->m_pool_rank << " : "
<< thfan->m_numa_rank << "." << thfan->m_numa_core_rank << " ]" ;
}
s << " }" ;
if ( th == & s_threads_process ) {
s << " is_process" ;
}
}
s << std::endl ;
}
}
}
else {
s << " not initialized" << std::endl ;
}
}
//----------------------------------------------------------------------------
int ThreadsExec::is_initialized()
{ return 0 != s_threads_exec[0] ; }
void ThreadsExec::initialize( unsigned thread_count ,
unsigned use_numa_count ,
unsigned use_cores_per_numa ,
bool allow_asynchronous_threadpool )
{
static const Sentinel sentinel ;
const bool is_initialized = 0 != s_thread_pool_size[0] ;
unsigned thread_spawn_failed = 0 ;
for ( int i = 0; i < ThreadsExec::MAX_THREAD_COUNT ; i++)
s_threads_exec[i] = NULL;
if ( ! is_initialized ) {
// If thread_count, use_numa_count, or use_cores_per_numa are zero
// then they will be given default values based upon hwloc detection
// and allowed asynchronous execution.
const bool hwloc_avail = Kokkos::hwloc::available();
const bool hwloc_can_bind = hwloc_avail && Kokkos::hwloc::can_bind_threads();
if ( thread_count == 0 ) {
thread_count = hwloc_avail
? Kokkos::hwloc::get_available_numa_count() *
Kokkos::hwloc::get_available_cores_per_numa() *
Kokkos::hwloc::get_available_threads_per_core()
: 1 ;
}
const unsigned thread_spawn_begin =
hwloc::thread_mapping( "Kokkos::Threads::initialize" ,
allow_asynchronous_threadpool ,
thread_count ,
use_numa_count ,
use_cores_per_numa ,
s_threads_coord );
const std::pair<unsigned,unsigned> proc_coord = s_threads_coord[0] ;
if ( thread_spawn_begin ) {
// Synchronous with s_threads_coord[0] as the process core
// Claim entry #0 for binding the process core.
s_threads_coord[0] = std::pair<unsigned,unsigned>(~0u,~0u);
}
s_thread_pool_size[0] = thread_count ;
s_thread_pool_size[1] = s_thread_pool_size[0] / use_numa_count ;
s_thread_pool_size[2] = s_thread_pool_size[1] / use_cores_per_numa ;
s_current_function = & execute_function_noop ; // Initialization work function
for ( unsigned ith = thread_spawn_begin ; ith < thread_count ; ++ith ) {
s_threads_process.m_pool_state = ThreadsExec::Inactive ;
// If hwloc available then spawned thread will
// choose its own entry in 's_threads_coord'
// otherwise specify the entry.
s_current_function_arg = (void*)static_cast<uintptr_t>( hwloc_can_bind ? ~0u : ith );
// Make sure all outstanding memory writes are complete
// before spawning the new thread.
memory_fence();
// Spawn thread executing the 'driver()' function.
// Wait until spawned thread has attempted to initialize.
// If spawning and initialization is successfull then
// an entry in 's_threads_exec' will be assigned.
if ( ThreadsExec::spawn() ) {
wait_yield( s_threads_process.m_pool_state , ThreadsExec::Inactive );
}
if ( s_threads_process.m_pool_state == ThreadsExec::Terminating ) break ;
}
// Wait for all spawned threads to deactivate before zeroing the function.
for ( unsigned ith = thread_spawn_begin ; ith < thread_count ; ++ith ) {
// Try to protect against cache coherency failure by casting to volatile.
ThreadsExec * const th = ((ThreadsExec * volatile *)s_threads_exec)[ith] ;
if ( th ) {
wait_yield( th->m_pool_state , ThreadsExec::Active );
}
else {
++thread_spawn_failed ;
}
}
s_current_function = 0 ;
s_current_function_arg = 0 ;
s_threads_process.m_pool_state = ThreadsExec::Inactive ;
memory_fence();
if ( ! thread_spawn_failed ) {
// Bind process to the core on which it was located before spawning occured
if (hwloc_can_bind) {
Kokkos::hwloc::bind_this_thread( proc_coord );
}
if ( thread_spawn_begin ) { // Include process in pool.
const std::pair<unsigned,unsigned> coord = Kokkos::hwloc::get_this_thread_coordinate();
s_threads_exec[0] = & s_threads_process ;
s_threads_process.m_numa_rank = coord.first ;
s_threads_process.m_numa_core_rank = coord.second ;
s_threads_process.m_pool_base = s_threads_exec ;
s_threads_process.m_pool_rank = thread_count - 1 ; // Reversed for scan-compatible reductions
s_threads_process.m_pool_size = thread_count ;
s_threads_process.m_pool_fan_size = fan_size( s_threads_process.m_pool_rank , s_threads_process.m_pool_size );
s_threads_pid[ s_threads_process.m_pool_rank ] = pthread_self();
}
else {
s_threads_process.m_pool_base = 0 ;
s_threads_process.m_pool_rank = 0 ;
s_threads_process.m_pool_size = 0 ;
s_threads_process.m_pool_fan_size = 0 ;
}
// Initial allocations:
ThreadsExec::resize_scratch( 1024 , 1024 );
}
else {
s_thread_pool_size[0] = 0 ;
s_thread_pool_size[1] = 0 ;
s_thread_pool_size[2] = 0 ;
}
}
if ( is_initialized || thread_spawn_failed ) {
std::ostringstream msg ;
msg << "Kokkos::Threads::initialize ERROR" ;
if ( is_initialized ) {
msg << " : already initialized" ;
}
if ( thread_spawn_failed ) {
msg << " : failed to spawn " << thread_spawn_failed << " threads" ;
}
Kokkos::Impl::throw_runtime_exception( msg.str() );
}
// Check for over-subscription
- if( Impl::mpi_ranks_per_node() * long(thread_count) > Impl::processors_per_node() ) {
- std::cout << "Kokkos::Threads::initialize WARNING: You are likely oversubscribing your CPU cores." << std::endl;
- std::cout << " Detected: " << Impl::processors_per_node() << " cores per node." << std::endl;
- std::cout << " Detected: " << Impl::mpi_ranks_per_node() << " MPI_ranks per node." << std::endl;
- std::cout << " Requested: " << thread_count << " threads per process." << std::endl;
- }
+ //if( Impl::mpi_ranks_per_node() * long(thread_count) > Impl::processors_per_node() ) {
+ // std::cout << "Kokkos::Threads::initialize WARNING: You are likely oversubscribing your CPU cores." << std::endl;
+ // std::cout << " Detected: " << Impl::processors_per_node() << " cores per node." << std::endl;
+ // std::cout << " Detected: " << Impl::mpi_ranks_per_node() << " MPI_ranks per node." << std::endl;
+ // std::cout << " Requested: " << thread_count << " threads per process." << std::endl;
+ //}
// Init the array for used for arbitrarily sized atomics
Impl::init_lock_array_host_space();
#if (KOKKOS_ENABLE_PROFILING)
Kokkos::Profiling::initialize();
#endif
}
//----------------------------------------------------------------------------
void ThreadsExec::finalize()
{
verify_is_process("ThreadsExec::finalize",false);
fence();
resize_scratch(0,0);
const unsigned begin = s_threads_process.m_pool_base ? 1 : 0 ;
for ( unsigned i = s_thread_pool_size[0] ; begin < i-- ; ) {
if ( s_threads_exec[i] ) {
s_threads_exec[i]->m_pool_state = ThreadsExec::Terminating ;
wait_yield( s_threads_process.m_pool_state , ThreadsExec::Inactive );
s_threads_process.m_pool_state = ThreadsExec::Inactive ;
}
s_threads_pid[i] = 0 ;
}
if ( s_threads_process.m_pool_base ) {
( & s_threads_process )->~ThreadsExec();
s_threads_exec[0] = 0 ;
}
if (Kokkos::hwloc::can_bind_threads() ) {
Kokkos::hwloc::unbind_this_thread();
}
s_thread_pool_size[0] = 0 ;
s_thread_pool_size[1] = 0 ;
s_thread_pool_size[2] = 0 ;
// Reset master thread to run solo.
s_threads_process.m_numa_rank = 0 ;
s_threads_process.m_numa_core_rank = 0 ;
s_threads_process.m_pool_base = 0 ;
s_threads_process.m_pool_rank = 0 ;
s_threads_process.m_pool_size = 1 ;
s_threads_process.m_pool_fan_size = 0 ;
s_threads_process.m_pool_state = ThreadsExec::Inactive ;
#if (KOKKOS_ENABLE_PROFILING)
Kokkos::Profiling::finalize();
#endif
}
//----------------------------------------------------------------------------
} /* namespace Impl */
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
int Threads::concurrency() {
return thread_pool_size(0);
}
Threads & Threads::instance(int)
{
static Threads t ;
return t ;
}
int Threads::thread_pool_size( int depth )
{
return Impl::s_thread_pool_size[depth];
}
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
int Threads::thread_pool_rank()
{
const pthread_t pid = pthread_self();
int i = 0;
while ( ( i < Impl::s_thread_pool_size[0] ) && ( pid != Impl::s_threads_pid[i] ) ) { ++i ; }
return i ;
}
#endif
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
#endif /* #if defined( KOKKOS_HAVE_PTHREAD ) || defined( KOKKOS_HAVE_WINTHREAD ) */
diff --git a/lib/kokkos/core/src/Threads/Kokkos_ThreadsTeam.hpp b/lib/kokkos/core/src/Threads/Kokkos_ThreadsTeam.hpp
index 3407ffaa5..4256b0aa6 100644
--- a/lib/kokkos/core/src/Threads/Kokkos_ThreadsTeam.hpp
+++ b/lib/kokkos/core/src/Threads/Kokkos_ThreadsTeam.hpp
@@ -1,932 +1,933 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_THREADSTEAM_HPP
#define KOKKOS_THREADSTEAM_HPP
#include <stdio.h>
#include <utility>
#include <impl/Kokkos_spinwait.hpp>
#include <impl/Kokkos_FunctorAdapter.hpp>
#include <Kokkos_Atomic.hpp>
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
//----------------------------------------------------------------------------
template< class > struct ThreadsExecAdapter ;
//----------------------------------------------------------------------------
class ThreadsExecTeamMember {
private:
enum { TEAM_REDUCE_SIZE = 512 };
typedef Kokkos::Threads execution_space ;
typedef execution_space::scratch_memory_space space ;
ThreadsExec * const m_exec ;
ThreadsExec * const * m_team_base ; ///< Base for team fan-in
space m_team_shared ;
int m_team_shared_size ;
int m_team_size ;
int m_team_rank ;
int m_team_rank_rev ;
int m_league_size ;
int m_league_end ;
int m_league_rank ;
int m_chunk_size;
int m_league_chunk_end;
int m_invalid_thread;
int m_team_alloc;
inline
void set_team_shared()
{ new( & m_team_shared ) space( ((char *) (*m_team_base)->scratch_memory()) + TEAM_REDUCE_SIZE , m_team_shared_size ); }
public:
// Fan-in and wait until the matching fan-out is called.
// The root thread which does not wait will return true.
// All other threads will return false during the fan-out.
KOKKOS_INLINE_FUNCTION bool team_fan_in() const
{
int n , j ;
// Wait for fan-in threads
for ( n = 1 ; ( ! ( m_team_rank_rev & n ) ) && ( ( j = m_team_rank_rev + n ) < m_team_size ) ; n <<= 1 ) {
Impl::spinwait( m_team_base[j]->state() , ThreadsExec::Active );
}
// If not root then wait for release
if ( m_team_rank_rev ) {
m_exec->state() = ThreadsExec::Rendezvous ;
Impl::spinwait( m_exec->state() , ThreadsExec::Rendezvous );
}
return ! m_team_rank_rev ;
}
KOKKOS_INLINE_FUNCTION void team_fan_out() const
{
int n , j ;
for ( n = 1 ; ( ! ( m_team_rank_rev & n ) ) && ( ( j = m_team_rank_rev + n ) < m_team_size ) ; n <<= 1 ) {
m_team_base[j]->state() = ThreadsExec::Active ;
}
}
public:
KOKKOS_INLINE_FUNCTION static int team_reduce_size() { return TEAM_REDUCE_SIZE ; }
KOKKOS_INLINE_FUNCTION
const execution_space::scratch_memory_space & team_shmem() const
{ return m_team_shared.set_team_thread_mode(0,1,0) ; }
KOKKOS_INLINE_FUNCTION
const execution_space::scratch_memory_space & team_scratch(int) const
{ return m_team_shared.set_team_thread_mode(0,1,0) ; }
KOKKOS_INLINE_FUNCTION
const execution_space::scratch_memory_space & thread_scratch(int) const
{ return m_team_shared.set_team_thread_mode(0,team_size(),team_rank()) ; }
KOKKOS_INLINE_FUNCTION int league_rank() const { return m_league_rank ; }
KOKKOS_INLINE_FUNCTION int league_size() const { return m_league_size ; }
KOKKOS_INLINE_FUNCTION int team_rank() const { return m_team_rank ; }
KOKKOS_INLINE_FUNCTION int team_size() const { return m_team_size ; }
KOKKOS_INLINE_FUNCTION void team_barrier() const
{
team_fan_in();
team_fan_out();
}
template<class ValueType>
KOKKOS_INLINE_FUNCTION
void team_broadcast(ValueType& value, const int& thread_id) const
{
#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
{ }
#else
// Make sure there is enough scratch space:
typedef typename if_c< sizeof(ValueType) < TEAM_REDUCE_SIZE
, ValueType , void >::type type ;
if ( m_team_base ) {
type * const local_value = ((type*) m_team_base[0]->scratch_memory());
if(team_rank() == thread_id) *local_value = value;
memory_fence();
team_barrier();
value = *local_value;
}
#endif
}
template< typename Type >
KOKKOS_INLINE_FUNCTION Type team_reduce( const Type & value ) const
#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
{ return Type(); }
#else
{
// Make sure there is enough scratch space:
typedef typename if_c< sizeof(Type) < TEAM_REDUCE_SIZE , Type , void >::type type ;
if ( 0 == m_exec ) return value ;
*((volatile type*) m_exec->scratch_memory() ) = value ;
memory_fence();
type & accum = *((type *) m_team_base[0]->scratch_memory() );
if ( team_fan_in() ) {
for ( int i = 1 ; i < m_team_size ; ++i ) {
accum += *((type *) m_team_base[i]->scratch_memory() );
}
memory_fence();
}
team_fan_out();
return accum ;
}
#endif
#ifdef KOKKOS_HAVE_CXX11
template< class ValueType, class JoinOp >
KOKKOS_INLINE_FUNCTION ValueType
team_reduce( const ValueType & value
, const JoinOp & op_in ) const
#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
{ return ValueType(); }
#else
{
typedef ValueType value_type;
const JoinLambdaAdapter<value_type,JoinOp> op(op_in);
#endif
#else // KOKKOS_HAVE_CXX11
template< class JoinOp >
KOKKOS_INLINE_FUNCTION typename JoinOp::value_type
team_reduce( const typename JoinOp::value_type & value
, const JoinOp & op ) const
#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
{ return typename JoinOp::value_type(); }
#else
{
typedef typename JoinOp::value_type value_type;
#endif
#endif // KOKKOS_HAVE_CXX11
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
// Make sure there is enough scratch space:
typedef typename if_c< sizeof(value_type) < TEAM_REDUCE_SIZE
, value_type , void >::type type ;
if ( 0 == m_exec ) return value ;
type * const local_value = ((type*) m_exec->scratch_memory());
// Set this thread's contribution
*local_value = value ;
// Fence to make sure the base team member has access:
memory_fence();
if ( team_fan_in() ) {
// The last thread to synchronize returns true, all other threads wait for team_fan_out()
type * const team_value = ((type*) m_team_base[0]->scratch_memory());
// Join to the team value:
for ( int i = 1 ; i < m_team_size ; ++i ) {
op.join( *team_value , *((type*) m_team_base[i]->scratch_memory()) );
}
// Team base thread may "lap" member threads so copy out to their local value.
for ( int i = 1 ; i < m_team_size ; ++i ) {
*((type*) m_team_base[i]->scratch_memory()) = *team_value ;
}
// Fence to make sure all team members have access
memory_fence();
}
team_fan_out();
// Value was changed by the team base
return *((type volatile const *) local_value);
}
#endif
/** \brief Intra-team exclusive prefix sum with team_rank() ordering
* with intra-team non-deterministic ordering accumulation.
*
* The global inter-team accumulation value will, at the end of the
* league's parallel execution, be the scan's total.
* Parallel execution ordering of the league's teams is non-deterministic.
* As such the base value for each team's scan operation is similarly
* non-deterministic.
*/
template< typename ArgType >
KOKKOS_INLINE_FUNCTION ArgType team_scan( const ArgType & value , ArgType * const global_accum ) const
#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
{ return ArgType(); }
#else
{
// Make sure there is enough scratch space:
typedef typename if_c< sizeof(ArgType) < TEAM_REDUCE_SIZE , ArgType , void >::type type ;
if ( 0 == m_exec ) return type(0);
volatile type * const work_value = ((type*) m_exec->scratch_memory());
*work_value = value ;
memory_fence();
if ( team_fan_in() ) {
// The last thread to synchronize returns true, all other threads wait for team_fan_out()
// m_team_base[0] == highest ranking team member
// m_team_base[ m_team_size - 1 ] == lowest ranking team member
//
// 1) copy from lower to higher rank, initialize lowest rank to zero
// 2) prefix sum from lowest to highest rank, skipping lowest rank
type accum = 0 ;
if ( global_accum ) {
for ( int i = m_team_size ; i-- ; ) {
type & val = *((type*) m_team_base[i]->scratch_memory());
accum += val ;
}
accum = atomic_fetch_add( global_accum , accum );
}
for ( int i = m_team_size ; i-- ; ) {
type & val = *((type*) m_team_base[i]->scratch_memory());
const type offset = accum ;
accum += val ;
val = offset ;
}
memory_fence();
}
team_fan_out();
return *work_value ;
}
#endif
/** \brief Intra-team exclusive prefix sum with team_rank() ordering.
*
* The highest rank thread can compute the reduction total as
* reduction_total = dev.team_scan( value ) + value ;
*/
template< typename ArgType >
KOKKOS_INLINE_FUNCTION ArgType team_scan( const ArgType & value ) const
{ return this-> template team_scan<ArgType>( value , 0 ); }
//----------------------------------------
// Private for the driver
template< class ... Properties >
ThreadsExecTeamMember( Impl::ThreadsExec * exec
, const TeamPolicyInternal< Kokkos::Threads , Properties ... > & team
, const int shared_size )
: m_exec( exec )
, m_team_base(0)
, m_team_shared(0,0)
, m_team_shared_size( shared_size )
, m_team_size(team.team_size())
, m_team_rank(0)
, m_team_rank_rev(0)
, m_league_size(0)
, m_league_end(0)
, m_league_rank(0)
, m_chunk_size( team.chunk_size() )
, m_league_chunk_end(0)
, m_team_alloc( team.team_alloc())
{
if ( team.league_size() ) {
// Execution is using device-team interface:
const int pool_rank_rev = m_exec->pool_size() - ( m_exec->pool_rank() + 1 );
const int team_rank_rev = pool_rank_rev % team.team_alloc();
const size_t pool_league_size = m_exec->pool_size() / team.team_alloc() ;
const size_t pool_league_rank_rev = pool_rank_rev / team.team_alloc() ;
const size_t pool_league_rank = pool_league_size - ( pool_league_rank_rev + 1 );
const int pool_num_teams = m_exec->pool_size()/team.team_alloc();
const int chunk_size = team.chunk_size()>0?team.chunk_size():team.team_iter();
const int chunks_per_team = ( team.league_size() + chunk_size*pool_num_teams-1 ) / (chunk_size*pool_num_teams);
int league_iter_end = team.league_size() - pool_league_rank_rev * chunks_per_team * chunk_size;
int league_iter_begin = league_iter_end - chunks_per_team * chunk_size;
if (league_iter_begin < 0) league_iter_begin = 0;
if (league_iter_end>team.league_size()) league_iter_end = team.league_size();
if ((team.team_alloc()>m_team_size)?
(team_rank_rev >= m_team_size):
(m_exec->pool_size() - pool_num_teams*m_team_size > m_exec->pool_rank())
)
m_invalid_thread = 1;
else
m_invalid_thread = 0;
// May be using fewer threads per team than a multiple of threads per core,
// some threads will idle.
if ( team_rank_rev < team.team_size() && !m_invalid_thread) {
m_team_base = m_exec->pool_base() + team.team_alloc() * pool_league_rank_rev ;
m_team_size = team.team_size() ;
m_team_rank = team.team_size() - ( team_rank_rev + 1 );
m_team_rank_rev = team_rank_rev ;
m_league_size = team.league_size();
m_league_rank = ( team.league_size() * pool_league_rank ) / pool_league_size ;
m_league_end = ( team.league_size() * (pool_league_rank+1) ) / pool_league_size ;
set_team_shared();
}
if ( (m_team_rank_rev == 0) && (m_invalid_thread == 0) ) {
m_exec->set_work_range(m_league_rank,m_league_end,m_chunk_size);
m_exec->reset_steal_target(m_team_size);
}
if(std::is_same<typename TeamPolicyInternal<Kokkos::Threads, Properties ...>::schedule_type::type,Kokkos::Dynamic>::value) {
m_exec->barrier();
}
}
+ else
+ { m_invalid_thread = 1; }
}
ThreadsExecTeamMember()
: m_exec(0)
, m_team_base(0)
, m_team_shared(0,0)
, m_team_shared_size(0)
, m_team_size(1)
, m_team_rank(0)
, m_team_rank_rev(0)
, m_league_size(1)
, m_league_end(0)
, m_league_rank(0)
, m_chunk_size(0)
, m_league_chunk_end(0)
, m_invalid_thread(0)
, m_team_alloc(0)
{}
inline
ThreadsExec & threads_exec_team_base() const { return m_team_base ? **m_team_base : *m_exec ; }
bool valid_static() const
{ return m_league_rank < m_league_end ; }
void next_static()
{
if ( m_league_rank < m_league_end ) {
team_barrier();
set_team_shared();
}
m_league_rank++;
}
bool valid_dynamic() {
if(m_invalid_thread)
return false;
if ((m_league_rank < m_league_chunk_end) && (m_league_rank < m_league_size)) {
return true;
}
if ( m_team_rank_rev == 0 ) {
m_team_base[0]->get_work_index(m_team_alloc);
}
team_barrier();
long work_index = m_team_base[0]->team_work_index();
m_league_rank = work_index * m_chunk_size;
m_league_chunk_end = (work_index +1 ) * m_chunk_size;
if(m_league_chunk_end > m_league_size) m_league_chunk_end = m_league_size;
- if(m_league_rank>=0)
+ if((m_league_rank>=0) && (m_league_rank < m_league_chunk_end))
return true;
return false;
}
void next_dynamic() {
if(m_invalid_thread)
return;
if ( m_league_rank < m_league_chunk_end ) {
team_barrier();
set_team_shared();
}
m_league_rank++;
}
void set_league_shmem( const int arg_league_rank
, const int arg_league_size
, const int arg_shmem_size
)
{
m_league_rank = arg_league_rank ;
m_league_size = arg_league_size ;
m_team_shared_size = arg_shmem_size ;
set_team_shared();
}
};
} /* namespace Impl */
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template< class ... Properties >
class TeamPolicyInternal< Kokkos::Threads , Properties ... >: public PolicyTraits<Properties ...>
{
private:
int m_league_size ;
int m_team_size ;
int m_team_alloc ;
int m_team_iter ;
size_t m_team_scratch_size[2];
size_t m_thread_scratch_size[2];
int m_chunk_size;
inline
void init( const int league_size_request
, const int team_size_request )
{
const int pool_size = traits::execution_space::thread_pool_size(0);
const int team_max = traits::execution_space::thread_pool_size(1);
const int team_grain = traits::execution_space::thread_pool_size(2);
m_league_size = league_size_request ;
m_team_size = team_size_request < team_max ?
team_size_request : team_max ;
// Round team size up to a multiple of 'team_gain'
const int team_size_grain = team_grain * ( ( m_team_size + team_grain - 1 ) / team_grain );
const int team_count = pool_size / team_size_grain ;
// Constraint : pool_size = m_team_alloc * team_count
m_team_alloc = pool_size / team_count ;
// Maxumum number of iterations each team will take:
m_team_iter = ( m_league_size + team_count - 1 ) / team_count ;
set_auto_chunk_size();
}
public:
//! Tag this class as a kokkos execution policy
//! Tag this class as a kokkos execution policy
typedef TeamPolicyInternal execution_policy ;
typedef PolicyTraits<Properties ... > traits;
TeamPolicyInternal& operator = (const TeamPolicyInternal& p) {
m_league_size = p.m_league_size;
m_team_size = p.m_team_size;
m_team_alloc = p.m_team_alloc;
m_team_iter = p.m_team_iter;
m_team_scratch_size[0] = p.m_team_scratch_size[0];
m_thread_scratch_size[0] = p.m_thread_scratch_size[0];
m_team_scratch_size[1] = p.m_team_scratch_size[1];
m_thread_scratch_size[1] = p.m_thread_scratch_size[1];
m_chunk_size = p.m_chunk_size;
return *this;
}
//----------------------------------------
template< class FunctorType >
inline static
int team_size_max( const FunctorType & )
{ return traits::execution_space::thread_pool_size(1); }
template< class FunctorType >
static int team_size_recommended( const FunctorType & )
{ return traits::execution_space::thread_pool_size(2); }
template< class FunctorType >
inline static
int team_size_recommended( const FunctorType &, const int& )
{ return traits::execution_space::thread_pool_size(2); }
//----------------------------------------
inline int team_size() const { return m_team_size ; }
inline int team_alloc() const { return m_team_alloc ; }
inline int league_size() const { return m_league_size ; }
inline size_t scratch_size(const int& level, int team_size_ = -1 ) const {
if(team_size_ < 0)
team_size_ = m_team_size;
return m_team_scratch_size[level] + team_size_*m_thread_scratch_size[level] ;
}
inline int team_iter() const { return m_team_iter ; }
/** \brief Specify league size, request team size */
TeamPolicyInternal( typename traits::execution_space &
, int league_size_request
, int team_size_request
, int vector_length_request = 1 )
: m_league_size(0)
, m_team_size(0)
, m_team_alloc(0)
, m_team_scratch_size { 0 , 0 }
, m_thread_scratch_size { 0 , 0 }
, m_chunk_size(0)
{ init(league_size_request,team_size_request); (void) vector_length_request; }
/** \brief Specify league size, request team size */
TeamPolicyInternal( typename traits::execution_space &
, int league_size_request
, const Kokkos::AUTO_t & /* team_size_request */
, int /* vector_length_request */ = 1 )
: m_league_size(0)
, m_team_size(0)
, m_team_alloc(0)
, m_team_scratch_size { 0 , 0 }
, m_thread_scratch_size { 0 , 0 }
, m_chunk_size(0)
{ init(league_size_request,traits::execution_space::thread_pool_size(2)); }
TeamPolicyInternal( int league_size_request
, int team_size_request
, int /* vector_length_request */ = 1 )
: m_league_size(0)
, m_team_size(0)
, m_team_alloc(0)
, m_team_scratch_size { 0 , 0 }
, m_thread_scratch_size { 0 , 0 }
, m_chunk_size(0)
{ init(league_size_request,team_size_request); }
TeamPolicyInternal( int league_size_request
, const Kokkos::AUTO_t & /* team_size_request */
, int /* vector_length_request */ = 1 )
: m_league_size(0)
, m_team_size(0)
, m_team_alloc(0)
, m_team_scratch_size { 0 , 0 }
, m_thread_scratch_size { 0 , 0 }
, m_chunk_size(0)
{ init(league_size_request,traits::execution_space::thread_pool_size(2)); }
inline int chunk_size() const { return m_chunk_size ; }
/** \brief set chunk_size to a discrete value*/
inline TeamPolicyInternal set_chunk_size(typename traits::index_type chunk_size_) const {
TeamPolicyInternal p = *this;
p.m_chunk_size = chunk_size_;
return p;
}
/** \brief set per team scratch size for a specific level of the scratch hierarchy */
inline TeamPolicyInternal set_scratch_size(const int& level, const PerTeamValue& per_team) const {
TeamPolicyInternal p = *this;
p.m_team_scratch_size[level] = per_team.value;
return p;
};
/** \brief set per thread scratch size for a specific level of the scratch hierarchy */
inline TeamPolicyInternal set_scratch_size(const int& level, const PerThreadValue& per_thread) const {
TeamPolicyInternal p = *this;
p.m_thread_scratch_size[level] = per_thread.value;
return p;
};
/** \brief set per thread and per team scratch size for a specific level of the scratch hierarchy */
inline TeamPolicyInternal set_scratch_size(const int& level, const PerTeamValue& per_team, const PerThreadValue& per_thread) const {
TeamPolicyInternal p = *this;
p.m_team_scratch_size[level] = per_team.value;
p.m_thread_scratch_size[level] = per_thread.value;
return p;
};
private:
/** \brief finalize chunk_size if it was set to AUTO*/
inline void set_auto_chunk_size() {
int concurrency = traits::execution_space::thread_pool_size(0)/m_team_alloc;
if( concurrency==0 ) concurrency=1;
if(m_chunk_size > 0) {
if(!Impl::is_integral_power_of_two( m_chunk_size ))
Kokkos::abort("TeamPolicy blocking granularity must be power of two" );
}
int new_chunk_size = 1;
while(new_chunk_size*100*concurrency < m_league_size)
new_chunk_size *= 2;
if(new_chunk_size < 128) {
new_chunk_size = 1;
while( (new_chunk_size*40*concurrency < m_league_size ) && (new_chunk_size<128) )
new_chunk_size*=2;
}
m_chunk_size = new_chunk_size;
}
public:
typedef Impl::ThreadsExecTeamMember member_type ;
friend class Impl::ThreadsExecTeamMember ;
};
} /*namespace Impl */
} /* namespace Kokkos */
namespace Kokkos {
-template<typename iType>
+template< typename iType >
KOKKOS_INLINE_FUNCTION
-Impl::TeamThreadRangeBoundariesStruct<iType,Impl::ThreadsExecTeamMember>
-TeamThreadRange(const Impl::ThreadsExecTeamMember& thread, const iType& count)
+Impl::TeamThreadRangeBoundariesStruct< iType, Impl::ThreadsExecTeamMember >
+TeamThreadRange( const Impl::ThreadsExecTeamMember& thread, const iType& count )
{
- return Impl::TeamThreadRangeBoundariesStruct<iType,Impl::ThreadsExecTeamMember>(thread,count);
+ return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::ThreadsExecTeamMember >( thread, count );
}
-template<typename iType>
+template< typename iType1, typename iType2 >
KOKKOS_INLINE_FUNCTION
-Impl::TeamThreadRangeBoundariesStruct<iType,Impl::ThreadsExecTeamMember>
-TeamThreadRange( const Impl::ThreadsExecTeamMember& thread
- , const iType & begin
- , const iType & end
- )
+Impl::TeamThreadRangeBoundariesStruct< typename std::common_type< iType1, iType2 >::type,
+ Impl::ThreadsExecTeamMember>
+TeamThreadRange( const Impl::ThreadsExecTeamMember& thread, const iType1 & begin, const iType2 & end )
{
- return Impl::TeamThreadRangeBoundariesStruct<iType,Impl::ThreadsExecTeamMember>(thread,begin,end);
+ typedef typename std::common_type< iType1, iType2 >::type iType;
+ return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::ThreadsExecTeamMember >( thread, iType(begin), iType(end) );
}
template<typename iType>
KOKKOS_INLINE_FUNCTION
Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::ThreadsExecTeamMember >
ThreadVectorRange(const Impl::ThreadsExecTeamMember& thread, const iType& count) {
return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::ThreadsExecTeamMember >(thread,count);
}
KOKKOS_INLINE_FUNCTION
Impl::ThreadSingleStruct<Impl::ThreadsExecTeamMember> PerTeam(const Impl::ThreadsExecTeamMember& thread) {
return Impl::ThreadSingleStruct<Impl::ThreadsExecTeamMember>(thread);
}
KOKKOS_INLINE_FUNCTION
Impl::VectorSingleStruct<Impl::ThreadsExecTeamMember> PerThread(const Impl::ThreadsExecTeamMember& thread) {
return Impl::VectorSingleStruct<Impl::ThreadsExecTeamMember>(thread);
}
} // namespace Kokkos
namespace Kokkos {
/** \brief Inter-thread parallel_for. Executes lambda(iType i) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all threads of the the calling thread team.
* This functionality requires C++11 support.*/
template<typename iType, class Lambda>
KOKKOS_INLINE_FUNCTION
void parallel_for(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::ThreadsExecTeamMember>& loop_boundaries, const Lambda& lambda) {
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment)
lambda(i);
}
/** \brief Inter-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all threads of the the calling thread team and a summation of
* val is performed and put into result. This functionality requires C++11 support.*/
template< typename iType, class Lambda, typename ValueType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::ThreadsExecTeamMember>& loop_boundaries,
const Lambda & lambda, ValueType& result) {
result = ValueType();
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
ValueType tmp = ValueType();
lambda(i,tmp);
result+=tmp;
}
result = loop_boundaries.thread.team_reduce(result,Impl::JoinAdd<ValueType>());
}
#if defined( KOKKOS_HAVE_CXX11 )
/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a reduction of
* val is performed using JoinType(ValueType& val, const ValueType& update) and put into init_result.
* The input value of init_result is used as initializer for temporary variables of ValueType. Therefore
* the input value should be the neutral element with respect to the join operation (e.g. '0 for +-' or
* '1 for *'). This functionality requires C++11 support.*/
template< typename iType, class Lambda, typename ValueType, class JoinType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::ThreadsExecTeamMember>& loop_boundaries,
const Lambda & lambda, const JoinType& join, ValueType& init_result) {
ValueType result = init_result;
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
ValueType tmp = ValueType();
lambda(i,tmp);
join(result,tmp);
}
init_result = loop_boundaries.thread.team_reduce(result,Impl::JoinLambdaAdapter<ValueType,JoinType>(join));
}
#endif /* #if defined( KOKKOS_HAVE_CXX11 ) */
} //namespace Kokkos
namespace Kokkos {
/** \brief Intra-thread vector parallel_for. Executes lambda(iType i) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes of the the calling thread.
* This functionality requires C++11 support.*/
template<typename iType, class Lambda>
KOKKOS_INLINE_FUNCTION
void parallel_for(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::ThreadsExecTeamMember >&
loop_boundaries, const Lambda& lambda) {
#ifdef KOKKOS_HAVE_PRAGMA_IVDEP
#pragma ivdep
#endif
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment)
lambda(i);
}
/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a summation of
* val is performed and put into result. This functionality requires C++11 support.*/
template< typename iType, class Lambda, typename ValueType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::ThreadsExecTeamMember >&
loop_boundaries, const Lambda & lambda, ValueType& result) {
result = ValueType();
#ifdef KOKKOS_HAVE_PRAGMA_IVDEP
#pragma ivdep
#endif
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
ValueType tmp = ValueType();
lambda(i,tmp);
result+=tmp;
}
}
/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a reduction of
* val is performed using JoinType(ValueType& val, const ValueType& update) and put into init_result.
* The input value of init_result is used as initializer for temporary variables of ValueType. Therefore
* the input value should be the neutral element with respect to the join operation (e.g. '0 for +-' or
* '1 for *'). This functionality requires C++11 support.*/
template< typename iType, class Lambda, typename ValueType, class JoinType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::ThreadsExecTeamMember >&
loop_boundaries, const Lambda & lambda, const JoinType& join, ValueType& init_result) {
ValueType result = init_result;
#ifdef KOKKOS_HAVE_PRAGMA_IVDEP
#pragma ivdep
#endif
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
ValueType tmp = ValueType();
lambda(i,tmp);
join(result,tmp);
}
init_result = result;
}
/** \brief Intra-thread vector parallel exclusive prefix sum. Executes lambda(iType i, ValueType & val, bool final)
* for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes in the thread and a scan operation is performed.
* Depending on the target execution space the operator might be called twice: once with final=false
* and once with final=true. When final==true val contains the prefix sum value. The contribution of this
* "i" needs to be added to val no matter whether final==true or not. In a serial execution
* (i.e. team_size==1) the operator is only called once with final==true. Scan_val will be set
* to the final sum value over all vector lanes.
* This functionality requires C++11 support.*/
template< typename iType, class FunctorType >
KOKKOS_INLINE_FUNCTION
void parallel_scan(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::ThreadsExecTeamMember >&
loop_boundaries, const FunctorType & lambda) {
typedef Kokkos::Impl::FunctorValueTraits< FunctorType , void > ValueTraits ;
typedef typename ValueTraits::value_type value_type ;
value_type scan_val = value_type();
#ifdef KOKKOS_HAVE_PRAGMA_IVDEP
#pragma ivdep
#endif
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
lambda(i,scan_val,true);
}
}
} // namespace Kokkos
namespace Kokkos {
template<class FunctorType>
KOKKOS_INLINE_FUNCTION
void single(const Impl::VectorSingleStruct<Impl::ThreadsExecTeamMember>& single_struct, const FunctorType& lambda) {
lambda();
}
template<class FunctorType>
KOKKOS_INLINE_FUNCTION
void single(const Impl::ThreadSingleStruct<Impl::ThreadsExecTeamMember>& single_struct, const FunctorType& lambda) {
if(single_struct.team_member.team_rank()==0) lambda();
}
template<class FunctorType, class ValueType>
KOKKOS_INLINE_FUNCTION
void single(const Impl::VectorSingleStruct<Impl::ThreadsExecTeamMember>& single_struct, const FunctorType& lambda, ValueType& val) {
lambda(val);
}
template<class FunctorType, class ValueType>
KOKKOS_INLINE_FUNCTION
void single(const Impl::ThreadSingleStruct<Impl::ThreadsExecTeamMember>& single_struct, const FunctorType& lambda, ValueType& val) {
if(single_struct.team_member.team_rank()==0) {
lambda(val);
}
single_struct.team_member.team_broadcast(val,0);
}
}
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
#endif /* #define KOKKOS_THREADSTEAM_HPP */
diff --git a/lib/kokkos/core/src/Threads/Kokkos_Threads_TaskPolicy.cpp b/lib/kokkos/core/src/Threads/Kokkos_Threads_TaskPolicy.cpp
deleted file mode 100644
index e1599284b..000000000
--- a/lib/kokkos/core/src/Threads/Kokkos_Threads_TaskPolicy.cpp
+++ /dev/null
@@ -1,930 +0,0 @@
-/*
-//@HEADER
-// ************************************************************************
-//
-// Kokkos v. 2.0
-// Copyright (2014) Sandia Corporation
-//
-// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
-// the U.S. Government retains certain rights in this software.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are
-// met:
-//
-// 1. Redistributions of source code must retain the above copyright
-// notice, this list of conditions and the following disclaimer.
-//
-// 2. Redistributions in binary form must reproduce the above copyright
-// notice, this list of conditions and the following disclaimer in the
-// documentation and/or other materials provided with the distribution.
-//
-// 3. Neither the name of the Corporation nor the names of the
-// contributors may be used to endorse or promote products derived from
-// this software without specific prior written permission.
-//
-// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
-// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
-// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
-// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
-// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
-// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
-// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
-// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
-// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-//
-// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
-// ************************************************************************
-//@HEADER
-*/
-
-// Experimental unified task-data parallel manycore LDRD
-
-#include <stdio.h>
-#include <iostream>
-#include <sstream>
-#include <Kokkos_Core.hpp>
-#include <Threads/Kokkos_Threads_TaskPolicy.hpp>
-
-#if defined( KOKKOS_HAVE_PTHREAD ) && defined( KOKKOS_ENABLE_TASKPOLICY )
-
-#define QLOCK (reinterpret_cast<void*>( ~((uintptr_t)0) ))
-#define QDENIED (reinterpret_cast<void*>( ~((uintptr_t)0) - 1 ))
-
-namespace Kokkos {
-namespace Experimental {
-namespace Impl {
-
-void ThreadsTaskPolicyQueue::Destroy::destroy_shared_allocation()
-{
- // Verify the queue is empty
-
- if ( m_policy->m_count_ready ||
- m_policy->m_team[0] ||
- m_policy->m_team[1] ||
- m_policy->m_team[2] ||
- m_policy->m_serial[0] ||
- m_policy->m_serial[1] ||
- m_policy->m_serial[2] ) {
- Kokkos::abort("ThreadsTaskPolicyQueue ERROR : Attempt to destroy non-empty queue" );
- }
-
- m_policy->~ThreadsTaskPolicyQueue();
-}
-
-//----------------------------------------------------------------------------
-
-ThreadsTaskPolicyQueue::~ThreadsTaskPolicyQueue()
-{
-}
-
-ThreadsTaskPolicyQueue::ThreadsTaskPolicyQueue
- ( const unsigned arg_task_max_count
- , const unsigned arg_task_max_size
- , const unsigned arg_task_default_dependence_capacity
- , const unsigned arg_task_team_size
- )
- : m_space( Kokkos::Threads::memory_space()
- , arg_task_max_size * arg_task_max_count * 1.2
- , 16 /* log2(superblock size) */
- )
- , m_team { 0 , 0 , 0 }
- , m_serial { 0 , 0 , 0 }
- , m_team_size( arg_task_team_size )
- , m_default_dependence_capacity( arg_task_default_dependence_capacity )
- , m_count_ready(0)
- , m_count_alloc(0)
-{
- const int threads_total = Threads::thread_pool_size(0);
- const int threads_per_numa = Threads::thread_pool_size(1);
- const int threads_per_core = Threads::thread_pool_size(2);
-
- if ( 0 == m_team_size ) {
- // If a team task then claim for execution until count is zero
- // Issue: team collectives cannot assume which pool members are in the team.
- // Issue: team must only span a single NUMA region.
-
- // If more than one thread per core then map cores to work team,
- // else map numa to work team.
-
- if ( 1 < threads_per_core ) m_team_size = threads_per_core ;
- else if ( 1 < threads_per_numa ) m_team_size = threads_per_numa ;
- else m_team_size = 1 ;
- }
-
- // Verify a valid team size
- const bool valid_team_size =
- ( 0 < m_team_size && m_team_size <= threads_total ) &&
- (
- ( 1 == m_team_size ) ||
- ( threads_per_core == m_team_size ) ||
- ( threads_per_numa == m_team_size )
- );
-
- if ( ! valid_team_size ) {
- std::ostringstream msg ;
-
- msg << "Kokkos::Experimental::TaskPolicy< Kokkos::Threads > ERROR"
- << " invalid team_size(" << m_team_size << ")"
- << " threads_per_core(" << threads_per_core << ")"
- << " threads_per_numa(" << threads_per_numa << ")"
- << " threads_total(" << threads_total << ")"
- ;
-
- Kokkos::Impl::throw_runtime_exception( msg.str() );
- }
-
- Kokkos::memory_fence();
-}
-
-//----------------------------------------------------------------------------
-
-void ThreadsTaskPolicyQueue::driver( Kokkos::Impl::ThreadsExec & exec
- , const void * arg )
-{
- // Whole thread pool is calling this function
-
- typedef Kokkos::Impl::ThreadsExecTeamMember member_type ;
-
- ThreadsTaskPolicyQueue & self =
- * reinterpret_cast< ThreadsTaskPolicyQueue * >( const_cast<void*>(arg) );
-
- // Create the thread team member with shared memory for the given task.
-
- const TeamPolicy< Kokkos::Threads > team_policy( 1 , self.m_team_size );
-
- member_type team_member( & exec , team_policy , 0 );
-
- Kokkos::Impl::ThreadsExec & exec_team_base =
- team_member.threads_exec_team_base();
-
- task_root_type * volatile * const task_team_ptr =
- reinterpret_cast<task_root_type**>( exec_team_base.reduce_memory() );
-
- volatile int * const work_team_ptr =
- reinterpret_cast<volatile int*>( task_team_ptr + 1 );
-
- // Each team must iterate this loop synchronously
- // to insure team-execution of team-task.
-
- const bool team_lead = team_member.team_fan_in();
-
- bool work_team = true ;
-
- while ( work_team ) {
-
- task_root_type * task = 0 ;
-
- // Start here with members in a fan_in state
-
- if ( team_lead ) {
- // Team lead queries the ready count for a team-consistent view.
- *work_team_ptr = 0 != self.m_count_ready ;
-
- // Only the team lead attempts to pop a team task from the queues
- for ( int i = 0 ; i < int(NPRIORITY) && 0 == task ; ++i ) {
- if ( ( i < 2 /* regular queue */ )
- || ( ! self.m_space.is_empty() /* waiting for memory queue */ ) ) {
- task = pop_ready_task( & self.m_team[i] );
- }
- }
-
- *task_team_ptr = task ;
- }
-
- Kokkos::memory_fence();
-
- team_member.team_fan_out();
-
- work_team = *work_team_ptr ;
-
- // Query if team acquired a team task
-
- if ( 0 != ( task = *task_team_ptr ) ) {
- // Set shared memory
- team_member.set_league_shmem( 0 , 1 , task->m_shmem_size );
-
- (*task->m_team)( task , team_member );
-
- // The team task called the functor,
- // called the team_fan_in(), and
- // if completed the team lead destroyed the task functor.
-
- if ( team_lead ) {
- self.complete_executed_task( task );
- }
- }
- else {
- // No team task acquired, each thread try a serial task
- // Try the priority queue, then the regular queue.
- for ( int i = 0 ; i < int(NPRIORITY) && 0 == task ; ++i ) {
- if ( ( i < 2 /* regular queue */ )
- || ( ! self.m_space.is_empty() /* waiting for memory queue */ ) ) {
- task = pop_ready_task( & self.m_serial[i] );
- }
- }
-
- if ( 0 != task ) {
-
- (*task->m_serial)( task );
-
- self.complete_executed_task( task );
- }
-
- team_member.team_fan_in();
- }
- }
-
- team_member.team_fan_out();
-
- exec.fan_in();
-}
-
-//----------------------------------------------------------------------------
-
-ThreadsTaskPolicyQueue::task_root_type *
-ThreadsTaskPolicyQueue::pop_ready_task(
- ThreadsTaskPolicyQueue::task_root_type * volatile * const queue )
-{
- task_root_type * const q_lock = reinterpret_cast<task_root_type*>(QLOCK);
- task_root_type * task = 0 ;
- task_root_type * const task_claim = *queue ;
-
- if ( ( q_lock != task_claim ) && ( 0 != task_claim ) ) {
-
- // Queue is not locked and not null, try to claim head of queue.
- // Is a race among threads to claim the queue.
-
- if ( task_claim == atomic_compare_exchange(queue,task_claim,q_lock) ) {
-
- // Aquired the task which must be in the waiting state.
-
- const int claim_state =
- atomic_compare_exchange( & task_claim->m_state
- , int(TASK_STATE_WAITING)
- , int(TASK_STATE_EXECUTING) );
-
- task_root_type * lock_verify = 0 ;
-
- if ( claim_state == int(TASK_STATE_WAITING) ) {
-
- // Transitioned this task from waiting to executing
- // Update the queue to the next entry and release the lock
-
- task_root_type * const next =
- *((task_root_type * volatile *) & task_claim->m_next );
-
- *((task_root_type * volatile *) & task_claim->m_next ) = 0 ;
-
- lock_verify = atomic_compare_exchange( queue , q_lock , next );
- }
-
- if ( ( claim_state != int(TASK_STATE_WAITING) ) |
- ( q_lock != lock_verify ) ) {
-
- fprintf(stderr,"ThreadsTaskPolicyQueue::pop_ready_task(0x%lx) task(0x%lx) state(%d) ERROR %s\n"
- , (unsigned long) queue
- , (unsigned long) task
- , claim_state
- , ( claim_state != int(TASK_STATE_WAITING)
- ? "NOT WAITING"
- : "UNLOCK" ) );
- fflush(stderr);
- Kokkos::abort("ThreadsTaskPolicyQueue::pop_ready_task");
- }
-
- task = task_claim ;
- }
- }
-
- return task ;
-}
-
-//----------------------------------------------------------------------------
-
-void ThreadsTaskPolicyQueue::complete_executed_task(
- ThreadsTaskPolicyQueue::task_root_type * task )
-{
- task_root_type * const q_denied = reinterpret_cast<task_root_type*>(QDENIED);
-
- // State is either executing or if respawned then waiting,
- // try to transition from executing to complete.
- // Reads the current value.
-
- const int state_old =
- atomic_compare_exchange( & task->m_state
- , int(Kokkos::Experimental::TASK_STATE_EXECUTING)
- , int(Kokkos::Experimental::TASK_STATE_COMPLETE) );
-
- if ( int(Kokkos::Experimental::TASK_STATE_WAITING) == state_old ) {
- // Task requested a respawn so reschedule it.
- // The reference count will be incremented if placed in a queue.
- schedule_task( task , false /* not the initial spawn */ );
- }
- else if ( int(Kokkos::Experimental::TASK_STATE_EXECUTING) == state_old ) {
- /* Task is complete */
-
- // Clear dependences of this task before locking wait queue
-
- task->clear_dependence();
-
- // Stop other tasks from adding themselves to this task's wait queue.
- // The wait queue is updated concurrently so guard with an atomic.
-
- task_root_type * wait_queue = *((task_root_type * volatile *) & task->m_wait );
- task_root_type * wait_queue_old = 0 ;
-
- do {
- wait_queue_old = wait_queue ;
- wait_queue = atomic_compare_exchange( & task->m_wait , wait_queue_old , q_denied );
- } while ( wait_queue_old != wait_queue );
-
- // The task has been removed from ready queue and
- // execution is complete so decrement the reference count.
- // The reference count was incremented by the initial spawning.
- // The task may be deleted if this was the last reference.
- task_root_type::assign( & task , 0 );
-
- // Pop waiting tasks and schedule them
- while ( wait_queue ) {
- task_root_type * const x = wait_queue ; wait_queue = x->m_next ; x->m_next = 0 ;
- schedule_task( x , false /* not the initial spawn */ );
- }
- }
- else {
- fprintf( stderr
- , "ThreadsTaskPolicyQueue::complete_executed_task(0x%lx) ERROR state_old(%d) dep_size(%d)\n"
- , (unsigned long)( task )
- , int(state_old)
- , task->m_dep_size
- );
- fflush( stderr );
- Kokkos::abort("ThreadsTaskPolicyQueue::complete_executed_task" );
- }
-
- // If the task was respawned it may have already been
- // put in a ready queue and the count incremented.
- // By decrementing the count last it will never go to zero
- // with a ready or executing task.
-
- atomic_fetch_add( & m_count_ready , -1 );
-}
-
-//----------------------------------------------------------------------------
-
-void ThreadsTaskPolicyQueue::reschedule_task(
- ThreadsTaskPolicyQueue::task_root_type * const task )
-{
- // Reschedule transitions from executing back to waiting.
- const int old_state =
- atomic_compare_exchange( & task->m_state
- , int(TASK_STATE_EXECUTING)
- , int(TASK_STATE_WAITING) );
-
- if ( old_state != int(TASK_STATE_EXECUTING) ) {
-
- fprintf( stderr
- , "ThreadsTaskPolicyQueue::reschedule_task(0x%lx) ERROR state(%d)\n"
- , (unsigned long) task
- , old_state
- );
- fflush(stderr);
- Kokkos::abort("ThreadsTaskPolicyQueue::reschedule" );
- }
-}
-
-void ThreadsTaskPolicyQueue::schedule_task
- ( ThreadsTaskPolicyQueue::task_root_type * const task
- , const bool initial_spawn )
-{
- task_root_type * const q_lock = reinterpret_cast<task_root_type*>(QLOCK);
- task_root_type * const q_denied = reinterpret_cast<task_root_type*>(QDENIED);
-
- //----------------------------------------
- // State is either constructing or already waiting.
- // If constructing then transition to waiting.
-
- {
- const int old_state = atomic_compare_exchange( & task->m_state
- , int(TASK_STATE_CONSTRUCTING)
- , int(TASK_STATE_WAITING) );
-
- // Head of linked list of tasks waiting on this task
- task_root_type * const waitTask =
- *((task_root_type * volatile const *) & task->m_wait );
-
- // Member of linked list of tasks waiting on some other task
- task_root_type * const next =
- *((task_root_type * volatile const *) & task->m_next );
-
- // An incomplete and non-executing task has:
- // task->m_state == TASK_STATE_CONSTRUCTING or TASK_STATE_WAITING
- // task->m_wait != q_denied
- // task->m_next == 0
- //
- if ( ( q_denied == waitTask ) ||
- ( 0 != next ) ||
- ( old_state != int(TASK_STATE_CONSTRUCTING) &&
- old_state != int(TASK_STATE_WAITING) ) ) {
- fprintf(stderr,"ThreadsTaskPolicyQueue::schedule_task(0x%lx) STATE ERROR: state(%d) wait(0x%lx) next(0x%lx)\n"
- , (unsigned long) task
- , old_state
- , (unsigned long) waitTask
- , (unsigned long) next );
- fflush(stderr);
- Kokkos::abort("ThreadsTaskPolicyQueue::schedule" );
- }
- }
-
- //----------------------------------------
-
- if ( initial_spawn ) {
- // The initial spawn of a task increments the reference count
- // for the task's existence in either a waiting or ready queue
- // until the task has completed.
- // Completing the task's execution is the matching
- // decrement of the reference count.
-
- task_root_type::assign( 0 , task );
- }
-
- //----------------------------------------
- // Insert this task into a dependence task that is not complete.
- // Push on to that task's wait queue.
-
- bool attempt_insert_in_queue = true ;
-
- task_root_type * volatile * queue =
- task->m_dep_size ? & task->m_dep[0]->m_wait : (task_root_type **) 0 ;
-
- for ( int i = 0 ; attempt_insert_in_queue && ( 0 != queue ) ; ) {
-
- task_root_type * const head_value_old = *queue ;
-
- if ( q_denied == head_value_old ) {
- // Wait queue is closed because task is complete,
- // try again with the next dependence wait queue.
- ++i ;
- queue = i < task->m_dep_size ? & task->m_dep[i]->m_wait
- : (task_root_type **) 0 ;
- }
- else {
-
- // Wait queue is open and not denied.
- // Have exclusive access to this task.
- // Assign m_next assuming a successfull insertion into the queue.
- // Fence the memory assignment before attempting the CAS.
-
- *((task_root_type * volatile *) & task->m_next ) = head_value_old ;
-
- memory_fence();
-
- // Attempt to insert this task into the queue.
- // If fails then continue the attempt.
-
- attempt_insert_in_queue =
- head_value_old != atomic_compare_exchange(queue,head_value_old,task);
- }
- }
-
- //----------------------------------------
- // All dependences are complete, insert into the ready list
-
- if ( attempt_insert_in_queue ) {
-
- // Increment the count of ready tasks.
- // Count will be decremented when task is complete.
-
- atomic_fetch_add( & m_count_ready , 1 );
-
- queue = task->m_queue ;
-
- while ( attempt_insert_in_queue ) {
-
- // A locked queue is being popped.
-
- task_root_type * const head_value_old = *queue ;
-
- if ( q_lock != head_value_old ) {
- // Read the head of ready queue,
- // if same as previous value then CAS locks the ready queue
-
- // Have exclusive access to this task,
- // assign to head of queue, assuming successful insert
- // Fence assignment before attempting insert.
- *((task_root_type * volatile *) & task->m_next ) = head_value_old ;
-
- memory_fence();
-
- attempt_insert_in_queue =
- head_value_old != atomic_compare_exchange(queue,head_value_old,task);
- }
- }
- }
-}
-
-
-void TaskMember< Kokkos::Threads , void , void >::latch_add( const int k )
-{
- typedef TaskMember< Kokkos::Threads , void , void > task_root_type ;
-
- task_root_type * const q_denied = reinterpret_cast<task_root_type*>(QDENIED);
-
- const bool ok_input = 0 < k ;
-
- const int count = ok_input ? atomic_fetch_add( & m_dep_size , -k ) - k
- : k ;
-
- const bool ok_count = 0 <= count ;
-
- const int state = 0 != count ? TASK_STATE_WAITING :
- atomic_compare_exchange( & m_state
- , TASK_STATE_WAITING
- , TASK_STATE_COMPLETE );
-
- const bool ok_state = state == TASK_STATE_WAITING ;
-
- if ( ! ok_count || ! ok_state ) {
- printf( "ThreadsTaskPolicyQueue::latch_add[0x%lx](%d) ERROR %s %d\n"
- , (unsigned long) this
- , k
- , ( ! ok_input ? "Non-positive input" :
- ( ! ok_count ? "Negative count" : "Bad State" ) )
- , ( ! ok_input ? k :
- ( ! ok_count ? count : state ) )
- );
- Kokkos::abort( "ThreadsTaskPolicyQueue::latch_add ERROR" );
- }
- else if ( 0 == count ) {
- // Stop other tasks from adding themselves to this latch's wait queue.
- // The wait queue is updated concurrently so guard with an atomic.
-
- ThreadsTaskPolicyQueue & policy = *m_policy ;
- task_root_type * wait_queue = *((task_root_type * volatile *) &m_wait);
- task_root_type * wait_queue_old = 0 ;
-
- do {
- wait_queue_old = wait_queue ;
- wait_queue = atomic_compare_exchange( & m_wait , wait_queue_old , q_denied );
- } while ( wait_queue_old != wait_queue );
-
- // Pop waiting tasks and schedule them
- while ( wait_queue ) {
- task_root_type * const x = wait_queue ; wait_queue = x->m_next ; x->m_next = 0 ;
- policy.schedule_task( x , false /* not initial spawn */ );
- }
- }
-}
-
-//----------------------------------------------------------------------------
-
-void ThreadsTaskPolicyQueue::deallocate_task( void * ptr , unsigned size_alloc )
-{
-/*
- const int n = atomic_fetch_add( & alloc_count , -1 ) - 1 ;
-
- fprintf( stderr
- , "ThreadsTaskPolicyQueue::deallocate_task(0x%lx,%d) count(%d)\n"
- , (unsigned long) ptr
- , size_alloc
- , n
- );
- fflush( stderr );
-*/
-
- m_space.deallocate( ptr , size_alloc );
-
- Kokkos::atomic_decrement( & m_count_alloc );
-}
-
-ThreadsTaskPolicyQueue::task_root_type *
-ThreadsTaskPolicyQueue::allocate_task
- ( const unsigned arg_sizeof_task
- , const unsigned arg_dep_capacity
- , const unsigned arg_team_shmem
- )
-{
- const unsigned base_size = arg_sizeof_task +
- ( arg_sizeof_task % sizeof(task_root_type*)
- ? sizeof(task_root_type*) - arg_sizeof_task % sizeof(task_root_type*)
- : 0 );
-
- const unsigned dep_capacity
- = ~0u == arg_dep_capacity
- ? m_default_dependence_capacity
- : arg_dep_capacity ;
-
- const unsigned size_alloc =
- base_size + sizeof(task_root_type*) * dep_capacity ;
-
-#if 0
- // User created task memory pool with an estimate,
- // if estimate is to low then report and throw exception.
-
- if ( m_space.get_min_block_size() < size_alloc ) {
- fprintf(stderr,"TaskPolicy<Threads> task allocation requires %d bytes on memory pool with %d byte chunk size\n"
- , int(size_alloc)
- , int(m_space.get_min_block_size())
- );
- fflush(stderr);
- Kokkos::Impl::throw_runtime_exception("TaskMember< Threads >::task_allocate");
- }
-#endif
-
- task_root_type * const task =
- reinterpret_cast<task_root_type*>( m_space.allocate( size_alloc ) );
-
- if ( task != 0 ) {
-
- // Initialize task's root and value data structure
- // Calling function must copy construct the functor.
-
- new( (void*) task ) task_root_type();
-
- task->m_policy = this ;
- task->m_size_alloc = size_alloc ;
- task->m_dep_capacity = dep_capacity ;
- task->m_shmem_size = arg_team_shmem ;
-
- if ( dep_capacity ) {
- task->m_dep =
- reinterpret_cast<task_root_type**>(
- reinterpret_cast<unsigned char*>(task) + base_size );
-
- for ( unsigned i = 0 ; i < dep_capacity ; ++i )
- task->task_root_type::m_dep[i] = 0 ;
- }
-
- Kokkos::atomic_increment( & m_count_alloc );
- }
- return task ;
-}
-
-
-//----------------------------------------------------------------------------
-
-void ThreadsTaskPolicyQueue::add_dependence
- ( ThreadsTaskPolicyQueue::task_root_type * const after
- , ThreadsTaskPolicyQueue::task_root_type * const before
- )
-{
- if ( ( after != 0 ) && ( before != 0 ) ) {
-
- int const state = *((volatile const int *) & after->m_state );
-
- // Only add dependence during construction or during execution.
- // Both tasks must have the same policy.
- // Dependence on non-full memory cannot be mixed with any other dependence.
-
- const bool ok_state =
- Kokkos::Experimental::TASK_STATE_CONSTRUCTING == state ||
- Kokkos::Experimental::TASK_STATE_EXECUTING == state ;
-
- const bool ok_capacity =
- after->m_dep_size < after->m_dep_capacity ;
-
- const bool ok_policy =
- after->m_policy == this && before->m_policy == this ;
-
- if ( ok_state && ok_capacity && ok_policy ) {
-
- ++after->m_dep_size ;
-
- task_root_type::assign( after->m_dep + (after->m_dep_size-1) , before );
-
- memory_fence();
- }
- else {
-
-fprintf( stderr
- , "ThreadsTaskPolicyQueue::add_dependence( 0x%lx , 0x%lx ) ERROR %s\n"
- , (unsigned long) after
- , (unsigned long) before
- , ( ! ok_state ? "Task not constructing or executing" :
- ( ! ok_capacity ? "Task Exceeded dependence capacity"
- : "Tasks from different policies"
- )) );
-
-fflush( stderr );
-
- Kokkos::abort("ThreadsTaskPolicyQueue::add_dependence ERROR");
- }
- }
-}
-
-} /* namespace Impl */
-} /* namespace Experimental */
-} /* namespace Kokkos */
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Experimental {
-
-TaskPolicy< Kokkos::Threads >::TaskPolicy
- ( const unsigned arg_task_max_count
- , const unsigned arg_task_max_size // Application's task size
- , const unsigned arg_task_default_dependence_capacity
- , const unsigned arg_task_team_size
- )
- : m_track()
- , m_policy(0)
-{
- typedef Kokkos::Experimental::Impl::SharedAllocationRecord
- < Kokkos::HostSpace , Impl::ThreadsTaskPolicyQueue::Destroy > record_type ;
-
- record_type * record =
- record_type::allocate( Kokkos::HostSpace()
- , "Threads task queue"
- , sizeof(Impl::ThreadsTaskPolicyQueue)
- );
-
- m_policy =
- reinterpret_cast< Impl::ThreadsTaskPolicyQueue * >( record->data() );
-
- // Tasks are allocated with application's task size + sizeof(task_root_type)
-
- const size_t full_task_size_estimate =
- arg_task_max_size +
- sizeof(task_root_type) +
- sizeof(task_root_type*) * arg_task_default_dependence_capacity ;
-
- new( m_policy )
- Impl::ThreadsTaskPolicyQueue( arg_task_max_count
- , full_task_size_estimate
- , arg_task_default_dependence_capacity
- , arg_task_team_size );
-
- record->m_destroy.m_policy = m_policy ;
-
- m_track.assign_allocated_record_to_uninitialized( record );
-}
-
-
-TaskPolicy< Kokkos::Threads >::member_type &
-TaskPolicy< Kokkos::Threads >::member_single()
-{
- static member_type s ;
- return s ;
-}
-
-void wait( Kokkos::Experimental::TaskPolicy< Kokkos::Threads > & policy )
-{
- typedef Kokkos::Impl::ThreadsExecTeamMember member_type ;
-
- enum { BASE_SHMEM = 1024 };
-
- Kokkos::Impl::ThreadsExec::resize_scratch( 0 , member_type::team_reduce_size() + BASE_SHMEM );
-
- Kokkos::Impl::ThreadsExec::start( & Impl::ThreadsTaskPolicyQueue::driver
- , policy.m_policy );
-
- Kokkos::Impl::ThreadsExec::fence();
-}
-
-} /* namespace Experimental */
-} /* namespace Kokkos */
-
-namespace Kokkos {
-namespace Experimental {
-namespace Impl {
-
-typedef TaskMember< Kokkos::Threads , void , void > Task ;
-
-//----------------------------------------------------------------------------
-
-Task::~TaskMember()
-{
-}
-
-//----------------------------------------------------------------------------
-
-#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
-
-void Task::assign( Task ** const lhs_ptr , Task * rhs )
-{
- Task * const q_denied = reinterpret_cast<Task*>(QDENIED);
-
- // Increment rhs reference count.
- if ( rhs ) { atomic_fetch_add( & rhs->m_ref_count , 1 ); }
-
- if ( 0 == lhs_ptr ) return ;
-
- // Must have exclusive access to *lhs_ptr.
- // Assign the pointer and retrieve the previous value.
-
-#if 1
-
- Task * const old_lhs = *lhs_ptr ;
-
- *lhs_ptr = rhs ;
-
-#elif 0
-
- Task * const old_lhs = *((Task*volatile*)lhs_ptr);
-
- *((Task*volatile*)lhs_ptr) = rhs ;
-
- Kokkos::memory_fence();
-
-#else
-
- Task * const old_lhs = atomic_exchange( lhs_ptr , rhs );
-
-#endif
-
- if ( old_lhs && rhs && old_lhs->m_policy != rhs->m_policy ) {
- Kokkos::abort( "Kokkos::Impl::TaskMember<Kokkos::Threads>::assign ERROR different queues");
- }
-
- if ( old_lhs ) {
-
- // Decrement former lhs reference count.
- // If reference count is zero task must be complete, then delete task.
- // Task is ready for deletion when wait == q_denied
- int const count = atomic_fetch_add( & (old_lhs->m_ref_count) , -1 ) - 1 ;
- int const state = old_lhs->m_state ;
- Task * const wait = *((Task * const volatile *) & old_lhs->m_wait );
-
- const bool ok_count = 0 <= count ;
-
- // If count == 0 then will be deleting
- // and must either be constructing or complete.
- const bool ok_state = 0 < count ? true :
- ( ( state == int(TASK_STATE_CONSTRUCTING) && wait == 0 ) ||
- ( state == int(TASK_STATE_COMPLETE) && wait == q_denied ) )
- &&
- old_lhs->m_next == 0 &&
- old_lhs->m_dep_size == 0 ;
-
- if ( ! ok_count || ! ok_state ) {
-
- fprintf( stderr , "Kokkos::Impl::TaskManager<Kokkos::Threads>::assign ERROR deleting task(0x%lx) m_ref_count(%d) , m_wait(0x%ld)\n"
- , (unsigned long) old_lhs
- , count
- , (unsigned long) wait );
- fflush(stderr);
- Kokkos::abort( "Kokkos::Impl::TaskMember<Kokkos::Threads>::assign ERROR deleting");
- }
-
- if ( count == 0 ) {
- // When 'count == 0' this thread has exclusive access to 'old_lhs'
-
- ThreadsTaskPolicyQueue & queue = *( old_lhs->m_policy );
-
- queue.deallocate_task( old_lhs , old_lhs->m_size_alloc );
- }
- }
-}
-
-#endif
-
-//----------------------------------------------------------------------------
-
-Task * Task::get_dependence( int i ) const
-{
- Task * const t = m_dep[i] ;
-
- if ( Kokkos::Experimental::TASK_STATE_EXECUTING != m_state || i < 0 || m_dep_size <= i || 0 == t ) {
-
-fprintf( stderr
- , "TaskMember< Threads >::get_dependence ERROR : task[%lx]{ state(%d) dep_size(%d) dep[%d] = %lx }\n"
- , (unsigned long) this
- , m_state
- , m_dep_size
- , i
- , (unsigned long) t
- );
-fflush( stderr );
-
- Kokkos::Impl::throw_runtime_exception("TaskMember< Threads >::get_dependence ERROR");
- }
-
- return t ;
-}
-
-//----------------------------------------------------------------------------
-
-void Task::clear_dependence()
-{
- for ( int i = m_dep_size - 1 ; 0 <= i ; --i ) {
- assign( m_dep + i , 0 );
- }
-
- *((volatile int *) & m_dep_size ) = 0 ;
-
- memory_fence();
-}
-
-//----------------------------------------------------------------------------
-
-} /* namespace Impl */
-} /* namespace Experimental */
-} /* namespace Kokkos */
-
-#endif /* #if defined( KOKKOS_HAVE_PTHREAD ) && defined( KOKKOS_ENABLE_TASKPOLICY ) */
-
diff --git a/lib/kokkos/core/src/Threads/Kokkos_Threads_TaskPolicy.hpp b/lib/kokkos/core/src/Threads/Kokkos_Threads_TaskPolicy.hpp
deleted file mode 100644
index 116d32e4f..000000000
--- a/lib/kokkos/core/src/Threads/Kokkos_Threads_TaskPolicy.hpp
+++ /dev/null
@@ -1,745 +0,0 @@
-/*
-//@HEADER
-// ************************************************************************
-//
-// Kokkos v. 2.0
-// Copyright (2014) Sandia Corporation
-//
-// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
-// the U.S. Government retains certain rights in this software.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are
-// met:
-//
-// 1. Redistributions of source code must retain the above copyright
-// notice, this list of conditions and the following disclaimer.
-//
-// 2. Redistributions in binary form must reproduce the above copyright
-// notice, this list of conditions and the following disclaimer in the
-// documentation and/or other materials provided with the distribution.
-//
-// 3. Neither the name of the Corporation nor the names of the
-// contributors may be used to endorse or promote products derived from
-// this software without specific prior written permission.
-//
-// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
-// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
-// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
-// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
-// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
-// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
-// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
-// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
-// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-//
-// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
-// ************************************************************************
-//@HEADER
-*/
-
-// Experimental unified task-data parallel manycore LDRD
-
-#ifndef KOKKOS_THREADS_TASKPOLICY_HPP
-#define KOKKOS_THREADS_TASKPOLICY_HPP
-
-
-#include <Kokkos_Threads.hpp>
-#include <Kokkos_TaskPolicy.hpp>
-
-#if defined( KOKKOS_HAVE_PTHREAD ) && defined( KOKKOS_ENABLE_TASKPOLICY )
-
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Experimental {
-namespace Impl {
-
-struct ThreadsTaskPolicyQueue ;
-
-/** \brief Base class for all Kokkos::Threads tasks */
-template<>
-class TaskMember< Kokkos::Threads , void , void > {
-public:
-
- template < class > friend class Kokkos::Experimental::TaskPolicy ;
- friend struct ThreadsTaskPolicyQueue ;
-
- typedef TaskMember * (* function_verify_type) ( TaskMember * );
- typedef void (* function_single_type) ( TaskMember * );
- typedef void (* function_team_type) ( TaskMember * , Kokkos::Impl::ThreadsExecTeamMember & );
-
-private:
-
-
- ThreadsTaskPolicyQueue * m_policy ;
- TaskMember * volatile * m_queue ;
- function_verify_type m_verify ;
- function_team_type m_team ; ///< Apply function
- function_single_type m_serial ; ///< Apply function
- TaskMember ** m_dep ; ///< Dependences
- TaskMember * m_wait ; ///< Head of linked list of tasks waiting on this task
- TaskMember * m_next ; ///< Member of linked list of tasks
- int m_dep_capacity ; ///< Capacity of dependences
- int m_dep_size ; ///< Actual count of dependences
- int m_size_alloc ;
- int m_shmem_size ;
- int m_ref_count ; ///< Reference count
- int m_state ; ///< State of the task
-
-
- TaskMember( TaskMember && ) = delete ;
- TaskMember( const TaskMember & ) = delete ;
- TaskMember & operator = ( TaskMember && ) = delete ;
- TaskMember & operator = ( const TaskMember & ) = delete ;
-
-protected:
-
- TaskMember()
- : m_policy(0)
- , m_verify(0)
- , m_team(0)
- , m_serial(0)
- , m_dep(0)
- , m_wait(0)
- , m_next(0)
- , m_dep_capacity(0)
- , m_dep_size(0)
- , m_size_alloc(0)
- , m_shmem_size(0)
- , m_ref_count(0)
- , m_state( TASK_STATE_CONSTRUCTING )
- {}
-
-public:
-
- ~TaskMember();
-
- KOKKOS_INLINE_FUNCTION
- int reference_count() const
- { return *((volatile int *) & m_ref_count ); }
-
- template< typename ResultType >
- KOKKOS_FUNCTION static
- TaskMember * verify_type( TaskMember * t )
- {
- enum { check_type = ! std::is_same< ResultType , void >::value };
-
- if ( check_type && t != 0 ) {
-
- // Verify that t->m_verify is this function
- const function_verify_type self = & TaskMember::template verify_type< ResultType > ;
-
- if ( t->m_verify != self ) {
- t = 0 ;
- Kokkos::abort("TaskPolicy< Threads > verify_result_type" );
- }
- }
- return t ;
- }
-
- //----------------------------------------
- /* Inheritence Requirements on task types:
- *
- * class TaskMember< Threads , DerivedType::value_type , FunctorType >
- * : public TaskMember< Threads , DerivedType::value_type , void >
- * , public Functor
- * { ... };
- *
- * If value_type != void
- * class TaskMember< Threads , value_type , void >
- * : public TaskMember< Threads , void , void >
- *
- */
- //----------------------------------------
-
- template< class DerivedTaskType , class Tag >
- KOKKOS_FUNCTION static
- void apply_single(
- typename std::enable_if
- <( std::is_same<Tag,void>::value &&
- std::is_same< typename DerivedTaskType::result_type , void >::value
- ), TaskMember * >::type t )
- {
- {
- typedef typename DerivedTaskType::functor_type functor_type ;
-
- functor_type * const f =
- static_cast< functor_type * >( static_cast< DerivedTaskType * >(t) );
-
- f->apply();
-
- if ( t->m_state == int(Kokkos::Experimental::TASK_STATE_EXECUTING) ) {
- f->~functor_type();
- }
- }
- }
-
- template< class DerivedTaskType , class Tag >
- KOKKOS_FUNCTION static
- void apply_single(
- typename std::enable_if
- <( std::is_same< Tag , void >::value &&
- ! std::is_same< typename DerivedTaskType::result_type , void >::value
- ), TaskMember * >::type t )
- {
- {
- typedef typename DerivedTaskType::functor_type functor_type ;
-
- DerivedTaskType * const self = static_cast< DerivedTaskType * >(t);
- functor_type * const f = static_cast< functor_type * >( self );
-
- f->apply( self->m_result );
-
- if ( t->m_state == int(Kokkos::Experimental::TASK_STATE_EXECUTING) ) {
- f->~functor_type();
- }
- }
- }
-
- //----------------------------------------
-
- template< class DerivedTaskType , class Tag >
- KOKKOS_FUNCTION static
- void apply_team(
- typename std::enable_if
- <( std::is_same<Tag,void>::value &&
- std::is_same<typename DerivedTaskType::result_type,void>::value
- ), TaskMember * >::type t
- , Kokkos::Impl::ThreadsExecTeamMember & member
- )
- {
- typedef typename DerivedTaskType::functor_type functor_type ;
-
- functor_type * const f =
- static_cast< functor_type * >( static_cast< DerivedTaskType * >(t) );
-
- f->apply( member );
-
- // Synchronize for possible functor destruction and
- // completion of team task.
- if ( member.team_fan_in() ) {
- if ( t->m_state == int(Kokkos::Experimental::TASK_STATE_EXECUTING) ) {
- f->~functor_type();
- }
- }
- }
-
- template< class DerivedTaskType , class Tag >
- KOKKOS_FUNCTION static
- void apply_team(
- typename std::enable_if
- <( std::is_same<Tag,void>::value &&
- ! std::is_same<typename DerivedTaskType::result_type,void>::value
- ), TaskMember * >::type t
- , Kokkos::Impl::ThreadsExecTeamMember & member
- )
- {
- typedef typename DerivedTaskType::functor_type functor_type ;
-
- DerivedTaskType * const self = static_cast< DerivedTaskType * >(t);
- functor_type * const f = static_cast< functor_type * >( self );
-
- f->apply( member , self->m_result );
-
- // Synchronize for possible functor destruction and
- // completion of team task.
- if ( member.team_fan_in() ) {
- if ( t->m_state == int(Kokkos::Experimental::TASK_STATE_EXECUTING) ) {
- f->~functor_type();
- }
- }
- }
-
- //----------------------------------------
-
-#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- static
- void assign( TaskMember ** const lhs , TaskMember * const rhs );
-#else
- KOKKOS_INLINE_FUNCTION static
- void assign( TaskMember ** const lhs , TaskMember * const rhs ) {}
-#endif
-
- TaskMember * get_dependence( int i ) const ;
-
- KOKKOS_INLINE_FUNCTION
- int get_dependence() const { return m_dep_size ; }
-
- void clear_dependence();
-
- void latch_add( const int k );
-
- //----------------------------------------
-
- typedef FutureValueTypeIsVoidError get_result_type ;
-
- KOKKOS_INLINE_FUNCTION
- get_result_type get() const { return get_result_type() ; }
-
- inline static
- void construct_result( TaskMember * const ) {}
-
- KOKKOS_INLINE_FUNCTION
- Kokkos::Experimental::TaskState get_state() const { return Kokkos::Experimental::TaskState( m_state ); }
-
-};
-
-/** \brief A Future< Kokkos::Threads , ResultType > will cast
- * from TaskMember< Kokkos::Threads , void , void >
- * to TaskMember< Kokkos::Threads , ResultType , void >
- * to query the result.
- */
-template< class ResultType >
-class TaskMember< Kokkos::Threads , ResultType , void >
- : public TaskMember< Kokkos::Threads , void , void >
-{
-public:
-
- typedef ResultType result_type ;
-
- result_type m_result ;
-
- typedef const result_type & get_result_type ;
-
- KOKKOS_INLINE_FUNCTION
- get_result_type get() const { return m_result ; }
-
- inline static
- void construct_result( TaskMember * const ptr )
- {
- new((void*)(& ptr->m_result)) result_type();
- }
-
- inline
- TaskMember() : TaskMember< Kokkos::Threads , void , void >(), m_result() {}
-
- TaskMember( TaskMember && ) = delete ;
- TaskMember( const TaskMember & ) = delete ;
- TaskMember & operator = ( TaskMember && ) = delete ;
- TaskMember & operator = ( const TaskMember & ) = delete ;
-};
-
-/** \brief Callback functions will cast
- * from TaskMember< Kokkos::Threads , void , void >
- * to TaskMember< Kokkos::Threads , ResultType , FunctorType >
- * to execute work functions.
- */
-template< class ResultType , class FunctorType >
-class TaskMember< Kokkos::Threads , ResultType , FunctorType >
- : public TaskMember< Kokkos::Threads , ResultType , void >
- , public FunctorType
-{
-public:
- typedef ResultType result_type ;
- typedef FunctorType functor_type ;
-
- inline
- TaskMember( const functor_type & arg_functor )
- : TaskMember< Kokkos::Threads , ResultType , void >()
- , functor_type( arg_functor )
- {}
-
- inline static
- void copy_construct( TaskMember * const ptr
- , const functor_type & arg_functor )
- {
- typedef TaskMember< Kokkos::Threads , ResultType , void > base_type ;
-
- new((void*)static_cast<FunctorType*>(ptr)) functor_type( arg_functor );
-
- base_type::construct_result( static_cast<base_type*>( ptr ) );
- }
-
- TaskMember() = delete ;
- TaskMember( TaskMember && ) = delete ;
- TaskMember( const TaskMember & ) = delete ;
- TaskMember & operator = ( TaskMember && ) = delete ;
- TaskMember & operator = ( const TaskMember & ) = delete ;
-};
-
-//----------------------------------------------------------------------------
-
-struct ThreadsTaskPolicyQueue {
-
- enum { NPRIORITY = 3 };
-
- typedef Kokkos::Experimental::MemoryPool< Kokkos::Threads >
- memory_space ;
-
- typedef Kokkos::Experimental::Impl::TaskMember< Kokkos::Threads, void, void >
- task_root_type ;
-
- memory_space m_space ;
- task_root_type * m_team[ NPRIORITY ];
- task_root_type * m_serial[ NPRIORITY ];
- int m_team_size ; ///< Fixed size of a task-team
- int m_default_dependence_capacity ;
- int volatile m_count_ready ; ///< Ready plus executing tasks
- int volatile m_count_alloc ; ///< Total allocated tasks
-
- // Execute tasks until all non-waiting tasks are complete.
- static void driver( Kokkos::Impl::ThreadsExec & exec
- , const void * arg );
-
- task_root_type * allocate_task
- ( const unsigned arg_sizeof_task
- , const unsigned arg_dep_capacity
- , const unsigned arg_team_shmem
- );
-
- void deallocate_task( void * , unsigned );
- void schedule_task( task_root_type * const
- , const bool initial_spawn = true );
- void reschedule_task( task_root_type * const );
- void add_dependence( task_root_type * const after
- , task_root_type * const before );
-
- // When a task finishes executing update its dependences
- // and either deallocate the task if complete
- // or reschedule the task if respawned.
- void complete_executed_task( task_root_type * );
-
- // Pop a task from a ready queue
- static task_root_type *
- pop_ready_task( task_root_type * volatile * const queue );
-
- ThreadsTaskPolicyQueue() = delete ;
- ThreadsTaskPolicyQueue( ThreadsTaskPolicyQueue && ) = delete ;
- ThreadsTaskPolicyQueue( const ThreadsTaskPolicyQueue & ) = delete ;
- ThreadsTaskPolicyQueue & operator = ( ThreadsTaskPolicyQueue && ) = delete ;
- ThreadsTaskPolicyQueue & operator = ( const ThreadsTaskPolicyQueue & ) = delete ;
-
- ~ThreadsTaskPolicyQueue();
-
- ThreadsTaskPolicyQueue
- ( const unsigned arg_task_max_count
- , const unsigned arg_task_max_size
- , const unsigned arg_task_default_dependence_capacity
- , const unsigned arg_task_team_size
- );
-
- // Callback to destroy the shared memory tracked queue.
- struct Destroy {
- ThreadsTaskPolicyQueue * m_policy ;
- void destroy_shared_allocation();
- };
-};
-
-} /* namespace Impl */
-} /* namespace Experimental */
-} /* namespace Kokkos */
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Experimental {
-
-void wait( TaskPolicy< Kokkos::Threads > & );
-
-template<>
-class TaskPolicy< Kokkos::Threads >
-{
-public:
-
- typedef Kokkos::Threads execution_space ;
- typedef TaskPolicy execution_policy ;
- typedef Kokkos::Impl::ThreadsExecTeamMember member_type ;
-
-private:
-
- typedef Impl::TaskMember< Kokkos::Threads , void , void > task_root_type ;
- typedef Kokkos::Experimental::MemoryPool< Kokkos::Threads > memory_space ;
-
- typedef Kokkos::Experimental::Impl::SharedAllocationTracker track_type ;
-
- track_type m_track ;
- Impl::ThreadsTaskPolicyQueue * m_policy ;
-
- template< class FunctorType >
- static inline
- const task_root_type * get_task_root( const FunctorType * f )
- {
- typedef Impl::TaskMember< execution_space , typename FunctorType::value_type , FunctorType > task_type ;
- return static_cast< const task_root_type * >( static_cast< const task_type * >(f) );
- }
-
- template< class FunctorType >
- static inline
- task_root_type * get_task_root( FunctorType * f )
- {
- typedef Impl::TaskMember< execution_space , typename FunctorType::value_type , FunctorType > task_type ;
- return static_cast< task_root_type * >( static_cast< task_type * >(f) );
- }
-
- /** \brief Allocate and construct a task.
- *
- * Allocate space for DerivedTaskType followed by TaskMember*[ dependence_capacity ]
- */
- template< class DerivedTaskType , class Tag >
- task_root_type *
- create( const typename DerivedTaskType::functor_type & arg_functor
- , const task_root_type::function_single_type arg_apply_single
- , const task_root_type::function_team_type arg_apply_team
- , const unsigned arg_team_shmem
- , const unsigned arg_dependence_capacity
- )
- {
- task_root_type * const t =
- m_policy->allocate_task( sizeof(DerivedTaskType)
- , arg_dependence_capacity
- , arg_team_shmem
- );
- if ( t != 0 ) {
-
- DerivedTaskType * const task = static_cast<DerivedTaskType*>(t);
-
- DerivedTaskType::copy_construct( task , arg_functor );
-
- task->task_root_type::m_verify = & task_root_type::template verify_type< typename DerivedTaskType::value_type > ;
- task->task_root_type::m_team = arg_apply_team ;
- task->task_root_type::m_serial = arg_apply_single ;
-
- // Do not proceed until initialization is written to memory
- Kokkos::memory_fence();
- }
- return t ;
- }
-
-public:
-
- // Valid team sizes are 1,
- // Threads::pool_size(1) == threads per numa, or
- // Threads::pool_size(2) == threads per core
-
- TaskPolicy
- ( const unsigned arg_task_max_count
- , const unsigned arg_task_max_size
- , const unsigned arg_task_default_dependence_capacity = 4
- , const unsigned arg_task_team_size = 0 /* choose default */
- );
-
- KOKKOS_FUNCTION TaskPolicy() = default ;
- KOKKOS_FUNCTION TaskPolicy( TaskPolicy && rhs ) = default ;
- KOKKOS_FUNCTION TaskPolicy( const TaskPolicy & rhs ) = default ;
- KOKKOS_FUNCTION TaskPolicy & operator = ( TaskPolicy && rhs ) = default ;
- KOKKOS_FUNCTION TaskPolicy & operator = ( const TaskPolicy & rhs ) = default ;
-
- //----------------------------------------
-
- KOKKOS_INLINE_FUNCTION
- int allocated_task_count() const { return m_policy->m_count_alloc ; }
-
- //----------------------------------------
- // Create serial-thread task
-
- template< class FunctorType >
- KOKKOS_INLINE_FUNCTION
- Future< typename FunctorType::value_type , execution_space >
- task_create( const FunctorType & functor
- , const unsigned dependence_capacity = ~0u )
- {
- typedef typename FunctorType::value_type value_type ;
- typedef Impl::TaskMember< execution_space , value_type , FunctorType > task_type ;
-
- return Future< value_type , execution_space >(
-#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- TaskPolicy::create< task_type , void >
- ( functor
- , & task_root_type::template apply_single< task_type , void >
- , task_root_type::function_team_type(0)
- , 0
- , dependence_capacity
- )
-#endif
- );
- }
-
- template< class FunctorType >
- KOKKOS_INLINE_FUNCTION
- Future< typename FunctorType::value_type , execution_space >
- proc_create( const FunctorType & functor
- , const unsigned dependence_capacity = ~0u )
- { return task_create( functor , dependence_capacity ); }
-
- // Create thread-team task
-
- template< class FunctorType >
- KOKKOS_INLINE_FUNCTION
- Future< typename FunctorType::value_type , execution_space >
- task_create_team( const FunctorType & functor
- , const unsigned dependence_capacity = ~0u )
- {
- typedef typename FunctorType::value_type value_type ;
- typedef Impl::TaskMember< execution_space , value_type , FunctorType > task_type ;
-
- return Future< value_type , execution_space >(
-#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- TaskPolicy::create< task_type , void >
- ( functor
- , task_root_type::function_single_type(0)
- , & task_root_type::template apply_team< task_type , void >
- , Kokkos::Impl::FunctorTeamShmemSize< FunctorType >::
- value( functor , m_policy->m_team_size )
- , dependence_capacity
- )
-#endif
- );
- }
-
- template< class FunctorType >
- KOKKOS_INLINE_FUNCTION
- Future< typename FunctorType::value_type , execution_space >
- proc_create_team( const FunctorType & functor
- , const unsigned dependence_capacity = ~0u )
- { return task_create_team( functor , dependence_capacity ); }
-
- template< class A1 , class A2 , class A3 , class A4 >
- KOKKOS_INLINE_FUNCTION
- void add_dependence( const Future<A1,A2> & after
- , const Future<A3,A4> & before
- , typename std::enable_if
- < std::is_same< typename Future<A1,A2>::execution_space , execution_space >::value
- &&
- std::is_same< typename Future<A3,A4>::execution_space , execution_space >::value
- >::type * = 0
- ) const
- {
-#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- m_policy->add_dependence( after.m_task , before.m_task );
-#endif
- }
-
- //----------------------------------------
-
- Future< Latch , execution_space >
- KOKKOS_INLINE_FUNCTION
- create_latch( const int N ) const
- {
- task_root_type * const task =
- m_policy->allocate_task( sizeof(task_root_type) , 0 , 0 );
- task->m_dep_size = N ; // Using m_dep_size for latch counter
- task->m_state = TASK_STATE_WAITING ;
- return Future< Latch , execution_space >( task );
- }
-
- //----------------------------------------
-
- template< class FunctorType , class A3 , class A4 >
- KOKKOS_INLINE_FUNCTION
- void add_dependence( FunctorType * task_functor
- , const Future<A3,A4> & before
- , typename std::enable_if
- < std::is_same< typename Future<A3,A4>::execution_space , execution_space >::value
- >::type * = 0
- ) const
- {
-#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- m_policy->add_dependence( get_task_root(task_functor) , before.m_task );
-#endif
- }
-
- template< class ValueType >
- const Future< ValueType , execution_space > &
- spawn( const Future< ValueType , execution_space > & f
- , const bool priority = false ) const
- {
- if ( f.m_task ) {
-#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- f.m_task->m_queue =
- ( f.m_task->m_team != 0
- ? & ( m_policy->m_team[ priority ? 0 : 1 ] )
- : & ( m_policy->m_serial[ priority ? 0 : 1 ] ) );
- m_policy->schedule_task( f.m_task );
-#endif
- }
- return f ;
- }
-
- template< class FunctorType >
- KOKKOS_INLINE_FUNCTION
- void respawn( FunctorType * task_functor
- , const bool priority = false ) const
- {
-#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- task_root_type * const t = get_task_root(task_functor);
- t->m_queue =
- ( t->m_team != 0 ? & ( m_policy->m_team[ priority ? 0 : 1 ] )
- : & ( m_policy->m_serial[ priority ? 0 : 1 ] ) );
- m_policy->reschedule_task( t );
-#endif
- }
-
- // When a create method fails by returning a null Future
- // the task that called the create method may respawn
- // with a dependence on memory becoming available.
- // This is a race as more than one task may be respawned
- // with this need.
-
- template< class FunctorType >
- KOKKOS_INLINE_FUNCTION
- void respawn_needing_memory( FunctorType * task_functor ) const
- {
-#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- task_root_type * const t = get_task_root(task_functor);
- t->m_queue =
- ( t->m_team != 0 ? & ( m_policy->m_team[ 2 ] )
- : & ( m_policy->m_serial[ 2 ] ) );
- m_policy->reschedule_task( t );
-#endif
- }
-
- //----------------------------------------
- // Functions for an executing task functor to query dependences,
- // set new dependences, and respawn itself.
-
- template< class FunctorType >
- KOKKOS_INLINE_FUNCTION
- Future< void , execution_space >
- get_dependence( const FunctorType * task_functor , int i ) const
- {
- return Future<void,execution_space>(
-#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- get_task_root(task_functor)->get_dependence(i)
-#endif
- );
- }
-
- template< class FunctorType >
- KOKKOS_INLINE_FUNCTION
- int get_dependence( const FunctorType * task_functor ) const
-#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- { return get_task_root(task_functor)->get_dependence(); }
-#else
- { return 0 ; }
-#endif
-
- template< class FunctorType >
- KOKKOS_INLINE_FUNCTION
- void clear_dependence( FunctorType * task_functor ) const
-#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- { get_task_root(task_functor)->clear_dependence(); }
-#else
- {}
-#endif
-
- //----------------------------------------
-
- static member_type & member_single();
-
- friend void wait( TaskPolicy< Kokkos::Threads > & );
-};
-
-} /* namespace Experimental */
-} /* namespace Kokkos */
-
-//----------------------------------------------------------------------------
-
-#endif /* #if defined( KOKKOS_HAVE_PTHREAD ) && defined( KOKKOS_ENABLE_TASKPOLICY ) */
-#endif /* #ifndef KOKKOS_THREADS_TASKPOLICY_HPP */
-
-
diff --git a/lib/kokkos/core/src/impl/KokkosExp_ViewMapping.hpp b/lib/kokkos/core/src/impl/KokkosExp_ViewMapping.hpp
index ed56536cd..d5d27cc83 100644
--- a/lib/kokkos/core/src/impl/KokkosExp_ViewMapping.hpp
+++ b/lib/kokkos/core/src/impl/KokkosExp_ViewMapping.hpp
@@ -1,2932 +1,46 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
-#ifndef KOKKOS_EXPERIMENTAL_VIEW_MAPPING_HPP
-#define KOKKOS_EXPERIMENTAL_VIEW_MAPPING_HPP
-
-#include <type_traits>
-#include <initializer_list>
-
-#include <Kokkos_Core_fwd.hpp>
-#include <Kokkos_Pair.hpp>
-#include <Kokkos_Layout.hpp>
-#include <impl/Kokkos_Error.hpp>
-#include <impl/Kokkos_Traits.hpp>
-#include <impl/KokkosExp_ViewCtor.hpp>
-#include <impl/Kokkos_Atomic_View.hpp>
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Experimental {
-namespace Impl {
-
-template< unsigned I , size_t ... Args >
-struct variadic_size_t
- { enum { value = ~size_t(0) }; };
-
-template< size_t Val , size_t ... Args >
-struct variadic_size_t< 0 , Val , Args ... >
- { enum { value = Val }; };
-
-template< unsigned I , size_t Val , size_t ... Args >
-struct variadic_size_t< I , Val , Args ... >
- { enum { value = variadic_size_t< I - 1 , Args ... >::value }; };
-
-template< size_t ... Args >
-struct rank_dynamic ;
-
-template<>
-struct rank_dynamic<> { enum { value = 0 }; };
-
-template< size_t Val , size_t ... Args >
-struct rank_dynamic< Val , Args... >
-{
- enum { value = ( Val == 0 ? 1 : 0 ) + rank_dynamic< Args... >::value };
-};
-
-#define KOKKOS_IMPL_VIEW_DIMENSION( R ) \
- template< size_t V , unsigned > struct ViewDimension ## R \
- { \
- enum { ArgN ## R = ( V != ~size_t(0) ? V : 1 ) }; \
- enum { N ## R = ( V != ~size_t(0) ? V : 1 ) }; \
- KOKKOS_INLINE_FUNCTION explicit ViewDimension ## R ( size_t ) {} \
- ViewDimension ## R () = default ; \
- ViewDimension ## R ( const ViewDimension ## R & ) = default ; \
- ViewDimension ## R & operator = ( const ViewDimension ## R & ) = default ; \
- }; \
- template< unsigned RD > struct ViewDimension ## R < 0 , RD > \
- { \
- enum { ArgN ## R = 0 }; \
- typename std::conditional<( RD < 3 ), size_t , unsigned >::type N ## R ; \
- ViewDimension ## R () = default ; \
- ViewDimension ## R ( const ViewDimension ## R & ) = default ; \
- ViewDimension ## R & operator = ( const ViewDimension ## R & ) = default ; \
- KOKKOS_INLINE_FUNCTION explicit ViewDimension ## R ( size_t V ) : N ## R ( V ) {} \
- };
-
-KOKKOS_IMPL_VIEW_DIMENSION( 0 )
-KOKKOS_IMPL_VIEW_DIMENSION( 1 )
-KOKKOS_IMPL_VIEW_DIMENSION( 2 )
-KOKKOS_IMPL_VIEW_DIMENSION( 3 )
-KOKKOS_IMPL_VIEW_DIMENSION( 4 )
-KOKKOS_IMPL_VIEW_DIMENSION( 5 )
-KOKKOS_IMPL_VIEW_DIMENSION( 6 )
-KOKKOS_IMPL_VIEW_DIMENSION( 7 )
-
-#undef KOKKOS_IMPL_VIEW_DIMENSION
-
-template< size_t ... Vals >
-struct ViewDimension
- : public ViewDimension0< variadic_size_t<0,Vals...>::value
- , rank_dynamic< Vals... >::value >
- , public ViewDimension1< variadic_size_t<1,Vals...>::value
- , rank_dynamic< Vals... >::value >
- , public ViewDimension2< variadic_size_t<2,Vals...>::value
- , rank_dynamic< Vals... >::value >
- , public ViewDimension3< variadic_size_t<3,Vals...>::value
- , rank_dynamic< Vals... >::value >
- , public ViewDimension4< variadic_size_t<4,Vals...>::value
- , rank_dynamic< Vals... >::value >
- , public ViewDimension5< variadic_size_t<5,Vals...>::value
- , rank_dynamic< Vals... >::value >
- , public ViewDimension6< variadic_size_t<6,Vals...>::value
- , rank_dynamic< Vals... >::value >
- , public ViewDimension7< variadic_size_t<7,Vals...>::value
- , rank_dynamic< Vals... >::value >
-{
- typedef ViewDimension0< variadic_size_t<0,Vals...>::value
- , rank_dynamic< Vals... >::value > D0 ;
- typedef ViewDimension1< variadic_size_t<1,Vals...>::value
- , rank_dynamic< Vals... >::value > D1 ;
- typedef ViewDimension2< variadic_size_t<2,Vals...>::value
- , rank_dynamic< Vals... >::value > D2 ;
- typedef ViewDimension3< variadic_size_t<3,Vals...>::value
- , rank_dynamic< Vals... >::value > D3 ;
- typedef ViewDimension4< variadic_size_t<4,Vals...>::value
- , rank_dynamic< Vals... >::value > D4 ;
- typedef ViewDimension5< variadic_size_t<5,Vals...>::value
- , rank_dynamic< Vals... >::value > D5 ;
- typedef ViewDimension6< variadic_size_t<6,Vals...>::value
- , rank_dynamic< Vals... >::value > D6 ;
- typedef ViewDimension7< variadic_size_t<7,Vals...>::value
- , rank_dynamic< Vals... >::value > D7 ;
-
- using D0::ArgN0 ;
- using D1::ArgN1 ;
- using D2::ArgN2 ;
- using D3::ArgN3 ;
- using D4::ArgN4 ;
- using D5::ArgN5 ;
- using D6::ArgN6 ;
- using D7::ArgN7 ;
-
- using D0::N0 ;
- using D1::N1 ;
- using D2::N2 ;
- using D3::N3 ;
- using D4::N4 ;
- using D5::N5 ;
- using D6::N6 ;
- using D7::N7 ;
-
- enum { rank = sizeof...(Vals) };
- enum { rank_dynamic = Impl::rank_dynamic< Vals... >::value };
-
- ViewDimension() = default ;
- ViewDimension( const ViewDimension & ) = default ;
- ViewDimension & operator = ( const ViewDimension & ) = default ;
-
- KOKKOS_INLINE_FUNCTION
- constexpr
- ViewDimension( size_t n0 , size_t n1 , size_t n2 , size_t n3
- , size_t n4 , size_t n5 , size_t n6 , size_t n7 )
- : D0( n0 )
- , D1( n1 )
- , D2( n2 )
- , D3( n3 )
- , D4( n4 )
- , D5( n5 )
- , D6( n6 )
- , D7( n7 )
- {}
-
- KOKKOS_INLINE_FUNCTION
- constexpr size_t extent( const unsigned r ) const
- {
- return r == 0 ? N0 : (
- r == 1 ? N1 : (
- r == 2 ? N2 : (
- r == 3 ? N3 : (
- r == 4 ? N4 : (
- r == 5 ? N5 : (
- r == 6 ? N6 : (
- r == 7 ? N7 : 0 )))))));
- }
-
- template< size_t N >
- struct prepend { typedef ViewDimension< N , Vals... > type ; };
-
- template< size_t N >
- struct append { typedef ViewDimension< Vals... , N > type ; };
-};
-
-template< class A , class B >
-struct ViewDimensionJoin ;
-
-template< size_t ... A , size_t ... B >
-struct ViewDimensionJoin< ViewDimension< A... > , ViewDimension< B... > > {
- typedef ViewDimension< A... , B... > type ;
-};
-
-//----------------------------------------------------------------------------
-
-template< class DstDim , class SrcDim >
-struct ViewDimensionAssignable ;
-
-template< size_t ... DstArgs , size_t ... SrcArgs >
-struct ViewDimensionAssignable< ViewDimension< DstArgs ... >
- , ViewDimension< SrcArgs ... > >
-{
- typedef ViewDimension< DstArgs... > dst ;
- typedef ViewDimension< SrcArgs... > src ;
-
- enum { value =
- unsigned(dst::rank) == unsigned(src::rank) && (
- //Compile time check that potential static dimensions match
- ( ( 1 > dst::rank_dynamic && 1 > src::rank_dynamic ) ? (size_t(dst::ArgN0) == size_t(src::ArgN0)) : true ) &&
- ( ( 2 > dst::rank_dynamic && 2 > src::rank_dynamic ) ? (size_t(dst::ArgN1) == size_t(src::ArgN1)) : true ) &&
- ( ( 3 > dst::rank_dynamic && 3 > src::rank_dynamic ) ? (size_t(dst::ArgN2) == size_t(src::ArgN2)) : true ) &&
- ( ( 4 > dst::rank_dynamic && 4 > src::rank_dynamic ) ? (size_t(dst::ArgN3) == size_t(src::ArgN3)) : true ) &&
- ( ( 5 > dst::rank_dynamic && 5 > src::rank_dynamic ) ? (size_t(dst::ArgN4) == size_t(src::ArgN4)) : true ) &&
- ( ( 6 > dst::rank_dynamic && 6 > src::rank_dynamic ) ? (size_t(dst::ArgN5) == size_t(src::ArgN5)) : true ) &&
- ( ( 7 > dst::rank_dynamic && 7 > src::rank_dynamic ) ? (size_t(dst::ArgN6) == size_t(src::ArgN6)) : true ) &&
- ( ( 8 > dst::rank_dynamic && 8 > src::rank_dynamic ) ? (size_t(dst::ArgN7) == size_t(src::ArgN7)) : true )
- )};
-
-};
-
-}}} // namespace Kokkos::Experimental::Impl
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Experimental {
-namespace Impl {
-
-struct ALL_t {
- KOKKOS_INLINE_FUNCTION
- constexpr const ALL_t & operator()() const { return *this ; }
-};
-
-template< class T >
-struct is_integral_extent_type
-{ enum { value = std::is_same<T,Kokkos::Experimental::Impl::ALL_t>::value ? 1 : 0 }; };
-
-template< class iType >
-struct is_integral_extent_type< std::pair<iType,iType> >
-{ enum { value = std::is_integral<iType>::value ? 1 : 0 }; };
-
-template< class iType >
-struct is_integral_extent_type< Kokkos::pair<iType,iType> >
-{ enum { value = std::is_integral<iType>::value ? 1 : 0 }; };
-
-// Assuming '2 == initializer_list<iType>::size()'
-template< class iType >
-struct is_integral_extent_type< std::initializer_list<iType> >
-{ enum { value = std::is_integral<iType>::value ? 1 : 0 }; };
-
-template < unsigned I , class ... Args >
-struct is_integral_extent
-{
- // get_type is void when sizeof...(Args) <= I
- typedef typename std::remove_cv<
- typename std::remove_reference<
- typename Kokkos::Impl::get_type<I,Args...
- >::type >::type >::type type ;
-
- enum { value = is_integral_extent_type<type>::value };
-
- static_assert( value ||
- std::is_integral<type>::value ||
- std::is_same<type,void>::value
- , "subview argument must be either integral or integral extent" );
-};
-
-template< unsigned DomainRank , unsigned RangeRank >
-struct SubviewExtents {
-private:
-
- // Cannot declare zero-length arrays
- enum { InternalRangeRank = RangeRank ? RangeRank : 1u };
-
- size_t m_begin[ DomainRank ];
- size_t m_length[ InternalRangeRank ];
- unsigned m_index[ InternalRangeRank ];
-
- template< size_t ... DimArgs >
- KOKKOS_FORCEINLINE_FUNCTION
- bool set( unsigned domain_rank
- , unsigned range_rank
- , const ViewDimension< DimArgs ... > & dim )
- { return true ; }
-
- template< class T , size_t ... DimArgs , class ... Args >
- KOKKOS_FORCEINLINE_FUNCTION
- bool set( unsigned domain_rank
- , unsigned range_rank
- , const ViewDimension< DimArgs ... > & dim
- , const T & val
- , Args ... args )
- {
- const size_t v = static_cast<size_t>(val);
-
- m_begin[ domain_rank ] = v ;
-
- return set( domain_rank + 1 , range_rank , dim , args... )
-#if defined( KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK )
- && ( v < dim.extent( domain_rank ) )
-#endif
- ;
- }
-
- // ALL_t
- template< size_t ... DimArgs , class ... Args >
- KOKKOS_FORCEINLINE_FUNCTION
- bool set( unsigned domain_rank
- , unsigned range_rank
- , const ViewDimension< DimArgs ... > & dim
- , const Kokkos::Experimental::Impl::ALL_t
- , Args ... args )
- {
- m_begin[ domain_rank ] = 0 ;
- m_length[ range_rank ] = dim.extent( domain_rank );
- m_index[ range_rank ] = domain_rank ;
-
- return set( domain_rank + 1 , range_rank + 1 , dim , args... );
- }
-
- // std::pair range
- template< class T , size_t ... DimArgs , class ... Args >
- KOKKOS_FORCEINLINE_FUNCTION
- bool set( unsigned domain_rank
- , unsigned range_rank
- , const ViewDimension< DimArgs ... > & dim
- , const std::pair<T,T> & val
- , Args ... args )
- {
- const size_t b = static_cast<size_t>( val.first );
- const size_t e = static_cast<size_t>( val.second );
-
- m_begin[ domain_rank ] = b ;
- m_length[ range_rank ] = e - b ;
- m_index[ range_rank ] = domain_rank ;
-
- return set( domain_rank + 1 , range_rank + 1 , dim , args... )
-#if defined( KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK )
- && ( e <= b + dim.extent( domain_rank ) )
-#endif
- ;
- }
-
- // Kokkos::pair range
- template< class T , size_t ... DimArgs , class ... Args >
- KOKKOS_FORCEINLINE_FUNCTION
- bool set( unsigned domain_rank
- , unsigned range_rank
- , const ViewDimension< DimArgs ... > & dim
- , const Kokkos::pair<T,T> & val
- , Args ... args )
- {
- const size_t b = static_cast<size_t>( val.first );
- const size_t e = static_cast<size_t>( val.second );
-
- m_begin[ domain_rank ] = b ;
- m_length[ range_rank ] = e - b ;
- m_index[ range_rank ] = domain_rank ;
-
- return set( domain_rank + 1 , range_rank + 1 , dim , args... )
-#if defined( KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK )
- && ( e <= b + dim.extent( domain_rank ) )
-#endif
- ;
- }
-
- // { begin , end } range
- template< class T , size_t ... DimArgs , class ... Args >
- KOKKOS_FORCEINLINE_FUNCTION
- bool set( unsigned domain_rank
- , unsigned range_rank
- , const ViewDimension< DimArgs ... > & dim
- , const std::initializer_list< T > & val
- , Args ... args )
- {
- const size_t b = static_cast<size_t>( val.begin()[0] );
- const size_t e = static_cast<size_t>( val.begin()[1] );
-
- m_begin[ domain_rank ] = b ;
- m_length[ range_rank ] = e - b ;
- m_index[ range_rank ] = domain_rank ;
-
- return set( domain_rank + 1 , range_rank + 1 , dim , args... )
-#if defined( KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK )
- && ( val.size() == 2 )
- && ( e <= b + dim.extent( domain_rank ) )
-#endif
- ;
- }
-
- //------------------------------
-
-#if defined( KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK )
-
- template< size_t ... DimArgs >
- void error( char *
- , int
- , unsigned
- , unsigned
- , const ViewDimension< DimArgs ... > & ) const
- {}
-
- template< class T , size_t ... DimArgs , class ... Args >
- void error( char * buf , int buf_len
- , unsigned domain_rank
- , unsigned range_rank
- , const ViewDimension< DimArgs ... > & dim
- , const T & val
- , Args ... args ) const
- {
- const int n = std::min( buf_len ,
- snprintf( buf , buf_len
- , " %lu < %lu %c"
- , static_cast<unsigned long>(val)
- , static_cast<unsigned long>( dim.extent( domain_rank ) )
- , int( sizeof...(Args) ? ',' : ')' ) ) );
-
- error( buf+n, buf_len-n, domain_rank + 1 , range_rank , dim , args... );
- }
-
- // std::pair range
- template< size_t ... DimArgs , class ... Args >
- void error( char * buf , int buf_len
- , unsigned domain_rank
- , unsigned range_rank
- , const ViewDimension< DimArgs ... > & dim
- , const Kokkos::Experimental::Impl::ALL_t
- , Args ... args ) const
- {
- const int n = std::min( buf_len ,
- snprintf( buf , buf_len
- , " Kokkos::ALL %c"
- , int( sizeof...(Args) ? ',' : ')' ) ) );
-
- error( buf+n , buf_len-n , domain_rank + 1 , range_rank + 1 , dim , args... );
- }
-
- // std::pair range
- template< class T , size_t ... DimArgs , class ... Args >
- void error( char * buf , int buf_len
- , unsigned domain_rank
- , unsigned range_rank
- , const ViewDimension< DimArgs ... > & dim
- , const std::pair<T,T> & val
- , Args ... args ) const
- {
- // d <= e - b
- const int n = std::min( buf_len ,
- snprintf( buf , buf_len
- , " %lu <= %lu - %lu %c"
- , static_cast<unsigned long>( dim.extent( domain_rank ) )
- , static_cast<unsigned long>( val.second )
- , static_cast<unsigned long>( val.begin )
- , int( sizeof...(Args) ? ',' : ')' ) ) );
-
- error( buf+n , buf_len-n , domain_rank + 1 , range_rank + 1 , dim , args... );
- }
-
- // Kokkos::pair range
- template< class T , size_t ... DimArgs , class ... Args >
- void error( char * buf , int buf_len
- , unsigned domain_rank
- , unsigned range_rank
- , const ViewDimension< DimArgs ... > & dim
- , const Kokkos::pair<T,T> & val
- , Args ... args ) const
- {
- // d <= e - b
- const int n = std::min( buf_len ,
- snprintf( buf , buf_len
- , " %lu <= %lu - %lu %c"
- , static_cast<unsigned long>( dim.extent( domain_rank ) )
- , static_cast<unsigned long>( val.second )
- , static_cast<unsigned long>( val.begin )
- , int( sizeof...(Args) ? ',' : ')' ) ) );
-
- error( buf+n , buf_len-n , domain_rank + 1 , range_rank + 1 , dim , args... );
- }
-
- // { begin , end } range
- template< class T , size_t ... DimArgs , class ... Args >
- void error( char * buf , int buf_len
- , unsigned domain_rank
- , unsigned range_rank
- , const ViewDimension< DimArgs ... > & dim
- , const std::initializer_list< T > & val
- , Args ... args ) const
- {
- // d <= e - b
- int n = 0 ;
- if ( val.size() == 2 ) {
- n = std::min( buf_len ,
- snprintf( buf , buf_len
- , " %lu <= %lu - %lu %c"
- , static_cast<unsigned long>( dim.extent( domain_rank ) )
- , static_cast<unsigned long>( val.begin()[0] )
- , static_cast<unsigned long>( val.begin()[1] )
- , int( sizeof...(Args) ? ',' : ')' ) ) );
- }
- else {
- n = std::min( buf_len ,
- snprintf( buf , buf_len
- , " { ... }.size() == %u %c"
- , unsigned(val.size())
- , int( sizeof...(Args) ? ',' : ')' ) ) );
- }
-
- error( buf+n , buf_len-n , domain_rank + 1 , range_rank + 1 , dim , args... );
- }
-
- template< size_t ... DimArgs , class ... Args >
- KOKKOS_FORCEINLINE_FUNCTION
- void error( const ViewDimension< DimArgs ... > & dim , Args ... args ) const
- {
-#if defined( KOKKOS_ACTIVE_EXECUTION_SPACE_HOST )
- enum { LEN = 1024 };
- char buffer[ LEN ];
-
- const int n = snprintf(buffer,LEN,"Kokkos::subview bounds error (");
- error( buffer+n , LEN-n , 0 , 0 , dim , args... );
-
- Kokkos::Impl::throw_runtime_exception(std::string(buffer));
-#else
- Kokkos::abort("Kokkos::subview bounds error");
-#endif
- }
-
-#else
-
- template< size_t ... DimArgs , class ... Args >
- KOKKOS_FORCEINLINE_FUNCTION
- void error( const ViewDimension< DimArgs ... > & , Args ... ) const {}
-
-#endif
-
-public:
-
- template< size_t ... DimArgs , class ... Args >
- KOKKOS_INLINE_FUNCTION
- SubviewExtents( const ViewDimension< DimArgs ... > & dim , Args ... args )
- {
- static_assert( DomainRank == sizeof...(DimArgs) , "" );
- static_assert( DomainRank == sizeof...(Args) , "" );
-
- // Verifies that all arguments, up to 8, are integral types,
- // integral extents, or don't exist.
- static_assert( RangeRank ==
- unsigned( is_integral_extent<0,Args...>::value ) +
- unsigned( is_integral_extent<1,Args...>::value ) +
- unsigned( is_integral_extent<2,Args...>::value ) +
- unsigned( is_integral_extent<3,Args...>::value ) +
- unsigned( is_integral_extent<4,Args...>::value ) +
- unsigned( is_integral_extent<5,Args...>::value ) +
- unsigned( is_integral_extent<6,Args...>::value ) +
- unsigned( is_integral_extent<7,Args...>::value ) , "" );
-
- if ( RangeRank == 0 ) { m_length[0] = 0 ; m_index[0] = ~0u ; }
-
- if ( ! set( 0 , 0 , dim , args... ) ) error( dim , args... );
- }
-
- template < typename iType >
- KOKKOS_FORCEINLINE_FUNCTION
- constexpr size_t domain_offset( const iType i ) const
- { return unsigned(i) < DomainRank ? m_begin[i] : 0 ; }
-
- template < typename iType >
- KOKKOS_FORCEINLINE_FUNCTION
- constexpr size_t range_extent( const iType i ) const
- { return unsigned(i) < InternalRangeRank ? m_length[i] : 0 ; }
-
- template < typename iType >
- KOKKOS_FORCEINLINE_FUNCTION
- constexpr unsigned range_index( const iType i ) const
- { return unsigned(i) < InternalRangeRank ? m_index[i] : ~0u ; }
-};
-
-}}} // namespace Kokkos::Experimental::Impl
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Experimental {
-namespace Impl {
-
-/** \brief Given a value type and dimension generate the View data type */
-template< class T , class Dim >
-struct ViewDataType ;
-
-template< class T >
-struct ViewDataType< T , ViewDimension<> >
-{
- typedef T type ;
-};
-
-template< class T , size_t ... Args >
-struct ViewDataType< T , ViewDimension< 0 , Args... > >
-{
- typedef typename ViewDataType<T*,ViewDimension<Args...> >::type type ;
-};
-
-template< class T , size_t N , size_t ... Args >
-struct ViewDataType< T , ViewDimension< N , Args... > >
-{
- typedef typename ViewDataType<T,ViewDimension<Args...> >::type type[N] ;
-};
-
-/**\brief Analysis of View data type.
- *
- * Data type conforms to one of the following patterns :
- * {const} value_type [][#][#][#]
- * {const} value_type ***[#][#][#]
- * Where the sum of counts of '*' and '[#]' is at most ten.
- *
- * Provide typedef for the ViewDimension<...> and value_type.
- */
-template< class T >
-struct ViewArrayAnalysis
-{
- typedef T value_type ;
- typedef typename std::add_const< T >::type const_value_type ;
- typedef typename std::remove_const< T >::type non_const_value_type ;
- typedef ViewDimension<> static_dimension ;
- typedef ViewDimension<> dynamic_dimension ;
- typedef ViewDimension<> dimension ;
-};
-
-template< class T , size_t N >
-struct ViewArrayAnalysis< T[N] >
-{
-private:
- typedef ViewArrayAnalysis< T > nested ;
-public:
- typedef typename nested::value_type value_type ;
- typedef typename nested::const_value_type const_value_type ;
- typedef typename nested::non_const_value_type non_const_value_type ;
-
- typedef typename nested::static_dimension::template prepend<N>::type
- static_dimension ;
-
- typedef typename nested::dynamic_dimension dynamic_dimension ;
-
- typedef typename
- ViewDimensionJoin< dynamic_dimension , static_dimension >::type
- dimension ;
-};
-
-template< class T >
-struct ViewArrayAnalysis< T[] >
-{
-private:
- typedef ViewArrayAnalysis< T > nested ;
- typedef typename nested::dimension nested_dimension ;
-public:
- typedef typename nested::value_type value_type ;
- typedef typename nested::const_value_type const_value_type ;
- typedef typename nested::non_const_value_type non_const_value_type ;
-
- typedef typename nested::dynamic_dimension::template prepend<0>::type
- dynamic_dimension ;
-
- typedef typename nested::static_dimension static_dimension ;
-
- typedef typename
- ViewDimensionJoin< dynamic_dimension , static_dimension >::type
- dimension ;
-};
-
-template< class T >
-struct ViewArrayAnalysis< T* >
-{
-private:
- typedef ViewArrayAnalysis< T > nested ;
-public:
- typedef typename nested::value_type value_type ;
- typedef typename nested::const_value_type const_value_type ;
- typedef typename nested::non_const_value_type non_const_value_type ;
-
- typedef typename nested::dynamic_dimension::template prepend<0>::type
- dynamic_dimension ;
-
- typedef typename nested::static_dimension static_dimension ;
-
- typedef typename
- ViewDimensionJoin< dynamic_dimension , static_dimension >::type
- dimension ;
-};
-
-
-template< class DataType , class ArrayLayout , class ValueType >
-struct ViewDataAnalysis
-{
-private:
-
- typedef ViewArrayAnalysis< DataType > array_analysis ;
-
- // ValueType is opportunity for partial specialization.
- // Must match array analysis when this default template is used.
- static_assert( std::is_same< ValueType , typename array_analysis::non_const_value_type >::value , "" );
-
-public:
-
- typedef void specialize ; // No specialization
-
- typedef typename array_analysis::dimension dimension ;
- typedef typename array_analysis::value_type value_type ;
- typedef typename array_analysis::const_value_type const_value_type ;
- typedef typename array_analysis::non_const_value_type non_const_value_type ;
-
- // Generate analogous multidimensional array specification type.
- typedef typename ViewDataType< value_type , dimension >::type type ;
- typedef typename ViewDataType< const_value_type , dimension >::type const_type ;
- typedef typename ViewDataType< non_const_value_type , dimension >::type non_const_type ;
-
- // Generate "flattened" multidimensional array specification type.
- typedef type scalar_array_type ;
- typedef const_type const_scalar_array_type ;
- typedef non_const_type non_const_scalar_array_type ;
-};
-
-}}} // namespace Kokkos::Experimental::Impl
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Experimental {
-namespace Impl {
-
-template < class Dimension , class Layout , typename Enable = void >
-struct ViewOffset {
- using is_mapping_plugin = std::false_type ;
-};
-
-//----------------------------------------------------------------------------
-// LayoutLeft AND ( 1 >= rank OR 0 == rank_dynamic ) : no padding / striding
-template < class Dimension >
-struct ViewOffset< Dimension , Kokkos::LayoutLeft
- , typename std::enable_if<( 1 >= Dimension::rank
- ||
- 0 == Dimension::rank_dynamic
- )>::type >
-{
- using is_mapping_plugin = std::true_type ;
- using is_regular = std::true_type ;
-
- typedef size_t size_type ;
- typedef Dimension dimension_type ;
- typedef Kokkos::LayoutLeft array_layout ;
-
- dimension_type m_dim ;
-
- //----------------------------------------
-
- // rank 1
- template< typename I0 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0 ) const { return i0 ; }
-
- // rank 2
- template < typename I0 , typename I1 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0 , I1 const & i1 ) const
- { return i0 + m_dim.N0 * i1 ; }
-
- //rank 3
- template < typename I0, typename I1, typename I2 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2 ) const
- {
- return i0 + m_dim.N0 * ( i1 + m_dim.N1 * i2 );
- }
-
- //rank 4
- template < typename I0, typename I1, typename I2, typename I3 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3 ) const
- {
- return i0 + m_dim.N0 * (
- i1 + m_dim.N1 * (
- i2 + m_dim.N2 * i3 ));
- }
-
- //rank 5
- template < typename I0, typename I1, typename I2, typename I3
- , typename I4 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
- , I4 const & i4 ) const
- {
- return i0 + m_dim.N0 * (
- i1 + m_dim.N1 * (
- i2 + m_dim.N2 * (
- i3 + m_dim.N3 * i4 )));
- }
-
- //rank 6
- template < typename I0, typename I1, typename I2, typename I3
- , typename I4, typename I5 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
- , I4 const & i4, I5 const & i5 ) const
- {
- return i0 + m_dim.N0 * (
- i1 + m_dim.N1 * (
- i2 + m_dim.N2 * (
- i3 + m_dim.N3 * (
- i4 + m_dim.N4 * i5 ))));
- }
-
- //rank 7
- template < typename I0, typename I1, typename I2, typename I3
- , typename I4, typename I5, typename I6 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
- , I4 const & i4, I5 const & i5, I6 const & i6 ) const
- {
- return i0 + m_dim.N0 * (
- i1 + m_dim.N1 * (
- i2 + m_dim.N2 * (
- i3 + m_dim.N3 * (
- i4 + m_dim.N4 * (
- i5 + m_dim.N5 * i6 )))));
- }
-
- //rank 8
- template < typename I0, typename I1, typename I2, typename I3
- , typename I4, typename I5, typename I6, typename I7 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
- , I4 const & i4, I5 const & i5, I6 const & i6, I7 const & i7 ) const
- {
- return i0 + m_dim.N0 * (
- i1 + m_dim.N1 * (
- i2 + m_dim.N2 * (
- i3 + m_dim.N3 * (
- i4 + m_dim.N4 * (
- i5 + m_dim.N5 * (
- i6 + m_dim.N6 * i7 ))))));
- }
-
- //----------------------------------------
-
- KOKKOS_INLINE_FUNCTION
- constexpr array_layout layout() const
- {
- return array_layout( m_dim.N0 , m_dim.N1 , m_dim.N2 , m_dim.N3
- , m_dim.N4 , m_dim.N5 , m_dim.N6 , m_dim.N7 );
- }
-
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_0() const { return m_dim.N0 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_1() const { return m_dim.N1 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_2() const { return m_dim.N2 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_3() const { return m_dim.N3 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_4() const { return m_dim.N4 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_5() const { return m_dim.N5 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_6() const { return m_dim.N6 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_7() const { return m_dim.N7 ; }
-
- /* Cardinality of the domain index space */
- KOKKOS_INLINE_FUNCTION
- constexpr size_type size() const
- { return m_dim.N0 * m_dim.N1 * m_dim.N2 * m_dim.N3 * m_dim.N4 * m_dim.N5 * m_dim.N6 * m_dim.N7 ; }
-
- /* Span of the range space */
- KOKKOS_INLINE_FUNCTION
- constexpr size_type span() const
- { return m_dim.N0 * m_dim.N1 * m_dim.N2 * m_dim.N3 * m_dim.N4 * m_dim.N5 * m_dim.N6 * m_dim.N7 ; }
-
- KOKKOS_INLINE_FUNCTION constexpr bool span_is_contiguous() const { return true ; }
-
- /* Strides of dimensions */
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_0() const { return 1 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_1() const { return m_dim.N0 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_2() const { return m_dim.N0 * m_dim.N1 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_3() const { return m_dim.N0 * m_dim.N1 * m_dim.N2 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_4() const { return m_dim.N0 * m_dim.N1 * m_dim.N2 * m_dim.N3 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_5() const { return m_dim.N0 * m_dim.N1 * m_dim.N2 * m_dim.N3 * m_dim.N4 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_6() const { return m_dim.N0 * m_dim.N1 * m_dim.N2 * m_dim.N3 * m_dim.N4 * m_dim.N5 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_7() const { return m_dim.N0 * m_dim.N1 * m_dim.N2 * m_dim.N3 * m_dim.N4 * m_dim.N5 * m_dim.N6 ; }
-
- // Stride with [ rank ] value is the total length
- template< typename iType >
- KOKKOS_INLINE_FUNCTION
- void stride( iType * const s ) const
- {
- s[0] = 1 ;
- if ( 0 < dimension_type::rank ) { s[1] = m_dim.N0 ; }
- if ( 1 < dimension_type::rank ) { s[2] = s[1] * m_dim.N1 ; }
- if ( 2 < dimension_type::rank ) { s[3] = s[2] * m_dim.N2 ; }
- if ( 3 < dimension_type::rank ) { s[4] = s[3] * m_dim.N3 ; }
- if ( 4 < dimension_type::rank ) { s[5] = s[4] * m_dim.N4 ; }
- if ( 5 < dimension_type::rank ) { s[6] = s[5] * m_dim.N5 ; }
- if ( 6 < dimension_type::rank ) { s[7] = s[6] * m_dim.N6 ; }
- if ( 7 < dimension_type::rank ) { s[8] = s[7] * m_dim.N7 ; }
- }
-
- //----------------------------------------
-
- ViewOffset() = default ;
- ViewOffset( const ViewOffset & ) = default ;
- ViewOffset & operator = ( const ViewOffset & ) = default ;
-
- template< unsigned TrivialScalarSize >
- KOKKOS_INLINE_FUNCTION
- constexpr ViewOffset
- ( std::integral_constant<unsigned,TrivialScalarSize> const &
- , Kokkos::LayoutLeft const & arg_layout
- )
- : m_dim( arg_layout.dimension[0], 0, 0, 0, 0, 0, 0, 0 )
- {}
-
- template< class DimRHS >
- KOKKOS_INLINE_FUNCTION
- constexpr ViewOffset( const ViewOffset< DimRHS , Kokkos::LayoutLeft , void > & rhs )
- : m_dim( rhs.m_dim.N0 , rhs.m_dim.N1 , rhs.m_dim.N2 , rhs.m_dim.N3
- , rhs.m_dim.N4 , rhs.m_dim.N5 , rhs.m_dim.N6 , rhs.m_dim.N7 )
- {
- static_assert( int(DimRHS::rank) == int(dimension_type::rank) , "ViewOffset assignment requires equal rank" );
- // Also requires equal static dimensions ...
- }
-
- template< class DimRHS >
- KOKKOS_INLINE_FUNCTION
- constexpr ViewOffset( const ViewOffset< DimRHS , Kokkos::LayoutRight , void > & rhs )
- : m_dim( rhs.m_dim.N0, 0, 0, 0, 0, 0, 0, 0 )
- {
- static_assert( DimRHS::rank == 1 && dimension_type::rank == 1 && dimension_type::rank_dynamic == 1
- , "ViewOffset LayoutLeft and LayoutRight are only compatible when rank == 1" );
- }
-
- template< class DimRHS >
- KOKKOS_INLINE_FUNCTION
- ViewOffset( const ViewOffset< DimRHS , Kokkos::LayoutStride , void > & rhs )
- : m_dim( rhs.m_dim.N0, 0, 0, 0, 0, 0, 0, 0 )
- {
- static_assert( DimRHS::rank == 1 && dimension_type::rank == 1 && dimension_type::rank_dynamic == 1
- , "ViewOffset LayoutLeft and LayoutStride are only compatible when rank == 1" );
- if ( rhs.m_stride.S0 != 1 ) {
- Kokkos::abort("Kokkos::Experimental::ViewOffset assignment of LayoutLeft from LayoutStride requires stride == 1" );
- }
- }
-
- //----------------------------------------
- // Subview construction
-
- template< class DimRHS >
- KOKKOS_INLINE_FUNCTION
- constexpr ViewOffset(
- const ViewOffset< DimRHS , Kokkos::LayoutLeft , void > & rhs ,
- const SubviewExtents< DimRHS::rank , dimension_type::rank > & sub )
- : m_dim( sub.range_extent(0), 0, 0, 0, 0, 0, 0, 0 )
- {
- static_assert( ( 0 == dimension_type::rank ) ||
- ( 1 == dimension_type::rank && 1 == dimension_type::rank_dynamic && 1 <= DimRHS::rank )
- , "ViewOffset subview construction requires compatible rank" );
- }
-};
-
-//----------------------------------------------------------------------------
-// LayoutLeft AND ( 1 < rank AND 0 < rank_dynamic ) : has padding / striding
-template < class Dimension >
-struct ViewOffset< Dimension , Kokkos::LayoutLeft
- , typename std::enable_if<( 1 < Dimension::rank
- &&
- 0 < Dimension::rank_dynamic
- )>::type >
-{
- using is_mapping_plugin = std::true_type ;
- using is_regular = std::true_type ;
-
- typedef size_t size_type ;
- typedef Dimension dimension_type ;
- typedef Kokkos::LayoutLeft array_layout ;
-
- dimension_type m_dim ;
- size_type m_stride ;
-
- //----------------------------------------
-
- // rank 1
- template< typename I0 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0 ) const { return i0 ; }
-
- // rank 2
- template < typename I0 , typename I1 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0 , I1 const & i1 ) const
- { return i0 + m_stride * i1 ; }
-
- //rank 3
- template < typename I0, typename I1, typename I2 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2 ) const
- {
- return i0 + m_stride * ( i1 + m_dim.N1 * i2 );
- }
-
- //rank 4
- template < typename I0, typename I1, typename I2, typename I3 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3 ) const
- {
- return i0 + m_stride * (
- i1 + m_dim.N1 * (
- i2 + m_dim.N2 * i3 ));
- }
-
- //rank 5
- template < typename I0, typename I1, typename I2, typename I3
- , typename I4 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
- , I4 const & i4 ) const
- {
- return i0 + m_stride * (
- i1 + m_dim.N1 * (
- i2 + m_dim.N2 * (
- i3 + m_dim.N3 * i4 )));
- }
-
- //rank 6
- template < typename I0, typename I1, typename I2, typename I3
- , typename I4, typename I5 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
- , I4 const & i4, I5 const & i5 ) const
- {
- return i0 + m_stride * (
- i1 + m_dim.N1 * (
- i2 + m_dim.N2 * (
- i3 + m_dim.N3 * (
- i4 + m_dim.N4 * i5 ))));
- }
-
- //rank 7
- template < typename I0, typename I1, typename I2, typename I3
- , typename I4, typename I5, typename I6 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
- , I4 const & i4, I5 const & i5, I6 const & i6 ) const
- {
- return i0 + m_stride * (
- i1 + m_dim.N1 * (
- i2 + m_dim.N2 * (
- i3 + m_dim.N3 * (
- i4 + m_dim.N4 * (
- i5 + m_dim.N5 * i6 )))));
- }
-
- //rank 8
- template < typename I0, typename I1, typename I2, typename I3
- , typename I4, typename I5, typename I6, typename I7 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
- , I4 const & i4, I5 const & i5, I6 const & i6, I7 const & i7 ) const
- {
- return i0 + m_stride * (
- i1 + m_dim.N1 * (
- i2 + m_dim.N2 * (
- i3 + m_dim.N3 * (
- i4 + m_dim.N4 * (
- i5 + m_dim.N5 * (
- i6 + m_dim.N6 * i7 ))))));
- }
-
- //----------------------------------------
-
- KOKKOS_INLINE_FUNCTION
- constexpr array_layout layout() const
- {
- return array_layout( m_dim.N0 , m_dim.N1 , m_dim.N2 , m_dim.N3
- , m_dim.N4 , m_dim.N5 , m_dim.N6 , m_dim.N7 );
- }
-
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_0() const { return m_dim.N0 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_1() const { return m_dim.N1 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_2() const { return m_dim.N2 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_3() const { return m_dim.N3 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_4() const { return m_dim.N4 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_5() const { return m_dim.N5 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_6() const { return m_dim.N6 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_7() const { return m_dim.N7 ; }
-
- /* Cardinality of the domain index space */
- KOKKOS_INLINE_FUNCTION
- constexpr size_type size() const
- { return m_dim.N0 * m_dim.N1 * m_dim.N2 * m_dim.N3 * m_dim.N4 * m_dim.N5 * m_dim.N6 * m_dim.N7 ; }
-
- /* Span of the range space */
- KOKKOS_INLINE_FUNCTION
- constexpr size_type span() const
- { return m_stride * m_dim.N1 * m_dim.N2 * m_dim.N3 * m_dim.N4 * m_dim.N5 * m_dim.N6 * m_dim.N7 ; }
-
- KOKKOS_INLINE_FUNCTION constexpr bool span_is_contiguous() const { return m_stride == m_dim.N0 ; }
-
- /* Strides of dimensions */
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_0() const { return 1 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_1() const { return m_stride ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_2() const { return m_stride * m_dim.N1 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_3() const { return m_stride * m_dim.N1 * m_dim.N2 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_4() const { return m_stride * m_dim.N1 * m_dim.N2 * m_dim.N3 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_5() const { return m_stride * m_dim.N1 * m_dim.N2 * m_dim.N3 * m_dim.N4 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_6() const { return m_stride * m_dim.N1 * m_dim.N2 * m_dim.N3 * m_dim.N4 * m_dim.N5 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_7() const { return m_stride * m_dim.N1 * m_dim.N2 * m_dim.N3 * m_dim.N4 * m_dim.N5 * m_dim.N6 ; }
-
- // Stride with [ rank ] value is the total length
- template< typename iType >
- KOKKOS_INLINE_FUNCTION
- void stride( iType * const s ) const
- {
- s[0] = 1 ;
- if ( 0 < dimension_type::rank ) { s[1] = m_stride ; }
- if ( 1 < dimension_type::rank ) { s[2] = s[1] * m_dim.N1 ; }
- if ( 2 < dimension_type::rank ) { s[3] = s[2] * m_dim.N2 ; }
- if ( 3 < dimension_type::rank ) { s[4] = s[3] * m_dim.N3 ; }
- if ( 4 < dimension_type::rank ) { s[5] = s[4] * m_dim.N4 ; }
- if ( 5 < dimension_type::rank ) { s[6] = s[5] * m_dim.N5 ; }
- if ( 6 < dimension_type::rank ) { s[7] = s[6] * m_dim.N6 ; }
- if ( 7 < dimension_type::rank ) { s[8] = s[7] * m_dim.N7 ; }
- }
-
- //----------------------------------------
-
-private:
-
- template< unsigned TrivialScalarSize >
- struct Padding {
- enum { div = TrivialScalarSize == 0 ? 0 : Kokkos::Impl::MEMORY_ALIGNMENT / ( TrivialScalarSize ? TrivialScalarSize : 1 ) };
- enum { mod = TrivialScalarSize == 0 ? 0 : Kokkos::Impl::MEMORY_ALIGNMENT % ( TrivialScalarSize ? TrivialScalarSize : 1 ) };
-
- // If memory alignment is a multiple of the trivial scalar size then attempt to align.
- enum { align = 0 != TrivialScalarSize && 0 == mod ? div : 0 };
- enum { div_ok = div ? div : 1 }; // To valid modulo zero in constexpr
-
- KOKKOS_INLINE_FUNCTION
- static constexpr size_t stride( size_t const N )
- {
- return ( align && ( Kokkos::Impl::MEMORY_ALIGNMENT_THRESHOLD * align < N ) && ( N % div_ok ) )
- ? N + align - ( N % div_ok ) : N ;
- }
- };
-
-public:
-
- ViewOffset() = default ;
- ViewOffset( const ViewOffset & ) = default ;
- ViewOffset & operator = ( const ViewOffset & ) = default ;
-
- /* Enable padding for trivial scalar types with non-zero trivial scalar size */
- template< unsigned TrivialScalarSize >
- KOKKOS_INLINE_FUNCTION
- constexpr ViewOffset
- ( std::integral_constant<unsigned,TrivialScalarSize> const & padding_type_size
- , Kokkos::LayoutLeft const & arg_layout
- )
- : m_dim( arg_layout.dimension[0] , arg_layout.dimension[1]
- , arg_layout.dimension[2] , arg_layout.dimension[3]
- , arg_layout.dimension[4] , arg_layout.dimension[5]
- , arg_layout.dimension[6] , arg_layout.dimension[7]
- )
- , m_stride( Padding<TrivialScalarSize>::stride( arg_layout.dimension[0] ) )
- {}
-
- template< class DimRHS >
- KOKKOS_INLINE_FUNCTION
- constexpr ViewOffset( const ViewOffset< DimRHS , Kokkos::LayoutLeft , void > & rhs )
- : m_dim( rhs.m_dim.N0 , rhs.m_dim.N1 , rhs.m_dim.N2 , rhs.m_dim.N3
- , rhs.m_dim.N4 , rhs.m_dim.N5 , rhs.m_dim.N6 , rhs.m_dim.N7 )
- , m_stride( rhs.stride_1() )
- {
- static_assert( int(DimRHS::rank) == int(dimension_type::rank) , "ViewOffset assignment requires equal rank" );
- // Also requires equal static dimensions ...
- }
-
- //----------------------------------------
- // Subview construction
- // This subview must be 2 == rank and 2 == rank_dynamic
- // due to only having stride #0.
- // The source dimension #0 must be non-zero for stride-one leading dimension.
- // At most subsequent dimension can be non-zero.
-
- template< class DimRHS >
- KOKKOS_INLINE_FUNCTION
- constexpr ViewOffset
- ( const ViewOffset< DimRHS , Kokkos::LayoutLeft , void > & rhs ,
- const SubviewExtents< DimRHS::rank , dimension_type::rank > & sub )
- : m_dim( sub.range_extent(0)
- , sub.range_extent(1)
- , 0, 0, 0, 0, 0, 0 )
- , m_stride( ( 1 == sub.range_index(1) ? rhs.stride_1() :
- ( 2 == sub.range_index(1) ? rhs.stride_2() :
- ( 3 == sub.range_index(1) ? rhs.stride_3() :
- ( 4 == sub.range_index(1) ? rhs.stride_4() :
- ( 5 == sub.range_index(1) ? rhs.stride_5() :
- ( 6 == sub.range_index(1) ? rhs.stride_6() :
- ( 7 == sub.range_index(1) ? rhs.stride_7() : 0 ))))))))
- {
- static_assert( ( 2 == dimension_type::rank ) &&
- ( 2 == dimension_type::rank_dynamic ) &&
- ( 2 <= DimRHS::rank )
- , "ViewOffset subview construction requires compatible rank" );
- }
-};
-
-//----------------------------------------------------------------------------
-// LayoutRight AND ( 1 >= rank OR 0 == rank_dynamic ) : no padding / striding
-template < class Dimension >
-struct ViewOffset< Dimension , Kokkos::LayoutRight
- , typename std::enable_if<( 1 >= Dimension::rank
- ||
- 0 == Dimension::rank_dynamic
- )>::type >
-{
- using is_mapping_plugin = std::true_type ;
- using is_regular = std::true_type ;
-
- typedef size_t size_type ;
- typedef Dimension dimension_type ;
- typedef Kokkos::LayoutRight array_layout ;
-
- dimension_type m_dim ;
-
- //----------------------------------------
-
- // rank 1
- template< typename I0 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0 ) const { return i0 ; }
-
- // rank 2
- template < typename I0 , typename I1 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0 , I1 const & i1 ) const
- { return i1 + m_dim.N1 * i0 ; }
-
- //rank 3
- template < typename I0, typename I1, typename I2 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2 ) const
- {
- return i2 + m_dim.N2 * ( i1 + m_dim.N1 * ( i0 ));
- }
-
- //rank 4
- template < typename I0, typename I1, typename I2, typename I3 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3 ) const
- {
- return i3 + m_dim.N3 * (
- i2 + m_dim.N2 * (
- i1 + m_dim.N1 * ( i0 )));
- }
-
- //rank 5
- template < typename I0, typename I1, typename I2, typename I3
- , typename I4 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
- , I4 const & i4 ) const
- {
- return i4 + m_dim.N4 * (
- i3 + m_dim.N3 * (
- i2 + m_dim.N2 * (
- i1 + m_dim.N1 * ( i0 ))));
- }
-
- //rank 6
- template < typename I0, typename I1, typename I2, typename I3
- , typename I4, typename I5 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
- , I4 const & i4, I5 const & i5 ) const
- {
- return i5 + m_dim.N5 * (
- i4 + m_dim.N4 * (
- i3 + m_dim.N3 * (
- i2 + m_dim.N2 * (
- i1 + m_dim.N1 * ( i0 )))));
- }
-
- //rank 7
- template < typename I0, typename I1, typename I2, typename I3
- , typename I4, typename I5, typename I6 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
- , I4 const & i4, I5 const & i5, I6 const & i6 ) const
- {
- return i6 + m_dim.N6 * (
- i5 + m_dim.N5 * (
- i4 + m_dim.N4 * (
- i3 + m_dim.N3 * (
- i2 + m_dim.N2 * (
- i1 + m_dim.N1 * ( i0 ))))));
- }
-
- //rank 8
- template < typename I0, typename I1, typename I2, typename I3
- , typename I4, typename I5, typename I6, typename I7 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
- , I4 const & i4, I5 const & i5, I6 const & i6, I7 const & i7 ) const
- {
- return i7 + m_dim.N7 * (
- i6 + m_dim.N6 * (
- i5 + m_dim.N5 * (
- i4 + m_dim.N4 * (
- i3 + m_dim.N3 * (
- i2 + m_dim.N2 * (
- i1 + m_dim.N1 * ( i0 )))))));
- }
-
- //----------------------------------------
-
- KOKKOS_INLINE_FUNCTION
- constexpr array_layout layout() const
- {
- return array_layout( m_dim.N0 , m_dim.N1 , m_dim.N2 , m_dim.N3
- , m_dim.N4 , m_dim.N5 , m_dim.N6 , m_dim.N7 );
- }
-
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_0() const { return m_dim.N0 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_1() const { return m_dim.N1 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_2() const { return m_dim.N2 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_3() const { return m_dim.N3 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_4() const { return m_dim.N4 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_5() const { return m_dim.N5 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_6() const { return m_dim.N6 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_7() const { return m_dim.N7 ; }
-
- /* Cardinality of the domain index space */
- KOKKOS_INLINE_FUNCTION
- constexpr size_type size() const
- { return m_dim.N0 * m_dim.N1 * m_dim.N2 * m_dim.N3 * m_dim.N4 * m_dim.N5 * m_dim.N6 * m_dim.N7 ; }
-
- /* Span of the range space */
- KOKKOS_INLINE_FUNCTION
- constexpr size_type span() const
- { return m_dim.N0 * m_dim.N1 * m_dim.N2 * m_dim.N3 * m_dim.N4 * m_dim.N5 * m_dim.N6 * m_dim.N7 ; }
-
- KOKKOS_INLINE_FUNCTION constexpr bool span_is_contiguous() const { return true ; }
-
- /* Strides of dimensions */
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_7() const { return 1 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_6() const { return m_dim.N7 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_5() const { return m_dim.N7 * m_dim.N6 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_4() const { return m_dim.N7 * m_dim.N6 * m_dim.N5 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_3() const { return m_dim.N7 * m_dim.N6 * m_dim.N5 * m_dim.N4 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_2() const { return m_dim.N7 * m_dim.N6 * m_dim.N5 * m_dim.N4 * m_dim.N3 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_1() const { return m_dim.N7 * m_dim.N6 * m_dim.N5 * m_dim.N4 * m_dim.N3 * m_dim.N2 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_0() const { return m_dim.N7 * m_dim.N6 * m_dim.N5 * m_dim.N4 * m_dim.N3 * m_dim.N2 * m_dim.N1 ; }
-
- // Stride with [ rank ] value is the total length
- template< typename iType >
- KOKKOS_INLINE_FUNCTION
- void stride( iType * const s ) const
- {
- size_type n = 1 ;
- if ( 7 < dimension_type::rank ) { s[7] = n ; n *= m_dim.N7 ; }
- if ( 6 < dimension_type::rank ) { s[6] = n ; n *= m_dim.N6 ; }
- if ( 5 < dimension_type::rank ) { s[5] = n ; n *= m_dim.N5 ; }
- if ( 4 < dimension_type::rank ) { s[4] = n ; n *= m_dim.N4 ; }
- if ( 3 < dimension_type::rank ) { s[3] = n ; n *= m_dim.N3 ; }
- if ( 2 < dimension_type::rank ) { s[2] = n ; n *= m_dim.N2 ; }
- if ( 1 < dimension_type::rank ) { s[1] = n ; n *= m_dim.N1 ; }
- if ( 0 < dimension_type::rank ) { s[0] = n ; }
- s[dimension_type::rank] = n * m_dim.N0 ;
- }
-
- //----------------------------------------
-
- ViewOffset() = default ;
- ViewOffset( const ViewOffset & ) = default ;
- ViewOffset & operator = ( const ViewOffset & ) = default ;
-
- template< unsigned TrivialScalarSize >
- KOKKOS_INLINE_FUNCTION
- constexpr ViewOffset
- ( std::integral_constant<unsigned,TrivialScalarSize> const &
- , Kokkos::LayoutRight const & arg_layout
- )
- : m_dim( arg_layout.dimension[0], 0, 0, 0, 0, 0, 0, 0 )
- {}
-
- template< class DimRHS >
- KOKKOS_INLINE_FUNCTION
- constexpr ViewOffset( const ViewOffset< DimRHS , Kokkos::LayoutRight , void > & rhs )
- : m_dim( rhs.m_dim.N0 , rhs.m_dim.N1 , rhs.m_dim.N2 , rhs.m_dim.N3
- , rhs.m_dim.N4 , rhs.m_dim.N5 , rhs.m_dim.N6 , rhs.m_dim.N7 )
- {
- static_assert( int(DimRHS::rank) == int(dimension_type::rank) , "ViewOffset assignment requires equal rank" );
- // Also requires equal static dimensions ...
- }
-
- template< class DimRHS >
- KOKKOS_INLINE_FUNCTION
- constexpr ViewOffset( const ViewOffset< DimRHS , Kokkos::LayoutLeft , void > & rhs )
- : m_dim( rhs.m_dim.N0, 0, 0, 0, 0, 0, 0, 0 )
- {
- static_assert( DimRHS::rank == 1 && dimension_type::rank == 1 && dimension_type::rank_dynamic == 1
- , "ViewOffset LayoutRight and LayoutLeft are only compatible when rank == 1" );
- }
-
- template< class DimRHS >
- KOKKOS_INLINE_FUNCTION
- ViewOffset( const ViewOffset< DimRHS , Kokkos::LayoutStride , void > & rhs )
- : m_dim( rhs.m_dim.N0, 0, 0, 0, 0, 0, 0, 0 )
- {
- static_assert( DimRHS::rank == 1 && dimension_type::rank == 1 && dimension_type::rank_dynamic == 1
- , "ViewOffset LayoutLeft/Right and LayoutStride are only compatible when rank == 1" );
- if ( rhs.m_stride.S0 != 1 ) {
- Kokkos::abort("Kokkos::Experimental::ViewOffset assignment of LayoutLeft/Right from LayoutStride requires stride == 1" );
- }
- }
-
- //----------------------------------------
- // Subview construction
-
- template< class DimRHS >
- KOKKOS_INLINE_FUNCTION
- constexpr ViewOffset
- ( const ViewOffset< DimRHS , Kokkos::LayoutRight , void > & rhs
- , const SubviewExtents< DimRHS::rank , dimension_type::rank > & sub
- )
- : m_dim( sub.range_extent(0) , 0, 0, 0, 0, 0, 0, 0 )
- {
- static_assert( ( 0 == dimension_type::rank_dynamic ) ||
- ( 1 == dimension_type::rank && 1 == dimension_type::rank_dynamic && 1 <= DimRHS::rank )
- , "ViewOffset subview construction requires compatible rank" );
- }
-};
-
-//----------------------------------------------------------------------------
-// LayoutRight AND ( 1 < rank AND 0 < rank_dynamic ) : has padding / striding
-template < class Dimension >
-struct ViewOffset< Dimension , Kokkos::LayoutRight
- , typename std::enable_if<( 1 < Dimension::rank
- &&
- 0 < Dimension::rank_dynamic
- )>::type >
-{
- using is_mapping_plugin = std::true_type ;
- using is_regular = std::true_type ;
-
- typedef size_t size_type ;
- typedef Dimension dimension_type ;
- typedef Kokkos::LayoutRight array_layout ;
-
- dimension_type m_dim ;
- size_type m_stride ;
-
- //----------------------------------------
-
- // rank 1
- template< typename I0 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0 ) const { return i0 ; }
-
- // rank 2
- template < typename I0 , typename I1 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0 , I1 const & i1 ) const
- { return i1 + i0 * m_stride ; }
-
- //rank 3
- template < typename I0, typename I1, typename I2 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2 ) const
- { return i2 + m_dim.N2 * ( i1 ) + i0 * m_stride ; }
-
- //rank 4
- template < typename I0, typename I1, typename I2, typename I3 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3 ) const
- {
- return i3 + m_dim.N3 * (
- i2 + m_dim.N2 * ( i1 )) +
- i0 * m_stride ;
- }
-
- //rank 5
- template < typename I0, typename I1, typename I2, typename I3
- , typename I4 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
- , I4 const & i4 ) const
- {
- return i4 + m_dim.N4 * (
- i3 + m_dim.N3 * (
- i2 + m_dim.N2 * ( i1 ))) +
- i0 * m_stride ;
- }
-
- //rank 6
- template < typename I0, typename I1, typename I2, typename I3
- , typename I4, typename I5 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
- , I4 const & i4, I5 const & i5 ) const
- {
- return i5 + m_dim.N5 * (
- i4 + m_dim.N4 * (
- i3 + m_dim.N3 * (
- i2 + m_dim.N2 * ( i1 )))) +
- i0 * m_stride ;
- }
-
- //rank 7
- template < typename I0, typename I1, typename I2, typename I3
- , typename I4, typename I5, typename I6 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
- , I4 const & i4, I5 const & i5, I6 const & i6 ) const
- {
- return i6 + m_dim.N6 * (
- i5 + m_dim.N5 * (
- i4 + m_dim.N4 * (
- i3 + m_dim.N3 * (
- i2 + m_dim.N2 * ( i1 ))))) +
- i0 * m_stride ;
- }
-
- //rank 8
- template < typename I0, typename I1, typename I2, typename I3
- , typename I4, typename I5, typename I6, typename I7 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
- , I4 const & i4, I5 const & i5, I6 const & i6, I7 const & i7 ) const
- {
- return i7 + m_dim.N7 * (
- i6 + m_dim.N6 * (
- i5 + m_dim.N5 * (
- i4 + m_dim.N4 * (
- i3 + m_dim.N3 * (
- i2 + m_dim.N2 * ( i1 )))))) +
- i0 * m_stride ;
- }
-
- //----------------------------------------
-
- KOKKOS_INLINE_FUNCTION
- constexpr array_layout layout() const
- {
- return array_layout( m_dim.N0 , m_dim.N1 , m_dim.N2 , m_dim.N3
- , m_dim.N4 , m_dim.N5 , m_dim.N6 , m_dim.N7 );
- }
-
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_0() const { return m_dim.N0 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_1() const { return m_dim.N1 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_2() const { return m_dim.N2 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_3() const { return m_dim.N3 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_4() const { return m_dim.N4 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_5() const { return m_dim.N5 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_6() const { return m_dim.N6 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_7() const { return m_dim.N7 ; }
-
- /* Cardinality of the domain index space */
- KOKKOS_INLINE_FUNCTION
- constexpr size_type size() const
- { return m_dim.N0 * m_dim.N1 * m_dim.N2 * m_dim.N3 * m_dim.N4 * m_dim.N5 * m_dim.N6 * m_dim.N7 ; }
-
- /* Span of the range space */
- KOKKOS_INLINE_FUNCTION
- constexpr size_type span() const
- { return m_dim.N0 * m_stride ; }
-
- KOKKOS_INLINE_FUNCTION constexpr bool span_is_contiguous() const
- { return m_stride == m_dim.N7 * m_dim.N6 * m_dim.N5 * m_dim.N4 * m_dim.N3 * m_dim.N2 * m_dim.N1 ; }
-
- /* Strides of dimensions */
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_7() const { return 1 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_6() const { return m_dim.N7 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_5() const { return m_dim.N7 * m_dim.N6 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_4() const { return m_dim.N7 * m_dim.N6 * m_dim.N5 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_3() const { return m_dim.N7 * m_dim.N6 * m_dim.N5 * m_dim.N4 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_2() const { return m_dim.N7 * m_dim.N6 * m_dim.N5 * m_dim.N4 * m_dim.N3 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_1() const { return m_dim.N7 * m_dim.N6 * m_dim.N5 * m_dim.N4 * m_dim.N3 * m_dim.N2 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_0() const { return m_stride ; }
-
- // Stride with [ rank ] value is the total length
- template< typename iType >
- KOKKOS_INLINE_FUNCTION
- void stride( iType * const s ) const
- {
- size_type n = 1 ;
- if ( 7 < dimension_type::rank ) { s[7] = n ; n *= m_dim.N7 ; }
- if ( 6 < dimension_type::rank ) { s[6] = n ; n *= m_dim.N6 ; }
- if ( 5 < dimension_type::rank ) { s[5] = n ; n *= m_dim.N5 ; }
- if ( 4 < dimension_type::rank ) { s[4] = n ; n *= m_dim.N4 ; }
- if ( 3 < dimension_type::rank ) { s[3] = n ; n *= m_dim.N3 ; }
- if ( 2 < dimension_type::rank ) { s[2] = n ; n *= m_dim.N2 ; }
- if ( 1 < dimension_type::rank ) { s[1] = n ; }
- if ( 0 < dimension_type::rank ) { s[0] = m_stride ; }
- s[dimension_type::rank] = m_stride * m_dim.N0 ;
- }
-
- //----------------------------------------
-
-private:
-
- template< unsigned TrivialScalarSize >
- struct Padding {
- enum { div = TrivialScalarSize == 0 ? 0 : Kokkos::Impl::MEMORY_ALIGNMENT / ( TrivialScalarSize ? TrivialScalarSize : 1 ) };
- enum { mod = TrivialScalarSize == 0 ? 0 : Kokkos::Impl::MEMORY_ALIGNMENT % ( TrivialScalarSize ? TrivialScalarSize : 1 ) };
-
- // If memory alignment is a multiple of the trivial scalar size then attempt to align.
- enum { align = 0 != TrivialScalarSize && 0 == mod ? div : 0 };
- enum { div_ok = div ? div : 1 }; // To valid modulo zero in constexpr
-
- KOKKOS_INLINE_FUNCTION
- static constexpr size_t stride( size_t const N )
- {
- return ( align && ( Kokkos::Impl::MEMORY_ALIGNMENT_THRESHOLD * align < N ) && ( N % div_ok ) )
- ? N + align - ( N % div_ok ) : N ;
- }
- };
-
-public:
-
- ViewOffset() = default ;
- ViewOffset( const ViewOffset & ) = default ;
- ViewOffset & operator = ( const ViewOffset & ) = default ;
-
- /* Enable padding for trivial scalar types with non-zero trivial scalar size. */
- template< unsigned TrivialScalarSize >
- KOKKOS_INLINE_FUNCTION
- constexpr ViewOffset
- ( std::integral_constant<unsigned,TrivialScalarSize> const & padding_type_size
- , Kokkos::LayoutRight const & arg_layout
- )
- : m_dim( arg_layout.dimension[0] , arg_layout.dimension[1]
- , arg_layout.dimension[2] , arg_layout.dimension[3]
- , arg_layout.dimension[4] , arg_layout.dimension[5]
- , arg_layout.dimension[6] , arg_layout.dimension[7]
- )
- , m_stride( Padding<TrivialScalarSize>::
- stride( /* 2 <= rank */
- m_dim.N1 * ( dimension_type::rank == 2 ? 1 :
- m_dim.N2 * ( dimension_type::rank == 3 ? 1 :
- m_dim.N3 * ( dimension_type::rank == 4 ? 1 :
- m_dim.N4 * ( dimension_type::rank == 5 ? 1 :
- m_dim.N5 * ( dimension_type::rank == 6 ? 1 :
- m_dim.N6 * ( dimension_type::rank == 7 ? 1 : m_dim.N7 )))))) ))
- {}
-
- template< class DimRHS >
- KOKKOS_INLINE_FUNCTION
- constexpr ViewOffset( const ViewOffset< DimRHS , Kokkos::LayoutRight , void > & rhs )
- : m_dim( rhs.m_dim.N0 , rhs.m_dim.N1 , rhs.m_dim.N2 , rhs.m_dim.N3
- , rhs.m_dim.N4 , rhs.m_dim.N5 , rhs.m_dim.N6 , rhs.m_dim.N7 )
- , m_stride( rhs.stride_0() )
- {
- static_assert( int(DimRHS::rank) == int(dimension_type::rank) , "ViewOffset assignment requires equal rank" );
- // Also requires equal static dimensions ...
- }
-
- //----------------------------------------
- // Subview construction
- // Last dimension must be non-zero
-
- template< class DimRHS >
- KOKKOS_INLINE_FUNCTION
- constexpr ViewOffset
- ( const ViewOffset< DimRHS , Kokkos::LayoutRight , void > & rhs
- , const SubviewExtents< DimRHS::rank , dimension_type::rank > & sub
- )
- : m_dim( sub.range_extent(0)
- , sub.range_extent(1)
- , 0, 0, 0, 0, 0, 0 )
- , m_stride( 0 == sub.range_index(0) ? rhs.stride_0() : (
- 1 == sub.range_index(0) ? rhs.stride_1() : (
- 2 == sub.range_index(0) ? rhs.stride_2() : (
- 3 == sub.range_index(0) ? rhs.stride_3() : (
- 4 == sub.range_index(0) ? rhs.stride_4() : (
- 5 == sub.range_index(0) ? rhs.stride_5() : (
- 6 == sub.range_index(0) ? rhs.stride_6() : 0 )))))))
- {
- // This subview must be 2 == rank and 2 == rank_dynamic
- // due to only having stride #0.
- // The source dimension #0 must be non-zero for stride-one leading dimension.
- // At most subsequent dimension can be non-zero.
-
- static_assert( ( 2 == dimension_type::rank ) &&
- ( 2 <= DimRHS::rank )
- , "ViewOffset subview construction requires compatible rank" );
- }
-};
-
-//----------------------------------------------------------------------------
-/* Strided array layout only makes sense for 0 < rank */
-/* rank = 0 included for DynRankView case */
-
-template< unsigned Rank >
-struct ViewStride ;
-
-template<>
-struct ViewStride<0> {
- enum { S0 = 0 , S1 = 0 , S2 = 0 , S3 = 0 , S4 = 0 , S5 = 0 , S6 = 0 , S7 = 0 };
-
- ViewStride() = default ;
- ViewStride( const ViewStride & ) = default ;
- ViewStride & operator = ( const ViewStride & ) = default ;
-
- KOKKOS_INLINE_FUNCTION
- constexpr ViewStride( size_t , size_t , size_t , size_t
- , size_t , size_t , size_t , size_t )
- {}
-};
-
-template<>
-struct ViewStride<1> {
- size_t S0 ;
- enum { S1 = 0 , S2 = 0 , S3 = 0 , S4 = 0 , S5 = 0 , S6 = 0 , S7 = 0 };
-
- ViewStride() = default ;
- ViewStride( const ViewStride & ) = default ;
- ViewStride & operator = ( const ViewStride & ) = default ;
-
- KOKKOS_INLINE_FUNCTION
- constexpr ViewStride( size_t aS0 , size_t , size_t , size_t
- , size_t , size_t , size_t , size_t )
- : S0( aS0 )
- {}
-};
-
-template<>
-struct ViewStride<2> {
- size_t S0 , S1 ;
- enum { S2 = 0 , S3 = 0 , S4 = 0 , S5 = 0 , S6 = 0 , S7 = 0 };
-
- ViewStride() = default ;
- ViewStride( const ViewStride & ) = default ;
- ViewStride & operator = ( const ViewStride & ) = default ;
-
- KOKKOS_INLINE_FUNCTION
- constexpr ViewStride( size_t aS0 , size_t aS1 , size_t , size_t
- , size_t , size_t , size_t , size_t )
- : S0( aS0 ) , S1( aS1 )
- {}
-};
-
-template<>
-struct ViewStride<3> {
- size_t S0 , S1 , S2 ;
- enum { S3 = 0 , S4 = 0 , S5 = 0 , S6 = 0 , S7 = 0 };
-
- ViewStride() = default ;
- ViewStride( const ViewStride & ) = default ;
- ViewStride & operator = ( const ViewStride & ) = default ;
-
- KOKKOS_INLINE_FUNCTION
- constexpr ViewStride( size_t aS0 , size_t aS1 , size_t aS2 , size_t
- , size_t , size_t , size_t , size_t )
- : S0( aS0 ) , S1( aS1 ) , S2( aS2 )
- {}
-};
-
-template<>
-struct ViewStride<4> {
- size_t S0 , S1 , S2 , S3 ;
- enum { S4 = 0 , S5 = 0 , S6 = 0 , S7 = 0 };
-
- ViewStride() = default ;
- ViewStride( const ViewStride & ) = default ;
- ViewStride & operator = ( const ViewStride & ) = default ;
-
- KOKKOS_INLINE_FUNCTION
- constexpr ViewStride( size_t aS0 , size_t aS1 , size_t aS2 , size_t aS3
- , size_t , size_t , size_t , size_t )
- : S0( aS0 ) , S1( aS1 ) , S2( aS2 ) , S3( aS3 )
- {}
-};
-
-template<>
-struct ViewStride<5> {
- size_t S0 , S1 , S2 , S3 , S4 ;
- enum { S5 = 0 , S6 = 0 , S7 = 0 };
-
- ViewStride() = default ;
- ViewStride( const ViewStride & ) = default ;
- ViewStride & operator = ( const ViewStride & ) = default ;
-
- KOKKOS_INLINE_FUNCTION
- constexpr ViewStride( size_t aS0 , size_t aS1 , size_t aS2 , size_t aS3
- , size_t aS4 , size_t , size_t , size_t )
- : S0( aS0 ) , S1( aS1 ) , S2( aS2 ) , S3( aS3 )
- , S4( aS4 )
- {}
-};
-
-template<>
-struct ViewStride<6> {
- size_t S0 , S1 , S2 , S3 , S4 , S5 ;
- enum { S6 = 0 , S7 = 0 };
-
- ViewStride() = default ;
- ViewStride( const ViewStride & ) = default ;
- ViewStride & operator = ( const ViewStride & ) = default ;
-
- KOKKOS_INLINE_FUNCTION
- constexpr ViewStride( size_t aS0 , size_t aS1 , size_t aS2 , size_t aS3
- , size_t aS4 , size_t aS5 , size_t , size_t )
- : S0( aS0 ) , S1( aS1 ) , S2( aS2 ) , S3( aS3 )
- , S4( aS4 ) , S5( aS5 )
- {}
-};
-
-template<>
-struct ViewStride<7> {
- size_t S0 , S1 , S2 , S3 , S4 , S5 , S6 ;
- enum { S7 = 0 };
-
- ViewStride() = default ;
- ViewStride( const ViewStride & ) = default ;
- ViewStride & operator = ( const ViewStride & ) = default ;
-
- KOKKOS_INLINE_FUNCTION
- constexpr ViewStride( size_t aS0 , size_t aS1 , size_t aS2 , size_t aS3
- , size_t aS4 , size_t aS5 , size_t aS6 , size_t )
- : S0( aS0 ) , S1( aS1 ) , S2( aS2 ) , S3( aS3 )
- , S4( aS4 ) , S5( aS5 ) , S6( aS6 )
- {}
-};
-
-template<>
-struct ViewStride<8> {
- size_t S0 , S1 , S2 , S3 , S4 , S5 , S6 , S7 ;
-
- ViewStride() = default ;
- ViewStride( const ViewStride & ) = default ;
- ViewStride & operator = ( const ViewStride & ) = default ;
-
- KOKKOS_INLINE_FUNCTION
- constexpr ViewStride( size_t aS0 , size_t aS1 , size_t aS2 , size_t aS3
- , size_t aS4 , size_t aS5 , size_t aS6 , size_t aS7 )
- : S0( aS0 ) , S1( aS1 ) , S2( aS2 ) , S3( aS3 )
- , S4( aS4 ) , S5( aS5 ) , S6( aS6 ) , S7( aS7 )
- {}
-};
-
-template < class Dimension >
-struct ViewOffset< Dimension , Kokkos::LayoutStride
- , void >
-{
-private:
- typedef ViewStride< Dimension::rank > stride_type ;
-public:
-
- using is_mapping_plugin = std::true_type ;
- using is_regular = std::true_type ;
-
- typedef size_t size_type ;
- typedef Dimension dimension_type ;
- typedef Kokkos::LayoutStride array_layout ;
-
- dimension_type m_dim ;
- stride_type m_stride ;
-
- //----------------------------------------
-
- // rank 1
- template< typename I0 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0 ) const
- {
- return i0 * m_stride.S0 ;
- }
-
- // rank 2
- template < typename I0 , typename I1 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0 , I1 const & i1 ) const
- {
- return i0 * m_stride.S0 +
- i1 * m_stride.S1 ;
- }
-
- //rank 3
- template < typename I0, typename I1, typename I2 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2 ) const
- {
- return i0 * m_stride.S0 +
- i1 * m_stride.S1 +
- i2 * m_stride.S2 ;
- }
-
- //rank 4
- template < typename I0, typename I1, typename I2, typename I3 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3 ) const
- {
- return i0 * m_stride.S0 +
- i1 * m_stride.S1 +
- i2 * m_stride.S2 +
- i3 * m_stride.S3 ;
- }
-
- //rank 5
- template < typename I0, typename I1, typename I2, typename I3
- , typename I4 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
- , I4 const & i4 ) const
- {
- return i0 * m_stride.S0 +
- i1 * m_stride.S1 +
- i2 * m_stride.S2 +
- i3 * m_stride.S3 +
- i4 * m_stride.S4 ;
- }
-
- //rank 6
- template < typename I0, typename I1, typename I2, typename I3
- , typename I4, typename I5 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
- , I4 const & i4, I5 const & i5 ) const
- {
- return i0 * m_stride.S0 +
- i1 * m_stride.S1 +
- i2 * m_stride.S2 +
- i3 * m_stride.S3 +
- i4 * m_stride.S4 +
- i5 * m_stride.S5 ;
- }
-
- //rank 7
- template < typename I0, typename I1, typename I2, typename I3
- , typename I4, typename I5, typename I6 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
- , I4 const & i4, I5 const & i5, I6 const & i6 ) const
- {
- return i0 * m_stride.S0 +
- i1 * m_stride.S1 +
- i2 * m_stride.S2 +
- i3 * m_stride.S3 +
- i4 * m_stride.S4 +
- i5 * m_stride.S5 +
- i6 * m_stride.S6 ;
- }
-
- //rank 8
- template < typename I0, typename I1, typename I2, typename I3
- , typename I4, typename I5, typename I6, typename I7 >
- KOKKOS_INLINE_FUNCTION constexpr
- size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
- , I4 const & i4, I5 const & i5, I6 const & i6, I7 const & i7 ) const
- {
- return i0 * m_stride.S0 +
- i1 * m_stride.S1 +
- i2 * m_stride.S2 +
- i3 * m_stride.S3 +
- i4 * m_stride.S4 +
- i5 * m_stride.S5 +
- i6 * m_stride.S6 +
- i7 * m_stride.S7 ;
- }
-
- //----------------------------------------
-
- KOKKOS_INLINE_FUNCTION
- constexpr array_layout layout() const
- {
- return array_layout( m_dim.N0 , m_stride.S0
- , m_dim.N1 , m_stride.S1
- , m_dim.N2 , m_stride.S2
- , m_dim.N3 , m_stride.S3
- , m_dim.N4 , m_stride.S4
- , m_dim.N5 , m_stride.S5
- , m_dim.N6 , m_stride.S6
- , m_dim.N7 , m_stride.S7
- );
- }
-
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_0() const { return m_dim.N0 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_1() const { return m_dim.N1 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_2() const { return m_dim.N2 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_3() const { return m_dim.N3 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_4() const { return m_dim.N4 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_5() const { return m_dim.N5 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_6() const { return m_dim.N6 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type dimension_7() const { return m_dim.N7 ; }
-
- /* Cardinality of the domain index space */
- KOKKOS_INLINE_FUNCTION
- constexpr size_type size() const
- { return m_dim.N0 * m_dim.N1 * m_dim.N2 * m_dim.N3 * m_dim.N4 * m_dim.N5 * m_dim.N6 * m_dim.N7 ; }
-
-private:
-
- KOKKOS_INLINE_FUNCTION
- static constexpr size_type Max( size_type lhs , size_type rhs )
- { return lhs < rhs ? rhs : lhs ; }
-
-public:
-
- /* Span of the range space, largest stride * dimension */
- KOKKOS_INLINE_FUNCTION
- constexpr size_type span() const
- {
- return Max( m_dim.N0 * m_stride.S0 ,
- Max( m_dim.N1 * m_stride.S1 ,
- Max( m_dim.N2 * m_stride.S2 ,
- Max( m_dim.N3 * m_stride.S3 ,
- Max( m_dim.N4 * m_stride.S4 ,
- Max( m_dim.N5 * m_stride.S5 ,
- Max( m_dim.N6 * m_stride.S6 ,
- m_dim.N7 * m_stride.S7 )))))));
- }
-
- KOKKOS_INLINE_FUNCTION constexpr bool span_is_contiguous() const { return span() == size(); }
-
- /* Strides of dimensions */
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_0() const { return m_stride.S0 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_1() const { return m_stride.S1 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_2() const { return m_stride.S2 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_3() const { return m_stride.S3 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_4() const { return m_stride.S4 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_5() const { return m_stride.S5 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_6() const { return m_stride.S6 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_type stride_7() const { return m_stride.S7 ; }
-
- // Stride with [ rank ] value is the total length
- template< typename iType >
- KOKKOS_INLINE_FUNCTION
- void stride( iType * const s ) const
- {
- if ( 0 < dimension_type::rank ) { s[0] = m_stride.S0 ; }
- if ( 1 < dimension_type::rank ) { s[1] = m_stride.S1 ; }
- if ( 2 < dimension_type::rank ) { s[2] = m_stride.S2 ; }
- if ( 3 < dimension_type::rank ) { s[3] = m_stride.S3 ; }
- if ( 4 < dimension_type::rank ) { s[4] = m_stride.S4 ; }
- if ( 5 < dimension_type::rank ) { s[5] = m_stride.S5 ; }
- if ( 6 < dimension_type::rank ) { s[6] = m_stride.S6 ; }
- if ( 7 < dimension_type::rank ) { s[7] = m_stride.S7 ; }
- s[dimension_type::rank] = span();
- }
-
- //----------------------------------------
-
- ViewOffset() = default ;
- ViewOffset( const ViewOffset & ) = default ;
- ViewOffset & operator = ( const ViewOffset & ) = default ;
-
- KOKKOS_INLINE_FUNCTION
- constexpr ViewOffset( std::integral_constant<unsigned,0> const &
- , Kokkos::LayoutStride const & rhs )
- : m_dim( rhs.dimension[0] , rhs.dimension[1] , rhs.dimension[2] , rhs.dimension[3]
- , rhs.dimension[4] , rhs.dimension[5] , rhs.dimension[6] , rhs.dimension[7] )
- , m_stride( rhs.stride[0] , rhs.stride[1] , rhs.stride[2] , rhs.stride[3]
- , rhs.stride[4] , rhs.stride[5] , rhs.stride[6] , rhs.stride[7] )
- {}
-
- template< class DimRHS , class LayoutRHS >
- KOKKOS_INLINE_FUNCTION
- constexpr ViewOffset( const ViewOffset< DimRHS , LayoutRHS , void > & rhs )
- : m_dim( rhs.m_dim.N0 , rhs.m_dim.N1 , rhs.m_dim.N2 , rhs.m_dim.N3
- , rhs.m_dim.N4 , rhs.m_dim.N5 , rhs.m_dim.N6 , rhs.m_dim.N7 )
- , m_stride( rhs.stride_0() , rhs.stride_1() , rhs.stride_2() , rhs.stride_3()
- , rhs.stride_4() , rhs.stride_5() , rhs.stride_6() , rhs.stride_7() )
- {
- static_assert( int(DimRHS::rank) == int(dimension_type::rank) , "ViewOffset assignment requires equal rank" );
- // Also requires equal static dimensions ...
- }
-
- //----------------------------------------
- // Subview construction
-
-private:
-
- template< class DimRHS , class LayoutRHS >
- KOKKOS_INLINE_FUNCTION static
- constexpr size_t stride
- ( unsigned r , const ViewOffset< DimRHS , LayoutRHS , void > & rhs )
- {
- return r > 7 ? 0 : (
- r == 0 ? rhs.stride_0() : (
- r == 1 ? rhs.stride_1() : (
- r == 2 ? rhs.stride_2() : (
- r == 3 ? rhs.stride_3() : (
- r == 4 ? rhs.stride_4() : (
- r == 5 ? rhs.stride_5() : (
- r == 6 ? rhs.stride_6() : rhs.stride_7() )))))));
- }
-
-public:
-
- template< class DimRHS , class LayoutRHS >
- KOKKOS_INLINE_FUNCTION
- constexpr ViewOffset
- ( const ViewOffset< DimRHS , LayoutRHS , void > & rhs
- , const SubviewExtents< DimRHS::rank , dimension_type::rank > & sub
- )
- // range_extent(r) returns 0 when dimension_type::rank <= r
- : m_dim( sub.range_extent(0)
- , sub.range_extent(1)
- , sub.range_extent(2)
- , sub.range_extent(3)
- , sub.range_extent(4)
- , sub.range_extent(5)
- , sub.range_extent(6)
- , sub.range_extent(7)
- )
- // range_index(r) returns ~0u when dimension_type::rank <= r
- , m_stride( stride( sub.range_index(0), rhs )
- , stride( sub.range_index(1), rhs )
- , stride( sub.range_index(2), rhs )
- , stride( sub.range_index(3), rhs )
- , stride( sub.range_index(4), rhs )
- , stride( sub.range_index(5), rhs )
- , stride( sub.range_index(6), rhs )
- , stride( sub.range_index(7), rhs )
- )
- {}
-};
-
-}}} // namespace Kokkos::Experimental::Impl
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Experimental {
-namespace Impl {
-
-/** \brief ViewDataHandle provides the type of the 'data handle' which the view
- * uses to access data with the [] operator. It also provides
- * an allocate function and a function to extract a raw ptr from the
- * data handle. ViewDataHandle also defines an enum ReferenceAble which
- * specifies whether references/pointers to elements can be taken and a
- * 'return_type' which is what the view operators will give back.
- * Specialisation of this object allows three things depending
- * on ViewTraits and compiler options:
- * (i) Use special allocator (e.g. huge pages/small pages and pinned memory)
- * (ii) Use special data handle type (e.g. add Cuda Texture Object)
- * (iii) Use special access intrinsics (e.g. texture fetch and non-caching loads)
- */
-template< class Traits , class Enable = void >
-struct ViewDataHandle {
-
- typedef typename Traits::value_type value_type ;
- typedef typename Traits::value_type * handle_type ;
- typedef typename Traits::value_type & return_type ;
- typedef Kokkos::Experimental::Impl::SharedAllocationTracker track_type ;
-
- KOKKOS_INLINE_FUNCTION
- static handle_type assign( value_type * arg_data_ptr
- , track_type const & /*arg_tracker*/ )
- {
- return handle_type( arg_data_ptr );
- }
-};
-
-template< class Traits >
-struct ViewDataHandle< Traits ,
- typename std::enable_if<( std::is_same< typename Traits::non_const_value_type
- , typename Traits::value_type >::value
- &&
- std::is_same< typename Traits::specialize , void >::value
- &&
- Traits::memory_traits::Atomic
- )>::type >
-{
- typedef typename Traits::value_type value_type ;
- typedef typename Kokkos::Impl::AtomicViewDataHandle< Traits > handle_type ;
- typedef typename Kokkos::Impl::AtomicDataElement< Traits > return_type ;
- typedef Kokkos::Experimental::Impl::SharedAllocationTracker track_type ;
-
- KOKKOS_INLINE_FUNCTION
- static handle_type assign( value_type * arg_data_ptr
- , track_type const & /*arg_tracker*/ )
- {
- return handle_type( arg_data_ptr );
- }
-};
-
-}}} // namespace Kokkos::Experimental::Impl
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Experimental {
-namespace Impl {
-
-//----------------------------------------------------------------------------
-
-/*
- * The construction, assignment to default, and destruction
- * are merged into a single functor.
- * Primarily to work around an unresolved CUDA back-end bug
- * that would lose the destruction cuda device function when
- * called from the shared memory tracking destruction.
- * Secondarily to have two fewer partial specializations.
- */
-template< class ExecSpace
- , class ValueType
- , bool IsScalar = std::is_scalar< ValueType >::value
- >
-struct ViewValueFunctor ;
-
-template< class ExecSpace , class ValueType >
-struct ViewValueFunctor< ExecSpace , ValueType , false /* is_scalar */ >
-{
- typedef Kokkos::RangePolicy< ExecSpace > PolicyType ;
-
- ExecSpace space ;
- ValueType * ptr ;
- size_t n ;
- bool destroy ;
-
- KOKKOS_INLINE_FUNCTION
- void operator()( const size_t i ) const
- {
- if ( destroy ) { (ptr+i)->~ValueType(); }
- else { new (ptr+i) ValueType(); }
- }
-
- ViewValueFunctor() = default ;
- ViewValueFunctor( const ViewValueFunctor & ) = default ;
- ViewValueFunctor & operator = ( const ViewValueFunctor & ) = default ;
-
- ViewValueFunctor( ExecSpace const & arg_space
- , ValueType * const arg_ptr
- , size_t const arg_n )
- : space( arg_space )
- , ptr( arg_ptr )
- , n( arg_n )
- , destroy( false )
- {}
-
- void execute( bool arg )
- {
- destroy = arg ;
- if ( ! space.in_parallel() ) {
- const Kokkos::Impl::ParallelFor< ViewValueFunctor , PolicyType >
- closure( *this , PolicyType( 0 , n ) );
- closure.execute();
- space.fence();
- }
- else {
- for ( size_t i = 0 ; i < n ; ++i ) operator()(i);
- }
- }
-
- void construct_shared_allocation()
- { execute( false ); }
-
- void destroy_shared_allocation()
- { execute( true ); }
-};
-
-
-template< class ExecSpace , class ValueType >
-struct ViewValueFunctor< ExecSpace , ValueType , true /* is_scalar */ >
-{
- typedef Kokkos::RangePolicy< ExecSpace > PolicyType ;
-
- ExecSpace space ;
- ValueType * ptr ;
- size_t n ;
-
- KOKKOS_INLINE_FUNCTION
- void operator()( const size_t i ) const
- { ptr[i] = ValueType(); }
-
- ViewValueFunctor() = default ;
- ViewValueFunctor( const ViewValueFunctor & ) = default ;
- ViewValueFunctor & operator = ( const ViewValueFunctor & ) = default ;
-
- ViewValueFunctor( ExecSpace const & arg_space
- , ValueType * const arg_ptr
- , size_t const arg_n )
- : space( arg_space )
- , ptr( arg_ptr )
- , n( arg_n )
- {}
-
- void construct_shared_allocation()
- {
- if ( ! space.in_parallel() ) {
- const Kokkos::Impl::ParallelFor< ViewValueFunctor , PolicyType >
- closure( *this , PolicyType( 0 , n ) );
- closure.execute();
- space.fence();
- }
- else {
- for ( size_t i = 0 ; i < n ; ++i ) operator()(i);
- }
- }
-
- void destroy_shared_allocation() {}
-};
-
-//----------------------------------------------------------------------------
-/** \brief View mapping for non-specialized data type and standard layout */
-template< class Traits >
-class ViewMapping< Traits ,
- typename std::enable_if<(
- std::is_same< typename Traits::specialize , void >::value
- &&
- ViewOffset< typename Traits::dimension
- , typename Traits::array_layout
- , void >::is_mapping_plugin::value
- )>::type >
-{
-private:
-
- template< class , class ... > friend class ViewMapping ;
- template< class , class ... > friend class Kokkos::Experimental::View ;
-
- typedef ViewOffset< typename Traits::dimension
- , typename Traits::array_layout
- , void
- > offset_type ;
-
- typedef typename ViewDataHandle< Traits >::handle_type handle_type ;
-
- handle_type m_handle ;
- offset_type m_offset ;
-
- KOKKOS_INLINE_FUNCTION
- ViewMapping( const handle_type & arg_handle , const offset_type & arg_offset )
- : m_handle( arg_handle )
- , m_offset( arg_offset )
- {}
-
-public:
-
- //----------------------------------------
- // Domain dimensions
-
- enum { Rank = Traits::dimension::rank };
-
- template< typename iType >
- KOKKOS_INLINE_FUNCTION constexpr size_t extent( const iType & r ) const
- { return m_offset.m_dim.extent(r); }
-
- KOKKOS_INLINE_FUNCTION constexpr
- typename Traits::array_layout layout() const
- { return m_offset.layout(); }
-
- KOKKOS_INLINE_FUNCTION constexpr size_t dimension_0() const { return m_offset.dimension_0(); }
- KOKKOS_INLINE_FUNCTION constexpr size_t dimension_1() const { return m_offset.dimension_1(); }
- KOKKOS_INLINE_FUNCTION constexpr size_t dimension_2() const { return m_offset.dimension_2(); }
- KOKKOS_INLINE_FUNCTION constexpr size_t dimension_3() const { return m_offset.dimension_3(); }
- KOKKOS_INLINE_FUNCTION constexpr size_t dimension_4() const { return m_offset.dimension_4(); }
- KOKKOS_INLINE_FUNCTION constexpr size_t dimension_5() const { return m_offset.dimension_5(); }
- KOKKOS_INLINE_FUNCTION constexpr size_t dimension_6() const { return m_offset.dimension_6(); }
- KOKKOS_INLINE_FUNCTION constexpr size_t dimension_7() const { return m_offset.dimension_7(); }
-
- // Is a regular layout with uniform striding for each index.
- using is_regular = typename offset_type::is_regular ;
-
- KOKKOS_INLINE_FUNCTION constexpr size_t stride_0() const { return m_offset.stride_0(); }
- KOKKOS_INLINE_FUNCTION constexpr size_t stride_1() const { return m_offset.stride_1(); }
- KOKKOS_INLINE_FUNCTION constexpr size_t stride_2() const { return m_offset.stride_2(); }
- KOKKOS_INLINE_FUNCTION constexpr size_t stride_3() const { return m_offset.stride_3(); }
- KOKKOS_INLINE_FUNCTION constexpr size_t stride_4() const { return m_offset.stride_4(); }
- KOKKOS_INLINE_FUNCTION constexpr size_t stride_5() const { return m_offset.stride_5(); }
- KOKKOS_INLINE_FUNCTION constexpr size_t stride_6() const { return m_offset.stride_6(); }
- KOKKOS_INLINE_FUNCTION constexpr size_t stride_7() const { return m_offset.stride_7(); }
-
- template< typename iType >
- KOKKOS_INLINE_FUNCTION void stride( iType * const s ) const { m_offset.stride(s); }
-
- //----------------------------------------
- // Range span
-
- /** \brief Span of the mapped range */
- KOKKOS_INLINE_FUNCTION constexpr size_t span() const { return m_offset.span(); }
-
- /** \brief Is the mapped range span contiguous */
- KOKKOS_INLINE_FUNCTION constexpr bool span_is_contiguous() const { return m_offset.span_is_contiguous(); }
-
- typedef typename ViewDataHandle< Traits >::return_type reference_type ;
- typedef typename Traits::value_type * pointer_type ;
-
- /** \brief If data references are lvalue_reference than can query pointer to memory */
- KOKKOS_INLINE_FUNCTION constexpr pointer_type data() const
- {
- return std::is_lvalue_reference< reference_type >::value
- ? (pointer_type) m_handle
- : (pointer_type) 0 ;
- }
-
- //----------------------------------------
- // The View class performs all rank and bounds checking before
- // calling these element reference methods.
-
- KOKKOS_FORCEINLINE_FUNCTION
- reference_type reference() const { return m_handle[0]; }
-
- template< typename I0 >
- KOKKOS_FORCEINLINE_FUNCTION
- typename
- std::enable_if< std::is_integral<I0>::value &&
- ! std::is_same< typename Traits::array_layout , Kokkos::LayoutStride >::value
- , reference_type >::type
- reference( const I0 & i0 ) const { return m_handle[i0]; }
-
- template< typename I0 >
- KOKKOS_FORCEINLINE_FUNCTION
- typename
- std::enable_if< std::is_integral<I0>::value &&
- std::is_same< typename Traits::array_layout , Kokkos::LayoutStride >::value
- , reference_type >::type
- reference( const I0 & i0 ) const { return m_handle[ m_offset(i0) ]; }
-
- template< typename I0 , typename I1 >
- KOKKOS_FORCEINLINE_FUNCTION
- reference_type reference( const I0 & i0 , const I1 & i1 ) const
- { return m_handle[ m_offset(i0,i1) ]; }
-
- template< typename I0 , typename I1 , typename I2 >
- KOKKOS_FORCEINLINE_FUNCTION
- reference_type reference( const I0 & i0 , const I1 & i1 , const I2 & i2 ) const
- { return m_handle[ m_offset(i0,i1,i2) ]; }
-
- template< typename I0 , typename I1 , typename I2 , typename I3 >
- KOKKOS_FORCEINLINE_FUNCTION
- reference_type reference( const I0 & i0 , const I1 & i1 , const I2 & i2 , const I3 & i3 ) const
- { return m_handle[ m_offset(i0,i1,i2,i3) ]; }
-
- template< typename I0 , typename I1 , typename I2 , typename I3
- , typename I4 >
- KOKKOS_FORCEINLINE_FUNCTION
- reference_type reference( const I0 & i0 , const I1 & i1 , const I2 & i2 , const I3 & i3
- , const I4 & i4 ) const
- { return m_handle[ m_offset(i0,i1,i2,i3,i4) ]; }
-
- template< typename I0 , typename I1 , typename I2 , typename I3
- , typename I4 , typename I5 >
- KOKKOS_FORCEINLINE_FUNCTION
- reference_type reference( const I0 & i0 , const I1 & i1 , const I2 & i2 , const I3 & i3
- , const I4 & i4 , const I5 & i5 ) const
- { return m_handle[ m_offset(i0,i1,i2,i3,i4,i5) ]; }
-
- template< typename I0 , typename I1 , typename I2 , typename I3
- , typename I4 , typename I5 , typename I6 >
- KOKKOS_FORCEINLINE_FUNCTION
- reference_type reference( const I0 & i0 , const I1 & i1 , const I2 & i2 , const I3 & i3
- , const I4 & i4 , const I5 & i5 , const I6 & i6 ) const
- { return m_handle[ m_offset(i0,i1,i2,i3,i4,i5,i6) ]; }
-
- template< typename I0 , typename I1 , typename I2 , typename I3
- , typename I4 , typename I5 , typename I6 , typename I7 >
- KOKKOS_FORCEINLINE_FUNCTION
- reference_type reference( const I0 & i0 , const I1 & i1 , const I2 & i2 , const I3 & i3
- , const I4 & i4 , const I5 & i5 , const I6 & i6 , const I7 & i7 ) const
- { return m_handle[ m_offset(i0,i1,i2,i3,i4,i5,i6,i7) ]; }
-
- //----------------------------------------
-
-private:
-
- enum { MemorySpanMask = 8 - 1 /* Force alignment on 8 byte boundary */ };
- enum { MemorySpanSize = sizeof(typename Traits::value_type) };
-
-public:
-
- /** \brief Span, in bytes, of the referenced memory */
- KOKKOS_INLINE_FUNCTION constexpr size_t memory_span() const
- {
- return ( m_offset.span() * sizeof(typename Traits::value_type) + MemorySpanMask ) & ~size_t(MemorySpanMask);
- }
-
- //----------------------------------------
-
- KOKKOS_INLINE_FUNCTION ~ViewMapping() {}
- KOKKOS_INLINE_FUNCTION ViewMapping() : m_handle(), m_offset() {}
- KOKKOS_INLINE_FUNCTION ViewMapping( const ViewMapping & rhs )
- : m_handle( rhs.m_handle ), m_offset( rhs.m_offset ) {}
- KOKKOS_INLINE_FUNCTION ViewMapping & operator = ( const ViewMapping & rhs )
- { m_handle = rhs.m_handle ; m_offset = rhs.m_offset ; return *this ; }
-
- KOKKOS_INLINE_FUNCTION ViewMapping( ViewMapping && rhs )
- : m_handle( rhs.m_handle ), m_offset( rhs.m_offset ) {}
- KOKKOS_INLINE_FUNCTION ViewMapping & operator = ( ViewMapping && rhs )
- { m_handle = rhs.m_handle ; m_offset = rhs.m_offset ; return *this ; }
-
- //----------------------------------------
-
- /**\brief Span, in bytes, of the required memory */
- KOKKOS_INLINE_FUNCTION
- static constexpr size_t memory_span( typename Traits::array_layout const & arg_layout )
- {
- typedef std::integral_constant< unsigned , 0 > padding ;
- return ( offset_type( padding(), arg_layout ).span() * MemorySpanSize + MemorySpanMask ) & ~size_t(MemorySpanMask);
- }
-
- /**\brief Wrap a span of memory */
- template< class ... P >
- KOKKOS_INLINE_FUNCTION
- ViewMapping( ViewCtorProp< P ... > const & arg_prop
- , typename Traits::array_layout const & arg_layout
- )
- : m_handle( ( (ViewCtorProp<void,pointer_type> const &) arg_prop ).value )
- , m_offset( std::integral_constant< unsigned , 0 >() , arg_layout )
- {}
-
- //----------------------------------------
- /* Allocate and construct mapped array.
- * Allocate via shared allocation record and
- * return that record for allocation tracking.
- */
- template< class ... P >
- SharedAllocationRecord<> *
- allocate_shared( ViewCtorProp< P... > const & arg_prop
- , typename Traits::array_layout const & arg_layout )
- {
- typedef ViewCtorProp< P... > alloc_prop ;
-
- typedef typename alloc_prop::execution_space execution_space ;
- typedef typename Traits::memory_space memory_space ;
- typedef typename Traits::value_type value_type ;
- typedef ViewValueFunctor< execution_space , value_type > functor_type ;
- typedef SharedAllocationRecord< memory_space , functor_type > record_type ;
-
- // Query the mapping for byte-size of allocation.
- // If padding is allowed then pass in sizeof value type
- // for padding computation.
- typedef std::integral_constant
- < unsigned
- , alloc_prop::allow_padding ? sizeof(value_type) : 0
- > padding ;
-
- m_offset = offset_type( padding(), arg_layout );
-
- const size_t alloc_size =
- ( m_offset.span() * MemorySpanSize + MemorySpanMask ) & ~size_t(MemorySpanMask);
-
- // Create shared memory tracking record with allocate memory from the memory space
- record_type * const record =
- record_type::allocate( ( (ViewCtorProp<void,memory_space> const &) arg_prop ).value
- , ( (ViewCtorProp<void,std::string> const &) arg_prop ).value
- , alloc_size );
-
- // Only set the the pointer and initialize if the allocation is non-zero.
- // May be zero if one of the dimensions is zero.
- if ( alloc_size ) {
-
- m_handle = handle_type( reinterpret_cast< pointer_type >( record->data() ) );
-
- if ( alloc_prop::initialize ) {
- // Assume destruction is only required when construction is requested.
- // The ViewValueFunctor has both value construction and destruction operators.
- record->m_destroy = functor_type( ( (ViewCtorProp<void,execution_space> const &) arg_prop).value
- , (value_type *) m_handle
- , m_offset.span()
- );
-
- // Construct values
- record->m_destroy.construct_shared_allocation();
- }
- }
-
- return record ;
- }
-};
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-/** \brief Assign compatible default mappings */
-
-template< class DstTraits , class SrcTraits >
-class ViewMapping< DstTraits , SrcTraits ,
- typename std::enable_if<(
- std::is_same< typename DstTraits::memory_space , typename SrcTraits::memory_space >::value
- &&
- std::is_same< typename DstTraits::specialize , void >::value
- &&
- std::is_same< typename SrcTraits::specialize , void >::value
- &&
- (
- std::is_same< typename DstTraits::array_layout , typename SrcTraits::array_layout >::value
- ||
- (
- (
- std::is_same< typename DstTraits::array_layout , Kokkos::LayoutLeft >::value ||
- std::is_same< typename DstTraits::array_layout , Kokkos::LayoutRight >::value ||
- std::is_same< typename DstTraits::array_layout , Kokkos::LayoutStride >::value
- )
- &&
- (
- std::is_same< typename SrcTraits::array_layout , Kokkos::LayoutLeft >::value ||
- std::is_same< typename SrcTraits::array_layout , Kokkos::LayoutRight >::value ||
- std::is_same< typename SrcTraits::array_layout , Kokkos::LayoutStride >::value
- )
- )
- )
- )>::type >
-{
-private:
-
- enum { is_assignable_value_type =
- std::is_same< typename DstTraits::value_type
- , typename SrcTraits::value_type >::value ||
- std::is_same< typename DstTraits::value_type
- , typename SrcTraits::const_value_type >::value };
-
- enum { is_assignable_dimension =
- ViewDimensionAssignable< typename DstTraits::dimension
- , typename SrcTraits::dimension >::value };
-
- enum { is_assignable_layout =
- std::is_same< typename DstTraits::array_layout
- , typename SrcTraits::array_layout >::value ||
- std::is_same< typename DstTraits::array_layout
- , Kokkos::LayoutStride >::value ||
- ( DstTraits::dimension::rank == 0 ) ||
- ( DstTraits::dimension::rank == 1 &&
- DstTraits::dimension::rank_dynamic == 1 )
- };
-
-public:
-
- enum { is_assignable = is_assignable_value_type &&
- is_assignable_dimension &&
- is_assignable_layout };
-
- typedef Kokkos::Experimental::Impl::SharedAllocationTracker TrackType ;
- typedef ViewMapping< DstTraits , void > DstType ;
- typedef ViewMapping< SrcTraits , void > SrcType ;
-
- KOKKOS_INLINE_FUNCTION
- static void assign( DstType & dst , const SrcType & src , const TrackType & src_track )
- {
- static_assert( is_assignable_value_type
- , "View assignment must have same value type or const = non-const" );
-
- static_assert( is_assignable_dimension
- , "View assignment must have compatible dimensions" );
-
- static_assert( is_assignable_layout
- , "View assignment must have compatible layout or have rank <= 1" );
-
- typedef typename DstType::offset_type dst_offset_type ;
-
- if ( size_t(DstTraits::dimension::rank_dynamic) < size_t(SrcTraits::dimension::rank_dynamic) ) {
- typedef typename DstTraits::dimension dst_dim;
- bool assignable =
- ( ( 1 > DstTraits::dimension::rank_dynamic && 1 <= SrcTraits::dimension::rank_dynamic ) ?
- dst_dim::ArgN0 == src.dimension_0() : true ) &&
- ( ( 2 > DstTraits::dimension::rank_dynamic && 2 <= SrcTraits::dimension::rank_dynamic ) ?
- dst_dim::ArgN1 == src.dimension_1() : true ) &&
- ( ( 3 > DstTraits::dimension::rank_dynamic && 3 <= SrcTraits::dimension::rank_dynamic ) ?
- dst_dim::ArgN2 == src.dimension_2() : true ) &&
- ( ( 4 > DstTraits::dimension::rank_dynamic && 4 <= SrcTraits::dimension::rank_dynamic ) ?
- dst_dim::ArgN3 == src.dimension_3() : true ) &&
- ( ( 5 > DstTraits::dimension::rank_dynamic && 5 <= SrcTraits::dimension::rank_dynamic ) ?
- dst_dim::ArgN4 == src.dimension_4() : true ) &&
- ( ( 6 > DstTraits::dimension::rank_dynamic && 6 <= SrcTraits::dimension::rank_dynamic ) ?
- dst_dim::ArgN5 == src.dimension_5() : true ) &&
- ( ( 7 > DstTraits::dimension::rank_dynamic && 7 <= SrcTraits::dimension::rank_dynamic ) ?
- dst_dim::ArgN6 == src.dimension_6() : true ) &&
- ( ( 8 > DstTraits::dimension::rank_dynamic && 8 <= SrcTraits::dimension::rank_dynamic ) ?
- dst_dim::ArgN7 == src.dimension_7() : true )
- ;
- if(!assignable)
- Kokkos::abort("View Assignment: trying to assign runtime dimension to non matching compile time dimension.");
- }
- dst.m_offset = dst_offset_type( src.m_offset );
- dst.m_handle = Kokkos::Experimental::Impl::ViewDataHandle< DstTraits >::assign( src.m_handle , src_track );
- }
-};
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-// Subview mapping.
-// Deduce destination view type from source view traits and subview arguments
-
-template< class SrcTraits , class ... Args >
-struct ViewMapping
- < typename std::enable_if<(
- std::is_same< typename SrcTraits::specialize , void >::value
- &&
- (
- std::is_same< typename SrcTraits::array_layout
- , Kokkos::LayoutLeft >::value ||
- std::is_same< typename SrcTraits::array_layout
- , Kokkos::LayoutRight >::value ||
- std::is_same< typename SrcTraits::array_layout
- , Kokkos::LayoutStride >::value
- )
- )>::type
- , SrcTraits
- , Args ... >
-{
-private:
-
- static_assert( SrcTraits::rank == sizeof...(Args) ,
- "Subview mapping requires one argument for each dimension of source View" );
-
- enum
- { RZ = false
- , R0 = bool(is_integral_extent<0,Args...>::value)
- , R1 = bool(is_integral_extent<1,Args...>::value)
- , R2 = bool(is_integral_extent<2,Args...>::value)
- , R3 = bool(is_integral_extent<3,Args...>::value)
- , R4 = bool(is_integral_extent<4,Args...>::value)
- , R5 = bool(is_integral_extent<5,Args...>::value)
- , R6 = bool(is_integral_extent<6,Args...>::value)
- , R7 = bool(is_integral_extent<7,Args...>::value)
- };
-
- enum { rank = unsigned(R0) + unsigned(R1) + unsigned(R2) + unsigned(R3)
- + unsigned(R4) + unsigned(R5) + unsigned(R6) + unsigned(R7) };
-
- // Whether right-most rank is a range.
- enum { R0_rev = ( 0 == SrcTraits::rank ? RZ : (
- 1 == SrcTraits::rank ? R0 : (
- 2 == SrcTraits::rank ? R1 : (
- 3 == SrcTraits::rank ? R2 : (
- 4 == SrcTraits::rank ? R3 : (
- 5 == SrcTraits::rank ? R4 : (
- 6 == SrcTraits::rank ? R5 : (
- 7 == SrcTraits::rank ? R6 : R7 )))))))) };
-
- // Subview's layout
- typedef typename std::conditional<
- ( /* Same array layout IF */
- ( rank == 0 ) /* output rank zero */
- ||
- // OutputRank 1 or 2, InputLayout Left, Interval 0
- // because single stride one or second index has a stride.
- ( rank <= 2 && R0 && std::is_same< typename SrcTraits::array_layout , Kokkos::LayoutLeft >::value ) //replace with input rank
- ||
- // OutputRank 1 or 2, InputLayout Right, Interval [InputRank-1]
- // because single stride one or second index has a stride.
- ( rank <= 2 && R0_rev && std::is_same< typename SrcTraits::array_layout , Kokkos::LayoutRight >::value ) //replace input rank
- ), typename SrcTraits::array_layout , Kokkos::LayoutStride
- >::type array_layout ;
-
- typedef typename SrcTraits::value_type value_type ;
-
- typedef typename std::conditional< rank == 0 , value_type ,
- typename std::conditional< rank == 1 , value_type * ,
- typename std::conditional< rank == 2 , value_type ** ,
- typename std::conditional< rank == 3 , value_type *** ,
- typename std::conditional< rank == 4 , value_type **** ,
- typename std::conditional< rank == 5 , value_type ***** ,
- typename std::conditional< rank == 6 , value_type ****** ,
- typename std::conditional< rank == 7 , value_type ******* ,
- value_type ********
- >::type >::type >::type >::type >::type >::type >::type >::type
- data_type ;
-
-public:
-
- typedef Kokkos::Experimental::ViewTraits
- < data_type
- , array_layout
- , typename SrcTraits::device_type
- , typename SrcTraits::memory_traits > traits_type ;
-
- typedef Kokkos::Experimental::View
- < data_type
- , array_layout
- , typename SrcTraits::device_type
- , typename SrcTraits::memory_traits > type ;
-
- template< class MemoryTraits >
- struct apply {
-
- static_assert( Kokkos::Impl::is_memory_traits< MemoryTraits >::value , "" );
-
- typedef Kokkos::Experimental::ViewTraits
- < data_type
- , array_layout
- , typename SrcTraits::device_type
- , MemoryTraits > traits_type ;
-
- typedef Kokkos::Experimental::View
- < data_type
- , array_layout
- , typename SrcTraits::device_type
- , MemoryTraits > type ;
- };
-
- // The presumed type is 'ViewMapping< traits_type , void >'
- // However, a compatible ViewMapping is acceptable.
- template< class DstTraits >
- KOKKOS_INLINE_FUNCTION
- static void assign( ViewMapping< DstTraits , void > & dst
- , ViewMapping< SrcTraits , void > const & src
- , Args ... args )
- {
- static_assert(
- ViewMapping< DstTraits , traits_type , void >::is_assignable ,
- "Subview destination type must be compatible with subview derived type" );
-
- typedef ViewMapping< DstTraits , void > DstType ;
-
- typedef typename DstType::offset_type dst_offset_type ;
- typedef typename DstType::handle_type dst_handle_type ;
-
- const SubviewExtents< SrcTraits::rank , rank >
- extents( src.m_offset.m_dim , args... );
-
- dst.m_offset = dst_offset_type( src.m_offset , extents );
- dst.m_handle = dst_handle_type( src.m_handle +
- src.m_offset( extents.domain_offset(0)
- , extents.domain_offset(1)
- , extents.domain_offset(2)
- , extents.domain_offset(3)
- , extents.domain_offset(4)
- , extents.domain_offset(5)
- , extents.domain_offset(6)
- , extents.domain_offset(7)
- ) );
- }
-};
-
-
-
-//----------------------------------------------------------------------------
-
-}}} // namespace Kokkos::Experimental::Impl
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Experimental {
-namespace Impl {
-
-template< unsigned , class MapType >
-KOKKOS_INLINE_FUNCTION
-bool view_verify_operator_bounds( const MapType & )
-{ return true ; }
-
-template< unsigned R , class MapType , class iType , class ... Args >
-KOKKOS_INLINE_FUNCTION
-bool view_verify_operator_bounds
- ( const MapType & map
- , const iType & i
- , Args ... args
- )
-{
- return ( size_t(i) < map.extent(R) )
- && view_verify_operator_bounds<R+1>( map , args ... );
-}
-
-template< unsigned , class MapType >
-inline
-void view_error_operator_bounds( char * , int , const MapType & )
-{}
-
-template< unsigned R , class MapType , class iType , class ... Args >
-inline
-void view_error_operator_bounds
- ( char * buf
- , int len
- , const MapType & map
- , const iType & i
- , Args ... args
- )
-{
- const int n =
- snprintf(buf,len," %ld < %ld %c"
- , static_cast<unsigned long>(i)
- , static_cast<unsigned long>( map.extent(R) )
- , ( sizeof...(Args) ? ',' : ')' )
- );
- view_error_operator_bounds<R+1>(buf+n,len-n,map,args...);
-}
-
-template< class MapType , class ... Args >
-KOKKOS_INLINE_FUNCTION
-void view_verify_operator_bounds
- ( const MapType & map , Args ... args )
-{
- if ( ! view_verify_operator_bounds<0>( map , args ... ) ) {
-#if defined( KOKKOS_ACTIVE_EXECUTION_SPACE_HOST )
- enum { LEN = 1024 };
- char buffer[ LEN ];
- int n = snprintf(buf,LEN,"View bounds error(" );
- view_error_operator_bounds<0>( buffer + n , LEN - n , map , args ... );
- Kokkos::Impl::throw_runtime_exception(std::string(buffer));
-#else
- Kokkos::abort("View bounds error");
-#endif
- }
-}
-
-
-class Error_view_scalar_reference_to_non_scalar_view ;
-
-} /* namespace Impl */
-} /* namespace Experimental */
-} /* namespace Kokkos */
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-#endif /* #ifndef KOKKOS_EXPERIMENTAL_VIEW_MAPPING_HPP */
+// Deprecated file for backward compatibility
+#include <impl/Kokkos_ViewMapping.hpp>
diff --git a/lib/kokkos/core/src/impl/Kokkos_AnalyzeShape.hpp b/lib/kokkos/core/src/impl/Kokkos_AnalyzeShape.hpp
deleted file mode 100644
index 2de9df008..000000000
--- a/lib/kokkos/core/src/impl/Kokkos_AnalyzeShape.hpp
+++ /dev/null
@@ -1,260 +0,0 @@
-/*
-//@HEADER
-// ************************************************************************
-//
-// Kokkos v. 2.0
-// Copyright (2014) Sandia Corporation
-//
-// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
-// the U.S. Government retains certain rights in this software.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are
-// met:
-//
-// 1. Redistributions of source code must retain the above copyright
-// notice, this list of conditions and the following disclaimer.
-//
-// 2. Redistributions in binary form must reproduce the above copyright
-// notice, this list of conditions and the following disclaimer in the
-// documentation and/or other materials provided with the distribution.
-//
-// 3. Neither the name of the Corporation nor the names of the
-// contributors may be used to endorse or promote products derived from
-// this software without specific prior written permission.
-//
-// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
-// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
-// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
-// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
-// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
-// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
-// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
-// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
-// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-//
-// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
-// ************************************************************************
-//@HEADER
-*/
-
-#ifndef KOKKOS_ANALYZESHAPE_HPP
-#define KOKKOS_ANALYZESHAPE_HPP
-
-#include <impl/Kokkos_Shape.hpp>
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Impl {
-
-//----------------------------------------------------------------------------
-
-/** \brief Analyze the array shape defined by a Kokkos::View data type.
- *
- * It is presumed that the data type can be mapped down to a multidimensional
- * array of an intrinsic scalar numerical type (double, float, int, ... ).
- * The 'value_type' of an array may be an embedded aggregate type such
- * as a fixed length array 'Array<T,N>'.
- * In this case the 'array_intrinsic_type' represents the
- * underlying array of intrinsic scalar numerical type.
- *
- * The embedded aggregate type must have an AnalyzeShape specialization
- * to map it down to a shape and intrinsic scalar numerical type.
- */
-template< class T >
-struct AnalyzeShape : public Shape< sizeof(T) , 0 >
-{
- typedef void specialize ;
-
- typedef Shape< sizeof(T), 0 > shape ;
-
- typedef T array_intrinsic_type ;
- typedef T value_type ;
- typedef T type ;
-
- typedef const T const_array_intrinsic_type ;
- typedef const T const_value_type ;
- typedef const T const_type ;
-
- typedef T non_const_array_intrinsic_type ;
- typedef T non_const_value_type ;
- typedef T non_const_type ;
-};
-
-template<>
-struct AnalyzeShape<void> : public Shape< 0 , 0 >
-{
- typedef void specialize ;
-
- typedef Shape< 0 , 0 > shape ;
-
- typedef void array_intrinsic_type ;
- typedef void value_type ;
- typedef void type ;
- typedef const void const_array_intrinsic_type ;
- typedef const void const_value_type ;
- typedef const void const_type ;
- typedef void non_const_array_intrinsic_type ;
- typedef void non_const_value_type ;
- typedef void non_const_type ;
-};
-
-template< class T >
-struct AnalyzeShape< const T > : public AnalyzeShape<T>::shape
-{
-private:
- typedef AnalyzeShape<T> nested ;
-public:
-
- typedef typename nested::specialize specialize ;
-
- typedef typename nested::shape shape ;
-
- typedef typename nested::const_array_intrinsic_type array_intrinsic_type ;
- typedef typename nested::const_value_type value_type ;
- typedef typename nested::const_type type ;
-
- typedef typename nested::const_array_intrinsic_type const_array_intrinsic_type ;
- typedef typename nested::const_value_type const_value_type ;
- typedef typename nested::const_type const_type ;
-
- typedef typename nested::non_const_array_intrinsic_type non_const_array_intrinsic_type ;
- typedef typename nested::non_const_value_type non_const_value_type ;
- typedef typename nested::non_const_type non_const_type ;
-};
-
-template< class T >
-struct AnalyzeShape< T * >
- : public ShapeInsert< typename AnalyzeShape<T>::shape , 0 >::type
-{
-private:
- typedef AnalyzeShape<T> nested ;
-public:
-
- typedef typename nested::specialize specialize ;
-
- typedef typename ShapeInsert< typename nested::shape , 0 >::type shape ;
-
- typedef typename nested::array_intrinsic_type * array_intrinsic_type ;
- typedef typename nested::value_type value_type ;
- typedef typename nested::type * type ;
-
- typedef typename nested::const_array_intrinsic_type * const_array_intrinsic_type ;
- typedef typename nested::const_value_type const_value_type ;
- typedef typename nested::const_type * const_type ;
-
- typedef typename nested::non_const_array_intrinsic_type * non_const_array_intrinsic_type ;
- typedef typename nested::non_const_value_type non_const_value_type ;
- typedef typename nested::non_const_type * non_const_type ;
-};
-
-template< class T >
-struct AnalyzeShape< T[] >
- : public ShapeInsert< typename AnalyzeShape<T>::shape , 0 >::type
-{
-private:
- typedef AnalyzeShape<T> nested ;
-public:
-
- typedef typename nested::specialize specialize ;
-
- typedef typename ShapeInsert< typename nested::shape , 0 >::type shape ;
-
- typedef typename nested::array_intrinsic_type array_intrinsic_type [] ;
- typedef typename nested::value_type value_type ;
- typedef typename nested::type type [] ;
-
- typedef typename nested::const_array_intrinsic_type const_array_intrinsic_type [] ;
- typedef typename nested::const_value_type const_value_type ;
- typedef typename nested::const_type const_type [] ;
-
- typedef typename nested::non_const_array_intrinsic_type non_const_array_intrinsic_type [] ;
- typedef typename nested::non_const_value_type non_const_value_type ;
- typedef typename nested::non_const_type non_const_type [] ;
-};
-
-template< class T >
-struct AnalyzeShape< const T[] >
- : public ShapeInsert< typename AnalyzeShape< const T >::shape , 0 >::type
-{
-private:
- typedef AnalyzeShape< const T > nested ;
-public:
-
- typedef typename nested::specialize specialize ;
-
- typedef typename ShapeInsert< typename nested::shape , 0 >::type shape ;
-
- typedef typename nested::array_intrinsic_type array_intrinsic_type [] ;
- typedef typename nested::value_type value_type ;
- typedef typename nested::type type [] ;
-
- typedef typename nested::const_array_intrinsic_type const_array_intrinsic_type [] ;
- typedef typename nested::const_value_type const_value_type ;
- typedef typename nested::const_type const_type [] ;
-
- typedef typename nested::non_const_array_intrinsic_type non_const_array_intrinsic_type [] ;
- typedef typename nested::non_const_value_type non_const_value_type ;
- typedef typename nested::non_const_type non_const_type [] ;
-};
-
-template< class T , unsigned N >
-struct AnalyzeShape< T[N] >
- : public ShapeInsert< typename AnalyzeShape<T>::shape , N >::type
-{
-private:
- typedef AnalyzeShape<T> nested ;
-public:
-
- typedef typename nested::specialize specialize ;
-
- typedef typename ShapeInsert< typename nested::shape , N >::type shape ;
-
- typedef typename nested::array_intrinsic_type array_intrinsic_type [N] ;
- typedef typename nested::value_type value_type ;
- typedef typename nested::type type [N] ;
-
- typedef typename nested::const_array_intrinsic_type const_array_intrinsic_type [N] ;
- typedef typename nested::const_value_type const_value_type ;
- typedef typename nested::const_type const_type [N] ;
-
- typedef typename nested::non_const_array_intrinsic_type non_const_array_intrinsic_type [N] ;
- typedef typename nested::non_const_value_type non_const_value_type ;
- typedef typename nested::non_const_type non_const_type [N] ;
-};
-
-template< class T , unsigned N >
-struct AnalyzeShape< const T[N] >
- : public ShapeInsert< typename AnalyzeShape< const T >::shape , N >::type
-{
-private:
- typedef AnalyzeShape< const T > nested ;
-public:
-
- typedef typename nested::specialize specialize ;
-
- typedef typename ShapeInsert< typename nested::shape , N >::type shape ;
-
- typedef typename nested::array_intrinsic_type array_intrinsic_type [N] ;
- typedef typename nested::value_type value_type ;
- typedef typename nested::type type [N] ;
-
- typedef typename nested::const_array_intrinsic_type const_array_intrinsic_type [N] ;
- typedef typename nested::const_value_type const_value_type ;
- typedef typename nested::const_type const_type [N] ;
-
- typedef typename nested::non_const_array_intrinsic_type non_const_array_intrinsic_type [N] ;
- typedef typename nested::non_const_value_type non_const_value_type ;
- typedef typename nested::non_const_type non_const_type [N] ;
-};
-
-} // namespace Impl
-} // namespace Kokkos
-
-#endif /* #ifndef KOKKOS_ANALYZESHAPE_HPP */
-
diff --git a/lib/kokkos/core/src/impl/Kokkos_Atomic_Compare_Exchange_Strong.hpp b/lib/kokkos/core/src/impl/Kokkos_Atomic_Compare_Exchange_Strong.hpp
index fd7ea845e..beafeaa5b 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Atomic_Compare_Exchange_Strong.hpp
+++ b/lib/kokkos/core/src/impl/Kokkos_Atomic_Compare_Exchange_Strong.hpp
@@ -1,271 +1,278 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#if defined( KOKKOS_ATOMIC_HPP ) && ! defined( KOKKOS_ATOMIC_COMPARE_EXCHANGE_STRONG_HPP )
#define KOKKOS_ATOMIC_COMPARE_EXCHANGE_STRONG_HPP
namespace Kokkos {
//----------------------------------------------------------------------------
// Cuda native CAS supports int, unsigned int, and unsigned long long int (non-standard type).
// Must cast-away 'volatile' for the CAS call.
-#if defined( KOKKOS_ATOMICS_USE_CUDA )
+#if defined( KOKKOS_HAVE_CUDA )
+#if defined(__CUDA_ARCH__) || defined(KOKKOS_CUDA_CLANG_WORKAROUND)
__inline__ __device__
int atomic_compare_exchange( volatile int * const dest, const int compare, const int val)
{ return atomicCAS((int*)dest,compare,val); }
__inline__ __device__
unsigned int atomic_compare_exchange( volatile unsigned int * const dest, const unsigned int compare, const unsigned int val)
{ return atomicCAS((unsigned int*)dest,compare,val); }
__inline__ __device__
unsigned long long int atomic_compare_exchange( volatile unsigned long long int * const dest ,
const unsigned long long int compare ,
const unsigned long long int val )
{ return atomicCAS((unsigned long long int*)dest,compare,val); }
template < typename T >
__inline__ __device__
T atomic_compare_exchange( volatile T * const dest , const T & compare ,
typename Kokkos::Impl::enable_if< sizeof(T) == sizeof(int) , const T & >::type val )
{
const int tmp = atomicCAS( (int*) dest , *((int*)&compare) , *((int*)&val) );
return *((T*)&tmp);
}
template < typename T >
__inline__ __device__
T atomic_compare_exchange( volatile T * const dest , const T & compare ,
typename Kokkos::Impl::enable_if< sizeof(T) != sizeof(int) &&
sizeof(T) == sizeof(unsigned long long int) , const T & >::type val )
{
typedef unsigned long long int type ;
const type tmp = atomicCAS( (type*) dest , *((type*)&compare) , *((type*)&val) );
return *((T*)&tmp);
}
template < typename T >
__inline__ __device__
T atomic_compare_exchange( volatile T * const dest , const T & compare ,
- typename ::Kokkos::Impl::enable_if<
+ typename Kokkos::Impl::enable_if<
( sizeof(T) != 4 )
&& ( sizeof(T) != 8 )
, const T >::type& val )
{
T return_val;
// This is a way to (hopefully) avoid dead lock in a warp
- int done = 1;
- while ( done>0 ) {
- done++;
- if( Impl::lock_address_cuda_space( (void*) dest ) ) {
- return_val = *dest;
- if( return_val == compare )
- *dest = val;
- Impl::unlock_address_cuda_space( (void*) dest );
- done = 0;
+ int done = 0;
+ unsigned int active = __ballot(1);
+ unsigned int done_active = 0;
+ while (active!=done_active) {
+ if(!done) {
+ if( Impl::lock_address_cuda_space( (void*) dest ) ) {
+ return_val = *dest;
+ if( return_val == compare )
+ *dest = val;
+ Impl::unlock_address_cuda_space( (void*) dest );
+ done = 1;
+ }
}
+ done_active = __ballot(done);
}
return return_val;
}
+#endif
+#endif
//----------------------------------------------------------------------------
// GCC native CAS supports int, long, unsigned int, unsigned long.
// Intel native CAS support int and long with the same interface as GCC.
+#if !defined(__CUDA_ARCH__) || defined(KOKKOS_CUDA_CLANG_WORKAROUND)
+#if defined(KOKKOS_ATOMICS_USE_GCC) || defined(KOKKOS_ATOMICS_USE_INTEL)
-#elif defined(KOKKOS_ATOMICS_USE_GCC) || defined(KOKKOS_ATOMICS_USE_INTEL)
-
-KOKKOS_INLINE_FUNCTION
+inline
int atomic_compare_exchange( volatile int * const dest, const int compare, const int val)
{ return __sync_val_compare_and_swap(dest,compare,val); }
-KOKKOS_INLINE_FUNCTION
+inline
long atomic_compare_exchange( volatile long * const dest, const long compare, const long val )
{ return __sync_val_compare_and_swap(dest,compare,val); }
#if defined( KOKKOS_ATOMICS_USE_GCC )
// GCC supports unsigned
-KOKKOS_INLINE_FUNCTION
+inline
unsigned int atomic_compare_exchange( volatile unsigned int * const dest, const unsigned int compare, const unsigned int val )
{ return __sync_val_compare_and_swap(dest,compare,val); }
-KOKKOS_INLINE_FUNCTION
+inline
unsigned long atomic_compare_exchange( volatile unsigned long * const dest ,
const unsigned long compare ,
const unsigned long val )
{ return __sync_val_compare_and_swap(dest,compare,val); }
#endif
template < typename T >
-KOKKOS_INLINE_FUNCTION
+inline
T atomic_compare_exchange( volatile T * const dest, const T & compare,
typename Kokkos::Impl::enable_if< sizeof(T) == sizeof(int) , const T & >::type val )
{
#ifdef KOKKOS_HAVE_CXX11
union U {
int i ;
T t ;
KOKKOS_INLINE_FUNCTION U() {};
} tmp ;
#else
union U {
int i ;
T t ;
} tmp ;
#endif
tmp.i = __sync_val_compare_and_swap( (int*) dest , *((int*)&compare) , *((int*)&val) );
return tmp.t ;
}
template < typename T >
-KOKKOS_INLINE_FUNCTION
+inline
T atomic_compare_exchange( volatile T * const dest, const T & compare,
typename Kokkos::Impl::enable_if< sizeof(T) != sizeof(int) &&
sizeof(T) == sizeof(long) , const T & >::type val )
{
#ifdef KOKKOS_HAVE_CXX11
union U {
long i ;
T t ;
KOKKOS_INLINE_FUNCTION U() {};
} tmp ;
#else
union U {
long i ;
T t ;
} tmp ;
#endif
tmp.i = __sync_val_compare_and_swap( (long*) dest , *((long*)&compare) , *((long*)&val) );
return tmp.t ;
}
#if defined( KOKKOS_ENABLE_ASM) && defined ( KOKKOS_USE_ISA_X86_64 )
template < typename T >
-KOKKOS_INLINE_FUNCTION
+inline
T atomic_compare_exchange( volatile T * const dest, const T & compare,
typename Kokkos::Impl::enable_if< sizeof(T) != sizeof(int) &&
sizeof(T) != sizeof(long) &&
sizeof(T) == sizeof(Impl::cas128_t), const T & >::type val )
{
union U {
Impl::cas128_t i ;
T t ;
KOKKOS_INLINE_FUNCTION U() {};
} tmp ;
tmp.i = Impl::cas128( (Impl::cas128_t*) dest , *((Impl::cas128_t*)&compare) , *((Impl::cas128_t*)&val) );
return tmp.t ;
}
#endif
template < typename T >
inline
T atomic_compare_exchange( volatile T * const dest , const T compare ,
- typename ::Kokkos::Impl::enable_if<
+ typename Kokkos::Impl::enable_if<
( sizeof(T) != 4 )
&& ( sizeof(T) != 8 )
#if defined(KOKKOS_ENABLE_ASM) && defined ( KOKKOS_USE_ISA_X86_64 )
&& ( sizeof(T) != 16 )
#endif
, const T >::type& val )
{
while( !Impl::lock_address_host_space( (void*) dest ) );
T return_val = *dest;
if( return_val == compare ) {
// Don't use the following line of code here:
//
//const T tmp = *dest = val;
//
// Instead, put each assignment in its own statement. This is
// because the overload of T::operator= for volatile *this should
// return void, not volatile T&. See Kokkos #177:
//
// https://github.com/kokkos/kokkos/issues/177
*dest = val;
const T tmp = *dest;
#ifndef KOKKOS_COMPILER_CLANG
(void) tmp;
#endif
}
Impl::unlock_address_host_space( (void*) dest );
return return_val;
}
//----------------------------------------------------------------------------
#elif defined( KOKKOS_ATOMICS_USE_OMP31 )
template< typename T >
KOKKOS_INLINE_FUNCTION
T atomic_compare_exchange( volatile T * const dest, const T compare, const T val )
{
T retval;
#pragma omp critical
{
retval = dest[0];
if ( retval == compare )
dest[0] = val;
}
return retval;
}
+#endif
#endif
template <typename T>
KOKKOS_INLINE_FUNCTION
bool atomic_compare_exchange_strong(volatile T* const dest, const T compare, const T val)
{
return compare == atomic_compare_exchange(dest, compare, val);
}
-
//----------------------------------------------------------------------------
} // namespace Kokkos
#endif
diff --git a/lib/kokkos/core/src/impl/Kokkos_Atomic_Decrement.hpp b/lib/kokkos/core/src/impl/Kokkos_Atomic_Decrement.hpp
index 1438a37e4..7fc0e6984 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Atomic_Decrement.hpp
+++ b/lib/kokkos/core/src/impl/Kokkos_Atomic_Decrement.hpp
@@ -1,117 +1,119 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#if defined( KOKKOS_ATOMIC_HPP) && ! defined( KOKKOS_ATOMIC_DECREMENT )
#define KOKKOS_ATOMIC_DECREMENT
+#include "impl/Kokkos_Atomic_Fetch_Sub.hpp"
+
namespace Kokkos {
// Atomic increment
template<>
KOKKOS_INLINE_FUNCTION
void atomic_decrement<char>(volatile char* a) {
#if defined( KOKKOS_ENABLE_ASM ) && defined( KOKKOS_USE_ISA_X86_64 ) && ! defined(_WIN32) && ! defined(__CUDA_ARCH__)
__asm__ __volatile__(
"lock decb %0"
: /* no output registers */
: "m" (a[0])
: "memory"
);
#else
- Kokkos::atomic_fetch_add(a,-1);
+ Kokkos::atomic_fetch_sub(a, 1);
#endif
}
template<>
KOKKOS_INLINE_FUNCTION
void atomic_decrement<short>(volatile short* a) {
#if defined( KOKKOS_ENABLE_ASM ) && defined( KOKKOS_USE_ISA_X86_64 ) && ! defined(_WIN32) && ! defined(__CUDA_ARCH__)
__asm__ __volatile__(
"lock decw %0"
: /* no output registers */
: "m" (a[0])
: "memory"
);
#else
- Kokkos::atomic_fetch_add(a,-1);
+ Kokkos::atomic_fetch_sub(a, 1);
#endif
}
template<>
KOKKOS_INLINE_FUNCTION
void atomic_decrement<int>(volatile int* a) {
#if defined( KOKKOS_ENABLE_ASM ) && defined( KOKKOS_USE_ISA_X86_64 ) && ! defined(_WIN32) && ! defined(__CUDA_ARCH__)
__asm__ __volatile__(
"lock decl %0"
: /* no output registers */
: "m" (a[0])
: "memory"
);
#else
- Kokkos::atomic_fetch_add(a,-1);
+ Kokkos::atomic_fetch_sub(a, 1);
#endif
}
template<>
KOKKOS_INLINE_FUNCTION
void atomic_decrement<long long int>(volatile long long int* a) {
#if defined( KOKKOS_ENABLE_ASM ) && defined( KOKKOS_USE_ISA_X86_64 ) && ! defined(_WIN32) && ! defined(__CUDA_ARCH__)
__asm__ __volatile__(
"lock decq %0"
: /* no output registers */
: "m" (a[0])
: "memory"
);
#else
- Kokkos::atomic_fetch_add(a,-1);
+ Kokkos::atomic_fetch_sub(a, 1);
#endif
}
template<typename T>
KOKKOS_INLINE_FUNCTION
void atomic_decrement(volatile T* a) {
- Kokkos::atomic_fetch_add(a,-1);
+ Kokkos::atomic_fetch_sub(a, 1);
}
} // End of namespace Kokkos
#endif
diff --git a/lib/kokkos/core/src/impl/Kokkos_Atomic_Exchange.hpp b/lib/kokkos/core/src/impl/Kokkos_Atomic_Exchange.hpp
index e8cac4ba3..ae53b8177 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Atomic_Exchange.hpp
+++ b/lib/kokkos/core/src/impl/Kokkos_Atomic_Exchange.hpp
@@ -1,359 +1,368 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#if defined( KOKKOS_ATOMIC_HPP ) && ! defined( KOKKOS_ATOMIC_EXCHANGE_HPP )
#define KOKKOS_ATOMIC_EXCHANGE_HPP
namespace Kokkos {
//----------------------------------------------------------------------------
-#if defined( KOKKOS_ATOMICS_USE_CUDA )
+#if defined( KOKKOS_HAVE_CUDA )
+#if defined(__CUDA_ARCH__) || defined(KOKKOS_CUDA_CLANG_WORKAROUND)
__inline__ __device__
int atomic_exchange( volatile int * const dest , const int val )
{
// return __iAtomicExch( (int*) dest , val );
return atomicExch( (int*) dest , val );
}
__inline__ __device__
unsigned int atomic_exchange( volatile unsigned int * const dest , const unsigned int val )
{
// return __uAtomicExch( (unsigned int*) dest , val );
return atomicExch( (unsigned int*) dest , val );
}
__inline__ __device__
unsigned long long int atomic_exchange( volatile unsigned long long int * const dest , const unsigned long long int val )
{
// return __ullAtomicExch( (unsigned long long*) dest , val );
return atomicExch( (unsigned long long*) dest , val );
}
/** \brief Atomic exchange for any type with compatible size */
template< typename T >
__inline__ __device__
T atomic_exchange(
volatile T * const dest ,
typename Kokkos::Impl::enable_if< sizeof(T) == sizeof(int) , const T & >::type val )
{
// int tmp = __ullAtomicExch( (int*) dest , *((int*)&val) );
int tmp = atomicExch( ((int*)dest) , *((int*)&val) );
return *((T*)&tmp);
}
template< typename T >
__inline__ __device__
T atomic_exchange(
volatile T * const dest ,
typename Kokkos::Impl::enable_if< sizeof(T) != sizeof(int) &&
sizeof(T) == sizeof(unsigned long long int) , const T & >::type val )
{
typedef unsigned long long int type ;
// type tmp = __ullAtomicExch( (type*) dest , *((type*)&val) );
type tmp = atomicExch( ((type*)dest) , *((type*)&val) );
return *((T*)&tmp);
}
template < typename T >
__inline__ __device__
T atomic_exchange( volatile T * const dest ,
- typename ::Kokkos::Impl::enable_if<
+ typename Kokkos::Impl::enable_if<
( sizeof(T) != 4 )
&& ( sizeof(T) != 8 )
, const T >::type& val )
{
T return_val;
// This is a way to (hopefully) avoid dead lock in a warp
- int done = 1;
- while ( done > 0 ) {
- done++;
- if( Impl::lock_address_cuda_space( (void*) dest ) ) {
- return_val = *dest;
- *dest = val;
- Impl::unlock_address_cuda_space( (void*) dest );
- done = 0;
+ int done = 0;
+ unsigned int active = __ballot(1);
+ unsigned int done_active = 0;
+ while (active!=done_active) {
+ if(!done) {
+ if( Impl::lock_address_cuda_space( (void*) dest ) ) {
+ return_val = *dest;
+ *dest = val;
+ Impl::unlock_address_cuda_space( (void*) dest );
+ done = 1;
+ }
}
+ done_active = __ballot(done);
}
return return_val;
}
/** \brief Atomic exchange for any type with compatible size */
template< typename T >
__inline__ __device__
void atomic_assign(
volatile T * const dest ,
typename Kokkos::Impl::enable_if< sizeof(T) == sizeof(int) , const T & >::type val )
{
// (void) __ullAtomicExch( (int*) dest , *((int*)&val) );
(void) atomicExch( ((int*)dest) , *((int*)&val) );
}
template< typename T >
__inline__ __device__
void atomic_assign(
volatile T * const dest ,
typename Kokkos::Impl::enable_if< sizeof(T) != sizeof(int) &&
sizeof(T) == sizeof(unsigned long long int) , const T & >::type val )
{
typedef unsigned long long int type ;
// (void) __ullAtomicExch( (type*) dest , *((type*)&val) );
(void) atomicExch( ((type*)dest) , *((type*)&val) );
}
template< typename T >
__inline__ __device__
void atomic_assign(
volatile T * const dest ,
typename Kokkos::Impl::enable_if< sizeof(T) != sizeof(int) &&
sizeof(T) != sizeof(unsigned long long int)
, const T & >::type val )
{
(void) atomic_exchange(dest,val);
}
+#endif
+#endif
+
//----------------------------------------------------------------------------
-#elif defined(KOKKOS_ATOMICS_USE_GCC) || defined(KOKKOS_ATOMICS_USE_INTEL)
+#if !defined(__CUDA_ARCH__) || defined(KOKKOS_CUDA_CLANG_WORKAROUND)
+#if defined(KOKKOS_ATOMICS_USE_GCC) || defined(KOKKOS_ATOMICS_USE_INTEL)
template< typename T >
-KOKKOS_INLINE_FUNCTION
+inline
T atomic_exchange( volatile T * const dest ,
typename Kokkos::Impl::enable_if< sizeof(T) == sizeof(int) || sizeof(T) == sizeof(long)
, const T & >::type val )
{
typedef typename Kokkos::Impl::if_c< sizeof(T) == sizeof(int) , int , long >::type type ;
const type v = *((type*)&val); // Extract to be sure the value doesn't change
type assumed ;
#ifdef KOKKOS_HAVE_CXX11
union U {
T val_T ;
type val_type ;
- KOKKOS_INLINE_FUNCTION U() {};
+ inline U() {};
} old ;
#else
union { T val_T ; type val_type ; } old ;
#endif
old.val_T = *dest ;
do {
assumed = old.val_type ;
old.val_type = __sync_val_compare_and_swap( (volatile type *) dest , assumed , v );
} while ( assumed != old.val_type );
return old.val_T ;
}
#if defined(KOKKOS_ENABLE_ASM) && defined ( KOKKOS_USE_ISA_X86_64 )
template< typename T >
-KOKKOS_INLINE_FUNCTION
+inline
T atomic_exchange( volatile T * const dest ,
typename Kokkos::Impl::enable_if< sizeof(T) == sizeof(Impl::cas128_t)
, const T & >::type val )
{
union U {
Impl::cas128_t i ;
T t ;
- KOKKOS_INLINE_FUNCTION U() {};
+ inline U() {};
} assume , oldval , newval ;
oldval.t = *dest ;
newval.t = val;
do {
assume.i = oldval.i ;
oldval.i = Impl::cas128( (volatile Impl::cas128_t*) dest , assume.i , newval.i );
} while ( assume.i != oldval.i );
return oldval.t ;
}
#endif
//----------------------------------------------------------------------------
template < typename T >
inline
T atomic_exchange( volatile T * const dest ,
- typename ::Kokkos::Impl::enable_if<
+ typename Kokkos::Impl::enable_if<
( sizeof(T) != 4 )
&& ( sizeof(T) != 8 )
#if defined(KOKKOS_ENABLE_ASM) && defined ( KOKKOS_USE_ISA_X86_64 )
&& ( sizeof(T) != 16 )
#endif
, const T >::type& val )
{
while( !Impl::lock_address_host_space( (void*) dest ) );
T return_val = *dest;
// Don't use the following line of code here:
//
//const T tmp = *dest = val;
//
// Instead, put each assignment in its own statement. This is
// because the overload of T::operator= for volatile *this should
// return void, not volatile T&. See Kokkos #177:
//
// https://github.com/kokkos/kokkos/issues/177
*dest = val;
const T tmp = *dest;
#ifndef KOKKOS_COMPILER_CLANG
(void) tmp;
#endif
Impl::unlock_address_host_space( (void*) dest );
return return_val;
}
template< typename T >
-KOKKOS_INLINE_FUNCTION
+inline
void atomic_assign( volatile T * const dest ,
typename Kokkos::Impl::enable_if< sizeof(T) == sizeof(int) || sizeof(T) == sizeof(long)
, const T & >::type val )
{
typedef typename Kokkos::Impl::if_c< sizeof(T) == sizeof(int) , int , long >::type type ;
const type v = *((type*)&val); // Extract to be sure the value doesn't change
type assumed ;
#ifdef KOKKOS_HAVE_CXX11
union U {
T val_T ;
type val_type ;
- KOKKOS_INLINE_FUNCTION U() {};
+ inline U() {};
} old ;
#else
union { T val_T ; type val_type ; } old ;
#endif
old.val_T = *dest ;
do {
assumed = old.val_type ;
old.val_type = __sync_val_compare_and_swap( (volatile type *) dest , assumed , v );
} while ( assumed != old.val_type );
}
#if defined( KOKKOS_ENABLE_ASM ) && defined ( KOKKOS_USE_ISA_X86_64 )
template< typename T >
-KOKKOS_INLINE_FUNCTION
+inline
void atomic_assign( volatile T * const dest ,
typename Kokkos::Impl::enable_if< sizeof(T) == sizeof(Impl::cas128_t)
, const T & >::type val )
{
union U {
Impl::cas128_t i ;
T t ;
- KOKKOS_INLINE_FUNCTION U() {};
+ inline U() {};
} assume , oldval , newval ;
oldval.t = *dest ;
newval.t = val;
do {
assume.i = oldval.i ;
oldval.i = Impl::cas128( (volatile Impl::cas128_t*) dest , assume.i , newval.i);
} while ( assume.i != oldval.i );
}
#endif
template < typename T >
inline
void atomic_assign( volatile T * const dest ,
- typename ::Kokkos::Impl::enable_if<
+ typename Kokkos::Impl::enable_if<
( sizeof(T) != 4 )
&& ( sizeof(T) != 8 )
#if defined(KOKKOS_ENABLE_ASM) && defined ( KOKKOS_USE_ISA_X86_64 )
&& ( sizeof(T) != 16 )
#endif
, const T >::type& val )
{
while( !Impl::lock_address_host_space( (void*) dest ) );
// This is likely an aggregate type with a defined
// 'volatile T & operator = ( const T & ) volatile'
// member. The volatile return value implicitly defines a
// dereference that some compilers (gcc 4.7.2) warn is being ignored.
// Suppress warning by casting return to void.
//(void)( *dest = val );
*dest = val;
Impl::unlock_address_host_space( (void*) dest );
}
//----------------------------------------------------------------------------
#elif defined( KOKKOS_ATOMICS_USE_OMP31 )
template < typename T >
-KOKKOS_INLINE_FUNCTION
+inline
T atomic_exchange( volatile T * const dest , const T val )
{
T retval;
//#pragma omp atomic capture
#pragma omp critical
{
retval = dest[0];
dest[0] = val;
}
return retval;
}
template < typename T >
-KOKKOS_INLINE_FUNCTION
+inline
void atomic_assign( volatile T * const dest , const T val )
{
//#pragma omp atomic
#pragma omp critical
{
dest[0] = val;
}
}
#endif
-
+#endif
} // namespace Kokkos
#endif
//----------------------------------------------------------------------------
diff --git a/lib/kokkos/core/src/impl/Kokkos_Atomic_Fetch_Add.hpp b/lib/kokkos/core/src/impl/Kokkos_Atomic_Fetch_Add.hpp
index 62dfcdd2f..08d2867ab 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Atomic_Fetch_Add.hpp
+++ b/lib/kokkos/core/src/impl/Kokkos_Atomic_Fetch_Add.hpp
@@ -1,340 +1,354 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#if defined( KOKKOS_ATOMIC_HPP ) && ! defined( KOKKOS_ATOMIC_FETCH_ADD_HPP )
#define KOKKOS_ATOMIC_FETCH_ADD_HPP
namespace Kokkos {
//----------------------------------------------------------------------------
-#if defined( KOKKOS_ATOMICS_USE_CUDA )
+#if defined( KOKKOS_HAVE_CUDA )
+#if defined(__CUDA_ARCH__) || defined(KOKKOS_CUDA_CLANG_WORKAROUND)
// Support for int, unsigned int, unsigned long long int, and float
__inline__ __device__
int atomic_fetch_add( volatile int * const dest , const int val )
{ return atomicAdd((int*)dest,val); }
__inline__ __device__
unsigned int atomic_fetch_add( volatile unsigned int * const dest , const unsigned int val )
{ return atomicAdd((unsigned int*)dest,val); }
__inline__ __device__
unsigned long long int atomic_fetch_add( volatile unsigned long long int * const dest ,
const unsigned long long int val )
{ return atomicAdd((unsigned long long int*)dest,val); }
__inline__ __device__
float atomic_fetch_add( volatile float * const dest , const float val )
{ return atomicAdd((float*)dest,val); }
+#if ( 600 <= __CUDA_ARCH__ )
+__inline__ __device__
+double atomic_fetch_add( volatile double * const dest , const double val )
+{ return atomicAdd((double*)dest,val); }
+#endif
+
template < typename T >
__inline__ __device__
T atomic_fetch_add( volatile T * const dest ,
typename Kokkos::Impl::enable_if< sizeof(T) == sizeof(int) , const T >::type val )
{
#ifdef KOKKOS_HAVE_CXX11
union U {
int i ;
T t ;
KOKKOS_INLINE_FUNCTION U() {};
} assume , oldval , newval ;
#else
union U {
int i ;
T t ;
} assume , oldval , newval ;
#endif
oldval.t = *dest ;
do {
assume.i = oldval.i ;
newval.t = assume.t + val ;
oldval.i = atomicCAS( (int*)dest , assume.i , newval.i );
} while ( assume.i != oldval.i );
return oldval.t ;
}
template < typename T >
__inline__ __device__
T atomic_fetch_add( volatile T * const dest ,
typename Kokkos::Impl::enable_if< sizeof(T) != sizeof(int) &&
sizeof(T) == sizeof(unsigned long long int) , const T >::type val )
{
#ifdef KOKKOS_HAVE_CXX11
union U {
unsigned long long int i ;
T t ;
KOKKOS_INLINE_FUNCTION U() {};
} assume , oldval , newval ;
#else
union U {
unsigned long long int i ;
T t ;
} assume , oldval , newval ;
#endif
oldval.t = *dest ;
do {
assume.i = oldval.i ;
newval.t = assume.t + val ;
oldval.i = atomicCAS( (unsigned long long int*)dest , assume.i , newval.i );
} while ( assume.i != oldval.i );
return oldval.t ;
}
//----------------------------------------------------------------------------
template < typename T >
__inline__ __device__
T atomic_fetch_add( volatile T * const dest ,
- typename ::Kokkos::Impl::enable_if<
+ typename Kokkos::Impl::enable_if<
( sizeof(T) != 4 )
&& ( sizeof(T) != 8 )
, const T >::type& val )
{
T return_val;
// This is a way to (hopefully) avoid dead lock in a warp
- int done = 1;
- while ( done>0 ) {
- done++;
- if( Impl::lock_address_cuda_space( (void*) dest ) ) {
- return_val = *dest;
- *dest = return_val + val;
- Impl::unlock_address_cuda_space( (void*) dest );
- done = 0;
+ int done = 0;
+ unsigned int active = __ballot(1);
+ unsigned int done_active = 0;
+ while (active!=done_active) {
+ if(!done) {
+ bool locked = Impl::lock_address_cuda_space( (void*) dest );
+ if( locked ) {
+ return_val = *dest;
+ *dest = return_val + val;
+ Impl::unlock_address_cuda_space( (void*) dest );
+ done = 1;
+ }
}
+ done_active = __ballot(done);
}
return return_val;
}
+#endif
+#endif
//----------------------------------------------------------------------------
-
-#elif defined(KOKKOS_ATOMICS_USE_GCC) || defined(KOKKOS_ATOMICS_USE_INTEL)
+#if !defined(__CUDA_ARCH__) || defined(KOKKOS_CUDA_CLANG_WORKAROUND)
+#if defined(KOKKOS_ATOMICS_USE_GCC) || defined(KOKKOS_ATOMICS_USE_INTEL)
#if defined( KOKKOS_ENABLE_ASM ) && defined ( KOKKOS_USE_ISA_X86_64 )
-KOKKOS_INLINE_FUNCTION
+inline
int atomic_fetch_add( volatile int * dest , const int val )
{
int original = val;
__asm__ __volatile__(
"lock xadd %1, %0"
: "+m" (*dest), "+r" (original)
: "m" (*dest), "r" (original)
: "memory"
);
return original;
}
#else
-KOKKOS_INLINE_FUNCTION
+inline
int atomic_fetch_add( volatile int * const dest , const int val )
{ return __sync_fetch_and_add(dest, val); }
#endif
-KOKKOS_INLINE_FUNCTION
+inline
long int atomic_fetch_add( volatile long int * const dest , const long int val )
{ return __sync_fetch_and_add(dest,val); }
#if defined( KOKKOS_ATOMICS_USE_GCC )
-KOKKOS_INLINE_FUNCTION
+inline
unsigned int atomic_fetch_add( volatile unsigned int * const dest , const unsigned int val )
{ return __sync_fetch_and_add(dest,val); }
-KOKKOS_INLINE_FUNCTION
+inline
unsigned long int atomic_fetch_add( volatile unsigned long int * const dest , const unsigned long int val )
{ return __sync_fetch_and_add(dest,val); }
#endif
template < typename T >
-KOKKOS_INLINE_FUNCTION
+inline
T atomic_fetch_add( volatile T * const dest ,
typename Kokkos::Impl::enable_if< sizeof(T) == sizeof(int) , const T >::type val )
{
#ifdef KOKKOS_HAVE_CXX11
union U {
int i ;
T t ;
- KOKKOS_INLINE_FUNCTION U() {};
+ inline U() {};
} assume , oldval , newval ;
#else
union U {
int i ;
T t ;
} assume , oldval , newval ;
#endif
oldval.t = *dest ;
do {
assume.i = oldval.i ;
newval.t = assume.t + val ;
oldval.i = __sync_val_compare_and_swap( (int*) dest , assume.i , newval.i );
} while ( assume.i != oldval.i );
return oldval.t ;
}
template < typename T >
-KOKKOS_INLINE_FUNCTION
+inline
T atomic_fetch_add( volatile T * const dest ,
typename Kokkos::Impl::enable_if< sizeof(T) != sizeof(int) &&
sizeof(T) == sizeof(long) , const T >::type val )
{
#ifdef KOKKOS_HAVE_CXX11
union U {
long i ;
T t ;
- KOKKOS_INLINE_FUNCTION U() {};
+ inline U() {};
} assume , oldval , newval ;
#else
union U {
long i ;
T t ;
} assume , oldval , newval ;
#endif
oldval.t = *dest ;
do {
assume.i = oldval.i ;
newval.t = assume.t + val ;
oldval.i = __sync_val_compare_and_swap( (long*) dest , assume.i , newval.i );
} while ( assume.i != oldval.i );
return oldval.t ;
}
#if defined( KOKKOS_ENABLE_ASM ) && defined ( KOKKOS_USE_ISA_X86_64 )
template < typename T >
-KOKKOS_INLINE_FUNCTION
+inline
T atomic_fetch_add( volatile T * const dest ,
typename Kokkos::Impl::enable_if< sizeof(T) != sizeof(int) &&
sizeof(T) != sizeof(long) &&
sizeof(T) == sizeof(Impl::cas128_t) , const T >::type val )
{
union U {
Impl::cas128_t i ;
T t ;
- KOKKOS_INLINE_FUNCTION U() {};
+ inline U() {};
} assume , oldval , newval ;
oldval.t = *dest ;
do {
assume.i = oldval.i ;
newval.t = assume.t + val ;
oldval.i = Impl::cas128( (volatile Impl::cas128_t*) dest , assume.i , newval.i );
} while ( assume.i != oldval.i );
return oldval.t ;
}
#endif
//----------------------------------------------------------------------------
template < typename T >
inline
T atomic_fetch_add( volatile T * const dest ,
- typename ::Kokkos::Impl::enable_if<
+ typename Kokkos::Impl::enable_if<
( sizeof(T) != 4 )
&& ( sizeof(T) != 8 )
#if defined(KOKKOS_ENABLE_ASM) && defined ( KOKKOS_USE_ISA_X86_64 )
&& ( sizeof(T) != 16 )
#endif
, const T >::type& val )
{
while( !Impl::lock_address_host_space( (void*) dest ) );
T return_val = *dest;
// Don't use the following line of code here:
//
//const T tmp = *dest = return_val + val;
//
// Instead, put each assignment in its own statement. This is
// because the overload of T::operator= for volatile *this should
// return void, not volatile T&. See Kokkos #177:
//
// https://github.com/kokkos/kokkos/issues/177
*dest = return_val + val;
const T tmp = *dest;
(void) tmp;
Impl::unlock_address_host_space( (void*) dest );
return return_val;
}
//----------------------------------------------------------------------------
#elif defined( KOKKOS_ATOMICS_USE_OMP31 )
template< typename T >
T atomic_fetch_add( volatile T * const dest , const T val )
{
T retval;
#pragma omp atomic capture
{
retval = dest[0];
dest[0] += val;
}
return retval;
}
#endif
-
+#endif
//----------------------------------------------------------------------------
// Simpler version of atomic_fetch_add without the fetch
template <typename T>
KOKKOS_INLINE_FUNCTION
void atomic_add(volatile T * const dest, const T src) {
atomic_fetch_add(dest,src);
}
}
#endif
diff --git a/lib/kokkos/core/src/impl/Kokkos_Atomic_Fetch_And.hpp b/lib/kokkos/core/src/impl/Kokkos_Atomic_Fetch_And.hpp
index 9b7ebae4a..121a5d519 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Atomic_Fetch_And.hpp
+++ b/lib/kokkos/core/src/impl/Kokkos_Atomic_Fetch_And.hpp
@@ -1,125 +1,127 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#if defined( KOKKOS_ATOMIC_HPP ) && ! defined( KOKKOS_ATOMIC_FETCH_AND_HPP )
#define KOKKOS_ATOMIC_FETCH_AND_HPP
namespace Kokkos {
//----------------------------------------------------------------------------
-#if defined( KOKKOS_ATOMICS_USE_CUDA )
+#if defined( KOKKOS_HAVE_CUDA )
+#if defined(__CUDA_ARCH__) || defined(KOKKOS_CUDA_CLANG_WORKAROUND)
// Support for int, unsigned int, unsigned long long int, and float
__inline__ __device__
int atomic_fetch_and( volatile int * const dest , const int val )
{ return atomicAnd((int*)dest,val); }
__inline__ __device__
unsigned int atomic_fetch_and( volatile unsigned int * const dest , const unsigned int val )
{ return atomicAnd((unsigned int*)dest,val); }
#if defined( __CUDA_ARCH__ ) && ( 350 <= __CUDA_ARCH__ )
__inline__ __device__
unsigned long long int atomic_fetch_and( volatile unsigned long long int * const dest ,
const unsigned long long int val )
{ return atomicAnd((unsigned long long int*)dest,val); }
#endif
-
+#endif
+#endif
//----------------------------------------------------------------------------
+#if !defined(__CUDA_ARCH__) || defined(KOKKOS_CUDA_CLANG_WORKAROUND)
+#if defined(KOKKOS_ATOMICS_USE_GCC) || defined(KOKKOS_ATOMICS_USE_INTEL)
-#elif defined(KOKKOS_ATOMICS_USE_GCC) || defined(KOKKOS_ATOMICS_USE_INTEL)
-
-KOKKOS_INLINE_FUNCTION
+inline
int atomic_fetch_and( volatile int * const dest , const int val )
{ return __sync_fetch_and_and(dest,val); }
-KOKKOS_INLINE_FUNCTION
+inline
long int atomic_fetch_and( volatile long int * const dest , const long int val )
{ return __sync_fetch_and_and(dest,val); }
#if defined( KOKKOS_ATOMICS_USE_GCC )
-KOKKOS_INLINE_FUNCTION
+inline
unsigned int atomic_fetch_and( volatile unsigned int * const dest , const unsigned int val )
{ return __sync_fetch_and_and(dest,val); }
-KOKKOS_INLINE_FUNCTION
+inline
unsigned long int atomic_fetch_and( volatile unsigned long int * const dest , const unsigned long int val )
{ return __sync_fetch_and_and(dest,val); }
#endif
//----------------------------------------------------------------------------
#elif defined( KOKKOS_ATOMICS_USE_OMP31 )
template< typename T >
T atomic_fetch_and( volatile T * const dest , const T val )
{
T retval;
#pragma omp atomic capture
{
retval = dest[0];
dest[0] &= val;
}
return retval;
}
#endif
-
+#endif
//----------------------------------------------------------------------------
// Simpler version of atomic_fetch_and without the fetch
template <typename T>
KOKKOS_INLINE_FUNCTION
void atomic_and(volatile T * const dest, const T src) {
(void)atomic_fetch_and(dest,src);
}
}
#endif
diff --git a/lib/kokkos/core/src/impl/Kokkos_Atomic_Fetch_Or.hpp b/lib/kokkos/core/src/impl/Kokkos_Atomic_Fetch_Or.hpp
index f15e61a3a..2c89f5670 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Atomic_Fetch_Or.hpp
+++ b/lib/kokkos/core/src/impl/Kokkos_Atomic_Fetch_Or.hpp
@@ -1,125 +1,127 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#if defined( KOKKOS_ATOMIC_HPP ) && ! defined( KOKKOS_ATOMIC_FETCH_OR_HPP )
#define KOKKOS_ATOMIC_FETCH_OR_HPP
namespace Kokkos {
//----------------------------------------------------------------------------
-#if defined( KOKKOS_ATOMICS_USE_CUDA )
+#if defined( KOKKOS_HAVE_CUDA )
+#if defined(__CUDA_ARCH__) || defined(KOKKOS_CUDA_CLANG_WORKAROUND)
// Support for int, unsigned int, unsigned long long int, and float
__inline__ __device__
int atomic_fetch_or( volatile int * const dest , const int val )
{ return atomicOr((int*)dest,val); }
__inline__ __device__
unsigned int atomic_fetch_or( volatile unsigned int * const dest , const unsigned int val )
{ return atomicOr((unsigned int*)dest,val); }
#if defined( __CUDA_ARCH__ ) && ( 350 <= __CUDA_ARCH__ )
__inline__ __device__
unsigned long long int atomic_fetch_or( volatile unsigned long long int * const dest ,
const unsigned long long int val )
{ return atomicOr((unsigned long long int*)dest,val); }
#endif
-
+#endif
+#endif
//----------------------------------------------------------------------------
+#if !defined(__CUDA_ARCH__) || defined(KOKKOS_CUDA_CLANG_WORKAROUND)
+#if defined(KOKKOS_ATOMICS_USE_GCC) || defined(KOKKOS_ATOMICS_USE_INTEL)
-#elif defined(KOKKOS_ATOMICS_USE_GCC) || defined(KOKKOS_ATOMICS_USE_INTEL)
-
-KOKKOS_INLINE_FUNCTION
+inline
int atomic_fetch_or( volatile int * const dest , const int val )
{ return __sync_fetch_and_or(dest,val); }
-KOKKOS_INLINE_FUNCTION
+inline
long int atomic_fetch_or( volatile long int * const dest , const long int val )
{ return __sync_fetch_and_or(dest,val); }
#if defined( KOKKOS_ATOMICS_USE_GCC )
-KOKKOS_INLINE_FUNCTION
+inline
unsigned int atomic_fetch_or( volatile unsigned int * const dest , const unsigned int val )
{ return __sync_fetch_and_or(dest,val); }
-KOKKOS_INLINE_FUNCTION
+inline
unsigned long int atomic_fetch_or( volatile unsigned long int * const dest , const unsigned long int val )
{ return __sync_fetch_and_or(dest,val); }
#endif
//----------------------------------------------------------------------------
#elif defined( KOKKOS_ATOMICS_USE_OMP31 )
template< typename T >
T atomic_fetch_or( volatile T * const dest , const T val )
{
T retval;
#pragma omp atomic capture
{
retval = dest[0];
dest[0] |= val;
}
return retval;
}
#endif
-
+#endif
//----------------------------------------------------------------------------
// Simpler version of atomic_fetch_or without the fetch
template <typename T>
KOKKOS_INLINE_FUNCTION
void atomic_or(volatile T * const dest, const T src) {
(void)atomic_fetch_or(dest,src);
}
}
#endif
diff --git a/lib/kokkos/core/src/impl/Kokkos_Atomic_Fetch_Sub.hpp b/lib/kokkos/core/src/impl/Kokkos_Atomic_Fetch_Sub.hpp
index a3a57aa81..b51d2fe78 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Atomic_Fetch_Sub.hpp
+++ b/lib/kokkos/core/src/impl/Kokkos_Atomic_Fetch_Sub.hpp
@@ -1,235 +1,241 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#if defined( KOKKOS_ATOMIC_HPP ) && ! defined( KOKKOS_ATOMIC_FETCH_SUB_HPP )
#define KOKKOS_ATOMIC_FETCH_SUB_HPP
namespace Kokkos {
//----------------------------------------------------------------------------
-#if defined( KOKKOS_ATOMICS_USE_CUDA )
+#if defined( KOKKOS_HAVE_CUDA )
+#if defined(__CUDA_ARCH__) || defined(KOKKOS_CUDA_CLANG_WORKAROUND)
// Support for int, unsigned int, unsigned long long int, and float
__inline__ __device__
int atomic_fetch_sub( volatile int * const dest , const int val )
{ return atomicSub((int*)dest,val); }
__inline__ __device__
unsigned int atomic_fetch_sub( volatile unsigned int * const dest , const unsigned int val )
{ return atomicSub((unsigned int*)dest,val); }
template < typename T >
__inline__ __device__
T atomic_fetch_sub( volatile T * const dest ,
typename Kokkos::Impl::enable_if< sizeof(T) == sizeof(int) , const T >::type val )
{
union { int i ; T t ; } oldval , assume , newval ;
oldval.t = *dest ;
do {
assume.i = oldval.i ;
newval.t = assume.t - val ;
oldval.i = atomicCAS( (int*)dest , assume.i , newval.i );
} while ( assume.i != oldval.i );
return oldval.t ;
}
template < typename T >
__inline__ __device__
T atomic_fetch_sub( volatile T * const dest ,
typename Kokkos::Impl::enable_if< sizeof(T) != sizeof(int) &&
sizeof(T) == sizeof(unsigned long long int) , const T >::type val )
{
union { unsigned long long int i ; T t ; } oldval , assume , newval ;
oldval.t = *dest ;
do {
assume.i = oldval.i ;
newval.t = assume.t - val ;
oldval.i = atomicCAS( (unsigned long long int*)dest , assume.i , newval.i );
} while ( assume.i != oldval.i );
return oldval.t ;
}
//----------------------------------------------------------------------------
template < typename T >
__inline__ __device__
T atomic_fetch_sub( volatile T * const dest ,
- typename ::Kokkos::Impl::enable_if<
+ typename Kokkos::Impl::enable_if<
( sizeof(T) != 4 )
&& ( sizeof(T) != 8 )
, const T >::type& val )
{
T return_val;
// This is a way to (hopefully) avoid dead lock in a warp
int done = 0;
- while ( done>0 ) {
- done++;
- if( Impl::lock_address_cuda_space( (void*) dest ) ) {
- return_val = *dest;
- *dest = return_val - val;
- Impl::unlock_address_cuda_space( (void*) dest );
- done = 0;
+ unsigned int active = __ballot(1);
+ unsigned int done_active = 0;
+ while (active!=done_active) {
+ if(!done) {
+ if( Impl::lock_address_cuda_space( (void*) dest ) ) {
+ return_val = *dest;
+ *dest = return_val - val;
+ Impl::unlock_address_cuda_space( (void*) dest );
+ done = 1;
+ }
}
+ done_active = __ballot(done);
}
return return_val;
}
-
+#endif
+#endif
//----------------------------------------------------------------------------
+#if !defined(__CUDA_ARCH__) || defined(KOKKOS_CUDA_CLANG_WORKAROUND)
+#if defined(KOKKOS_ATOMICS_USE_GCC) || defined(KOKKOS_ATOMICS_USE_INTEL)
-#elif defined(KOKKOS_ATOMICS_USE_GCC) || defined(KOKKOS_ATOMICS_USE_INTEL)
-
-KOKKOS_INLINE_FUNCTION
+inline
int atomic_fetch_sub( volatile int * const dest , const int val )
{ return __sync_fetch_and_sub(dest,val); }
-KOKKOS_INLINE_FUNCTION
+inline
long int atomic_fetch_sub( volatile long int * const dest , const long int val )
{ return __sync_fetch_and_sub(dest,val); }
#if defined( KOKKOS_ATOMICS_USE_GCC )
-KOKKOS_INLINE_FUNCTION
+inline
unsigned int atomic_fetch_sub( volatile unsigned int * const dest , const unsigned int val )
{ return __sync_fetch_and_sub(dest,val); }
-KOKKOS_INLINE_FUNCTION
+inline
unsigned long int atomic_fetch_sub( volatile unsigned long int * const dest , const unsigned long int val )
{ return __sync_fetch_and_sub(dest,val); }
#endif
template < typename T >
-KOKKOS_INLINE_FUNCTION
+inline
T atomic_fetch_sub( volatile T * const dest ,
typename Kokkos::Impl::enable_if< sizeof(T) == sizeof(int) , const T >::type val )
{
union { int i ; T t ; } assume , oldval , newval ;
oldval.t = *dest ;
do {
assume.i = oldval.i ;
newval.t = assume.t - val ;
oldval.i = __sync_val_compare_and_swap( (int*) dest , assume.i , newval.i );
} while ( assume.i != oldval.i );
return oldval.t ;
}
template < typename T >
-KOKKOS_INLINE_FUNCTION
+inline
T atomic_fetch_sub( volatile T * const dest ,
typename Kokkos::Impl::enable_if< sizeof(T) != sizeof(int) &&
sizeof(T) == sizeof(long) , const T >::type val )
{
union { long i ; T t ; } assume , oldval , newval ;
oldval.t = *dest ;
do {
assume.i = oldval.i ;
newval.t = assume.t - val ;
oldval.i = __sync_val_compare_and_swap( (long*) dest , assume.i , newval.i );
} while ( assume.i != oldval.i );
return oldval.t ;
}
//----------------------------------------------------------------------------
template < typename T >
inline
T atomic_fetch_sub( volatile T * const dest ,
- typename ::Kokkos::Impl::enable_if<
+ typename Kokkos::Impl::enable_if<
( sizeof(T) != 4 )
&& ( sizeof(T) != 8 )
, const T >::type& val )
{
while( !Impl::lock_address_host_space( (void*) dest ) );
T return_val = *dest;
*dest = return_val - val;
Impl::unlock_address_host_space( (void*) dest );
return return_val;
}
//----------------------------------------------------------------------------
#elif defined( KOKKOS_ATOMICS_USE_OMP31 )
template< typename T >
T atomic_fetch_sub( volatile T * const dest , const T val )
{
T retval;
#pragma omp atomic capture
{
retval = dest[0];
dest[0] -= val;
}
return retval;
}
#endif
-
+#endif
// Simpler version of atomic_fetch_sub without the fetch
template <typename T>
KOKKOS_INLINE_FUNCTION
void atomic_sub(volatile T * const dest, const T src) {
atomic_fetch_sub(dest,src);
}
}
#include<impl/Kokkos_Atomic_Assembly.hpp>
#endif
diff --git a/lib/kokkos/core/src/impl/Kokkos_Atomic_Generic.hpp b/lib/kokkos/core/src/impl/Kokkos_Atomic_Generic.hpp
index 343e9bf4c..527e1bb4e 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Atomic_Generic.hpp
+++ b/lib/kokkos/core/src/impl/Kokkos_Atomic_Generic.hpp
@@ -1,419 +1,429 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#if defined( KOKKOS_ATOMIC_HPP ) && ! defined( KOKKOS_ATOMIC_GENERIC_HPP )
#define KOKKOS_ATOMIC_GENERIC_HPP
#include <Kokkos_Macros.hpp>
// Combination operands to be used in an Compare and Exchange based atomic operation
namespace Kokkos {
namespace Impl {
template<class Scalar1, class Scalar2>
struct MaxOper {
KOKKOS_FORCEINLINE_FUNCTION
static Scalar1 apply(const Scalar1& val1, const Scalar2& val2) {
return (val1 > val2 ? val1 : val2);
}
};
template<class Scalar1, class Scalar2>
struct MinOper {
KOKKOS_FORCEINLINE_FUNCTION
static Scalar1 apply(const Scalar1& val1, const Scalar2& val2) {
return (val1 < val2 ? val1 : val2);
}
};
template<class Scalar1, class Scalar2>
struct AddOper {
KOKKOS_FORCEINLINE_FUNCTION
static Scalar1 apply(const Scalar1& val1, const Scalar2& val2) {
return val1+val2;
}
};
template<class Scalar1, class Scalar2>
struct SubOper {
KOKKOS_FORCEINLINE_FUNCTION
static Scalar1 apply(const Scalar1& val1, const Scalar2& val2) {
return val1-val2;
}
};
template<class Scalar1, class Scalar2>
struct MulOper {
KOKKOS_FORCEINLINE_FUNCTION
static Scalar1 apply(const Scalar1& val1, const Scalar2& val2) {
return val1*val2;
}
};
template<class Scalar1, class Scalar2>
struct DivOper {
KOKKOS_FORCEINLINE_FUNCTION
static Scalar1 apply(const Scalar1& val1, const Scalar2& val2) {
return val1/val2;
}
};
template<class Scalar1, class Scalar2>
struct ModOper {
KOKKOS_FORCEINLINE_FUNCTION
static Scalar1 apply(const Scalar1& val1, const Scalar2& val2) {
return val1%val2;
}
};
template<class Scalar1, class Scalar2>
struct AndOper {
KOKKOS_FORCEINLINE_FUNCTION
static Scalar1 apply(const Scalar1& val1, const Scalar2& val2) {
return val1&val2;
}
};
template<class Scalar1, class Scalar2>
struct OrOper {
KOKKOS_FORCEINLINE_FUNCTION
static Scalar1 apply(const Scalar1& val1, const Scalar2& val2) {
return val1|val2;
}
};
template<class Scalar1, class Scalar2>
struct XorOper {
KOKKOS_FORCEINLINE_FUNCTION
static Scalar1 apply(const Scalar1& val1, const Scalar2& val2) {
return val1^val2;
}
};
template<class Scalar1, class Scalar2>
struct LShiftOper {
KOKKOS_FORCEINLINE_FUNCTION
static Scalar1 apply(const Scalar1& val1, const Scalar2& val2) {
return val1<<val2;
}
};
template<class Scalar1, class Scalar2>
struct RShiftOper {
KOKKOS_FORCEINLINE_FUNCTION
static Scalar1 apply(const Scalar1& val1, const Scalar2& val2) {
return val1>>val2;
}
};
template < class Oper, typename T >
KOKKOS_INLINE_FUNCTION
T atomic_fetch_oper( const Oper& op, volatile T * const dest ,
- typename ::Kokkos::Impl::enable_if< sizeof(T) != sizeof(int) &&
+ typename Kokkos::Impl::enable_if< sizeof(T) != sizeof(int) &&
sizeof(T) == sizeof(unsigned long long int) , const T >::type val )
{
union { unsigned long long int i ; T t ; } oldval , assume , newval ;
oldval.t = *dest ;
do {
assume.i = oldval.i ;
newval.t = Oper::apply(assume.t, val) ;
- oldval.i = ::Kokkos::atomic_compare_exchange( (unsigned long long int*)dest , assume.i , newval.i );
+ oldval.i = Kokkos::atomic_compare_exchange( (unsigned long long int*)dest , assume.i , newval.i );
} while ( assume.i != oldval.i );
return oldval.t ;
}
template < class Oper, typename T >
KOKKOS_INLINE_FUNCTION
T atomic_oper_fetch( const Oper& op, volatile T * const dest ,
- typename ::Kokkos::Impl::enable_if< sizeof(T) != sizeof(int) &&
+ typename Kokkos::Impl::enable_if< sizeof(T) != sizeof(int) &&
sizeof(T) == sizeof(unsigned long long int) , const T >::type val )
{
union { unsigned long long int i ; T t ; } oldval , assume , newval ;
oldval.t = *dest ;
do {
assume.i = oldval.i ;
newval.t = Oper::apply(assume.t, val) ;
- oldval.i = ::Kokkos::atomic_compare_exchange( (unsigned long long int*)dest , assume.i , newval.i );
+ oldval.i = Kokkos::atomic_compare_exchange( (unsigned long long int*)dest , assume.i , newval.i );
} while ( assume.i != oldval.i );
return newval.t ;
}
template < class Oper, typename T >
KOKKOS_INLINE_FUNCTION
T atomic_fetch_oper( const Oper& op, volatile T * const dest ,
- typename ::Kokkos::Impl::enable_if< sizeof(T) == sizeof(int) , const T >::type val )
+ typename Kokkos::Impl::enable_if< sizeof(T) == sizeof(int) , const T >::type val )
{
union { int i ; T t ; } oldval , assume , newval ;
oldval.t = *dest ;
do {
assume.i = oldval.i ;
newval.t = Oper::apply(assume.t, val) ;
- oldval.i = ::Kokkos::atomic_compare_exchange( (int*)dest , assume.i , newval.i );
+ oldval.i = Kokkos::atomic_compare_exchange( (int*)dest , assume.i , newval.i );
} while ( assume.i != oldval.i );
return oldval.t ;
}
template < class Oper, typename T >
KOKKOS_INLINE_FUNCTION
T atomic_oper_fetch( const Oper& op, volatile T * const dest ,
- typename ::Kokkos::Impl::enable_if< sizeof(T) == sizeof(int), const T >::type val )
+ typename Kokkos::Impl::enable_if< sizeof(T) == sizeof(int), const T >::type val )
{
union { int i ; T t ; } oldval , assume , newval ;
oldval.t = *dest ;
do {
assume.i = oldval.i ;
newval.t = Oper::apply(assume.t, val) ;
- oldval.i = ::Kokkos::atomic_compare_exchange( (int*)dest , assume.i , newval.i );
+ oldval.i = Kokkos::atomic_compare_exchange( (int*)dest , assume.i , newval.i );
} while ( assume.i != oldval.i );
return newval.t ;
}
template < class Oper, typename T >
KOKKOS_INLINE_FUNCTION
T atomic_fetch_oper( const Oper& op, volatile T * const dest ,
- typename ::Kokkos::Impl::enable_if<
+ typename Kokkos::Impl::enable_if<
( sizeof(T) != 4 )
&& ( sizeof(T) != 8 )
#if defined(KOKKOS_ENABLE_ASM) && defined(KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST)
&& ( sizeof(T) != 16 )
#endif
, const T >::type val )
{
#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
while( !Impl::lock_address_host_space( (void*) dest ) );
T return_val = *dest;
*dest = Oper::apply(return_val, val);
Impl::unlock_address_host_space( (void*) dest );
return return_val;
#else
// This is a way to (hopefully) avoid dead lock in a warp
- int done = 1;
- while ( done>0 ) {
- done++;
- if( Impl::lock_address_cuda_space( (void*) dest ) ) {
- T return_val = *dest;
- *dest = Oper::apply(return_val, val);;
- Impl::unlock_address_cuda_space( (void*) dest );
- done=0;
+ T return_val;
+ int done = 0;
+ unsigned int active = __ballot(1);
+ unsigned int done_active = 0;
+ while (active!=done_active) {
+ if(!done) {
+ if( Impl::lock_address_cuda_space( (void*) dest ) ) {
+ return_val = *dest;
+ *dest = Oper::apply(return_val, val);;
+ Impl::unlock_address_cuda_space( (void*) dest );
+ done=1;
+ }
}
+ done_active = __ballot(done);
}
return return_val;
#endif
}
template < class Oper, typename T >
KOKKOS_INLINE_FUNCTION
T atomic_oper_fetch( const Oper& op, volatile T * const dest ,
- typename ::Kokkos::Impl::enable_if<
+ typename Kokkos::Impl::enable_if<
( sizeof(T) != 4 )
&& ( sizeof(T) != 8 )
#if defined(KOKKOS_ENABLE_ASM) && defined(KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST)
&& ( sizeof(T) != 16 )
#endif
, const T >::type& val )
{
#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
while( !Impl::lock_address_host_space( (void*) dest ) );
T return_val = Oper::apply(*dest, val);
*dest = return_val;
Impl::unlock_address_host_space( (void*) dest );
return return_val;
#else
+ T return_val;
// This is a way to (hopefully) avoid dead lock in a warp
- int done = 1;
- while ( done>0 ) {
- done++;
- if( Impl::lock_address_cuda_space( (void*) dest ) ) {
- T return_val = Oper::apply(*dest, val);
- *dest = return_val;
- Impl::unlock_address_cuda_space( (void*) dest );
- done=0;
+ int done = 0;
+ unsigned int active = __ballot(1);
+ unsigned int done_active = 0;
+ while (active!=done_active) {
+ if(!done) {
+ if( Impl::lock_address_cuda_space( (void*) dest ) ) {
+ return_val = Oper::apply(*dest, val);
+ *dest = return_val;
+ Impl::unlock_address_cuda_space( (void*) dest );
+ done=1;
+ }
}
+ done_active = __ballot(done);
}
return return_val;
#endif
}
}
}
namespace Kokkos {
// Fetch_Oper atomics: return value before operation
template < typename T >
KOKKOS_INLINE_FUNCTION
T atomic_fetch_max(volatile T * const dest, const T val) {
return Impl::atomic_fetch_oper(Impl::MaxOper<T,const T>(),dest,val);
}
template < typename T >
KOKKOS_INLINE_FUNCTION
T atomic_fetch_min(volatile T * const dest, const T val) {
return Impl::atomic_fetch_oper(Impl::MinOper<T,const T>(),dest,val);
}
template < typename T >
KOKKOS_INLINE_FUNCTION
T atomic_fetch_mul(volatile T * const dest, const T val) {
return Impl::atomic_fetch_oper(Impl::MulOper<T,const T>(),dest,val);
}
template < typename T >
KOKKOS_INLINE_FUNCTION
T atomic_fetch_div(volatile T * const dest, const T val) {
return Impl::atomic_fetch_oper(Impl::DivOper<T,const T>(),dest,val);
}
template < typename T >
KOKKOS_INLINE_FUNCTION
T atomic_fetch_mod(volatile T * const dest, const T val) {
return Impl::atomic_fetch_oper(Impl::ModOper<T,const T>(),dest,val);
}
template < typename T >
KOKKOS_INLINE_FUNCTION
T atomic_fetch_and(volatile T * const dest, const T val) {
return Impl::atomic_fetch_oper(Impl::AndOper<T,const T>(),dest,val);
}
template < typename T >
KOKKOS_INLINE_FUNCTION
T atomic_fetch_or(volatile T * const dest, const T val) {
return Impl::atomic_fetch_oper(Impl::OrOper<T,const T>(),dest,val);
}
template < typename T >
KOKKOS_INLINE_FUNCTION
T atomic_fetch_xor(volatile T * const dest, const T val) {
return Impl::atomic_fetch_oper(Impl::XorOper<T,const T>(),dest,val);
}
template < typename T >
KOKKOS_INLINE_FUNCTION
T atomic_fetch_lshift(volatile T * const dest, const unsigned int val) {
return Impl::atomic_fetch_oper(Impl::LShiftOper<T,const unsigned int>(),dest,val);
}
template < typename T >
KOKKOS_INLINE_FUNCTION
T atomic_fetch_rshift(volatile T * const dest, const unsigned int val) {
return Impl::atomic_fetch_oper(Impl::RShiftOper<T,const unsigned int>(),dest,val);
}
// Oper Fetch atomics: return value after operation
template < typename T >
KOKKOS_INLINE_FUNCTION
T atomic_max_fetch(volatile T * const dest, const T val) {
return Impl::atomic_oper_fetch(Impl::MaxOper<T,const T>(),dest,val);
}
template < typename T >
KOKKOS_INLINE_FUNCTION
T atomic_min_fetch(volatile T * const dest, const T val) {
return Impl::atomic_oper_fetch(Impl::MinOper<T,const T>(),dest,val);
}
template < typename T >
KOKKOS_INLINE_FUNCTION
T atomic_mul_fetch(volatile T * const dest, const T val) {
return Impl::atomic_oper_fetch(Impl::MulOper<T,const T>(),dest,val);
}
template < typename T >
KOKKOS_INLINE_FUNCTION
T atomic_div_fetch(volatile T * const dest, const T val) {
return Impl::atomic_oper_fetch(Impl::DivOper<T,const T>(),dest,val);
}
template < typename T >
KOKKOS_INLINE_FUNCTION
T atomic_mod_fetch(volatile T * const dest, const T val) {
return Impl::atomic_oper_fetch(Impl::ModOper<T,const T>(),dest,val);
}
template < typename T >
KOKKOS_INLINE_FUNCTION
T atomic_and_fetch(volatile T * const dest, const T val) {
return Impl::atomic_oper_fetch(Impl::AndOper<T,const T>(),dest,val);
}
template < typename T >
KOKKOS_INLINE_FUNCTION
T atomic_or_fetch(volatile T * const dest, const T val) {
return Impl::atomic_oper_fetch(Impl::OrOper<T,const T>(),dest,val);
}
template < typename T >
KOKKOS_INLINE_FUNCTION
T atomic_xor_fetch(volatile T * const dest, const T val) {
return Impl::atomic_oper_fetch(Impl::XorOper<T,const T>(),dest,val);
}
template < typename T >
KOKKOS_INLINE_FUNCTION
T atomic_lshift_fetch(volatile T * const dest, const unsigned int val) {
return Impl::atomic_oper_fetch(Impl::LShiftOper<T,const unsigned int>(),dest,val);
}
template < typename T >
KOKKOS_INLINE_FUNCTION
T atomic_rshift_fetch(volatile T * const dest, const unsigned int val) {
return Impl::atomic_oper_fetch(Impl::RShiftOper<T,const unsigned int>(),dest,val);
}
}
#endif
diff --git a/lib/kokkos/core/src/impl/Kokkos_CPUDiscovery.cpp b/lib/kokkos/core/src/impl/Kokkos_CPUDiscovery.cpp
index b9d23bd81..8ee094675 100644
--- a/lib/kokkos/core/src/impl/Kokkos_CPUDiscovery.cpp
+++ b/lib/kokkos/core/src/impl/Kokkos_CPUDiscovery.cpp
@@ -1,124 +1,124 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifdef _WIN32
#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#else
#include <unistd.h>
#endif
#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <cerrno>
namespace Kokkos {
namespace Impl {
//The following function (processors_per_node) is copied from here:
// https://lists.gnu.org/archive/html/autoconf/2002-08/msg00126.html
// Philip Willoughby
int processors_per_node() {
int nprocs = -1;
int nprocs_max = -1;
#ifdef _WIN32
#ifndef _SC_NPROCESSORS_ONLN
SYSTEM_INFO info;
GetSystemInfo(&info);
#define sysconf(a) info.dwNumberOfProcessors
#define _SC_NPROCESSORS_ONLN
#endif
#endif
#ifdef _SC_NPROCESSORS_ONLN
nprocs = sysconf(_SC_NPROCESSORS_ONLN);
if (nprocs < 1)
{
return -1;
}
nprocs_max = sysconf(_SC_NPROCESSORS_CONF);
if (nprocs_max < 1)
{
return -1;
}
return nprocs;
#else
return -1;
#endif
}
int mpi_ranks_per_node() {
char *str;
int ppn = 1;
- if ((str = getenv("SLURM_TASKS_PER_NODE"))) {
- ppn = atoi(str);
- if(ppn<=0) ppn = 1;
- }
+ //if ((str = getenv("SLURM_TASKS_PER_NODE"))) {
+ // ppn = atoi(str);
+ // if(ppn<=0) ppn = 1;
+ //}
if ((str = getenv("MV2_COMM_WORLD_LOCAL_SIZE"))) {
ppn = atoi(str);
if(ppn<=0) ppn = 1;
}
if ((str = getenv("OMPI_COMM_WORLD_LOCAL_SIZE"))) {
ppn = atoi(str);
if(ppn<=0) ppn = 1;
}
return ppn;
}
int mpi_local_rank_on_node() {
char *str;
int local_rank=0;
- if ((str = getenv("SLURM_LOCALID"))) {
- local_rank = atoi(str);
- }
+ //if ((str = getenv("SLURM_LOCALID"))) {
+ // local_rank = atoi(str);
+ //}
if ((str = getenv("MV2_COMM_WORLD_LOCAL_RANK"))) {
local_rank = atoi(str);
}
if ((str = getenv("OMPI_COMM_WORLD_LOCAL_RANK"))) {
local_rank = atoi(str);
}
return local_rank;
}
}
}
diff --git a/lib/kokkos/core/src/impl/Kokkos_Core.cpp b/lib/kokkos/core/src/impl/Kokkos_Core.cpp
index 567a21414..de1085986 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Core.cpp
+++ b/lib/kokkos/core/src/impl/Kokkos_Core.cpp
@@ -1,454 +1,453 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Core.hpp>
#include <impl/Kokkos_Error.hpp>
#include <cctype>
#include <cstring>
#include <iostream>
#include <cstdlib>
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
namespace {
bool is_unsigned_int(const char* str)
{
const size_t len = strlen (str);
for (size_t i = 0; i < len; ++i) {
if (! isdigit (str[i])) {
return false;
}
}
return true;
}
void initialize_internal(const InitArguments& args)
{
// This is an experimental setting
// For KNL in Flat mode this variable should be set, so that
// memkind allocates high bandwidth memory correctly.
#ifdef KOKKOS_HAVE_HBWSPACE
setenv("MEMKIND_HBW_NODES", "1", 0);
#endif
// Protect declarations, to prevent "unused variable" warnings.
#if defined( KOKKOS_HAVE_OPENMP ) || defined( KOKKOS_HAVE_PTHREAD )
const int num_threads = args.num_threads;
const int use_numa = args.num_numa;
#endif // defined( KOKKOS_HAVE_OPENMP ) || defined( KOKKOS_HAVE_PTHREAD )
#if defined( KOKKOS_HAVE_CUDA )
const int use_gpu = args.device_id;
#endif // defined( KOKKOS_HAVE_CUDA )
#if defined( KOKKOS_HAVE_OPENMP )
- if( Impl::is_same< Kokkos::OpenMP , Kokkos::DefaultExecutionSpace >::value ||
- Impl::is_same< Kokkos::OpenMP , Kokkos::HostSpace::execution_space >::value ) {
+ if( std::is_same< Kokkos::OpenMP , Kokkos::DefaultExecutionSpace >::value ||
+ std::is_same< Kokkos::OpenMP , Kokkos::HostSpace::execution_space >::value ) {
if(num_threads>0) {
if(use_numa>0) {
Kokkos::OpenMP::initialize(num_threads,use_numa);
}
else {
Kokkos::OpenMP::initialize(num_threads);
}
} else {
Kokkos::OpenMP::initialize();
}
//std::cout << "Kokkos::initialize() fyi: OpenMP enabled and initialized" << std::endl ;
}
else {
//std::cout << "Kokkos::initialize() fyi: OpenMP enabled but not initialized" << std::endl ;
}
#endif
#if defined( KOKKOS_HAVE_PTHREAD )
- if( Impl::is_same< Kokkos::Threads , Kokkos::DefaultExecutionSpace >::value ||
- Impl::is_same< Kokkos::Threads , Kokkos::HostSpace::execution_space >::value ) {
+ if( std::is_same< Kokkos::Threads , Kokkos::DefaultExecutionSpace >::value ||
+ std::is_same< Kokkos::Threads , Kokkos::HostSpace::execution_space >::value ) {
if(num_threads>0) {
if(use_numa>0) {
Kokkos::Threads::initialize(num_threads,use_numa);
}
else {
Kokkos::Threads::initialize(num_threads);
}
} else {
Kokkos::Threads::initialize();
}
//std::cout << "Kokkos::initialize() fyi: Pthread enabled and initialized" << std::endl ;
}
else {
//std::cout << "Kokkos::initialize() fyi: Pthread enabled but not initialized" << std::endl ;
}
#endif
#if defined( KOKKOS_HAVE_SERIAL )
// Prevent "unused variable" warning for 'args' input struct. If
// Serial::initialize() ever needs to take arguments from the input
// struct, you may remove this line of code.
(void) args;
- if( Impl::is_same< Kokkos::Serial , Kokkos::DefaultExecutionSpace >::value ||
- Impl::is_same< Kokkos::Serial , Kokkos::HostSpace::execution_space >::value ) {
+ if( std::is_same< Kokkos::Serial , Kokkos::DefaultExecutionSpace >::value ||
+ std::is_same< Kokkos::Serial , Kokkos::HostSpace::execution_space >::value ) {
Kokkos::Serial::initialize();
}
#endif
#if defined( KOKKOS_HAVE_CUDA )
- if( Impl::is_same< Kokkos::Cuda , Kokkos::DefaultExecutionSpace >::value || 0 < use_gpu ) {
+ if( std::is_same< Kokkos::Cuda , Kokkos::DefaultExecutionSpace >::value || 0 < use_gpu ) {
if (use_gpu > -1) {
Kokkos::Cuda::initialize( Kokkos::Cuda::SelectDevice( use_gpu ) );
}
else {
Kokkos::Cuda::initialize();
}
//std::cout << "Kokkos::initialize() fyi: Cuda enabled and initialized" << std::endl ;
}
#endif
#if (KOKKOS_ENABLE_PROFILING)
Kokkos::Profiling::initialize();
#endif
}
void finalize_internal( const bool all_spaces = false )
{
+#if (KOKKOS_ENABLE_PROFILING)
+ Kokkos::Profiling::finalize();
+#endif
+
#if defined( KOKKOS_HAVE_CUDA )
- if( Impl::is_same< Kokkos::Cuda , Kokkos::DefaultExecutionSpace >::value || all_spaces ) {
+ if( std::is_same< Kokkos::Cuda , Kokkos::DefaultExecutionSpace >::value || all_spaces ) {
if(Kokkos::Cuda::is_initialized())
Kokkos::Cuda::finalize();
}
#endif
#if defined( KOKKOS_HAVE_OPENMP )
- if( Impl::is_same< Kokkos::OpenMP , Kokkos::DefaultExecutionSpace >::value ||
- Impl::is_same< Kokkos::OpenMP , Kokkos::HostSpace::execution_space >::value ||
+ if( std::is_same< Kokkos::OpenMP , Kokkos::DefaultExecutionSpace >::value ||
+ std::is_same< Kokkos::OpenMP , Kokkos::HostSpace::execution_space >::value ||
all_spaces ) {
if(Kokkos::OpenMP::is_initialized())
Kokkos::OpenMP::finalize();
}
#endif
#if defined( KOKKOS_HAVE_PTHREAD )
- if( Impl::is_same< Kokkos::Threads , Kokkos::DefaultExecutionSpace >::value ||
- Impl::is_same< Kokkos::Threads , Kokkos::HostSpace::execution_space >::value ||
+ if( std::is_same< Kokkos::Threads , Kokkos::DefaultExecutionSpace >::value ||
+ std::is_same< Kokkos::Threads , Kokkos::HostSpace::execution_space >::value ||
all_spaces ) {
if(Kokkos::Threads::is_initialized())
Kokkos::Threads::finalize();
}
#endif
#if defined( KOKKOS_HAVE_SERIAL )
- if( Impl::is_same< Kokkos::Serial , Kokkos::DefaultExecutionSpace >::value ||
- Impl::is_same< Kokkos::Serial , Kokkos::HostSpace::execution_space >::value ||
+ if( std::is_same< Kokkos::Serial , Kokkos::DefaultExecutionSpace >::value ||
+ std::is_same< Kokkos::Serial , Kokkos::HostSpace::execution_space >::value ||
all_spaces ) {
if(Kokkos::Serial::is_initialized())
Kokkos::Serial::finalize();
}
#endif
-
-#if (KOKKOS_ENABLE_PROFILING)
- Kokkos::Profiling::finalize();
-#endif
-
}
void fence_internal()
{
#if defined( KOKKOS_HAVE_CUDA )
- if( Impl::is_same< Kokkos::Cuda , Kokkos::DefaultExecutionSpace >::value ) {
+ if( std::is_same< Kokkos::Cuda , Kokkos::DefaultExecutionSpace >::value ) {
Kokkos::Cuda::fence();
}
#endif
#if defined( KOKKOS_HAVE_OPENMP )
- if( Impl::is_same< Kokkos::OpenMP , Kokkos::DefaultExecutionSpace >::value ||
- Impl::is_same< Kokkos::OpenMP , Kokkos::HostSpace::execution_space >::value ) {
+ if( std::is_same< Kokkos::OpenMP , Kokkos::DefaultExecutionSpace >::value ||
+ std::is_same< Kokkos::OpenMP , Kokkos::HostSpace::execution_space >::value ) {
Kokkos::OpenMP::fence();
}
#endif
#if defined( KOKKOS_HAVE_PTHREAD )
- if( Impl::is_same< Kokkos::Threads , Kokkos::DefaultExecutionSpace >::value ||
- Impl::is_same< Kokkos::Threads , Kokkos::HostSpace::execution_space >::value ) {
+ if( std::is_same< Kokkos::Threads , Kokkos::DefaultExecutionSpace >::value ||
+ std::is_same< Kokkos::Threads , Kokkos::HostSpace::execution_space >::value ) {
Kokkos::Threads::fence();
}
#endif
#if defined( KOKKOS_HAVE_SERIAL )
- if( Impl::is_same< Kokkos::Serial , Kokkos::DefaultExecutionSpace >::value ||
- Impl::is_same< Kokkos::Serial , Kokkos::HostSpace::execution_space >::value ) {
+ if( std::is_same< Kokkos::Serial , Kokkos::DefaultExecutionSpace >::value ||
+ std::is_same< Kokkos::Serial , Kokkos::HostSpace::execution_space >::value ) {
Kokkos::Serial::fence();
}
#endif
}
} // namespace
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
namespace Kokkos {
void initialize(int& narg, char* arg[])
{
int num_threads = -1;
int numa = -1;
int device = -1;
int kokkos_threads_found = 0;
int kokkos_numa_found = 0;
int kokkos_device_found = 0;
int kokkos_ndevices_found = 0;
int iarg = 0;
while (iarg < narg) {
if ((strncmp(arg[iarg],"--kokkos-threads",16) == 0) || (strncmp(arg[iarg],"--threads",9) == 0)) {
//Find the number of threads (expecting --threads=XX)
if (!((strncmp(arg[iarg],"--kokkos-threads=",17) == 0) || (strncmp(arg[iarg],"--threads=",10) == 0)))
Impl::throw_runtime_exception("Error: expecting an '=INT' after command line argument '--threads/--kokkos-threads'. Raised by Kokkos::initialize(int narg, char* argc[]).");
char* number = strchr(arg[iarg],'=')+1;
if(!Impl::is_unsigned_int(number) || (strlen(number)==0))
Impl::throw_runtime_exception("Error: expecting an '=INT' after command line argument '--threads/--kokkos-threads'. Raised by Kokkos::initialize(int narg, char* argc[]).");
if((strncmp(arg[iarg],"--kokkos-threads",16) == 0) || !kokkos_threads_found)
num_threads = atoi(number);
//Remove the --kokkos-threads argument from the list but leave --threads
if(strncmp(arg[iarg],"--kokkos-threads",16) == 0) {
for(int k=iarg;k<narg-1;k++) {
arg[k] = arg[k+1];
}
kokkos_threads_found=1;
narg--;
} else {
iarg++;
}
} else if ((strncmp(arg[iarg],"--kokkos-numa",13) == 0) || (strncmp(arg[iarg],"--numa",6) == 0)) {
//Find the number of numa (expecting --numa=XX)
if (!((strncmp(arg[iarg],"--kokkos-numa=",14) == 0) || (strncmp(arg[iarg],"--numa=",7) == 0)))
Impl::throw_runtime_exception("Error: expecting an '=INT' after command line argument '--numa/--kokkos-numa'. Raised by Kokkos::initialize(int narg, char* argc[]).");
char* number = strchr(arg[iarg],'=')+1;
if(!Impl::is_unsigned_int(number) || (strlen(number)==0))
Impl::throw_runtime_exception("Error: expecting an '=INT' after command line argument '--numa/--kokkos-numa'. Raised by Kokkos::initialize(int narg, char* argc[]).");
if((strncmp(arg[iarg],"--kokkos-numa",13) == 0) || !kokkos_numa_found)
numa = atoi(number);
//Remove the --kokkos-numa argument from the list but leave --numa
if(strncmp(arg[iarg],"--kokkos-numa",13) == 0) {
for(int k=iarg;k<narg-1;k++) {
arg[k] = arg[k+1];
}
kokkos_numa_found=1;
narg--;
} else {
iarg++;
}
} else if ((strncmp(arg[iarg],"--kokkos-device",15) == 0) || (strncmp(arg[iarg],"--device",8) == 0)) {
//Find the number of device (expecting --device=XX)
if (!((strncmp(arg[iarg],"--kokkos-device=",16) == 0) || (strncmp(arg[iarg],"--device=",9) == 0)))
Impl::throw_runtime_exception("Error: expecting an '=INT' after command line argument '--device/--kokkos-device'. Raised by Kokkos::initialize(int narg, char* argc[]).");
char* number = strchr(arg[iarg],'=')+1;
if(!Impl::is_unsigned_int(number) || (strlen(number)==0))
Impl::throw_runtime_exception("Error: expecting an '=INT' after command line argument '--device/--kokkos-device'. Raised by Kokkos::initialize(int narg, char* argc[]).");
if((strncmp(arg[iarg],"--kokkos-device",15) == 0) || !kokkos_device_found)
device = atoi(number);
//Remove the --kokkos-device argument from the list but leave --device
if(strncmp(arg[iarg],"--kokkos-device",15) == 0) {
for(int k=iarg;k<narg-1;k++) {
arg[k] = arg[k+1];
}
kokkos_device_found=1;
narg--;
} else {
iarg++;
}
} else if ((strncmp(arg[iarg],"--kokkos-ndevices",17) == 0) || (strncmp(arg[iarg],"--ndevices",10) == 0)) {
//Find the number of device (expecting --device=XX)
if (!((strncmp(arg[iarg],"--kokkos-ndevices=",18) == 0) || (strncmp(arg[iarg],"--ndevices=",11) == 0)))
Impl::throw_runtime_exception("Error: expecting an '=INT[,INT]' after command line argument '--ndevices/--kokkos-ndevices'. Raised by Kokkos::initialize(int narg, char* argc[]).");
int ndevices=-1;
int skip_device = 9999;
char* num1 = strchr(arg[iarg],'=')+1;
char* num2 = strpbrk(num1,",");
int num1_len = num2==NULL?strlen(num1):num2-num1;
char* num1_only = new char[num1_len+1];
strncpy(num1_only,num1,num1_len);
num1_only[num1_len]=0;
if(!Impl::is_unsigned_int(num1_only) || (strlen(num1_only)==0)) {
Impl::throw_runtime_exception("Error: expecting an integer number after command line argument '--kokkos-ndevices'. Raised by Kokkos::initialize(int narg, char* argc[]).");
}
if((strncmp(arg[iarg],"--kokkos-ndevices",17) == 0) || !kokkos_ndevices_found)
ndevices = atoi(num1_only);
if( num2 != NULL ) {
if(( !Impl::is_unsigned_int(num2+1) ) || (strlen(num2)==1) )
Impl::throw_runtime_exception("Error: expecting an integer number after command line argument '--kokkos-ndevices=XX,'. Raised by Kokkos::initialize(int narg, char* argc[]).");
if((strncmp(arg[iarg],"--kokkos-ndevices",17) == 0) || !kokkos_ndevices_found)
skip_device = atoi(num2+1);
}
if((strncmp(arg[iarg],"--kokkos-ndevices",17) == 0) || !kokkos_ndevices_found) {
char *str;
- if ((str = getenv("SLURM_LOCALID"))) {
- int local_rank = atoi(str);
- device = local_rank % ndevices;
- if (device >= skip_device) device++;
- }
+ //if ((str = getenv("SLURM_LOCALID"))) {
+ // int local_rank = atoi(str);
+ // device = local_rank % ndevices;
+ // if (device >= skip_device) device++;
+ //}
if ((str = getenv("MV2_COMM_WORLD_LOCAL_RANK"))) {
int local_rank = atoi(str);
device = local_rank % ndevices;
if (device >= skip_device) device++;
}
if ((str = getenv("OMPI_COMM_WORLD_LOCAL_RANK"))) {
int local_rank = atoi(str);
device = local_rank % ndevices;
if (device >= skip_device) device++;
}
if(device==-1) {
device = 0;
if (device >= skip_device) device++;
}
}
//Remove the --kokkos-ndevices argument from the list but leave --ndevices
if(strncmp(arg[iarg],"--kokkos-ndevices",17) == 0) {
for(int k=iarg;k<narg-1;k++) {
arg[k] = arg[k+1];
}
kokkos_ndevices_found=1;
narg--;
} else {
iarg++;
}
} else if ((strcmp(arg[iarg],"--kokkos-help") == 0) || (strcmp(arg[iarg],"--help") == 0)) {
std::cout << std::endl;
std::cout << "--------------------------------------------------------------------------------" << std::endl;
std::cout << "-------------Kokkos command line arguments--------------------------------------" << std::endl;
std::cout << "--------------------------------------------------------------------------------" << std::endl;
std::cout << "The following arguments exist also without prefix 'kokkos' (e.g. --help)." << std::endl;
std::cout << "The prefixed arguments will be removed from the list by Kokkos::initialize()," << std::endl;
std::cout << "the non-prefixed ones are not removed. Prefixed versions take precedence over " << std::endl;
std::cout << "non prefixed ones, and the last occurence of an argument overwrites prior" << std::endl;
std::cout << "settings." << std::endl;
std::cout << std::endl;
std::cout << "--kokkos-help : print this message" << std::endl;
std::cout << "--kokkos-threads=INT : specify total number of threads or" << std::endl;
std::cout << " number of threads per NUMA region if " << std::endl;
std::cout << " used in conjunction with '--numa' option. " << std::endl;
std::cout << "--kokkos-numa=INT : specify number of NUMA regions used by process." << std::endl;
std::cout << "--kokkos-device=INT : specify device id to be used by Kokkos. " << std::endl;
std::cout << "--kokkos-ndevices=INT[,INT] : used when running MPI jobs. Specify number of" << std::endl;
std::cout << " devices per node to be used. Process to device" << std::endl;
std::cout << " mapping happens by obtaining the local MPI rank" << std::endl;
std::cout << " and assigning devices round-robin. The optional" << std::endl;
std::cout << " second argument allows for an existing device" << std::endl;
std::cout << " to be ignored. This is most useful on workstations" << std::endl;
std::cout << " with multiple GPUs of which one is used to drive" << std::endl;
std::cout << " screen output." << std::endl;
std::cout << std::endl;
std::cout << "--------------------------------------------------------------------------------" << std::endl;
std::cout << std::endl;
//Remove the --kokkos-help argument from the list but leave --ndevices
if(strcmp(arg[iarg],"--kokkos-help") == 0) {
for(int k=iarg;k<narg-1;k++) {
arg[k] = arg[k+1];
}
narg--;
} else {
iarg++;
}
} else
iarg++;
}
InitArguments arguments;
arguments.num_threads = num_threads;
arguments.num_numa = numa;
arguments.device_id = device;
Impl::initialize_internal(arguments);
}
void initialize(const InitArguments& arguments) {
Impl::initialize_internal(arguments);
}
void finalize()
{
Impl::finalize_internal();
}
void finalize_all()
{
enum { all_spaces = true };
Impl::finalize_internal( all_spaces );
}
void fence()
{
Impl::fence_internal();
}
} // namespace Kokkos
diff --git a/lib/kokkos/core/src/impl/Kokkos_Error.hpp b/lib/kokkos/core/src/impl/Kokkos_Error.hpp
index 5f88d6620..5fab5eb9a 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Error.hpp
+++ b/lib/kokkos/core/src/impl/Kokkos_Error.hpp
@@ -1,82 +1,88 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_IMPL_ERROR_HPP
#define KOKKOS_IMPL_ERROR_HPP
#include <string>
#include <iosfwd>
-#include <KokkosCore_config.h>
+#include <Kokkos_Macros.hpp>
#ifdef KOKKOS_HAVE_CUDA
#include <Cuda/Kokkos_Cuda_abort.hpp>
#endif
namespace Kokkos {
namespace Impl {
void host_abort( const char * const );
void throw_runtime_exception( const std::string & );
void traceback_callstack( std::ostream & );
std::string human_memory_size(size_t arg_bytes);
}
}
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
-#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
+
namespace Kokkos {
-inline
-void abort( const char * const message ) { Kokkos::Impl::host_abort(message); }
+KOKKOS_INLINE_FUNCTION
+void abort( const char * const message ) {
+#ifdef __CUDA_ARCH__
+ Kokkos::Impl::cuda_abort(message);
+#else
+ Kokkos::Impl::host_abort(message);
+#endif
+}
+
}
-#endif /* defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_CUDA ) */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
#endif /* #ifndef KOKKOS_IMPL_ERROR_HPP */
diff --git a/lib/kokkos/core/src/impl/Kokkos_FunctorAdapter.hpp b/lib/kokkos/core/src/impl/Kokkos_FunctorAdapter.hpp
index 78b679449..66c3157c3 100644
--- a/lib/kokkos/core/src/impl/Kokkos_FunctorAdapter.hpp
+++ b/lib/kokkos/core/src/impl/Kokkos_FunctorAdapter.hpp
@@ -1,1131 +1,1131 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_FUNCTORADAPTER_HPP
#define KOKKOS_FUNCTORADAPTER_HPP
#include <cstddef>
#include <Kokkos_Core_fwd.hpp>
#include <impl/Kokkos_Traits.hpp>
#include <impl/Kokkos_Tags.hpp>
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template< class FunctorType , class ArgTag , class Enable = void >
struct FunctorDeclaresValueType : public Impl::false_type {};
template< class FunctorType , class ArgTag >
struct FunctorDeclaresValueType< FunctorType , ArgTag
, typename Impl::enable_if_type< typename FunctorType::value_type >::type >
: public Impl::true_type {};
/** \brief Query Functor and execution policy argument tag for value type.
*
* If C++11 enabled and 'value_type' is not explicitly declared then attempt
* to deduce the type from FunctorType::operator().
*/
template< class FunctorType , class ArgTag , bool Dec = FunctorDeclaresValueType<FunctorType,ArgTag>::value >
struct FunctorValueTraits
{
typedef void value_type ;
typedef void pointer_type ;
typedef void reference_type ;
typedef void functor_type ;
enum { StaticValueSize = 0 };
KOKKOS_FORCEINLINE_FUNCTION static
unsigned value_count( const FunctorType & ) { return 0 ; }
KOKKOS_FORCEINLINE_FUNCTION static
unsigned value_size( const FunctorType & ) { return 0 ; }
};
template<class ArgTag>
struct FunctorValueTraits<void, ArgTag,false>
{
typedef void value_type ;
typedef void pointer_type ;
typedef void reference_type ;
typedef void functor_type ;
};
/** \brief FunctorType::value_type is explicitly declared so use it.
*
* Two options for declaration
*
* 1) A plain-old-data (POD) type
* typedef {pod_type} value_type ;
*
* 2) An array of POD of a runtime specified count.
* typedef {pod_type} value_type[] ;
* const unsigned value_count ;
*/
template< class FunctorType , class ArgTag >
struct FunctorValueTraits< FunctorType , ArgTag , true /* == exists FunctorType::value_type */ >
{
typedef typename Impl::remove_extent< typename FunctorType::value_type >::type value_type ;
typedef FunctorType functor_type;
static_assert( 0 == ( sizeof(value_type) % sizeof(int) ) ,
"Reduction functor's declared value_type requires: 0 == sizeof(value_type) % sizeof(int)" );
// If not an array then what is the sizeof(value_type)
enum { StaticValueSize = Impl::is_array< typename FunctorType::value_type >::value ? 0 : sizeof(value_type) };
typedef value_type * pointer_type ;
// The reference_type for an array is 'value_type *'
// The reference_type for a single value is 'value_type &'
typedef typename Impl::if_c< ! StaticValueSize , value_type *
, value_type & >::type reference_type ;
// Number of values if single value
template< class F >
KOKKOS_FORCEINLINE_FUNCTION static
- typename Impl::enable_if< Impl::is_same<F,FunctorType>::value && StaticValueSize , unsigned >::type
+ typename Impl::enable_if< std::is_same<F,FunctorType>::value && StaticValueSize , unsigned >::type
value_count( const F & ) { return 1 ; }
// Number of values if an array, protect via templating because 'f.value_count'
// will only exist when the functor declares the value_type to be an array.
template< class F >
KOKKOS_FORCEINLINE_FUNCTION static
- typename Impl::enable_if< Impl::is_same<F,FunctorType>::value && ! StaticValueSize , unsigned >::type
+ typename Impl::enable_if< std::is_same<F,FunctorType>::value && ! StaticValueSize , unsigned >::type
value_count( const F & f ) { return f.value_count ; }
// Total size of the value
KOKKOS_INLINE_FUNCTION static
unsigned value_size( const FunctorType & f ) { return value_count( f ) * sizeof(value_type) ; }
};
template< class FunctorType , class ArgTag >
struct FunctorValueTraits< FunctorType
, ArgTag
, false /* == exists FunctorType::value_type */
>
{
private:
struct VOIDTAG {}; // Allow declaration of non-matching operator() with void argument tag.
struct REJECTTAG {}; // Reject tagged operator() when using non-tagged execution policy.
typedef typename
- Impl::if_c< Impl::is_same< ArgTag , void >::value , VOIDTAG , ArgTag >::type tag_type ;
+ Impl::if_c< std::is_same< ArgTag , void >::value , VOIDTAG , ArgTag >::type tag_type ;
//----------------------------------------
// parallel_for operator without a tag:
template< class ArgMember >
KOKKOS_INLINE_FUNCTION
static VOIDTAG deduce_reduce_type( VOIDTAG , void (FunctorType::*)( ArgMember ) const ) {}
template< class ArgMember >
KOKKOS_INLINE_FUNCTION
static VOIDTAG deduce_reduce_type( VOIDTAG , void (FunctorType::*)( const ArgMember & ) const ) {}
template< class TagType , class ArgMember >
KOKKOS_INLINE_FUNCTION
static REJECTTAG deduce_reduce_type( VOIDTAG , void (FunctorType::*)( TagType , ArgMember ) const ) {}
template< class TagType , class ArgMember >
KOKKOS_INLINE_FUNCTION
static REJECTTAG deduce_reduce_type( VOIDTAG , void (FunctorType::*)( TagType , const ArgMember & ) const ) {}
template< class TagType , class ArgMember >
KOKKOS_INLINE_FUNCTION
static REJECTTAG deduce_reduce_type( VOIDTAG , void (FunctorType::*)( const TagType & , ArgMember ) const ) {}
template< class TagType , class ArgMember >
KOKKOS_INLINE_FUNCTION
static REJECTTAG deduce_reduce_type( VOIDTAG , void (FunctorType::*)( const TagType & , const ArgMember & ) const ) {}
//----------------------------------------
// parallel_for operator with a tag:
template< class ArgMember >
KOKKOS_INLINE_FUNCTION
static VOIDTAG deduce_reduce_type( tag_type , void (FunctorType::*)( tag_type , ArgMember ) const ) {}
template< class ArgMember >
KOKKOS_INLINE_FUNCTION
static VOIDTAG deduce_reduce_type( tag_type , void (FunctorType::*)( const tag_type & , ArgMember ) const ) {}
template< class ArgMember >
KOKKOS_INLINE_FUNCTION
static VOIDTAG deduce_reduce_type( tag_type , void (FunctorType::*)( tag_type , const ArgMember & ) const ) {}
template< class ArgMember >
KOKKOS_INLINE_FUNCTION
static VOIDTAG deduce_reduce_type( tag_type , void (FunctorType::*)( const tag_type & , const ArgMember & ) const ) {}
//----------------------------------------
// parallel_reduce operator without a tag:
template< class ArgMember , class T >
KOKKOS_INLINE_FUNCTION
static T deduce_reduce_type( VOIDTAG , void (FunctorType::*)( ArgMember , T & ) const ) {}
template< class ArgMember , class T >
KOKKOS_INLINE_FUNCTION
static T deduce_reduce_type( VOIDTAG , void (FunctorType::*)( const ArgMember & , T & ) const ) {}
template< class TagType , class ArgMember , class T >
KOKKOS_INLINE_FUNCTION
static REJECTTAG deduce_reduce_type( VOIDTAG , void (FunctorType::*)( TagType , ArgMember , T & ) const ) {}
template< class TagType , class ArgMember , class T >
KOKKOS_INLINE_FUNCTION
static REJECTTAG deduce_reduce_type( VOIDTAG , void (FunctorType::*)( TagType , const ArgMember & , T & ) const ) {}
template< class TagType , class ArgMember , class T >
KOKKOS_INLINE_FUNCTION
static REJECTTAG deduce_reduce_type( VOIDTAG , void (FunctorType::*)( const TagType & , ArgMember , T & ) const ) {}
template< class TagType , class ArgMember , class T >
KOKKOS_INLINE_FUNCTION
static REJECTTAG deduce_reduce_type( VOIDTAG , void (FunctorType::*)( const TagType & , const ArgMember & , T & ) const ) {}
//----------------------------------------
// parallel_reduce operator with a tag:
template< class ArgMember , class T >
KOKKOS_INLINE_FUNCTION
static T deduce_reduce_type( tag_type , void (FunctorType::*)( tag_type , ArgMember , T & ) const ) {}
template< class ArgMember , class T >
KOKKOS_INLINE_FUNCTION
static T deduce_reduce_type( tag_type , void (FunctorType::*)( const tag_type & , ArgMember , T & ) const ) {}
template< class ArgMember , class T >
KOKKOS_INLINE_FUNCTION
static T deduce_reduce_type( tag_type , void (FunctorType::*)( tag_type , const ArgMember & , T & ) const ) {}
template< class ArgMember , class T >
KOKKOS_INLINE_FUNCTION
static T deduce_reduce_type( tag_type , void (FunctorType::*)( const tag_type & , const ArgMember & , T & ) const ) {}
//----------------------------------------
// parallel_scan operator without a tag:
template< class ArgMember , class T >
KOKKOS_INLINE_FUNCTION
static T deduce_reduce_type( VOIDTAG , void (FunctorType::*)( ArgMember , T & , bool ) const ) {}
template< class ArgMember , class T >
KOKKOS_INLINE_FUNCTION
static T deduce_reduce_type( VOIDTAG , void (FunctorType::*)( const ArgMember & , T & , bool ) const ) {}
template< class TagType , class ArgMember , class T >
KOKKOS_INLINE_FUNCTION
static REJECTTAG deduce_reduce_type( VOIDTAG , void (FunctorType::*)( TagType , ArgMember , T & , bool ) const ) {}
template< class TagType , class ArgMember , class T >
KOKKOS_INLINE_FUNCTION
static REJECTTAG deduce_reduce_type( VOIDTAG , void (FunctorType::*)( TagType , const ArgMember & , T & , bool ) const ) {}
template< class TagType , class ArgMember , class T >
KOKKOS_INLINE_FUNCTION
static REJECTTAG deduce_reduce_type( VOIDTAG , void (FunctorType::*)( const TagType & , ArgMember , T & , bool ) const ) {}
template< class TagType , class ArgMember , class T >
KOKKOS_INLINE_FUNCTION
static REJECTTAG deduce_reduce_type( VOIDTAG , void (FunctorType::*)( const TagType & , const ArgMember & , T & , bool ) const ) {}
template< class ArgMember , class T >
KOKKOS_INLINE_FUNCTION
static T deduce_reduce_type( VOIDTAG , void (FunctorType::*)( ArgMember , T & , const bool& ) const ) {}
template< class ArgMember , class T >
KOKKOS_INLINE_FUNCTION
static T deduce_reduce_type( VOIDTAG , void (FunctorType::*)( const ArgMember & , T & , const bool& ) const ) {}
template< class TagType , class ArgMember , class T >
KOKKOS_INLINE_FUNCTION
static REJECTTAG deduce_reduce_type( VOIDTAG , void (FunctorType::*)( TagType , ArgMember , T & , const bool& ) const ) {}
template< class TagType , class ArgMember , class T >
KOKKOS_INLINE_FUNCTION
static REJECTTAG deduce_reduce_type( VOIDTAG , void (FunctorType::*)( TagType , const ArgMember & , T & , const bool& ) const ) {}
template< class TagType , class ArgMember , class T >
KOKKOS_INLINE_FUNCTION
static REJECTTAG deduce_reduce_type( VOIDTAG , void (FunctorType::*)( const TagType & , ArgMember , T & , const bool& ) const ) {}
template< class TagType , class ArgMember , class T >
KOKKOS_INLINE_FUNCTION
static REJECTTAG deduce_reduce_type( VOIDTAG , void (FunctorType::*)( const TagType & , const ArgMember & , T & , const bool& ) const ) {}
//----------------------------------------
// parallel_scan operator with a tag:
template< class ArgMember , class T >
KOKKOS_INLINE_FUNCTION
static T deduce_reduce_type( tag_type , void (FunctorType::*)( tag_type , ArgMember , T & , bool ) const ) {}
template< class ArgMember , class T >
KOKKOS_INLINE_FUNCTION
static T deduce_reduce_type( tag_type , void (FunctorType::*)( const tag_type & , ArgMember , T & , bool ) const ) {}
template< class ArgMember , class T >
KOKKOS_INLINE_FUNCTION
static T deduce_reduce_type( tag_type , void (FunctorType::*)( tag_type , const ArgMember& , T & , bool ) const ) {}
template< class ArgMember , class T >
KOKKOS_INLINE_FUNCTION
static T deduce_reduce_type( tag_type , void (FunctorType::*)( const tag_type & , const ArgMember& , T & , bool ) const ) {}
template< class ArgMember , class T >
KOKKOS_INLINE_FUNCTION
static T deduce_reduce_type( tag_type , void (FunctorType::*)( tag_type , ArgMember , T & , const bool& ) const ) {}
template< class ArgMember , class T >
KOKKOS_INLINE_FUNCTION
static T deduce_reduce_type( tag_type , void (FunctorType::*)( const tag_type & , ArgMember , T & , const bool& ) const ) {}
template< class ArgMember , class T >
KOKKOS_INLINE_FUNCTION
static T deduce_reduce_type( tag_type , void (FunctorType::*)( tag_type , const ArgMember& , T & , const bool& ) const ) {}
template< class ArgMember , class T >
KOKKOS_INLINE_FUNCTION
static T deduce_reduce_type( tag_type , void (FunctorType::*)( const tag_type & , const ArgMember& , T & , const bool& ) const ) {}
//----------------------------------------
typedef decltype( deduce_reduce_type( tag_type() , & FunctorType::operator() ) ) ValueType ;
- enum { IS_VOID = Impl::is_same<VOIDTAG ,ValueType>::value };
- enum { IS_REJECT = Impl::is_same<REJECTTAG,ValueType>::value };
+ enum { IS_VOID = std::is_same<VOIDTAG ,ValueType>::value };
+ enum { IS_REJECT = std::is_same<REJECTTAG,ValueType>::value };
public:
typedef typename Impl::if_c< IS_VOID || IS_REJECT , void , ValueType >::type value_type ;
typedef typename Impl::if_c< IS_VOID || IS_REJECT , void , ValueType * >::type pointer_type ;
typedef typename Impl::if_c< IS_VOID || IS_REJECT , void , ValueType & >::type reference_type ;
typedef FunctorType functor_type;
static_assert( IS_VOID || IS_REJECT || 0 == ( sizeof(ValueType) % sizeof(int) ) ,
"Reduction functor's value_type deduced from functor::operator() requires: 0 == sizeof(value_type) % sizeof(int)" );
enum { StaticValueSize = IS_VOID || IS_REJECT ? 0 : sizeof(ValueType) };
KOKKOS_FORCEINLINE_FUNCTION static
unsigned value_size( const FunctorType & ) { return StaticValueSize ; }
KOKKOS_FORCEINLINE_FUNCTION static
unsigned value_count( const FunctorType & ) { return IS_VOID || IS_REJECT ? 0 : 1 ; }
};
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
/** Function signatures for FunctorType::init function with a tag.
* reference_type is 'value_type &' for scalar and 'value_type *' for array.
*/
template< class FunctorType , class ArgTag >
struct FunctorValueInitFunction {
typedef typename FunctorValueTraits<FunctorType,ArgTag>::reference_type
reference_type ;
KOKKOS_INLINE_FUNCTION static void
enable_if( void (FunctorType::*)( ArgTag , reference_type ) const );
KOKKOS_INLINE_FUNCTION static void
enable_if( void (FunctorType::*)( ArgTag const & , reference_type ) const );
KOKKOS_INLINE_FUNCTION static void
enable_if( void ( *)( ArgTag , reference_type ) );
KOKKOS_INLINE_FUNCTION static void
enable_if( void ( *)( ArgTag const & , reference_type ) );
};
/** Function signatures for FunctorType::init function without a tag.
* reference_type is 'value_type &' for scalar and 'value_type *' for array.
*/
template< class FunctorType >
struct FunctorValueInitFunction< FunctorType , void > {
typedef typename FunctorValueTraits<FunctorType,void>::reference_type
reference_type ;
KOKKOS_INLINE_FUNCTION static void
enable_if( void (FunctorType::*)( reference_type ) const );
KOKKOS_INLINE_FUNCTION static void
enable_if( void ( *)( reference_type ) );
};
// Adapter for value initialization function.
// If a proper FunctorType::init is declared then use it,
// otherwise use default constructor.
template< class FunctorType , class ArgTag
, class T = typename FunctorValueTraits<FunctorType,ArgTag>::reference_type
, class Enable = void >
struct FunctorValueInit ;
/* No 'init' function provided for single value */
template< class FunctorType , class ArgTag , class T , class Enable >
struct FunctorValueInit< FunctorType , ArgTag , T & , Enable >
{
KOKKOS_FORCEINLINE_FUNCTION static
T & init( const FunctorType & f , void * p )
{ return *( new(p) T() ); };
};
/* No 'init' function provided for array value */
template< class FunctorType , class ArgTag , class T , class Enable >
struct FunctorValueInit< FunctorType , ArgTag , T * , Enable >
{
KOKKOS_FORCEINLINE_FUNCTION static
T * init( const FunctorType & f , void * p )
{
const int n = FunctorValueTraits< FunctorType , ArgTag >::value_count(f);
for ( int i = 0 ; i < n ; ++i ) { new( ((T*)p) + i ) T(); }
return (T*)p ;
}
};
/* 'init' function provided for single value */
template< class FunctorType , class T >
struct FunctorValueInit
< FunctorType
, void
, T &
// First substitution failure when FunctorType::init does not exist.
// Second substitution failure when FunctorType::init is not compatible.
, decltype( FunctorValueInitFunction< FunctorType , void >::enable_if( & FunctorType::init ) )
>
{
KOKKOS_FORCEINLINE_FUNCTION static
T & init( const FunctorType & f , void * p )
{ f.init( *((T*)p) ); return *((T*)p) ; }
};
/* 'init' function provided for array value */
template< class FunctorType , class T >
struct FunctorValueInit
< FunctorType
, void
, T *
// First substitution failure when FunctorType::init does not exist.
// Second substitution failure when FunctorType::init is not compatible
, decltype( FunctorValueInitFunction< FunctorType , void >::enable_if( & FunctorType::init ) )
>
{
KOKKOS_FORCEINLINE_FUNCTION static
T * init( const FunctorType & f , void * p )
{ f.init( (T*)p ); return (T*)p ; }
};
/* 'init' function provided for single value */
template< class FunctorType , class ArgTag , class T >
struct FunctorValueInit
< FunctorType
, ArgTag
, T &
// First substitution failure when FunctorType::init does not exist.
// Second substitution failure when FunctorType::init is not compatible.
, decltype( FunctorValueInitFunction< FunctorType , ArgTag >::enable_if( & FunctorType::init ) )
>
{
KOKKOS_FORCEINLINE_FUNCTION static
T & init( const FunctorType & f , void * p )
{ f.init( ArgTag() , *((T*)p) ); return *((T*)p) ; }
};
/* 'init' function provided for array value */
template< class FunctorType , class ArgTag , class T >
struct FunctorValueInit
< FunctorType
, ArgTag
, T *
// First substitution failure when FunctorType::init does not exist.
// Second substitution failure when FunctorType::init is not compatible
, decltype( FunctorValueInitFunction< FunctorType , ArgTag >::enable_if( & FunctorType::init ) )
>
{
KOKKOS_FORCEINLINE_FUNCTION static
T * init( const FunctorType & f , void * p )
{ f.init( ArgTag() , (T*)p ); return (T*)p ; }
};
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
// Signatures for compatible FunctorType::join with tag and not an array
template< class FunctorType , class ArgTag , bool IsArray = 0 == FunctorValueTraits<FunctorType,ArgTag>::StaticValueSize >
struct FunctorValueJoinFunction {
typedef typename FunctorValueTraits<FunctorType,ArgTag>::value_type value_type ;
typedef volatile value_type & vref_type ;
typedef const volatile value_type & cvref_type ;
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag , vref_type , cvref_type ) const );
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag const & , vref_type , cvref_type ) const );
KOKKOS_INLINE_FUNCTION static void enable_if( void ( *)( ArgTag , vref_type , cvref_type ) );
KOKKOS_INLINE_FUNCTION static void enable_if( void ( *)( ArgTag const & , vref_type , cvref_type ) );
};
// Signatures for compatible FunctorType::join with tag and is an array
template< class FunctorType , class ArgTag >
struct FunctorValueJoinFunction< FunctorType , ArgTag , true > {
typedef typename FunctorValueTraits<FunctorType,ArgTag>::value_type value_type ;
typedef volatile value_type * vptr_type ;
typedef const volatile value_type * cvptr_type ;
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag , vptr_type , cvptr_type ) const );
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag const & , vptr_type , cvptr_type ) const );
KOKKOS_INLINE_FUNCTION static void enable_if( void ( *)( ArgTag , vptr_type , cvptr_type ) );
KOKKOS_INLINE_FUNCTION static void enable_if( void ( *)( ArgTag const & , vptr_type , cvptr_type ) );
};
// Signatures for compatible FunctorType::join without tag and not an array
template< class FunctorType >
struct FunctorValueJoinFunction< FunctorType , void , false > {
typedef typename FunctorValueTraits<FunctorType,void>::value_type value_type ;
typedef volatile value_type & vref_type ;
typedef const volatile value_type & cvref_type ;
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( vref_type , cvref_type ) const );
KOKKOS_INLINE_FUNCTION static void enable_if( void ( *)( vref_type , cvref_type ) );
};
// Signatures for compatible FunctorType::join without tag and is an array
template< class FunctorType >
struct FunctorValueJoinFunction< FunctorType , void , true > {
typedef typename FunctorValueTraits<FunctorType,void>::value_type value_type ;
typedef volatile value_type * vptr_type ;
typedef const volatile value_type * cvptr_type ;
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( vptr_type , cvptr_type ) const );
KOKKOS_INLINE_FUNCTION static void enable_if( void ( *)( vptr_type , cvptr_type ) );
};
template< class FunctorType , class ArgTag
, class T = typename FunctorValueTraits<FunctorType,ArgTag>::reference_type
, class Enable = void >
struct FunctorValueJoin ;
/* No 'join' function provided, single value */
template< class FunctorType , class ArgTag , class T , class Enable >
struct FunctorValueJoin< FunctorType , ArgTag , T & , Enable >
{
KOKKOS_FORCEINLINE_FUNCTION
FunctorValueJoin(const FunctorType& ){}
KOKKOS_FORCEINLINE_FUNCTION static
void join( const FunctorType & f , volatile void * const lhs , const volatile void * const rhs )
{
*((volatile T*)lhs) += *((const volatile T*)rhs);
}
KOKKOS_FORCEINLINE_FUNCTION
void operator()( volatile T& lhs , const volatile T& rhs ) const
{
lhs += rhs;
}
KOKKOS_FORCEINLINE_FUNCTION
void operator() ( T& lhs , const T& rhs ) const
{
lhs += rhs;
}
};
/* No 'join' function provided, array of values */
template< class FunctorType , class ArgTag , class T , class Enable >
struct FunctorValueJoin< FunctorType , ArgTag , T * , Enable >
{
const FunctorType& f;
KOKKOS_FORCEINLINE_FUNCTION
FunctorValueJoin(const FunctorType& f_):f(f_){}
KOKKOS_FORCEINLINE_FUNCTION static
void join( const FunctorType & f_ , volatile void * const lhs , const volatile void * const rhs )
{
const int n = FunctorValueTraits<FunctorType,ArgTag>::value_count(f_);
for ( int i = 0 ; i < n ; ++i ) { ((volatile T*)lhs)[i] += ((const volatile T*)rhs)[i]; }
}
KOKKOS_FORCEINLINE_FUNCTION
void operator()( volatile T* const lhs , const volatile T* const rhs ) const
{
const int n = FunctorValueTraits<FunctorType,ArgTag>::value_count(f);
for ( int i = 0 ; i < n ; ++i ) { lhs[i] += rhs[i]; }
}
KOKKOS_FORCEINLINE_FUNCTION
void operator() ( T* lhs , const T* rhs ) const
{
const int n = FunctorValueTraits<FunctorType,ArgTag>::value_count(f);
for ( int i = 0 ; i < n ; ++i ) { lhs[i] += rhs[i]; }
}
};
/* 'join' function provided, single value */
template< class FunctorType , class ArgTag , class T >
struct FunctorValueJoin
< FunctorType
, ArgTag
, T &
// First substitution failure when FunctorType::join does not exist.
// Second substitution failure when enable_if( & Functor::join ) does not exist
, decltype( FunctorValueJoinFunction< FunctorType , ArgTag >::enable_if( & FunctorType::join ) )
>
{
const FunctorType& f;
KOKKOS_FORCEINLINE_FUNCTION
FunctorValueJoin(const FunctorType& f_):f(f_){}
KOKKOS_FORCEINLINE_FUNCTION static
void join( const FunctorType & f_ , volatile void * const lhs , const volatile void * const rhs )
{
f_.join( ArgTag() , *((volatile T *)lhs) , *((const volatile T *)rhs) );
}
KOKKOS_FORCEINLINE_FUNCTION
void operator()( volatile T& lhs , const volatile T& rhs ) const
{
f.join( ArgTag() , lhs , rhs );
}
KOKKOS_FORCEINLINE_FUNCTION
void operator() ( T& lhs , const T& rhs ) const
{
f.join( ArgTag(), lhs , rhs );
}
};
/* 'join' function provided, no tag, single value */
template< class FunctorType , class T >
struct FunctorValueJoin
< FunctorType
, void
, T &
// First substitution failure when FunctorType::join does not exist.
// Second substitution failure when enable_if( & Functor::join ) does not exist
, decltype( FunctorValueJoinFunction< FunctorType , void >::enable_if( & FunctorType::join ) )
>
{
const FunctorType& f;
KOKKOS_FORCEINLINE_FUNCTION
FunctorValueJoin(const FunctorType& f_):f(f_){}
KOKKOS_FORCEINLINE_FUNCTION static
void join( const FunctorType & f_ , volatile void * const lhs , const volatile void * const rhs )
{
f_.join( *((volatile T *)lhs) , *((const volatile T *)rhs) );
}
KOKKOS_FORCEINLINE_FUNCTION
void operator()( volatile T& lhs , const volatile T& rhs ) const
{
f.join( lhs , rhs );
}
KOKKOS_FORCEINLINE_FUNCTION
void operator() ( T& lhs , const T& rhs ) const
{
f.join( lhs , rhs );
}
};
/* 'join' function provided for array value */
template< class FunctorType , class ArgTag , class T >
struct FunctorValueJoin
< FunctorType
, ArgTag
, T *
// First substitution failure when FunctorType::join does not exist.
// Second substitution failure when enable_if( & Functor::join ) does not exist
, decltype( FunctorValueJoinFunction< FunctorType , ArgTag >::enable_if( & FunctorType::join ) )
>
{
const FunctorType& f;
KOKKOS_FORCEINLINE_FUNCTION
FunctorValueJoin(const FunctorType& f_):f(f_){}
KOKKOS_FORCEINLINE_FUNCTION static
void join( const FunctorType & f_ , volatile void * const lhs , const volatile void * const rhs )
{
f_.join( ArgTag() , (volatile T *)lhs , (const volatile T *)rhs );
}
KOKKOS_FORCEINLINE_FUNCTION
void operator()( volatile T* const lhs , const volatile T* const rhs ) const
{
f.join( ArgTag() , lhs , rhs );
}
KOKKOS_FORCEINLINE_FUNCTION
void operator() ( T* lhs , const T* rhs ) const
{
f.join( ArgTag(), lhs , rhs );
}
};
/* 'join' function provided, no tag, array value */
template< class FunctorType , class T >
struct FunctorValueJoin
< FunctorType
, void
, T *
// First substitution failure when FunctorType::join does not exist.
// Second substitution failure when enable_if( & Functor::join ) does not exist
, decltype( FunctorValueJoinFunction< FunctorType , void >::enable_if( & FunctorType::join ) )
>
{
const FunctorType& f;
KOKKOS_FORCEINLINE_FUNCTION
FunctorValueJoin(const FunctorType& f_):f(f_){}
KOKKOS_FORCEINLINE_FUNCTION static
void join( const FunctorType & f_ , volatile void * const lhs , const volatile void * const rhs )
{
f_.join( (volatile T *)lhs , (const volatile T *)rhs );
}
KOKKOS_FORCEINLINE_FUNCTION
void operator() ( volatile T* const lhs , const volatile T* const rhs ) const
{
f.join( lhs , rhs );
}
KOKKOS_FORCEINLINE_FUNCTION
void operator() ( T* lhs , const T* rhs ) const
{
f.join( lhs , rhs );
}
};
} // namespace Impl
} // namespace Kokkos
namespace Kokkos {
namespace Impl {
template<typename ValueType, class JoinOp, class Enable = void>
struct JoinLambdaAdapter {
typedef ValueType value_type;
const JoinOp& lambda;
KOKKOS_INLINE_FUNCTION
JoinLambdaAdapter(const JoinOp& lambda_):lambda(lambda_) {}
KOKKOS_INLINE_FUNCTION
void join(volatile value_type& dst, const volatile value_type& src) const {
lambda(dst,src);
}
KOKKOS_INLINE_FUNCTION
void join(value_type& dst, const value_type& src) const {
lambda(dst,src);
}
KOKKOS_INLINE_FUNCTION
void operator() (volatile value_type& dst, const volatile value_type& src) const {
lambda(dst,src);
}
KOKKOS_INLINE_FUNCTION
void operator() (value_type& dst, const value_type& src) const {
lambda(dst,src);
}
};
template<typename ValueType, class JoinOp>
struct JoinLambdaAdapter<ValueType, JoinOp, decltype( FunctorValueJoinFunction< JoinOp , void >::enable_if( & JoinOp::join ) )> {
typedef ValueType value_type;
typedef StaticAssertSame<ValueType,typename JoinOp::value_type> assert_value_types_match;
const JoinOp& lambda;
KOKKOS_INLINE_FUNCTION
JoinLambdaAdapter(const JoinOp& lambda_):lambda(lambda_) {}
KOKKOS_INLINE_FUNCTION
void join(volatile value_type& dst, const volatile value_type& src) const {
lambda.join(dst,src);
}
KOKKOS_INLINE_FUNCTION
void join(value_type& dst, const value_type& src) const {
lambda.join(dst,src);
}
KOKKOS_INLINE_FUNCTION
void operator() (volatile value_type& dst, const volatile value_type& src) const {
lambda.join(dst,src);
}
KOKKOS_INLINE_FUNCTION
void operator() (value_type& dst, const value_type& src) const {
lambda.join(dst,src);
}
};
template<typename ValueType>
struct JoinAdd {
typedef ValueType value_type;
KOKKOS_INLINE_FUNCTION
JoinAdd() {}
KOKKOS_INLINE_FUNCTION
void join(volatile value_type& dst, const volatile value_type& src) const {
dst+=src;
}
KOKKOS_INLINE_FUNCTION
void operator() (value_type& dst, const value_type& src) const {
dst+=src;
}
KOKKOS_INLINE_FUNCTION
void operator() (volatile value_type& dst, const volatile value_type& src) const {
dst+=src;
}
};
}
}
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template< class FunctorType , class ArgTag
, class T = typename FunctorValueTraits<FunctorType,ArgTag>::reference_type >
struct FunctorValueOps ;
template< class FunctorType , class ArgTag , class T >
struct FunctorValueOps< FunctorType , ArgTag , T & >
{
KOKKOS_FORCEINLINE_FUNCTION static
T * pointer( T & r ) { return & r ; }
KOKKOS_FORCEINLINE_FUNCTION static
T & reference( void * p ) { return *((T*)p); }
KOKKOS_FORCEINLINE_FUNCTION static
void copy( const FunctorType & , void * const lhs , const void * const rhs )
{ *((T*)lhs) = *((const T*)rhs); }
};
/* No 'join' function provided, array of values */
template< class FunctorType , class ArgTag , class T >
struct FunctorValueOps< FunctorType , ArgTag , T * >
{
KOKKOS_FORCEINLINE_FUNCTION static
T * pointer( T * p ) { return p ; }
KOKKOS_FORCEINLINE_FUNCTION static
T * reference( void * p ) { return ((T*)p); }
KOKKOS_FORCEINLINE_FUNCTION static
void copy( const FunctorType & f , void * const lhs , const void * const rhs )
{
const int n = FunctorValueTraits<FunctorType,ArgTag>::value_count(f);
for ( int i = 0 ; i < n ; ++i ) { ((T*)lhs)[i] = ((const T*)rhs)[i]; }
}
};
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
// Compatible functions for 'final' function and value_type not an array
template< class FunctorType , class ArgTag , bool IsArray = 0 == FunctorValueTraits<FunctorType,ArgTag>::StaticValueSize >
struct FunctorFinalFunction {
typedef typename FunctorValueTraits<FunctorType,ArgTag>::value_type value_type ;
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag , value_type & ) const );
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag const & , value_type & ) const );
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag , value_type & ) );
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag const & , value_type & ) );
KOKKOS_INLINE_FUNCTION static void enable_if( void ( *)( ArgTag , value_type & ) );
KOKKOS_INLINE_FUNCTION static void enable_if( void ( *)( ArgTag const & , value_type & ) );
// KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag , value_type volatile & ) const );
// KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag const & , value_type volatile & ) const );
// KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag , value_type volatile & ) );
// KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag const & , value_type volatile & ) );
// KOKKOS_INLINE_FUNCTION static void enable_if( void ( *)( ArgTag , value_type volatile & ) );
// KOKKOS_INLINE_FUNCTION static void enable_if( void ( *)( ArgTag const & , value_type volatile & ) );
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag , value_type const & ) const );
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag const & , value_type const & ) const );
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag , value_type const & ) );
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag const & , value_type const & ) );
KOKKOS_INLINE_FUNCTION static void enable_if( void ( *)( ArgTag , value_type const & ) );
KOKKOS_INLINE_FUNCTION static void enable_if( void ( *)( ArgTag const & , value_type const & ) );
// KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag , value_type const volatile & ) const );
// KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag const & , value_type const volatile & ) const );
// KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag , value_type const volatile & ) );
// KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag const & , value_type const volatile & ) );
// KOKKOS_INLINE_FUNCTION static void enable_if( void ( *)( ArgTag , value_type const volatile & ) );
// KOKKOS_INLINE_FUNCTION static void enable_if( void ( *)( ArgTag const & , value_type const volatile & ) );
};
// Compatible functions for 'final' function and value_type is an array
template< class FunctorType , class ArgTag >
struct FunctorFinalFunction< FunctorType , ArgTag , true > {
typedef typename FunctorValueTraits<FunctorType,ArgTag>::value_type value_type ;
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag , value_type * ) const );
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag const & , value_type * ) const );
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag , value_type * ) );
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag const & , value_type * ) );
KOKKOS_INLINE_FUNCTION static void enable_if( void ( *)( ArgTag , value_type * ) );
KOKKOS_INLINE_FUNCTION static void enable_if( void ( *)( ArgTag const & , value_type * ) );
// KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag , value_type volatile * ) const );
// KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag const & , value_type volatile * ) const );
// KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag , value_type volatile * ) );
// KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag const & , value_type volatile * ) );
// KOKKOS_INLINE_FUNCTION static void enable_if( void ( *)( ArgTag , value_type volatile * ) );
// KOKKOS_INLINE_FUNCTION static void enable_if( void ( *)( ArgTag const & , value_type volatile * ) );
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag , value_type const * ) const );
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag const & , value_type const * ) const );
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag , value_type const * ) );
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag const & , value_type const * ) );
KOKKOS_INLINE_FUNCTION static void enable_if( void ( *)( ArgTag , value_type const * ) );
KOKKOS_INLINE_FUNCTION static void enable_if( void ( *)( ArgTag const & , value_type const * ) );
// KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag , value_type const volatile * ) const );
// KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag const & , value_type const volatile * ) const );
// KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag , value_type const volatile * ) );
// KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag const & , value_type const volatile * ) );
// KOKKOS_INLINE_FUNCTION static void enable_if( void ( *)( ArgTag , value_type const volatile * ) );
// KOKKOS_INLINE_FUNCTION static void enable_if( void ( *)( ArgTag const & , value_type const volatile * ) );
};
template< class FunctorType >
struct FunctorFinalFunction< FunctorType , void , false > {
typedef typename FunctorValueTraits<FunctorType,void>::value_type value_type ;
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( value_type & ) const );
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( value_type & ) );
KOKKOS_INLINE_FUNCTION static void enable_if( void ( *)( value_type & ) );
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( const value_type & ) const );
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( const value_type & ) );
KOKKOS_INLINE_FUNCTION static void enable_if( void ( *)( const value_type & ) );
};
template< class FunctorType >
struct FunctorFinalFunction< FunctorType , void , true > {
typedef typename FunctorValueTraits<FunctorType,void>::value_type value_type ;
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( value_type * ) const );
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( value_type * ) );
KOKKOS_INLINE_FUNCTION static void enable_if( void ( *)( value_type * ) );
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( const value_type * ) const );
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( const value_type * ) );
KOKKOS_INLINE_FUNCTION static void enable_if( void ( *)( const value_type * ) );
};
/* No 'final' function provided */
template< class FunctorType , class ArgTag
, class ResultType = typename FunctorValueTraits<FunctorType,ArgTag>::reference_type
, class Enable = void >
struct FunctorFinal
{
KOKKOS_FORCEINLINE_FUNCTION static
void final( const FunctorType & , void * ) {}
};
/* 'final' function provided */
template< class FunctorType , class ArgTag , class T >
struct FunctorFinal
< FunctorType
, ArgTag
, T &
// First substitution failure when FunctorType::final does not exist.
// Second substitution failure when enable_if( & Functor::final ) does not exist
, decltype( FunctorFinalFunction< FunctorType , ArgTag >::enable_if( & FunctorType::final ) )
>
{
KOKKOS_FORCEINLINE_FUNCTION static
void final( const FunctorType & f , void * p ) { f.final( *((T*)p) ); }
KOKKOS_FORCEINLINE_FUNCTION static
void final( FunctorType & f , void * p ) { f.final( *((T*)p) ); }
};
/* 'final' function provided for array value */
template< class FunctorType , class ArgTag , class T >
struct FunctorFinal
< FunctorType
, ArgTag
, T *
// First substitution failure when FunctorType::final does not exist.
// Second substitution failure when enable_if( & Functor::final ) does not exist
, decltype( FunctorFinalFunction< FunctorType , ArgTag >::enable_if( & FunctorType::final ) )
>
{
KOKKOS_FORCEINLINE_FUNCTION static
void final( const FunctorType & f , void * p ) { f.final( (T*)p ); }
KOKKOS_FORCEINLINE_FUNCTION static
void final( FunctorType & f , void * p ) { f.final( (T*)p ); }
};
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template< class FunctorType , class ArgTag
, class ReferenceType = typename FunctorValueTraits<FunctorType,ArgTag>::reference_type >
struct FunctorApplyFunction {
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag , ReferenceType ) const );
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag const & , ReferenceType ) const );
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag , ReferenceType ) );
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ArgTag const & , ReferenceType ) );
KOKKOS_INLINE_FUNCTION static void enable_if( void ( *)( ArgTag , ReferenceType ) );
KOKKOS_INLINE_FUNCTION static void enable_if( void ( *)( ArgTag const & , ReferenceType ) );
};
template< class FunctorType , class ReferenceType >
struct FunctorApplyFunction< FunctorType , void , ReferenceType > {
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ReferenceType ) const );
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)( ReferenceType ) );
KOKKOS_INLINE_FUNCTION static void enable_if( void ( *)( ReferenceType ) );
};
template< class FunctorType >
struct FunctorApplyFunction< FunctorType , void , void > {
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)() const );
KOKKOS_INLINE_FUNCTION static void enable_if( void (FunctorType::*)() );
};
template< class FunctorType , class ArgTag , class ReferenceType
, class Enable = void >
struct FunctorApply
{
KOKKOS_FORCEINLINE_FUNCTION static
void apply( const FunctorType & , void * ) {}
};
/* 'apply' function provided for void value */
template< class FunctorType , class ArgTag >
struct FunctorApply
< FunctorType
, ArgTag
, void
// First substitution failure when FunctorType::apply does not exist.
// Second substitution failure when enable_if( & Functor::apply ) does not exist
, decltype( FunctorApplyFunction< FunctorType , ArgTag , void >::enable_if( & FunctorType::apply ) )
>
{
KOKKOS_FORCEINLINE_FUNCTION static
void apply( FunctorType & f ) { f.apply(); }
KOKKOS_FORCEINLINE_FUNCTION static
void apply( const FunctorType & f ) { f.apply(); }
};
/* 'apply' function provided for single value */
template< class FunctorType , class ArgTag , class T >
struct FunctorApply
< FunctorType
, ArgTag
, T &
// First substitution failure when FunctorType::apply does not exist.
// Second substitution failure when enable_if( & Functor::apply ) does not exist
, decltype( FunctorApplyFunction< FunctorType , ArgTag >::enable_if( & FunctorType::apply ) )
>
{
KOKKOS_FORCEINLINE_FUNCTION static
void apply( const FunctorType & f , void * p ) { f.apply( *((T*)p) ); }
KOKKOS_FORCEINLINE_FUNCTION static
void apply( FunctorType & f , void * p ) { f.apply( *((T*)p) ); }
};
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
#endif /* KOKKOS_FUNCTORADAPTER_HPP */
diff --git a/lib/kokkos/core/src/impl/Kokkos_HBWSpace.cpp b/lib/kokkos/core/src/impl/Kokkos_HBWSpace.cpp
index 11cc12021..953402611 100644
--- a/lib/kokkos/core/src/impl/Kokkos_HBWSpace.cpp
+++ b/lib/kokkos/core/src/impl/Kokkos_HBWSpace.cpp
@@ -1,379 +1,399 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Macros.hpp>
#include <stddef.h>
#include <stdlib.h>
#include <stdint.h>
#include <memory.h>
#include <iostream>
#include <sstream>
#include <cstring>
#include <algorithm>
#include <Kokkos_HBWSpace.hpp>
#include <impl/Kokkos_Error.hpp>
#include <Kokkos_Atomic.hpp>
#ifdef KOKKOS_HAVE_HBWSPACE
#include <memkind.h>
#endif
+#if (KOKKOS_ENABLE_PROFILING)
+#include <impl/Kokkos_Profiling_Interface.hpp>
+#endif
+
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
#ifdef KOKKOS_HAVE_HBWSPACE
#define MEMKIND_TYPE MEMKIND_HBW //hbw_get_kind(HBW_PAGESIZE_4KB)
namespace Kokkos {
namespace Experimental {
namespace {
static const int QUERY_SPACE_IN_PARALLEL_MAX = 16 ;
typedef int (* QuerySpaceInParallelPtr )();
QuerySpaceInParallelPtr s_in_parallel_query[ QUERY_SPACE_IN_PARALLEL_MAX ] ;
int s_in_parallel_query_count = 0 ;
} // namespace <empty>
void HBWSpace::register_in_parallel( int (*device_in_parallel)() )
{
if ( 0 == device_in_parallel ) {
Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Experimental::HBWSpace::register_in_parallel ERROR : given NULL" ) );
}
int i = -1 ;
if ( ! (device_in_parallel)() ) {
for ( i = 0 ; i < s_in_parallel_query_count && ! (*(s_in_parallel_query[i]))() ; ++i );
}
if ( i < s_in_parallel_query_count ) {
Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Experimental::HBWSpace::register_in_parallel_query ERROR : called in_parallel" ) );
}
if ( QUERY_SPACE_IN_PARALLEL_MAX <= i ) {
Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Experimental::HBWSpace::register_in_parallel_query ERROR : exceeded maximum" ) );
}
for ( i = 0 ; i < s_in_parallel_query_count && s_in_parallel_query[i] != device_in_parallel ; ++i );
if ( i == s_in_parallel_query_count ) {
s_in_parallel_query[s_in_parallel_query_count++] = device_in_parallel ;
}
}
int HBWSpace::in_parallel()
{
const int n = s_in_parallel_query_count ;
int i = 0 ;
while ( i < n && ! (*(s_in_parallel_query[i]))() ) { ++i ; }
return i < n ;
}
} // namespace Experiemtal
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
namespace Kokkos {
namespace Experimental {
/* Default allocation mechanism */
HBWSpace::HBWSpace()
: m_alloc_mech(
HBWSpace::STD_MALLOC
)
{
printf("Init\n");
setenv("MEMKIND_HBW_NODES", "1", 0);
}
/* Default allocation mechanism */
HBWSpace::HBWSpace( const HBWSpace::AllocationMechanism & arg_alloc_mech )
: m_alloc_mech( HBWSpace::STD_MALLOC )
{
printf("Init2\n");
setenv("MEMKIND_HBW_NODES", "1", 0);
if ( arg_alloc_mech == STD_MALLOC ) {
m_alloc_mech = HBWSpace::STD_MALLOC ;
}
}
void * HBWSpace::allocate( const size_t arg_alloc_size ) const
{
static_assert( sizeof(void*) == sizeof(uintptr_t)
, "Error sizeof(void*) != sizeof(uintptr_t)" );
static_assert( Kokkos::Impl::power_of_two< Kokkos::Impl::MEMORY_ALIGNMENT >::value
, "Memory alignment must be power of two" );
constexpr uintptr_t alignment = Kokkos::Impl::MEMORY_ALIGNMENT ;
constexpr uintptr_t alignment_mask = alignment - 1 ;
void * ptr = 0 ;
if ( arg_alloc_size ) {
if ( m_alloc_mech == STD_MALLOC ) {
// Over-allocate to and round up to guarantee proper alignment.
size_t size_padded = arg_alloc_size + sizeof(void*) + alignment ;
void * alloc_ptr = memkind_malloc(MEMKIND_TYPE, size_padded );
if (alloc_ptr) {
uintptr_t address = reinterpret_cast<uintptr_t>(alloc_ptr);
// offset enough to record the alloc_ptr
address += sizeof(void *);
uintptr_t rem = address % alignment;
uintptr_t offset = rem ? (alignment - rem) : 0u;
address += offset;
ptr = reinterpret_cast<void *>(address);
// record the alloc'd pointer
address -= sizeof(void *);
*reinterpret_cast<void **>(address) = alloc_ptr;
}
}
}
if ( ( ptr == 0 ) || ( reinterpret_cast<uintptr_t>(ptr) == ~uintptr_t(0) )
|| ( reinterpret_cast<uintptr_t>(ptr) & alignment_mask ) ) {
std::ostringstream msg ;
msg << "Kokkos::Experimental::HBWSpace::allocate[ " ;
switch( m_alloc_mech ) {
case STD_MALLOC: msg << "STD_MALLOC" ; break ;
}
msg << " ]( " << arg_alloc_size << " ) FAILED" ;
if ( ptr == NULL ) { msg << " NULL" ; }
else { msg << " NOT ALIGNED " << ptr ; }
std::cerr << msg.str() << std::endl ;
std::cerr.flush();
Kokkos::Impl::throw_runtime_exception( msg.str() );
}
return ptr;
}
void HBWSpace::deallocate( void * const arg_alloc_ptr , const size_t arg_alloc_size ) const
{
if ( arg_alloc_ptr ) {
if ( m_alloc_mech == STD_MALLOC ) {
void * alloc_ptr = *(reinterpret_cast<void **>(arg_alloc_ptr) -1);
memkind_free(MEMKIND_TYPE, alloc_ptr );
}
}
}
+constexpr const char* HBWSpace::name() {
+ return m_name;
+}
+
} // namespace Experimental
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
-namespace Experimental {
namespace Impl {
SharedAllocationRecord< void , void >
SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >::s_root_record ;
void
SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >::
deallocate( SharedAllocationRecord< void , void > * arg_rec )
{
delete static_cast<SharedAllocationRecord*>(arg_rec);
}
SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >::
~SharedAllocationRecord()
{
+ #if (KOKKOS_ENABLE_PROFILING)
+ if(Kokkos::Profiling::profileLibraryLoaded()) {
+ Kokkos::Profiling::deallocateData(
+ Kokkos::Profiling::SpaceHandle(Kokkos::Experimental::HBWSpace::name()),RecordBase::m_alloc_ptr->m_label,
+ data(),size());
+ }
+ #endif
+
m_space.deallocate( SharedAllocationRecord< void , void >::m_alloc_ptr
, SharedAllocationRecord< void , void >::m_alloc_size
);
}
SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >::
SharedAllocationRecord( const Kokkos::Experimental::HBWSpace & arg_space
, const std::string & arg_label
, const size_t arg_alloc_size
, const SharedAllocationRecord< void , void >::function_type arg_dealloc
)
// Pass through allocated [ SharedAllocationHeader , user_memory ]
// Pass through deallocation function
: SharedAllocationRecord< void , void >
( & SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >::s_root_record
, reinterpret_cast<SharedAllocationHeader*>( arg_space.allocate( sizeof(SharedAllocationHeader) + arg_alloc_size ) )
, sizeof(SharedAllocationHeader) + arg_alloc_size
, arg_dealloc
)
, m_space( arg_space )
{
+ #if (KOKKOS_ENABLE_PROFILING)
+ if(Kokkos::Profiling::profileLibraryLoaded()) {
+ Kokkos::Profiling::allocateData(Kokkos::Profiling::SpaceHandle(arg_space.name()),arg_label,data(),arg_alloc_size);
+ }
+ #endif
+
// Fill in the Header information
RecordBase::m_alloc_ptr->m_record = static_cast< SharedAllocationRecord< void , void > * >( this );
strncpy( RecordBase::m_alloc_ptr->m_label
, arg_label.c_str()
, SharedAllocationHeader::maximum_label_length
);
}
//----------------------------------------------------------------------------
void * SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >::
allocate_tracked( const Kokkos::Experimental::HBWSpace & arg_space
, const std::string & arg_alloc_label
, const size_t arg_alloc_size )
{
if ( ! arg_alloc_size ) return (void *) 0 ;
SharedAllocationRecord * const r =
allocate( arg_space , arg_alloc_label , arg_alloc_size );
RecordBase::increment( r );
return r->data();
}
void SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >::
deallocate_tracked( void * const arg_alloc_ptr )
{
if ( arg_alloc_ptr != 0 ) {
SharedAllocationRecord * const r = get_record( arg_alloc_ptr );
RecordBase::decrement( r );
}
}
void * SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >::
reallocate_tracked( void * const arg_alloc_ptr
, const size_t arg_alloc_size )
{
SharedAllocationRecord * const r_old = get_record( arg_alloc_ptr );
SharedAllocationRecord * const r_new = allocate( r_old->m_space , r_old->get_label() , arg_alloc_size );
- Kokkos::Impl::DeepCopy<HBWSpace,HBWSpace>( r_new->data() , r_old->data()
+ Kokkos::Impl::DeepCopy<Kokkos::Experimental::HBWSpace,Kokkos::Experimental::HBWSpace>( r_new->data() , r_old->data()
, std::min( r_old->size() , r_new->size() ) );
RecordBase::increment( r_new );
RecordBase::decrement( r_old );
return r_new->data();
}
SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void > *
SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >::get_record( void * alloc_ptr )
{
typedef SharedAllocationHeader Header ;
typedef SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void > RecordHost ;
SharedAllocationHeader const * const head = alloc_ptr ? Header::get_header( alloc_ptr ) : (SharedAllocationHeader *)0 ;
RecordHost * const record = head ? static_cast< RecordHost * >( head->m_record ) : (RecordHost *) 0 ;
if ( ! alloc_ptr || record->m_alloc_ptr != head ) {
- Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >::get_record ERROR" ) );
+ Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Impl::SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >::get_record ERROR" ) );
}
return record ;
}
// Iterate records to print orphaned memory ...
void SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >::
print_records( std::ostream & s , const Kokkos::Experimental::HBWSpace & space , bool detail )
{
SharedAllocationRecord< void , void >::print_host_accessible_records( s , "HBWSpace" , & s_root_record , detail );
}
} // namespace Impl
-} // namespace Experimental
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
namespace Kokkos {
namespace Experimental {
namespace {
const unsigned HBW_SPACE_ATOMIC_MASK = 0xFFFF;
const unsigned HBW_SPACE_ATOMIC_XOR_MASK = 0x5A39;
static int HBW_SPACE_ATOMIC_LOCKS[HBW_SPACE_ATOMIC_MASK+1];
}
namespace Impl {
void init_lock_array_hbw_space() {
static int is_initialized = 0;
if(! is_initialized)
for(int i = 0; i < static_cast<int> (HBW_SPACE_ATOMIC_MASK+1); i++)
HBW_SPACE_ATOMIC_LOCKS[i] = 0;
}
bool lock_address_hbw_space(void* ptr) {
return 0 == atomic_compare_exchange( &HBW_SPACE_ATOMIC_LOCKS[
(( size_t(ptr) >> 2 ) & HBW_SPACE_ATOMIC_MASK) ^ HBW_SPACE_ATOMIC_XOR_MASK] ,
0 , 1);
}
void unlock_address_hbw_space(void* ptr) {
atomic_exchange( &HBW_SPACE_ATOMIC_LOCKS[
(( size_t(ptr) >> 2 ) & HBW_SPACE_ATOMIC_MASK) ^ HBW_SPACE_ATOMIC_XOR_MASK] ,
0);
}
}
}
}
#endif
diff --git a/lib/kokkos/core/src/impl/Kokkos_HostSpace.cpp b/lib/kokkos/core/src/impl/Kokkos_HostSpace.cpp
index b52f4591e..bfd13572b 100644
--- a/lib/kokkos/core/src/impl/Kokkos_HostSpace.cpp
+++ b/lib/kokkos/core/src/impl/Kokkos_HostSpace.cpp
@@ -1,537 +1,505 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <algorithm>
#include <Kokkos_Macros.hpp>
-
+#if (KOKKOS_ENABLE_PROFILING)
+#include <impl/Kokkos_Profiling_Interface.hpp>
+#endif
/*--------------------------------------------------------------------------*/
#if defined( __INTEL_COMPILER ) && ! defined ( KOKKOS_HAVE_CUDA )
// Intel specialized allocator does not interoperate with CUDA memory allocation
#define KOKKOS_INTEL_MM_ALLOC_AVAILABLE
#endif
/*--------------------------------------------------------------------------*/
#if defined(KOKKOS_POSIX_MEMALIGN_AVAILABLE)
#include <unistd.h>
#include <sys/mman.h>
/* mmap flags for private anonymous memory allocation */
#if defined( MAP_ANONYMOUS ) && defined( MAP_PRIVATE )
#define KOKKOS_POSIX_MMAP_FLAGS (MAP_PRIVATE | MAP_ANONYMOUS)
#elif defined( MAP_ANON ) && defined( MAP_PRIVATE )
#define KOKKOS_POSIX_MMAP_FLAGS (MAP_PRIVATE | MAP_ANON)
#endif
// mmap flags for huge page tables
// the Cuda driver does not interoperate with MAP_HUGETLB
#if defined( KOKKOS_POSIX_MMAP_FLAGS )
#if defined( MAP_HUGETLB ) && ! defined( KOKKOS_HAVE_CUDA )
#define KOKKOS_POSIX_MMAP_FLAGS_HUGE (KOKKOS_POSIX_MMAP_FLAGS | MAP_HUGETLB )
#else
#define KOKKOS_POSIX_MMAP_FLAGS_HUGE KOKKOS_POSIX_MMAP_FLAGS
#endif
#endif
#endif
/*--------------------------------------------------------------------------*/
#include <stddef.h>
#include <stdlib.h>
#include <stdint.h>
#include <memory.h>
#include <iostream>
#include <sstream>
#include <cstring>
#include <Kokkos_HostSpace.hpp>
#include <impl/Kokkos_Error.hpp>
#include <Kokkos_Atomic.hpp>
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace {
static const int QUERY_SPACE_IN_PARALLEL_MAX = 16 ;
typedef int (* QuerySpaceInParallelPtr )();
QuerySpaceInParallelPtr s_in_parallel_query[ QUERY_SPACE_IN_PARALLEL_MAX ] ;
int s_in_parallel_query_count = 0 ;
} // namespace <empty>
void HostSpace::register_in_parallel( int (*device_in_parallel)() )
{
if ( 0 == device_in_parallel ) {
Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::HostSpace::register_in_parallel ERROR : given NULL" ) );
}
int i = -1 ;
if ( ! (device_in_parallel)() ) {
for ( i = 0 ; i < s_in_parallel_query_count && ! (*(s_in_parallel_query[i]))() ; ++i );
}
if ( i < s_in_parallel_query_count ) {
Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::HostSpace::register_in_parallel_query ERROR : called in_parallel" ) );
}
if ( QUERY_SPACE_IN_PARALLEL_MAX <= i ) {
Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::HostSpace::register_in_parallel_query ERROR : exceeded maximum" ) );
}
for ( i = 0 ; i < s_in_parallel_query_count && s_in_parallel_query[i] != device_in_parallel ; ++i );
if ( i == s_in_parallel_query_count ) {
s_in_parallel_query[s_in_parallel_query_count++] = device_in_parallel ;
}
}
int HostSpace::in_parallel()
{
const int n = s_in_parallel_query_count ;
int i = 0 ;
while ( i < n && ! (*(s_in_parallel_query[i]))() ) { ++i ; }
return i < n ;
}
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
namespace Kokkos {
/* Default allocation mechanism */
HostSpace::HostSpace()
: m_alloc_mech(
#if defined( KOKKOS_INTEL_MM_ALLOC_AVAILABLE )
HostSpace::INTEL_MM_ALLOC
#elif defined( KOKKOS_POSIX_MMAP_FLAGS )
HostSpace::POSIX_MMAP
#elif defined( KOKKOS_POSIX_MEMALIGN_AVAILABLE )
HostSpace::POSIX_MEMALIGN
#else
HostSpace::STD_MALLOC
#endif
)
{}
/* Default allocation mechanism */
HostSpace::HostSpace( const HostSpace::AllocationMechanism & arg_alloc_mech )
: m_alloc_mech( HostSpace::STD_MALLOC )
{
if ( arg_alloc_mech == STD_MALLOC ) {
m_alloc_mech = HostSpace::STD_MALLOC ;
}
#if defined( KOKKOS_INTEL_MM_ALLOC_AVAILABLE )
else if ( arg_alloc_mech == HostSpace::INTEL_MM_ALLOC ) {
m_alloc_mech = HostSpace::INTEL_MM_ALLOC ;
}
#elif defined( KOKKOS_POSIX_MEMALIGN_AVAILABLE )
else if ( arg_alloc_mech == HostSpace::POSIX_MEMALIGN ) {
m_alloc_mech = HostSpace::POSIX_MEMALIGN ;
}
#elif defined( KOKKOS_POSIX_MMAP_FLAGS )
else if ( arg_alloc_mech == HostSpace::POSIX_MMAP ) {
m_alloc_mech = HostSpace::POSIX_MMAP ;
}
#endif
else {
const char * const mech =
( arg_alloc_mech == HostSpace::INTEL_MM_ALLOC ) ? "INTEL_MM_ALLOC" : (
( arg_alloc_mech == HostSpace::POSIX_MEMALIGN ) ? "POSIX_MEMALIGN" : (
( arg_alloc_mech == HostSpace::POSIX_MMAP ) ? "POSIX_MMAP" : "" ));
std::string msg ;
msg.append("Kokkos::HostSpace ");
msg.append(mech);
msg.append(" is not available" );
Kokkos::Impl::throw_runtime_exception( msg );
}
}
void * HostSpace::allocate( const size_t arg_alloc_size ) const
{
static_assert( sizeof(void*) == sizeof(uintptr_t)
, "Error sizeof(void*) != sizeof(uintptr_t)" );
static_assert( Kokkos::Impl::is_integral_power_of_two( Kokkos::Impl::MEMORY_ALIGNMENT )
, "Memory alignment must be power of two" );
constexpr uintptr_t alignment = Kokkos::Impl::MEMORY_ALIGNMENT ;
constexpr uintptr_t alignment_mask = alignment - 1 ;
void * ptr = 0 ;
if ( arg_alloc_size ) {
if ( m_alloc_mech == STD_MALLOC ) {
// Over-allocate to and round up to guarantee proper alignment.
size_t size_padded = arg_alloc_size + sizeof(void*) + alignment ;
void * alloc_ptr = malloc( size_padded );
if (alloc_ptr) {
uintptr_t address = reinterpret_cast<uintptr_t>(alloc_ptr);
// offset enough to record the alloc_ptr
address += sizeof(void *);
uintptr_t rem = address % alignment;
uintptr_t offset = rem ? (alignment - rem) : 0u;
address += offset;
ptr = reinterpret_cast<void *>(address);
// record the alloc'd pointer
address -= sizeof(void *);
*reinterpret_cast<void **>(address) = alloc_ptr;
}
}
#if defined( KOKKOS_INTEL_MM_ALLOC_AVAILABLE )
else if ( m_alloc_mech == INTEL_MM_ALLOC ) {
ptr = _mm_malloc( arg_alloc_size , alignment );
}
#endif
#if defined( KOKKOS_POSIX_MEMALIGN_AVAILABLE )
else if ( m_alloc_mech == POSIX_MEMALIGN ) {
posix_memalign( & ptr, alignment , arg_alloc_size );
}
#endif
#if defined( KOKKOS_POSIX_MMAP_FLAGS )
else if ( m_alloc_mech == POSIX_MMAP ) {
constexpr size_t use_huge_pages = (1u << 27);
constexpr int prot = PROT_READ | PROT_WRITE ;
const int flags = arg_alloc_size < use_huge_pages
? KOKKOS_POSIX_MMAP_FLAGS
: KOKKOS_POSIX_MMAP_FLAGS_HUGE ;
// read write access to private memory
ptr = mmap( NULL /* address hint, if NULL OS kernel chooses address */
, arg_alloc_size /* size in bytes */
, prot /* memory protection */
, flags /* visibility of updates */
, -1 /* file descriptor */
, 0 /* offset */
);
/* Associated reallocation:
ptr = mremap( old_ptr , old_size , new_size , MREMAP_MAYMOVE );
*/
}
#endif
}
if ( ( ptr == 0 ) || ( reinterpret_cast<uintptr_t>(ptr) == ~uintptr_t(0) )
|| ( reinterpret_cast<uintptr_t>(ptr) & alignment_mask ) ) {
std::ostringstream msg ;
msg << "Kokkos::HostSpace::allocate[ " ;
switch( m_alloc_mech ) {
case STD_MALLOC: msg << "STD_MALLOC" ; break ;
case POSIX_MEMALIGN: msg << "POSIX_MEMALIGN" ; break ;
case POSIX_MMAP: msg << "POSIX_MMAP" ; break ;
case INTEL_MM_ALLOC: msg << "INTEL_MM_ALLOC" ; break ;
}
msg << " ]( " << arg_alloc_size << " ) FAILED" ;
if ( ptr == NULL ) { msg << " NULL" ; }
else { msg << " NOT ALIGNED " << ptr ; }
std::cerr << msg.str() << std::endl ;
std::cerr.flush();
Kokkos::Impl::throw_runtime_exception( msg.str() );
}
return ptr;
}
void HostSpace::deallocate( void * const arg_alloc_ptr , const size_t arg_alloc_size ) const
{
if ( arg_alloc_ptr ) {
if ( m_alloc_mech == STD_MALLOC ) {
void * alloc_ptr = *(reinterpret_cast<void **>(arg_alloc_ptr) -1);
free( alloc_ptr );
}
#if defined( KOKKOS_INTEL_MM_ALLOC_AVAILABLE )
else if ( m_alloc_mech == INTEL_MM_ALLOC ) {
_mm_free( arg_alloc_ptr );
}
#endif
#if defined( KOKKOS_POSIX_MEMALIGN_AVAILABLE )
else if ( m_alloc_mech == POSIX_MEMALIGN ) {
free( arg_alloc_ptr );
}
#endif
#if defined( KOKKOS_POSIX_MMAP_FLAGS )
else if ( m_alloc_mech == POSIX_MMAP ) {
munmap( arg_alloc_ptr , arg_alloc_size );
}
#endif
}
}
+constexpr const char* HostSpace::name() {
+ return m_name;
+}
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
-namespace Experimental {
namespace Impl {
SharedAllocationRecord< void , void >
SharedAllocationRecord< Kokkos::HostSpace , void >::s_root_record ;
void
SharedAllocationRecord< Kokkos::HostSpace , void >::
deallocate( SharedAllocationRecord< void , void > * arg_rec )
{
delete static_cast<SharedAllocationRecord*>(arg_rec);
}
SharedAllocationRecord< Kokkos::HostSpace , void >::
~SharedAllocationRecord()
{
+ #if (KOKKOS_ENABLE_PROFILING)
+ if(Kokkos::Profiling::profileLibraryLoaded()) {
+ Kokkos::Profiling::deallocateData(
+ Kokkos::Profiling::SpaceHandle(Kokkos::HostSpace::name()),RecordBase::m_alloc_ptr->m_label,
+ data(),size());
+ }
+ #endif
+
m_space.deallocate( SharedAllocationRecord< void , void >::m_alloc_ptr
, SharedAllocationRecord< void , void >::m_alloc_size
);
}
SharedAllocationRecord< Kokkos::HostSpace , void >::
SharedAllocationRecord( const Kokkos::HostSpace & arg_space
, const std::string & arg_label
, const size_t arg_alloc_size
, const SharedAllocationRecord< void , void >::function_type arg_dealloc
)
// Pass through allocated [ SharedAllocationHeader , user_memory ]
// Pass through deallocation function
: SharedAllocationRecord< void , void >
( & SharedAllocationRecord< Kokkos::HostSpace , void >::s_root_record
, reinterpret_cast<SharedAllocationHeader*>( arg_space.allocate( sizeof(SharedAllocationHeader) + arg_alloc_size ) )
, sizeof(SharedAllocationHeader) + arg_alloc_size
, arg_dealloc
)
, m_space( arg_space )
{
+#if (KOKKOS_ENABLE_PROFILING)
+ if(Kokkos::Profiling::profileLibraryLoaded()) {
+ Kokkos::Profiling::allocateData(Kokkos::Profiling::SpaceHandle(arg_space.name()),arg_label,data(),arg_alloc_size);
+ }
+#endif
// Fill in the Header information
RecordBase::m_alloc_ptr->m_record = static_cast< SharedAllocationRecord< void , void > * >( this );
strncpy( RecordBase::m_alloc_ptr->m_label
, arg_label.c_str()
, SharedAllocationHeader::maximum_label_length
);
}
//----------------------------------------------------------------------------
void * SharedAllocationRecord< Kokkos::HostSpace , void >::
allocate_tracked( const Kokkos::HostSpace & arg_space
, const std::string & arg_alloc_label
, const size_t arg_alloc_size )
{
if ( ! arg_alloc_size ) return (void *) 0 ;
SharedAllocationRecord * const r =
allocate( arg_space , arg_alloc_label , arg_alloc_size );
RecordBase::increment( r );
return r->data();
}
void SharedAllocationRecord< Kokkos::HostSpace , void >::
deallocate_tracked( void * const arg_alloc_ptr )
{
if ( arg_alloc_ptr != 0 ) {
SharedAllocationRecord * const r = get_record( arg_alloc_ptr );
RecordBase::decrement( r );
}
}
void * SharedAllocationRecord< Kokkos::HostSpace , void >::
reallocate_tracked( void * const arg_alloc_ptr
, const size_t arg_alloc_size )
{
SharedAllocationRecord * const r_old = get_record( arg_alloc_ptr );
SharedAllocationRecord * const r_new = allocate( r_old->m_space , r_old->get_label() , arg_alloc_size );
Kokkos::Impl::DeepCopy<HostSpace,HostSpace>( r_new->data() , r_old->data()
, std::min( r_old->size() , r_new->size() ) );
RecordBase::increment( r_new );
RecordBase::decrement( r_old );
return r_new->data();
}
SharedAllocationRecord< Kokkos::HostSpace , void > *
SharedAllocationRecord< Kokkos::HostSpace , void >::get_record( void * alloc_ptr )
{
typedef SharedAllocationHeader Header ;
typedef SharedAllocationRecord< Kokkos::HostSpace , void > RecordHost ;
SharedAllocationHeader const * const head = alloc_ptr ? Header::get_header( alloc_ptr ) : (SharedAllocationHeader *)0 ;
RecordHost * const record = head ? static_cast< RecordHost * >( head->m_record ) : (RecordHost *) 0 ;
if ( ! alloc_ptr || record->m_alloc_ptr != head ) {
- Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::HostSpace , void >::get_record ERROR" ) );
+ Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Impl::SharedAllocationRecord< Kokkos::HostSpace , void >::get_record ERROR" ) );
}
return record ;
}
// Iterate records to print orphaned memory ...
void SharedAllocationRecord< Kokkos::HostSpace , void >::
print_records( std::ostream & s , const Kokkos::HostSpace & space , bool detail )
{
SharedAllocationRecord< void , void >::print_host_accessible_records( s , "HostSpace" , & s_root_record , detail );
}
} // namespace Impl
-} // namespace Experimental
-} // namespace Kokkos
-
-/*--------------------------------------------------------------------------*/
-/*--------------------------------------------------------------------------*/
-
-namespace Kokkos {
-namespace Experimental {
-namespace Impl {
-
-template< class >
-struct ViewOperatorBoundsErrorAbort ;
-
-template<>
-struct ViewOperatorBoundsErrorAbort< Kokkos::HostSpace > {
- static void apply( const size_t rank
- , const size_t n0 , const size_t n1
- , const size_t n2 , const size_t n3
- , const size_t n4 , const size_t n5
- , const size_t n6 , const size_t n7
- , const size_t i0 , const size_t i1
- , const size_t i2 , const size_t i3
- , const size_t i4 , const size_t i5
- , const size_t i6 , const size_t i7 );
-};
-
-void ViewOperatorBoundsErrorAbort< Kokkos::HostSpace >::
-apply( const size_t rank
- , const size_t n0 , const size_t n1
- , const size_t n2 , const size_t n3
- , const size_t n4 , const size_t n5
- , const size_t n6 , const size_t n7
- , const size_t i0 , const size_t i1
- , const size_t i2 , const size_t i3
- , const size_t i4 , const size_t i5
- , const size_t i6 , const size_t i7 )
-{
- char buffer[512];
-
- snprintf( buffer , sizeof(buffer)
- , "View operator bounds error : rank(%lu) dim(%lu,%lu,%lu,%lu,%lu,%lu,%lu,%lu) index(%lu,%lu,%lu,%lu,%lu,%lu,%lu,%lu)"
- , rank , n0 , n1 , n2 , n3 , n4 , n5 , n6 , n7
- , i0 , i1 , i2 , i3 , i4 , i5 , i6 , i7 );
-
- Kokkos::Impl::throw_runtime_exception( buffer );
-}
-
-} // namespace Impl
-} // namespace Experimental
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
namespace Kokkos {
namespace {
const unsigned HOST_SPACE_ATOMIC_MASK = 0xFFFF;
const unsigned HOST_SPACE_ATOMIC_XOR_MASK = 0x5A39;
static int HOST_SPACE_ATOMIC_LOCKS[HOST_SPACE_ATOMIC_MASK+1];
}
namespace Impl {
void init_lock_array_host_space() {
static int is_initialized = 0;
if(! is_initialized)
for(int i = 0; i < static_cast<int> (HOST_SPACE_ATOMIC_MASK+1); i++)
HOST_SPACE_ATOMIC_LOCKS[i] = 0;
}
bool lock_address_host_space(void* ptr) {
return 0 == atomic_compare_exchange( &HOST_SPACE_ATOMIC_LOCKS[
(( size_t(ptr) >> 2 ) & HOST_SPACE_ATOMIC_MASK) ^ HOST_SPACE_ATOMIC_XOR_MASK] ,
0 , 1);
}
void unlock_address_host_space(void* ptr) {
atomic_exchange( &HOST_SPACE_ATOMIC_LOCKS[
(( size_t(ptr) >> 2 ) & HOST_SPACE_ATOMIC_MASK) ^ HOST_SPACE_ATOMIC_XOR_MASK] ,
0);
}
}
}
diff --git a/lib/kokkos/core/src/impl/Kokkos_Memory_Fence.hpp b/lib/kokkos/core/src/impl/Kokkos_Memory_Fence.hpp
index eb3da7501..5155c66df 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Memory_Fence.hpp
+++ b/lib/kokkos/core/src/impl/Kokkos_Memory_Fence.hpp
@@ -1,107 +1,107 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#if defined( KOKKOS_ATOMIC_HPP ) && ! defined( KOKKOS_MEMORY_FENCE )
#define KOKKOS_MEMORY_FENCE
namespace Kokkos {
//----------------------------------------------------------------------------
KOKKOS_FORCEINLINE_FUNCTION
void memory_fence()
{
-#if defined( KOKKOS_ATOMICS_USE_CUDA )
+#if defined( __CUDA_ARCH__ )
__threadfence();
#elif defined( KOKKOS_ATOMICS_USE_GCC ) || \
( defined( KOKKOS_COMPILER_NVCC ) && defined( KOKKOS_ATOMICS_USE_INTEL ) )
__sync_synchronize();
#elif defined( KOKKOS_ATOMICS_USE_INTEL )
_mm_mfence();
#elif defined( KOKKOS_ATOMICS_USE_OMP31 )
#pragma omp flush
#elif defined( KOKKOS_ATOMICS_USE_WINDOWS )
MemoryBarrier();
#else
#error "Error: memory_fence() not defined"
#endif
}
//////////////////////////////////////////////////////
// store_fence()
//
// If possible use a store fence on the architecture, if not run a full memory fence
KOKKOS_FORCEINLINE_FUNCTION
void store_fence()
{
#if defined( KOKKOS_ENABLE_ASM ) && defined( KOKKOS_USE_ISA_X86_64 )
asm volatile (
"sfence" ::: "memory"
);
#else
memory_fence();
#endif
}
//////////////////////////////////////////////////////
// load_fence()
//
// If possible use a load fence on the architecture, if not run a full memory fence
KOKKOS_FORCEINLINE_FUNCTION
void load_fence()
{
#if defined( KOKKOS_ENABLE_ASM ) && defined( KOKKOS_USE_ISA_X86_64 )
asm volatile (
"lfence" ::: "memory"
);
#else
memory_fence();
#endif
}
} // namespace kokkos
#endif
diff --git a/lib/kokkos/core/src/impl/Kokkos_Profiling_Interface.cpp b/lib/kokkos/core/src/impl/Kokkos_Profiling_Interface.cpp
index 91faed170..99c5df4db 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Profiling_Interface.cpp
+++ b/lib/kokkos/core/src/impl/Kokkos_Profiling_Interface.cpp
@@ -1,186 +1,237 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <impl/Kokkos_Profiling_Interface.hpp>
#if (KOKKOS_ENABLE_PROFILING)
#include <string.h>
namespace Kokkos {
namespace Profiling {
+
+ SpaceHandle::SpaceHandle(const char* space_name) {
+ strncpy(name,space_name,64);
+ }
+
bool profileLibraryLoaded() {
return (NULL != initProfileLibrary);
}
void beginParallelFor(const std::string& kernelPrefix, const uint32_t devID, uint64_t* kernelID) {
if(NULL != beginForCallee) {
Kokkos::fence();
(*beginForCallee)(kernelPrefix.c_str(), devID, kernelID);
}
}
void endParallelFor(const uint64_t kernelID) {
if(NULL != endForCallee) {
Kokkos::fence();
(*endForCallee)(kernelID);
}
}
void beginParallelScan(const std::string& kernelPrefix, const uint32_t devID, uint64_t* kernelID) {
if(NULL != beginScanCallee) {
Kokkos::fence();
(*beginScanCallee)(kernelPrefix.c_str(), devID, kernelID);
}
}
void endParallelScan(const uint64_t kernelID) {
if(NULL != endScanCallee) {
Kokkos::fence();
(*endScanCallee)(kernelID);
}
}
void beginParallelReduce(const std::string& kernelPrefix, const uint32_t devID, uint64_t* kernelID) {
if(NULL != beginReduceCallee) {
Kokkos::fence();
(*beginReduceCallee)(kernelPrefix.c_str(), devID, kernelID);
}
}
void endParallelReduce(const uint64_t kernelID) {
if(NULL != endReduceCallee) {
Kokkos::fence();
(*endReduceCallee)(kernelID);
}
}
+
+ void pushRegion(const std::string& kName) {
+ if( NULL != pushRegionCallee ) {
+ Kokkos::fence();
+ (*pushRegionCallee)(kName.c_str());
+ }
+ }
+
+ void popRegion() {
+ if( NULL != popRegionCallee ) {
+ Kokkos::fence();
+ (*popRegionCallee)();
+ }
+ }
+
+ void allocateData(const SpaceHandle space, const std::string label, const void* ptr, const uint64_t size) {
+ if(NULL != allocateDataCallee) {
+ (*allocateDataCallee)(space,label.c_str(),ptr,size);
+ }
+ }
+
+ void deallocateData(const SpaceHandle space, const std::string label, const void* ptr, const uint64_t size) {
+ if(NULL != allocateDataCallee) {
+ (*deallocateDataCallee)(space,label.c_str(),ptr,size);
+ }
+ }
+
void initialize() {
// Make sure initialize calls happens only once
static int is_initialized = 0;
if(is_initialized) return;
is_initialized = 1;
void* firstProfileLibrary;
char* envProfileLibrary = getenv("KOKKOS_PROFILE_LIBRARY");
// If we do not find a profiling library in the environment then exit
// early.
if( NULL == envProfileLibrary ) {
return ;
}
char* envProfileCopy = (char*) malloc(sizeof(char) * (strlen(envProfileLibrary) + 1));
sprintf(envProfileCopy, "%s", envProfileLibrary);
char* profileLibraryName = strtok(envProfileCopy, ";");
if( (NULL != profileLibraryName) && (strcmp(profileLibraryName, "") != 0) ) {
firstProfileLibrary = dlopen(profileLibraryName, RTLD_NOW | RTLD_GLOBAL);
if(NULL == firstProfileLibrary) {
std::cerr << "Error: Unable to load KokkosP library: " <<
profileLibraryName << std::endl;
} else {
std::cout << "KokkosP: Library Loaded: " << profileLibraryName << std::endl;
// dlsym returns a pointer to an object, while we want to assign to pointer to function
// A direct cast will give warnings hence, we have to workaround the issue by casting pointer to pointers.
auto p1 = dlsym(firstProfileLibrary, "kokkosp_begin_parallel_for");
beginForCallee = *((beginFunction*) &p1);
auto p2 = dlsym(firstProfileLibrary, "kokkosp_begin_parallel_scan");
beginScanCallee = *((beginFunction*) &p2);
auto p3 = dlsym(firstProfileLibrary, "kokkosp_begin_parallel_reduce");
beginReduceCallee = *((beginFunction*) &p3);
auto p4 = dlsym(firstProfileLibrary, "kokkosp_end_parallel_scan");
endScanCallee = *((endFunction*) &p4);
auto p5 = dlsym(firstProfileLibrary, "kokkosp_end_parallel_for");
endForCallee = *((endFunction*) &p5);
auto p6 = dlsym(firstProfileLibrary, "kokkosp_end_parallel_reduce");
endReduceCallee = *((endFunction*) &p6);
auto p7 = dlsym(firstProfileLibrary, "kokkosp_init_library");
initProfileLibrary = *((initFunction*) &p7);
auto p8 = dlsym(firstProfileLibrary, "kokkosp_finalize_library");
finalizeProfileLibrary = *((finalizeFunction*) &p8);
+
+ auto p9 = dlsym(firstProfileLibrary, "kokkosp_push_profile_region");
+ pushRegionCallee = *((pushFunction*) &p9);
+ auto p10 = dlsym(firstProfileLibrary, "kokkosp_pop_profile_region");
+ popRegionCallee = *((popFunction*) &p10);
+
+ auto p11 = dlsym(firstProfileLibrary, "kokkosp_allocate_data");
+ allocateDataCallee = *((allocateDataFunction*) &p11);
+ auto p12 = dlsym(firstProfileLibrary, "kokkosp_deallocate_data");
+ deallocateDataCallee = *((deallocateDataFunction*) &p12);
+
}
}
if(NULL != initProfileLibrary) {
(*initProfileLibrary)(0,
(uint64_t) KOKKOSP_INTERFACE_VERSION,
(uint32_t) 0,
NULL);
}
free(envProfileCopy);
}
void finalize() {
// Make sure finalize calls happens only once
static int is_finalized = 0;
if(is_finalized) return;
is_finalized = 1;
if(NULL != finalizeProfileLibrary) {
(*finalizeProfileLibrary)();
// Set all profile hooks to NULL to prevent
// any additional calls. Once we are told to
// finalize, we mean it
+ initProfileLibrary = NULL;
+ finalizeProfileLibrary = NULL;
+
beginForCallee = NULL;
beginScanCallee = NULL;
beginReduceCallee = NULL;
endScanCallee = NULL;
endForCallee = NULL;
endReduceCallee = NULL;
- initProfileLibrary = NULL;
- finalizeProfileLibrary = NULL;
+
+ pushRegionCallee = NULL;
+ popRegionCallee = NULL;
+
+ allocateDataCallee = NULL;
+ deallocateDataCallee = NULL;
+
}
}
}
}
#endif
diff --git a/lib/kokkos/core/src/impl/Kokkos_Profiling_Interface.hpp b/lib/kokkos/core/src/impl/Kokkos_Profiling_Interface.hpp
index 4f0125633..3d6a38925 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Profiling_Interface.hpp
+++ b/lib/kokkos/core/src/impl/Kokkos_Profiling_Interface.hpp
@@ -1,118 +1,151 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOSP_INTERFACE_HPP
#define KOKKOSP_INTERFACE_HPP
#include <cstddef>
#include <Kokkos_Core_fwd.hpp>
#include <Kokkos_Macros.hpp>
#include <string>
+#include <cinttypes>
#if (KOKKOS_ENABLE_PROFILING)
#include <impl/Kokkos_Profiling_DeviceInfo.hpp>
#include <dlfcn.h>
#include <iostream>
#include <stdlib.h>
#endif
#define KOKKOSP_INTERFACE_VERSION 20150628
#if (KOKKOS_ENABLE_PROFILING)
namespace Kokkos {
namespace Profiling {
+ struct SpaceHandle {
+ SpaceHandle(const char* space_name);
+ char name[64];
+ };
+
typedef void (*initFunction)(const int,
const uint64_t,
const uint32_t,
KokkosPDeviceInfo*);
typedef void (*finalizeFunction)();
typedef void (*beginFunction)(const char*, const uint32_t, uint64_t*);
typedef void (*endFunction)(uint64_t);
+ typedef void (*pushFunction)(const char*);
+ typedef void (*popFunction)();
+
+ typedef void (*allocateDataFunction)(const SpaceHandle, const char*, const void*, const uint64_t);
+ typedef void (*deallocateDataFunction)(const SpaceHandle, const char*, const void*, const uint64_t);
+
+
static initFunction initProfileLibrary = NULL;
static finalizeFunction finalizeProfileLibrary = NULL;
+
static beginFunction beginForCallee = NULL;
static beginFunction beginScanCallee = NULL;
static beginFunction beginReduceCallee = NULL;
static endFunction endForCallee = NULL;
static endFunction endScanCallee = NULL;
static endFunction endReduceCallee = NULL;
+ static pushFunction pushRegionCallee = NULL;
+ static popFunction popRegionCallee = NULL;
+
+ static allocateDataFunction allocateDataCallee = NULL;
+ static deallocateDataFunction deallocateDataCallee = NULL;
+
+
bool profileLibraryLoaded();
void beginParallelFor(const std::string& kernelPrefix, const uint32_t devID, uint64_t* kernelID);
void endParallelFor(const uint64_t kernelID);
void beginParallelScan(const std::string& kernelPrefix, const uint32_t devID, uint64_t* kernelID);
void endParallelScan(const uint64_t kernelID);
void beginParallelReduce(const std::string& kernelPrefix, const uint32_t devID, uint64_t* kernelID);
void endParallelReduce(const uint64_t kernelID);
+ void pushRegion(const std::string& kName);
+ void popRegion();
+
+ void allocateData(const SpaceHandle space, const std::string label, const void* ptr, const uint64_t size);
+ void deallocateData(const SpaceHandle space, const std::string label, const void* ptr, const uint64_t size);
+
void initialize();
void finalize();
//Define finalize_fake inline to get rid of warnings for unused static variables
inline void finalize_fake() {
if(NULL != finalizeProfileLibrary) {
(*finalizeProfileLibrary)();
// Set all profile hooks to NULL to prevent
// any additional calls. Once we are told to
// finalize, we mean it
beginForCallee = NULL;
beginScanCallee = NULL;
beginReduceCallee = NULL;
endScanCallee = NULL;
endForCallee = NULL;
endReduceCallee = NULL;
+
+ allocateDataCallee = NULL;
+ deallocateDataCallee = NULL;
+
initProfileLibrary = NULL;
finalizeProfileLibrary = NULL;
+ pushRegionCallee = NULL;
+ popRegionCallee = NULL;
}
}
}
}
#endif
#endif
diff --git a/lib/kokkos/core/src/impl/Kokkos_Serial_Task.cpp b/lib/kokkos/core/src/impl/Kokkos_Serial_Task.cpp
index e8bdbde6c..eb881545d 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Serial_Task.cpp
+++ b/lib/kokkos/core/src/impl/Kokkos_Serial_Task.cpp
@@ -1,147 +1,148 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Core.hpp>
-#if defined( KOKKOS_HAVE_SERIAL ) && defined( KOKKOS_ENABLE_TASKPOLICY )
+#if defined( KOKKOS_HAVE_SERIAL ) && defined( KOKKOS_ENABLE_TASKDAG )
+#include <impl/Kokkos_Serial_Task.hpp>
#include <impl/Kokkos_TaskQueue_impl.hpp>
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template class TaskQueue< Kokkos::Serial > ;
void TaskQueueSpecialization< Kokkos::Serial >::execute
( TaskQueue< Kokkos::Serial > * const queue )
{
using execution_space = Kokkos::Serial ;
using queue_type = TaskQueue< execution_space > ;
using task_root_type = TaskBase< execution_space , void , void > ;
using Member = TaskExec< execution_space > ;
task_root_type * const end = (task_root_type *) task_root_type::EndTag ;
Member exec ;
// Loop until all queues are empty
while ( 0 < queue->m_ready_count ) {
task_root_type * task = end ;
for ( int i = 0 ; i < queue_type::NumQueue && end == task ; ++i ) {
for ( int j = 0 ; j < 2 && end == task ; ++j ) {
task = queue_type::pop_task( & queue->m_ready[i][j] );
}
}
if ( end != task ) {
// pop_task resulted in lock == task->m_next
// In the executing state
(*task->m_apply)( task , & exec );
#if 0
printf( "TaskQueue<Serial>::executed: 0x%lx { 0x%lx 0x%lx %d %d %d }\n"
, uintptr_t(task)
, uintptr_t(task->m_wait)
, uintptr_t(task->m_next)
, task->m_task_type
, task->m_priority
, task->m_ref_count );
#endif
// If a respawn then re-enqueue otherwise the task is complete
// and all tasks waiting on this task are updated.
queue->complete( task );
}
else if ( 0 != queue->m_ready_count ) {
Kokkos::abort("TaskQueue<Serial>::execute ERROR: ready_count");
}
}
}
void TaskQueueSpecialization< Kokkos::Serial > ::
iff_single_thread_recursive_execute(
TaskQueue< Kokkos::Serial > * const queue )
{
using execution_space = Kokkos::Serial ;
using queue_type = TaskQueue< execution_space > ;
using task_root_type = TaskBase< execution_space , void , void > ;
using Member = TaskExec< execution_space > ;
task_root_type * const end = (task_root_type *) task_root_type::EndTag ;
Member exec ;
// Loop until no runnable task
task_root_type * task = end ;
do {
task = end ;
for ( int i = 0 ; i < queue_type::NumQueue && end == task ; ++i ) {
for ( int j = 0 ; j < 2 && end == task ; ++j ) {
task = queue_type::pop_task( & queue->m_ready[i][j] );
}
}
if ( end == task ) break ;
(*task->m_apply)( task , & exec );
queue->complete( task );
} while(1);
}
}} /* namespace Kokkos::Impl */
-#endif /* #if defined( KOKKOS_HAVE_SERIAL ) && defined( KOKKOS_ENABLE_TASKPOLICY ) */
+#endif /* #if defined( KOKKOS_HAVE_SERIAL ) && defined( KOKKOS_ENABLE_TASKDAG ) */
diff --git a/lib/kokkos/core/src/impl/Kokkos_Serial_Task.hpp b/lib/kokkos/core/src/impl/Kokkos_Serial_Task.hpp
index 48a110c5f..473b7aadb 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Serial_Task.hpp
+++ b/lib/kokkos/core/src/impl/Kokkos_Serial_Task.hpp
@@ -1,271 +1,308 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_IMPL_SERIAL_TASK_HPP
#define KOKKOS_IMPL_SERIAL_TASK_HPP
-#if defined( KOKKOS_ENABLE_TASKPOLICY )
+#if defined( KOKKOS_ENABLE_TASKDAG )
+
+#include <impl/Kokkos_TaskQueue.hpp>
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
//----------------------------------------------------------------------------
template<>
class TaskQueueSpecialization< Kokkos::Serial >
{
public:
using execution_space = Kokkos::Serial ;
using memory_space = Kokkos::HostSpace ;
using queue_type = Kokkos::Impl::TaskQueue< execution_space > ;
using task_base_type = Kokkos::Impl::TaskBase< execution_space , void , void > ;
static
void iff_single_thread_recursive_execute( queue_type * const );
static
void execute( queue_type * const );
template< typename FunctorType >
static
void proc_set_apply( task_base_type::function_type * ptr )
{
using TaskType = TaskBase< Kokkos::Serial
, typename FunctorType::value_type
, FunctorType
> ;
*ptr = TaskType::apply ;
}
};
extern template class TaskQueue< Kokkos::Serial > ;
//----------------------------------------------------------------------------
template<>
class TaskExec< Kokkos::Serial >
{
public:
KOKKOS_INLINE_FUNCTION void team_barrier() const {}
KOKKOS_INLINE_FUNCTION int team_rank() const { return 0 ; }
KOKKOS_INLINE_FUNCTION int team_size() const { return 1 ; }
};
template<typename iType>
struct TeamThreadRangeBoundariesStruct<iType, TaskExec< Kokkos::Serial > >
{
typedef iType index_type;
const iType start ;
const iType end ;
enum {increment = 1};
//const TaskExec< Kokkos::Serial > & thread;
TaskExec< Kokkos::Serial > & thread;
KOKKOS_INLINE_FUNCTION
TeamThreadRangeBoundariesStruct
//( const TaskExec< Kokkos::Serial > & arg_thread, const iType& arg_count)
( TaskExec< Kokkos::Serial > & arg_thread, const iType& arg_count)
: start(0)
, end(arg_count)
, thread(arg_thread)
{}
KOKKOS_INLINE_FUNCTION
TeamThreadRangeBoundariesStruct
//( const TaskExec< Kokkos::Serial > & arg_thread
( TaskExec< Kokkos::Serial > & arg_thread
, const iType& arg_start
, const iType & arg_end
)
: start( arg_start )
, end( arg_end)
, thread( arg_thread )
{}
};
+//----------------------------------------------------------------------------
+
+template<typename iType>
+struct ThreadVectorRangeBoundariesStruct<iType, TaskExec< Kokkos::Serial > >
+{
+ typedef iType index_type;
+ const iType start ;
+ const iType end ;
+ enum {increment = 1};
+ TaskExec< Kokkos::Serial > & thread;
+
+ KOKKOS_INLINE_FUNCTION
+ ThreadVectorRangeBoundariesStruct
+ ( TaskExec< Kokkos::Serial > & arg_thread, const iType& arg_count)
+ : start( 0 )
+ , end(arg_count)
+ , thread(arg_thread)
+ {}
+};
+
}} /* namespace Kokkos::Impl */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
-/*
-template<typename iType>
-KOKKOS_INLINE_FUNCTION
-Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Serial > >
-TeamThreadRange( const Impl::TaskExec< Kokkos::Serial > & thread
- , const iType & count )
-{
- return Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Serial > >(thread,count);
-}
-*/
-//TODO const issue omp
-template<typename iType>
+
+// OMP version needs non-const TaskExec
+template< typename iType >
KOKKOS_INLINE_FUNCTION
-Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Serial > >
-TeamThreadRange( Impl::TaskExec< Kokkos::Serial > & thread
- , const iType & count )
+Impl::TeamThreadRangeBoundariesStruct< iType, Impl::TaskExec< Kokkos::Serial > >
+TeamThreadRange( Impl::TaskExec< Kokkos::Serial > & thread, const iType & count )
{
- return Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Serial > >(thread,count);
+ return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::TaskExec< Kokkos::Serial > >( thread, count );
}
-/*
-template<typename iType>
+
+// OMP version needs non-const TaskExec
+template< typename iType1, typename iType2 >
KOKKOS_INLINE_FUNCTION
-Impl::TeamThreadRangeBoundariesStruct<iType,Impl:: TaskExec< Kokkos::Serial > >
-TeamThreadRange( const Impl:: TaskExec< Kokkos::Serial > & thread, const iType & start , const iType & end )
+Impl::TeamThreadRangeBoundariesStruct< typename std::common_type< iType1, iType2 >::type,
+ Impl::TaskExec< Kokkos::Serial > >
+TeamThreadRange( Impl::TaskExec< Kokkos::Serial > & thread, const iType1 & start, const iType2 & end )
{
- return Impl::TeamThreadRangeBoundariesStruct<iType,Impl:: TaskExec< Kokkos::Serial > >(thread,start,end);
+ typedef typename std::common_type< iType1, iType2 >::type iType;
+ return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::TaskExec< Kokkos::Serial > >(
+ thread, iType(start), iType(end) );
}
-*/
-//TODO const issue omp
+
+// OMP version needs non-const TaskExec
template<typename iType>
KOKKOS_INLINE_FUNCTION
-Impl::TeamThreadRangeBoundariesStruct<iType,Impl:: TaskExec< Kokkos::Serial > >
-TeamThreadRange( Impl:: TaskExec< Kokkos::Serial > & thread, const iType & start , const iType & end )
+Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Serial > >
+ThreadVectorRange
+ ( Impl::TaskExec< Kokkos::Serial > & thread
+ , const iType & count )
{
- return Impl::TeamThreadRangeBoundariesStruct<iType,Impl:: TaskExec< Kokkos::Serial > >(thread,start,end);
+ return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Serial > >(thread,count);
}
/** \brief Inter-thread parallel_for. Executes lambda(iType i) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all threads of the the calling thread team.
* This functionality requires C++11 support.*/
template<typename iType, class Lambda>
KOKKOS_INLINE_FUNCTION
-void parallel_for(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl:: TaskExec< Kokkos::Serial > >& loop_boundaries, const Lambda& lambda) {
+void parallel_for(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Serial > >& loop_boundaries, const Lambda& lambda) {
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment)
lambda(i);
}
template< typename iType, class Lambda, typename ValueType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce
(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Serial > >& loop_boundaries,
const Lambda & lambda,
ValueType& initialized_result)
{
ValueType result = initialized_result;
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment)
lambda(i, result);
initialized_result = result;
}
template< typename iType, class Lambda, typename ValueType, class JoinType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce
(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Serial > >& loop_boundaries,
const Lambda & lambda,
const JoinType & join,
ValueType& initialized_result)
{
ValueType result = initialized_result;
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment)
lambda(i, result);
initialized_result = result;
}
-// placeholder for future function
+
template< typename iType, class Lambda, typename ValueType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce
(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Serial > >& loop_boundaries,
const Lambda & lambda,
ValueType& initialized_result)
{
+ initialized_result = ValueType();
+#ifdef KOKKOS_HAVE_PRAGMA_IVDEP
+#pragma ivdep
+#endif
+ for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
+ ValueType tmp = ValueType();
+ lambda(i,tmp);
+ initialized_result+=tmp;
+ }
}
-// placeholder for future function
+
template< typename iType, class Lambda, typename ValueType, class JoinType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce
(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Serial > >& loop_boundaries,
const Lambda & lambda,
const JoinType & join,
ValueType& initialized_result)
{
+ ValueType result = initialized_result;
+#ifdef KOKKOS_HAVE_PRAGMA_IVDEP
+#pragma ivdep
+#endif
+ for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
+ ValueType tmp = ValueType();
+ lambda(i,tmp);
+ join(result,tmp);
+ }
+ initialized_result = result;
}
template< typename ValueType, typename iType, class Lambda >
KOKKOS_INLINE_FUNCTION
void parallel_scan
(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Serial > >& loop_boundaries,
const Lambda & lambda)
{
ValueType accum = 0 ;
ValueType val, local_total;
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
local_total = 0;
lambda(i,local_total,false);
val = accum;
lambda(i,val,true);
accum += local_total;
}
}
// placeholder for future function
template< typename iType, class Lambda, typename ValueType >
KOKKOS_INLINE_FUNCTION
void parallel_scan
(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Serial > >& loop_boundaries,
const Lambda & lambda)
{
}
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
-#endif /* #if defined( KOKKOS_ENABLE_TASKPOLICY ) */
+#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
#endif /* #ifndef KOKKOS_IMPL_SERIAL_TASK_HPP */
diff --git a/lib/kokkos/core/src/impl/Kokkos_Serial_TaskPolicy.cpp b/lib/kokkos/core/src/impl/Kokkos_Serial_TaskPolicy.cpp
deleted file mode 100644
index 1577df07c..000000000
--- a/lib/kokkos/core/src/impl/Kokkos_Serial_TaskPolicy.cpp
+++ /dev/null
@@ -1,348 +0,0 @@
-/*
-//@HEADER
-// ************************************************************************
-//
-// Kokkos v. 2.0
-// Copyright (2014) Sandia Corporation
-//
-// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
-// the U.S. Government retains certain rights in this software.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are
-// met:
-//
-// 1. Redistributions of source code must retain the above copyright
-// notice, this list of conditions and the following disclaimer.
-//
-// 2. Redistributions in binary form must reproduce the above copyright
-// notice, this list of conditions and the following disclaimer in the
-// documentation and/or other materials provided with the distribution.
-//
-// 3. Neither the name of the Corporation nor the names of the
-// contributors may be used to endorse or promote products derived from
-// this software without specific prior written permission.
-//
-// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
-// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
-// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
-// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
-// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
-// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
-// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
-// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
-// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-//
-// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
-// ************************************************************************
-//@HEADER
-*/
-
-// Experimental unified task-data parallel manycore LDRD
-
-#include <impl/Kokkos_Serial_TaskPolicy.hpp>
-
-#if defined( KOKKOS_HAVE_SERIAL ) && defined( KOKKOS_ENABLE_TASKPOLICY )
-
-#include <stdlib.h>
-#include <stdexcept>
-#include <iostream>
-#include <sstream>
-#include <string>
-
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Experimental {
-
-TaskPolicy< Kokkos::Serial >::member_type &
-TaskPolicy< Kokkos::Serial >::member_single()
-{
- static member_type s(0,1,0);
- return s ;
-}
-
-} // namespace Experimental
-} // namespace Kokkos
-
-namespace Kokkos {
-namespace Experimental {
-namespace Impl {
-
-typedef TaskMember< Kokkos::Serial , void , void > Task ;
-
-//----------------------------------------------------------------------------
-
-namespace {
-
-inline
-unsigned padded_sizeof_derived( unsigned sizeof_derived )
-{
- return sizeof_derived +
- ( sizeof_derived % sizeof(Task*) ? sizeof(Task*) - sizeof_derived % sizeof(Task*) : 0 );
-}
-
-} // namespace
-
-void Task::deallocate( void * ptr )
-{
- free( ptr );
-}
-
-void * Task::allocate( const unsigned arg_sizeof_derived
- , const unsigned arg_dependence_capacity )
-{
- return malloc( padded_sizeof_derived( arg_sizeof_derived ) + arg_dependence_capacity * sizeof(Task*) );
-}
-
-Task::~TaskMember()
-{
-
-}
-
-Task::TaskMember( const Task::function_verify_type arg_verify
- , const Task::function_dealloc_type arg_dealloc
- , const Task::function_apply_type arg_apply
- , const unsigned arg_sizeof_derived
- , const unsigned arg_dependence_capacity
- )
- : m_dealloc( arg_dealloc )
- , m_verify( arg_verify )
- , m_apply( arg_apply )
- , m_dep( (Task **)( ((unsigned char *) this) + padded_sizeof_derived( arg_sizeof_derived ) ) )
- , m_wait( 0 )
- , m_next( 0 )
- , m_dep_capacity( arg_dependence_capacity )
- , m_dep_size( 0 )
- , m_ref_count( 0 )
- , m_state( TASK_STATE_CONSTRUCTING )
-{
- for ( unsigned i = 0 ; i < arg_dependence_capacity ; ++i ) m_dep[i] = 0 ;
-}
-
-Task::TaskMember( const Task::function_dealloc_type arg_dealloc
- , const Task::function_apply_type arg_apply
- , const unsigned arg_sizeof_derived
- , const unsigned arg_dependence_capacity
- )
- : m_dealloc( arg_dealloc )
- , m_verify( & Task::verify_type<void> )
- , m_apply( arg_apply )
- , m_dep( (Task **)( ((unsigned char *) this) + padded_sizeof_derived( arg_sizeof_derived ) ) )
- , m_wait( 0 )
- , m_next( 0 )
- , m_dep_capacity( arg_dependence_capacity )
- , m_dep_size( 0 )
- , m_ref_count( 0 )
- , m_state( TASK_STATE_CONSTRUCTING )
-{
- for ( unsigned i = 0 ; i < arg_dependence_capacity ; ++i ) m_dep[i] = 0 ;
-}
-
-//----------------------------------------------------------------------------
-
-void Task::throw_error_add_dependence() const
-{
- std::cerr << "TaskMember< Serial >::add_dependence ERROR"
- << " state(" << m_state << ")"
- << " dep_size(" << m_dep_size << ")"
- << std::endl ;
- throw std::runtime_error("TaskMember< Serial >::add_dependence ERROR");
-}
-
-void Task::throw_error_verify_type()
-{
- throw std::runtime_error("TaskMember< Serial >::verify_type ERROR");
-}
-
-//----------------------------------------------------------------------------
-
-#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
-
-void Task::assign( Task ** const lhs , Task * rhs , const bool no_throw )
-{
- static const char msg_error_header[] = "Kokkos::Experimental::Impl::TaskManager<Kokkos::Serial>::assign ERROR" ;
- static const char msg_error_count[] = ": negative reference count" ;
- static const char msg_error_complete[] = ": destroy task that is not complete" ;
- static const char msg_error_dependences[] = ": destroy task that has dependences" ;
- static const char msg_error_exception[] = ": caught internal exception" ;
-
- const char * msg_error = 0 ;
-
- try {
-
- if ( *lhs ) {
-
- const int count = --((**lhs).m_ref_count);
-
- if ( 0 == count ) {
-
- // Reference count at zero, delete it
-
- // Should only be deallocating a completed task
- if ( (**lhs).m_state == Kokkos::Experimental::TASK_STATE_COMPLETE ) {
-
- // A completed task should not have dependences...
- for ( int i = 0 ; i < (**lhs).m_dep_size && 0 == msg_error ; ++i ) {
- if ( (**lhs).m_dep[i] ) msg_error = msg_error_dependences ;
- }
- }
- else {
- msg_error = msg_error_complete ;
- }
-
- if ( 0 == msg_error ) {
- // Get deletion function and apply it
- const Task::function_dealloc_type d = (**lhs).m_dealloc ;
-
- (*d)( *lhs );
- }
- }
- else if ( count <= 0 ) {
- msg_error = msg_error_count ;
- }
- }
-
- if ( 0 == msg_error && rhs ) { ++( rhs->m_ref_count ); }
-
- *lhs = rhs ;
- }
- catch( ... ) {
- if ( 0 == msg_error ) msg_error = msg_error_exception ;
- }
-
- if ( 0 != msg_error ) {
- if ( no_throw ) {
- std::cerr << msg_error_header << msg_error << std::endl ;
- std::cerr.flush();
- }
- else {
- std::string msg(msg_error_header);
- msg.append(msg_error);
- throw std::runtime_error( msg );
- }
- }
-}
-#endif
-
-namespace {
-
-Task * s_ready = 0 ;
-Task * s_denied = reinterpret_cast<Task*>( ~((uintptr_t)0) );
-
-}
-
-void Task::schedule()
-{
- // Execute ready tasks in case the task being scheduled
- // is dependent upon a waiting and ready task.
-
- Task::execute_ready_tasks();
-
- // spawning : Constructing -> Waiting
- // respawning : Executing -> Waiting
- // updating : Waiting -> Waiting
-
- // Must not be in a dependence linked list: 0 == t->m_next
-
- const bool ok_state = TASK_STATE_COMPLETE != m_state ;
- const bool ok_list = 0 == m_next ;
-
- if ( ok_state && ok_list ) {
-
- if ( TASK_STATE_CONSTRUCTING == m_state ) {
- // Initial scheduling increment,
- // matched by decrement when task is complete.
- ++m_ref_count ;
- }
-
- // Will be waiting for execution upon return from this function
-
- m_state = Kokkos::Experimental::TASK_STATE_WAITING ;
-
- // Insert this task into another dependence that is not complete
-
- int i = 0 ;
- for ( ; i < m_dep_size ; ++i ) {
- Task * const y = m_dep[i] ;
- if ( y && s_denied != ( m_next = y->m_wait ) ) {
- y->m_wait = this ; // CAS( & y->m_wait , m_next , this );
- break ;
- }
- }
- if ( i == m_dep_size ) {
- // All dependences are complete, insert into the ready list
- m_next = s_ready ;
- s_ready = this ; // CAS( & s_ready , m_next = s_ready , this );
- }
- }
- else {
- throw std::runtime_error(std::string("Kokkos::Experimental::Impl::Task spawn or respawn state error"));
- }
-}
-
-void Task::execute_ready_tasks()
-{
- while ( s_ready ) {
-
- // Remove this task from the ready list
-
- // Task * task ;
- // while ( ! CAS( & s_ready , task = s_ready , s_ready->m_next ) );
-
- Task * task = s_ready ;
-
- s_ready = task->m_next ;
-
- task->m_next = 0 ;
-
- // precondition: task->m_state = TASK_STATE_WAITING
- // precondition: task->m_dep[i]->m_state == TASK_STATE_COMPLETE for all i
- // precondition: does not exist T such that T->m_wait = task
- // precondition: does not exist T such that T->m_next = task
-
- task->m_state = Kokkos::Experimental::TASK_STATE_EXECUTING ;
-
- (*task->m_apply)( task );
-
- if ( task->m_state == Kokkos::Experimental::TASK_STATE_EXECUTING ) {
- // task did not respawn itself
- task->m_state = Kokkos::Experimental::TASK_STATE_COMPLETE ;
-
- // release dependences:
- for ( int i = 0 ; i < task->m_dep_size ; ++i ) {
- assign( task->m_dep + i , 0 );
- }
-
- // Stop other tasks from adding themselves to 'task->m_wait' ;
-
- Task * x ;
- // CAS( & task->m_wait , x = task->m_wait , s_denied );
- x = task->m_wait ; task->m_wait = s_denied ;
-
- // update tasks waiting on this task
- while ( x ) {
- Task * const next = x->m_next ;
-
- x->m_next = 0 ;
-
- x->schedule(); // could happen concurrently
-
- x = next ;
- }
-
- // Decrement to match the initial scheduling increment
- assign( & task , 0 );
- }
- }
-}
-
-} // namespace Impl
-} // namespace Experimental
-} // namespace Kokkos
-
-#endif /* #if defined( KOKKOS_HAVE_SERIAL ) && defined( KOKKOS_ENABLE_TASKPOLICY ) */
-
diff --git a/lib/kokkos/core/src/impl/Kokkos_Serial_TaskPolicy.hpp b/lib/kokkos/core/src/impl/Kokkos_Serial_TaskPolicy.hpp
deleted file mode 100644
index a333f948a..000000000
--- a/lib/kokkos/core/src/impl/Kokkos_Serial_TaskPolicy.hpp
+++ /dev/null
@@ -1,677 +0,0 @@
-/*
-//@HEADER
-// ************************************************************************
-//
-// Kokkos v. 2.0
-// Copyright (2014) Sandia Corporation
-//
-// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
-// the U.S. Government retains certain rights in this software.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are
-// met:
-//
-// 1. Redistributions of source code must retain the above copyright
-// notice, this list of conditions and the following disclaimer.
-//
-// 2. Redistributions in binary form must reproduce the above copyright
-// notice, this list of conditions and the following disclaimer in the
-// documentation and/or other materials provided with the distribution.
-//
-// 3. Neither the name of the Corporation nor the names of the
-// contributors may be used to endorse or promote products derived from
-// this software without specific prior written permission.
-//
-// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
-// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
-// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
-// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
-// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
-// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
-// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
-// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
-// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-//
-// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
-// ************************************************************************
-//@HEADER
-*/
-
-// Experimental unified task-data parallel manycore LDRD
-
-#ifndef KOKKOS_EXPERIMENTAL_SERIAL_TASKPOLICY_HPP
-#define KOKKOS_EXPERIMENTAL_SERIAL_TASKPOLICY_HPP
-
-#include <Kokkos_Macros.hpp>
-
-#if defined( KOKKOS_HAVE_SERIAL )
-
-#include <string>
-#include <typeinfo>
-#include <stdexcept>
-
-#include <Kokkos_Serial.hpp>
-#include <Kokkos_TaskPolicy.hpp>
-#include <Kokkos_View.hpp>
-
-#if defined( KOKKOS_ENABLE_TASKPOLICY )
-
-#include <impl/Kokkos_FunctorAdapter.hpp>
-
-//----------------------------------------------------------------------------
-/* Inheritance structure to allow static_cast from the task root type
- * and a task's FunctorType.
- *
- * task_root_type == TaskMember< Space , void , void >
- *
- * TaskMember< PolicyType , ResultType , FunctorType >
- * : TaskMember< PolicyType::Space , ResultType , FunctorType >
- * { ... };
- *
- * TaskMember< Space , ResultType , FunctorType >
- * : TaskMember< Space , ResultType , void >
- * , FunctorType
- * { ... };
- *
- * when ResultType != void
- *
- * TaskMember< Space , ResultType , void >
- * : TaskMember< Space , void , void >
- * { ... };
- *
- */
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Experimental {
-namespace Impl {
-
-/** \brief Base class for all tasks in the Serial execution space */
-template<>
-class TaskMember< Kokkos::Serial , void , void >
-{
-public:
-
- typedef void (* function_apply_type) ( TaskMember * );
- typedef void (* function_dealloc_type)( TaskMember * );
- typedef TaskMember * (* function_verify_type) ( TaskMember * );
-
-private:
-
- const function_dealloc_type m_dealloc ; ///< Deallocation
- const function_verify_type m_verify ; ///< Result type verification
- const function_apply_type m_apply ; ///< Apply function
- TaskMember ** const m_dep ; ///< Dependences
- TaskMember * m_wait ; ///< Linked list of tasks waiting on this task
- TaskMember * m_next ; ///< Linked list of tasks waiting on a different task
- const int m_dep_capacity ; ///< Capacity of dependences
- int m_dep_size ; ///< Actual count of dependences
- int m_ref_count ; ///< Reference count
- int m_state ; ///< State of the task
-
- // size = 6 Pointers + 4 ints
-
- TaskMember() /* = delete */ ;
- TaskMember( const TaskMember & ) /* = delete */ ;
- TaskMember & operator = ( const TaskMember & ) /* = delete */ ;
-
- static void * allocate( const unsigned arg_sizeof_derived , const unsigned arg_dependence_capacity );
- static void deallocate( void * );
-
- void throw_error_add_dependence() const ;
- static void throw_error_verify_type();
-
- template < class DerivedTaskType >
- static
- void deallocate( TaskMember * t )
- {
- DerivedTaskType * ptr = static_cast< DerivedTaskType * >(t);
- ptr->~DerivedTaskType();
- deallocate( (void *) ptr );
- }
-
-protected :
-
- ~TaskMember();
-
- // Used by TaskMember< Serial , ResultType , void >
- TaskMember( const function_verify_type arg_verify
- , const function_dealloc_type arg_dealloc
- , const function_apply_type arg_apply
- , const unsigned arg_sizeof_derived
- , const unsigned arg_dependence_capacity
- );
-
- // Used for TaskMember< Serial , void , void >
- TaskMember( const function_dealloc_type arg_dealloc
- , const function_apply_type arg_apply
- , const unsigned arg_sizeof_derived
- , const unsigned arg_dependence_capacity
- );
-
-public:
-
- template< typename ResultType >
- KOKKOS_FUNCTION static
- TaskMember * verify_type( TaskMember * t )
- {
- enum { check_type = ! Kokkos::Impl::is_same< ResultType , void >::value };
-
- if ( check_type && t != 0 ) {
-
- // Verify that t->m_verify is this function
- const function_verify_type self = & TaskMember::template verify_type< ResultType > ;
-
- if ( t->m_verify != self ) {
- t = 0 ;
-#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- throw_error_verify_type();
-#endif
- }
- }
- return t ;
- }
-
- //----------------------------------------
- /* Inheritence Requirements on task types:
- * typedef FunctorType::value_type value_type ;
- * class DerivedTaskType
- * : public TaskMember< Serial , value_type , FunctorType >
- * { ... };
- * class TaskMember< Serial , value_type , FunctorType >
- * : public TaskMember< Serial , value_type , void >
- * , public Functor
- * { ... };
- * If value_type != void
- * class TaskMember< Serial , value_type , void >
- * : public TaskMember< Serial , void , void >
- *
- * Allocate space for DerivedTaskType followed by TaskMember*[ dependence_capacity ]
- *
- */
-
- /** \brief Allocate and construct a single-thread task */
- template< class DerivedTaskType >
- static
- TaskMember * create( const typename DerivedTaskType::functor_type & arg_functor
- , const unsigned arg_dependence_capacity
- )
- {
- typedef typename DerivedTaskType::functor_type functor_type ;
- typedef typename functor_type::value_type value_type ;
-
- DerivedTaskType * const task =
- new( allocate( sizeof(DerivedTaskType) , arg_dependence_capacity ) )
- DerivedTaskType( & TaskMember::template deallocate< DerivedTaskType >
- , & TaskMember::template apply_single< functor_type , value_type >
- , sizeof(DerivedTaskType)
- , arg_dependence_capacity
- , arg_functor );
-
- return static_cast< TaskMember * >( task );
- }
-
- /** \brief Allocate and construct a data parallel task */
- template< class DerivedTaskType >
- static
- TaskMember * create( const typename DerivedTaskType::policy_type & arg_policy
- , const typename DerivedTaskType::functor_type & arg_functor
- , const unsigned arg_dependence_capacity
- )
- {
- DerivedTaskType * const task =
- new( allocate( sizeof(DerivedTaskType) , arg_dependence_capacity ) )
- DerivedTaskType( & TaskMember::template deallocate< DerivedTaskType >
- , sizeof(DerivedTaskType)
- , arg_dependence_capacity
- , arg_policy
- , arg_functor
- );
-
- return static_cast< TaskMember * >( task );
- }
-
- /** \brief Allocate and construct a thread-team task */
- template< class DerivedTaskType >
- static
- TaskMember * create_team( const typename DerivedTaskType::functor_type & arg_functor
- , const unsigned arg_dependence_capacity
- )
- {
- typedef typename DerivedTaskType::functor_type functor_type ;
- typedef typename functor_type::value_type value_type ;
-
- DerivedTaskType * const task =
- new( allocate( sizeof(DerivedTaskType) , arg_dependence_capacity ) )
- DerivedTaskType( & TaskMember::template deallocate< DerivedTaskType >
- , & TaskMember::template apply_team< functor_type , value_type >
- , sizeof(DerivedTaskType)
- , arg_dependence_capacity
- , arg_functor );
-
- return static_cast< TaskMember * >( task );
- }
-
- void schedule();
- static void execute_ready_tasks();
-
- //----------------------------------------
-
- typedef FutureValueTypeIsVoidError get_result_type ;
-
- KOKKOS_INLINE_FUNCTION
- get_result_type get() const { return get_result_type() ; }
-
- KOKKOS_INLINE_FUNCTION
- Kokkos::Experimental::TaskState get_state() const { return Kokkos::Experimental::TaskState( m_state ); }
-
- //----------------------------------------
-
-#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- static
- void assign( TaskMember ** const lhs , TaskMember * const rhs , const bool no_throw = false );
-#else
- KOKKOS_INLINE_FUNCTION static
- void assign( TaskMember ** const lhs , TaskMember * const rhs , const bool no_throw = false ) {}
-#endif
-
- KOKKOS_INLINE_FUNCTION
- TaskMember * get_dependence( int i ) const
- { return ( Kokkos::Experimental::TASK_STATE_EXECUTING == m_state && 0 <= i && i < m_dep_size ) ? m_dep[i] : (TaskMember*) 0 ; }
-
- KOKKOS_INLINE_FUNCTION
- int get_dependence() const
- { return m_dep_size ; }
-
- KOKKOS_INLINE_FUNCTION
- void clear_dependence()
- {
- for ( int i = 0 ; i < m_dep_size ; ++i ) assign( m_dep + i , 0 );
- m_dep_size = 0 ;
- }
-
- KOKKOS_INLINE_FUNCTION
- void add_dependence( TaskMember * before )
- {
- if ( ( Kokkos::Experimental::TASK_STATE_CONSTRUCTING == m_state ||
- Kokkos::Experimental::TASK_STATE_EXECUTING == m_state ) &&
- m_dep_size < m_dep_capacity ) {
- assign( m_dep + m_dep_size , before );
- ++m_dep_size ;
- }
- else {
- throw_error_add_dependence();
- }
- }
-
- //----------------------------------------
-
- template< class FunctorType , class ResultType >
- KOKKOS_INLINE_FUNCTION static
- void apply_single( typename Kokkos::Impl::enable_if< ! Kokkos::Impl::is_same< ResultType , void >::value , TaskMember * >::type t )
- {
- typedef TaskMember< Kokkos::Serial , ResultType , FunctorType > derived_type ;
-
- // TaskMember< Kokkos::Serial , ResultType , FunctorType >
- // : public TaskMember< Kokkos::Serial , ResultType , void >
- // , public FunctorType
- // { ... };
-
- derived_type & m = * static_cast< derived_type * >( t );
-
- Kokkos::Impl::FunctorApply< FunctorType , void , ResultType & >::apply( (FunctorType &) m , & m.m_result );
- }
-
- template< class FunctorType , class ResultType >
- KOKKOS_INLINE_FUNCTION static
- void apply_single( typename Kokkos::Impl::enable_if< Kokkos::Impl::is_same< ResultType , void >::value , TaskMember * >::type t )
- {
- typedef TaskMember< Kokkos::Serial , ResultType , FunctorType > derived_type ;
-
- // TaskMember< Kokkos::Serial , ResultType , FunctorType >
- // : public TaskMember< Kokkos::Serial , ResultType , void >
- // , public FunctorType
- // { ... };
-
- derived_type & m = * static_cast< derived_type * >( t );
-
- Kokkos::Impl::FunctorApply< FunctorType , void , void >::apply( (FunctorType &) m );
- }
-
- //----------------------------------------
-
- template< class FunctorType , class ResultType >
- static
- void apply_team( typename Kokkos::Impl::enable_if< ! Kokkos::Impl::is_same< ResultType , void >::value , TaskMember * >::type t )
- {
- typedef TaskMember< Kokkos::Serial , ResultType , FunctorType > derived_type ;
- typedef Kokkos::Impl::SerialTeamMember member_type ;
-
- // TaskMember< Kokkos::Serial , ResultType , FunctorType >
- // : public TaskMember< Kokkos::Serial , ResultType , void >
- // , public FunctorType
- // { ... };
-
- derived_type & m = * static_cast< derived_type * >( t );
-
- m.FunctorType::apply( member_type(0,1,0) , m.m_result );
- }
-
- template< class FunctorType , class ResultType >
- static
- void apply_team( typename Kokkos::Impl::enable_if< Kokkos::Impl::is_same< ResultType , void >::value , TaskMember * >::type t )
- {
- typedef TaskMember< Kokkos::Serial , ResultType , FunctorType > derived_type ;
- typedef Kokkos::Impl::SerialTeamMember member_type ;
-
- // TaskMember< Kokkos::Serial , ResultType , FunctorType >
- // : public TaskMember< Kokkos::Serial , ResultType , void >
- // , public FunctorType
- // { ... };
-
- derived_type & m = * static_cast< derived_type * >( t );
-
- m.FunctorType::apply( member_type(0,1,0) );
- }
-};
-
-//----------------------------------------------------------------------------
-/** \brief Base class for tasks with a result value in the Serial execution space.
- *
- * The FunctorType must be void because this class is accessed by the
- * Future class for the task and result value.
- *
- * Must be derived from TaskMember<S,void,void> 'root class' so the Future class
- * can correctly static_cast from the 'root class' to this class.
- */
-template < class ResultType >
-class TaskMember< Kokkos::Serial , ResultType , void >
- : public TaskMember< Kokkos::Serial , void , void >
-{
-public:
-
- ResultType m_result ;
-
- typedef const ResultType & get_result_type ;
-
- KOKKOS_INLINE_FUNCTION
- get_result_type get() const { return m_result ; }
-
-protected:
-
- typedef TaskMember< Kokkos::Serial , void , void > task_root_type ;
- typedef task_root_type::function_dealloc_type function_dealloc_type ;
- typedef task_root_type::function_apply_type function_apply_type ;
-
- inline
- TaskMember( const function_dealloc_type arg_dealloc
- , const function_apply_type arg_apply
- , const unsigned arg_sizeof_derived
- , const unsigned arg_dependence_capacity
- )
- : task_root_type( & task_root_type::template verify_type< ResultType >
- , arg_dealloc
- , arg_apply
- , arg_sizeof_derived
- , arg_dependence_capacity )
- , m_result()
- {}
-};
-
-template< class ResultType , class FunctorType >
-class TaskMember< Kokkos::Serial , ResultType , FunctorType >
- : public TaskMember< Kokkos::Serial , ResultType , void >
- , public FunctorType
-{
-public:
-
- typedef FunctorType functor_type ;
-
- typedef TaskMember< Kokkos::Serial , void , void > task_root_type ;
- typedef TaskMember< Kokkos::Serial , ResultType , void > task_base_type ;
- typedef task_root_type::function_dealloc_type function_dealloc_type ;
- typedef task_root_type::function_apply_type function_apply_type ;
-
- inline
- TaskMember( const function_dealloc_type arg_dealloc
- , const function_apply_type arg_apply
- , const unsigned arg_sizeof_derived
- , const unsigned arg_dependence_capacity
- , const functor_type & arg_functor
- )
- : task_base_type( arg_dealloc , arg_apply , arg_sizeof_derived , arg_dependence_capacity )
- , functor_type( arg_functor )
- {}
-};
-
-} /* namespace Impl */
-} /* namespace Experimental */
-} /* namespace Kokkos */
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Experimental {
-
-template<>
-class TaskPolicy< Kokkos::Serial >
-{
-public:
-
- typedef Kokkos::Serial execution_space ;
- typedef Kokkos::Impl::SerialTeamMember member_type ;
-
-private:
-
- typedef Impl::TaskMember< execution_space , void , void > task_root_type ;
-
- template< class FunctorType >
- static inline
- const task_root_type * get_task_root( const FunctorType * f )
- {
- typedef Impl::TaskMember< execution_space , typename FunctorType::value_type , FunctorType > task_type ;
- return static_cast< const task_root_type * >( static_cast< const task_type * >(f) );
- }
-
- template< class FunctorType >
- static inline
- task_root_type * get_task_root( FunctorType * f )
- {
- typedef Impl::TaskMember< execution_space , typename FunctorType::value_type , FunctorType > task_type ;
- return static_cast< task_root_type * >( static_cast< task_type * >(f) );
- }
-
- unsigned m_default_dependence_capacity ;
-
-public:
-
- // Stubbed out for now.
- KOKKOS_INLINE_FUNCTION
- int allocated_task_count() const { return 0 ; }
-
- TaskPolicy
- ( const unsigned /* arg_task_max_count */
- , const unsigned /* arg_task_max_size */
- , const unsigned arg_task_default_dependence_capacity = 4
- , const unsigned /* arg_task_team_size */ = 0
- )
- : m_default_dependence_capacity( arg_task_default_dependence_capacity )
- {}
-
- KOKKOS_FUNCTION TaskPolicy() = default ;
- KOKKOS_FUNCTION TaskPolicy( TaskPolicy && rhs ) = default ;
- KOKKOS_FUNCTION TaskPolicy( const TaskPolicy & rhs ) = default ;
- KOKKOS_FUNCTION TaskPolicy & operator = ( TaskPolicy && rhs ) = default ;
- KOKKOS_FUNCTION TaskPolicy & operator = ( const TaskPolicy & rhs ) = default ;
-
- //----------------------------------------
-
- template< class ValueType >
- KOKKOS_INLINE_FUNCTION
- const Future< ValueType , execution_space > &
- spawn( const Future< ValueType , execution_space > & f
- , const bool priority = false ) const
- {
-#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- f.m_task->schedule();
-#endif
- return f ;
- }
-
- //----------------------------------------
- // Create single-thread task
-
- template< class FunctorType >
- KOKKOS_INLINE_FUNCTION
- Future< typename FunctorType::value_type , execution_space >
- task_create( const FunctorType & functor
- , const unsigned dependence_capacity = ~0u ) const
- {
- typedef typename FunctorType::value_type value_type ;
- typedef Impl::TaskMember< execution_space , value_type , FunctorType > task_type ;
- return Future< value_type , execution_space >(
-#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- task_root_type::create< task_type >(
- functor , ( ~0u == dependence_capacity ? m_default_dependence_capacity : dependence_capacity ) )
-#endif
- );
- }
-
- template< class FunctorType >
- KOKKOS_INLINE_FUNCTION
- Future< typename FunctorType::value_type , execution_space >
- proc_create( const FunctorType & functor
- , const unsigned dependence_capacity = ~0u ) const
- { return task_create( functor , dependence_capacity ); }
-
- template< class FunctorType >
- KOKKOS_INLINE_FUNCTION
- Future< typename FunctorType::value_type , execution_space >
- task_create_team( const FunctorType & functor
- , const unsigned dependence_capacity = ~0u ) const
- {
- typedef typename FunctorType::value_type value_type ;
- typedef Impl::TaskMember< execution_space , value_type , FunctorType > task_type ;
- return Future< value_type , execution_space >(
-#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- task_root_type::create_team< task_type >(
- functor , ( ~0u == dependence_capacity ? m_default_dependence_capacity : dependence_capacity ) )
-#endif
- );
- }
-
- template< class FunctorType >
- KOKKOS_INLINE_FUNCTION
- Future< typename FunctorType::value_type , execution_space >
- proc_create_team( const FunctorType & functor
- , const unsigned dependence_capacity = ~0u ) const
- { return task_create_team( functor , dependence_capacity ); }
-
- //----------------------------------------
- // Add dependence
- template< class A1 , class A2 , class A3 , class A4 >
- KOKKOS_INLINE_FUNCTION
- void add_dependence( const Future<A1,A2> & after
- , const Future<A3,A4> & before
- , typename Kokkos::Impl::enable_if
- < Kokkos::Impl::is_same< typename Future<A1,A2>::execution_space , execution_space >::value
- &&
- Kokkos::Impl::is_same< typename Future<A3,A4>::execution_space , execution_space >::value
- >::type * = 0
- ) const
- {
-#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- after.m_task->add_dependence( before.m_task );
-#endif
- }
-
- //----------------------------------------
- // Functions for an executing task functor to query dependences,
- // set new dependences, and respawn itself.
-
- template< class FunctorType >
- KOKKOS_INLINE_FUNCTION
- Future< void , execution_space >
- get_dependence( const FunctorType * task_functor , int i ) const
- {
- return Future<void,execution_space>(
-#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- get_task_root(task_functor)->get_dependence(i)
-#endif
- );
- }
-
- template< class FunctorType >
- KOKKOS_INLINE_FUNCTION
- int get_dependence( const FunctorType * task_functor ) const
-#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- { return get_task_root(task_functor)->get_dependence(); }
-#else
- { return 0 ; }
-#endif
-
- template< class FunctorType >
- KOKKOS_INLINE_FUNCTION
- void clear_dependence( FunctorType * task_functor ) const
-#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- { get_task_root(task_functor)->clear_dependence(); }
-#else
- {}
-#endif
-
- template< class FunctorType , class A3 , class A4 >
- KOKKOS_INLINE_FUNCTION
- void add_dependence( FunctorType * task_functor
- , const Future<A3,A4> & before
- , typename Kokkos::Impl::enable_if
- < Kokkos::Impl::is_same< typename Future<A3,A4>::execution_space , execution_space >::value
- >::type * = 0
- ) const
-#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- { get_task_root(task_functor)->add_dependence( before.m_task ); }
-#else
- {}
-#endif
-
- template< class FunctorType >
- KOKKOS_INLINE_FUNCTION
- void respawn( FunctorType * task_functor
- , const bool priority = false ) const
- {
-#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- get_task_root(task_functor)->schedule();
-#endif
- }
-
- template< class FunctorType >
- KOKKOS_INLINE_FUNCTION
- void respawn_needing_memory( FunctorType * task_functor ) const
- {
-#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- get_task_root(task_functor)->schedule();
-#endif
- }
-
- //----------------------------------------
-
- static member_type & member_single();
-};
-
-inline
-void wait( TaskPolicy< Kokkos::Serial > & )
-{ Impl::TaskMember< Kokkos::Serial , void , void >::execute_ready_tasks(); }
-
-} /* namespace Experimental */
-} // namespace Kokkos
-
-//----------------------------------------------------------------------------
-
-#endif /* #if defined( KOKKOS_ENABLE_TASKPOLICY ) */
-#endif /* defined( KOKKOS_HAVE_SERIAL ) */
-#endif /* #define KOKKOS_EXPERIMENTAL_SERIAL_TASK_HPP */
-
diff --git a/lib/kokkos/core/src/impl/Kokkos_Shape.cpp b/lib/kokkos/core/src/impl/Kokkos_Shape.cpp
deleted file mode 100644
index da12db1f3..000000000
--- a/lib/kokkos/core/src/impl/Kokkos_Shape.cpp
+++ /dev/null
@@ -1,178 +0,0 @@
-/*
-//@HEADER
-// ************************************************************************
-//
-// Kokkos v. 2.0
-// Copyright (2014) Sandia Corporation
-//
-// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
-// the U.S. Government retains certain rights in this software.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are
-// met:
-//
-// 1. Redistributions of source code must retain the above copyright
-// notice, this list of conditions and the following disclaimer.
-//
-// 2. Redistributions in binary form must reproduce the above copyright
-// notice, this list of conditions and the following disclaimer in the
-// documentation and/or other materials provided with the distribution.
-//
-// 3. Neither the name of the Corporation nor the names of the
-// contributors may be used to endorse or promote products derived from
-// this software without specific prior written permission.
-//
-// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
-// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
-// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
-// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
-// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
-// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
-// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
-// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
-// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-//
-// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
-// ************************************************************************
-//@HEADER
-*/
-
-
-#include <sstream>
-#include <impl/Kokkos_Error.hpp>
-#include <impl/Kokkos_Shape.hpp>
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Impl {
-
-void assert_counts_are_equal_throw(
- const size_t x_count ,
- const size_t y_count )
-{
- std::ostringstream msg ;
-
- msg << "Kokkos::Impl::assert_counts_are_equal_throw( "
- << x_count << " != " << y_count << " )" ;
-
- throw_runtime_exception( msg.str() );
-}
-
-void assert_shapes_are_equal_throw(
- const unsigned x_scalar_size ,
- const unsigned x_rank ,
- const size_t x_N0 , const unsigned x_N1 ,
- const unsigned x_N2 , const unsigned x_N3 ,
- const unsigned x_N4 , const unsigned x_N5 ,
- const unsigned x_N6 , const unsigned x_N7 ,
-
- const unsigned y_scalar_size ,
- const unsigned y_rank ,
- const size_t y_N0 , const unsigned y_N1 ,
- const unsigned y_N2 , const unsigned y_N3 ,
- const unsigned y_N4 , const unsigned y_N5 ,
- const unsigned y_N6 , const unsigned y_N7 )
-{
- std::ostringstream msg ;
-
- msg << "Kokkos::Impl::assert_shape_are_equal_throw( {"
- << " scalar_size(" << x_scalar_size
- << ") rank(" << x_rank
- << ") dimension(" ;
- if ( 0 < x_rank ) { msg << " " << x_N0 ; }
- if ( 1 < x_rank ) { msg << " " << x_N1 ; }
- if ( 2 < x_rank ) { msg << " " << x_N2 ; }
- if ( 3 < x_rank ) { msg << " " << x_N3 ; }
- if ( 4 < x_rank ) { msg << " " << x_N4 ; }
- if ( 5 < x_rank ) { msg << " " << x_N5 ; }
- if ( 6 < x_rank ) { msg << " " << x_N6 ; }
- if ( 7 < x_rank ) { msg << " " << x_N7 ; }
- msg << " ) } != { "
- << " scalar_size(" << y_scalar_size
- << ") rank(" << y_rank
- << ") dimension(" ;
- if ( 0 < y_rank ) { msg << " " << y_N0 ; }
- if ( 1 < y_rank ) { msg << " " << y_N1 ; }
- if ( 2 < y_rank ) { msg << " " << y_N2 ; }
- if ( 3 < y_rank ) { msg << " " << y_N3 ; }
- if ( 4 < y_rank ) { msg << " " << y_N4 ; }
- if ( 5 < y_rank ) { msg << " " << y_N5 ; }
- if ( 6 < y_rank ) { msg << " " << y_N6 ; }
- if ( 7 < y_rank ) { msg << " " << y_N7 ; }
- msg << " ) } )" ;
-
- throw_runtime_exception( msg.str() );
-}
-
-void AssertShapeBoundsAbort< Kokkos::HostSpace >::apply(
- const size_t rank ,
- const size_t n0 , const size_t n1 ,
- const size_t n2 , const size_t n3 ,
- const size_t n4 , const size_t n5 ,
- const size_t n6 , const size_t n7 ,
-
- const size_t arg_rank ,
- const size_t i0 , const size_t i1 ,
- const size_t i2 , const size_t i3 ,
- const size_t i4 , const size_t i5 ,
- const size_t i6 , const size_t i7 )
-{
- std::ostringstream msg ;
- msg << "Kokkos::Impl::AssertShapeBoundsAbort( shape = {" ;
- if ( 0 < rank ) { msg << " " << n0 ; }
- if ( 1 < rank ) { msg << " " << n1 ; }
- if ( 2 < rank ) { msg << " " << n2 ; }
- if ( 3 < rank ) { msg << " " << n3 ; }
- if ( 4 < rank ) { msg << " " << n4 ; }
- if ( 5 < rank ) { msg << " " << n5 ; }
- if ( 6 < rank ) { msg << " " << n6 ; }
- if ( 7 < rank ) { msg << " " << n7 ; }
- msg << " } index = {" ;
- if ( 0 < arg_rank ) { msg << " " << i0 ; }
- if ( 1 < arg_rank ) { msg << " " << i1 ; }
- if ( 2 < arg_rank ) { msg << " " << i2 ; }
- if ( 3 < arg_rank ) { msg << " " << i3 ; }
- if ( 4 < arg_rank ) { msg << " " << i4 ; }
- if ( 5 < arg_rank ) { msg << " " << i5 ; }
- if ( 6 < arg_rank ) { msg << " " << i6 ; }
- if ( 7 < arg_rank ) { msg << " " << i7 ; }
- msg << " } )" ;
-
- throw_runtime_exception( msg.str() );
-}
-
-void assert_shape_effective_rank1_at_leastN_throw(
- const size_t x_rank , const size_t x_N0 ,
- const size_t x_N1 , const size_t x_N2 ,
- const size_t x_N3 , const size_t x_N4 ,
- const size_t x_N5 , const size_t x_N6 ,
- const size_t x_N7 ,
- const size_t N0 )
-{
- std::ostringstream msg ;
-
- msg << "Kokkos::Impl::assert_shape_effective_rank1_at_leastN_throw( shape = {" ;
- if ( 0 < x_rank ) { msg << " " << x_N0 ; }
- if ( 1 < x_rank ) { msg << " " << x_N1 ; }
- if ( 2 < x_rank ) { msg << " " << x_N2 ; }
- if ( 3 < x_rank ) { msg << " " << x_N3 ; }
- if ( 4 < x_rank ) { msg << " " << x_N4 ; }
- if ( 5 < x_rank ) { msg << " " << x_N5 ; }
- if ( 6 < x_rank ) { msg << " " << x_N6 ; }
- if ( 7 < x_rank ) { msg << " " << x_N7 ; }
- msg << " } N = " << N0 << " )" ;
-
- throw_runtime_exception( msg.str() );
-}
-
-
-
-}
-}
-
diff --git a/lib/kokkos/core/src/impl/Kokkos_Shape.hpp b/lib/kokkos/core/src/impl/Kokkos_Shape.hpp
deleted file mode 100644
index 9749e0a1f..000000000
--- a/lib/kokkos/core/src/impl/Kokkos_Shape.hpp
+++ /dev/null
@@ -1,917 +0,0 @@
-/*
-//@HEADER
-// ************************************************************************
-//
-// Kokkos v. 2.0
-// Copyright (2014) Sandia Corporation
-//
-// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
-// the U.S. Government retains certain rights in this software.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are
-// met:
-//
-// 1. Redistributions of source code must retain the above copyright
-// notice, this list of conditions and the following disclaimer.
-//
-// 2. Redistributions in binary form must reproduce the above copyright
-// notice, this list of conditions and the following disclaimer in the
-// documentation and/or other materials provided with the distribution.
-//
-// 3. Neither the name of the Corporation nor the names of the
-// contributors may be used to endorse or promote products derived from
-// this software without specific prior written permission.
-//
-// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
-// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
-// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
-// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
-// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
-// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
-// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
-// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
-// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-//
-// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
-// ************************************************************************
-//@HEADER
-*/
-
-#ifndef KOKKOS_SHAPE_HPP
-#define KOKKOS_SHAPE_HPP
-
-#include <typeinfo>
-#include <utility>
-#include <Kokkos_Core_fwd.hpp>
-#include <impl/Kokkos_Traits.hpp>
-#include <impl/Kokkos_StaticAssert.hpp>
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Impl {
-
-//----------------------------------------------------------------------------
-/** \brief The shape of a Kokkos with dynamic and static dimensions.
- * Dynamic dimensions are member values and static dimensions are
- * 'static const' values.
- *
- * The upper bound on the array rank is eight.
- */
-template< unsigned ScalarSize ,
- unsigned Rank ,
- unsigned s0 = 1 ,
- unsigned s1 = 1 ,
- unsigned s2 = 1 ,
- unsigned s3 = 1 ,
- unsigned s4 = 1 ,
- unsigned s5 = 1 ,
- unsigned s6 = 1 ,
- unsigned s7 = 1 >
-struct Shape ;
-
-//----------------------------------------------------------------------------
-/** \brief Shape equality if the value type, layout, and dimensions
- * are equal.
- */
-template< unsigned xSize , unsigned xRank ,
- unsigned xN0 , unsigned xN1 , unsigned xN2 , unsigned xN3 ,
- unsigned xN4 , unsigned xN5 , unsigned xN6 , unsigned xN7 ,
-
- unsigned ySize , unsigned yRank ,
- unsigned yN0 , unsigned yN1 , unsigned yN2 , unsigned yN3 ,
- unsigned yN4 , unsigned yN5 , unsigned yN6 , unsigned yN7 >
-KOKKOS_INLINE_FUNCTION
-bool operator == ( const Shape<xSize,xRank,xN0,xN1,xN2,xN3,xN4,xN5,xN6,xN7> & x ,
- const Shape<ySize,yRank,yN0,yN1,yN2,yN3,yN4,yN5,yN6,yN7> & y )
-{
- enum { same_size = xSize == ySize };
- enum { same_rank = xRank == yRank };
-
- return same_size && same_rank &&
- size_t( x.N0 ) == size_t( y.N0 ) &&
- unsigned( x.N1 ) == unsigned( y.N1 ) &&
- unsigned( x.N2 ) == unsigned( y.N2 ) &&
- unsigned( x.N3 ) == unsigned( y.N3 ) &&
- unsigned( x.N4 ) == unsigned( y.N4 ) &&
- unsigned( x.N5 ) == unsigned( y.N5 ) &&
- unsigned( x.N6 ) == unsigned( y.N6 ) &&
- unsigned( x.N7 ) == unsigned( y.N7 ) ;
-}
-
-template< unsigned xSize , unsigned xRank ,
- unsigned xN0 , unsigned xN1 , unsigned xN2 , unsigned xN3 ,
- unsigned xN4 , unsigned xN5 , unsigned xN6 , unsigned xN7 ,
-
- unsigned ySize ,unsigned yRank ,
- unsigned yN0 , unsigned yN1 , unsigned yN2 , unsigned yN3 ,
- unsigned yN4 , unsigned yN5 , unsigned yN6 , unsigned yN7 >
-KOKKOS_INLINE_FUNCTION
-bool operator != ( const Shape<xSize,xRank,xN0,xN1,xN2,xN3,xN4,xN5,xN6,xN7> & x ,
- const Shape<ySize,yRank,yN0,yN1,yN2,yN3,yN4,yN5,yN6,yN7> & y )
-{ return ! operator == ( x , y ); }
-
-//----------------------------------------------------------------------------
-
-void assert_counts_are_equal_throw(
- const size_t x_count ,
- const size_t y_count );
-
-inline
-void assert_counts_are_equal(
- const size_t x_count ,
- const size_t y_count )
-{
- if ( x_count != y_count ) {
- assert_counts_are_equal_throw( x_count , y_count );
- }
-}
-
-void assert_shapes_are_equal_throw(
- const unsigned x_scalar_size ,
- const unsigned x_rank ,
- const size_t x_N0 , const unsigned x_N1 ,
- const unsigned x_N2 , const unsigned x_N3 ,
- const unsigned x_N4 , const unsigned x_N5 ,
- const unsigned x_N6 , const unsigned x_N7 ,
-
- const unsigned y_scalar_size ,
- const unsigned y_rank ,
- const size_t y_N0 , const unsigned y_N1 ,
- const unsigned y_N2 , const unsigned y_N3 ,
- const unsigned y_N4 , const unsigned y_N5 ,
- const unsigned y_N6 , const unsigned y_N7 );
-
-template< unsigned xSize , unsigned xRank ,
- unsigned xN0 , unsigned xN1 , unsigned xN2 , unsigned xN3 ,
- unsigned xN4 , unsigned xN5 , unsigned xN6 , unsigned xN7 ,
-
- unsigned ySize , unsigned yRank ,
- unsigned yN0 , unsigned yN1 , unsigned yN2 , unsigned yN3 ,
- unsigned yN4 , unsigned yN5 , unsigned yN6 , unsigned yN7 >
-inline
-void assert_shapes_are_equal(
- const Shape<xSize,xRank,xN0,xN1,xN2,xN3,xN4,xN5,xN6,xN7> & x ,
- const Shape<ySize,yRank,yN0,yN1,yN2,yN3,yN4,yN5,yN6,yN7> & y )
-{
- typedef Shape<xSize,xRank,xN0,xN1,xN2,xN3,xN4,xN5,xN6,xN7> x_type ;
- typedef Shape<ySize,yRank,yN0,yN1,yN2,yN3,yN4,yN5,yN6,yN7> y_type ;
-
- if ( x != y ) {
- assert_shapes_are_equal_throw(
- x_type::scalar_size, x_type::rank, x.N0, x.N1, x.N2, x.N3, x.N4, x.N5, x.N6, x.N7,
- y_type::scalar_size, y_type::rank, y.N0, y.N1, y.N2, y.N3, y.N4, y.N5, y.N6, y.N7 );
- }
-}
-
-template< unsigned xSize , unsigned xRank ,
- unsigned xN0 , unsigned xN1 , unsigned xN2 , unsigned xN3 ,
- unsigned xN4 , unsigned xN5 , unsigned xN6 , unsigned xN7 ,
-
- unsigned ySize , unsigned yRank ,
- unsigned yN0 , unsigned yN1 , unsigned yN2 , unsigned yN3 ,
- unsigned yN4 , unsigned yN5 , unsigned yN6 , unsigned yN7 >
-void assert_shapes_equal_dimension(
- const Shape<xSize,xRank,xN0,xN1,xN2,xN3,xN4,xN5,xN6,xN7> & x ,
- const Shape<ySize,yRank,yN0,yN1,yN2,yN3,yN4,yN5,yN6,yN7> & y )
-{
- typedef Shape<xSize,xRank,xN0,xN1,xN2,xN3,xN4,xN5,xN6,xN7> x_type ;
- typedef Shape<ySize,yRank,yN0,yN1,yN2,yN3,yN4,yN5,yN6,yN7> y_type ;
-
- // Omit comparison of scalar_size.
- if ( unsigned( x.rank ) != unsigned( y.rank ) ||
- size_t( x.N0 ) != size_t( y.N0 ) ||
- unsigned( x.N1 ) != unsigned( y.N1 ) ||
- unsigned( x.N2 ) != unsigned( y.N2 ) ||
- unsigned( x.N3 ) != unsigned( y.N3 ) ||
- unsigned( x.N4 ) != unsigned( y.N4 ) ||
- unsigned( x.N5 ) != unsigned( y.N5 ) ||
- unsigned( x.N6 ) != unsigned( y.N6 ) ||
- unsigned( x.N7 ) != unsigned( y.N7 ) ) {
- assert_shapes_are_equal_throw(
- x_type::scalar_size, x_type::rank, x.N0, x.N1, x.N2, x.N3, x.N4, x.N5, x.N6, x.N7,
- y_type::scalar_size, y_type::rank, y.N0, y.N1, y.N2, y.N3, y.N4, y.N5, y.N6, y.N7 );
- }
-}
-
-//----------------------------------------------------------------------------
-
-template< class ShapeType > struct assert_shape_is_rank_zero ;
-template< class ShapeType > struct assert_shape_is_rank_one ;
-
-template< unsigned Size >
-struct assert_shape_is_rank_zero< Shape<Size,0> >
- : public true_type {};
-
-template< unsigned Size , unsigned s0 >
-struct assert_shape_is_rank_one< Shape<Size,1,s0> >
- : public true_type {};
-
-//----------------------------------------------------------------------------
-
-/** \brief Array bounds assertion templated on the execution space
- * to allow device-specific abort code.
- */
-template< class Space >
-struct AssertShapeBoundsAbort ;
-
-template<>
-struct AssertShapeBoundsAbort< Kokkos::HostSpace >
-{
- static void apply( const size_t rank ,
- const size_t n0 , const size_t n1 ,
- const size_t n2 , const size_t n3 ,
- const size_t n4 , const size_t n5 ,
- const size_t n6 , const size_t n7 ,
- const size_t arg_rank ,
- const size_t i0 , const size_t i1 ,
- const size_t i2 , const size_t i3 ,
- const size_t i4 , const size_t i5 ,
- const size_t i6 , const size_t i7 );
-};
-
-template< class ExecutionSpace >
-struct AssertShapeBoundsAbort
-{
- KOKKOS_INLINE_FUNCTION
- static void apply( const size_t rank ,
- const size_t n0 , const size_t n1 ,
- const size_t n2 , const size_t n3 ,
- const size_t n4 , const size_t n5 ,
- const size_t n6 , const size_t n7 ,
- const size_t arg_rank ,
- const size_t i0 , const size_t i1 ,
- const size_t i2 , const size_t i3 ,
- const size_t i4 , const size_t i5 ,
- const size_t i6 , const size_t i7 )
- {
- AssertShapeBoundsAbort< Kokkos::HostSpace >
- ::apply( rank , n0 , n1 , n2 , n3 , n4 , n5 , n6 , n7 ,
- arg_rank, i0 , i1 , i2 , i3 , i4 , i5 , i6 , i7 );
- }
-};
-
-template< class ShapeType >
-KOKKOS_INLINE_FUNCTION
-void assert_shape_bounds( const ShapeType & shape ,
- const size_t arg_rank ,
- const size_t i0 ,
- const size_t i1 = 0 ,
- const size_t i2 = 0 ,
- const size_t i3 = 0 ,
- const size_t i4 = 0 ,
- const size_t i5 = 0 ,
- const size_t i6 = 0 ,
- const size_t i7 = 0 )
-{
- // Must supply at least as many indices as ranks.
- // Every index must be within bounds.
- const bool ok = ShapeType::rank <= arg_rank &&
- i0 < size_t(shape.N0) &&
- i1 < size_t(shape.N1) &&
- i2 < size_t(shape.N2) &&
- i3 < size_t(shape.N3) &&
- i4 < size_t(shape.N4) &&
- i5 < size_t(shape.N5) &&
- i6 < size_t(shape.N6) &&
- i7 < size_t(shape.N7) ;
-
- if ( ! ok ) {
- AssertShapeBoundsAbort< Kokkos::Impl::ActiveExecutionMemorySpace >
- ::apply( ShapeType::rank ,
- shape.N0 , shape.N1 , shape.N2 , shape.N3 ,
- shape.N4 , shape.N5 , shape.N6 , shape.N7 ,
- arg_rank , i0 , i1 , i2 , i3 , i4 , i5 , i6 , i7 );
- }
-}
-
-#if defined( KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK )
-#define KOKKOS_ASSERT_SHAPE_BOUNDS_1( S , I0 ) assert_shape_bounds(S,1,I0);
-#define KOKKOS_ASSERT_SHAPE_BOUNDS_2( S , I0 , I1 ) assert_shape_bounds(S,2,I0,I1);
-#define KOKKOS_ASSERT_SHAPE_BOUNDS_3( S , I0 , I1 , I2 ) assert_shape_bounds(S,3,I0,I1,I2);
-#define KOKKOS_ASSERT_SHAPE_BOUNDS_4( S , I0 , I1 , I2 , I3 ) assert_shape_bounds(S,4,I0,I1,I2,I3);
-#define KOKKOS_ASSERT_SHAPE_BOUNDS_5( S , I0 , I1 , I2 , I3 , I4 ) assert_shape_bounds(S,5,I0,I1,I2,I3,I4);
-#define KOKKOS_ASSERT_SHAPE_BOUNDS_6( S , I0 , I1 , I2 , I3 , I4 , I5 ) assert_shape_bounds(S,6,I0,I1,I2,I3,I4,I5);
-#define KOKKOS_ASSERT_SHAPE_BOUNDS_7( S , I0 , I1 , I2 , I3 , I4 , I5 , I6 ) assert_shape_bounds(S,7,I0,I1,I2,I3,I4,I5,I6);
-#define KOKKOS_ASSERT_SHAPE_BOUNDS_8( S , I0 , I1 , I2 , I3 , I4 , I5 , I6 , I7 ) assert_shape_bounds(S,8,I0,I1,I2,I3,I4,I5,I6,I7);
-#else
-#define KOKKOS_ASSERT_SHAPE_BOUNDS_1( S , I0 ) /* */
-#define KOKKOS_ASSERT_SHAPE_BOUNDS_2( S , I0 , I1 ) /* */
-#define KOKKOS_ASSERT_SHAPE_BOUNDS_3( S , I0 , I1 , I2 ) /* */
-#define KOKKOS_ASSERT_SHAPE_BOUNDS_4( S , I0 , I1 , I2 , I3 ) /* */
-#define KOKKOS_ASSERT_SHAPE_BOUNDS_5( S , I0 , I1 , I2 , I3 , I4 ) /* */
-#define KOKKOS_ASSERT_SHAPE_BOUNDS_6( S , I0 , I1 , I2 , I3 , I4 , I5 ) /* */
-#define KOKKOS_ASSERT_SHAPE_BOUNDS_7( S , I0 , I1 , I2 , I3 , I4 , I5 , I6 ) /* */
-#define KOKKOS_ASSERT_SHAPE_BOUNDS_8( S , I0 , I1 , I2 , I3 , I4 , I5 , I6 , I7 ) /* */
-#endif
-
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-// Specialization and optimization for the Rank 0 shape.
-
-template < unsigned ScalarSize >
-struct Shape< ScalarSize , 0, 1,1,1,1, 1,1,1,1 >
-{
- enum { scalar_size = ScalarSize };
- enum { rank_dynamic = 0 };
- enum { rank = 0 };
-
- enum { N0 = 1 };
- enum { N1 = 1 };
- enum { N2 = 1 };
- enum { N3 = 1 };
- enum { N4 = 1 };
- enum { N5 = 1 };
- enum { N6 = 1 };
- enum { N7 = 1 };
-
- KOKKOS_INLINE_FUNCTION
- static
- void assign( Shape & ,
- unsigned = 0 , unsigned = 0 , unsigned = 0 , unsigned = 0 ,
- unsigned = 0 , unsigned = 0 , unsigned = 0 , unsigned = 0 )
- {}
-};
-
-//----------------------------------------------------------------------------
-
-template< unsigned R > struct assign_shape_dimension ;
-
-#define KOKKOS_ASSIGN_SHAPE_DIMENSION( R ) \
-template<> \
-struct assign_shape_dimension< R > \
-{ \
- template< class ShapeType > \
- KOKKOS_INLINE_FUNCTION \
- assign_shape_dimension( ShapeType & shape \
- , typename Impl::enable_if<( R < ShapeType::rank_dynamic ), size_t >::type n \
- ) { shape.N ## R = n ; } \
-};
-
-KOKKOS_ASSIGN_SHAPE_DIMENSION(0)
-KOKKOS_ASSIGN_SHAPE_DIMENSION(1)
-KOKKOS_ASSIGN_SHAPE_DIMENSION(2)
-KOKKOS_ASSIGN_SHAPE_DIMENSION(3)
-KOKKOS_ASSIGN_SHAPE_DIMENSION(4)
-KOKKOS_ASSIGN_SHAPE_DIMENSION(5)
-KOKKOS_ASSIGN_SHAPE_DIMENSION(6)
-KOKKOS_ASSIGN_SHAPE_DIMENSION(7)
-
-#undef KOKKOS_ASSIGN_SHAPE_DIMENSION
-
-//----------------------------------------------------------------------------
-// All-static dimension array
-
-template < unsigned ScalarSize ,
- unsigned Rank ,
- unsigned s0 ,
- unsigned s1 ,
- unsigned s2 ,
- unsigned s3 ,
- unsigned s4 ,
- unsigned s5 ,
- unsigned s6 ,
- unsigned s7 >
-struct Shape {
-
- enum { scalar_size = ScalarSize };
- enum { rank_dynamic = 0 };
- enum { rank = Rank };
-
- enum { N0 = s0 };
- enum { N1 = s1 };
- enum { N2 = s2 };
- enum { N3 = s3 };
- enum { N4 = s4 };
- enum { N5 = s5 };
- enum { N6 = s6 };
- enum { N7 = s7 };
-
- KOKKOS_INLINE_FUNCTION
- static
- void assign( Shape & ,
- unsigned = 0 , unsigned = 0 , unsigned = 0 , unsigned = 0 ,
- unsigned = 0 , unsigned = 0 , unsigned = 0 , unsigned = 0 )
- {}
-};
-
-// 1 == dynamic_rank <= rank <= 8
-template < unsigned ScalarSize ,
- unsigned Rank ,
- unsigned s1 ,
- unsigned s2 ,
- unsigned s3 ,
- unsigned s4 ,
- unsigned s5 ,
- unsigned s6 ,
- unsigned s7 >
-struct Shape< ScalarSize , Rank , 0,s1,s2,s3, s4,s5,s6,s7 >
-{
- enum { scalar_size = ScalarSize };
- enum { rank_dynamic = 1 };
- enum { rank = Rank };
-
- size_t N0 ; // For 1 == dynamic_rank allow N0 > 2^32
-
- enum { N1 = s1 };
- enum { N2 = s2 };
- enum { N3 = s3 };
- enum { N4 = s4 };
- enum { N5 = s5 };
- enum { N6 = s6 };
- enum { N7 = s7 };
-
- KOKKOS_INLINE_FUNCTION
- static
- void assign( Shape & s ,
- size_t n0 , unsigned = 0 , unsigned = 0 , unsigned = 0 ,
- unsigned = 0 , unsigned = 0 , unsigned = 0 , unsigned = 0 )
- { s.N0 = n0 ; }
-};
-
-// 2 == dynamic_rank <= rank <= 8
-template < unsigned ScalarSize , unsigned Rank ,
- unsigned s2 ,
- unsigned s3 ,
- unsigned s4 ,
- unsigned s5 ,
- unsigned s6 ,
- unsigned s7 >
-struct Shape< ScalarSize , Rank , 0,0,s2,s3, s4,s5,s6,s7 >
-{
- enum { scalar_size = ScalarSize };
- enum { rank_dynamic = 2 };
- enum { rank = Rank };
-
- unsigned N0 ;
- unsigned N1 ;
-
- enum { N2 = s2 };
- enum { N3 = s3 };
- enum { N4 = s4 };
- enum { N5 = s5 };
- enum { N6 = s6 };
- enum { N7 = s7 };
-
- KOKKOS_INLINE_FUNCTION
- static
- void assign( Shape & s ,
- unsigned n0 , unsigned n1 , unsigned = 0 , unsigned = 0 ,
- unsigned = 0 , unsigned = 0 , unsigned = 0 , unsigned = 0 )
- { s.N0 = n0 ; s.N1 = n1 ; }
-};
-
-// 3 == dynamic_rank <= rank <= 8
-template < unsigned Rank , unsigned ScalarSize ,
- unsigned s3 ,
- unsigned s4 ,
- unsigned s5 ,
- unsigned s6 ,
- unsigned s7 >
-struct Shape< ScalarSize , Rank , 0,0,0,s3, s4,s5,s6,s7>
-{
- enum { scalar_size = ScalarSize };
- enum { rank_dynamic = 3 };
- enum { rank = Rank };
-
- unsigned N0 ;
- unsigned N1 ;
- unsigned N2 ;
-
- enum { N3 = s3 };
- enum { N4 = s4 };
- enum { N5 = s5 };
- enum { N6 = s6 };
- enum { N7 = s7 };
-
- KOKKOS_INLINE_FUNCTION
- static
- void assign( Shape & s ,
- unsigned n0 , unsigned n1 , unsigned n2 , unsigned = 0 ,
- unsigned = 0 , unsigned = 0 , unsigned = 0 , unsigned = 0 )
- { s.N0 = n0 ; s.N1 = n1 ; s.N2 = n2 ; }
-};
-
-// 4 == dynamic_rank <= rank <= 8
-template < unsigned ScalarSize , unsigned Rank ,
- unsigned s4 ,
- unsigned s5 ,
- unsigned s6 ,
- unsigned s7 >
-struct Shape< ScalarSize , Rank, 0,0,0,0, s4,s5,s6,s7 >
-{
- enum { scalar_size = ScalarSize };
- enum { rank_dynamic = 4 };
- enum { rank = Rank };
-
- unsigned N0 ;
- unsigned N1 ;
- unsigned N2 ;
- unsigned N3 ;
-
- enum { N4 = s4 };
- enum { N5 = s5 };
- enum { N6 = s6 };
- enum { N7 = s7 };
-
- KOKKOS_INLINE_FUNCTION
- static
- void assign( Shape & s ,
- unsigned n0 , unsigned n1 , unsigned n2 , unsigned n3 ,
- unsigned = 0 , unsigned = 0 , unsigned = 0 , unsigned = 0 )
- { s.N0 = n0 ; s.N1 = n1 ; s.N2 = n2 ; s.N3 = n3 ; }
-};
-
-// 5 == dynamic_rank <= rank <= 8
-template < unsigned ScalarSize , unsigned Rank ,
- unsigned s5 ,
- unsigned s6 ,
- unsigned s7 >
-struct Shape< ScalarSize , Rank , 0,0,0,0, 0,s5,s6,s7 >
-{
- enum { scalar_size = ScalarSize };
- enum { rank_dynamic = 5 };
- enum { rank = Rank };
-
- unsigned N0 ;
- unsigned N1 ;
- unsigned N2 ;
- unsigned N3 ;
- unsigned N4 ;
-
- enum { N5 = s5 };
- enum { N6 = s6 };
- enum { N7 = s7 };
-
- KOKKOS_INLINE_FUNCTION
- static
- void assign( Shape & s ,
- unsigned n0 , unsigned n1 , unsigned n2 , unsigned n3 ,
- unsigned n4 , unsigned = 0 , unsigned = 0 , unsigned = 0 )
- { s.N0 = n0 ; s.N1 = n1 ; s.N2 = n2 ; s.N3 = n3 ; s.N4 = n4 ; }
-};
-
-// 6 == dynamic_rank <= rank <= 8
-template < unsigned ScalarSize , unsigned Rank ,
- unsigned s6 ,
- unsigned s7 >
-struct Shape< ScalarSize , Rank , 0,0,0,0, 0,0,s6,s7 >
-{
- enum { scalar_size = ScalarSize };
- enum { rank_dynamic = 6 };
- enum { rank = Rank };
-
- unsigned N0 ;
- unsigned N1 ;
- unsigned N2 ;
- unsigned N3 ;
- unsigned N4 ;
- unsigned N5 ;
-
- enum { N6 = s6 };
- enum { N7 = s7 };
-
- KOKKOS_INLINE_FUNCTION
- static
- void assign( Shape & s ,
- unsigned n0 , unsigned n1 , unsigned n2 , unsigned n3 ,
- unsigned n4 , unsigned n5 = 0 , unsigned = 0 , unsigned = 0 )
- {
- s.N0 = n0 ; s.N1 = n1 ; s.N2 = n2 ; s.N3 = n3 ;
- s.N4 = n4 ; s.N5 = n5 ;
- }
-};
-
-// 7 == dynamic_rank <= rank <= 8
-template < unsigned ScalarSize , unsigned Rank ,
- unsigned s7 >
-struct Shape< ScalarSize , Rank , 0,0,0,0, 0,0,0,s7 >
-{
- enum { scalar_size = ScalarSize };
- enum { rank_dynamic = 7 };
- enum { rank = Rank };
-
- unsigned N0 ;
- unsigned N1 ;
- unsigned N2 ;
- unsigned N3 ;
- unsigned N4 ;
- unsigned N5 ;
- unsigned N6 ;
-
- enum { N7 = s7 };
-
- KOKKOS_INLINE_FUNCTION
- static
- void assign( Shape & s ,
- unsigned n0 , unsigned n1 , unsigned n2 , unsigned n3 ,
- unsigned n4 , unsigned n5 , unsigned n6 , unsigned = 0 )
- {
- s.N0 = n0 ; s.N1 = n1 ; s.N2 = n2 ; s.N3 = n3 ;
- s.N4 = n4 ; s.N5 = n5 ; s.N6 = n6 ;
- }
-};
-
-// 8 == dynamic_rank <= rank <= 8
-template < unsigned ScalarSize >
-struct Shape< ScalarSize , 8 , 0,0,0,0, 0,0,0,0 >
-{
- enum { scalar_size = ScalarSize };
- enum { rank_dynamic = 8 };
- enum { rank = 8 };
-
- unsigned N0 ;
- unsigned N1 ;
- unsigned N2 ;
- unsigned N3 ;
- unsigned N4 ;
- unsigned N5 ;
- unsigned N6 ;
- unsigned N7 ;
-
- KOKKOS_INLINE_FUNCTION
- static
- void assign( Shape & s ,
- unsigned n0 , unsigned n1 , unsigned n2 , unsigned n3 ,
- unsigned n4 , unsigned n5 , unsigned n6 , unsigned n7 )
- {
- s.N0 = n0 ; s.N1 = n1 ; s.N2 = n2 ; s.N3 = n3 ;
- s.N4 = n4 ; s.N5 = n5 ; s.N6 = n6 ; s.N7 = n7 ;
- }
-};
-
-//----------------------------------------------------------------------------
-
-template< class ShapeType , unsigned N ,
- unsigned R = ShapeType::rank_dynamic >
-struct ShapeInsert ;
-
-template< class ShapeType , unsigned N >
-struct ShapeInsert< ShapeType , N , 0 >
-{
- typedef Shape< ShapeType::scalar_size ,
- ShapeType::rank + 1 ,
- N ,
- ShapeType::N0 ,
- ShapeType::N1 ,
- ShapeType::N2 ,
- ShapeType::N3 ,
- ShapeType::N4 ,
- ShapeType::N5 ,
- ShapeType::N6 > type ;
-};
-
-template< class ShapeType , unsigned N >
-struct ShapeInsert< ShapeType , N , 1 >
-{
- typedef Shape< ShapeType::scalar_size ,
- ShapeType::rank + 1 ,
- 0 ,
- N ,
- ShapeType::N1 ,
- ShapeType::N2 ,
- ShapeType::N3 ,
- ShapeType::N4 ,
- ShapeType::N5 ,
- ShapeType::N6 > type ;
-};
-
-template< class ShapeType , unsigned N >
-struct ShapeInsert< ShapeType , N , 2 >
-{
- typedef Shape< ShapeType::scalar_size ,
- ShapeType::rank + 1 ,
- 0 ,
- 0 ,
- N ,
- ShapeType::N2 ,
- ShapeType::N3 ,
- ShapeType::N4 ,
- ShapeType::N5 ,
- ShapeType::N6 > type ;
-};
-
-template< class ShapeType , unsigned N >
-struct ShapeInsert< ShapeType , N , 3 >
-{
- typedef Shape< ShapeType::scalar_size ,
- ShapeType::rank + 1 ,
- 0 ,
- 0 ,
- 0 ,
- N ,
- ShapeType::N3 ,
- ShapeType::N4 ,
- ShapeType::N5 ,
- ShapeType::N6 > type ;
-};
-
-template< class ShapeType , unsigned N >
-struct ShapeInsert< ShapeType , N , 4 >
-{
- typedef Shape< ShapeType::scalar_size ,
- ShapeType::rank + 1 ,
- 0 ,
- 0 ,
- 0 ,
- 0 ,
- N ,
- ShapeType::N4 ,
- ShapeType::N5 ,
- ShapeType::N6 > type ;
-};
-
-template< class ShapeType , unsigned N >
-struct ShapeInsert< ShapeType , N , 5 >
-{
- typedef Shape< ShapeType::scalar_size ,
- ShapeType::rank + 1 ,
- 0 ,
- 0 ,
- 0 ,
- 0 ,
- 0 ,
- N ,
- ShapeType::N5 ,
- ShapeType::N6 > type ;
-};
-
-template< class ShapeType , unsigned N >
-struct ShapeInsert< ShapeType , N , 6 >
-{
- typedef Shape< ShapeType::scalar_size ,
- ShapeType::rank + 1 ,
- 0 ,
- 0 ,
- 0 ,
- 0 ,
- 0 ,
- 0 ,
- N ,
- ShapeType::N6 > type ;
-};
-
-template< class ShapeType , unsigned N >
-struct ShapeInsert< ShapeType , N , 7 >
-{
- typedef Shape< ShapeType::scalar_size ,
- ShapeType::rank + 1 ,
- 0 ,
- 0 ,
- 0 ,
- 0 ,
- 0 ,
- 0 ,
- 0 ,
- N > type ;
-};
-
-//----------------------------------------------------------------------------
-
-template< class DstShape , class SrcShape ,
- unsigned DstRankDynamic = DstShape::rank_dynamic ,
- bool DstRankDynamicOK = unsigned(DstShape::rank_dynamic) >= unsigned(SrcShape::rank_dynamic) >
-struct ShapeCompatible { enum { value = false }; };
-
-template< class DstShape , class SrcShape >
-struct ShapeCompatible< DstShape , SrcShape , 8 , true >
-{
- enum { value = unsigned(DstShape::scalar_size) == unsigned(SrcShape::scalar_size) };
-};
-
-template< class DstShape , class SrcShape >
-struct ShapeCompatible< DstShape , SrcShape , 7 , true >
-{
- enum { value = unsigned(DstShape::scalar_size) == unsigned(SrcShape::scalar_size) &&
- unsigned(DstShape::N7) == unsigned(SrcShape::N7) };
-};
-
-template< class DstShape , class SrcShape >
-struct ShapeCompatible< DstShape , SrcShape , 6 , true >
-{
- enum { value = unsigned(DstShape::scalar_size) == unsigned(SrcShape::scalar_size) &&
- unsigned(DstShape::N6) == unsigned(SrcShape::N6) &&
- unsigned(DstShape::N7) == unsigned(SrcShape::N7) };
-};
-
-template< class DstShape , class SrcShape >
-struct ShapeCompatible< DstShape , SrcShape , 5 , true >
-{
- enum { value = unsigned(DstShape::scalar_size) == unsigned(SrcShape::scalar_size) &&
- unsigned(DstShape::N5) == unsigned(SrcShape::N5) &&
- unsigned(DstShape::N6) == unsigned(SrcShape::N6) &&
- unsigned(DstShape::N7) == unsigned(SrcShape::N7) };
-};
-
-template< class DstShape , class SrcShape >
-struct ShapeCompatible< DstShape , SrcShape , 4 , true >
-{
- enum { value = unsigned(DstShape::scalar_size) == unsigned(SrcShape::scalar_size) &&
- unsigned(DstShape::N4) == unsigned(SrcShape::N4) &&
- unsigned(DstShape::N5) == unsigned(SrcShape::N5) &&
- unsigned(DstShape::N6) == unsigned(SrcShape::N6) &&
- unsigned(DstShape::N7) == unsigned(SrcShape::N7) };
-};
-
-template< class DstShape , class SrcShape >
-struct ShapeCompatible< DstShape , SrcShape , 3 , true >
-{
- enum { value = unsigned(DstShape::scalar_size) == unsigned(SrcShape::scalar_size) &&
- unsigned(DstShape::N3) == unsigned(SrcShape::N3) &&
- unsigned(DstShape::N4) == unsigned(SrcShape::N4) &&
- unsigned(DstShape::N5) == unsigned(SrcShape::N5) &&
- unsigned(DstShape::N6) == unsigned(SrcShape::N6) &&
- unsigned(DstShape::N7) == unsigned(SrcShape::N7) };
-};
-
-template< class DstShape , class SrcShape >
-struct ShapeCompatible< DstShape , SrcShape , 2 , true >
-{
- enum { value = unsigned(DstShape::scalar_size) == unsigned(SrcShape::scalar_size) &&
- unsigned(DstShape::N2) == unsigned(SrcShape::N2) &&
- unsigned(DstShape::N3) == unsigned(SrcShape::N3) &&
- unsigned(DstShape::N4) == unsigned(SrcShape::N4) &&
- unsigned(DstShape::N5) == unsigned(SrcShape::N5) &&
- unsigned(DstShape::N6) == unsigned(SrcShape::N6) &&
- unsigned(DstShape::N7) == unsigned(SrcShape::N7) };
-};
-
-template< class DstShape , class SrcShape >
-struct ShapeCompatible< DstShape , SrcShape , 1 , true >
-{
- enum { value = unsigned(DstShape::scalar_size) == unsigned(SrcShape::scalar_size) &&
- unsigned(DstShape::N1) == unsigned(SrcShape::N1) &&
- unsigned(DstShape::N2) == unsigned(SrcShape::N2) &&
- unsigned(DstShape::N3) == unsigned(SrcShape::N3) &&
- unsigned(DstShape::N4) == unsigned(SrcShape::N4) &&
- unsigned(DstShape::N5) == unsigned(SrcShape::N5) &&
- unsigned(DstShape::N6) == unsigned(SrcShape::N6) &&
- unsigned(DstShape::N7) == unsigned(SrcShape::N7) };
-};
-
-template< class DstShape , class SrcShape >
-struct ShapeCompatible< DstShape , SrcShape , 0 , true >
-{
- enum { value = unsigned(DstShape::scalar_size) == unsigned(SrcShape::scalar_size) &&
- unsigned(DstShape::N0) == unsigned(SrcShape::N0) &&
- unsigned(DstShape::N1) == unsigned(SrcShape::N1) &&
- unsigned(DstShape::N2) == unsigned(SrcShape::N2) &&
- unsigned(DstShape::N3) == unsigned(SrcShape::N3) &&
- unsigned(DstShape::N4) == unsigned(SrcShape::N4) &&
- unsigned(DstShape::N5) == unsigned(SrcShape::N5) &&
- unsigned(DstShape::N6) == unsigned(SrcShape::N6) &&
- unsigned(DstShape::N7) == unsigned(SrcShape::N7) };
-};
-
-} /* namespace Impl */
-} /* namespace Kokkos */
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Impl {
-
-template< unsigned ScalarSize , unsigned Rank ,
- unsigned s0 , unsigned s1 , unsigned s2 , unsigned s3 ,
- unsigned s4 , unsigned s5 , unsigned s6 , unsigned s7 ,
- typename iType >
-KOKKOS_INLINE_FUNCTION
-size_t dimension(
- const Shape<ScalarSize,Rank,s0,s1,s2,s3,s4,s5,s6,s7> & shape ,
- const iType & r )
-{
- return 0 == r ? shape.N0 : (
- 1 == r ? shape.N1 : (
- 2 == r ? shape.N2 : (
- 3 == r ? shape.N3 : (
- 4 == r ? shape.N4 : (
- 5 == r ? shape.N5 : (
- 6 == r ? shape.N6 : (
- 7 == r ? shape.N7 : 1 )))))));
-}
-
-template< unsigned ScalarSize , unsigned Rank ,
- unsigned s0 , unsigned s1 , unsigned s2 , unsigned s3 ,
- unsigned s4 , unsigned s5 , unsigned s6 , unsigned s7 >
-KOKKOS_INLINE_FUNCTION
-size_t cardinality_count(
- const Shape<ScalarSize,Rank,s0,s1,s2,s3,s4,s5,s6,s7> & shape )
-{
- return size_t(shape.N0) * shape.N1 * shape.N2 * shape.N3 *
- shape.N4 * shape.N5 * shape.N6 * shape.N7 ;
-}
-
-//----------------------------------------------------------------------------
-
-} /* namespace Impl */
-} /* namespace Kokkos */
-
-#endif /* #ifndef KOKKOS_CORESHAPE_HPP */
-
diff --git a/lib/kokkos/core/src/impl/KokkosExp_SharedAlloc.cpp b/lib/kokkos/core/src/impl/Kokkos_SharedAlloc.cpp
similarity index 85%
rename from lib/kokkos/core/src/impl/KokkosExp_SharedAlloc.cpp
rename to lib/kokkos/core/src/impl/Kokkos_SharedAlloc.cpp
index 96b370434..1ae51742e 100644
--- a/lib/kokkos/core/src/impl/KokkosExp_SharedAlloc.cpp
+++ b/lib/kokkos/core/src/impl/Kokkos_SharedAlloc.cpp
@@ -1,346 +1,344 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Core.hpp>
namespace Kokkos {
-namespace Experimental {
namespace Impl {
int SharedAllocationRecord< void , void >::s_tracking_enabled = 1 ;
void SharedAllocationRecord< void , void >::tracking_claim_and_disable()
{
// A host thread claim and disable tracking flag
while ( ! Kokkos::atomic_compare_exchange_strong( & s_tracking_enabled, 1, 0 ) );
}
void SharedAllocationRecord< void , void >::tracking_release_and_enable()
{
// The host thread that claimed and disabled the tracking flag
// now release and enable tracking.
if ( ! Kokkos::atomic_compare_exchange_strong( & s_tracking_enabled, 0, 1 ) ){
- Kokkos::Impl::throw_runtime_exception("Kokkos::Experimental::Impl::SharedAllocationRecord<>::tracking_release_and_enable FAILED, this host process thread did not hold the lock" );
+ Kokkos::Impl::throw_runtime_exception("Kokkos::Impl::SharedAllocationRecord<>::tracking_release_and_enable FAILED, this host process thread did not hold the lock" );
}
}
//----------------------------------------------------------------------------
bool
SharedAllocationRecord< void , void >::
is_sane( SharedAllocationRecord< void , void > * arg_record )
{
constexpr static SharedAllocationRecord * zero = 0 ;
SharedAllocationRecord * const root = arg_record ? arg_record->m_root : 0 ;
bool ok = root != 0 && root->use_count() == 0 ;
if ( ok ) {
SharedAllocationRecord * root_next = 0 ;
// Lock the list:
while ( ( root_next = Kokkos::atomic_exchange( & root->m_next , zero ) ) == zero );
for ( SharedAllocationRecord * rec = root_next ; ok && rec != root ; rec = rec->m_next ) {
const bool ok_non_null = rec && rec->m_prev && ( rec == root || rec->m_next );
const bool ok_root = ok_non_null && rec->m_root == root ;
const bool ok_prev_next = ok_non_null && ( rec->m_prev != root ? rec->m_prev->m_next == rec : root_next == rec );
const bool ok_next_prev = ok_non_null && rec->m_next->m_prev == rec ;
const bool ok_count = ok_non_null && 0 <= rec->use_count() ;
ok = ok_root && ok_prev_next && ok_next_prev && ok_count ;
if ( ! ok ) {
//Formatting dependent on sizeof(uintptr_t)
const char * format_string;
if (sizeof(uintptr_t) == sizeof(unsigned long)) {
- format_string = "Kokkos::Experimental::Impl::SharedAllocationRecord failed is_sane: rec(0x%.12lx){ m_count(%d) m_root(0x%.12lx) m_next(0x%.12lx) m_prev(0x%.12lx) m_next->m_prev(0x%.12lx) m_prev->m_next(0x%.12lx) }\n";
+ format_string = "Kokkos::Impl::SharedAllocationRecord failed is_sane: rec(0x%.12lx){ m_count(%d) m_root(0x%.12lx) m_next(0x%.12lx) m_prev(0x%.12lx) m_next->m_prev(0x%.12lx) m_prev->m_next(0x%.12lx) }\n";
}
else if (sizeof(uintptr_t) == sizeof(unsigned long long)) {
- format_string = "Kokkos::Experimental::Impl::SharedAllocationRecord failed is_sane: rec(0x%.12llx){ m_count(%d) m_root(0x%.12llx) m_next(0x%.12llx) m_prev(0x%.12llx) m_next->m_prev(0x%.12llx) m_prev->m_next(0x%.12llx) }\n";
+ format_string = "Kokkos::Impl::SharedAllocationRecord failed is_sane: rec(0x%.12llx){ m_count(%d) m_root(0x%.12llx) m_next(0x%.12llx) m_prev(0x%.12llx) m_next->m_prev(0x%.12llx) m_prev->m_next(0x%.12llx) }\n";
}
fprintf(stderr
, format_string
, reinterpret_cast< uintptr_t >( rec )
, rec->use_count()
, reinterpret_cast< uintptr_t >( rec->m_root )
, reinterpret_cast< uintptr_t >( rec->m_next )
, reinterpret_cast< uintptr_t >( rec->m_prev )
, reinterpret_cast< uintptr_t >( rec->m_next != NULL ? rec->m_next->m_prev : NULL )
, reinterpret_cast< uintptr_t >( rec->m_prev != rec->m_root ? rec->m_prev->m_next : root_next )
);
}
}
if ( zero != Kokkos::atomic_exchange( & root->m_next , root_next ) ) {
- Kokkos::Impl::throw_runtime_exception("Kokkos::Experimental::Impl::SharedAllocationRecord failed is_sane unlocking");
+ Kokkos::Impl::throw_runtime_exception("Kokkos::Impl::SharedAllocationRecord failed is_sane unlocking");
}
}
return ok ;
}
SharedAllocationRecord<void,void> *
SharedAllocationRecord<void,void>::find( SharedAllocationRecord<void,void> * const arg_root , void * const arg_data_ptr )
{
constexpr static SharedAllocationRecord * zero = 0 ;
SharedAllocationRecord * root_next = 0 ;
// Lock the list:
while ( ( root_next = Kokkos::atomic_exchange( & arg_root->m_next , zero ) ) == zero );
// Iterate searching for the record with this data pointer
SharedAllocationRecord * r = root_next ;
while ( ( r != arg_root ) && ( r->data() != arg_data_ptr ) ) { r = r->m_next ; }
if ( r == arg_root ) { r = 0 ; }
if ( zero != Kokkos::atomic_exchange( & arg_root->m_next , root_next ) ) {
- Kokkos::Impl::throw_runtime_exception("Kokkos::Experimental::Impl::SharedAllocationRecord failed locking/unlocking");
+ Kokkos::Impl::throw_runtime_exception("Kokkos::Impl::SharedAllocationRecord failed locking/unlocking");
}
return r ;
}
/**\brief Construct and insert into 'arg_root' tracking set.
* use_count is zero.
*/
SharedAllocationRecord< void , void >::
SharedAllocationRecord( SharedAllocationRecord<void,void> * arg_root
, SharedAllocationHeader * arg_alloc_ptr
, size_t arg_alloc_size
, SharedAllocationRecord< void , void >::function_type arg_dealloc
)
: m_alloc_ptr( arg_alloc_ptr )
, m_alloc_size( arg_alloc_size )
, m_dealloc( arg_dealloc )
, m_root( arg_root )
, m_prev( 0 )
, m_next( 0 )
, m_count( 0 )
{
constexpr static SharedAllocationRecord * zero = 0 ;
if ( 0 != arg_alloc_ptr ) {
// Insert into the root double-linked list for tracking
//
// before: arg_root->m_next == next ; next->m_prev == arg_root
// after: arg_root->m_next == this ; this->m_prev == arg_root ;
// this->m_next == next ; next->m_prev == this
m_prev = m_root ;
// Read root->m_next and lock by setting to zero
while ( ( m_next = Kokkos::atomic_exchange( & m_root->m_next , zero ) ) == zero );
m_next->m_prev = this ;
// memory fence before completing insertion into linked list
Kokkos::memory_fence();
if ( zero != Kokkos::atomic_exchange( & m_root->m_next , this ) ) {
- Kokkos::Impl::throw_runtime_exception("Kokkos::Experimental::Impl::SharedAllocationRecord failed locking/unlocking");
+ Kokkos::Impl::throw_runtime_exception("Kokkos::Impl::SharedAllocationRecord failed locking/unlocking");
}
}
else {
- Kokkos::Impl::throw_runtime_exception("Kokkos::Experimental::Impl::SharedAllocationRecord given NULL allocation");
+ Kokkos::Impl::throw_runtime_exception("Kokkos::Impl::SharedAllocationRecord given NULL allocation");
}
}
void
SharedAllocationRecord< void , void >::
increment( SharedAllocationRecord< void , void > * arg_record )
{
const int old_count = Kokkos::atomic_fetch_add( & arg_record->m_count , 1 );
if ( old_count < 0 ) { // Error
- Kokkos::Impl::throw_runtime_exception("Kokkos::Experimental::Impl::SharedAllocationRecord failed increment");
+ Kokkos::Impl::throw_runtime_exception("Kokkos::Impl::SharedAllocationRecord failed increment");
}
}
SharedAllocationRecord< void , void > *
SharedAllocationRecord< void , void >::
decrement( SharedAllocationRecord< void , void > * arg_record )
{
constexpr static SharedAllocationRecord * zero = 0 ;
const int old_count = Kokkos::atomic_fetch_add( & arg_record->m_count , -1 );
#if 0
if ( old_count <= 1 ) {
- fprintf(stderr,"Kokkos::Experimental::Impl::SharedAllocationRecord '%s' at 0x%lx delete count = %d\n", arg_record->m_alloc_ptr->m_label , (unsigned long) arg_record , old_count );
+ fprintf(stderr,"Kokkos::Impl::SharedAllocationRecord '%s' at 0x%lx delete count = %d\n", arg_record->m_alloc_ptr->m_label , (unsigned long) arg_record , old_count );
fflush(stderr);
}
#endif
if ( old_count == 1 ) {
// before: arg_record->m_prev->m_next == arg_record &&
// arg_record->m_next->m_prev == arg_record
//
// after: arg_record->m_prev->m_next == arg_record->m_next &&
// arg_record->m_next->m_prev == arg_record->m_prev
SharedAllocationRecord * root_next = 0 ;
// Lock the list:
while ( ( root_next = Kokkos::atomic_exchange( & arg_record->m_root->m_next , zero ) ) == zero );
arg_record->m_next->m_prev = arg_record->m_prev ;
if ( root_next != arg_record ) {
arg_record->m_prev->m_next = arg_record->m_next ;
}
else {
// before: arg_record->m_root == arg_record->m_prev
// after: arg_record->m_root == arg_record->m_next
root_next = arg_record->m_next ;
}
// Unlock the list:
if ( zero != Kokkos::atomic_exchange( & arg_record->m_root->m_next , root_next ) ) {
- Kokkos::Impl::throw_runtime_exception("Kokkos::Experimental::Impl::SharedAllocationRecord failed decrement unlocking");
+ Kokkos::Impl::throw_runtime_exception("Kokkos::Impl::SharedAllocationRecord failed decrement unlocking");
}
arg_record->m_next = 0 ;
arg_record->m_prev = 0 ;
function_type d = arg_record->m_dealloc ;
(*d)( arg_record );
arg_record = 0 ;
}
else if ( old_count < 1 ) { // Error
- fprintf(stderr,"Kokkos::Experimental::Impl::SharedAllocationRecord '%s' failed decrement count = %d\n", arg_record->m_alloc_ptr->m_label , old_count );
+ fprintf(stderr,"Kokkos::Impl::SharedAllocationRecord '%s' failed decrement count = %d\n", arg_record->m_alloc_ptr->m_label , old_count );
fflush(stderr);
- Kokkos::Impl::throw_runtime_exception("Kokkos::Experimental::Impl::SharedAllocationRecord failed decrement count");
+ Kokkos::Impl::throw_runtime_exception("Kokkos::Impl::SharedAllocationRecord failed decrement count");
}
return arg_record ;
}
void
SharedAllocationRecord< void , void >::
print_host_accessible_records( std::ostream & s
, const char * const space_name
, const SharedAllocationRecord * const root
, const bool detail )
{
const SharedAllocationRecord< void , void > * r = root ;
char buffer[256] ;
if ( detail ) {
do {
//Formatting dependent on sizeof(uintptr_t)
const char * format_string;
if (sizeof(uintptr_t) == sizeof(unsigned long)) {
format_string = "%s addr( 0x%.12lx ) list( 0x%.12lx 0x%.12lx ) extent[ 0x%.12lx + %.8ld ] count(%d) dealloc(0x%.12lx) %s\n";
}
else if (sizeof(uintptr_t) == sizeof(unsigned long long)) {
format_string = "%s addr( 0x%.12llx ) list( 0x%.12llx 0x%.12llx ) extent[ 0x%.12llx + %.8ld ] count(%d) dealloc(0x%.12llx) %s\n";
}
snprintf( buffer , 256
, format_string
, space_name
, reinterpret_cast<uintptr_t>( r )
, reinterpret_cast<uintptr_t>( r->m_prev )
, reinterpret_cast<uintptr_t>( r->m_next )
, reinterpret_cast<uintptr_t>( r->m_alloc_ptr )
, r->m_alloc_size
, r->use_count()
, reinterpret_cast<uintptr_t>( r->m_dealloc )
, r->m_alloc_ptr->m_label
);
std::cout << buffer ;
r = r->m_next ;
} while ( r != root );
}
else {
do {
if ( r->m_alloc_ptr ) {
//Formatting dependent on sizeof(uintptr_t)
const char * format_string;
if (sizeof(uintptr_t) == sizeof(unsigned long)) {
format_string = "%s [ 0x%.12lx + %ld ] %s\n";
}
else if (sizeof(uintptr_t) == sizeof(unsigned long long)) {
format_string = "%s [ 0x%.12llx + %ld ] %s\n";
}
snprintf( buffer , 256
, format_string
, space_name
, reinterpret_cast< uintptr_t >( r->data() )
, r->size()
, r->m_alloc_ptr->m_label
);
}
else {
snprintf( buffer , 256 , "%s [ 0 + 0 ]\n" , space_name );
}
std::cout << buffer ;
r = r->m_next ;
} while ( r != root );
}
}
} /* namespace Impl */
-} /* namespace Experimental */
} /* namespace Kokkos */
diff --git a/lib/kokkos/core/src/impl/KokkosExp_SharedAlloc.hpp b/lib/kokkos/core/src/impl/Kokkos_SharedAlloc.hpp
similarity index 96%
rename from lib/kokkos/core/src/impl/KokkosExp_SharedAlloc.hpp
rename to lib/kokkos/core/src/impl/Kokkos_SharedAlloc.hpp
index 1498eafb0..a9c2d6f22 100644
--- a/lib/kokkos/core/src/impl/KokkosExp_SharedAlloc.hpp
+++ b/lib/kokkos/core/src/impl/Kokkos_SharedAlloc.hpp
@@ -1,400 +1,402 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_SHARED_ALLOC_HPP_
#define KOKKOS_SHARED_ALLOC_HPP_
#include <stdint.h>
#include <string>
namespace Kokkos {
-namespace Experimental {
namespace Impl {
template< class MemorySpace = void , class DestroyFunctor = void >
class SharedAllocationRecord ;
class SharedAllocationHeader {
private:
typedef SharedAllocationRecord<void,void> Record ;
static constexpr unsigned maximum_label_length = ( 1u << 7 /* 128 */ ) - sizeof(Record*);
template< class , class > friend class SharedAllocationRecord ;
Record * m_record ;
char m_label[ maximum_label_length ];
public:
/* Given user memory get pointer to the header */
KOKKOS_INLINE_FUNCTION static
const SharedAllocationHeader * get_header( void * alloc_ptr )
{ return reinterpret_cast<SharedAllocationHeader*>( reinterpret_cast<char*>(alloc_ptr) - sizeof(SharedAllocationHeader) ); }
};
template<>
class SharedAllocationRecord< void , void > {
protected:
static_assert( sizeof(SharedAllocationHeader) == ( 1u << 7 /* 128 */ ) , "sizeof(SharedAllocationHeader) != 128" );
template< class , class > friend class SharedAllocationRecord ;
typedef void (* function_type )( SharedAllocationRecord<void,void> * );
static int s_tracking_enabled ;
SharedAllocationHeader * const m_alloc_ptr ;
size_t const m_alloc_size ;
function_type const m_dealloc ;
SharedAllocationRecord * const m_root ;
SharedAllocationRecord * m_prev ;
SharedAllocationRecord * m_next ;
int m_count ;
SharedAllocationRecord( SharedAllocationRecord && ) = delete ;
SharedAllocationRecord( const SharedAllocationRecord & ) = delete ;
SharedAllocationRecord & operator = ( SharedAllocationRecord && ) = delete ;
SharedAllocationRecord & operator = ( const SharedAllocationRecord & ) = delete ;
/**\brief Construct and insert into 'arg_root' tracking set.
* use_count is zero.
*/
SharedAllocationRecord( SharedAllocationRecord * arg_root
, SharedAllocationHeader * arg_alloc_ptr
, size_t arg_alloc_size
, function_type arg_dealloc
);
public:
+ inline std::string get_label() const { return std::string("Unmanaged"); }
static int tracking_enabled() { return s_tracking_enabled ; }
/**\brief A host process thread claims and disables the
* shared allocation tracking flag.
*/
static void tracking_claim_and_disable();
/**\brief A host process thread releases and enables the
* shared allocation tracking flag.
*/
static void tracking_release_and_enable();
~SharedAllocationRecord() = default ;
SharedAllocationRecord()
: m_alloc_ptr( 0 )
, m_alloc_size( 0 )
, m_dealloc( 0 )
, m_root( this )
, m_prev( this )
, m_next( this )
, m_count( 0 )
{}
static constexpr unsigned maximum_label_length = SharedAllocationHeader::maximum_label_length ;
KOKKOS_INLINE_FUNCTION
const SharedAllocationHeader * head() const { return m_alloc_ptr ; }
/* User's memory begins at the end of the header */
KOKKOS_INLINE_FUNCTION
void * data() const { return reinterpret_cast<void*>( m_alloc_ptr + 1 ); }
/* User's memory begins at the end of the header */
size_t size() const { return m_alloc_size - sizeof(SharedAllocationHeader) ; }
/* Cannot be 'constexpr' because 'm_count' is volatile */
int use_count() const { return *static_cast<const volatile int *>(&m_count); }
/* Increment use count */
static void increment( SharedAllocationRecord * );
/* Decrement use count. If 1->0 then remove from the tracking list and invoke m_dealloc */
static SharedAllocationRecord * decrement( SharedAllocationRecord * );
/* Given a root record and data pointer find the record */
static SharedAllocationRecord * find( SharedAllocationRecord * const , void * const );
/* Sanity check for the whole set of records to which the input record belongs.
* Locks the set's insert/erase operations until the sanity check is complete.
*/
static bool is_sane( SharedAllocationRecord * );
/* Print host-accessible records */
static void print_host_accessible_records( std::ostream &
, const char * const space_name
, const SharedAllocationRecord * const root
, const bool detail );
};
namespace {
/* Taking the address of this function so make sure it is unique */
template < class MemorySpace , class DestroyFunctor >
void deallocate( SharedAllocationRecord<void,void> * record_ptr )
{
typedef SharedAllocationRecord< MemorySpace , void > base_type ;
typedef SharedAllocationRecord< MemorySpace , DestroyFunctor > this_type ;
this_type * const ptr = static_cast< this_type * >(
static_cast< base_type * >( record_ptr ) );
ptr->m_destroy.destroy_shared_allocation();
delete ptr ;
}
}
/*
* Memory space specialization of SharedAllocationRecord< Space , void > requires :
*
* SharedAllocationRecord< Space , void > : public SharedAllocationRecord< void , void >
* {
* // delete allocated user memory via static_cast to this type.
* static void deallocate( const SharedAllocationRecord<void,void> * );
* Space m_space ;
* }
*/
template< class MemorySpace , class DestroyFunctor >
class SharedAllocationRecord : public SharedAllocationRecord< MemorySpace , void >
{
private:
SharedAllocationRecord( const MemorySpace & arg_space
, const std::string & arg_label
, const size_t arg_alloc
)
/* Allocate user memory as [ SharedAllocationHeader , user_memory ] */
- : SharedAllocationRecord< MemorySpace , void >( arg_space , arg_label , arg_alloc , & Kokkos::Experimental::Impl::deallocate< MemorySpace , DestroyFunctor > )
+ : SharedAllocationRecord< MemorySpace , void >( arg_space , arg_label , arg_alloc , & Kokkos::Impl::deallocate< MemorySpace , DestroyFunctor > )
, m_destroy()
{}
SharedAllocationRecord() = delete ;
SharedAllocationRecord( const SharedAllocationRecord & ) = delete ;
SharedAllocationRecord & operator = ( const SharedAllocationRecord & ) = delete ;
public:
DestroyFunctor m_destroy ;
// Allocate with a zero use count. Incrementing the use count from zero to one
// inserts the record into the tracking list. Decrementing the count from one to zero
// removes from the trakcing list and deallocates.
KOKKOS_INLINE_FUNCTION static
SharedAllocationRecord * allocate( const MemorySpace & arg_space
, const std::string & arg_label
, const size_t arg_alloc
)
{
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
return new SharedAllocationRecord( arg_space , arg_label , arg_alloc );
#else
return (SharedAllocationRecord *) 0 ;
#endif
}
};
+template< class MemorySpace >
+class SharedAllocationRecord<MemorySpace,void> : public SharedAllocationRecord< void , void > {};
+
union SharedAllocationTracker {
private:
typedef SharedAllocationRecord<void,void> Record ;
enum : uintptr_t { DO_NOT_DEREF_FLAG = 0x01ul };
// The allocation record resides in Host memory space
uintptr_t m_record_bits ;
Record * m_record ;
public:
// Use macros instead of inline functions to reduce
// pressure on compiler optimization by reducing
// number of symbols and inline functons.
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
#define KOKKOS_SHARED_ALLOCATION_TRACKER_ENABLED \
Record::tracking_enabled()
#define KOKKOS_SHARED_ALLOCATION_TRACKER_INCREMENT \
if ( ! ( m_record_bits & DO_NOT_DEREF_FLAG ) ) Record::increment( m_record );
#define KOKKOS_SHARED_ALLOCATION_TRACKER_DECREMENT \
if ( ! ( m_record_bits & DO_NOT_DEREF_FLAG ) ) Record::decrement( m_record );
#else
#define KOKKOS_SHARED_ALLOCATION_TRACKER_ENABLED 0
#define KOKKOS_SHARED_ALLOCATION_TRACKER_INCREMENT /* */
#define KOKKOS_SHARED_ALLOCATION_TRACKER_DECREMENT /* */
#endif
/** \brief Assign a specialized record */
inline
void assign_allocated_record_to_uninitialized( Record * arg_record )
{
if ( arg_record ) {
Record::increment( m_record = arg_record );
}
else {
m_record_bits = DO_NOT_DEREF_FLAG ;
}
}
template< class MemorySpace >
constexpr
SharedAllocationRecord< MemorySpace , void > &
get_record() const
{ return * static_cast< SharedAllocationRecord< MemorySpace , void > * >( m_record ); }
template< class MemorySpace >
std::string get_label() const
{
- return ( m_record_bits & DO_NOT_DEREF_FLAG )
+ return ( m_record_bits == DO_NOT_DEREF_FLAG )
? std::string()
- : static_cast< SharedAllocationRecord< MemorySpace , void > * >( m_record )->get_label()
+ : reinterpret_cast< SharedAllocationRecord< MemorySpace , void > * >( m_record_bits & ~DO_NOT_DEREF_FLAG )->get_label()
;
}
KOKKOS_INLINE_FUNCTION
int use_count() const
{
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
Record * const tmp = reinterpret_cast<Record*>( m_record_bits & ~DO_NOT_DEREF_FLAG );
return ( tmp ? tmp->use_count() : 0 );
#else
return 0 ;
#endif
}
KOKKOS_FORCEINLINE_FUNCTION
~SharedAllocationTracker()
{ KOKKOS_SHARED_ALLOCATION_TRACKER_DECREMENT }
KOKKOS_FORCEINLINE_FUNCTION
constexpr SharedAllocationTracker()
: m_record_bits( DO_NOT_DEREF_FLAG ) {}
// Move:
KOKKOS_FORCEINLINE_FUNCTION
SharedAllocationTracker( SharedAllocationTracker && rhs )
: m_record_bits( rhs.m_record_bits )
{ rhs.m_record_bits = DO_NOT_DEREF_FLAG ; }
KOKKOS_FORCEINLINE_FUNCTION
SharedAllocationTracker & operator = ( SharedAllocationTracker && rhs )
{
// If this is tracking then must decrement
KOKKOS_SHARED_ALLOCATION_TRACKER_DECREMENT
// Move and reset RHS to default constructed value.
m_record_bits = rhs.m_record_bits ;
rhs.m_record_bits = DO_NOT_DEREF_FLAG ;
return *this ;
}
// Copy:
KOKKOS_FORCEINLINE_FUNCTION
SharedAllocationTracker( const SharedAllocationTracker & rhs )
: m_record_bits( KOKKOS_SHARED_ALLOCATION_TRACKER_ENABLED
? rhs.m_record_bits
: rhs.m_record_bits | DO_NOT_DEREF_FLAG )
{
KOKKOS_SHARED_ALLOCATION_TRACKER_INCREMENT
}
/** \brief Copy construction may disable tracking. */
KOKKOS_FORCEINLINE_FUNCTION
SharedAllocationTracker( const SharedAllocationTracker & rhs
, const bool enable_tracking )
: m_record_bits( KOKKOS_SHARED_ALLOCATION_TRACKER_ENABLED
&& enable_tracking
? rhs.m_record_bits
: rhs.m_record_bits | DO_NOT_DEREF_FLAG )
{ KOKKOS_SHARED_ALLOCATION_TRACKER_INCREMENT }
KOKKOS_FORCEINLINE_FUNCTION
SharedAllocationTracker & operator = ( const SharedAllocationTracker & rhs )
{
// If this is tracking then must decrement
KOKKOS_SHARED_ALLOCATION_TRACKER_DECREMENT
m_record_bits = KOKKOS_SHARED_ALLOCATION_TRACKER_ENABLED
? rhs.m_record_bits
: rhs.m_record_bits | DO_NOT_DEREF_FLAG ;
KOKKOS_SHARED_ALLOCATION_TRACKER_INCREMENT
return *this ;
}
/** \brief Copy assignment may disable tracking */
KOKKOS_FORCEINLINE_FUNCTION
void assign( const SharedAllocationTracker & rhs
, const bool enable_tracking )
{
KOKKOS_SHARED_ALLOCATION_TRACKER_DECREMENT
m_record_bits = KOKKOS_SHARED_ALLOCATION_TRACKER_ENABLED
&& enable_tracking
? rhs.m_record_bits
: rhs.m_record_bits | DO_NOT_DEREF_FLAG ;
KOKKOS_SHARED_ALLOCATION_TRACKER_INCREMENT
}
#undef KOKKOS_SHARED_ALLOCATION_TRACKER_ENABLED
#undef KOKKOS_SHARED_ALLOCATION_TRACKER_INCREMENT
#undef KOKKOS_SHARED_ALLOCATION_TRACKER_DECREMENT
};
} /* namespace Impl */
-} /* namespace Experimental */
} /* namespace Kokkos */
#endif
diff --git a/lib/kokkos/core/src/impl/Kokkos_Tags.hpp b/lib/kokkos/core/src/impl/Kokkos_Tags.hpp
index 0bc2864ff..9545e7e6b 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Tags.hpp
+++ b/lib/kokkos/core/src/impl/Kokkos_Tags.hpp
@@ -1,198 +1,89 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_TAGS_HPP
#define KOKKOS_TAGS_HPP
#include <impl/Kokkos_Traits.hpp>
#include <Kokkos_Core_fwd.hpp>
#include <type_traits>
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
/** KOKKOS_HAVE_TYPE( Type )
*
* defines a meta-function that check if a type expose an internal typedef or
* type alias which matches Type
*
* e.g.
* KOKKOS_HAVE_TYPE( array_layout );
* struct Foo { using array_layout = void; };
* have_array_layout<Foo>::value == 1;
*/
-#define KOKKOS_HAVE_TYPE( Type ) \
-template <typename T> \
-struct have_##Type { \
- template <typename U> static std::false_type have_type(...); \
- template <typename U> static std::true_type have_type( typename U::Type* ); \
- using type = decltype(have_type<T>(nullptr)); \
- static constexpr bool value = type::value; \
-}
-
-/** KOKKOS_IS_CONCEPT( Concept )
- *
- * defines a meta-function that check if a type match the given Kokkos concept
- * type alias which matches Type
- *
- * e.g.
- * KOKKOS_IS_CONCEPT( array_layout );
- * struct Foo { using array_layout = Foo; };
- * is_array_layout<Foo>::value == 1;
- */
-#define KOKKOS_IS_CONCEPT( Concept ) \
-template <typename T> \
-struct is_##Concept { \
- template <typename U> static std::false_type have_concept(...); \
- template <typename U> static auto have_concept( typename U::Concept* ) \
- ->typename std::is_same<T, typename U::Concept>::type;\
- using type = decltype(have_concept<T>(nullptr)); \
- static constexpr bool value = type::value; \
-}
+#define KOKKOS_HAVE_TYPE( TYPE ) \
+template <typename T> struct have_ ## TYPE { \
+private: \
+ template <typename U, typename = void > struct X : std::false_type {}; \
+ template <typename U> struct X<U,typename std::conditional<true,void,typename X:: TYPE >::type > : std::true_type {}; \
+public: \
+ typedef typename X<T>::type type ; \
+ enum : bool { value = type::value }; \
+};
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos { namespace Impl {
template <typename T>
using is_void = std::is_same<void,T>;
-// is_memory_space<T>::value
-KOKKOS_IS_CONCEPT( memory_space );
-
-// is_memory_traits<T>::value
-KOKKOS_IS_CONCEPT( memory_traits );
-
-// is_execution_space<T>::value
-KOKKOS_IS_CONCEPT( execution_space );
-
-// is_execution_policy<T>::value
-KOKKOS_IS_CONCEPT( execution_policy );
-
-// is_array_layout<T>::value
-KOKKOS_IS_CONCEPT( array_layout );
-
-// is_iteration_pattern<T>::value
-KOKKOS_IS_CONCEPT( iteration_pattern );
-
-// is_schedule_type<T>::value
-KOKKOS_IS_CONCEPT( schedule_type );
-
-// is_index_type<T>::value
-KOKKOS_IS_CONCEPT( index_type );
-
}} // namespace Kokkos::Impl
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
-namespace Kokkos {
-
-template< class ExecutionSpace , class MemorySpace >
-struct Device {
- static_assert( Impl::is_execution_space<ExecutionSpace>::value
- , "Execution space is not valid" );
- static_assert( Impl::is_memory_space<MemorySpace>::value
- , "Memory space is not valid" );
- typedef ExecutionSpace execution_space;
- typedef MemorySpace memory_space;
- typedef Device<execution_space,memory_space> device_type;
-};
-}
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Impl {
-
-template< class C , class Enable = void >
-struct is_space : public Impl::false_type {};
-
-template< class C >
-struct is_space< C
- , typename Impl::enable_if<(
- Impl::is_same< C , typename C::execution_space >::value ||
- Impl::is_same< C , typename C::memory_space >::value ||
- Impl::is_same< C , Device<
- typename C::execution_space,
- typename C::memory_space> >::value
- )>::type
- >
- : public Impl::true_type
-{
- typedef typename C::execution_space execution_space ;
- typedef typename C::memory_space memory_space ;
-
- // The host_memory_space defines a space with host-resident memory.
- // If the execution space's memory space is host accessible then use that execution space.
- // else use the HostSpace.
- typedef
- typename Impl::if_c< Impl::is_same< memory_space , HostSpace >::value
-#ifdef KOKKOS_HAVE_CUDA
- || Impl::is_same< memory_space , CudaUVMSpace>::value
- || Impl::is_same< memory_space , CudaHostPinnedSpace>::value
-#endif
- , memory_space , HostSpace >::type
- host_memory_space ;
-
- // The host_execution_space defines a space which has access to HostSpace.
- // If the execution space can access HostSpace then use that execution space.
- // else use the DefaultHostExecutionSpace.
-#ifdef KOKKOS_HAVE_CUDA
- typedef
- typename Impl::if_c< Impl::is_same< execution_space , Cuda >::value
- , DefaultHostExecutionSpace , execution_space >::type
- host_execution_space ;
-#else
- typedef execution_space host_execution_space;
#endif
- typedef Device<host_execution_space,host_memory_space> host_mirror_space;
-};
-}
-}
-
-#endif
diff --git a/lib/kokkos/core/src/impl/Kokkos_TaskQueue.hpp b/lib/kokkos/core/src/impl/Kokkos_TaskQueue.hpp
index 663bb1985..ee9c69e92 100644
--- a/lib/kokkos/core/src/impl/Kokkos_TaskQueue.hpp
+++ b/lib/kokkos/core/src/impl/Kokkos_TaskQueue.hpp
@@ -1,499 +1,509 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
// Experimental unified task-data parallel manycore LDRD
#ifndef KOKKOS_IMPL_TASKQUEUE_HPP
#define KOKKOS_IMPL_TASKQUEUE_HPP
-#if defined( KOKKOS_ENABLE_TASKPOLICY )
+#if defined( KOKKOS_ENABLE_TASKDAG )
#include <string>
#include <typeinfo>
#include <stdexcept>
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
-namespace Kokkos {
-
-template< typename > class TaskPolicy ;
-
-template< typename Arg1 = void , typename Arg2 = void > class Future ;
-
-} /* namespace Kokkos */
-
namespace Kokkos {
namespace Impl {
-template< typename , typename , typename > class TaskBase ;
-template< typename > class TaskExec ;
+/*\brief Implementation data for task data management, access, and execution.
+ *
+ * Curiously recurring template pattern (CRTP)
+ * to allow static_cast from the
+ * task root type and a task's FunctorType.
+ *
+ * TaskBase< Space , ResultType , FunctorType >
+ * : TaskBase< Space , ResultType , void >
+ * , FunctorType
+ * { ... };
+ *
+ * TaskBase< Space , ResultType , void >
+ * : TaskBase< Space , void , void >
+ * { ... };
+ */
+template< typename Space , typename ResultType , typename FunctorType >
+class TaskBase ;
+
+template< typename Space >
+class TaskExec ;
} /* namespace Impl */
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template< typename Space >
class TaskQueueSpecialization ;
/** \brief Manage task allocation, deallocation, and scheduling.
*
* Task execution is deferred to the TaskQueueSpecialization.
* All other aspects of task management have shared implementation.
*/
template< typename ExecSpace >
class TaskQueue {
private:
friend class TaskQueueSpecialization< ExecSpace > ;
- friend class Kokkos::TaskPolicy< ExecSpace > ;
+ friend class Kokkos::TaskScheduler< ExecSpace > ;
using execution_space = ExecSpace ;
using specialization = TaskQueueSpecialization< execution_space > ;
using memory_space = typename specialization::memory_space ;
using device_type = Kokkos::Device< execution_space , memory_space > ;
using memory_pool = Kokkos::Experimental::MemoryPool< device_type > ;
using task_root_type = Kokkos::Impl::TaskBase<execution_space,void,void> ;
struct Destroy {
TaskQueue * m_queue ;
void destroy_shared_allocation();
};
//----------------------------------------
enum : int { NumQueue = 3 };
// Queue is organized as [ priority ][ type ]
memory_pool m_memory ;
task_root_type * volatile m_ready[ NumQueue ][ 2 ];
long m_accum_alloc ; // Accumulated number of allocations
int m_count_alloc ; // Current number of allocations
int m_max_alloc ; // Maximum number of allocations
int m_ready_count ; // Number of ready or executing
//----------------------------------------
~TaskQueue();
TaskQueue() = delete ;
TaskQueue( TaskQueue && ) = delete ;
TaskQueue( TaskQueue const & ) = delete ;
TaskQueue & operator = ( TaskQueue && ) = delete ;
TaskQueue & operator = ( TaskQueue const & ) = delete ;
TaskQueue
( const memory_space & arg_space
, unsigned const arg_memory_pool_capacity
, unsigned const arg_memory_pool_superblock_capacity_log2
);
// Schedule a task
// Precondition:
// task is not executing
// task->m_next is the dependence or zero
// Postcondition:
// task->m_next is linked list membership
KOKKOS_FUNCTION
void schedule( task_root_type * const );
// Complete a task
// Precondition:
// task is not executing
// task->m_next == LockTag => task is complete
// task->m_next != LockTag => task is respawn
// Postcondition:
// task->m_wait == LockTag => task is complete
// task->m_wait != LockTag => task is waiting
KOKKOS_FUNCTION
void complete( task_root_type * );
KOKKOS_FUNCTION
static bool push_task( task_root_type * volatile * const
, task_root_type * const );
KOKKOS_FUNCTION
static task_root_type * pop_task( task_root_type * volatile * const );
KOKKOS_FUNCTION static
void decrement( task_root_type * task );
public:
// If and only if the execution space is a single thread
// then execute ready tasks.
KOKKOS_INLINE_FUNCTION
void iff_single_thread_recursive_execute()
{
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
specialization::iff_single_thread_recursive_execute( this );
#endif
}
void execute() { specialization::execute( this ); }
// Assign task pointer with reference counting of assigned tasks
template< typename LV , typename RV >
KOKKOS_FUNCTION static
void assign( TaskBase< execution_space,LV,void> ** const lhs
, TaskBase< execution_space,RV,void> * const rhs )
{
using task_lhs = TaskBase< execution_space,LV,void> ;
#if 0
{
printf( "assign( 0x%lx { 0x%lx %d %d } , 0x%lx { 0x%lx %d %d } )\n"
, uintptr_t( lhs ? *lhs : 0 )
, uintptr_t( lhs && *lhs ? (*lhs)->m_next : 0 )
, int( lhs && *lhs ? (*lhs)->m_task_type : 0 )
, int( lhs && *lhs ? (*lhs)->m_ref_count : 0 )
, uintptr_t(rhs)
, uintptr_t( rhs ? rhs->m_next : 0 )
, int( rhs ? rhs->m_task_type : 0 )
, int( rhs ? rhs->m_ref_count : 0 )
);
fflush( stdout );
}
#endif
if ( *lhs ) decrement( *lhs );
- if ( rhs ) { Kokkos::atomic_fetch_add( &(rhs->m_ref_count) , 1 ); }
+ if ( rhs ) { Kokkos::atomic_increment( &(rhs->m_ref_count) ); }
// Force write of *lhs
*static_cast< task_lhs * volatile * >(lhs) = rhs ;
Kokkos::memory_fence();
}
KOKKOS_FUNCTION
size_t allocate_block_size( size_t n ); ///< Actual block size allocated
KOKKOS_FUNCTION
void * allocate( size_t n ); ///< Allocate from the memory pool
KOKKOS_FUNCTION
void deallocate( void * p , size_t n ); ///< Deallocate to the memory pool
};
} /* namespace Impl */
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template<>
class TaskBase< void , void , void > {
public:
enum : int16_t { TaskTeam = 0 , TaskSingle = 1 , Aggregate = 2 };
enum : uintptr_t { LockTag = ~uintptr_t(0) , EndTag = ~uintptr_t(1) };
};
/** \brief Base class for task management, access, and execution.
*
* Inheritance structure to allow static_cast from the task root type
* and a task's FunctorType.
*
* // Enable a Future to access result data
* TaskBase< Space , ResultType , void >
* : TaskBase< void , void , void >
* { ... };
*
* // Enable a functor to access the base class
* TaskBase< Space , ResultType , FunctorType >
* : TaskBase< Space , ResultType , void >
* , FunctorType
* { ... };
*
*
* States of a task:
*
* Constructing State, NOT IN a linked list
* m_wait == 0
* m_next == 0
*
* Scheduling transition : Constructing -> Waiting
* before:
* m_wait == 0
* m_next == this task's initial dependence, 0 if none
* after:
* m_wait == EndTag
* m_next == EndTag
*
* Waiting State, IN a linked list
* m_apply != 0
* m_queue != 0
* m_ref_count > 0
* m_wait == head of linked list of tasks waiting on this task
* m_next == next of linked list of tasks
*
* transition : Waiting -> Executing
* before:
* m_next == EndTag
* after::
* m_next == LockTag
*
* Executing State, NOT IN a linked list
* m_apply != 0
* m_queue != 0
* m_ref_count > 0
* m_wait == head of linked list of tasks waiting on this task
* m_next == LockTag
*
* Respawn transition : Executing -> Executing-Respawn
* before:
* m_next == LockTag
* after:
* m_next == this task's updated dependence, 0 if none
*
* Executing-Respawn State, NOT IN a linked list
* m_apply != 0
* m_queue != 0
* m_ref_count > 0
* m_wait == head of linked list of tasks waiting on this task
* m_next == this task's updated dependence, 0 if none
*
* transition : Executing -> Complete
* before:
* m_wait == head of linked list
* after:
* m_wait == LockTag
*
* Complete State, NOT IN a linked list
* m_wait == LockTag: cannot add dependence
* m_next == LockTag: not a member of a wait queue
*
*/
template< typename ExecSpace >
class TaskBase< ExecSpace , void , void >
{
public:
enum : int16_t { TaskTeam = TaskBase<void,void,void>::TaskTeam
, TaskSingle = TaskBase<void,void,void>::TaskSingle
, Aggregate = TaskBase<void,void,void>::Aggregate };
enum : uintptr_t { LockTag = TaskBase<void,void,void>::LockTag
, EndTag = TaskBase<void,void,void>::EndTag };
using execution_space = ExecSpace ;
using queue_type = TaskQueue< execution_space > ;
- template< typename > friend class Kokkos::TaskPolicy ;
+ template< typename > friend class Kokkos::TaskScheduler ;
typedef void (* function_type) ( TaskBase * , void * );
// sizeof(TaskBase) == 48
function_type m_apply ; ///< Apply function pointer
queue_type * m_queue ; ///< Queue in which this task resides
TaskBase * m_wait ; ///< Linked list of tasks waiting on this
TaskBase * m_next ; ///< Waiting linked-list next
int32_t m_ref_count ; ///< Reference count
int32_t m_alloc_size ;///< Allocation size
int32_t m_dep_count ; ///< Aggregate's number of dependences
int16_t m_task_type ; ///< Type of task
int16_t m_priority ; ///< Priority of runnable task
TaskBase( TaskBase && ) = delete ;
TaskBase( const TaskBase & ) = delete ;
TaskBase & operator = ( TaskBase && ) = delete ;
TaskBase & operator = ( const TaskBase & ) = delete ;
KOKKOS_INLINE_FUNCTION ~TaskBase() = default ;
KOKKOS_INLINE_FUNCTION
constexpr TaskBase() noexcept
: m_apply(0)
, m_queue(0)
, m_wait(0)
, m_next(0)
, m_ref_count(0)
, m_alloc_size(0)
, m_dep_count(0)
, m_task_type( TaskSingle )
, m_priority( 1 /* TaskRegularPriority */ )
{}
//----------------------------------------
KOKKOS_INLINE_FUNCTION
TaskBase ** aggregate_dependences()
{ return reinterpret_cast<TaskBase**>( this + 1 ); }
using get_return_type = void ;
KOKKOS_INLINE_FUNCTION
get_return_type get() const {}
};
template < typename ExecSpace , typename ResultType >
class TaskBase< ExecSpace , ResultType , void >
: public TaskBase< ExecSpace , void , void >
{
private:
static_assert( sizeof(TaskBase<ExecSpace,void,void>) == 48 , "" );
TaskBase( TaskBase && ) = delete ;
TaskBase( const TaskBase & ) = delete ;
TaskBase & operator = ( TaskBase && ) = delete ;
TaskBase & operator = ( const TaskBase & ) = delete ;
public:
ResultType m_result ;
KOKKOS_INLINE_FUNCTION ~TaskBase() = default ;
KOKKOS_INLINE_FUNCTION
TaskBase()
: TaskBase< ExecSpace , void , void >()
, m_result()
{}
using get_return_type = ResultType const & ;
KOKKOS_INLINE_FUNCTION
get_return_type get() const { return m_result ; }
};
template< typename ExecSpace , typename ResultType , typename FunctorType >
class TaskBase
: public TaskBase< ExecSpace , ResultType , void >
, public FunctorType
{
private:
TaskBase() = delete ;
TaskBase( TaskBase && ) = delete ;
TaskBase( const TaskBase & ) = delete ;
TaskBase & operator = ( TaskBase && ) = delete ;
TaskBase & operator = ( const TaskBase & ) = delete ;
public:
using root_type = TaskBase< ExecSpace , void , void > ;
using base_type = TaskBase< ExecSpace , ResultType , void > ;
using member_type = TaskExec< ExecSpace > ;
using functor_type = FunctorType ;
using result_type = ResultType ;
template< typename Type >
KOKKOS_INLINE_FUNCTION static
void apply_functor
( Type * const task
, typename std::enable_if
< std::is_same< typename Type::result_type , void >::value
, member_type * const
>::type member
)
{
using fType = typename Type::functor_type ;
static_cast<fType*>(task)->operator()( *member );
}
template< typename Type >
KOKKOS_INLINE_FUNCTION static
void apply_functor
( Type * const task
, typename std::enable_if
< ! std::is_same< typename Type::result_type , void >::value
, member_type * const
>::type member
)
{
using fType = typename Type::functor_type ;
static_cast<fType*>(task)->operator()( *member , task->m_result );
}
KOKKOS_FUNCTION static
void apply( root_type * root , void * exec )
{
TaskBase * const lock = reinterpret_cast< TaskBase * >( root_type::LockTag );
TaskBase * const task = static_cast< TaskBase * >( root );
member_type * const member = reinterpret_cast< member_type * >( exec );
TaskBase::template apply_functor( task , member );
// Task may be serial or team.
// If team then must synchronize before querying task->m_next.
// If team then only one thread calls destructor.
member->team_barrier();
if ( 0 == member->team_rank() && lock == task->m_next ) {
// Did not respawn, destroy the functor to free memory
static_cast<functor_type*>(task)->~functor_type();
// Cannot destroy the task until its dependences
// have been processed.
}
}
KOKKOS_INLINE_FUNCTION
TaskBase( FunctorType const & arg_functor )
: base_type()
, FunctorType( arg_functor )
{}
KOKKOS_INLINE_FUNCTION
~TaskBase() {}
};
} /* namespace Impl */
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
-#endif /* #if defined( KOKKOS_ENABLE_TASKPOLICY ) */
+#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
#endif /* #ifndef KOKKOS_IMPL_TASKQUEUE_HPP */
diff --git a/lib/kokkos/core/src/impl/Kokkos_TaskQueue_impl.hpp b/lib/kokkos/core/src/impl/Kokkos_TaskQueue_impl.hpp
index 70a880d4a..05fd06a9a 100644
--- a/lib/kokkos/core/src/impl/Kokkos_TaskQueue_impl.hpp
+++ b/lib/kokkos/core/src/impl/Kokkos_TaskQueue_impl.hpp
@@ -1,569 +1,570 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
-#if defined( KOKKOS_ENABLE_TASKPOLICY )
+#if defined( KOKKOS_ENABLE_TASKDAG )
namespace Kokkos {
namespace Impl {
//----------------------------------------------------------------------------
template< typename ExecSpace >
void TaskQueue< ExecSpace >::Destroy::destroy_shared_allocation()
{
m_queue->~TaskQueue();
}
//----------------------------------------------------------------------------
template< typename ExecSpace >
TaskQueue< ExecSpace >::TaskQueue
( const TaskQueue< ExecSpace >::memory_space & arg_space
, unsigned const arg_memory_pool_capacity
, unsigned const arg_memory_pool_superblock_capacity_log2
)
: m_memory( arg_space
, arg_memory_pool_capacity
, arg_memory_pool_superblock_capacity_log2 )
, m_ready()
, m_accum_alloc(0)
+ , m_count_alloc(0)
, m_max_alloc(0)
, m_ready_count(0)
{
for ( int i = 0 ; i < NumQueue ; ++i ) {
m_ready[i][0] = (task_root_type *) task_root_type::EndTag ;
m_ready[i][1] = (task_root_type *) task_root_type::EndTag ;
}
}
//----------------------------------------------------------------------------
template< typename ExecSpace >
TaskQueue< ExecSpace >::~TaskQueue()
{
// Verify that queues are empty and ready count is zero
for ( int i = 0 ; i < NumQueue ; ++i ) {
for ( int j = 0 ; j < 2 ; ++j ) {
if ( m_ready[i][j] != (task_root_type *) task_root_type::EndTag ) {
Kokkos::abort("TaskQueue::~TaskQueue ERROR: has ready tasks");
}
}
}
if ( 0 != m_ready_count ) {
Kokkos::abort("TaskQueue::~TaskQueue ERROR: has ready or executing tasks");
}
}
//----------------------------------------------------------------------------
template< typename ExecSpace >
KOKKOS_FUNCTION
void TaskQueue< ExecSpace >::decrement
( TaskQueue< ExecSpace >::task_root_type * task )
{
const int count = Kokkos::atomic_fetch_add(&(task->m_ref_count),-1);
#if 0
if ( 1 == count ) {
printf( "decrement-destroy( 0x%lx { 0x%lx %d %d } )\n"
, uintptr_t( task )
, uintptr_t( task->m_next )
, int( task->m_task_type )
, int( task->m_ref_count )
);
}
#endif
if ( ( 1 == count ) &&
( task->m_next == (task_root_type *) task_root_type::LockTag ) ) {
// Reference count is zero and task is complete, deallocate.
task->m_queue->deallocate( task , task->m_alloc_size );
}
else if ( count <= 1 ) {
- Kokkos::abort("TaskPolicy task has negative reference count or is incomplete" );
+ Kokkos::abort("TaskScheduler task has negative reference count or is incomplete" );
}
}
//----------------------------------------------------------------------------
template< typename ExecSpace >
KOKKOS_FUNCTION
size_t TaskQueue< ExecSpace >::allocate_block_size( size_t n )
{
return m_memory.allocate_block_size( n );
}
template< typename ExecSpace >
KOKKOS_FUNCTION
void * TaskQueue< ExecSpace >::allocate( size_t n )
{
void * const p = m_memory.allocate(n);
if ( p ) {
Kokkos::atomic_increment( & m_accum_alloc );
Kokkos::atomic_increment( & m_count_alloc );
if ( m_max_alloc < m_count_alloc ) m_max_alloc = m_count_alloc ;
}
return p ;
}
template< typename ExecSpace >
KOKKOS_FUNCTION
void TaskQueue< ExecSpace >::deallocate( void * p , size_t n )
{
m_memory.deallocate( p , n );
Kokkos::atomic_decrement( & m_count_alloc );
}
//----------------------------------------------------------------------------
template< typename ExecSpace >
KOKKOS_FUNCTION
bool TaskQueue< ExecSpace >::push_task
( TaskQueue< ExecSpace >::task_root_type * volatile * const queue
, TaskQueue< ExecSpace >::task_root_type * const task
)
{
// Push task into a concurrently pushed and popped queue.
// The queue is a linked list where 'task->m_next' form the links.
// Fail the push attempt if the queue is locked;
// otherwise retry until the push succeeds.
#if 0
printf( "push_task( 0x%lx { 0x%lx } 0x%lx { 0x%lx 0x%lx %d %d %d } )\n"
, uintptr_t(queue)
, uintptr_t(*queue)
, uintptr_t(task)
, uintptr_t(task->m_wait)
, uintptr_t(task->m_next)
, task->m_task_type
, task->m_priority
, task->m_ref_count );
#endif
task_root_type * const zero = (task_root_type *) 0 ;
task_root_type * const lock = (task_root_type *) task_root_type::LockTag ;
task_root_type * volatile * const next = & task->m_next ;
if ( zero != *next ) {
Kokkos::abort("TaskQueue::push_task ERROR: already a member of another queue" );
}
task_root_type * y = *queue ;
while ( lock != y ) {
*next = y ;
// Do not proceed until '*next' has been stored.
Kokkos::memory_fence();
task_root_type * const x = y ;
y = Kokkos::atomic_compare_exchange(queue,y,task);
if ( x == y ) return true ;
}
// Failed, replace 'task->m_next' value since 'task' remains
// not a member of a queue.
*next = zero ;
// Do not proceed until '*next' has been stored.
Kokkos::memory_fence();
return false ;
}
//----------------------------------------------------------------------------
template< typename ExecSpace >
KOKKOS_FUNCTION
typename TaskQueue< ExecSpace >::task_root_type *
TaskQueue< ExecSpace >::pop_task
( TaskQueue< ExecSpace >::task_root_type * volatile * const queue )
{
// Pop task from a concurrently pushed and popped queue.
// The queue is a linked list where 'task->m_next' form the links.
task_root_type * const zero = (task_root_type *) 0 ;
task_root_type * const lock = (task_root_type *) task_root_type::LockTag ;
task_root_type * const end = (task_root_type *) task_root_type::EndTag ;
// *queue is
// end => an empty queue
// lock => a locked queue
// valid
// Retry until the lock is acquired or the queue is empty.
task_root_type * task = *queue ;
while ( end != task ) {
// The only possible values for the queue are
// (1) lock, (2) end, or (3) a valid task.
// Thus zero will never appear in the queue.
//
// If queue is locked then just read by guaranteeing
// the CAS will fail.
if ( lock == task ) task = 0 ;
task_root_type * const x = task ;
task = Kokkos::atomic_compare_exchange(queue,task,lock);
if ( x == task ) break ; // CAS succeeded and queue is locked
}
if ( end != task ) {
// This thread has locked the queue and removed 'task' from the queue.
// Extract the next entry of the queue from 'task->m_next'
// and mark 'task' as popped from a queue by setting
// 'task->m_next = lock'.
task_root_type * const next =
Kokkos::atomic_exchange( & task->m_next , lock );
// Place the next entry in the head of the queue,
// which also unlocks the queue.
task_root_type * const unlock =
Kokkos::atomic_exchange( queue , next );
if ( next == zero || next == lock || lock != unlock ) {
Kokkos::abort("TaskQueue::pop_task ERROR");
}
}
#if 0
if ( end != task ) {
printf( "pop_task( 0x%lx 0x%lx { 0x%lx 0x%lx %d %d %d } )\n"
, uintptr_t(queue)
, uintptr_t(task)
, uintptr_t(task->m_wait)
, uintptr_t(task->m_next)
, int(task->m_task_type)
, int(task->m_priority)
, int(task->m_ref_count) );
}
#endif
return task ;
}
//----------------------------------------------------------------------------
template< typename ExecSpace >
KOKKOS_FUNCTION
void TaskQueue< ExecSpace >::schedule
( TaskQueue< ExecSpace >::task_root_type * const task )
{
// Schedule a runnable or when_all task upon construction / spawn
// and upon completion of other tasks that 'task' is waiting on.
// Precondition on runnable task state:
// task is either constructing or executing
//
// Constructing state:
// task->m_wait == 0
// task->m_next == dependence
// Executing-respawn state:
// task->m_wait == head of linked list
// task->m_next == dependence
//
// Task state transition:
// Constructing -> Waiting
// Executing-respawn -> Waiting
//
// Postcondition on task state:
// task->m_wait == head of linked list
// task->m_next == member of linked list
#if 0
printf( "schedule( 0x%lx { 0x%lx 0x%lx %d %d %d }\n"
, uintptr_t(task)
, uintptr_t(task->m_wait)
, uintptr_t(task->m_next)
, task->m_task_type
, task->m_priority
, task->m_ref_count );
#endif
task_root_type * const zero = (task_root_type *) 0 ;
task_root_type * const lock = (task_root_type *) task_root_type::LockTag ;
task_root_type * const end = (task_root_type *) task_root_type::EndTag ;
//----------------------------------------
{
// If Constructing then task->m_wait == 0
// Change to waiting by task->m_wait = EndTag
task_root_type * const init =
Kokkos::atomic_compare_exchange( & task->m_wait , zero , end );
// Precondition
if ( lock == init ) {
Kokkos::abort("TaskQueue::schedule ERROR: task is complete");
}
// if ( init == 0 ) Constructing -> Waiting
// else Executing-Respawn -> Waiting
}
//----------------------------------------
if ( task_root_type::Aggregate != task->m_task_type ) {
// Scheduling a runnable task which may have a depencency 'dep'.
// Extract dependence, if any, from task->m_next.
// If 'dep' is not null then attempt to push 'task'
// into the wait queue of 'dep'.
// If the push succeeds then 'task' may be
// processed or executed by another thread at any time.
// If the push fails then 'dep' is complete and 'task'
// is ready to execute.
task_root_type * dep = Kokkos::atomic_exchange( & task->m_next , zero );
const bool is_ready =
( 0 == dep ) || ( ! push_task( & dep->m_wait , task ) );
// Reference count for dep was incremented when assigned
// to task->m_next so that if it completed prior to the
// above push_task dep would not be destroyed.
// dep reference count can now be decremented,
// which may deallocate the task.
TaskQueue::assign( & dep , (task_root_type *)0 );
if ( is_ready ) {
// No dependence or 'dep' is complete so push task into ready queue.
// Increment the ready count before pushing into ready queue
// to track number of ready + executing tasks.
// The ready count will be decremented when the task is complete.
Kokkos::atomic_increment( & m_ready_count );
task_root_type * volatile * const queue =
& m_ready[ task->m_priority ][ task->m_task_type ];
// A push_task fails if the ready queue is locked.
// A ready queue is only locked during a push or pop;
// i.e., it is never permanently locked.
// Retry push to ready queue until it succeeds.
// When the push succeeds then 'task' may be
// processed or executed by another thread at any time.
while ( ! push_task( queue , task ) );
}
}
//----------------------------------------
else {
// Scheduling a 'when_all' task with multiple dependences.
// This scheduling may be called when the 'when_all' is
// (1) created or
// (2) being removed from a completed task's wait list.
task_root_type ** const aggr = task->aggregate_dependences();
// Assume the 'when_all' is complete until a dependence is
// found that is not complete.
bool is_complete = true ;
for ( int i = task->m_dep_count ; 0 < i && is_complete ; ) {
--i ;
// Loop dependences looking for an incomplete task.
// Add this task to the incomplete task's wait queue.
// Remove a task 'x' from the dependence list.
// The reference count of 'x' was incremented when
// it was assigned into the dependence list.
task_root_type * x = Kokkos::atomic_exchange( aggr + i , zero );
if ( x ) {
// If x->m_wait is not locked then push succeeds
// and the aggregate is not complete.
// If the push succeeds then this when_all 'task' may be
// processed by another thread at any time.
// For example, 'x' may be completeed by another
// thread and then re-schedule this when_all 'task'.
is_complete = ! push_task( & x->m_wait , task );
// Decrement reference count which had been incremented
// when 'x' was added to the dependence list.
TaskQueue::assign( & x , zero );
}
}
if ( is_complete ) {
// The when_all 'task' was not added to a wait queue because
// all dependences were complete so this aggregate is complete.
// Complete the when_all 'task' to schedule other tasks
// that are waiting for the when_all 'task' to complete.
task->m_next = lock ;
complete( task );
// '*task' may have been deleted upon completion
}
}
//----------------------------------------
// Postcondition:
// A runnable 'task' was pushed into a wait or ready queue.
// An aggregate 'task' was either pushed to a wait queue
// or completed.
// Concurrent execution may have already popped 'task'
// from a queue and processed it as appropriate.
}
//----------------------------------------------------------------------------
template< typename ExecSpace >
KOKKOS_FUNCTION
void TaskQueue< ExecSpace >::complete
( TaskQueue< ExecSpace >::task_root_type * task )
{
// Complete a runnable task that has finished executing
// or a when_all task when all of its dependeneces are complete.
task_root_type * const zero = (task_root_type *) 0 ;
task_root_type * const lock = (task_root_type *) task_root_type::LockTag ;
task_root_type * const end = (task_root_type *) task_root_type::EndTag ;
#if 0
printf( "complete( 0x%lx { 0x%lx 0x%lx %d %d %d }\n"
, uintptr_t(task)
, uintptr_t(task->m_wait)
, uintptr_t(task->m_next)
, task->m_task_type
, task->m_priority
, task->m_ref_count );
fflush( stdout );
#endif
const bool runnable = task_root_type::Aggregate != task->m_task_type ;
//----------------------------------------
if ( runnable && lock != task->m_next ) {
// Is a runnable task has finished executing and requested respawn.
// Schedule the task for subsequent execution.
schedule( task );
}
//----------------------------------------
else {
// Is either an aggregate or a runnable task that executed
// and did not respawn. Transition this task to complete.
// If 'task' is an aggregate then any of the runnable tasks that
// it depends upon may be attempting to complete this 'task'.
// Must only transition a task once to complete status.
// This is controled by atomically locking the wait queue.
// Stop other tasks from adding themselves to this task's wait queue
// by locking the head of this task's wait queue.
task_root_type * x = Kokkos::atomic_exchange( & task->m_wait , lock );
if ( x != (task_root_type *) lock ) {
// This thread has transitioned this 'task' to complete.
// 'task' is no longer in a queue and is not executing
// so decrement the reference count from 'task's creation.
// If no other references to this 'task' then it will be deleted.
TaskQueue::assign( & task , zero );
// This thread has exclusive access to the wait list so
// the concurrency-safe pop_task function is not needed.
// Schedule the tasks that have been waiting on the input 'task',
// which may have been deleted.
while ( x != end ) {
// Set x->m_next = zero <= no dependence
task_root_type * const next =
(task_root_type *) Kokkos::atomic_exchange( & x->m_next , zero );
schedule( x );
x = next ;
}
}
}
if ( runnable ) {
// A runnable task was popped from a ready queue and executed.
// If respawned into a ready queue then the ready count was incremented
// so decrement whether respawned or not.
Kokkos::atomic_decrement( & m_ready_count );
}
}
//----------------------------------------------------------------------------
} /* namespace Impl */
} /* namespace Kokkos */
-#endif /* #if defined( KOKKOS_ENABLE_TASKPOLICY ) */
+#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
diff --git a/lib/kokkos/core/src/impl/Kokkos_Timer.hpp b/lib/kokkos/core/src/impl/Kokkos_Timer.hpp
index 1f14e4287..293e395b8 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Timer.hpp
+++ b/lib/kokkos/core/src/impl/Kokkos_Timer.hpp
@@ -1,118 +1,63 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_IMPLWALLTIME_HPP
#define KOKKOS_IMPLWALLTIME_HPP
-#include <stddef.h>
-
-#ifdef _MSC_VER
-#undef KOKKOS_USE_LIBRT
-#include <gettimeofday.c>
-#else
-#ifdef KOKKOS_USE_LIBRT
-#include <ctime>
-#else
-#include <sys/time.h>
-#endif
-#endif
+#include <Kokkos_Timer.hpp>
namespace Kokkos {
namespace Impl {
-/** \brief Time since construction */
-
-class Timer {
-private:
- #ifdef KOKKOS_USE_LIBRT
- struct timespec m_old;
- #else
- struct timeval m_old ;
- #endif
- Timer( const Timer & );
- Timer & operator = ( const Timer & );
-public:
-
- inline
- void reset() {
- #ifdef KOKKOS_USE_LIBRT
- clock_gettime(CLOCK_REALTIME, &m_old);
- #else
- gettimeofday( & m_old , ((struct timezone *) NULL ) );
- #endif
- }
-
- inline
- ~Timer() {}
-
- inline
- Timer() { reset(); }
+/** \brief Time since construction
+ * Timer promoted from Impl to Kokkos ns
+ * This file included for backwards compatibility
+ */
- inline
- double seconds() const
- {
- #ifdef KOKKOS_USE_LIBRT
- struct timespec m_new;
- clock_gettime(CLOCK_REALTIME, &m_new);
-
- return ( (double) ( m_new.tv_sec - m_old.tv_sec ) ) +
- ( (double) ( m_new.tv_nsec - m_old.tv_nsec ) * 1.0e-9 );
- #else
- struct timeval m_new ;
-
- ::gettimeofday( & m_new , ((struct timezone *) NULL ) );
-
- return ( (double) ( m_new.tv_sec - m_old.tv_sec ) ) +
- ( (double) ( m_new.tv_usec - m_old.tv_usec ) * 1.0e-6 );
- #endif
- }
-};
+ using Kokkos::Timer ;
} // namespace Impl
-
- using Kokkos::Impl::Timer ;
-
} // namespace Kokkos
#endif /* #ifndef KOKKOS_IMPLWALLTIME_HPP */
diff --git a/lib/kokkos/core/src/impl/Kokkos_Utilities.hpp b/lib/kokkos/core/src/impl/Kokkos_Utilities.hpp
new file mode 100644
index 000000000..d66fdd9a5
--- /dev/null
+++ b/lib/kokkos/core/src/impl/Kokkos_Utilities.hpp
@@ -0,0 +1,414 @@
+/*
+//@HEADER
+// ************************************************************************
+//
+// Kokkos v. 2.0
+// Copyright (2014) Sandia Corporation
+//
+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
+// the U.S. Government retains certain rights in this software.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the Corporation nor the names of the
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
+//
+// ************************************************************************
+//@HEADER
+*/
+
+#ifndef KOKKOS_CORE_IMPL_UTILITIES_HPP
+#define KOKKOS_CORE_IMPL_UTILITIES_HPP
+
+#include <Kokkos_Macros.hpp>
+#include <type_traits>
+
+//----------------------------------------------------------------------------
+//----------------------------------------------------------------------------
+
+namespace Kokkos { namespace Impl {
+
+// same as std::forward
+// needed to allow perfect forwarding on the device
+template <typename T>
+KOKKOS_INLINE_FUNCTION
+constexpr
+T&& forward( typename std::remove_reference<T>::type& arg ) noexcept
+{ return static_cast<T&&>(arg); }
+
+template <typename T>
+KOKKOS_INLINE_FUNCTION
+constexpr
+T&& forward( typename std::remove_reference<T>::type&& arg ) noexcept
+{ return static_cast<T&&>(arg); }
+
+// same as std::move
+// needed to allowing moving on the device
+template <typename T>
+KOKKOS_INLINE_FUNCTION
+constexpr
+typename std::remove_reference<T>::type&& move( T&& arg ) noexcept
+{ return static_cast<typename std::remove_reference<T>::type&&>(arg); }
+
+// empty function to allow expanding a variadic argument pack
+template<typename... Args>
+KOKKOS_INLINE_FUNCTION
+void expand_variadic(Args &&...) {}
+
+//----------------------------------------
+// C++14 integer sequence
+template< typename T , T ... Ints >
+struct integer_sequence {
+ using value_type = T ;
+ static constexpr std::size_t size() noexcept { return sizeof...(Ints); }
+};
+
+template< typename T , std::size_t N >
+struct make_integer_sequence_helper ;
+
+template< typename T , T N >
+using make_integer_sequence =
+ typename make_integer_sequence_helper<T,N>::type ;
+
+template< typename T >
+struct make_integer_sequence_helper< T , 0 >
+{ using type = integer_sequence<T> ; };
+
+template< typename T >
+struct make_integer_sequence_helper< T , 1 >
+{ using type = integer_sequence<T,0> ; };
+
+template< typename T >
+struct make_integer_sequence_helper< T , 2 >
+{ using type = integer_sequence<T,0,1> ; };
+
+template< typename T >
+struct make_integer_sequence_helper< T , 3 >
+{ using type = integer_sequence<T,0,1,2> ; };
+
+template< typename T >
+struct make_integer_sequence_helper< T , 4 >
+{ using type = integer_sequence<T,0,1,2,3> ; };
+
+template< typename T >
+struct make_integer_sequence_helper< T , 5 >
+{ using type = integer_sequence<T,0,1,2,3,4> ; };
+
+template< typename T >
+struct make_integer_sequence_helper< T , 6 >
+{ using type = integer_sequence<T,0,1,2,3,4,5> ; };
+
+template< typename T >
+struct make_integer_sequence_helper< T , 7 >
+{ using type = integer_sequence<T,0,1,2,3,4,5,6> ; };
+
+template< typename T >
+struct make_integer_sequence_helper< T , 8 >
+{ using type = integer_sequence<T,0,1,2,3,4,5,6,7> ; };
+
+template< typename X , typename Y >
+struct make_integer_sequence_concat ;
+
+template< typename T , T ... x , T ... y >
+struct make_integer_sequence_concat< integer_sequence<T,x...>
+ , integer_sequence<T,y...> >
+{ using type = integer_sequence< T , x ... , (sizeof...(x)+y)... > ; };
+
+template< typename T , std::size_t N >
+struct make_integer_sequence_helper {
+ using type = typename make_integer_sequence_concat
+ < typename make_integer_sequence_helper< T , N/2 >::type
+ , typename make_integer_sequence_helper< T , N - N/2 >::type
+ >::type ;
+};
+
+//----------------------------------------
+
+template <std::size_t... Indices>
+using index_sequence = integer_sequence<std::size_t, Indices...>;
+
+template< std::size_t N >
+using make_index_sequence = make_integer_sequence< std::size_t, N>;
+
+//----------------------------------------
+
+template <unsigned I, typename IntegerSequence>
+struct integer_sequence_at;
+
+template <unsigned I, typename T, T h0, T... tail>
+struct integer_sequence_at<I, integer_sequence<T, h0, tail...> >
+ : public integer_sequence_at<I-1u, integer_sequence<T,tail...> >
+{
+ static_assert( 8 <= I , "Reasoning Error" );
+ static_assert( I < integer_sequence<T, h0, tail...>::size(), "Error: Index out of bounds");
+};
+
+template < typename T, T h0, T... tail>
+struct integer_sequence_at<0u, integer_sequence<T,h0, tail...> >
+{
+ using type = T;
+ static constexpr T value = h0;
+};
+
+template < typename T, T h0, T h1, T... tail>
+struct integer_sequence_at<1u, integer_sequence<T, h0, h1, tail...> >
+{
+ using type = T;
+ static constexpr T value = h1;
+};
+
+template < typename T, T h0, T h1, T h2, T... tail>
+struct integer_sequence_at<2u, integer_sequence<T, h0, h1, h2, tail...> >
+{
+ using type = T;
+ static constexpr T value = h2;
+};
+
+template < typename T, T h0, T h1, T h2, T h3, T... tail>
+struct integer_sequence_at<3u, integer_sequence<T, h0, h1, h2, h3, tail...> >
+{
+ using type = T;
+ static constexpr T value = h3;
+};
+
+template < typename T, T h0, T h1, T h2, T h3, T h4, T... tail>
+struct integer_sequence_at<4u, integer_sequence<T, h0, h1, h2, h3, h4, tail...> >
+{
+ using type = T;
+ static constexpr T value = h4;
+};
+
+template < typename T, T h0, T h1, T h2, T h3, T h4, T h5, T... tail>
+struct integer_sequence_at<5u, integer_sequence<T, h0, h1, h2, h3, h4, h5, tail...> >
+{
+ using type = T;
+ static constexpr T value = h5;
+};
+
+template < typename T, T h0, T h1, T h2, T h3, T h4, T h5, T h6, T... tail>
+struct integer_sequence_at<6u, integer_sequence<T, h0, h1, h2, h3, h4, h5, h6, tail...> >
+{
+ using type = T;
+ static constexpr T value = h6;
+};
+
+template < typename T, T h0, T h1, T h2, T h3, T h4, T h5, T h6, T h7, T... tail>
+struct integer_sequence_at<7u, integer_sequence<T, h0, h1, h2, h3, h4, h5, h6, h7, tail...> >
+{
+ using type = T;
+ static constexpr T value = h7;
+};
+
+//----------------------------------------
+
+template <typename T>
+constexpr
+T at( const unsigned, integer_sequence<T> ) noexcept
+{ return ~static_cast<T>(0); }
+
+template <typename T, T h0, T... tail>
+constexpr
+T at( const unsigned i, integer_sequence<T, h0> ) noexcept
+{ return i==0u ? h0 : ~static_cast<T>(0); }
+
+template <typename T, T h0, T h1>
+constexpr
+T at( const unsigned i, integer_sequence<T, h0, h1> ) noexcept
+{ return i==0u ? h0 :
+ i==1u ? h1 : ~static_cast<T>(0);
+}
+
+template <typename T, T h0, T h1, T h2>
+constexpr
+T at( const unsigned i, integer_sequence<T, h0, h1, h2> ) noexcept
+{ return i==0u ? h0 :
+ i==1u ? h1 :
+ i==2u ? h2 : ~static_cast<T>(0);
+}
+
+template <typename T, T h0, T h1, T h2, T h3>
+constexpr
+T at( const unsigned i, integer_sequence<T, h0, h1, h2, h3> ) noexcept
+{ return i==0u ? h0 :
+ i==1u ? h1 :
+ i==2u ? h2 :
+ i==3u ? h3 : ~static_cast<T>(0);
+}
+
+template <typename T, T h0, T h1, T h2, T h3, T h4>
+constexpr
+T at( const unsigned i, integer_sequence<T, h0, h1, h2, h3, h4> ) noexcept
+{ return i==0u ? h0 :
+ i==1u ? h1 :
+ i==2u ? h2 :
+ i==3u ? h3 :
+ i==4u ? h4 : ~static_cast<T>(0);
+}
+
+template <typename T, T h0, T h1, T h2, T h3, T h4, T h5>
+constexpr
+T at( const unsigned i, integer_sequence<T, h0, h1, h2, h3, h4, h5> ) noexcept
+{ return i==0u ? h0 :
+ i==1u ? h1 :
+ i==2u ? h2 :
+ i==3u ? h3 :
+ i==4u ? h4 :
+ i==5u ? h5 : ~static_cast<T>(0);
+}
+
+template <typename T, T h0, T h1, T h2, T h3, T h4, T h5, T h6>
+constexpr
+T at( const unsigned i, integer_sequence<T, h0, h1, h2, h3, h4, h5, h6> ) noexcept
+{ return i==0u ? h0 :
+ i==1u ? h1 :
+ i==2u ? h2 :
+ i==3u ? h3 :
+ i==4u ? h4 :
+ i==5u ? h5 :
+ i==6u ? h6 : ~static_cast<T>(0);
+}
+
+template <typename T, T h0, T h1, T h2, T h3, T h4, T h5, T h6, T h7, T... tail>
+constexpr
+T at( const unsigned i, integer_sequence<T, h0, h1, h2, h3, h4, h5, h6, h7, tail...> ) noexcept
+{ return i==0u ? h0 :
+ i==1u ? h1 :
+ i==2u ? h2 :
+ i==3u ? h3 :
+ i==4u ? h4 :
+ i==5u ? h5 :
+ i==6u ? h6 :
+ i==7u ? h7 : at(i-8u, integer_sequence<T, tail...>{} );
+}
+
+//----------------------------------------
+
+
+template < typename IntegerSequence
+ , typename ResultSequence = integer_sequence<typename IntegerSequence::value_type>
+ >
+struct reverse_integer_sequence_helper;
+
+template <typename T, T h0, T... tail, T... results>
+struct reverse_integer_sequence_helper< integer_sequence<T, h0, tail...>, integer_sequence<T, results...> >
+ : public reverse_integer_sequence_helper< integer_sequence<T, tail...>, integer_sequence<T, h0, results...> >
+{};
+
+template <typename T, T... results>
+struct reverse_integer_sequence_helper< integer_sequence<T>, integer_sequence<T, results...> >
+{
+ using type = integer_sequence<T, results...>;
+};
+
+
+template <typename IntegerSequence>
+using reverse_integer_sequence = typename reverse_integer_sequence_helper<IntegerSequence>::type;
+
+//----------------------------------------
+
+template < typename IntegerSequence
+ , typename Result
+ , typename ResultSequence = integer_sequence<typename IntegerSequence::value_type>
+ >
+struct exclusive_scan_integer_sequence_helper;
+
+template <typename T, T h0, T... tail, typename Result, T... results>
+struct exclusive_scan_integer_sequence_helper
+ < integer_sequence<T, h0, tail...>
+ , Result
+ , integer_sequence<T, results...> >
+ : public exclusive_scan_integer_sequence_helper
+ < integer_sequence<T, tail...>
+ , std::integral_constant<T,Result::value+h0>
+ , integer_sequence<T, 0, (results+h0)...> >
+{};
+
+template <typename T, typename Result, T... results>
+struct exclusive_scan_integer_sequence_helper
+ < integer_sequence<T>, Result, integer_sequence<T, results...> >
+{
+ using type = integer_sequence<T, results...>;
+ static constexpr T value = Result::value ;
+};
+
+template <typename IntegerSequence>
+struct exclusive_scan_integer_sequence
+{
+ using value_type = typename IntegerSequence::value_type;
+ using helper =
+ exclusive_scan_integer_sequence_helper
+ < reverse_integer_sequence<IntegerSequence>
+ , std::integral_constant< value_type , 0 >
+ > ;
+ using type = typename helper::type ;
+ static constexpr value_type value = helper::value ;
+};
+
+//----------------------------------------
+
+template < typename IntegerSequence
+ , typename Result
+ , typename ResultSequence = integer_sequence<typename IntegerSequence::value_type>
+ >
+struct inclusive_scan_integer_sequence_helper;
+
+template <typename T, T h0, T... tail, typename Result, T... results>
+struct inclusive_scan_integer_sequence_helper
+ < integer_sequence<T, h0, tail...>
+ , Result
+ , integer_sequence<T, results...> >
+ : public inclusive_scan_integer_sequence_helper
+ < integer_sequence<T, tail...>
+ , std::integral_constant<T,Result::value+h0>
+ , integer_sequence<T, h0, (results+h0)...> >
+{};
+
+template <typename T, typename Result, T... results>
+struct inclusive_scan_integer_sequence_helper
+ < integer_sequence<T>, Result, integer_sequence<T, results...> >
+{
+ using type = integer_sequence<T, results...>;
+ static constexpr T value = Result::value ;
+};
+
+template <typename IntegerSequence>
+struct inclusive_scan_integer_sequence
+{
+ using value_type = typename IntegerSequence::value_type;
+ using helper =
+ inclusive_scan_integer_sequence_helper
+ < reverse_integer_sequence<IntegerSequence>
+ , std::integral_constant< value_type , 0 >
+ > ;
+ using type = typename helper::type ;
+ static constexpr value_type value = helper::value ;
+};
+
+}} // namespace Kokkos::Impl
+
+
+#endif //KOKKOS_CORE_IMPL_UTILITIES
diff --git a/lib/kokkos/core/src/impl/KokkosExp_ViewArray.hpp b/lib/kokkos/core/src/impl/Kokkos_ViewArray.hpp
similarity index 96%
rename from lib/kokkos/core/src/impl/KokkosExp_ViewArray.hpp
rename to lib/kokkos/core/src/impl/Kokkos_ViewArray.hpp
index 17d28ace4..c55636b64 100644
--- a/lib/kokkos/core/src/impl/KokkosExp_ViewArray.hpp
+++ b/lib/kokkos/core/src/impl/Kokkos_ViewArray.hpp
@@ -1,606 +1,606 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_EXPERIMENTAL_VIEW_ARRAY_MAPPING_HPP
#define KOKKOS_EXPERIMENTAL_VIEW_ARRAY_MAPPING_HPP
#include <Kokkos_Array.hpp>
namespace Kokkos {
namespace Experimental {
namespace Impl {
template< class DataType , class ArrayLayout , class V , size_t N , class P >
struct ViewDataAnalysis< DataType , ArrayLayout , Kokkos::Array<V,N,P> >
{
private:
typedef ViewArrayAnalysis<DataType> array_analysis ;
static_assert( std::is_same<P,void>::value , "" );
static_assert( std::is_same<typename array_analysis::non_const_value_type , Kokkos::Array<V,N,P> >::value , "" );
static_assert( std::is_scalar<V>::value , "View of Array type must be of a scalar type" );
public:
typedef Kokkos::Array<> specialize ;
typedef typename array_analysis::dimension dimension ;
private:
enum { is_const = std::is_same< typename array_analysis::value_type
, typename array_analysis::const_value_type
>::value };
typedef typename dimension::template append<N>::type array_scalar_dimension ;
typedef typename std::conditional< is_const , const V , V >::type scalar_type ;
typedef V non_const_scalar_type ;
typedef const V const_scalar_type ;
public:
typedef typename array_analysis::value_type value_type ;
typedef typename array_analysis::const_value_type const_value_type ;
typedef typename array_analysis::non_const_value_type non_const_value_type ;
typedef typename ViewDataType< value_type , dimension >::type type ;
typedef typename ViewDataType< const_value_type , dimension >::type const_type ;
typedef typename ViewDataType< non_const_value_type , dimension >::type non_const_type ;
typedef typename ViewDataType< scalar_type , array_scalar_dimension >::type scalar_array_type ;
typedef typename ViewDataType< const_scalar_type , array_scalar_dimension >::type const_scalar_array_type ;
typedef typename ViewDataType< non_const_scalar_type , array_scalar_dimension >::type non_const_scalar_array_type ;
};
}}} // namespace Kokkos::Experimental::Impl
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Experimental {
namespace Impl {
/** \brief View mapping for non-specialized data type and standard layout */
template< class Traits >
class ViewMapping< Traits ,
typename std::enable_if<(
std::is_same< typename Traits::specialize , Kokkos::Array<> >::value &&
( std::is_same< typename Traits::array_layout , Kokkos::LayoutLeft >::value ||
std::is_same< typename Traits::array_layout , Kokkos::LayoutRight >::value ||
std::is_same< typename Traits::array_layout , Kokkos::LayoutStride >::value )
)>::type >
{
private:
template< class , class ... > friend class ViewMapping ;
- template< class , class ... > friend class Kokkos::Experimental::View ;
+ template< class , class ... > friend class Kokkos::View ;
typedef ViewOffset< typename Traits::dimension
, typename Traits::array_layout
, void
> offset_type ;
typedef typename Traits::value_type::pointer handle_type ;
handle_type m_handle ;
offset_type m_offset ;
size_t m_stride ;
typedef typename Traits::value_type::value_type scalar_type ;
typedef Kokkos::Array< scalar_type , ~size_t(0) , Kokkos::Array<>::contiguous > contiguous_reference ;
typedef Kokkos::Array< scalar_type , ~size_t(0) , Kokkos::Array<>::strided > strided_reference ;
enum { is_contiguous_reference =
( Traits::rank == 0 ) || ( std::is_same< typename Traits::array_layout , Kokkos::LayoutRight >::value ) };
enum { Array_N = Traits::value_type::size() };
enum { Array_S = is_contiguous_reference ? Array_N : 1 };
KOKKOS_INLINE_FUNCTION
ViewMapping( const handle_type & arg_handle , const offset_type & arg_offset )
: m_handle( arg_handle )
, m_offset( arg_offset )
, m_stride( is_contiguous_reference ? 0 : arg_offset.span() )
{}
public:
//----------------------------------------
// Domain dimensions
enum { Rank = Traits::dimension::rank };
template< typename iType >
KOKKOS_INLINE_FUNCTION constexpr size_t extent( const iType & r ) const
{ return m_offset.m_dim.extent(r); }
KOKKOS_INLINE_FUNCTION constexpr
typename Traits::array_layout layout() const
{ return m_offset.layout(); }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_0() const { return m_offset.dimension_0(); }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_1() const { return m_offset.dimension_1(); }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_2() const { return m_offset.dimension_2(); }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_3() const { return m_offset.dimension_3(); }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_4() const { return m_offset.dimension_4(); }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_5() const { return m_offset.dimension_5(); }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_6() const { return m_offset.dimension_6(); }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_7() const { return m_offset.dimension_7(); }
// Is a regular layout with uniform striding for each index.
using is_regular = typename offset_type::is_regular ;
KOKKOS_INLINE_FUNCTION constexpr size_t stride_0() const { return m_offset.stride_0(); }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_1() const { return m_offset.stride_1(); }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_2() const { return m_offset.stride_2(); }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_3() const { return m_offset.stride_3(); }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_4() const { return m_offset.stride_4(); }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_5() const { return m_offset.stride_5(); }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_6() const { return m_offset.stride_6(); }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_7() const { return m_offset.stride_7(); }
//----------------------------------------
// Range span
/** \brief Span of the mapped range */
KOKKOS_INLINE_FUNCTION constexpr size_t span() const
{ return m_offset.span() * Array_N ; }
/** \brief Is the mapped range span contiguous */
KOKKOS_INLINE_FUNCTION constexpr bool span_is_contiguous() const
{ return m_offset.span_is_contiguous(); }
typedef typename std::conditional< is_contiguous_reference , contiguous_reference , strided_reference >::type reference_type ;
typedef handle_type pointer_type ;
/** \brief If data references are lvalue_reference than can query pointer to memory */
KOKKOS_INLINE_FUNCTION constexpr pointer_type data() const
{ return m_handle ; }
//----------------------------------------
// The View class performs all rank and bounds checking before
// calling these element reference methods.
KOKKOS_FORCEINLINE_FUNCTION
reference_type reference() const { return reference_type( m_handle + 0 , Array_N , 0 ); }
template< typename I0 >
KOKKOS_FORCEINLINE_FUNCTION
reference_type
reference( const I0 & i0 ) const
{ return reference_type( m_handle + m_offset(i0) * Array_S , Array_N , m_stride ); }
template< typename I0 , typename I1 >
KOKKOS_FORCEINLINE_FUNCTION
reference_type reference( const I0 & i0 , const I1 & i1 ) const
{ return reference_type( m_handle + m_offset(i0,i1) * Array_S , Array_N , m_stride ); }
template< typename I0 , typename I1 , typename I2 >
KOKKOS_FORCEINLINE_FUNCTION
reference_type reference( const I0 & i0 , const I1 & i1 , const I2 & i2 ) const
{ return reference_type( m_handle + m_offset(i0,i1,i2) * Array_S , Array_N , m_stride ); }
template< typename I0 , typename I1 , typename I2 , typename I3 >
KOKKOS_FORCEINLINE_FUNCTION
reference_type reference( const I0 & i0 , const I1 & i1 , const I2 & i2 , const I3 & i3 ) const
{ return reference_type( m_handle + m_offset(i0,i1,i2,i3) * Array_S , Array_N , m_stride ); }
template< typename I0 , typename I1 , typename I2 , typename I3
, typename I4 >
KOKKOS_FORCEINLINE_FUNCTION
reference_type reference( const I0 & i0 , const I1 & i1 , const I2 & i2 , const I3 & i3
, const I4 & i4 ) const
{ return reference_type( m_handle + m_offset(i0,i1,i2,i3,i4) * Array_S , Array_N , m_stride ); }
template< typename I0 , typename I1 , typename I2 , typename I3
, typename I4 , typename I5 >
KOKKOS_FORCEINLINE_FUNCTION
reference_type reference( const I0 & i0 , const I1 & i1 , const I2 & i2 , const I3 & i3
, const I4 & i4 , const I5 & i5 ) const
{ return reference_type( m_handle + m_offset(i0,i1,i2,i3,i4,i5) * Array_S , Array_N , m_stride ); }
template< typename I0 , typename I1 , typename I2 , typename I3
, typename I4 , typename I5 , typename I6 >
KOKKOS_FORCEINLINE_FUNCTION
reference_type reference( const I0 & i0 , const I1 & i1 , const I2 & i2 , const I3 & i3
, const I4 & i4 , const I5 & i5 , const I6 & i6 ) const
{ return reference_type( m_handle + m_offset(i0,i1,i2,i3,i4,i5,i6) * Array_S , Array_N , m_stride ); }
template< typename I0 , typename I1 , typename I2 , typename I3
, typename I4 , typename I5 , typename I6 , typename I7 >
KOKKOS_FORCEINLINE_FUNCTION
reference_type reference( const I0 & i0 , const I1 & i1 , const I2 & i2 , const I3 & i3
, const I4 & i4 , const I5 & i5 , const I6 & i6 , const I7 & i7 ) const
{ return reference_type( m_handle + m_offset(i0,i1,i2,i3,i4,i5,i6,i7) * Array_S , Array_N , m_stride ); }
//----------------------------------------
private:
enum { MemorySpanMask = 8 - 1 /* Force alignment on 8 byte boundary */ };
enum { MemorySpanSize = sizeof(scalar_type) };
public:
/** \brief Span, in bytes, of the referenced memory */
KOKKOS_INLINE_FUNCTION constexpr size_t memory_span() const
{
return ( m_offset.span() * Array_N * MemorySpanSize + MemorySpanMask ) & ~size_t(MemorySpanMask);
}
//----------------------------------------
KOKKOS_INLINE_FUNCTION ~ViewMapping() {}
KOKKOS_INLINE_FUNCTION ViewMapping() : m_handle(), m_offset(), m_stride(0) {}
KOKKOS_INLINE_FUNCTION ViewMapping( const ViewMapping & rhs )
: m_handle( rhs.m_handle ), m_offset( rhs.m_offset ), m_stride( rhs.m_stride ) {}
KOKKOS_INLINE_FUNCTION ViewMapping & operator = ( const ViewMapping & rhs )
{ m_handle = rhs.m_handle ; m_offset = rhs.m_offset ; m_stride = rhs.m_stride ; ; return *this ; }
KOKKOS_INLINE_FUNCTION ViewMapping( ViewMapping && rhs )
: m_handle( rhs.m_handle ), m_offset( rhs.m_offset ), m_stride( rhs.m_stride ) {}
KOKKOS_INLINE_FUNCTION ViewMapping & operator = ( ViewMapping && rhs )
{ m_handle = rhs.m_handle ; m_offset = rhs.m_offset ; m_stride = rhs.m_stride ; return *this ; }
//----------------------------------------
template< class ... Args >
KOKKOS_INLINE_FUNCTION
ViewMapping( pointer_type ptr , Args ... args )
: m_handle( ptr )
, m_offset( std::integral_constant< unsigned , 0 >() , args... )
, m_stride( m_offset.span() )
{}
//----------------------------------------
template< class ... P >
- SharedAllocationRecord<> *
- allocate_shared( ViewCtorProp< P... > const & arg_prop
+ Kokkos::Impl::SharedAllocationRecord<> *
+ allocate_shared( Kokkos::Impl::ViewCtorProp< P... > const & arg_prop
, typename Traits::array_layout const & arg_layout
)
{
- typedef ViewCtorProp< P... > alloc_prop ;
+ typedef Kokkos::Impl::ViewCtorProp< P... > alloc_prop ;
typedef typename alloc_prop::execution_space execution_space ;
typedef typename Traits::memory_space memory_space ;
typedef ViewValueFunctor< execution_space , scalar_type > functor_type ;
- typedef SharedAllocationRecord< memory_space , functor_type > record_type ;
+ typedef Kokkos::Impl::SharedAllocationRecord< memory_space , functor_type > record_type ;
// Query the mapping for byte-size of allocation.
typedef std::integral_constant< unsigned ,
alloc_prop::allow_padding ? sizeof(scalar_type) : 0 > padding ;
m_offset = offset_type( padding(), arg_layout );
const size_t alloc_size =
( m_offset.span() * Array_N * MemorySpanSize + MemorySpanMask ) & ~size_t(MemorySpanMask);
// Allocate memory from the memory space and create tracking record.
record_type * const record =
- record_type::allocate( ((ViewCtorProp<void,memory_space> const &) arg_prop ).value
- , ((ViewCtorProp<void,std::string> const &) arg_prop ).value
+ record_type::allocate( ((Kokkos::Impl::ViewCtorProp<void,memory_space> const &) arg_prop ).value
+ , ((Kokkos::Impl::ViewCtorProp<void,std::string> const &) arg_prop ).value
, alloc_size );
if ( alloc_size ) {
m_handle =
handle_type( reinterpret_cast< pointer_type >( record->data() ) );
if ( alloc_prop::initialize ) {
// The functor constructs and destroys
- record->m_destroy = functor_type( ((ViewCtorProp<void,execution_space> const & )arg_prop).value
+ record->m_destroy = functor_type( ((Kokkos::Impl::ViewCtorProp<void,execution_space> const & )arg_prop).value
, (pointer_type) m_handle
, m_offset.span() * Array_N
);
record->m_destroy.construct_shared_allocation();
}
}
return record ;
}
};
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
/** \brief Assign compatible default mappings */
template< class DstTraits , class SrcTraits >
class ViewMapping< DstTraits , SrcTraits ,
typename std::enable_if<(
std::is_same< typename DstTraits::memory_space , typename SrcTraits::memory_space >::value
&&
std::is_same< typename DstTraits::specialize , Kokkos::Array<> >::value
&&
(
std::is_same< typename DstTraits::array_layout , Kokkos::LayoutLeft >::value ||
std::is_same< typename DstTraits::array_layout , Kokkos::LayoutRight >::value ||
std::is_same< typename DstTraits::array_layout , Kokkos::LayoutStride >::value
)
&&
std::is_same< typename SrcTraits::specialize , Kokkos::Array<> >::value
&&
(
std::is_same< typename SrcTraits::array_layout , Kokkos::LayoutLeft >::value ||
std::is_same< typename SrcTraits::array_layout , Kokkos::LayoutRight >::value ||
std::is_same< typename SrcTraits::array_layout , Kokkos::LayoutStride >::value
)
)>::type >
{
public:
enum { is_assignable = true };
- typedef Kokkos::Experimental::Impl::SharedAllocationTracker TrackType ;
+ typedef Kokkos::Impl::SharedAllocationTracker TrackType ;
typedef ViewMapping< DstTraits , void > DstType ;
typedef ViewMapping< SrcTraits , void > SrcType ;
KOKKOS_INLINE_FUNCTION
static void assign( DstType & dst , const SrcType & src , const TrackType & src_track )
{
static_assert( std::is_same< typename DstTraits::value_type , typename SrcTraits::value_type >::value ||
std::is_same< typename DstTraits::value_type , typename SrcTraits::const_value_type >::value
, "View assignment must have same value type or const = non-const" );
static_assert( ViewDimensionAssignable< typename DstTraits::dimension , typename SrcTraits::dimension >::value
, "View assignment must have compatible dimensions" );
static_assert( std::is_same< typename DstTraits::array_layout , typename SrcTraits::array_layout >::value ||
std::is_same< typename DstTraits::array_layout , Kokkos::LayoutStride >::value ||
( DstTraits::dimension::rank == 0 ) ||
( DstTraits::dimension::rank == 1 && DstTraits::dimension::rank_dynamic == 1 )
, "View assignment must have compatible layout or have rank <= 1" );
typedef typename DstType::offset_type dst_offset_type ;
dst.m_offset = dst_offset_type( src.m_offset );
dst.m_handle = src.m_handle ;
dst.m_stride = src.m_stride ;
}
};
/** \brief Assign Array to non-Array */
template< class DstTraits , class SrcTraits >
class ViewMapping< DstTraits , SrcTraits ,
typename std::enable_if<(
std::is_same< typename DstTraits::memory_space , typename SrcTraits::memory_space >::value
&&
std::is_same< typename DstTraits::specialize , void >::value
&&
(
std::is_same< typename DstTraits::array_layout , Kokkos::LayoutLeft >::value ||
std::is_same< typename DstTraits::array_layout , Kokkos::LayoutRight >::value ||
std::is_same< typename DstTraits::array_layout , Kokkos::LayoutStride >::value
)
&&
std::is_same< typename SrcTraits::specialize , Kokkos::Array<> >::value
&&
(
std::is_same< typename SrcTraits::array_layout , Kokkos::LayoutLeft >::value ||
std::is_same< typename SrcTraits::array_layout , Kokkos::LayoutRight >::value ||
std::is_same< typename SrcTraits::array_layout , Kokkos::LayoutStride >::value
)
)>::type >
{
public:
// Can only convert to View::array_type
enum { is_assignable = std::is_same< typename DstTraits::data_type , typename SrcTraits::scalar_array_type >::value &&
std::is_same< typename DstTraits::array_layout , typename SrcTraits::array_layout >::value };
- typedef Kokkos::Experimental::Impl::SharedAllocationTracker TrackType ;
+ typedef Kokkos::Impl::SharedAllocationTracker TrackType ;
typedef ViewMapping< DstTraits , void > DstType ;
typedef ViewMapping< SrcTraits , void > SrcType ;
KOKKOS_INLINE_FUNCTION
static void assign( DstType & dst , const SrcType & src , const TrackType & src_track )
{
static_assert( is_assignable , "Can only convert to array_type" );
typedef typename DstType::offset_type dst_offset_type ;
// Array dimension becomes the last dimension.
// Arguments beyond the destination rank are ignored.
if ( src.span_is_contiguous() ) { // not padded
dst.m_offset = dst_offset_type( std::integral_constant<unsigned,0>() ,
typename DstTraits::array_layout
( ( 0 < SrcType::Rank ? src.dimension_0() : SrcTraits::value_type::size() )
, ( 1 < SrcType::Rank ? src.dimension_1() : SrcTraits::value_type::size() )
, ( 2 < SrcType::Rank ? src.dimension_2() : SrcTraits::value_type::size() )
, ( 3 < SrcType::Rank ? src.dimension_3() : SrcTraits::value_type::size() )
, ( 4 < SrcType::Rank ? src.dimension_4() : SrcTraits::value_type::size() )
, ( 5 < SrcType::Rank ? src.dimension_5() : SrcTraits::value_type::size() )
, ( 6 < SrcType::Rank ? src.dimension_6() : SrcTraits::value_type::size() )
, ( 7 < SrcType::Rank ? src.dimension_7() : SrcTraits::value_type::size() )
) );
}
else { // is padded
typedef std::integral_constant<unsigned,sizeof(typename SrcTraits::value_type::value_type)> padded ;
dst.m_offset = dst_offset_type( padded() ,
typename DstTraits::array_layout
( ( 0 < SrcType::Rank ? src.dimension_0() : SrcTraits::value_type::size() )
, ( 1 < SrcType::Rank ? src.dimension_1() : SrcTraits::value_type::size() )
, ( 2 < SrcType::Rank ? src.dimension_2() : SrcTraits::value_type::size() )
, ( 3 < SrcType::Rank ? src.dimension_3() : SrcTraits::value_type::size() )
, ( 4 < SrcType::Rank ? src.dimension_4() : SrcTraits::value_type::size() )
, ( 5 < SrcType::Rank ? src.dimension_5() : SrcTraits::value_type::size() )
, ( 6 < SrcType::Rank ? src.dimension_6() : SrcTraits::value_type::size() )
, ( 7 < SrcType::Rank ? src.dimension_7() : SrcTraits::value_type::size() )
) );
}
dst.m_handle = src.m_handle ;
}
};
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
template< class SrcTraits , class ... Args >
struct ViewMapping
< typename std::enable_if<(
std::is_same< typename SrcTraits::specialize , Kokkos::Array<> >::value
&&
(
std::is_same< typename SrcTraits::array_layout , Kokkos::LayoutLeft >::value ||
std::is_same< typename SrcTraits::array_layout , Kokkos::LayoutRight >::value ||
std::is_same< typename SrcTraits::array_layout , Kokkos::LayoutStride >::value
)
)>::type
, SrcTraits
, Args ... >
{
private:
static_assert( SrcTraits::rank == sizeof...(Args) , "" );
enum : bool
{ R0 = is_integral_extent<0,Args...>::value
, R1 = is_integral_extent<1,Args...>::value
, R2 = is_integral_extent<2,Args...>::value
, R3 = is_integral_extent<3,Args...>::value
, R4 = is_integral_extent<4,Args...>::value
, R5 = is_integral_extent<5,Args...>::value
, R6 = is_integral_extent<6,Args...>::value
, R7 = is_integral_extent<7,Args...>::value
};
enum { rank = unsigned(R0) + unsigned(R1) + unsigned(R2) + unsigned(R3)
+ unsigned(R4) + unsigned(R5) + unsigned(R6) + unsigned(R7) };
// Whether right-most rank is a range.
enum { R0_rev = 0 == SrcTraits::rank ? false : (
1 == SrcTraits::rank ? R0 : (
2 == SrcTraits::rank ? R1 : (
3 == SrcTraits::rank ? R2 : (
4 == SrcTraits::rank ? R3 : (
5 == SrcTraits::rank ? R4 : (
6 == SrcTraits::rank ? R5 : (
7 == SrcTraits::rank ? R6 : R7 ))))))) };
// Subview's layout
typedef typename std::conditional<
( /* Same array layout IF */
( rank == 0 ) /* output rank zero */
||
// OutputRank 1 or 2, InputLayout Left, Interval 0
// because single stride one or second index has a stride.
( rank <= 2 && R0 && std::is_same< typename SrcTraits::array_layout , Kokkos::LayoutLeft >::value )
||
// OutputRank 1 or 2, InputLayout Right, Interval [InputRank-1]
// because single stride one or second index has a stride.
( rank <= 2 && R0_rev && std::is_same< typename SrcTraits::array_layout , Kokkos::LayoutRight >::value )
), typename SrcTraits::array_layout , Kokkos::LayoutStride
>::type array_layout ;
typedef typename SrcTraits::value_type value_type ;
typedef typename std::conditional< rank == 0 , value_type ,
typename std::conditional< rank == 1 , value_type * ,
typename std::conditional< rank == 2 , value_type ** ,
typename std::conditional< rank == 3 , value_type *** ,
typename std::conditional< rank == 4 , value_type **** ,
typename std::conditional< rank == 5 , value_type ***** ,
typename std::conditional< rank == 6 , value_type ****** ,
typename std::conditional< rank == 7 , value_type ******* ,
value_type ********
>::type >::type >::type >::type >::type >::type >::type >::type
data_type ;
public:
- typedef Kokkos::Experimental::ViewTraits
+ typedef Kokkos::ViewTraits
< data_type
, array_layout
, typename SrcTraits::device_type
, typename SrcTraits::memory_traits > traits_type ;
- typedef Kokkos::Experimental::View
+ typedef Kokkos::View
< data_type
, array_layout
, typename SrcTraits::device_type
, typename SrcTraits::memory_traits > type ;
KOKKOS_INLINE_FUNCTION
static void assign( ViewMapping< traits_type , void > & dst
, ViewMapping< SrcTraits , void > const & src
, Args ... args )
{
typedef ViewMapping< traits_type , void > DstType ;
typedef typename DstType::offset_type dst_offset_type ;
typedef typename DstType::handle_type dst_handle_type ;
const SubviewExtents< SrcTraits::rank , rank >
extents( src.m_offset.m_dim , args... );
dst.m_offset = dst_offset_type( src.m_offset , extents );
dst.m_handle = dst_handle_type( src.m_handle +
src.m_offset( extents.domain_offset(0)
, extents.domain_offset(1)
, extents.domain_offset(2)
, extents.domain_offset(3)
, extents.domain_offset(4)
, extents.domain_offset(5)
, extents.domain_offset(6)
, extents.domain_offset(7)
) );
}
};
}}} // namespace Kokkos::Experimental::Impl
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
#endif /* #ifndef KOKKOS_EXPERIMENTAL_VIEW_ARRAY_MAPPING_HPP */
diff --git a/lib/kokkos/core/src/impl/KokkosExp_ViewCtor.hpp b/lib/kokkos/core/src/impl/Kokkos_ViewCtor.hpp
similarity index 99%
rename from lib/kokkos/core/src/impl/KokkosExp_ViewCtor.hpp
rename to lib/kokkos/core/src/impl/Kokkos_ViewCtor.hpp
index 6525fed0a..6381aee46 100644
--- a/lib/kokkos/core/src/impl/KokkosExp_ViewCtor.hpp
+++ b/lib/kokkos/core/src/impl/Kokkos_ViewCtor.hpp
@@ -1,252 +1,250 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_EXPERIMENTAL_IMPL_VIEW_CTOR_PROP_HPP
#define KOKKOS_EXPERIMENTAL_IMPL_VIEW_CTOR_PROP_HPP
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
/* For backward compatibility */
struct ViewAllocateWithoutInitializing {
const std::string label ;
ViewAllocateWithoutInitializing() : label() {}
explicit
ViewAllocateWithoutInitializing( const std::string & arg_label ) : label( arg_label ) {}
explicit
ViewAllocateWithoutInitializing( const char * const arg_label ) : label( arg_label ) {}
};
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
-namespace Experimental {
namespace Impl {
struct WithoutInitializing_t {};
struct AllowPadding_t {};
struct NullSpace_t {};
//----------------------------------------------------------------------------
/**\brief Whether a type can be used for a view label */
template < typename >
struct is_view_label : public std::false_type {};
template<>
struct is_view_label< std::string > : public std::true_type {};
template< unsigned N >
struct is_view_label< char[N] > : public std::true_type {};
template< unsigned N >
struct is_view_label< const char[N] > : public std::true_type {};
//----------------------------------------------------------------------------
template< typename ... P >
struct ViewCtorProp ;
/* std::integral_constant<unsigned,I> are dummy arguments
* that avoid duplicate base class errors
*/
template< unsigned I >
struct ViewCtorProp< void , std::integral_constant<unsigned,I> >
{
ViewCtorProp() = default ;
ViewCtorProp( const ViewCtorProp & ) = default ;
ViewCtorProp & operator = ( const ViewCtorProp & ) = default ;
template< typename P >
ViewCtorProp( const P & ) {}
};
/* Property flags have constexpr value */
template< typename P >
struct ViewCtorProp
< typename std::enable_if<
std::is_same< P , AllowPadding_t >::value ||
std::is_same< P , WithoutInitializing_t >::value
>::type
, P
>
{
ViewCtorProp() = default ;
ViewCtorProp( const ViewCtorProp & ) = default ;
ViewCtorProp & operator = ( const ViewCtorProp & ) = default ;
typedef P type ;
ViewCtorProp( const type & ) {}
static constexpr type value = type();
};
/* Map input label type to std::string */
template< typename Label >
struct ViewCtorProp
< typename std::enable_if< is_view_label< Label >::value >::type
, Label
>
{
ViewCtorProp() = default ;
ViewCtorProp( const ViewCtorProp & ) = default ;
ViewCtorProp & operator = ( const ViewCtorProp & ) = default ;
typedef std::string type ;
ViewCtorProp( const type & arg ) : value( arg ) {}
ViewCtorProp( type && arg ) : value( arg ) {}
type value ;
};
template< typename Space >
struct ViewCtorProp
< typename std::enable_if<
Kokkos::Impl::is_memory_space<Space>::value ||
Kokkos::Impl::is_execution_space<Space>::value
>::type
, Space
>
{
ViewCtorProp() = default ;
ViewCtorProp( const ViewCtorProp & ) = default ;
ViewCtorProp & operator = ( const ViewCtorProp & ) = default ;
typedef Space type ;
ViewCtorProp( const type & arg ) : value( arg ) {}
type value ;
};
template< typename T >
struct ViewCtorProp < void , T * >
{
ViewCtorProp() = default ;
ViewCtorProp( const ViewCtorProp & ) = default ;
ViewCtorProp & operator = ( const ViewCtorProp & ) = default ;
typedef T * type ;
KOKKOS_INLINE_FUNCTION
ViewCtorProp( const type arg ) : value( arg ) {}
type value ;
};
template< typename ... P >
struct ViewCtorProp : public ViewCtorProp< void , P > ...
{
private:
typedef Kokkos::Impl::has_condition< void , Kokkos::Impl::is_memory_space , P ... >
var_memory_space ;
typedef Kokkos::Impl::has_condition< void , Kokkos::Impl::is_execution_space , P ... >
var_execution_space ;
struct VOIDDUMMY{};
typedef Kokkos::Impl::has_condition< VOIDDUMMY , std::is_pointer , P ... >
var_pointer ;
public:
/* Flags for the common properties */
enum { has_memory_space = var_memory_space::value };
enum { has_execution_space = var_execution_space::value };
enum { has_pointer = var_pointer::value };
enum { has_label = Kokkos::Impl::has_type< std::string , P... >::value };
enum { allow_padding = Kokkos::Impl::has_type< AllowPadding_t , P... >::value };
enum { initialize = ! Kokkos::Impl::has_type< WithoutInitializing_t , P ... >::value };
typedef typename var_memory_space::type memory_space ;
typedef typename var_execution_space::type execution_space ;
typedef typename var_pointer::type pointer_type ;
/* Copy from a matching argument list.
* Requires std::is_same< P , ViewCtorProp< void , Args >::value ...
*/
template< typename ... Args >
inline
ViewCtorProp( Args const & ... args )
: ViewCtorProp< void , P >( args ) ...
{}
template< typename ... Args >
KOKKOS_INLINE_FUNCTION
ViewCtorProp( pointer_type arg0 , Args const & ... args )
: ViewCtorProp< void , pointer_type >( arg0 )
, ViewCtorProp< void , typename ViewCtorProp< void , Args >::type >( args ) ...
{}
/* Copy from a matching property subset */
template< typename ... Args >
ViewCtorProp( ViewCtorProp< Args ... > const & arg )
: ViewCtorProp< void , Args >( ((ViewCtorProp<void,Args> const &) arg ) ) ...
{}
};
} /* namespace Impl */
-} /* namespace Experimental */
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
#endif
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewDefault.hpp b/lib/kokkos/core/src/impl/Kokkos_ViewDefault.hpp
deleted file mode 100644
index 94c8e13c1..000000000
--- a/lib/kokkos/core/src/impl/Kokkos_ViewDefault.hpp
+++ /dev/null
@@ -1,886 +0,0 @@
-/*
-//@HEADER
-// ************************************************************************
-//
-// Kokkos v. 2.0
-// Copyright (2014) Sandia Corporation
-//
-// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
-// the U.S. Government retains certain rights in this software.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are
-// met:
-//
-// 1. Redistributions of source code must retain the above copyright
-// notice, this list of conditions and the following disclaimer.
-//
-// 2. Redistributions in binary form must reproduce the above copyright
-// notice, this list of conditions and the following disclaimer in the
-// documentation and/or other materials provided with the distribution.
-//
-// 3. Neither the name of the Corporation nor the names of the
-// contributors may be used to endorse or promote products derived from
-// this software without specific prior written permission.
-//
-// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
-// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
-// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
-// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
-// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
-// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
-// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
-// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
-// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-//
-// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
-// ************************************************************************
-//@HEADER
-*/
-
-#ifndef KOKKOS_VIEWDEFAULT_HPP
-#define KOKKOS_VIEWDEFAULT_HPP
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Impl {
-
-template<>
-struct ViewAssignment< ViewDefault , ViewDefault , void >
-{
- typedef ViewDefault Specialize ;
-
- //------------------------------------
- /** \brief Compatible value and shape and LayoutLeft/Right to LayoutStride*/
-
- template< class DT , class DL , class DD , class DM ,
- class ST , class SL , class SD , class SM >
- KOKKOS_INLINE_FUNCTION
- ViewAssignment( View<DT,DL,DD,DM,Specialize> & dst ,
- const View<ST,SL,SD,SM,Specialize> & src ,
- const typename enable_if<(
- ViewAssignable< ViewTraits<DT,DL,DD,DM> ,
- ViewTraits<ST,SL,SD,SM> >::value
- ||
- ( ViewAssignable< ViewTraits<DT,DL,DD,DM> ,
- ViewTraits<ST,SL,SD,SM> >::assignable_value
- &&
- ShapeCompatible< typename ViewTraits<DT,DL,DD,DM>::shape_type ,
- typename ViewTraits<ST,SL,SD,SM>::shape_type >::value
- &&
- is_same< typename ViewTraits<DT,DL,DD,DM>::array_layout,LayoutStride>::value
- && (is_same< typename ViewTraits<ST,SL,SD,SM>::array_layout,LayoutLeft>::value ||
- is_same< typename ViewTraits<ST,SL,SD,SM>::array_layout,LayoutRight>::value))
- )>::type * = 0 )
- {
- dst.m_offset_map.assign( src.m_offset_map );
-
- dst.m_management = src.m_management ;
-
- dst.m_ptr_on_device = ViewDataManagement< ViewTraits<DT,DL,DD,DM> >::create_handle( src.m_ptr_on_device, src.m_tracker );
-
- if( dst.is_managed )
- dst.m_tracker = src.m_tracker ;
- else {
- dst.m_tracker = AllocationTracker();
- dst.m_management.set_unmanaged();
- }
- }
-
-
- /** \brief Assign 1D Strided View to LayoutLeft or LayoutRight if stride[0]==1 */
-
- template< class DT , class DL , class DD , class DM ,
- class ST , class SD , class SM >
- KOKKOS_INLINE_FUNCTION
- ViewAssignment( View<DT,DL,DD,DM,Specialize> & dst ,
- const View<ST,LayoutStride,SD,SM,Specialize> & src ,
- const typename enable_if<(
- (
- ViewAssignable< ViewTraits<DT,DL,DD,DM> ,
- ViewTraits<ST,LayoutStride,SD,SM> >::value
- ||
- ( ViewAssignable< ViewTraits<DT,DL,DD,DM> ,
- ViewTraits<ST,LayoutStride,SD,SM> >::assignable_value
- &&
- ShapeCompatible< typename ViewTraits<DT,DL,DD,DM>::shape_type ,
- typename ViewTraits<ST,LayoutStride,SD,SM>::shape_type >::value
- )
- )
- &&
- (View<DT,DL,DD,DM,Specialize>::rank==1)
- && (is_same< typename ViewTraits<DT,DL,DD,DM>::array_layout,LayoutLeft>::value ||
- is_same< typename ViewTraits<DT,DL,DD,DM>::array_layout,LayoutRight>::value)
- )>::type * = 0 )
- {
- size_t strides[8];
- src.stride(strides);
- if(strides[0]!=1) {
- Kokkos::abort("Trying to assign strided 1D View to LayoutRight or LayoutLeft which is not stride-1");
- }
- dst.m_offset_map.assign( src.dimension_0(), 0, 0, 0, 0, 0, 0, 0, 0 );
-
- dst.m_management = src.m_management ;
-
- dst.m_ptr_on_device = ViewDataManagement< ViewTraits<DT,DL,DD,DM> >::create_handle( src.m_ptr_on_device, src.m_tracker );
-
- if( dst.is_managed )
- dst.m_tracker = src.m_tracker ;
- else {
- dst.m_tracker = AllocationTracker();
- dst.m_management.set_unmanaged();
- }
- }
-
- //------------------------------------
- /** \brief Deep copy data from compatible value type, layout, rank, and specialization.
- * Check the dimensions and allocation lengths at runtime.
- */
- template< class DT , class DL , class DD , class DM ,
- class ST , class SL , class SD , class SM >
- inline static
- void deep_copy( const View<DT,DL,DD,DM,Specialize> & dst ,
- const View<ST,SL,SD,SM,Specialize> & src ,
- const typename Impl::enable_if<(
- Impl::is_same< typename ViewTraits<DT,DL,DD,DM>::value_type ,
- typename ViewTraits<ST,SL,SD,SM>::non_const_value_type >::value
- &&
- Impl::is_same< typename ViewTraits<DT,DL,DD,DM>::array_layout ,
- typename ViewTraits<ST,SL,SD,SM>::array_layout >::value
- &&
- ( unsigned(ViewTraits<DT,DL,DD,DM>::rank) == unsigned(ViewTraits<ST,SL,SD,SM>::rank) )
- )>::type * = 0 )
- {
- typedef typename ViewTraits<DT,DL,DD,DM>::memory_space dst_memory_space ;
- typedef typename ViewTraits<ST,SL,SD,SM>::memory_space src_memory_space ;
-
- if ( dst.ptr_on_device() != src.ptr_on_device() ) {
-
- Impl::assert_shapes_are_equal( dst.m_offset_map , src.m_offset_map );
-
- const size_t nbytes = dst.m_offset_map.scalar_size * dst.m_offset_map.capacity();
-
- DeepCopy< dst_memory_space , src_memory_space >( dst.ptr_on_device() , src.ptr_on_device() , nbytes );
- }
- }
-};
-
-} /* namespace Impl */
-} /* namespace Kokkos */
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Impl {
-
-template< class ExecSpace , class DT , class DL, class DD, class DM, class DS >
-struct ViewDefaultConstruct< ExecSpace , Kokkos::View<DT,DL,DD,DM,DS> , true >
-{
- Kokkos::View<DT,DL,DD,DM,DS> * const m_ptr ;
-
- KOKKOS_FORCEINLINE_FUNCTION
- void operator()( const typename ExecSpace::size_type& i ) const
- { new(m_ptr+i) Kokkos::View<DT,DL,DD,DM,DS>(); }
-
- ViewDefaultConstruct( Kokkos::View<DT,DL,DD,DM,DS> * pointer , size_t capacity )
- : m_ptr( pointer )
- {
- Kokkos::RangePolicy< ExecSpace > range( 0 , capacity );
- parallel_for( range , *this );
- ExecSpace::fence();
- }
-};
-
-template< class SrcDataType , class SrcArg1Type , class SrcArg2Type , class SrcArg3Type
- , class SubArg0_type , class SubArg1_type , class SubArg2_type , class SubArg3_type
- , class SubArg4_type , class SubArg5_type , class SubArg6_type , class SubArg7_type
- >
-struct ViewSubview< View< SrcDataType , SrcArg1Type , SrcArg2Type , SrcArg3Type , Impl::ViewDefault >
- , SubArg0_type , SubArg1_type , SubArg2_type , SubArg3_type
- , SubArg4_type , SubArg5_type , SubArg6_type , SubArg7_type >
-{
-private:
-
- typedef View< SrcDataType , SrcArg1Type , SrcArg2Type , SrcArg3Type , Impl::ViewDefault > SrcViewType ;
-
- enum { V0 = Impl::is_same< SubArg0_type , void >::value ? 1 : 0 };
- enum { V1 = Impl::is_same< SubArg1_type , void >::value ? 1 : 0 };
- enum { V2 = Impl::is_same< SubArg2_type , void >::value ? 1 : 0 };
- enum { V3 = Impl::is_same< SubArg3_type , void >::value ? 1 : 0 };
- enum { V4 = Impl::is_same< SubArg4_type , void >::value ? 1 : 0 };
- enum { V5 = Impl::is_same< SubArg5_type , void >::value ? 1 : 0 };
- enum { V6 = Impl::is_same< SubArg6_type , void >::value ? 1 : 0 };
- enum { V7 = Impl::is_same< SubArg7_type , void >::value ? 1 : 0 };
-
- // The source view rank must be equal to the input argument rank
- // Once a void argument is encountered all subsequent arguments must be void.
- enum { InputRank =
- Impl::StaticAssert<( SrcViewType::rank ==
- ( V0 ? 0 : (
- V1 ? 1 : (
- V2 ? 2 : (
- V3 ? 3 : (
- V4 ? 4 : (
- V5 ? 5 : (
- V6 ? 6 : (
- V7 ? 7 : 8 ))))))) ))
- &&
- ( SrcViewType::rank ==
- ( 8 - ( V0 + V1 + V2 + V3 + V4 + V5 + V6 + V7 ) ) )
- >::value ? SrcViewType::rank : 0 };
-
- enum { R0 = Impl::ViewOffsetRange< SubArg0_type >::is_range ? 1 : 0 };
- enum { R1 = Impl::ViewOffsetRange< SubArg1_type >::is_range ? 1 : 0 };
- enum { R2 = Impl::ViewOffsetRange< SubArg2_type >::is_range ? 1 : 0 };
- enum { R3 = Impl::ViewOffsetRange< SubArg3_type >::is_range ? 1 : 0 };
- enum { R4 = Impl::ViewOffsetRange< SubArg4_type >::is_range ? 1 : 0 };
- enum { R5 = Impl::ViewOffsetRange< SubArg5_type >::is_range ? 1 : 0 };
- enum { R6 = Impl::ViewOffsetRange< SubArg6_type >::is_range ? 1 : 0 };
- enum { R7 = Impl::ViewOffsetRange< SubArg7_type >::is_range ? 1 : 0 };
-
- enum { OutputRank = unsigned(R0) + unsigned(R1) + unsigned(R2) + unsigned(R3)
- + unsigned(R4) + unsigned(R5) + unsigned(R6) + unsigned(R7) };
-
- // Reverse
- enum { R0_rev = 0 == InputRank ? 0u : (
- 1 == InputRank ? unsigned(R0) : (
- 2 == InputRank ? unsigned(R1) : (
- 3 == InputRank ? unsigned(R2) : (
- 4 == InputRank ? unsigned(R3) : (
- 5 == InputRank ? unsigned(R4) : (
- 6 == InputRank ? unsigned(R5) : (
- 7 == InputRank ? unsigned(R6) : unsigned(R7) ))))))) };
-
- typedef typename SrcViewType::array_layout SrcViewLayout ;
-
- // Choose array layout, attempting to preserve original layout if at all possible.
- typedef typename Impl::if_c<
- ( // Same Layout IF
- // OutputRank 0
- ( OutputRank == 0 )
- ||
- // OutputRank 1 or 2, InputLayout Left, Interval 0
- // because single stride one or second index has a stride.
- ( OutputRank <= 2 && R0 && Impl::is_same<SrcViewLayout,LayoutLeft>::value )
- ||
- // OutputRank 1 or 2, InputLayout Right, Interval [InputRank-1]
- // because single stride one or second index has a stride.
- ( OutputRank <= 2 && R0_rev && Impl::is_same<SrcViewLayout,LayoutRight>::value )
- ), SrcViewLayout , Kokkos::LayoutStride >::type OutputViewLayout ;
-
- // Choose data type as a purely dynamic rank array to accomodate a runtime range.
- typedef typename Impl::if_c< OutputRank == 0 , typename SrcViewType::value_type ,
- typename Impl::if_c< OutputRank == 1 , typename SrcViewType::value_type *,
- typename Impl::if_c< OutputRank == 2 , typename SrcViewType::value_type **,
- typename Impl::if_c< OutputRank == 3 , typename SrcViewType::value_type ***,
- typename Impl::if_c< OutputRank == 4 , typename SrcViewType::value_type ****,
- typename Impl::if_c< OutputRank == 5 , typename SrcViewType::value_type *****,
- typename Impl::if_c< OutputRank == 6 , typename SrcViewType::value_type ******,
- typename Impl::if_c< OutputRank == 7 , typename SrcViewType::value_type *******,
- typename SrcViewType::value_type ********
- >::type >::type >::type >::type >::type >::type >::type >::type OutputData ;
-
- // Choose space.
- // If the source view's template arg1 or arg2 is a space then use it,
- // otherwise use the source view's execution space.
-
- typedef typename Impl::if_c< Impl::is_space< SrcArg1Type >::value , SrcArg1Type ,
- typename Impl::if_c< Impl::is_space< SrcArg2Type >::value , SrcArg2Type , typename SrcViewType::device_type
- >::type >::type OutputSpace ;
-
-public:
-
- // If keeping the layout then match non-data type arguments
- // else keep execution space and memory traits.
- typedef typename
- Impl::if_c< Impl::is_same< SrcViewLayout , OutputViewLayout >::value
- , Kokkos::View< OutputData , SrcArg1Type , SrcArg2Type , SrcArg3Type , Impl::ViewDefault >
- , Kokkos::View< OutputData , OutputViewLayout , OutputSpace
- , typename SrcViewType::memory_traits
- , Impl::ViewDefault >
- >::type type ;
-};
-
-} /* namespace Impl */
-} /* namespace Kokkos */
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-
-// Construct subview of a Rank 8 view
-template< class DstDataType , class DstArg1Type , class DstArg2Type , class DstArg3Type >
-template< class SrcDataType , class SrcArg1Type , class SrcArg2Type , class SrcArg3Type
- , class SubArg0_type , class SubArg1_type , class SubArg2_type , class SubArg3_type
- , class SubArg4_type , class SubArg5_type , class SubArg6_type , class SubArg7_type
- >
-KOKKOS_INLINE_FUNCTION
-View< DstDataType , DstArg1Type , DstArg2Type , DstArg3Type , Impl::ViewDefault >::
-View( const View< SrcDataType , SrcArg1Type , SrcArg2Type , SrcArg3Type , Impl::ViewDefault > & src
- , const SubArg0_type & arg0
- , const SubArg1_type & arg1
- , const SubArg2_type & arg2
- , const SubArg3_type & arg3
- , const SubArg4_type & arg4
- , const SubArg5_type & arg5
- , const SubArg6_type & arg6
- , const SubArg7_type & arg7
- )
- : m_ptr_on_device( (typename traits::value_type*) NULL)
- , m_offset_map()
- , m_management()
- , m_tracker()
-{
- // This constructor can only be used to construct a subview
- // from the source view. This type must match the subview type
- // deduced from the source view and subview arguments.
-
- typedef Impl::ViewSubview< View< SrcDataType , SrcArg1Type , SrcArg2Type , SrcArg3Type , Impl::ViewDefault >
- , SubArg0_type , SubArg1_type , SubArg2_type , SubArg3_type
- , SubArg4_type , SubArg5_type , SubArg6_type , SubArg7_type >
- ViewSubviewDeduction ;
-
- enum { is_a_valid_subview_constructor =
- Impl::StaticAssert<
- Impl::is_same< View , typename ViewSubviewDeduction::type >::value
- >::value
- };
-
- if ( is_a_valid_subview_constructor ) {
-
- typedef Impl::ViewOffsetRange< SubArg0_type > R0 ;
- typedef Impl::ViewOffsetRange< SubArg1_type > R1 ;
- typedef Impl::ViewOffsetRange< SubArg2_type > R2 ;
- typedef Impl::ViewOffsetRange< SubArg3_type > R3 ;
- typedef Impl::ViewOffsetRange< SubArg4_type > R4 ;
- typedef Impl::ViewOffsetRange< SubArg5_type > R5 ;
- typedef Impl::ViewOffsetRange< SubArg6_type > R6 ;
- typedef Impl::ViewOffsetRange< SubArg7_type > R7 ;
-
- // 'assign_subview' returns whether the subview offset_map
- // introduces noncontiguity in the view.
- const bool introduce_noncontiguity =
- m_offset_map.assign_subview( src.m_offset_map
- , R0::dimension( src.m_offset_map.N0 , arg0 )
- , R1::dimension( src.m_offset_map.N1 , arg1 )
- , R2::dimension( src.m_offset_map.N2 , arg2 )
- , R3::dimension( src.m_offset_map.N3 , arg3 )
- , R4::dimension( src.m_offset_map.N4 , arg4 )
- , R5::dimension( src.m_offset_map.N5 , arg5 )
- , R6::dimension( src.m_offset_map.N6 , arg6 )
- , R7::dimension( src.m_offset_map.N7 , arg7 )
- );
-
- if ( m_offset_map.capacity() ) {
-
- m_management = src.m_management ;
-
- if ( introduce_noncontiguity ) m_management.set_noncontiguous();
-
- m_ptr_on_device = src.m_ptr_on_device +
- src.m_offset_map( R0::begin( arg0 )
- , R1::begin( arg1 )
- , R2::begin( arg2 )
- , R3::begin( arg3 )
- , R4::begin( arg4 )
- , R5::begin( arg5 )
- , R6::begin( arg6 )
- , R7::begin( arg7 ) );
- m_tracker = src.m_tracker ;
- }
- }
-}
-
-// Construct subview of a Rank 7 view
-template< class DstDataType , class DstArg1Type , class DstArg2Type , class DstArg3Type >
-template< class SrcDataType , class SrcArg1Type , class SrcArg2Type , class SrcArg3Type
- , class SubArg0_type , class SubArg1_type , class SubArg2_type , class SubArg3_type
- , class SubArg4_type , class SubArg5_type , class SubArg6_type
- >
-KOKKOS_INLINE_FUNCTION
-View< DstDataType , DstArg1Type , DstArg2Type , DstArg3Type , Impl::ViewDefault >::
-View( const View< SrcDataType , SrcArg1Type , SrcArg2Type , SrcArg3Type , Impl::ViewDefault > & src
- , const SubArg0_type & arg0
- , const SubArg1_type & arg1
- , const SubArg2_type & arg2
- , const SubArg3_type & arg3
- , const SubArg4_type & arg4
- , const SubArg5_type & arg5
- , const SubArg6_type & arg6
- )
- : m_ptr_on_device( (typename traits::value_type*) NULL)
- , m_offset_map()
- , m_management()
- , m_tracker()
-{
- // This constructor can only be used to construct a subview
- // from the source view. This type must match the subview type
- // deduced from the source view and subview arguments.
-
- typedef Impl::ViewSubview< View< SrcDataType , SrcArg1Type , SrcArg2Type , SrcArg3Type , Impl::ViewDefault >
- , SubArg0_type , SubArg1_type , SubArg2_type , SubArg3_type
- , SubArg4_type , SubArg5_type , SubArg6_type , void >
- ViewSubviewDeduction ;
-
- enum { is_a_valid_subview_constructor =
- Impl::StaticAssert<
- Impl::is_same< View , typename ViewSubviewDeduction::type >::value
- >::value
- };
-
- if ( is_a_valid_subview_constructor ) {
-
- typedef Impl::ViewOffsetRange< SubArg0_type > R0 ;
- typedef Impl::ViewOffsetRange< SubArg1_type > R1 ;
- typedef Impl::ViewOffsetRange< SubArg2_type > R2 ;
- typedef Impl::ViewOffsetRange< SubArg3_type > R3 ;
- typedef Impl::ViewOffsetRange< SubArg4_type > R4 ;
- typedef Impl::ViewOffsetRange< SubArg5_type > R5 ;
- typedef Impl::ViewOffsetRange< SubArg6_type > R6 ;
-
- // 'assign_subview' returns whether the subview offset_map
- // introduces noncontiguity in the view.
- const bool introduce_noncontiguity =
- m_offset_map.assign_subview( src.m_offset_map
- , R0::dimension( src.m_offset_map.N0 , arg0 )
- , R1::dimension( src.m_offset_map.N1 , arg1 )
- , R2::dimension( src.m_offset_map.N2 , arg2 )
- , R3::dimension( src.m_offset_map.N3 , arg3 )
- , R4::dimension( src.m_offset_map.N4 , arg4 )
- , R5::dimension( src.m_offset_map.N5 , arg5 )
- , R6::dimension( src.m_offset_map.N6 , arg6 )
- , 0
- );
-
- if ( m_offset_map.capacity() ) {
-
- m_management = src.m_management ;
-
- if ( introduce_noncontiguity ) m_management.set_noncontiguous();
-
- m_ptr_on_device = src.m_ptr_on_device +
- src.m_offset_map( R0::begin( arg0 )
- , R1::begin( arg1 )
- , R2::begin( arg2 )
- , R3::begin( arg3 )
- , R4::begin( arg4 )
- , R5::begin( arg5 )
- , R6::begin( arg6 )
- );
- m_tracker = src.m_tracker ;
- }
- }
-}
-
-// Construct subview of a Rank 6 view
-template< class DstDataType , class DstArg1Type , class DstArg2Type , class DstArg3Type >
-template< class SrcDataType , class SrcArg1Type , class SrcArg2Type , class SrcArg3Type
- , class SubArg0_type , class SubArg1_type , class SubArg2_type , class SubArg3_type
- , class SubArg4_type , class SubArg5_type
- >
-KOKKOS_INLINE_FUNCTION
-View< DstDataType , DstArg1Type , DstArg2Type , DstArg3Type , Impl::ViewDefault >::
-View( const View< SrcDataType , SrcArg1Type , SrcArg2Type , SrcArg3Type , Impl::ViewDefault > & src
- , const SubArg0_type & arg0
- , const SubArg1_type & arg1
- , const SubArg2_type & arg2
- , const SubArg3_type & arg3
- , const SubArg4_type & arg4
- , const SubArg5_type & arg5
- )
- : m_ptr_on_device( (typename traits::value_type*) NULL)
- , m_offset_map()
- , m_management()
- , m_tracker()
-{
- // This constructor can only be used to construct a subview
- // from the source view. This type must match the subview type
- // deduced from the source view and subview arguments.
-
- typedef Impl::ViewSubview< View< SrcDataType , SrcArg1Type , SrcArg2Type , SrcArg3Type , Impl::ViewDefault >
- , SubArg0_type , SubArg1_type , SubArg2_type , SubArg3_type
- , SubArg4_type , SubArg5_type , void , void >
- ViewSubviewDeduction ;
-
- enum { is_a_valid_subview_constructor =
- Impl::StaticAssert<
- Impl::is_same< View , typename ViewSubviewDeduction::type >::value
- >::value
- };
-
- if ( is_a_valid_subview_constructor ) {
-
- typedef Impl::ViewOffsetRange< SubArg0_type > R0 ;
- typedef Impl::ViewOffsetRange< SubArg1_type > R1 ;
- typedef Impl::ViewOffsetRange< SubArg2_type > R2 ;
- typedef Impl::ViewOffsetRange< SubArg3_type > R3 ;
- typedef Impl::ViewOffsetRange< SubArg4_type > R4 ;
- typedef Impl::ViewOffsetRange< SubArg5_type > R5 ;
-
- // 'assign_subview' returns whether the subview offset_map
- // introduces noncontiguity in the view.
- const bool introduce_noncontiguity =
- m_offset_map.assign_subview( src.m_offset_map
- , R0::dimension( src.m_offset_map.N0 , arg0 )
- , R1::dimension( src.m_offset_map.N1 , arg1 )
- , R2::dimension( src.m_offset_map.N2 , arg2 )
- , R3::dimension( src.m_offset_map.N3 , arg3 )
- , R4::dimension( src.m_offset_map.N4 , arg4 )
- , R5::dimension( src.m_offset_map.N5 , arg5 )
- , 0
- , 0
- );
-
- if ( m_offset_map.capacity() ) {
-
- m_management = src.m_management ;
-
- if ( introduce_noncontiguity ) m_management.set_noncontiguous();
-
- m_ptr_on_device = src.m_ptr_on_device +
- src.m_offset_map( R0::begin( arg0 )
- , R1::begin( arg1 )
- , R2::begin( arg2 )
- , R3::begin( arg3 )
- , R4::begin( arg4 )
- , R5::begin( arg5 )
- );
- m_tracker = src.m_tracker ;
- }
- }
-}
-
-// Construct subview of a Rank 5 view
-template< class DstDataType , class DstArg1Type , class DstArg2Type , class DstArg3Type >
-template< class SrcDataType , class SrcArg1Type , class SrcArg2Type , class SrcArg3Type
- , class SubArg0_type , class SubArg1_type , class SubArg2_type , class SubArg3_type
- , class SubArg4_type
- >
-KOKKOS_INLINE_FUNCTION
-View< DstDataType , DstArg1Type , DstArg2Type , DstArg3Type , Impl::ViewDefault >::
-View( const View< SrcDataType , SrcArg1Type , SrcArg2Type , SrcArg3Type , Impl::ViewDefault > & src
- , const SubArg0_type & arg0
- , const SubArg1_type & arg1
- , const SubArg2_type & arg2
- , const SubArg3_type & arg3
- , const SubArg4_type & arg4
- )
- : m_ptr_on_device( (typename traits::value_type*) NULL)
- , m_offset_map()
- , m_management()
- , m_tracker()
-{
- // This constructor can only be used to construct a subview
- // from the source view. This type must match the subview type
- // deduced from the source view and subview arguments.
-
- typedef Impl::ViewSubview< View< SrcDataType , SrcArg1Type , SrcArg2Type , SrcArg3Type , Impl::ViewDefault >
- , SubArg0_type , SubArg1_type , SubArg2_type , SubArg3_type
- , SubArg4_type , void , void , void >
- ViewSubviewDeduction ;
-
- enum { is_a_valid_subview_constructor =
- Impl::StaticAssert<
- Impl::is_same< View , typename ViewSubviewDeduction::type >::value
- >::value
- };
-
- if ( is_a_valid_subview_constructor ) {
-
- typedef Impl::ViewOffsetRange< SubArg0_type > R0 ;
- typedef Impl::ViewOffsetRange< SubArg1_type > R1 ;
- typedef Impl::ViewOffsetRange< SubArg2_type > R2 ;
- typedef Impl::ViewOffsetRange< SubArg3_type > R3 ;
- typedef Impl::ViewOffsetRange< SubArg4_type > R4 ;
-
- // 'assign_subview' returns whether the subview offset_map
- // introduces noncontiguity in the view.
- const bool introduce_noncontiguity =
- m_offset_map.assign_subview( src.m_offset_map
- , R0::dimension( src.m_offset_map.N0 , arg0 )
- , R1::dimension( src.m_offset_map.N1 , arg1 )
- , R2::dimension( src.m_offset_map.N2 , arg2 )
- , R3::dimension( src.m_offset_map.N3 , arg3 )
- , R4::dimension( src.m_offset_map.N4 , arg4 )
- , 0
- , 0
- , 0
- );
-
- if ( m_offset_map.capacity() ) {
-
- m_management = src.m_management ;
-
- if ( introduce_noncontiguity ) m_management.set_noncontiguous();
-
- m_ptr_on_device = src.m_ptr_on_device +
- src.m_offset_map( R0::begin( arg0 )
- , R1::begin( arg1 )
- , R2::begin( arg2 )
- , R3::begin( arg3 )
- , R4::begin( arg4 )
- );
- m_tracker = src.m_tracker ;
- }
- }
-}
-
-// Construct subview of a Rank 4 view
-template< class DstDataType , class DstArg1Type , class DstArg2Type , class DstArg3Type >
-template< class SrcDataType , class SrcArg1Type , class SrcArg2Type , class SrcArg3Type
- , class SubArg0_type , class SubArg1_type , class SubArg2_type , class SubArg3_type
- >
-KOKKOS_INLINE_FUNCTION
-View< DstDataType , DstArg1Type , DstArg2Type , DstArg3Type , Impl::ViewDefault >::
-View( const View< SrcDataType , SrcArg1Type , SrcArg2Type , SrcArg3Type , Impl::ViewDefault > & src
- , const SubArg0_type & arg0
- , const SubArg1_type & arg1
- , const SubArg2_type & arg2
- , const SubArg3_type & arg3
- )
- : m_ptr_on_device( (typename traits::value_type*) NULL)
- , m_offset_map()
- , m_management()
- , m_tracker()
-{
- // This constructor can only be used to construct a subview
- // from the source view. This type must match the subview type
- // deduced from the source view and subview arguments.
-
- typedef Impl::ViewSubview< View< SrcDataType , SrcArg1Type , SrcArg2Type , SrcArg3Type , Impl::ViewDefault >
- , SubArg0_type , SubArg1_type , SubArg2_type , SubArg3_type
- , void , void , void , void >
- ViewSubviewDeduction ;
-
- enum { is_a_valid_subview_constructor =
- Impl::StaticAssert<
- Impl::is_same< View , typename ViewSubviewDeduction::type >::value
- >::value
- };
-
- if ( is_a_valid_subview_constructor ) {
-
- typedef Impl::ViewOffsetRange< SubArg0_type > R0 ;
- typedef Impl::ViewOffsetRange< SubArg1_type > R1 ;
- typedef Impl::ViewOffsetRange< SubArg2_type > R2 ;
- typedef Impl::ViewOffsetRange< SubArg3_type > R3 ;
-
- // 'assign_subview' returns whether the subview offset_map
- // introduces noncontiguity in the view.
- const bool introduce_noncontiguity =
- m_offset_map.assign_subview( src.m_offset_map
- , R0::dimension( src.m_offset_map.N0 , arg0 )
- , R1::dimension( src.m_offset_map.N1 , arg1 )
- , R2::dimension( src.m_offset_map.N2 , arg2 )
- , R3::dimension( src.m_offset_map.N3 , arg3 )
- , 0
- , 0
- , 0
- , 0
- );
-
- if ( m_offset_map.capacity() ) {
-
- m_management = src.m_management ;
-
- if ( introduce_noncontiguity ) m_management.set_noncontiguous();
-
- m_ptr_on_device = src.m_ptr_on_device +
- src.m_offset_map( R0::begin( arg0 )
- , R1::begin( arg1 )
- , R2::begin( arg2 )
- , R3::begin( arg3 )
- );
- m_tracker = src.m_tracker ;
- }
- }
-}
-
-// Construct subview of a Rank 3 view
-template< class DstDataType , class DstArg1Type , class DstArg2Type , class DstArg3Type >
-template< class SrcDataType , class SrcArg1Type , class SrcArg2Type , class SrcArg3Type
- , class SubArg0_type , class SubArg1_type , class SubArg2_type
- >
-KOKKOS_INLINE_FUNCTION
-View< DstDataType , DstArg1Type , DstArg2Type , DstArg3Type , Impl::ViewDefault >::
-View( const View< SrcDataType , SrcArg1Type , SrcArg2Type , SrcArg3Type , Impl::ViewDefault > & src
- , const SubArg0_type & arg0
- , const SubArg1_type & arg1
- , const SubArg2_type & arg2
- )
- : m_ptr_on_device( (typename traits::value_type*) NULL)
- , m_offset_map()
- , m_management()
- , m_tracker()
-{
- // This constructor can only be used to construct a subview
- // from the source view. This type must match the subview type
- // deduced from the source view and subview arguments.
-
- typedef Impl::ViewSubview< View< SrcDataType , SrcArg1Type , SrcArg2Type , SrcArg3Type , Impl::ViewDefault >
- , SubArg0_type , SubArg1_type , SubArg2_type , void , void , void , void , void >
- ViewSubviewDeduction ;
-
- enum { is_a_valid_subview_constructor =
- Impl::StaticAssert<
- Impl::is_same< View , typename ViewSubviewDeduction::type >::value
- >::value
- };
-
- if ( is_a_valid_subview_constructor ) {
-
- typedef Impl::ViewOffsetRange< SubArg0_type > R0 ;
- typedef Impl::ViewOffsetRange< SubArg1_type > R1 ;
- typedef Impl::ViewOffsetRange< SubArg2_type > R2 ;
-
- // 'assign_subview' returns whether the subview offset_map
- // introduces noncontiguity in the view.
- const bool introduce_noncontiguity =
- m_offset_map.assign_subview( src.m_offset_map
- , R0::dimension( src.m_offset_map.N0 , arg0 )
- , R1::dimension( src.m_offset_map.N1 , arg1 )
- , R2::dimension( src.m_offset_map.N2 , arg2 )
- , 0 , 0 , 0 , 0 , 0);
-
- if ( m_offset_map.capacity() ) {
-
- m_management = src.m_management ;
-
- if ( introduce_noncontiguity ) m_management.set_noncontiguous();
-
- m_ptr_on_device = src.m_ptr_on_device +
- src.m_offset_map( R0::begin( arg0 )
- , R1::begin( arg1 )
- , R2::begin( arg2 )
- );
- m_tracker = src.m_tracker ;
- }
- }
-}
-
-// Construct subview of a Rank 2 view
-template< class DstDataType , class DstArg1Type , class DstArg2Type , class DstArg3Type >
-template< class SrcDataType , class SrcArg1Type , class SrcArg2Type , class SrcArg3Type
- , class SubArg0_type , class SubArg1_type
- >
-KOKKOS_INLINE_FUNCTION
-View< DstDataType , DstArg1Type , DstArg2Type , DstArg3Type , Impl::ViewDefault >::
-View( const View< SrcDataType , SrcArg1Type , SrcArg2Type , SrcArg3Type , Impl::ViewDefault > & src
- , const SubArg0_type & arg0
- , const SubArg1_type & arg1
- )
- : m_ptr_on_device( (typename traits::value_type*) NULL)
- , m_offset_map()
- , m_management()
- , m_tracker()
-{
- // This constructor can only be used to construct a subview
- // from the source view. This type must match the subview type
- // deduced from the source view and subview arguments.
-
- typedef Impl::ViewSubview< View< SrcDataType , SrcArg1Type , SrcArg2Type , SrcArg3Type , Impl::ViewDefault >
- , SubArg0_type , SubArg1_type , void , void , void , void , void , void >
- ViewSubviewDeduction ;
-
- enum { is_a_valid_subview_constructor =
- Impl::StaticAssert<
- Impl::is_same< View , typename ViewSubviewDeduction::type >::value
- >::value
- };
-
- if ( is_a_valid_subview_constructor ) {
-
- typedef Impl::ViewOffsetRange< SubArg0_type > R0 ;
- typedef Impl::ViewOffsetRange< SubArg1_type > R1 ;
-
- // 'assign_subview' returns whether the subview offset_map
- // introduces noncontiguity in the view.
- const bool introduce_noncontiguity =
- m_offset_map.assign_subview( src.m_offset_map
- , R0::dimension( src.m_offset_map.N0 , arg0 )
- , R1::dimension( src.m_offset_map.N1 , arg1 )
- , 0 , 0 , 0 , 0 , 0 , 0 );
-
- if ( m_offset_map.capacity() ) {
-
- m_management = src.m_management ;
-
- if ( introduce_noncontiguity ) m_management.set_noncontiguous();
-
- m_ptr_on_device = src.m_ptr_on_device +
- src.m_offset_map( R0::begin( arg0 )
- , R1::begin( arg1 )
- );
- m_tracker = src.m_tracker ;
- }
- }
-}
-
-// Construct subview of a Rank 1 view
-template< class DstDataType , class DstArg1Type , class DstArg2Type , class DstArg3Type >
-template< class SrcDataType , class SrcArg1Type , class SrcArg2Type , class SrcArg3Type
- , class SubArg0_type
- >
-KOKKOS_INLINE_FUNCTION
-View< DstDataType , DstArg1Type , DstArg2Type , DstArg3Type , Impl::ViewDefault >::
-View( const View< SrcDataType , SrcArg1Type , SrcArg2Type , SrcArg3Type , Impl::ViewDefault > & src
- , const SubArg0_type & arg0
- )
- : m_ptr_on_device( (typename traits::value_type*) NULL)
- , m_offset_map()
- , m_management()
- , m_tracker()
-{
- // This constructor can only be used to construct a subview
- // from the source view. This type must match the subview type
- // deduced from the source view and subview arguments.
-
- typedef Impl::ViewSubview< View< SrcDataType , SrcArg1Type , SrcArg2Type , SrcArg3Type , Impl::ViewDefault >
- , SubArg0_type , void , void , void , void , void , void , void >
- ViewSubviewDeduction ;
-
- enum { is_a_valid_subview_constructor =
- Impl::StaticAssert<
- Impl::is_same< View , typename ViewSubviewDeduction::type >::value
- >::value
- };
-
- if ( is_a_valid_subview_constructor ) {
-
- typedef Impl::ViewOffsetRange< SubArg0_type > R0 ;
-
- // 'assign_subview' returns whether the subview offset_map
- // introduces noncontiguity in the view.
- const bool introduce_noncontiguity =
- m_offset_map.assign_subview( src.m_offset_map
- , R0::dimension( src.m_offset_map.N0 , arg0 )
- , 0 , 0 , 0 , 0 , 0 , 0 , 0 );
-
- if ( m_offset_map.capacity() ) {
-
- m_management = src.m_management ;
-
- if ( introduce_noncontiguity ) m_management.set_noncontiguous();
-
- m_ptr_on_device = src.m_ptr_on_device +
- src.m_offset_map( R0::begin( arg0 )
- );
- m_tracker = src.m_tracker ;
- }
- }
-}
-
-} /* namespace Kokkos */
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-#endif /* #ifndef KOKKOS_VIEWDEFAULT_HPP */
-
diff --git a/lib/kokkos/core/src/impl/KokkosExp_ViewMapping.hpp b/lib/kokkos/core/src/impl/Kokkos_ViewMapping.hpp
similarity index 89%
copy from lib/kokkos/core/src/impl/KokkosExp_ViewMapping.hpp
copy to lib/kokkos/core/src/impl/Kokkos_ViewMapping.hpp
index ed56536cd..588166c18 100644
--- a/lib/kokkos/core/src/impl/KokkosExp_ViewMapping.hpp
+++ b/lib/kokkos/core/src/impl/Kokkos_ViewMapping.hpp
@@ -1,2932 +1,3156 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_EXPERIMENTAL_VIEW_MAPPING_HPP
#define KOKKOS_EXPERIMENTAL_VIEW_MAPPING_HPP
#include <type_traits>
#include <initializer_list>
#include <Kokkos_Core_fwd.hpp>
#include <Kokkos_Pair.hpp>
#include <Kokkos_Layout.hpp>
#include <impl/Kokkos_Error.hpp>
#include <impl/Kokkos_Traits.hpp>
-#include <impl/KokkosExp_ViewCtor.hpp>
+#include <impl/Kokkos_ViewCtor.hpp>
#include <impl/Kokkos_Atomic_View.hpp>
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Experimental {
namespace Impl {
template< unsigned I , size_t ... Args >
struct variadic_size_t
{ enum { value = ~size_t(0) }; };
template< size_t Val , size_t ... Args >
struct variadic_size_t< 0 , Val , Args ... >
{ enum { value = Val }; };
template< unsigned I , size_t Val , size_t ... Args >
struct variadic_size_t< I , Val , Args ... >
{ enum { value = variadic_size_t< I - 1 , Args ... >::value }; };
template< size_t ... Args >
struct rank_dynamic ;
template<>
struct rank_dynamic<> { enum { value = 0 }; };
template< size_t Val , size_t ... Args >
struct rank_dynamic< Val , Args... >
{
enum { value = ( Val == 0 ? 1 : 0 ) + rank_dynamic< Args... >::value };
};
#define KOKKOS_IMPL_VIEW_DIMENSION( R ) \
template< size_t V , unsigned > struct ViewDimension ## R \
{ \
enum { ArgN ## R = ( V != ~size_t(0) ? V : 1 ) }; \
enum { N ## R = ( V != ~size_t(0) ? V : 1 ) }; \
KOKKOS_INLINE_FUNCTION explicit ViewDimension ## R ( size_t ) {} \
ViewDimension ## R () = default ; \
ViewDimension ## R ( const ViewDimension ## R & ) = default ; \
ViewDimension ## R & operator = ( const ViewDimension ## R & ) = default ; \
}; \
template< unsigned RD > struct ViewDimension ## R < 0 , RD > \
{ \
enum { ArgN ## R = 0 }; \
typename std::conditional<( RD < 3 ), size_t , unsigned >::type N ## R ; \
ViewDimension ## R () = default ; \
ViewDimension ## R ( const ViewDimension ## R & ) = default ; \
ViewDimension ## R & operator = ( const ViewDimension ## R & ) = default ; \
KOKKOS_INLINE_FUNCTION explicit ViewDimension ## R ( size_t V ) : N ## R ( V ) {} \
};
KOKKOS_IMPL_VIEW_DIMENSION( 0 )
KOKKOS_IMPL_VIEW_DIMENSION( 1 )
KOKKOS_IMPL_VIEW_DIMENSION( 2 )
KOKKOS_IMPL_VIEW_DIMENSION( 3 )
KOKKOS_IMPL_VIEW_DIMENSION( 4 )
KOKKOS_IMPL_VIEW_DIMENSION( 5 )
KOKKOS_IMPL_VIEW_DIMENSION( 6 )
KOKKOS_IMPL_VIEW_DIMENSION( 7 )
#undef KOKKOS_IMPL_VIEW_DIMENSION
template< size_t ... Vals >
struct ViewDimension
: public ViewDimension0< variadic_size_t<0,Vals...>::value
, rank_dynamic< Vals... >::value >
, public ViewDimension1< variadic_size_t<1,Vals...>::value
, rank_dynamic< Vals... >::value >
, public ViewDimension2< variadic_size_t<2,Vals...>::value
, rank_dynamic< Vals... >::value >
, public ViewDimension3< variadic_size_t<3,Vals...>::value
, rank_dynamic< Vals... >::value >
, public ViewDimension4< variadic_size_t<4,Vals...>::value
, rank_dynamic< Vals... >::value >
, public ViewDimension5< variadic_size_t<5,Vals...>::value
, rank_dynamic< Vals... >::value >
, public ViewDimension6< variadic_size_t<6,Vals...>::value
, rank_dynamic< Vals... >::value >
, public ViewDimension7< variadic_size_t<7,Vals...>::value
, rank_dynamic< Vals... >::value >
{
typedef ViewDimension0< variadic_size_t<0,Vals...>::value
, rank_dynamic< Vals... >::value > D0 ;
typedef ViewDimension1< variadic_size_t<1,Vals...>::value
, rank_dynamic< Vals... >::value > D1 ;
typedef ViewDimension2< variadic_size_t<2,Vals...>::value
, rank_dynamic< Vals... >::value > D2 ;
typedef ViewDimension3< variadic_size_t<3,Vals...>::value
, rank_dynamic< Vals... >::value > D3 ;
typedef ViewDimension4< variadic_size_t<4,Vals...>::value
, rank_dynamic< Vals... >::value > D4 ;
typedef ViewDimension5< variadic_size_t<5,Vals...>::value
, rank_dynamic< Vals... >::value > D5 ;
typedef ViewDimension6< variadic_size_t<6,Vals...>::value
, rank_dynamic< Vals... >::value > D6 ;
typedef ViewDimension7< variadic_size_t<7,Vals...>::value
, rank_dynamic< Vals... >::value > D7 ;
using D0::ArgN0 ;
using D1::ArgN1 ;
using D2::ArgN2 ;
using D3::ArgN3 ;
using D4::ArgN4 ;
using D5::ArgN5 ;
using D6::ArgN6 ;
using D7::ArgN7 ;
using D0::N0 ;
using D1::N1 ;
using D2::N2 ;
using D3::N3 ;
using D4::N4 ;
using D5::N5 ;
using D6::N6 ;
using D7::N7 ;
enum { rank = sizeof...(Vals) };
enum { rank_dynamic = Impl::rank_dynamic< Vals... >::value };
ViewDimension() = default ;
ViewDimension( const ViewDimension & ) = default ;
ViewDimension & operator = ( const ViewDimension & ) = default ;
KOKKOS_INLINE_FUNCTION
constexpr
ViewDimension( size_t n0 , size_t n1 , size_t n2 , size_t n3
, size_t n4 , size_t n5 , size_t n6 , size_t n7 )
: D0( n0 )
, D1( n1 )
, D2( n2 )
, D3( n3 )
, D4( n4 )
, D5( n5 )
, D6( n6 )
, D7( n7 )
{}
KOKKOS_INLINE_FUNCTION
constexpr size_t extent( const unsigned r ) const
{
return r == 0 ? N0 : (
r == 1 ? N1 : (
r == 2 ? N2 : (
r == 3 ? N3 : (
r == 4 ? N4 : (
r == 5 ? N5 : (
r == 6 ? N6 : (
r == 7 ? N7 : 0 )))))));
}
template< size_t N >
struct prepend { typedef ViewDimension< N , Vals... > type ; };
template< size_t N >
struct append { typedef ViewDimension< Vals... , N > type ; };
};
template< class A , class B >
struct ViewDimensionJoin ;
template< size_t ... A , size_t ... B >
struct ViewDimensionJoin< ViewDimension< A... > , ViewDimension< B... > > {
typedef ViewDimension< A... , B... > type ;
};
//----------------------------------------------------------------------------
template< class DstDim , class SrcDim >
struct ViewDimensionAssignable ;
template< size_t ... DstArgs , size_t ... SrcArgs >
struct ViewDimensionAssignable< ViewDimension< DstArgs ... >
, ViewDimension< SrcArgs ... > >
{
typedef ViewDimension< DstArgs... > dst ;
typedef ViewDimension< SrcArgs... > src ;
enum { value =
unsigned(dst::rank) == unsigned(src::rank) && (
//Compile time check that potential static dimensions match
( ( 1 > dst::rank_dynamic && 1 > src::rank_dynamic ) ? (size_t(dst::ArgN0) == size_t(src::ArgN0)) : true ) &&
( ( 2 > dst::rank_dynamic && 2 > src::rank_dynamic ) ? (size_t(dst::ArgN1) == size_t(src::ArgN1)) : true ) &&
( ( 3 > dst::rank_dynamic && 3 > src::rank_dynamic ) ? (size_t(dst::ArgN2) == size_t(src::ArgN2)) : true ) &&
( ( 4 > dst::rank_dynamic && 4 > src::rank_dynamic ) ? (size_t(dst::ArgN3) == size_t(src::ArgN3)) : true ) &&
( ( 5 > dst::rank_dynamic && 5 > src::rank_dynamic ) ? (size_t(dst::ArgN4) == size_t(src::ArgN4)) : true ) &&
( ( 6 > dst::rank_dynamic && 6 > src::rank_dynamic ) ? (size_t(dst::ArgN5) == size_t(src::ArgN5)) : true ) &&
( ( 7 > dst::rank_dynamic && 7 > src::rank_dynamic ) ? (size_t(dst::ArgN6) == size_t(src::ArgN6)) : true ) &&
( ( 8 > dst::rank_dynamic && 8 > src::rank_dynamic ) ? (size_t(dst::ArgN7) == size_t(src::ArgN7)) : true )
)};
};
}}} // namespace Kokkos::Experimental::Impl
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
-namespace Experimental {
namespace Impl {
struct ALL_t {
KOKKOS_INLINE_FUNCTION
constexpr const ALL_t & operator()() const { return *this ; }
};
+}} // namespace Kokkos::Impl
+
+namespace Kokkos {
+namespace Experimental {
+namespace Impl {
+
+using Kokkos::Impl::ALL_t ;
+
template< class T >
struct is_integral_extent_type
{ enum { value = std::is_same<T,Kokkos::Experimental::Impl::ALL_t>::value ? 1 : 0 }; };
template< class iType >
struct is_integral_extent_type< std::pair<iType,iType> >
{ enum { value = std::is_integral<iType>::value ? 1 : 0 }; };
template< class iType >
struct is_integral_extent_type< Kokkos::pair<iType,iType> >
{ enum { value = std::is_integral<iType>::value ? 1 : 0 }; };
// Assuming '2 == initializer_list<iType>::size()'
template< class iType >
struct is_integral_extent_type< std::initializer_list<iType> >
{ enum { value = std::is_integral<iType>::value ? 1 : 0 }; };
template < unsigned I , class ... Args >
struct is_integral_extent
{
// get_type is void when sizeof...(Args) <= I
typedef typename std::remove_cv<
typename std::remove_reference<
typename Kokkos::Impl::get_type<I,Args...
>::type >::type >::type type ;
enum { value = is_integral_extent_type<type>::value };
static_assert( value ||
std::is_integral<type>::value ||
std::is_same<type,void>::value
, "subview argument must be either integral or integral extent" );
};
+// Rules for subview arguments and layouts matching
+
+template<class LayoutDest, class LayoutSrc, int RankDest, int RankSrc, int CurrentArg, class ... SubViewArgs>
+struct SubviewLegalArgsCompileTime;
+
+// Rules which allow LayoutLeft to LayoutLeft assignment
+
+template<int RankDest, int RankSrc, int CurrentArg, class Arg, class ... SubViewArgs>
+struct SubviewLegalArgsCompileTime<Kokkos::LayoutLeft, Kokkos::LayoutLeft, RankDest, RankSrc, CurrentArg, Arg, SubViewArgs...> {
+ enum { value =(((CurrentArg==RankDest-1) && (Kokkos::Experimental::Impl::is_integral_extent_type<Arg>::value)) ||
+ ((CurrentArg>=RankDest) && (std::is_integral<Arg>::value)) ||
+ ((CurrentArg<RankDest) && (std::is_same<Arg,Kokkos::Impl::ALL_t>::value)) ||
+ ((CurrentArg==0) && (Kokkos::Experimental::Impl::is_integral_extent_type<Arg>::value))
+ ) && (SubviewLegalArgsCompileTime<Kokkos::LayoutLeft, Kokkos::LayoutLeft, RankDest, RankSrc, CurrentArg+1, SubViewArgs...>::value)};
+};
+
+template<int RankDest, int RankSrc, int CurrentArg, class Arg>
+struct SubviewLegalArgsCompileTime<Kokkos::LayoutLeft, Kokkos::LayoutLeft, RankDest, RankSrc, CurrentArg, Arg> {
+ enum { value = ((CurrentArg==RankDest-1) || (std::is_integral<Arg>::value)) &&
+ (CurrentArg==RankSrc-1) };
+};
+
+// Rules which allow LayoutRight to LayoutRight assignment
+
+template<int RankDest, int RankSrc, int CurrentArg, class Arg, class ... SubViewArgs>
+struct SubviewLegalArgsCompileTime<Kokkos::LayoutRight, Kokkos::LayoutRight, RankDest, RankSrc, CurrentArg, Arg, SubViewArgs...> {
+ enum { value =(((CurrentArg==RankSrc-RankDest) && (Kokkos::Experimental::Impl::is_integral_extent_type<Arg>::value)) ||
+ ((CurrentArg<RankSrc-RankDest) && (std::is_integral<Arg>::value)) ||
+ ((CurrentArg>=RankSrc-RankDest) && (std::is_same<Arg,Kokkos::Impl::ALL_t>::value))
+ ) && (SubviewLegalArgsCompileTime<Kokkos::LayoutRight, Kokkos::LayoutRight, RankDest, RankSrc, CurrentArg+1, SubViewArgs...>::value)};
+};
+
+template<int RankDest, int RankSrc, int CurrentArg, class Arg>
+struct SubviewLegalArgsCompileTime<Kokkos::LayoutRight, Kokkos::LayoutRight, RankDest, RankSrc, CurrentArg, Arg> {
+ enum { value = ((CurrentArg==RankSrc-1) && (std::is_same<Arg,Kokkos::Impl::ALL_t>::value)) };
+};
+
+// Rules which allow assignment to LayoutStride
+
+template<int RankDest, int RankSrc, int CurrentArg, class ... SubViewArgs>
+struct SubviewLegalArgsCompileTime<Kokkos::LayoutStride,Kokkos::LayoutLeft,RankDest,RankSrc,CurrentArg,SubViewArgs...> {
+ enum { value = true };
+};
+
+template<int RankDest, int RankSrc, int CurrentArg, class ... SubViewArgs>
+struct SubviewLegalArgsCompileTime<Kokkos::LayoutStride,Kokkos::LayoutRight,RankDest,RankSrc,CurrentArg,SubViewArgs...> {
+ enum { value = true };
+};
+
+template<int RankDest, int RankSrc, int CurrentArg, class ... SubViewArgs>
+struct SubviewLegalArgsCompileTime<Kokkos::LayoutStride,Kokkos::LayoutStride,RankDest,RankSrc,CurrentArg,SubViewArgs...> {
+ enum { value = true };
+};
+
+
template< unsigned DomainRank , unsigned RangeRank >
struct SubviewExtents {
private:
// Cannot declare zero-length arrays
enum { InternalRangeRank = RangeRank ? RangeRank : 1u };
size_t m_begin[ DomainRank ];
size_t m_length[ InternalRangeRank ];
unsigned m_index[ InternalRangeRank ];
template< size_t ... DimArgs >
KOKKOS_FORCEINLINE_FUNCTION
bool set( unsigned domain_rank
, unsigned range_rank
, const ViewDimension< DimArgs ... > & dim )
{ return true ; }
template< class T , size_t ... DimArgs , class ... Args >
KOKKOS_FORCEINLINE_FUNCTION
bool set( unsigned domain_rank
, unsigned range_rank
, const ViewDimension< DimArgs ... > & dim
, const T & val
, Args ... args )
{
const size_t v = static_cast<size_t>(val);
m_begin[ domain_rank ] = v ;
return set( domain_rank + 1 , range_rank , dim , args... )
#if defined( KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK )
&& ( v < dim.extent( domain_rank ) )
#endif
;
}
// ALL_t
template< size_t ... DimArgs , class ... Args >
KOKKOS_FORCEINLINE_FUNCTION
bool set( unsigned domain_rank
, unsigned range_rank
, const ViewDimension< DimArgs ... > & dim
, const Kokkos::Experimental::Impl::ALL_t
, Args ... args )
{
m_begin[ domain_rank ] = 0 ;
m_length[ range_rank ] = dim.extent( domain_rank );
m_index[ range_rank ] = domain_rank ;
return set( domain_rank + 1 , range_rank + 1 , dim , args... );
}
// std::pair range
template< class T , size_t ... DimArgs , class ... Args >
KOKKOS_FORCEINLINE_FUNCTION
bool set( unsigned domain_rank
, unsigned range_rank
, const ViewDimension< DimArgs ... > & dim
, const std::pair<T,T> & val
, Args ... args )
{
const size_t b = static_cast<size_t>( val.first );
const size_t e = static_cast<size_t>( val.second );
m_begin[ domain_rank ] = b ;
m_length[ range_rank ] = e - b ;
m_index[ range_rank ] = domain_rank ;
return set( domain_rank + 1 , range_rank + 1 , dim , args... )
#if defined( KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK )
&& ( e <= b + dim.extent( domain_rank ) )
#endif
;
}
// Kokkos::pair range
template< class T , size_t ... DimArgs , class ... Args >
KOKKOS_FORCEINLINE_FUNCTION
bool set( unsigned domain_rank
, unsigned range_rank
, const ViewDimension< DimArgs ... > & dim
, const Kokkos::pair<T,T> & val
, Args ... args )
{
const size_t b = static_cast<size_t>( val.first );
const size_t e = static_cast<size_t>( val.second );
m_begin[ domain_rank ] = b ;
m_length[ range_rank ] = e - b ;
m_index[ range_rank ] = domain_rank ;
return set( domain_rank + 1 , range_rank + 1 , dim , args... )
#if defined( KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK )
&& ( e <= b + dim.extent( domain_rank ) )
#endif
;
}
// { begin , end } range
template< class T , size_t ... DimArgs , class ... Args >
KOKKOS_FORCEINLINE_FUNCTION
bool set( unsigned domain_rank
, unsigned range_rank
, const ViewDimension< DimArgs ... > & dim
, const std::initializer_list< T > & val
, Args ... args )
{
const size_t b = static_cast<size_t>( val.begin()[0] );
const size_t e = static_cast<size_t>( val.begin()[1] );
m_begin[ domain_rank ] = b ;
m_length[ range_rank ] = e - b ;
m_index[ range_rank ] = domain_rank ;
return set( domain_rank + 1 , range_rank + 1 , dim , args... )
#if defined( KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK )
&& ( val.size() == 2 )
&& ( e <= b + dim.extent( domain_rank ) )
#endif
;
}
//------------------------------
#if defined( KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK )
template< size_t ... DimArgs >
void error( char *
, int
, unsigned
, unsigned
, const ViewDimension< DimArgs ... > & ) const
{}
template< class T , size_t ... DimArgs , class ... Args >
void error( char * buf , int buf_len
, unsigned domain_rank
, unsigned range_rank
, const ViewDimension< DimArgs ... > & dim
, const T & val
, Args ... args ) const
{
const int n = std::min( buf_len ,
snprintf( buf , buf_len
, " %lu < %lu %c"
, static_cast<unsigned long>(val)
, static_cast<unsigned long>( dim.extent( domain_rank ) )
, int( sizeof...(Args) ? ',' : ')' ) ) );
error( buf+n, buf_len-n, domain_rank + 1 , range_rank , dim , args... );
}
// std::pair range
template< size_t ... DimArgs , class ... Args >
void error( char * buf , int buf_len
, unsigned domain_rank
, unsigned range_rank
, const ViewDimension< DimArgs ... > & dim
, const Kokkos::Experimental::Impl::ALL_t
, Args ... args ) const
{
const int n = std::min( buf_len ,
snprintf( buf , buf_len
, " Kokkos::ALL %c"
, int( sizeof...(Args) ? ',' : ')' ) ) );
error( buf+n , buf_len-n , domain_rank + 1 , range_rank + 1 , dim , args... );
}
// std::pair range
template< class T , size_t ... DimArgs , class ... Args >
void error( char * buf , int buf_len
, unsigned domain_rank
, unsigned range_rank
, const ViewDimension< DimArgs ... > & dim
, const std::pair<T,T> & val
, Args ... args ) const
{
// d <= e - b
const int n = std::min( buf_len ,
snprintf( buf , buf_len
, " %lu <= %lu - %lu %c"
, static_cast<unsigned long>( dim.extent( domain_rank ) )
, static_cast<unsigned long>( val.second )
, static_cast<unsigned long>( val.begin )
, int( sizeof...(Args) ? ',' : ')' ) ) );
error( buf+n , buf_len-n , domain_rank + 1 , range_rank + 1 , dim , args... );
}
// Kokkos::pair range
template< class T , size_t ... DimArgs , class ... Args >
void error( char * buf , int buf_len
, unsigned domain_rank
, unsigned range_rank
, const ViewDimension< DimArgs ... > & dim
, const Kokkos::pair<T,T> & val
, Args ... args ) const
{
// d <= e - b
const int n = std::min( buf_len ,
snprintf( buf , buf_len
, " %lu <= %lu - %lu %c"
, static_cast<unsigned long>( dim.extent( domain_rank ) )
, static_cast<unsigned long>( val.second )
, static_cast<unsigned long>( val.begin )
, int( sizeof...(Args) ? ',' : ')' ) ) );
error( buf+n , buf_len-n , domain_rank + 1 , range_rank + 1 , dim , args... );
}
// { begin , end } range
template< class T , size_t ... DimArgs , class ... Args >
void error( char * buf , int buf_len
, unsigned domain_rank
, unsigned range_rank
, const ViewDimension< DimArgs ... > & dim
, const std::initializer_list< T > & val
, Args ... args ) const
{
// d <= e - b
int n = 0 ;
if ( val.size() == 2 ) {
n = std::min( buf_len ,
snprintf( buf , buf_len
, " %lu <= %lu - %lu %c"
, static_cast<unsigned long>( dim.extent( domain_rank ) )
, static_cast<unsigned long>( val.begin()[0] )
, static_cast<unsigned long>( val.begin()[1] )
, int( sizeof...(Args) ? ',' : ')' ) ) );
}
else {
n = std::min( buf_len ,
snprintf( buf , buf_len
, " { ... }.size() == %u %c"
, unsigned(val.size())
, int( sizeof...(Args) ? ',' : ')' ) ) );
}
error( buf+n , buf_len-n , domain_rank + 1 , range_rank + 1 , dim , args... );
}
template< size_t ... DimArgs , class ... Args >
KOKKOS_FORCEINLINE_FUNCTION
void error( const ViewDimension< DimArgs ... > & dim , Args ... args ) const
{
#if defined( KOKKOS_ACTIVE_EXECUTION_SPACE_HOST )
enum { LEN = 1024 };
char buffer[ LEN ];
const int n = snprintf(buffer,LEN,"Kokkos::subview bounds error (");
error( buffer+n , LEN-n , 0 , 0 , dim , args... );
Kokkos::Impl::throw_runtime_exception(std::string(buffer));
#else
Kokkos::abort("Kokkos::subview bounds error");
#endif
}
#else
template< size_t ... DimArgs , class ... Args >
KOKKOS_FORCEINLINE_FUNCTION
void error( const ViewDimension< DimArgs ... > & , Args ... ) const {}
#endif
public:
template< size_t ... DimArgs , class ... Args >
KOKKOS_INLINE_FUNCTION
SubviewExtents( const ViewDimension< DimArgs ... > & dim , Args ... args )
{
static_assert( DomainRank == sizeof...(DimArgs) , "" );
static_assert( DomainRank == sizeof...(Args) , "" );
// Verifies that all arguments, up to 8, are integral types,
// integral extents, or don't exist.
static_assert( RangeRank ==
unsigned( is_integral_extent<0,Args...>::value ) +
unsigned( is_integral_extent<1,Args...>::value ) +
unsigned( is_integral_extent<2,Args...>::value ) +
unsigned( is_integral_extent<3,Args...>::value ) +
unsigned( is_integral_extent<4,Args...>::value ) +
unsigned( is_integral_extent<5,Args...>::value ) +
unsigned( is_integral_extent<6,Args...>::value ) +
unsigned( is_integral_extent<7,Args...>::value ) , "" );
if ( RangeRank == 0 ) { m_length[0] = 0 ; m_index[0] = ~0u ; }
if ( ! set( 0 , 0 , dim , args... ) ) error( dim , args... );
}
template < typename iType >
KOKKOS_FORCEINLINE_FUNCTION
constexpr size_t domain_offset( const iType i ) const
{ return unsigned(i) < DomainRank ? m_begin[i] : 0 ; }
template < typename iType >
KOKKOS_FORCEINLINE_FUNCTION
constexpr size_t range_extent( const iType i ) const
{ return unsigned(i) < InternalRangeRank ? m_length[i] : 0 ; }
template < typename iType >
KOKKOS_FORCEINLINE_FUNCTION
constexpr unsigned range_index( const iType i ) const
{ return unsigned(i) < InternalRangeRank ? m_index[i] : ~0u ; }
};
}}} // namespace Kokkos::Experimental::Impl
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Experimental {
namespace Impl {
/** \brief Given a value type and dimension generate the View data type */
template< class T , class Dim >
struct ViewDataType ;
template< class T >
struct ViewDataType< T , ViewDimension<> >
{
typedef T type ;
};
template< class T , size_t ... Args >
struct ViewDataType< T , ViewDimension< 0 , Args... > >
{
typedef typename ViewDataType<T*,ViewDimension<Args...> >::type type ;
};
template< class T , size_t N , size_t ... Args >
struct ViewDataType< T , ViewDimension< N , Args... > >
{
typedef typename ViewDataType<T,ViewDimension<Args...> >::type type[N] ;
};
/**\brief Analysis of View data type.
*
* Data type conforms to one of the following patterns :
* {const} value_type [][#][#][#]
* {const} value_type ***[#][#][#]
* Where the sum of counts of '*' and '[#]' is at most ten.
*
* Provide typedef for the ViewDimension<...> and value_type.
*/
template< class T >
struct ViewArrayAnalysis
{
typedef T value_type ;
typedef typename std::add_const< T >::type const_value_type ;
typedef typename std::remove_const< T >::type non_const_value_type ;
typedef ViewDimension<> static_dimension ;
typedef ViewDimension<> dynamic_dimension ;
typedef ViewDimension<> dimension ;
};
template< class T , size_t N >
struct ViewArrayAnalysis< T[N] >
{
private:
typedef ViewArrayAnalysis< T > nested ;
public:
typedef typename nested::value_type value_type ;
typedef typename nested::const_value_type const_value_type ;
typedef typename nested::non_const_value_type non_const_value_type ;
typedef typename nested::static_dimension::template prepend<N>::type
static_dimension ;
typedef typename nested::dynamic_dimension dynamic_dimension ;
typedef typename
ViewDimensionJoin< dynamic_dimension , static_dimension >::type
dimension ;
};
template< class T >
struct ViewArrayAnalysis< T[] >
{
private:
typedef ViewArrayAnalysis< T > nested ;
typedef typename nested::dimension nested_dimension ;
public:
typedef typename nested::value_type value_type ;
typedef typename nested::const_value_type const_value_type ;
typedef typename nested::non_const_value_type non_const_value_type ;
typedef typename nested::dynamic_dimension::template prepend<0>::type
dynamic_dimension ;
typedef typename nested::static_dimension static_dimension ;
typedef typename
ViewDimensionJoin< dynamic_dimension , static_dimension >::type
dimension ;
};
template< class T >
struct ViewArrayAnalysis< T* >
{
private:
typedef ViewArrayAnalysis< T > nested ;
public:
typedef typename nested::value_type value_type ;
typedef typename nested::const_value_type const_value_type ;
typedef typename nested::non_const_value_type non_const_value_type ;
typedef typename nested::dynamic_dimension::template prepend<0>::type
dynamic_dimension ;
typedef typename nested::static_dimension static_dimension ;
typedef typename
ViewDimensionJoin< dynamic_dimension , static_dimension >::type
dimension ;
};
template< class DataType , class ArrayLayout , class ValueType >
struct ViewDataAnalysis
{
private:
typedef ViewArrayAnalysis< DataType > array_analysis ;
// ValueType is opportunity for partial specialization.
// Must match array analysis when this default template is used.
static_assert( std::is_same< ValueType , typename array_analysis::non_const_value_type >::value , "" );
public:
typedef void specialize ; // No specialization
typedef typename array_analysis::dimension dimension ;
typedef typename array_analysis::value_type value_type ;
typedef typename array_analysis::const_value_type const_value_type ;
typedef typename array_analysis::non_const_value_type non_const_value_type ;
// Generate analogous multidimensional array specification type.
typedef typename ViewDataType< value_type , dimension >::type type ;
typedef typename ViewDataType< const_value_type , dimension >::type const_type ;
typedef typename ViewDataType< non_const_value_type , dimension >::type non_const_type ;
// Generate "flattened" multidimensional array specification type.
typedef type scalar_array_type ;
typedef const_type const_scalar_array_type ;
typedef non_const_type non_const_scalar_array_type ;
};
}}} // namespace Kokkos::Experimental::Impl
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Experimental {
namespace Impl {
template < class Dimension , class Layout , typename Enable = void >
struct ViewOffset {
using is_mapping_plugin = std::false_type ;
};
//----------------------------------------------------------------------------
// LayoutLeft AND ( 1 >= rank OR 0 == rank_dynamic ) : no padding / striding
template < class Dimension >
struct ViewOffset< Dimension , Kokkos::LayoutLeft
, typename std::enable_if<( 1 >= Dimension::rank
||
0 == Dimension::rank_dynamic
)>::type >
{
using is_mapping_plugin = std::true_type ;
using is_regular = std::true_type ;
typedef size_t size_type ;
typedef Dimension dimension_type ;
typedef Kokkos::LayoutLeft array_layout ;
dimension_type m_dim ;
//----------------------------------------
// rank 1
template< typename I0 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0 ) const { return i0 ; }
// rank 2
template < typename I0 , typename I1 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0 , I1 const & i1 ) const
{ return i0 + m_dim.N0 * i1 ; }
//rank 3
template < typename I0, typename I1, typename I2 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2 ) const
{
return i0 + m_dim.N0 * ( i1 + m_dim.N1 * i2 );
}
//rank 4
template < typename I0, typename I1, typename I2, typename I3 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3 ) const
{
return i0 + m_dim.N0 * (
i1 + m_dim.N1 * (
i2 + m_dim.N2 * i3 ));
}
//rank 5
template < typename I0, typename I1, typename I2, typename I3
, typename I4 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
, I4 const & i4 ) const
{
return i0 + m_dim.N0 * (
i1 + m_dim.N1 * (
i2 + m_dim.N2 * (
i3 + m_dim.N3 * i4 )));
}
//rank 6
template < typename I0, typename I1, typename I2, typename I3
, typename I4, typename I5 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
, I4 const & i4, I5 const & i5 ) const
{
return i0 + m_dim.N0 * (
i1 + m_dim.N1 * (
i2 + m_dim.N2 * (
i3 + m_dim.N3 * (
i4 + m_dim.N4 * i5 ))));
}
//rank 7
template < typename I0, typename I1, typename I2, typename I3
, typename I4, typename I5, typename I6 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
, I4 const & i4, I5 const & i5, I6 const & i6 ) const
{
return i0 + m_dim.N0 * (
i1 + m_dim.N1 * (
i2 + m_dim.N2 * (
i3 + m_dim.N3 * (
i4 + m_dim.N4 * (
i5 + m_dim.N5 * i6 )))));
}
//rank 8
template < typename I0, typename I1, typename I2, typename I3
, typename I4, typename I5, typename I6, typename I7 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
, I4 const & i4, I5 const & i5, I6 const & i6, I7 const & i7 ) const
{
return i0 + m_dim.N0 * (
i1 + m_dim.N1 * (
i2 + m_dim.N2 * (
i3 + m_dim.N3 * (
i4 + m_dim.N4 * (
i5 + m_dim.N5 * (
i6 + m_dim.N6 * i7 ))))));
}
//----------------------------------------
KOKKOS_INLINE_FUNCTION
constexpr array_layout layout() const
{
return array_layout( m_dim.N0 , m_dim.N1 , m_dim.N2 , m_dim.N3
, m_dim.N4 , m_dim.N5 , m_dim.N6 , m_dim.N7 );
}
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_0() const { return m_dim.N0 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_1() const { return m_dim.N1 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_2() const { return m_dim.N2 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_3() const { return m_dim.N3 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_4() const { return m_dim.N4 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_5() const { return m_dim.N5 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_6() const { return m_dim.N6 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_7() const { return m_dim.N7 ; }
/* Cardinality of the domain index space */
KOKKOS_INLINE_FUNCTION
constexpr size_type size() const
{ return m_dim.N0 * m_dim.N1 * m_dim.N2 * m_dim.N3 * m_dim.N4 * m_dim.N5 * m_dim.N6 * m_dim.N7 ; }
/* Span of the range space */
KOKKOS_INLINE_FUNCTION
constexpr size_type span() const
{ return m_dim.N0 * m_dim.N1 * m_dim.N2 * m_dim.N3 * m_dim.N4 * m_dim.N5 * m_dim.N6 * m_dim.N7 ; }
KOKKOS_INLINE_FUNCTION constexpr bool span_is_contiguous() const { return true ; }
/* Strides of dimensions */
KOKKOS_INLINE_FUNCTION constexpr size_type stride_0() const { return 1 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_1() const { return m_dim.N0 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_2() const { return m_dim.N0 * m_dim.N1 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_3() const { return m_dim.N0 * m_dim.N1 * m_dim.N2 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_4() const { return m_dim.N0 * m_dim.N1 * m_dim.N2 * m_dim.N3 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_5() const { return m_dim.N0 * m_dim.N1 * m_dim.N2 * m_dim.N3 * m_dim.N4 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_6() const { return m_dim.N0 * m_dim.N1 * m_dim.N2 * m_dim.N3 * m_dim.N4 * m_dim.N5 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_7() const { return m_dim.N0 * m_dim.N1 * m_dim.N2 * m_dim.N3 * m_dim.N4 * m_dim.N5 * m_dim.N6 ; }
// Stride with [ rank ] value is the total length
template< typename iType >
KOKKOS_INLINE_FUNCTION
void stride( iType * const s ) const
{
s[0] = 1 ;
if ( 0 < dimension_type::rank ) { s[1] = m_dim.N0 ; }
if ( 1 < dimension_type::rank ) { s[2] = s[1] * m_dim.N1 ; }
if ( 2 < dimension_type::rank ) { s[3] = s[2] * m_dim.N2 ; }
if ( 3 < dimension_type::rank ) { s[4] = s[3] * m_dim.N3 ; }
if ( 4 < dimension_type::rank ) { s[5] = s[4] * m_dim.N4 ; }
if ( 5 < dimension_type::rank ) { s[6] = s[5] * m_dim.N5 ; }
if ( 6 < dimension_type::rank ) { s[7] = s[6] * m_dim.N6 ; }
if ( 7 < dimension_type::rank ) { s[8] = s[7] * m_dim.N7 ; }
}
//----------------------------------------
ViewOffset() = default ;
ViewOffset( const ViewOffset & ) = default ;
ViewOffset & operator = ( const ViewOffset & ) = default ;
template< unsigned TrivialScalarSize >
KOKKOS_INLINE_FUNCTION
constexpr ViewOffset
( std::integral_constant<unsigned,TrivialScalarSize> const &
, Kokkos::LayoutLeft const & arg_layout
)
: m_dim( arg_layout.dimension[0], 0, 0, 0, 0, 0, 0, 0 )
{}
template< class DimRHS >
KOKKOS_INLINE_FUNCTION
constexpr ViewOffset( const ViewOffset< DimRHS , Kokkos::LayoutLeft , void > & rhs )
: m_dim( rhs.m_dim.N0 , rhs.m_dim.N1 , rhs.m_dim.N2 , rhs.m_dim.N3
, rhs.m_dim.N4 , rhs.m_dim.N5 , rhs.m_dim.N6 , rhs.m_dim.N7 )
{
static_assert( int(DimRHS::rank) == int(dimension_type::rank) , "ViewOffset assignment requires equal rank" );
// Also requires equal static dimensions ...
}
template< class DimRHS >
KOKKOS_INLINE_FUNCTION
constexpr ViewOffset( const ViewOffset< DimRHS , Kokkos::LayoutRight , void > & rhs )
: m_dim( rhs.m_dim.N0, 0, 0, 0, 0, 0, 0, 0 )
{
static_assert( DimRHS::rank == 1 && dimension_type::rank == 1 && dimension_type::rank_dynamic == 1
, "ViewOffset LayoutLeft and LayoutRight are only compatible when rank == 1" );
}
template< class DimRHS >
KOKKOS_INLINE_FUNCTION
ViewOffset( const ViewOffset< DimRHS , Kokkos::LayoutStride , void > & rhs )
: m_dim( rhs.m_dim.N0, 0, 0, 0, 0, 0, 0, 0 )
{
static_assert( DimRHS::rank == 1 && dimension_type::rank == 1 && dimension_type::rank_dynamic == 1
, "ViewOffset LayoutLeft and LayoutStride are only compatible when rank == 1" );
if ( rhs.m_stride.S0 != 1 ) {
- Kokkos::abort("Kokkos::Experimental::ViewOffset assignment of LayoutLeft from LayoutStride requires stride == 1" );
+ Kokkos::abort("Kokkos::Impl::ViewOffset assignment of LayoutLeft from LayoutStride requires stride == 1" );
}
}
//----------------------------------------
// Subview construction
template< class DimRHS >
KOKKOS_INLINE_FUNCTION
constexpr ViewOffset(
const ViewOffset< DimRHS , Kokkos::LayoutLeft , void > & rhs ,
const SubviewExtents< DimRHS::rank , dimension_type::rank > & sub )
: m_dim( sub.range_extent(0), 0, 0, 0, 0, 0, 0, 0 )
{
static_assert( ( 0 == dimension_type::rank ) ||
( 1 == dimension_type::rank && 1 == dimension_type::rank_dynamic && 1 <= DimRHS::rank )
, "ViewOffset subview construction requires compatible rank" );
}
};
//----------------------------------------------------------------------------
// LayoutLeft AND ( 1 < rank AND 0 < rank_dynamic ) : has padding / striding
template < class Dimension >
struct ViewOffset< Dimension , Kokkos::LayoutLeft
, typename std::enable_if<( 1 < Dimension::rank
&&
0 < Dimension::rank_dynamic
)>::type >
{
using is_mapping_plugin = std::true_type ;
using is_regular = std::true_type ;
typedef size_t size_type ;
typedef Dimension dimension_type ;
typedef Kokkos::LayoutLeft array_layout ;
dimension_type m_dim ;
size_type m_stride ;
//----------------------------------------
// rank 1
template< typename I0 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0 ) const { return i0 ; }
// rank 2
template < typename I0 , typename I1 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0 , I1 const & i1 ) const
{ return i0 + m_stride * i1 ; }
//rank 3
template < typename I0, typename I1, typename I2 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2 ) const
{
return i0 + m_stride * ( i1 + m_dim.N1 * i2 );
}
//rank 4
template < typename I0, typename I1, typename I2, typename I3 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3 ) const
{
return i0 + m_stride * (
i1 + m_dim.N1 * (
i2 + m_dim.N2 * i3 ));
}
//rank 5
template < typename I0, typename I1, typename I2, typename I3
, typename I4 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
, I4 const & i4 ) const
{
return i0 + m_stride * (
i1 + m_dim.N1 * (
i2 + m_dim.N2 * (
i3 + m_dim.N3 * i4 )));
}
//rank 6
template < typename I0, typename I1, typename I2, typename I3
, typename I4, typename I5 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
, I4 const & i4, I5 const & i5 ) const
{
return i0 + m_stride * (
i1 + m_dim.N1 * (
i2 + m_dim.N2 * (
i3 + m_dim.N3 * (
i4 + m_dim.N4 * i5 ))));
}
//rank 7
template < typename I0, typename I1, typename I2, typename I3
, typename I4, typename I5, typename I6 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
, I4 const & i4, I5 const & i5, I6 const & i6 ) const
{
return i0 + m_stride * (
i1 + m_dim.N1 * (
i2 + m_dim.N2 * (
i3 + m_dim.N3 * (
i4 + m_dim.N4 * (
i5 + m_dim.N5 * i6 )))));
}
//rank 8
template < typename I0, typename I1, typename I2, typename I3
, typename I4, typename I5, typename I6, typename I7 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
, I4 const & i4, I5 const & i5, I6 const & i6, I7 const & i7 ) const
{
return i0 + m_stride * (
i1 + m_dim.N1 * (
i2 + m_dim.N2 * (
i3 + m_dim.N3 * (
i4 + m_dim.N4 * (
i5 + m_dim.N5 * (
i6 + m_dim.N6 * i7 ))))));
}
//----------------------------------------
KOKKOS_INLINE_FUNCTION
constexpr array_layout layout() const
{
return array_layout( m_dim.N0 , m_dim.N1 , m_dim.N2 , m_dim.N3
, m_dim.N4 , m_dim.N5 , m_dim.N6 , m_dim.N7 );
}
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_0() const { return m_dim.N0 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_1() const { return m_dim.N1 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_2() const { return m_dim.N2 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_3() const { return m_dim.N3 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_4() const { return m_dim.N4 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_5() const { return m_dim.N5 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_6() const { return m_dim.N6 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_7() const { return m_dim.N7 ; }
/* Cardinality of the domain index space */
KOKKOS_INLINE_FUNCTION
constexpr size_type size() const
{ return m_dim.N0 * m_dim.N1 * m_dim.N2 * m_dim.N3 * m_dim.N4 * m_dim.N5 * m_dim.N6 * m_dim.N7 ; }
/* Span of the range space */
KOKKOS_INLINE_FUNCTION
constexpr size_type span() const
{ return m_stride * m_dim.N1 * m_dim.N2 * m_dim.N3 * m_dim.N4 * m_dim.N5 * m_dim.N6 * m_dim.N7 ; }
KOKKOS_INLINE_FUNCTION constexpr bool span_is_contiguous() const { return m_stride == m_dim.N0 ; }
/* Strides of dimensions */
KOKKOS_INLINE_FUNCTION constexpr size_type stride_0() const { return 1 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_1() const { return m_stride ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_2() const { return m_stride * m_dim.N1 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_3() const { return m_stride * m_dim.N1 * m_dim.N2 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_4() const { return m_stride * m_dim.N1 * m_dim.N2 * m_dim.N3 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_5() const { return m_stride * m_dim.N1 * m_dim.N2 * m_dim.N3 * m_dim.N4 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_6() const { return m_stride * m_dim.N1 * m_dim.N2 * m_dim.N3 * m_dim.N4 * m_dim.N5 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_7() const { return m_stride * m_dim.N1 * m_dim.N2 * m_dim.N3 * m_dim.N4 * m_dim.N5 * m_dim.N6 ; }
// Stride with [ rank ] value is the total length
template< typename iType >
KOKKOS_INLINE_FUNCTION
void stride( iType * const s ) const
{
s[0] = 1 ;
if ( 0 < dimension_type::rank ) { s[1] = m_stride ; }
if ( 1 < dimension_type::rank ) { s[2] = s[1] * m_dim.N1 ; }
if ( 2 < dimension_type::rank ) { s[3] = s[2] * m_dim.N2 ; }
if ( 3 < dimension_type::rank ) { s[4] = s[3] * m_dim.N3 ; }
if ( 4 < dimension_type::rank ) { s[5] = s[4] * m_dim.N4 ; }
if ( 5 < dimension_type::rank ) { s[6] = s[5] * m_dim.N5 ; }
if ( 6 < dimension_type::rank ) { s[7] = s[6] * m_dim.N6 ; }
if ( 7 < dimension_type::rank ) { s[8] = s[7] * m_dim.N7 ; }
}
//----------------------------------------
private:
template< unsigned TrivialScalarSize >
struct Padding {
enum { div = TrivialScalarSize == 0 ? 0 : Kokkos::Impl::MEMORY_ALIGNMENT / ( TrivialScalarSize ? TrivialScalarSize : 1 ) };
enum { mod = TrivialScalarSize == 0 ? 0 : Kokkos::Impl::MEMORY_ALIGNMENT % ( TrivialScalarSize ? TrivialScalarSize : 1 ) };
// If memory alignment is a multiple of the trivial scalar size then attempt to align.
enum { align = 0 != TrivialScalarSize && 0 == mod ? div : 0 };
enum { div_ok = div ? div : 1 }; // To valid modulo zero in constexpr
KOKKOS_INLINE_FUNCTION
static constexpr size_t stride( size_t const N )
{
return ( align && ( Kokkos::Impl::MEMORY_ALIGNMENT_THRESHOLD * align < N ) && ( N % div_ok ) )
? N + align - ( N % div_ok ) : N ;
}
};
public:
ViewOffset() = default ;
ViewOffset( const ViewOffset & ) = default ;
ViewOffset & operator = ( const ViewOffset & ) = default ;
/* Enable padding for trivial scalar types with non-zero trivial scalar size */
template< unsigned TrivialScalarSize >
KOKKOS_INLINE_FUNCTION
constexpr ViewOffset
( std::integral_constant<unsigned,TrivialScalarSize> const & padding_type_size
, Kokkos::LayoutLeft const & arg_layout
)
: m_dim( arg_layout.dimension[0] , arg_layout.dimension[1]
, arg_layout.dimension[2] , arg_layout.dimension[3]
, arg_layout.dimension[4] , arg_layout.dimension[5]
, arg_layout.dimension[6] , arg_layout.dimension[7]
)
, m_stride( Padding<TrivialScalarSize>::stride( arg_layout.dimension[0] ) )
{}
template< class DimRHS >
KOKKOS_INLINE_FUNCTION
constexpr ViewOffset( const ViewOffset< DimRHS , Kokkos::LayoutLeft , void > & rhs )
: m_dim( rhs.m_dim.N0 , rhs.m_dim.N1 , rhs.m_dim.N2 , rhs.m_dim.N3
, rhs.m_dim.N4 , rhs.m_dim.N5 , rhs.m_dim.N6 , rhs.m_dim.N7 )
, m_stride( rhs.stride_1() )
{
static_assert( int(DimRHS::rank) == int(dimension_type::rank) , "ViewOffset assignment requires equal rank" );
// Also requires equal static dimensions ...
}
//----------------------------------------
// Subview construction
// This subview must be 2 == rank and 2 == rank_dynamic
// due to only having stride #0.
// The source dimension #0 must be non-zero for stride-one leading dimension.
// At most subsequent dimension can be non-zero.
template< class DimRHS >
KOKKOS_INLINE_FUNCTION
constexpr ViewOffset
( const ViewOffset< DimRHS , Kokkos::LayoutLeft , void > & rhs ,
const SubviewExtents< DimRHS::rank , dimension_type::rank > & sub )
: m_dim( sub.range_extent(0)
, sub.range_extent(1)
- , 0, 0, 0, 0, 0, 0 )
+ , sub.range_extent(2)
+ , sub.range_extent(3)
+ , sub.range_extent(4)
+ , sub.range_extent(5)
+ , sub.range_extent(6)
+ , sub.range_extent(7))
, m_stride( ( 1 == sub.range_index(1) ? rhs.stride_1() :
( 2 == sub.range_index(1) ? rhs.stride_2() :
( 3 == sub.range_index(1) ? rhs.stride_3() :
( 4 == sub.range_index(1) ? rhs.stride_4() :
( 5 == sub.range_index(1) ? rhs.stride_5() :
( 6 == sub.range_index(1) ? rhs.stride_6() :
( 7 == sub.range_index(1) ? rhs.stride_7() : 0 ))))))))
{
- static_assert( ( 2 == dimension_type::rank ) &&
- ( 2 == dimension_type::rank_dynamic ) &&
- ( 2 <= DimRHS::rank )
- , "ViewOffset subview construction requires compatible rank" );
+ //static_assert( ( 2 == dimension_type::rank ) &&
+ // ( 2 == dimension_type::rank_dynamic ) &&
+ // ( 2 <= DimRHS::rank )
+ // , "ViewOffset subview construction requires compatible rank" );
}
};
//----------------------------------------------------------------------------
// LayoutRight AND ( 1 >= rank OR 0 == rank_dynamic ) : no padding / striding
template < class Dimension >
struct ViewOffset< Dimension , Kokkos::LayoutRight
, typename std::enable_if<( 1 >= Dimension::rank
||
0 == Dimension::rank_dynamic
)>::type >
{
using is_mapping_plugin = std::true_type ;
using is_regular = std::true_type ;
typedef size_t size_type ;
typedef Dimension dimension_type ;
typedef Kokkos::LayoutRight array_layout ;
dimension_type m_dim ;
//----------------------------------------
// rank 1
template< typename I0 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0 ) const { return i0 ; }
// rank 2
template < typename I0 , typename I1 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0 , I1 const & i1 ) const
{ return i1 + m_dim.N1 * i0 ; }
//rank 3
template < typename I0, typename I1, typename I2 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2 ) const
{
return i2 + m_dim.N2 * ( i1 + m_dim.N1 * ( i0 ));
}
//rank 4
template < typename I0, typename I1, typename I2, typename I3 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3 ) const
{
return i3 + m_dim.N3 * (
i2 + m_dim.N2 * (
i1 + m_dim.N1 * ( i0 )));
}
//rank 5
template < typename I0, typename I1, typename I2, typename I3
, typename I4 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
, I4 const & i4 ) const
{
return i4 + m_dim.N4 * (
i3 + m_dim.N3 * (
i2 + m_dim.N2 * (
i1 + m_dim.N1 * ( i0 ))));
}
//rank 6
template < typename I0, typename I1, typename I2, typename I3
, typename I4, typename I5 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
, I4 const & i4, I5 const & i5 ) const
{
return i5 + m_dim.N5 * (
i4 + m_dim.N4 * (
i3 + m_dim.N3 * (
i2 + m_dim.N2 * (
i1 + m_dim.N1 * ( i0 )))));
}
//rank 7
template < typename I0, typename I1, typename I2, typename I3
, typename I4, typename I5, typename I6 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
, I4 const & i4, I5 const & i5, I6 const & i6 ) const
{
return i6 + m_dim.N6 * (
i5 + m_dim.N5 * (
i4 + m_dim.N4 * (
i3 + m_dim.N3 * (
i2 + m_dim.N2 * (
i1 + m_dim.N1 * ( i0 ))))));
}
//rank 8
template < typename I0, typename I1, typename I2, typename I3
, typename I4, typename I5, typename I6, typename I7 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
, I4 const & i4, I5 const & i5, I6 const & i6, I7 const & i7 ) const
{
return i7 + m_dim.N7 * (
i6 + m_dim.N6 * (
i5 + m_dim.N5 * (
i4 + m_dim.N4 * (
i3 + m_dim.N3 * (
i2 + m_dim.N2 * (
i1 + m_dim.N1 * ( i0 )))))));
}
//----------------------------------------
KOKKOS_INLINE_FUNCTION
constexpr array_layout layout() const
{
return array_layout( m_dim.N0 , m_dim.N1 , m_dim.N2 , m_dim.N3
, m_dim.N4 , m_dim.N5 , m_dim.N6 , m_dim.N7 );
}
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_0() const { return m_dim.N0 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_1() const { return m_dim.N1 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_2() const { return m_dim.N2 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_3() const { return m_dim.N3 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_4() const { return m_dim.N4 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_5() const { return m_dim.N5 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_6() const { return m_dim.N6 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_7() const { return m_dim.N7 ; }
/* Cardinality of the domain index space */
KOKKOS_INLINE_FUNCTION
constexpr size_type size() const
{ return m_dim.N0 * m_dim.N1 * m_dim.N2 * m_dim.N3 * m_dim.N4 * m_dim.N5 * m_dim.N6 * m_dim.N7 ; }
/* Span of the range space */
KOKKOS_INLINE_FUNCTION
constexpr size_type span() const
{ return m_dim.N0 * m_dim.N1 * m_dim.N2 * m_dim.N3 * m_dim.N4 * m_dim.N5 * m_dim.N6 * m_dim.N7 ; }
KOKKOS_INLINE_FUNCTION constexpr bool span_is_contiguous() const { return true ; }
/* Strides of dimensions */
KOKKOS_INLINE_FUNCTION constexpr size_type stride_7() const { return 1 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_6() const { return m_dim.N7 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_5() const { return m_dim.N7 * m_dim.N6 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_4() const { return m_dim.N7 * m_dim.N6 * m_dim.N5 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_3() const { return m_dim.N7 * m_dim.N6 * m_dim.N5 * m_dim.N4 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_2() const { return m_dim.N7 * m_dim.N6 * m_dim.N5 * m_dim.N4 * m_dim.N3 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_1() const { return m_dim.N7 * m_dim.N6 * m_dim.N5 * m_dim.N4 * m_dim.N3 * m_dim.N2 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_0() const { return m_dim.N7 * m_dim.N6 * m_dim.N5 * m_dim.N4 * m_dim.N3 * m_dim.N2 * m_dim.N1 ; }
// Stride with [ rank ] value is the total length
template< typename iType >
KOKKOS_INLINE_FUNCTION
void stride( iType * const s ) const
{
size_type n = 1 ;
if ( 7 < dimension_type::rank ) { s[7] = n ; n *= m_dim.N7 ; }
if ( 6 < dimension_type::rank ) { s[6] = n ; n *= m_dim.N6 ; }
if ( 5 < dimension_type::rank ) { s[5] = n ; n *= m_dim.N5 ; }
if ( 4 < dimension_type::rank ) { s[4] = n ; n *= m_dim.N4 ; }
if ( 3 < dimension_type::rank ) { s[3] = n ; n *= m_dim.N3 ; }
if ( 2 < dimension_type::rank ) { s[2] = n ; n *= m_dim.N2 ; }
if ( 1 < dimension_type::rank ) { s[1] = n ; n *= m_dim.N1 ; }
if ( 0 < dimension_type::rank ) { s[0] = n ; }
s[dimension_type::rank] = n * m_dim.N0 ;
}
//----------------------------------------
ViewOffset() = default ;
ViewOffset( const ViewOffset & ) = default ;
ViewOffset & operator = ( const ViewOffset & ) = default ;
template< unsigned TrivialScalarSize >
KOKKOS_INLINE_FUNCTION
constexpr ViewOffset
( std::integral_constant<unsigned,TrivialScalarSize> const &
, Kokkos::LayoutRight const & arg_layout
)
: m_dim( arg_layout.dimension[0], 0, 0, 0, 0, 0, 0, 0 )
{}
template< class DimRHS >
KOKKOS_INLINE_FUNCTION
constexpr ViewOffset( const ViewOffset< DimRHS , Kokkos::LayoutRight , void > & rhs )
: m_dim( rhs.m_dim.N0 , rhs.m_dim.N1 , rhs.m_dim.N2 , rhs.m_dim.N3
, rhs.m_dim.N4 , rhs.m_dim.N5 , rhs.m_dim.N6 , rhs.m_dim.N7 )
{
static_assert( int(DimRHS::rank) == int(dimension_type::rank) , "ViewOffset assignment requires equal rank" );
// Also requires equal static dimensions ...
}
template< class DimRHS >
KOKKOS_INLINE_FUNCTION
constexpr ViewOffset( const ViewOffset< DimRHS , Kokkos::LayoutLeft , void > & rhs )
: m_dim( rhs.m_dim.N0, 0, 0, 0, 0, 0, 0, 0 )
{
static_assert( DimRHS::rank == 1 && dimension_type::rank == 1 && dimension_type::rank_dynamic == 1
, "ViewOffset LayoutRight and LayoutLeft are only compatible when rank == 1" );
}
template< class DimRHS >
KOKKOS_INLINE_FUNCTION
ViewOffset( const ViewOffset< DimRHS , Kokkos::LayoutStride , void > & rhs )
: m_dim( rhs.m_dim.N0, 0, 0, 0, 0, 0, 0, 0 )
{
static_assert( DimRHS::rank == 1 && dimension_type::rank == 1 && dimension_type::rank_dynamic == 1
, "ViewOffset LayoutLeft/Right and LayoutStride are only compatible when rank == 1" );
if ( rhs.m_stride.S0 != 1 ) {
- Kokkos::abort("Kokkos::Experimental::ViewOffset assignment of LayoutLeft/Right from LayoutStride requires stride == 1" );
+ Kokkos::abort("Kokkos::Impl::ViewOffset assignment of LayoutLeft/Right from LayoutStride requires stride == 1" );
}
}
//----------------------------------------
// Subview construction
template< class DimRHS >
KOKKOS_INLINE_FUNCTION
constexpr ViewOffset
( const ViewOffset< DimRHS , Kokkos::LayoutRight , void > & rhs
, const SubviewExtents< DimRHS::rank , dimension_type::rank > & sub
)
: m_dim( sub.range_extent(0) , 0, 0, 0, 0, 0, 0, 0 )
{
static_assert( ( 0 == dimension_type::rank_dynamic ) ||
( 1 == dimension_type::rank && 1 == dimension_type::rank_dynamic && 1 <= DimRHS::rank )
, "ViewOffset subview construction requires compatible rank" );
}
};
//----------------------------------------------------------------------------
// LayoutRight AND ( 1 < rank AND 0 < rank_dynamic ) : has padding / striding
template < class Dimension >
struct ViewOffset< Dimension , Kokkos::LayoutRight
, typename std::enable_if<( 1 < Dimension::rank
&&
0 < Dimension::rank_dynamic
)>::type >
{
using is_mapping_plugin = std::true_type ;
using is_regular = std::true_type ;
typedef size_t size_type ;
typedef Dimension dimension_type ;
typedef Kokkos::LayoutRight array_layout ;
dimension_type m_dim ;
size_type m_stride ;
//----------------------------------------
// rank 1
template< typename I0 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0 ) const { return i0 ; }
// rank 2
template < typename I0 , typename I1 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0 , I1 const & i1 ) const
{ return i1 + i0 * m_stride ; }
//rank 3
template < typename I0, typename I1, typename I2 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2 ) const
{ return i2 + m_dim.N2 * ( i1 ) + i0 * m_stride ; }
//rank 4
template < typename I0, typename I1, typename I2, typename I3 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3 ) const
{
return i3 + m_dim.N3 * (
i2 + m_dim.N2 * ( i1 )) +
i0 * m_stride ;
}
//rank 5
template < typename I0, typename I1, typename I2, typename I3
, typename I4 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
, I4 const & i4 ) const
{
return i4 + m_dim.N4 * (
i3 + m_dim.N3 * (
i2 + m_dim.N2 * ( i1 ))) +
i0 * m_stride ;
}
//rank 6
template < typename I0, typename I1, typename I2, typename I3
, typename I4, typename I5 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
, I4 const & i4, I5 const & i5 ) const
{
return i5 + m_dim.N5 * (
i4 + m_dim.N4 * (
i3 + m_dim.N3 * (
i2 + m_dim.N2 * ( i1 )))) +
i0 * m_stride ;
}
//rank 7
template < typename I0, typename I1, typename I2, typename I3
, typename I4, typename I5, typename I6 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
, I4 const & i4, I5 const & i5, I6 const & i6 ) const
{
return i6 + m_dim.N6 * (
i5 + m_dim.N5 * (
i4 + m_dim.N4 * (
i3 + m_dim.N3 * (
i2 + m_dim.N2 * ( i1 ))))) +
i0 * m_stride ;
}
//rank 8
template < typename I0, typename I1, typename I2, typename I3
, typename I4, typename I5, typename I6, typename I7 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
, I4 const & i4, I5 const & i5, I6 const & i6, I7 const & i7 ) const
{
return i7 + m_dim.N7 * (
i6 + m_dim.N6 * (
i5 + m_dim.N5 * (
i4 + m_dim.N4 * (
i3 + m_dim.N3 * (
i2 + m_dim.N2 * ( i1 )))))) +
i0 * m_stride ;
}
//----------------------------------------
KOKKOS_INLINE_FUNCTION
constexpr array_layout layout() const
{
return array_layout( m_dim.N0 , m_dim.N1 , m_dim.N2 , m_dim.N3
, m_dim.N4 , m_dim.N5 , m_dim.N6 , m_dim.N7 );
}
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_0() const { return m_dim.N0 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_1() const { return m_dim.N1 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_2() const { return m_dim.N2 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_3() const { return m_dim.N3 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_4() const { return m_dim.N4 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_5() const { return m_dim.N5 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_6() const { return m_dim.N6 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_7() const { return m_dim.N7 ; }
/* Cardinality of the domain index space */
KOKKOS_INLINE_FUNCTION
constexpr size_type size() const
{ return m_dim.N0 * m_dim.N1 * m_dim.N2 * m_dim.N3 * m_dim.N4 * m_dim.N5 * m_dim.N6 * m_dim.N7 ; }
/* Span of the range space */
KOKKOS_INLINE_FUNCTION
constexpr size_type span() const
{ return m_dim.N0 * m_stride ; }
KOKKOS_INLINE_FUNCTION constexpr bool span_is_contiguous() const
{ return m_stride == m_dim.N7 * m_dim.N6 * m_dim.N5 * m_dim.N4 * m_dim.N3 * m_dim.N2 * m_dim.N1 ; }
/* Strides of dimensions */
KOKKOS_INLINE_FUNCTION constexpr size_type stride_7() const { return 1 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_6() const { return m_dim.N7 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_5() const { return m_dim.N7 * m_dim.N6 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_4() const { return m_dim.N7 * m_dim.N6 * m_dim.N5 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_3() const { return m_dim.N7 * m_dim.N6 * m_dim.N5 * m_dim.N4 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_2() const { return m_dim.N7 * m_dim.N6 * m_dim.N5 * m_dim.N4 * m_dim.N3 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_1() const { return m_dim.N7 * m_dim.N6 * m_dim.N5 * m_dim.N4 * m_dim.N3 * m_dim.N2 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_0() const { return m_stride ; }
// Stride with [ rank ] value is the total length
template< typename iType >
KOKKOS_INLINE_FUNCTION
void stride( iType * const s ) const
{
size_type n = 1 ;
if ( 7 < dimension_type::rank ) { s[7] = n ; n *= m_dim.N7 ; }
if ( 6 < dimension_type::rank ) { s[6] = n ; n *= m_dim.N6 ; }
if ( 5 < dimension_type::rank ) { s[5] = n ; n *= m_dim.N5 ; }
if ( 4 < dimension_type::rank ) { s[4] = n ; n *= m_dim.N4 ; }
if ( 3 < dimension_type::rank ) { s[3] = n ; n *= m_dim.N3 ; }
if ( 2 < dimension_type::rank ) { s[2] = n ; n *= m_dim.N2 ; }
if ( 1 < dimension_type::rank ) { s[1] = n ; }
if ( 0 < dimension_type::rank ) { s[0] = m_stride ; }
s[dimension_type::rank] = m_stride * m_dim.N0 ;
}
//----------------------------------------
private:
template< unsigned TrivialScalarSize >
struct Padding {
enum { div = TrivialScalarSize == 0 ? 0 : Kokkos::Impl::MEMORY_ALIGNMENT / ( TrivialScalarSize ? TrivialScalarSize : 1 ) };
enum { mod = TrivialScalarSize == 0 ? 0 : Kokkos::Impl::MEMORY_ALIGNMENT % ( TrivialScalarSize ? TrivialScalarSize : 1 ) };
// If memory alignment is a multiple of the trivial scalar size then attempt to align.
enum { align = 0 != TrivialScalarSize && 0 == mod ? div : 0 };
enum { div_ok = div ? div : 1 }; // To valid modulo zero in constexpr
KOKKOS_INLINE_FUNCTION
static constexpr size_t stride( size_t const N )
{
return ( align && ( Kokkos::Impl::MEMORY_ALIGNMENT_THRESHOLD * align < N ) && ( N % div_ok ) )
? N + align - ( N % div_ok ) : N ;
}
};
public:
ViewOffset() = default ;
ViewOffset( const ViewOffset & ) = default ;
ViewOffset & operator = ( const ViewOffset & ) = default ;
/* Enable padding for trivial scalar types with non-zero trivial scalar size. */
template< unsigned TrivialScalarSize >
KOKKOS_INLINE_FUNCTION
constexpr ViewOffset
( std::integral_constant<unsigned,TrivialScalarSize> const & padding_type_size
, Kokkos::LayoutRight const & arg_layout
)
: m_dim( arg_layout.dimension[0] , arg_layout.dimension[1]
, arg_layout.dimension[2] , arg_layout.dimension[3]
, arg_layout.dimension[4] , arg_layout.dimension[5]
, arg_layout.dimension[6] , arg_layout.dimension[7]
)
, m_stride( Padding<TrivialScalarSize>::
stride( /* 2 <= rank */
m_dim.N1 * ( dimension_type::rank == 2 ? 1 :
m_dim.N2 * ( dimension_type::rank == 3 ? 1 :
m_dim.N3 * ( dimension_type::rank == 4 ? 1 :
m_dim.N4 * ( dimension_type::rank == 5 ? 1 :
m_dim.N5 * ( dimension_type::rank == 6 ? 1 :
m_dim.N6 * ( dimension_type::rank == 7 ? 1 : m_dim.N7 )))))) ))
{}
template< class DimRHS >
KOKKOS_INLINE_FUNCTION
constexpr ViewOffset( const ViewOffset< DimRHS , Kokkos::LayoutRight , void > & rhs )
: m_dim( rhs.m_dim.N0 , rhs.m_dim.N1 , rhs.m_dim.N2 , rhs.m_dim.N3
, rhs.m_dim.N4 , rhs.m_dim.N5 , rhs.m_dim.N6 , rhs.m_dim.N7 )
, m_stride( rhs.stride_0() )
{
static_assert( int(DimRHS::rank) == int(dimension_type::rank) , "ViewOffset assignment requires equal rank" );
// Also requires equal static dimensions ...
}
//----------------------------------------
// Subview construction
// Last dimension must be non-zero
template< class DimRHS >
KOKKOS_INLINE_FUNCTION
constexpr ViewOffset
( const ViewOffset< DimRHS , Kokkos::LayoutRight , void > & rhs
, const SubviewExtents< DimRHS::rank , dimension_type::rank > & sub
)
: m_dim( sub.range_extent(0)
, sub.range_extent(1)
- , 0, 0, 0, 0, 0, 0 )
+ , sub.range_extent(2)
+ , sub.range_extent(3)
+ , sub.range_extent(4)
+ , sub.range_extent(5)
+ , sub.range_extent(6)
+ , sub.range_extent(7))
, m_stride( 0 == sub.range_index(0) ? rhs.stride_0() : (
1 == sub.range_index(0) ? rhs.stride_1() : (
2 == sub.range_index(0) ? rhs.stride_2() : (
3 == sub.range_index(0) ? rhs.stride_3() : (
4 == sub.range_index(0) ? rhs.stride_4() : (
5 == sub.range_index(0) ? rhs.stride_5() : (
6 == sub.range_index(0) ? rhs.stride_6() : 0 )))))))
{
- // This subview must be 2 == rank and 2 == rank_dynamic
+/* // This subview must be 2 == rank and 2 == rank_dynamic
// due to only having stride #0.
// The source dimension #0 must be non-zero for stride-one leading dimension.
// At most subsequent dimension can be non-zero.
- static_assert( ( 2 == dimension_type::rank ) &&
- ( 2 <= DimRHS::rank )
+ static_assert( (( 2 == dimension_type::rank ) &&
+ ( 2 <= DimRHS::rank )) ||
+ ()
, "ViewOffset subview construction requires compatible rank" );
+*/
}
};
//----------------------------------------------------------------------------
/* Strided array layout only makes sense for 0 < rank */
/* rank = 0 included for DynRankView case */
template< unsigned Rank >
struct ViewStride ;
template<>
struct ViewStride<0> {
enum { S0 = 0 , S1 = 0 , S2 = 0 , S3 = 0 , S4 = 0 , S5 = 0 , S6 = 0 , S7 = 0 };
ViewStride() = default ;
ViewStride( const ViewStride & ) = default ;
ViewStride & operator = ( const ViewStride & ) = default ;
KOKKOS_INLINE_FUNCTION
constexpr ViewStride( size_t , size_t , size_t , size_t
, size_t , size_t , size_t , size_t )
{}
};
template<>
struct ViewStride<1> {
size_t S0 ;
enum { S1 = 0 , S2 = 0 , S3 = 0 , S4 = 0 , S5 = 0 , S6 = 0 , S7 = 0 };
ViewStride() = default ;
ViewStride( const ViewStride & ) = default ;
ViewStride & operator = ( const ViewStride & ) = default ;
KOKKOS_INLINE_FUNCTION
constexpr ViewStride( size_t aS0 , size_t , size_t , size_t
, size_t , size_t , size_t , size_t )
: S0( aS0 )
{}
};
template<>
struct ViewStride<2> {
size_t S0 , S1 ;
enum { S2 = 0 , S3 = 0 , S4 = 0 , S5 = 0 , S6 = 0 , S7 = 0 };
ViewStride() = default ;
ViewStride( const ViewStride & ) = default ;
ViewStride & operator = ( const ViewStride & ) = default ;
KOKKOS_INLINE_FUNCTION
constexpr ViewStride( size_t aS0 , size_t aS1 , size_t , size_t
, size_t , size_t , size_t , size_t )
: S0( aS0 ) , S1( aS1 )
{}
};
template<>
struct ViewStride<3> {
size_t S0 , S1 , S2 ;
enum { S3 = 0 , S4 = 0 , S5 = 0 , S6 = 0 , S7 = 0 };
ViewStride() = default ;
ViewStride( const ViewStride & ) = default ;
ViewStride & operator = ( const ViewStride & ) = default ;
KOKKOS_INLINE_FUNCTION
constexpr ViewStride( size_t aS0 , size_t aS1 , size_t aS2 , size_t
, size_t , size_t , size_t , size_t )
: S0( aS0 ) , S1( aS1 ) , S2( aS2 )
{}
};
template<>
struct ViewStride<4> {
size_t S0 , S1 , S2 , S3 ;
enum { S4 = 0 , S5 = 0 , S6 = 0 , S7 = 0 };
ViewStride() = default ;
ViewStride( const ViewStride & ) = default ;
ViewStride & operator = ( const ViewStride & ) = default ;
KOKKOS_INLINE_FUNCTION
constexpr ViewStride( size_t aS0 , size_t aS1 , size_t aS2 , size_t aS3
, size_t , size_t , size_t , size_t )
: S0( aS0 ) , S1( aS1 ) , S2( aS2 ) , S3( aS3 )
{}
};
template<>
struct ViewStride<5> {
size_t S0 , S1 , S2 , S3 , S4 ;
enum { S5 = 0 , S6 = 0 , S7 = 0 };
ViewStride() = default ;
ViewStride( const ViewStride & ) = default ;
ViewStride & operator = ( const ViewStride & ) = default ;
KOKKOS_INLINE_FUNCTION
constexpr ViewStride( size_t aS0 , size_t aS1 , size_t aS2 , size_t aS3
, size_t aS4 , size_t , size_t , size_t )
: S0( aS0 ) , S1( aS1 ) , S2( aS2 ) , S3( aS3 )
, S4( aS4 )
{}
};
template<>
struct ViewStride<6> {
size_t S0 , S1 , S2 , S3 , S4 , S5 ;
enum { S6 = 0 , S7 = 0 };
ViewStride() = default ;
ViewStride( const ViewStride & ) = default ;
ViewStride & operator = ( const ViewStride & ) = default ;
KOKKOS_INLINE_FUNCTION
constexpr ViewStride( size_t aS0 , size_t aS1 , size_t aS2 , size_t aS3
, size_t aS4 , size_t aS5 , size_t , size_t )
: S0( aS0 ) , S1( aS1 ) , S2( aS2 ) , S3( aS3 )
, S4( aS4 ) , S5( aS5 )
{}
};
template<>
struct ViewStride<7> {
size_t S0 , S1 , S2 , S3 , S4 , S5 , S6 ;
enum { S7 = 0 };
ViewStride() = default ;
ViewStride( const ViewStride & ) = default ;
ViewStride & operator = ( const ViewStride & ) = default ;
KOKKOS_INLINE_FUNCTION
constexpr ViewStride( size_t aS0 , size_t aS1 , size_t aS2 , size_t aS3
, size_t aS4 , size_t aS5 , size_t aS6 , size_t )
: S0( aS0 ) , S1( aS1 ) , S2( aS2 ) , S3( aS3 )
, S4( aS4 ) , S5( aS5 ) , S6( aS6 )
{}
};
template<>
struct ViewStride<8> {
size_t S0 , S1 , S2 , S3 , S4 , S5 , S6 , S7 ;
ViewStride() = default ;
ViewStride( const ViewStride & ) = default ;
ViewStride & operator = ( const ViewStride & ) = default ;
KOKKOS_INLINE_FUNCTION
constexpr ViewStride( size_t aS0 , size_t aS1 , size_t aS2 , size_t aS3
, size_t aS4 , size_t aS5 , size_t aS6 , size_t aS7 )
: S0( aS0 ) , S1( aS1 ) , S2( aS2 ) , S3( aS3 )
, S4( aS4 ) , S5( aS5 ) , S6( aS6 ) , S7( aS7 )
{}
};
template < class Dimension >
struct ViewOffset< Dimension , Kokkos::LayoutStride
, void >
{
private:
typedef ViewStride< Dimension::rank > stride_type ;
public:
using is_mapping_plugin = std::true_type ;
using is_regular = std::true_type ;
typedef size_t size_type ;
typedef Dimension dimension_type ;
typedef Kokkos::LayoutStride array_layout ;
dimension_type m_dim ;
stride_type m_stride ;
//----------------------------------------
// rank 1
template< typename I0 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0 ) const
{
return i0 * m_stride.S0 ;
}
// rank 2
template < typename I0 , typename I1 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0 , I1 const & i1 ) const
{
return i0 * m_stride.S0 +
i1 * m_stride.S1 ;
}
//rank 3
template < typename I0, typename I1, typename I2 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2 ) const
{
return i0 * m_stride.S0 +
i1 * m_stride.S1 +
i2 * m_stride.S2 ;
}
//rank 4
template < typename I0, typename I1, typename I2, typename I3 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3 ) const
{
return i0 * m_stride.S0 +
i1 * m_stride.S1 +
i2 * m_stride.S2 +
i3 * m_stride.S3 ;
}
//rank 5
template < typename I0, typename I1, typename I2, typename I3
, typename I4 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
, I4 const & i4 ) const
{
return i0 * m_stride.S0 +
i1 * m_stride.S1 +
i2 * m_stride.S2 +
i3 * m_stride.S3 +
i4 * m_stride.S4 ;
}
//rank 6
template < typename I0, typename I1, typename I2, typename I3
, typename I4, typename I5 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
, I4 const & i4, I5 const & i5 ) const
{
return i0 * m_stride.S0 +
i1 * m_stride.S1 +
i2 * m_stride.S2 +
i3 * m_stride.S3 +
i4 * m_stride.S4 +
i5 * m_stride.S5 ;
}
//rank 7
template < typename I0, typename I1, typename I2, typename I3
, typename I4, typename I5, typename I6 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
, I4 const & i4, I5 const & i5, I6 const & i6 ) const
{
return i0 * m_stride.S0 +
i1 * m_stride.S1 +
i2 * m_stride.S2 +
i3 * m_stride.S3 +
i4 * m_stride.S4 +
i5 * m_stride.S5 +
i6 * m_stride.S6 ;
}
//rank 8
template < typename I0, typename I1, typename I2, typename I3
, typename I4, typename I5, typename I6, typename I7 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0, I1 const & i1, I2 const & i2, I3 const & i3
, I4 const & i4, I5 const & i5, I6 const & i6, I7 const & i7 ) const
{
return i0 * m_stride.S0 +
i1 * m_stride.S1 +
i2 * m_stride.S2 +
i3 * m_stride.S3 +
i4 * m_stride.S4 +
i5 * m_stride.S5 +
i6 * m_stride.S6 +
i7 * m_stride.S7 ;
}
//----------------------------------------
KOKKOS_INLINE_FUNCTION
constexpr array_layout layout() const
{
return array_layout( m_dim.N0 , m_stride.S0
, m_dim.N1 , m_stride.S1
, m_dim.N2 , m_stride.S2
, m_dim.N3 , m_stride.S3
, m_dim.N4 , m_stride.S4
, m_dim.N5 , m_stride.S5
, m_dim.N6 , m_stride.S6
, m_dim.N7 , m_stride.S7
);
}
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_0() const { return m_dim.N0 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_1() const { return m_dim.N1 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_2() const { return m_dim.N2 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_3() const { return m_dim.N3 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_4() const { return m_dim.N4 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_5() const { return m_dim.N5 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_6() const { return m_dim.N6 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_7() const { return m_dim.N7 ; }
/* Cardinality of the domain index space */
KOKKOS_INLINE_FUNCTION
constexpr size_type size() const
{ return m_dim.N0 * m_dim.N1 * m_dim.N2 * m_dim.N3 * m_dim.N4 * m_dim.N5 * m_dim.N6 * m_dim.N7 ; }
private:
KOKKOS_INLINE_FUNCTION
static constexpr size_type Max( size_type lhs , size_type rhs )
{ return lhs < rhs ? rhs : lhs ; }
public:
/* Span of the range space, largest stride * dimension */
KOKKOS_INLINE_FUNCTION
constexpr size_type span() const
{
return Max( m_dim.N0 * m_stride.S0 ,
Max( m_dim.N1 * m_stride.S1 ,
Max( m_dim.N2 * m_stride.S2 ,
Max( m_dim.N3 * m_stride.S3 ,
Max( m_dim.N4 * m_stride.S4 ,
Max( m_dim.N5 * m_stride.S5 ,
Max( m_dim.N6 * m_stride.S6 ,
m_dim.N7 * m_stride.S7 )))))));
}
KOKKOS_INLINE_FUNCTION constexpr bool span_is_contiguous() const { return span() == size(); }
/* Strides of dimensions */
KOKKOS_INLINE_FUNCTION constexpr size_type stride_0() const { return m_stride.S0 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_1() const { return m_stride.S1 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_2() const { return m_stride.S2 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_3() const { return m_stride.S3 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_4() const { return m_stride.S4 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_5() const { return m_stride.S5 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_6() const { return m_stride.S6 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_7() const { return m_stride.S7 ; }
// Stride with [ rank ] value is the total length
template< typename iType >
KOKKOS_INLINE_FUNCTION
void stride( iType * const s ) const
{
if ( 0 < dimension_type::rank ) { s[0] = m_stride.S0 ; }
if ( 1 < dimension_type::rank ) { s[1] = m_stride.S1 ; }
if ( 2 < dimension_type::rank ) { s[2] = m_stride.S2 ; }
if ( 3 < dimension_type::rank ) { s[3] = m_stride.S3 ; }
if ( 4 < dimension_type::rank ) { s[4] = m_stride.S4 ; }
if ( 5 < dimension_type::rank ) { s[5] = m_stride.S5 ; }
if ( 6 < dimension_type::rank ) { s[6] = m_stride.S6 ; }
if ( 7 < dimension_type::rank ) { s[7] = m_stride.S7 ; }
s[dimension_type::rank] = span();
}
//----------------------------------------
ViewOffset() = default ;
ViewOffset( const ViewOffset & ) = default ;
ViewOffset & operator = ( const ViewOffset & ) = default ;
KOKKOS_INLINE_FUNCTION
constexpr ViewOffset( std::integral_constant<unsigned,0> const &
, Kokkos::LayoutStride const & rhs )
: m_dim( rhs.dimension[0] , rhs.dimension[1] , rhs.dimension[2] , rhs.dimension[3]
, rhs.dimension[4] , rhs.dimension[5] , rhs.dimension[6] , rhs.dimension[7] )
, m_stride( rhs.stride[0] , rhs.stride[1] , rhs.stride[2] , rhs.stride[3]
, rhs.stride[4] , rhs.stride[5] , rhs.stride[6] , rhs.stride[7] )
{}
template< class DimRHS , class LayoutRHS >
KOKKOS_INLINE_FUNCTION
constexpr ViewOffset( const ViewOffset< DimRHS , LayoutRHS , void > & rhs )
: m_dim( rhs.m_dim.N0 , rhs.m_dim.N1 , rhs.m_dim.N2 , rhs.m_dim.N3
, rhs.m_dim.N4 , rhs.m_dim.N5 , rhs.m_dim.N6 , rhs.m_dim.N7 )
, m_stride( rhs.stride_0() , rhs.stride_1() , rhs.stride_2() , rhs.stride_3()
, rhs.stride_4() , rhs.stride_5() , rhs.stride_6() , rhs.stride_7() )
{
static_assert( int(DimRHS::rank) == int(dimension_type::rank) , "ViewOffset assignment requires equal rank" );
// Also requires equal static dimensions ...
}
//----------------------------------------
// Subview construction
private:
template< class DimRHS , class LayoutRHS >
KOKKOS_INLINE_FUNCTION static
constexpr size_t stride
( unsigned r , const ViewOffset< DimRHS , LayoutRHS , void > & rhs )
{
return r > 7 ? 0 : (
r == 0 ? rhs.stride_0() : (
r == 1 ? rhs.stride_1() : (
r == 2 ? rhs.stride_2() : (
r == 3 ? rhs.stride_3() : (
r == 4 ? rhs.stride_4() : (
r == 5 ? rhs.stride_5() : (
r == 6 ? rhs.stride_6() : rhs.stride_7() )))))));
}
public:
template< class DimRHS , class LayoutRHS >
KOKKOS_INLINE_FUNCTION
constexpr ViewOffset
( const ViewOffset< DimRHS , LayoutRHS , void > & rhs
, const SubviewExtents< DimRHS::rank , dimension_type::rank > & sub
)
// range_extent(r) returns 0 when dimension_type::rank <= r
: m_dim( sub.range_extent(0)
, sub.range_extent(1)
, sub.range_extent(2)
, sub.range_extent(3)
, sub.range_extent(4)
, sub.range_extent(5)
, sub.range_extent(6)
, sub.range_extent(7)
)
// range_index(r) returns ~0u when dimension_type::rank <= r
, m_stride( stride( sub.range_index(0), rhs )
, stride( sub.range_index(1), rhs )
, stride( sub.range_index(2), rhs )
, stride( sub.range_index(3), rhs )
, stride( sub.range_index(4), rhs )
, stride( sub.range_index(5), rhs )
, stride( sub.range_index(6), rhs )
, stride( sub.range_index(7), rhs )
)
{}
};
}}} // namespace Kokkos::Experimental::Impl
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Experimental {
namespace Impl {
/** \brief ViewDataHandle provides the type of the 'data handle' which the view
* uses to access data with the [] operator. It also provides
* an allocate function and a function to extract a raw ptr from the
* data handle. ViewDataHandle also defines an enum ReferenceAble which
* specifies whether references/pointers to elements can be taken and a
* 'return_type' which is what the view operators will give back.
* Specialisation of this object allows three things depending
* on ViewTraits and compiler options:
* (i) Use special allocator (e.g. huge pages/small pages and pinned memory)
* (ii) Use special data handle type (e.g. add Cuda Texture Object)
* (iii) Use special access intrinsics (e.g. texture fetch and non-caching loads)
*/
template< class Traits , class Enable = void >
struct ViewDataHandle {
typedef typename Traits::value_type value_type ;
typedef typename Traits::value_type * handle_type ;
typedef typename Traits::value_type & return_type ;
- typedef Kokkos::Experimental::Impl::SharedAllocationTracker track_type ;
+ typedef Kokkos::Impl::SharedAllocationTracker track_type ;
KOKKOS_INLINE_FUNCTION
static handle_type assign( value_type * arg_data_ptr
, track_type const & /*arg_tracker*/ )
{
return handle_type( arg_data_ptr );
}
+
+ KOKKOS_INLINE_FUNCTION
+ static handle_type assign( handle_type const arg_data_ptr
+ , size_t offset )
+ {
+ return handle_type( arg_data_ptr + offset );
+ }
};
template< class Traits >
struct ViewDataHandle< Traits ,
typename std::enable_if<( std::is_same< typename Traits::non_const_value_type
, typename Traits::value_type >::value
&&
std::is_same< typename Traits::specialize , void >::value
&&
Traits::memory_traits::Atomic
)>::type >
{
typedef typename Traits::value_type value_type ;
typedef typename Kokkos::Impl::AtomicViewDataHandle< Traits > handle_type ;
typedef typename Kokkos::Impl::AtomicDataElement< Traits > return_type ;
- typedef Kokkos::Experimental::Impl::SharedAllocationTracker track_type ;
+ typedef Kokkos::Impl::SharedAllocationTracker track_type ;
KOKKOS_INLINE_FUNCTION
static handle_type assign( value_type * arg_data_ptr
, track_type const & /*arg_tracker*/ )
{
return handle_type( arg_data_ptr );
}
+
+ template<class SrcHandleType>
+ KOKKOS_INLINE_FUNCTION
+ static handle_type assign( const SrcHandleType& arg_handle
+ , size_t offset )
+ {
+ return handle_type( arg_handle.ptr + offset );
+ }
};
+template< class Traits >
+struct ViewDataHandle< Traits ,
+ typename std::enable_if<(
+ std::is_same< typename Traits::specialize , void >::value
+ &&
+ (!Traits::memory_traits::Aligned)
+ &&
+ Traits::memory_traits::Restrict
+#ifdef KOKKOS_HAVE_CUDA
+ &&
+ (!( std::is_same< typename Traits::memory_space,Kokkos::CudaSpace>::value ||
+ std::is_same< typename Traits::memory_space,Kokkos::CudaUVMSpace>::value ))
+#endif
+ &&
+ (!Traits::memory_traits::Atomic)
+ )>::type >
+{
+ typedef typename Traits::value_type value_type ;
+ typedef typename Traits::value_type * KOKKOS_RESTRICT handle_type ;
+ typedef typename Traits::value_type & KOKKOS_RESTRICT return_type ;
+ typedef Kokkos::Impl::SharedAllocationTracker track_type ;
+
+ KOKKOS_INLINE_FUNCTION
+ static handle_type assign( value_type * arg_data_ptr
+ , track_type const & /*arg_tracker*/ )
+ {
+ return handle_type( arg_data_ptr );
+ }
+
+ KOKKOS_INLINE_FUNCTION
+ static handle_type assign( handle_type const arg_data_ptr
+ , size_t offset )
+ {
+ return handle_type( arg_data_ptr + offset );
+ }
+};
+
+template< class Traits >
+struct ViewDataHandle< Traits ,
+ typename std::enable_if<(
+ std::is_same< typename Traits::specialize , void >::value
+ &&
+ Traits::memory_traits::Aligned
+ &&
+ (!Traits::memory_traits::Restrict)
+#ifdef KOKKOS_HAVE_CUDA
+ &&
+ (!( std::is_same< typename Traits::memory_space,Kokkos::CudaSpace>::value ||
+ std::is_same< typename Traits::memory_space,Kokkos::CudaUVMSpace>::value ))
+#endif
+ &&
+ (!Traits::memory_traits::Atomic)
+ )>::type >
+{
+ typedef typename Traits::value_type value_type ;
+ typedef typename Traits::value_type * KOKKOS_ALIGN_PTR(KOKKOS_ALIGN_SIZE) handle_type ;
+ typedef typename Traits::value_type & return_type ;
+ typedef Kokkos::Impl::SharedAllocationTracker track_type ;
+
+ KOKKOS_INLINE_FUNCTION
+ static handle_type assign( value_type * arg_data_ptr
+ , track_type const & /*arg_tracker*/ )
+ {
+ if ( reinterpret_cast<uintptr_t>(arg_data_ptr) % KOKKOS_ALIGN_SIZE ) {
+ Kokkos::abort("Assigning NonAligned View or Pointer to Kokkos::View with Aligned attribute");
+ }
+ return handle_type( arg_data_ptr );
+ }
+
+ KOKKOS_INLINE_FUNCTION
+ static handle_type assign( handle_type const arg_data_ptr
+ , size_t offset )
+ {
+ if ( reinterpret_cast<uintptr_t>(arg_data_ptr+offset) % KOKKOS_ALIGN_SIZE ) {
+ Kokkos::abort("Assigning NonAligned View or Pointer to Kokkos::View with Aligned attribute");
+ }
+ return handle_type( arg_data_ptr + offset );
+ }
+};
+
+template< class Traits >
+struct ViewDataHandle< Traits ,
+ typename std::enable_if<(
+ std::is_same< typename Traits::specialize , void >::value
+ &&
+ Traits::memory_traits::Aligned
+ &&
+ Traits::memory_traits::Restrict
+#ifdef KOKKOS_HAVE_CUDA
+ &&
+ (!( std::is_same< typename Traits::memory_space,Kokkos::CudaSpace>::value ||
+ std::is_same< typename Traits::memory_space,Kokkos::CudaUVMSpace>::value ))
+#endif
+ &&
+ (!Traits::memory_traits::Atomic)
+ )>::type >
+{
+ typedef typename Traits::value_type value_type ;
+ typedef typename Traits::value_type * KOKKOS_RESTRICT KOKKOS_ALIGN_PTR(KOKKOS_ALIGN_SIZE) handle_type ;
+ typedef typename Traits::value_type & return_type ;
+ typedef Kokkos::Impl::SharedAllocationTracker track_type ;
+
+ KOKKOS_INLINE_FUNCTION
+ static handle_type assign( value_type * arg_data_ptr
+ , track_type const & /*arg_tracker*/ )
+ {
+ if ( reinterpret_cast<uintptr_t>(arg_data_ptr) % KOKKOS_ALIGN_SIZE ) {
+ Kokkos::abort("Assigning NonAligned View or Pointer to Kokkos::View with Aligned attribute");
+ }
+ return handle_type( arg_data_ptr );
+ }
+
+ KOKKOS_INLINE_FUNCTION
+ static handle_type assign( handle_type const arg_data_ptr
+ , size_t offset )
+ {
+ if ( reinterpret_cast<uintptr_t>(arg_data_ptr+offset) % KOKKOS_ALIGN_SIZE ) {
+ Kokkos::abort("Assigning NonAligned View or Pointer to Kokkos::View with Aligned attribute");
+ }
+ return handle_type( arg_data_ptr + offset );
+ }
+};
}}} // namespace Kokkos::Experimental::Impl
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Experimental {
namespace Impl {
//----------------------------------------------------------------------------
/*
* The construction, assignment to default, and destruction
* are merged into a single functor.
* Primarily to work around an unresolved CUDA back-end bug
* that would lose the destruction cuda device function when
* called from the shared memory tracking destruction.
* Secondarily to have two fewer partial specializations.
*/
template< class ExecSpace
, class ValueType
, bool IsScalar = std::is_scalar< ValueType >::value
>
struct ViewValueFunctor ;
template< class ExecSpace , class ValueType >
struct ViewValueFunctor< ExecSpace , ValueType , false /* is_scalar */ >
{
typedef Kokkos::RangePolicy< ExecSpace > PolicyType ;
ExecSpace space ;
ValueType * ptr ;
size_t n ;
bool destroy ;
KOKKOS_INLINE_FUNCTION
void operator()( const size_t i ) const
{
- if ( destroy ) { (ptr+i)->~ValueType(); }
+ if ( destroy ) { (ptr+i)->~ValueType(); } //KOKKOS_CUDA_CLANG_WORKAROUND this line causes ptax error __cxa_begin_catch in nested_view unit-test
else { new (ptr+i) ValueType(); }
}
ViewValueFunctor() = default ;
ViewValueFunctor( const ViewValueFunctor & ) = default ;
ViewValueFunctor & operator = ( const ViewValueFunctor & ) = default ;
ViewValueFunctor( ExecSpace const & arg_space
, ValueType * const arg_ptr
, size_t const arg_n )
: space( arg_space )
, ptr( arg_ptr )
, n( arg_n )
, destroy( false )
{}
void execute( bool arg )
{
destroy = arg ;
if ( ! space.in_parallel() ) {
const Kokkos::Impl::ParallelFor< ViewValueFunctor , PolicyType >
closure( *this , PolicyType( 0 , n ) );
closure.execute();
space.fence();
}
else {
for ( size_t i = 0 ; i < n ; ++i ) operator()(i);
}
}
void construct_shared_allocation()
{ execute( false ); }
void destroy_shared_allocation()
{ execute( true ); }
};
template< class ExecSpace , class ValueType >
struct ViewValueFunctor< ExecSpace , ValueType , true /* is_scalar */ >
{
typedef Kokkos::RangePolicy< ExecSpace > PolicyType ;
ExecSpace space ;
ValueType * ptr ;
size_t n ;
KOKKOS_INLINE_FUNCTION
void operator()( const size_t i ) const
{ ptr[i] = ValueType(); }
ViewValueFunctor() = default ;
ViewValueFunctor( const ViewValueFunctor & ) = default ;
ViewValueFunctor & operator = ( const ViewValueFunctor & ) = default ;
ViewValueFunctor( ExecSpace const & arg_space
, ValueType * const arg_ptr
, size_t const arg_n )
: space( arg_space )
, ptr( arg_ptr )
, n( arg_n )
{}
void construct_shared_allocation()
{
if ( ! space.in_parallel() ) {
const Kokkos::Impl::ParallelFor< ViewValueFunctor , PolicyType >
closure( *this , PolicyType( 0 , n ) );
closure.execute();
space.fence();
}
else {
for ( size_t i = 0 ; i < n ; ++i ) operator()(i);
}
}
void destroy_shared_allocation() {}
};
//----------------------------------------------------------------------------
/** \brief View mapping for non-specialized data type and standard layout */
template< class Traits >
class ViewMapping< Traits ,
typename std::enable_if<(
std::is_same< typename Traits::specialize , void >::value
&&
ViewOffset< typename Traits::dimension
, typename Traits::array_layout
, void >::is_mapping_plugin::value
)>::type >
{
private:
template< class , class ... > friend class ViewMapping ;
- template< class , class ... > friend class Kokkos::Experimental::View ;
+ template< class , class ... > friend class Kokkos::View ;
typedef ViewOffset< typename Traits::dimension
, typename Traits::array_layout
, void
> offset_type ;
typedef typename ViewDataHandle< Traits >::handle_type handle_type ;
handle_type m_handle ;
offset_type m_offset ;
KOKKOS_INLINE_FUNCTION
ViewMapping( const handle_type & arg_handle , const offset_type & arg_offset )
: m_handle( arg_handle )
, m_offset( arg_offset )
{}
public:
//----------------------------------------
// Domain dimensions
enum { Rank = Traits::dimension::rank };
template< typename iType >
KOKKOS_INLINE_FUNCTION constexpr size_t extent( const iType & r ) const
{ return m_offset.m_dim.extent(r); }
KOKKOS_INLINE_FUNCTION constexpr
typename Traits::array_layout layout() const
{ return m_offset.layout(); }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_0() const { return m_offset.dimension_0(); }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_1() const { return m_offset.dimension_1(); }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_2() const { return m_offset.dimension_2(); }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_3() const { return m_offset.dimension_3(); }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_4() const { return m_offset.dimension_4(); }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_5() const { return m_offset.dimension_5(); }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_6() const { return m_offset.dimension_6(); }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_7() const { return m_offset.dimension_7(); }
// Is a regular layout with uniform striding for each index.
using is_regular = typename offset_type::is_regular ;
KOKKOS_INLINE_FUNCTION constexpr size_t stride_0() const { return m_offset.stride_0(); }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_1() const { return m_offset.stride_1(); }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_2() const { return m_offset.stride_2(); }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_3() const { return m_offset.stride_3(); }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_4() const { return m_offset.stride_4(); }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_5() const { return m_offset.stride_5(); }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_6() const { return m_offset.stride_6(); }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_7() const { return m_offset.stride_7(); }
template< typename iType >
KOKKOS_INLINE_FUNCTION void stride( iType * const s ) const { m_offset.stride(s); }
//----------------------------------------
// Range span
/** \brief Span of the mapped range */
KOKKOS_INLINE_FUNCTION constexpr size_t span() const { return m_offset.span(); }
/** \brief Is the mapped range span contiguous */
KOKKOS_INLINE_FUNCTION constexpr bool span_is_contiguous() const { return m_offset.span_is_contiguous(); }
typedef typename ViewDataHandle< Traits >::return_type reference_type ;
typedef typename Traits::value_type * pointer_type ;
/** \brief If data references are lvalue_reference than can query pointer to memory */
KOKKOS_INLINE_FUNCTION constexpr pointer_type data() const
{
return std::is_lvalue_reference< reference_type >::value
? (pointer_type) m_handle
: (pointer_type) 0 ;
}
//----------------------------------------
// The View class performs all rank and bounds checking before
// calling these element reference methods.
KOKKOS_FORCEINLINE_FUNCTION
reference_type reference() const { return m_handle[0]; }
template< typename I0 >
KOKKOS_FORCEINLINE_FUNCTION
typename
std::enable_if< std::is_integral<I0>::value &&
! std::is_same< typename Traits::array_layout , Kokkos::LayoutStride >::value
, reference_type >::type
reference( const I0 & i0 ) const { return m_handle[i0]; }
template< typename I0 >
KOKKOS_FORCEINLINE_FUNCTION
typename
std::enable_if< std::is_integral<I0>::value &&
std::is_same< typename Traits::array_layout , Kokkos::LayoutStride >::value
, reference_type >::type
reference( const I0 & i0 ) const { return m_handle[ m_offset(i0) ]; }
template< typename I0 , typename I1 >
KOKKOS_FORCEINLINE_FUNCTION
reference_type reference( const I0 & i0 , const I1 & i1 ) const
{ return m_handle[ m_offset(i0,i1) ]; }
template< typename I0 , typename I1 , typename I2 >
KOKKOS_FORCEINLINE_FUNCTION
reference_type reference( const I0 & i0 , const I1 & i1 , const I2 & i2 ) const
{ return m_handle[ m_offset(i0,i1,i2) ]; }
template< typename I0 , typename I1 , typename I2 , typename I3 >
KOKKOS_FORCEINLINE_FUNCTION
reference_type reference( const I0 & i0 , const I1 & i1 , const I2 & i2 , const I3 & i3 ) const
{ return m_handle[ m_offset(i0,i1,i2,i3) ]; }
template< typename I0 , typename I1 , typename I2 , typename I3
, typename I4 >
KOKKOS_FORCEINLINE_FUNCTION
reference_type reference( const I0 & i0 , const I1 & i1 , const I2 & i2 , const I3 & i3
, const I4 & i4 ) const
{ return m_handle[ m_offset(i0,i1,i2,i3,i4) ]; }
template< typename I0 , typename I1 , typename I2 , typename I3
, typename I4 , typename I5 >
KOKKOS_FORCEINLINE_FUNCTION
reference_type reference( const I0 & i0 , const I1 & i1 , const I2 & i2 , const I3 & i3
, const I4 & i4 , const I5 & i5 ) const
{ return m_handle[ m_offset(i0,i1,i2,i3,i4,i5) ]; }
template< typename I0 , typename I1 , typename I2 , typename I3
, typename I4 , typename I5 , typename I6 >
KOKKOS_FORCEINLINE_FUNCTION
reference_type reference( const I0 & i0 , const I1 & i1 , const I2 & i2 , const I3 & i3
, const I4 & i4 , const I5 & i5 , const I6 & i6 ) const
{ return m_handle[ m_offset(i0,i1,i2,i3,i4,i5,i6) ]; }
template< typename I0 , typename I1 , typename I2 , typename I3
, typename I4 , typename I5 , typename I6 , typename I7 >
KOKKOS_FORCEINLINE_FUNCTION
reference_type reference( const I0 & i0 , const I1 & i1 , const I2 & i2 , const I3 & i3
, const I4 & i4 , const I5 & i5 , const I6 & i6 , const I7 & i7 ) const
{ return m_handle[ m_offset(i0,i1,i2,i3,i4,i5,i6,i7) ]; }
//----------------------------------------
private:
enum { MemorySpanMask = 8 - 1 /* Force alignment on 8 byte boundary */ };
enum { MemorySpanSize = sizeof(typename Traits::value_type) };
public:
/** \brief Span, in bytes, of the referenced memory */
KOKKOS_INLINE_FUNCTION constexpr size_t memory_span() const
{
return ( m_offset.span() * sizeof(typename Traits::value_type) + MemorySpanMask ) & ~size_t(MemorySpanMask);
}
//----------------------------------------
KOKKOS_INLINE_FUNCTION ~ViewMapping() {}
KOKKOS_INLINE_FUNCTION ViewMapping() : m_handle(), m_offset() {}
KOKKOS_INLINE_FUNCTION ViewMapping( const ViewMapping & rhs )
: m_handle( rhs.m_handle ), m_offset( rhs.m_offset ) {}
KOKKOS_INLINE_FUNCTION ViewMapping & operator = ( const ViewMapping & rhs )
{ m_handle = rhs.m_handle ; m_offset = rhs.m_offset ; return *this ; }
KOKKOS_INLINE_FUNCTION ViewMapping( ViewMapping && rhs )
: m_handle( rhs.m_handle ), m_offset( rhs.m_offset ) {}
KOKKOS_INLINE_FUNCTION ViewMapping & operator = ( ViewMapping && rhs )
{ m_handle = rhs.m_handle ; m_offset = rhs.m_offset ; return *this ; }
//----------------------------------------
/**\brief Span, in bytes, of the required memory */
KOKKOS_INLINE_FUNCTION
static constexpr size_t memory_span( typename Traits::array_layout const & arg_layout )
{
typedef std::integral_constant< unsigned , 0 > padding ;
return ( offset_type( padding(), arg_layout ).span() * MemorySpanSize + MemorySpanMask ) & ~size_t(MemorySpanMask);
}
/**\brief Wrap a span of memory */
template< class ... P >
KOKKOS_INLINE_FUNCTION
- ViewMapping( ViewCtorProp< P ... > const & arg_prop
+ ViewMapping( Kokkos::Impl::ViewCtorProp< P ... > const & arg_prop
, typename Traits::array_layout const & arg_layout
)
- : m_handle( ( (ViewCtorProp<void,pointer_type> const &) arg_prop ).value )
+ : m_handle( ( (Kokkos::Impl::ViewCtorProp<void,pointer_type> const &) arg_prop ).value )
, m_offset( std::integral_constant< unsigned , 0 >() , arg_layout )
{}
//----------------------------------------
/* Allocate and construct mapped array.
* Allocate via shared allocation record and
* return that record for allocation tracking.
*/
template< class ... P >
- SharedAllocationRecord<> *
- allocate_shared( ViewCtorProp< P... > const & arg_prop
+ Kokkos::Impl::SharedAllocationRecord<> *
+ allocate_shared( Kokkos::Impl::ViewCtorProp< P... > const & arg_prop
, typename Traits::array_layout const & arg_layout )
{
- typedef ViewCtorProp< P... > alloc_prop ;
+ typedef Kokkos::Impl::ViewCtorProp< P... > alloc_prop ;
typedef typename alloc_prop::execution_space execution_space ;
typedef typename Traits::memory_space memory_space ;
typedef typename Traits::value_type value_type ;
typedef ViewValueFunctor< execution_space , value_type > functor_type ;
- typedef SharedAllocationRecord< memory_space , functor_type > record_type ;
+ typedef Kokkos::Impl::SharedAllocationRecord< memory_space , functor_type > record_type ;
// Query the mapping for byte-size of allocation.
// If padding is allowed then pass in sizeof value type
// for padding computation.
typedef std::integral_constant
< unsigned
, alloc_prop::allow_padding ? sizeof(value_type) : 0
> padding ;
m_offset = offset_type( padding(), arg_layout );
const size_t alloc_size =
( m_offset.span() * MemorySpanSize + MemorySpanMask ) & ~size_t(MemorySpanMask);
// Create shared memory tracking record with allocate memory from the memory space
record_type * const record =
- record_type::allocate( ( (ViewCtorProp<void,memory_space> const &) arg_prop ).value
- , ( (ViewCtorProp<void,std::string> const &) arg_prop ).value
+ record_type::allocate( ( (Kokkos::Impl::ViewCtorProp<void,memory_space> const &) arg_prop ).value
+ , ( (Kokkos::Impl::ViewCtorProp<void,std::string> const &) arg_prop ).value
, alloc_size );
// Only set the the pointer and initialize if the allocation is non-zero.
// May be zero if one of the dimensions is zero.
if ( alloc_size ) {
m_handle = handle_type( reinterpret_cast< pointer_type >( record->data() ) );
if ( alloc_prop::initialize ) {
// Assume destruction is only required when construction is requested.
// The ViewValueFunctor has both value construction and destruction operators.
- record->m_destroy = functor_type( ( (ViewCtorProp<void,execution_space> const &) arg_prop).value
+ record->m_destroy = functor_type( ( (Kokkos::Impl::ViewCtorProp<void,execution_space> const &) arg_prop).value
, (value_type *) m_handle
, m_offset.span()
);
// Construct values
record->m_destroy.construct_shared_allocation();
}
}
return record ;
}
};
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
/** \brief Assign compatible default mappings */
template< class DstTraits , class SrcTraits >
class ViewMapping< DstTraits , SrcTraits ,
typename std::enable_if<(
- std::is_same< typename DstTraits::memory_space , typename SrcTraits::memory_space >::value
- &&
+ /* default mappings */
std::is_same< typename DstTraits::specialize , void >::value
&&
std::is_same< typename SrcTraits::specialize , void >::value
&&
(
+ /* same layout */
std::is_same< typename DstTraits::array_layout , typename SrcTraits::array_layout >::value
||
+ /* known layout */
(
(
std::is_same< typename DstTraits::array_layout , Kokkos::LayoutLeft >::value ||
std::is_same< typename DstTraits::array_layout , Kokkos::LayoutRight >::value ||
std::is_same< typename DstTraits::array_layout , Kokkos::LayoutStride >::value
)
&&
(
std::is_same< typename SrcTraits::array_layout , Kokkos::LayoutLeft >::value ||
std::is_same< typename SrcTraits::array_layout , Kokkos::LayoutRight >::value ||
std::is_same< typename SrcTraits::array_layout , Kokkos::LayoutStride >::value
)
)
)
)>::type >
{
private:
+ enum { is_assignable_space =
+#if 1
+ Kokkos::Impl::MemorySpaceAccess
+ < typename DstTraits::memory_space
+ , typename SrcTraits::memory_space >::assignable };
+#else
+ std::is_same< typename DstTraits::memory_space
+ , typename SrcTraits::memory_space >::value };
+#endif
+
enum { is_assignable_value_type =
std::is_same< typename DstTraits::value_type
, typename SrcTraits::value_type >::value ||
std::is_same< typename DstTraits::value_type
, typename SrcTraits::const_value_type >::value };
enum { is_assignable_dimension =
ViewDimensionAssignable< typename DstTraits::dimension
, typename SrcTraits::dimension >::value };
enum { is_assignable_layout =
std::is_same< typename DstTraits::array_layout
, typename SrcTraits::array_layout >::value ||
std::is_same< typename DstTraits::array_layout
, Kokkos::LayoutStride >::value ||
( DstTraits::dimension::rank == 0 ) ||
( DstTraits::dimension::rank == 1 &&
DstTraits::dimension::rank_dynamic == 1 )
};
public:
- enum { is_assignable = is_assignable_value_type &&
+ enum { is_assignable = is_assignable_space &&
+ is_assignable_value_type &&
is_assignable_dimension &&
is_assignable_layout };
- typedef Kokkos::Experimental::Impl::SharedAllocationTracker TrackType ;
+ typedef Kokkos::Impl::SharedAllocationTracker TrackType ;
typedef ViewMapping< DstTraits , void > DstType ;
typedef ViewMapping< SrcTraits , void > SrcType ;
KOKKOS_INLINE_FUNCTION
static void assign( DstType & dst , const SrcType & src , const TrackType & src_track )
{
+ static_assert( is_assignable_space
+ , "View assignment must have compatible spaces" );
+
static_assert( is_assignable_value_type
, "View assignment must have same value type or const = non-const" );
static_assert( is_assignable_dimension
, "View assignment must have compatible dimensions" );
static_assert( is_assignable_layout
, "View assignment must have compatible layout or have rank <= 1" );
typedef typename DstType::offset_type dst_offset_type ;
if ( size_t(DstTraits::dimension::rank_dynamic) < size_t(SrcTraits::dimension::rank_dynamic) ) {
typedef typename DstTraits::dimension dst_dim;
bool assignable =
( ( 1 > DstTraits::dimension::rank_dynamic && 1 <= SrcTraits::dimension::rank_dynamic ) ?
dst_dim::ArgN0 == src.dimension_0() : true ) &&
( ( 2 > DstTraits::dimension::rank_dynamic && 2 <= SrcTraits::dimension::rank_dynamic ) ?
dst_dim::ArgN1 == src.dimension_1() : true ) &&
( ( 3 > DstTraits::dimension::rank_dynamic && 3 <= SrcTraits::dimension::rank_dynamic ) ?
dst_dim::ArgN2 == src.dimension_2() : true ) &&
( ( 4 > DstTraits::dimension::rank_dynamic && 4 <= SrcTraits::dimension::rank_dynamic ) ?
dst_dim::ArgN3 == src.dimension_3() : true ) &&
( ( 5 > DstTraits::dimension::rank_dynamic && 5 <= SrcTraits::dimension::rank_dynamic ) ?
dst_dim::ArgN4 == src.dimension_4() : true ) &&
( ( 6 > DstTraits::dimension::rank_dynamic && 6 <= SrcTraits::dimension::rank_dynamic ) ?
dst_dim::ArgN5 == src.dimension_5() : true ) &&
( ( 7 > DstTraits::dimension::rank_dynamic && 7 <= SrcTraits::dimension::rank_dynamic ) ?
dst_dim::ArgN6 == src.dimension_6() : true ) &&
( ( 8 > DstTraits::dimension::rank_dynamic && 8 <= SrcTraits::dimension::rank_dynamic ) ?
dst_dim::ArgN7 == src.dimension_7() : true )
;
if(!assignable)
Kokkos::abort("View Assignment: trying to assign runtime dimension to non matching compile time dimension.");
}
dst.m_offset = dst_offset_type( src.m_offset );
dst.m_handle = Kokkos::Experimental::Impl::ViewDataHandle< DstTraits >::assign( src.m_handle , src_track );
}
};
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
// Subview mapping.
// Deduce destination view type from source view traits and subview arguments
template< class SrcTraits , class ... Args >
struct ViewMapping
< typename std::enable_if<(
std::is_same< typename SrcTraits::specialize , void >::value
&&
(
std::is_same< typename SrcTraits::array_layout
, Kokkos::LayoutLeft >::value ||
std::is_same< typename SrcTraits::array_layout
, Kokkos::LayoutRight >::value ||
std::is_same< typename SrcTraits::array_layout
, Kokkos::LayoutStride >::value
)
)>::type
, SrcTraits
, Args ... >
{
private:
static_assert( SrcTraits::rank == sizeof...(Args) ,
"Subview mapping requires one argument for each dimension of source View" );
enum
{ RZ = false
, R0 = bool(is_integral_extent<0,Args...>::value)
, R1 = bool(is_integral_extent<1,Args...>::value)
, R2 = bool(is_integral_extent<2,Args...>::value)
, R3 = bool(is_integral_extent<3,Args...>::value)
, R4 = bool(is_integral_extent<4,Args...>::value)
, R5 = bool(is_integral_extent<5,Args...>::value)
, R6 = bool(is_integral_extent<6,Args...>::value)
, R7 = bool(is_integral_extent<7,Args...>::value)
};
enum { rank = unsigned(R0) + unsigned(R1) + unsigned(R2) + unsigned(R3)
+ unsigned(R4) + unsigned(R5) + unsigned(R6) + unsigned(R7) };
// Whether right-most rank is a range.
enum { R0_rev = ( 0 == SrcTraits::rank ? RZ : (
1 == SrcTraits::rank ? R0 : (
2 == SrcTraits::rank ? R1 : (
3 == SrcTraits::rank ? R2 : (
4 == SrcTraits::rank ? R3 : (
5 == SrcTraits::rank ? R4 : (
6 == SrcTraits::rank ? R5 : (
7 == SrcTraits::rank ? R6 : R7 )))))))) };
// Subview's layout
typedef typename std::conditional<
( /* Same array layout IF */
( rank == 0 ) /* output rank zero */
||
+ SubviewLegalArgsCompileTime<typename SrcTraits::array_layout, typename SrcTraits::array_layout,
+ rank, SrcTraits::rank, 0, Args...>::value
+ ||
// OutputRank 1 or 2, InputLayout Left, Interval 0
// because single stride one or second index has a stride.
( rank <= 2 && R0 && std::is_same< typename SrcTraits::array_layout , Kokkos::LayoutLeft >::value ) //replace with input rank
||
// OutputRank 1 or 2, InputLayout Right, Interval [InputRank-1]
// because single stride one or second index has a stride.
( rank <= 2 && R0_rev && std::is_same< typename SrcTraits::array_layout , Kokkos::LayoutRight >::value ) //replace input rank
), typename SrcTraits::array_layout , Kokkos::LayoutStride
>::type array_layout ;
typedef typename SrcTraits::value_type value_type ;
typedef typename std::conditional< rank == 0 , value_type ,
typename std::conditional< rank == 1 , value_type * ,
typename std::conditional< rank == 2 , value_type ** ,
typename std::conditional< rank == 3 , value_type *** ,
typename std::conditional< rank == 4 , value_type **** ,
typename std::conditional< rank == 5 , value_type ***** ,
typename std::conditional< rank == 6 , value_type ****** ,
typename std::conditional< rank == 7 , value_type ******* ,
value_type ********
>::type >::type >::type >::type >::type >::type >::type >::type
data_type ;
public:
- typedef Kokkos::Experimental::ViewTraits
+ typedef Kokkos::ViewTraits
< data_type
, array_layout
, typename SrcTraits::device_type
, typename SrcTraits::memory_traits > traits_type ;
- typedef Kokkos::Experimental::View
+ typedef Kokkos::View
< data_type
, array_layout
, typename SrcTraits::device_type
, typename SrcTraits::memory_traits > type ;
template< class MemoryTraits >
struct apply {
static_assert( Kokkos::Impl::is_memory_traits< MemoryTraits >::value , "" );
- typedef Kokkos::Experimental::ViewTraits
+ typedef Kokkos::ViewTraits
< data_type
, array_layout
, typename SrcTraits::device_type
, MemoryTraits > traits_type ;
- typedef Kokkos::Experimental::View
+ typedef Kokkos::View
< data_type
, array_layout
, typename SrcTraits::device_type
, MemoryTraits > type ;
};
// The presumed type is 'ViewMapping< traits_type , void >'
// However, a compatible ViewMapping is acceptable.
template< class DstTraits >
KOKKOS_INLINE_FUNCTION
static void assign( ViewMapping< DstTraits , void > & dst
, ViewMapping< SrcTraits , void > const & src
, Args ... args )
{
static_assert(
ViewMapping< DstTraits , traits_type , void >::is_assignable ,
"Subview destination type must be compatible with subview derived type" );
typedef ViewMapping< DstTraits , void > DstType ;
typedef typename DstType::offset_type dst_offset_type ;
- typedef typename DstType::handle_type dst_handle_type ;
const SubviewExtents< SrcTraits::rank , rank >
extents( src.m_offset.m_dim , args... );
dst.m_offset = dst_offset_type( src.m_offset , extents );
- dst.m_handle = dst_handle_type( src.m_handle +
- src.m_offset( extents.domain_offset(0)
- , extents.domain_offset(1)
- , extents.domain_offset(2)
- , extents.domain_offset(3)
- , extents.domain_offset(4)
- , extents.domain_offset(5)
- , extents.domain_offset(6)
- , extents.domain_offset(7)
- ) );
+
+ dst.m_handle = ViewDataHandle< DstTraits >::assign(src.m_handle,
+ src.m_offset( extents.domain_offset(0)
+ , extents.domain_offset(1)
+ , extents.domain_offset(2)
+ , extents.domain_offset(3)
+ , extents.domain_offset(4)
+ , extents.domain_offset(5)
+ , extents.domain_offset(6)
+ , extents.domain_offset(7)
+ ));
}
};
//----------------------------------------------------------------------------
}}} // namespace Kokkos::Experimental::Impl
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
-namespace Experimental {
namespace Impl {
template< unsigned , class MapType >
KOKKOS_INLINE_FUNCTION
bool view_verify_operator_bounds( const MapType & )
{ return true ; }
template< unsigned R , class MapType , class iType , class ... Args >
KOKKOS_INLINE_FUNCTION
bool view_verify_operator_bounds
( const MapType & map
, const iType & i
, Args ... args
)
{
return ( size_t(i) < map.extent(R) )
&& view_verify_operator_bounds<R+1>( map , args ... );
}
template< unsigned , class MapType >
inline
void view_error_operator_bounds( char * , int , const MapType & )
{}
template< unsigned R , class MapType , class iType , class ... Args >
inline
void view_error_operator_bounds
( char * buf
, int len
, const MapType & map
, const iType & i
, Args ... args
)
{
const int n =
snprintf(buf,len," %ld < %ld %c"
, static_cast<unsigned long>(i)
, static_cast<unsigned long>( map.extent(R) )
, ( sizeof...(Args) ? ',' : ')' )
);
view_error_operator_bounds<R+1>(buf+n,len-n,map,args...);
}
template< class MapType , class ... Args >
KOKKOS_INLINE_FUNCTION
void view_verify_operator_bounds
- ( const MapType & map , Args ... args )
+ ( const char* label , const MapType & map , Args ... args )
{
if ( ! view_verify_operator_bounds<0>( map , args ... ) ) {
-#if defined( KOKKOS_ACTIVE_EXECUTION_SPACE_HOST )
+#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
enum { LEN = 1024 };
char buffer[ LEN ];
- int n = snprintf(buf,LEN,"View bounds error(" );
+ int n = snprintf(buffer,LEN,"View bounds error of view %s (", label);
view_error_operator_bounds<0>( buffer + n , LEN - n , map , args ... );
Kokkos::Impl::throw_runtime_exception(std::string(buffer));
#else
Kokkos::abort("View bounds error");
#endif
}
}
-
-class Error_view_scalar_reference_to_non_scalar_view ;
-
} /* namespace Impl */
-} /* namespace Experimental */
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
#endif /* #ifndef KOKKOS_EXPERIMENTAL_VIEW_MAPPING_HPP */
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewOffset.hpp b/lib/kokkos/core/src/impl/Kokkos_ViewOffset.hpp
deleted file mode 100644
index 5748e722c..000000000
--- a/lib/kokkos/core/src/impl/Kokkos_ViewOffset.hpp
+++ /dev/null
@@ -1,1341 +0,0 @@
-/*
-//@HEADER
-// ************************************************************************
-//
-// Kokkos v. 2.0
-// Copyright (2014) Sandia Corporation
-//
-// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
-// the U.S. Government retains certain rights in this software.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are
-// met:
-//
-// 1. Redistributions of source code must retain the above copyright
-// notice, this list of conditions and the following disclaimer.
-//
-// 2. Redistributions in binary form must reproduce the above copyright
-// notice, this list of conditions and the following disclaimer in the
-// documentation and/or other materials provided with the distribution.
-//
-// 3. Neither the name of the Corporation nor the names of the
-// contributors may be used to endorse or promote products derived from
-// this software without specific prior written permission.
-//
-// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
-// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
-// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
-// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
-// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
-// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
-// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
-// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
-// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-//
-// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
-// ************************************************************************
-//@HEADER
-*/
-
-#ifndef KOKKOS_VIEWOFFSET_HPP
-#define KOKKOS_VIEWOFFSET_HPP
-
-#include <Kokkos_Pair.hpp>
-#include <Kokkos_Layout.hpp>
-#include <impl/Kokkos_Traits.hpp>
-#include <impl/Kokkos_Shape.hpp>
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Kokkos { namespace Impl {
-
-template < class ShapeType , class LayoutType , typename Enable = void >
-struct ViewOffset ;
-
-//----------------------------------------------------------------------------
-// LayoutLeft AND ( 1 >= rank OR 0 == rank_dynamic ) : no padding / striding
-template < class ShapeType >
-struct ViewOffset< ShapeType , LayoutLeft
- , typename enable_if<( 1 >= ShapeType::rank
- ||
- 0 == ShapeType::rank_dynamic
- )>::type >
- : public ShapeType
-{
- typedef size_t size_type ;
- typedef ShapeType shape_type ;
- typedef LayoutLeft array_layout ;
-
- enum { has_padding = false };
-
- template< unsigned R >
- KOKKOS_INLINE_FUNCTION
- void assign( size_t n )
- { assign_shape_dimension<R>( *this , n ); }
-
- // Return whether the subview introduced noncontiguity
- template< class S , class L >
- KOKKOS_INLINE_FUNCTION
- typename Impl::enable_if<( 0 == shape_type::rank &&
- Impl::is_same<L,LayoutLeft>::value
- ), bool >::type
- assign_subview( const ViewOffset<S,L,void> &
- , const size_t n0
- , const size_t n1
- , const size_t n2
- , const size_t n3
- , const size_t n4
- , const size_t n5
- , const size_t n6
- , const size_t n7
- )
- {
- return false ; // did not introduce noncontiguity
- }
-
- // This subview must be 1 == rank and 1 == rank_dynamic.
- // The source dimension #0 must be non-zero and all other dimensions are zero.
- // Return whether the subview introduced noncontiguity
- template< class S , class L >
- KOKKOS_INLINE_FUNCTION
- typename Impl::enable_if<( 1 == shape_type::rank &&
- 1 == shape_type::rank_dynamic &&
- 1 <= S::rank &&
- Impl::is_same<L,LayoutLeft>::value
- ), bool >::type
- assign_subview( const ViewOffset<S,L,void> &
- , const size_t n0
- , const size_t n1
- , const size_t n2
- , const size_t n3
- , const size_t n4
- , const size_t n5
- , const size_t n6
- , const size_t n7
- )
- {
- // n1 .. n7 must be zero
- shape_type::N0 = n0 ;
- return false ; // did not introduce noncontiguity
- }
-
-
- KOKKOS_INLINE_FUNCTION
- void assign( size_t n0 , size_t n1 , size_t n2 , size_t n3
- , size_t n4 , size_t n5 , size_t n6 , size_t n7
- , size_t = 0 )
- { shape_type::assign( *this , n0, n1, n2, n3, n4, n5, n6, n7 ); }
-
- template< class ShapeRHS >
- KOKKOS_INLINE_FUNCTION
- void assign( const ViewOffset< ShapeRHS , LayoutLeft > & rhs
- , typename enable_if<( int(ShapeRHS::rank) == int(shape_type::rank)
- &&
- int(ShapeRHS::rank_dynamic) <= int(shape_type::rank_dynamic)
- )>::type * = 0 )
- { shape_type::assign( *this , rhs.N0, rhs.N1, rhs.N2, rhs.N3, rhs.N4, rhs.N5, rhs.N6, rhs.N7 ); }
-
- template< class ShapeRHS >
- KOKKOS_INLINE_FUNCTION
- void assign( const ViewOffset< ShapeRHS , LayoutRight > & rhs
- , typename enable_if<( 1 == int(ShapeRHS::rank)
- &&
- 1 == int(shape_type::rank)
- &&
- 1 == int(shape_type::rank_dynamic)
- )>::type * = 0 )
- { shape_type::assign( *this , rhs.N0, rhs.N1, rhs.N2, rhs.N3, rhs.N4, rhs.N5, rhs.N6, rhs.N7 ); }
-
- KOKKOS_INLINE_FUNCTION
- void set_padding() {}
-
- KOKKOS_INLINE_FUNCTION
- size_type cardinality() const
- { return size_type(shape_type::N0) * shape_type::N1 * shape_type::N2 * shape_type::N3 * shape_type::N4 * shape_type::N5 * shape_type::N6 * shape_type::N7 ; }
-
- KOKKOS_INLINE_FUNCTION
- size_type capacity() const
- { return size_type(shape_type::N0) * shape_type::N1 * shape_type::N2 * shape_type::N3 * shape_type::N4 * shape_type::N5 * shape_type::N6 * shape_type::N7 ; }
-
- // Stride with [ rank ] value is the total length
- template< typename iType >
- KOKKOS_INLINE_FUNCTION
- void stride( iType * const s ) const
- {
- s[0] = 1 ;
- if ( 0 < shape_type::rank ) { s[1] = shape_type::N0 ; }
- if ( 1 < shape_type::rank ) { s[2] = s[1] * shape_type::N1 ; }
- if ( 2 < shape_type::rank ) { s[3] = s[2] * shape_type::N2 ; }
- if ( 3 < shape_type::rank ) { s[4] = s[3] * shape_type::N3 ; }
- if ( 4 < shape_type::rank ) { s[5] = s[4] * shape_type::N4 ; }
- if ( 5 < shape_type::rank ) { s[6] = s[5] * shape_type::N5 ; }
- if ( 6 < shape_type::rank ) { s[7] = s[6] * shape_type::N6 ; }
- if ( 7 < shape_type::rank ) { s[8] = s[7] * shape_type::N7 ; }
- }
-
- KOKKOS_INLINE_FUNCTION size_type stride_0() const { return 1 ; }
- KOKKOS_INLINE_FUNCTION size_type stride_1() const { return shape_type::N0 ; }
- KOKKOS_INLINE_FUNCTION size_type stride_2() const { return shape_type::N0 * shape_type::N1 ; }
- KOKKOS_INLINE_FUNCTION size_type stride_3() const { return shape_type::N0 * shape_type::N1 * shape_type::N2 ; }
-
- KOKKOS_INLINE_FUNCTION
- size_type stride_4() const
- { return shape_type::N0 * shape_type::N1 * shape_type::N2 * shape_type::N3 ; }
-
- KOKKOS_INLINE_FUNCTION
- size_type stride_5() const
- { return shape_type::N0 * shape_type::N1 * shape_type::N2 * shape_type::N3 * shape_type::N4 ; }
-
- KOKKOS_INLINE_FUNCTION
- size_type stride_6() const
- { return shape_type::N0 * shape_type::N1 * shape_type::N2 * shape_type::N3 * shape_type::N4 * shape_type::N5 ; }
-
- KOKKOS_INLINE_FUNCTION
- size_type stride_7() const
- { return shape_type::N0 * shape_type::N1 * shape_type::N2 * shape_type::N3 * shape_type::N4 * shape_type::N5 * shape_type::N6 ; }
-
- // rank 1
- template< typename I0 >
- KOKKOS_FORCEINLINE_FUNCTION
- size_type operator()( I0 const & i0 ) const { return i0 ; }
-
- // rank 2
- template < typename I0 , typename I1 >
- KOKKOS_FORCEINLINE_FUNCTION
- size_type operator()( I0 const & i0 , I1 const & i1 ) const
- { return i0 + shape_type::N0 * i1 ; }
-
- //rank 3
- template <typename I0, typename I1, typename I2>
- KOKKOS_FORCEINLINE_FUNCTION
- size_type operator()( I0 const& i0
- , I1 const& i1
- , I2 const& i2
- ) const
- {
- return i0 + shape_type::N0 * (
- i1 + shape_type::N1 * i2 );
- }
-
- //rank 4
- template <typename I0, typename I1, typename I2, typename I3>
- KOKKOS_FORCEINLINE_FUNCTION
- size_type operator()( I0 const& i0, I1 const& i1, I2 const& i2, I3 const& i3 ) const
- {
- return i0 + shape_type::N0 * (
- i1 + shape_type::N1 * (
- i2 + shape_type::N2 * i3 ));
- }
-
- //rank 5
- template < typename I0, typename I1, typename I2, typename I3
- ,typename I4 >
- KOKKOS_FORCEINLINE_FUNCTION
- size_type operator()( I0 const& i0, I1 const& i1, I2 const& i2, I3 const& i3, I4 const& i4 ) const
- {
- return i0 + shape_type::N0 * (
- i1 + shape_type::N1 * (
- i2 + shape_type::N2 * (
- i3 + shape_type::N3 * i4 )));
- }
-
- //rank 6
- template < typename I0, typename I1, typename I2, typename I3
- ,typename I4, typename I5 >
- KOKKOS_FORCEINLINE_FUNCTION
- size_type operator()( I0 const& i0, I1 const& i1, I2 const& i2, I3 const& i3, I4 const& i4, I5 const& i5 ) const
- {
- return i0 + shape_type::N0 * (
- i1 + shape_type::N1 * (
- i2 + shape_type::N2 * (
- i3 + shape_type::N3 * (
- i4 + shape_type::N4 * i5 ))));
- }
-
- //rank 7
- template < typename I0, typename I1, typename I2, typename I3
- ,typename I4, typename I5, typename I6 >
- KOKKOS_FORCEINLINE_FUNCTION
- size_type operator()( I0 const& i0, I1 const& i1, I2 const& i2, I3 const& i3, I4 const& i4, I5 const& i5, I6 const& i6) const
- {
- return i0 + shape_type::N0 * (
- i1 + shape_type::N1 * (
- i2 + shape_type::N2 * (
- i3 + shape_type::N3 * (
- i4 + shape_type::N4 * (
- i5 + shape_type::N5 * i6 )))));
- }
-
- //rank 8
- template < typename I0, typename I1, typename I2, typename I3
- ,typename I4, typename I5, typename I6, typename I7 >
- KOKKOS_FORCEINLINE_FUNCTION
- size_type operator()( I0 const& i0, I1 const& i1, I2 const& i2, I3 const& i3, I4 const& i4, I5 const& i5, I6 const& i6, I7 const& i7) const
- {
- return i0 + shape_type::N0 * (
- i1 + shape_type::N1 * (
- i2 + shape_type::N2 * (
- i3 + shape_type::N3 * (
- i4 + shape_type::N4 * (
- i5 + shape_type::N5 * (
- i6 + shape_type::N6 * i7 ))))));
- }
-};
-
-//----------------------------------------------------------------------------
-// LayoutLeft AND ( 1 < rank AND 0 < rank_dynamic ) : has padding / striding
-template < class ShapeType >
-struct ViewOffset< ShapeType , LayoutLeft
- , typename enable_if<( 1 < ShapeType::rank
- &&
- 0 < ShapeType::rank_dynamic
- )>::type >
- : public ShapeType
-{
- typedef size_t size_type ;
- typedef ShapeType shape_type ;
- typedef LayoutLeft array_layout ;
-
- enum { has_padding = true };
-
- size_type S0 ;
-
- // This subview must be 2 == rank and 2 == rank_dynamic
- // due to only having stride #0.
- // The source dimension #0 must be non-zero for stride-one leading dimension.
- // At most subsequent dimension can be non-zero.
- // Return whether the subview introduced noncontiguity.
- template< class S , class L >
- KOKKOS_INLINE_FUNCTION
- typename Impl::enable_if<( 2 == shape_type::rank &&
- 2 == shape_type::rank_dynamic &&
- 2 <= S::rank &&
- Impl::is_same<L,LayoutLeft>::value
- ), bool >::type
- assign_subview( const ViewOffset<S,L,void> & rhs
- , const size_t n0
- , const size_t n1
- , const size_t n2
- , const size_t n3
- , const size_t n4
- , const size_t n5
- , const size_t n6
- , const size_t n7
- )
- {
- // N1 = second non-zero dimension
- // S0 = stride for second non-zero dimension
- shape_type::N0 = n0 ;
- shape_type::N1 = 0 ;
- S0 = 0 ;
-
- if ( n1 ) { shape_type::N1 = n1 ; S0 = rhs.stride_1(); }
- else if ( 2 < S::rank && n2 ) { shape_type::N1 = n2 ; S0 = rhs.stride_2(); }
- else if ( 3 < S::rank && n3 ) { shape_type::N1 = n3 ; S0 = rhs.stride_3(); }
- else if ( 4 < S::rank && n4 ) { shape_type::N1 = n4 ; S0 = rhs.stride_4(); }
- else if ( 5 < S::rank && n5 ) { shape_type::N1 = n5 ; S0 = rhs.stride_5(); }
- else if ( 6 < S::rank && n6 ) { shape_type::N1 = n6 ; S0 = rhs.stride_6(); }
- else if ( 7 < S::rank && n7 ) { shape_type::N1 = n7 ; S0 = rhs.stride_7(); }
-
- // Introduce noncontiguity if change the first dimension
- // or took a range of a dimension after the second.
- return ( size_t(shape_type::N0) != size_t(rhs.N0) ) || ( 0 == n1 );
- }
-
-
- template< unsigned R >
- KOKKOS_INLINE_FUNCTION
- void assign( size_t n )
- { assign_shape_dimension<R>( *this , n ); }
-
-
- KOKKOS_INLINE_FUNCTION
- void assign( size_t n0 , size_t n1 , size_t n2 , size_t n3
- , size_t n4 , size_t n5 , size_t n6 , size_t n7
- , size_t = 0 )
- { shape_type::assign( *this , n0, n1, n2, n3, n4, n5, n6, n7 ); S0 = shape_type::N0 ; }
-
- template< class ShapeRHS >
- KOKKOS_INLINE_FUNCTION
- void assign( const ViewOffset< ShapeRHS , LayoutLeft > & rhs
- , typename enable_if<( int(ShapeRHS::rank) == int(shape_type::rank)
- &&
- int(ShapeRHS::rank_dynamic) <= int(shape_type::rank_dynamic)
- &&
- int(ShapeRHS::rank_dynamic) == 0
- )>::type * = 0 )
- {
- shape_type::assign( *this , rhs.N0, rhs.N1, rhs.N2, rhs.N3, rhs.N4, rhs.N5, rhs.N6, rhs.N7 );
- S0 = shape_type::N0 ; // No padding when dynamic_rank == 0
- }
-
- template< class ShapeRHS >
- KOKKOS_INLINE_FUNCTION
- void assign( const ViewOffset< ShapeRHS , LayoutLeft > & rhs
- , typename enable_if<( int(ShapeRHS::rank) == int(shape_type::rank)
- &&
- int(ShapeRHS::rank_dynamic) <= int(shape_type::rank_dynamic)
- &&
- int(ShapeRHS::rank_dynamic) > 0
- )>::type * = 0 )
- {
- shape_type::assign( *this , rhs.N0, rhs.N1, rhs.N2, rhs.N3, rhs.N4, rhs.N5, rhs.N6, rhs.N7 );
- S0 = rhs.S0 ; // possibly padding when dynamic rank > 0
- }
-
- KOKKOS_INLINE_FUNCTION
- void set_padding()
- {
- enum { div = MEMORY_ALIGNMENT / shape_type::scalar_size };
- enum { mod = MEMORY_ALIGNMENT % shape_type::scalar_size };
- enum { align = 0 == mod ? div : 0 };
-
- if ( align && MEMORY_ALIGNMENT_THRESHOLD * align < S0 ) {
-
- const size_type count_mod = S0 % ( div ? div : 1 );
-
- if ( count_mod ) { S0 += align - count_mod ; }
- }
- }
-
- KOKKOS_INLINE_FUNCTION
- size_type cardinality() const
- { return size_type(shape_type::N0) * shape_type::N1 * shape_type::N2 * shape_type::N3 * shape_type::N4 * shape_type::N5 * shape_type::N6 * shape_type::N7 ; }
-
- KOKKOS_INLINE_FUNCTION
- size_type capacity() const
- { return size_type(S0) * shape_type::N1 * shape_type::N2 * shape_type::N3 * shape_type::N4 * shape_type::N5 * shape_type::N6 * shape_type::N7 ; }
-
- // Stride with [ rank ] as total length
- template< typename iType >
- KOKKOS_INLINE_FUNCTION
- void stride( iType * const s ) const
- {
- s[0] = 1 ;
- if ( 0 < shape_type::rank ) { s[1] = S0 ; }
- if ( 1 < shape_type::rank ) { s[2] = s[1] * shape_type::N1 ; }
- if ( 2 < shape_type::rank ) { s[3] = s[2] * shape_type::N2 ; }
- if ( 3 < shape_type::rank ) { s[4] = s[3] * shape_type::N3 ; }
- if ( 4 < shape_type::rank ) { s[5] = s[4] * shape_type::N4 ; }
- if ( 5 < shape_type::rank ) { s[6] = s[5] * shape_type::N5 ; }
- if ( 6 < shape_type::rank ) { s[7] = s[6] * shape_type::N6 ; }
- if ( 7 < shape_type::rank ) { s[8] = s[7] * shape_type::N7 ; }
- }
-
- KOKKOS_INLINE_FUNCTION size_type stride_0() const { return 1 ; }
- KOKKOS_INLINE_FUNCTION size_type stride_1() const { return S0 ; }
- KOKKOS_INLINE_FUNCTION size_type stride_2() const { return S0 * shape_type::N1 ; }
- KOKKOS_INLINE_FUNCTION size_type stride_3() const { return S0 * shape_type::N1 * shape_type::N2 ; }
-
- KOKKOS_INLINE_FUNCTION
- size_type stride_4() const
- { return S0 * shape_type::N1 * shape_type::N2 * shape_type::N3 ; }
-
- KOKKOS_INLINE_FUNCTION
- size_type stride_5() const
- { return S0 * shape_type::N1 * shape_type::N2 * shape_type::N3 * shape_type::N4 ; }
-
- KOKKOS_INLINE_FUNCTION
- size_type stride_6() const
- { return S0 * shape_type::N1 * shape_type::N2 * shape_type::N3 * shape_type::N4 * shape_type::N5 ; }
-
- KOKKOS_INLINE_FUNCTION
- size_type stride_7() const
- { return S0 * shape_type::N1 * shape_type::N2 * shape_type::N3 * shape_type::N4 * shape_type::N5 * shape_type::N6 ; }
-
- // rank 2
- template < typename I0 , typename I1 >
- KOKKOS_FORCEINLINE_FUNCTION
- size_type operator()( I0 const & i0 , I1 const & i1) const
- { return i0 + S0 * i1 ; }
-
- //rank 3
- template <typename I0, typename I1, typename I2>
- KOKKOS_FORCEINLINE_FUNCTION
- size_type operator()( I0 const& i0, I1 const& i1, I2 const& i2 ) const
- {
- return i0 + S0 * (
- i1 + shape_type::N1 * i2 );
- }
-
- //rank 4
- template <typename I0, typename I1, typename I2, typename I3>
- KOKKOS_FORCEINLINE_FUNCTION
- size_type operator()( I0 const& i0, I1 const& i1, I2 const& i2, I3 const& i3 ) const
- {
- return i0 + S0 * (
- i1 + shape_type::N1 * (
- i2 + shape_type::N2 * i3 ));
- }
-
- //rank 5
- template < typename I0, typename I1, typename I2, typename I3
- ,typename I4 >
- KOKKOS_FORCEINLINE_FUNCTION
- size_type operator()( I0 const& i0, I1 const& i1, I2 const& i2, I3 const& i3, I4 const& i4 ) const
- {
- return i0 + S0 * (
- i1 + shape_type::N1 * (
- i2 + shape_type::N2 * (
- i3 + shape_type::N3 * i4 )));
- }
-
- //rank 6
- template < typename I0, typename I1, typename I2, typename I3
- ,typename I4, typename I5 >
- KOKKOS_FORCEINLINE_FUNCTION
- size_type operator()( I0 const& i0, I1 const& i1, I2 const& i2, I3 const& i3, I4 const& i4, I5 const& i5 ) const
- {
- return i0 + S0 * (
- i1 + shape_type::N1 * (
- i2 + shape_type::N2 * (
- i3 + shape_type::N3 * (
- i4 + shape_type::N4 * i5 ))));
- }
-
- //rank 7
- template < typename I0, typename I1, typename I2, typename I3
- ,typename I4, typename I5, typename I6 >
- KOKKOS_FORCEINLINE_FUNCTION
- size_type operator()( I0 const& i0, I1 const& i1, I2 const& i2, I3 const& i3, I4 const& i4, I5 const& i5, I6 const& i6 ) const
- {
- return i0 + S0 * (
- i1 + shape_type::N1 * (
- i2 + shape_type::N2 * (
- i3 + shape_type::N3 * (
- i4 + shape_type::N4 * (
- i5 + shape_type::N5 * i6 )))));
- }
-
- //rank 8
- template < typename I0, typename I1, typename I2, typename I3
- ,typename I4, typename I5, typename I6, typename I7 >
- KOKKOS_FORCEINLINE_FUNCTION
- size_type operator()( I0 const& i0, I1 const& i1, I2 const& i2, I3 const& i3, I4 const& i4, I5 const& i5, I6 const& i6, I7 const& i7 ) const
- {
- return i0 + S0 * (
- i1 + shape_type::N1 * (
- i2 + shape_type::N2 * (
- i3 + shape_type::N3 * (
- i4 + shape_type::N4 * (
- i5 + shape_type::N5 * (
- i6 + shape_type::N6 * i7 ))))));
- }
-};
-
-//----------------------------------------------------------------------------
-// LayoutRight AND ( 1 >= rank OR 1 >= rank_dynamic ) : no padding / striding
-template < class ShapeType >
-struct ViewOffset< ShapeType , LayoutRight
- , typename enable_if<( 1 >= ShapeType::rank
- ||
- 1 >= ShapeType::rank_dynamic
- )>::type >
- : public ShapeType
-{
- typedef size_t size_type;
- typedef ShapeType shape_type;
- typedef LayoutRight array_layout ;
-
- enum { has_padding = false };
-
- // This subview must be 1 == rank and 1 == rank_dynamic
- // The source view's last dimension must be non-zero
- // Return whether the subview introduced noncontiguity
- template< class S , class L >
- KOKKOS_INLINE_FUNCTION
- typename Impl::enable_if<( 0 == shape_type::rank &&
- Impl::is_same<L,LayoutRight>::value
- ), bool >::type
- assign_subview( const ViewOffset<S,L,void> &
- , const size_t n0
- , const size_t n1
- , const size_t n2
- , const size_t n3
- , const size_t n4
- , const size_t n5
- , const size_t n6
- , const size_t n7
- )
- { return false ; }
-
- // This subview must be 1 == rank and 1 == rank_dynamic
- // The source view's last dimension must be non-zero
- // Return whether the subview introduced noncontiguity
- template< class S , class L >
- KOKKOS_INLINE_FUNCTION
- typename Impl::enable_if<( 1 == shape_type::rank &&
- 1 == shape_type::rank_dynamic &&
- 1 <= S::rank &&
- Impl::is_same<L,LayoutRight>::value
- ), bool >::type
- assign_subview( const ViewOffset<S,L,void> &
- , const size_t n0
- , const size_t n1
- , const size_t n2
- , const size_t n3
- , const size_t n4
- , const size_t n5
- , const size_t n6
- , const size_t n7
- )
- {
- shape_type::N0 = S::rank == 1 ? n0 : (
- S::rank == 2 ? n1 : (
- S::rank == 3 ? n2 : (
- S::rank == 4 ? n3 : (
- S::rank == 5 ? n4 : (
- S::rank == 6 ? n5 : (
- S::rank == 7 ? n6 : n7 ))))));
- // should have n0 .. n_(rank-2) equal zero
- return false ;
- }
-
- template< unsigned R >
- KOKKOS_INLINE_FUNCTION
- void assign( size_t n )
- { assign_shape_dimension<R>( *this , n ); }
-
- KOKKOS_INLINE_FUNCTION
- void assign( size_t n0 , size_t n1 , size_t n2 , size_t n3
- , size_t n4 , size_t n5 , size_t n6 , size_t n7
- , size_t = 0 )
- { shape_type::assign( *this , n0, n1, n2, n3, n4, n5, n6, n7 ); }
-
- template< class ShapeRHS >
- KOKKOS_INLINE_FUNCTION
- void assign( const ViewOffset< ShapeRHS , LayoutRight > & rhs
- , typename enable_if<( int(ShapeRHS::rank) == int(shape_type::rank)
- &&
- int(ShapeRHS::rank_dynamic) <= int(shape_type::rank_dynamic)
- )>::type * = 0 )
- { shape_type::assign( *this , rhs.N0, rhs.N1, rhs.N2, rhs.N3, rhs.N4, rhs.N5, rhs.N6, rhs.N7 ); }
-
- template< class ShapeRHS >
- KOKKOS_INLINE_FUNCTION
- void assign( const ViewOffset< ShapeRHS , LayoutLeft > & rhs
- , typename enable_if<( 1 == int(ShapeRHS::rank)
- &&
- 1 == int(shape_type::rank)
- &&
- 1 == int(shape_type::rank_dynamic)
- )>::type * = 0 )
- { shape_type::assign( *this , rhs.N0, rhs.N1, rhs.N2, rhs.N3, rhs.N4, rhs.N5, rhs.N6, rhs.N7 ); }
-
- KOKKOS_INLINE_FUNCTION
- void set_padding() {}
-
- KOKKOS_INLINE_FUNCTION
- size_type cardinality() const
- { return size_type(shape_type::N0) * shape_type::N1 * shape_type::N2 * shape_type::N3 * shape_type::N4 * shape_type::N5 * shape_type::N6 * shape_type::N7 ; }
-
- KOKKOS_INLINE_FUNCTION
- size_type capacity() const
- { return size_type(shape_type::N0) * shape_type::N1 * shape_type::N2 * shape_type::N3 * shape_type::N4 * shape_type::N5 * shape_type::N6 * shape_type::N7 ; }
-
- size_type stride_R() const
- {
- return size_type(shape_type::N1) * shape_type::N2 * shape_type::N3 *
- shape_type::N4 * shape_type::N5 * shape_type::N6 * shape_type::N7 ;
- };
-
- // Stride with [rank] as total length
- template< typename iType >
- KOKKOS_INLINE_FUNCTION
- void stride( iType * const s ) const
- {
- size_type n = 1 ;
- if ( 7 < shape_type::rank ) { s[7] = n ; n *= shape_type::N7 ; }
- if ( 6 < shape_type::rank ) { s[6] = n ; n *= shape_type::N6 ; }
- if ( 5 < shape_type::rank ) { s[5] = n ; n *= shape_type::N5 ; }
- if ( 4 < shape_type::rank ) { s[4] = n ; n *= shape_type::N4 ; }
- if ( 3 < shape_type::rank ) { s[3] = n ; n *= shape_type::N3 ; }
- if ( 2 < shape_type::rank ) { s[2] = n ; n *= shape_type::N2 ; }
- if ( 1 < shape_type::rank ) { s[1] = n ; n *= shape_type::N1 ; }
- if ( 0 < shape_type::rank ) { s[0] = n ; }
- s[shape_type::rank] = n * shape_type::N0 ;
- }
-
- KOKKOS_INLINE_FUNCTION
- size_type stride_7() const { return 1 ; }
-
- KOKKOS_INLINE_FUNCTION
- size_type stride_6() const { return shape_type::N7 ; }
-
- KOKKOS_INLINE_FUNCTION
- size_type stride_5() const { return shape_type::N7 * shape_type::N6 ; }
-
- KOKKOS_INLINE_FUNCTION
- size_type stride_4() const { return shape_type::N7 * shape_type::N6 * shape_type::N5 ; }
-
- KOKKOS_INLINE_FUNCTION
- size_type stride_3() const { return shape_type::N7 * shape_type::N6 * shape_type::N5 * shape_type::N4 ; }
-
- KOKKOS_INLINE_FUNCTION
- size_type stride_2() const { return shape_type::N7 * shape_type::N6 * shape_type::N5 * shape_type::N4 * shape_type::N3 ; }
-
- KOKKOS_INLINE_FUNCTION
- size_type stride_1() const { return shape_type::N7 * shape_type::N6 * shape_type::N5 * shape_type::N4 * shape_type::N3 * shape_type::N2 ; }
-
- KOKKOS_INLINE_FUNCTION
- size_type stride_0() const { return shape_type::N7 * shape_type::N6 * shape_type::N5 * shape_type::N4 * shape_type::N3 * shape_type::N2 * shape_type::N1 ; }
-
- // rank 1
- template <typename I0>
- KOKKOS_FORCEINLINE_FUNCTION
- size_type operator()( I0 const& i0) const
- {
- return i0 ;
- }
-
- // rank 2
- template <typename I0, typename I1>
- KOKKOS_FORCEINLINE_FUNCTION
- size_type operator()( I0 const& i0, I1 const& i1 ) const
- {
- return i1 + shape_type::N1 * i0 ;
- }
-
- template <typename I0, typename I1, typename I2>
- KOKKOS_FORCEINLINE_FUNCTION
- size_type operator()( I0 const& i0, I1 const& i1, I2 const& i2 ) const
- {
- return i2 + shape_type::N2 * (
- i1 + shape_type::N1 * ( i0 ));
- }
-
- template <typename I0, typename I1, typename I2, typename I3>
- KOKKOS_FORCEINLINE_FUNCTION
- size_type operator()( I0 const& i0, I1 const& i1, I2 const& i2 , I3 const& i3 ) const
- {
- return i3 + shape_type::N3 * (
- i2 + shape_type::N2 * (
- i1 + shape_type::N1 * ( i0 )));
- }
-
- template < typename I0, typename I1, typename I2, typename I3
- ,typename I4 >
- KOKKOS_FORCEINLINE_FUNCTION
- size_type operator()( I0 const& i0, I1 const& i1, I2 const& i2 , I3 const& i3, I4 const& i4 ) const
- {
- return i4 + shape_type::N4 * (
- i3 + shape_type::N3 * (
- i2 + shape_type::N2 * (
- i1 + shape_type::N1 * ( i0 ))));
- }
-
- template < typename I0, typename I1, typename I2, typename I3
- ,typename I4, typename I5 >
- KOKKOS_FORCEINLINE_FUNCTION
- size_type operator()( I0 const& i0, I1 const& i1, I2 const& i2 , I3 const& i3, I4 const& i4, I5 const& i5 ) const
- {
- return i5 + shape_type::N5 * (
- i4 + shape_type::N4 * (
- i3 + shape_type::N3 * (
- i2 + shape_type::N2 * (
- i1 + shape_type::N1 * ( i0 )))));
- }
-
- template < typename I0, typename I1, typename I2, typename I3
- ,typename I4, typename I5, typename I6 >
- KOKKOS_FORCEINLINE_FUNCTION
- size_type operator()( I0 const& i0, I1 const& i1, I2 const& i2 , I3 const& i3, I4 const& i4, I5 const& i5, I6 const& i6 ) const
- {
- return i6 + shape_type::N6 * (
- i5 + shape_type::N5 * (
- i4 + shape_type::N4 * (
- i3 + shape_type::N3 * (
- i2 + shape_type::N2 * (
- i1 + shape_type::N1 * ( i0 ))))));
- }
-
- template < typename I0, typename I1, typename I2, typename I3
- ,typename I4, typename I5, typename I6, typename I7 >
- KOKKOS_FORCEINLINE_FUNCTION
- size_type operator()( I0 const& i0, I1 const& i1, I2 const& i2 , I3 const& i3, I4 const& i4, I5 const& i5, I6 const& i6, I7 const& i7 ) const
- {
- return i7 + shape_type::N7 * (
- i6 + shape_type::N6 * (
- i5 + shape_type::N5 * (
- i4 + shape_type::N4 * (
- i3 + shape_type::N3 * (
- i2 + shape_type::N2 * (
- i1 + shape_type::N1 * ( i0 )))))));
- }
-};
-
-//----------------------------------------------------------------------------
-// LayoutRight AND ( 1 < rank AND 1 < rank_dynamic ) : has padding / striding
-template < class ShapeType >
-struct ViewOffset< ShapeType , LayoutRight
- , typename enable_if<( 1 < ShapeType::rank
- &&
- 1 < ShapeType::rank_dynamic
- )>::type >
- : public ShapeType
-{
- typedef size_t size_type;
- typedef ShapeType shape_type;
- typedef LayoutRight array_layout ;
-
- enum { has_padding = true };
-
- size_type SR ;
-
- // This subview must be 2 == rank and 2 == rank_dynamic
- // due to only having stride #(rank-1).
- // The source dimension #(rank-1) must be non-zero for stride-one leading dimension.
- // At most one prior dimension can be non-zero.
- // Return whether the subview introduced noncontiguity.
- template< class S , class L >
- KOKKOS_INLINE_FUNCTION
- typename Impl::enable_if<( 2 == shape_type::rank &&
- 2 == shape_type::rank_dynamic &&
- 2 <= S::rank &&
- Impl::is_same<L,LayoutRight>::value
- ), bool >::type
- assign_subview( const ViewOffset<S,L,void> & rhs
- , const size_t n0
- , const size_t n1
- , const size_t n2
- , const size_t n3
- , const size_t n4
- , const size_t n5
- , const size_t n6
- , const size_t n7
- )
- {
- const size_type nR = S::rank == 2 ? n1 : (
- S::rank == 3 ? n2 : (
- S::rank == 4 ? n3 : (
- S::rank == 5 ? n4 : (
- S::rank == 6 ? n5 : (
- S::rank == 7 ? n6 : n7 )))));
-
- // N0 = first non-zero-dimension
- // N1 = last non-zero dimension
- // SR = stride for second non-zero dimension
- shape_type::N0 = 0 ;
- shape_type::N1 = nR ;
- SR = 0 ;
-
- if ( n0 ) { shape_type::N0 = n0 ; SR = rhs.stride_0(); }
- else if ( 2 < S::rank && n1 ) { shape_type::N0 = n1 ; SR = rhs.stride_1(); }
- else if ( 3 < S::rank && n2 ) { shape_type::N0 = n2 ; SR = rhs.stride_2(); }
- else if ( 4 < S::rank && n3 ) { shape_type::N0 = n3 ; SR = rhs.stride_3(); }
- else if ( 5 < S::rank && n4 ) { shape_type::N0 = n4 ; SR = rhs.stride_4(); }
- else if ( 6 < S::rank && n5 ) { shape_type::N0 = n5 ; SR = rhs.stride_5(); }
- else if ( 7 < S::rank && n6 ) { shape_type::N0 = n6 ; SR = rhs.stride_6(); }
-
- // Introduce noncontiguous if change the last dimension
- // or take a range of a dimension other than the second-to-last dimension.
-
- return 2 == S::rank ? ( size_t(shape_type::N1) != size_t(rhs.N1) || 0 == n0 ) : (
- 3 == S::rank ? ( size_t(shape_type::N1) != size_t(rhs.N2) || 0 == n1 ) : (
- 4 == S::rank ? ( size_t(shape_type::N1) != size_t(rhs.N3) || 0 == n2 ) : (
- 5 == S::rank ? ( size_t(shape_type::N1) != size_t(rhs.N4) || 0 == n3 ) : (
- 6 == S::rank ? ( size_t(shape_type::N1) != size_t(rhs.N5) || 0 == n4 ) : (
- 7 == S::rank ? ( size_t(shape_type::N1) != size_t(rhs.N6) || 0 == n5 ) : (
- ( size_t(shape_type::N1) != size_t(rhs.N7) || 0 == n6 ) ))))));
- }
-
- template< unsigned R >
- KOKKOS_INLINE_FUNCTION
- void assign( size_t n )
- { assign_shape_dimension<R>( *this , n ); }
-
- KOKKOS_INLINE_FUNCTION
- void assign( size_t n0 , size_t n1 , size_t n2 , size_t n3
- , size_t n4 , size_t n5 , size_t n6 , size_t n7
- , size_t = 0 )
- {
- shape_type::assign( *this , n0, n1, n2, n3, n4, n5, n6, n7 );
- SR = size_type(shape_type::N1) * shape_type::N2 * shape_type::N3 * shape_type::N4 * shape_type::N5 * shape_type::N6 * shape_type::N7 ;
- }
-
- template< class ShapeRHS >
- KOKKOS_INLINE_FUNCTION
- void assign( const ViewOffset< ShapeRHS , LayoutRight > & rhs
- , typename enable_if<( int(ShapeRHS::rank) == int(shape_type::rank)
- &&
- int(ShapeRHS::rank_dynamic) <= int(shape_type::rank_dynamic)
- &&
- int(ShapeRHS::rank_dynamic) <= 1
- )>::type * = 0 )
- {
- shape_type::assign( *this , rhs.N0, rhs.N1, rhs.N2, rhs.N3, rhs.N4, rhs.N5, rhs.N6, rhs.N7 );
- SR = shape_type::N1 * shape_type::N2 * shape_type::N3 * shape_type::N4 * shape_type::N5 * shape_type::N6 * shape_type::N7 ;
- }
-
- template< class ShapeRHS >
- KOKKOS_INLINE_FUNCTION
- void assign( const ViewOffset< ShapeRHS , LayoutRight > & rhs
- , typename enable_if<( int(ShapeRHS::rank) == int(shape_type::rank)
- &&
- int(ShapeRHS::rank_dynamic) <= int(shape_type::rank_dynamic)
- &&
- int(ShapeRHS::rank_dynamic) > 1
- )>::type * = 0 )
- {
- shape_type::assign( *this , rhs.N0, rhs.N1, rhs.N2, rhs.N3, rhs.N4, rhs.N5, rhs.N6, rhs.N7 );
- SR = rhs.SR ;
- }
-
- KOKKOS_INLINE_FUNCTION
- void set_padding()
- {
- enum { div = MEMORY_ALIGNMENT / shape_type::scalar_size };
- enum { mod = MEMORY_ALIGNMENT % shape_type::scalar_size };
- enum { align = 0 == mod ? div : 0 };
-
- if ( align && MEMORY_ALIGNMENT_THRESHOLD * align < SR ) {
-
- const size_type count_mod = SR % ( div ? div : 1 );
-
- if ( count_mod ) { SR += align - count_mod ; }
- }
- }
-
- KOKKOS_INLINE_FUNCTION
- size_type cardinality() const
- { return size_type(shape_type::N0) * shape_type::N1 * shape_type::N2 * shape_type::N3 * shape_type::N4 * shape_type::N5 * shape_type::N6 * shape_type::N7 ; }
-
- KOKKOS_INLINE_FUNCTION
- size_type capacity() const { return shape_type::N0 * SR ; }
-
- template< typename iType >
- KOKKOS_INLINE_FUNCTION
- void stride( iType * const s ) const
- {
- size_type n = 1 ;
- if ( 7 < shape_type::rank ) { s[7] = n ; n *= shape_type::N7 ; }
- if ( 6 < shape_type::rank ) { s[6] = n ; n *= shape_type::N6 ; }
- if ( 5 < shape_type::rank ) { s[5] = n ; n *= shape_type::N5 ; }
- if ( 4 < shape_type::rank ) { s[4] = n ; n *= shape_type::N4 ; }
- if ( 3 < shape_type::rank ) { s[3] = n ; n *= shape_type::N3 ; }
- if ( 2 < shape_type::rank ) { s[2] = n ; n *= shape_type::N2 ; }
- if ( 1 < shape_type::rank ) { s[1] = n ; n *= shape_type::N1 ; }
- if ( 0 < shape_type::rank ) { s[0] = SR ; }
- s[shape_type::rank] = SR * shape_type::N0 ;
- }
-
- KOKKOS_INLINE_FUNCTION
- size_type stride_7() const { return 1 ; }
-
- KOKKOS_INLINE_FUNCTION
- size_type stride_6() const { return shape_type::N7 ; }
-
- KOKKOS_INLINE_FUNCTION
- size_type stride_5() const { return shape_type::N7 * shape_type::N6 ; }
-
- KOKKOS_INLINE_FUNCTION
- size_type stride_4() const { return shape_type::N7 * shape_type::N6 * shape_type::N5 ; }
-
- KOKKOS_INLINE_FUNCTION
- size_type stride_3() const { return shape_type::N7 * shape_type::N6 * shape_type::N5 * shape_type::N4 ; }
-
- KOKKOS_INLINE_FUNCTION
- size_type stride_2() const { return shape_type::N7 * shape_type::N6 * shape_type::N5 * shape_type::N4 * shape_type::N3 ; }
-
- KOKKOS_INLINE_FUNCTION
- size_type stride_1() const { return shape_type::N7 * shape_type::N6 * shape_type::N5 * shape_type::N4 * shape_type::N3 * shape_type::N2 ; }
-
- KOKKOS_INLINE_FUNCTION
- size_type stride_0() const { return SR ; }
-
- // rank 2
- template <typename I0, typename I1>
- KOKKOS_FORCEINLINE_FUNCTION
- size_type operator()( I0 const& i0, I1 const& i1 ) const
- {
- return i1 + i0 * SR ;
- }
-
- template <typename I0, typename I1, typename I2>
- KOKKOS_FORCEINLINE_FUNCTION
- size_type operator()( I0 const& i0, I1 const& i1, I2 const& i2 ) const
- {
- return i2 + shape_type::N2 * ( i1 ) +
- i0 * SR ;
- }
-
- template <typename I0, typename I1, typename I2, typename I3>
- KOKKOS_FORCEINLINE_FUNCTION
- size_type operator()( I0 const& i0, I1 const& i1, I2 const& i2 , I3 const& i3 ) const
- {
- return i3 + shape_type::N3 * (
- i2 + shape_type::N2 * ( i1 )) +
- i0 * SR ;
- }
-
- template < typename I0, typename I1, typename I2, typename I3
- ,typename I4 >
- KOKKOS_FORCEINLINE_FUNCTION
- size_type operator()( I0 const& i0, I1 const& i1, I2 const& i2 , I3 const& i3, I4 const& i4 ) const
- {
- return i4 + shape_type::N4 * (
- i3 + shape_type::N3 * (
- i2 + shape_type::N2 * ( i1 ))) +
- i0 * SR ;
- }
-
- template < typename I0, typename I1, typename I2, typename I3
- ,typename I4, typename I5 >
- KOKKOS_FORCEINLINE_FUNCTION
- size_type operator()( I0 const& i0, I1 const& i1, I2 const& i2 , I3 const& i3, I4 const& i4, I5 const& i5 ) const
- {
- return i5 + shape_type::N5 * (
- i4 + shape_type::N4 * (
- i3 + shape_type::N3 * (
- i2 + shape_type::N2 * ( i1 )))) +
- i0 * SR ;
- }
-
- template < typename I0, typename I1, typename I2, typename I3
- ,typename I4, typename I5, typename I6 >
- KOKKOS_FORCEINLINE_FUNCTION
- size_type operator()( I0 const& i0, I1 const& i1, I2 const& i2 , I3 const& i3, I4 const& i4, I5 const& i5, I6 const& i6 ) const
- {
- return i6 + shape_type::N6 * (
- i5 + shape_type::N5 * (
- i4 + shape_type::N4 * (
- i3 + shape_type::N3 * (
- i2 + shape_type::N2 * ( i1 ))))) +
- i0 * SR ;
- }
-
- template < typename I0, typename I1, typename I2, typename I3
- ,typename I4, typename I5, typename I6, typename I7 >
- KOKKOS_FORCEINLINE_FUNCTION
- size_type operator()( I0 const& i0, I1 const& i1, I2 const& i2 , I3 const& i3, I4 const& i4, I5 const& i5, I6 const& i6, I7 const& i7 ) const
- {
- return i7 + shape_type::N7 * (
- i6 + shape_type::N6 * (
- i5 + shape_type::N5 * (
- i4 + shape_type::N4 * (
- i3 + shape_type::N3 * (
- i2 + shape_type::N2 * ( i1 )))))) +
- i0 * SR ;
- }
-};
-
-//----------------------------------------------------------------------------
-// LayoutStride :
-template < class ShapeType >
-struct ViewOffset< ShapeType , LayoutStride
- , typename enable_if<( 0 < ShapeType::rank )>::type >
- : public ShapeType
-{
- typedef size_t size_type;
- typedef ShapeType shape_type;
- typedef LayoutStride array_layout ;
-
- size_type S[ shape_type::rank + 1 ];
-
- template< class SType , class L >
- KOKKOS_INLINE_FUNCTION
- bool assign_subview( const ViewOffset<SType,L,void> & rhs
- , const size_type n0
- , const size_type n1
- , const size_type n2
- , const size_type n3
- , const size_type n4
- , const size_type n5
- , const size_type n6
- , const size_type n7
- )
- {
- shape_type::assign( *this, 0,0,0,0, 0,0,0,0 );
-
- for ( int i = 0 ; i < int(shape_type::rank+1) ; ++i ) { S[i] = 0 ; }
-
- // preconditions:
- // shape_type::rank <= rhs.rank
- // shape_type::rank == count of nonzero( rhs_dim[i] )
- size_type dim[8] = { n0 , n1 , n2 , n3 , n4 , n5 , n6 , n7 };
- size_type str[ SType::rank + 1 ];
-
- rhs.stride( str );
-
- // contract the zero-dimensions
- int r = 0 ;
- for ( int i = 0 ; i < int(SType::rank) ; ++i ) {
- if ( 0 != dim[i] ) {
- dim[r] = dim[i] ;
- str[r] = str[i] ;
- ++r ;
- }
- }
-
- if ( int(shape_type::rank) == r ) {
- // The shape is non-zero
- for ( int i = 0 ; i < int(shape_type::rank) ; ++i ) {
- const size_type cap = dim[i] * ( S[i] = str[i] );
- if ( S[ shape_type::rank ] < cap ) S[ shape_type::rank ] = cap ;
- }
- // set the contracted nonzero dimensions
- shape_type::assign( *this, dim[0], dim[1], dim[2], dim[3], dim[4], dim[5], dim[6], dim[7] );
- }
-
- return true ; // definitely noncontiguous
- }
-
- template< unsigned R >
- KOKKOS_INLINE_FUNCTION
- void assign( size_t n )
- { assign_shape_dimension<R>( *this , n ); }
-
- template< class ShapeRHS , class Layout >
- KOKKOS_INLINE_FUNCTION
- void assign( const ViewOffset<ShapeRHS,Layout> & rhs
- , typename enable_if<( int(ShapeRHS::rank) == int(shape_type::rank) )>::type * = 0 )
- {
- rhs.stride(S);
- shape_type::assign( *this, rhs.N0, rhs.N1, rhs.N2, rhs.N3, rhs.N4, rhs.N5, rhs.N6, rhs.N7 );
- }
-
- KOKKOS_INLINE_FUNCTION
- void assign( const LayoutStride & layout )
- {
- size_type max = 0 ;
- for ( int i = 0 ; i < shape_type::rank ; ++i ) {
- S[i] = layout.stride[i] ;
- const size_type m = layout.dimension[i] * S[i] ;
- if ( max < m ) { max = m ; }
- }
- S[ shape_type::rank ] = max ;
- shape_type::assign( *this, layout.dimension[0], layout.dimension[1],
- layout.dimension[2], layout.dimension[3],
- layout.dimension[4], layout.dimension[5],
- layout.dimension[6], layout.dimension[7] );
- }
-
- KOKKOS_INLINE_FUNCTION
- void assign( size_t s0 , size_t s1 , size_t s2 , size_t s3
- , size_t s4 , size_t s5 , size_t s6 , size_t s7
- , size_t s8 )
- {
- const size_t str[9] = { s0, s1, s2, s3, s4, s5, s6, s7, s8 };
-
- // Last argument is the total length.
- // Total length must be non-zero.
- // All strides must be non-zero and less than total length.
- bool ok = 0 < str[ shape_type::rank ] ;
-
- for ( int i = 0 ; ( i < shape_type::rank ) &&
- ( ok = 0 < str[i] && str[i] < str[ shape_type::rank ] ); ++i );
-
- if ( ok ) {
- size_t dim[8] = { 1,1,1,1,1,1,1,1 };
- int iorder[9] = { 0,0,0,0,0,0,0,0,0 };
-
- // Ordering of strides smallest to largest.
- for ( int i = 1 ; i < shape_type::rank ; ++i ) {
- int j = i ;
- for ( ; 0 < j && str[i] < str[ iorder[j-1] ] ; --j ) {
- iorder[j] = iorder[j-1] ;
- }
- iorder[j] = i ;
- }
-
- // Last argument is the total length.
- iorder[ shape_type::rank ] = shape_type::rank ;
-
- // Determine dimension associated with each stride.
- // Guarantees non-overlap by truncating dimension
- // if ( 0 != str[ iorder[i+1] ] % str[ iorder[i] ] )
- for ( int i = 0 ; i < shape_type::rank ; ++i ) {
- dim[ iorder[i] ] = str[ iorder[i+1] ] / str[ iorder[i] ] ;
- }
-
- // Assign dimensions and strides:
- shape_type::assign( *this, dim[0], dim[1], dim[2], dim[3], dim[4], dim[5], dim[6], dim[7] );
- for ( int i = 0 ; i <= shape_type::rank ; ++i ) { S[i] = str[i] ; }
- }
- else {
- shape_type::assign(*this,0,0,0,0,0,0,0,0);
- for ( int i = 0 ; i <= shape_type::rank ; ++i ) { S[i] = 0 ; }
- }
- }
-
- KOKKOS_INLINE_FUNCTION
- void set_padding() {}
-
- KOKKOS_INLINE_FUNCTION
- size_type cardinality() const
- { return shape_type::N0 * shape_type::N1 * shape_type::N2 * shape_type::N3 * shape_type::N4 * shape_type::N5 * shape_type::N6 * shape_type::N7 ; }
-
- KOKKOS_INLINE_FUNCTION
- size_type capacity() const { return S[ shape_type::rank ]; }
-
- template< typename iType >
- KOKKOS_INLINE_FUNCTION
- void stride( iType * const s ) const
- { for ( int i = 0 ; i <= shape_type::rank ; ++i ) { s[i] = S[i] ; } }
-
- KOKKOS_INLINE_FUNCTION
- size_type stride_0() const { return S[0] ; }
-
- KOKKOS_INLINE_FUNCTION
- size_type stride_1() const { return S[1] ; }
-
- KOKKOS_INLINE_FUNCTION
- size_type stride_2() const { return S[2] ; }
-
- KOKKOS_INLINE_FUNCTION
- size_type stride_3() const { return S[3] ; }
-
- KOKKOS_INLINE_FUNCTION
- size_type stride_4() const { return S[4] ; }
-
- KOKKOS_INLINE_FUNCTION
- size_type stride_5() const { return S[5] ; }
-
- KOKKOS_INLINE_FUNCTION
- size_type stride_6() const { return S[6] ; }
-
- KOKKOS_INLINE_FUNCTION
- size_type stride_7() const { return S[7] ; }
-
- // rank 1
- template <typename I0 >
- KOKKOS_FORCEINLINE_FUNCTION
- typename std::enable_if< (std::is_integral<I0>::value) && (shape_type::rank==1),size_type>::type
- operator()( I0 const& i0) const
- {
- return i0 * S[0] ;
- }
-
- // rank 2
- template <typename I0, typename I1>
- KOKKOS_FORCEINLINE_FUNCTION
- typename std::enable_if< (std::is_integral<I0>::value) && (shape_type::rank==2),size_type>::type
- operator()( I0 const& i0, I1 const& i1 ) const
- {
- return i0 * S[0] + i1 * S[1] ;
- }
-
- template <typename I0, typename I1, typename I2>
- KOKKOS_FORCEINLINE_FUNCTION
- typename std::enable_if< (std::is_integral<I0>::value) && (shape_type::rank==3),size_type>::type
- operator()( I0 const& i0, I1 const& i1, I2 const& i2 ) const
- {
- return i0 * S[0] + i1 * S[1] + i2 * S[2] ;
- }
-
- template <typename I0, typename I1, typename I2, typename I3>
- KOKKOS_FORCEINLINE_FUNCTION
- typename std::enable_if< (std::is_integral<I0>::value) && (shape_type::rank==4),size_type>::type
- operator()( I0 const& i0, I1 const& i1, I2 const& i2 , I3 const& i3 ) const
- {
- return i0 * S[0] + i1 * S[1] + i2 * S[2] + i3 * S[3] ;
- }
-
- template < typename I0, typename I1, typename I2, typename I3
- ,typename I4 >
- KOKKOS_FORCEINLINE_FUNCTION
- typename std::enable_if< (std::is_integral<I0>::value) && (shape_type::rank==5),size_type>::type
- operator()( I0 const& i0, I1 const& i1, I2 const& i2 , I3 const& i3, I4 const& i4 ) const
- {
- return i0 * S[0] + i1 * S[1] + i2 * S[2] + i3 * S[3] + i4 * S[4] ;
- }
-
- template < typename I0, typename I1, typename I2, typename I3
- ,typename I4, typename I5 >
- KOKKOS_FORCEINLINE_FUNCTION
- typename std::enable_if< (std::is_integral<I0>::value) && (shape_type::rank==6),size_type>::type
- operator()( I0 const& i0, I1 const& i1, I2 const& i2 , I3 const& i3, I4 const& i4, I5 const& i5 ) const
- {
- return i0 * S[0] + i1 * S[1] + i2 * S[2] + i3 * S[3] + i4 * S[4] + i5 * S[5] ;
- }
-
- template < typename I0, typename I1, typename I2, typename I3
- ,typename I4, typename I5, typename I6 >
- KOKKOS_FORCEINLINE_FUNCTION
- typename std::enable_if< (std::is_integral<I0>::value) && (shape_type::rank==7),size_type>::type
- operator()( I0 const& i0, I1 const& i1, I2 const& i2 , I3 const& i3, I4 const& i4, I5 const& i5, I6 const& i6 ) const
- {
- return i0 * S[0] + i1 * S[1] + i2 * S[2] + i3 * S[3] + i4 * S[4] + i5 * S[5] + i6 * S[6] ;
- }
-
- template < typename I0, typename I1, typename I2, typename I3
- ,typename I4, typename I5, typename I6, typename I7 >
- KOKKOS_FORCEINLINE_FUNCTION
- typename std::enable_if< (std::is_integral<I0>::value) && (shape_type::rank==8),size_type>::type
- operator()( I0 const& i0, I1 const& i1, I2 const& i2 , I3 const& i3, I4 const& i4, I5 const& i5, I6 const& i6, I7 const& i7 ) const
- {
- return i0 * S[0] + i1 * S[1] + i2 * S[2] + i3 * S[3] + i4 * S[4] + i5 * S[5] + i6 * S[6] + i7 * S[7] ;
- }
-};
-
-//----------------------------------------------------------------------------
-
-template< class T >
-struct ViewOffsetRange {
-
- enum { OK_integral_type = Impl::StaticAssert< Impl::is_integral<T>::value >::value };
-
- enum { is_range = false };
-
- KOKKOS_INLINE_FUNCTION static
- size_t dimension( size_t const , T const & ) { return 0 ; }
-
- KOKKOS_INLINE_FUNCTION static
- size_t begin( T const & i ) { return size_t(i) ; }
-};
-
-template<>
-struct ViewOffsetRange<void> {
- enum { is_range = false };
-};
-
-template<>
-struct ViewOffsetRange< Kokkos::ALL > {
- enum { is_range = true };
-
- KOKKOS_INLINE_FUNCTION static
- size_t dimension( size_t const n , ALL const & ) { return n ; }
-
- KOKKOS_INLINE_FUNCTION static
- size_t begin( ALL const & ) { return 0 ; }
-};
-
-template< typename iType >
-struct ViewOffsetRange< std::pair<iType,iType> > {
-
- enum { OK_integral_type = Impl::StaticAssert< Impl::is_integral<iType>::value >::value };
-
- enum { is_range = true };
-
- KOKKOS_INLINE_FUNCTION static
- size_t dimension( size_t const n , std::pair<iType,iType> const & r )
- { return ( size_t(r.first) < size_t(r.second) && size_t(r.second) <= n ) ? size_t(r.second) - size_t(r.first) : 0 ; }
-
- KOKKOS_INLINE_FUNCTION static
- size_t begin( std::pair<iType,iType> const & r ) { return size_t(r.first) ; }
-};
-
-template< typename iType >
-struct ViewOffsetRange< Kokkos::pair<iType,iType> > {
-
- enum { OK_integral_type = Impl::StaticAssert< Impl::is_integral<iType>::value >::value };
-
- enum { is_range = true };
-
- KOKKOS_INLINE_FUNCTION static
- size_t dimension( size_t const n , Kokkos::pair<iType,iType> const & r )
- { return ( size_t(r.first) < size_t(r.second) && size_t(r.second) <= n ) ? size_t(r.second) - size_t(r.first) : 0 ; }
-
- KOKKOS_INLINE_FUNCTION static
- size_t begin( Kokkos::pair<iType,iType> const & r ) { return size_t(r.first) ; }
-};
-
-}} // namespace Kokkos::Impl
-
-#endif //KOKKOS_VIEWOFFSET_HPP
-
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewSupport.hpp b/lib/kokkos/core/src/impl/Kokkos_ViewSupport.hpp
deleted file mode 100644
index 8b63039f5..000000000
--- a/lib/kokkos/core/src/impl/Kokkos_ViewSupport.hpp
+++ /dev/null
@@ -1,393 +0,0 @@
-/*
-//@HEADER
-// ************************************************************************
-//
-// Kokkos v. 2.0
-// Copyright (2014) Sandia Corporation
-//
-// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
-// the U.S. Government retains certain rights in this software.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are
-// met:
-//
-// 1. Redistributions of source code must retain the above copyright
-// notice, this list of conditions and the following disclaimer.
-//
-// 2. Redistributions in binary form must reproduce the above copyright
-// notice, this list of conditions and the following disclaimer in the
-// documentation and/or other materials provided with the distribution.
-//
-// 3. Neither the name of the Corporation nor the names of the
-// contributors may be used to endorse or promote products derived from
-// this software without specific prior written permission.
-//
-// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
-// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
-// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
-// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
-// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
-// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
-// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
-// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
-// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-//
-// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
-// ************************************************************************
-//@HEADER
-*/
-
-#ifndef KOKKOS_VIEWSUPPORT_HPP
-#define KOKKOS_VIEWSUPPORT_HPP
-
-#include <algorithm>
-#include <Kokkos_ExecPolicy.hpp>
-#include <impl/Kokkos_Shape.hpp>
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Impl {
-
-/** \brief Evaluate if LHS = RHS view assignment is allowed. */
-template< class ViewLHS , class ViewRHS >
-struct ViewAssignable
-{
- // Same memory space.
- // Same value type.
- // Compatible 'const' qualifier
- // Cannot assign managed = unmannaged
- enum { assignable_value =
- ( is_same< typename ViewLHS::value_type ,
- typename ViewRHS::value_type >::value
- ||
- is_same< typename ViewLHS::value_type ,
- typename ViewRHS::const_value_type >::value )
- &&
- is_same< typename ViewLHS::memory_space ,
- typename ViewRHS::memory_space >::value
- &&
- ( ! ( ViewLHS::is_managed && ! ViewRHS::is_managed ) )
- };
-
- enum { assignable_shape =
- // Compatible shape and matching layout:
- ( ShapeCompatible< typename ViewLHS::shape_type ,
- typename ViewRHS::shape_type >::value
- &&
- is_same< typename ViewLHS::array_layout ,
- typename ViewRHS::array_layout >::value )
- ||
- // Matching layout, same rank, and LHS dynamic rank
- ( is_same< typename ViewLHS::array_layout ,
- typename ViewRHS::array_layout >::value
- &&
- int(ViewLHS::rank) == int(ViewRHS::rank)
- &&
- int(ViewLHS::rank) == int(ViewLHS::rank_dynamic) )
- ||
- // Both rank-0, any shape and layout
- ( int(ViewLHS::rank) == 0 && int(ViewRHS::rank) == 0 )
- ||
- // Both rank-1 and LHS is dynamic rank-1, any shape and layout
- ( int(ViewLHS::rank) == 1 && int(ViewRHS::rank) == 1 &&
- int(ViewLHS::rank_dynamic) == 1 )
- };
-
- enum { value = assignable_value && assignable_shape };
-};
-
-} // namespace Impl
-} // namespace Kokkos
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Impl {
-
-template< class ExecSpace , class Type , bool Initialize >
-struct ViewDefaultConstruct
-{ ViewDefaultConstruct( Type * , size_t ) {} };
-
-} // namespace Impl
-} // namespace Kokkos
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Impl {
-
-template< class OutputView , class InputView , unsigned Rank = OutputView::Rank >
-struct ViewRemap
-{
- typedef typename OutputView::size_type size_type ;
-
- const OutputView output ;
- const InputView input ;
- const size_type n0 ;
- const size_type n1 ;
- const size_type n2 ;
- const size_type n3 ;
- const size_type n4 ;
- const size_type n5 ;
- const size_type n6 ;
- const size_type n7 ;
-
- ViewRemap( const OutputView & arg_out , const InputView & arg_in )
- : output( arg_out ), input( arg_in )
- , n0( std::min( (size_t)arg_out.dimension_0() , (size_t)arg_in.dimension_0() ) )
- , n1( std::min( (size_t)arg_out.dimension_1() , (size_t)arg_in.dimension_1() ) )
- , n2( std::min( (size_t)arg_out.dimension_2() , (size_t)arg_in.dimension_2() ) )
- , n3( std::min( (size_t)arg_out.dimension_3() , (size_t)arg_in.dimension_3() ) )
- , n4( std::min( (size_t)arg_out.dimension_4() , (size_t)arg_in.dimension_4() ) )
- , n5( std::min( (size_t)arg_out.dimension_5() , (size_t)arg_in.dimension_5() ) )
- , n6( std::min( (size_t)arg_out.dimension_6() , (size_t)arg_in.dimension_6() ) )
- , n7( std::min( (size_t)arg_out.dimension_7() , (size_t)arg_in.dimension_7() ) )
- {
- typedef typename OutputView::execution_space execution_space ;
- Kokkos::RangePolicy< execution_space > range( 0 , n0 );
- parallel_for( range , *this );
- }
-
- KOKKOS_INLINE_FUNCTION
- void operator()( const size_type i0 ) const
- {
- for ( size_type i1 = 0 ; i1 < n1 ; ++i1 ) {
- for ( size_type i2 = 0 ; i2 < n2 ; ++i2 ) {
- for ( size_type i3 = 0 ; i3 < n3 ; ++i3 ) {
- for ( size_type i4 = 0 ; i4 < n4 ; ++i4 ) {
- for ( size_type i5 = 0 ; i5 < n5 ; ++i5 ) {
- for ( size_type i6 = 0 ; i6 < n6 ; ++i6 ) {
- for ( size_type i7 = 0 ; i7 < n7 ; ++i7 ) {
- output.at(i0,i1,i2,i3,i4,i5,i6,i7) = input.at(i0,i1,i2,i3,i4,i5,i6,i7);
- }}}}}}}
- }
-};
-
-template< class OutputView , class InputView >
-struct ViewRemap< OutputView , InputView , 0 >
-{
- typedef typename OutputView::value_type value_type ;
- typedef typename OutputView::memory_space dst_space ;
- typedef typename InputView ::memory_space src_space ;
-
- ViewRemap( const OutputView & arg_out , const InputView & arg_in )
- {
- DeepCopy< dst_space , src_space >( arg_out.ptr_on_device() ,
- arg_in.ptr_on_device() ,
- sizeof(value_type) );
- }
-};
-
-//----------------------------------------------------------------------------
-
-template< class ExecSpace , class Type >
-struct ViewDefaultConstruct< ExecSpace , Type , true >
-{
- Type * const m_ptr ;
-
- KOKKOS_FORCEINLINE_FUNCTION
- void operator()( const typename ExecSpace::size_type& i ) const
- { m_ptr[i] = Type(); }
-
- ViewDefaultConstruct( Type * pointer , size_t capacity )
- : m_ptr( pointer )
- {
- Kokkos::RangePolicy< ExecSpace > range( 0 , capacity );
- parallel_for( range , *this );
- ExecSpace::fence();
- }
-};
-
-template< class OutputView , unsigned Rank = OutputView::Rank ,
- class Enabled = void >
-struct ViewFill
-{
- typedef typename OutputView::const_value_type const_value_type ;
- typedef typename OutputView::size_type size_type ;
-
- const OutputView output ;
- const_value_type input ;
-
- ViewFill( const OutputView & arg_out , const_value_type & arg_in )
- : output( arg_out ), input( arg_in )
- {
- typedef typename OutputView::execution_space execution_space ;
- Kokkos::RangePolicy< execution_space > range( 0 , output.dimension_0() );
- parallel_for( range , *this );
- execution_space::fence();
- }
-
- KOKKOS_INLINE_FUNCTION
- void operator()( const size_type i0 ) const
- {
- for ( size_type i1 = 0 ; i1 < output.dimension_1() ; ++i1 ) {
- for ( size_type i2 = 0 ; i2 < output.dimension_2() ; ++i2 ) {
- for ( size_type i3 = 0 ; i3 < output.dimension_3() ; ++i3 ) {
- for ( size_type i4 = 0 ; i4 < output.dimension_4() ; ++i4 ) {
- for ( size_type i5 = 0 ; i5 < output.dimension_5() ; ++i5 ) {
- for ( size_type i6 = 0 ; i6 < output.dimension_6() ; ++i6 ) {
- for ( size_type i7 = 0 ; i7 < output.dimension_7() ; ++i7 ) {
- output.at(i0,i1,i2,i3,i4,i5,i6,i7) = input ;
- }}}}}}}
- }
-};
-
-template< class OutputView >
-struct ViewFill< OutputView , 0 >
-{
- typedef typename OutputView::const_value_type const_value_type ;
- typedef typename OutputView::memory_space dst_space ;
-
- ViewFill( const OutputView & arg_out , const_value_type & arg_in )
- {
- DeepCopy< dst_space , dst_space >( arg_out.ptr_on_device() , & arg_in ,
- sizeof(const_value_type) );
- }
-};
-
-} // namespace Impl
-} // namespace Kokkos
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-
-struct ViewAllocateWithoutInitializing {
-
- const std::string label ;
-
- ViewAllocateWithoutInitializing() : label() {}
- explicit ViewAllocateWithoutInitializing( const std::string & arg_label ) : label( arg_label ) {}
- explicit ViewAllocateWithoutInitializing( const char * const arg_label ) : label( arg_label ) {}
-};
-
-struct ViewAllocate {
-
- const std::string label ;
-
- ViewAllocate() : label() {}
- ViewAllocate( const std::string & arg_label ) : label( arg_label ) {}
- ViewAllocate( const char * const arg_label ) : label( arg_label ) {}
-};
-
-}
-
-namespace Kokkos {
-namespace Impl {
-
-template< class Traits , class AllocationProperties , class Enable = void >
-struct ViewAllocProp : public Kokkos::Impl::false_type {};
-
-template< class Traits >
-struct ViewAllocProp< Traits , Kokkos::ViewAllocate
- , typename Kokkos::Impl::enable_if<(
- Traits::is_managed && ! Kokkos::Impl::is_const< typename Traits::value_type >::value
- )>::type >
- : public Kokkos::Impl::true_type
-{
- typedef size_t size_type ;
- typedef const ViewAllocate & property_type ;
-
- enum { Initialize = true };
- enum { AllowPadding = false };
-
- inline
- static const std::string & label( property_type p ) { return p.label ; }
-};
-
-template< class Traits >
-struct ViewAllocProp< Traits , std::string
- , typename Kokkos::Impl::enable_if<(
- Traits::is_managed && ! Kokkos::Impl::is_const< typename Traits::value_type >::value
- )>::type >
- : public Kokkos::Impl::true_type
-{
- typedef size_t size_type ;
- typedef const std::string & property_type ;
-
- enum { Initialize = true };
- enum { AllowPadding = false };
-
- inline
- static const std::string & label( property_type s ) { return s ; }
-};
-
-template< class Traits , unsigned N >
-struct ViewAllocProp< Traits , char[N]
- , typename Kokkos::Impl::enable_if<(
- Traits::is_managed && ! Kokkos::Impl::is_const< typename Traits::value_type >::value
- )>::type >
- : public Kokkos::Impl::true_type
-{
-private:
- typedef char label_type[N] ;
-public:
-
- typedef size_t size_type ;
- typedef const label_type & property_type ;
-
- enum { Initialize = true };
- enum { AllowPadding = false };
-
- inline
- static std::string label( property_type s ) { return std::string(s) ; }
-};
-
-template< class Traits >
-struct ViewAllocProp< Traits , Kokkos::ViewAllocateWithoutInitializing
- , typename Kokkos::Impl::enable_if<(
- Traits::is_managed && ! Kokkos::Impl::is_const< typename Traits::value_type >::value
- )>::type >
- : public Kokkos::Impl::true_type
-{
- typedef size_t size_type ;
- typedef const Kokkos::ViewAllocateWithoutInitializing & property_type ;
-
- enum { Initialize = false };
- enum { AllowPadding = false };
-
- inline
- static std::string label( property_type s ) { return s.label ; }
-};
-
-} // namespace Impl
-} // namespace Kokkos
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Impl {
-
-template< class Traits , class PointerProperties , class Enable = void >
-struct ViewRawPointerProp : public Kokkos::Impl::false_type {};
-
-template< class Traits , typename T >
-struct ViewRawPointerProp< Traits , T ,
- typename Kokkos::Impl::enable_if<(
- Impl::is_same< T , typename Traits::value_type >::value ||
- Impl::is_same< T , typename Traits::non_const_value_type >::value
- )>::type >
- : public Kokkos::Impl::true_type
-{
- typedef size_t size_type ;
-};
-
-} // namespace Impl
-} // namespace Kokkos
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-#endif /* #ifndef KOKKOS_VIEWSUPPORT_HPP */
-
-
diff --git a/lib/kokkos/core/src/impl/KokkosExp_ViewTile.hpp b/lib/kokkos/core/src/impl/Kokkos_ViewTile.hpp
similarity index 92%
rename from lib/kokkos/core/src/impl/KokkosExp_ViewTile.hpp
rename to lib/kokkos/core/src/impl/Kokkos_ViewTile.hpp
index 8b3749e85..ecbcf72fe 100644
--- a/lib/kokkos/core/src/impl/KokkosExp_ViewTile.hpp
+++ b/lib/kokkos/core/src/impl/Kokkos_ViewTile.hpp
@@ -1,227 +1,227 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_EXPERIMENTAL_VIEWTILE_HPP
#define KOKKOS_EXPERIMENTAL_VIEWTILE_HPP
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Experimental {
namespace Impl {
// View mapping for rank two tiled array
template< class L >
struct is_layout_tile : public std::false_type {};
template< unsigned N0 , unsigned N1 >
struct is_layout_tile< Kokkos::LayoutTileLeft<N0,N1,true> > : public std::true_type {};
template< class Dimension , class Layout >
struct ViewOffset< Dimension , Layout ,
typename std::enable_if<(
( Dimension::rank == 2 )
&&
is_layout_tile< Layout >::value
)>::type >
{
public:
enum { SHIFT_0 = Kokkos::Impl::integral_power_of_two(Layout::N0) };
enum { SHIFT_1 = Kokkos::Impl::integral_power_of_two(Layout::N1) };
enum { SHIFT_T = SHIFT_0 + SHIFT_1 };
enum { MASK_0 = Layout::N0 - 1 };
enum { MASK_1 = Layout::N1 - 1 };
// Is an irregular layout that does not have uniform striding for each index.
using is_mapping_plugin = std::true_type ;
using is_regular = std::false_type ;
typedef size_t size_type ;
typedef Dimension dimension_type ;
typedef Layout array_layout ;
dimension_type m_dim ;
size_type m_tile_N0 ;
//----------------------------------------
// Only instantiated for rank 2
template< typename I0 , typename I1 >
KOKKOS_INLINE_FUNCTION constexpr
size_type operator()( I0 const & i0 , I1 const & i1
, int = 0 , int = 0
, int = 0 , int = 0
, int = 0 , int = 0
) const
{
return /* ( ( Tile offset ) * Tile size ) */
( ( (i0>>SHIFT_0) + m_tile_N0 * (i1>>SHIFT_1) ) << SHIFT_T) +
/* ( Offset within tile ) */
( (i0 & MASK_0) + ((i1 & MASK_1)<<SHIFT_0) ) ;
}
//----------------------------------------
KOKKOS_INLINE_FUNCTION constexpr
array_layout layout() const
{ return array_layout( m_dim.N0 , m_dim.N1 ); }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_0() const { return m_dim.N0 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_1() const { return m_dim.N1 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_2() const { return 1 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_3() const { return 1 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_4() const { return 1 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_5() const { return 1 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_6() const { return 1 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type dimension_7() const { return 1 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type size() const { return m_dim.N0 * m_dim.N1 ; }
// Strides are meaningless due to irregularity
KOKKOS_INLINE_FUNCTION constexpr size_type stride_0() const { return 0 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_1() const { return 0 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_2() const { return 0 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_3() const { return 0 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_4() const { return 0 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_5() const { return 0 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_6() const { return 0 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type stride_7() const { return 0 ; }
KOKKOS_INLINE_FUNCTION constexpr size_type span() const
{
// ( TileDim0 * ( TileDim1 ) ) * TileSize
return ( m_tile_N0 * ( ( m_dim.N1 + MASK_1 ) >> SHIFT_1 ) ) << SHIFT_T ;
}
KOKKOS_INLINE_FUNCTION constexpr bool span_is_contiguous() const
{
// Only if dimensions align with tile size
return ( m_dim.N0 & MASK_0 ) == 0 && ( m_dim.N1 & MASK_1 ) == 0 ;
}
//----------------------------------------
~ViewOffset() = default ;
- ViewOffset() = default ;
- ViewOffset( const ViewOffset & ) = default ;
- ViewOffset & operator = ( const ViewOffset & ) = default ;
+ KOKKOS_INLINE_FUNCTION ViewOffset() = default ;
+ KOKKOS_INLINE_FUNCTION ViewOffset( const ViewOffset & ) = default ;
+ KOKKOS_INLINE_FUNCTION ViewOffset & operator = ( const ViewOffset & ) = default ;
template< unsigned TrivialScalarSize >
KOKKOS_INLINE_FUNCTION
constexpr ViewOffset( std::integral_constant<unsigned,TrivialScalarSize> const & ,
array_layout const arg_layout )
: m_dim( arg_layout.dimension[0], arg_layout.dimension[1], 0, 0, 0, 0, 0, 0 )
, m_tile_N0( ( arg_layout.dimension[0] + MASK_0 ) >> SHIFT_0 /* number of tiles in first dimension */ )
{}
};
template< typename T , unsigned N0 , unsigned N1 , class ... P
, typename iType0 , typename iType1
>
struct ViewMapping
< void
- , Kokkos::Experimental::ViewTraits<T**,Kokkos::LayoutTileLeft<N0,N1,true>,P...>
+ , Kokkos::ViewTraits<T**,Kokkos::LayoutTileLeft<N0,N1,true>,P...>
, Kokkos::LayoutTileLeft<N0,N1,true>
, iType0
, iType1 >
{
typedef Kokkos::LayoutTileLeft<N0,N1,true> src_layout ;
- typedef Kokkos::Experimental::ViewTraits< T** , src_layout , P... > src_traits ;
- typedef Kokkos::Experimental::ViewTraits< T[N0][N1] , LayoutLeft , P ... > traits ;
- typedef Kokkos::Experimental::View< T[N0][N1] , LayoutLeft , P ... > type ;
+ typedef Kokkos::ViewTraits< T** , src_layout , P... > src_traits ;
+ typedef Kokkos::ViewTraits< T[N0][N1] , LayoutLeft , P ... > traits ;
+ typedef Kokkos::View< T[N0][N1] , LayoutLeft , P ... > type ;
KOKKOS_INLINE_FUNCTION static
void assign( ViewMapping< traits , void > & dst
, const ViewMapping< src_traits , void > & src
, const src_layout &
, const size_t i_tile0
, const size_t i_tile1
)
{
typedef ViewMapping< traits , void > dst_map_type ;
typedef ViewMapping< src_traits , void > src_map_type ;
typedef typename dst_map_type::handle_type dst_handle_type ;
typedef typename dst_map_type::offset_type dst_offset_type ;
typedef typename src_map_type::offset_type src_offset_type ;
dst = dst_map_type(
dst_handle_type( src.m_handle +
( ( i_tile0 + src.m_offset.m_tile_N0 * i_tile1 ) << src_offset_type::SHIFT_T ) ) ,
dst_offset_type() );
}
};
} /* namespace Impl */
} /* namespace Experimental */
} /* namespace Kokkos */
namespace Kokkos {
namespace Experimental {
template< typename T , unsigned N0 , unsigned N1 , class ... P >
KOKKOS_INLINE_FUNCTION
-Kokkos::Experimental::View< T[N0][N1] , LayoutLeft , P... >
-tile_subview( const Kokkos::Experimental::View<T**,Kokkos::LayoutTileLeft<N0,N1,true>,P...> & src
+Kokkos::View< T[N0][N1] , LayoutLeft , P... >
+tile_subview( const Kokkos::View<T**,Kokkos::LayoutTileLeft<N0,N1,true>,P...> & src
, const size_t i_tile0
, const size_t i_tile1
)
{
// Force the specialized ViewMapping for extracting a tile
// by using the first subview argument as the layout.
typedef Kokkos::LayoutTileLeft<N0,N1,true> SrcLayout ;
- return Kokkos::Experimental::View< T[N0][N1] , LayoutLeft , P... >
+ return Kokkos::View< T[N0][N1] , LayoutLeft , P... >
( src , SrcLayout() , i_tile0 , i_tile1 );
}
} /* namespace Experimental */
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
#endif /* #ifndef KOKKOS_EXPERIENTAL_VIEWTILE_HPP */
diff --git a/lib/kokkos/core/unit_test/CMakeLists.txt b/lib/kokkos/core/unit_test/CMakeLists.txt
index 5bb2b672e..795657fe8 100644
--- a/lib/kokkos/core/unit_test/CMakeLists.txt
+++ b/lib/kokkos/core/unit_test/CMakeLists.txt
@@ -1,105 +1,197 @@
#
# Add test-only library for gtest to be reused by all the subpackages
#
SET(GTEST_SOURCE_DIR ${${PARENT_PACKAGE_NAME}_SOURCE_DIR}/tpls/gtest)
INCLUDE_DIRECTORIES(${GTEST_SOURCE_DIR})
TRIBITS_ADD_LIBRARY(
kokkos_gtest
HEADERS ${GTEST_SOURCE_DIR}/gtest/gtest.h
SOURCES ${GTEST_SOURCE_DIR}/gtest/gtest-all.cc
TESTONLY
)
#
# Define the tests
#
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINARY_DIR})
-INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR})
+INCLUDE_DIRECTORIES(REQUIRED_DURING_INSTALLATION_TESTING ${CMAKE_CURRENT_SOURCE_DIR})
IF(Kokkos_ENABLE_Serial)
TRIBITS_ADD_EXECUTABLE_AND_TEST(
UnitTest_Serial
- SOURCES UnitTestMain.cpp TestSerial.cpp
+ SOURCES
+ UnitTestMain.cpp
+ serial/TestSerial_Atomics.cpp
+ serial/TestSerial_Other.cpp
+ serial/TestSerial_Reductions.cpp
+ serial/TestSerial_SubView_a.cpp
+ serial/TestSerial_SubView_b.cpp
+ serial/TestSerial_SubView_c01.cpp
+ serial/TestSerial_SubView_c02.cpp
+ serial/TestSerial_SubView_c03.cpp
+ serial/TestSerial_SubView_c04.cpp
+ serial/TestSerial_SubView_c05.cpp
+ serial/TestSerial_SubView_c06.cpp
+ serial/TestSerial_SubView_c07.cpp
+ serial/TestSerial_SubView_c08.cpp
+ serial/TestSerial_SubView_c09.cpp
+ serial/TestSerial_SubView_c10.cpp
+ serial/TestSerial_SubView_c11.cpp
+ serial/TestSerial_SubView_c12.cpp
+ serial/TestSerial_Team.cpp
+ serial/TestSerial_ViewAPI_a.cpp
+ serial/TestSerial_ViewAPI_b.cpp
COMM serial mpi
NUM_MPI_PROCS 1
FAIL_REGULAR_EXPRESSION " FAILED "
TESTONLYLIBS kokkos_gtest
)
ENDIF()
IF(Kokkos_ENABLE_Pthread)
TRIBITS_ADD_EXECUTABLE_AND_TEST(
UnitTest_Threads
- SOURCES UnitTestMain.cpp TestThreads.cpp
+ SOURCES
+ UnitTestMain.cpp
+ threads/TestThreads_Atomics.cpp
+ threads/TestThreads_Other.cpp
+ threads/TestThreads_Reductions.cpp
+ threads/TestThreads_SubView_a.cpp
+ threads/TestThreads_SubView_b.cpp
+ threads/TestThreads_SubView_c01.cpp
+ threads/TestThreads_SubView_c02.cpp
+ threads/TestThreads_SubView_c03.cpp
+ threads/TestThreads_SubView_c04.cpp
+ threads/TestThreads_SubView_c05.cpp
+ threads/TestThreads_SubView_c06.cpp
+ threads/TestThreads_SubView_c07.cpp
+ threads/TestThreads_SubView_c08.cpp
+ threads/TestThreads_SubView_c09.cpp
+ threads/TestThreads_SubView_c10.cpp
+ threads/TestThreads_SubView_c11.cpp
+ threads/TestThreads_SubView_c12.cpp
+ threads/TestThreads_Team.cpp
+ threads/TestThreads_ViewAPI_a.cpp
+ threads/TestThreads_ViewAPI_b.cpp
COMM serial mpi
NUM_MPI_PROCS 1
FAIL_REGULAR_EXPRESSION " FAILED "
TESTONLYLIBS kokkos_gtest
)
ENDIF()
IF(Kokkos_ENABLE_OpenMP)
TRIBITS_ADD_EXECUTABLE_AND_TEST(
UnitTest_OpenMP
- SOURCES UnitTestMain.cpp TestOpenMP.cpp TestOpenMP_a.cpp TestOpenMP_b.cpp TestOpenMP_c.cpp
+ SOURCES
+ UnitTestMain.cpp
+ openmp/TestOpenMP_Atomics.cpp
+ openmp/TestOpenMP_Other.cpp
+ openmp/TestOpenMP_Reductions.cpp
+ openmp/TestOpenMP_SubView_a.cpp
+ openmp/TestOpenMP_SubView_b.cpp
+ openmp/TestOpenMP_SubView_c01.cpp
+ openmp/TestOpenMP_SubView_c02.cpp
+ openmp/TestOpenMP_SubView_c03.cpp
+ openmp/TestOpenMP_SubView_c04.cpp
+ openmp/TestOpenMP_SubView_c05.cpp
+ openmp/TestOpenMP_SubView_c06.cpp
+ openmp/TestOpenMP_SubView_c07.cpp
+ openmp/TestOpenMP_SubView_c08.cpp
+ openmp/TestOpenMP_SubView_c09.cpp
+ openmp/TestOpenMP_SubView_c10.cpp
+ openmp/TestOpenMP_SubView_c11.cpp
+ openmp/TestOpenMP_SubView_c12.cpp
+ openmp/TestOpenMP_Team.cpp
+ openmp/TestOpenMP_ViewAPI_a.cpp
+ openmp/TestOpenMP_ViewAPI_b.cpp
COMM serial mpi
NUM_MPI_PROCS 1
FAIL_REGULAR_EXPRESSION " FAILED "
TESTONLYLIBS kokkos_gtest
)
ENDIF()
IF(Kokkos_ENABLE_QTHREAD)
TRIBITS_ADD_EXECUTABLE_AND_TEST(
UnitTest_Qthread
SOURCES UnitTestMain.cpp TestQthread.cpp
COMM serial mpi
NUM_MPI_PROCS 1
FAIL_REGULAR_EXPRESSION " FAILED "
TESTONLYLIBS kokkos_gtest
)
ENDIF()
IF(Kokkos_ENABLE_Cuda)
TRIBITS_ADD_EXECUTABLE_AND_TEST(
UnitTest_Cuda
- SOURCES UnitTestMain.cpp TestCuda.cpp TestCuda_a.cpp TestCuda_b.cpp TestCuda_c.cpp
+ SOURCES
+ UnitTestMain.cpp
+ cuda/TestCuda_Atomics.cpp
+ cuda/TestCuda_Other.cpp
+ cuda/TestCuda_Reductions_a.cpp
+ cuda/TestCuda_Reductions_b.cpp
+ cuda/TestCuda_Spaces.cpp
+ cuda/TestCuda_SubView_a.cpp
+ cuda/TestCuda_SubView_b.cpp
+ cuda/TestCuda_SubView_c01.cpp
+ cuda/TestCuda_SubView_c02.cpp
+ cuda/TestCuda_SubView_c03.cpp
+ cuda/TestCuda_SubView_c04.cpp
+ cuda/TestCuda_SubView_c05.cpp
+ cuda/TestCuda_SubView_c06.cpp
+ cuda/TestCuda_SubView_c07.cpp
+ cuda/TestCuda_SubView_c08.cpp
+ cuda/TestCuda_SubView_c09.cpp
+ cuda/TestCuda_SubView_c10.cpp
+ cuda/TestCuda_SubView_c11.cpp
+ cuda/TestCuda_SubView_c12.cpp
+ cuda/TestCuda_Team.cpp
+ cuda/TestCuda_ViewAPI_a.cpp
+ cuda/TestCuda_ViewAPI_b.cpp
+ cuda/TestCuda_ViewAPI_c.cpp
+ cuda/TestCuda_ViewAPI_d.cpp
+ cuda/TestCuda_ViewAPI_e.cpp
+ cuda/TestCuda_ViewAPI_f.cpp
+ cuda/TestCuda_ViewAPI_g.cpp
+ cuda/TestCuda_ViewAPI_h.cpp
COMM serial mpi
NUM_MPI_PROCS 1
FAIL_REGULAR_EXPRESSION " FAILED "
TESTONLYLIBS kokkos_gtest
)
ENDIF()
TRIBITS_ADD_EXECUTABLE_AND_TEST(
UnitTest_Default
SOURCES UnitTestMain.cpp TestDefaultDeviceType.cpp TestDefaultDeviceType_a.cpp
COMM serial mpi
NUM_MPI_PROCS 1
FAIL_REGULAR_EXPRESSION " FAILED "
TESTONLYLIBS kokkos_gtest
)
foreach(INITTESTS_NUM RANGE 1 16)
TRIBITS_ADD_EXECUTABLE_AND_TEST(
UnitTest_DefaultInit_${INITTESTS_NUM}
SOURCES UnitTestMain.cpp TestDefaultDeviceTypeInit_${INITTESTS_NUM}.cpp
COMM serial mpi
NUM_MPI_PROCS 1
FAIL_REGULAR_EXPRESSION " FAILED "
TESTONLYLIBS kokkos_gtest
)
endforeach(INITTESTS_NUM)
TRIBITS_ADD_EXECUTABLE_AND_TEST(
UnitTest_HWLOC
SOURCES UnitTestMain.cpp TestHWLOC.cpp
COMM serial mpi
NUM_MPI_PROCS 1
FAIL_REGULAR_EXPRESSION " FAILED "
TESTONLYLIBS kokkos_gtest
)
diff --git a/lib/kokkos/core/unit_test/Makefile b/lib/kokkos/core/unit_test/Makefile
index 3d9d212c1..3203dec28 100644
--- a/lib/kokkos/core/unit_test/Makefile
+++ b/lib/kokkos/core/unit_test/Makefile
@@ -1,153 +1,195 @@
KOKKOS_PATH = ../..
GTEST_PATH = ../../tpls/gtest
vpath %.cpp ${KOKKOS_PATH}/core/unit_test
+vpath %.cpp ${KOKKOS_PATH}/core/unit_test/serial
+vpath %.cpp ${KOKKOS_PATH}/core/unit_test/threads
+vpath %.cpp ${KOKKOS_PATH}/core/unit_test/openmp
+vpath %.cpp ${KOKKOS_PATH}/core/unit_test/cuda
+
TEST_HEADERS = $(wildcard $(KOKKOS_PATH)/core/unit_test/*.hpp)
+TEST_HEADERS += $(wildcard $(KOKKOS_PATH)/core/unit_test/*/*.hpp)
default: build_all
echo "End Build"
-include $(KOKKOS_PATH)/Makefile.kokkos
-
-ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
- CXX = $(NVCC_WRAPPER)
- CXXFLAGS ?= -O3
- LINK = $(CXX)
- LDFLAGS ?= -lpthread
+ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
+ CXX = $(KOKKOS_PATH)/config/nvcc_wrapper
else
- CXX ?= g++
- CXXFLAGS ?= -O3
- LINK ?= $(CXX)
- LDFLAGS ?= -lpthread
+ CXX = g++
endif
+CXXFLAGS = -O3
+LINK ?= $(CXX)
+LDFLAGS ?= -lpthread
+
+include $(KOKKOS_PATH)/Makefile.kokkos
+
KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/core/unit_test
TEST_TARGETS =
TARGETS =
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
- OBJ_CUDA = TestCuda_c.o TestCuda_b.o TestCuda_a.o TestCuda.o UnitTestMain.o gtest-all.o
+ OBJ_CUDA = TestCuda_Other.o TestCuda_Reductions_a.o TestCuda_Reductions_b.o TestCuda_Atomics.o TestCuda_Team.o TestCuda_Spaces.o
+ OBJ_CUDA += TestCuda_SubView_a.o TestCuda_SubView_b.o
+ifeq ($(KOKKOS_INTERNAL_COMPILER_XL), 1)
+ OBJ_OPENMP += TestCuda_SubView_c_all.o
+else
+ OBJ_CUDA += TestCuda_SubView_c01.o TestCuda_SubView_c02.o TestCuda_SubView_c03.o
+ OBJ_CUDA += TestCuda_SubView_c04.o TestCuda_SubView_c05.o TestCuda_SubView_c06.o
+ OBJ_CUDA += TestCuda_SubView_c07.o TestCuda_SubView_c08.o TestCuda_SubView_c09.o
+ OBJ_CUDA += TestCuda_SubView_c10.o TestCuda_SubView_c11.o TestCuda_SubView_c12.o
+endif
+ OBJ_CUDA += TestCuda_ViewAPI_a.o TestCuda_ViewAPI_b.o TestCuda_ViewAPI_c.o TestCuda_ViewAPI_d.o
+ OBJ_CUDA += TestCuda_ViewAPI_e.o TestCuda_ViewAPI_f.o TestCuda_ViewAPI_g.o TestCuda_ViewAPI_h.o
+ OBJ_CUDA += UnitTestMain.o gtest-all.o
TARGETS += KokkosCore_UnitTest_Cuda
TEST_TARGETS += test-cuda
endif
ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 1)
- OBJ_THREADS = TestThreads.o UnitTestMain.o gtest-all.o
+ OBJ_THREADS = TestThreads_Other.o TestThreads_Reductions.o TestThreads_Atomics.o TestThreads_Team.o
+ OBJ_THREADS += TestThreads_SubView_a.o TestThreads_SubView_b.o
+ OBJ_THREADS += TestThreads_SubView_c01.o TestThreads_SubView_c02.o TestThreads_SubView_c03.o
+ OBJ_THREADS += TestThreads_SubView_c04.o TestThreads_SubView_c05.o TestThreads_SubView_c06.o
+ OBJ_THREADS += TestThreads_SubView_c07.o TestThreads_SubView_c08.o TestThreads_SubView_c09.o
+ OBJ_THREADS += TestThreads_SubView_c10.o TestThreads_SubView_c11.o TestThreads_SubView_c12.o
+ OBJ_THREADS += TestThreads_ViewAPI_a.o TestThreads_ViewAPI_b.o UnitTestMain.o gtest-all.o
TARGETS += KokkosCore_UnitTest_Threads
TEST_TARGETS += test-threads
endif
ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
- OBJ_OPENMP = TestOpenMP_c.o TestOpenMP_b.o TestOpenMP_a.o TestOpenMP.o UnitTestMain.o gtest-all.o
+ OBJ_OPENMP = TestOpenMP_Other.o TestOpenMP_Reductions.o TestOpenMP_Atomics.o TestOpenMP_Team.o
+ OBJ_OPENMP += TestOpenMP_SubView_a.o TestOpenMP_SubView_b.o
+ifeq ($(KOKKOS_INTERNAL_COMPILER_XL), 1)
+ OBJ_OPENMP += TestOpenMP_SubView_c_all.o
+else
+ OBJ_OPENMP += TestOpenMP_SubView_c01.o TestOpenMP_SubView_c02.o TestOpenMP_SubView_c03.o
+ OBJ_OPENMP += TestOpenMP_SubView_c04.o TestOpenMP_SubView_c05.o TestOpenMP_SubView_c06.o
+ OBJ_OPENMP += TestOpenMP_SubView_c07.o TestOpenMP_SubView_c08.o TestOpenMP_SubView_c09.o
+ OBJ_OPENMP += TestOpenMP_SubView_c10.o TestOpenMP_SubView_c11.o TestOpenMP_SubView_c12.o
+endif
+ OBJ_OPENMP += TestOpenMP_ViewAPI_a.o TestOpenMP_ViewAPI_b.o UnitTestMain.o gtest-all.o
TARGETS += KokkosCore_UnitTest_OpenMP
TEST_TARGETS += test-openmp
endif
ifeq ($(KOKKOS_INTERNAL_USE_SERIAL), 1)
- OBJ_SERIAL = TestSerial.o UnitTestMain.o gtest-all.o
+ OBJ_SERIAL = TestSerial_Other.o TestSerial_Reductions.o TestSerial_Atomics.o TestSerial_Team.o
+ OBJ_SERIAL += TestSerial_SubView_a.o TestSerial_SubView_b.o
+ifeq ($(KOKKOS_INTERNAL_COMPILER_XL), 1)
+ OBJ_OPENMP += TestSerial_SubView_c_all.o
+else
+ OBJ_SERIAL += TestSerial_SubView_c01.o TestSerial_SubView_c02.o TestSerial_SubView_c03.o
+ OBJ_SERIAL += TestSerial_SubView_c04.o TestSerial_SubView_c05.o TestSerial_SubView_c06.o
+ OBJ_SERIAL += TestSerial_SubView_c07.o TestSerial_SubView_c08.o TestSerial_SubView_c09.o
+ OBJ_SERIAL += TestSerial_SubView_c10.o TestSerial_SubView_c11.o TestSerial_SubView_c12.o
+endif
+ OBJ_SERIAL += TestSerial_ViewAPI_a.o TestSerial_ViewAPI_b.o UnitTestMain.o gtest-all.o
TARGETS += KokkosCore_UnitTest_Serial
TEST_TARGETS += test-serial
endif
ifeq ($(KOKKOS_INTERNAL_USE_QTHREAD), 1)
OBJ_QTHREAD = TestQthread.o UnitTestMain.o gtest-all.o
TARGETS += KokkosCore_UnitTest_Qthread
TEST_TARGETS += test-qthread
endif
OBJ_HWLOC = TestHWLOC.o UnitTestMain.o gtest-all.o
TARGETS += KokkosCore_UnitTest_HWLOC
TEST_TARGETS += test-hwloc
-OBJ_DEFAULT = TestDefaultDeviceType.o TestDefaultDeviceType_a.o UnitTestMain.o gtest-all.o
+OBJ_DEFAULT = TestDefaultDeviceType.o TestDefaultDeviceType_a.o TestDefaultDeviceType_b.o TestDefaultDeviceType_c.o TestDefaultDeviceType_d.o UnitTestMain.o gtest-all.o
TARGETS += KokkosCore_UnitTest_Default
TEST_TARGETS += test-default
NUM_INITTESTS = 16
INITTESTS_NUMBERS := $(shell seq 1 ${NUM_INITTESTS})
INITTESTS_TARGETS := $(addprefix KokkosCore_UnitTest_DefaultDeviceTypeInit_,${INITTESTS_NUMBERS})
TARGETS += ${INITTESTS_TARGETS}
INITTESTS_TEST_TARGETS := $(addprefix test-default-init-,${INITTESTS_NUMBERS})
TEST_TARGETS += ${INITTESTS_TEST_TARGETS}
OBJ_SYNCHRONIC = TestSynchronic.o UnitTestMain.o gtest-all.o
TARGETS += KokkosCore_UnitTest_Synchronic
TEST_TARGETS += test-synchronic
KokkosCore_UnitTest_Cuda: $(OBJ_CUDA) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_CUDA) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_UnitTest_Cuda
KokkosCore_UnitTest_Threads: $(OBJ_THREADS) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_THREADS) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_UnitTest_Threads
KokkosCore_UnitTest_OpenMP: $(OBJ_OPENMP) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_OPENMP) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_UnitTest_OpenMP
KokkosCore_UnitTest_Serial: $(OBJ_SERIAL) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_SERIAL) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_UnitTest_Serial
KokkosCore_UnitTest_Qthread: $(OBJ_QTHREAD) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_QTHREAD) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_UnitTest_Qthread
KokkosCore_UnitTest_HWLOC: $(OBJ_HWLOC) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_HWLOC) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_UnitTest_HWLOC
KokkosCore_UnitTest_AllocationTracker: $(OBJ_ALLOCATIONTRACKER) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_ALLOCATIONTRACKER) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_UnitTest_AllocationTracker
KokkosCore_UnitTest_Default: $(OBJ_DEFAULT) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_DEFAULT) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_UnitTest_Default
${INITTESTS_TARGETS}: KokkosCore_UnitTest_DefaultDeviceTypeInit_%: TestDefaultDeviceTypeInit_%.o UnitTestMain.o gtest-all.o $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) TestDefaultDeviceTypeInit_$*.o UnitTestMain.o gtest-all.o $(KOKKOS_LIBS) $(LIB) -o KokkosCore_UnitTest_DefaultDeviceTypeInit_$*
KokkosCore_UnitTest_Synchronic: $(OBJ_SYNCHRONIC) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_SYNCHRONIC) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_UnitTest_Synchronic
test-cuda: KokkosCore_UnitTest_Cuda
./KokkosCore_UnitTest_Cuda
test-threads: KokkosCore_UnitTest_Threads
./KokkosCore_UnitTest_Threads
test-openmp: KokkosCore_UnitTest_OpenMP
./KokkosCore_UnitTest_OpenMP
test-serial: KokkosCore_UnitTest_Serial
./KokkosCore_UnitTest_Serial
test-qthread: KokkosCore_UnitTest_Qthread
./KokkosCore_UnitTest_Qthread
test-hwloc: KokkosCore_UnitTest_HWLOC
./KokkosCore_UnitTest_HWLOC
test-allocationtracker: KokkosCore_UnitTest_AllocationTracker
./KokkosCore_UnitTest_AllocationTracker
test-default: KokkosCore_UnitTest_Default
./KokkosCore_UnitTest_Default
${INITTESTS_TEST_TARGETS}: test-default-init-%: KokkosCore_UnitTest_DefaultDeviceTypeInit_%
./KokkosCore_UnitTest_DefaultDeviceTypeInit_$*
test-synchronic: KokkosCore_UnitTest_Synchronic
./KokkosCore_UnitTest_Synchronic
build_all: $(TARGETS)
test: $(TEST_TARGETS)
clean: kokkos-clean
rm -f *.o $(TARGETS)
# Compilation rules
%.o:%.cpp $(KOKKOS_CPP_DEPENDS) $(TEST_HEADERS)
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
gtest-all.o:$(GTEST_PATH)/gtest/gtest-all.cc
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $(GTEST_PATH)/gtest/gtest-all.cc
diff --git a/lib/kokkos/core/unit_test/TestAggregate.hpp b/lib/kokkos/core/unit_test/TestAggregate.hpp
index 5388a6078..d22837f3e 100644
--- a/lib/kokkos/core/unit_test/TestAggregate.hpp
+++ b/lib/kokkos/core/unit_test/TestAggregate.hpp
@@ -1,109 +1,109 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef TEST_AGGREGATE_HPP
#define TEST_AGGREGATE_HPP
#include <gtest/gtest.h>
#include <stdexcept>
#include <sstream>
#include <iostream>
/*--------------------------------------------------------------------------*/
-#include <impl/KokkosExp_ViewArray.hpp>
+#include <impl/Kokkos_ViewArray.hpp>
namespace Test {
template< class DeviceType >
void TestViewAggregate()
{
typedef Kokkos::Array<double,32> value_type ;
typedef Kokkos::Experimental::Impl::
ViewDataAnalysis< value_type * , Kokkos::LayoutLeft , value_type >
analysis_1d ;
static_assert( std::is_same< typename analysis_1d::specialize , Kokkos::Array<> >::value , "" );
typedef Kokkos::ViewTraits< value_type ** , DeviceType > a32_traits ;
typedef Kokkos::ViewTraits< typename a32_traits::scalar_array_type , DeviceType > flat_traits ;
static_assert( std::is_same< typename a32_traits::specialize , Kokkos::Array<> >::value , "" );
static_assert( std::is_same< typename a32_traits::value_type , value_type >::value , "" );
static_assert( a32_traits::rank == 2 , "" );
static_assert( a32_traits::rank_dynamic == 2 , "" );
static_assert( std::is_same< typename flat_traits::specialize , void >::value , "" );
static_assert( flat_traits::rank == 3 , "" );
static_assert( flat_traits::rank_dynamic == 2 , "" );
static_assert( flat_traits::dimension::N2 == 32 , "" );
typedef Kokkos::View< Kokkos::Array<double,32> ** , DeviceType > a32_type ;
typedef typename a32_type::array_type a32_flat_type ;
static_assert( std::is_same< typename a32_type::value_type , value_type >::value , "" );
static_assert( std::is_same< typename a32_type::pointer_type , double * >::value , "" );
static_assert( a32_type::Rank == 2 , "" );
static_assert( a32_flat_type::Rank == 3 , "" );
a32_type x("test",4,5);
a32_flat_type y( x );
ASSERT_EQ( x.extent(0) , 4 );
ASSERT_EQ( x.extent(1) , 5 );
ASSERT_EQ( y.extent(0) , 4 );
ASSERT_EQ( y.extent(1) , 5 );
ASSERT_EQ( y.extent(2) , 32 );
}
}
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
#endif /* #ifndef TEST_AGGREGATE_HPP */
diff --git a/lib/kokkos/core/unit_test/TestAggregateReduction.hpp b/lib/kokkos/core/unit_test/TestAggregateReduction.hpp
deleted file mode 100644
index bd05cd347..000000000
--- a/lib/kokkos/core/unit_test/TestAggregateReduction.hpp
+++ /dev/null
@@ -1,191 +0,0 @@
-/*
-//@HEADER
-// ************************************************************************
-//
-// Kokkos v. 2.0
-// Copyright (2014) Sandia Corporation
-//
-// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
-// the U.S. Government retains certain rights in this software.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are
-// met:
-//
-// 1. Redistributions of source code must retain the above copyright
-// notice, this list of conditions and the following disclaimer.
-//
-// 2. Redistributions in binary form must reproduce the above copyright
-// notice, this list of conditions and the following disclaimer in the
-// documentation and/or other materials provided with the distribution.
-//
-// 3. Neither the name of the Corporation nor the names of the
-// contributors may be used to endorse or promote products derived from
-// this software without specific prior written permission.
-//
-// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
-// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
-// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
-// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
-// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
-// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
-// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
-// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
-// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-//
-// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
-// ************************************************************************
-//@HEADER
-*/
-
-#ifndef TEST_AGGREGATE_REDUCTION_HPP
-#define TEST_AGGREGATE_REDUCTION_HPP
-
-#include <gtest/gtest.h>
-
-#include <stdexcept>
-#include <sstream>
-#include <iostream>
-
-namespace Test {
-
-template< typename T , unsigned N >
-struct StaticArray {
- T value[N] ;
-
- KOKKOS_INLINE_FUNCTION
- StaticArray() = default;
-
- KOKKOS_INLINE_FUNCTION
- StaticArray( const StaticArray & rhs ) = default;
-
- KOKKOS_INLINE_FUNCTION
- operator T () { return value[0]; }
-
- KOKKOS_INLINE_FUNCTION
- StaticArray & operator = ( const T & rhs )
- {
- for ( unsigned i = 0 ; i < N ; ++i ) value[i] = rhs ;
- return *this ;
- }
-
- KOKKOS_INLINE_FUNCTION
- StaticArray & operator = ( const StaticArray & rhs ) = default;
-
- KOKKOS_INLINE_FUNCTION
- StaticArray operator * ( const StaticArray & rhs )
- {
- StaticArray tmp ;
- for ( unsigned i = 0 ; i < N ; ++i ) tmp.value[i] = value[i] * rhs.value[i] ;
- return tmp ;
- }
-
- KOKKOS_INLINE_FUNCTION
- StaticArray operator + ( const StaticArray & rhs )
- {
- StaticArray tmp ;
- for ( unsigned i = 0 ; i < N ; ++i ) tmp.value[i] = value[i] + rhs.value[i] ;
- return tmp ;
- }
-
- KOKKOS_INLINE_FUNCTION
- StaticArray & operator += ( const StaticArray & rhs )
- {
- for ( unsigned i = 0 ; i < N ; ++i ) value[i] += rhs.value[i] ;
- return *this ;
- }
-
- KOKKOS_INLINE_FUNCTION
- void operator += ( const volatile StaticArray & rhs ) volatile
- {
- for ( unsigned i = 0 ; i < N ; ++i ) value[i] += rhs.value[i] ;
- }
-};
-
-static_assert(std::is_trivial<StaticArray<int, 4>>::value, "Not trivial");
-
-template< typename T , class Space >
-struct DOT {
- typedef T value_type ;
- typedef Space execution_space ;
-
- Kokkos::View< value_type * , Space > a ;
- Kokkos::View< value_type * , Space > b ;
-
- DOT( const Kokkos::View< value_type * , Space > arg_a
- , const Kokkos::View< value_type * , Space > arg_b
- )
- : a( arg_a ), b( arg_b ) {}
-
- KOKKOS_INLINE_FUNCTION
- void operator()( const int i , value_type & update ) const
- {
- update += a(i) * b(i);
- }
-};
-
-template< typename T , class Space >
-struct FILL {
- typedef T value_type ;
- typedef Space execution_space ;
-
- Kokkos::View< value_type * , Space > a ;
- Kokkos::View< value_type * , Space > b ;
-
- FILL( const Kokkos::View< value_type * , Space > & arg_a
- , const Kokkos::View< value_type * , Space > & arg_b
- )
- : a( arg_a ), b( arg_b ) {}
-
- KOKKOS_INLINE_FUNCTION
- void operator()( const int i ) const
- {
- a(i) = i % 2 ? i + 1 : 1 ;
- b(i) = i % 2 ? 1 : i + 1 ;
- }
-};
-
-template< class Space >
-void TestViewAggregateReduction()
-{
-
-#if ! KOKKOS_USING_EXP_VIEW
-
- const int count = 2 ;
- const long result = count % 2 ? ( count * ( ( count + 1 ) / 2 ) )
- : ( ( count / 2 ) * ( count + 1 ) );
-
- Kokkos::View< long * , Space > a("a",count);
- Kokkos::View< long * , Space > b("b",count);
- Kokkos::View< StaticArray<long,4> * , Space > a4("a4",count);
- Kokkos::View< StaticArray<long,4> * , Space > b4("b4",count);
- Kokkos::View< StaticArray<long,10> * , Space > a10("a10",count);
- Kokkos::View< StaticArray<long,10> * , Space > b10("b10",count);
-
- Kokkos::parallel_for( count , FILL<long,Space>(a,b) );
- Kokkos::parallel_for( count , FILL< StaticArray<long,4> , Space >(a4,b4) );
- Kokkos::parallel_for( count , FILL< StaticArray<long,10> , Space >(a10,b10) );
-
- long r = 0;
- StaticArray<long,4> r4 ;
- StaticArray<long,10> r10 ;
-
- Kokkos::parallel_reduce( count , DOT<long,Space>(a,b) , r );
- Kokkos::parallel_reduce( count , DOT< StaticArray<long,4> , Space >(a4,b4) , r4 );
- Kokkos::parallel_reduce( count , DOT< StaticArray<long,10> , Space >(a10,b10) , r10 );
-
- ASSERT_EQ( result , r );
- for ( int i = 0 ; i < 10 ; ++i ) { ASSERT_EQ( result , r10.value[i] ); }
- for ( int i = 0 ; i < 4 ; ++i ) { ASSERT_EQ( result , r4.value[i] ); }
-
-#endif
-
-}
-
-}
-
-#endif /* #ifndef TEST_AGGREGATE_REDUCTION_HPP */
-
diff --git a/lib/kokkos/core/unit_test/TestAtomicOperations.hpp b/lib/kokkos/core/unit_test/TestAtomicOperations.hpp
index aee4bda06..7f1519045 100644
--- a/lib/kokkos/core/unit_test/TestAtomicOperations.hpp
+++ b/lib/kokkos/core/unit_test/TestAtomicOperations.hpp
@@ -1,841 +1,985 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Core.hpp>
namespace TestAtomicOperations {
//-----------------------------------------------
//--------------zero_functor---------------------
//-----------------------------------------------
template<class T,class DEVICE_TYPE>
struct ZeroFunctor {
typedef DEVICE_TYPE execution_space;
typedef typename Kokkos::View<T,execution_space> type;
typedef typename Kokkos::View<T,execution_space>::HostMirror h_type;
type data;
KOKKOS_INLINE_FUNCTION
void operator()(int) const {
data() = 0;
}
};
//-----------------------------------------------
//--------------init_functor---------------------
//-----------------------------------------------
template<class T,class DEVICE_TYPE>
struct InitFunctor {
typedef DEVICE_TYPE execution_space;
typedef typename Kokkos::View<T,execution_space> type;
typedef typename Kokkos::View<T,execution_space>::HostMirror h_type;
type data;
T init_value ;
KOKKOS_INLINE_FUNCTION
void operator()(int) const {
data() = init_value;
}
InitFunctor(T _init_value) : init_value(_init_value) {}
};
//---------------------------------------------------
//--------------atomic_fetch_max---------------------
//---------------------------------------------------
template<class T,class DEVICE_TYPE>
struct MaxFunctor{
typedef DEVICE_TYPE execution_space;
typedef Kokkos::View<T,execution_space> type;
type data;
T i0;
T i1;
KOKKOS_INLINE_FUNCTION
void operator()(int) const {
//Kokkos::atomic_fetch_max(&data(),(T)1);
Kokkos::atomic_fetch_max(&data(),(T)i1);
}
MaxFunctor( T _i0 , T _i1 ) : i0(_i0) , i1(_i1) {}
};
template<class T, class execution_space >
T MaxAtomic(T i0 , T i1) {
struct InitFunctor<T,execution_space> f_init(i0);
typename InitFunctor<T,execution_space>::type data("Data");
typename InitFunctor<T,execution_space>::h_type h_data("HData");
f_init.data = data;
Kokkos::parallel_for(1,f_init);
execution_space::fence();
struct MaxFunctor<T,execution_space> f(i0,i1);
f.data = data;
Kokkos::parallel_for(1,f);
execution_space::fence();
Kokkos::deep_copy(h_data,data);
T val = h_data();
return val;
}
template<class T>
T MaxAtomicCheck(T i0 , T i1) {
T* data = new T[1];
data[0] = 0;
*data = (i0 > i1 ? i0 : i1) ;
T val = *data;
delete [] data;
return val;
}
template<class T,class DeviceType>
bool MaxAtomicTest(T i0, T i1)
{
T res = MaxAtomic<T,DeviceType>(i0,i1);
T resSerial = MaxAtomicCheck<T>(i0,i1);
bool passed = true;
if ( resSerial != res ) {
passed = false;
std::cout << "Loop<"
<< typeid(T).name()
<< ">( test = MaxAtomicTest"
<< " FAILED : "
<< resSerial << " != " << res
<< std::endl ;
}
return passed ;
}
//---------------------------------------------------
//--------------atomic_fetch_min---------------------
//---------------------------------------------------
template<class T,class DEVICE_TYPE>
struct MinFunctor{
typedef DEVICE_TYPE execution_space;
typedef Kokkos::View<T,execution_space> type;
type data;
T i0;
T i1;
KOKKOS_INLINE_FUNCTION
void operator()(int) const {
Kokkos::atomic_fetch_min(&data(),(T)i1);
}
MinFunctor( T _i0 , T _i1 ) : i0(_i0) , i1(_i1) {}
};
template<class T, class execution_space >
T MinAtomic(T i0 , T i1) {
struct InitFunctor<T,execution_space> f_init(i0);
typename InitFunctor<T,execution_space>::type data("Data");
typename InitFunctor<T,execution_space>::h_type h_data("HData");
f_init.data = data;
Kokkos::parallel_for(1,f_init);
execution_space::fence();
struct MinFunctor<T,execution_space> f(i0,i1);
f.data = data;
Kokkos::parallel_for(1,f);
execution_space::fence();
Kokkos::deep_copy(h_data,data);
T val = h_data();
return val;
}
template<class T>
T MinAtomicCheck(T i0 , T i1) {
T* data = new T[1];
data[0] = 0;
*data = (i0 < i1 ? i0 : i1) ;
T val = *data;
delete [] data;
return val;
}
template<class T,class DeviceType>
bool MinAtomicTest(T i0, T i1)
{
T res = MinAtomic<T,DeviceType>(i0,i1);
T resSerial = MinAtomicCheck<T>(i0,i1);
bool passed = true;
if ( resSerial != res ) {
passed = false;
std::cout << "Loop<"
<< typeid(T).name()
<< ">( test = MinAtomicTest"
<< " FAILED : "
<< resSerial << " != " << res
<< std::endl ;
}
return passed ;
}
+//---------------------------------------------------
+//--------------atomic_increment---------------------
+//---------------------------------------------------
+
+template<class T,class DEVICE_TYPE>
+struct IncFunctor{
+ typedef DEVICE_TYPE execution_space;
+ typedef Kokkos::View<T,execution_space> type;
+ type data;
+ T i0;
+
+ KOKKOS_INLINE_FUNCTION
+ void operator()(int) const {
+ Kokkos::atomic_increment(&data());
+ }
+ IncFunctor( T _i0 ) : i0(_i0) {}
+};
+
+template<class T, class execution_space >
+T IncAtomic(T i0) {
+ struct InitFunctor<T,execution_space> f_init(i0);
+ typename InitFunctor<T,execution_space>::type data("Data");
+ typename InitFunctor<T,execution_space>::h_type h_data("HData");
+ f_init.data = data;
+ Kokkos::parallel_for(1,f_init);
+ execution_space::fence();
+
+ struct IncFunctor<T,execution_space> f(i0);
+ f.data = data;
+ Kokkos::parallel_for(1,f);
+ execution_space::fence();
+
+ Kokkos::deep_copy(h_data,data);
+ T val = h_data();
+ return val;
+}
+
+template<class T>
+T IncAtomicCheck(T i0) {
+ T* data = new T[1];
+ data[0] = 0;
+
+ *data = i0 + 1;
+
+ T val = *data;
+ delete [] data;
+ return val;
+}
+
+template<class T,class DeviceType>
+bool IncAtomicTest(T i0)
+{
+ T res = IncAtomic<T,DeviceType>(i0);
+ T resSerial = IncAtomicCheck<T>(i0);
+
+ bool passed = true;
+
+ if ( resSerial != res ) {
+ passed = false;
+
+ std::cout << "Loop<"
+ << typeid(T).name()
+ << ">( test = IncAtomicTest"
+ << " FAILED : "
+ << resSerial << " != " << res
+ << std::endl ;
+ }
+
+ return passed ;
+}
+
+//---------------------------------------------------
+//--------------atomic_decrement---------------------
+//---------------------------------------------------
+
+template<class T,class DEVICE_TYPE>
+struct DecFunctor{
+ typedef DEVICE_TYPE execution_space;
+ typedef Kokkos::View<T,execution_space> type;
+ type data;
+ T i0;
+
+ KOKKOS_INLINE_FUNCTION
+ void operator()(int) const {
+ Kokkos::atomic_decrement(&data());
+ }
+ DecFunctor( T _i0 ) : i0(_i0) {}
+};
+
+template<class T, class execution_space >
+T DecAtomic(T i0) {
+ struct InitFunctor<T,execution_space> f_init(i0);
+ typename InitFunctor<T,execution_space>::type data("Data");
+ typename InitFunctor<T,execution_space>::h_type h_data("HData");
+ f_init.data = data;
+ Kokkos::parallel_for(1,f_init);
+ execution_space::fence();
+
+ struct DecFunctor<T,execution_space> f(i0);
+ f.data = data;
+ Kokkos::parallel_for(1,f);
+ execution_space::fence();
+
+ Kokkos::deep_copy(h_data,data);
+ T val = h_data();
+ return val;
+}
+
+template<class T>
+T DecAtomicCheck(T i0) {
+ T* data = new T[1];
+ data[0] = 0;
+
+ *data = i0 - 1;
+
+ T val = *data;
+ delete [] data;
+ return val;
+}
+
+template<class T,class DeviceType>
+bool DecAtomicTest(T i0)
+{
+ T res = DecAtomic<T,DeviceType>(i0);
+ T resSerial = DecAtomicCheck<T>(i0);
+
+ bool passed = true;
+
+ if ( resSerial != res ) {
+ passed = false;
+
+ std::cout << "Loop<"
+ << typeid(T).name()
+ << ">( test = DecAtomicTest"
+ << " FAILED : "
+ << resSerial << " != " << res
+ << std::endl ;
+ }
+
+ return passed ;
+}
+
//---------------------------------------------------
//--------------atomic_fetch_mul---------------------
//---------------------------------------------------
template<class T,class DEVICE_TYPE>
struct MulFunctor{
typedef DEVICE_TYPE execution_space;
typedef Kokkos::View<T,execution_space> type;
type data;
T i0;
T i1;
KOKKOS_INLINE_FUNCTION
void operator()(int) const {
Kokkos::atomic_fetch_mul(&data(),(T)i1);
}
MulFunctor( T _i0 , T _i1 ) : i0(_i0) , i1(_i1) {}
};
template<class T, class execution_space >
T MulAtomic(T i0 , T i1) {
struct InitFunctor<T,execution_space> f_init(i0);
typename InitFunctor<T,execution_space>::type data("Data");
typename InitFunctor<T,execution_space>::h_type h_data("HData");
f_init.data = data;
Kokkos::parallel_for(1,f_init);
execution_space::fence();
struct MulFunctor<T,execution_space> f(i0,i1);
f.data = data;
Kokkos::parallel_for(1,f);
execution_space::fence();
Kokkos::deep_copy(h_data,data);
T val = h_data();
return val;
}
template<class T>
T MulAtomicCheck(T i0 , T i1) {
T* data = new T[1];
data[0] = 0;
*data = i0*i1 ;
T val = *data;
delete [] data;
return val;
}
template<class T,class DeviceType>
bool MulAtomicTest(T i0, T i1)
{
T res = MulAtomic<T,DeviceType>(i0,i1);
T resSerial = MulAtomicCheck<T>(i0,i1);
bool passed = true;
if ( resSerial != res ) {
passed = false;
std::cout << "Loop<"
<< typeid(T).name()
<< ">( test = MulAtomicTest"
<< " FAILED : "
<< resSerial << " != " << res
<< std::endl ;
}
return passed ;
}
//---------------------------------------------------
//--------------atomic_fetch_div---------------------
//---------------------------------------------------
template<class T,class DEVICE_TYPE>
struct DivFunctor{
typedef DEVICE_TYPE execution_space;
typedef Kokkos::View<T,execution_space> type;
type data;
T i0;
T i1;
KOKKOS_INLINE_FUNCTION
void operator()(int) const {
Kokkos::atomic_fetch_div(&data(),(T)i1);
}
DivFunctor( T _i0 , T _i1 ) : i0(_i0) , i1(_i1) {}
};
template<class T, class execution_space >
T DivAtomic(T i0 , T i1) {
struct InitFunctor<T,execution_space> f_init(i0);
typename InitFunctor<T,execution_space>::type data("Data");
typename InitFunctor<T,execution_space>::h_type h_data("HData");
f_init.data = data;
Kokkos::parallel_for(1,f_init);
execution_space::fence();
struct DivFunctor<T,execution_space> f(i0,i1);
f.data = data;
Kokkos::parallel_for(1,f);
execution_space::fence();
Kokkos::deep_copy(h_data,data);
T val = h_data();
return val;
}
template<class T>
T DivAtomicCheck(T i0 , T i1) {
T* data = new T[1];
data[0] = 0;
*data = i0/i1 ;
T val = *data;
delete [] data;
return val;
}
template<class T,class DeviceType>
bool DivAtomicTest(T i0, T i1)
{
T res = DivAtomic<T,DeviceType>(i0,i1);
T resSerial = DivAtomicCheck<T>(i0,i1);
bool passed = true;
if ( resSerial != res ) {
passed = false;
std::cout << "Loop<"
<< typeid(T).name()
<< ">( test = DivAtomicTest"
<< " FAILED : "
<< resSerial << " != " << res
<< std::endl ;
}
return passed ;
}
//---------------------------------------------------
//--------------atomic_fetch_mod---------------------
//---------------------------------------------------
template<class T,class DEVICE_TYPE>
struct ModFunctor{
typedef DEVICE_TYPE execution_space;
typedef Kokkos::View<T,execution_space> type;
type data;
T i0;
T i1;
KOKKOS_INLINE_FUNCTION
void operator()(int) const {
Kokkos::atomic_fetch_mod(&data(),(T)i1);
}
ModFunctor( T _i0 , T _i1 ) : i0(_i0) , i1(_i1) {}
};
template<class T, class execution_space >
T ModAtomic(T i0 , T i1) {
struct InitFunctor<T,execution_space> f_init(i0);
typename InitFunctor<T,execution_space>::type data("Data");
typename InitFunctor<T,execution_space>::h_type h_data("HData");
f_init.data = data;
Kokkos::parallel_for(1,f_init);
execution_space::fence();
struct ModFunctor<T,execution_space> f(i0,i1);
f.data = data;
Kokkos::parallel_for(1,f);
execution_space::fence();
Kokkos::deep_copy(h_data,data);
T val = h_data();
return val;
}
template<class T>
T ModAtomicCheck(T i0 , T i1) {
T* data = new T[1];
data[0] = 0;
*data = i0%i1 ;
T val = *data;
delete [] data;
return val;
}
template<class T,class DeviceType>
bool ModAtomicTest(T i0, T i1)
{
T res = ModAtomic<T,DeviceType>(i0,i1);
T resSerial = ModAtomicCheck<T>(i0,i1);
bool passed = true;
if ( resSerial != res ) {
passed = false;
std::cout << "Loop<"
<< typeid(T).name()
<< ">( test = ModAtomicTest"
<< " FAILED : "
<< resSerial << " != " << res
<< std::endl ;
}
return passed ;
}
//---------------------------------------------------
//--------------atomic_fetch_and---------------------
//---------------------------------------------------
template<class T,class DEVICE_TYPE>
struct AndFunctor{
typedef DEVICE_TYPE execution_space;
typedef Kokkos::View<T,execution_space> type;
type data;
T i0;
T i1;
KOKKOS_INLINE_FUNCTION
void operator()(int) const {
Kokkos::atomic_fetch_and(&data(),(T)i1);
}
AndFunctor( T _i0 , T _i1 ) : i0(_i0) , i1(_i1) {}
};
template<class T, class execution_space >
T AndAtomic(T i0 , T i1) {
struct InitFunctor<T,execution_space> f_init(i0);
typename InitFunctor<T,execution_space>::type data("Data");
typename InitFunctor<T,execution_space>::h_type h_data("HData");
f_init.data = data;
Kokkos::parallel_for(1,f_init);
execution_space::fence();
struct AndFunctor<T,execution_space> f(i0,i1);
f.data = data;
Kokkos::parallel_for(1,f);
execution_space::fence();
Kokkos::deep_copy(h_data,data);
T val = h_data();
return val;
}
template<class T>
T AndAtomicCheck(T i0 , T i1) {
T* data = new T[1];
data[0] = 0;
*data = i0&i1 ;
T val = *data;
delete [] data;
return val;
}
template<class T,class DeviceType>
bool AndAtomicTest(T i0, T i1)
{
T res = AndAtomic<T,DeviceType>(i0,i1);
T resSerial = AndAtomicCheck<T>(i0,i1);
bool passed = true;
if ( resSerial != res ) {
passed = false;
std::cout << "Loop<"
<< typeid(T).name()
<< ">( test = AndAtomicTest"
<< " FAILED : "
<< resSerial << " != " << res
<< std::endl ;
}
return passed ;
}
//---------------------------------------------------
//--------------atomic_fetch_or----------------------
//---------------------------------------------------
template<class T,class DEVICE_TYPE>
struct OrFunctor{
typedef DEVICE_TYPE execution_space;
typedef Kokkos::View<T,execution_space> type;
type data;
T i0;
T i1;
KOKKOS_INLINE_FUNCTION
void operator()(int) const {
Kokkos::atomic_fetch_or(&data(),(T)i1);
}
OrFunctor( T _i0 , T _i1 ) : i0(_i0) , i1(_i1) {}
};
template<class T, class execution_space >
T OrAtomic(T i0 , T i1) {
struct InitFunctor<T,execution_space> f_init(i0);
typename InitFunctor<T,execution_space>::type data("Data");
typename InitFunctor<T,execution_space>::h_type h_data("HData");
f_init.data = data;
Kokkos::parallel_for(1,f_init);
execution_space::fence();
struct OrFunctor<T,execution_space> f(i0,i1);
f.data = data;
Kokkos::parallel_for(1,f);
execution_space::fence();
Kokkos::deep_copy(h_data,data);
T val = h_data();
return val;
}
template<class T>
T OrAtomicCheck(T i0 , T i1) {
T* data = new T[1];
data[0] = 0;
*data = i0|i1 ;
T val = *data;
delete [] data;
return val;
}
template<class T,class DeviceType>
bool OrAtomicTest(T i0, T i1)
{
T res = OrAtomic<T,DeviceType>(i0,i1);
T resSerial = OrAtomicCheck<T>(i0,i1);
bool passed = true;
if ( resSerial != res ) {
passed = false;
std::cout << "Loop<"
<< typeid(T).name()
<< ">( test = OrAtomicTest"
<< " FAILED : "
<< resSerial << " != " << res
<< std::endl ;
}
return passed ;
}
//---------------------------------------------------
//--------------atomic_fetch_xor---------------------
//---------------------------------------------------
template<class T,class DEVICE_TYPE>
struct XorFunctor{
typedef DEVICE_TYPE execution_space;
typedef Kokkos::View<T,execution_space> type;
type data;
T i0;
T i1;
KOKKOS_INLINE_FUNCTION
void operator()(int) const {
Kokkos::atomic_fetch_xor(&data(),(T)i1);
}
XorFunctor( T _i0 , T _i1 ) : i0(_i0) , i1(_i1) {}
};
template<class T, class execution_space >
T XorAtomic(T i0 , T i1) {
struct InitFunctor<T,execution_space> f_init(i0);
typename InitFunctor<T,execution_space>::type data("Data");
typename InitFunctor<T,execution_space>::h_type h_data("HData");
f_init.data = data;
Kokkos::parallel_for(1,f_init);
execution_space::fence();
struct XorFunctor<T,execution_space> f(i0,i1);
f.data = data;
Kokkos::parallel_for(1,f);
execution_space::fence();
Kokkos::deep_copy(h_data,data);
T val = h_data();
return val;
}
template<class T>
T XorAtomicCheck(T i0 , T i1) {
T* data = new T[1];
data[0] = 0;
*data = i0^i1 ;
T val = *data;
delete [] data;
return val;
}
template<class T,class DeviceType>
bool XorAtomicTest(T i0, T i1)
{
T res = XorAtomic<T,DeviceType>(i0,i1);
T resSerial = XorAtomicCheck<T>(i0,i1);
bool passed = true;
if ( resSerial != res ) {
passed = false;
std::cout << "Loop<"
<< typeid(T).name()
<< ">( test = XorAtomicTest"
<< " FAILED : "
<< resSerial << " != " << res
<< std::endl ;
}
return passed ;
}
//---------------------------------------------------
//--------------atomic_fetch_lshift---------------------
//---------------------------------------------------
template<class T,class DEVICE_TYPE>
struct LShiftFunctor{
typedef DEVICE_TYPE execution_space;
typedef Kokkos::View<T,execution_space> type;
type data;
T i0;
T i1;
KOKKOS_INLINE_FUNCTION
void operator()(int) const {
Kokkos::atomic_fetch_lshift(&data(),(T)i1);
}
LShiftFunctor( T _i0 , T _i1 ) : i0(_i0) , i1(_i1) {}
};
template<class T, class execution_space >
T LShiftAtomic(T i0 , T i1) {
struct InitFunctor<T,execution_space> f_init(i0);
typename InitFunctor<T,execution_space>::type data("Data");
typename InitFunctor<T,execution_space>::h_type h_data("HData");
f_init.data = data;
Kokkos::parallel_for(1,f_init);
execution_space::fence();
struct LShiftFunctor<T,execution_space> f(i0,i1);
f.data = data;
Kokkos::parallel_for(1,f);
execution_space::fence();
Kokkos::deep_copy(h_data,data);
T val = h_data();
return val;
}
template<class T>
T LShiftAtomicCheck(T i0 , T i1) {
T* data = new T[1];
data[0] = 0;
*data = i0<<i1 ;
T val = *data;
delete [] data;
return val;
}
template<class T,class DeviceType>
bool LShiftAtomicTest(T i0, T i1)
{
T res = LShiftAtomic<T,DeviceType>(i0,i1);
T resSerial = LShiftAtomicCheck<T>(i0,i1);
bool passed = true;
if ( resSerial != res ) {
passed = false;
std::cout << "Loop<"
<< typeid(T).name()
<< ">( test = LShiftAtomicTest"
<< " FAILED : "
<< resSerial << " != " << res
<< std::endl ;
}
return passed ;
}
//---------------------------------------------------
//--------------atomic_fetch_rshift---------------------
//---------------------------------------------------
template<class T,class DEVICE_TYPE>
struct RShiftFunctor{
typedef DEVICE_TYPE execution_space;
typedef Kokkos::View<T,execution_space> type;
type data;
T i0;
T i1;
KOKKOS_INLINE_FUNCTION
void operator()(int) const {
Kokkos::atomic_fetch_rshift(&data(),(T)i1);
}
RShiftFunctor( T _i0 , T _i1 ) : i0(_i0) , i1(_i1) {}
};
template<class T, class execution_space >
T RShiftAtomic(T i0 , T i1) {
struct InitFunctor<T,execution_space> f_init(i0);
typename InitFunctor<T,execution_space>::type data("Data");
typename InitFunctor<T,execution_space>::h_type h_data("HData");
f_init.data = data;
Kokkos::parallel_for(1,f_init);
execution_space::fence();
struct RShiftFunctor<T,execution_space> f(i0,i1);
f.data = data;
Kokkos::parallel_for(1,f);
execution_space::fence();
Kokkos::deep_copy(h_data,data);
T val = h_data();
return val;
}
template<class T>
T RShiftAtomicCheck(T i0 , T i1) {
T* data = new T[1];
data[0] = 0;
*data = i0>>i1 ;
T val = *data;
delete [] data;
return val;
}
template<class T,class DeviceType>
bool RShiftAtomicTest(T i0, T i1)
{
T res = RShiftAtomic<T,DeviceType>(i0,i1);
T resSerial = RShiftAtomicCheck<T>(i0,i1);
bool passed = true;
if ( resSerial != res ) {
passed = false;
std::cout << "Loop<"
<< typeid(T).name()
<< ">( test = RShiftAtomicTest"
<< " FAILED : "
<< resSerial << " != " << res
<< std::endl ;
}
return passed ;
}
//---------------------------------------------------
//--------------atomic_test_control------------------
//---------------------------------------------------
template<class T,class DeviceType>
bool AtomicOperationsTestIntegralType( int i0 , int i1 , int test )
{
switch (test) {
case 1: return MaxAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
case 2: return MinAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
case 3: return MulAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
case 4: return DivAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
case 5: return ModAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
case 6: return AndAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
case 7: return OrAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
case 8: return XorAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
case 9: return LShiftAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
case 10: return RShiftAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
+ case 11: return IncAtomicTest<T,DeviceType>( (T)i0 );
+ case 12: return DecAtomicTest<T,DeviceType>( (T)i0 );
}
return 0;
}
template<class T,class DeviceType>
bool AtomicOperationsTestNonIntegralType( int i0 , int i1 , int test )
{
switch (test) {
case 1: return MaxAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
case 2: return MinAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
case 3: return MulAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
case 4: return DivAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
}
return 0;
}
} // namespace
diff --git a/lib/kokkos/core/unit_test/TestCompilerMacros.hpp b/lib/kokkos/core/unit_test/TestCompilerMacros.hpp
index dfa2250c0..71c221448 100644
--- a/lib/kokkos/core/unit_test/TestCompilerMacros.hpp
+++ b/lib/kokkos/core/unit_test/TestCompilerMacros.hpp
@@ -1,93 +1,95 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Core.hpp>
#define KOKKOS_PRAGMA_UNROLL(a)
namespace TestCompilerMacros {
template<class DEVICE_TYPE>
struct AddFunctor {
typedef DEVICE_TYPE execution_space;
typedef typename Kokkos::View<int**,execution_space> type;
type a,b;
int length;
AddFunctor(type a_, type b_):a(a_),b(b_),length(a.dimension_1()) {}
KOKKOS_INLINE_FUNCTION
void operator()(int i) const {
#ifdef KOKKOS_HAVE_PRAGMA_UNROLL
#pragma unroll
#endif
#ifdef KOKKOS_HAVE_PRAGMA_IVDEP
#pragma ivdep
#endif
#ifdef KOKKOS_HAVE_PRAGMA_VECTOR
#pragma vector always
#endif
#ifdef KOKKOS_HAVE_PRAGMA_LOOPCOUNT
#pragma loop count(128)
#endif
+#ifndef KOKKOS_HAVE_DEBUG
#ifdef KOKKOS_HAVE_PRAGMA_SIMD
#pragma simd
+#endif
#endif
for(int j=0;j<length;j++)
a(i,j) += b(i,j);
}
};
template<class DeviceType>
bool Test() {
typedef typename Kokkos::View<int**,DeviceType> type;
type a("A",1024,128);
type b("B",1024,128);
AddFunctor<DeviceType> f(a,b);
Kokkos::parallel_for(1024,f);
DeviceType::fence();
return true;
}
}
diff --git a/lib/kokkos/core/unit_test/TestCuda.cpp b/lib/kokkos/core/unit_test/TestCuda.cpp
deleted file mode 100644
index e61556625..000000000
--- a/lib/kokkos/core/unit_test/TestCuda.cpp
+++ /dev/null
@@ -1,290 +0,0 @@
-/*
-//@HEADER
-// ************************************************************************
-//
-// Kokkos v. 2.0
-// Copyright (2014) Sandia Corporation
-//
-// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
-// the U.S. Government retains certain rights in this software.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are
-// met:
-//
-// 1. Redistributions of source code must retain the above copyright
-// notice, this list of conditions and the following disclaimer.
-//
-// 2. Redistributions in binary form must reproduce the above copyright
-// notice, this list of conditions and the following disclaimer in the
-// documentation and/or other materials provided with the distribution.
-//
-// 3. Neither the name of the Corporation nor the names of the
-// contributors may be used to endorse or promote products derived from
-// this software without specific prior written permission.
-//
-// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
-// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
-// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
-// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
-// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
-// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
-// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
-// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
-// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-//
-// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
-// ************************************************************************
-//@HEADER
-*/
-
-#include <gtest/gtest.h>
-
-#include <iostream>
-
-#include <Kokkos_Core.hpp>
-
-//----------------------------------------------------------------------------
-
-#include <Cuda/Kokkos_Cuda_TaskPolicy.hpp>
-#include <impl/Kokkos_ViewTileLeft.hpp>
-#include <TestTile.hpp>
-
-//----------------------------------------------------------------------------
-
-#include <TestSharedAlloc.hpp>
-#include <TestViewMapping.hpp>
-
-#include <TestViewImpl.hpp>
-#include <TestAtomic.hpp>
-
-#include <TestViewAPI.hpp>
-#include <TestViewSubview.hpp>
-#include <TestViewOfClass.hpp>
-
-#include <TestReduce.hpp>
-#include <TestScan.hpp>
-#include <TestRange.hpp>
-#include <TestTeam.hpp>
-#include <TestAggregate.hpp>
-#include <TestAggregateReduction.hpp>
-#include <TestCompilerMacros.hpp>
-#include <TestMemorySpaceTracking.hpp>
-#include <TestMemoryPool.hpp>
-#include <TestTeamVector.hpp>
-#include <TestTemplateMetaFunctions.hpp>
-#include <TestCXX11Deduction.hpp>
-
-#include <TestTaskPolicy.hpp>
-#include <TestPolicyConstruction.hpp>
-
-#include <TestMDRange.hpp>
-
-//----------------------------------------------------------------------------
-
-class cuda : public ::testing::Test {
-protected:
- static void SetUpTestCase();
- static void TearDownTestCase();
-};
-
-void cuda::SetUpTestCase()
- {
- Kokkos::Cuda::print_configuration( std::cout );
- Kokkos::HostSpace::execution_space::initialize();
- Kokkos::Cuda::initialize( Kokkos::Cuda::SelectDevice(0) );
- }
-
-void cuda::TearDownTestCase()
- {
- Kokkos::Cuda::finalize();
- Kokkos::HostSpace::execution_space::finalize();
- }
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace Test {
-
-__global__
-void test_abort()
-{
- Kokkos::Impl::VerifyExecutionCanAccessMemorySpace<
- Kokkos::CudaSpace ,
- Kokkos::HostSpace >::verify();
-}
-
-__global__
-void test_cuda_spaces_int_value( int * ptr )
-{
- if ( *ptr == 42 ) { *ptr = 2 * 42 ; }
-}
-
-TEST_F( cuda , md_range ) {
- TestMDRange_2D< Kokkos::Cuda >::test_for2(100,100);
-
- TestMDRange_3D< Kokkos::Cuda >::test_for3(100,100,100);
-}
-
-TEST_F( cuda , compiler_macros )
-{
- ASSERT_TRUE( ( TestCompilerMacros::Test< Kokkos::Cuda >() ) );
-}
-
-TEST_F( cuda , memory_space )
-{
- TestMemorySpace< Kokkos::Cuda >();
-}
-
-TEST_F( cuda, uvm )
-{
- if ( Kokkos::CudaUVMSpace::available() ) {
-
- int * uvm_ptr = (int*) Kokkos::kokkos_malloc< Kokkos::CudaUVMSpace >("uvm_ptr",sizeof(int));
-
- *uvm_ptr = 42 ;
-
- Kokkos::Cuda::fence();
- test_cuda_spaces_int_value<<<1,1>>>(uvm_ptr);
- Kokkos::Cuda::fence();
-
- EXPECT_EQ( *uvm_ptr, int(2*42) );
-
- Kokkos::kokkos_free< Kokkos::CudaUVMSpace >(uvm_ptr );
- }
-}
-
-//----------------------------------------------------------------------------
-
-TEST_F( cuda , impl_shared_alloc )
-{
- test_shared_alloc< Kokkos::CudaSpace , Kokkos::HostSpace::execution_space >();
- test_shared_alloc< Kokkos::CudaUVMSpace , Kokkos::HostSpace::execution_space >();
- test_shared_alloc< Kokkos::CudaHostPinnedSpace , Kokkos::HostSpace::execution_space >();
-}
-
-TEST_F( cuda, policy_construction) {
- TestRangePolicyConstruction< Kokkos::Cuda >();
- TestTeamPolicyConstruction< Kokkos::Cuda >();
-}
-
-TEST_F( cuda , impl_view_mapping )
-{
- test_view_mapping< Kokkos::Cuda >();
- test_view_mapping< Kokkos::CudaUVMSpace >();
- test_view_mapping_subview< Kokkos::Cuda >();
- test_view_mapping_subview< Kokkos::CudaUVMSpace >();
- test_view_mapping_operator< Kokkos::Cuda >();
- test_view_mapping_operator< Kokkos::CudaUVMSpace >();
- TestViewMappingAtomic< Kokkos::Cuda >::run();
-}
-
-TEST_F( cuda , view_of_class )
-{
- TestViewMappingClassValue< Kokkos::CudaSpace >::run();
- TestViewMappingClassValue< Kokkos::CudaUVMSpace >::run();
-}
-
-template< class MemSpace >
-struct TestViewCudaTexture {
-
- enum { N = 1000 };
-
- using V = Kokkos::Experimental::View<double*,MemSpace> ;
- using T = Kokkos::Experimental::View<const double*, MemSpace, Kokkos::MemoryRandomAccess > ;
-
- V m_base ;
- T m_tex ;
-
- struct TagInit {};
- struct TagTest {};
-
- KOKKOS_INLINE_FUNCTION
- void operator()( const TagInit & , const int i ) const { m_base[i] = i + 1 ; }
-
- KOKKOS_INLINE_FUNCTION
- void operator()( const TagTest & , const int i , long & error_count ) const
- { if ( m_tex[i] != i + 1 ) ++error_count ; }
-
- TestViewCudaTexture()
- : m_base("base",N)
- , m_tex( m_base )
- {}
-
- static void run()
- {
- EXPECT_TRUE( ( std::is_same< typename V::reference_type
- , double &
- >::value ) );
-
- EXPECT_TRUE( ( std::is_same< typename T::reference_type
- , const double
- >::value ) );
-
- EXPECT_TRUE( V::reference_type_is_lvalue_reference ); // An ordinary view
- EXPECT_FALSE( T::reference_type_is_lvalue_reference ); // Texture fetch returns by value
-
- TestViewCudaTexture self ;
- Kokkos::parallel_for( Kokkos::RangePolicy< Kokkos::Cuda , TagInit >(0,N) , self );
- long error_count = -1 ;
- Kokkos::parallel_reduce( Kokkos::RangePolicy< Kokkos::Cuda , TagTest >(0,N) , self , error_count );
- EXPECT_EQ( error_count , 0 );
- }
-};
-
-TEST_F( cuda , impl_view_texture )
-{
- TestViewCudaTexture< Kokkos::CudaSpace >::run();
- TestViewCudaTexture< Kokkos::CudaUVMSpace >::run();
-}
-
-template< class MemSpace , class ExecSpace >
-struct TestViewCudaAccessible {
-
- enum { N = 1000 };
-
- using V = Kokkos::Experimental::View<double*,MemSpace> ;
-
- V m_base ;
-
- struct TagInit {};
- struct TagTest {};
-
- KOKKOS_INLINE_FUNCTION
- void operator()( const TagInit & , const int i ) const { m_base[i] = i + 1 ; }
-
- KOKKOS_INLINE_FUNCTION
- void operator()( const TagTest & , const int i , long & error_count ) const
- { if ( m_base[i] != i + 1 ) ++error_count ; }
-
- TestViewCudaAccessible()
- : m_base("base",N)
- {}
-
- static void run()
- {
- TestViewCudaAccessible self ;
- Kokkos::parallel_for( Kokkos::RangePolicy< typename MemSpace::execution_space , TagInit >(0,N) , self );
- MemSpace::execution_space::fence();
- // Next access is a different execution space, must complete prior kernel.
- long error_count = -1 ;
- Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace , TagTest >(0,N) , self , error_count );
- EXPECT_EQ( error_count , 0 );
- }
-};
-
-TEST_F( cuda , impl_view_accessible )
-{
- TestViewCudaAccessible< Kokkos::CudaSpace , Kokkos::Cuda >::run();
-
- TestViewCudaAccessible< Kokkos::CudaUVMSpace , Kokkos::Cuda >::run();
- TestViewCudaAccessible< Kokkos::CudaUVMSpace , Kokkos::HostSpace::execution_space >::run();
-
- TestViewCudaAccessible< Kokkos::CudaHostPinnedSpace , Kokkos::Cuda >::run();
- TestViewCudaAccessible< Kokkos::CudaHostPinnedSpace , Kokkos::HostSpace::execution_space >::run();
-}
-
-}
diff --git a/lib/kokkos/core/unit_test/TestCuda_a.cpp b/lib/kokkos/core/unit_test/TestCuda_a.cpp
deleted file mode 100644
index 4680c3338..000000000
--- a/lib/kokkos/core/unit_test/TestCuda_a.cpp
+++ /dev/null
@@ -1,182 +0,0 @@
-/*
-//@HEADER
-// ************************************************************************
-//
-// Kokkos v. 2.0
-// Copyright (2014) Sandia Corporation
-//
-// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
-// the U.S. Government retains certain rights in this software.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are
-// met:
-//
-// 1. Redistributions of source code must retain the above copyright
-// notice, this list of conditions and the following disclaimer.
-//
-// 2. Redistributions in binary form must reproduce the above copyright
-// notice, this list of conditions and the following disclaimer in the
-// documentation and/or other materials provided with the distribution.
-//
-// 3. Neither the name of the Corporation nor the names of the
-// contributors may be used to endorse or promote products derived from
-// this software without specific prior written permission.
-//
-// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
-// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
-// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
-// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
-// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
-// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
-// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
-// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
-// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-//
-// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
-// ************************************************************************
-//@HEADER
-*/
-
-#include <gtest/gtest.h>
-
-#include <iostream>
-
-#include <Kokkos_Core.hpp>
-
-//----------------------------------------------------------------------------
-
-#include <Cuda/Kokkos_Cuda_TaskPolicy.hpp>
-#include <impl/Kokkos_ViewTileLeft.hpp>
-#include <TestTile.hpp>
-
-//----------------------------------------------------------------------------
-
-#include <TestSharedAlloc.hpp>
-#include <TestViewMapping.hpp>
-
-#include <TestViewImpl.hpp>
-#include <TestAtomic.hpp>
-
-#include <TestViewAPI.hpp>
-#include <TestViewSubview.hpp>
-#include <TestViewOfClass.hpp>
-
-#include <TestReduce.hpp>
-#include <TestScan.hpp>
-#include <TestRange.hpp>
-#include <TestTeam.hpp>
-#include <TestAggregate.hpp>
-#include <TestAggregateReduction.hpp>
-#include <TestCompilerMacros.hpp>
-#include <TestMemorySpaceTracking.hpp>
-#include <TestMemoryPool.hpp>
-#include <TestTeamVector.hpp>
-#include <TestTemplateMetaFunctions.hpp>
-#include <TestCXX11Deduction.hpp>
-
-#include <TestTaskPolicy.hpp>
-#include <TestPolicyConstruction.hpp>
-
-//----------------------------------------------------------------------------
-
-class cuda : public ::testing::Test {
-protected:
- static void SetUpTestCase();
- static void TearDownTestCase();
-};
-
-//----------------------------------------------------------------------------
-
-namespace Test {
-
-TEST_F( cuda, view_impl )
-{
- // test_abort<<<32,32>>>(); // Aborts the kernel with CUDA version 4.1 or greater
-
- test_view_impl< Kokkos::Cuda >();
-}
-
-TEST_F( cuda, view_api )
-{
- typedef Kokkos::View< const int * , Kokkos::Cuda , Kokkos::MemoryTraits< Kokkos::RandomAccess > > view_texture_managed ;
- typedef Kokkos::View< const int * , Kokkos::Cuda , Kokkos::MemoryTraits< Kokkos::RandomAccess | Kokkos::Unmanaged > > view_texture_unmanaged ;
-
- TestViewAPI< double , Kokkos::Cuda >();
- TestViewAPI< double , Kokkos::CudaUVMSpace >();
-
-#if 0
- Kokkos::View<double, Kokkos::Cuda > x("x");
- Kokkos::View<double[1], Kokkos::Cuda > y("y");
- // *x = 10 ;
- // x() = 10 ;
- // y[0] = 10 ;
- // y(0) = 10 ;
-#endif
-}
-
-TEST_F( cuda , view_nested_view )
-{
- ::Test::view_nested_view< Kokkos::Cuda >();
-}
-
-TEST_F( cuda, view_subview_auto_1d_left ) {
- TestViewSubview::test_auto_1d< Kokkos::LayoutLeft,Kokkos::Cuda >();
-}
-
-TEST_F( cuda, view_subview_auto_1d_right ) {
- TestViewSubview::test_auto_1d< Kokkos::LayoutRight,Kokkos::Cuda >();
-}
-
-TEST_F( cuda, view_subview_auto_1d_stride ) {
- TestViewSubview::test_auto_1d< Kokkos::LayoutStride,Kokkos::Cuda >();
-}
-
-TEST_F( cuda, view_subview_assign_strided ) {
- TestViewSubview::test_1d_strided_assignment< Kokkos::Cuda >();
-}
-
-TEST_F( cuda, view_subview_left_0 ) {
- TestViewSubview::test_left_0< Kokkos::CudaUVMSpace >();
-}
-
-TEST_F( cuda, view_subview_left_1 ) {
- TestViewSubview::test_left_1< Kokkos::CudaUVMSpace >();
-}
-
-TEST_F( cuda, view_subview_left_2 ) {
- TestViewSubview::test_left_2< Kokkos::CudaUVMSpace >();
-}
-
-TEST_F( cuda, view_subview_left_3 ) {
- TestViewSubview::test_left_3< Kokkos::CudaUVMSpace >();
-}
-
-TEST_F( cuda, view_subview_right_0 ) {
- TestViewSubview::test_right_0< Kokkos::CudaUVMSpace >();
-}
-
-TEST_F( cuda, view_subview_right_1 ) {
- TestViewSubview::test_right_1< Kokkos::CudaUVMSpace >();
-}
-
-TEST_F( cuda, view_subview_right_3 ) {
- TestViewSubview::test_right_3< Kokkos::CudaUVMSpace >();
-}
-
-TEST_F( cuda, view_subview_1d_assign ) {
- TestViewSubview::test_1d_assign< Kokkos::CudaUVMSpace >();
-}
-
-TEST_F( cuda, view_subview_2d_from_3d ) {
- TestViewSubview::test_2d_subview_3d< Kokkos::CudaUVMSpace >();
-}
-
-TEST_F( cuda, view_subview_2d_from_5d ) {
- TestViewSubview::test_2d_subview_5d< Kokkos::CudaUVMSpace >();
-}
-
-}
diff --git a/lib/kokkos/core/unit_test/TestCuda_b.cpp b/lib/kokkos/core/unit_test/TestCuda_b.cpp
deleted file mode 100644
index d4ca949e5..000000000
--- a/lib/kokkos/core/unit_test/TestCuda_b.cpp
+++ /dev/null
@@ -1,191 +0,0 @@
-/*
-//@HEADER
-// ************************************************************************
-//
-// Kokkos v. 2.0
-// Copyright (2014) Sandia Corporation
-//
-// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
-// the U.S. Government retains certain rights in this software.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are
-// met:
-//
-// 1. Redistributions of source code must retain the above copyright
-// notice, this list of conditions and the following disclaimer.
-//
-// 2. Redistributions in binary form must reproduce the above copyright
-// notice, this list of conditions and the following disclaimer in the
-// documentation and/or other materials provided with the distribution.
-//
-// 3. Neither the name of the Corporation nor the names of the
-// contributors may be used to endorse or promote products derived from
-// this software without specific prior written permission.
-//
-// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
-// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
-// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
-// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
-// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
-// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
-// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
-// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
-// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-//
-// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
-// ************************************************************************
-//@HEADER
-*/
-
-#include <gtest/gtest.h>
-
-#include <iostream>
-
-#include <Kokkos_Core.hpp>
-
-//----------------------------------------------------------------------------
-
-#include <Cuda/Kokkos_Cuda_TaskPolicy.hpp>
-#include <impl/Kokkos_ViewTileLeft.hpp>
-#include <TestTile.hpp>
-
-//----------------------------------------------------------------------------
-
-#include <TestSharedAlloc.hpp>
-#include <TestViewMapping.hpp>
-
-#include <TestViewImpl.hpp>
-#include <TestAtomic.hpp>
-
-#include <TestViewAPI.hpp>
-#include <TestViewSubview.hpp>
-#include <TestViewOfClass.hpp>
-
-#include <TestReduce.hpp>
-#include <TestScan.hpp>
-#include <TestRange.hpp>
-#include <TestTeam.hpp>
-#include <TestAggregate.hpp>
-#include <TestAggregateReduction.hpp>
-#include <TestCompilerMacros.hpp>
-#include <TestMemorySpaceTracking.hpp>
-#include <TestMemoryPool.hpp>
-#include <TestTeamVector.hpp>
-#include <TestTemplateMetaFunctions.hpp>
-#include <TestCXX11Deduction.hpp>
-
-#include <TestTaskPolicy.hpp>
-#include <TestPolicyConstruction.hpp>
-
-//----------------------------------------------------------------------------
-
-class cuda : public ::testing::Test {
-protected:
- static void SetUpTestCase();
- static void TearDownTestCase();
-};
-
-//----------------------------------------------------------------------------
-
-namespace Test {
-
-TEST_F( cuda, range_tag )
-{
- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_for(3);
- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_reduce(3);
- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_scan(3);
- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(3);
- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(3);
- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(3);
- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_for(1000);
- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_reduce(1000);
- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_scan(1000);
- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(1001);
- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(1001);
- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(1001);
- //TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy(1000);
-}
-
-TEST_F( cuda, team_tag )
-{
- TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_for(3);
- TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_reduce(3);
- TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(3);
- TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(3);
- TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_for(1000);
- TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_reduce(1000);
- TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(1000);
- TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(1000);
-}
-
-TEST_F( cuda, reduce )
-{
- TestReduce< long , Kokkos::Cuda >( 10000000 );
- TestReduce< double , Kokkos::Cuda >( 1000000 );
- TestReduce< int , Kokkos::Cuda >( 0 );
-}
-
-TEST_F( cuda , reducers )
-{
- TestReducers<int, Kokkos::Cuda>::execute_integer();
- TestReducers<size_t, Kokkos::Cuda>::execute_integer();
- TestReducers<double, Kokkos::Cuda>::execute_float();
- TestReducers<Kokkos::complex<double>, Kokkos::Cuda>::execute_basic();
-}
-
-TEST_F( cuda, reduce_team )
-{
- TestReduceTeam< long , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >( 3 );
- TestReduceTeam< long , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
- TestReduceTeam< long , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >( 100000 );
- TestReduceTeam< long , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
- TestReduceTeam< double , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >( 3 );
- TestReduceTeam< double , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
- TestReduceTeam< double , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >( 100000 );
- TestReduceTeam< double , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
-}
-
-TEST_F( cuda, shared_team )
-{
- TestSharedTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >();
- TestSharedTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >();
-}
-
-#if defined (KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA)
-TEST_F( cuda, lambda_shared_team )
-{
- TestLambdaSharedTeam< Kokkos::CudaSpace, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >();
- TestLambdaSharedTeam< Kokkos::CudaUVMSpace, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >();
- TestLambdaSharedTeam< Kokkos::CudaHostPinnedSpace, Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >();
- TestLambdaSharedTeam< Kokkos::CudaSpace, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >();
- TestLambdaSharedTeam< Kokkos::CudaUVMSpace, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >();
- TestLambdaSharedTeam< Kokkos::CudaHostPinnedSpace, Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >();
-}
-#endif
-
-TEST_F( cuda, shmem_size) {
- TestShmemSize< Kokkos::Cuda >();
-}
-
-TEST_F( cuda, multi_level_scratch) {
- TestMultiLevelScratchTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >();
- TestMultiLevelScratchTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >();
-}
-
-TEST_F( cuda, reduce_dynamic )
-{
- TestReduceDynamic< long , Kokkos::Cuda >( 10000000 );
- TestReduceDynamic< double , Kokkos::Cuda >( 1000000 );
-}
-
-TEST_F( cuda, reduce_dynamic_view )
-{
- TestReduceDynamicView< long , Kokkos::Cuda >( 10000000 );
- TestReduceDynamicView< double , Kokkos::Cuda >( 1000000 );
-}
-
-}
diff --git a/lib/kokkos/core/unit_test/TestDefaultDeviceType.cpp b/lib/kokkos/core/unit_test/TestDefaultDeviceType.cpp
index 1b1e0e673..87a534f11 100644
--- a/lib/kokkos/core/unit_test/TestDefaultDeviceType.cpp
+++ b/lib/kokkos/core/unit_test/TestDefaultDeviceType.cpp
@@ -1,242 +1,101 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
#if !defined(KOKKOS_HAVE_CUDA) || defined(__CUDACC__)
//----------------------------------------------------------------------------
-#include <TestViewImpl.hpp>
#include <TestAtomic.hpp>
#include <TestViewAPI.hpp>
#include <TestReduce.hpp>
#include <TestScan.hpp>
#include <TestTeam.hpp>
#include <TestAggregate.hpp>
#include <TestCompilerMacros.hpp>
#include <TestCXX11.hpp>
#include <TestTeamVector.hpp>
+#include <TestUtilities.hpp>
namespace Test {
class defaultdevicetype : public ::testing::Test {
protected:
static void SetUpTestCase()
{
Kokkos::initialize();
}
static void TearDownTestCase()
{
Kokkos::finalize();
}
};
-
-TEST_F( defaultdevicetype, view_impl) {
- test_view_impl< Kokkos::DefaultExecutionSpace >();
-}
-
-TEST_F( defaultdevicetype, view_api) {
- TestViewAPI< double , Kokkos::DefaultExecutionSpace >();
-}
-
-TEST_F( defaultdevicetype, long_reduce) {
- TestReduce< long , Kokkos::DefaultExecutionSpace >( 100000 );
-}
-
-TEST_F( defaultdevicetype, double_reduce) {
- TestReduce< double , Kokkos::DefaultExecutionSpace >( 100000 );
-}
-
-TEST_F( defaultdevicetype, long_reduce_dynamic ) {
- TestReduceDynamic< long , Kokkos::DefaultExecutionSpace >( 100000 );
-}
-
-TEST_F( defaultdevicetype, double_reduce_dynamic ) {
- TestReduceDynamic< double , Kokkos::DefaultExecutionSpace >( 100000 );
-}
-
-TEST_F( defaultdevicetype, long_reduce_dynamic_view ) {
- TestReduceDynamicView< long , Kokkos::DefaultExecutionSpace >( 100000 );
-}
-
-
-TEST_F( defaultdevicetype , atomics )
+TEST_F( defaultdevicetype, host_space_access )
{
- const int loop_count = 1e4 ;
+ typedef Kokkos::HostSpace::execution_space host_exec_space ;
+ typedef Kokkos::Device< host_exec_space , Kokkos::HostSpace > device_space ;
+ typedef Kokkos::Impl::HostMirror< Kokkos::DefaultExecutionSpace >::Space mirror_space ;
- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::DefaultExecutionSpace>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::DefaultExecutionSpace>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::DefaultExecutionSpace>(loop_count,3) ) );
+ static_assert(
+ Kokkos::Impl::SpaceAccessibility< host_exec_space , Kokkos::HostSpace >::accessible , "" );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::DefaultExecutionSpace>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::DefaultExecutionSpace>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::DefaultExecutionSpace>(loop_count,3) ) );
+ static_assert(
+ Kokkos::Impl::SpaceAccessibility< device_space , Kokkos::HostSpace >::accessible , "" );
- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::DefaultExecutionSpace>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::DefaultExecutionSpace>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::DefaultExecutionSpace>(loop_count,3) ) );
-
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::DefaultExecutionSpace>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::DefaultExecutionSpace>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::DefaultExecutionSpace>(loop_count,3) ) );
-
- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::DefaultExecutionSpace>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::DefaultExecutionSpace>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::DefaultExecutionSpace>(loop_count,3) ) );
-
- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::DefaultExecutionSpace>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::DefaultExecutionSpace>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::DefaultExecutionSpace>(loop_count,3) ) );
-
- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::DefaultExecutionSpace>(100,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::DefaultExecutionSpace>(100,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::DefaultExecutionSpace>(100,3) ) );
+ static_assert(
+ Kokkos::Impl::SpaceAccessibility< mirror_space , Kokkos::HostSpace >::accessible , "" );
}
-/*TEST_F( defaultdevicetype , view_remap )
-{
- enum { N0 = 3 , N1 = 2 , N2 = 8 , N3 = 9 };
-
- typedef Kokkos::View< double*[N1][N2][N3] ,
- Kokkos::LayoutRight ,
- Kokkos::DefaultExecutionSpace > output_type ;
-
- typedef Kokkos::View< int**[N2][N3] ,
- Kokkos::LayoutLeft ,
- Kokkos::DefaultExecutionSpace > input_type ;
-
- typedef Kokkos::View< int*[N0][N2][N3] ,
- Kokkos::LayoutLeft ,
- Kokkos::DefaultExecutionSpace > diff_type ;
-
- output_type output( "output" , N0 );
- input_type input ( "input" , N0 , N1 );
- diff_type diff ( "diff" , N0 );
-
- int value = 0 ;
- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
- for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
- input(i0,i1,i2,i3) = ++value ;
- }}}}
-
- // Kokkos::deep_copy( diff , input ); // throw with incompatible shape
- Kokkos::deep_copy( output , input );
-
- value = 0 ;
- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
- for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
- ++value ;
- ASSERT_EQ( value , ((int) output(i0,i1,i2,i3) ) );
- }}}}
-}*/
-
-//----------------------------------------------------------------------------
-
-
-TEST_F( defaultdevicetype , view_aggregate )
-{
- TestViewAggregate< Kokkos::DefaultExecutionSpace >();
-}
-
-//----------------------------------------------------------------------------
-
-TEST_F( defaultdevicetype , scan )
-{
- TestScan< Kokkos::DefaultExecutionSpace >::test_range( 1 , 1000 );
- TestScan< Kokkos::DefaultExecutionSpace >( 1000000 );
- TestScan< Kokkos::DefaultExecutionSpace >( 10000000 );
- Kokkos::DefaultExecutionSpace::fence();
-}
-
-
-//----------------------------------------------------------------------------
-
-TEST_F( defaultdevicetype , compiler_macros )
-{
- ASSERT_TRUE( ( TestCompilerMacros::Test< Kokkos::DefaultExecutionSpace >() ) );
-}
-
-
-//----------------------------------------------------------------------------
-TEST_F( defaultdevicetype , cxx11 )
-{
- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::DefaultExecutionSpace >(1) ) );
- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::DefaultExecutionSpace >(2) ) );
- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::DefaultExecutionSpace >(3) ) );
- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::DefaultExecutionSpace >(4) ) );
-}
-
-TEST_F( defaultdevicetype , team_vector )
-{
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::DefaultExecutionSpace >(0) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::DefaultExecutionSpace >(1) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::DefaultExecutionSpace >(2) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::DefaultExecutionSpace >(3) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::DefaultExecutionSpace >(4) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::DefaultExecutionSpace >(5) ) );
-}
-
-TEST_F( defaultdevicetype , malloc )
-{
- int* data = (int*) Kokkos::kokkos_malloc(100*sizeof(int));
- ASSERT_NO_THROW(data = (int*) Kokkos::kokkos_realloc(data,120*sizeof(int)));
- Kokkos::kokkos_free(data);
-
- int* data2 = (int*) Kokkos::kokkos_malloc(0);
- ASSERT_TRUE(data2==NULL);
- Kokkos::kokkos_free(data2);
+TEST_F( defaultdevicetype, view_api) {
+ TestViewAPI< double , Kokkos::DefaultExecutionSpace >();
}
} // namespace test
#endif
diff --git a/lib/kokkos/core/unit_test/TestDefaultDeviceTypeInit.hpp b/lib/kokkos/core/unit_test/TestDefaultDeviceTypeInit.hpp
index a17ed97a9..caeb56c9e 100644
--- a/lib/kokkos/core/unit_test/TestDefaultDeviceTypeInit.hpp
+++ b/lib/kokkos/core/unit_test/TestDefaultDeviceTypeInit.hpp
@@ -1,419 +1,419 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
#ifdef KOKKOS_HAVE_OPENMP
#include <omp.h>
#endif
#if !defined(KOKKOS_HAVE_CUDA) || defined(__CUDACC__)
//----------------------------------------------------------------------------
namespace Test {
namespace Impl {
char** init_kokkos_args(bool do_threads,bool do_numa,bool do_device,bool do_other, int& nargs, Kokkos::InitArguments& init_args) {
nargs = (do_threads?1:0) +
(do_numa?1:0) +
(do_device?1:0) +
(do_other?4:0);
char** args_kokkos = new char*[nargs];
for(int i = 0; i < nargs; i++)
args_kokkos[i] = new char[20];
int threads_idx = do_other?1:0;
int numa_idx = (do_other?3:0) + (do_threads?1:0);
int device_idx = (do_other?3:0) + (do_threads?1:0) + (do_numa?1:0);
if(do_threads) {
int nthreads = 3;
#ifdef KOKKOS_HAVE_OPENMP
if(omp_get_max_threads() < 3)
nthreads = omp_get_max_threads();
#endif
if(Kokkos::hwloc::available()) {
if(Kokkos::hwloc::get_available_threads_per_core()<3)
nthreads = Kokkos::hwloc::get_available_threads_per_core()
* Kokkos::hwloc::get_available_numa_count();
}
#ifdef KOKKOS_HAVE_SERIAL
- if(Kokkos::Impl::is_same<Kokkos::Serial,Kokkos::DefaultExecutionSpace>::value ||
- Kokkos::Impl::is_same<Kokkos::Serial,Kokkos::DefaultHostExecutionSpace>::value ) {
+ if(std::is_same<Kokkos::Serial,Kokkos::DefaultExecutionSpace>::value ||
+ std::is_same<Kokkos::Serial,Kokkos::DefaultHostExecutionSpace>::value ) {
nthreads = 1;
}
#endif
init_args.num_threads = nthreads;
sprintf(args_kokkos[threads_idx],"--threads=%i",nthreads);
}
if(do_numa) {
int numa = 1;
if(Kokkos::hwloc::available())
numa = Kokkos::hwloc::get_available_numa_count();
#ifdef KOKKOS_HAVE_SERIAL
- if(Kokkos::Impl::is_same<Kokkos::Serial,Kokkos::DefaultExecutionSpace>::value ||
- Kokkos::Impl::is_same<Kokkos::Serial,Kokkos::DefaultHostExecutionSpace>::value ) {
+ if(std::is_same<Kokkos::Serial,Kokkos::DefaultExecutionSpace>::value ||
+ std::is_same<Kokkos::Serial,Kokkos::DefaultHostExecutionSpace>::value ) {
numa = 1;
}
#endif
init_args.num_numa = numa;
sprintf(args_kokkos[numa_idx],"--numa=%i",numa);
}
if(do_device) {
init_args.device_id = 0;
sprintf(args_kokkos[device_idx],"--device=%i",0);
}
if(do_other) {
sprintf(args_kokkos[0],"--dummyarg=1");
sprintf(args_kokkos[threads_idx+(do_threads?1:0)],"--dummy2arg");
sprintf(args_kokkos[threads_idx+(do_threads?1:0)+1],"dummy3arg");
sprintf(args_kokkos[device_idx+(do_device?1:0)],"dummy4arg=1");
}
return args_kokkos;
}
Kokkos::InitArguments init_initstruct(bool do_threads, bool do_numa, bool do_device) {
Kokkos::InitArguments args;
if(do_threads) {
int nthreads = 3;
#ifdef KOKKOS_HAVE_OPENMP
if(omp_get_max_threads() < 3)
nthreads = omp_get_max_threads();
#endif
if(Kokkos::hwloc::available()) {
if(Kokkos::hwloc::get_available_threads_per_core()<3)
nthreads = Kokkos::hwloc::get_available_threads_per_core()
* Kokkos::hwloc::get_available_numa_count();
}
#ifdef KOKKOS_HAVE_SERIAL
- if(Kokkos::Impl::is_same<Kokkos::Serial,Kokkos::DefaultExecutionSpace>::value ||
- Kokkos::Impl::is_same<Kokkos::Serial,Kokkos::DefaultHostExecutionSpace>::value ) {
+ if(std::is_same<Kokkos::Serial,Kokkos::DefaultExecutionSpace>::value ||
+ std::is_same<Kokkos::Serial,Kokkos::DefaultHostExecutionSpace>::value ) {
nthreads = 1;
}
#endif
args.num_threads = nthreads;
}
if(do_numa) {
int numa = 1;
if(Kokkos::hwloc::available())
numa = Kokkos::hwloc::get_available_numa_count();
#ifdef KOKKOS_HAVE_SERIAL
- if(Kokkos::Impl::is_same<Kokkos::Serial,Kokkos::DefaultExecutionSpace>::value ||
- Kokkos::Impl::is_same<Kokkos::Serial,Kokkos::DefaultHostExecutionSpace>::value ) {
+ if(std::is_same<Kokkos::Serial,Kokkos::DefaultExecutionSpace>::value ||
+ std::is_same<Kokkos::Serial,Kokkos::DefaultHostExecutionSpace>::value ) {
numa = 1;
}
#endif
args.num_numa = numa;
}
if(do_device) {
args.device_id = 0;
}
return args;
}
void check_correct_initialization(const Kokkos::InitArguments& argstruct) {
ASSERT_EQ( Kokkos::DefaultExecutionSpace::is_initialized(), 1);
ASSERT_EQ( Kokkos::HostSpace::execution_space::is_initialized(), 1);
//Figure out the number of threads the HostSpace ExecutionSpace should have initialized to
int expected_nthreads = argstruct.num_threads;
if(expected_nthreads<1) {
if(Kokkos::hwloc::available()) {
expected_nthreads = Kokkos::hwloc::get_available_numa_count()
* Kokkos::hwloc::get_available_cores_per_numa()
* Kokkos::hwloc::get_available_threads_per_core();
} else {
#ifdef KOKKOS_HAVE_OPENMP
- if(Kokkos::Impl::is_same<Kokkos::HostSpace::execution_space,Kokkos::OpenMP>::value) {
+ if(std::is_same<Kokkos::HostSpace::execution_space,Kokkos::OpenMP>::value) {
expected_nthreads = omp_get_max_threads();
} else
#endif
expected_nthreads = 1;
}
#ifdef KOKKOS_HAVE_SERIAL
- if(Kokkos::Impl::is_same<Kokkos::DefaultExecutionSpace,Kokkos::Serial>::value ||
- Kokkos::Impl::is_same<Kokkos::DefaultHostExecutionSpace,Kokkos::Serial>::value )
+ if(std::is_same<Kokkos::DefaultExecutionSpace,Kokkos::Serial>::value ||
+ std::is_same<Kokkos::DefaultHostExecutionSpace,Kokkos::Serial>::value )
expected_nthreads = 1;
#endif
}
int expected_numa = argstruct.num_numa;
if(expected_numa<1) {
if(Kokkos::hwloc::available()) {
expected_numa = Kokkos::hwloc::get_available_numa_count();
} else {
expected_numa = 1;
}
#ifdef KOKKOS_HAVE_SERIAL
- if(Kokkos::Impl::is_same<Kokkos::DefaultExecutionSpace,Kokkos::Serial>::value ||
- Kokkos::Impl::is_same<Kokkos::DefaultHostExecutionSpace,Kokkos::Serial>::value )
+ if(std::is_same<Kokkos::DefaultExecutionSpace,Kokkos::Serial>::value ||
+ std::is_same<Kokkos::DefaultHostExecutionSpace,Kokkos::Serial>::value )
expected_numa = 1;
#endif
}
ASSERT_EQ(Kokkos::HostSpace::execution_space::thread_pool_size(),expected_nthreads);
#ifdef KOKKOS_HAVE_CUDA
- if(Kokkos::Impl::is_same<Kokkos::DefaultExecutionSpace,Kokkos::Cuda>::value) {
+ if(std::is_same<Kokkos::DefaultExecutionSpace,Kokkos::Cuda>::value) {
int device;
cudaGetDevice( &device );
int expected_device = argstruct.device_id;
if(argstruct.device_id<0) {
expected_device = 0;
}
ASSERT_EQ(expected_device,device);
}
#endif
}
//ToDo: Add check whether correct number of threads are actually started
void test_no_arguments() {
Kokkos::initialize();
check_correct_initialization(Kokkos::InitArguments());
Kokkos::finalize();
}
void test_commandline_args(int nargs, char** args, const Kokkos::InitArguments& argstruct) {
Kokkos::initialize(nargs,args);
check_correct_initialization(argstruct);
Kokkos::finalize();
}
void test_initstruct_args(const Kokkos::InitArguments& args) {
Kokkos::initialize(args);
check_correct_initialization(args);
Kokkos::finalize();
}
}
class defaultdevicetypeinit : public ::testing::Test {
protected:
static void SetUpTestCase()
{
}
static void TearDownTestCase()
{
}
};
#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_01
TEST_F( defaultdevicetypeinit, no_args) {
Impl::test_no_arguments();
}
#endif
#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_02
TEST_F( defaultdevicetypeinit, commandline_args_empty) {
Kokkos::InitArguments argstruct;
int nargs = 0;
char** args = Impl::init_kokkos_args(false,false,false,false,nargs, argstruct);
Impl::test_commandline_args(nargs,args,argstruct);
for(int i = 0; i < nargs; i++)
delete [] args[i];
delete [] args;
}
#endif
#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_03
TEST_F( defaultdevicetypeinit, commandline_args_other) {
Kokkos::InitArguments argstruct;
int nargs = 0;
char** args = Impl::init_kokkos_args(false,false,false,true,nargs, argstruct);
Impl::test_commandline_args(nargs,args,argstruct);
for(int i = 0; i < nargs; i++)
delete [] args[i];
delete [] args;
}
#endif
#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_04
TEST_F( defaultdevicetypeinit, commandline_args_nthreads) {
Kokkos::InitArguments argstruct;
int nargs = 0;
char** args = Impl::init_kokkos_args(true,false,false,false,nargs, argstruct);
Impl::test_commandline_args(nargs,args,argstruct);
for(int i = 0; i < nargs; i++)
delete [] args[i];
delete [] args;
}
#endif
#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_05
TEST_F( defaultdevicetypeinit, commandline_args_nthreads_numa) {
Kokkos::InitArguments argstruct;
int nargs = 0;
char** args = Impl::init_kokkos_args(true,true,false,false,nargs, argstruct);
Impl::test_commandline_args(nargs,args,argstruct);
for(int i = 0; i < nargs; i++)
delete [] args[i];
delete [] args;
}
#endif
#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_06
TEST_F( defaultdevicetypeinit, commandline_args_nthreads_numa_device) {
Kokkos::InitArguments argstruct;
int nargs = 0;
char** args = Impl::init_kokkos_args(true,true,true,false,nargs, argstruct);
Impl::test_commandline_args(nargs,args,argstruct);
for(int i = 0; i < nargs; i++)
delete [] args[i];
delete [] args;
}
#endif
#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_07
TEST_F( defaultdevicetypeinit, commandline_args_nthreads_device) {
Kokkos::InitArguments argstruct;
int nargs = 0;
char** args = Impl::init_kokkos_args(true,false,true,false,nargs, argstruct);
Impl::test_commandline_args(nargs,args,argstruct);
for(int i = 0; i < nargs; i++)
delete [] args[i];
delete [] args;
}
#endif
#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_08
TEST_F( defaultdevicetypeinit, commandline_args_numa_device) {
Kokkos::InitArguments argstruct;
int nargs = 0;
char** args = Impl::init_kokkos_args(false,true,true,false,nargs, argstruct);
Impl::test_commandline_args(nargs,args,argstruct);
for(int i = 0; i < nargs; i++)
delete [] args[i];
delete [] args;
}
#endif
#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_09
TEST_F( defaultdevicetypeinit, commandline_args_device) {
Kokkos::InitArguments argstruct;
int nargs = 0;
char** args = Impl::init_kokkos_args(false,false,true,false,nargs, argstruct);
Impl::test_commandline_args(nargs,args,argstruct);
for(int i = 0; i < nargs; i++)
delete [] args[i];
delete [] args;
}
#endif
#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_10
TEST_F( defaultdevicetypeinit, commandline_args_nthreads_numa_device_other) {
Kokkos::InitArguments argstruct;
int nargs = 0;
char** args = Impl::init_kokkos_args(true,true,true,true,nargs, argstruct);
Impl::test_commandline_args(nargs,args,argstruct);
for(int i = 0; i < nargs; i++)
delete [] args[i];
delete [] args;
}
#endif
#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_11
TEST_F( defaultdevicetypeinit, initstruct_default) {
Kokkos::InitArguments args;
Impl::test_initstruct_args(args);
}
#endif
#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_12
TEST_F( defaultdevicetypeinit, initstruct_nthreads) {
Kokkos::InitArguments args = Impl::init_initstruct(true,false,false);
Impl::test_initstruct_args(args);
}
#endif
#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_13
TEST_F( defaultdevicetypeinit, initstruct_nthreads_numa) {
Kokkos::InitArguments args = Impl::init_initstruct(true,true,false);
Impl::test_initstruct_args(args);
}
#endif
#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_14
TEST_F( defaultdevicetypeinit, initstruct_device) {
Kokkos::InitArguments args = Impl::init_initstruct(false,false,true);
Impl::test_initstruct_args(args);
}
#endif
#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_15
TEST_F( defaultdevicetypeinit, initstruct_nthreads_device) {
Kokkos::InitArguments args = Impl::init_initstruct(true,false,true);
Impl::test_initstruct_args(args);
}
#endif
#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_16
TEST_F( defaultdevicetypeinit, initstruct_nthreads_numa_device) {
Kokkos::InitArguments args = Impl::init_initstruct(true,true,true);
Impl::test_initstruct_args(args);
}
#endif
} // namespace test
#endif
diff --git a/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp b/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp
index c15f81223..185c1b791 100644
--- a/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp
+++ b/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp
@@ -1,76 +1,76 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
#if !defined(KOKKOS_HAVE_CUDA) || defined(__CUDACC__)
//----------------------------------------------------------------------------
#include <TestReduce.hpp>
namespace Test {
class defaultdevicetype : public ::testing::Test {
protected:
static void SetUpTestCase()
{
Kokkos::initialize();
}
static void TearDownTestCase()
{
Kokkos::finalize();
}
};
-TEST_F( defaultdevicetype, reduce_instantiation) {
- TestReduceCombinatoricalInstantiation<>::execute();
+TEST_F( defaultdevicetype, reduce_instantiation_a) {
+ TestReduceCombinatoricalInstantiation<>::execute_a();
}
} // namespace test
#endif
diff --git a/lib/kokkos/core/src/impl/Kokkos_HBWAllocators.hpp b/lib/kokkos/core/unit_test/TestDefaultDeviceType_b.cpp
similarity index 78%
rename from lib/kokkos/core/src/impl/Kokkos_HBWAllocators.hpp
rename to lib/kokkos/core/unit_test/TestDefaultDeviceType_b.cpp
index be0134460..9aa540187 100644
--- a/lib/kokkos/core/src/impl/Kokkos_HBWAllocators.hpp
+++ b/lib/kokkos/core/unit_test/TestDefaultDeviceType_b.cpp
@@ -1,75 +1,76 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
-#ifndef KOKKOS_HBW_ALLOCATORS_HPP
-#define KOKKOS_HBW_ALLOCATORS_HPP
+#include <gtest/gtest.h>
-#ifdef KOKKOS_HAVE_HBWSPACE
+#include <Kokkos_Core.hpp>
-namespace Kokkos {
-namespace Experimental {
-namespace Impl {
+#if !defined(KOKKOS_HAVE_CUDA) || defined(__CUDACC__)
+//----------------------------------------------------------------------------
+
+#include <TestReduce.hpp>
-/// class MallocAllocator
-class HBWMallocAllocator
-{
-public:
- static const char * name()
- {
- return "HBW Malloc Allocator";
- }
- static void* allocate(size_t size);
+namespace Test {
- static void deallocate(void * ptr, size_t size);
+class defaultdevicetype : public ::testing::Test {
+protected:
+ static void SetUpTestCase()
+ {
+ Kokkos::initialize();
+ }
- static void * reallocate(void * old_ptr, size_t old_size, size_t new_size);
+ static void TearDownTestCase()
+ {
+ Kokkos::finalize();
+ }
};
+
+TEST_F( defaultdevicetype, reduce_instantiation_b) {
+ TestReduceCombinatoricalInstantiation<>::execute_b();
}
-}
-} // namespace Kokkos::Impl
-#endif //KOKKOS_HAVE_HBWSPACE
-#endif //KOKKOS_HBW_ALLOCATORS_HPP
+} // namespace test
+#endif
diff --git a/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp b/lib/kokkos/core/unit_test/TestDefaultDeviceType_c.cpp
similarity index 95%
copy from lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp
copy to lib/kokkos/core/unit_test/TestDefaultDeviceType_c.cpp
index c15f81223..585658909 100644
--- a/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp
+++ b/lib/kokkos/core/unit_test/TestDefaultDeviceType_c.cpp
@@ -1,76 +1,76 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
#if !defined(KOKKOS_HAVE_CUDA) || defined(__CUDACC__)
//----------------------------------------------------------------------------
#include <TestReduce.hpp>
namespace Test {
class defaultdevicetype : public ::testing::Test {
protected:
static void SetUpTestCase()
{
Kokkos::initialize();
}
static void TearDownTestCase()
{
Kokkos::finalize();
}
};
-TEST_F( defaultdevicetype, reduce_instantiation) {
- TestReduceCombinatoricalInstantiation<>::execute();
+TEST_F( defaultdevicetype, reduce_instantiation_c) {
+ TestReduceCombinatoricalInstantiation<>::execute_c();
}
} // namespace test
#endif
diff --git a/lib/kokkos/core/unit_test/TestDefaultDeviceType.cpp b/lib/kokkos/core/unit_test/TestDefaultDeviceType_d.cpp
similarity index 97%
copy from lib/kokkos/core/unit_test/TestDefaultDeviceType.cpp
copy to lib/kokkos/core/unit_test/TestDefaultDeviceType_d.cpp
index 1b1e0e673..2659b5c38 100644
--- a/lib/kokkos/core/unit_test/TestDefaultDeviceType.cpp
+++ b/lib/kokkos/core/unit_test/TestDefaultDeviceType_d.cpp
@@ -1,242 +1,237 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
#if !defined(KOKKOS_HAVE_CUDA) || defined(__CUDACC__)
//----------------------------------------------------------------------------
-#include <TestViewImpl.hpp>
#include <TestAtomic.hpp>
#include <TestViewAPI.hpp>
#include <TestReduce.hpp>
#include <TestScan.hpp>
#include <TestTeam.hpp>
#include <TestAggregate.hpp>
#include <TestCompilerMacros.hpp>
#include <TestCXX11.hpp>
#include <TestTeamVector.hpp>
+#include <TestUtilities.hpp>
namespace Test {
class defaultdevicetype : public ::testing::Test {
protected:
static void SetUpTestCase()
{
Kokkos::initialize();
}
static void TearDownTestCase()
{
Kokkos::finalize();
}
};
-
-TEST_F( defaultdevicetype, view_impl) {
- test_view_impl< Kokkos::DefaultExecutionSpace >();
-}
-
-TEST_F( defaultdevicetype, view_api) {
- TestViewAPI< double , Kokkos::DefaultExecutionSpace >();
+TEST_F( defaultdevicetype, test_utilities) {
+ test_utilities();
}
TEST_F( defaultdevicetype, long_reduce) {
TestReduce< long , Kokkos::DefaultExecutionSpace >( 100000 );
}
TEST_F( defaultdevicetype, double_reduce) {
TestReduce< double , Kokkos::DefaultExecutionSpace >( 100000 );
}
TEST_F( defaultdevicetype, long_reduce_dynamic ) {
TestReduceDynamic< long , Kokkos::DefaultExecutionSpace >( 100000 );
}
TEST_F( defaultdevicetype, double_reduce_dynamic ) {
TestReduceDynamic< double , Kokkos::DefaultExecutionSpace >( 100000 );
}
TEST_F( defaultdevicetype, long_reduce_dynamic_view ) {
TestReduceDynamicView< long , Kokkos::DefaultExecutionSpace >( 100000 );
}
TEST_F( defaultdevicetype , atomics )
{
const int loop_count = 1e4 ;
ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::DefaultExecutionSpace>(loop_count,1) ) );
ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::DefaultExecutionSpace>(loop_count,2) ) );
ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::DefaultExecutionSpace>(loop_count,3) ) );
ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::DefaultExecutionSpace>(loop_count,1) ) );
ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::DefaultExecutionSpace>(loop_count,2) ) );
ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::DefaultExecutionSpace>(loop_count,3) ) );
ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::DefaultExecutionSpace>(loop_count,1) ) );
ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::DefaultExecutionSpace>(loop_count,2) ) );
ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::DefaultExecutionSpace>(loop_count,3) ) );
ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::DefaultExecutionSpace>(loop_count,1) ) );
ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::DefaultExecutionSpace>(loop_count,2) ) );
ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::DefaultExecutionSpace>(loop_count,3) ) );
ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::DefaultExecutionSpace>(loop_count,1) ) );
ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::DefaultExecutionSpace>(loop_count,2) ) );
ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::DefaultExecutionSpace>(loop_count,3) ) );
ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::DefaultExecutionSpace>(loop_count,1) ) );
ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::DefaultExecutionSpace>(loop_count,2) ) );
ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::DefaultExecutionSpace>(loop_count,3) ) );
ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::DefaultExecutionSpace>(100,1) ) );
ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::DefaultExecutionSpace>(100,2) ) );
ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::DefaultExecutionSpace>(100,3) ) );
}
/*TEST_F( defaultdevicetype , view_remap )
{
enum { N0 = 3 , N1 = 2 , N2 = 8 , N3 = 9 };
typedef Kokkos::View< double*[N1][N2][N3] ,
Kokkos::LayoutRight ,
Kokkos::DefaultExecutionSpace > output_type ;
typedef Kokkos::View< int**[N2][N3] ,
Kokkos::LayoutLeft ,
Kokkos::DefaultExecutionSpace > input_type ;
typedef Kokkos::View< int*[N0][N2][N3] ,
Kokkos::LayoutLeft ,
Kokkos::DefaultExecutionSpace > diff_type ;
output_type output( "output" , N0 );
input_type input ( "input" , N0 , N1 );
diff_type diff ( "diff" , N0 );
int value = 0 ;
for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
input(i0,i1,i2,i3) = ++value ;
}}}}
// Kokkos::deep_copy( diff , input ); // throw with incompatible shape
Kokkos::deep_copy( output , input );
value = 0 ;
for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
++value ;
ASSERT_EQ( value , ((int) output(i0,i1,i2,i3) ) );
}}}}
}*/
//----------------------------------------------------------------------------
TEST_F( defaultdevicetype , view_aggregate )
{
TestViewAggregate< Kokkos::DefaultExecutionSpace >();
}
//----------------------------------------------------------------------------
TEST_F( defaultdevicetype , scan )
{
TestScan< Kokkos::DefaultExecutionSpace >::test_range( 1 , 1000 );
TestScan< Kokkos::DefaultExecutionSpace >( 1000000 );
TestScan< Kokkos::DefaultExecutionSpace >( 10000000 );
Kokkos::DefaultExecutionSpace::fence();
}
//----------------------------------------------------------------------------
TEST_F( defaultdevicetype , compiler_macros )
{
ASSERT_TRUE( ( TestCompilerMacros::Test< Kokkos::DefaultExecutionSpace >() ) );
}
//----------------------------------------------------------------------------
TEST_F( defaultdevicetype , cxx11 )
{
ASSERT_TRUE( ( TestCXX11::Test< Kokkos::DefaultExecutionSpace >(1) ) );
ASSERT_TRUE( ( TestCXX11::Test< Kokkos::DefaultExecutionSpace >(2) ) );
ASSERT_TRUE( ( TestCXX11::Test< Kokkos::DefaultExecutionSpace >(3) ) );
ASSERT_TRUE( ( TestCXX11::Test< Kokkos::DefaultExecutionSpace >(4) ) );
}
TEST_F( defaultdevicetype , team_vector )
{
ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::DefaultExecutionSpace >(0) ) );
ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::DefaultExecutionSpace >(1) ) );
ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::DefaultExecutionSpace >(2) ) );
ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::DefaultExecutionSpace >(3) ) );
ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::DefaultExecutionSpace >(4) ) );
ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::DefaultExecutionSpace >(5) ) );
}
TEST_F( defaultdevicetype , malloc )
{
int* data = (int*) Kokkos::kokkos_malloc(100*sizeof(int));
ASSERT_NO_THROW(data = (int*) Kokkos::kokkos_realloc(data,120*sizeof(int)));
Kokkos::kokkos_free(data);
int* data2 = (int*) Kokkos::kokkos_malloc(0);
ASSERT_TRUE(data2==NULL);
Kokkos::kokkos_free(data2);
}
} // namespace test
#endif
diff --git a/lib/kokkos/core/unit_test/TestMemoryPool.hpp b/lib/kokkos/core/unit_test/TestMemoryPool.hpp
index cf650b0bc..f83f390ac 100644
--- a/lib/kokkos/core/unit_test/TestMemoryPool.hpp
+++ b/lib/kokkos/core/unit_test/TestMemoryPool.hpp
@@ -1,820 +1,820 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_UNITTEST_MEMPOOL_HPP
#define KOKKOS_UNITTEST_MEMPOOL_HPP
#include <stdio.h>
#include <iostream>
#include <cmath>
#include <algorithm>
#include <impl/Kokkos_Timer.hpp>
//#define TESTMEMORYPOOL_PRINT
//#define TESTMEMORYPOOL_PRINT_STATUS
+#define STRIDE 1
#ifdef KOKKOS_HAVE_CUDA
-#define STRIDE 32
+#define STRIDE_ALLOC 32
#else
-#define STRIDE 1
+#define STRIDE_ALLOC 1
#endif
namespace TestMemoryPool {
struct pointer_obj {
uint64_t * ptr;
+
+ KOKKOS_INLINE_FUNCTION
+ pointer_obj() : ptr( 0 ) {}
};
struct pointer_obj2 {
void * ptr;
size_t size;
+
+ KOKKOS_INLINE_FUNCTION
+ pointer_obj2() : ptr( 0 ), size( 0 ) {}
};
template < typename PointerView, typename Allocator >
struct allocate_memory {
typedef typename PointerView::execution_space execution_space;
typedef typename execution_space::size_type size_type;
PointerView m_pointers;
size_t m_chunk_size;
Allocator m_mempool;
allocate_memory( PointerView & ptrs, size_t num_ptrs,
size_t cs, Allocator & m )
: m_pointers( ptrs ), m_chunk_size( cs ), m_mempool( m )
{
// Initialize the view with the out degree of each vertex.
- Kokkos::parallel_for( num_ptrs * STRIDE, *this );
+ Kokkos::parallel_for( num_ptrs * STRIDE_ALLOC, *this );
}
KOKKOS_INLINE_FUNCTION
void operator()( size_type i ) const
{
- if ( i % STRIDE == 0 ) {
- m_pointers[i / STRIDE].ptr =
+ if ( i % STRIDE_ALLOC == 0 ) {
+ m_pointers[i / STRIDE_ALLOC].ptr =
static_cast< uint64_t * >( m_mempool.allocate( m_chunk_size ) );
}
}
};
template < typename PointerView >
struct count_invalid_memory {
typedef typename PointerView::execution_space execution_space;
typedef typename execution_space::size_type size_type;
typedef uint64_t value_type;
PointerView m_pointers;
uint64_t & m_result;
count_invalid_memory( PointerView & ptrs, size_t num_ptrs, uint64_t & res )
: m_pointers( ptrs ), m_result( res )
{
// Initialize the view with the out degree of each vertex.
Kokkos::parallel_reduce( num_ptrs * STRIDE, *this, m_result );
}
KOKKOS_INLINE_FUNCTION
void init( value_type & v ) const
{ v = 0; }
KOKKOS_INLINE_FUNCTION
void join( volatile value_type & dst, volatile value_type const & src ) const
{ dst += src; }
KOKKOS_INLINE_FUNCTION
void operator()( size_type i, value_type & r ) const
{
if ( i % STRIDE == 0 ) {
r += ( m_pointers[i / STRIDE].ptr == 0 );
}
}
};
template < typename PointerView >
struct fill_memory {
typedef typename PointerView::execution_space execution_space;
typedef typename execution_space::size_type size_type;
PointerView m_pointers;
fill_memory( PointerView & ptrs, size_t num_ptrs ) : m_pointers( ptrs )
{
// Initialize the view with the out degree of each vertex.
Kokkos::parallel_for( num_ptrs * STRIDE, *this );
}
KOKKOS_INLINE_FUNCTION
void operator()( size_type i ) const
{
if ( i % STRIDE == 0 ) {
*m_pointers[i / STRIDE].ptr = i / STRIDE ;
}
}
};
template < typename PointerView >
struct sum_memory {
typedef typename PointerView::execution_space execution_space;
typedef typename execution_space::size_type size_type;
typedef uint64_t value_type;
PointerView m_pointers;
uint64_t & m_result;
sum_memory( PointerView & ptrs, size_t num_ptrs, uint64_t & res )
: m_pointers( ptrs ), m_result( res )
{
// Initialize the view with the out degree of each vertex.
Kokkos::parallel_reduce( num_ptrs * STRIDE, *this, m_result );
}
KOKKOS_INLINE_FUNCTION
void init( value_type & v ) const
{ v = 0; }
KOKKOS_INLINE_FUNCTION
void join( volatile value_type & dst, volatile value_type const & src ) const
{ dst += src; }
KOKKOS_INLINE_FUNCTION
void operator()( size_type i, value_type & r ) const
{
if ( i % STRIDE == 0 ) {
r += *m_pointers[i / STRIDE].ptr;
}
}
};
template < typename PointerView, typename Allocator >
struct deallocate_memory {
typedef typename PointerView::execution_space execution_space;
typedef typename execution_space::size_type size_type;
PointerView m_pointers;
size_t m_chunk_size;
Allocator m_mempool;
deallocate_memory( PointerView & ptrs, size_t num_ptrs,
size_t cs, Allocator & m )
: m_pointers( ptrs ), m_chunk_size( cs ), m_mempool( m )
{
// Initialize the view with the out degree of each vertex.
Kokkos::parallel_for( num_ptrs * STRIDE, *this );
}
KOKKOS_INLINE_FUNCTION
void operator()( size_type i ) const
{
if ( i % STRIDE == 0 ) {
m_mempool.deallocate( m_pointers[i / STRIDE].ptr, m_chunk_size );
}
}
};
template < typename WorkView, typename PointerView, typename ScalarView,
typename Allocator >
struct allocate_deallocate_memory {
typedef typename WorkView::execution_space execution_space;
typedef typename execution_space::size_type size_type;
WorkView m_work;
PointerView m_pointers;
ScalarView m_ptrs_front;
ScalarView m_ptrs_back;
Allocator m_mempool;
allocate_deallocate_memory( WorkView & w, size_t work_size, PointerView & p,
ScalarView pf, ScalarView pb, Allocator & m )
: m_work( w ), m_pointers( p ), m_ptrs_front( pf ), m_ptrs_back( pb ),
m_mempool( m )
{
// Initialize the view with the out degree of each vertex.
- Kokkos::parallel_for( work_size * STRIDE, *this );
+ Kokkos::parallel_for( work_size * STRIDE_ALLOC, *this );
}
KOKKOS_INLINE_FUNCTION
void operator()( size_type i ) const
{
- if ( i % STRIDE == 0 ) {
- unsigned my_work = m_work[i / STRIDE];
+ if ( i % STRIDE_ALLOC == 0 ) {
+ unsigned my_work = m_work[i / STRIDE_ALLOC];
if ( ( my_work & 1 ) == 0 ) {
// Allocation.
size_t pos = Kokkos::atomic_fetch_add( &m_ptrs_back(), 1 );
size_t alloc_size = my_work >> 1;
m_pointers[pos].ptr = m_mempool.allocate( alloc_size );
m_pointers[pos].size = alloc_size;
}
else {
// Deallocation.
size_t pos = Kokkos::atomic_fetch_add( &m_ptrs_front(), 1 );
m_mempool.deallocate( m_pointers[pos].ptr, m_pointers[pos].size );
}
}
}
};
#define PRECISION 6
#define SHIFTW 24
#define SHIFTW2 12
template < typename F >
void print_results( const std::string & text, F elapsed_time )
{
std::cout << std::setw( SHIFTW ) << text << std::setw( SHIFTW2 )
<< std::fixed << std::setprecision( PRECISION ) << elapsed_time
<< std::endl;
}
template < typename F, typename T >
void print_results( const std::string & text, unsigned long long width,
F elapsed_time, T result )
{
std::cout << std::setw( SHIFTW ) << text << std::setw( SHIFTW2 )
<< std::fixed << std::setprecision( PRECISION ) << elapsed_time
<< " " << std::setw( width ) << result << std::endl;
}
template < typename F >
void print_results( const std::string & text, unsigned long long width,
F elapsed_time, const std::string & result )
{
std::cout << std::setw( SHIFTW ) << text << std::setw( SHIFTW2 )
<< std::fixed << std::setprecision( PRECISION ) << elapsed_time
<< " " << std::setw( width ) << result << std::endl;
}
// This test slams allocation and deallocation in a worse than real-world usage
// scenario to see how bad the thread-safety really is by having a loop where
// all threads allocate and a subsequent loop where all threads deallocate.
// All of the allocation requests are for equal-sized chunks that are the base
// chunk size of the memory pool. It also tests initialization of the memory
// pool and breaking large chunks into smaller chunks to fulfill allocation
// requests. It verifies that MemoryPool(), allocate(), and deallocate() work
// correctly.
template < class Device >
bool test_mempool( size_t chunk_size, size_t total_size )
{
typedef typename Device::execution_space execution_space;
typedef typename Device::memory_space memory_space;
typedef Device device_type;
typedef Kokkos::View< pointer_obj *, device_type > pointer_view;
typedef Kokkos::Experimental::MemoryPool< device_type > pool_memory_space;
- uint64_t result;
+ uint64_t result = 0;
size_t num_chunks = total_size / chunk_size;
bool return_val = true;
pointer_view pointers( "pointers", num_chunks );
#ifdef TESTMEMORYPOOL_PRINT
std::cout << "*** test_mempool() ***" << std::endl
<< std::setw( SHIFTW ) << "chunk_size: " << std::setw( 12 )
<< chunk_size << std::endl
<< std::setw( SHIFTW ) << "total_size: " << std::setw( 12 )
<< total_size << std::endl
<< std::setw( SHIFTW ) << "num_chunks: " << std::setw( 12 )
<< num_chunks << std::endl;
double elapsed_time = 0;
Kokkos::Timer timer;
#endif
pool_memory_space mempool( memory_space(), total_size * 1.2, 20 );
#ifdef TESTMEMORYPOOL_PRINT
execution_space::fence();
elapsed_time = timer.seconds();
print_results( "initialize mempool: ", elapsed_time );
#ifdef TESTMEMORYPOOL_PRINT_STATUS
mempool.print_status();
#endif
timer.reset();
#endif
{
allocate_memory< pointer_view, pool_memory_space >
am( pointers, num_chunks, chunk_size, mempool );
}
#ifdef TESTMEMORYPOOL_PRINT
execution_space::fence();
elapsed_time = timer.seconds();
print_results( "allocate chunks: ", elapsed_time );
#ifdef TESTMEMORYPOOL_PRINT_STATUS
mempool.print_status();
#endif
timer.reset();
#endif
{
count_invalid_memory< pointer_view > sm( pointers, num_chunks, result );
}
#ifdef TESTMEMORYPOOL_PRINT
execution_space::fence();
elapsed_time = timer.seconds();
print_results( "invalid chunks: ", 16, elapsed_time, result );
timer.reset();
#endif
{
fill_memory< pointer_view > fm( pointers, num_chunks );
}
#ifdef TESTMEMORYPOOL_PRINT
execution_space::fence();
elapsed_time = timer.seconds();
print_results( "fill chunks: ", elapsed_time );
timer.reset();
#endif
{
sum_memory< pointer_view > sm( pointers, num_chunks, result );
}
execution_space::fence();
#ifdef TESTMEMORYPOOL_PRINT
elapsed_time = timer.seconds();
print_results( "sum chunks: ", 16, elapsed_time, result );
#endif
if ( result != ( num_chunks * ( num_chunks - 1 ) ) / 2 ) {
std::cerr << "Invalid sum value in memory." << std::endl;
return_val = false;
}
#ifdef TESTMEMORYPOOL_PRINT
timer.reset();
#endif
{
deallocate_memory< pointer_view, pool_memory_space >
dm( pointers, num_chunks, chunk_size, mempool );
}
#ifdef TESTMEMORYPOOL_PRINT
execution_space::fence();
elapsed_time = timer.seconds();
print_results( "deallocate chunks: ", elapsed_time );
#ifdef TESTMEMORYPOOL_PRINT_STATUS
mempool.print_status();
#endif
timer.reset();
#endif
{
allocate_memory< pointer_view, pool_memory_space >
am( pointers, num_chunks, chunk_size, mempool );
}
#ifdef TESTMEMORYPOOL_PRINT
execution_space::fence();
elapsed_time = timer.seconds();
print_results( "allocate chunks: ", elapsed_time );
#ifdef TESTMEMORYPOOL_PRINT_STATUS
mempool.print_status();
#endif
timer.reset();
#endif
{
count_invalid_memory< pointer_view > sm( pointers, num_chunks, result );
}
#ifdef TESTMEMORYPOOL_PRINT
execution_space::fence();
elapsed_time = timer.seconds();
print_results( "invalid chunks: ", 16, elapsed_time, result );
timer.reset();
#endif
{
fill_memory< pointer_view > fm( pointers, num_chunks );
}
#ifdef TESTMEMORYPOOL_PRINT
execution_space::fence();
elapsed_time = timer.seconds();
print_results( "fill chunks: ", elapsed_time );
timer.reset();
#endif
{
sum_memory< pointer_view > sm( pointers, num_chunks, result );
}
execution_space::fence();
#ifdef TESTMEMORYPOOL_PRINT
elapsed_time = timer.seconds();
print_results( "sum chunks: ", 16, elapsed_time, result );
#endif
if ( result != ( num_chunks * ( num_chunks - 1 ) ) / 2 ) {
std::cerr << "Invalid sum value in memory." << std::endl;
return_val = false;
}
#ifdef TESTMEMORYPOOL_PRINT
timer.reset();
#endif
{
deallocate_memory< pointer_view, pool_memory_space >
dm( pointers, num_chunks, chunk_size, mempool );
}
#ifdef TESTMEMORYPOOL_PRINT
execution_space::fence();
elapsed_time = timer.seconds();
print_results( "deallocate chunks: ", elapsed_time );
#ifdef TESTMEMORYPOOL_PRINT_STATUS
mempool.print_status();
#endif
#endif
return return_val;
}
template < typename T >
T smallest_power2_ge( T val )
{
// Find the most significant nonzero bit.
int first_nonzero_bit = Kokkos::Impl::bit_scan_reverse( val );
// If val is an integral power of 2, ceil( log2(val) ) is equal to the
// most significant nonzero bit. Otherwise, you need to add 1.
int lg2_size = first_nonzero_bit +
!Kokkos::Impl::is_integral_power_of_two( val );
return T(1) << T(lg2_size);
}
// This test makes allocation requests for multiple sizes and interleaves
// allocation and deallocation.
//
// There are 3 phases. The first phase does only allocations to build up a
// working state for the allocator. The second phase interleaves allocations
// and deletions. The third phase does only deallocations to undo all the
// allocations from the first phase. By building first to a working state,
// allocations and deallocations can happen in any order for the second phase.
// Each phase performs on multiple chunk sizes.
template < class Device >
void test_mempool2( unsigned base_chunk_size, size_t num_chunk_sizes,
size_t phase1_size, size_t phase2_size )
{
#ifdef TESTMEMORYPOOL_PRINT
typedef typename Device::execution_space execution_space;
#endif
typedef typename Device::memory_space memory_space;
typedef Device device_type;
typedef Kokkos::View< unsigned *, device_type > work_view;
typedef Kokkos::View< size_t, device_type > scalar_view;
typedef Kokkos::View< pointer_obj2 *, device_type > pointer_view;
typedef Kokkos::Experimental::MemoryPool< device_type > pool_memory_space;
enum {
MIN_CHUNK_SIZE = 64,
MIN_BASE_CHUNK_SIZE = MIN_CHUNK_SIZE / 2 + 1
};
// Make sure the base chunk size is at least MIN_BASE_CHUNK_SIZE bytes, so
// all the different chunk sizes translate to different block sizes for the
// allocator.
if ( base_chunk_size < MIN_BASE_CHUNK_SIZE ) {
base_chunk_size = MIN_BASE_CHUNK_SIZE;
}
// Get the smallest power of 2 >= the base chunk size. The size must be
// >= MIN_CHUNK_SIZE, though.
unsigned ceil_base_chunk_size = smallest_power2_ge( base_chunk_size );
if ( ceil_base_chunk_size < MIN_CHUNK_SIZE ) {
ceil_base_chunk_size = MIN_CHUNK_SIZE;
}
// Make sure the phase 1 size is multiples of num_chunk_sizes.
phase1_size = ( ( phase1_size + num_chunk_sizes - 1 ) / num_chunk_sizes ) *
num_chunk_sizes;
// Make sure the phase 2 size is multiples of (2 * num_chunk_sizes).
phase2_size =
( ( phase2_size + 2 * num_chunk_sizes - 1 ) / ( 2 * num_chunk_sizes ) ) *
2 * num_chunk_sizes;
// The phase2 size must be <= twice the phase1 size so that deallocations
// can't happen before allocations.
if ( phase2_size > 2 * phase1_size ) phase2_size = 2 * phase1_size;
size_t phase3_size = phase1_size;
size_t half_phase2_size = phase2_size / 2;
// Each entry in the work views has the following format. The least
// significant bit indicates allocation (0) vs. deallocation (1). For
// allocation, the other bits indicate the desired allocation size.
// Initialize the phase 1 work view with an equal number of allocations for
// each chunk size.
work_view phase1_work( "Phase 1 Work", phase1_size );
typename work_view::HostMirror host_phase1_work =
create_mirror_view(phase1_work);
size_t inner_size = phase1_size / num_chunk_sizes;
unsigned chunk_size = base_chunk_size;
for ( size_t i = 0; i < num_chunk_sizes; ++i ) {
for ( size_t j = 0; j < inner_size; ++j ) {
host_phase1_work[i * inner_size + j] = chunk_size << 1;
}
chunk_size *= 2;
}
std::random_shuffle( host_phase1_work.ptr_on_device(),
host_phase1_work.ptr_on_device() + phase1_size );
deep_copy( phase1_work, host_phase1_work );
// Initialize the phase 2 work view with half allocations and half
// deallocations with an equal number of allocations for each chunk size.
work_view phase2_work( "Phase 2 Work", phase2_size );
typename work_view::HostMirror host_phase2_work =
create_mirror_view(phase2_work);
inner_size = half_phase2_size / num_chunk_sizes;
chunk_size = base_chunk_size;
for ( size_t i = 0; i < num_chunk_sizes; ++i ) {
for ( size_t j = 0; j < inner_size; ++j ) {
host_phase2_work[i * inner_size + j] = chunk_size << 1;
}
chunk_size *= 2;
}
for ( size_t i = half_phase2_size; i < phase2_size; ++i ) {
host_phase2_work[i] = 1;
}
std::random_shuffle( host_phase2_work.ptr_on_device(),
host_phase2_work.ptr_on_device() + phase2_size );
deep_copy( phase2_work, host_phase2_work );
// Initialize the phase 3 work view with all deallocations.
work_view phase3_work( "Phase 3 Work", phase3_size );
typename work_view::HostMirror host_phase3_work =
create_mirror_view(phase3_work);
inner_size = phase3_size / num_chunk_sizes;
for ( size_t i = 0; i < phase3_size; ++i ) host_phase3_work[i] = 1;
deep_copy( phase3_work, host_phase3_work );
// Calculate the amount of memory needed for the allocator. We need to know
// the number of superblocks required for each chunk size and use that to
// calculate the amount of memory for each chunk size.
size_t lg_sb_size = 18;
size_t sb_size = 1 << lg_sb_size;
size_t total_size = 0;
size_t allocs_per_size = phase1_size / num_chunk_sizes +
half_phase2_size / num_chunk_sizes;
chunk_size = ceil_base_chunk_size;
for ( size_t i = 0; i < num_chunk_sizes; ++i ) {
size_t my_size = allocs_per_size * chunk_size;
total_size += ( my_size + sb_size - 1 ) / sb_size * sb_size;
chunk_size *= 2;
}
// Declare the queue to hold the records for allocated memory. An allocation
// adds a record to the back of the queue, and a deallocation removes a
// record from the front of the queue.
size_t num_allocations = phase1_size + half_phase2_size;
scalar_view ptrs_front( "Pointers front" );
scalar_view ptrs_back( "Pointers back" );
pointer_view pointers( "pointers", num_allocations );
#ifdef TESTMEMORYPOOL_PRINT
printf( "\n*** test_mempool2() ***\n" );
printf( " num_chunk_sizes: %12zu\n", num_chunk_sizes );
printf( " base_chunk_size: %12u\n", base_chunk_size );
printf( " ceil_base_chunk_size: %12u\n", ceil_base_chunk_size );
printf( " phase1_size: %12zu\n", phase1_size );
printf( " phase2_size: %12zu\n", phase2_size );
printf( " phase3_size: %12zu\n", phase3_size );
printf( " allocs_per_size: %12zu\n", allocs_per_size );
printf( " num_allocations: %12zu\n", num_allocations );
printf( " total_size: %12zu\n", total_size );
fflush( stdout );
double elapsed_time = 0;
Kokkos::Timer timer;
#endif
pool_memory_space mempool( memory_space(), total_size * 1.2, lg_sb_size );
#ifdef TESTMEMORYPOOL_PRINT
execution_space::fence();
elapsed_time = timer.seconds();
print_results( "initialize mempool: ", elapsed_time );
#ifdef TESTMEMORYPOOL_PRINT_STATUS
mempool.print_status();
#endif
timer.reset();
#endif
{
allocate_deallocate_memory< work_view, pointer_view, scalar_view,
pool_memory_space >
adm( phase1_work, phase1_size, pointers, ptrs_front, ptrs_back, mempool );
}
#ifdef TESTMEMORYPOOL_PRINT
execution_space::fence();
elapsed_time = timer.seconds();
print_results( "phase1: ", elapsed_time );
#ifdef TESTMEMORYPOOL_PRINT_STATUS
mempool.print_status();
#endif
timer.reset();
#endif
{
allocate_deallocate_memory< work_view, pointer_view, scalar_view,
pool_memory_space >
adm( phase2_work, phase2_size, pointers, ptrs_front, ptrs_back, mempool );
}
#ifdef TESTMEMORYPOOL_PRINT
execution_space::fence();
elapsed_time = timer.seconds();
print_results( "phase2: ", elapsed_time );
#ifdef TESTMEMORYPOOL_PRINT_STATUS
mempool.print_status();
#endif
timer.reset();
#endif
{
allocate_deallocate_memory< work_view, pointer_view, scalar_view,
pool_memory_space >
adm( phase3_work, phase3_size, pointers, ptrs_front, ptrs_back, mempool );
}
#ifdef TESTMEMORYPOOL_PRINT
execution_space::fence();
elapsed_time = timer.seconds();
print_results( "phase3: ", elapsed_time );
#ifdef TESTMEMORYPOOL_PRINT_STATUS
mempool.print_status();
#endif
#endif
}
// Tests for correct behavior when the allocator is out of memory.
template < class Device >
void test_memory_exhaustion()
{
#ifdef TESTMEMORYPOOL_PRINT
typedef typename Device::execution_space execution_space;
#endif
typedef typename Device::memory_space memory_space;
typedef Device device_type;
typedef Kokkos::View< pointer_obj *, device_type > pointer_view;
typedef Kokkos::Experimental::MemoryPool< device_type > pool_memory_space;
// The allocator will have a single superblock, and allocations will all be
// of the same chunk size. The allocation loop will attempt to allocate
// twice the number of chunks as are available in the allocator. The
// deallocation loop will only free the successfully allocated chunks.
size_t chunk_size = 128;
size_t num_chunks = 128;
size_t half_num_chunks = num_chunks / 2;
size_t superblock_size = chunk_size * half_num_chunks;
size_t lg_superblock_size =
Kokkos::Impl::integral_power_of_two( superblock_size );
pointer_view pointers( "pointers", num_chunks );
#ifdef TESTMEMORYPOOL_PRINT
std::cout << "\n*** test_memory_exhaustion() ***" << std::endl;
double elapsed_time = 0;
Kokkos::Timer timer;
#endif
pool_memory_space mempool( memory_space(), superblock_size,
lg_superblock_size );
#ifdef TESTMEMORYPOOL_PRINT
execution_space::fence();
elapsed_time = timer.seconds();
print_results( "initialize mempool: ", elapsed_time );
#ifdef TESTMEMORYPOOL_PRINT_STATUS
mempool.print_status();
#endif
timer.reset();
#endif
{
allocate_memory< pointer_view, pool_memory_space >
am( pointers, num_chunks, chunk_size, mempool );
}
#ifdef TESTMEMORYPOOL_PRINT
execution_space::fence();
elapsed_time = timer.seconds();
print_results( "allocate chunks: ", elapsed_time );
#ifdef TESTMEMORYPOOL_PRINT_STATUS
mempool.print_status();
#endif
timer.reset();
#endif
{
// In parallel, the allocations that succeeded were not put contiguously
// into the pointers View. The whole View can still be looped over and
// have deallocate called because deallocate will just do nothing for NULL
// pointers.
deallocate_memory< pointer_view, pool_memory_space >
dm( pointers, num_chunks, chunk_size, mempool );
}
#ifdef TESTMEMORYPOOL_PRINT
execution_space::fence();
elapsed_time = timer.seconds();
print_results( "deallocate chunks: ", elapsed_time );
#ifdef TESTMEMORYPOOL_PRINT_STATUS
mempool.print_status();
#endif
#endif
}
}
-#ifdef TESTMEMORYPOOL_PRINT
#undef TESTMEMORYPOOL_PRINT
-#endif
-
-#ifdef TESTMEMORYPOOL_PRINT_STATUS
#undef TESTMEMORYPOOL_PRINT_STATUS
-#endif
-
-#ifdef STRIDE
#undef STRIDE
-#endif
+#undef STRIDE_ALLOC
#endif
diff --git a/lib/kokkos/core/unit_test/TestOpenMP_c.cpp b/lib/kokkos/core/unit_test/TestOpenMP_c.cpp
deleted file mode 100644
index f0cdabe91..000000000
--- a/lib/kokkos/core/unit_test/TestOpenMP_c.cpp
+++ /dev/null
@@ -1,262 +0,0 @@
-/*
-//@HEADER
-// ************************************************************************
-//
-// Kokkos v. 2.0
-// Copyright (2014) Sandia Corporation
-//
-// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
-// the U.S. Government retains certain rights in this software.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are
-// met:
-//
-// 1. Redistributions of source code must retain the above copyright
-// notice, this list of conditions and the following disclaimer.
-//
-// 2. Redistributions in binary form must reproduce the above copyright
-// notice, this list of conditions and the following disclaimer in the
-// documentation and/or other materials provided with the distribution.
-//
-// 3. Neither the name of the Corporation nor the names of the
-// contributors may be used to endorse or promote products derived from
-// this software without specific prior written permission.
-//
-// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
-// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
-// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
-// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
-// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
-// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
-// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
-// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
-// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-//
-// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
-// ************************************************************************
-//@HEADER
-*/
-
-#include <gtest/gtest.h>
-
-#include <Kokkos_Macros.hpp>
-#ifdef KOKKOS_LAMBDA
-#undef KOKKOS_LAMBDA
-#endif
-#define KOKKOS_LAMBDA [=]
-
-#include <Kokkos_Core.hpp>
-
-//----------------------------------------------------------------------------
-
-#include <TestViewImpl.hpp>
-#include <TestAtomic.hpp>
-
-#include <TestViewAPI.hpp>
-#include <TestViewSubview.hpp>
-#include <TestViewOfClass.hpp>
-
-#include <TestSharedAlloc.hpp>
-#include <TestViewMapping.hpp>
-
-#include <TestRange.hpp>
-#include <TestTeam.hpp>
-#include <TestReduce.hpp>
-#include <TestScan.hpp>
-#include <TestAggregate.hpp>
-#include <TestAggregateReduction.hpp>
-#include <TestCompilerMacros.hpp>
-#include <TestMemoryPool.hpp>
-#include <TestTaskPolicy.hpp>
-
-
-#include <TestCXX11.hpp>
-#include <TestCXX11Deduction.hpp>
-#include <TestTeamVector.hpp>
-#include <TestMemorySpaceTracking.hpp>
-#include <TestTemplateMetaFunctions.hpp>
-
-#include <TestPolicyConstruction.hpp>
-
-
-namespace Test {
-
-class openmp : public ::testing::Test {
-protected:
- static void SetUpTestCase();
- static void TearDownTestCase();
-};
-
-TEST_F( openmp , view_remap )
-{
- enum { N0 = 3 , N1 = 2 , N2 = 8 , N3 = 9 };
-
- typedef Kokkos::View< double*[N1][N2][N3] ,
- Kokkos::LayoutRight ,
- Kokkos::OpenMP > output_type ;
-
- typedef Kokkos::View< int**[N2][N3] ,
- Kokkos::LayoutLeft ,
- Kokkos::OpenMP > input_type ;
-
- typedef Kokkos::View< int*[N0][N2][N3] ,
- Kokkos::LayoutLeft ,
- Kokkos::OpenMP > diff_type ;
-
- output_type output( "output" , N0 );
- input_type input ( "input" , N0 , N1 );
- diff_type diff ( "diff" , N0 );
-
- int value = 0 ;
- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
- for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
- input(i0,i1,i2,i3) = ++value ;
- }}}}
-
- // Kokkos::deep_copy( diff , input ); // throw with incompatible shape
- Kokkos::deep_copy( output , input );
-
- value = 0 ;
- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
- for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
- ++value ;
- ASSERT_EQ( value , ((int) output(i0,i1,i2,i3) ) );
- }}}}
-}
-
-//----------------------------------------------------------------------------
-
-
-TEST_F( openmp , view_aggregate )
-{
- TestViewAggregate< Kokkos::OpenMP >();
- TestViewAggregateReduction< Kokkos::OpenMP >();
-}
-
-//----------------------------------------------------------------------------
-
-TEST_F( openmp , scan )
-{
- TestScan< Kokkos::OpenMP >::test_range( 1 , 1000 );
- TestScan< Kokkos::OpenMP >( 1000000 );
- TestScan< Kokkos::OpenMP >( 10000000 );
- Kokkos::OpenMP::fence();
-}
-
-
-TEST_F( openmp , team_scan )
-{
- TestScanTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >( 10 );
- TestScanTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >( 10 );
- TestScanTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >( 10000 );
- TestScanTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >( 10000 );
-}
-
-//----------------------------------------------------------------------------
-
-TEST_F( openmp , compiler_macros )
-{
- ASSERT_TRUE( ( TestCompilerMacros::Test< Kokkos::OpenMP >() ) );
-}
-
-//----------------------------------------------------------------------------
-
-TEST_F( openmp , memory_space )
-{
- TestMemorySpace< Kokkos::OpenMP >();
-}
-
-TEST_F( openmp , memory_pool )
-{
- bool val = TestMemoryPool::test_mempool< Kokkos::OpenMP >( 128, 128000000 );
- ASSERT_TRUE( val );
-
- TestMemoryPool::test_mempool2< Kokkos::OpenMP >( 64, 4, 1000000, 2000000 );
-
- TestMemoryPool::test_memory_exhaustion< Kokkos::OpenMP >();
-}
-
-//----------------------------------------------------------------------------
-
-TEST_F( openmp , template_meta_functions )
-{
- TestTemplateMetaFunctions<int, Kokkos::OpenMP >();
-}
-
-//----------------------------------------------------------------------------
-
-#if defined( KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_OPENMP )
-TEST_F( openmp , cxx11 )
-{
- if ( Kokkos::Impl::is_same< Kokkos::DefaultExecutionSpace , Kokkos::OpenMP >::value ) {
- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::OpenMP >(1) ) );
- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::OpenMP >(2) ) );
- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::OpenMP >(3) ) );
- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::OpenMP >(4) ) );
- }
-}
-#endif
-
-TEST_F( openmp , reduction_deduction )
-{
- TestCXX11::test_reduction_deduction< Kokkos::OpenMP >();
-}
-
-TEST_F( openmp , team_vector )
-{
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(0) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(1) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(2) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(3) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(4) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(5) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(6) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(7) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(8) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(9) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(10) ) );
-}
-
-//----------------------------------------------------------------------------
-
-#if defined( KOKKOS_ENABLE_TASKPOLICY )
-
-TEST_F( openmp , task_fib )
-{
- for ( int i = 0 ; i < 25 ; ++i ) {
- TestTaskPolicy::TestFib< Kokkos::OpenMP >::run(i, (i+1)*1000000 );
- }
-}
-
-TEST_F( openmp , task_depend )
-{
- for ( int i = 0 ; i < 25 ; ++i ) {
- TestTaskPolicy::TestTaskDependence< Kokkos::OpenMP >::run(i);
- }
-}
-
-TEST_F( openmp , task_team )
-{
- TestTaskPolicy::TestTaskTeam< Kokkos::OpenMP >::run(1000);
- //TestTaskPolicy::TestTaskTeamValue< Kokkos::OpenMP >::run(1000); //TODO put back after testing
-}
-
-
-#endif /* #if defined( KOKKOS_ENABLE_TASKPOLICY ) */
-
-
-} // namespace test
-
-
-
-
-
-
diff --git a/lib/kokkos/core/unit_test/TestPolicyConstruction.hpp b/lib/kokkos/core/unit_test/TestPolicyConstruction.hpp
index 049138eb0..1bb45481c 100644
--- a/lib/kokkos/core/unit_test/TestPolicyConstruction.hpp
+++ b/lib/kokkos/core/unit_test/TestPolicyConstruction.hpp
@@ -1,493 +1,497 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
#include <stdexcept>
#include <sstream>
#include <iostream>
struct SomeTag{};
template< class ExecutionSpace >
class TestRangePolicyConstruction {
public:
TestRangePolicyConstruction() {
test_compile_time_parameters();
}
private:
void test_compile_time_parameters() {
+ {
+ Kokkos::Impl::expand_variadic();
+ Kokkos::Impl::expand_variadic(1,2,3);
+ }
{
typedef Kokkos::RangePolicy<> policy_t;
typedef typename policy_t::execution_space execution_space;
typedef typename policy_t::index_type index_type;
typedef typename policy_t::schedule_type schedule_type;
typedef typename policy_t::work_tag work_tag;
ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
ASSERT_TRUE((std::is_same<index_type ,typename execution_space::size_type >::value));
ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Static> >::value));
ASSERT_TRUE((std::is_same<work_tag ,void >::value));
}
{
typedef Kokkos::RangePolicy<ExecutionSpace> policy_t;
typedef typename policy_t::execution_space execution_space;
typedef typename policy_t::index_type index_type;
typedef typename policy_t::schedule_type schedule_type;
typedef typename policy_t::work_tag work_tag;
ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
ASSERT_TRUE((std::is_same<index_type ,typename execution_space::size_type >::value));
ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Static> >::value));
ASSERT_TRUE((std::is_same<work_tag ,void >::value));
}
{
typedef Kokkos::RangePolicy<ExecutionSpace,Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
typedef typename policy_t::execution_space execution_space;
typedef typename policy_t::index_type index_type;
typedef typename policy_t::schedule_type schedule_type;
typedef typename policy_t::work_tag work_tag;
ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
ASSERT_TRUE((std::is_same<index_type ,typename execution_space::size_type >::value));
ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
ASSERT_TRUE((std::is_same<work_tag ,void >::value));
}
{
typedef Kokkos::RangePolicy<ExecutionSpace,Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long> > policy_t;
typedef typename policy_t::execution_space execution_space;
typedef typename policy_t::index_type index_type;
typedef typename policy_t::schedule_type schedule_type;
typedef typename policy_t::work_tag work_tag;
ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
ASSERT_TRUE((std::is_same<index_type ,long >::value));
ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
ASSERT_TRUE((std::is_same<work_tag ,void >::value));
}
{
typedef Kokkos::RangePolicy<Kokkos::IndexType<long>, ExecutionSpace,Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
typedef typename policy_t::execution_space execution_space;
typedef typename policy_t::index_type index_type;
typedef typename policy_t::schedule_type schedule_type;
typedef typename policy_t::work_tag work_tag;
ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
ASSERT_TRUE((std::is_same<index_type ,long >::value));
ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
ASSERT_TRUE((std::is_same<work_tag ,void >::value));
}
{
typedef Kokkos::RangePolicy<ExecutionSpace,Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long>,SomeTag > policy_t;
typedef typename policy_t::execution_space execution_space;
typedef typename policy_t::index_type index_type;
typedef typename policy_t::schedule_type schedule_type;
typedef typename policy_t::work_tag work_tag;
ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
ASSERT_TRUE((std::is_same<index_type ,long >::value));
ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
ASSERT_TRUE((std::is_same<work_tag ,SomeTag >::value));
}
{
typedef Kokkos::RangePolicy<Kokkos::Schedule<Kokkos::Dynamic>,ExecutionSpace,Kokkos::IndexType<long>,SomeTag > policy_t;
typedef typename policy_t::execution_space execution_space;
typedef typename policy_t::index_type index_type;
typedef typename policy_t::schedule_type schedule_type;
typedef typename policy_t::work_tag work_tag;
ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
ASSERT_TRUE((std::is_same<index_type ,long >::value));
ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
ASSERT_TRUE((std::is_same<work_tag ,SomeTag >::value));
}
{
typedef Kokkos::RangePolicy<SomeTag,Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long>,ExecutionSpace > policy_t;
typedef typename policy_t::execution_space execution_space;
typedef typename policy_t::index_type index_type;
typedef typename policy_t::schedule_type schedule_type;
typedef typename policy_t::work_tag work_tag;
ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
ASSERT_TRUE((std::is_same<index_type ,long >::value));
ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
ASSERT_TRUE((std::is_same<work_tag ,SomeTag >::value));
}
{
typedef Kokkos::RangePolicy<Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
typedef typename policy_t::execution_space execution_space;
typedef typename policy_t::index_type index_type;
typedef typename policy_t::schedule_type schedule_type;
typedef typename policy_t::work_tag work_tag;
ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
ASSERT_TRUE((std::is_same<index_type ,typename execution_space::size_type >::value));
ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
ASSERT_TRUE((std::is_same<work_tag ,void >::value));
}
{
typedef Kokkos::RangePolicy<Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long> > policy_t;
typedef typename policy_t::execution_space execution_space;
typedef typename policy_t::index_type index_type;
typedef typename policy_t::schedule_type schedule_type;
typedef typename policy_t::work_tag work_tag;
ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
ASSERT_TRUE((std::is_same<index_type ,long >::value));
ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
ASSERT_TRUE((std::is_same<work_tag ,void >::value));
}
{
typedef Kokkos::RangePolicy<Kokkos::IndexType<long>, Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
typedef typename policy_t::execution_space execution_space;
typedef typename policy_t::index_type index_type;
typedef typename policy_t::schedule_type schedule_type;
typedef typename policy_t::work_tag work_tag;
ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
ASSERT_TRUE((std::is_same<index_type ,long >::value));
ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
ASSERT_TRUE((std::is_same<work_tag ,void >::value));
}
{
typedef Kokkos::RangePolicy<Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long>,SomeTag > policy_t;
typedef typename policy_t::execution_space execution_space;
typedef typename policy_t::index_type index_type;
typedef typename policy_t::schedule_type schedule_type;
typedef typename policy_t::work_tag work_tag;
ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
ASSERT_TRUE((std::is_same<index_type ,long >::value));
ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
ASSERT_TRUE((std::is_same<work_tag ,SomeTag >::value));
}
{
typedef Kokkos::RangePolicy<Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long>,SomeTag > policy_t;
typedef typename policy_t::execution_space execution_space;
typedef typename policy_t::index_type index_type;
typedef typename policy_t::schedule_type schedule_type;
typedef typename policy_t::work_tag work_tag;
ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
ASSERT_TRUE((std::is_same<index_type ,long >::value));
ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
ASSERT_TRUE((std::is_same<work_tag ,SomeTag >::value));
}
{
typedef Kokkos::RangePolicy<SomeTag,Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long> > policy_t;
typedef typename policy_t::execution_space execution_space;
typedef typename policy_t::index_type index_type;
typedef typename policy_t::schedule_type schedule_type;
typedef typename policy_t::work_tag work_tag;
ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
ASSERT_TRUE((std::is_same<index_type ,long >::value));
ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
ASSERT_TRUE((std::is_same<work_tag ,SomeTag >::value));
}
}
};
template< class ExecutionSpace >
class TestTeamPolicyConstruction {
public:
TestTeamPolicyConstruction() {
test_compile_time_parameters();
test_run_time_parameters();
}
private:
void test_compile_time_parameters() {
{
typedef Kokkos::TeamPolicy<> policy_t;
typedef typename policy_t::execution_space execution_space;
typedef typename policy_t::index_type index_type;
typedef typename policy_t::schedule_type schedule_type;
typedef typename policy_t::work_tag work_tag;
ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
ASSERT_TRUE((std::is_same<index_type ,typename execution_space::size_type >::value));
ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Static> >::value));
ASSERT_TRUE((std::is_same<work_tag ,void >::value));
}
{
typedef Kokkos::TeamPolicy<ExecutionSpace> policy_t;
typedef typename policy_t::execution_space execution_space;
typedef typename policy_t::index_type index_type;
typedef typename policy_t::schedule_type schedule_type;
typedef typename policy_t::work_tag work_tag;
ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
ASSERT_TRUE((std::is_same<index_type ,typename execution_space::size_type >::value));
ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Static> >::value));
ASSERT_TRUE((std::is_same<work_tag ,void >::value));
}
{
typedef Kokkos::TeamPolicy<ExecutionSpace,Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
typedef typename policy_t::execution_space execution_space;
typedef typename policy_t::index_type index_type;
typedef typename policy_t::schedule_type schedule_type;
typedef typename policy_t::work_tag work_tag;
ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
ASSERT_TRUE((std::is_same<index_type ,typename execution_space::size_type >::value));
ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
ASSERT_TRUE((std::is_same<work_tag ,void >::value));
}
{
typedef Kokkos::TeamPolicy<ExecutionSpace,Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long> > policy_t;
typedef typename policy_t::execution_space execution_space;
typedef typename policy_t::index_type index_type;
typedef typename policy_t::schedule_type schedule_type;
typedef typename policy_t::work_tag work_tag;
ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
ASSERT_TRUE((std::is_same<index_type ,long >::value));
ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
ASSERT_TRUE((std::is_same<work_tag ,void >::value));
}
{
typedef Kokkos::TeamPolicy<Kokkos::IndexType<long>, ExecutionSpace,Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
typedef typename policy_t::execution_space execution_space;
typedef typename policy_t::index_type index_type;
typedef typename policy_t::schedule_type schedule_type;
typedef typename policy_t::work_tag work_tag;
ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
ASSERT_TRUE((std::is_same<index_type ,long >::value));
ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
ASSERT_TRUE((std::is_same<work_tag ,void >::value));
}
{
typedef Kokkos::TeamPolicy<ExecutionSpace,Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long>,SomeTag > policy_t;
typedef typename policy_t::execution_space execution_space;
typedef typename policy_t::index_type index_type;
typedef typename policy_t::schedule_type schedule_type;
typedef typename policy_t::work_tag work_tag;
ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
ASSERT_TRUE((std::is_same<index_type ,long >::value));
ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
ASSERT_TRUE((std::is_same<work_tag ,SomeTag >::value));
}
{
typedef Kokkos::TeamPolicy<Kokkos::Schedule<Kokkos::Dynamic>,ExecutionSpace,Kokkos::IndexType<long>,SomeTag > policy_t;
typedef typename policy_t::execution_space execution_space;
typedef typename policy_t::index_type index_type;
typedef typename policy_t::schedule_type schedule_type;
typedef typename policy_t::work_tag work_tag;
ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
ASSERT_TRUE((std::is_same<index_type ,long >::value));
ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
ASSERT_TRUE((std::is_same<work_tag ,SomeTag >::value));
}
{
typedef Kokkos::TeamPolicy<SomeTag,Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long>,ExecutionSpace > policy_t;
typedef typename policy_t::execution_space execution_space;
typedef typename policy_t::index_type index_type;
typedef typename policy_t::schedule_type schedule_type;
typedef typename policy_t::work_tag work_tag;
ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
ASSERT_TRUE((std::is_same<index_type ,long >::value));
ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
ASSERT_TRUE((std::is_same<work_tag ,SomeTag >::value));
}
{
typedef Kokkos::TeamPolicy<Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
typedef typename policy_t::execution_space execution_space;
typedef typename policy_t::index_type index_type;
typedef typename policy_t::schedule_type schedule_type;
typedef typename policy_t::work_tag work_tag;
ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
ASSERT_TRUE((std::is_same<index_type ,typename execution_space::size_type >::value));
ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
ASSERT_TRUE((std::is_same<work_tag ,void >::value));
}
{
typedef Kokkos::TeamPolicy<Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long> > policy_t;
typedef typename policy_t::execution_space execution_space;
typedef typename policy_t::index_type index_type;
typedef typename policy_t::schedule_type schedule_type;
typedef typename policy_t::work_tag work_tag;
ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
ASSERT_TRUE((std::is_same<index_type ,long >::value));
ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
ASSERT_TRUE((std::is_same<work_tag ,void >::value));
}
{
typedef Kokkos::TeamPolicy<Kokkos::IndexType<long>, Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
typedef typename policy_t::execution_space execution_space;
typedef typename policy_t::index_type index_type;
typedef typename policy_t::schedule_type schedule_type;
typedef typename policy_t::work_tag work_tag;
ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
ASSERT_TRUE((std::is_same<index_type ,long >::value));
ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
ASSERT_TRUE((std::is_same<work_tag ,void >::value));
}
{
typedef Kokkos::TeamPolicy<Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long>,SomeTag > policy_t;
typedef typename policy_t::execution_space execution_space;
typedef typename policy_t::index_type index_type;
typedef typename policy_t::schedule_type schedule_type;
typedef typename policy_t::work_tag work_tag;
ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
ASSERT_TRUE((std::is_same<index_type ,long >::value));
ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
ASSERT_TRUE((std::is_same<work_tag ,SomeTag >::value));
}
{
typedef Kokkos::TeamPolicy<Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long>,SomeTag > policy_t;
typedef typename policy_t::execution_space execution_space;
typedef typename policy_t::index_type index_type;
typedef typename policy_t::schedule_type schedule_type;
typedef typename policy_t::work_tag work_tag;
ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
ASSERT_TRUE((std::is_same<index_type ,long >::value));
ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
ASSERT_TRUE((std::is_same<work_tag ,SomeTag >::value));
}
{
typedef Kokkos::TeamPolicy<SomeTag,Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long> > policy_t;
typedef typename policy_t::execution_space execution_space;
typedef typename policy_t::index_type index_type;
typedef typename policy_t::schedule_type schedule_type;
typedef typename policy_t::work_tag work_tag;
ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
ASSERT_TRUE((std::is_same<index_type ,long >::value));
ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
ASSERT_TRUE((std::is_same<work_tag ,SomeTag >::value));
}
}
template<class policy_t>
void test_run_time_parameters_type() {
int league_size = 131;
int team_size = 4<policy_t::execution_space::concurrency()?4:policy_t::execution_space::concurrency();
int chunk_size = 4;
int per_team_scratch = 1024;
int per_thread_scratch = 16;
int scratch_size = per_team_scratch + per_thread_scratch*team_size;
policy_t p1(league_size,team_size);
ASSERT_EQ (p1.league_size() , league_size);
ASSERT_EQ (p1.team_size() , team_size);
ASSERT_TRUE(p1.chunk_size() > 0);
ASSERT_EQ (p1.scratch_size(0), 0);
policy_t p2 = p1.set_chunk_size(chunk_size);
ASSERT_EQ (p1.league_size() , league_size);
ASSERT_EQ (p1.team_size() , team_size);
ASSERT_TRUE(p1.chunk_size() > 0);
ASSERT_EQ (p1.scratch_size(0), 0);
ASSERT_EQ (p2.league_size() , league_size);
ASSERT_EQ (p2.team_size() , team_size);
ASSERT_EQ (p2.chunk_size() , chunk_size);
ASSERT_EQ (p2.scratch_size(0), 0);
policy_t p3 = p2.set_scratch_size(0,Kokkos::PerTeam(per_team_scratch));
ASSERT_EQ (p2.league_size() , league_size);
ASSERT_EQ (p2.team_size() , team_size);
ASSERT_EQ (p2.chunk_size() , chunk_size);
ASSERT_EQ (p2.scratch_size(0), 0);
ASSERT_EQ (p3.league_size() , league_size);
ASSERT_EQ (p3.team_size() , team_size);
ASSERT_EQ (p3.chunk_size() , chunk_size);
ASSERT_EQ (p3.scratch_size(0), per_team_scratch);
policy_t p4 = p2.set_scratch_size(0,Kokkos::PerThread(per_thread_scratch));
ASSERT_EQ (p2.league_size() , league_size);
ASSERT_EQ (p2.team_size() , team_size);
ASSERT_EQ (p2.chunk_size() , chunk_size);
ASSERT_EQ (p2.scratch_size(0), 0);
ASSERT_EQ (p4.league_size() , league_size);
ASSERT_EQ (p4.team_size() , team_size);
ASSERT_EQ (p4.chunk_size() , chunk_size);
ASSERT_EQ (p4.scratch_size(0), per_thread_scratch*team_size);
policy_t p5 = p2.set_scratch_size(0,Kokkos::PerThread(per_thread_scratch),Kokkos::PerTeam(per_team_scratch));
ASSERT_EQ (p2.league_size() , league_size);
ASSERT_EQ (p2.team_size() , team_size);
ASSERT_EQ (p2.chunk_size() , chunk_size);
ASSERT_EQ (p2.scratch_size(0), 0);
ASSERT_EQ (p5.league_size() , league_size);
ASSERT_EQ (p5.team_size() , team_size);
ASSERT_EQ (p5.chunk_size() , chunk_size);
ASSERT_EQ (p5.scratch_size(0), scratch_size);
policy_t p6 = p2.set_scratch_size(0,Kokkos::PerTeam(per_team_scratch),Kokkos::PerThread(per_thread_scratch));
ASSERT_EQ (p2.league_size() , league_size);
ASSERT_EQ (p2.team_size() , team_size);
ASSERT_EQ (p2.chunk_size() , chunk_size);
ASSERT_EQ (p2.scratch_size(0), 0);
ASSERT_EQ (p6.league_size() , league_size);
ASSERT_EQ (p6.team_size() , team_size);
ASSERT_EQ (p6.chunk_size() , chunk_size);
ASSERT_EQ (p6.scratch_size(0), scratch_size);
policy_t p7 = p3.set_scratch_size(0,Kokkos::PerTeam(per_team_scratch),Kokkos::PerThread(per_thread_scratch));
ASSERT_EQ (p3.league_size() , league_size);
ASSERT_EQ (p3.team_size() , team_size);
ASSERT_EQ (p3.chunk_size() , chunk_size);
ASSERT_EQ (p3.scratch_size(0), per_team_scratch);
ASSERT_EQ (p7.league_size() , league_size);
ASSERT_EQ (p7.team_size() , team_size);
ASSERT_EQ (p7.chunk_size() , chunk_size);
ASSERT_EQ (p7.scratch_size(0), scratch_size);
}
void test_run_time_parameters() {
test_run_time_parameters_type<Kokkos::TeamPolicy<ExecutionSpace> >();
test_run_time_parameters_type<Kokkos::TeamPolicy<ExecutionSpace,Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long> > >();
test_run_time_parameters_type<Kokkos::TeamPolicy<Kokkos::IndexType<long>, ExecutionSpace, Kokkos::Schedule<Kokkos::Dynamic> > >();
test_run_time_parameters_type<Kokkos::TeamPolicy<Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long>,ExecutionSpace,SomeTag > >();
}
};
diff --git a/lib/kokkos/core/unit_test/TestQthread.cpp b/lib/kokkos/core/unit_test/TestQthread.cpp
index 431b844c9..a465f39ca 100644
--- a/lib/kokkos/core/unit_test/TestQthread.cpp
+++ b/lib/kokkos/core/unit_test/TestQthread.cpp
@@ -1,290 +1,287 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
#include <Kokkos_Qthread.hpp>
-#include <Qthread/Kokkos_Qthread_TaskPolicy.hpp>
-
//----------------------------------------------------------------------------
-#include <TestViewImpl.hpp>
#include <TestAtomic.hpp>
#include <TestViewAPI.hpp>
#include <TestViewOfClass.hpp>
#include <TestTeam.hpp>
#include <TestRange.hpp>
#include <TestReduce.hpp>
#include <TestScan.hpp>
#include <TestAggregate.hpp>
#include <TestCompilerMacros.hpp>
-#include <TestTaskPolicy.hpp>
+#include <TestTaskScheduler.hpp>
// #include <TestTeamVector.hpp>
namespace Test {
class qthread : public ::testing::Test {
protected:
static void SetUpTestCase()
{
const unsigned numa_count = Kokkos::hwloc::get_available_numa_count();
const unsigned cores_per_numa = Kokkos::hwloc::get_available_cores_per_numa();
const unsigned threads_per_core = Kokkos::hwloc::get_available_threads_per_core();
int threads_count = std::max( 1u , numa_count )
* std::max( 2u , ( cores_per_numa * threads_per_core ) / 2 );
Kokkos::Qthread::initialize( threads_count );
Kokkos::Qthread::print_configuration( std::cout , true );
}
static void TearDownTestCase()
{
Kokkos::Qthread::finalize();
}
};
TEST_F( qthread , compiler_macros )
{
ASSERT_TRUE( ( TestCompilerMacros::Test< Kokkos::Qthread >() ) );
}
TEST_F( qthread, view_impl) {
test_view_impl< Kokkos::Qthread >();
}
TEST_F( qthread, view_api) {
TestViewAPI< double , Kokkos::Qthread >();
}
TEST_F( qthread , view_nested_view )
{
::Test::view_nested_view< Kokkos::Qthread >();
}
TEST_F( qthread , range_tag )
{
TestRange< Kokkos::Qthread , Kokkos::Schedule<Kokkos::Static> >::test_for(1000);
TestRange< Kokkos::Qthread , Kokkos::Schedule<Kokkos::Static> >::test_reduce(1000);
TestRange< Kokkos::Qthread , Kokkos::Schedule<Kokkos::Static> >::test_scan(1000);
}
TEST_F( qthread , team_tag )
{
TestTeamPolicy< Kokkos::Qthread , Kokkos::Schedule<Kokkos::Static> >::test_for( 1000 );
TestTeamPolicy< Kokkos::Qthread , Kokkos::Schedule<Kokkos::Static> >::test_reduce( 1000 );
}
TEST_F( qthread, long_reduce) {
TestReduce< long , Kokkos::Qthread >( 1000000 );
}
TEST_F( qthread, double_reduce) {
TestReduce< double , Kokkos::Qthread >( 1000000 );
}
TEST_F( qthread, long_reduce_dynamic ) {
TestReduceDynamic< long , Kokkos::Qthread >( 1000000 );
}
TEST_F( qthread, double_reduce_dynamic ) {
TestReduceDynamic< double , Kokkos::Qthread >( 1000000 );
}
TEST_F( qthread, long_reduce_dynamic_view ) {
TestReduceDynamicView< long , Kokkos::Qthread >( 1000000 );
}
TEST_F( qthread, team_long_reduce) {
TestReduceTeam< long , Kokkos::Qthread , Kokkos::Schedule<Kokkos::Static> >( 1000000 );
}
TEST_F( qthread, team_double_reduce) {
TestReduceTeam< double , Kokkos::Qthread , Kokkos::Schedule<Kokkos::Static> >( 1000000 );
}
TEST_F( qthread , atomics )
{
const int loop_count = 1e4 ;
ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Qthread>(loop_count,1) ) );
ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Qthread>(loop_count,2) ) );
ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Qthread>(loop_count,3) ) );
ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Qthread>(loop_count,1) ) );
ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Qthread>(loop_count,2) ) );
ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Qthread>(loop_count,3) ) );
ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Qthread>(loop_count,1) ) );
ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Qthread>(loop_count,2) ) );
ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Qthread>(loop_count,3) ) );
ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Qthread>(loop_count,1) ) );
ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Qthread>(loop_count,2) ) );
ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Qthread>(loop_count,3) ) );
ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Qthread>(loop_count,1) ) );
ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Qthread>(loop_count,2) ) );
ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Qthread>(loop_count,3) ) );
ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Qthread>(loop_count,1) ) );
ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Qthread>(loop_count,2) ) );
ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Qthread>(loop_count,3) ) );
ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Qthread>(100,1) ) );
ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Qthread>(100,2) ) );
ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Qthread>(100,3) ) );
#if defined( KOKKOS_ENABLE_ASM )
ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Qthread>(100,1) ) );
ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Qthread>(100,2) ) );
ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Qthread>(100,3) ) );
#endif
}
TEST_F( qthread , view_remap )
{
enum { N0 = 3 , N1 = 2 , N2 = 8 , N3 = 9 };
typedef Kokkos::View< double*[N1][N2][N3] ,
Kokkos::LayoutRight ,
Kokkos::Qthread > output_type ;
typedef Kokkos::View< int**[N2][N3] ,
Kokkos::LayoutLeft ,
Kokkos::Qthread > input_type ;
typedef Kokkos::View< int*[N0][N2][N3] ,
Kokkos::LayoutLeft ,
Kokkos::Qthread > diff_type ;
output_type output( "output" , N0 );
input_type input ( "input" , N0 , N1 );
diff_type diff ( "diff" , N0 );
int value = 0 ;
for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
input(i0,i1,i2,i3) = ++value ;
}}}}
// Kokkos::deep_copy( diff , input ); // throw with incompatible shape
Kokkos::deep_copy( output , input );
value = 0 ;
for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
++value ;
ASSERT_EQ( value , ((int) output(i0,i1,i2,i3) ) );
}}}}
}
//----------------------------------------------------------------------------
TEST_F( qthread , view_aggregate )
{
TestViewAggregate< Kokkos::Qthread >();
}
//----------------------------------------------------------------------------
TEST_F( qthread , scan )
{
TestScan< Kokkos::Qthread >::test_range( 1 , 1000 );
TestScan< Kokkos::Qthread >( 1000000 );
TestScan< Kokkos::Qthread >( 10000000 );
Kokkos::Qthread::fence();
}
TEST_F( qthread, team_shared ) {
TestSharedTeam< Kokkos::Qthread , Kokkos::Schedule<Kokkos::Static> >();
}
TEST_F( qthread, shmem_size) {
TestShmemSize< Kokkos::Qthread >();
}
TEST_F( qthread , team_scan )
{
TestScanTeam< Kokkos::Qthread , Kokkos::Schedule<Kokkos::Static> >( 10 );
TestScanTeam< Kokkos::Qthread , Kokkos::Schedule<Kokkos::Static> >( 10000 );
}
#if 0 /* disable */
TEST_F( qthread , team_vector )
{
ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthread >(0) ) );
ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthread >(1) ) );
ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthread >(2) ) );
ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthread >(3) ) );
ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthread >(4) ) );
}
#endif
//----------------------------------------------------------------------------
TEST_F( qthread , task_policy )
{
- TestTaskPolicy::test_task_dep< Kokkos::Qthread >( 10 );
- for ( long i = 0 ; i < 25 ; ++i ) TestTaskPolicy::test_fib< Kokkos::Qthread >(i);
- for ( long i = 0 ; i < 35 ; ++i ) TestTaskPolicy::test_fib2< Kokkos::Qthread >(i);
+ TestTaskScheduler::test_task_dep< Kokkos::Qthread >( 10 );
+ for ( long i = 0 ; i < 25 ; ++i ) TestTaskScheduler::test_fib< Kokkos::Qthread >(i);
+ for ( long i = 0 ; i < 35 ; ++i ) TestTaskScheduler::test_fib2< Kokkos::Qthread >(i);
}
TEST_F( qthread , task_team )
{
- TestTaskPolicy::test_task_team< Kokkos::Qthread >(1000);
+ TestTaskScheduler::test_task_team< Kokkos::Qthread >(1000);
}
//----------------------------------------------------------------------------
} // namespace test
diff --git a/lib/kokkos/core/unit_test/TestRange.hpp b/lib/kokkos/core/unit_test/TestRange.hpp
index be8b4f90a..e342e844c 100644
--- a/lib/kokkos/core/unit_test/TestRange.hpp
+++ b/lib/kokkos/core/unit_test/TestRange.hpp
@@ -1,242 +1,242 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <stdio.h>
#include <Kokkos_Core.hpp>
/*--------------------------------------------------------------------------*/
namespace Test {
namespace {
template< class ExecSpace, class ScheduleType >
struct TestRange {
typedef int value_type ; ///< typedef required for the parallel_reduce
typedef Kokkos::View<int*,ExecSpace> view_type ;
view_type m_flags ;
struct VerifyInitTag {};
struct ResetTag {};
struct VerifyResetTag {};
TestRange( const size_t N )
: m_flags( Kokkos::ViewAllocateWithoutInitializing("flags"), N )
{}
static void test_for( const size_t N )
{
TestRange functor(N);
typename view_type::HostMirror host_flags = Kokkos::create_mirror_view( functor.m_flags );
Kokkos::parallel_for( Kokkos::RangePolicy<ExecSpace,ScheduleType>(0,N) , functor );
Kokkos::parallel_for( Kokkos::RangePolicy<ExecSpace,ScheduleType,VerifyInitTag>(0,N) , functor );
Kokkos::deep_copy( host_flags , functor.m_flags );
size_t error_count = 0 ;
for ( size_t i = 0 ; i < N ; ++i ) {
if ( int(i) != host_flags(i) ) ++error_count ;
}
ASSERT_EQ( error_count , size_t(0) );
Kokkos::parallel_for( Kokkos::RangePolicy<ExecSpace,ScheduleType,ResetTag>(0,N) , functor );
Kokkos::parallel_for( std::string("TestKernelFor") , Kokkos::RangePolicy<ExecSpace,ScheduleType,VerifyResetTag>(0,N) , functor );
Kokkos::deep_copy( host_flags , functor.m_flags );
error_count = 0 ;
for ( size_t i = 0 ; i < N ; ++i ) {
if ( int(2*i) != host_flags(i) ) ++error_count ;
}
ASSERT_EQ( error_count , size_t(0) );
}
KOKKOS_INLINE_FUNCTION
void operator()( const int i ) const
{ m_flags(i) = i ; }
KOKKOS_INLINE_FUNCTION
void operator()( const VerifyInitTag & , const int i ) const
{ if ( i != m_flags(i) ) { printf("TestRange::test_for error at %d != %d\n",i,m_flags(i)); } }
KOKKOS_INLINE_FUNCTION
void operator()( const ResetTag & , const int i ) const
{ m_flags(i) = 2 * m_flags(i); }
KOKKOS_INLINE_FUNCTION
void operator()( const VerifyResetTag & , const int i ) const
{ if ( 2 * i != m_flags(i) ) { printf("TestRange::test_for error at %d != %d\n",i,m_flags(i)); } }
//----------------------------------------
struct OffsetTag {};
static void test_reduce( const size_t N )
{
TestRange functor(N);
int total = 0 ;
Kokkos::parallel_for( Kokkos::RangePolicy<ExecSpace,ScheduleType>(0,N) , functor );
Kokkos::parallel_reduce( "TestKernelReduce" , Kokkos::RangePolicy<ExecSpace,ScheduleType>(0,N) , functor , total );
// sum( 0 .. N-1 )
ASSERT_EQ( size_t((N-1)*(N)/2) , size_t(total) );
Kokkos::parallel_reduce( Kokkos::RangePolicy<ExecSpace,ScheduleType,OffsetTag>(0,N) , functor , total );
// sum( 1 .. N )
ASSERT_EQ( size_t((N)*(N+1)/2) , size_t(total) );
}
KOKKOS_INLINE_FUNCTION
void operator()( const int i , value_type & update ) const
{ update += m_flags(i); }
KOKKOS_INLINE_FUNCTION
void operator()( const OffsetTag & , const int i , value_type & update ) const
{ update += 1 + m_flags(i); }
//----------------------------------------
static void test_scan( const size_t N )
{
TestRange functor(N);
Kokkos::parallel_for( Kokkos::RangePolicy<ExecSpace,ScheduleType>(0,N) , functor );
Kokkos::parallel_scan( "TestKernelScan" , Kokkos::RangePolicy<ExecSpace,ScheduleType,OffsetTag>(0,N) , functor );
}
KOKKOS_INLINE_FUNCTION
void operator()( const OffsetTag & , const int i , value_type & update , bool final ) const
{
update += m_flags(i);
if ( final ) {
if ( update != (i*(i+1))/2 ) {
printf("TestRange::test_scan error %d : %d != %d\n",i,(i*(i+1))/2,m_flags(i));
}
}
}
static void test_dynamic_policy( const size_t N ) {
typedef Kokkos::RangePolicy<ExecSpace,Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
{
Kokkos::View<size_t*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Atomic> > count("Count",ExecSpace::concurrency());
Kokkos::View<int*,ExecSpace> a("A",N);
Kokkos::parallel_for( policy_t(0,N),
KOKKOS_LAMBDA (const typename policy_t::member_type& i) {
for(int k=0; k<(i<N/2?1:10000); k++ )
a(i)++;
count(ExecSpace::hardware_thread_id())++;
});
int error = 0;
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N), KOKKOS_LAMBDA(const typename policy_t::member_type& i, int& lsum) {
lsum += ( a(i)!= (i<N/2?1:10000) );
},error);
ASSERT_EQ(error,0);
- if( ( ExecSpace::concurrency()>(int)1) && (N>static_cast<const size_t>(4*ExecSpace::concurrency())) ) {
+ if( ( ExecSpace::concurrency()>(int)1) && (N>static_cast<size_t>(4*ExecSpace::concurrency())) ) {
size_t min = N;
size_t max = 0;
for(int t=0; t<ExecSpace::concurrency(); t++) {
if(count(t)<min) min = count(t);
if(count(t)>max) max = count(t);
}
ASSERT_TRUE(min<max);
//if(ExecSpace::concurrency()>2)
// ASSERT_TRUE(2*min<max);
}
}
{
Kokkos::View<size_t*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Atomic> > count("Count",ExecSpace::concurrency());
Kokkos::View<int*,ExecSpace> a("A",N);
int sum = 0;
Kokkos::parallel_reduce( policy_t(0,N),
KOKKOS_LAMBDA (const typename policy_t::member_type& i, int& lsum) {
for(int k=0; k<(i<N/2?1:10000); k++ )
a(i)++;
count(ExecSpace::hardware_thread_id())++;
lsum++;
},sum);
ASSERT_EQ(sum,N);
int error = 0;
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N), KOKKOS_LAMBDA(const typename policy_t::member_type& i, int& lsum) {
lsum += ( a(i)!= (i<N/2?1:10000) );
},error);
ASSERT_EQ(error,0);
- if( ( ExecSpace::concurrency()>(int)1) && (N>static_cast<const size_t>(4*ExecSpace::concurrency())) ) {
+ if( ( ExecSpace::concurrency()>(int)1) && (N>static_cast<size_t>(4*ExecSpace::concurrency())) ) {
size_t min = N;
size_t max = 0;
for(int t=0; t<ExecSpace::concurrency(); t++) {
if(count(t)<min) min = count(t);
if(count(t)>max) max = count(t);
}
ASSERT_TRUE(min<max);
//if(ExecSpace::concurrency()>2)
// ASSERT_TRUE(2*min<max);
}
}
}
};
} /* namespace */
} /* namespace Test */
/*--------------------------------------------------------------------------*/
diff --git a/lib/kokkos/core/unit_test/TestReduce.hpp b/lib/kokkos/core/unit_test/TestReduce.hpp
index 53fc393bc..a15fab17a 100644
--- a/lib/kokkos/core/unit_test/TestReduce.hpp
+++ b/lib/kokkos/core/unit_test/TestReduce.hpp
@@ -1,1872 +1,1907 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#include <stdexcept>
#include <sstream>
#include <iostream>
#include <limits>
#include <Kokkos_Core.hpp>
/*--------------------------------------------------------------------------*/
namespace Test {
template< typename ScalarType , class DeviceType >
class ReduceFunctor
{
public:
typedef DeviceType execution_space ;
typedef typename execution_space::size_type size_type ;
struct value_type {
ScalarType value[3] ;
};
const size_type nwork ;
ReduceFunctor( const size_type & arg_nwork ) : nwork( arg_nwork ) {}
ReduceFunctor( const ReduceFunctor & rhs )
: nwork( rhs.nwork ) {}
/*
KOKKOS_INLINE_FUNCTION
void init( value_type & dst ) const
{
dst.value[0] = 0 ;
dst.value[1] = 0 ;
dst.value[2] = 0 ;
}
*/
KOKKOS_INLINE_FUNCTION
void join( volatile value_type & dst ,
const volatile value_type & src ) const
{
dst.value[0] += src.value[0] ;
dst.value[1] += src.value[1] ;
dst.value[2] += src.value[2] ;
}
KOKKOS_INLINE_FUNCTION
void operator()( size_type iwork , value_type & dst ) const
{
dst.value[0] += 1 ;
dst.value[1] += iwork + 1 ;
dst.value[2] += nwork - iwork ;
}
};
template< class DeviceType >
class ReduceFunctorFinal : public ReduceFunctor< long , DeviceType > {
public:
typedef typename ReduceFunctor< long , DeviceType >::value_type value_type ;
ReduceFunctorFinal( const size_t n )
: ReduceFunctor<long,DeviceType>(n)
{}
KOKKOS_INLINE_FUNCTION
void final( value_type & dst ) const
{
dst.value[0] = - dst.value[0] ;
dst.value[1] = - dst.value[1] ;
dst.value[2] = - dst.value[2] ;
}
};
template< typename ScalarType , class DeviceType >
class RuntimeReduceFunctor
{
public:
// Required for functor:
typedef DeviceType execution_space ;
typedef ScalarType value_type[] ;
const unsigned value_count ;
// Unit test details:
typedef typename execution_space::size_type size_type ;
const size_type nwork ;
RuntimeReduceFunctor( const size_type arg_nwork ,
const size_type arg_count )
: value_count( arg_count )
, nwork( arg_nwork ) {}
KOKKOS_INLINE_FUNCTION
void init( ScalarType dst[] ) const
{
for ( unsigned i = 0 ; i < value_count ; ++i ) dst[i] = 0 ;
}
KOKKOS_INLINE_FUNCTION
void join( volatile ScalarType dst[] ,
const volatile ScalarType src[] ) const
{
for ( unsigned i = 0 ; i < value_count ; ++i ) dst[i] += src[i] ;
}
KOKKOS_INLINE_FUNCTION
void operator()( size_type iwork , ScalarType dst[] ) const
{
const size_type tmp[3] = { 1 , iwork + 1 , nwork - iwork };
for ( size_type i = 0 ; i < value_count ; ++i ) {
dst[i] += tmp[ i % 3 ];
}
}
};
template< typename ScalarType , class DeviceType >
class RuntimeReduceMinMax
{
public:
// Required for functor:
typedef DeviceType execution_space ;
typedef ScalarType value_type[] ;
const unsigned value_count ;
// Unit test details:
typedef typename execution_space::size_type size_type ;
const size_type nwork ;
const ScalarType amin ;
const ScalarType amax ;
RuntimeReduceMinMax( const size_type arg_nwork ,
const size_type arg_count )
: value_count( arg_count )
, nwork( arg_nwork )
, amin( std::numeric_limits<ScalarType>::min() )
, amax( std::numeric_limits<ScalarType>::max() )
{}
KOKKOS_INLINE_FUNCTION
void init( ScalarType dst[] ) const
{
for ( unsigned i = 0 ; i < value_count ; ++i ) {
dst[i] = i % 2 ? amax : amin ;
}
}
KOKKOS_INLINE_FUNCTION
void join( volatile ScalarType dst[] ,
const volatile ScalarType src[] ) const
{
for ( unsigned i = 0 ; i < value_count ; ++i ) {
dst[i] = i % 2 ? ( dst[i] < src[i] ? dst[i] : src[i] ) // min
: ( dst[i] > src[i] ? dst[i] : src[i] ); // max
}
}
KOKKOS_INLINE_FUNCTION
void operator()( size_type iwork , ScalarType dst[] ) const
{
const ScalarType tmp[2] = { ScalarType(iwork + 1)
, ScalarType(nwork - iwork) };
for ( size_type i = 0 ; i < value_count ; ++i ) {
dst[i] = i % 2 ? ( dst[i] < tmp[i%2] ? dst[i] : tmp[i%2] )
: ( dst[i] > tmp[i%2] ? dst[i] : tmp[i%2] );
}
}
};
template< class DeviceType >
class RuntimeReduceFunctorFinal : public RuntimeReduceFunctor< long , DeviceType > {
public:
typedef RuntimeReduceFunctor< long , DeviceType > base_type ;
typedef typename base_type::value_type value_type ;
typedef long scalar_type ;
RuntimeReduceFunctorFinal( const size_t theNwork , const size_t count ) : base_type(theNwork,count) {}
KOKKOS_INLINE_FUNCTION
void final( value_type dst ) const
{
for ( unsigned i = 0 ; i < base_type::value_count ; ++i ) {
dst[i] = - dst[i] ;
}
}
};
} // namespace Test
namespace {
template< typename ScalarType , class DeviceType >
class TestReduce
{
public:
typedef DeviceType execution_space ;
typedef typename execution_space::size_type size_type ;
//------------------------------------
TestReduce( const size_type & nwork )
{
run_test(nwork);
run_test_final(nwork);
}
void run_test( const size_type & nwork )
{
typedef Test::ReduceFunctor< ScalarType , execution_space > functor_type ;
typedef typename functor_type::value_type value_type ;
enum { Count = 3 };
enum { Repeat = 100 };
value_type result[ Repeat ];
const unsigned long nw = nwork ;
const unsigned long nsum = nw % 2 ? nw * (( nw + 1 )/2 )
: (nw/2) * ( nw + 1 );
for ( unsigned i = 0 ; i < Repeat ; ++i ) {
Kokkos::parallel_reduce( nwork , functor_type(nwork) , result[i] );
}
for ( unsigned i = 0 ; i < Repeat ; ++i ) {
for ( unsigned j = 0 ; j < Count ; ++j ) {
const unsigned long correct = 0 == j % 3 ? nw : nsum ;
ASSERT_EQ( (ScalarType) correct , result[i].value[j] );
}
}
}
void run_test_final( const size_type & nwork )
{
typedef Test::ReduceFunctorFinal< execution_space > functor_type ;
typedef typename functor_type::value_type value_type ;
enum { Count = 3 };
enum { Repeat = 100 };
value_type result[ Repeat ];
const unsigned long nw = nwork ;
const unsigned long nsum = nw % 2 ? nw * (( nw + 1 )/2 )
: (nw/2) * ( nw + 1 );
for ( unsigned i = 0 ; i < Repeat ; ++i ) {
if(i%2==0)
Kokkos::parallel_reduce( nwork , functor_type(nwork) , result[i] );
else
Kokkos::parallel_reduce( "Reduce", nwork , functor_type(nwork) , result[i] );
}
for ( unsigned i = 0 ; i < Repeat ; ++i ) {
for ( unsigned j = 0 ; j < Count ; ++j ) {
const unsigned long correct = 0 == j % 3 ? nw : nsum ;
ASSERT_EQ( (ScalarType) correct , - result[i].value[j] );
}
}
}
};
template< typename ScalarType , class DeviceType >
class TestReduceDynamic
{
public:
typedef DeviceType execution_space ;
typedef typename execution_space::size_type size_type ;
//------------------------------------
TestReduceDynamic( const size_type nwork )
{
run_test_dynamic(nwork);
run_test_dynamic_minmax(nwork);
run_test_dynamic_final(nwork);
}
void run_test_dynamic( const size_type nwork )
{
typedef Test::RuntimeReduceFunctor< ScalarType , execution_space > functor_type ;
enum { Count = 3 };
enum { Repeat = 100 };
ScalarType result[ Repeat ][ Count ] ;
const unsigned long nw = nwork ;
const unsigned long nsum = nw % 2 ? nw * (( nw + 1 )/2 )
: (nw/2) * ( nw + 1 );
for ( unsigned i = 0 ; i < Repeat ; ++i ) {
if(i%2==0)
Kokkos::parallel_reduce( nwork , functor_type(nwork,Count) , result[i] );
else
Kokkos::parallel_reduce( "Reduce", nwork , functor_type(nwork,Count) , result[i] );
}
for ( unsigned i = 0 ; i < Repeat ; ++i ) {
for ( unsigned j = 0 ; j < Count ; ++j ) {
const unsigned long correct = 0 == j % 3 ? nw : nsum ;
ASSERT_EQ( (ScalarType) correct , result[i][j] );
}
}
}
void run_test_dynamic_minmax( const size_type nwork )
{
typedef Test::RuntimeReduceMinMax< ScalarType , execution_space > functor_type ;
enum { Count = 2 };
enum { Repeat = 100 };
ScalarType result[ Repeat ][ Count ] ;
for ( unsigned i = 0 ; i < Repeat ; ++i ) {
if(i%2==0)
Kokkos::parallel_reduce( nwork , functor_type(nwork,Count) , result[i] );
else
Kokkos::parallel_reduce( "Reduce", nwork , functor_type(nwork,Count) , result[i] );
}
for ( unsigned i = 0 ; i < Repeat ; ++i ) {
for ( unsigned j = 0 ; j < Count ; ++j ) {
- const unsigned long correct = j % 2 ? 1 : nwork ;
- ASSERT_EQ( (ScalarType) correct , result[i][j] );
+ if ( nwork == 0 )
+ {
+ ScalarType amin( std::numeric_limits<ScalarType>::min() );
+ ScalarType amax( std::numeric_limits<ScalarType>::max() );
+ const ScalarType correct = (j%2) ? amax : amin;
+ ASSERT_EQ( (ScalarType) correct , result[i][j] );
+ } else {
+ const unsigned long correct = j % 2 ? 1 : nwork ;
+ ASSERT_EQ( (ScalarType) correct , result[i][j] );
+ }
}
}
}
void run_test_dynamic_final( const size_type nwork )
{
typedef Test::RuntimeReduceFunctorFinal< execution_space > functor_type ;
enum { Count = 3 };
enum { Repeat = 100 };
typename functor_type::scalar_type result[ Repeat ][ Count ] ;
const unsigned long nw = nwork ;
const unsigned long nsum = nw % 2 ? nw * (( nw + 1 )/2 )
: (nw/2) * ( nw + 1 );
for ( unsigned i = 0 ; i < Repeat ; ++i ) {
if(i%2==0)
Kokkos::parallel_reduce( nwork , functor_type(nwork,Count) , result[i] );
else
Kokkos::parallel_reduce( "TestKernelReduce" , nwork , functor_type(nwork,Count) , result[i] );
}
for ( unsigned i = 0 ; i < Repeat ; ++i ) {
for ( unsigned j = 0 ; j < Count ; ++j ) {
const unsigned long correct = 0 == j % 3 ? nw : nsum ;
ASSERT_EQ( (ScalarType) correct , - result[i][j] );
}
}
}
};
template< typename ScalarType , class DeviceType >
class TestReduceDynamicView
{
public:
typedef DeviceType execution_space ;
typedef typename execution_space::size_type size_type ;
//------------------------------------
TestReduceDynamicView( const size_type nwork )
{
run_test_dynamic_view(nwork);
}
void run_test_dynamic_view( const size_type nwork )
{
typedef Test::RuntimeReduceFunctor< ScalarType , execution_space > functor_type ;
typedef Kokkos::View< ScalarType* , DeviceType > result_type ;
typedef typename result_type::HostMirror result_host_type ;
const unsigned CountLimit = 23 ;
const unsigned long nw = nwork ;
const unsigned long nsum = nw % 2 ? nw * (( nw + 1 )/2 )
: (nw/2) * ( nw + 1 );
for ( unsigned count = 0 ; count < CountLimit ; ++count ) {
result_type result("result",count);
result_host_type host_result = Kokkos::create_mirror( result );
// Test result to host pointer:
std::string str("TestKernelReduce");
if(count%2==0)
Kokkos::parallel_reduce( nw , functor_type(nw,count) , host_result.ptr_on_device() );
else
Kokkos::parallel_reduce( str , nw , functor_type(nw,count) , host_result.ptr_on_device() );
for ( unsigned j = 0 ; j < count ; ++j ) {
const unsigned long correct = 0 == j % 3 ? nw : nsum ;
ASSERT_EQ( host_result(j), (ScalarType) correct );
host_result(j) = 0 ;
}
}
}
};
}
// Computes y^T*A*x
// (modified from kokkos-tutorials/GTC2016/Exercises/ThreeLevelPar )
#if ( ! defined( KOKKOS_HAVE_CUDA ) ) || defined( KOKKOS_CUDA_USE_LAMBDA )
template< typename ScalarType , class DeviceType >
class TestTripleNestedReduce
{
public:
typedef DeviceType execution_space ;
typedef typename execution_space::size_type size_type ;
//------------------------------------
- TestTripleNestedReduce( const size_type & nrows , const size_type & ncols
+ TestTripleNestedReduce( const size_type & nrows , const size_type & ncols
, const size_type & team_size , const size_type & vector_length )
{
run_test( nrows , ncols , team_size, vector_length );
}
- void run_test( const size_type & nrows , const size_type & ncols
+ void run_test( const size_type & nrows , const size_type & ncols
, const size_type & team_size, const size_type & vector_length )
{
//typedef Kokkos::LayoutLeft Layout;
typedef Kokkos::LayoutRight Layout;
typedef Kokkos::View<ScalarType* , DeviceType> ViewVector;
typedef Kokkos::View<ScalarType** , Layout , DeviceType> ViewMatrix;
ViewVector y( "y" , nrows );
ViewVector x( "x" , ncols );
ViewMatrix A( "A" , nrows , ncols );
typedef Kokkos::RangePolicy<DeviceType> range_policy;
// Initialize y vector
Kokkos::parallel_for( range_policy( 0 , nrows ) , KOKKOS_LAMBDA( const int i ) { y( i ) = 1; } );
// Initialize x vector
Kokkos::parallel_for( range_policy( 0 , ncols ) , KOKKOS_LAMBDA( const int i ) { x( i ) = 1; } );
typedef Kokkos::TeamPolicy<DeviceType> team_policy;
typedef typename Kokkos::TeamPolicy<DeviceType>::member_type member_type;
// Initialize A matrix, note 2D indexing computation
Kokkos::parallel_for( team_policy( nrows , Kokkos::AUTO ) , KOKKOS_LAMBDA( const member_type& teamMember ) {
const int j = teamMember.league_rank();
Kokkos::parallel_for( Kokkos::TeamThreadRange( teamMember , ncols ) , [&] ( const int i ) {
A( j , i ) = 1;
} );
} );
- // Three level parallelism kernel to force caching of vector x
+ // Three level parallelism kernel to force caching of vector x
ScalarType result = 0.0;
int chunk_size = 128;
Kokkos::parallel_reduce( team_policy( nrows/chunk_size , team_size , vector_length ) , KOKKOS_LAMBDA ( const member_type& teamMember , double &update ) {
const int row_start = teamMember.league_rank() * chunk_size;
const int row_end = row_start + chunk_size;
Kokkos::parallel_for( Kokkos::TeamThreadRange( teamMember , row_start , row_end ) , [&] ( const int i ) {
ScalarType sum_i = 0.0;
Kokkos::parallel_reduce( Kokkos::ThreadVectorRange( teamMember , ncols ) , [&] ( const int j , ScalarType &innerUpdate ) {
innerUpdate += A( i , j ) * x( j );
} , sum_i );
Kokkos::single( Kokkos::PerThread( teamMember ) , [&] () {
update += y( i ) * sum_i;
} );
} );
} , result );
const ScalarType solution= ( ScalarType ) nrows * ( ScalarType ) ncols;
ASSERT_EQ( solution , result );
}
};
#else /* #if ( ! defined( KOKKOS_HAVE_CUDA ) ) || defined( KOKKOS_CUDA_USE_LAMBDA ) */
template< typename ScalarType , class DeviceType >
class TestTripleNestedReduce
{
public:
typedef DeviceType execution_space ;
typedef typename execution_space::size_type size_type ;
- TestTripleNestedReduce( const size_type & , const size_type
+ TestTripleNestedReduce( const size_type & , const size_type
, const size_type & , const size_type )
{ }
};
#endif
//--------------------------------------------------------------------------
namespace Test {
namespace ReduceCombinatorical {
template<class Scalar,class Space = Kokkos::HostSpace>
struct AddPlus {
public:
//Required
typedef AddPlus reducer_type;
typedef Scalar value_type;
typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;
private:
result_view_type result;
public:
AddPlus(value_type& result_):result(&result_) {}
//Required
KOKKOS_INLINE_FUNCTION
void join(value_type& dest, const value_type& src) const {
dest += src + 1;
}
KOKKOS_INLINE_FUNCTION
void join(volatile value_type& dest, const volatile value_type& src) const {
dest += src + 1;
}
//Optional
KOKKOS_INLINE_FUNCTION
void init( value_type& val) const {
val = value_type();
}
result_view_type result_view() const {
return result;
}
};
template<int ISTEAM>
struct FunctorScalar;
template<>
struct FunctorScalar<0>{
FunctorScalar(Kokkos::View<double> r):result(r) {}
Kokkos::View<double> result;
KOKKOS_INLINE_FUNCTION
void operator() (const int& i,double& update) const {
update+=i;
}
};
template<>
struct FunctorScalar<1>{
FunctorScalar(Kokkos::View<double> r):result(r) {}
Kokkos::View<double> result;
typedef Kokkos::TeamPolicy<>::member_type team_type;
KOKKOS_INLINE_FUNCTION
void operator() (const team_type& team,double& update) const {
update+=1.0/team.team_size()*team.league_rank();
}
};
template<int ISTEAM>
struct FunctorScalarInit;
template<>
struct FunctorScalarInit<0> {
FunctorScalarInit(Kokkos::View<double> r):result(r) {}
Kokkos::View<double> result;
KOKKOS_INLINE_FUNCTION
void operator() (const int& i, double& update) const {
update += i;
}
KOKKOS_INLINE_FUNCTION
void init(double& update) const {
update = 0.0;
}
};
template<>
struct FunctorScalarInit<1> {
FunctorScalarInit(Kokkos::View<double> r):result(r) {}
Kokkos::View<double> result;
typedef Kokkos::TeamPolicy<>::member_type team_type;
KOKKOS_INLINE_FUNCTION
void operator() (const team_type& team,double& update) const {
update+=1.0/team.team_size()*team.league_rank();
}
KOKKOS_INLINE_FUNCTION
void init(double& update) const {
update = 0.0;
}
};
template<int ISTEAM>
struct FunctorScalarFinal;
template<>
struct FunctorScalarFinal<0> {
FunctorScalarFinal(Kokkos::View<double> r):result(r) {}
Kokkos::View<double> result;
KOKKOS_INLINE_FUNCTION
void operator() (const int& i, double& update) const {
update += i;
}
KOKKOS_INLINE_FUNCTION
void final(double& update) const {
result() = update;
}
};
template<>
struct FunctorScalarFinal<1> {
FunctorScalarFinal(Kokkos::View<double> r):result(r) {}
Kokkos::View<double> result;
typedef Kokkos::TeamPolicy<>::member_type team_type;
KOKKOS_INLINE_FUNCTION
void operator() (const team_type& team, double& update) const {
update+=1.0/team.team_size()*team.league_rank();
}
KOKKOS_INLINE_FUNCTION
void final(double& update) const {
result() = update;
}
};
template<int ISTEAM>
struct FunctorScalarJoin;
template<>
struct FunctorScalarJoin<0> {
FunctorScalarJoin(Kokkos::View<double> r):result(r) {}
Kokkos::View<double> result;
KOKKOS_INLINE_FUNCTION
void operator() (const int& i, double& update) const {
update += i;
}
KOKKOS_INLINE_FUNCTION
void join(volatile double& dst, const volatile double& update) const {
dst += update;
}
};
template<>
struct FunctorScalarJoin<1> {
FunctorScalarJoin(Kokkos::View<double> r):result(r) {}
Kokkos::View<double> result;
typedef Kokkos::TeamPolicy<>::member_type team_type;
KOKKOS_INLINE_FUNCTION
void operator() (const team_type& team,double& update) const {
update+=1.0/team.team_size()*team.league_rank();
}
KOKKOS_INLINE_FUNCTION
void join(volatile double& dst, const volatile double& update) const {
dst += update;
}
};
template<int ISTEAM>
struct FunctorScalarJoinFinal;
template<>
struct FunctorScalarJoinFinal<0> {
FunctorScalarJoinFinal(Kokkos::View<double> r):result(r) {}
Kokkos::View<double> result;
KOKKOS_INLINE_FUNCTION
void operator() (const int& i, double& update) const {
update += i;
}
KOKKOS_INLINE_FUNCTION
void join(volatile double& dst, const volatile double& update) const {
dst += update;
}
KOKKOS_INLINE_FUNCTION
void final(double& update) const {
result() = update;
}
};
template<>
struct FunctorScalarJoinFinal<1> {
FunctorScalarJoinFinal(Kokkos::View<double> r):result(r) {}
Kokkos::View<double> result;
typedef Kokkos::TeamPolicy<>::member_type team_type;
KOKKOS_INLINE_FUNCTION
void operator() (const team_type& team,double& update) const {
update+=1.0/team.team_size()*team.league_rank();
}
KOKKOS_INLINE_FUNCTION
void join(volatile double& dst, const volatile double& update) const {
dst += update;
}
KOKKOS_INLINE_FUNCTION
void final(double& update) const {
result() = update;
}
};
template<int ISTEAM>
struct FunctorScalarJoinInit;
template<>
struct FunctorScalarJoinInit<0> {
FunctorScalarJoinInit(Kokkos::View<double> r):result(r) {}
Kokkos::View<double> result;
KOKKOS_INLINE_FUNCTION
void operator() (const int& i, double& update) const {
update += i;
}
KOKKOS_INLINE_FUNCTION
void join(volatile double& dst, const volatile double& update) const {
dst += update;
}
KOKKOS_INLINE_FUNCTION
void init(double& update) const {
update = 0.0;
}
};
template<>
struct FunctorScalarJoinInit<1> {
FunctorScalarJoinInit(Kokkos::View<double> r):result(r) {}
Kokkos::View<double> result;
typedef Kokkos::TeamPolicy<>::member_type team_type;
KOKKOS_INLINE_FUNCTION
void operator() (const team_type& team,double& update) const {
update+=1.0/team.team_size()*team.league_rank();
}
KOKKOS_INLINE_FUNCTION
void join(volatile double& dst, const volatile double& update) const {
dst += update;
}
KOKKOS_INLINE_FUNCTION
void init(double& update) const {
update = 0.0;
}
};
template<int ISTEAM>
struct FunctorScalarJoinFinalInit;
template<>
struct FunctorScalarJoinFinalInit<0> {
FunctorScalarJoinFinalInit(Kokkos::View<double> r):result(r) {}
Kokkos::View<double> result;
KOKKOS_INLINE_FUNCTION
void operator() (const int& i, double& update) const {
update += i;
}
KOKKOS_INLINE_FUNCTION
void join(volatile double& dst, const volatile double& update) const {
dst += update;
}
KOKKOS_INLINE_FUNCTION
void final(double& update) const {
result() = update;
}
KOKKOS_INLINE_FUNCTION
void init(double& update) const {
update = 0.0;
}
};
template<>
struct FunctorScalarJoinFinalInit<1> {
FunctorScalarJoinFinalInit(Kokkos::View<double> r):result(r) {}
Kokkos::View<double> result;
typedef Kokkos::TeamPolicy<>::member_type team_type;
KOKKOS_INLINE_FUNCTION
void operator() (const team_type& team,double& update) const {
update+=1.0/team.team_size()*team.league_rank();
}
KOKKOS_INLINE_FUNCTION
void join(volatile double& dst, const volatile double& update) const {
dst += update;
}
KOKKOS_INLINE_FUNCTION
void final(double& update) const {
result() = update;
}
KOKKOS_INLINE_FUNCTION
void init(double& update) const {
update = 0.0;
}
};
struct Functor1 {
KOKKOS_INLINE_FUNCTION
void operator() (const int& i,double& update) const {
update+=i;
}
};
struct Functor2 {
typedef double value_type[];
const unsigned value_count;
Functor2(unsigned n):value_count(n){}
KOKKOS_INLINE_FUNCTION
void operator() (const unsigned& i,double update[]) const {
for(unsigned j=0;j<value_count;j++)
update[j]+=i;
}
KOKKOS_INLINE_FUNCTION
void init( double dst[] ) const
{
for ( unsigned i = 0 ; i < value_count ; ++i ) dst[i] = 0 ;
}
KOKKOS_INLINE_FUNCTION
void join( volatile double dst[] ,
const volatile double src[] ) const
{
for ( unsigned i = 0 ; i < value_count ; ++i ) dst[i] += src[i] ;
}
};
}
}
namespace Test {
template<class ExecSpace = Kokkos::DefaultExecutionSpace>
struct TestReduceCombinatoricalInstantiation {
template<class ... Args>
static void CallParallelReduce(Args... args) {
Kokkos::parallel_reduce(args...);
}
template<class ... Args>
static void AddReturnArgument(Args... args) {
Kokkos::View<double,Kokkos::HostSpace> result_view("ResultView");
double expected_result = 1000.0*999.0/2.0;
double value = 0;
Kokkos::parallel_reduce(args...,value);
ASSERT_EQ(expected_result,value);
result_view() = 0;
CallParallelReduce(args...,result_view);
ASSERT_EQ(expected_result,result_view());
value = 0;
CallParallelReduce(args...,Kokkos::View<double,Kokkos::HostSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>(&value));
ASSERT_EQ(expected_result,value);
result_view() = 0;
const Kokkos::View<double,Kokkos::HostSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> result_view_const_um = result_view;
CallParallelReduce(args...,result_view_const_um);
ASSERT_EQ(expected_result,result_view_const_um());
value = 0;
CallParallelReduce(args...,Test::ReduceCombinatorical::AddPlus<double>(value));
if((Kokkos::DefaultExecutionSpace::concurrency() > 1) && (ExecSpace::concurrency()>1))
ASSERT_TRUE(expected_result<value);
else if((Kokkos::DefaultExecutionSpace::concurrency() > 1) || (ExecSpace::concurrency()>1))
ASSERT_TRUE(expected_result<=value);
else
ASSERT_EQ(expected_result,value);
value = 0;
Test::ReduceCombinatorical::AddPlus<double> add(value);
CallParallelReduce(args...,add);
if((Kokkos::DefaultExecutionSpace::concurrency() > 1) && (ExecSpace::concurrency()>1))
ASSERT_TRUE(expected_result<value);
else if((Kokkos::DefaultExecutionSpace::concurrency() > 1) || (ExecSpace::concurrency()>1))
ASSERT_TRUE(expected_result<=value);
else
ASSERT_EQ(expected_result,value);
}
template<class ... Args>
static void AddLambdaRange(void*,Args... args) {
AddReturnArgument(args..., KOKKOS_LAMBDA (const int&i , double& lsum) {
lsum += i;
});
}
template<class ... Args>
static void AddLambdaTeam(void*,Args... args) {
AddReturnArgument(args..., KOKKOS_LAMBDA (const Kokkos::TeamPolicy<>::member_type& team, double& update) {
update+=1.0/team.team_size()*team.league_rank();
});
}
template<class ... Args>
static void AddLambdaRange(Kokkos::InvalidType,Args... args) {
}
template<class ... Args>
static void AddLambdaTeam(Kokkos::InvalidType,Args... args) {
}
template<int ISTEAM, class ... Args>
static void AddFunctor(Args... args) {
Kokkos::View<double> result_view("FunctorView");
auto h_r = Kokkos::create_mirror_view(result_view);
Test::ReduceCombinatorical::FunctorScalar<ISTEAM> functor(result_view);
double expected_result = 1000.0*999.0/2.0;
AddReturnArgument(args..., functor);
AddReturnArgument(args..., Test::ReduceCombinatorical::FunctorScalar<ISTEAM>(result_view));
AddReturnArgument(args..., Test::ReduceCombinatorical::FunctorScalarInit<ISTEAM>(result_view));
AddReturnArgument(args..., Test::ReduceCombinatorical::FunctorScalarJoin<ISTEAM>(result_view));
AddReturnArgument(args..., Test::ReduceCombinatorical::FunctorScalarJoinInit<ISTEAM>(result_view));
h_r() = 0;
Kokkos::deep_copy(result_view,h_r);
CallParallelReduce(args..., Test::ReduceCombinatorical::FunctorScalarFinal<ISTEAM>(result_view));
Kokkos::deep_copy(h_r,result_view);
ASSERT_EQ(expected_result,h_r());
h_r() = 0;
Kokkos::deep_copy(result_view,h_r);
CallParallelReduce(args..., Test::ReduceCombinatorical::FunctorScalarJoinFinal<ISTEAM>(result_view));
Kokkos::deep_copy(h_r,result_view);
ASSERT_EQ(expected_result,h_r());
h_r() = 0;
Kokkos::deep_copy(result_view,h_r);
CallParallelReduce(args..., Test::ReduceCombinatorical::FunctorScalarJoinFinalInit<ISTEAM>(result_view));
Kokkos::deep_copy(h_r,result_view);
ASSERT_EQ(expected_result,h_r());
}
template<class ... Args>
static void AddFunctorLambdaRange(Args... args) {
AddFunctor<0,Args...>(args...);
#ifdef KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA
AddLambdaRange(typename std::conditional<std::is_same<ExecSpace,Kokkos::DefaultExecutionSpace>::value,void*,Kokkos::InvalidType>::type(), args...);
#endif
}
template<class ... Args>
static void AddFunctorLambdaTeam(Args... args) {
AddFunctor<1,Args...>(args...);
#ifdef KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA
AddLambdaTeam(typename std::conditional<std::is_same<ExecSpace,Kokkos::DefaultExecutionSpace>::value,void*,Kokkos::InvalidType>::type(), args...);
#endif
}
template<class ... Args>
static void AddPolicy(Args... args) {
int N = 1000;
Kokkos::RangePolicy<ExecSpace> policy(0,N);
AddFunctorLambdaRange(args...,1000);
AddFunctorLambdaRange(args...,N);
AddFunctorLambdaRange(args...,policy);
AddFunctorLambdaRange(args...,Kokkos::RangePolicy<ExecSpace>(0,N));
AddFunctorLambdaRange(args...,Kokkos::RangePolicy<ExecSpace,Kokkos::Schedule<Kokkos::Dynamic> >(0,N));
AddFunctorLambdaRange(args...,Kokkos::RangePolicy<ExecSpace,Kokkos::Schedule<Kokkos::Static> >(0,N).set_chunk_size(10));
AddFunctorLambdaRange(args...,Kokkos::RangePolicy<ExecSpace,Kokkos::Schedule<Kokkos::Dynamic> >(0,N).set_chunk_size(10));
AddFunctorLambdaTeam(args...,Kokkos::TeamPolicy<ExecSpace>(N,Kokkos::AUTO));
AddFunctorLambdaTeam(args...,Kokkos::TeamPolicy<ExecSpace,Kokkos::Schedule<Kokkos::Dynamic> >(N,Kokkos::AUTO));
AddFunctorLambdaTeam(args...,Kokkos::TeamPolicy<ExecSpace,Kokkos::Schedule<Kokkos::Static> >(N,Kokkos::AUTO).set_chunk_size(10));
AddFunctorLambdaTeam(args...,Kokkos::TeamPolicy<ExecSpace,Kokkos::Schedule<Kokkos::Dynamic> >(N,Kokkos::AUTO).set_chunk_size(10));
}
- static void AddLabel() {
- std::string s("Std::String");
+ static void execute_a() {
AddPolicy();
- AddPolicy("Char Constant");
+ }
+
+ static void execute_b() {
+ std::string s("Std::String");
AddPolicy(s.c_str());
- AddPolicy(s);
+ AddPolicy("Char Constant");
}
- static void execute() {
- AddLabel();
+ static void execute_c() {
+ std::string s("Std::String");
+ AddPolicy(s);
}
};
template<class Scalar, class ExecSpace = Kokkos::DefaultExecutionSpace>
struct TestReducers {
struct SumFunctor {
Kokkos::View<const Scalar*,ExecSpace> values;
KOKKOS_INLINE_FUNCTION
void operator() (const int& i, Scalar& value) const {
value += values(i);
}
};
struct ProdFunctor {
Kokkos::View<const Scalar*,ExecSpace> values;
KOKKOS_INLINE_FUNCTION
void operator() (const int& i, Scalar& value) const {
value *= values(i);
}
};
struct MinFunctor {
Kokkos::View<const Scalar*,ExecSpace> values;
KOKKOS_INLINE_FUNCTION
void operator() (const int& i, Scalar& value) const {
if(values(i) < value)
value = values(i);
}
};
struct MaxFunctor {
Kokkos::View<const Scalar*,ExecSpace> values;
KOKKOS_INLINE_FUNCTION
void operator() (const int& i, Scalar& value) const {
if(values(i) > value)
value = values(i);
}
};
struct MinLocFunctor {
Kokkos::View<const Scalar*,ExecSpace> values;
KOKKOS_INLINE_FUNCTION
void operator() (const int& i,
typename Kokkos::Experimental::MinLoc<Scalar,int>::value_type& value) const {
if(values(i) < value.val) {
value.val = values(i);
value.loc = i;
}
}
};
struct MaxLocFunctor {
Kokkos::View<const Scalar*,ExecSpace> values;
KOKKOS_INLINE_FUNCTION
void operator() (const int& i,
typename Kokkos::Experimental::MaxLoc<Scalar,int>::value_type& value) const {
if(values(i) > value.val) {
value.val = values(i);
value.loc = i;
}
}
};
struct MinMaxLocFunctor {
Kokkos::View<const Scalar*,ExecSpace> values;
KOKKOS_INLINE_FUNCTION
void operator() (const int& i,
typename Kokkos::Experimental::MinMaxLoc<Scalar,int>::value_type& value) const {
if(values(i) > value.max_val) {
value.max_val = values(i);
value.max_loc = i;
}
if(values(i) < value.min_val) {
value.min_val = values(i);
value.min_loc = i;
}
}
};
struct BAndFunctor {
Kokkos::View<const Scalar*,ExecSpace> values;
KOKKOS_INLINE_FUNCTION
void operator() (const int& i, Scalar& value) const {
value = value & values(i);
}
};
struct BOrFunctor {
Kokkos::View<const Scalar*,ExecSpace> values;
KOKKOS_INLINE_FUNCTION
void operator() (const int& i, Scalar& value) const {
value = value | values(i);
}
};
struct BXorFunctor {
Kokkos::View<const Scalar*,ExecSpace> values;
KOKKOS_INLINE_FUNCTION
void operator() (const int& i, Scalar& value) const {
value = value ^ values(i);
}
};
struct LAndFunctor {
Kokkos::View<const Scalar*,ExecSpace> values;
KOKKOS_INLINE_FUNCTION
void operator() (const int& i, Scalar& value) const {
value = value && values(i);
}
};
struct LOrFunctor {
Kokkos::View<const Scalar*,ExecSpace> values;
KOKKOS_INLINE_FUNCTION
void operator() (const int& i, Scalar& value) const {
value = value || values(i);
}
};
struct LXorFunctor {
Kokkos::View<const Scalar*,ExecSpace> values;
KOKKOS_INLINE_FUNCTION
void operator() (const int& i, Scalar& value) const {
value = value ? (!values(i)) : values(i);
}
};
static void test_sum(int N) {
Kokkos::View<Scalar*,ExecSpace> values("Values",N);
auto h_values = Kokkos::create_mirror_view(values);
Scalar reference_sum = 0;
for(int i=0; i<N; i++) {
h_values(i) = (Scalar)(rand()%100);
reference_sum += h_values(i);
}
Kokkos::deep_copy(values,h_values);
SumFunctor f;
f.values = values;
Scalar init = 0;
{
Scalar sum_scalar = init;
Kokkos::Experimental::Sum<Scalar> reducer_scalar(sum_scalar);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
ASSERT_EQ(sum_scalar,reference_sum);
Scalar sum_scalar_view = reducer_scalar.result_view()();
ASSERT_EQ(sum_scalar_view,reference_sum);
}
{
Scalar sum_scalar_init = init;
Kokkos::Experimental::Sum<Scalar> reducer_scalar_init(sum_scalar_init,init);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar_init);
ASSERT_EQ(sum_scalar_init,reference_sum);
Scalar sum_scalar_init_view = reducer_scalar_init.result_view()();
ASSERT_EQ(sum_scalar_init_view,reference_sum);
}
{
Kokkos::View<Scalar,Kokkos::HostSpace> sum_view("View");
sum_view() = init;
Kokkos::Experimental::Sum<Scalar> reducer_view(sum_view);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
Scalar sum_view_scalar = sum_view();
ASSERT_EQ(sum_view_scalar,reference_sum);
Scalar sum_view_view = reducer_view.result_view()();
ASSERT_EQ(sum_view_view,reference_sum);
}
{
Kokkos::View<Scalar,Kokkos::HostSpace> sum_view_init("View");
sum_view_init() = init;
Kokkos::Experimental::Sum<Scalar> reducer_view_init(sum_view_init,init);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view_init);
Scalar sum_view_init_scalar = sum_view_init();
ASSERT_EQ(sum_view_init_scalar,reference_sum);
Scalar sum_view_init_view = reducer_view_init.result_view()();
ASSERT_EQ(sum_view_init_view,reference_sum);
}
}
static void test_prod(int N) {
Kokkos::View<Scalar*,ExecSpace> values("Values",N);
auto h_values = Kokkos::create_mirror_view(values);
Scalar reference_prod = 1;
for(int i=0; i<N; i++) {
h_values(i) = (Scalar)(rand()%4+1);
reference_prod *= h_values(i);
}
Kokkos::deep_copy(values,h_values);
ProdFunctor f;
f.values = values;
Scalar init = 1;
if(std::is_arithmetic<Scalar>::value)
{
Scalar prod_scalar = init;
Kokkos::Experimental::Prod<Scalar> reducer_scalar(prod_scalar);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
ASSERT_EQ(prod_scalar,reference_prod);
Scalar prod_scalar_view = reducer_scalar.result_view()();
ASSERT_EQ(prod_scalar_view,reference_prod);
}
{
Scalar prod_scalar_init = init;
Kokkos::Experimental::Prod<Scalar> reducer_scalar_init(prod_scalar_init,init);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar_init);
ASSERT_EQ(prod_scalar_init,reference_prod);
Scalar prod_scalar_init_view = reducer_scalar_init.result_view()();
ASSERT_EQ(prod_scalar_init_view,reference_prod);
}
if(std::is_arithmetic<Scalar>::value)
{
Kokkos::View<Scalar,Kokkos::HostSpace> prod_view("View");
prod_view() = init;
Kokkos::Experimental::Prod<Scalar> reducer_view(prod_view);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
Scalar prod_view_scalar = prod_view();
ASSERT_EQ(prod_view_scalar,reference_prod);
Scalar prod_view_view = reducer_view.result_view()();
ASSERT_EQ(prod_view_view,reference_prod);
}
{
Kokkos::View<Scalar,Kokkos::HostSpace> prod_view_init("View");
prod_view_init() = init;
Kokkos::Experimental::Prod<Scalar> reducer_view_init(prod_view_init,init);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view_init);
Scalar prod_view_init_scalar = prod_view_init();
ASSERT_EQ(prod_view_init_scalar,reference_prod);
Scalar prod_view_init_view = reducer_view_init.result_view()();
ASSERT_EQ(prod_view_init_view,reference_prod);
}
}
static void test_min(int N) {
Kokkos::View<Scalar*,ExecSpace> values("Values",N);
auto h_values = Kokkos::create_mirror_view(values);
Scalar reference_min = std::numeric_limits<Scalar>::max();
for(int i=0; i<N; i++) {
h_values(i) = (Scalar)(rand()%100000);
if(h_values(i)<reference_min)
reference_min = h_values(i);
}
Kokkos::deep_copy(values,h_values);
MinFunctor f;
f.values = values;
Scalar init = std::numeric_limits<Scalar>::max();
{
Scalar min_scalar = init;
Kokkos::Experimental::Min<Scalar> reducer_scalar(min_scalar);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
ASSERT_EQ(min_scalar,reference_min);
Scalar min_scalar_view = reducer_scalar.result_view()();
ASSERT_EQ(min_scalar_view,reference_min);
}
{
Scalar min_scalar_init = init;
Kokkos::Experimental::Min<Scalar> reducer_scalar_init(min_scalar_init,init);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar_init);
ASSERT_EQ(min_scalar_init,reference_min);
Scalar min_scalar_init_view = reducer_scalar_init.result_view()();
ASSERT_EQ(min_scalar_init_view,reference_min);
}
{
Kokkos::View<Scalar,Kokkos::HostSpace> min_view("View");
min_view() = init;
Kokkos::Experimental::Min<Scalar> reducer_view(min_view);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
Scalar min_view_scalar = min_view();
ASSERT_EQ(min_view_scalar,reference_min);
Scalar min_view_view = reducer_view.result_view()();
ASSERT_EQ(min_view_view,reference_min);
}
{
Kokkos::View<Scalar,Kokkos::HostSpace> min_view_init("View");
min_view_init() = init;
Kokkos::Experimental::Min<Scalar> reducer_view_init(min_view_init,init);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view_init);
Scalar min_view_init_scalar = min_view_init();
ASSERT_EQ(min_view_init_scalar,reference_min);
Scalar min_view_init_view = reducer_view_init.result_view()();
ASSERT_EQ(min_view_init_view,reference_min);
}
}
static void test_max(int N) {
Kokkos::View<Scalar*,ExecSpace> values("Values",N);
auto h_values = Kokkos::create_mirror_view(values);
Scalar reference_max = std::numeric_limits<Scalar>::min();
for(int i=0; i<N; i++) {
h_values(i) = (Scalar)(rand()%100000+1);
if(h_values(i)>reference_max)
reference_max = h_values(i);
}
Kokkos::deep_copy(values,h_values);
MaxFunctor f;
f.values = values;
Scalar init = std::numeric_limits<Scalar>::min();
{
Scalar max_scalar = init;
Kokkos::Experimental::Max<Scalar> reducer_scalar(max_scalar);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
ASSERT_EQ(max_scalar,reference_max);
Scalar max_scalar_view = reducer_scalar.result_view()();
ASSERT_EQ(max_scalar_view,reference_max);
}
{
Scalar max_scalar_init = init;
Kokkos::Experimental::Max<Scalar> reducer_scalar_init(max_scalar_init,init);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar_init);
ASSERT_EQ(max_scalar_init,reference_max);
Scalar max_scalar_init_view = reducer_scalar_init.result_view()();
ASSERT_EQ(max_scalar_init_view,reference_max);
}
{
Kokkos::View<Scalar,Kokkos::HostSpace> max_view("View");
max_view() = init;
Kokkos::Experimental::Max<Scalar> reducer_view(max_view);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
Scalar max_view_scalar = max_view();
ASSERT_EQ(max_view_scalar,reference_max);
Scalar max_view_view = reducer_view.result_view()();
ASSERT_EQ(max_view_view,reference_max);
}
{
Kokkos::View<Scalar,Kokkos::HostSpace> max_view_init("View");
max_view_init() = init;
Kokkos::Experimental::Max<Scalar> reducer_view_init(max_view_init,init);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view_init);
Scalar max_view_init_scalar = max_view_init();
ASSERT_EQ(max_view_init_scalar,reference_max);
Scalar max_view_init_view = reducer_view_init.result_view()();
ASSERT_EQ(max_view_init_view,reference_max);
}
}
static void test_minloc(int N) {
Kokkos::View<Scalar*,ExecSpace> values("Values",N);
auto h_values = Kokkos::create_mirror_view(values);
Scalar reference_min = std::numeric_limits<Scalar>::max();
int reference_loc = -1;
for(int i=0; i<N; i++) {
h_values(i) = (Scalar)(rand()%100000);
if(h_values(i)<reference_min) {
reference_min = h_values(i);
reference_loc = i;
+ } else if (h_values(i) == reference_min) {
+ // make min unique
+ h_values(i) += std::numeric_limits<Scalar>::epsilon();
}
}
Kokkos::deep_copy(values,h_values);
MinLocFunctor f;
typedef typename Kokkos::Experimental::MinLoc<Scalar,int>::value_type value_type;
f.values = values;
Scalar init = std::numeric_limits<Scalar>::max();
{
value_type min_scalar;
Kokkos::Experimental::MinLoc<Scalar,int> reducer_scalar(min_scalar);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
ASSERT_EQ(min_scalar.val,reference_min);
ASSERT_EQ(min_scalar.loc,reference_loc);
value_type min_scalar_view = reducer_scalar.result_view()();
ASSERT_EQ(min_scalar_view.val,reference_min);
ASSERT_EQ(min_scalar_view.loc,reference_loc);
}
{
value_type min_scalar_init;
Kokkos::Experimental::MinLoc<Scalar,int> reducer_scalar_init(min_scalar_init,init);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar_init);
ASSERT_EQ(min_scalar_init.val,reference_min);
ASSERT_EQ(min_scalar_init.loc,reference_loc);
value_type min_scalar_init_view = reducer_scalar_init.result_view()();
ASSERT_EQ(min_scalar_init_view.val,reference_min);
ASSERT_EQ(min_scalar_init_view.loc,reference_loc);
}
{
Kokkos::View<value_type,Kokkos::HostSpace> min_view("View");
Kokkos::Experimental::MinLoc<Scalar,int> reducer_view(min_view);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
value_type min_view_scalar = min_view();
ASSERT_EQ(min_view_scalar.val,reference_min);
ASSERT_EQ(min_view_scalar.loc,reference_loc);
value_type min_view_view = reducer_view.result_view()();
ASSERT_EQ(min_view_view.val,reference_min);
ASSERT_EQ(min_view_view.loc,reference_loc);
}
{
Kokkos::View<value_type,Kokkos::HostSpace> min_view_init("View");
Kokkos::Experimental::MinLoc<Scalar,int> reducer_view_init(min_view_init,init);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view_init);
value_type min_view_init_scalar = min_view_init();
ASSERT_EQ(min_view_init_scalar.val,reference_min);
ASSERT_EQ(min_view_init_scalar.loc,reference_loc);
value_type min_view_init_view = reducer_view_init.result_view()();
ASSERT_EQ(min_view_init_view.val,reference_min);
ASSERT_EQ(min_view_init_view.loc,reference_loc);
}
}
static void test_maxloc(int N) {
Kokkos::View<Scalar*,ExecSpace> values("Values",N);
auto h_values = Kokkos::create_mirror_view(values);
Scalar reference_max = std::numeric_limits<Scalar>::min();
int reference_loc = -1;
for(int i=0; i<N; i++) {
h_values(i) = (Scalar)(rand()%100000);
if(h_values(i)>reference_max) {
reference_max = h_values(i);
reference_loc = i;
+ } else if (h_values(i) == reference_max) {
+ // make max unique
+ h_values(i) -= std::numeric_limits<Scalar>::epsilon();
}
}
Kokkos::deep_copy(values,h_values);
MaxLocFunctor f;
typedef typename Kokkos::Experimental::MaxLoc<Scalar,int>::value_type value_type;
f.values = values;
Scalar init = std::numeric_limits<Scalar>::min();
{
value_type max_scalar;
Kokkos::Experimental::MaxLoc<Scalar,int> reducer_scalar(max_scalar);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
ASSERT_EQ(max_scalar.val,reference_max);
ASSERT_EQ(max_scalar.loc,reference_loc);
value_type max_scalar_view = reducer_scalar.result_view()();
ASSERT_EQ(max_scalar_view.val,reference_max);
ASSERT_EQ(max_scalar_view.loc,reference_loc);
}
{
value_type max_scalar_init;
Kokkos::Experimental::MaxLoc<Scalar,int> reducer_scalar_init(max_scalar_init,init);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar_init);
ASSERT_EQ(max_scalar_init.val,reference_max);
ASSERT_EQ(max_scalar_init.loc,reference_loc);
value_type max_scalar_init_view = reducer_scalar_init.result_view()();
ASSERT_EQ(max_scalar_init_view.val,reference_max);
ASSERT_EQ(max_scalar_init_view.loc,reference_loc);
}
{
Kokkos::View<value_type,Kokkos::HostSpace> max_view("View");
Kokkos::Experimental::MaxLoc<Scalar,int> reducer_view(max_view);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
value_type max_view_scalar = max_view();
ASSERT_EQ(max_view_scalar.val,reference_max);
ASSERT_EQ(max_view_scalar.loc,reference_loc);
value_type max_view_view = reducer_view.result_view()();
ASSERT_EQ(max_view_view.val,reference_max);
ASSERT_EQ(max_view_view.loc,reference_loc);
}
{
Kokkos::View<value_type,Kokkos::HostSpace> max_view_init("View");
Kokkos::Experimental::MaxLoc<Scalar,int> reducer_view_init(max_view_init,init);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view_init);
value_type max_view_init_scalar = max_view_init();
ASSERT_EQ(max_view_init_scalar.val,reference_max);
ASSERT_EQ(max_view_init_scalar.loc,reference_loc);
value_type max_view_init_view = reducer_view_init.result_view()();
ASSERT_EQ(max_view_init_view.val,reference_max);
ASSERT_EQ(max_view_init_view.loc,reference_loc);
}
}
static void test_minmaxloc(int N) {
Kokkos::View<Scalar*,ExecSpace> values("Values",N);
auto h_values = Kokkos::create_mirror_view(values);
Scalar reference_max = std::numeric_limits<Scalar>::min();
Scalar reference_min = std::numeric_limits<Scalar>::max();
int reference_minloc = -1;
int reference_maxloc = -1;
for(int i=0; i<N; i++) {
h_values(i) = (Scalar)(rand()%100000);
+ }
+ for(int i=0; i<N; i++) {
if(h_values(i)>reference_max) {
reference_max = h_values(i);
reference_maxloc = i;
+ } else if (h_values(i) == reference_max) {
+ // make max unique
+ h_values(i) -= std::numeric_limits<Scalar>::epsilon();
}
+ }
+ for(int i=0; i<N; i++) {
if(h_values(i)<reference_min) {
reference_min = h_values(i);
reference_minloc = i;
+ } else if (h_values(i) == reference_min) {
+ // make min unique
+ h_values(i) += std::numeric_limits<Scalar>::epsilon();
}
}
Kokkos::deep_copy(values,h_values);
MinMaxLocFunctor f;
typedef typename Kokkos::Experimental::MinMaxLoc<Scalar,int>::value_type value_type;
f.values = values;
Scalar init_min = std::numeric_limits<Scalar>::max();
Scalar init_max = std::numeric_limits<Scalar>::min();
{
value_type minmax_scalar;
Kokkos::Experimental::MinMaxLoc<Scalar,int> reducer_scalar(minmax_scalar);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
ASSERT_EQ(minmax_scalar.min_val,reference_min);
+ for(int i=0; i<N; i++) {
+ if((i == minmax_scalar.min_loc) && (h_values(i)==reference_min))
+ reference_minloc = i;
+ }
ASSERT_EQ(minmax_scalar.min_loc,reference_minloc);
ASSERT_EQ(minmax_scalar.max_val,reference_max);
+ for(int i=0; i<N; i++) {
+ if((i == minmax_scalar.max_loc) && (h_values(i)==reference_max))
+ reference_maxloc = i;
+ }
ASSERT_EQ(minmax_scalar.max_loc,reference_maxloc);
value_type minmax_scalar_view = reducer_scalar.result_view()();
ASSERT_EQ(minmax_scalar_view.min_val,reference_min);
ASSERT_EQ(minmax_scalar_view.min_loc,reference_minloc);
ASSERT_EQ(minmax_scalar_view.max_val,reference_max);
ASSERT_EQ(minmax_scalar_view.max_loc,reference_maxloc);
}
{
value_type minmax_scalar_init;
Kokkos::Experimental::MinMaxLoc<Scalar,int> reducer_scalar_init(minmax_scalar_init,init_min,init_max);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar_init);
ASSERT_EQ(minmax_scalar_init.min_val,reference_min);
ASSERT_EQ(minmax_scalar_init.min_loc,reference_minloc);
ASSERT_EQ(minmax_scalar_init.max_val,reference_max);
ASSERT_EQ(minmax_scalar_init.max_loc,reference_maxloc);
value_type minmax_scalar_init_view = reducer_scalar_init.result_view()();
ASSERT_EQ(minmax_scalar_init_view.min_val,reference_min);
ASSERT_EQ(minmax_scalar_init_view.min_loc,reference_minloc);
ASSERT_EQ(minmax_scalar_init_view.max_val,reference_max);
ASSERT_EQ(minmax_scalar_init_view.max_loc,reference_maxloc);
}
{
Kokkos::View<value_type,Kokkos::HostSpace> minmax_view("View");
Kokkos::Experimental::MinMaxLoc<Scalar,int> reducer_view(minmax_view);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
value_type minmax_view_scalar = minmax_view();
ASSERT_EQ(minmax_view_scalar.min_val,reference_min);
ASSERT_EQ(minmax_view_scalar.min_loc,reference_minloc);
ASSERT_EQ(minmax_view_scalar.max_val,reference_max);
ASSERT_EQ(minmax_view_scalar.max_loc,reference_maxloc);
value_type minmax_view_view = reducer_view.result_view()();
ASSERT_EQ(minmax_view_view.min_val,reference_min);
ASSERT_EQ(minmax_view_view.min_loc,reference_minloc);
ASSERT_EQ(minmax_view_view.max_val,reference_max);
ASSERT_EQ(minmax_view_view.max_loc,reference_maxloc);
}
{
Kokkos::View<value_type,Kokkos::HostSpace> minmax_view_init("View");
Kokkos::Experimental::MinMaxLoc<Scalar,int> reducer_view_init(minmax_view_init,init_min,init_max);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view_init);
value_type minmax_view_init_scalar = minmax_view_init();
ASSERT_EQ(minmax_view_init_scalar.min_val,reference_min);
ASSERT_EQ(minmax_view_init_scalar.min_loc,reference_minloc);
ASSERT_EQ(minmax_view_init_scalar.max_val,reference_max);
ASSERT_EQ(minmax_view_init_scalar.max_loc,reference_maxloc);
value_type minmax_view_init_view = reducer_view_init.result_view()();
ASSERT_EQ(minmax_view_init_view.min_val,reference_min);
ASSERT_EQ(minmax_view_init_view.min_loc,reference_minloc);
ASSERT_EQ(minmax_view_init_view.max_val,reference_max);
ASSERT_EQ(minmax_view_init_view.max_loc,reference_maxloc);
}
}
static void test_BAnd(int N) {
Kokkos::View<Scalar*,ExecSpace> values("Values",N);
auto h_values = Kokkos::create_mirror_view(values);
Scalar reference_band = Scalar() | (~Scalar());
for(int i=0; i<N; i++) {
h_values(i) = (Scalar)(rand()%100000+1);
reference_band = reference_band & h_values(i);
}
Kokkos::deep_copy(values,h_values);
BAndFunctor f;
f.values = values;
Scalar init = Scalar() | (~Scalar());
{
Scalar band_scalar = init;
Kokkos::Experimental::BAnd<Scalar> reducer_scalar(band_scalar);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
ASSERT_EQ(band_scalar,reference_band);
Scalar band_scalar_view = reducer_scalar.result_view()();
ASSERT_EQ(band_scalar_view,reference_band);
}
{
Kokkos::View<Scalar,Kokkos::HostSpace> band_view("View");
band_view() = init;
Kokkos::Experimental::BAnd<Scalar> reducer_view(band_view);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
Scalar band_view_scalar = band_view();
ASSERT_EQ(band_view_scalar,reference_band);
Scalar band_view_view = reducer_view.result_view()();
ASSERT_EQ(band_view_view,reference_band);
}
}
static void test_BOr(int N) {
Kokkos::View<Scalar*,ExecSpace> values("Values",N);
auto h_values = Kokkos::create_mirror_view(values);
Scalar reference_bor = Scalar() & (~Scalar());
for(int i=0; i<N; i++) {
h_values(i) = (Scalar)((rand()%100000+1)*2);
reference_bor = reference_bor | h_values(i);
}
Kokkos::deep_copy(values,h_values);
BOrFunctor f;
f.values = values;
Scalar init = Scalar() & (~Scalar());
{
Scalar bor_scalar = init;
Kokkos::Experimental::BOr<Scalar> reducer_scalar(bor_scalar);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
ASSERT_EQ(bor_scalar,reference_bor);
Scalar bor_scalar_view = reducer_scalar.result_view()();
ASSERT_EQ(bor_scalar_view,reference_bor);
}
{
Kokkos::View<Scalar,Kokkos::HostSpace> bor_view("View");
bor_view() = init;
Kokkos::Experimental::BOr<Scalar> reducer_view(bor_view);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
Scalar bor_view_scalar = bor_view();
ASSERT_EQ(bor_view_scalar,reference_bor);
Scalar bor_view_view = reducer_view.result_view()();
ASSERT_EQ(bor_view_view,reference_bor);
}
}
static void test_BXor(int N) {
Kokkos::View<Scalar*,ExecSpace> values("Values",N);
auto h_values = Kokkos::create_mirror_view(values);
Scalar reference_bxor = Scalar() & (~Scalar());
for(int i=0; i<N; i++) {
h_values(i) = (Scalar)((rand()%100000+1)*2);
reference_bxor = reference_bxor ^ h_values(i);
}
Kokkos::deep_copy(values,h_values);
BXorFunctor f;
f.values = values;
Scalar init = Scalar() & (~Scalar());
{
Scalar bxor_scalar = init;
Kokkos::Experimental::BXor<Scalar> reducer_scalar(bxor_scalar);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
ASSERT_EQ(bxor_scalar,reference_bxor);
Scalar bxor_scalar_view = reducer_scalar.result_view()();
ASSERT_EQ(bxor_scalar_view,reference_bxor);
}
{
Kokkos::View<Scalar,Kokkos::HostSpace> bxor_view("View");
bxor_view() = init;
Kokkos::Experimental::BXor<Scalar> reducer_view(bxor_view);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
Scalar bxor_view_scalar = bxor_view();
ASSERT_EQ(bxor_view_scalar,reference_bxor);
Scalar bxor_view_view = reducer_view.result_view()();
ASSERT_EQ(bxor_view_view,reference_bxor);
}
}
static void test_LAnd(int N) {
Kokkos::View<Scalar*,ExecSpace> values("Values",N);
auto h_values = Kokkos::create_mirror_view(values);
Scalar reference_land = 1;
for(int i=0; i<N; i++) {
h_values(i) = (Scalar)(rand()%2);
reference_land = reference_land && h_values(i);
}
Kokkos::deep_copy(values,h_values);
LAndFunctor f;
f.values = values;
Scalar init = 1;
{
Scalar land_scalar = init;
Kokkos::Experimental::LAnd<Scalar> reducer_scalar(land_scalar);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
ASSERT_EQ(land_scalar,reference_land);
Scalar land_scalar_view = reducer_scalar.result_view()();
ASSERT_EQ(land_scalar_view,reference_land);
}
{
Kokkos::View<Scalar,Kokkos::HostSpace> land_view("View");
land_view() = init;
Kokkos::Experimental::LAnd<Scalar> reducer_view(land_view);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
Scalar land_view_scalar = land_view();
ASSERT_EQ(land_view_scalar,reference_land);
Scalar land_view_view = reducer_view.result_view()();
ASSERT_EQ(land_view_view,reference_land);
}
}
static void test_LOr(int N) {
Kokkos::View<Scalar*,ExecSpace> values("Values",N);
auto h_values = Kokkos::create_mirror_view(values);
Scalar reference_lor = 0;
for(int i=0; i<N; i++) {
h_values(i) = (Scalar)(rand()%2);
reference_lor = reference_lor || h_values(i);
}
Kokkos::deep_copy(values,h_values);
LOrFunctor f;
f.values = values;
Scalar init = 0;
{
Scalar lor_scalar = init;
Kokkos::Experimental::LOr<Scalar> reducer_scalar(lor_scalar);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
ASSERT_EQ(lor_scalar,reference_lor);
Scalar lor_scalar_view = reducer_scalar.result_view()();
ASSERT_EQ(lor_scalar_view,reference_lor);
}
{
Kokkos::View<Scalar,Kokkos::HostSpace> lor_view("View");
lor_view() = init;
Kokkos::Experimental::LOr<Scalar> reducer_view(lor_view);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
Scalar lor_view_scalar = lor_view();
ASSERT_EQ(lor_view_scalar,reference_lor);
Scalar lor_view_view = reducer_view.result_view()();
ASSERT_EQ(lor_view_view,reference_lor);
}
}
static void test_LXor(int N) {
Kokkos::View<Scalar*,ExecSpace> values("Values",N);
auto h_values = Kokkos::create_mirror_view(values);
Scalar reference_lxor = 0;
for(int i=0; i<N; i++) {
h_values(i) = (Scalar)(rand()%2);
reference_lxor = reference_lxor ? (!h_values(i)) : h_values(i);
}
Kokkos::deep_copy(values,h_values);
LXorFunctor f;
f.values = values;
Scalar init = 0;
{
Scalar lxor_scalar = init;
Kokkos::Experimental::LXor<Scalar> reducer_scalar(lxor_scalar);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
ASSERT_EQ(lxor_scalar,reference_lxor);
Scalar lxor_scalar_view = reducer_scalar.result_view()();
ASSERT_EQ(lxor_scalar_view,reference_lxor);
}
{
Kokkos::View<Scalar,Kokkos::HostSpace> lxor_view("View");
lxor_view() = init;
Kokkos::Experimental::LXor<Scalar> reducer_view(lxor_view);
Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
Scalar lxor_view_scalar = lxor_view();
ASSERT_EQ(lxor_view_scalar,reference_lxor);
Scalar lxor_view_view = reducer_view.result_view()();
ASSERT_EQ(lxor_view_view,reference_lxor);
}
}
static void execute_float() {
test_sum(10001);
test_prod(35);
test_min(10003);
test_minloc(10003);
test_max(10007);
test_maxloc(10007);
test_minmaxloc(10007);
}
static void execute_integer() {
test_sum(10001);
test_prod(35);
test_min(10003);
test_minloc(10003);
test_max(10007);
test_maxloc(10007);
test_minmaxloc(10007);
test_BAnd(35);
test_BOr(35);
test_BXor(35);
test_LAnd(35);
test_LOr(35);
test_LXor(35);
}
static void execute_basic() {
test_sum(10001);
test_prod(35);
}
};
}
/*--------------------------------------------------------------------------*/
diff --git a/lib/kokkos/core/unit_test/TestSerial.cpp b/lib/kokkos/core/unit_test/TestSerial.cpp
deleted file mode 100644
index d85614e66..000000000
--- a/lib/kokkos/core/unit_test/TestSerial.cpp
+++ /dev/null
@@ -1,571 +0,0 @@
-/*
-//@HEADER
-// ************************************************************************
-//
-// Kokkos v. 2.0
-// Copyright (2014) Sandia Corporation
-//
-// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
-// the U.S. Government retains certain rights in this software.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are
-// met:
-//
-// 1. Redistributions of source code must retain the above copyright
-// notice, this list of conditions and the following disclaimer.
-//
-// 2. Redistributions in binary form must reproduce the above copyright
-// notice, this list of conditions and the following disclaimer in the
-// documentation and/or other materials provided with the distribution.
-//
-// 3. Neither the name of the Corporation nor the names of the
-// contributors may be used to endorse or promote products derived from
-// this software without specific prior written permission.
-//
-// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
-// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
-// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
-// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
-// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
-// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
-// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
-// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
-// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-//
-// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
-// ************************************************************************
-//@HEADER
-*/
-#include <gtest/gtest.h>
-
-#include <Kokkos_Macros.hpp>
-#ifdef KOKKOS_LAMBDA
-#undef KOKKOS_LAMBDA
-#endif
-#define KOKKOS_LAMBDA [=]
-
-#include <Kokkos_Core.hpp>
-
-#include <impl/Kokkos_ViewTileLeft.hpp>
-#include <TestTile.hpp>
-
-#include <impl/Kokkos_Serial_TaskPolicy.hpp>
-
-//----------------------------------------------------------------------------
-
-#include <TestSharedAlloc.hpp>
-#include <TestViewMapping.hpp>
-
-#include <TestViewImpl.hpp>
-
-#include <TestViewAPI.hpp>
-#include <TestViewOfClass.hpp>
-#include <TestViewSubview.hpp>
-#include <TestAtomic.hpp>
-#include <TestAtomicOperations.hpp>
-#include <TestRange.hpp>
-#include <TestTeam.hpp>
-#include <TestReduce.hpp>
-#include <TestScan.hpp>
-#include <TestAggregate.hpp>
-#include <TestAggregateReduction.hpp>
-#include <TestCompilerMacros.hpp>
-#include <TestTaskPolicy.hpp>
-#include <TestMemoryPool.hpp>
-
-
-#include <TestCXX11.hpp>
-#include <TestCXX11Deduction.hpp>
-#include <TestTeamVector.hpp>
-#include <TestMemorySpaceTracking.hpp>
-#include <TestTemplateMetaFunctions.hpp>
-
-#include <TestPolicyConstruction.hpp>
-
-#include <TestMDRange.hpp>
-
-namespace Test {
-
-class serial : public ::testing::Test {
-protected:
- static void SetUpTestCase()
- {
- Kokkos::HostSpace::execution_space::initialize();
- }
- static void TearDownTestCase()
- {
- Kokkos::HostSpace::execution_space::finalize();
- }
-};
-
-TEST_F( serial , md_range ) {
- TestMDRange_2D< Kokkos::Serial >::test_for2(100,100);
-
- TestMDRange_3D< Kokkos::Serial >::test_for3(100,100,100);
-}
-
-TEST_F( serial , impl_shared_alloc ) {
- test_shared_alloc< Kokkos::HostSpace , Kokkos::Serial >();
-}
-
-TEST_F( serial, policy_construction) {
- TestRangePolicyConstruction< Kokkos::Serial >();
- TestTeamPolicyConstruction< Kokkos::Serial >();
-}
-
-TEST_F( serial , impl_view_mapping ) {
- test_view_mapping< Kokkos::Serial >();
- test_view_mapping_subview< Kokkos::Serial >();
- test_view_mapping_operator< Kokkos::Serial >();
- TestViewMappingAtomic< Kokkos::Serial >::run();
-}
-
-TEST_F( serial, view_impl) {
- test_view_impl< Kokkos::Serial >();
-}
-
-TEST_F( serial, view_api) {
- TestViewAPI< double , Kokkos::Serial >();
-}
-
-TEST_F( serial , view_nested_view )
-{
- ::Test::view_nested_view< Kokkos::Serial >();
-}
-
-TEST_F( serial, view_subview_auto_1d_left ) {
- TestViewSubview::test_auto_1d< Kokkos::LayoutLeft,Kokkos::Serial >();
-}
-
-TEST_F( serial, view_subview_auto_1d_right ) {
- TestViewSubview::test_auto_1d< Kokkos::LayoutRight,Kokkos::Serial >();
-}
-
-TEST_F( serial, view_subview_auto_1d_stride ) {
- TestViewSubview::test_auto_1d< Kokkos::LayoutStride,Kokkos::Serial >();
-}
-
-TEST_F( serial, view_subview_assign_strided ) {
- TestViewSubview::test_1d_strided_assignment< Kokkos::Serial >();
-}
-
-TEST_F( serial, view_subview_left_0 ) {
- TestViewSubview::test_left_0< Kokkos::Serial >();
-}
-
-TEST_F( serial, view_subview_left_1 ) {
- TestViewSubview::test_left_1< Kokkos::Serial >();
-}
-
-TEST_F( serial, view_subview_left_2 ) {
- TestViewSubview::test_left_2< Kokkos::Serial >();
-}
-
-TEST_F( serial, view_subview_left_3 ) {
- TestViewSubview::test_left_3< Kokkos::Serial >();
-}
-
-TEST_F( serial, view_subview_right_0 ) {
- TestViewSubview::test_right_0< Kokkos::Serial >();
-}
-
-TEST_F( serial, view_subview_right_1 ) {
- TestViewSubview::test_right_1< Kokkos::Serial >();
-}
-
-TEST_F( serial, view_subview_right_3 ) {
- TestViewSubview::test_right_3< Kokkos::Serial >();
-}
-
-TEST_F( serial , range_tag )
-{
- TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >::test_for(1000);
- TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >::test_reduce(1000);
- TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >::test_scan(1000);
- TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(1001);
- TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(1001);
- TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(1001);
- TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy(1000);
-}
-
-TEST_F( serial , team_tag )
-{
- TestTeamPolicy< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >::test_for(1000);
- TestTeamPolicy< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >::test_reduce(1000);
- TestTeamPolicy< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(1000);
- TestTeamPolicy< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(1000);
-}
-
-TEST_F( serial, long_reduce) {
- TestReduce< long , Kokkos::Serial >( 1000000 );
-}
-
-TEST_F( serial, double_reduce) {
- TestReduce< double , Kokkos::Serial >( 1000000 );
-}
-
-TEST_F( serial , reducers )
-{
- TestReducers<int, Kokkos::Serial>::execute_integer();
- TestReducers<size_t, Kokkos::Serial>::execute_integer();
- TestReducers<double, Kokkos::Serial>::execute_float();
- TestReducers<Kokkos::complex<double>, Kokkos::Serial>::execute_basic();
-}
-
-TEST_F( serial, long_reduce_dynamic ) {
- TestReduceDynamic< long , Kokkos::Serial >( 1000000 );
-}
-
-TEST_F( serial, double_reduce_dynamic ) {
- TestReduceDynamic< double , Kokkos::Serial >( 1000000 );
-}
-
-TEST_F( serial, long_reduce_dynamic_view ) {
- TestReduceDynamicView< long , Kokkos::Serial >( 1000000 );
-}
-
-TEST_F( serial , scan )
-{
- TestScan< Kokkos::Serial >::test_range( 1 , 1000 );
- TestScan< Kokkos::Serial >( 10 );
- TestScan< Kokkos::Serial >( 10000 );
-}
-
-TEST_F( serial , team_long_reduce) {
- TestReduceTeam< long , Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >( 3 );
- TestReduceTeam< long , Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
- TestReduceTeam< long , Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >( 100000 );
- TestReduceTeam< long , Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
-}
-
-TEST_F( serial , team_double_reduce) {
- TestReduceTeam< double , Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >( 3 );
- TestReduceTeam< double , Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
- TestReduceTeam< double , Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >( 100000 );
- TestReduceTeam< double , Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
-}
-
-TEST_F( serial , team_shared_request) {
- TestSharedTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >();
- TestSharedTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >();
-}
-
-#if defined(KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA)
-TEST_F( serial , team_lambda_shared_request) {
- TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >();
- TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >();
-}
-#endif
-
-TEST_F( serial, shmem_size) {
- TestShmemSize< Kokkos::Serial >();
-}
-
-TEST_F( serial , team_scan )
-{
- TestScanTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >( 10 );
- TestScanTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >( 10 );
- TestScanTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >( 10000 );
- TestScanTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >( 10000 );
-}
-
-
-TEST_F( serial , view_remap )
-{
- enum { N0 = 3 , N1 = 2 , N2 = 8 , N3 = 9 };
-
- typedef Kokkos::View< double*[N1][N2][N3] ,
- Kokkos::LayoutRight ,
- Kokkos::Serial > output_type ;
-
- typedef Kokkos::View< int**[N2][N3] ,
- Kokkos::LayoutLeft ,
- Kokkos::Serial > input_type ;
-
- typedef Kokkos::View< int*[N0][N2][N3] ,
- Kokkos::LayoutLeft ,
- Kokkos::Serial > diff_type ;
-
- output_type output( "output" , N0 );
- input_type input ( "input" , N0 , N1 );
- diff_type diff ( "diff" , N0 );
-
- int value = 0 ;
- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
- for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
- input(i0,i1,i2,i3) = ++value ;
- }}}}
-
- // Kokkos::deep_copy( diff , input ); // throw with incompatible shape
- Kokkos::deep_copy( output , input );
-
- value = 0 ;
- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
- for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
- ++value ;
- ASSERT_EQ( value , ((int) output(i0,i1,i2,i3) ) );
- }}}}
-}
-
-//----------------------------------------------------------------------------
-
-TEST_F( serial , view_aggregate )
-{
- TestViewAggregate< Kokkos::Serial >();
- TestViewAggregateReduction< Kokkos::Serial >();
-}
-
-//----------------------------------------------------------------------------
-
-TEST_F( serial , atomics )
-{
- const int loop_count = 1e6 ;
-
- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Serial>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Serial>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Serial>(loop_count,3) ) );
-
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Serial>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Serial>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Serial>(loop_count,3) ) );
-
- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Serial>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Serial>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Serial>(loop_count,3) ) );
-
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Serial>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Serial>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Serial>(loop_count,3) ) );
-
- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Serial>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Serial>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Serial>(loop_count,3) ) );
-
- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Serial>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Serial>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Serial>(loop_count,3) ) );
-
- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Serial>(100,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Serial>(100,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Serial>(100,3) ) );
-
- ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Serial>(100,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Serial>(100,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Serial>(100,3) ) );
-
- ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::Serial>(100,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::Serial>(100,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::Serial>(100,3) ) );
-}
-
-TEST_F( serial , atomic_operations )
-{
- const int start = 1; //Avoid zero for division
- const int end = 11;
- for (int i = start; i < end; ++i)
- {
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 4 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 5 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 6 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 7 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 8 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 9 ) ) );
-
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 4 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 5 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 6 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 7 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 8 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 9 ) ) );
-
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 4 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 5 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 6 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 7 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 8 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 9 ) ) );
-
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 4 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 5 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 6 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 7 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 8 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 9 ) ) );
-
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 4 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 5 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 6 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 7 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 8 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 9 ) ) );
-
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Serial>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Serial>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Serial>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Serial>(start, end-i, 4 ) ) );
-
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Serial>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Serial>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Serial>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Serial>(start, end-i, 4 ) ) );
- }
-
-}
-//----------------------------------------------------------------------------
-
-TEST_F( serial, tile_layout )
-{
- TestTile::test< Kokkos::Serial , 1 , 1 >( 1 , 1 );
- TestTile::test< Kokkos::Serial , 1 , 1 >( 2 , 3 );
- TestTile::test< Kokkos::Serial , 1 , 1 >( 9 , 10 );
-
- TestTile::test< Kokkos::Serial , 2 , 2 >( 1 , 1 );
- TestTile::test< Kokkos::Serial , 2 , 2 >( 2 , 3 );
- TestTile::test< Kokkos::Serial , 2 , 2 >( 4 , 4 );
- TestTile::test< Kokkos::Serial , 2 , 2 >( 9 , 9 );
-
- TestTile::test< Kokkos::Serial , 2 , 4 >( 9 , 9 );
- TestTile::test< Kokkos::Serial , 4 , 2 >( 9 , 9 );
-
- TestTile::test< Kokkos::Serial , 4 , 4 >( 1 , 1 );
- TestTile::test< Kokkos::Serial , 4 , 4 >( 4 , 4 );
- TestTile::test< Kokkos::Serial , 4 , 4 >( 9 , 9 );
- TestTile::test< Kokkos::Serial , 4 , 4 >( 9 , 11 );
-
- TestTile::test< Kokkos::Serial , 8 , 8 >( 1 , 1 );
- TestTile::test< Kokkos::Serial , 8 , 8 >( 4 , 4 );
- TestTile::test< Kokkos::Serial , 8 , 8 >( 9 , 9 );
- TestTile::test< Kokkos::Serial , 8 , 8 >( 9 , 11 );
-}
-
-//----------------------------------------------------------------------------
-
-TEST_F( serial , compiler_macros )
-{
- ASSERT_TRUE( ( TestCompilerMacros::Test< Kokkos::Serial >() ) );
-}
-
-//----------------------------------------------------------------------------
-
-TEST_F( serial , memory_space )
-{
- TestMemorySpace< Kokkos::Serial >();
-}
-
-TEST_F( serial , memory_pool )
-{
- bool val = TestMemoryPool::test_mempool< Kokkos::Serial >( 128, 128000000 );
- ASSERT_TRUE( val );
-
- TestMemoryPool::test_mempool2< Kokkos::Serial >( 64, 4, 1000000, 2000000 );
-
- TestMemoryPool::test_memory_exhaustion< Kokkos::Serial >();
-}
-
-//----------------------------------------------------------------------------
-
-#if defined( KOKKOS_ENABLE_TASKPOLICY )
-
-TEST_F( serial , task_fib )
-{
- for ( int i = 0 ; i < 25 ; ++i ) {
- TestTaskPolicy::TestFib< Kokkos::Serial >::run(i);
- }
-}
-
-TEST_F( serial , task_depend )
-{
- for ( int i = 0 ; i < 25 ; ++i ) {
- TestTaskPolicy::TestTaskDependence< Kokkos::Serial >::run(i);
- }
-}
-
-TEST_F( serial , task_team )
-{
- TestTaskPolicy::TestTaskTeam< Kokkos::Serial >::run(1000);
- //TestTaskPolicy::TestTaskTeamValue< Kokkos::Serial >::run(1000); //put back after testing
-}
-
-TEST_F( serial , old_task_policy )
-{
- TestTaskPolicy::test_task_dep< Kokkos::Serial >( 10 );
- // TestTaskPolicy::test_norm2< Kokkos::Serial >( 1000 );
- // for ( long i = 0 ; i < 30 ; ++i ) TestTaskPolicy::test_fib< Kokkos::Serial >(i);
- // for ( long i = 0 ; i < 40 ; ++i ) TestTaskPolicy::test_fib2< Kokkos::Serial >(i);
- for ( long i = 0 ; i < 20 ; ++i ) TestTaskPolicy::test_fib< Kokkos::Serial >(i);
- for ( long i = 0 ; i < 25 ; ++i ) TestTaskPolicy::test_fib2< Kokkos::Serial >(i);
-}
-
-TEST_F( serial , old_task_team )
-{
- TestTaskPolicy::test_task_team< Kokkos::Serial >(1000);
-}
-
-#endif /* #if defined( KOKKOS_ENABLE_TASKPOLICY ) */
-
-//----------------------------------------------------------------------------
-
-TEST_F( serial , template_meta_functions )
-{
- TestTemplateMetaFunctions<int, Kokkos::Serial >();
-}
-
-//----------------------------------------------------------------------------
-
-#if defined( KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_SERIAL )
-TEST_F( serial , cxx11 )
-{
- if ( Kokkos::Impl::is_same< Kokkos::DefaultExecutionSpace , Kokkos::Serial >::value ) {
- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Serial >(1) ) );
- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Serial >(2) ) );
- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Serial >(3) ) );
- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Serial >(4) ) );
- }
-}
-#endif
-
-TEST_F( serial , reduction_deduction )
-{
- TestCXX11::test_reduction_deduction< Kokkos::Serial >();
-}
-
-TEST_F( serial , team_vector )
-{
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(0) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(1) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(2) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(3) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(4) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(5) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(6) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(7) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(8) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(9) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(10) ) );
-}
-
-} // namespace test
-
diff --git a/lib/kokkos/core/unit_test/TestSharedAlloc.hpp b/lib/kokkos/core/unit_test/TestSharedAlloc.hpp
index 611668881..291f9f60e 100644
--- a/lib/kokkos/core/unit_test/TestSharedAlloc.hpp
+++ b/lib/kokkos/core/unit_test/TestSharedAlloc.hpp
@@ -1,215 +1,215 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <gtest/gtest.h>
#include <stdexcept>
#include <sstream>
#include <iostream>
#include <Kokkos_Core.hpp>
/*--------------------------------------------------------------------------*/
namespace Test {
struct SharedAllocDestroy {
volatile int * count ;
SharedAllocDestroy() = default ;
SharedAllocDestroy( int * arg ) : count( arg ) {}
void destroy_shared_allocation()
{
- Kokkos::atomic_fetch_add( count , 1 );
+ Kokkos::atomic_increment( count );
}
};
template< class MemorySpace , class ExecutionSpace >
void test_shared_alloc()
{
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- typedef const Kokkos::Experimental::Impl::SharedAllocationHeader Header ;
- typedef Kokkos::Experimental::Impl::SharedAllocationTracker Tracker ;
- typedef Kokkos::Experimental::Impl::SharedAllocationRecord< void , void > RecordBase ;
- typedef Kokkos::Experimental::Impl::SharedAllocationRecord< MemorySpace , void > RecordMemS ;
- typedef Kokkos::Experimental::Impl::SharedAllocationRecord< MemorySpace , SharedAllocDestroy > RecordFull ;
+ typedef const Kokkos::Impl::SharedAllocationHeader Header ;
+ typedef Kokkos::Impl::SharedAllocationTracker Tracker ;
+ typedef Kokkos::Impl::SharedAllocationRecord< void , void > RecordBase ;
+ typedef Kokkos::Impl::SharedAllocationRecord< MemorySpace , void > RecordMemS ;
+ typedef Kokkos::Impl::SharedAllocationRecord< MemorySpace , SharedAllocDestroy > RecordFull ;
static_assert( sizeof(Tracker) == sizeof(int*), "SharedAllocationTracker has wrong size!" );
MemorySpace s ;
const size_t N = 1200 ;
const size_t size = 8 ;
RecordMemS * rarray[ N ];
Header * harray[ N ];
RecordMemS ** const r = rarray ;
Header ** const h = harray ;
Kokkos::RangePolicy< ExecutionSpace > range(0,N);
//----------------------------------------
{
// Since always executed on host space, leave [=]
Kokkos::parallel_for( range , [=]( size_t i ){
char name[64] ;
sprintf(name,"test_%.2d",int(i));
r[i] = RecordMemS::allocate( s , name , size * ( i + 1 ) );
h[i] = Header::get_header( r[i]->data() );
ASSERT_EQ( r[i]->use_count() , 0 );
for ( size_t j = 0 ; j < ( i / 10 ) + 1 ; ++j ) RecordBase::increment( r[i] );
ASSERT_EQ( r[i]->use_count() , ( i / 10 ) + 1 );
ASSERT_EQ( r[i] , RecordMemS::get_record( r[i]->data() ) );
});
// Sanity check for the whole set of allocation records to which this record belongs.
RecordBase::is_sane( r[0] );
// RecordMemS::print_records( std::cout , s , true );
Kokkos::parallel_for( range , [=]( size_t i ){
while ( 0 != ( r[i] = static_cast< RecordMemS *>( RecordBase::decrement( r[i] ) ) ) ) {
if ( r[i]->use_count() == 1 ) RecordBase::is_sane( r[i] );
}
});
}
//----------------------------------------
{
int destroy_count = 0 ;
SharedAllocDestroy counter( & destroy_count );
Kokkos::parallel_for( range , [=]( size_t i ){
char name[64] ;
sprintf(name,"test_%.2d",int(i));
RecordFull * rec = RecordFull::allocate( s , name , size * ( i + 1 ) );
rec->m_destroy = counter ;
r[i] = rec ;
h[i] = Header::get_header( r[i]->data() );
ASSERT_EQ( r[i]->use_count() , 0 );
for ( size_t j = 0 ; j < ( i / 10 ) + 1 ; ++j ) RecordBase::increment( r[i] );
ASSERT_EQ( r[i]->use_count() , ( i / 10 ) + 1 );
ASSERT_EQ( r[i] , RecordMemS::get_record( r[i]->data() ) );
});
RecordBase::is_sane( r[0] );
Kokkos::parallel_for( range , [=]( size_t i ){
while ( 0 != ( r[i] = static_cast< RecordMemS *>( RecordBase::decrement( r[i] ) ) ) ) {
if ( r[i]->use_count() == 1 ) RecordBase::is_sane( r[i] );
}
});
ASSERT_EQ( destroy_count , int(N) );
}
//----------------------------------------
{
int destroy_count = 0 ;
{
RecordFull * rec = RecordFull::allocate( s , "test" , size );
// ... Construction of the allocated { rec->data() , rec->size() }
// Copy destruction function object into the allocation record
rec->m_destroy = SharedAllocDestroy( & destroy_count );
ASSERT_EQ( rec->use_count() , 0 );
// Start tracking, increments the use count from 0 to 1
Tracker track ;
track.assign_allocated_record_to_uninitialized( rec );
ASSERT_EQ( rec->use_count() , 1 );
ASSERT_EQ( track.use_count() , 1 );
// Verify construction / destruction increment
for ( size_t i = 0 ; i < N ; ++i ) {
ASSERT_EQ( rec->use_count() , 1 );
{
Tracker local_tracker ;
local_tracker.assign_allocated_record_to_uninitialized( rec );
ASSERT_EQ( rec->use_count() , 2 );
ASSERT_EQ( local_tracker.use_count() , 2 );
}
ASSERT_EQ( rec->use_count() , 1 );
ASSERT_EQ( track.use_count() , 1 );
}
Kokkos::parallel_for( range , [=]( size_t i ){
Tracker local_tracker ;
local_tracker.assign_allocated_record_to_uninitialized( rec );
ASSERT_GT( rec->use_count() , 1 );
});
ASSERT_EQ( rec->use_count() , 1 );
ASSERT_EQ( track.use_count() , 1 );
// Destruction of 'track' object deallocates the 'rec' and invokes the destroy function object.
}
ASSERT_EQ( destroy_count , 1 );
}
#endif /* #if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST ) */
}
}
diff --git a/lib/kokkos/core/unit_test/TestSynchronic.cpp b/lib/kokkos/core/unit_test/TestSynchronic.cpp
index 9121dc15a..f6a3f38e3 100644
--- a/lib/kokkos/core/unit_test/TestSynchronic.cpp
+++ b/lib/kokkos/core/unit_test/TestSynchronic.cpp
@@ -1,448 +1,448 @@
/*
Copyright (c) 2014, NVIDIA Corporation
All rights reserved.
Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
OF THE POSSIBILITY OF SUCH DAMAGE.
*/
//#undef _WIN32_WINNT
//#define _WIN32_WINNT 0x0602
-#if defined(__powerpc__) || defined(__ppc__) || defined(__PPC__) || defined(__APPLE__)
+#if defined(__powerpc__) || defined(__ppc__) || defined(__PPC__) || defined(__APPLE__) || defined(__ARM_ARCH_8A)
// Skip for now
#else
#include <gtest/gtest.h>
#ifdef USEOMP
#include <omp.h>
#endif
#include <iostream>
#include <sstream>
#include <algorithm>
#include <string>
#include <vector>
#include <map>
#include <cstring>
#include <ctime>
//#include <details/config>
//#undef __SYNCHRONIC_COMPATIBLE
#include <impl/Kokkos_Synchronic.hpp>
#include <impl/Kokkos_Synchronic_n3998.hpp>
#include "TestSynchronic.hpp"
// Uncomment to allow test to dump output
//#define VERBOSE_TEST
namespace Test {
unsigned next_table[] =
{
0, 1, 2, 3, //0-3
4, 4, 6, 6, //4-7
8, 8, 8, 8, //8-11
12, 12, 12, 12, //12-15
16, 16, 16, 16, //16-19
16, 16, 16, 16, //20-23
24, 24, 24, 24, //24-27
24, 24, 24, 24, //28-31
32, 32, 32, 32, //32-35
32, 32, 32, 32, //36-39
40, 40, 40, 40, //40-43
40, 40, 40, 40, //44-47
48, 48, 48, 48, //48-51
48, 48, 48, 48, //52-55
56, 56, 56, 56, //56-59
56, 56, 56, 56, //60-63
};
//change this if you want to allow oversubscription of the system, by default only the range {1-(system size)} is tested
#define FOR_GAUNTLET(x) for(unsigned x = (std::min)(std::thread::hardware_concurrency()*8,unsigned(sizeof(next_table)/sizeof(unsigned))); x; x = next_table[x-1])
//set this to override the benchmark of barriers to use OMP barriers instead of n3998 std::barrier
//#define USEOMP
#if defined(__SYNCHRONIC_COMPATIBLE)
#define PREFIX "futex-"
#else
#define PREFIX "backoff-"
#endif
//this test uses a custom Mersenne twister to eliminate implementation variation
MersenneTwister mt;
int dummya = 1, dummyb =1;
int dummy1 = 1;
std::atomic<int> dummy2(1);
std::atomic<int> dummy3(1);
double time_item(int const count = (int)1E8) {
clock_t const start = clock();
for(int i = 0;i < count; ++i)
mt.integer();
clock_t const end = clock();
double elapsed_seconds = (end - start) / double(CLOCKS_PER_SEC);
return elapsed_seconds / count;
}
double time_nil(int const count = (int)1E08) {
clock_t const start = clock();
dummy3 = count;
for(int i = 0;i < (int)1E6; ++i) {
if(dummy1) {
// Do some work while holding the lock
int workunits = dummy3;//(int) (mtc.poissonInterval((float)num_items_critical) + 0.5f);
for (int j = 1; j < workunits; j++)
dummy1 &= j; // Do one work unit
dummy2.fetch_add(dummy1,std::memory_order_relaxed);
}
}
clock_t const end = clock();
double elapsed_seconds = (end - start) / double(CLOCKS_PER_SEC);
return elapsed_seconds / count;
}
template <class mutex_type>
void testmutex_inner(mutex_type& m, std::atomic<int>& t,std::atomic<int>& wc,std::atomic<int>& wnc, int const num_iterations,
int const num_items_critical, int const num_items_noncritical, MersenneTwister& mtc, MersenneTwister& mtnc, bool skip) {
for(int k = 0; k < num_iterations; ++k) {
if(num_items_noncritical) {
// Do some work without holding the lock
int workunits = num_items_noncritical;//(int) (mtnc.poissonInterval((float)num_items_noncritical) + 0.5f);
for (int i = 1; i < workunits; i++)
mtnc.integer(); // Do one work unit
wnc.fetch_add(workunits,std::memory_order_relaxed);
}
t.fetch_add(1,std::memory_order_relaxed);
if(!skip) {
std::unique_lock<mutex_type> l(m);
if(num_items_critical) {
// Do some work while holding the lock
int workunits = num_items_critical;//(int) (mtc.poissonInterval((float)num_items_critical) + 0.5f);
for (int i = 1; i < workunits; i++)
mtc.integer(); // Do one work unit
wc.fetch_add(workunits,std::memory_order_relaxed);
}
}
}
}
template <class mutex_type>
void testmutex_outer(std::map<std::string,std::vector<double>>& results, std::string const& name, double critical_fraction, double critical_duration) {
std::ostringstream truename;
truename << name << " (f=" << critical_fraction << ",d=" << critical_duration << ")";
std::vector<double>& data = results[truename.str()];
double const workItemTime = time_item() ,
nilTime = time_nil();
int const num_items_critical = (critical_duration <= 0 ? 0 : (std::max)( int(critical_duration / workItemTime + 0.5), int(100 * nilTime / workItemTime + 0.5))),
num_items_noncritical = (num_items_critical <= 0 ? 0 : int( ( 1 - critical_fraction ) * num_items_critical / critical_fraction + 0.5 ));
FOR_GAUNTLET(num_threads) {
//Kokkos::Impl::portable_sleep(std::chrono::microseconds(2000000));
int const num_iterations = (num_items_critical + num_items_noncritical != 0) ?
#ifdef __SYNCHRONIC_JUST_YIELD
int( 1 / ( 8 * workItemTime ) / (num_items_critical + num_items_noncritical) / num_threads + 0.5 ) :
#else
int( 1 / ( 8 * workItemTime ) / (num_items_critical + num_items_noncritical) / num_threads + 0.5 ) :
#endif
#ifdef WIN32
int( 1 / workItemTime / (20 * num_threads * num_threads) );
#else
int( 1 / workItemTime / (200 * num_threads * num_threads) );
#endif
#ifdef VERBOSE_TEST
std::cerr << "running " << truename.str() << " #" << num_threads << ", " << num_iterations << " * " << num_items_noncritical << "\n" << std::flush;
#endif
std::atomic<int> t[2], wc[2], wnc[2];
clock_t start[2], end[2];
for(int pass = 0; pass < 2; ++pass) {
t[pass] = 0;
wc[pass] = 0;
wnc[pass] = 0;
srand(num_threads);
std::vector<MersenneTwister> randomsnc(num_threads),
randomsc(num_threads);
mutex_type m;
start[pass] = clock();
#ifdef USEOMP
omp_set_num_threads(num_threads);
std::atomic<int> _j(0);
#pragma omp parallel
{
int const j = _j.fetch_add(1,std::memory_order_relaxed);
testmutex_inner(m, t[pass], wc[pass], wnc[pass], num_iterations, num_items_critical, num_items_noncritical, randomsc[j], randomsnc[j], pass==0);
num_threads = omp_get_num_threads();
}
#else
std::vector<std::thread*> threads(num_threads);
for(unsigned j = 0; j < num_threads; ++j)
threads[j] = new std::thread([&,j](){
testmutex_inner(m, t[pass], wc[pass], wnc[pass], num_iterations, num_items_critical, num_items_noncritical, randomsc[j], randomsnc[j], pass==0);
}
);
for(unsigned j = 0; j < num_threads; ++j) {
threads[j]->join();
delete threads[j];
}
#endif
end[pass] = clock();
}
if(t[0] != t[1]) throw std::string("mismatched iteration counts");
if(wnc[0] != wnc[1]) throw std::string("mismatched work item counts");
double elapsed_seconds_0 = (end[0] - start[0]) / double(CLOCKS_PER_SEC),
elapsed_seconds_1 = (end[1] - start[1]) / double(CLOCKS_PER_SEC);
double time = (elapsed_seconds_1 - elapsed_seconds_0 - wc[1]*workItemTime) / num_iterations;
data.push_back(time);
#ifdef VERBOSE_TEST
std::cerr << truename.str() << " : " << num_threads << "," << elapsed_seconds_1 / num_iterations << " - " << elapsed_seconds_0 / num_iterations << " - " << wc[1]*workItemTime/num_iterations << " = " << time << " \n";
#endif
}
}
template <class barrier_type>
void testbarrier_inner(barrier_type& b, int const num_threads, int const j, std::atomic<int>& t,std::atomic<int>& w,
int const num_iterations_odd, int const num_iterations_even,
int const num_items_noncritical, MersenneTwister& arg_mt, bool skip) {
for(int k = 0; k < (std::max)(num_iterations_even,num_iterations_odd); ++k) {
if(k >= (~j & 0x1 ? num_iterations_odd : num_iterations_even )) {
if(!skip)
b.arrive_and_drop();
break;
}
if(num_items_noncritical) {
// Do some work without holding the lock
int workunits = (int) (arg_mt.poissonInterval((float)num_items_noncritical) + 0.5f);
for (int i = 1; i < workunits; i++)
arg_mt.integer(); // Do one work unit
w.fetch_add(workunits,std::memory_order_relaxed);
}
t.fetch_add(1,std::memory_order_relaxed);
if(!skip) {
int const thiscount = (std::min)(k+1,num_iterations_odd)*((num_threads>>1)+(num_threads&1)) + (std::min)(k+1,num_iterations_even)*(num_threads>>1);
if(t.load(std::memory_order_relaxed) > thiscount) {
std::cerr << "FAILURE: some threads have run ahead of the barrier (" << t.load(std::memory_order_relaxed) << ">" << thiscount << ").\n";
EXPECT_TRUE(false);
}
#ifdef USEOMP
#pragma omp barrier
#else
b.arrive_and_wait();
#endif
if(t.load(std::memory_order_relaxed) < thiscount) {
std::cerr << "FAILURE: some threads have fallen behind the barrier (" << t.load(std::memory_order_relaxed) << "<" << thiscount << ").\n";
EXPECT_TRUE(false);
}
}
}
}
template <class barrier_type>
void testbarrier_outer(std::map<std::string,std::vector<double>>& results, std::string const& name, double barrier_frequency, double phase_duration, bool randomIterations = false) {
std::vector<double>& data = results[name];
double const workItemTime = time_item();
int const num_items_noncritical = int( phase_duration / workItemTime + 0.5 );
FOR_GAUNTLET(num_threads) {
int const num_iterations = int( barrier_frequency );
#ifdef VERBOSE_TEST
std::cerr << "running " << name << " #" << num_threads << ", " << num_iterations << " * " << num_items_noncritical << "\r" << std::flush;
#endif
srand(num_threads);
MersenneTwister local_mt;
int const num_iterations_odd = randomIterations ? int(local_mt.poissonInterval((float)num_iterations)+0.5f) : num_iterations,
num_iterations_even = randomIterations ? int(local_mt.poissonInterval((float)num_iterations)+0.5f) : num_iterations;
std::atomic<int> t[2], w[2];
std::chrono::time_point<std::chrono::high_resolution_clock> start[2], end[2];
for(int pass = 0; pass < 2; ++pass) {
t[pass] = 0;
w[pass] = 0;
srand(num_threads);
std::vector<MersenneTwister> randoms(num_threads);
barrier_type b(num_threads);
start[pass] = std::chrono::high_resolution_clock::now();
#ifdef USEOMP
omp_set_num_threads(num_threads);
std::atomic<int> _j(0);
#pragma omp parallel
{
int const j = _j.fetch_add(1,std::memory_order_relaxed);
testbarrier_inner(b, num_threads, j, t[pass], w[pass], num_iterations_odd, num_iterations_even, num_items_noncritical, randoms[j], pass==0);
num_threads = omp_get_num_threads();
}
#else
std::vector<std::thread*> threads(num_threads);
for(unsigned j = 0; j < num_threads; ++j)
threads[j] = new std::thread([&,j](){
testbarrier_inner(b, num_threads, j, t[pass], w[pass], num_iterations_odd, num_iterations_even, num_items_noncritical, randoms[j], pass==0);
});
for(unsigned j = 0; j < num_threads; ++j) {
threads[j]->join();
delete threads[j];
}
#endif
end[pass] = std::chrono::high_resolution_clock::now();
}
if(t[0] != t[1]) throw std::string("mismatched iteration counts");
if(w[0] != w[1]) throw std::string("mismatched work item counts");
int const phases = (std::max)(num_iterations_odd, num_iterations_even);
std::chrono::duration<double> elapsed_seconds_0 = end[0]-start[0],
elapsed_seconds_1 = end[1]-start[1];
double const time = (elapsed_seconds_1.count() - elapsed_seconds_0.count()) / phases;
data.push_back(time);
#ifdef VERBOSE_TEST
std::cerr << name << " : " << num_threads << "," << elapsed_seconds_1.count() / phases << " - " << elapsed_seconds_0.count() / phases << " = " << time << " \n";
#endif
}
}
template <class... T>
struct mutex_tester;
template <class F>
struct mutex_tester<F> {
static void run(std::map<std::string,std::vector<double>>& results, std::string const name[], double critical_fraction, double critical_duration) {
testmutex_outer<F>(results, *name, critical_fraction, critical_duration);
}
};
template <class F, class... T>
struct mutex_tester<F,T...> {
static void run(std::map<std::string,std::vector<double>>& results, std::string const name[], double critical_fraction, double critical_duration) {
mutex_tester<F>::run(results, name, critical_fraction, critical_duration);
mutex_tester<T...>::run(results, ++name, critical_fraction, critical_duration);
}
};
TEST( synchronic, main )
{
//warm up
time_item();
//measure up
#ifdef VERBOSE_TEST
std::cerr << "measuring work item speed...\r";
std::cerr << "work item speed is " << time_item() << " per item, nil is " << time_nil() << "\n";
#endif
try {
std::pair<double,double> testpoints[] = { {1, 0}, /*{1E-1, 10E-3}, {5E-1, 2E-6}, {3E-1, 50E-9},*/ };
for(auto x : testpoints ) {
std::map<std::string,std::vector<double>> results;
//testbarrier_outer<std::barrier>(results, PREFIX"bar 1khz 100us", 1E3, x.second);
std::string const names[] = {
PREFIX"tkt", PREFIX"mcs", PREFIX"ttas", PREFIX"std"
#ifdef WIN32
,PREFIX"srw"
#endif
};
//run -->
mutex_tester<
ticket_mutex, mcs_mutex, ttas_mutex, std::mutex
#ifdef WIN32
,srw_mutex
#endif
>::run(results, names, x.first, x.second);
//<-- run
#ifdef VERBOSE_TEST
std::cout << "threads";
for(auto & i : results)
std::cout << ",\"" << i.first << '\"';
std::cout << std::endl;
int j = 0;
FOR_GAUNTLET(num_threads) {
std::cout << num_threads;
for(auto & i : results)
std::cout << ',' << i.second[j];
std::cout << std::endl;
++j;
}
#endif
}
}
catch(std::string & e) {
std::cerr << "EXCEPTION : " << e << std::endl;
EXPECT_TRUE( false );
}
}
} // namespace Test
#endif
diff --git a/lib/kokkos/core/unit_test/TestSynchronic.hpp b/lib/kokkos/core/unit_test/TestSynchronic.hpp
index d820129e8..f4341b978 100644
--- a/lib/kokkos/core/unit_test/TestSynchronic.hpp
+++ b/lib/kokkos/core/unit_test/TestSynchronic.hpp
@@ -1,240 +1,241 @@
/*
Copyright (c) 2014, NVIDIA Corporation
All rights reserved.
Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef TEST_SYNCHRONIC_HPP
#define TEST_SYNCHRONIC_HPP
#include <impl/Kokkos_Synchronic.hpp>
#include <mutex>
+#include <cmath>
namespace Test {
template <bool truly>
struct dumb_mutex {
dumb_mutex () : locked(0) {
}
void lock() {
while(1) {
bool state = false;
if (locked.compare_exchange_weak(state,true,std::memory_order_acquire)) {
break;
}
while (locked.load(std::memory_order_relaxed)) {
if (!truly) {
Kokkos::Impl::portable_yield();
}
}
}
}
void unlock() {
locked.store(false,std::memory_order_release);
}
private :
std::atomic<bool> locked;
};
#ifdef WIN32
#include <winsock2.h>
#include <windows.h>
#include <synchapi.h>
struct srw_mutex {
srw_mutex () {
InitializeSRWLock(&_lock);
}
void lock() {
AcquireSRWLockExclusive(&_lock);
}
void unlock() {
ReleaseSRWLockExclusive(&_lock);
}
private :
SRWLOCK _lock;
};
#endif
struct ttas_mutex {
ttas_mutex() : locked(false) {
}
ttas_mutex(const ttas_mutex&) = delete;
ttas_mutex& operator=(const ttas_mutex&) = delete;
void lock() {
for(int i = 0;; ++i) {
bool state = false;
if(locked.compare_exchange_weak(state,true,std::memory_order_relaxed,Kokkos::Impl::notify_none))
break;
locked.expect_update(true);
}
std::atomic_thread_fence(std::memory_order_acquire);
}
void unlock() {
locked.store(false,std::memory_order_release);
}
private :
Kokkos::Impl::synchronic<bool> locked;
};
struct ticket_mutex {
ticket_mutex() : active(0), queue(0) {
}
ticket_mutex(const ticket_mutex&) = delete;
ticket_mutex& operator=(const ticket_mutex&) = delete;
void lock() {
int const me = queue.fetch_add(1, std::memory_order_relaxed);
while(me != active.load_when_equal(me, std::memory_order_acquire))
;
}
void unlock() {
active.fetch_add(1,std::memory_order_release);
}
private :
Kokkos::Impl::synchronic<int> active;
std::atomic<int> queue;
};
struct mcs_mutex {
mcs_mutex() : head(nullptr) {
}
mcs_mutex(const mcs_mutex&) = delete;
mcs_mutex& operator=(const mcs_mutex&) = delete;
struct unique_lock {
unique_lock(mcs_mutex & arg_m) : m(arg_m), next(nullptr), ready(false) {
unique_lock * const h = m.head.exchange(this,std::memory_order_acquire);
if(__builtin_expect(h != nullptr,0)) {
h->next.store(this,std::memory_order_seq_cst,Kokkos::Impl::notify_one);
while(!ready.load_when_not_equal(false,std::memory_order_acquire))
;
}
}
unique_lock(const unique_lock&) = delete;
unique_lock& operator=(const unique_lock&) = delete;
~unique_lock() {
unique_lock * h = this;
if(__builtin_expect(!m.head.compare_exchange_strong(h,nullptr,std::memory_order_release, std::memory_order_relaxed),0)) {
unique_lock * n = next.load(std::memory_order_relaxed);
while(!n)
n = next.load_when_not_equal(n,std::memory_order_relaxed);
n->ready.store(true,std::memory_order_release,Kokkos::Impl::notify_one);
}
}
private:
mcs_mutex & m;
Kokkos::Impl::synchronic<unique_lock*> next;
Kokkos::Impl::synchronic<bool> ready;
};
private :
std::atomic<unique_lock*> head;
};
}
namespace std {
template<>
struct unique_lock<Test::mcs_mutex> : Test::mcs_mutex::unique_lock {
unique_lock(Test::mcs_mutex & arg_m) : Test::mcs_mutex::unique_lock(arg_m) {
}
unique_lock(const unique_lock&) = delete;
unique_lock& operator=(const unique_lock&) = delete;
};
}
/* #include <cmath> */
#include <stdlib.h>
namespace Test {
//-------------------------------------
// MersenneTwister
//-------------------------------------
#define MT_IA 397
#define MT_LEN 624
class MersenneTwister
{
volatile unsigned long m_buffer[MT_LEN][64/sizeof(unsigned long)];
volatile int m_index;
public:
MersenneTwister() {
for (int i = 0; i < MT_LEN; i++)
m_buffer[i][0] = rand();
m_index = 0;
for (int i = 0; i < MT_LEN * 100; i++)
integer();
}
unsigned long integer() {
// Indices
int i = m_index;
int i2 = m_index + 1; if (i2 >= MT_LEN) i2 = 0; // wrap-around
int j = m_index + MT_IA; if (j >= MT_LEN) j -= MT_LEN; // wrap-around
// Twist
unsigned long s = (m_buffer[i][0] & 0x80000000) | (m_buffer[i2][0] & 0x7fffffff);
unsigned long r = m_buffer[j][0] ^ (s >> 1) ^ ((s & 1) * 0x9908B0DF);
m_buffer[m_index][0] = r;
m_index = i2;
// Swizzle
r ^= (r >> 11);
r ^= (r << 7) & 0x9d2c5680UL;
r ^= (r << 15) & 0xefc60000UL;
r ^= (r >> 18);
return r;
}
float poissonInterval(float ooLambda) {
return -logf(1.0f - integer() * 2.3283e-10f) * ooLambda;
}
};
} // namespace Test
#endif //TEST_HPP
diff --git a/lib/kokkos/core/unit_test/TestTaskPolicy.hpp b/lib/kokkos/core/unit_test/TestTaskPolicy.hpp
deleted file mode 100644
index 71790f6de..000000000
--- a/lib/kokkos/core/unit_test/TestTaskPolicy.hpp
+++ /dev/null
@@ -1,1145 +0,0 @@
-/*
-//@HEADER
-// ************************************************************************
-//
-// Kokkos v. 2.0
-// Copyright (2014) Sandia Corporation
-//
-// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
-// the U.S. Government retains certain rights in this software.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are
-// met:
-//
-// 1. Redistributions of source code must retain the above copyright
-// notice, this list of conditions and the following disclaimer.
-//
-// 2. Redistributions in binary form must reproduce the above copyright
-// notice, this list of conditions and the following disclaimer in the
-// documentation and/or other materials provided with the distribution.
-//
-// 3. Neither the name of the Corporation nor the names of the
-// contributors may be used to endorse or promote products derived from
-// this software without specific prior written permission.
-//
-// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
-// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
-// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
-// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
-// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
-// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
-// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
-// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
-// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-//
-// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
-// ************************************************************************
-//@HEADER
-*/
-
-
-#ifndef KOKKOS_UNITTEST_TASKPOLICY_HPP
-#define KOKKOS_UNITTEST_TASKPOLICY_HPP
-
-#include <stdio.h>
-#include <iostream>
-#include <cmath>
-#include <Kokkos_TaskPolicy.hpp>
-
-#if defined( KOKKOS_ENABLE_TASKPOLICY )
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace TestTaskPolicy {
-
-namespace {
-
-long eval_fib( long n )
-{
- constexpr long mask = 0x03 ;
-
- long fib[4] = { 0 , 1 , 1 , 2 };
-
- for ( long i = 2 ; i <= n ; ++i ) {
- fib[ i & mask ] = fib[ ( i - 1 ) & mask ] + fib[ ( i - 2 ) & mask ];
- }
-
- return fib[ n & mask ];
-}
-
-}
-
-template< typename Space >
-struct TestFib
-{
- typedef Kokkos::TaskPolicy<Space> policy_type ;
- typedef Kokkos::Future<long,Space> future_type ;
- typedef long value_type ;
-
- policy_type policy ;
- future_type fib_m1 ;
- future_type fib_m2 ;
- const value_type n ;
-
- KOKKOS_INLINE_FUNCTION
- TestFib( const policy_type & arg_policy , const value_type arg_n )
- : policy(arg_policy)
- , fib_m1() , fib_m2()
- , n( arg_n )
- {}
-
- KOKKOS_INLINE_FUNCTION
- void operator()( typename policy_type::member_type & , value_type & result )
- {
-#if 0
- printf( "\nTestFib(%ld) %d %d\n"
- , n
- , int( ! fib_m1.is_null() )
- , int( ! fib_m2.is_null() )
- );
-#endif
-
- if ( n < 2 ) {
- result = n ;
- }
- else if ( ! fib_m2.is_null() && ! fib_m1.is_null() ) {
- result = fib_m1.get() + fib_m2.get();
- }
- else {
-
- // Spawn new children and respawn myself to sum their results:
- // Spawn lower value at higher priority as it has a shorter
- // path to completion.
-
- fib_m2 = policy.task_spawn( TestFib(policy,n-2)
- , Kokkos::TaskSingle
- , Kokkos::TaskHighPriority );
-
- fib_m1 = policy.task_spawn( TestFib(policy,n-1)
- , Kokkos::TaskSingle );
-
- Kokkos::Future<Space> dep[] = { fib_m1 , fib_m2 };
-
- Kokkos::Future<Space> fib_all = policy.when_all( 2 , dep );
-
- if ( ! fib_m2.is_null() && ! fib_m1.is_null() && ! fib_all.is_null() ) {
- // High priority to retire this branch
- policy.respawn( this , Kokkos::TaskHighPriority , fib_all );
- }
- else {
-#if 0
- printf( "TestFib(%ld) insufficient memory alloc_capacity(%d) task_max(%d) task_accum(%ld)\n"
- , n
- , policy.allocation_capacity()
- , policy.allocated_task_count_max()
- , policy.allocated_task_count_accum()
- );
-#endif
- Kokkos::abort("TestFib insufficient memory");
-
- }
- }
- }
-
- static void run( int i , size_t MemoryCapacity = 16000 )
- {
- typedef typename policy_type::memory_space memory_space ;
-
- enum { Log2_SuperBlockSize = 12 };
-
- policy_type root_policy( memory_space() , MemoryCapacity , Log2_SuperBlockSize );
-
- future_type f = root_policy.host_spawn( TestFib(root_policy,i) , Kokkos::TaskSingle );
- Kokkos::wait( root_policy );
- ASSERT_EQ( eval_fib(i) , f.get() );
-
-#if 0
- fprintf( stdout , "\nTestFib::run(%d) spawn_size(%d) when_all_size(%d) alloc_capacity(%d) task_max(%d) task_accum(%ld)\n"
- , i
- , int(root_policy.template spawn_allocation_size<TestFib>())
- , int(root_policy.when_all_allocation_size(2))
- , root_policy.allocation_capacity()
- , root_policy.allocated_task_count_max()
- , root_policy.allocated_task_count_accum()
- );
- fflush( stdout );
-#endif
- }
-
-};
-
-} // namespace TestTaskPolicy
-
-//----------------------------------------------------------------------------
-
-namespace TestTaskPolicy {
-
-template< class Space >
-struct TestTaskDependence {
-
- typedef Kokkos::TaskPolicy<Space> policy_type ;
- typedef Kokkos::Future<Space> future_type ;
- typedef Kokkos::View<long,Space> accum_type ;
- typedef void value_type ;
-
- policy_type m_policy ;
- accum_type m_accum ;
- long m_count ;
-
- KOKKOS_INLINE_FUNCTION
- TestTaskDependence( long n
- , const policy_type & arg_policy
- , const accum_type & arg_accum )
- : m_policy( arg_policy )
- , m_accum( arg_accum )
- , m_count( n )
- {}
-
- KOKKOS_INLINE_FUNCTION
- void operator()( typename policy_type::member_type & )
- {
- enum { CHUNK = 8 };
- const int n = CHUNK < m_count ? CHUNK : m_count ;
-
- if ( 1 < m_count ) {
- future_type f[ CHUNK ] ;
-
- const int inc = ( m_count + n - 1 ) / n ;
-
- for ( int i = 0 ; i < n ; ++i ) {
- long begin = i * inc ;
- long count = begin + inc < m_count ? inc : m_count - begin ;
- f[i] = m_policy.task_spawn( TestTaskDependence(count,m_policy,m_accum) , Kokkos::TaskSingle );
- }
-
- m_count = 0 ;
-
- m_policy.respawn( this , m_policy.when_all( n , f ) );
- }
- else if ( 1 == m_count ) {
- Kokkos::atomic_increment( & m_accum() );
- }
- }
-
- static void run( int n )
- {
- typedef typename policy_type::memory_space memory_space ;
-
- // enum { MemoryCapacity = 4000 }; // Triggers infinite loop in memory pool
- enum { MemoryCapacity = 16000 };
- enum { Log2_SuperBlockSize = 12 };
- policy_type policy( memory_space() , MemoryCapacity , Log2_SuperBlockSize );
-
- accum_type accum("accum");
-
- typename accum_type::HostMirror host_accum =
- Kokkos::create_mirror_view( accum );
-
- policy.host_spawn( TestTaskDependence(n,policy,accum) , Kokkos::TaskSingle );
-
- Kokkos::wait( policy );
-
- Kokkos::deep_copy( host_accum , accum );
-
- ASSERT_EQ( host_accum() , n );
- }
-};
-
-} // namespace TestTaskPolicy
-
-//----------------------------------------------------------------------------
-
-namespace TestTaskPolicy {
-
-template< class ExecSpace >
-struct TestTaskTeam {
-
- //enum { SPAN = 8 };
- enum { SPAN = 33 };
- //enum { SPAN = 1 };
-
- typedef void value_type ;
- typedef Kokkos::TaskPolicy<ExecSpace> policy_type ;
- typedef Kokkos::Future<ExecSpace> future_type ;
- typedef Kokkos::View<long*,ExecSpace> view_type ;
-
- policy_type policy ;
- future_type future ;
-
- view_type parfor_result ;
- view_type parreduce_check ;
- view_type parscan_result ;
- view_type parscan_check ;
- const long nvalue ;
-
- KOKKOS_INLINE_FUNCTION
- TestTaskTeam( const policy_type & arg_policy
- , const view_type & arg_parfor_result
- , const view_type & arg_parreduce_check
- , const view_type & arg_parscan_result
- , const view_type & arg_parscan_check
- , const long arg_nvalue )
- : policy(arg_policy)
- , future()
- , parfor_result( arg_parfor_result )
- , parreduce_check( arg_parreduce_check )
- , parscan_result( arg_parscan_result )
- , parscan_check( arg_parscan_check )
- , nvalue( arg_nvalue )
- {}
-
- KOKKOS_INLINE_FUNCTION
- void operator()( typename policy_type::member_type & member )
- {
- const long end = nvalue + 1 ;
- const long begin = 0 < end - SPAN ? end - SPAN : 0 ;
-
- if ( 0 < begin && future.is_null() ) {
- if ( member.team_rank() == 0 ) {
- future = policy.task_spawn
- ( TestTaskTeam( policy ,
- parfor_result ,
- parreduce_check,
- parscan_result,
- parscan_check,
- begin - 1 )
- , Kokkos::TaskTeam );
-
- assert( ! future.is_null() );
-
- policy.respawn( this , future );
- }
- return ;
- }
-
- Kokkos::parallel_for( Kokkos::TeamThreadRange(member,begin,end)
- , [&]( int i ) { parfor_result[i] = i ; }
- );
-
- // test parallel_reduce without join
-
- long tot = 0;
- long expected = (begin+end-1)*(end-begin)*0.5;
-
- Kokkos::parallel_reduce( Kokkos::TeamThreadRange(member,begin,end)
- , [&]( int i, long &res) { res += parfor_result[i]; }
- , tot);
- Kokkos::parallel_for( Kokkos::TeamThreadRange(member,begin,end)
- , [&]( int i ) { parreduce_check[i] = expected-tot ; }
- );
-
- // test parallel_reduce with join
-
- tot = 0;
- Kokkos::parallel_reduce( Kokkos::TeamThreadRange(member,begin,end)
- , [&]( int i, long &res) { res += parfor_result[i]; }
- , [&]( long& val1, const long& val2) { val1 += val2; }
- , tot);
- Kokkos::parallel_for( Kokkos::TeamThreadRange(member,begin,end)
- , [&]( int i ) { parreduce_check[i] += expected-tot ; }
- );
-
-#if 0
- // test parallel_scan
-
- // Exclusive scan
- Kokkos::parallel_scan<long>( Kokkos::TeamThreadRange(member,begin,end)
- , [&]( int i, long &val , const bool final ) {
- if ( final ) { parscan_result[i] = val; }
- val += i;
- }
- );
-
- if ( member.team_rank() == 0 ) {
- for ( long i = begin ; i < end ; ++i ) {
- parscan_check[i] = (i*(i-1)-begin*(begin-1))*0.5-parscan_result[i];
- }
- }
-
- // Inclusive scan
- Kokkos::parallel_scan<long>( Kokkos::TeamThreadRange(member,begin,end)
- , [&]( int i, long &val , const bool final ) {
- val += i;
- if ( final ) { parscan_result[i] = val; }
- }
- );
-
- if ( member.team_rank() == 0 ) {
- for ( long i = begin ; i < end ; ++i ) {
- parscan_check[i] += (i*(i+1)-begin*(begin-1))*0.5-parscan_result[i];
- }
- }
-#endif
-
- }
-
- static void run( long n )
- {
- // const unsigned memory_capacity = 10000 ; // causes memory pool infinite loop
- // const unsigned memory_capacity = 100000 ; // fails with SPAN=1 for serial and OMP
- const unsigned memory_capacity = 400000 ;
-
- policy_type root_policy( typename policy_type::memory_space()
- , memory_capacity );
-
- view_type root_parfor_result("parfor_result",n+1);
- view_type root_parreduce_check("parreduce_check",n+1);
- view_type root_parscan_result("parscan_result",n+1);
- view_type root_parscan_check("parscan_check",n+1);
-
- typename view_type::HostMirror
- host_parfor_result = Kokkos::create_mirror_view( root_parfor_result );
- typename view_type::HostMirror
- host_parreduce_check = Kokkos::create_mirror_view( root_parreduce_check );
- typename view_type::HostMirror
- host_parscan_result = Kokkos::create_mirror_view( root_parscan_result );
- typename view_type::HostMirror
- host_parscan_check = Kokkos::create_mirror_view( root_parscan_check );
-
- future_type f = root_policy.host_spawn(
- TestTaskTeam( root_policy ,
- root_parfor_result ,
- root_parreduce_check ,
- root_parscan_result,
- root_parscan_check,
- n ) ,
- Kokkos::TaskTeam );
-
- Kokkos::wait( root_policy );
-
- Kokkos::deep_copy( host_parfor_result , root_parfor_result );
- Kokkos::deep_copy( host_parreduce_check , root_parreduce_check );
- Kokkos::deep_copy( host_parscan_result , root_parscan_result );
- Kokkos::deep_copy( host_parscan_check , root_parscan_check );
-
- for ( long i = 0 ; i <= n ; ++i ) {
- const long answer = i ;
- if ( host_parfor_result(i) != answer ) {
- std::cerr << "TestTaskTeam::run ERROR parallel_for result(" << i << ") = "
- << host_parfor_result(i) << " != " << answer << std::endl ;
- }
- if ( host_parreduce_check(i) != 0 ) {
- std::cerr << "TestTaskTeam::run ERROR parallel_reduce check(" << i << ") = "
- << host_parreduce_check(i) << " != 0" << std::endl ;
- } //TODO
- if ( host_parscan_check(i) != 0 ) {
- std::cerr << "TestTaskTeam::run ERROR parallel_scan check(" << i << ") = "
- << host_parscan_check(i) << " != 0" << std::endl ;
- }
- }
- }
-};
-
-template< class ExecSpace >
-struct TestTaskTeamValue {
-
- enum { SPAN = 8 };
-
- typedef long value_type ;
- typedef Kokkos::TaskPolicy<ExecSpace> policy_type ;
- typedef Kokkos::Future<value_type,ExecSpace> future_type ;
- typedef Kokkos::View<long*,ExecSpace> view_type ;
-
- policy_type policy ;
- future_type future ;
-
- view_type result ;
- const long nvalue ;
-
- KOKKOS_INLINE_FUNCTION
- TestTaskTeamValue( const policy_type & arg_policy
- , const view_type & arg_result
- , const long arg_nvalue )
- : policy(arg_policy)
- , future()
- , result( arg_result )
- , nvalue( arg_nvalue )
- {}
-
- KOKKOS_INLINE_FUNCTION
- void operator()( typename policy_type::member_type const & member
- , value_type & final )
- {
- const long end = nvalue + 1 ;
- const long begin = 0 < end - SPAN ? end - SPAN : 0 ;
-
- if ( 0 < begin && future.is_null() ) {
- if ( member.team_rank() == 0 ) {
-
- future = policy.task_spawn
- ( TestTaskTeamValue( policy , result , begin - 1 )
- , Kokkos::TaskTeam );
-
- assert( ! future.is_null() );
-
- policy.respawn( this , future );
- }
- return ;
- }
-
- Kokkos::parallel_for( Kokkos::TeamThreadRange(member,begin,end)
- , [&]( int i ) { result[i] = i + 1 ; }
- );
-
- if ( member.team_rank() == 0 ) {
- final = result[nvalue] ;
- }
-
- Kokkos::memory_fence();
- }
-
- static void run( long n )
- {
- // const unsigned memory_capacity = 10000 ; // causes memory pool infinite loop
- const unsigned memory_capacity = 100000 ;
-
- policy_type root_policy( typename policy_type::memory_space()
- , memory_capacity );
-
- view_type root_result("result",n+1);
-
- typename view_type::HostMirror
- host_result = Kokkos::create_mirror_view( root_result );
-
- future_type fv = root_policy.host_spawn
- ( TestTaskTeamValue( root_policy, root_result, n ) , Kokkos::TaskTeam );
-
- Kokkos::wait( root_policy );
-
- Kokkos::deep_copy( host_result , root_result );
-
- if ( fv.get() != n + 1 ) {
- std::cerr << "TestTaskTeamValue ERROR future = "
- << fv.get() << " != " << n + 1 << std::endl ;
- }
- for ( long i = 0 ; i <= n ; ++i ) {
- const long answer = i + 1 ;
- if ( host_result(i) != answer ) {
- std::cerr << "TestTaskTeamValue ERROR result(" << i << ") = "
- << host_result(i) << " != " << answer << std::endl ;
- }
- }
- }
-};
-} // namespace TestTaskPolicy
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-namespace TestTaskPolicy {
-
-template< class ExecSpace >
-struct FibChild {
-
- typedef long value_type ;
-
- Kokkos::Experimental::TaskPolicy<ExecSpace> policy ;
- Kokkos::Experimental::Future<long,ExecSpace> fib_1 ;
- Kokkos::Experimental::Future<long,ExecSpace> fib_2 ;
- const value_type n ;
- int has_nested ;
-
- KOKKOS_INLINE_FUNCTION
- FibChild( const Kokkos::Experimental::TaskPolicy<ExecSpace> & arg_policy
- , const value_type arg_n )
- : policy(arg_policy)
- , fib_1() , fib_2()
- , n( arg_n ), has_nested(0) {}
-
- KOKKOS_INLINE_FUNCTION
- void apply( value_type & result )
- {
- typedef Kokkos::Experimental::Future<long,ExecSpace> future_type ;
-
- if ( n < 2 ) {
-
- has_nested = -1 ;
-
- result = n ;
- }
- else {
- if ( has_nested == 0 ) {
- // Spawn new children and respawn myself to sum their results:
- // Spawn lower value at higher priority as it has a shorter
- // path to completion.
- if ( fib_2.is_null() ) {
- fib_2 = policy.task_create( FibChild(policy,n-2) );
- }
-
- if ( ! fib_2.is_null() && fib_1.is_null() ) {
- fib_1 = policy.task_create( FibChild(policy,n-1) );
- }
-
- if ( ! fib_1.is_null() ) {
- has_nested = 2 ;
-
- policy.spawn( fib_2 , true /* high priority */ );
- policy.spawn( fib_1 );
- policy.add_dependence( this , fib_1 );
- policy.add_dependence( this , fib_2 );
- policy.respawn( this );
- }
- else {
- // Release task memory before spawning the task,
- // after spawning memory cannot be released.
- fib_2 = future_type();
- // Respawn when more memory is available
- policy.respawn_needing_memory( this );
- }
- }
- else if ( has_nested == 2 ) {
-
- has_nested = -1 ;
-
- result = fib_1.get() + fib_2.get();
-
-if ( false ) {
- printf("FibChild %ld = fib(%ld), task_count(%d)\n"
- , long(n), long(result), policy.allocated_task_count());
-}
-
- }
- else {
- printf("FibChild(%ld) execution error\n",(long)n);
- Kokkos::abort("FibChild execution error");
- }
- }
- }
-};
-
-template< class ExecSpace >
-struct FibChild2 {
-
- typedef long value_type ;
-
- Kokkos::Experimental::TaskPolicy<ExecSpace> policy ;
- Kokkos::Experimental::Future<long,ExecSpace> fib_a ;
- Kokkos::Experimental::Future<long,ExecSpace> fib_b ;
- const value_type n ;
- int has_nested ;
-
- KOKKOS_INLINE_FUNCTION
- FibChild2( const Kokkos::Experimental::TaskPolicy<ExecSpace> & arg_policy
- , const value_type arg_n )
- : policy(arg_policy)
- , n( arg_n ), has_nested(0) {}
-
- KOKKOS_INLINE_FUNCTION
- void apply( value_type & result )
- {
- if ( 0 == has_nested ) {
- if ( n < 2 ) {
-
- has_nested = -1 ;
-
- result = n ;
- }
- else if ( n < 4 ) {
- // Spawn new children and respawn myself to sum their results:
- // result = Fib(n-1) + Fib(n-2)
- has_nested = 2 ;
-
- // Spawn lower value at higher priority as it has a shorter
- // path to completion.
-
- policy.clear_dependence( this );
- fib_a = policy.spawn( policy.task_create( FibChild2(policy,n-1) ) );
- fib_b = policy.spawn( policy.task_create( FibChild2(policy,n-2) ) , true );
- policy.add_dependence( this , fib_a );
- policy.add_dependence( this , fib_b );
- policy.respawn( this );
- }
- else {
- // Spawn new children and respawn myself to sum their results:
- // result = Fib(n-1) + Fib(n-2)
- // result = ( Fib(n-2) + Fib(n-3) ) + ( Fib(n-3) + Fib(n-4) )
- // result = ( ( Fib(n-3) + Fib(n-4) ) + Fib(n-3) ) + ( Fib(n-3) + Fib(n-4) )
- // result = 3 * Fib(n-3) + 2 * Fib(n-4)
- has_nested = 4 ;
-
- // Spawn lower value at higher priority as it has a shorter
- // path to completion.
-
- policy.clear_dependence( this );
- fib_a = policy.spawn( policy.task_create( FibChild2(policy,n-3) ) );
- fib_b = policy.spawn( policy.task_create( FibChild2(policy,n-4) ) , true );
- policy.add_dependence( this , fib_a );
- policy.add_dependence( this , fib_b );
- policy.respawn( this );
- }
- }
- else if ( 2 == has_nested || 4 == has_nested ) {
- result = ( has_nested == 2 ) ? fib_a.get() + fib_b.get()
- : 3 * fib_a.get() + 2 * fib_b.get() ;
-
- has_nested = -1 ;
- }
- else {
- printf("FibChild2(%ld) execution error\n",(long)n);
- Kokkos::abort("FibChild2 execution error");
- }
- }
-};
-
-template< class ExecSpace >
-void test_fib( long n , const unsigned task_max_count = 4096 )
-{
- const unsigned task_max_size = 256 ;
- const unsigned task_dependence = 4 ;
-
- Kokkos::Experimental::TaskPolicy<ExecSpace>
- policy( task_max_count
- , task_max_size
- , task_dependence );
-
- Kokkos::Experimental::Future<long,ExecSpace> f =
- policy.spawn( policy.proc_create( FibChild<ExecSpace>(policy,n) ) );
-
- Kokkos::Experimental::wait( policy );
-
- if ( f.get() != eval_fib(n) ) {
- std::cout << "Fib(" << n << ") = " << f.get();
- std::cout << " != " << eval_fib(n);
- std::cout << std::endl ;
- }
-}
-
-template< class ExecSpace >
-void test_fib2( long n , const unsigned task_max_count = 1024 )
-{
- const unsigned task_max_size = 256 ;
- const unsigned task_dependence = 4 ;
-
- Kokkos::Experimental::TaskPolicy<ExecSpace>
- policy( task_max_count
- , task_max_size
- , task_dependence );
-
- Kokkos::Experimental::Future<long,ExecSpace> f =
- policy.spawn( policy.proc_create( FibChild2<ExecSpace>(policy,n) ) );
-
- Kokkos::Experimental::wait( policy );
-
- if ( f.get() != eval_fib(n) ) {
- std::cout << "Fib2(" << n << ") = " << f.get();
- std::cout << " != " << eval_fib(n);
- std::cout << std::endl ;
- }
-}
-
-//----------------------------------------------------------------------------
-
-template< class ExecSpace >
-struct Norm2 {
-
- typedef double value_type ;
-
- const double * const m_x ;
-
- Norm2( const double * x ) : m_x(x) {}
-
- inline
- void init( double & val ) const { val = 0 ; }
-
- KOKKOS_INLINE_FUNCTION
- void operator()( int i , double & val ) const { val += m_x[i] * m_x[i] ; }
-
- void apply( double & dst ) const { dst = std::sqrt( dst ); }
-};
-
-template< class ExecSpace >
-void test_norm2( const int n )
-{
- const unsigned task_max_count = 1024 ;
- const unsigned task_max_size = 256 ;
- const unsigned task_dependence = 4 ;
-
- Kokkos::Experimental::TaskPolicy<ExecSpace>
- policy( task_max_count
- , task_max_size
- , task_dependence );
-
- double * const x = new double[n];
-
- for ( int i = 0 ; i < n ; ++i ) x[i] = 1 ;
-
- Kokkos::RangePolicy<ExecSpace> r(0,n);
-
- Kokkos::Experimental::Future<double,ExecSpace> f =
- Kokkos::Experimental::spawn_reduce( policy , r , Norm2<ExecSpace>(x) );
-
- Kokkos::Experimental::wait( policy );
-
-#if defined(PRINT)
- std::cout << "Norm2: " << f.get() << std::endl ;
-#endif
-
- delete[] x ;
-}
-
-//----------------------------------------------------------------------------
-
-template< class Space >
-struct TaskDep {
-
- typedef int value_type ;
- typedef Kokkos::Experimental::TaskPolicy< Space > policy_type ;
-
- const policy_type policy ;
- const int input ;
-
- TaskDep( const policy_type & arg_p , const int arg_i )
- : policy( arg_p ), input( arg_i ) {}
-
- KOKKOS_INLINE_FUNCTION
- void apply( int & val )
- {
- val = input ;
- const int num = policy.get_dependence( this );
-
- for ( int i = 0 ; i < num ; ++i ) {
- Kokkos::Experimental::Future<int,Space> f = policy.get_dependence( this , i );
- val += f.get();
- }
- }
-};
-
-
-template< class Space >
-void test_task_dep( const int n )
-{
- enum { NTEST = 64 };
-
- const unsigned task_max_count = 1024 ;
- const unsigned task_max_size = 64 ;
- const unsigned task_dependence = 4 ;
-
- Kokkos::Experimental::TaskPolicy<Space>
- policy( task_max_count
- , task_max_size
- , task_dependence );
-
- Kokkos::Experimental::Future<int,Space> f[ NTEST ];
-
- for ( int i = 0 ; i < NTEST ; ++i ) {
- // Create task in the "constructing" state with capacity for 'n+1' dependences
- f[i] = policy.proc_create( TaskDep<Space>(policy,0) , n + 1 );
-
- if ( f[i].get_task_state() != Kokkos::Experimental::TASK_STATE_CONSTRUCTING ) {
- Kokkos::Impl::throw_runtime_exception("get_task_state() != Kokkos::Experimental::TASK_STATE_CONSTRUCTING");
- }
-
- // Only use 'n' dependences
-
- for ( int j = 0 ; j < n ; ++j ) {
-
- Kokkos::Experimental::Future<int,Space> nested =
- policy.proc_create( TaskDep<Space>(policy,j+1) );
-
- policy.spawn( nested );
-
- // Add dependence to a "constructing" task
- policy.add_dependence( f[i] , nested );
- }
-
- // Spawn task from the "constructing" to the "waiting" state
- policy.spawn( f[i] );
- }
-
- const int answer = n % 2 ? n * ( ( n + 1 ) / 2 ) : ( n / 2 ) * ( n + 1 );
-
- Kokkos::Experimental::wait( policy );
-
- int error = 0 ;
- for ( int i = 0 ; i < NTEST ; ++i ) {
- if ( f[i].get_task_state() != Kokkos::Experimental::TASK_STATE_COMPLETE ) {
- Kokkos::Impl::throw_runtime_exception("get_task_state() != Kokkos::Experimental::TASK_STATE_COMPLETE");
- }
- if ( answer != f[i].get() && 0 == error ) {
- std::cout << "test_task_dep(" << n << ") ERROR at[" << i << "]"
- << " answer(" << answer << ") != result(" << f[i].get() << ")" << std::endl ;
- }
- }
-}
-
-//----------------------------------------------------------------------------
-
-template< class ExecSpace >
-struct TaskTeam {
-
- enum { SPAN = 8 };
-
- typedef void value_type ;
- typedef Kokkos::Experimental::TaskPolicy<ExecSpace> policy_type ;
- typedef Kokkos::Experimental::Future<void,ExecSpace> future_type ;
- typedef Kokkos::View<long*,ExecSpace> view_type ;
-
- policy_type policy ;
- future_type future ;
-
- view_type result ;
- const long nvalue ;
-
- KOKKOS_INLINE_FUNCTION
- TaskTeam( const policy_type & arg_policy
- , const view_type & arg_result
- , const long arg_nvalue )
- : policy(arg_policy)
- , future()
- , result( arg_result )
- , nvalue( arg_nvalue )
- {}
-
- KOKKOS_INLINE_FUNCTION
- void apply( const typename policy_type::member_type & member )
- {
- const long end = nvalue + 1 ;
- const long begin = 0 < end - SPAN ? end - SPAN : 0 ;
-
- if ( 0 < begin && future.get_task_state() == Kokkos::Experimental::TASK_STATE_NULL ) {
- if ( member.team_rank() == 0 ) {
- future = policy.spawn( policy.task_create_team( TaskTeam( policy , result , begin - 1 ) ) );
- policy.clear_dependence( this );
- policy.add_dependence( this , future );
- policy.respawn( this );
- }
- return ;
- }
-
- Kokkos::parallel_for( Kokkos::TeamThreadRange(member,begin,end)
- , [&]( int i ) { result[i] = i + 1 ; }
- );
- }
-};
-
-template< class ExecSpace >
-struct TaskTeamValue {
-
- enum { SPAN = 8 };
-
- typedef long value_type ;
- typedef Kokkos::Experimental::TaskPolicy<ExecSpace> policy_type ;
- typedef Kokkos::Experimental::Future<value_type,ExecSpace> future_type ;
- typedef Kokkos::View<long*,ExecSpace> view_type ;
-
- policy_type policy ;
- future_type future ;
-
- view_type result ;
- const long nvalue ;
-
- KOKKOS_INLINE_FUNCTION
- TaskTeamValue( const policy_type & arg_policy
- , const view_type & arg_result
- , const long arg_nvalue )
- : policy(arg_policy)
- , future()
- , result( arg_result )
- , nvalue( arg_nvalue )
- {}
-
- KOKKOS_INLINE_FUNCTION
- void apply( const typename policy_type::member_type & member , value_type & final )
- {
- const long end = nvalue + 1 ;
- const long begin = 0 < end - SPAN ? end - SPAN : 0 ;
-
- if ( 0 < begin && future.is_null() ) {
- if ( member.team_rank() == 0 ) {
-
- future = policy.task_create_team( TaskTeamValue( policy , result , begin - 1 ) );
-
- policy.spawn( future );
- policy.add_dependence( this , future );
- policy.respawn( this );
- }
- return ;
- }
-
- Kokkos::parallel_for( Kokkos::TeamThreadRange(member,begin,end)
- , [&]( int i ) { result[i] = i + 1 ; }
- );
-
- if ( member.team_rank() == 0 ) {
- final = result[nvalue] ;
- }
-
- Kokkos::memory_fence();
- }
-};
-
-template< class ExecSpace >
-void test_task_team( long n )
-{
- typedef TaskTeam< ExecSpace > task_type ;
- typedef TaskTeamValue< ExecSpace > task_value_type ;
- typedef typename task_type::view_type view_type ;
- typedef typename task_type::policy_type policy_type ;
-
- typedef typename task_type::future_type future_type ;
- typedef typename task_value_type::future_type future_value_type ;
-
- const unsigned task_max_count = 1024 ;
- const unsigned task_max_size = 256 ;
- const unsigned task_dependence = 4 ;
-
- policy_type
- policy( task_max_count
- , task_max_size
- , task_dependence );
-
- view_type result("result",n+1);
-
- typename view_type::HostMirror
- host_result = Kokkos::create_mirror_view( result );
-
- future_type f = policy.proc_create_team( task_type( policy , result , n ) );
-
- ASSERT_FALSE( f.is_null() );
-
- policy.spawn( f );
-
- Kokkos::Experimental::wait( policy );
-
- Kokkos::deep_copy( host_result , result );
-
- for ( long i = 0 ; i <= n ; ++i ) {
- const long answer = i + 1 ;
- if ( host_result(i) != answer ) {
- std::cerr << "test_task_team void ERROR result(" << i << ") = "
- << host_result(i) << " != " << answer << std::endl ;
- }
- }
-
- future_value_type fv = policy.proc_create_team( task_value_type( policy , result , n ) );
-
- ASSERT_FALSE( fv.is_null() );
-
- policy.spawn( fv );
-
- Kokkos::Experimental::wait( policy );
-
- Kokkos::deep_copy( host_result , result );
-
- if ( fv.get() != n + 1 ) {
- std::cerr << "test_task_team value ERROR future = "
- << fv.get() << " != " << n + 1 << std::endl ;
- }
- for ( long i = 0 ; i <= n ; ++i ) {
- const long answer = i + 1 ;
- if ( host_result(i) != answer ) {
- std::cerr << "test_task_team value ERROR result(" << i << ") = "
- << host_result(i) << " != " << answer << std::endl ;
- }
- }
-}
-
-//----------------------------------------------------------------------------
-
-template< class ExecSpace >
-struct TaskLatchAdd {
-
- typedef void value_type ;
- typedef Kokkos::Experimental::Future< Kokkos::Experimental::Latch , ExecSpace > future_type ;
-
- future_type latch ;
- volatile int * count ;
-
- KOKKOS_INLINE_FUNCTION
- TaskLatchAdd( const future_type & arg_latch
- , volatile int * const arg_count )
- : latch( arg_latch )
- , count( arg_count )
- {}
-
- KOKKOS_INLINE_FUNCTION
- void apply()
- {
- Kokkos::atomic_fetch_add( count , 1 );
- latch.add(1);
- }
-};
-
-template< class ExecSpace >
-struct TaskLatchRun {
-
- typedef void value_type ;
- typedef Kokkos::Experimental::TaskPolicy< ExecSpace > policy_type ;
- typedef Kokkos::Experimental::Future< Kokkos::Experimental::Latch , ExecSpace > future_type ;
-
- policy_type policy ;
- int total ;
- volatile int count ;
-
- KOKKOS_INLINE_FUNCTION
- TaskLatchRun( const policy_type & arg_policy , const int arg_total )
- : policy(arg_policy), total(arg_total), count(0) {}
-
- KOKKOS_INLINE_FUNCTION
- void apply()
- {
- if ( 0 == count && 0 < total ) {
- future_type latch = policy.create_latch( total );
-
- for ( int i = 0 ; i < total ; ++i ) {
- auto f = policy.task_create( TaskLatchAdd<ExecSpace>(latch,&count) , 0 );
- if ( f.is_null() ) {
- Kokkos::abort("TaskLatchAdd allocation FAILED" );
- }
-
- if ( policy.spawn( f ).is_null() ) {
- Kokkos::abort("TaskLatcAdd spawning FAILED" );
- }
- }
-
- policy.add_dependence( this , latch );
- policy.respawn( this );
- }
- else if ( count != total ) {
- printf("TaskLatchRun FAILED %d != %d\n",count,total);
- }
- }
-};
-
-
-template< class ExecSpace >
-void test_latch( int n )
-{
- typedef TaskLatchRun< ExecSpace > task_type ;
- typedef typename task_type::policy_type policy_type ;
-
- // Primary + latch + n * LatchAdd
- //
- // This test uses several two different block sizes for allocation from the
- // memory pool, so the memory size requested must be big enough to cause two
- // or more superblocks to be used. Currently, the superblock size in the
- // task policy is 2^16, so make the minimum requested memory size greater
- // than this.
- const unsigned task_max_count = n + 2 < 256 ? 256 : n + 2;
- const unsigned task_max_size = 256;
- const unsigned task_dependence = 4 ;
-
- policy_type
- policy( task_max_count
- , task_max_size
- , task_dependence );
-
- policy.spawn( policy.proc_create( TaskLatchRun<ExecSpace>(policy,n) ) );
-
- wait( policy );
-}
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-} // namespace TestTaskPolicy
-
-#endif /* #if defined( KOKKOS_ENABLE_TASKPOLICY ) */
-#endif /* #ifndef KOKKOS_UNITTEST_TASKPOLICY_HPP */
-
-
diff --git a/lib/kokkos/core/unit_test/TestTaskScheduler.hpp b/lib/kokkos/core/unit_test/TestTaskScheduler.hpp
new file mode 100644
index 000000000..113455398
--- /dev/null
+++ b/lib/kokkos/core/unit_test/TestTaskScheduler.hpp
@@ -0,0 +1,551 @@
+/*
+//@HEADER
+// ************************************************************************
+//
+// Kokkos v. 2.0
+// Copyright (2014) Sandia Corporation
+//
+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
+// the U.S. Government retains certain rights in this software.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the Corporation nor the names of the
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
+//
+// ************************************************************************
+//@HEADER
+*/
+
+
+#ifndef KOKKOS_UNITTEST_TASKSCHEDULER_HPP
+#define KOKKOS_UNITTEST_TASKSCHEDULER_HPP
+
+#include <stdio.h>
+#include <iostream>
+#include <cmath>
+
+#if defined( KOKKOS_ENABLE_TASKDAG )
+
+//----------------------------------------------------------------------------
+//----------------------------------------------------------------------------
+
+namespace TestTaskScheduler {
+
+namespace {
+
+inline
+long eval_fib( long n )
+{
+ constexpr long mask = 0x03 ;
+
+ long fib[4] = { 0 , 1 , 1 , 2 };
+
+ for ( long i = 2 ; i <= n ; ++i ) {
+ fib[ i & mask ] = fib[ ( i - 1 ) & mask ] + fib[ ( i - 2 ) & mask ];
+ }
+
+ return fib[ n & mask ];
+}
+
+}
+
+template< typename Space >
+struct TestFib
+{
+ typedef Kokkos::TaskScheduler<Space> policy_type ;
+ typedef Kokkos::Future<long,Space> future_type ;
+ typedef long value_type ;
+
+ policy_type policy ;
+ future_type fib_m1 ;
+ future_type fib_m2 ;
+ const value_type n ;
+
+ KOKKOS_INLINE_FUNCTION
+ TestFib( const policy_type & arg_policy , const value_type arg_n )
+ : policy(arg_policy)
+ , fib_m1() , fib_m2()
+ , n( arg_n )
+ {}
+
+ KOKKOS_INLINE_FUNCTION
+ void operator()( typename policy_type::member_type & , value_type & result )
+ {
+#if 0
+ printf( "\nTestFib(%ld) %d %d\n"
+ , n
+ , int( ! fib_m1.is_null() )
+ , int( ! fib_m2.is_null() )
+ );
+#endif
+
+ if ( n < 2 ) {
+ result = n ;
+ }
+ else if ( ! fib_m2.is_null() && ! fib_m1.is_null() ) {
+ result = fib_m1.get() + fib_m2.get();
+ }
+ else {
+
+ // Spawn new children and respawn myself to sum their results:
+ // Spawn lower value at higher priority as it has a shorter
+ // path to completion.
+
+ fib_m2 = policy.task_spawn( TestFib(policy,n-2)
+ , Kokkos::TaskSingle
+ , Kokkos::TaskHighPriority );
+
+ fib_m1 = policy.task_spawn( TestFib(policy,n-1)
+ , Kokkos::TaskSingle );
+
+ Kokkos::Future<Space> dep[] = { fib_m1 , fib_m2 };
+
+ Kokkos::Future<Space> fib_all = policy.when_all( 2 , dep );
+
+ if ( ! fib_m2.is_null() && ! fib_m1.is_null() && ! fib_all.is_null() ) {
+ // High priority to retire this branch
+ policy.respawn( this , Kokkos::TaskHighPriority , fib_all );
+ }
+ else {
+#if 1
+ printf( "TestFib(%ld) insufficient memory alloc_capacity(%d) task_max(%d) task_accum(%ld)\n"
+ , n
+ , policy.allocation_capacity()
+ , policy.allocated_task_count_max()
+ , policy.allocated_task_count_accum()
+ );
+#endif
+ Kokkos::abort("TestFib insufficient memory");
+
+ }
+ }
+ }
+
+ static void run( int i , size_t MemoryCapacity = 16000 )
+ {
+ typedef typename policy_type::memory_space memory_space ;
+
+ enum { Log2_SuperBlockSize = 12 };
+
+ policy_type root_policy( memory_space() , MemoryCapacity , Log2_SuperBlockSize );
+
+ future_type f = root_policy.host_spawn( TestFib(root_policy,i) , Kokkos::TaskSingle );
+ Kokkos::wait( root_policy );
+ ASSERT_EQ( eval_fib(i) , f.get() );
+
+#if 0
+ fprintf( stdout , "\nTestFib::run(%d) spawn_size(%d) when_all_size(%d) alloc_capacity(%d) task_max(%d) task_accum(%ld)\n"
+ , i
+ , int(root_policy.template spawn_allocation_size<TestFib>())
+ , int(root_policy.when_all_allocation_size(2))
+ , root_policy.allocation_capacity()
+ , root_policy.allocated_task_count_max()
+ , root_policy.allocated_task_count_accum()
+ );
+ fflush( stdout );
+#endif
+ }
+
+};
+
+} // namespace TestTaskScheduler
+
+//----------------------------------------------------------------------------
+
+namespace TestTaskScheduler {
+
+template< class Space >
+struct TestTaskDependence {
+
+ typedef Kokkos::TaskScheduler<Space> policy_type ;
+ typedef Kokkos::Future<Space> future_type ;
+ typedef Kokkos::View<long,Space> accum_type ;
+ typedef void value_type ;
+
+ policy_type m_policy ;
+ accum_type m_accum ;
+ long m_count ;
+
+ KOKKOS_INLINE_FUNCTION
+ TestTaskDependence( long n
+ , const policy_type & arg_policy
+ , const accum_type & arg_accum )
+ : m_policy( arg_policy )
+ , m_accum( arg_accum )
+ , m_count( n )
+ {}
+
+ KOKKOS_INLINE_FUNCTION
+ void operator()( typename policy_type::member_type & )
+ {
+ enum { CHUNK = 8 };
+ const int n = CHUNK < m_count ? CHUNK : m_count ;
+
+ if ( 1 < m_count ) {
+ future_type f[ CHUNK ] ;
+
+ const int inc = ( m_count + n - 1 ) / n ;
+
+ for ( int i = 0 ; i < n ; ++i ) {
+ long begin = i * inc ;
+ long count = begin + inc < m_count ? inc : m_count - begin ;
+ f[i] = m_policy.task_spawn( TestTaskDependence(count,m_policy,m_accum) , Kokkos::TaskSingle );
+ }
+
+ m_count = 0 ;
+
+ m_policy.respawn( this , m_policy.when_all( n , f ) );
+ }
+ else if ( 1 == m_count ) {
+ Kokkos::atomic_increment( & m_accum() );
+ }
+ }
+
+ static void run( int n )
+ {
+ typedef typename policy_type::memory_space memory_space ;
+
+ // enum { MemoryCapacity = 4000 }; // Triggers infinite loop in memory pool
+ enum { MemoryCapacity = 16000 };
+ enum { Log2_SuperBlockSize = 12 };
+ policy_type policy( memory_space() , MemoryCapacity , Log2_SuperBlockSize );
+
+ accum_type accum("accum");
+
+ typename accum_type::HostMirror host_accum =
+ Kokkos::create_mirror_view( accum );
+
+ policy.host_spawn( TestTaskDependence(n,policy,accum) , Kokkos::TaskSingle );
+
+ Kokkos::wait( policy );
+
+ Kokkos::deep_copy( host_accum , accum );
+
+ ASSERT_EQ( host_accum() , n );
+ }
+};
+
+} // namespace TestTaskScheduler
+
+//----------------------------------------------------------------------------
+
+namespace TestTaskScheduler {
+
+template< class ExecSpace >
+struct TestTaskTeam {
+
+ //enum { SPAN = 8 };
+ enum { SPAN = 33 };
+ //enum { SPAN = 1 };
+
+ typedef void value_type ;
+ typedef Kokkos::TaskScheduler<ExecSpace> policy_type ;
+ typedef Kokkos::Future<ExecSpace> future_type ;
+ typedef Kokkos::View<long*,ExecSpace> view_type ;
+
+ policy_type policy ;
+ future_type future ;
+
+ view_type parfor_result ;
+ view_type parreduce_check ;
+ view_type parscan_result ;
+ view_type parscan_check ;
+ const long nvalue ;
+
+ KOKKOS_INLINE_FUNCTION
+ TestTaskTeam( const policy_type & arg_policy
+ , const view_type & arg_parfor_result
+ , const view_type & arg_parreduce_check
+ , const view_type & arg_parscan_result
+ , const view_type & arg_parscan_check
+ , const long arg_nvalue )
+ : policy(arg_policy)
+ , future()
+ , parfor_result( arg_parfor_result )
+ , parreduce_check( arg_parreduce_check )
+ , parscan_result( arg_parscan_result )
+ , parscan_check( arg_parscan_check )
+ , nvalue( arg_nvalue )
+ {}
+
+ KOKKOS_INLINE_FUNCTION
+ void operator()( typename policy_type::member_type & member )
+ {
+ const long end = nvalue + 1 ;
+ const long begin = 0 < end - SPAN ? end - SPAN : 0 ;
+
+ if ( 0 < begin && future.is_null() ) {
+ if ( member.team_rank() == 0 ) {
+ future = policy.task_spawn
+ ( TestTaskTeam( policy ,
+ parfor_result ,
+ parreduce_check,
+ parscan_result,
+ parscan_check,
+ begin - 1 )
+ , Kokkos::TaskTeam );
+
+ assert( ! future.is_null() );
+
+ policy.respawn( this , future );
+ }
+ return ;
+ }
+
+ Kokkos::parallel_for( Kokkos::TeamThreadRange(member,begin,end)
+ , [&]( int i ) { parfor_result[i] = i ; }
+ );
+
+ // test parallel_reduce without join
+
+ long tot = 0;
+ long expected = (begin+end-1)*(end-begin)*0.5;
+
+ Kokkos::parallel_reduce( Kokkos::TeamThreadRange(member,begin,end)
+ , [&]( int i, long &res) { res += parfor_result[i]; }
+ , tot);
+ Kokkos::parallel_for( Kokkos::TeamThreadRange(member,begin,end)
+ , [&]( int i ) { parreduce_check[i] = expected-tot ; }
+ );
+
+ // test parallel_reduce with join
+
+ tot = 0;
+ Kokkos::parallel_reduce( Kokkos::TeamThreadRange(member,begin,end)
+ , [&]( int i, long &res) { res += parfor_result[i]; }
+ , [&]( long& val1, const long& val2) { val1 += val2; }
+ , tot);
+ Kokkos::parallel_for( Kokkos::TeamThreadRange(member,begin,end)
+ , [&]( int i ) { parreduce_check[i] += expected-tot ; }
+ );
+
+ // test parallel_scan
+
+ // Exclusive scan
+ Kokkos::parallel_scan<long>( Kokkos::TeamThreadRange(member,begin,end)
+ , [&]( int i, long &val , const bool final ) {
+ if ( final ) { parscan_result[i] = val; }
+ val += i;
+ }
+ );
+ if ( member.team_rank() == 0 ) {
+ for ( long i = begin ; i < end ; ++i ) {
+ parscan_check[i] = (i*(i-1)-begin*(begin-1))*0.5-parscan_result[i];
+ }
+ }
+
+ // Inclusive scan
+ Kokkos::parallel_scan<long>( Kokkos::TeamThreadRange(member,begin,end)
+ , [&]( int i, long &val , const bool final ) {
+ val += i;
+ if ( final ) { parscan_result[i] = val; }
+ }
+ );
+ if ( member.team_rank() == 0 ) {
+ for ( long i = begin ; i < end ; ++i ) {
+ parscan_check[i] += (i*(i+1)-begin*(begin-1))*0.5-parscan_result[i];
+ }
+ }
+ // ThreadVectorRange check
+ /*
+ long result = 0;
+ expected = (begin+end-1)*(end-begin)*0.5;
+ Kokkos::parallel_reduce( Kokkos::TeamThreadRange( member , 0 , 1 )
+ , [&] ( const int i , long & outerUpdate ) {
+ long sum_j = 0.0;
+ Kokkos::parallel_reduce( Kokkos::ThreadVectorRange( member , end - begin )
+ , [&] ( const int j , long &innerUpdate ) {
+ innerUpdate += begin+j;
+ } , sum_j );
+ outerUpdate += sum_j ;
+ } , result );
+ Kokkos::parallel_for( Kokkos::TeamThreadRange(member,begin,end)
+ , [&]( int i ) {
+ parreduce_check[i] += result-expected ;
+ }
+ );
+ */
+ }
+
+ static void run( long n )
+ {
+ // const unsigned memory_capacity = 10000 ; // causes memory pool infinite loop
+ // const unsigned memory_capacity = 100000 ; // fails with SPAN=1 for serial and OMP
+ const unsigned memory_capacity = 400000 ;
+
+ policy_type root_policy( typename policy_type::memory_space()
+ , memory_capacity );
+
+ view_type root_parfor_result("parfor_result",n+1);
+ view_type root_parreduce_check("parreduce_check",n+1);
+ view_type root_parscan_result("parscan_result",n+1);
+ view_type root_parscan_check("parscan_check",n+1);
+
+ typename view_type::HostMirror
+ host_parfor_result = Kokkos::create_mirror_view( root_parfor_result );
+ typename view_type::HostMirror
+ host_parreduce_check = Kokkos::create_mirror_view( root_parreduce_check );
+ typename view_type::HostMirror
+ host_parscan_result = Kokkos::create_mirror_view( root_parscan_result );
+ typename view_type::HostMirror
+ host_parscan_check = Kokkos::create_mirror_view( root_parscan_check );
+
+ future_type f = root_policy.host_spawn(
+ TestTaskTeam( root_policy ,
+ root_parfor_result ,
+ root_parreduce_check ,
+ root_parscan_result,
+ root_parscan_check,
+ n ) ,
+ Kokkos::TaskTeam );
+
+ Kokkos::wait( root_policy );
+
+ Kokkos::deep_copy( host_parfor_result , root_parfor_result );
+ Kokkos::deep_copy( host_parreduce_check , root_parreduce_check );
+ Kokkos::deep_copy( host_parscan_result , root_parscan_result );
+ Kokkos::deep_copy( host_parscan_check , root_parscan_check );
+
+ for ( long i = 0 ; i <= n ; ++i ) {
+ const long answer = i ;
+ if ( host_parfor_result(i) != answer ) {
+ std::cerr << "TestTaskTeam::run ERROR parallel_for result(" << i << ") = "
+ << host_parfor_result(i) << " != " << answer << std::endl ;
+ }
+ if ( host_parreduce_check(i) != 0 ) {
+ std::cerr << "TestTaskTeam::run ERROR parallel_reduce check(" << i << ") = "
+ << host_parreduce_check(i) << " != 0" << std::endl ;
+ }
+ if ( host_parscan_check(i) != 0 ) {
+ std::cerr << "TestTaskTeam::run ERROR parallel_scan check(" << i << ") = "
+ << host_parscan_check(i) << " != 0" << std::endl ;
+ }
+ }
+ }
+};
+
+template< class ExecSpace >
+struct TestTaskTeamValue {
+
+ enum { SPAN = 8 };
+
+ typedef long value_type ;
+ typedef Kokkos::TaskScheduler<ExecSpace> policy_type ;
+ typedef Kokkos::Future<value_type,ExecSpace> future_type ;
+ typedef Kokkos::View<long*,ExecSpace> view_type ;
+
+ policy_type policy ;
+ future_type future ;
+
+ view_type result ;
+ const long nvalue ;
+
+ KOKKOS_INLINE_FUNCTION
+ TestTaskTeamValue( const policy_type & arg_policy
+ , const view_type & arg_result
+ , const long arg_nvalue )
+ : policy(arg_policy)
+ , future()
+ , result( arg_result )
+ , nvalue( arg_nvalue )
+ {}
+
+ KOKKOS_INLINE_FUNCTION
+ void operator()( typename policy_type::member_type const & member
+ , value_type & final )
+ {
+ const long end = nvalue + 1 ;
+ const long begin = 0 < end - SPAN ? end - SPAN : 0 ;
+
+ if ( 0 < begin && future.is_null() ) {
+ if ( member.team_rank() == 0 ) {
+
+ future = policy.task_spawn
+ ( TestTaskTeamValue( policy , result , begin - 1 )
+ , Kokkos::TaskTeam );
+
+ assert( ! future.is_null() );
+
+ policy.respawn( this , future );
+ }
+ return ;
+ }
+
+ Kokkos::parallel_for( Kokkos::TeamThreadRange(member,begin,end)
+ , [&]( int i ) { result[i] = i + 1 ; }
+ );
+
+ if ( member.team_rank() == 0 ) {
+ final = result[nvalue] ;
+ }
+
+ Kokkos::memory_fence();
+ }
+
+ static void run( long n )
+ {
+ // const unsigned memory_capacity = 10000 ; // causes memory pool infinite loop
+ const unsigned memory_capacity = 100000 ;
+
+ policy_type root_policy( typename policy_type::memory_space()
+ , memory_capacity );
+
+ view_type root_result("result",n+1);
+
+ typename view_type::HostMirror
+ host_result = Kokkos::create_mirror_view( root_result );
+
+ future_type fv = root_policy.host_spawn
+ ( TestTaskTeamValue( root_policy, root_result, n ) , Kokkos::TaskTeam );
+
+ Kokkos::wait( root_policy );
+
+ Kokkos::deep_copy( host_result , root_result );
+
+ if ( fv.get() != n + 1 ) {
+ std::cerr << "TestTaskTeamValue ERROR future = "
+ << fv.get() << " != " << n + 1 << std::endl ;
+ }
+ for ( long i = 0 ; i <= n ; ++i ) {
+ const long answer = i + 1 ;
+ if ( host_result(i) != answer ) {
+ std::cerr << "TestTaskTeamValue ERROR result(" << i << ") = "
+ << host_result(i) << " != " << answer << std::endl ;
+ }
+ }
+ }
+};
+} // namespace TestTaskScheduler
+
+//----------------------------------------------------------------------------
+//----------------------------------------------------------------------------
+
+#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
+#endif /* #ifndef KOKKOS_UNITTEST_TASKSCHEDULER_HPP */
+
+
diff --git a/lib/kokkos/core/unit_test/TestTeam.hpp b/lib/kokkos/core/unit_test/TestTeam.hpp
index db6b0cff7..23ad2be3f 100644
--- a/lib/kokkos/core/unit_test/TestTeam.hpp
+++ b/lib/kokkos/core/unit_test/TestTeam.hpp
@@ -1,910 +1,923 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <stdio.h>
#include <stdexcept>
#include <sstream>
#include <iostream>
#include <Kokkos_Core.hpp>
/*--------------------------------------------------------------------------*/
namespace Test {
namespace {
template< class ExecSpace, class ScheduleType >
struct TestTeamPolicy {
typedef typename Kokkos::TeamPolicy< ScheduleType, ExecSpace >::member_type team_member ;
typedef Kokkos::View<int**,ExecSpace> view_type ;
view_type m_flags ;
TestTeamPolicy( const size_t league_size )
: m_flags( Kokkos::ViewAllocateWithoutInitializing("flags")
, Kokkos::TeamPolicy< ScheduleType, ExecSpace >::team_size_max( *this )
, league_size )
{}
struct VerifyInitTag {};
KOKKOS_INLINE_FUNCTION
void operator()( const team_member & member ) const
{
const int tid = member.team_rank() + member.team_size() * member.league_rank();
m_flags( member.team_rank() , member.league_rank() ) = tid ;
}
KOKKOS_INLINE_FUNCTION
void operator()( const VerifyInitTag & , const team_member & member ) const
{
const int tid = member.team_rank() + member.team_size() * member.league_rank();
if ( tid != m_flags( member.team_rank() , member.league_rank() ) ) {
printf("TestTeamPolicy member(%d,%d) error %d != %d\n"
, member.league_rank() , member.team_rank()
, tid , m_flags( member.team_rank() , member.league_rank() ) );
}
}
// included for test_small_league_size
TestTeamPolicy()
: m_flags()
{}
// included for test_small_league_size
struct NoOpTag {} ;
KOKKOS_INLINE_FUNCTION
void operator()( const NoOpTag & , const team_member & member ) const
{}
static void test_small_league_size() {
int bs = 8; // batch size (number of elements per batch)
int ns = 16; // total number of "problems" to process
// calculate total scratch memory space size
const int level = 0;
int mem_size = 960;
const int num_teams = ns/bs;
const Kokkos::TeamPolicy< ExecSpace, NoOpTag > policy(num_teams, Kokkos::AUTO());
Kokkos::parallel_for ( policy.set_scratch_size(level, Kokkos::PerTeam(mem_size), Kokkos::PerThread(0))
, TestTeamPolicy()
);
}
static void test_for( const size_t league_size )
{
TestTeamPolicy functor( league_size );
const int team_size = Kokkos::TeamPolicy< ScheduleType, ExecSpace >::team_size_max( functor );
Kokkos::parallel_for( Kokkos::TeamPolicy< ScheduleType, ExecSpace >( league_size , team_size ) , functor );
Kokkos::parallel_for( Kokkos::TeamPolicy< ScheduleType, ExecSpace , VerifyInitTag >( league_size , team_size ) , functor );
test_small_league_size();
}
struct ReduceTag {};
typedef long value_type ;
KOKKOS_INLINE_FUNCTION
void operator()( const team_member & member , value_type & update ) const
{
update += member.team_rank() + member.team_size() * member.league_rank();
}
KOKKOS_INLINE_FUNCTION
void operator()( const ReduceTag & , const team_member & member , value_type & update ) const
{
update += 1 + member.team_rank() + member.team_size() * member.league_rank();
}
static void test_reduce( const size_t league_size )
{
TestTeamPolicy functor( league_size );
const int team_size = Kokkos::TeamPolicy< ScheduleType, ExecSpace >::team_size_max( functor );
const long N = team_size * league_size ;
long total = 0 ;
Kokkos::parallel_reduce( Kokkos::TeamPolicy< ScheduleType, ExecSpace >( league_size , team_size ) , functor , total );
ASSERT_EQ( size_t((N-1)*(N))/2 , size_t(total) );
Kokkos::parallel_reduce( Kokkos::TeamPolicy< ScheduleType, ExecSpace , ReduceTag >( league_size , team_size ) , functor , total );
ASSERT_EQ( (size_t(N)*size_t(N+1))/2 , size_t(total) );
}
};
}
}
/*--------------------------------------------------------------------------*/
namespace Test {
template< typename ScalarType , class DeviceType, class ScheduleType >
class ReduceTeamFunctor
{
public:
typedef DeviceType execution_space ;
typedef Kokkos::TeamPolicy< ScheduleType, execution_space > policy_type ;
typedef typename execution_space::size_type size_type ;
struct value_type {
ScalarType value[3] ;
};
const size_type nwork ;
ReduceTeamFunctor( const size_type & arg_nwork ) : nwork( arg_nwork ) {}
ReduceTeamFunctor( const ReduceTeamFunctor & rhs )
: nwork( rhs.nwork ) {}
KOKKOS_INLINE_FUNCTION
void init( value_type & dst ) const
{
dst.value[0] = 0 ;
dst.value[1] = 0 ;
dst.value[2] = 0 ;
}
KOKKOS_INLINE_FUNCTION
void join( volatile value_type & dst ,
const volatile value_type & src ) const
{
dst.value[0] += src.value[0] ;
dst.value[1] += src.value[1] ;
dst.value[2] += src.value[2] ;
}
KOKKOS_INLINE_FUNCTION
void operator()( const typename policy_type::member_type ind , value_type & dst ) const
{
const int thread_rank = ind.team_rank() + ind.team_size() * ind.league_rank();
const int thread_size = ind.team_size() * ind.league_size();
const int chunk = ( nwork + thread_size - 1 ) / thread_size ;
size_type iwork = chunk * thread_rank ;
const size_type iwork_end = iwork + chunk < nwork ? iwork + chunk : nwork ;
for ( ; iwork < iwork_end ; ++iwork ) {
dst.value[0] += 1 ;
dst.value[1] += iwork + 1 ;
dst.value[2] += nwork - iwork ;
}
}
};
} // namespace Test
namespace {
template< typename ScalarType , class DeviceType, class ScheduleType >
class TestReduceTeam
{
public:
typedef DeviceType execution_space ;
typedef Kokkos::TeamPolicy< ScheduleType, execution_space > policy_type ;
typedef typename execution_space::size_type size_type ;
//------------------------------------
TestReduceTeam( const size_type & nwork )
{
run_test(nwork);
}
void run_test( const size_type & nwork )
{
typedef Test::ReduceTeamFunctor< ScalarType , execution_space , ScheduleType> functor_type ;
typedef typename functor_type::value_type value_type ;
typedef Kokkos::View< value_type, Kokkos::HostSpace, Kokkos::MemoryUnmanaged > result_type ;
enum { Count = 3 };
enum { Repeat = 100 };
value_type result[ Repeat ];
const unsigned long nw = nwork ;
const unsigned long nsum = nw % 2 ? nw * (( nw + 1 )/2 )
: (nw/2) * ( nw + 1 );
const unsigned team_size = policy_type::team_size_recommended( functor_type(nwork) );
const unsigned league_size = ( nwork + team_size - 1 ) / team_size ;
policy_type team_exec( league_size , team_size );
for ( unsigned i = 0 ; i < Repeat ; ++i ) {
result_type tmp( & result[i] );
Kokkos::parallel_reduce( team_exec , functor_type(nwork) , tmp );
}
execution_space::fence();
for ( unsigned i = 0 ; i < Repeat ; ++i ) {
for ( unsigned j = 0 ; j < Count ; ++j ) {
const unsigned long correct = 0 == j % 3 ? nw : nsum ;
ASSERT_EQ( (ScalarType) correct , result[i].value[j] );
}
}
}
};
}
/*--------------------------------------------------------------------------*/
namespace Test {
template< class DeviceType, class ScheduleType >
class ScanTeamFunctor
{
public:
typedef DeviceType execution_space ;
typedef Kokkos::TeamPolicy< ScheduleType, execution_space > policy_type ;
typedef long int value_type ;
Kokkos::View< value_type , execution_space > accum ;
Kokkos::View< value_type , execution_space > total ;
ScanTeamFunctor() : accum("accum"), total("total") {}
KOKKOS_INLINE_FUNCTION
void init( value_type & error ) const { error = 0 ; }
KOKKOS_INLINE_FUNCTION
void join( value_type volatile & error ,
value_type volatile const & input ) const
{ if ( input ) error = 1 ; }
struct JoinMax {
typedef long int value_type ;
KOKKOS_INLINE_FUNCTION
void join( value_type volatile & dst
, value_type volatile const & input ) const
{ if ( dst < input ) dst = input ; }
};
KOKKOS_INLINE_FUNCTION
void operator()( const typename policy_type::member_type ind , value_type & error ) const
{
if ( 0 == ind.league_rank() && 0 == ind.team_rank() ) {
const long int thread_count = ind.league_size() * ind.team_size();
total() = ( thread_count * ( thread_count + 1 ) ) / 2 ;
}
// Team max:
const int long m = ind.team_reduce( (long int) ( ind.league_rank() + ind.team_rank() ) , JoinMax() );
if ( m != ind.league_rank() + ( ind.team_size() - 1 ) ) {
printf("ScanTeamFunctor[%d.%d of %d.%d] reduce_max_answer(%ld) != reduce_max(%ld)\n"
, ind.league_rank(), ind.team_rank()
, ind.league_size(), ind.team_size()
, (long int)(ind.league_rank() + ( ind.team_size() - 1 )) , m );
}
// Scan:
const long int answer =
( ind.league_rank() + 1 ) * ind.team_rank() +
( ind.team_rank() * ( ind.team_rank() + 1 ) ) / 2 ;
const long int result =
ind.team_scan( ind.league_rank() + 1 + ind.team_rank() + 1 );
const long int result2 =
ind.team_scan( ind.league_rank() + 1 + ind.team_rank() + 1 );
if ( answer != result || answer != result2 ) {
printf("ScanTeamFunctor[%d.%d of %d.%d] answer(%ld) != scan_first(%ld) or scan_second(%ld)\n",
ind.league_rank(), ind.team_rank(),
ind.league_size(), ind.team_size(),
answer,result,result2);
error = 1 ;
}
const long int thread_rank = ind.team_rank() +
ind.team_size() * ind.league_rank();
ind.team_scan( 1 + thread_rank , accum.ptr_on_device() );
}
};
template< class DeviceType, class ScheduleType >
class TestScanTeam
{
public:
typedef DeviceType execution_space ;
typedef long int value_type ;
typedef Kokkos::TeamPolicy< ScheduleType, execution_space > policy_type ;
typedef Test::ScanTeamFunctor<DeviceType, ScheduleType> functor_type ;
//------------------------------------
TestScanTeam( const size_t nteam )
{
run_test(nteam);
}
void run_test( const size_t nteam )
{
typedef Kokkos::View< long int , Kokkos::HostSpace , Kokkos::MemoryUnmanaged > result_type ;
-
const unsigned REPEAT = 100000 ;
- const unsigned Repeat = ( REPEAT + nteam - 1 ) / nteam ;
+ unsigned Repeat;
+ if ( nteam == 0 )
+ {
+ Repeat = 1;
+ } else {
+ Repeat = ( REPEAT + nteam - 1 ) / nteam ; //error here
+ }
functor_type functor ;
policy_type team_exec( nteam , policy_type::team_size_max( functor ) );
for ( unsigned i = 0 ; i < Repeat ; ++i ) {
long int accum = 0 ;
long int total = 0 ;
long int error = 0 ;
Kokkos::deep_copy( functor.accum , total );
Kokkos::parallel_reduce( team_exec , functor , result_type( & error ) );
DeviceType::fence();
Kokkos::deep_copy( accum , functor.accum );
Kokkos::deep_copy( total , functor.total );
ASSERT_EQ( error , 0 );
ASSERT_EQ( total , accum );
}
execution_space::fence();
}
};
} // namespace Test
/*--------------------------------------------------------------------------*/
namespace Test {
template< class ExecSpace, class ScheduleType >
struct SharedTeamFunctor {
typedef ExecSpace execution_space ;
typedef int value_type ;
typedef Kokkos::TeamPolicy< ScheduleType, execution_space > policy_type ;
enum { SHARED_COUNT = 1000 };
typedef typename ExecSpace::scratch_memory_space shmem_space ;
// tbd: MemoryUnmanaged should be the default for shared memory space
typedef Kokkos::View<int*,shmem_space,Kokkos::MemoryUnmanaged> shared_int_array_type ;
// Tell how much shared memory will be required by this functor:
inline
unsigned team_shmem_size( int team_size ) const
{
return shared_int_array_type::shmem_size( SHARED_COUNT ) +
shared_int_array_type::shmem_size( SHARED_COUNT );
}
KOKKOS_INLINE_FUNCTION
void operator()( const typename policy_type::member_type & ind , value_type & update ) const
{
const shared_int_array_type shared_A( ind.team_shmem() , SHARED_COUNT );
const shared_int_array_type shared_B( ind.team_shmem() , SHARED_COUNT );
if ((shared_A.ptr_on_device () == NULL && SHARED_COUNT > 0) ||
(shared_B.ptr_on_device () == NULL && SHARED_COUNT > 0)) {
printf ("Failed to allocate shared memory of size %lu\n",
static_cast<unsigned long> (SHARED_COUNT));
++update; // failure to allocate is an error
}
else {
for ( int i = ind.team_rank() ; i < SHARED_COUNT ; i += ind.team_size() ) {
shared_A[i] = i + ind.league_rank();
shared_B[i] = 2 * i + ind.league_rank();
}
ind.team_barrier();
if ( ind.team_rank() + 1 == ind.team_size() ) {
for ( int i = 0 ; i < SHARED_COUNT ; ++i ) {
if ( shared_A[i] != i + ind.league_rank() ) {
++update ;
}
if ( shared_B[i] != 2 * i + ind.league_rank() ) {
++update ;
}
}
}
}
}
};
}
namespace {
template< class ExecSpace, class ScheduleType >
struct TestSharedTeam {
TestSharedTeam()
{ run(); }
void run()
{
typedef Test::SharedTeamFunctor<ExecSpace, ScheduleType> Functor ;
typedef Kokkos::View< typename Functor::value_type , Kokkos::HostSpace , Kokkos::MemoryUnmanaged > result_type ;
const size_t team_size = Kokkos::TeamPolicy< ScheduleType, ExecSpace >::team_size_max( Functor() );
Kokkos::TeamPolicy< ScheduleType, ExecSpace > team_exec( 8192 / team_size , team_size );
typename Functor::value_type error_count = 0 ;
Kokkos::parallel_reduce( team_exec , Functor() , result_type( & error_count ) );
ASSERT_EQ( error_count , 0 );
}
};
}
namespace Test {
#if defined (KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA)
template< class MemorySpace, class ExecSpace, class ScheduleType >
struct TestLambdaSharedTeam {
TestLambdaSharedTeam()
{ run(); }
void run()
{
typedef Test::SharedTeamFunctor<ExecSpace, ScheduleType> Functor ;
//typedef Kokkos::View< typename Functor::value_type , Kokkos::HostSpace , Kokkos::MemoryUnmanaged > result_type ;
typedef Kokkos::View< typename Functor::value_type , MemorySpace, Kokkos::MemoryUnmanaged > result_type ;
typedef typename ExecSpace::scratch_memory_space shmem_space ;
// tbd: MemoryUnmanaged should be the default for shared memory space
typedef Kokkos::View<int*,shmem_space,Kokkos::MemoryUnmanaged> shared_int_array_type ;
const int SHARED_COUNT = 1000;
int team_size = 1;
#ifdef KOKKOS_HAVE_CUDA
if(std::is_same<ExecSpace,Kokkos::Cuda>::value)
team_size = 128;
#endif
Kokkos::TeamPolicy< ScheduleType, ExecSpace > team_exec( 8192 / team_size , team_size);
team_exec = team_exec.set_scratch_size(0,Kokkos::PerTeam(SHARED_COUNT*2*sizeof(int)));
typename Functor::value_type error_count = 0 ;
Kokkos::parallel_reduce( team_exec , KOKKOS_LAMBDA
( const typename Kokkos::TeamPolicy< ScheduleType, ExecSpace >::member_type & ind , int & update ) {
const shared_int_array_type shared_A( ind.team_shmem() , SHARED_COUNT );
const shared_int_array_type shared_B( ind.team_shmem() , SHARED_COUNT );
if ((shared_A.ptr_on_device () == NULL && SHARED_COUNT > 0) ||
(shared_B.ptr_on_device () == NULL && SHARED_COUNT > 0)) {
printf ("Failed to allocate shared memory of size %lu\n",
static_cast<unsigned long> (SHARED_COUNT));
++update; // failure to allocate is an error
} else {
for ( int i = ind.team_rank() ; i < SHARED_COUNT ; i += ind.team_size() ) {
shared_A[i] = i + ind.league_rank();
shared_B[i] = 2 * i + ind.league_rank();
}
ind.team_barrier();
if ( ind.team_rank() + 1 == ind.team_size() ) {
for ( int i = 0 ; i < SHARED_COUNT ; ++i ) {
if ( shared_A[i] != i + ind.league_rank() ) {
++update ;
}
if ( shared_B[i] != 2 * i + ind.league_rank() ) {
++update ;
}
}
}
}
}, result_type( & error_count ) );
ASSERT_EQ( error_count , 0 );
}
};
#endif
}
namespace Test {
template< class ExecSpace, class ScheduleType >
struct ScratchTeamFunctor {
typedef ExecSpace execution_space ;
typedef int value_type ;
typedef Kokkos::TeamPolicy< ScheduleType, execution_space > policy_type ;
enum { SHARED_TEAM_COUNT = 100 };
enum { SHARED_THREAD_COUNT = 10 };
typedef typename ExecSpace::scratch_memory_space shmem_space ;
// tbd: MemoryUnmanaged should be the default for shared memory space
typedef Kokkos::View<size_t*,shmem_space,Kokkos::MemoryUnmanaged> shared_int_array_type ;
KOKKOS_INLINE_FUNCTION
void operator()( const typename policy_type::member_type & ind , value_type & update ) const
{
- const shared_int_array_type scratch_ptr( ind.team_scratch(1) , 2*ind.team_size() );
+ const shared_int_array_type scratch_ptr( ind.team_scratch(1) , 3*ind.team_size() );
const shared_int_array_type scratch_A( ind.team_scratch(1) , SHARED_TEAM_COUNT );
const shared_int_array_type scratch_B( ind.thread_scratch(1) , SHARED_THREAD_COUNT );
if ((scratch_ptr.ptr_on_device () == NULL ) ||
(scratch_A. ptr_on_device () == NULL && SHARED_TEAM_COUNT > 0) ||
(scratch_B. ptr_on_device () == NULL && SHARED_THREAD_COUNT > 0)) {
printf ("Failed to allocate shared memory of size %lu\n",
static_cast<unsigned long> (SHARED_TEAM_COUNT));
++update; // failure to allocate is an error
}
else {
Kokkos::parallel_for(Kokkos::TeamThreadRange(ind,0,(int)SHARED_TEAM_COUNT),[&] (const int &i) {
scratch_A[i] = i + ind.league_rank();
});
for(int i=0; i<SHARED_THREAD_COUNT; i++)
scratch_B[i] = 10000*ind.league_rank() + 100*ind.team_rank() + i;
scratch_ptr[ind.team_rank()] = (size_t) scratch_A.ptr_on_device();
scratch_ptr[ind.team_rank() + ind.team_size()] = (size_t) scratch_B.ptr_on_device();
ind.team_barrier();
for( int i = 0; i<SHARED_TEAM_COUNT; i++) {
if(scratch_A[i] != size_t(i + ind.league_rank()))
++update;
}
for( int i = 0; i < ind.team_size(); i++) {
if(scratch_ptr[0]!=scratch_ptr[i]) ++update;
}
if(scratch_ptr[1+ind.team_size()] - scratch_ptr[0 + ind.team_size()] <
SHARED_THREAD_COUNT*sizeof(size_t))
++update;
for( int i = 1; i < ind.team_size(); i++) {
if((scratch_ptr[i+ind.team_size()] - scratch_ptr[i-1+ind.team_size()]) !=
(scratch_ptr[1+ind.team_size()] - scratch_ptr[0 + ind.team_size()])) ++update;
}
}
}
};
}
namespace {
template< class ExecSpace, class ScheduleType >
struct TestScratchTeam {
TestScratchTeam()
{ run(); }
void run()
{
typedef Test::ScratchTeamFunctor<ExecSpace, ScheduleType> Functor ;
typedef Kokkos::View< typename Functor::value_type , Kokkos::HostSpace , Kokkos::MemoryUnmanaged > result_type ;
const size_t team_size = Kokkos::TeamPolicy< ScheduleType, ExecSpace >::team_size_max( Functor() );
Kokkos::TeamPolicy< ScheduleType, ExecSpace > team_exec( 8192 / team_size , team_size );
typename Functor::value_type error_count = 0 ;
int team_scratch_size = Functor::shared_int_array_type::shmem_size(Functor::SHARED_TEAM_COUNT) +
- Functor::shared_int_array_type::shmem_size(2*team_size);
+ Functor::shared_int_array_type::shmem_size(3*team_size);
int thread_scratch_size = Functor::shared_int_array_type::shmem_size(Functor::SHARED_THREAD_COUNT);
Kokkos::parallel_reduce( team_exec.set_scratch_size(0,Kokkos::PerTeam(team_scratch_size),
Kokkos::PerThread(thread_scratch_size)) ,
Functor() , result_type( & error_count ) );
ASSERT_EQ( error_count , 0 );
}
};
}
namespace Test {
template< class ExecSpace>
KOKKOS_INLINE_FUNCTION
int test_team_mulit_level_scratch_loop_body(const typename Kokkos::TeamPolicy<ExecSpace>::member_type& team) {
- Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> a_team1(team.team_scratch(0),128);
- Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> a_thread1(team.thread_scratch(0),16);
- Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> a_team2(team.team_scratch(0),128);
- Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> a_thread2(team.thread_scratch(0),16);
-
- Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> b_team1(team.team_scratch(1),128000);
- Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> b_thread1(team.thread_scratch(1),16000);
- Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> b_team2(team.team_scratch(1),128000);
- Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> b_thread2(team.thread_scratch(1),16000);
-
- Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> a_team3(team.team_scratch(0),128);
- Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> a_thread3(team.thread_scratch(0),16);
- Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> b_team3(team.team_scratch(1),128000);
- Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> b_thread3(team.thread_scratch(1),16000);
-
-
- Kokkos::parallel_for(Kokkos::TeamThreadRange(team,0,128), [&] (const int& i) {
- a_team1(i) = 1000000 + i;
- a_team2(i) = 2000000 + i;
- a_team3(i) = 3000000 + i;
- });
- team.team_barrier();
- Kokkos::parallel_for(Kokkos::ThreadVectorRange(team,16), [&] (const int& i){
- a_thread1(i) = 1000000 + 100000*team.team_rank() + 16-i;
- a_thread2(i) = 2000000 + 100000*team.team_rank() + 16-i;
- a_thread3(i) = 3000000 + 100000*team.team_rank() + 16-i;
- });
+ Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> a_team1(team.team_scratch(0),128);
+ Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> a_thread1(team.thread_scratch(0),16);
+ Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> a_team2(team.team_scratch(0),128);
+ Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> a_thread2(team.thread_scratch(0),16);
+
+ Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> b_team1(team.team_scratch(1),128000);
+ Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> b_thread1(team.thread_scratch(1),16000);
+ Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> b_team2(team.team_scratch(1),128000);
+ Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> b_thread2(team.thread_scratch(1),16000);
+
+ Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> a_team3(team.team_scratch(0),128);
+ Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> a_thread3(team.thread_scratch(0),16);
+ Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> b_team3(team.team_scratch(1),128000);
+ Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> b_thread3(team.thread_scratch(1),16000);
+
+ // The explicit types for 0 and 128 are here to test TeamThreadRange accepting different
+ // types for begin and end.
+ Kokkos::parallel_for(Kokkos::TeamThreadRange(team,int(0),unsigned(128)), [&] (const int& i)
+ {
+ a_team1(i) = 1000000 + i;
+ a_team2(i) = 2000000 + i;
+ a_team3(i) = 3000000 + i;
+ });
+ team.team_barrier();
+ Kokkos::parallel_for(Kokkos::ThreadVectorRange(team,16), [&] (const int& i)
+ {
+ a_thread1(i) = 1000000 + 100000*team.team_rank() + 16-i;
+ a_thread2(i) = 2000000 + 100000*team.team_rank() + 16-i;
+ a_thread3(i) = 3000000 + 100000*team.team_rank() + 16-i;
+ });
- Kokkos::parallel_for(Kokkos::TeamThreadRange(team,0,128000), [&] (const int& i) {
- b_team1(i) = 1000000 + i;
- b_team2(i) = 2000000 + i;
- b_team3(i) = 3000000 + i;
- });
- team.team_barrier();
- Kokkos::parallel_for(Kokkos::ThreadVectorRange(team,16000), [&] (const int& i){
- b_thread1(i) = 1000000 + 100000*team.team_rank() + 16-i;
- b_thread2(i) = 2000000 + 100000*team.team_rank() + 16-i;
- b_thread3(i) = 3000000 + 100000*team.team_rank() + 16-i;
- });
+ Kokkos::parallel_for(Kokkos::TeamThreadRange(team,0,128000), [&] (const int& i)
+ {
+ b_team1(i) = 1000000 + i;
+ b_team2(i) = 2000000 + i;
+ b_team3(i) = 3000000 + i;
+ });
+ team.team_barrier();
+ Kokkos::parallel_for(Kokkos::ThreadVectorRange(team,16000), [&] (const int& i)
+ {
+ b_thread1(i) = 1000000 + 100000*team.team_rank() + 16-i;
+ b_thread2(i) = 2000000 + 100000*team.team_rank() + 16-i;
+ b_thread3(i) = 3000000 + 100000*team.team_rank() + 16-i;
+ });
- team.team_barrier();
- int error = 0;
- Kokkos::parallel_for(Kokkos::TeamThreadRange(team,0,128), [&] (const int& i) {
- if(a_team1(i) != 1000000 + i) error++;
- if(a_team2(i) != 2000000 + i) error++;
- if(a_team3(i) != 3000000 + i) error++;
- });
- team.team_barrier();
- Kokkos::parallel_for(Kokkos::ThreadVectorRange(team,16), [&] (const int& i){
- if(a_thread1(i) != 1000000 + 100000*team.team_rank() + 16-i) error++;
- if(a_thread2(i) != 2000000 + 100000*team.team_rank() + 16-i) error++;
- if(a_thread3(i) != 3000000 + 100000*team.team_rank() + 16-i) error++;
- });
+ team.team_barrier();
+ int error = 0;
+ Kokkos::parallel_for(Kokkos::TeamThreadRange(team,0,128), [&] (const int& i)
+ {
+ if(a_team1(i) != 1000000 + i) error++;
+ if(a_team2(i) != 2000000 + i) error++;
+ if(a_team3(i) != 3000000 + i) error++;
+ });
+ team.team_barrier();
+ Kokkos::parallel_for(Kokkos::ThreadVectorRange(team,16), [&] (const int& i)
+ {
+ if(a_thread1(i) != 1000000 + 100000*team.team_rank() + 16-i) error++;
+ if(a_thread2(i) != 2000000 + 100000*team.team_rank() + 16-i) error++;
+ if(a_thread3(i) != 3000000 + 100000*team.team_rank() + 16-i) error++;
+ });
- Kokkos::parallel_for(Kokkos::TeamThreadRange(team,0,128000), [&] (const int& i) {
- if(b_team1(i) != 1000000 + i) error++;
- if(b_team2(i) != 2000000 + i) error++;
- if(b_team3(i) != 3000000 + i) error++;
- });
- team.team_barrier();
- Kokkos::parallel_for(Kokkos::ThreadVectorRange(team,16000), [&] (const int& i){
- if(b_thread1(i) != 1000000 + 100000*team.team_rank() + 16-i) error++;
- if(b_thread2(i) != 2000000 + 100000*team.team_rank() + 16-i) error++;
- if( b_thread3(i) != 3000000 + 100000*team.team_rank() + 16-i) error++;
- });
+ Kokkos::parallel_for(Kokkos::TeamThreadRange(team,0,128000), [&] (const int& i)
+ {
+ if(b_team1(i) != 1000000 + i) error++;
+ if(b_team2(i) != 2000000 + i) error++;
+ if(b_team3(i) != 3000000 + i) error++;
+ });
+ team.team_barrier();
+ Kokkos::parallel_for(Kokkos::ThreadVectorRange(team,16000), [&] (const int& i)
+ {
+ if(b_thread1(i) != 1000000 + 100000*team.team_rank() + 16-i) error++;
+ if(b_thread2(i) != 2000000 + 100000*team.team_rank() + 16-i) error++;
+ if( b_thread3(i) != 3000000 + 100000*team.team_rank() + 16-i) error++;
+ });
return error;
}
-
struct TagReduce {};
struct TagFor {};
template< class ExecSpace, class ScheduleType >
struct ClassNoShmemSizeFunction {
Kokkos::View<int,ExecSpace,Kokkos::MemoryTraits<Kokkos::Atomic> > errors;
KOKKOS_INLINE_FUNCTION
void operator() (const TagFor&, const typename Kokkos::TeamPolicy<ExecSpace,ScheduleType>::member_type& team) const {
int error = test_team_mulit_level_scratch_loop_body<ExecSpace>(team);
errors() += error;
}
KOKKOS_INLINE_FUNCTION
void operator() (const TagReduce&, const typename Kokkos::TeamPolicy<ExecSpace,ScheduleType>::member_type& team, int& error) const {
error += test_team_mulit_level_scratch_loop_body<ExecSpace>(team);
}
void run() {
Kokkos::View<int,ExecSpace> d_errors = Kokkos::View<int,ExecSpace>("Errors");
errors = d_errors;
const int per_team0 = 3*Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>::shmem_size(128);
const int per_thread0 = 3*Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>::shmem_size(16);
const int per_team1 = 3*Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>::shmem_size(128000);
const int per_thread1 = 3*Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>::shmem_size(16000);
{
Kokkos::TeamPolicy<TagFor,ExecSpace,ScheduleType> policy(10,8,16);
Kokkos::parallel_for(policy.set_scratch_size(0,Kokkos::PerTeam(per_team0),Kokkos::PerThread(per_thread0)).set_scratch_size(1,Kokkos::PerTeam(per_team1),Kokkos::PerThread(per_thread1)),
*this);
Kokkos::fence();
typename Kokkos::View<int,ExecSpace>::HostMirror h_errors = Kokkos::create_mirror_view(d_errors);
Kokkos::deep_copy(h_errors,d_errors);
ASSERT_EQ(h_errors(),0);
}
{
int error = 0;
Kokkos::TeamPolicy<TagReduce,ExecSpace,ScheduleType> policy(10,8,16);
Kokkos::parallel_reduce(policy.set_scratch_size(0,Kokkos::PerTeam(per_team0),Kokkos::PerThread(per_thread0)).set_scratch_size(1,Kokkos::PerTeam(per_team1),Kokkos::PerThread(per_thread1)),
*this,error);
Kokkos::fence();
ASSERT_EQ(error,0);
}
};
};
template< class ExecSpace, class ScheduleType >
struct ClassWithShmemSizeFunction {
Kokkos::View<int,ExecSpace,Kokkos::MemoryTraits<Kokkos::Atomic> > errors;
KOKKOS_INLINE_FUNCTION
void operator() (const TagFor&, const typename Kokkos::TeamPolicy<ExecSpace,ScheduleType>::member_type& team) const {
int error = test_team_mulit_level_scratch_loop_body<ExecSpace>(team);
errors() += error;
}
KOKKOS_INLINE_FUNCTION
void operator() (const TagReduce&, const typename Kokkos::TeamPolicy<ExecSpace,ScheduleType>::member_type& team, int& error) const {
error += test_team_mulit_level_scratch_loop_body<ExecSpace>(team);
}
void run() {
Kokkos::View<int,ExecSpace> d_errors = Kokkos::View<int,ExecSpace>("Errors");
errors = d_errors;
const int per_team1 = 3*Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>::shmem_size(128000);
const int per_thread1 = 3*Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>::shmem_size(16000);
{
Kokkos::TeamPolicy<TagFor,ExecSpace,ScheduleType> policy(10,8,16);
Kokkos::parallel_for(policy.set_scratch_size(1,Kokkos::PerTeam(per_team1),Kokkos::PerThread(per_thread1)),
*this);
Kokkos::fence();
typename Kokkos::View<int,ExecSpace>::HostMirror h_errors= Kokkos::create_mirror_view(d_errors);
Kokkos::deep_copy(h_errors,d_errors);
ASSERT_EQ(h_errors(),0);
}
{
int error = 0;
Kokkos::TeamPolicy<TagReduce,ExecSpace,ScheduleType> policy(10,8,16);
Kokkos::parallel_reduce(policy.set_scratch_size(1,Kokkos::PerTeam(per_team1),Kokkos::PerThread(per_thread1)),
*this,error);
Kokkos::fence();
ASSERT_EQ(error,0);
}
};
unsigned team_shmem_size(int team_size) const {
const int per_team0 = 3*Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>::shmem_size(128);
const int per_thread0 = 3*Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>::shmem_size(16);
return per_team0 + team_size * per_thread0;
}
};
template< class ExecSpace, class ScheduleType >
void test_team_mulit_level_scratch_test_lambda() {
#ifdef KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA
Kokkos::View<int,ExecSpace,Kokkos::MemoryTraits<Kokkos::Atomic> > errors;
Kokkos::View<int,ExecSpace> d_errors("Errors");
errors = d_errors;
const int per_team0 = 3*Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>::shmem_size(128);
const int per_thread0 = 3*Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>::shmem_size(16);
const int per_team1 = 3*Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>::shmem_size(128000);
const int per_thread1 = 3*Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>::shmem_size(16000);
Kokkos::TeamPolicy<ExecSpace,ScheduleType> policy(10,8,16);
Kokkos::parallel_for(policy.set_scratch_size(0,Kokkos::PerTeam(per_team0),Kokkos::PerThread(per_thread0)).set_scratch_size(1,Kokkos::PerTeam(per_team1),Kokkos::PerThread(per_thread1)),
KOKKOS_LAMBDA(const typename Kokkos::TeamPolicy<ExecSpace>::member_type& team) {
int error = test_team_mulit_level_scratch_loop_body<ExecSpace>(team);
errors() += error;
});
Kokkos::fence();
typename Kokkos::View<int,ExecSpace>::HostMirror h_errors= Kokkos::create_mirror_view(errors);
Kokkos::deep_copy(h_errors,d_errors);
ASSERT_EQ(h_errors(),0);
int error = 0;
Kokkos::parallel_reduce(policy.set_scratch_size(0,Kokkos::PerTeam(per_team0),Kokkos::PerThread(per_thread0)).set_scratch_size(1,Kokkos::PerTeam(per_team1),Kokkos::PerThread(per_thread1)),
KOKKOS_LAMBDA(const typename Kokkos::TeamPolicy<ExecSpace>::member_type& team, int& count) {
count += test_team_mulit_level_scratch_loop_body<ExecSpace>(team);
},error);
ASSERT_EQ(error,0);
Kokkos::fence();
#endif
}
}
namespace {
template< class ExecSpace, class ScheduleType >
struct TestMultiLevelScratchTeam {
TestMultiLevelScratchTeam()
{ run(); }
void run()
{
#ifdef KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA
Test::test_team_mulit_level_scratch_test_lambda<ExecSpace, ScheduleType>();
#endif
Test::ClassNoShmemSizeFunction<ExecSpace, ScheduleType> c1;
c1.run();
Test::ClassWithShmemSizeFunction<ExecSpace, ScheduleType> c2;
c2.run();
}
};
}
namespace Test {
template< class ExecSpace >
struct TestShmemSize {
TestShmemSize() { run(); }
void run()
{
typedef Kokkos::View< long***, ExecSpace > view_type;
size_t d1 = 5;
size_t d2 = 6;
size_t d3 = 7;
size_t size = view_type::shmem_size( d1, d2, d3 );
ASSERT_EQ( size, d1 * d2 * d3 * sizeof(long) );
}
};
}
/*--------------------------------------------------------------------------*/
diff --git a/lib/kokkos/core/unit_test/TestTeamVector.hpp b/lib/kokkos/core/unit_test/TestTeamVector.hpp
index 48187f036..d9b06c29e 100644
--- a/lib/kokkos/core/unit_test/TestTeamVector.hpp
+++ b/lib/kokkos/core/unit_test/TestTeamVector.hpp
@@ -1,646 +1,673 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Core.hpp>
#include <impl/Kokkos_Timer.hpp>
#include <iostream>
#include <cstdlib>
namespace TestTeamVector {
struct my_complex {
double re,im;
int dummy;
KOKKOS_INLINE_FUNCTION
my_complex() {
re = 0.0;
im = 0.0;
dummy = 0;
}
KOKKOS_INLINE_FUNCTION
my_complex(const my_complex& src) {
re = src.re;
im = src.im;
dummy = src.dummy;
}
KOKKOS_INLINE_FUNCTION
my_complex(const volatile my_complex& src) {
re = src.re;
im = src.im;
dummy = src.dummy;
}
KOKKOS_INLINE_FUNCTION
my_complex(const double& val) {
re = val;
im = 0.0;
dummy = 0;
}
KOKKOS_INLINE_FUNCTION
my_complex& operator += (const my_complex& src) {
re += src.re;
im += src.im;
dummy += src.dummy;
return *this;
}
KOKKOS_INLINE_FUNCTION
void operator += (const volatile my_complex& src) volatile {
re += src.re;
im += src.im;
dummy += src.dummy;
}
KOKKOS_INLINE_FUNCTION
my_complex& operator *= (const my_complex& src) {
double re_tmp = re*src.re - im*src.im;
double im_tmp = re * src.im + im * src.re;
re = re_tmp;
im = im_tmp;
dummy *= src.dummy;
return *this;
}
KOKKOS_INLINE_FUNCTION
void operator *= (const volatile my_complex& src) volatile {
double re_tmp = re*src.re - im*src.im;
double im_tmp = re * src.im + im * src.re;
re = re_tmp;
im = im_tmp;
dummy *= src.dummy;
}
KOKKOS_INLINE_FUNCTION
bool operator == (const my_complex& src) {
return (re == src.re) && (im == src.im) && ( dummy == src.dummy );
}
KOKKOS_INLINE_FUNCTION
bool operator != (const my_complex& src) {
return (re != src.re) || (im != src.im) || ( dummy != src.dummy );
}
KOKKOS_INLINE_FUNCTION
bool operator != (const double& val) {
return (re != val) ||
(im != 0) || (dummy != 0);
}
KOKKOS_INLINE_FUNCTION
my_complex& operator= (const int& val) {
re = val;
im = 0.0;
dummy = 0;
return *this;
}
KOKKOS_INLINE_FUNCTION
my_complex& operator= (const double& val) {
re = val;
im = 0.0;
dummy = 0;
return *this;
}
KOKKOS_INLINE_FUNCTION
operator double() {
return re;
}
};
template<typename Scalar, class ExecutionSpace>
struct functor_team_for {
typedef Kokkos::TeamPolicy<ExecutionSpace> policy_type;
typedef ExecutionSpace execution_space;
Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag;
functor_team_for(Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag_):flag(flag_) {}
unsigned team_shmem_size(int team_size) const {return team_size*13*sizeof(Scalar)+8;}
KOKKOS_INLINE_FUNCTION
void operator() (typename policy_type::member_type team) const {
typedef typename ExecutionSpace::scratch_memory_space shmem_space ;
typedef Kokkos::View<Scalar*,shmem_space,Kokkos::MemoryUnmanaged> shared_int;
typedef typename shared_int::size_type size_type;
const size_type shmemSize = team.team_size () * 13;
shared_int values = shared_int (team.team_shmem (), shmemSize);
if (values.ptr_on_device () == NULL || values.dimension_0 () < shmemSize) {
printf ("FAILED to allocate shared memory of size %u\n",
static_cast<unsigned int> (shmemSize));
}
else {
// Initialize shared memory
values(team.team_rank ()) = 0;
// Accumulate value into per thread shared memory
// This is non blocking
- Kokkos::parallel_for(Kokkos::TeamThreadRange(team,131),[&] (int i) {
+ Kokkos::parallel_for(Kokkos::TeamThreadRange(team,131),[&] (int i)
+ {
values(team.team_rank ()) += i - team.league_rank () + team.league_size () + team.team_size ();
});
// Wait for all memory to be written
team.team_barrier ();
// One thread per team executes the comparison
- Kokkos::single(Kokkos::PerTeam(team),[&]() {
+ Kokkos::single(Kokkos::PerTeam(team),[&]()
+ {
Scalar test = 0;
Scalar value = 0;
for (int i = 0; i < 131; ++i) {
test += i - team.league_rank () + team.league_size () + team.team_size ();
}
for (int i = 0; i < team.team_size (); ++i) {
value += values(i);
}
if (test != value) {
printf ("FAILED team_parallel_for %i %i %f %f\n",
team.league_rank (), team.team_rank (),
static_cast<double> (test), static_cast<double> (value));
flag() = 1;
}
});
}
}
};
template<typename Scalar, class ExecutionSpace>
struct functor_team_reduce {
typedef Kokkos::TeamPolicy<ExecutionSpace> policy_type;
typedef ExecutionSpace execution_space;
Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag;
functor_team_reduce(Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag_):flag(flag_) {}
unsigned team_shmem_size(int team_size) const {return team_size*13*sizeof(Scalar)+8;}
KOKKOS_INLINE_FUNCTION
void operator() (typename policy_type::member_type team) const {
Scalar value = Scalar();
- Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team,131),[&] (int i, Scalar& val) {
+ Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team,131),[&] (int i, Scalar& val)
+ {
val += i - team.league_rank () + team.league_size () + team.team_size ();
},value);
team.team_barrier ();
- Kokkos::single(Kokkos::PerTeam(team),[&]() {
+ Kokkos::single(Kokkos::PerTeam(team),[&]()
+ {
Scalar test = 0;
for (int i = 0; i < 131; ++i) {
test += i - team.league_rank () + team.league_size () + team.team_size ();
}
if (test != value) {
if(team.league_rank() == 0)
printf ("FAILED team_parallel_reduce %i %i %f %f %lu\n",
team.league_rank (), team.team_rank (),
static_cast<double> (test), static_cast<double> (value),sizeof(Scalar));
flag() = 1;
}
});
}
};
template<typename Scalar, class ExecutionSpace>
struct functor_team_reduce_join {
typedef Kokkos::TeamPolicy<ExecutionSpace> policy_type;
typedef ExecutionSpace execution_space;
Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag;
functor_team_reduce_join(Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag_):flag(flag_) {}
unsigned team_shmem_size(int team_size) const {return team_size*13*sizeof(Scalar)+8;}
KOKKOS_INLINE_FUNCTION
void operator() (typename policy_type::member_type team) const {
Scalar value = 0;
Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team,131)
- , [&] (int i, Scalar& val) {
+ , [&] (int i, Scalar& val)
+ {
val += i - team.league_rank () + team.league_size () + team.team_size ();
}
- , [&] (volatile Scalar& val, const volatile Scalar& src) {val+=src;}
+ , [&] (volatile Scalar& val, const volatile Scalar& src)
+ {val+=src;}
, value
);
team.team_barrier ();
- Kokkos::single(Kokkos::PerTeam(team),[&]() {
+ Kokkos::single(Kokkos::PerTeam(team),[&]()
+ {
Scalar test = 0;
for (int i = 0; i < 131; ++i) {
test += i - team.league_rank () + team.league_size () + team.team_size ();
}
if (test != value) {
printf ("FAILED team_vector_parallel_reduce_join %i %i %f %f\n",
team.league_rank (), team.team_rank (),
static_cast<double> (test), static_cast<double> (value));
flag() = 1;
}
});
}
};
template<typename Scalar, class ExecutionSpace>
struct functor_team_vector_for {
typedef Kokkos::TeamPolicy<ExecutionSpace> policy_type;
typedef ExecutionSpace execution_space;
Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag;
functor_team_vector_for(Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag_):flag(flag_) {}
unsigned team_shmem_size(int team_size) const {return team_size*13*sizeof(Scalar)+8;}
KOKKOS_INLINE_FUNCTION
void operator() (typename policy_type::member_type team) const {
typedef typename ExecutionSpace::scratch_memory_space shmem_space ;
typedef Kokkos::View<Scalar*,shmem_space,Kokkos::MemoryUnmanaged> shared_int;
typedef typename shared_int::size_type size_type;
const size_type shmemSize = team.team_size () * 13;
shared_int values = shared_int (team.team_shmem (), shmemSize);
if (values.ptr_on_device () == NULL || values.dimension_0 () < shmemSize) {
printf ("FAILED to allocate shared memory of size %u\n",
static_cast<unsigned int> (shmemSize));
}
else {
- Kokkos::single(Kokkos::PerThread(team),[&] () {
+ Kokkos::single(Kokkos::PerThread(team),[&] ()
+ {
values(team.team_rank ()) = 0;
});
- Kokkos::parallel_for(Kokkos::TeamThreadRange(team,131),[&] (int i) {
- Kokkos::single(Kokkos::PerThread(team),[&] () {
+ Kokkos::parallel_for(Kokkos::TeamThreadRange(team,131),[&] (int i)
+ {
+ Kokkos::single(Kokkos::PerThread(team),[&] ()
+ {
values(team.team_rank ()) += i - team.league_rank () + team.league_size () + team.team_size ();
});
});
team.team_barrier ();
- Kokkos::single(Kokkos::PerTeam(team),[&]() {
+ Kokkos::single(Kokkos::PerTeam(team),[&]()
+ {
Scalar test = 0;
Scalar value = 0;
for (int i = 0; i < 131; ++i) {
test += i - team.league_rank () + team.league_size () + team.team_size ();
}
for (int i = 0; i < team.team_size (); ++i) {
value += values(i);
}
if (test != value) {
printf ("FAILED team_vector_parallel_for %i %i %f %f\n",
team.league_rank (), team.team_rank (),
static_cast<double> (test), static_cast<double> (value));
flag() = 1;
}
});
}
}
};
template<typename Scalar, class ExecutionSpace>
struct functor_team_vector_reduce {
typedef Kokkos::TeamPolicy<ExecutionSpace> policy_type;
typedef ExecutionSpace execution_space;
Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag;
functor_team_vector_reduce(Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag_):flag(flag_) {}
unsigned team_shmem_size(int team_size) const {return team_size*13*sizeof(Scalar)+8;}
KOKKOS_INLINE_FUNCTION
void operator() (typename policy_type::member_type team) const {
Scalar value = Scalar();
- Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team,131),[&] (int i, Scalar& val) {
+ Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team,131),[&] (int i, Scalar& val)
+ {
val += i - team.league_rank () + team.league_size () + team.team_size ();
},value);
team.team_barrier ();
- Kokkos::single(Kokkos::PerTeam(team),[&]() {
+ Kokkos::single(Kokkos::PerTeam(team),[&]()
+ {
Scalar test = 0;
for (int i = 0; i < 131; ++i) {
test += i - team.league_rank () + team.league_size () + team.team_size ();
}
if (test != value) {
if(team.league_rank() == 0)
printf ("FAILED team_vector_parallel_reduce %i %i %f %f %lu\n",
team.league_rank (), team.team_rank (),
static_cast<double> (test), static_cast<double> (value),sizeof(Scalar));
flag() = 1;
}
});
}
};
template<typename Scalar, class ExecutionSpace>
struct functor_team_vector_reduce_join {
typedef Kokkos::TeamPolicy<ExecutionSpace> policy_type;
typedef ExecutionSpace execution_space;
Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag;
functor_team_vector_reduce_join(Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag_):flag(flag_) {}
unsigned team_shmem_size(int team_size) const {return team_size*13*sizeof(Scalar)+8;}
KOKKOS_INLINE_FUNCTION
void operator() (typename policy_type::member_type team) const {
Scalar value = 0;
Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team,131)
- , [&] (int i, Scalar& val) {
+ , [&] (int i, Scalar& val)
+ {
val += i - team.league_rank () + team.league_size () + team.team_size ();
}
- , [&] (volatile Scalar& val, const volatile Scalar& src) {val+=src;}
+ , [&] (volatile Scalar& val, const volatile Scalar& src)
+ {val+=src;}
, value
);
team.team_barrier ();
- Kokkos::single(Kokkos::PerTeam(team),[&]() {
+ Kokkos::single(Kokkos::PerTeam(team),[&]()
+ {
Scalar test = 0;
for (int i = 0; i < 131; ++i) {
test += i - team.league_rank () + team.league_size () + team.team_size ();
}
if (test != value) {
printf ("FAILED team_vector_parallel_reduce_join %i %i %f %f\n",
team.league_rank (), team.team_rank (),
static_cast<double> (test), static_cast<double> (value));
flag() = 1;
}
});
}
};
template<typename Scalar, class ExecutionSpace>
struct functor_vec_single {
typedef Kokkos::TeamPolicy<ExecutionSpace> policy_type;
typedef ExecutionSpace execution_space;
Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag;
functor_vec_single(Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag_):flag(flag_) {}
KOKKOS_INLINE_FUNCTION
void operator() (typename policy_type::member_type team) const {
// Warning: this test case intentionally violates permissable semantics
// It is not valid to get references to members of the enclosing region
// inside a parallel_for and write to it.
Scalar value = 0;
- Kokkos::parallel_for(Kokkos::ThreadVectorRange(team,13),[&] (int i) {
+ Kokkos::parallel_for(Kokkos::ThreadVectorRange(team,13),[&] (int i)
+ {
value = i; // This write is violating Kokkos semantics for nested parallelism
});
- Kokkos::single(Kokkos::PerThread(team),[&] (Scalar& val) {
+ Kokkos::single(Kokkos::PerThread(team),[&] (Scalar& val)
+ {
val = 1;
},value);
Scalar value2 = 0;
- Kokkos::parallel_reduce(Kokkos::ThreadVectorRange(team,13), [&] (int i, Scalar& val) {
+ Kokkos::parallel_reduce(Kokkos::ThreadVectorRange(team,13), [&] (int i, Scalar& val)
+ {
val += value;
},value2);
if(value2!=(value*13)) {
printf("FAILED vector_single broadcast %i %i %f %f\n",team.league_rank(),team.team_rank(),(double) value2,(double) value);
flag()=1;
}
}
};
template<typename Scalar, class ExecutionSpace>
struct functor_vec_for {
typedef Kokkos::TeamPolicy<ExecutionSpace> policy_type;
typedef ExecutionSpace execution_space;
Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag;
functor_vec_for(Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag_):flag(flag_) {}
unsigned team_shmem_size(int team_size) const {return team_size*13*sizeof(Scalar)+8;}
KOKKOS_INLINE_FUNCTION
void operator() (typename policy_type::member_type team) const {
typedef typename ExecutionSpace::scratch_memory_space shmem_space ;
typedef Kokkos::View<Scalar*,shmem_space,Kokkos::MemoryUnmanaged> shared_int;
shared_int values = shared_int(team.team_shmem(),team.team_size()*13);
if (values.ptr_on_device () == NULL ||
values.dimension_0() < (unsigned) team.team_size() * 13) {
printf ("FAILED to allocate memory of size %i\n",
static_cast<int> (team.team_size () * 13));
flag() = 1;
}
else {
- Kokkos::parallel_for(Kokkos::ThreadVectorRange(team,13), [&] (int i) {
+ Kokkos::parallel_for(Kokkos::ThreadVectorRange(team,13), [&] (int i)
+ {
values(13*team.team_rank() + i) = i - team.team_rank() - team.league_rank() + team.league_size() + team.team_size();
});
- Kokkos::single(Kokkos::PerThread(team),[&] () {
+ Kokkos::single(Kokkos::PerThread(team),[&] ()
+ {
Scalar test = 0;
Scalar value = 0;
for (int i = 0; i < 13; ++i) {
test += i - team.team_rank() - team.league_rank() + team.league_size() + team.team_size();
value += values(13*team.team_rank() + i);
}
if (test != value) {
printf ("FAILED vector_par_for %i %i %f %f\n",
team.league_rank (), team.team_rank (),
static_cast<double> (test), static_cast<double> (value));
flag() = 1;
}
});
}
}
};
template<typename Scalar, class ExecutionSpace>
struct functor_vec_red {
typedef Kokkos::TeamPolicy<ExecutionSpace> policy_type;
typedef ExecutionSpace execution_space;
Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag;
functor_vec_red(Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag_):flag(flag_) {}
KOKKOS_INLINE_FUNCTION
void operator() (typename policy_type::member_type team) const {
Scalar value = 0;
- Kokkos::parallel_reduce(Kokkos::ThreadVectorRange(team,13),[&] (int i, Scalar& val) {
+ Kokkos::parallel_reduce(Kokkos::ThreadVectorRange(team,13),[&] (int i, Scalar& val)
+ {
val += i;
}, value);
- Kokkos::single(Kokkos::PerThread(team),[&] () {
+ Kokkos::single(Kokkos::PerThread(team),[&] ()
+ {
Scalar test = 0;
for(int i = 0; i < 13; i++) {
test+=i;
}
if(test!=value) {
printf("FAILED vector_par_reduce %i %i %f %f\n",team.league_rank(),team.team_rank(),(double) test,(double) value);
flag()=1;
}
});
}
};
template<typename Scalar, class ExecutionSpace>
struct functor_vec_red_join {
typedef Kokkos::TeamPolicy<ExecutionSpace> policy_type;
typedef ExecutionSpace execution_space;
Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag;
functor_vec_red_join(Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag_):flag(flag_) {}
KOKKOS_INLINE_FUNCTION
void operator() (typename policy_type::member_type team) const {
Scalar value = 1;
Kokkos::parallel_reduce(Kokkos::ThreadVectorRange(team,13)
- , [&] (int i, Scalar& val) { val *= i; }
- , [&] (Scalar& val, const Scalar& src) {val*=src;}
+ , [&] (int i, Scalar& val)
+ { val *= i; }
+ , [&] (Scalar& val, const Scalar& src)
+ {val*=src;}
, value
);
- Kokkos::single(Kokkos::PerThread(team),[&] () {
+ Kokkos::single(Kokkos::PerThread(team),[&] ()
+ {
Scalar test = 1;
for(int i = 0; i < 13; i++) {
test*=i;
}
if(test!=value) {
printf("FAILED vector_par_reduce_join %i %i %f %f\n",team.league_rank(),team.team_rank(),(double) test,(double) value);
flag()=1;
}
});
}
};
template<typename Scalar, class ExecutionSpace>
struct functor_vec_scan {
typedef Kokkos::TeamPolicy<ExecutionSpace> policy_type;
typedef ExecutionSpace execution_space;
Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag;
functor_vec_scan(Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag_):flag(flag_) {}
KOKKOS_INLINE_FUNCTION
void operator() (typename policy_type::member_type team) const {
- Kokkos::parallel_scan(Kokkos::ThreadVectorRange(team,13),[&] (int i, Scalar& val, bool final) {
+ Kokkos::parallel_scan(Kokkos::ThreadVectorRange(team,13),[&] (int i, Scalar& val, bool final)
+ {
val += i;
if(final) {
Scalar test = 0;
for(int k = 0; k <= i; k++) {
test+=k;
}
if(test!=val) {
printf("FAILED vector_par_scan %i %i %f %f\n",team.league_rank(),team.team_rank(),(double) test,(double) val);
flag()=1;
}
}
});
}
};
template<typename Scalar, class ExecutionSpace>
struct functor_reduce {
typedef double value_type;
typedef Kokkos::TeamPolicy<ExecutionSpace> policy_type;
typedef ExecutionSpace execution_space;
Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag;
functor_reduce(Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag_):flag(flag_) {}
KOKKOS_INLINE_FUNCTION
void operator() (typename policy_type::member_type team, double& sum) const {
sum += team.league_rank() * 100 + team.thread_rank();
}
};
template<typename Scalar,class ExecutionSpace>
bool test_scalar(int nteams, int team_size, int test) {
Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> d_flag("flag");
typename Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace>::HostMirror h_flag("h_flag");
h_flag() = 0 ;
Kokkos::deep_copy(d_flag,h_flag);
if(test==0)
Kokkos::parallel_for( std::string("A") , Kokkos::TeamPolicy<ExecutionSpace>(nteams,team_size,8),
functor_vec_red<Scalar, ExecutionSpace>(d_flag));
if(test==1)
Kokkos::parallel_for( Kokkos::TeamPolicy<ExecutionSpace>(nteams,team_size,8),
functor_vec_red_join<Scalar, ExecutionSpace>(d_flag));
if(test==2)
Kokkos::parallel_for( Kokkos::TeamPolicy<ExecutionSpace>(nteams,team_size,8),
functor_vec_scan<Scalar, ExecutionSpace>(d_flag));
if(test==3)
Kokkos::parallel_for( Kokkos::TeamPolicy<ExecutionSpace>(nteams,team_size,8),
functor_vec_for<Scalar, ExecutionSpace>(d_flag));
if(test==4)
Kokkos::parallel_for( "B" , Kokkos::TeamPolicy<ExecutionSpace>(nteams,team_size,8),
functor_vec_single<Scalar, ExecutionSpace>(d_flag));
if(test==5)
Kokkos::parallel_for( Kokkos::TeamPolicy<ExecutionSpace>(nteams,team_size),
functor_team_for<Scalar, ExecutionSpace>(d_flag));
if(test==6)
Kokkos::parallel_for( Kokkos::TeamPolicy<ExecutionSpace>(nteams,team_size),
functor_team_reduce<Scalar, ExecutionSpace>(d_flag));
if(test==7)
Kokkos::parallel_for( Kokkos::TeamPolicy<ExecutionSpace>(nteams,team_size),
functor_team_reduce_join<Scalar, ExecutionSpace>(d_flag));
if(test==8)
Kokkos::parallel_for( Kokkos::TeamPolicy<ExecutionSpace>(nteams,team_size,8),
functor_team_vector_for<Scalar, ExecutionSpace>(d_flag));
if(test==9)
Kokkos::parallel_for( Kokkos::TeamPolicy<ExecutionSpace>(nteams,team_size,8),
functor_team_vector_reduce<Scalar, ExecutionSpace>(d_flag));
if(test==10)
Kokkos::parallel_for( Kokkos::TeamPolicy<ExecutionSpace>(nteams,team_size,8),
functor_team_vector_reduce_join<Scalar, ExecutionSpace>(d_flag));
Kokkos::deep_copy(h_flag,d_flag);
return (h_flag() == 0);
}
template<class ExecutionSpace>
bool Test(int test) {
bool passed = true;
passed = passed && test_scalar<int, ExecutionSpace>(317,33,test);
passed = passed && test_scalar<long long int, ExecutionSpace>(317,33,test);
passed = passed && test_scalar<float, ExecutionSpace>(317,33,test);
passed = passed && test_scalar<double, ExecutionSpace>(317,33,test);
passed = passed && test_scalar<my_complex, ExecutionSpace>(317,33,test);
return passed;
}
}
diff --git a/lib/kokkos/core/unit_test/TestThreads.cpp b/lib/kokkos/core/unit_test/TestThreads.cpp
deleted file mode 100644
index 93049b95d..000000000
--- a/lib/kokkos/core/unit_test/TestThreads.cpp
+++ /dev/null
@@ -1,614 +0,0 @@
-/*
-//@HEADER
-// ************************************************************************
-//
-// Kokkos v. 2.0
-// Copyright (2014) Sandia Corporation
-//
-// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
-// the U.S. Government retains certain rights in this software.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are
-// met:
-//
-// 1. Redistributions of source code must retain the above copyright
-// notice, this list of conditions and the following disclaimer.
-//
-// 2. Redistributions in binary form must reproduce the above copyright
-// notice, this list of conditions and the following disclaimer in the
-// documentation and/or other materials provided with the distribution.
-//
-// 3. Neither the name of the Corporation nor the names of the
-// contributors may be used to endorse or promote products derived from
-// this software without specific prior written permission.
-//
-// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
-// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
-// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
-// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
-// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
-// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
-// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
-// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
-// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-//
-// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
-// ************************************************************************
-//@HEADER
-*/
-
-#include <gtest/gtest.h>
-
-#include <Kokkos_Macros.hpp>
-
-#if defined( KOKKOS_HAVE_PTHREAD )
-#ifdef KOKKOS_LAMBDA
-#undef KOKKOS_LAMBDA
-#endif
-#define KOKKOS_LAMBDA [=]
-
-#include <Kokkos_Core.hpp>
-
-#include <Threads/Kokkos_Threads_TaskPolicy.hpp>
-
-//----------------------------------------------------------------------------
-
-#include <TestSharedAlloc.hpp>
-#include <TestViewMapping.hpp>
-
-#include <TestViewImpl.hpp>
-
-#include <TestViewAPI.hpp>
-#include <TestViewSubview.hpp>
-#include <TestViewOfClass.hpp>
-#include <TestAtomic.hpp>
-#include <TestAtomicOperations.hpp>
-
-#include <TestReduce.hpp>
-#include <TestScan.hpp>
-#include <TestRange.hpp>
-#include <TestTeam.hpp>
-#include <TestAggregate.hpp>
-#include <TestAggregateReduction.hpp>
-#include <TestCompilerMacros.hpp>
-#include <TestTaskPolicy.hpp>
-#include <TestMemoryPool.hpp>
-
-
-#include <TestCXX11.hpp>
-#include <TestCXX11Deduction.hpp>
-#include <TestTeamVector.hpp>
-#include <TestMemorySpaceTracking.hpp>
-#include <TestTemplateMetaFunctions.hpp>
-
-
-#include <TestPolicyConstruction.hpp>
-
-#include <TestMDRange.hpp>
-
-namespace Test {
-
-class threads : public ::testing::Test {
-protected:
- static void SetUpTestCase()
- {
- // Finalize without initialize is a no-op:
- Kokkos::Threads::finalize();
-
- const unsigned numa_count = Kokkos::hwloc::get_available_numa_count();
- const unsigned cores_per_numa = Kokkos::hwloc::get_available_cores_per_numa();
- const unsigned threads_per_core = Kokkos::hwloc::get_available_threads_per_core();
-
- unsigned threads_count = 0 ;
-
- // Initialize and finalize with no threads:
- Kokkos::Threads::initialize( 1u );
- Kokkos::Threads::finalize();
-
- threads_count = std::max( 1u , numa_count )
- * std::max( 2u , cores_per_numa * threads_per_core );
-
- Kokkos::Threads::initialize( threads_count );
- Kokkos::Threads::finalize();
-
- threads_count = std::max( 1u , numa_count * 2 )
- * std::max( 2u , ( cores_per_numa * threads_per_core ) / 2 );
-
- Kokkos::Threads::initialize( threads_count );
- Kokkos::Threads::finalize();
-
- // Quick attempt to verify thread start/terminate don't have race condition:
- threads_count = std::max( 1u , numa_count )
- * std::max( 2u , ( cores_per_numa * threads_per_core ) / 2 );
- for ( unsigned i = 0 ; i < 10 ; ++i ) {
- Kokkos::Threads::initialize( threads_count );
- Kokkos::Threads::sleep();
- Kokkos::Threads::wake();
- Kokkos::Threads::finalize();
- }
-
- Kokkos::Threads::initialize( threads_count );
- Kokkos::Threads::print_configuration( std::cout , true /* detailed */ );
- }
-
- static void TearDownTestCase()
- {
- Kokkos::Threads::finalize();
- }
-};
-
-TEST_F( threads , init ) {
- ;
-}
-
-TEST_F( threads , md_range ) {
- TestMDRange_2D< Kokkos::Threads >::test_for2(100,100);
-
- TestMDRange_3D< Kokkos::Threads >::test_for3(100,100,100);
-}
-
-TEST_F( threads , dispatch )
-{
- const int repeat = 100 ;
- for ( int i = 0 ; i < repeat ; ++i ) {
- for ( int j = 0 ; j < repeat ; ++j ) {
- Kokkos::parallel_for( Kokkos::RangePolicy< Kokkos::Threads >(0,j)
- , KOKKOS_LAMBDA( int ) {} );
- }}
-}
-
-TEST_F( threads , impl_shared_alloc ) {
- test_shared_alloc< Kokkos::HostSpace , Kokkos::Threads >();
-}
-
-TEST_F( threads, policy_construction) {
- TestRangePolicyConstruction< Kokkos::Threads >();
- TestTeamPolicyConstruction< Kokkos::Threads >();
-}
-
-TEST_F( threads , impl_view_mapping ) {
- test_view_mapping< Kokkos::Threads >();
- test_view_mapping_subview< Kokkos::Threads >();
- test_view_mapping_operator< Kokkos::Threads >();
- TestViewMappingAtomic< Kokkos::Threads >::run();
-}
-
-
-TEST_F( threads, view_impl) {
- test_view_impl< Kokkos::Threads >();
-}
-
-TEST_F( threads, view_api) {
- TestViewAPI< double , Kokkos::Threads >();
-}
-
-TEST_F( threads , view_nested_view )
-{
- ::Test::view_nested_view< Kokkos::Threads >();
-}
-
-TEST_F( threads, view_subview_auto_1d_left ) {
- TestViewSubview::test_auto_1d< Kokkos::LayoutLeft,Kokkos::Threads >();
-}
-
-TEST_F( threads, view_subview_auto_1d_right ) {
- TestViewSubview::test_auto_1d< Kokkos::LayoutRight,Kokkos::Threads >();
-}
-
-TEST_F( threads, view_subview_auto_1d_stride ) {
- TestViewSubview::test_auto_1d< Kokkos::LayoutStride,Kokkos::Threads >();
-}
-
-TEST_F( threads, view_subview_assign_strided ) {
- TestViewSubview::test_1d_strided_assignment< Kokkos::Threads >();
-}
-
-TEST_F( threads, view_subview_left_0 ) {
- TestViewSubview::test_left_0< Kokkos::Threads >();
-}
-
-TEST_F( threads, view_subview_left_1 ) {
- TestViewSubview::test_left_1< Kokkos::Threads >();
-}
-
-TEST_F( threads, view_subview_left_2 ) {
- TestViewSubview::test_left_2< Kokkos::Threads >();
-}
-
-TEST_F( threads, view_subview_left_3 ) {
- TestViewSubview::test_left_3< Kokkos::Threads >();
-}
-
-TEST_F( threads, view_subview_right_0 ) {
- TestViewSubview::test_right_0< Kokkos::Threads >();
-}
-
-TEST_F( threads, view_subview_right_1 ) {
- TestViewSubview::test_right_1< Kokkos::Threads >();
-}
-
-TEST_F( threads, view_subview_right_3 ) {
- TestViewSubview::test_right_3< Kokkos::Threads >();
-}
-
-
-TEST_F( threads, view_aggregate ) {
- TestViewAggregate< Kokkos::Threads >();
- TestViewAggregateReduction< Kokkos::Threads >();
-}
-
-TEST_F( threads , range_tag )
-{
- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_for(2);
- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_reduce(2);
- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_scan(2);
- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(3);
- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(3);
- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(3);
- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy(2);
- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_for(1000);
- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_reduce(1000);
- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_scan(1000);
- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(1001);
- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(1001);
- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(1001);
- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy(1000);
-}
-
-TEST_F( threads , team_tag )
-{
- TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_for(2);
- TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_reduce(2);
- TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(2);
- TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(2);
- TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_for(1000);
- TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_reduce(1000);
- TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(1000);
- TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(1000);
-}
-
-TEST_F( threads, long_reduce) {
- TestReduce< long , Kokkos::Threads >( 1000000 );
-}
-
-TEST_F( threads, double_reduce) {
- TestReduce< double , Kokkos::Threads >( 1000000 );
-}
-
-TEST_F( threads , reducers )
-{
- TestReducers<int, Kokkos::Threads>::execute_integer();
- TestReducers<size_t, Kokkos::Threads>::execute_integer();
- TestReducers<double, Kokkos::Threads>::execute_float();
- TestReducers<Kokkos::complex<double>, Kokkos::Threads>::execute_basic();
-}
-
-TEST_F( threads, team_long_reduce) {
- TestReduceTeam< long , Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >( 3 );
- TestReduceTeam< long , Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
- TestReduceTeam< long , Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >( 100000 );
- TestReduceTeam< long , Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
-}
-
-TEST_F( threads, team_double_reduce) {
- TestReduceTeam< double , Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >( 3 );
- TestReduceTeam< double , Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
- TestReduceTeam< double , Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >( 100000 );
- TestReduceTeam< double , Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
-}
-
-TEST_F( threads, long_reduce_dynamic ) {
- TestReduceDynamic< long , Kokkos::Threads >( 1000000 );
-}
-
-TEST_F( threads, double_reduce_dynamic ) {
- TestReduceDynamic< double , Kokkos::Threads >( 1000000 );
-}
-
-TEST_F( threads, long_reduce_dynamic_view ) {
- TestReduceDynamicView< long , Kokkos::Threads >( 1000000 );
-}
-
-TEST_F( threads, team_shared_request) {
- TestSharedTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >();
- TestSharedTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >();
-}
-
-#if defined(KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA)
-TEST_F( threads, team_lambda_shared_request) {
- TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >();
- TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >();
-}
-#endif
-
-TEST_F( threads, shmem_size) {
- TestShmemSize< Kokkos::Threads >();
-}
-
-TEST_F( threads , view_remap )
-{
- enum { N0 = 3 , N1 = 2 , N2 = 8 , N3 = 9 };
-
- typedef Kokkos::View< double*[N1][N2][N3] ,
- Kokkos::LayoutRight ,
- Kokkos::Threads > output_type ;
-
- typedef Kokkos::View< int**[N2][N3] ,
- Kokkos::LayoutLeft ,
- Kokkos::Threads > input_type ;
-
- typedef Kokkos::View< int*[N0][N2][N3] ,
- Kokkos::LayoutLeft ,
- Kokkos::Threads > diff_type ;
-
- output_type output( "output" , N0 );
- input_type input ( "input" , N0 , N1 );
- diff_type diff ( "diff" , N0 );
-
- int value = 0 ;
- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
- for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
- input(i0,i1,i2,i3) = ++value ;
- }}}}
-
- // Kokkos::deep_copy( diff , input ); // throw with incompatible shape
- Kokkos::deep_copy( output , input );
-
- value = 0 ;
- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
- for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
- ++value ;
- ASSERT_EQ( value , ((int) output(i0,i1,i2,i3) ) );
- }}}}
-}
-
-//----------------------------------------------------------------------------
-
-TEST_F( threads , atomics )
-{
- const int loop_count = 1e6 ;
-
- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Threads>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Threads>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Threads>(loop_count,3) ) );
-
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Threads>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Threads>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Threads>(loop_count,3) ) );
-
- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Threads>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Threads>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Threads>(loop_count,3) ) );
-
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Threads>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Threads>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Threads>(loop_count,3) ) );
-
- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Threads>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Threads>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Threads>(loop_count,3) ) );
-
- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Threads>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Threads>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Threads>(loop_count,3) ) );
-
- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Threads>(100,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Threads>(100,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Threads>(100,3) ) );
-
- ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Threads>(100,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Threads>(100,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Threads>(100,3) ) );
-
- ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<3>, Kokkos::Threads>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<3>, Kokkos::Threads>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<3>, Kokkos::Threads>(loop_count,3) ) );
-}
-
-TEST_F( threads , atomic_operations )
-{
- const int start = 1; //Avoid zero for division
- const int end = 11;
- for (int i = start; i < end; ++i)
- {
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 4 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 5 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 6 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 7 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 8 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 9 ) ) );
-
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 4 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 5 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 6 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 7 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 8 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 9 ) ) );
-
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 4 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 5 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 6 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 7 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 8 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 9 ) ) );
-
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 4 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 5 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 6 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 7 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 8 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 9 ) ) );
-
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 4 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 5 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 6 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 7 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 8 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 9 ) ) );
-
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Threads>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Threads>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Threads>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Threads>(start, end-i, 4 ) ) );
-
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Threads>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Threads>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Threads>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Threads>(start, end-i, 4 ) ) );
- }
-
-}
-
-//----------------------------------------------------------------------------
-
-#if 0
-TEST_F( threads , scan_small )
-{
- typedef TestScan< Kokkos::Threads , Kokkos::Impl::ThreadsExecUseScanSmall > TestScanFunctor ;
- for ( int i = 0 ; i < 1000 ; ++i ) {
- TestScanFunctor( 10 );
- TestScanFunctor( 10000 );
- }
- TestScanFunctor( 1000000 );
- TestScanFunctor( 10000000 );
-
- Kokkos::Threads::fence();
-}
-#endif
-
-TEST_F( threads , scan )
-{
- TestScan< Kokkos::Threads >::test_range( 1 , 1000 );
- TestScan< Kokkos::Threads >( 1000000 );
- TestScan< Kokkos::Threads >( 10000000 );
- Kokkos::Threads::fence();
-}
-
-//----------------------------------------------------------------------------
-
-TEST_F( threads , team_scan )
-{
- TestScanTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >( 10 );
- TestScanTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >( 10 );
- TestScanTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >( 10000 );
- TestScanTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >( 10000 );
-}
-
-//----------------------------------------------------------------------------
-
-TEST_F( threads , compiler_macros )
-{
- ASSERT_TRUE( ( TestCompilerMacros::Test< Kokkos::Threads >() ) );
-}
-
-TEST_F( threads , memory_space )
-{
- TestMemorySpace< Kokkos::Threads >();
-}
-
-TEST_F( threads , memory_pool )
-{
- bool val = TestMemoryPool::test_mempool< Kokkos::Threads >( 128, 128000000 );
- ASSERT_TRUE( val );
-
- TestMemoryPool::test_mempool2< Kokkos::Threads >( 64, 4, 1000000, 2000000 );
-
- TestMemoryPool::test_memory_exhaustion< Kokkos::Threads >();
-}
-
-//----------------------------------------------------------------------------
-
-TEST_F( threads , template_meta_functions )
-{
- TestTemplateMetaFunctions<int, Kokkos::Threads >();
-}
-
-//----------------------------------------------------------------------------
-
-#if defined( KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_THREADS )
-TEST_F( threads , cxx11 )
-{
- if ( Kokkos::Impl::is_same< Kokkos::DefaultExecutionSpace , Kokkos::Threads >::value ) {
- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Threads >(1) ) );
- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Threads >(2) ) );
- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Threads >(3) ) );
- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Threads >(4) ) );
- }
-}
-
-TEST_F( threads , reduction_deduction )
-{
- TestCXX11::test_reduction_deduction< Kokkos::Threads >();
-}
-#endif /* #if defined( KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_THREADS ) */
-
-TEST_F( threads , team_vector )
-{
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(0) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(1) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(2) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(3) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(4) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(5) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(6) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(7) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(8) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(9) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(10) ) );
-}
-
-#if defined( KOKKOS_ENABLE_TASKPOLICY )
-
-TEST_F( threads , task_policy )
-{
- TestTaskPolicy::test_task_dep< Kokkos::Threads >( 10 );
-
- for ( long i = 0 ; i < 25 ; ++i ) {
-// printf( "test_fib(): %2ld\n", i );
- TestTaskPolicy::test_fib< Kokkos::Threads >(i);
- }
- for ( long i = 0 ; i < 35 ; ++i ) {
-// printf( "test_fib2(): %2ld\n", i );
- TestTaskPolicy::test_fib2< Kokkos::Threads >(i);
- }
-}
-
-TEST_F( threads , task_team )
-{
- TestTaskPolicy::test_task_team< Kokkos::Threads >(1000);
-}
-
-TEST_F( threads , task_latch )
-{
- TestTaskPolicy::test_latch< Kokkos::Threads >(10);
- TestTaskPolicy::test_latch< Kokkos::Threads >(1000);
-}
-
-#endif /* #if defined( KOKKOS_ENABLE_TASKPOLICY ) */
-
-} // namespace Test
-
-#endif /* #if defined( KOKKOS_HAVE_PTHREAD ) */
diff --git a/lib/kokkos/core/unit_test/TestTile.hpp b/lib/kokkos/core/unit_test/TestTile.hpp
index dfb2bd81b..842131deb 100644
--- a/lib/kokkos/core/unit_test/TestTile.hpp
+++ b/lib/kokkos/core/unit_test/TestTile.hpp
@@ -1,153 +1,154 @@
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
#ifndef TEST_TILE_HPP
#define TEST_TILE_HPP
#include <Kokkos_Core.hpp>
+#include <impl/Kokkos_ViewTile.hpp>
namespace TestTile {
template < typename Device , typename TileLayout>
struct ReduceTileErrors
{
typedef Device execution_space ;
typedef Kokkos::View< ptrdiff_t**, TileLayout, Device> array_type;
typedef Kokkos::View< ptrdiff_t[ TileLayout::N0 ][ TileLayout::N1 ], Kokkos::LayoutLeft , Device > tile_type ;
array_type m_array ;
typedef ptrdiff_t value_type;
ReduceTileErrors( array_type a )
: m_array(a)
{}
KOKKOS_INLINE_FUNCTION
static void init( value_type & errors )
{
errors = 0;
}
KOKKOS_INLINE_FUNCTION
static void join( volatile value_type & errors ,
const volatile value_type & src_errors )
{
errors += src_errors;
}
// Initialize
KOKKOS_INLINE_FUNCTION
void operator()( size_t iwork ) const
{
const size_t i = iwork % m_array.dimension_0();
const size_t j = iwork / m_array.dimension_0();
if ( j < m_array.dimension_1() ) {
m_array(i,j) = & m_array(i,j) - & m_array(0,0);
// printf("m_array(%d,%d) = %d\n",int(i),int(j),int(m_array(i,j)));
}
}
// Verify:
KOKKOS_INLINE_FUNCTION
void operator()( size_t iwork , value_type & errors ) const
{
const size_t tile_dim0 = ( m_array.dimension_0() + TileLayout::N0 - 1 ) / TileLayout::N0 ;
const size_t tile_dim1 = ( m_array.dimension_1() + TileLayout::N1 - 1 ) / TileLayout::N1 ;
const size_t itile = iwork % tile_dim0 ;
const size_t jtile = iwork / tile_dim0 ;
if ( jtile < tile_dim1 ) {
- tile_type tile = Kokkos::tile_subview( m_array , itile , jtile );
+ tile_type tile = Kokkos::Experimental::tile_subview( m_array , itile , jtile );
if ( tile(0,0) != ptrdiff_t(( itile + jtile * tile_dim0 ) * TileLayout::N0 * TileLayout::N1 ) ) {
++errors ;
}
else {
for ( size_t j = 0 ; j < size_t(TileLayout::N1) ; ++j ) {
for ( size_t i = 0 ; i < size_t(TileLayout::N0) ; ++i ) {
const size_t iglobal = i + itile * TileLayout::N0 ;
const size_t jglobal = j + jtile * TileLayout::N1 ;
if ( iglobal < m_array.dimension_0() && jglobal < m_array.dimension_1() ) {
if ( tile(i,j) != ptrdiff_t( tile(0,0) + i + j * TileLayout::N0 ) ) ++errors ;
// printf("tile(%d,%d)(%d,%d) = %d\n",int(itile),int(jtile),int(i),int(j),int(tile(i,j)));
}
}
}
}
}
}
};
template< class Space , unsigned N0 , unsigned N1 >
void test( const size_t dim0 , const size_t dim1 )
{
typedef Kokkos::LayoutTileLeft<N0,N1> array_layout ;
typedef ReduceTileErrors< Space , array_layout > functor_type ;
const size_t tile_dim0 = ( dim0 + N0 - 1 ) / N0 ;
const size_t tile_dim1 = ( dim1 + N1 - 1 ) / N1 ;
typename functor_type::array_type array("",dim0,dim1);
Kokkos::parallel_for( Kokkos::RangePolicy<Space,size_t>(0,dim0*dim1) , functor_type( array ) );
ptrdiff_t error = 0 ;
Kokkos::parallel_reduce( Kokkos::RangePolicy<Space,size_t>(0,tile_dim0*tile_dim1) , functor_type( array ) , error );
EXPECT_EQ( error , ptrdiff_t(0) );
}
} /* namespace TestTile */
#endif //TEST_TILE_HPP
diff --git a/lib/kokkos/core/unit_test/TestUtilities.hpp b/lib/kokkos/core/unit_test/TestUtilities.hpp
new file mode 100644
index 000000000..947be03e3
--- /dev/null
+++ b/lib/kokkos/core/unit_test/TestUtilities.hpp
@@ -0,0 +1,306 @@
+/*
+//@HEADER
+// ************************************************************************
+//
+// Kokkos v. 2.0
+// Copyright (2014) Sandia Corporation
+//
+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
+// the U.S. Government retains certain rights in this software.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the Corporation nor the names of the
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
+//
+// ************************************************************************
+//@HEADER
+*/
+
+#include <gtest/gtest.h>
+
+#include <stdexcept>
+#include <sstream>
+#include <iostream>
+
+#include <Kokkos_Core.hpp>
+
+/*--------------------------------------------------------------------------*/
+
+namespace Test {
+
+inline
+void test_utilities()
+{
+ using namespace Kokkos::Impl;
+ {
+ using i = integer_sequence<int>;
+ using j = make_integer_sequence<int,0>;
+
+ static_assert( std::is_same<i,j>::value, "Error: make_integer_sequence" );
+ static_assert( i::size() == 0u, "Error: integer_sequence.size()" );
+ }
+
+
+ {
+ using i = integer_sequence<int,0>;
+ using j = make_integer_sequence<int,1>;
+
+ static_assert( std::is_same<i,j>::value, "Error: make_integer_sequence" );
+ static_assert( i::size() == 1u, "Error: integer_sequence.size()" );
+
+ static_assert( integer_sequence_at<0, i>::value == 0, "Error: integer_sequence_at" );
+
+ static_assert( at(0, i{}) == 0, "Error: at(unsigned, integer_sequence)" );
+ }
+
+
+ {
+ using i = integer_sequence<int,0,1>;
+ using j = make_integer_sequence<int,2>;
+
+ static_assert( std::is_same<i,j>::value, "Error: make_integer_sequence" );
+ static_assert( i::size() == 2u, "Error: integer_sequence.size()" );
+
+ static_assert( integer_sequence_at<0, i>::value == 0, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<1, i>::value == 1, "Error: integer_sequence_at" );
+
+ static_assert( at(0, i{}) == 0, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(1, i{}) == 1, "Error: at(unsigned, integer_sequence)" );
+ }
+
+ {
+ using i = integer_sequence<int,0,1,2>;
+ using j = make_integer_sequence<int,3>;
+
+ static_assert( std::is_same<i,j>::value, "Error: make_integer_sequence" );
+ static_assert( i::size() == 3u, "Error: integer_sequence.size()" );
+
+ static_assert( integer_sequence_at<0, i>::value == 0, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<1, i>::value == 1, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<2, i>::value == 2, "Error: integer_sequence_at" );
+
+ static_assert( at(0, i{}) == 0, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(1, i{}) == 1, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(2, i{}) == 2, "Error: at(unsigned, integer_sequence)" );
+ }
+
+ {
+ using i = integer_sequence<int,0,1,2,3>;
+ using j = make_integer_sequence<int,4>;
+
+ static_assert( std::is_same<i,j>::value, "Error: make_integer_sequence" );
+ static_assert( i::size() == 4u, "Error: integer_sequence.size()" );
+
+ static_assert( integer_sequence_at<0, i>::value == 0, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<1, i>::value == 1, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<2, i>::value == 2, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<3, i>::value == 3, "Error: integer_sequence_at" );
+
+ static_assert( at(0, i{}) == 0, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(1, i{}) == 1, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(2, i{}) == 2, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(3, i{}) == 3, "Error: at(unsigned, integer_sequence)" );
+ }
+
+ {
+ using i = integer_sequence<int,0,1,2,3,4>;
+ using j = make_integer_sequence<int,5>;
+
+ static_assert( std::is_same<i,j>::value, "Error: make_integer_sequence" );
+ static_assert( i::size() == 5u, "Error: integer_sequence.size()" );
+
+ static_assert( integer_sequence_at<0, i>::value == 0, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<1, i>::value == 1, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<2, i>::value == 2, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<3, i>::value == 3, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<4, i>::value == 4, "Error: integer_sequence_at" );
+
+ static_assert( at(0, i{}) == 0, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(1, i{}) == 1, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(2, i{}) == 2, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(3, i{}) == 3, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(4, i{}) == 4, "Error: at(unsigned, integer_sequence)" );
+ }
+
+ {
+ using i = integer_sequence<int,0,1,2,3,4,5>;
+ using j = make_integer_sequence<int,6>;
+
+ static_assert( std::is_same<i,j>::value, "Error: make_integer_sequence" );
+ static_assert( i::size() == 6u, "Error: integer_sequence.size()" );
+
+ static_assert( integer_sequence_at<0, i>::value == 0, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<1, i>::value == 1, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<2, i>::value == 2, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<3, i>::value == 3, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<4, i>::value == 4, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<5, i>::value == 5, "Error: integer_sequence_at" );
+
+ static_assert( at(0, i{}) == 0, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(1, i{}) == 1, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(2, i{}) == 2, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(3, i{}) == 3, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(4, i{}) == 4, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(5, i{}) == 5, "Error: at(unsigned, integer_sequence)" );
+ }
+
+ {
+ using i = integer_sequence<int,0,1,2,3,4,5,6>;
+ using j = make_integer_sequence<int,7>;
+
+ static_assert( std::is_same<i,j>::value, "Error: make_integer_sequence" );
+ static_assert( i::size() == 7u, "Error: integer_sequence.size()" );
+
+ static_assert( integer_sequence_at<0, i>::value == 0, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<1, i>::value == 1, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<2, i>::value == 2, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<3, i>::value == 3, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<4, i>::value == 4, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<5, i>::value == 5, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<6, i>::value == 6, "Error: integer_sequence_at" );
+
+ static_assert( at(0, i{}) == 0, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(1, i{}) == 1, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(2, i{}) == 2, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(3, i{}) == 3, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(4, i{}) == 4, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(5, i{}) == 5, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(6, i{}) == 6, "Error: at(unsigned, integer_sequence)" );
+ }
+
+ {
+ using i = integer_sequence<int,0,1,2,3,4,5,6,7>;
+ using j = make_integer_sequence<int,8>;
+
+ static_assert( std::is_same<i,j>::value, "Error: make_integer_sequence" );
+ static_assert( i::size() == 8u, "Error: integer_sequence.size()" );
+
+ static_assert( integer_sequence_at<0, i>::value == 0, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<1, i>::value == 1, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<2, i>::value == 2, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<3, i>::value == 3, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<4, i>::value == 4, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<5, i>::value == 5, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<6, i>::value == 6, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<7, i>::value == 7, "Error: integer_sequence_at" );
+
+ static_assert( at(0, i{}) == 0, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(1, i{}) == 1, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(2, i{}) == 2, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(3, i{}) == 3, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(4, i{}) == 4, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(5, i{}) == 5, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(6, i{}) == 6, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(7, i{}) == 7, "Error: at(unsigned, integer_sequence)" );
+ }
+
+ {
+ using i = integer_sequence<int,0,1,2,3,4,5,6,7,8>;
+ using j = make_integer_sequence<int,9>;
+
+ static_assert( std::is_same<i,j>::value, "Error: make_integer_sequence" );
+ static_assert( i::size() == 9u, "Error: integer_sequence.size()" );
+
+ static_assert( integer_sequence_at<0, i>::value == 0, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<1, i>::value == 1, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<2, i>::value == 2, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<3, i>::value == 3, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<4, i>::value == 4, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<5, i>::value == 5, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<6, i>::value == 6, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<7, i>::value == 7, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<8, i>::value == 8, "Error: integer_sequence_at" );
+
+ static_assert( at(0, i{}) == 0, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(1, i{}) == 1, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(2, i{}) == 2, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(3, i{}) == 3, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(4, i{}) == 4, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(5, i{}) == 5, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(6, i{}) == 6, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(7, i{}) == 7, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(8, i{}) == 8, "Error: at(unsigned, integer_sequence)" );
+ }
+
+ {
+ using i = integer_sequence<int,0,1,2,3,4,5,6,7,8,9>;
+ using j = make_integer_sequence<int,10>;
+
+ static_assert( std::is_same<i,j>::value, "Error: make_integer_sequence" );
+ static_assert( i::size() == 10u, "Error: integer_sequence.size()" );
+
+ static_assert( integer_sequence_at<0, i>::value == 0, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<1, i>::value == 1, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<2, i>::value == 2, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<3, i>::value == 3, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<4, i>::value == 4, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<5, i>::value == 5, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<6, i>::value == 6, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<7, i>::value == 7, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<8, i>::value == 8, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at<9, i>::value == 9, "Error: integer_sequence_at" );
+
+ static_assert( at(0, i{}) == 0, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(1, i{}) == 1, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(2, i{}) == 2, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(3, i{}) == 3, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(4, i{}) == 4, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(5, i{}) == 5, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(6, i{}) == 6, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(7, i{}) == 7, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(8, i{}) == 8, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at(9, i{}) == 9, "Error: at(unsigned, integer_sequence)" );
+ }
+
+ {
+ using i = make_integer_sequence<int, 5>;
+ using r = reverse_integer_sequence<i>;
+ using gr = integer_sequence<int, 4, 3, 2, 1, 0>;
+
+ static_assert( std::is_same<r,gr>::value, "Error: reverse_integer_sequence" );
+ }
+
+ {
+ using s = make_integer_sequence<int,10>;
+ using e = exclusive_scan_integer_sequence<s>;
+ using i = inclusive_scan_integer_sequence<s>;
+
+ using ge = integer_sequence<int, 0, 0, 1, 3, 6, 10, 15, 21, 28, 36>;
+ using gi = integer_sequence<int, 0, 1, 3, 6, 10, 15, 21, 28, 36, 45>;
+
+ static_assert( e::value == 45, "Error: scan value");
+ static_assert( i::value == 45, "Error: scan value");
+
+ static_assert( std::is_same< e::type, ge >::value, "Error: exclusive_scan");
+ static_assert( std::is_same< i::type, gi >::value, "Error: inclusive_scan");
+ }
+
+
+}
+
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/TestViewAPI.hpp b/lib/kokkos/core/unit_test/TestViewAPI.hpp
index ae4c6d218..88b474db1 100644
--- a/lib/kokkos/core/unit_test/TestViewAPI.hpp
+++ b/lib/kokkos/core/unit_test/TestViewAPI.hpp
@@ -1,1416 +1,1361 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
#include <stdexcept>
#include <sstream>
#include <iostream>
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
namespace Test {
-#if KOKKOS_USING_EXP_VIEW
-
template< class T , class ... P >
size_t allocation_count( const Kokkos::View<T,P...> & view )
{
const size_t card = view.size();
const size_t alloc = view.span();
const int memory_span = Kokkos::View<int*>::required_allocation_size(100);
return (card <= alloc && memory_span == 400) ? alloc : 0 ;
}
-#else
-
-template< class T , class L , class D , class M , class S >
-size_t allocation_count( const Kokkos::View<T,L,D,M,S> & view )
-{
- const size_t card = Kokkos::Impl::cardinality_count( view.shape() );
- const size_t alloc = view.capacity();
-
- return card <= alloc ? alloc : 0 ;
-}
-
-#endif
-
/*--------------------------------------------------------------------------*/
template< typename T, class DeviceType>
struct TestViewOperator
{
typedef typename DeviceType::execution_space execution_space ;
static const unsigned N = 100 ;
static const unsigned D = 3 ;
typedef Kokkos::View< T*[D] , execution_space > view_type ;
const view_type v1 ;
const view_type v2 ;
TestViewOperator()
: v1( "v1" , N )
, v2( "v2" , N )
{}
static void testit()
{
Kokkos::parallel_for( N , TestViewOperator() );
}
KOKKOS_INLINE_FUNCTION
void operator()( const unsigned i ) const
{
const unsigned X = 0 ;
const unsigned Y = 1 ;
const unsigned Z = 2 ;
v2(i,X) = v1(i,X);
v2(i,Y) = v1(i,Y);
v2(i,Z) = v1(i,Z);
}
};
/*--------------------------------------------------------------------------*/
template< class DataType ,
class DeviceType ,
unsigned Rank = Kokkos::ViewTraits< DataType >::rank >
struct TestViewOperator_LeftAndRight ;
template< class DataType , class DeviceType >
struct TestViewOperator_LeftAndRight< DataType , DeviceType , 8 >
{
typedef typename DeviceType::execution_space execution_space ;
typedef typename DeviceType::memory_space memory_space ;
typedef typename execution_space::size_type size_type ;
typedef int value_type ;
KOKKOS_INLINE_FUNCTION
static void join( volatile value_type & update ,
const volatile value_type & input )
{ update |= input ; }
KOKKOS_INLINE_FUNCTION
static void init( value_type & update )
{ update = 0 ; }
typedef Kokkos::
View< DataType, Kokkos::LayoutLeft, execution_space > left_view ;
typedef Kokkos::
View< DataType, Kokkos::LayoutRight, execution_space > right_view ;
typedef Kokkos::
View< DataType, Kokkos::LayoutStride, execution_space > stride_view ;
left_view left ;
right_view right ;
stride_view left_stride ;
stride_view right_stride ;
long left_alloc ;
long right_alloc ;
TestViewOperator_LeftAndRight()
: left( "left" )
, right( "right" )
, left_stride( left )
, right_stride( right )
, left_alloc( allocation_count( left ) )
, right_alloc( allocation_count( right ) )
{}
static void testit()
{
TestViewOperator_LeftAndRight driver ;
int error_flag = 0 ;
Kokkos::parallel_reduce( 1 , driver , error_flag );
ASSERT_EQ( error_flag , 0 );
}
KOKKOS_INLINE_FUNCTION
void operator()( const size_type , value_type & update ) const
{
long offset ;
offset = -1 ;
for ( unsigned i7 = 0 ; i7 < unsigned(left.dimension_7()) ; ++i7 )
for ( unsigned i6 = 0 ; i6 < unsigned(left.dimension_6()) ; ++i6 )
for ( unsigned i5 = 0 ; i5 < unsigned(left.dimension_5()) ; ++i5 )
for ( unsigned i4 = 0 ; i4 < unsigned(left.dimension_4()) ; ++i4 )
for ( unsigned i3 = 0 ; i3 < unsigned(left.dimension_3()) ; ++i3 )
for ( unsigned i2 = 0 ; i2 < unsigned(left.dimension_2()) ; ++i2 )
for ( unsigned i1 = 0 ; i1 < unsigned(left.dimension_1()) ; ++i1 )
for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
{
const long j = & left( i0, i1, i2, i3, i4, i5, i6, i7 ) -
& left( 0, 0, 0, 0, 0, 0, 0, 0 );
if ( j <= offset || left_alloc <= j ) { update |= 1 ; }
offset = j ;
if ( & left(i0,i1,i2,i3,i4,i5,i6,i7) !=
& left_stride(i0,i1,i2,i3,i4,i5,i6,i7) ) {
update |= 4 ;
}
}
offset = -1 ;
for ( unsigned i0 = 0 ; i0 < unsigned(right.dimension_0()) ; ++i0 )
for ( unsigned i1 = 0 ; i1 < unsigned(right.dimension_1()) ; ++i1 )
for ( unsigned i2 = 0 ; i2 < unsigned(right.dimension_2()) ; ++i2 )
for ( unsigned i3 = 0 ; i3 < unsigned(right.dimension_3()) ; ++i3 )
for ( unsigned i4 = 0 ; i4 < unsigned(right.dimension_4()) ; ++i4 )
for ( unsigned i5 = 0 ; i5 < unsigned(right.dimension_5()) ; ++i5 )
for ( unsigned i6 = 0 ; i6 < unsigned(right.dimension_6()) ; ++i6 )
for ( unsigned i7 = 0 ; i7 < unsigned(right.dimension_7()) ; ++i7 )
{
const long j = & right( i0, i1, i2, i3, i4, i5, i6, i7 ) -
& right( 0, 0, 0, 0, 0, 0, 0, 0 );
if ( j <= offset || right_alloc <= j ) { update |= 2 ; }
offset = j ;
if ( & right(i0,i1,i2,i3,i4,i5,i6,i7) !=
& right_stride(i0,i1,i2,i3,i4,i5,i6,i7) ) {
update |= 8 ;
}
}
}
};
template< class DataType , class DeviceType >
struct TestViewOperator_LeftAndRight< DataType , DeviceType , 7 >
{
typedef typename DeviceType::execution_space execution_space ;
typedef typename DeviceType::memory_space memory_space ;
typedef typename execution_space::size_type size_type ;
typedef int value_type ;
KOKKOS_INLINE_FUNCTION
static void join( volatile value_type & update ,
const volatile value_type & input )
{ update |= input ; }
KOKKOS_INLINE_FUNCTION
static void init( value_type & update )
{ update = 0 ; }
typedef Kokkos::
View< DataType, Kokkos::LayoutLeft, execution_space > left_view ;
typedef Kokkos::
View< DataType, Kokkos::LayoutRight, execution_space > right_view ;
left_view left ;
right_view right ;
long left_alloc ;
long right_alloc ;
TestViewOperator_LeftAndRight()
: left( "left" )
, right( "right" )
, left_alloc( allocation_count( left ) )
, right_alloc( allocation_count( right ) )
{}
static void testit()
{
TestViewOperator_LeftAndRight driver ;
int error_flag = 0 ;
Kokkos::parallel_reduce( 1 , driver , error_flag );
ASSERT_EQ( error_flag , 0 );
}
KOKKOS_INLINE_FUNCTION
void operator()( const size_type , value_type & update ) const
{
long offset ;
offset = -1 ;
for ( unsigned i6 = 0 ; i6 < unsigned(left.dimension_6()) ; ++i6 )
for ( unsigned i5 = 0 ; i5 < unsigned(left.dimension_5()) ; ++i5 )
for ( unsigned i4 = 0 ; i4 < unsigned(left.dimension_4()) ; ++i4 )
for ( unsigned i3 = 0 ; i3 < unsigned(left.dimension_3()) ; ++i3 )
for ( unsigned i2 = 0 ; i2 < unsigned(left.dimension_2()) ; ++i2 )
for ( unsigned i1 = 0 ; i1 < unsigned(left.dimension_1()) ; ++i1 )
for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
{
const long j = & left( i0, i1, i2, i3, i4, i5, i6 ) -
& left( 0, 0, 0, 0, 0, 0, 0 );
if ( j <= offset || left_alloc <= j ) { update |= 1 ; }
offset = j ;
}
offset = -1 ;
for ( unsigned i0 = 0 ; i0 < unsigned(right.dimension_0()) ; ++i0 )
for ( unsigned i1 = 0 ; i1 < unsigned(right.dimension_1()) ; ++i1 )
for ( unsigned i2 = 0 ; i2 < unsigned(right.dimension_2()) ; ++i2 )
for ( unsigned i3 = 0 ; i3 < unsigned(right.dimension_3()) ; ++i3 )
for ( unsigned i4 = 0 ; i4 < unsigned(right.dimension_4()) ; ++i4 )
for ( unsigned i5 = 0 ; i5 < unsigned(right.dimension_5()) ; ++i5 )
for ( unsigned i6 = 0 ; i6 < unsigned(right.dimension_6()) ; ++i6 )
{
const long j = & right( i0, i1, i2, i3, i4, i5, i6 ) -
& right( 0, 0, 0, 0, 0, 0, 0 );
if ( j <= offset || right_alloc <= j ) { update |= 2 ; }
offset = j ;
}
}
};
template< class DataType , class DeviceType >
struct TestViewOperator_LeftAndRight< DataType , DeviceType , 6 >
{
typedef typename DeviceType::execution_space execution_space ;
typedef typename DeviceType::memory_space memory_space ;
typedef typename execution_space::size_type size_type ;
typedef int value_type ;
KOKKOS_INLINE_FUNCTION
static void join( volatile value_type & update ,
const volatile value_type & input )
{ update |= input ; }
KOKKOS_INLINE_FUNCTION
static void init( value_type & update )
{ update = 0 ; }
typedef Kokkos::
View< DataType, Kokkos::LayoutLeft, execution_space > left_view ;
typedef Kokkos::
View< DataType, Kokkos::LayoutRight, execution_space > right_view ;
left_view left ;
right_view right ;
long left_alloc ;
long right_alloc ;
TestViewOperator_LeftAndRight()
: left( "left" )
, right( "right" )
, left_alloc( allocation_count( left ) )
, right_alloc( allocation_count( right ) )
{}
static void testit()
{
TestViewOperator_LeftAndRight driver ;
int error_flag = 0 ;
Kokkos::parallel_reduce( 1 , driver , error_flag );
ASSERT_EQ( error_flag , 0 );
}
KOKKOS_INLINE_FUNCTION
void operator()( const size_type , value_type & update ) const
{
long offset ;
offset = -1 ;
for ( unsigned i5 = 0 ; i5 < unsigned(left.dimension_5()) ; ++i5 )
for ( unsigned i4 = 0 ; i4 < unsigned(left.dimension_4()) ; ++i4 )
for ( unsigned i3 = 0 ; i3 < unsigned(left.dimension_3()) ; ++i3 )
for ( unsigned i2 = 0 ; i2 < unsigned(left.dimension_2()) ; ++i2 )
for ( unsigned i1 = 0 ; i1 < unsigned(left.dimension_1()) ; ++i1 )
for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
{
const long j = & left( i0, i1, i2, i3, i4, i5 ) -
& left( 0, 0, 0, 0, 0, 0 );
if ( j <= offset || left_alloc <= j ) { update |= 1 ; }
offset = j ;
}
offset = -1 ;
for ( unsigned i0 = 0 ; i0 < unsigned(right.dimension_0()) ; ++i0 )
for ( unsigned i1 = 0 ; i1 < unsigned(right.dimension_1()) ; ++i1 )
for ( unsigned i2 = 0 ; i2 < unsigned(right.dimension_2()) ; ++i2 )
for ( unsigned i3 = 0 ; i3 < unsigned(right.dimension_3()) ; ++i3 )
for ( unsigned i4 = 0 ; i4 < unsigned(right.dimension_4()) ; ++i4 )
for ( unsigned i5 = 0 ; i5 < unsigned(right.dimension_5()) ; ++i5 )
{
const long j = & right( i0, i1, i2, i3, i4, i5 ) -
& right( 0, 0, 0, 0, 0, 0 );
if ( j <= offset || right_alloc <= j ) { update |= 2 ; }
offset = j ;
}
}
};
template< class DataType , class DeviceType >
struct TestViewOperator_LeftAndRight< DataType , DeviceType , 5 >
{
typedef typename DeviceType::execution_space execution_space ;
typedef typename DeviceType::memory_space memory_space ;
typedef typename execution_space::size_type size_type ;
typedef int value_type ;
KOKKOS_INLINE_FUNCTION
static void join( volatile value_type & update ,
const volatile value_type & input )
{ update |= input ; }
KOKKOS_INLINE_FUNCTION
static void init( value_type & update )
{ update = 0 ; }
typedef Kokkos::
View< DataType, Kokkos::LayoutLeft, execution_space > left_view ;
typedef Kokkos::
View< DataType, Kokkos::LayoutRight, execution_space > right_view ;
typedef Kokkos::
View< DataType, Kokkos::LayoutStride, execution_space > stride_view ;
left_view left ;
right_view right ;
stride_view left_stride ;
stride_view right_stride ;
long left_alloc ;
long right_alloc ;
TestViewOperator_LeftAndRight()
: left( "left" )
, right( "right" )
, left_stride( left )
, right_stride( right )
, left_alloc( allocation_count( left ) )
, right_alloc( allocation_count( right ) )
{}
static void testit()
{
TestViewOperator_LeftAndRight driver ;
int error_flag = 0 ;
Kokkos::parallel_reduce( 1 , driver , error_flag );
ASSERT_EQ( error_flag , 0 );
}
KOKKOS_INLINE_FUNCTION
void operator()( const size_type , value_type & update ) const
{
long offset ;
offset = -1 ;
for ( unsigned i4 = 0 ; i4 < unsigned(left.dimension_4()) ; ++i4 )
for ( unsigned i3 = 0 ; i3 < unsigned(left.dimension_3()) ; ++i3 )
for ( unsigned i2 = 0 ; i2 < unsigned(left.dimension_2()) ; ++i2 )
for ( unsigned i1 = 0 ; i1 < unsigned(left.dimension_1()) ; ++i1 )
for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
{
const long j = & left( i0, i1, i2, i3, i4 ) -
& left( 0, 0, 0, 0, 0 );
if ( j <= offset || left_alloc <= j ) { update |= 1 ; }
offset = j ;
if ( & left( i0, i1, i2, i3, i4 ) !=
& left_stride( i0, i1, i2, i3, i4 ) ) { update |= 4 ; }
}
offset = -1 ;
for ( unsigned i0 = 0 ; i0 < unsigned(right.dimension_0()) ; ++i0 )
for ( unsigned i1 = 0 ; i1 < unsigned(right.dimension_1()) ; ++i1 )
for ( unsigned i2 = 0 ; i2 < unsigned(right.dimension_2()) ; ++i2 )
for ( unsigned i3 = 0 ; i3 < unsigned(right.dimension_3()) ; ++i3 )
for ( unsigned i4 = 0 ; i4 < unsigned(right.dimension_4()) ; ++i4 )
{
const long j = & right( i0, i1, i2, i3, i4 ) -
& right( 0, 0, 0, 0, 0 );
if ( j <= offset || right_alloc <= j ) { update |= 2 ; }
offset = j ;
if ( & right( i0, i1, i2, i3, i4 ) !=
& right_stride( i0, i1, i2, i3, i4 ) ) { update |= 8 ; }
}
}
};
template< class DataType , class DeviceType >
struct TestViewOperator_LeftAndRight< DataType , DeviceType , 4 >
{
typedef typename DeviceType::execution_space execution_space ;
typedef typename DeviceType::memory_space memory_space ;
typedef typename execution_space::size_type size_type ;
typedef int value_type ;
KOKKOS_INLINE_FUNCTION
static void join( volatile value_type & update ,
const volatile value_type & input )
{ update |= input ; }
KOKKOS_INLINE_FUNCTION
static void init( value_type & update )
{ update = 0 ; }
typedef Kokkos::
View< DataType, Kokkos::LayoutLeft, execution_space > left_view ;
typedef Kokkos::
View< DataType, Kokkos::LayoutRight, execution_space > right_view ;
left_view left ;
right_view right ;
long left_alloc ;
long right_alloc ;
TestViewOperator_LeftAndRight()
: left( "left" )
, right( "right" )
, left_alloc( allocation_count( left ) )
, right_alloc( allocation_count( right ) )
{}
static void testit()
{
TestViewOperator_LeftAndRight driver ;
int error_flag = 0 ;
Kokkos::parallel_reduce( 1 , driver , error_flag );
ASSERT_EQ( error_flag , 0 );
}
KOKKOS_INLINE_FUNCTION
void operator()( const size_type , value_type & update ) const
{
long offset ;
offset = -1 ;
for ( unsigned i3 = 0 ; i3 < unsigned(left.dimension_3()) ; ++i3 )
for ( unsigned i2 = 0 ; i2 < unsigned(left.dimension_2()) ; ++i2 )
for ( unsigned i1 = 0 ; i1 < unsigned(left.dimension_1()) ; ++i1 )
for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
{
const long j = & left( i0, i1, i2, i3 ) -
& left( 0, 0, 0, 0 );
if ( j <= offset || left_alloc <= j ) { update |= 1 ; }
offset = j ;
}
offset = -1 ;
for ( unsigned i0 = 0 ; i0 < unsigned(right.dimension_0()) ; ++i0 )
for ( unsigned i1 = 0 ; i1 < unsigned(right.dimension_1()) ; ++i1 )
for ( unsigned i2 = 0 ; i2 < unsigned(right.dimension_2()) ; ++i2 )
for ( unsigned i3 = 0 ; i3 < unsigned(right.dimension_3()) ; ++i3 )
{
const long j = & right( i0, i1, i2, i3 ) -
& right( 0, 0, 0, 0 );
if ( j <= offset || right_alloc <= j ) { update |= 2 ; }
offset = j ;
}
}
};
template< class DataType , class DeviceType >
struct TestViewOperator_LeftAndRight< DataType , DeviceType , 3 >
{
typedef typename DeviceType::execution_space execution_space ;
typedef typename DeviceType::memory_space memory_space ;
typedef typename execution_space::size_type size_type ;
typedef int value_type ;
KOKKOS_INLINE_FUNCTION
static void join( volatile value_type & update ,
const volatile value_type & input )
{ update |= input ; }
KOKKOS_INLINE_FUNCTION
static void init( value_type & update )
{ update = 0 ; }
typedef Kokkos::
View< DataType, Kokkos::LayoutLeft, execution_space > left_view ;
typedef Kokkos::
View< DataType, Kokkos::LayoutRight, execution_space > right_view ;
typedef Kokkos::
View< DataType, Kokkos::LayoutStride, execution_space > stride_view ;
left_view left ;
right_view right ;
stride_view left_stride ;
stride_view right_stride ;
long left_alloc ;
long right_alloc ;
TestViewOperator_LeftAndRight()
: left( std::string("left") )
, right( std::string("right") )
, left_stride( left )
, right_stride( right )
, left_alloc( allocation_count( left ) )
, right_alloc( allocation_count( right ) )
{}
static void testit()
{
TestViewOperator_LeftAndRight driver ;
int error_flag = 0 ;
Kokkos::parallel_reduce( 1 , driver , error_flag );
ASSERT_EQ( error_flag , 0 );
}
KOKKOS_INLINE_FUNCTION
void operator()( const size_type , value_type & update ) const
{
long offset ;
offset = -1 ;
for ( unsigned i2 = 0 ; i2 < unsigned(left.dimension_2()) ; ++i2 )
for ( unsigned i1 = 0 ; i1 < unsigned(left.dimension_1()) ; ++i1 )
for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
{
const long j = & left( i0, i1, i2 ) -
& left( 0, 0, 0 );
if ( j <= offset || left_alloc <= j ) { update |= 1 ; }
offset = j ;
if ( & left(i0,i1,i2) != & left_stride(i0,i1,i2) ) { update |= 4 ; }
}
offset = -1 ;
for ( unsigned i0 = 0 ; i0 < unsigned(right.dimension_0()) ; ++i0 )
for ( unsigned i1 = 0 ; i1 < unsigned(right.dimension_1()) ; ++i1 )
for ( unsigned i2 = 0 ; i2 < unsigned(right.dimension_2()) ; ++i2 )
{
const long j = & right( i0, i1, i2 ) -
& right( 0, 0, 0 );
if ( j <= offset || right_alloc <= j ) { update |= 2 ; }
offset = j ;
if ( & right(i0,i1,i2) != & right_stride(i0,i1,i2) ) { update |= 8 ; }
}
-#if KOKKOS_USING_EXP_VIEW
for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
for ( unsigned i1 = 0 ; i1 < unsigned(left.dimension_1()) ; ++i1 )
for ( unsigned i2 = 0 ; i2 < unsigned(left.dimension_2()) ; ++i2 )
{
if ( & left(i0,i1,i2) != & left(i0,i1,i2,0,0,0,0,0) ) { update |= 3 ; }
if ( & right(i0,i1,i2) != & right(i0,i1,i2,0,0,0,0,0) ) { update |= 3 ; }
}
-#endif
}
};
template< class DataType , class DeviceType >
struct TestViewOperator_LeftAndRight< DataType , DeviceType , 2 >
{
typedef typename DeviceType::execution_space execution_space ;
typedef typename DeviceType::memory_space memory_space ;
typedef typename execution_space::size_type size_type ;
typedef int value_type ;
KOKKOS_INLINE_FUNCTION
static void join( volatile value_type & update ,
const volatile value_type & input )
{ update |= input ; }
KOKKOS_INLINE_FUNCTION
static void init( value_type & update )
{ update = 0 ; }
typedef Kokkos::
View< DataType, Kokkos::LayoutLeft, execution_space > left_view ;
typedef Kokkos::
View< DataType, Kokkos::LayoutRight, execution_space > right_view ;
left_view left ;
right_view right ;
long left_alloc ;
long right_alloc ;
TestViewOperator_LeftAndRight()
: left( "left" )
, right( "right" )
, left_alloc( allocation_count( left ) )
, right_alloc( allocation_count( right ) )
{}
static void testit()
{
TestViewOperator_LeftAndRight driver ;
int error_flag = 0 ;
Kokkos::parallel_reduce( 1 , driver , error_flag );
ASSERT_EQ( error_flag , 0 );
}
KOKKOS_INLINE_FUNCTION
void operator()( const size_type , value_type & update ) const
{
long offset ;
offset = -1 ;
for ( unsigned i1 = 0 ; i1 < unsigned(left.dimension_1()) ; ++i1 )
for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
{
const long j = & left( i0, i1 ) -
& left( 0, 0 );
if ( j <= offset || left_alloc <= j ) { update |= 1 ; }
offset = j ;
}
offset = -1 ;
for ( unsigned i0 = 0 ; i0 < unsigned(right.dimension_0()) ; ++i0 )
for ( unsigned i1 = 0 ; i1 < unsigned(right.dimension_1()) ; ++i1 )
{
const long j = & right( i0, i1 ) -
& right( 0, 0 );
if ( j <= offset || right_alloc <= j ) { update |= 2 ; }
offset = j ;
}
-#if KOKKOS_USING_EXP_VIEW
for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
for ( unsigned i1 = 0 ; i1 < unsigned(left.dimension_1()) ; ++i1 )
{
if ( & left(i0,i1) != & left(i0,i1,0,0,0,0,0,0) ) { update |= 3 ; }
if ( & right(i0,i1) != & right(i0,i1,0,0,0,0,0,0) ) { update |= 3 ; }
}
-#endif
}
};
template< class DataType , class DeviceType >
struct TestViewOperator_LeftAndRight< DataType , DeviceType , 1 >
{
typedef typename DeviceType::execution_space execution_space ;
typedef typename DeviceType::memory_space memory_space ;
typedef typename execution_space::size_type size_type ;
typedef int value_type ;
KOKKOS_INLINE_FUNCTION
static void join( volatile value_type & update ,
const volatile value_type & input )
{ update |= input ; }
KOKKOS_INLINE_FUNCTION
static void init( value_type & update )
{ update = 0 ; }
typedef Kokkos::
View< DataType, Kokkos::LayoutLeft, execution_space > left_view ;
typedef Kokkos::
View< DataType, Kokkos::LayoutRight, execution_space > right_view ;
typedef Kokkos::
View< DataType, Kokkos::LayoutStride, execution_space > stride_view ;
left_view left ;
right_view right ;
stride_view left_stride ;
stride_view right_stride ;
long left_alloc ;
long right_alloc ;
TestViewOperator_LeftAndRight()
: left( "left" )
, right( "right" )
, left_stride( left )
, right_stride( right )
, left_alloc( allocation_count( left ) )
, right_alloc( allocation_count( right ) )
{}
static void testit()
{
TestViewOperator_LeftAndRight driver ;
int error_flag = 0 ;
Kokkos::parallel_reduce( 1 , driver , error_flag );
ASSERT_EQ( error_flag , 0 );
}
KOKKOS_INLINE_FUNCTION
void operator()( const size_type , value_type & update ) const
{
for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
{
-#if KOKKOS_USING_EXP_VIEW
if ( & left(i0) != & left(i0,0,0,0,0,0,0,0) ) { update |= 3 ; }
if ( & right(i0) != & right(i0,0,0,0,0,0,0,0) ) { update |= 3 ; }
-#endif
if ( & left(i0) != & left_stride(i0) ) { update |= 4 ; }
if ( & right(i0) != & right_stride(i0) ) { update |= 8 ; }
}
}
};
template<class Layout, class DeviceType>
struct TestViewMirror {
template<class MemoryTraits>
void static test_mirror() {
Kokkos::View<double*, Layout, Kokkos::HostSpace> a_org("A",1000);
Kokkos::View<double*, Layout, Kokkos::HostSpace, MemoryTraits> a_h = a_org;
auto a_h2 = Kokkos::create_mirror(Kokkos::HostSpace(),a_h);
auto a_d = Kokkos::create_mirror(DeviceType(),a_h);
int equal_ptr_h_h2 = (a_h.data() ==a_h2.data())?1:0;
int equal_ptr_h_d = (a_h.data() ==a_d. data())?1:0;
int equal_ptr_h2_d = (a_h2.data()==a_d. data())?1:0;
ASSERT_EQ(equal_ptr_h_h2,0);
ASSERT_EQ(equal_ptr_h_d ,0);
ASSERT_EQ(equal_ptr_h2_d,0);
ASSERT_EQ(a_h.dimension_0(),a_h2.dimension_0());
ASSERT_EQ(a_h.dimension_0(),a_d .dimension_0());
}
template<class MemoryTraits>
void static test_mirror_view() {
Kokkos::View<double*, Layout, Kokkos::HostSpace> a_org("A",1000);
Kokkos::View<double*, Layout, Kokkos::HostSpace, MemoryTraits> a_h = a_org;
auto a_h2 = Kokkos::create_mirror_view(Kokkos::HostSpace(),a_h);
auto a_d = Kokkos::create_mirror_view(DeviceType(),a_h);
int equal_ptr_h_h2 = a_h.data() ==a_h2.data()?1:0;
int equal_ptr_h_d = a_h.data() ==a_d. data()?1:0;
int equal_ptr_h2_d = a_h2.data()==a_d. data()?1:0;
int is_same_memspace = std::is_same<Kokkos::HostSpace,typename DeviceType::memory_space>::value?1:0;
ASSERT_EQ(equal_ptr_h_h2,1);
ASSERT_EQ(equal_ptr_h_d ,is_same_memspace);
ASSERT_EQ(equal_ptr_h2_d ,is_same_memspace);
ASSERT_EQ(a_h.dimension_0(),a_h2.dimension_0());
ASSERT_EQ(a_h.dimension_0(),a_d .dimension_0());
}
void static testit() {
test_mirror<Kokkos::MemoryTraits<0>>();
test_mirror<Kokkos::MemoryTraits<Kokkos::Unmanaged>>();
test_mirror_view<Kokkos::MemoryTraits<0>>();
test_mirror_view<Kokkos::MemoryTraits<Kokkos::Unmanaged>>();
}
};
/*--------------------------------------------------------------------------*/
template< typename T, class DeviceType >
class TestViewAPI
{
public:
typedef DeviceType device ;
enum { N0 = 1000 ,
N1 = 3 ,
N2 = 5 ,
N3 = 7 };
typedef Kokkos::View< T , device > dView0 ;
typedef Kokkos::View< T* , device > dView1 ;
typedef Kokkos::View< T*[N1] , device > dView2 ;
typedef Kokkos::View< T*[N1][N2] , device > dView3 ;
typedef Kokkos::View< T*[N1][N2][N3] , device > dView4 ;
typedef Kokkos::View< const T*[N1][N2][N3] , device > const_dView4 ;
typedef Kokkos::View< T****, device, Kokkos::MemoryUnmanaged > dView4_unmanaged ;
typedef typename dView0::host_mirror_space host ;
TestViewAPI()
{
run_test_mirror();
run_test();
run_test_scalar();
run_test_const();
run_test_subview();
run_test_subview_strided();
run_test_vector();
TestViewOperator< T , device >::testit();
TestViewOperator_LeftAndRight< int[2][3][4][2][3][4][2][3] , device >::testit();
TestViewOperator_LeftAndRight< int[2][3][4][2][3][4][2] , device >::testit();
TestViewOperator_LeftAndRight< int[2][3][4][2][3][4] , device >::testit();
TestViewOperator_LeftAndRight< int[2][3][4][2][3] , device >::testit();
TestViewOperator_LeftAndRight< int[2][3][4][2] , device >::testit();
TestViewOperator_LeftAndRight< int[2][3][4] , device >::testit();
TestViewOperator_LeftAndRight< int[2][3] , device >::testit();
TestViewOperator_LeftAndRight< int[2] , device >::testit();
TestViewMirror<Kokkos::LayoutLeft, device >::testit();
TestViewMirror<Kokkos::LayoutRight, device >::testit();
}
static void run_test_mirror()
{
typedef Kokkos::View< int , host > view_type ;
typedef typename view_type::HostMirror mirror_type ;
static_assert( std::is_same< typename view_type::memory_space
, typename mirror_type::memory_space
>::value , "" );
view_type a("a");
mirror_type am = Kokkos::create_mirror_view(a);
mirror_type ax = Kokkos::create_mirror(a);
ASSERT_EQ( & a() , & am() );
}
static void run_test_scalar()
{
typedef typename dView0::HostMirror hView0 ;
dView0 dx , dy ;
hView0 hx , hy ;
dx = dView0( "dx" );
dy = dView0( "dy" );
hx = Kokkos::create_mirror( dx );
hy = Kokkos::create_mirror( dy );
hx() = 1 ;
Kokkos::deep_copy( dx , hx );
Kokkos::deep_copy( dy , dx );
Kokkos::deep_copy( hy , dy );
ASSERT_EQ( hx(), hy() );
}
static void run_test()
{
// mfh 14 Feb 2014: This test doesn't actually create instances of
// these types. In order to avoid "declared but unused typedef"
// warnings, we declare empty instances of these types, with the
// usual "(void)" marker to avoid compiler warnings for unused
// variables.
typedef typename dView0::HostMirror hView0 ;
typedef typename dView1::HostMirror hView1 ;
typedef typename dView2::HostMirror hView2 ;
typedef typename dView3::HostMirror hView3 ;
typedef typename dView4::HostMirror hView4 ;
{
hView0 thing;
(void) thing;
}
{
hView1 thing;
(void) thing;
}
{
hView2 thing;
(void) thing;
}
{
hView3 thing;
(void) thing;
}
{
hView4 thing;
(void) thing;
}
dView4 dx , dy , dz ;
hView4 hx , hy , hz ;
ASSERT_TRUE( dx.ptr_on_device() == 0 );
ASSERT_TRUE( dy.ptr_on_device() == 0 );
ASSERT_TRUE( dz.ptr_on_device() == 0 );
ASSERT_TRUE( hx.ptr_on_device() == 0 );
ASSERT_TRUE( hy.ptr_on_device() == 0 );
ASSERT_TRUE( hz.ptr_on_device() == 0 );
ASSERT_EQ( dx.dimension_0() , 0u );
ASSERT_EQ( dy.dimension_0() , 0u );
ASSERT_EQ( dz.dimension_0() , 0u );
ASSERT_EQ( hx.dimension_0() , 0u );
ASSERT_EQ( hy.dimension_0() , 0u );
ASSERT_EQ( hz.dimension_0() , 0u );
ASSERT_EQ( dx.dimension_1() , unsigned(N1) );
ASSERT_EQ( dy.dimension_1() , unsigned(N1) );
ASSERT_EQ( dz.dimension_1() , unsigned(N1) );
ASSERT_EQ( hx.dimension_1() , unsigned(N1) );
ASSERT_EQ( hy.dimension_1() , unsigned(N1) );
ASSERT_EQ( hz.dimension_1() , unsigned(N1) );
dx = dView4( "dx" , N0 );
dy = dView4( "dy" , N0 );
- #if KOKKOS_USING_EXP_VIEW
ASSERT_EQ( dx.use_count() , size_t(1) );
- #else
- ASSERT_EQ( dx.tracker().ref_count() , size_t(1) );
- #endif
dView4_unmanaged unmanaged_dx = dx;
- #if KOKKOS_USING_EXP_VIEW
ASSERT_EQ( dx.use_count() , size_t(1) );
- #else
- ASSERT_EQ( dx.tracker().ref_count() , size_t(1) );
- #endif
dView4_unmanaged unmanaged_from_ptr_dx = dView4_unmanaged(dx.ptr_on_device(),
dx.dimension_0(),
dx.dimension_1(),
dx.dimension_2(),
dx.dimension_3());
{
// Destruction of this view should be harmless
const_dView4 unmanaged_from_ptr_const_dx( dx.ptr_on_device() ,
dx.dimension_0() ,
dx.dimension_1() ,
dx.dimension_2() ,
dx.dimension_3() );
}
const_dView4 const_dx = dx ;
- #if KOKKOS_USING_EXP_VIEW
ASSERT_EQ( dx.use_count() , size_t(2) );
- #else
- ASSERT_EQ( dx.tracker().ref_count() , size_t(2) );
- #endif
{
const_dView4 const_dx2;
const_dx2 = const_dx;
- #if KOKKOS_USING_EXP_VIEW
ASSERT_EQ( dx.use_count() , size_t(3) );
- #else
- ASSERT_EQ( dx.tracker().ref_count() , size_t(3) );
- #endif
const_dx2 = dy;
- #if KOKKOS_USING_EXP_VIEW
ASSERT_EQ( dx.use_count() , size_t(2) );
- #else
- ASSERT_EQ( dx.tracker().ref_count() , size_t(2) );
- #endif
const_dView4 const_dx3(dx);
- #if KOKKOS_USING_EXP_VIEW
ASSERT_EQ( dx.use_count() , size_t(3) );
- #else
- ASSERT_EQ( dx.tracker().ref_count() , size_t(3) );
- #endif
dView4_unmanaged dx4_unmanaged(dx);
- #if KOKKOS_USING_EXP_VIEW
ASSERT_EQ( dx.use_count() , size_t(3) );
- #else
- ASSERT_EQ( dx.tracker().ref_count() , size_t(3) );
- #endif
}
- #if KOKKOS_USING_EXP_VIEW
ASSERT_EQ( dx.use_count() , size_t(2) );
- #else
- ASSERT_EQ( dx.tracker().ref_count() , size_t(2) );
- #endif
ASSERT_FALSE( dx.ptr_on_device() == 0 );
ASSERT_FALSE( const_dx.ptr_on_device() == 0 );
ASSERT_FALSE( unmanaged_dx.ptr_on_device() == 0 );
ASSERT_FALSE( unmanaged_from_ptr_dx.ptr_on_device() == 0 );
ASSERT_FALSE( dy.ptr_on_device() == 0 );
ASSERT_NE( dx , dy );
ASSERT_EQ( dx.dimension_0() , unsigned(N0) );
ASSERT_EQ( dx.dimension_1() , unsigned(N1) );
ASSERT_EQ( dx.dimension_2() , unsigned(N2) );
ASSERT_EQ( dx.dimension_3() , unsigned(N3) );
ASSERT_EQ( dy.dimension_0() , unsigned(N0) );
ASSERT_EQ( dy.dimension_1() , unsigned(N1) );
ASSERT_EQ( dy.dimension_2() , unsigned(N2) );
ASSERT_EQ( dy.dimension_3() , unsigned(N3) );
ASSERT_EQ( unmanaged_from_ptr_dx.capacity(),unsigned(N0)*unsigned(N1)*unsigned(N2)*unsigned(N3) );
hx = Kokkos::create_mirror( dx );
hy = Kokkos::create_mirror( dy );
// T v1 = hx() ; // Generates compile error as intended
// T v2 = hx(0,0) ; // Generates compile error as intended
// hx(0,0) = v2 ; // Generates compile error as intended
-#if ! KOKKOS_USING_EXP_VIEW
// Testing with asynchronous deep copy with respect to device
{
size_t count = 0 ;
for ( size_t ip = 0 ; ip < N0 ; ++ip ) {
for ( size_t i1 = 0 ; i1 < hx.dimension_1() ; ++i1 ) {
for ( size_t i2 = 0 ; i2 < hx.dimension_2() ; ++i2 ) {
for ( size_t i3 = 0 ; i3 < hx.dimension_3() ; ++i3 ) {
hx(ip,i1,i2,i3) = ++count ;
}}}}
Kokkos::deep_copy(typename hView4::execution_space(), dx , hx );
Kokkos::deep_copy(typename hView4::execution_space(), dy , dx );
Kokkos::deep_copy(typename hView4::execution_space(), hy , dy );
for ( size_t ip = 0 ; ip < N0 ; ++ip ) {
for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
{ ASSERT_EQ( hx(ip,i1,i2,i3) , hy(ip,i1,i2,i3) ); }
}}}}
Kokkos::deep_copy(typename hView4::execution_space(), dx , T(0) );
Kokkos::deep_copy(typename hView4::execution_space(), hx , dx );
for ( size_t ip = 0 ; ip < N0 ; ++ip ) {
for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
{ ASSERT_EQ( hx(ip,i1,i2,i3) , T(0) ); }
}}}}
}
// Testing with asynchronous deep copy with respect to host
{
size_t count = 0 ;
for ( size_t ip = 0 ; ip < N0 ; ++ip ) {
for ( size_t i1 = 0 ; i1 < hx.dimension_1() ; ++i1 ) {
for ( size_t i2 = 0 ; i2 < hx.dimension_2() ; ++i2 ) {
for ( size_t i3 = 0 ; i3 < hx.dimension_3() ; ++i3 ) {
hx(ip,i1,i2,i3) = ++count ;
}}}}
Kokkos::deep_copy(typename dView4::execution_space(), dx , hx );
Kokkos::deep_copy(typename dView4::execution_space(), dy , dx );
Kokkos::deep_copy(typename dView4::execution_space(), hy , dy );
for ( size_t ip = 0 ; ip < N0 ; ++ip ) {
for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
{ ASSERT_EQ( hx(ip,i1,i2,i3) , hy(ip,i1,i2,i3) ); }
}}}}
Kokkos::deep_copy(typename dView4::execution_space(), dx , T(0) );
Kokkos::deep_copy(typename dView4::execution_space(), hx , dx );
for ( size_t ip = 0 ; ip < N0 ; ++ip ) {
for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
{ ASSERT_EQ( hx(ip,i1,i2,i3) , T(0) ); }
}}}}
}
-#endif /* #if ! KOKKOS_USING_EXP_VIEW */
// Testing with synchronous deep copy
{
size_t count = 0 ;
for ( size_t ip = 0 ; ip < N0 ; ++ip ) {
for ( size_t i1 = 0 ; i1 < hx.dimension_1() ; ++i1 ) {
for ( size_t i2 = 0 ; i2 < hx.dimension_2() ; ++i2 ) {
for ( size_t i3 = 0 ; i3 < hx.dimension_3() ; ++i3 ) {
hx(ip,i1,i2,i3) = ++count ;
}}}}
Kokkos::deep_copy( dx , hx );
Kokkos::deep_copy( dy , dx );
Kokkos::deep_copy( hy , dy );
for ( size_t ip = 0 ; ip < N0 ; ++ip ) {
for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
{ ASSERT_EQ( hx(ip,i1,i2,i3) , hy(ip,i1,i2,i3) ); }
}}}}
Kokkos::deep_copy( dx , T(0) );
Kokkos::deep_copy( hx , dx );
for ( size_t ip = 0 ; ip < N0 ; ++ip ) {
for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
{ ASSERT_EQ( hx(ip,i1,i2,i3) , T(0) ); }
}}}}
}
dz = dx ; ASSERT_EQ( dx, dz); ASSERT_NE( dy, dz);
dz = dy ; ASSERT_EQ( dy, dz); ASSERT_NE( dx, dz);
dx = dView4();
ASSERT_TRUE( dx.ptr_on_device() == 0 );
ASSERT_FALSE( dy.ptr_on_device() == 0 );
ASSERT_FALSE( dz.ptr_on_device() == 0 );
dy = dView4();
ASSERT_TRUE( dx.ptr_on_device() == 0 );
ASSERT_TRUE( dy.ptr_on_device() == 0 );
ASSERT_FALSE( dz.ptr_on_device() == 0 );
dz = dView4();
ASSERT_TRUE( dx.ptr_on_device() == 0 );
ASSERT_TRUE( dy.ptr_on_device() == 0 );
ASSERT_TRUE( dz.ptr_on_device() == 0 );
}
typedef T DataType[2] ;
static void
check_auto_conversion_to_const(
const Kokkos::View< const DataType , device > & arg_const ,
const Kokkos::View< DataType , device > & arg )
{
ASSERT_TRUE( arg_const == arg );
}
static void run_test_const()
{
typedef Kokkos::View< DataType , device > typeX ;
typedef Kokkos::View< const DataType , device > const_typeX ;
typedef Kokkos::View< const DataType , device , Kokkos::MemoryRandomAccess > const_typeR ;
typeX x( "X" );
const_typeX xc = x ;
const_typeR xr = x ;
ASSERT_TRUE( xc == x );
ASSERT_TRUE( x == xc );
// For CUDA the constant random access View does not return
// an lvalue reference due to retrieving through texture cache
// therefore not allowed to query the underlying pointer.
#if defined( KOKKOS_HAVE_CUDA )
if ( ! std::is_same< typename device::execution_space , Kokkos::Cuda >::value )
#endif
{
ASSERT_TRUE( x.ptr_on_device() == xr.ptr_on_device() );
}
// typeX xf = xc ; // setting non-const from const must not compile
check_auto_conversion_to_const( x , x );
}
static void run_test_subview()
{
typedef Kokkos::View< const T , device > sView ;
dView0 d0( "d0" );
dView1 d1( "d1" , N0 );
dView2 d2( "d2" , N0 );
dView3 d3( "d3" , N0 );
dView4 d4( "d4" , N0 );
sView s0 = d0 ;
sView s1 = Kokkos::subview( d1 , 1 );
sView s2 = Kokkos::subview( d2 , 1 , 1 );
sView s3 = Kokkos::subview( d3 , 1 , 1 , 1 );
sView s4 = Kokkos::subview( d4 , 1 , 1 , 1 , 1 );
}
static void run_test_subview_strided()
{
typedef Kokkos::View< int **** , Kokkos::LayoutLeft , host > view_left_4 ;
typedef Kokkos::View< int **** , Kokkos::LayoutRight , host > view_right_4 ;
typedef Kokkos::View< int ** , Kokkos::LayoutLeft , host > view_left_2 ;
typedef Kokkos::View< int ** , Kokkos::LayoutRight , host > view_right_2 ;
typedef Kokkos::View< int * , Kokkos::LayoutStride , host > view_stride_1 ;
typedef Kokkos::View< int ** , Kokkos::LayoutStride , host > view_stride_2 ;
view_left_2 xl2("xl2", 100 , 200 );
view_right_2 xr2("xr2", 100 , 200 );
view_stride_1 yl1 = Kokkos::subview( xl2 , 0 , Kokkos::ALL() );
view_stride_1 yl2 = Kokkos::subview( xl2 , 1 , Kokkos::ALL() );
view_stride_1 yr1 = Kokkos::subview( xr2 , 0 , Kokkos::ALL() );
view_stride_1 yr2 = Kokkos::subview( xr2 , 1 , Kokkos::ALL() );
ASSERT_EQ( yl1.dimension_0() , xl2.dimension_1() );
ASSERT_EQ( yl2.dimension_0() , xl2.dimension_1() );
ASSERT_EQ( yr1.dimension_0() , xr2.dimension_1() );
ASSERT_EQ( yr2.dimension_0() , xr2.dimension_1() );
ASSERT_EQ( & yl1(0) - & xl2(0,0) , 0 );
ASSERT_EQ( & yl2(0) - & xl2(1,0) , 0 );
ASSERT_EQ( & yr1(0) - & xr2(0,0) , 0 );
ASSERT_EQ( & yr2(0) - & xr2(1,0) , 0 );
view_left_4 xl4( "xl4" , 10 , 20 , 30 , 40 );
view_right_4 xr4( "xr4" , 10 , 20 , 30 , 40 );
view_stride_2 yl4 = Kokkos::subview( xl4 , 1 , Kokkos::ALL() , 2 , Kokkos::ALL() );
view_stride_2 yr4 = Kokkos::subview( xr4 , 1 , Kokkos::ALL() , 2 , Kokkos::ALL() );
ASSERT_EQ( yl4.dimension_0() , xl4.dimension_1() );
ASSERT_EQ( yl4.dimension_1() , xl4.dimension_3() );
ASSERT_EQ( yr4.dimension_0() , xr4.dimension_1() );
ASSERT_EQ( yr4.dimension_1() , xr4.dimension_3() );
ASSERT_EQ( & yl4(4,4) - & xl4(1,4,2,4) , 0 );
ASSERT_EQ( & yr4(4,4) - & xr4(1,4,2,4) , 0 );
}
static void run_test_vector()
{
static const unsigned Length = 1000 , Count = 8 ;
typedef Kokkos::View< T* , Kokkos::LayoutLeft , host > vector_type ;
typedef Kokkos::View< T** , Kokkos::LayoutLeft , host > multivector_type ;
typedef Kokkos::View< T* , Kokkos::LayoutRight , host > vector_right_type ;
typedef Kokkos::View< T** , Kokkos::LayoutRight , host > multivector_right_type ;
typedef Kokkos::View< const T* , Kokkos::LayoutRight, host > const_vector_right_type ;
typedef Kokkos::View< const T* , Kokkos::LayoutLeft , host > const_vector_type ;
typedef Kokkos::View< const T** , Kokkos::LayoutLeft , host > const_multivector_type ;
multivector_type mv = multivector_type( "mv" , Length , Count );
multivector_right_type mv_right = multivector_right_type( "mv" , Length , Count );
vector_type v1 = Kokkos::subview( mv , Kokkos::ALL() , 0 );
vector_type v2 = Kokkos::subview( mv , Kokkos::ALL() , 1 );
vector_type v3 = Kokkos::subview( mv , Kokkos::ALL() , 2 );
vector_type rv1 = Kokkos::subview( mv_right , 0 , Kokkos::ALL() );
vector_type rv2 = Kokkos::subview( mv_right , 1 , Kokkos::ALL() );
vector_type rv3 = Kokkos::subview( mv_right , 2 , Kokkos::ALL() );
multivector_type mv1 = Kokkos::subview( mv , std::make_pair( 1 , 998 ) ,
std::make_pair( 2 , 5 ) );
multivector_right_type mvr1 =
Kokkos::subview( mv_right ,
std::make_pair( 1 , 998 ) ,
std::make_pair( 2 , 5 ) );
const_vector_type cv1 = Kokkos::subview( mv , Kokkos::ALL(), 0 );
const_vector_type cv2 = Kokkos::subview( mv , Kokkos::ALL(), 1 );
const_vector_type cv3 = Kokkos::subview( mv , Kokkos::ALL(), 2 );
vector_right_type vr1 = Kokkos::subview( mv , Kokkos::ALL() , 0 );
vector_right_type vr2 = Kokkos::subview( mv , Kokkos::ALL() , 1 );
vector_right_type vr3 = Kokkos::subview( mv , Kokkos::ALL() , 2 );
const_vector_right_type cvr1 = Kokkos::subview( mv , Kokkos::ALL() , 0 );
const_vector_right_type cvr2 = Kokkos::subview( mv , Kokkos::ALL() , 1 );
const_vector_right_type cvr3 = Kokkos::subview( mv , Kokkos::ALL() , 2 );
ASSERT_TRUE( & v1[0] == & v1(0) );
ASSERT_TRUE( & v1[0] == & mv(0,0) );
ASSERT_TRUE( & v2[0] == & mv(0,1) );
ASSERT_TRUE( & v3[0] == & mv(0,2) );
ASSERT_TRUE( & cv1[0] == & mv(0,0) );
ASSERT_TRUE( & cv2[0] == & mv(0,1) );
ASSERT_TRUE( & cv3[0] == & mv(0,2) );
ASSERT_TRUE( & vr1[0] == & mv(0,0) );
ASSERT_TRUE( & vr2[0] == & mv(0,1) );
ASSERT_TRUE( & vr3[0] == & mv(0,2) );
ASSERT_TRUE( & cvr1[0] == & mv(0,0) );
ASSERT_TRUE( & cvr2[0] == & mv(0,1) );
ASSERT_TRUE( & cvr3[0] == & mv(0,2) );
ASSERT_TRUE( & mv1(0,0) == & mv( 1 , 2 ) );
ASSERT_TRUE( & mv1(1,1) == & mv( 2 , 3 ) );
ASSERT_TRUE( & mv1(3,2) == & mv( 4 , 4 ) );
ASSERT_TRUE( & mvr1(0,0) == & mv_right( 1 , 2 ) );
ASSERT_TRUE( & mvr1(1,1) == & mv_right( 2 , 3 ) );
ASSERT_TRUE( & mvr1(3,2) == & mv_right( 4 , 4 ) );
const_vector_type c_cv1( v1 );
typename vector_type::const_type c_cv2( v2 );
typename const_vector_type::const_type c_ccv2( v2 );
const_multivector_type cmv( mv );
typename multivector_type::const_type cmvX( cmv );
typename const_multivector_type::const_type ccmvX( cmv );
}
};
} // namespace Test
/*--------------------------------------------------------------------------*/
diff --git a/lib/kokkos/core/unit_test/TestViewImpl.hpp b/lib/kokkos/core/unit_test/TestViewImpl.hpp
deleted file mode 100644
index c34ef759d..000000000
--- a/lib/kokkos/core/unit_test/TestViewImpl.hpp
+++ /dev/null
@@ -1,289 +0,0 @@
-/*
-//@HEADER
-// ************************************************************************
-//
-// Kokkos v. 2.0
-// Copyright (2014) Sandia Corporation
-//
-// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
-// the U.S. Government retains certain rights in this software.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are
-// met:
-//
-// 1. Redistributions of source code must retain the above copyright
-// notice, this list of conditions and the following disclaimer.
-//
-// 2. Redistributions in binary form must reproduce the above copyright
-// notice, this list of conditions and the following disclaimer in the
-// documentation and/or other materials provided with the distribution.
-//
-// 3. Neither the name of the Corporation nor the names of the
-// contributors may be used to endorse or promote products derived from
-// this software without specific prior written permission.
-//
-// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
-// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
-// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
-// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
-// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
-// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
-// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
-// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
-// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-//
-// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
-// ************************************************************************
-//@HEADER
-*/
-
-#include <gtest/gtest.h>
-
-#include <stdexcept>
-#include <sstream>
-#include <iostream>
-
-#include <Kokkos_Core.hpp>
-
-/*--------------------------------------------------------------------------*/
-
-#if KOKKOS_USING_EXP_VIEW
-
-namespace Test {
-
-template < class Device >
-void test_view_impl() {}
-
-}
-
-#else
-
-/*--------------------------------------------------------------------------*/
-
-namespace Test {
-
-struct DummyMemorySpace
-{
- typedef DummyMemorySpace memory_space ;
- typedef unsigned size_type ;
-};
-
-/*--------------------------------------------------------------------------*/
-
-template< class Type >
-struct DefineShape {
- typedef typename Kokkos::Impl::AnalyzeShape<Type>::shape type ;
-};
-
-template< class Type >
-struct ExtractValueType {
- typedef typename Kokkos::Impl::AnalyzeShape<Type>::value_type type ;
-};
-
-template< class Type >
-struct ArrayType { typedef Type type ; };
-
-template < class Device >
-void test_view_impl()
-{
- //typedef typename Device::memory_space memory_space ; // unused
-
- typedef ArrayType< int[100] >::type type_01 ;
- typedef ArrayType< int* >::type type_11 ;
- typedef ArrayType< int[5][6][700] >::type type_03 ;
- typedef ArrayType< double*[8][9][900] >::type type_14 ;
- typedef ArrayType< long** >::type type_22 ;
- typedef ArrayType< short **[5][6][7] >::type type_25 ;
- typedef ArrayType< const short **[5][6][7] >::type const_type_25 ;
- typedef ArrayType< short***[5][6][7] >::type type_36 ;
- typedef ArrayType< const short***[5][6][7] >::type const_type_36 ;
-
- // mfh 14 Feb 2014: With gcc 4.8.2 -Wall, this emits a warning:
- //
- // typedef ‘ok_const_25’ locally defined but not used [-Wunused-local-typedefs]
- //
- // It's unfortunate that this is the case, because the typedef is
- // being used for a compile-time check! We deal with this by
- // declaring an instance of ok_const_25, and marking it with
- // "(void)" so that instance doesn't emit an "unused variable"
- // warning.
- //
- // typedef typename Kokkos::Impl::StaticAssertSame<
- // typename Kokkos::Impl::AnalyzeShape<type_25>::const_type ,
- // typename Kokkos::Impl::AnalyzeShape<const_type_25>::type
- // > ok_const_25 ;
-
- typedef typename Kokkos::Impl::StaticAssertSame<
- typename Kokkos::Impl::AnalyzeShape<type_25>::const_type,
- typename Kokkos::Impl::AnalyzeShape<const_type_25>::type
- > ok_const_25 ;
-
- typedef typename Kokkos::Impl::StaticAssertSame<
- typename Kokkos::Impl::AnalyzeShape<type_36>::const_type,
- typename Kokkos::Impl::AnalyzeShape<const_type_36>::type
- > ok_const_36 ;
- {
- ok_const_25 thing_25 ;
- ok_const_36 thing_36 ;
- (void) thing_25 ; // silence warning for unused variable
- (void) thing_36 ; // silence warning for unused variable
- }
-
- ASSERT_TRUE( ( Kokkos::Impl::is_same< ExtractValueType<type_03>::type , int >::value ) );
- ASSERT_TRUE( ( Kokkos::Impl::is_same< ExtractValueType<type_14>::type , double >::value ) );
- ASSERT_TRUE( ( Kokkos::Impl::is_same< ExtractValueType<type_22>::type , long >::value ) );
- ASSERT_TRUE( ( Kokkos::Impl::is_same< ExtractValueType<type_36>::type , short >::value ) );
-
- ASSERT_FALSE( ( Kokkos::Impl::is_same< ExtractValueType<type_36>::type , int >::value ) );
-
- typedef typename DefineShape< type_01 >::type shape_01_type ;
- typedef typename DefineShape< type_11 >::type shape_11_type ;
- typedef typename DefineShape< type_03 >::type shape_03_type ;
- typedef typename DefineShape< type_14 >::type shape_14_type ;
- typedef typename DefineShape< type_22 >::type shape_22_type ;
- typedef typename DefineShape< type_36 >::type shape_36_type ;
-
- ASSERT_TRUE( ( Kokkos::Impl::StaticAssert< shape_36_type::rank == 6 >::value ) );
- ASSERT_TRUE( ( Kokkos::Impl::StaticAssert< shape_03_type::rank == 3 >::value ) );
-
- shape_01_type shape_01 ; shape_01_type::assign( shape_01 );
- shape_11_type shape_11 ; shape_11_type::assign( shape_11, 1000 );
- shape_03_type shape_03 ; shape_03_type::assign( shape_03 );
- shape_14_type shape_14 ; shape_14_type::assign( shape_14 , 0 );
- shape_22_type shape_22 ; shape_22_type::assign( shape_22 , 0 , 0 );
- shape_36_type shape_36 ; shape_36_type::assign( shape_36 , 10 , 20 , 30 );
-
- ASSERT_TRUE( shape_01.rank_dynamic == 0u );
- ASSERT_TRUE( shape_01.rank == 1u );
- ASSERT_TRUE( shape_01.N0 == 100u );
-
- ASSERT_TRUE( shape_11.rank_dynamic == 1u );
- ASSERT_TRUE( shape_11.rank == 1u );
- ASSERT_TRUE( shape_11.N0 == 1000u );
-
- ASSERT_TRUE( shape_03.rank_dynamic == 0u );
- ASSERT_TRUE( shape_03.rank == 3u );
- ASSERT_TRUE( shape_03.N0 == 5u );
- ASSERT_TRUE( shape_03.N1 == 6u );
- ASSERT_TRUE( shape_03.N2 == 700u );
-
- ASSERT_TRUE( shape_14.rank_dynamic == 1u );
- ASSERT_TRUE( shape_14.rank == 4u );
- ASSERT_TRUE( shape_14.N0 == 0u );
- ASSERT_TRUE( shape_14.N1 == 8u );
- ASSERT_TRUE( shape_14.N2 == 9u );
- ASSERT_TRUE( shape_14.N3 == 900u );
-
- ASSERT_TRUE( shape_22.rank_dynamic == 2u );
- ASSERT_TRUE( shape_22.rank == 2u );
- ASSERT_TRUE( shape_22.N0 == 0u );
- ASSERT_TRUE( shape_22.N1 == 0u );
-
- ASSERT_TRUE( shape_36.rank_dynamic == 3u );
- ASSERT_TRUE( shape_36.rank == 6u );
- ASSERT_TRUE( shape_36.N0 == 10u );
- ASSERT_TRUE( shape_36.N1 == 20u );
- ASSERT_TRUE( shape_36.N2 == 30u );
- ASSERT_TRUE( shape_36.N3 == 5u );
- ASSERT_TRUE( shape_36.N4 == 6u );
- ASSERT_TRUE( shape_36.N5 == 7u );
-
-
- ASSERT_TRUE( shape_01 == shape_01 );
- ASSERT_TRUE( shape_11 == shape_11 );
- ASSERT_TRUE( shape_36 == shape_36 );
- ASSERT_TRUE( shape_01 != shape_36 );
- ASSERT_TRUE( shape_22 != shape_36 );
-
- //------------------------------------------------------------------------
-
- typedef Kokkos::Impl::ViewOffset< shape_01_type , Kokkos::LayoutLeft > shape_01_left_offset ;
- typedef Kokkos::Impl::ViewOffset< shape_11_type , Kokkos::LayoutLeft > shape_11_left_offset ;
- typedef Kokkos::Impl::ViewOffset< shape_03_type , Kokkos::LayoutLeft > shape_03_left_offset ;
- typedef Kokkos::Impl::ViewOffset< shape_14_type , Kokkos::LayoutLeft > shape_14_left_offset ;
- typedef Kokkos::Impl::ViewOffset< shape_22_type , Kokkos::LayoutLeft > shape_22_left_offset ;
- typedef Kokkos::Impl::ViewOffset< shape_36_type , Kokkos::LayoutLeft > shape_36_left_offset ;
-
- typedef Kokkos::Impl::ViewOffset< shape_01_type , Kokkos::LayoutRight > shape_01_right_offset ;
- typedef Kokkos::Impl::ViewOffset< shape_11_type , Kokkos::LayoutRight > shape_11_right_offset ;
- typedef Kokkos::Impl::ViewOffset< shape_03_type , Kokkos::LayoutRight > shape_03_right_offset ;
- typedef Kokkos::Impl::ViewOffset< shape_14_type , Kokkos::LayoutRight > shape_14_right_offset ;
- typedef Kokkos::Impl::ViewOffset< shape_22_type , Kokkos::LayoutRight > shape_22_right_offset ;
- typedef Kokkos::Impl::ViewOffset< shape_36_type , Kokkos::LayoutRight > shape_36_right_offset ;
-
- ASSERT_TRUE( ! shape_01_left_offset::has_padding );
- ASSERT_TRUE( ! shape_11_left_offset::has_padding );
- ASSERT_TRUE( ! shape_03_left_offset::has_padding );
- ASSERT_TRUE( shape_14_left_offset::has_padding );
- ASSERT_TRUE( shape_22_left_offset::has_padding );
- ASSERT_TRUE( shape_36_left_offset::has_padding );
-
- ASSERT_TRUE( ! shape_01_right_offset::has_padding );
- ASSERT_TRUE( ! shape_11_right_offset::has_padding );
- ASSERT_TRUE( ! shape_03_right_offset::has_padding );
- ASSERT_TRUE( ! shape_14_right_offset::has_padding );
- ASSERT_TRUE( shape_22_right_offset::has_padding );
- ASSERT_TRUE( shape_36_right_offset::has_padding );
-
- //------------------------------------------------------------------------
-
- typedef Kokkos::Impl::ViewOffset< shape_01_type , Kokkos::LayoutStride > shape_01_stride_offset ;
- typedef Kokkos::Impl::ViewOffset< shape_36_type , Kokkos::LayoutStride > shape_36_stride_offset ;
-
- {
- shape_01_stride_offset stride_offset_01 ;
-
- stride_offset_01.assign( 1, stride_offset_01.N0, 0,0,0,0,0,0,0 );
-
- ASSERT_EQ( int(stride_offset_01.S[0]) , int(1) );
- ASSERT_EQ( int(stride_offset_01.S[1]) , int(stride_offset_01.N0) );
- }
-
- {
- shape_36_stride_offset stride_offset_36 ;
-
- size_t str[7] ;
- str[5] = 1 ;
- str[4] = str[5] * stride_offset_36.N5 ;
- str[3] = str[4] * stride_offset_36.N4 ;
- str[2] = str[3] * stride_offset_36.N3 ;
- str[1] = str[2] * 100 ;
- str[0] = str[1] * 200 ;
- str[6] = str[0] * 300 ;
-
- stride_offset_36.assign( str[0] , str[1] , str[2] , str[3] , str[4] , str[5] , str[6] , 0 , 0 );
-
- ASSERT_EQ( size_t(stride_offset_36.S[6]) , size_t(str[6]) );
- ASSERT_EQ( size_t(stride_offset_36.N2) , size_t(100) );
- ASSERT_EQ( size_t(stride_offset_36.N1) , size_t(200) );
- ASSERT_EQ( size_t(stride_offset_36.N0) , size_t(300) );
- }
-
- //------------------------------------------------------------------------
-
- {
- const int rank = 6 ;
- const int order[] = { 5 , 3 , 1 , 0 , 2 , 4 };
- const unsigned dim[] = { 2 , 3 , 5 , 7 , 11 , 13 };
- Kokkos::LayoutStride stride_6 = Kokkos::LayoutStride::order_dimensions( rank , order , dim );
- size_t n = 1 ;
- for ( int i = 0 ; i < rank ; ++i ) {
- ASSERT_EQ( size_t(dim[i]) , size_t( stride_6.dimension[i] ) );
- ASSERT_EQ( size_t(n) , size_t( stride_6.stride[ order[i] ] ) );
- n *= dim[order[i]] ;
- }
- }
-
- //------------------------------------------------------------------------
-}
-
-} /* namespace Test */
-
-#endif
-
-/*--------------------------------------------------------------------------*/
-
diff --git a/lib/kokkos/core/unit_test/TestViewMapping.hpp b/lib/kokkos/core/unit_test/TestViewMapping.hpp
index eddb81bed..8989ee74c 100644
--- a/lib/kokkos/core/unit_test/TestViewMapping.hpp
+++ b/lib/kokkos/core/unit_test/TestViewMapping.hpp
@@ -1,1307 +1,1427 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <gtest/gtest.h>
#include <stdexcept>
#include <sstream>
#include <iostream>
#include <Kokkos_Core.hpp>
/*--------------------------------------------------------------------------*/
namespace Test {
template< class Space >
void test_view_mapping()
{
typedef typename Space::execution_space ExecSpace ;
typedef Kokkos::Experimental::Impl::ViewDimension<> dim_0 ;
typedef Kokkos::Experimental::Impl::ViewDimension<2> dim_s2 ;
typedef Kokkos::Experimental::Impl::ViewDimension<2,3> dim_s2_s3 ;
typedef Kokkos::Experimental::Impl::ViewDimension<2,3,4> dim_s2_s3_s4 ;
typedef Kokkos::Experimental::Impl::ViewDimension<0> dim_s0 ;
typedef Kokkos::Experimental::Impl::ViewDimension<0,3> dim_s0_s3 ;
typedef Kokkos::Experimental::Impl::ViewDimension<0,3,4> dim_s0_s3_s4 ;
typedef Kokkos::Experimental::Impl::ViewDimension<0,0> dim_s0_s0 ;
typedef Kokkos::Experimental::Impl::ViewDimension<0,0,4> dim_s0_s0_s4 ;
typedef Kokkos::Experimental::Impl::ViewDimension<0,0,0> dim_s0_s0_s0 ;
typedef Kokkos::Experimental::Impl::ViewDimension<0,0,0,0> dim_s0_s0_s0_s0 ;
typedef Kokkos::Experimental::Impl::ViewDimension<0,0,0,0,0> dim_s0_s0_s0_s0_s0 ;
typedef Kokkos::Experimental::Impl::ViewDimension<0,0,0,0,0,0> dim_s0_s0_s0_s0_s0_s0 ;
typedef Kokkos::Experimental::Impl::ViewDimension<0,0,0,0,0,0,0> dim_s0_s0_s0_s0_s0_s0_s0 ;
typedef Kokkos::Experimental::Impl::ViewDimension<0,0,0,0,0,0,0,0> dim_s0_s0_s0_s0_s0_s0_s0_s0 ;
// Fully static dimensions should not be larger than an int
ASSERT_LE( sizeof(dim_0) , sizeof(int) );
ASSERT_LE( sizeof(dim_s2) , sizeof(int) );
ASSERT_LE( sizeof(dim_s2_s3) , sizeof(int) );
ASSERT_LE( sizeof(dim_s2_s3_s4) , sizeof(int) );
// Rank 1 is size_t
ASSERT_EQ( sizeof(dim_s0) , sizeof(size_t) );
ASSERT_EQ( sizeof(dim_s0_s3) , sizeof(size_t) );
ASSERT_EQ( sizeof(dim_s0_s3_s4) , sizeof(size_t) );
// Allow for padding
ASSERT_LE( sizeof(dim_s0_s0) , 2 * sizeof(size_t) );
ASSERT_LE( sizeof(dim_s0_s0_s4) , 2 * sizeof(size_t) );
ASSERT_LE( sizeof(dim_s0_s0_s0) , 4 * sizeof(size_t) );
ASSERT_EQ( sizeof(dim_s0_s0_s0_s0) , 4 * sizeof(unsigned) );
ASSERT_LE( sizeof(dim_s0_s0_s0_s0_s0) , 6 * sizeof(unsigned) );
ASSERT_EQ( sizeof(dim_s0_s0_s0_s0_s0_s0) , 6 * sizeof(unsigned) );
ASSERT_LE( sizeof(dim_s0_s0_s0_s0_s0_s0_s0) , 8 * sizeof(unsigned) );
ASSERT_EQ( sizeof(dim_s0_s0_s0_s0_s0_s0_s0_s0) , 8 * sizeof(unsigned) );
- ASSERT_EQ( int(dim_0::rank) , int(0) );
- ASSERT_EQ( int(dim_0::rank_dynamic) , int(0) );
-
- ASSERT_EQ( int(dim_s2::rank) , int(1) );
- ASSERT_EQ( int(dim_s2::rank_dynamic) , int(0) );
-
- ASSERT_EQ( int(dim_s2_s3::rank) , int(2) );
- ASSERT_EQ( int(dim_s2_s3::rank_dynamic) , int(0) );
-
- ASSERT_EQ( int(dim_s2_s3_s4::rank) , int(3) );
- ASSERT_EQ( int(dim_s2_s3_s4::rank_dynamic) , int(0) );
-
- ASSERT_EQ( int(dim_s0::rank) , int(1) );
- ASSERT_EQ( int(dim_s0::rank_dynamic) , int(1) );
-
- ASSERT_EQ( int(dim_s0_s3::rank) , int(2) );
- ASSERT_EQ( int(dim_s0_s3::rank_dynamic) , int(1) );
-
- ASSERT_EQ( int(dim_s0_s3_s4::rank) , int(3) );
- ASSERT_EQ( int(dim_s0_s3_s4::rank_dynamic) , int(1) );
-
- ASSERT_EQ( int(dim_s0_s0_s4::rank) , int(3) );
- ASSERT_EQ( int(dim_s0_s0_s4::rank_dynamic) , int(2) );
-
- ASSERT_EQ( int(dim_s0_s0_s0::rank) , int(3) );
- ASSERT_EQ( int(dim_s0_s0_s0::rank_dynamic) , int(3) );
-
- ASSERT_EQ( int(dim_s0_s0_s0_s0::rank) , int(4) );
- ASSERT_EQ( int(dim_s0_s0_s0_s0::rank_dynamic) , int(4) );
-
- ASSERT_EQ( int(dim_s0_s0_s0_s0_s0::rank) , int(5) );
- ASSERT_EQ( int(dim_s0_s0_s0_s0_s0::rank_dynamic) , int(5) );
-
- ASSERT_EQ( int(dim_s0_s0_s0_s0_s0_s0::rank) , int(6) );
- ASSERT_EQ( int(dim_s0_s0_s0_s0_s0_s0::rank_dynamic) , int(6) );
-
- ASSERT_EQ( int(dim_s0_s0_s0_s0_s0_s0_s0::rank) , int(7) );
- ASSERT_EQ( int(dim_s0_s0_s0_s0_s0_s0_s0::rank_dynamic) , int(7) );
-
- ASSERT_EQ( int(dim_s0_s0_s0_s0_s0_s0_s0_s0::rank) , int(8) );
- ASSERT_EQ( int(dim_s0_s0_s0_s0_s0_s0_s0_s0::rank_dynamic) , int(8) );
+ static_assert( int(dim_0::rank) == int(0) , "" );
+ static_assert( int(dim_0::rank_dynamic) == int(0) , "" );
+ static_assert( int(dim_0::ArgN0) == 1 , "" );
+ static_assert( int(dim_0::ArgN1) == 1 , "" );
+ static_assert( int(dim_0::ArgN2) == 1 , "" );
+
+ static_assert( int(dim_s2::rank) == int(1) , "" );
+ static_assert( int(dim_s2::rank_dynamic) == int(0) , "" );
+ static_assert( int(dim_s2::ArgN0) == 2 , "" );
+ static_assert( int(dim_s2::ArgN1) == 1 , "" );
+
+ static_assert( int(dim_s2_s3::rank) == int(2) , "" );
+ static_assert( int(dim_s2_s3::rank_dynamic) == int(0) , "" );
+ static_assert( int(dim_s2_s3::ArgN0) == 2 , "" );
+ static_assert( int(dim_s2_s3::ArgN1) == 3 , "" );
+ static_assert( int(dim_s2_s3::ArgN2) == 1 , "" );
+
+ static_assert( int(dim_s2_s3_s4::rank) == int(3) , "" );
+ static_assert( int(dim_s2_s3_s4::rank_dynamic) == int(0) , "" );
+ static_assert( int(dim_s2_s3_s4::ArgN0) == 2 , "" );
+ static_assert( int(dim_s2_s3_s4::ArgN1) == 3 , "" );
+ static_assert( int(dim_s2_s3_s4::ArgN2) == 4 , "" );
+ static_assert( int(dim_s2_s3_s4::ArgN3) == 1 , "" );
+
+ static_assert( int(dim_s0::rank) == int(1) , "" );
+ static_assert( int(dim_s0::rank_dynamic) == int(1) , "" );
+
+ static_assert( int(dim_s0_s3::rank) == int(2) , "" );
+ static_assert( int(dim_s0_s3::rank_dynamic) == int(1) , "" );
+ static_assert( int(dim_s0_s3::ArgN0) == 0 , "" );
+ static_assert( int(dim_s0_s3::ArgN1) == 3 , "" );
+
+ static_assert( int(dim_s0_s3_s4::rank) == int(3) , "" );
+ static_assert( int(dim_s0_s3_s4::rank_dynamic) == int(1) , "" );
+ static_assert( int(dim_s0_s3_s4::ArgN0) == 0 , "" );
+ static_assert( int(dim_s0_s3_s4::ArgN1) == 3 , "" );
+ static_assert( int(dim_s0_s3_s4::ArgN2) == 4 , "" );
+
+ static_assert( int(dim_s0_s0_s4::rank) == int(3) , "" );
+ static_assert( int(dim_s0_s0_s4::rank_dynamic) == int(2) , "" );
+ static_assert( int(dim_s0_s0_s4::ArgN0) == 0 , "" );
+ static_assert( int(dim_s0_s0_s4::ArgN1) == 0 , "" );
+ static_assert( int(dim_s0_s0_s4::ArgN2) == 4 , "" );
+
+ static_assert( int(dim_s0_s0_s0::rank) == int(3) , "" );
+ static_assert( int(dim_s0_s0_s0::rank_dynamic) == int(3) , "" );
+
+ static_assert( int(dim_s0_s0_s0_s0::rank) == int(4) , "" );
+ static_assert( int(dim_s0_s0_s0_s0::rank_dynamic) == int(4) , "" );
+
+ static_assert( int(dim_s0_s0_s0_s0_s0::rank) == int(5) , "" );
+ static_assert( int(dim_s0_s0_s0_s0_s0::rank_dynamic) == int(5) , "" );
+
+ static_assert( int(dim_s0_s0_s0_s0_s0_s0::rank) == int(6) , "" );
+ static_assert( int(dim_s0_s0_s0_s0_s0_s0::rank_dynamic) == int(6) , "" );
+
+ static_assert( int(dim_s0_s0_s0_s0_s0_s0_s0::rank) == int(7) , "" );
+ static_assert( int(dim_s0_s0_s0_s0_s0_s0_s0::rank_dynamic) == int(7) , "" );
+
+ static_assert( int(dim_s0_s0_s0_s0_s0_s0_s0_s0::rank) == int(8) , "" );
+ static_assert( int(dim_s0_s0_s0_s0_s0_s0_s0_s0::rank_dynamic) == int(8) , "" );
dim_s0 d1( 2, 3, 4, 5, 6, 7, 8, 9 );
dim_s0_s0 d2( 2, 3, 4, 5, 6, 7, 8, 9 );
dim_s0_s0_s0 d3( 2, 3, 4, 5, 6, 7, 8, 9 );
dim_s0_s0_s0_s0 d4( 2, 3, 4, 5, 6, 7, 8, 9 );
ASSERT_EQ( d1.N0 , 2 );
ASSERT_EQ( d2.N0 , 2 );
ASSERT_EQ( d3.N0 , 2 );
ASSERT_EQ( d4.N0 , 2 );
ASSERT_EQ( d1.N1 , 1 );
ASSERT_EQ( d2.N1 , 3 );
ASSERT_EQ( d3.N1 , 3 );
ASSERT_EQ( d4.N1 , 3 );
ASSERT_EQ( d1.N2 , 1 );
ASSERT_EQ( d2.N2 , 1 );
ASSERT_EQ( d3.N2 , 4 );
ASSERT_EQ( d4.N2 , 4 );
ASSERT_EQ( d1.N3 , 1 );
ASSERT_EQ( d2.N3 , 1 );
ASSERT_EQ( d3.N3 , 1 );
ASSERT_EQ( d4.N3 , 5 );
//----------------------------------------
typedef Kokkos::Experimental::Impl::ViewOffset< dim_s0_s0_s0 , Kokkos::LayoutStride > stride_s0_s0_s0 ;
//----------------------------------------
// Static dimension
{
typedef Kokkos::Experimental::Impl::ViewOffset< dim_s2_s3_s4 , Kokkos::LayoutLeft > left_s2_s3_s4 ;
ASSERT_EQ( sizeof(left_s2_s3_s4) , sizeof(dim_s2_s3_s4) );
left_s2_s3_s4 off3 ;
stride_s0_s0_s0 stride3( off3 );
ASSERT_EQ( off3.stride_0() , 1 );
ASSERT_EQ( off3.stride_1() , 2 );
ASSERT_EQ( off3.stride_2() , 6 );
ASSERT_EQ( off3.span() , 24 );
ASSERT_EQ( off3.stride_0() , stride3.stride_0() );
ASSERT_EQ( off3.stride_1() , stride3.stride_1() );
ASSERT_EQ( off3.stride_2() , stride3.stride_2() );
ASSERT_EQ( off3.span() , stride3.span() );
int offset = 0 ;
for ( int k = 0 ; k < 4 ; ++k ){
for ( int j = 0 ; j < 3 ; ++j ){
for ( int i = 0 ; i < 2 ; ++i , ++offset ){
ASSERT_EQ( off3(i,j,k) , offset );
ASSERT_EQ( stride3(i,j,k) , off3(i,j,k) );
}}}
}
//----------------------------------------
// Small dimension is unpadded
{
typedef Kokkos::Experimental::Impl::ViewOffset< dim_s0_s0_s4 , Kokkos::LayoutLeft > left_s0_s0_s4 ;
left_s0_s0_s4 dyn_off3( std::integral_constant<unsigned,sizeof(int)>()
, Kokkos::LayoutLeft( 2, 3, 0, 0, 0, 0, 0, 0 ) );
stride_s0_s0_s0 stride3( dyn_off3 );
ASSERT_EQ( dyn_off3.m_dim.rank , 3 );
ASSERT_EQ( dyn_off3.m_dim.N0 , 2 );
ASSERT_EQ( dyn_off3.m_dim.N1 , 3 );
ASSERT_EQ( dyn_off3.m_dim.N2 , 4 );
ASSERT_EQ( dyn_off3.m_dim.N3 , 1 );
ASSERT_EQ( dyn_off3.size() , 2 * 3 * 4 );
const Kokkos::LayoutLeft layout = dyn_off3.layout();
ASSERT_EQ( layout.dimension[0] , 2 );
ASSERT_EQ( layout.dimension[1] , 3 );
ASSERT_EQ( layout.dimension[2] , 4 );
ASSERT_EQ( layout.dimension[3] , 1 );
ASSERT_EQ( layout.dimension[4] , 1 );
ASSERT_EQ( layout.dimension[5] , 1 );
ASSERT_EQ( layout.dimension[6] , 1 );
ASSERT_EQ( layout.dimension[7] , 1 );
ASSERT_EQ( stride3.m_dim.rank , 3 );
ASSERT_EQ( stride3.m_dim.N0 , 2 );
ASSERT_EQ( stride3.m_dim.N1 , 3 );
ASSERT_EQ( stride3.m_dim.N2 , 4 );
ASSERT_EQ( stride3.m_dim.N3 , 1 );
ASSERT_EQ( stride3.size() , 2 * 3 * 4 );
int offset = 0 ;
for ( int k = 0 ; k < 4 ; ++k ){
for ( int j = 0 ; j < 3 ; ++j ){
for ( int i = 0 ; i < 2 ; ++i , ++offset ){
ASSERT_EQ( offset , dyn_off3(i,j,k) );
ASSERT_EQ( stride3(i,j,k) , dyn_off3(i,j,k) );
}}}
ASSERT_EQ( dyn_off3.span() , offset );
ASSERT_EQ( stride3.span() , dyn_off3.span() );
}
// Large dimension is likely padded
{
constexpr int N0 = 2000 ;
constexpr int N1 = 300 ;
typedef Kokkos::Experimental::Impl::ViewOffset< dim_s0_s0_s4 , Kokkos::LayoutLeft > left_s0_s0_s4 ;
left_s0_s0_s4 dyn_off3( std::integral_constant<unsigned,sizeof(int)>()
, Kokkos::LayoutLeft( N0, N1, 0, 0, 0, 0, 0, 0 ) );
stride_s0_s0_s0 stride3( dyn_off3 );
ASSERT_EQ( dyn_off3.m_dim.rank , 3 );
ASSERT_EQ( dyn_off3.m_dim.N0 , N0 );
ASSERT_EQ( dyn_off3.m_dim.N1 , N1 );
ASSERT_EQ( dyn_off3.m_dim.N2 , 4 );
ASSERT_EQ( dyn_off3.m_dim.N3 , 1 );
ASSERT_EQ( dyn_off3.size() , N0 * N1 * 4 );
ASSERT_EQ( stride3.m_dim.rank , 3 );
ASSERT_EQ( stride3.m_dim.N0 , N0 );
ASSERT_EQ( stride3.m_dim.N1 , N1 );
ASSERT_EQ( stride3.m_dim.N2 , 4 );
ASSERT_EQ( stride3.m_dim.N3 , 1 );
ASSERT_EQ( stride3.size() , N0 * N1 * 4 );
ASSERT_EQ( stride3.span() , dyn_off3.span() );
int offset = 0 ;
for ( int k = 0 ; k < 4 ; ++k ){
for ( int j = 0 ; j < N1 ; ++j ){
for ( int i = 0 ; i < N0 ; ++i ){
ASSERT_LE( offset , dyn_off3(i,j,k) );
ASSERT_EQ( stride3(i,j,k) , dyn_off3(i,j,k) );
offset = dyn_off3(i,j,k) + 1 ;
}}}
ASSERT_LE( offset , dyn_off3.span() );
}
//----------------------------------------
// Static dimension
{
typedef Kokkos::Experimental::Impl::ViewOffset< dim_s2_s3_s4 , Kokkos::LayoutRight > right_s2_s3_s4 ;
ASSERT_EQ( sizeof(right_s2_s3_s4) , sizeof(dim_s2_s3_s4) );
right_s2_s3_s4 off3 ;
stride_s0_s0_s0 stride3( off3 );
ASSERT_EQ( off3.stride_0() , 12 );
ASSERT_EQ( off3.stride_1() , 4 );
ASSERT_EQ( off3.stride_2() , 1 );
ASSERT_EQ( off3.dimension_0() , stride3.dimension_0() );
ASSERT_EQ( off3.dimension_1() , stride3.dimension_1() );
ASSERT_EQ( off3.dimension_2() , stride3.dimension_2() );
ASSERT_EQ( off3.stride_0() , stride3.stride_0() );
ASSERT_EQ( off3.stride_1() , stride3.stride_1() );
ASSERT_EQ( off3.stride_2() , stride3.stride_2() );
ASSERT_EQ( off3.span() , stride3.span() );
int offset = 0 ;
for ( int i = 0 ; i < 2 ; ++i ){
for ( int j = 0 ; j < 3 ; ++j ){
for ( int k = 0 ; k < 4 ; ++k , ++offset ){
ASSERT_EQ( off3(i,j,k) , offset );
ASSERT_EQ( off3(i,j,k) , stride3(i,j,k) );
}}}
ASSERT_EQ( off3.span() , offset );
}
//----------------------------------------
// Small dimension is unpadded
{
typedef Kokkos::Experimental::Impl::ViewOffset< dim_s0_s0_s4 , Kokkos::LayoutRight > right_s0_s0_s4 ;
right_s0_s0_s4 dyn_off3( std::integral_constant<unsigned,sizeof(int)>()
, Kokkos::LayoutRight( 2, 3, 0, 0, 0, 0, 0, 0 ) );
stride_s0_s0_s0 stride3( dyn_off3 );
ASSERT_EQ( dyn_off3.m_dim.rank , 3 );
ASSERT_EQ( dyn_off3.m_dim.N0 , 2 );
ASSERT_EQ( dyn_off3.m_dim.N1 , 3 );
ASSERT_EQ( dyn_off3.m_dim.N2 , 4 );
ASSERT_EQ( dyn_off3.m_dim.N3 , 1 );
ASSERT_EQ( dyn_off3.size() , 2 * 3 * 4 );
ASSERT_EQ( dyn_off3.dimension_0() , stride3.dimension_0() );
ASSERT_EQ( dyn_off3.dimension_1() , stride3.dimension_1() );
ASSERT_EQ( dyn_off3.dimension_2() , stride3.dimension_2() );
ASSERT_EQ( dyn_off3.stride_0() , stride3.stride_0() );
ASSERT_EQ( dyn_off3.stride_1() , stride3.stride_1() );
ASSERT_EQ( dyn_off3.stride_2() , stride3.stride_2() );
ASSERT_EQ( dyn_off3.span() , stride3.span() );
int offset = 0 ;
for ( int i = 0 ; i < 2 ; ++i ){
for ( int j = 0 ; j < 3 ; ++j ){
for ( int k = 0 ; k < 4 ; ++k , ++offset ){
ASSERT_EQ( offset , dyn_off3(i,j,k) );
ASSERT_EQ( dyn_off3(i,j,k) , stride3(i,j,k) );
}}}
ASSERT_EQ( dyn_off3.span() , offset );
}
// Large dimension is likely padded
{
constexpr int N0 = 2000 ;
constexpr int N1 = 300 ;
typedef Kokkos::Experimental::Impl::ViewOffset< dim_s0_s0_s4 , Kokkos::LayoutRight > right_s0_s0_s4 ;
right_s0_s0_s4 dyn_off3( std::integral_constant<unsigned,sizeof(int)>()
, Kokkos::LayoutRight( N0, N1, 0, 0, 0, 0, 0, 0 ) );
stride_s0_s0_s0 stride3( dyn_off3 );
ASSERT_EQ( dyn_off3.m_dim.rank , 3 );
ASSERT_EQ( dyn_off3.m_dim.N0 , N0 );
ASSERT_EQ( dyn_off3.m_dim.N1 , N1 );
ASSERT_EQ( dyn_off3.m_dim.N2 , 4 );
ASSERT_EQ( dyn_off3.m_dim.N3 , 1 );
ASSERT_EQ( dyn_off3.size() , N0 * N1 * 4 );
ASSERT_EQ( dyn_off3.dimension_0() , stride3.dimension_0() );
ASSERT_EQ( dyn_off3.dimension_1() , stride3.dimension_1() );
ASSERT_EQ( dyn_off3.dimension_2() , stride3.dimension_2() );
ASSERT_EQ( dyn_off3.stride_0() , stride3.stride_0() );
ASSERT_EQ( dyn_off3.stride_1() , stride3.stride_1() );
ASSERT_EQ( dyn_off3.stride_2() , stride3.stride_2() );
ASSERT_EQ( dyn_off3.span() , stride3.span() );
int offset = 0 ;
for ( int i = 0 ; i < N0 ; ++i ){
for ( int j = 0 ; j < N1 ; ++j ){
for ( int k = 0 ; k < 4 ; ++k ){
ASSERT_LE( offset , dyn_off3(i,j,k) );
ASSERT_EQ( dyn_off3(i,j,k) , stride3(i,j,k) );
offset = dyn_off3(i,j,k) + 1 ;
}}}
ASSERT_LE( offset , dyn_off3.span() );
}
//----------------------------------------
// Subview
{
// Mapping rank 4 to rank 3
typedef Kokkos::Experimental::Impl::SubviewExtents<4,3> SubviewExtents ;
constexpr int N0 = 1000 ;
constexpr int N1 = 2000 ;
constexpr int N2 = 3000 ;
constexpr int N3 = 4000 ;
Kokkos::Experimental::Impl::ViewDimension<N0,N1,N2,N3> dim ;
SubviewExtents tmp( dim
, N0 / 2
, Kokkos::Experimental::ALL
, std::pair<int,int>( N2 / 4 , 10 + N2 / 4 )
, Kokkos::pair<int,int>( N3 / 4 , 20 + N3 / 4 )
);
ASSERT_EQ( tmp.domain_offset(0) , N0 / 2 );
ASSERT_EQ( tmp.domain_offset(1) , 0 );
ASSERT_EQ( tmp.domain_offset(2) , N2 / 4 );
ASSERT_EQ( tmp.domain_offset(3) , N3 / 4 );
ASSERT_EQ( tmp.range_index(0) , 1 );
ASSERT_EQ( tmp.range_index(1) , 2 );
ASSERT_EQ( tmp.range_index(2) , 3 );
ASSERT_EQ( tmp.range_extent(0) , N1 );
ASSERT_EQ( tmp.range_extent(1) , 10 );
ASSERT_EQ( tmp.range_extent(2) , 20 );
}
//----------------------------------------
{
constexpr int N0 = 2000 ;
constexpr int N1 = 300 ;
constexpr int sub_N0 = 1000 ;
constexpr int sub_N1 = 200 ;
constexpr int sub_N2 = 4 ;
typedef Kokkos::Experimental::Impl::ViewOffset< dim_s0_s0_s4 , Kokkos::LayoutLeft > left_s0_s0_s4 ;
left_s0_s0_s4 dyn_off3( std::integral_constant<unsigned,sizeof(int)>()
, Kokkos::LayoutLeft( N0, N1, 0, 0, 0, 0, 0, 0 ) );
Kokkos::Experimental::Impl::SubviewExtents< 3 , 3 >
sub( dyn_off3.m_dim
, Kokkos::pair<int,int>(0,sub_N0)
, Kokkos::pair<int,int>(0,sub_N1)
, Kokkos::pair<int,int>(0,sub_N2)
);
stride_s0_s0_s0 stride3( dyn_off3 , sub );
ASSERT_EQ( stride3.dimension_0() , sub_N0 );
ASSERT_EQ( stride3.dimension_1() , sub_N1 );
ASSERT_EQ( stride3.dimension_2() , sub_N2 );
ASSERT_EQ( stride3.size() , sub_N0 * sub_N1 * sub_N2 );
ASSERT_EQ( dyn_off3.stride_0() , stride3.stride_0() );
ASSERT_EQ( dyn_off3.stride_1() , stride3.stride_1() );
ASSERT_EQ( dyn_off3.stride_2() , stride3.stride_2() );
ASSERT_GE( dyn_off3.span() , stride3.span() );
for ( int k = 0 ; k < sub_N2 ; ++k ){
for ( int j = 0 ; j < sub_N1 ; ++j ){
for ( int i = 0 ; i < sub_N0 ; ++i ){
ASSERT_EQ( stride3(i,j,k) , dyn_off3(i,j,k) );
}}}
}
{
constexpr int N0 = 2000 ;
constexpr int N1 = 300 ;
constexpr int sub_N0 = 1000 ;
constexpr int sub_N1 = 200 ;
constexpr int sub_N2 = 4 ;
typedef Kokkos::Experimental::Impl::ViewOffset< dim_s0_s0_s4 , Kokkos::LayoutRight > right_s0_s0_s4 ;
right_s0_s0_s4 dyn_off3( std::integral_constant<unsigned,sizeof(int)>()
, Kokkos::LayoutRight( N0, N1, 0, 0, 0, 0, 0, 0 ) );
Kokkos::Experimental::Impl::SubviewExtents< 3 , 3 >
sub( dyn_off3.m_dim
, Kokkos::pair<int,int>(0,sub_N0)
, Kokkos::pair<int,int>(0,sub_N1)
, Kokkos::pair<int,int>(0,sub_N2)
);
stride_s0_s0_s0 stride3( dyn_off3 , sub );
ASSERT_EQ( stride3.dimension_0() , sub_N0 );
ASSERT_EQ( stride3.dimension_1() , sub_N1 );
ASSERT_EQ( stride3.dimension_2() , sub_N2 );
ASSERT_EQ( stride3.size() , sub_N0 * sub_N1 * sub_N2 );
ASSERT_EQ( dyn_off3.stride_0() , stride3.stride_0() );
ASSERT_EQ( dyn_off3.stride_1() , stride3.stride_1() );
ASSERT_EQ( dyn_off3.stride_2() , stride3.stride_2() );
ASSERT_GE( dyn_off3.span() , stride3.span() );
for ( int i = 0 ; i < sub_N0 ; ++i ){
for ( int j = 0 ; j < sub_N1 ; ++j ){
for ( int k = 0 ; k < sub_N2 ; ++k ){
ASSERT_EQ( stride3(i,j,k) , dyn_off3(i,j,k) );
}}}
}
//----------------------------------------
// view data analysis
{
using namespace Kokkos::Experimental::Impl ;
static_assert( rank_dynamic<>::value == 0 , "" );
static_assert( rank_dynamic<1>::value == 0 , "" );
static_assert( rank_dynamic<0>::value == 1 , "" );
static_assert( rank_dynamic<0,1>::value == 1 , "" );
static_assert( rank_dynamic<0,0,1>::value == 2 , "" );
}
{
using namespace Kokkos::Experimental::Impl ;
typedef ViewArrayAnalysis< int[] > a_int_r1 ;
typedef ViewArrayAnalysis< int**[4][5][6] > a_int_r5 ;
typedef ViewArrayAnalysis< const int[] > a_const_int_r1 ;
typedef ViewArrayAnalysis< const int**[4][5][6] > a_const_int_r5 ;
static_assert( a_int_r1::dimension::rank == 1 , "" );
static_assert( a_int_r1::dimension::rank_dynamic == 1 , "" );
+ static_assert( a_int_r5::dimension::ArgN0 == 0 , "" );
+ static_assert( a_int_r5::dimension::ArgN1 == 0 , "" );
+ static_assert( a_int_r5::dimension::ArgN2 == 4 , "" );
+ static_assert( a_int_r5::dimension::ArgN3 == 5 , "" );
+ static_assert( a_int_r5::dimension::ArgN4 == 6 , "" );
+ static_assert( a_int_r5::dimension::ArgN5 == 1 , "" );
+
static_assert( std::is_same< typename a_int_r1::dimension , ViewDimension<0> >::value , "" );
static_assert( std::is_same< typename a_int_r1::non_const_value_type , int >::value , "" );
static_assert( a_const_int_r1::dimension::rank == 1 , "" );
static_assert( a_const_int_r1::dimension::rank_dynamic == 1 , "" );
static_assert( std::is_same< typename a_const_int_r1::dimension , ViewDimension<0> >::value , "" );
static_assert( std::is_same< typename a_const_int_r1::non_const_value_type , int >::value , "" );
static_assert( a_const_int_r5::dimension::rank == 5 , "" );
static_assert( a_const_int_r5::dimension::rank_dynamic == 2 , "" );
- static_assert( std::is_same< typename a_const_int_r5::dimension , ViewDimension<0,0,4,5,6> >::value , "" );
+ static_assert( a_const_int_r5::dimension::ArgN0 == 0 , "" );
+ static_assert( a_const_int_r5::dimension::ArgN1 == 0 , "" );
+ static_assert( a_const_int_r5::dimension::ArgN2 == 4 , "" );
+ static_assert( a_const_int_r5::dimension::ArgN3 == 5 , "" );
+ static_assert( a_const_int_r5::dimension::ArgN4 == 6 , "" );
+ static_assert( a_const_int_r5::dimension::ArgN5 == 1 , "" );
+ static_assert( std::is_same< typename a_const_int_r5::dimension , ViewDimension<0,0,4,5,6> >::value , "" );
static_assert( std::is_same< typename a_const_int_r5::non_const_value_type , int >::value , "" );
static_assert( a_int_r5::dimension::rank == 5 , "" );
static_assert( a_int_r5::dimension::rank_dynamic == 2 , "" );
static_assert( std::is_same< typename a_int_r5::dimension , ViewDimension<0,0,4,5,6> >::value , "" );
static_assert( std::is_same< typename a_int_r5::non_const_value_type , int >::value , "" );
}
{
using namespace Kokkos::Experimental::Impl ;
typedef int t_i4[4] ;
// Dimensions of t_i4 are appended to the multdimensional array.
typedef ViewArrayAnalysis< t_i4 ***[3] > a_int_r5 ;
static_assert( a_int_r5::dimension::rank == 5 , "" );
static_assert( a_int_r5::dimension::rank_dynamic == 3 , "" );
static_assert( a_int_r5::dimension::ArgN0 == 0 , "" );
static_assert( a_int_r5::dimension::ArgN1 == 0 , "" );
static_assert( a_int_r5::dimension::ArgN2 == 0 , "" );
static_assert( a_int_r5::dimension::ArgN3 == 3 , "" );
static_assert( a_int_r5::dimension::ArgN4 == 4 , "" );
static_assert( std::is_same< typename a_int_r5::non_const_value_type , int >::value , "" );
}
{
using namespace Kokkos::Experimental::Impl ;
typedef ViewDataAnalysis< const int[] , void > a_const_int_r1 ;
static_assert( std::is_same< typename a_const_int_r1::specialize , void >::value , "" );
static_assert( std::is_same< typename a_const_int_r1::dimension , Kokkos::Experimental::Impl::ViewDimension<0> >::value , "" );
static_assert( std::is_same< typename a_const_int_r1::type , const int * >::value , "" );
static_assert( std::is_same< typename a_const_int_r1::value_type , const int >::value , "" );
static_assert( std::is_same< typename a_const_int_r1::scalar_array_type , const int * >::value , "" );
static_assert( std::is_same< typename a_const_int_r1::const_type , const int * >::value , "" );
static_assert( std::is_same< typename a_const_int_r1::const_value_type , const int >::value , "" );
static_assert( std::is_same< typename a_const_int_r1::const_scalar_array_type , const int * >::value , "" );
static_assert( std::is_same< typename a_const_int_r1::non_const_type , int * >::value , "" );
static_assert( std::is_same< typename a_const_int_r1::non_const_value_type , int >::value , "" );
typedef ViewDataAnalysis< const int**[4] , void > a_const_int_r3 ;
static_assert( std::is_same< typename a_const_int_r3::specialize , void >::value , "" );
static_assert( std::is_same< typename a_const_int_r3::dimension , Kokkos::Experimental::Impl::ViewDimension<0,0,4> >::value , "" );
static_assert( std::is_same< typename a_const_int_r3::type , const int**[4] >::value , "" );
static_assert( std::is_same< typename a_const_int_r3::value_type , const int >::value , "" );
static_assert( std::is_same< typename a_const_int_r3::scalar_array_type , const int**[4] >::value , "" );
static_assert( std::is_same< typename a_const_int_r3::const_type , const int**[4] >::value , "" );
static_assert( std::is_same< typename a_const_int_r3::const_value_type , const int >::value , "" );
static_assert( std::is_same< typename a_const_int_r3::const_scalar_array_type , const int**[4] >::value , "" );
static_assert( std::is_same< typename a_const_int_r3::non_const_type , int**[4] >::value , "" );
static_assert( std::is_same< typename a_const_int_r3::non_const_value_type , int >::value , "" );
static_assert( std::is_same< typename a_const_int_r3::non_const_scalar_array_type , int**[4] >::value , "" );
// std::cout << "typeid(const int**[4]).name() = " << typeid(const int**[4]).name() << std::endl ;
}
//----------------------------------------
{
constexpr int N = 10 ;
- typedef Kokkos::Experimental::View<int*,Space> T ;
- typedef Kokkos::Experimental::View<const int*,Space> C ;
+ typedef Kokkos::View<int*,Space> T ;
+ typedef Kokkos::View<const int*,Space> C ;
int data[N] ;
T vr1(data,N); // view of non-const
C cr1(vr1); // view of const from view of non-const
C cr2( (const int *) data , N );
// Generate static_assert error:
// T tmp( cr1 );
ASSERT_EQ( vr1.span() , N );
ASSERT_EQ( cr1.span() , N );
ASSERT_EQ( vr1.data() , & data[0] );
ASSERT_EQ( cr1.data() , & data[0] );
ASSERT_TRUE( ( std::is_same< typename T::data_type , int* >::value ) );
ASSERT_TRUE( ( std::is_same< typename T::const_data_type , const int* >::value ) );
ASSERT_TRUE( ( std::is_same< typename T::non_const_data_type , int* >::value ) );
ASSERT_TRUE( ( std::is_same< typename T::scalar_array_type , int* >::value ) );
ASSERT_TRUE( ( std::is_same< typename T::const_scalar_array_type , const int* >::value ) );
ASSERT_TRUE( ( std::is_same< typename T::non_const_scalar_array_type , int* >::value ) );
ASSERT_TRUE( ( std::is_same< typename T::value_type , int >::value ) );
ASSERT_TRUE( ( std::is_same< typename T::const_value_type , const int >::value ) );
ASSERT_TRUE( ( std::is_same< typename T::non_const_value_type , int >::value ) );
ASSERT_TRUE( ( std::is_same< typename T::memory_space , typename Space::memory_space >::value ) );
ASSERT_TRUE( ( std::is_same< typename T::reference_type , int & >::value ) );
ASSERT_EQ( T::Rank , 1 );
ASSERT_TRUE( ( std::is_same< typename C::data_type , const int* >::value ) );
ASSERT_TRUE( ( std::is_same< typename C::const_data_type , const int* >::value ) );
ASSERT_TRUE( ( std::is_same< typename C::non_const_data_type , int* >::value ) );
ASSERT_TRUE( ( std::is_same< typename C::scalar_array_type , const int* >::value ) );
ASSERT_TRUE( ( std::is_same< typename C::const_scalar_array_type , const int* >::value ) );
ASSERT_TRUE( ( std::is_same< typename C::non_const_scalar_array_type , int* >::value ) );
ASSERT_TRUE( ( std::is_same< typename C::value_type , const int >::value ) );
ASSERT_TRUE( ( std::is_same< typename C::const_value_type , const int >::value ) );
ASSERT_TRUE( ( std::is_same< typename C::non_const_value_type , int >::value ) );
ASSERT_TRUE( ( std::is_same< typename C::memory_space , typename Space::memory_space >::value ) );
ASSERT_TRUE( ( std::is_same< typename C::reference_type , const int & >::value ) );
ASSERT_EQ( C::Rank , 1 );
ASSERT_EQ( vr1.dimension_0() , N );
- if ( Kokkos::Impl::VerifyExecutionCanAccessMemorySpace< typename Space::memory_space , Kokkos::HostSpace >::value ) {
+ if ( Kokkos::Impl::SpaceAccessibility< Kokkos::HostSpace , typename Space::memory_space >::accessible ) {
for ( int i = 0 ; i < N ; ++i ) data[i] = i + 1 ;
for ( int i = 0 ; i < N ; ++i ) ASSERT_EQ( vr1[i] , i + 1 );
for ( int i = 0 ; i < N ; ++i ) ASSERT_EQ( cr1[i] , i + 1 );
{
T tmp( vr1 );
for ( int i = 0 ; i < N ; ++i ) ASSERT_EQ( tmp[i] , i + 1 );
for ( int i = 0 ; i < N ; ++i ) vr1(i) = i + 2 ;
for ( int i = 0 ; i < N ; ++i ) ASSERT_EQ( tmp[i] , i + 2 );
}
for ( int i = 0 ; i < N ; ++i ) ASSERT_EQ( vr1[i] , i + 2 );
}
}
{
constexpr int N = 10 ;
- typedef Kokkos::Experimental::View<int*,Space> T ;
- typedef Kokkos::Experimental::View<const int*,Space> C ;
+ typedef Kokkos::View<int*,Space> T ;
+ typedef Kokkos::View<const int*,Space> C ;
T vr1("vr1",N);
C cr1(vr1);
ASSERT_TRUE( ( std::is_same< typename T::data_type , int* >::value ) );
ASSERT_TRUE( ( std::is_same< typename T::const_data_type , const int* >::value ) );
ASSERT_TRUE( ( std::is_same< typename T::non_const_data_type , int* >::value ) );
ASSERT_TRUE( ( std::is_same< typename T::scalar_array_type , int* >::value ) );
ASSERT_TRUE( ( std::is_same< typename T::const_scalar_array_type , const int* >::value ) );
ASSERT_TRUE( ( std::is_same< typename T::non_const_scalar_array_type , int* >::value ) );
ASSERT_TRUE( ( std::is_same< typename T::value_type , int >::value ) );
ASSERT_TRUE( ( std::is_same< typename T::const_value_type , const int >::value ) );
ASSERT_TRUE( ( std::is_same< typename T::non_const_value_type , int >::value ) );
ASSERT_TRUE( ( std::is_same< typename T::memory_space , typename Space::memory_space >::value ) );
ASSERT_TRUE( ( std::is_same< typename T::reference_type , int & >::value ) );
ASSERT_EQ( T::Rank , 1 );
ASSERT_EQ( vr1.dimension_0() , N );
- if ( Kokkos::Impl::VerifyExecutionCanAccessMemorySpace< typename Space::memory_space , Kokkos::HostSpace >::value ) {
+ if ( Kokkos::Impl::SpaceAccessibility< Kokkos::HostSpace , typename Space::memory_space >::accessible ) {
for ( int i = 0 ; i < N ; ++i ) vr1(i) = i + 1 ;
for ( int i = 0 ; i < N ; ++i ) ASSERT_EQ( vr1[i] , i + 1 );
for ( int i = 0 ; i < N ; ++i ) ASSERT_EQ( cr1[i] , i + 1 );
{
T tmp( vr1 );
for ( int i = 0 ; i < N ; ++i ) ASSERT_EQ( tmp[i] , i + 1 );
for ( int i = 0 ; i < N ; ++i ) vr1(i) = i + 2 ;
for ( int i = 0 ; i < N ; ++i ) ASSERT_EQ( tmp[i] , i + 2 );
}
for ( int i = 0 ; i < N ; ++i ) ASSERT_EQ( vr1[i] , i + 2 );
}
}
// Testing proper handling of zero-length allocations
{
constexpr int N = 0 ;
- typedef Kokkos::Experimental::View<int*,Space> T ;
- typedef Kokkos::Experimental::View<const int*,Space> C ;
+ typedef Kokkos::View<int*,Space> T ;
+ typedef Kokkos::View<const int*,Space> C ;
T vr1("vr1",N);
C cr1(vr1);
ASSERT_EQ( vr1.dimension_0() , 0 );
ASSERT_EQ( cr1.dimension_0() , 0 );
}
// Testing using space instance for allocation.
// The execution space of the memory space must be available for view data initialization
if ( std::is_same< ExecSpace , typename ExecSpace::memory_space::execution_space >::value ) {
using namespace Kokkos::Experimental ;
typedef typename ExecSpace::memory_space memory_space ;
typedef View<int*,memory_space> V ;
constexpr int N = 10 ;
memory_space mem_space ;
V v( "v" , N );
V va( view_alloc() , N );
V vb( view_alloc( "vb" ) , N );
V vc( view_alloc( "vc" , AllowPadding ) , N );
V vd( view_alloc( "vd" , WithoutInitializing ) , N );
V ve( view_alloc( "ve" , WithoutInitializing , AllowPadding ) , N );
V vf( view_alloc( "vf" , mem_space , WithoutInitializing , AllowPadding ) , N );
V vg( view_alloc( mem_space , "vg" , WithoutInitializing , AllowPadding ) , N );
V vh( view_alloc( WithoutInitializing , AllowPadding ) , N );
V vi( view_alloc( WithoutInitializing ) , N );
V vj( view_alloc( std::string("vj") , AllowPadding ) , N );
V vk( view_alloc( mem_space , std::string("vk") , AllowPadding ) , N );
}
{
- typedef Kokkos::Experimental::ViewTraits<int***,Kokkos::LayoutStride,ExecSpace> traits_t ;
+ typedef Kokkos::ViewTraits<int***,Kokkos::LayoutStride,ExecSpace> traits_t ;
typedef Kokkos::Experimental::Impl::ViewDimension<0,0,0> dims_t ;
typedef Kokkos::Experimental::Impl::ViewOffset< dims_t , Kokkos::LayoutStride > offset_t ;
Kokkos::LayoutStride stride ;
stride.dimension[0] = 3 ;
stride.dimension[1] = 4 ;
stride.dimension[2] = 5 ;
stride.stride[0] = 4 ;
stride.stride[1] = 1 ;
stride.stride[2] = 12 ;
const offset_t offset( std::integral_constant<unsigned,0>() , stride );
ASSERT_EQ( offset.dimension_0() , 3 );
ASSERT_EQ( offset.dimension_1() , 4 );
ASSERT_EQ( offset.dimension_2() , 5 );
ASSERT_EQ( offset.stride_0() , 4 );
ASSERT_EQ( offset.stride_1() , 1 );
ASSERT_EQ( offset.stride_2() , 12 );
ASSERT_EQ( offset.span() , 60 );
ASSERT_TRUE( offset.span_is_contiguous() );
Kokkos::Experimental::Impl::ViewMapping< traits_t , void >
v( Kokkos::Experimental::Impl::ViewCtorProp<int*>((int*)0), stride );
}
{
- typedef Kokkos::Experimental::View<int**,Space> V ;
+ typedef Kokkos::View<int**,Space> V ;
typedef typename V::HostMirror M ;
+ typedef typename Kokkos::View<int**,Space>::array_layout layout_type;
constexpr int N0 = 10 ;
constexpr int N1 = 11 ;
V a("a",N0,N1);
M b = Kokkos::Experimental::create_mirror(a);
M c = Kokkos::Experimental::create_mirror_view(a);
M d ;
for ( int i0 = 0 ; i0 < N0 ; ++i0 )
for ( int i1 = 0 ; i1 < N1 ; ++i1 )
b(i0,i1) = 1 + i0 + i1 * N0 ;
Kokkos::Experimental::deep_copy( a , b );
Kokkos::Experimental::deep_copy( c , a );
for ( int i0 = 0 ; i0 < N0 ; ++i0 )
for ( int i1 = 0 ; i1 < N1 ; ++i1 )
ASSERT_EQ( b(i0,i1) , c(i0,i1) );
Kokkos::Experimental::resize( b , 5 , 6 );
+
+ for ( int i0 = 0 ; i0 < 5 ; ++i0 )
+ for ( int i1 = 0 ; i1 < 6 ; ++i1 ) {
+ int val = 1 + i0 + i1 * N0;
+ ASSERT_EQ( b(i0,i1) , c(i0,i1) );
+ ASSERT_EQ( b(i0,i1) , val );
+ }
+
Kokkos::Experimental::realloc( c , 5 , 6 );
Kokkos::Experimental::realloc( d , 5 , 6 );
ASSERT_EQ( b.dimension_0() , 5 );
ASSERT_EQ( b.dimension_1() , 6 );
ASSERT_EQ( c.dimension_0() , 5 );
ASSERT_EQ( c.dimension_1() , 6 );
ASSERT_EQ( d.dimension_0() , 5 );
ASSERT_EQ( d.dimension_1() , 6 );
+
+ layout_type layout(7,8);
+ Kokkos::Experimental::resize( b , layout );
+ for ( int i0 = 0 ; i0 < 7 ; ++i0 )
+ for ( int i1 = 6 ; i1 < 8 ; ++i1 )
+ b(i0,i1) = 1 + i0 + i1 * N0 ;
+
+ for ( int i0 = 5 ; i0 < 7 ; ++i0 )
+ for ( int i1 = 0 ; i1 < 8 ; ++i1 )
+ b(i0,i1) = 1 + i0 + i1 * N0 ;
+
+ for ( int i0 = 0 ; i0 < 7 ; ++i0 )
+ for ( int i1 = 0 ; i1 < 8 ; ++i1 ) {
+ int val = 1 + i0 + i1 * N0;
+ ASSERT_EQ( b(i0,i1) , val );
+ }
+
+ Kokkos::Experimental::realloc( c , layout );
+ Kokkos::Experimental::realloc( d , layout );
+
+ ASSERT_EQ( b.dimension_0() , 7 );
+ ASSERT_EQ( b.dimension_1() , 8 );
+ ASSERT_EQ( c.dimension_0() , 7 );
+ ASSERT_EQ( c.dimension_1() , 8 );
+ ASSERT_EQ( d.dimension_0() , 7 );
+ ASSERT_EQ( d.dimension_1() , 8 );
+
+ }
+
+ {
+ typedef Kokkos::View<int**,Kokkos::LayoutStride,Space> V ;
+ typedef typename V::HostMirror M ;
+ typedef typename Kokkos::View<int**,Kokkos::LayoutStride,Space>::array_layout layout_type;
+
+ constexpr int N0 = 10 ;
+ constexpr int N1 = 11 ;
+
+ const int dimensions[] = {N0,N1};
+ const int order[] = {1,0};
+
+ V a("a",Kokkos::LayoutStride::order_dimensions(2,order,dimensions));
+ M b = Kokkos::Experimental::create_mirror(a);
+ M c = Kokkos::Experimental::create_mirror_view(a);
+ M d ;
+
+ for ( int i0 = 0 ; i0 < N0 ; ++i0 )
+ for ( int i1 = 0 ; i1 < N1 ; ++i1 )
+ b(i0,i1) = 1 + i0 + i1 * N0 ;
+
+ Kokkos::Experimental::deep_copy( a , b );
+ Kokkos::Experimental::deep_copy( c , a );
+
+ for ( int i0 = 0 ; i0 < N0 ; ++i0 )
+ for ( int i1 = 0 ; i1 < N1 ; ++i1 )
+ ASSERT_EQ( b(i0,i1) , c(i0,i1) );
+
+ const int dimensions2[] = {7,8};
+ const int order2[] = {1,0};
+ layout_type layout = layout_type::order_dimensions(2,order2,dimensions2);
+ Kokkos::Experimental::resize( b , layout );
+
+ for ( int i0 = 0 ; i0 < 7 ; ++i0 )
+ for ( int i1 = 0 ; i1 < 8 ; ++i1 ) {
+ int val = 1 + i0 + i1 * N0;
+ ASSERT_EQ( b(i0,i1) , c(i0,i1) );
+ ASSERT_EQ( b(i0,i1) , val );
+ }
+
+ Kokkos::Experimental::realloc( c , layout );
+ Kokkos::Experimental::realloc( d , layout );
+
+ ASSERT_EQ( b.dimension_0() , 7 );
+ ASSERT_EQ( b.dimension_1() , 8 );
+ ASSERT_EQ( c.dimension_0() , 7 );
+ ASSERT_EQ( c.dimension_1() , 8 );
+ ASSERT_EQ( d.dimension_0() , 7 );
+ ASSERT_EQ( d.dimension_1() , 8 );
+
}
{
- typedef Kokkos::Experimental::View<int*,Space> V ;
- typedef Kokkos::Experimental::View<int*,Space,Kokkos::MemoryUnmanaged> U ;
+ typedef Kokkos::View<int*,Space> V ;
+ typedef Kokkos::View<int*,Space,Kokkos::MemoryUnmanaged> U ;
V a("a",10);
ASSERT_EQ( a.use_count() , 1 );
V b = a ;
ASSERT_EQ( a.use_count() , 2 );
ASSERT_EQ( b.use_count() , 2 );
{
U c = b ; // 'c' is compile-time unmanaged
ASSERT_EQ( a.use_count() , 2 );
ASSERT_EQ( b.use_count() , 2 );
ASSERT_EQ( c.use_count() , 2 );
V d = c ; // 'd' is run-time unmanaged
ASSERT_EQ( a.use_count() , 2 );
ASSERT_EQ( b.use_count() , 2 );
ASSERT_EQ( c.use_count() , 2 );
ASSERT_EQ( d.use_count() , 2 );
}
ASSERT_EQ( a.use_count() , 2 );
ASSERT_EQ( b.use_count() , 2 );
b = V();
ASSERT_EQ( a.use_count() , 1 );
ASSERT_EQ( b.use_count() , 0 );
-#if KOKKOS_USING_EXP_VIEW && ! defined ( KOKKOS_CUDA_USE_LAMBDA )
+#if ! defined ( KOKKOS_CUDA_USE_LAMBDA )
/* Cannot launch host lambda when CUDA lambda is enabled */
- typedef typename Kokkos::Impl::is_space< Space >::host_execution_space
+ typedef typename Kokkos::Impl::HostMirror< Space >::Space::execution_space
host_exec_space ;
Kokkos::parallel_for(
Kokkos::RangePolicy< host_exec_space >(0,10) ,
KOKKOS_LAMBDA( int i ){
// 'a' is captured by copy and the capture mechanism
// converts 'a' to an unmanaged copy.
// When the parallel dispatch accepts a move for the lambda
// this count should become 1
ASSERT_EQ( a.use_count() , 2 );
V x = a ;
ASSERT_EQ( a.use_count() , 2 );
ASSERT_EQ( x.use_count() , 2 );
});
#endif /* #if ! defined ( KOKKOS_CUDA_USE_LAMBDA ) */
}
}
template< class Space >
struct TestViewMappingSubview
{
typedef typename Space::execution_space ExecSpace ;
typedef typename Space::memory_space MemSpace ;
typedef Kokkos::pair<int,int> range ;
enum { AN = 10 };
- typedef Kokkos::Experimental::View<int*,ExecSpace> AT ;
- typedef Kokkos::Experimental::View<const int*,ExecSpace> ACT ;
- typedef Kokkos::Experimental::Subview< AT , range > AS ;
+ typedef Kokkos::View<int*,ExecSpace> AT ;
+ typedef Kokkos::View<const int*,ExecSpace> ACT ;
+ typedef Kokkos::Subview< AT , range > AS ;
enum { BN0 = 10 , BN1 = 11 , BN2 = 12 };
- typedef Kokkos::Experimental::View<int***,ExecSpace> BT ;
- typedef Kokkos::Experimental::Subview< BT , range , range , range > BS ;
+ typedef Kokkos::View<int***,ExecSpace> BT ;
+ typedef Kokkos::Subview< BT , range , range , range > BS ;
enum { CN0 = 10 , CN1 = 11 , CN2 = 12 };
- typedef Kokkos::Experimental::View<int***[13][14],ExecSpace> CT ;
- typedef Kokkos::Experimental::Subview< CT , range , range , range , int , int > CS ;
+ typedef Kokkos::View<int***[13][14],ExecSpace> CT ;
+ typedef Kokkos::Subview< CT , range , range , range , int , int > CS ;
enum { DN0 = 10 , DN1 = 11 , DN2 = 12 , DN3 = 13 , DN4 = 14 };
- typedef Kokkos::Experimental::View<int***[DN3][DN4],ExecSpace> DT ;
- typedef Kokkos::Experimental::Subview< DT , int , range , range , range , int > DS ;
+ typedef Kokkos::View<int***[DN3][DN4],ExecSpace> DT ;
+ typedef Kokkos::Subview< DT , int , range , range , range , int > DS ;
- typedef Kokkos::Experimental::View<int***[13][14],Kokkos::LayoutLeft,ExecSpace> DLT ;
- typedef Kokkos::Experimental::Subview< DLT , range , int , int , int , int > DLS1 ;
+ typedef Kokkos::View<int***[13][14],Kokkos::LayoutLeft,ExecSpace> DLT ;
+ typedef Kokkos::Subview< DLT , range , int , int , int , int > DLS1 ;
static_assert( DLS1::rank == 1 && std::is_same< typename DLS1::array_layout , Kokkos::LayoutLeft >::value
, "Subview layout error for rank 1 subview of left-most range of LayoutLeft" );
- typedef Kokkos::Experimental::View<int***[13][14],Kokkos::LayoutRight,ExecSpace> DRT ;
- typedef Kokkos::Experimental::Subview< DRT , int , int , int , int , range > DRS1 ;
+ typedef Kokkos::View<int***[13][14],Kokkos::LayoutRight,ExecSpace> DRT ;
+ typedef Kokkos::Subview< DRT , int , int , int , int , range > DRS1 ;
static_assert( DRS1::rank == 1 && std::is_same< typename DRS1::array_layout , Kokkos::LayoutRight >::value
, "Subview layout error for rank 1 subview of right-most range of LayoutRight" );
AT Aa ;
AS Ab ;
ACT Ac ;
BT Ba ;
BS Bb ;
CT Ca ;
CS Cb ;
DT Da ;
DS Db ;
TestViewMappingSubview()
: Aa("Aa",AN)
, Ab( Kokkos::Experimental::subview( Aa , std::pair<int,int>(1,AN-1) ) )
, Ac( Aa , std::pair<int,int>(1,AN-1) )
, Ba("Ba",BN0,BN1,BN2)
, Bb( Kokkos::Experimental::subview( Ba
, std::pair<int,int>(1,BN0-1)
, std::pair<int,int>(1,BN1-1)
, std::pair<int,int>(1,BN2-1)
) )
, Ca("Ca",CN0,CN1,CN2)
, Cb( Kokkos::Experimental::subview( Ca
, std::pair<int,int>(1,CN0-1)
, std::pair<int,int>(1,CN1-1)
, std::pair<int,int>(1,CN2-1)
, 1
, 2
) )
, Da("Da",DN0,DN1,DN2)
, Db( Kokkos::Experimental::subview( Da
, 1
, std::pair<int,int>(1,DN1-1)
, std::pair<int,int>(1,DN2-1)
, std::pair<int,int>(1,DN3-1)
, 2
) )
{
}
KOKKOS_INLINE_FUNCTION
void operator()( const int , long & error_count ) const
{
auto Ad = Kokkos::Experimental::subview< Kokkos::MemoryUnmanaged >( Aa , Kokkos::pair<int,int>(1,AN-1) );
for ( int i = 1 ; i < AN-1 ; ++i ) if( & Aa[i] != & Ab[i-1] ) ++error_count ;
for ( int i = 1 ; i < AN-1 ; ++i ) if( & Aa[i] != & Ac[i-1] ) ++error_count ;
for ( int i = 1 ; i < AN-1 ; ++i ) if( & Aa[i] != & Ad[i-1] ) ++error_count ;
for ( int i2 = 1 ; i2 < BN2-1 ; ++i2 ) {
for ( int i1 = 1 ; i1 < BN1-1 ; ++i1 ) {
for ( int i0 = 1 ; i0 < BN0-1 ; ++i0 ) {
if ( & Ba(i0,i1,i2) != & Bb(i0-1,i1-1,i2-1) ) ++error_count ;
}}}
for ( int i2 = 1 ; i2 < CN2-1 ; ++i2 ) {
for ( int i1 = 1 ; i1 < CN1-1 ; ++i1 ) {
for ( int i0 = 1 ; i0 < CN0-1 ; ++i0 ) {
if ( & Ca(i0,i1,i2,1,2) != & Cb(i0-1,i1-1,i2-1) ) ++error_count ;
}}}
for ( int i2 = 1 ; i2 < DN3-1 ; ++i2 ) {
for ( int i1 = 1 ; i1 < DN2-1 ; ++i1 ) {
for ( int i0 = 1 ; i0 < DN1-1 ; ++i0 ) {
if ( & Da(1,i0,i1,i2,2) != & Db(i0-1,i1-1,i2-1) ) ++error_count ;
}}}
}
static void run()
{
TestViewMappingSubview self ;
ASSERT_EQ( self.Aa.dimension_0() , AN );
ASSERT_EQ( self.Ab.dimension_0() , AN - 2 );
ASSERT_EQ( self.Ac.dimension_0() , AN - 2 );
ASSERT_EQ( self.Ba.dimension_0() , BN0 );
ASSERT_EQ( self.Ba.dimension_1() , BN1 );
ASSERT_EQ( self.Ba.dimension_2() , BN2 );
ASSERT_EQ( self.Bb.dimension_0() , BN0 - 2 );
ASSERT_EQ( self.Bb.dimension_1() , BN1 - 2 );
ASSERT_EQ( self.Bb.dimension_2() , BN2 - 2 );
ASSERT_EQ( self.Ca.dimension_0() , CN0 );
ASSERT_EQ( self.Ca.dimension_1() , CN1 );
ASSERT_EQ( self.Ca.dimension_2() , CN2 );
ASSERT_EQ( self.Ca.dimension_3() , 13 );
ASSERT_EQ( self.Ca.dimension_4() , 14 );
ASSERT_EQ( self.Cb.dimension_0() , CN0 - 2 );
ASSERT_EQ( self.Cb.dimension_1() , CN1 - 2 );
ASSERT_EQ( self.Cb.dimension_2() , CN2 - 2 );
ASSERT_EQ( self.Da.dimension_0() , DN0 );
ASSERT_EQ( self.Da.dimension_1() , DN1 );
ASSERT_EQ( self.Da.dimension_2() , DN2 );
ASSERT_EQ( self.Da.dimension_3() , DN3 );
ASSERT_EQ( self.Da.dimension_4() , DN4 );
ASSERT_EQ( self.Db.dimension_0() , DN1 - 2 );
ASSERT_EQ( self.Db.dimension_1() , DN2 - 2 );
ASSERT_EQ( self.Db.dimension_2() , DN3 - 2 );
ASSERT_EQ( self.Da.stride_1() , self.Db.stride_0() );
ASSERT_EQ( self.Da.stride_2() , self.Db.stride_1() );
ASSERT_EQ( self.Da.stride_3() , self.Db.stride_2() );
long error_count = -1 ;
Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >(0,1) , self , error_count );
ASSERT_EQ( error_count , 0 );
}
};
template< class Space >
void test_view_mapping_subview()
{
typedef typename Space::execution_space ExecSpace ;
TestViewMappingSubview< ExecSpace >::run();
}
/*--------------------------------------------------------------------------*/
template< class ViewType >
struct TestViewMapOperator {
static_assert( ViewType::reference_type_is_lvalue_reference
, "Test only valid for lvalue reference type" );
const ViewType v ;
KOKKOS_INLINE_FUNCTION
void test_left( size_t i0 , long & error_count ) const
{
typename ViewType::value_type * const base_ptr = & v(0,0,0,0,0,0,0,0);
const size_t n1 = v.dimension_1();
const size_t n2 = v.dimension_2();
const size_t n3 = v.dimension_3();
const size_t n4 = v.dimension_4();
const size_t n5 = v.dimension_5();
const size_t n6 = v.dimension_6();
const size_t n7 = v.dimension_7();
long offset = 0 ;
for ( size_t i7 = 0 ; i7 < n7 ; ++i7 )
for ( size_t i6 = 0 ; i6 < n6 ; ++i6 )
for ( size_t i5 = 0 ; i5 < n5 ; ++i5 )
for ( size_t i4 = 0 ; i4 < n4 ; ++i4 )
for ( size_t i3 = 0 ; i3 < n3 ; ++i3 )
for ( size_t i2 = 0 ; i2 < n2 ; ++i2 )
for ( size_t i1 = 0 ; i1 < n1 ; ++i1 )
{
const long d = & v(i0,i1,i2,i3,i4,i5,i6,i7) - base_ptr ;
if ( d < offset ) ++error_count ;
offset = d ;
}
if ( v.span() <= size_t(offset) ) ++error_count ;
}
KOKKOS_INLINE_FUNCTION
void test_right( size_t i0 , long & error_count ) const
{
typename ViewType::value_type * const base_ptr = & v(0,0,0,0,0,0,0,0);
const size_t n1 = v.dimension_1();
const size_t n2 = v.dimension_2();
const size_t n3 = v.dimension_3();
const size_t n4 = v.dimension_4();
const size_t n5 = v.dimension_5();
const size_t n6 = v.dimension_6();
const size_t n7 = v.dimension_7();
long offset = 0 ;
for ( size_t i1 = 0 ; i1 < n1 ; ++i1 )
for ( size_t i2 = 0 ; i2 < n2 ; ++i2 )
for ( size_t i3 = 0 ; i3 < n3 ; ++i3 )
for ( size_t i4 = 0 ; i4 < n4 ; ++i4 )
for ( size_t i5 = 0 ; i5 < n5 ; ++i5 )
for ( size_t i6 = 0 ; i6 < n6 ; ++i6 )
for ( size_t i7 = 0 ; i7 < n7 ; ++i7 )
{
const long d = & v(i0,i1,i2,i3,i4,i5,i6,i7) - base_ptr ;
if ( d < offset ) ++error_count ;
offset = d ;
}
if ( v.span() <= size_t(offset) ) ++error_count ;
}
KOKKOS_INLINE_FUNCTION
void operator()( size_t i , long & error_count ) const
{
if ( std::is_same< typename ViewType::array_layout , Kokkos::LayoutLeft >::value )
test_left(i,error_count);
else if ( std::is_same< typename ViewType::array_layout , Kokkos::LayoutRight >::value )
test_right(i,error_count);
}
constexpr static size_t N0 = 10 ;
constexpr static size_t N1 = 9 ;
constexpr static size_t N2 = 8 ;
constexpr static size_t N3 = 7 ;
constexpr static size_t N4 = 6 ;
constexpr static size_t N5 = 5 ;
constexpr static size_t N6 = 4 ;
constexpr static size_t N7 = 3 ;
TestViewMapOperator() : v( "Test" , N0, N1, N2, N3, N4, N5, N6, N7 ) {}
static void run()
{
TestViewMapOperator self ;
ASSERT_EQ( self.v.dimension_0() , ( 0 < ViewType::rank ? N0 : 1 ) );
ASSERT_EQ( self.v.dimension_1() , ( 1 < ViewType::rank ? N1 : 1 ) );
ASSERT_EQ( self.v.dimension_2() , ( 2 < ViewType::rank ? N2 : 1 ) );
ASSERT_EQ( self.v.dimension_3() , ( 3 < ViewType::rank ? N3 : 1 ) );
ASSERT_EQ( self.v.dimension_4() , ( 4 < ViewType::rank ? N4 : 1 ) );
ASSERT_EQ( self.v.dimension_5() , ( 5 < ViewType::rank ? N5 : 1 ) );
ASSERT_EQ( self.v.dimension_6() , ( 6 < ViewType::rank ? N6 : 1 ) );
ASSERT_EQ( self.v.dimension_7() , ( 7 < ViewType::rank ? N7 : 1 ) );
ASSERT_LE( self.v.dimension_0()*
self.v.dimension_1()*
self.v.dimension_2()*
self.v.dimension_3()*
self.v.dimension_4()*
self.v.dimension_5()*
self.v.dimension_6()*
self.v.dimension_7()
, self.v.span() );
long error_count ;
Kokkos::RangePolicy< typename ViewType::execution_space > range(0,self.v.dimension_0());
Kokkos::parallel_reduce( range , self , error_count );
ASSERT_EQ( 0 , error_count );
}
};
template< class Space >
void test_view_mapping_operator()
{
typedef typename Space::execution_space ExecSpace ;
- TestViewMapOperator< Kokkos::Experimental::View<int,Kokkos::LayoutLeft,ExecSpace> >::run();
- TestViewMapOperator< Kokkos::Experimental::View<int*,Kokkos::LayoutLeft,ExecSpace> >::run();
- TestViewMapOperator< Kokkos::Experimental::View<int**,Kokkos::LayoutLeft,ExecSpace> >::run();
- TestViewMapOperator< Kokkos::Experimental::View<int***,Kokkos::LayoutLeft,ExecSpace> >::run();
- TestViewMapOperator< Kokkos::Experimental::View<int****,Kokkos::LayoutLeft,ExecSpace> >::run();
- TestViewMapOperator< Kokkos::Experimental::View<int*****,Kokkos::LayoutLeft,ExecSpace> >::run();
- TestViewMapOperator< Kokkos::Experimental::View<int******,Kokkos::LayoutLeft,ExecSpace> >::run();
- TestViewMapOperator< Kokkos::Experimental::View<int*******,Kokkos::LayoutLeft,ExecSpace> >::run();
-
- TestViewMapOperator< Kokkos::Experimental::View<int,Kokkos::LayoutRight,ExecSpace> >::run();
- TestViewMapOperator< Kokkos::Experimental::View<int*,Kokkos::LayoutRight,ExecSpace> >::run();
- TestViewMapOperator< Kokkos::Experimental::View<int**,Kokkos::LayoutRight,ExecSpace> >::run();
- TestViewMapOperator< Kokkos::Experimental::View<int***,Kokkos::LayoutRight,ExecSpace> >::run();
- TestViewMapOperator< Kokkos::Experimental::View<int****,Kokkos::LayoutRight,ExecSpace> >::run();
- TestViewMapOperator< Kokkos::Experimental::View<int*****,Kokkos::LayoutRight,ExecSpace> >::run();
- TestViewMapOperator< Kokkos::Experimental::View<int******,Kokkos::LayoutRight,ExecSpace> >::run();
- TestViewMapOperator< Kokkos::Experimental::View<int*******,Kokkos::LayoutRight,ExecSpace> >::run();
+ TestViewMapOperator< Kokkos::View<int,Kokkos::LayoutLeft,ExecSpace> >::run();
+ TestViewMapOperator< Kokkos::View<int*,Kokkos::LayoutLeft,ExecSpace> >::run();
+ TestViewMapOperator< Kokkos::View<int**,Kokkos::LayoutLeft,ExecSpace> >::run();
+ TestViewMapOperator< Kokkos::View<int***,Kokkos::LayoutLeft,ExecSpace> >::run();
+ TestViewMapOperator< Kokkos::View<int****,Kokkos::LayoutLeft,ExecSpace> >::run();
+ TestViewMapOperator< Kokkos::View<int*****,Kokkos::LayoutLeft,ExecSpace> >::run();
+ TestViewMapOperator< Kokkos::View<int******,Kokkos::LayoutLeft,ExecSpace> >::run();
+ TestViewMapOperator< Kokkos::View<int*******,Kokkos::LayoutLeft,ExecSpace> >::run();
+
+ TestViewMapOperator< Kokkos::View<int,Kokkos::LayoutRight,ExecSpace> >::run();
+ TestViewMapOperator< Kokkos::View<int*,Kokkos::LayoutRight,ExecSpace> >::run();
+ TestViewMapOperator< Kokkos::View<int**,Kokkos::LayoutRight,ExecSpace> >::run();
+ TestViewMapOperator< Kokkos::View<int***,Kokkos::LayoutRight,ExecSpace> >::run();
+ TestViewMapOperator< Kokkos::View<int****,Kokkos::LayoutRight,ExecSpace> >::run();
+ TestViewMapOperator< Kokkos::View<int*****,Kokkos::LayoutRight,ExecSpace> >::run();
+ TestViewMapOperator< Kokkos::View<int******,Kokkos::LayoutRight,ExecSpace> >::run();
+ TestViewMapOperator< Kokkos::View<int*******,Kokkos::LayoutRight,ExecSpace> >::run();
}
/*--------------------------------------------------------------------------*/
template< class Space >
struct TestViewMappingAtomic {
typedef typename Space::execution_space ExecSpace ;
typedef typename Space::memory_space MemSpace ;
typedef Kokkos::MemoryTraits< Kokkos::Atomic > mem_trait ;
- typedef Kokkos::Experimental::View< int * , ExecSpace > T ;
- typedef Kokkos::Experimental::View< int * , ExecSpace , mem_trait > T_atom ;
+ typedef Kokkos::View< int * , ExecSpace > T ;
+ typedef Kokkos::View< int * , ExecSpace , mem_trait > T_atom ;
T x ;
T_atom x_atom ;
constexpr static size_t N = 100000 ;
struct TagInit {};
struct TagUpdate {};
struct TagVerify {};
KOKKOS_INLINE_FUNCTION
void operator()( const TagInit & , const int i ) const
{ x(i) = i ; }
KOKKOS_INLINE_FUNCTION
void operator()( const TagUpdate & , const int i ) const
{ x_atom(i%2) += 1 ; }
KOKKOS_INLINE_FUNCTION
void operator()( const TagVerify & , const int i , long & error_count ) const
{
if ( i < 2 ) { if ( x(i) != int(i + N / 2) ) ++error_count ; }
else { if ( x(i) != int(i) ) ++error_count ; }
}
TestViewMappingAtomic()
: x("x",N)
, x_atom( x )
{}
static void run()
{
ASSERT_TRUE( T::reference_type_is_lvalue_reference );
ASSERT_FALSE( T_atom::reference_type_is_lvalue_reference );
TestViewMappingAtomic self ;
Kokkos::parallel_for( Kokkos::RangePolicy< ExecSpace , TagInit >(0,N) , self );
Kokkos::parallel_for( Kokkos::RangePolicy< ExecSpace , TagUpdate >(0,N) , self );
long error_count = -1 ;
Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace , TagVerify >(0,N) , self , error_count );
ASSERT_EQ( 0 , error_count );
}
};
/*--------------------------------------------------------------------------*/
template< class Space >
struct TestViewMappingClassValue {
typedef typename Space::execution_space ExecSpace ;
typedef typename Space::memory_space MemSpace ;
struct ValueType {
KOKKOS_INLINE_FUNCTION
ValueType()
{
#if 0
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_CUDA )
printf("TestViewMappingClassValue construct on Cuda\n");
#elif defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
printf("TestViewMappingClassValue construct on Host\n");
#else
printf("TestViewMappingClassValue construct unknown\n");
#endif
#endif
}
KOKKOS_INLINE_FUNCTION
~ValueType()
{
#if 0
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_CUDA )
printf("TestViewMappingClassValue destruct on Cuda\n");
#elif defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
printf("TestViewMappingClassValue destruct on Host\n");
#else
printf("TestViewMappingClassValue destruct unknown\n");
#endif
#endif
}
};
static void run()
{
using namespace Kokkos::Experimental ;
ExecSpace::fence();
{
View< ValueType , ExecSpace > a("a");
ExecSpace::fence();
}
ExecSpace::fence();
}
};
} /* namespace Test */
/*--------------------------------------------------------------------------*/
diff --git a/lib/kokkos/core/unit_test/TestViewOfClass.hpp b/lib/kokkos/core/unit_test/TestViewOfClass.hpp
index 9b23a5d55..381b8786b 100644
--- a/lib/kokkos/core/unit_test/TestViewOfClass.hpp
+++ b/lib/kokkos/core/unit_test/TestViewOfClass.hpp
@@ -1,163 +1,131 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
#include <stdexcept>
#include <sstream>
#include <iostream>
/*--------------------------------------------------------------------------*/
namespace Test {
template< class Space >
struct NestedView {
Kokkos::View<int*,Space> member ;
public:
KOKKOS_INLINE_FUNCTION
NestedView() : member()
{}
KOKKOS_INLINE_FUNCTION
NestedView & operator = ( const Kokkos::View<int*,Space> & lhs )
{
member = lhs ;
if ( member.dimension_0() ) Kokkos::atomic_add( & member(0) , 1 );
return *this ;
}
KOKKOS_INLINE_FUNCTION
~NestedView()
{
if ( member.dimension_0() ) {
Kokkos::atomic_add( & member(0) , -1 );
}
}
};
template< class Space >
struct NestedViewFunctor {
Kokkos::View< NestedView<Space> * , Space > nested ;
Kokkos::View<int*,Space> array ;
NestedViewFunctor(
const Kokkos::View< NestedView<Space> * , Space > & arg_nested ,
const Kokkos::View<int*,Space> & arg_array )
: nested( arg_nested )
, array( arg_array )
{}
KOKKOS_INLINE_FUNCTION
void operator()( int i ) const
{ nested[i] = array ; }
};
template< class Space >
void view_nested_view()
{
Kokkos::View<int*,Space> tracking("tracking",1);
typename Kokkos::View<int*,Space>::HostMirror
host_tracking = Kokkos::create_mirror( tracking );
{
Kokkos::View< NestedView<Space> * , Space > a("a_nested_view",2);
Kokkos::parallel_for( Kokkos::RangePolicy<Space>(0,2) , NestedViewFunctor<Space>( a , tracking ) );
Kokkos::deep_copy( host_tracking , tracking );
ASSERT_EQ( 2 , host_tracking(0) );
Kokkos::View< NestedView<Space> * , Space > b("b_nested_view",2);
Kokkos::parallel_for( Kokkos::RangePolicy<Space>(0,2) , NestedViewFunctor<Space>( b , tracking ) );
Kokkos::deep_copy( host_tracking , tracking );
ASSERT_EQ( 4 , host_tracking(0) );
}
Kokkos::deep_copy( host_tracking , tracking );
-#if KOKKOS_USING_EXP_VIEW
ASSERT_EQ( 0 , host_tracking(0) );
-#endif
-
}
}
-#if ! KOKKOS_USING_EXP_VIEW
-
-namespace Kokkos {
-namespace Impl {
-
-template< class ExecSpace , class S >
-struct ViewDefaultConstruct< ExecSpace , Test::NestedView<S> , true >
-{
- typedef Test::NestedView<S> type ;
- type * const m_ptr ;
-
- KOKKOS_FORCEINLINE_FUNCTION
- void operator()( const typename ExecSpace::size_type& i ) const
- { new(m_ptr+i) type(); }
-
- ViewDefaultConstruct( type * pointer , size_t capacity )
- : m_ptr( pointer )
- {
- Kokkos::RangePolicy< ExecSpace > range( 0 , capacity );
- parallel_for( range , *this );
- ExecSpace::fence();
- }
-};
-
-} // namespace Impl
-} // namespace Kokkos
-
-#endif
-
/*--------------------------------------------------------------------------*/
diff --git a/lib/kokkos/core/unit_test/TestViewSubview.hpp b/lib/kokkos/core/unit_test/TestViewSubview.hpp
index 3846354b8..1c2575b6f 100644
--- a/lib/kokkos/core/unit_test/TestViewSubview.hpp
+++ b/lib/kokkos/core/unit_test/TestViewSubview.hpp
@@ -1,874 +1,1239 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
#include <stdexcept>
#include <sstream>
#include <iostream>
/*--------------------------------------------------------------------------*/
namespace TestViewSubview {
template<class Layout, class Space>
struct getView {
static
Kokkos::View<double**,Layout,Space> get(int n, int m) {
return Kokkos::View<double**,Layout,Space>("G",n,m);
}
};
template<class Space>
struct getView<Kokkos::LayoutStride,Space> {
static
Kokkos::View<double**,Kokkos::LayoutStride,Space> get(int n, int m) {
const int rank = 2 ;
const int order[] = { 0, 1 };
const unsigned dim[] = { unsigned(n), unsigned(m) };
Kokkos::LayoutStride stride = Kokkos::LayoutStride::order_dimensions( rank , order , dim );
return Kokkos::View<double**,Kokkos::LayoutStride,Space>("G",stride);
}
};
template<class ViewType, class Space>
struct fill_1D {
typedef typename Space::execution_space execution_space;
typedef typename ViewType::size_type size_type;
ViewType a;
double val;
fill_1D(ViewType a_, double val_):a(a_),val(val_) {
}
KOKKOS_INLINE_FUNCTION
void operator() (const int i) const {
a(i) = val;
}
};
template<class ViewType, class Space>
struct fill_2D {
typedef typename Space::execution_space execution_space;
typedef typename ViewType::size_type size_type;
ViewType a;
double val;
fill_2D(ViewType a_, double val_):a(a_),val(val_) {
}
KOKKOS_INLINE_FUNCTION
void operator() (const int i) const{
for(int j = 0; j < static_cast<int>(a.dimension_1()); j++)
a(i,j) = val;
}
};
template<class Layout, class Space>
void test_auto_1d ()
{
typedef Kokkos::View<double**, Layout, Space> mv_type;
typedef typename mv_type::size_type size_type;
const double ZERO = 0.0;
const double ONE = 1.0;
const double TWO = 2.0;
const size_type numRows = 10;
const size_type numCols = 3;
mv_type X = getView<Layout,Space>::get(numRows, numCols);
typename mv_type::HostMirror X_h = Kokkos::create_mirror_view (X);
fill_2D<mv_type,Space> f1(X, ONE);
Kokkos::parallel_for(X.dimension_0(),f1);
Kokkos::deep_copy (X_h, X);
for (size_type j = 0; j < numCols; ++j) {
for (size_type i = 0; i < numRows; ++i) {
ASSERT_TRUE(X_h(i,j) == ONE);
}
}
fill_2D<mv_type,Space> f2(X, 0.0);
Kokkos::parallel_for(X.dimension_0(),f2);
Kokkos::deep_copy (X_h, X);
for (size_type j = 0; j < numCols; ++j) {
for (size_type i = 0; i < numRows; ++i) {
ASSERT_TRUE(X_h(i,j) == ZERO);
}
}
fill_2D<mv_type,Space> f3(X, TWO);
Kokkos::parallel_for(X.dimension_0(),f3);
Kokkos::deep_copy (X_h, X);
for (size_type j = 0; j < numCols; ++j) {
for (size_type i = 0; i < numRows; ++i) {
ASSERT_TRUE(X_h(i,j) == TWO);
}
}
for (size_type j = 0; j < numCols; ++j) {
- auto X_j = Kokkos::subview (X, Kokkos::ALL(), j);
+ auto X_j = Kokkos::subview (X, Kokkos::ALL, j);
fill_1D<decltype(X_j),Space> f4(X_j, ZERO);
Kokkos::parallel_for(X_j.dimension_0(),f4);
Kokkos::deep_copy (X_h, X);
for (size_type i = 0; i < numRows; ++i) {
ASSERT_TRUE(X_h(i,j) == ZERO);
}
for (size_type jj = 0; jj < numCols; ++jj) {
- auto X_jj = Kokkos::subview (X, Kokkos::ALL(), jj);
+ auto X_jj = Kokkos::subview (X, Kokkos::ALL, jj);
fill_1D<decltype(X_jj),Space> f5(X_jj, ONE);
Kokkos::parallel_for(X_jj.dimension_0(),f5);
Kokkos::deep_copy (X_h, X);
for (size_type i = 0; i < numRows; ++i) {
ASSERT_TRUE(X_h(i,jj) == ONE);
}
}
}
}
template<class LD, class LS, class Space>
void test_1d_strided_assignment_impl(bool a, bool b, bool c, bool d, int n, int m) {
Kokkos::View<double**,LS,Space> l2d("l2d",n,m);
int col = n>2?2:0;
int row = m>2?2:0;
- if(Kokkos::Impl::VerifyExecutionCanAccessMemorySpace<Kokkos::HostSpace,Space>::value) {
+ if(Kokkos::Impl::SpaceAccessibility<Kokkos::HostSpace,typename Space::memory_space>::accessible) {
if(a) {
- Kokkos::View<double*,LD,Space> l1da = Kokkos::subview(l2d,Kokkos::ALL(),row);
+ Kokkos::View<double*,LD,Space> l1da = Kokkos::subview(l2d,Kokkos::ALL,row);
ASSERT_TRUE( & l1da(0) == & l2d(0,row) );
if(n>1)
ASSERT_TRUE( & l1da(1) == & l2d(1,row) );
}
if(b && n>13) {
Kokkos::View<double*,LD,Space> l1db = Kokkos::subview(l2d,std::pair<unsigned,unsigned>(2,13),row);
ASSERT_TRUE( & l1db(0) == & l2d(2,row) );
ASSERT_TRUE( & l1db(1) == & l2d(3,row) );
}
if(c) {
- Kokkos::View<double*,LD,Space> l1dc = Kokkos::subview(l2d,col,Kokkos::ALL());
+ Kokkos::View<double*,LD,Space> l1dc = Kokkos::subview(l2d,col,Kokkos::ALL);
ASSERT_TRUE( & l1dc(0) == & l2d(col,0) );
if(m>1)
ASSERT_TRUE( & l1dc(1) == & l2d(col,1) );
}
if(d && m>13) {
Kokkos::View<double*,LD,Space> l1dd = Kokkos::subview(l2d,col,std::pair<unsigned,unsigned>(2,13));
ASSERT_TRUE( & l1dd(0) == & l2d(col,2) );
ASSERT_TRUE( & l1dd(1) == & l2d(col,3) );
}
}
}
template<class Space >
void test_1d_strided_assignment() {
test_1d_strided_assignment_impl<Kokkos::LayoutStride,Kokkos::LayoutLeft,Space>(true,true,true,true,17,3);
test_1d_strided_assignment_impl<Kokkos::LayoutStride,Kokkos::LayoutRight,Space>(true,true,true,true,17,3);
test_1d_strided_assignment_impl<Kokkos::LayoutLeft,Kokkos::LayoutLeft,Space>(true,true,false,false,17,3);
test_1d_strided_assignment_impl<Kokkos::LayoutRight,Kokkos::LayoutLeft,Space>(true,true,false,false,17,3);
test_1d_strided_assignment_impl<Kokkos::LayoutLeft,Kokkos::LayoutRight,Space>(false,false,true,true,17,3);
test_1d_strided_assignment_impl<Kokkos::LayoutRight,Kokkos::LayoutRight,Space>(false,false,true,true,17,3);
test_1d_strided_assignment_impl<Kokkos::LayoutLeft,Kokkos::LayoutLeft,Space>(true,true,false,false,17,1);
test_1d_strided_assignment_impl<Kokkos::LayoutLeft,Kokkos::LayoutLeft,Space>(true,true,true,true,1,17);
test_1d_strided_assignment_impl<Kokkos::LayoutRight,Kokkos::LayoutLeft,Space>(true,true,true,true,1,17);
test_1d_strided_assignment_impl<Kokkos::LayoutRight,Kokkos::LayoutLeft,Space>(true,true,false,false,17,1);
test_1d_strided_assignment_impl<Kokkos::LayoutLeft,Kokkos::LayoutRight,Space>(true,true,true,true,17,1);
test_1d_strided_assignment_impl<Kokkos::LayoutLeft,Kokkos::LayoutRight,Space>(false,false,true,true,1,17);
test_1d_strided_assignment_impl<Kokkos::LayoutRight,Kokkos::LayoutRight,Space>(false,false,true,true,1,17);
test_1d_strided_assignment_impl<Kokkos::LayoutRight,Kokkos::LayoutRight,Space>(true,true,true,true,17,1);
}
template< class Space >
void test_left_0()
{
typedef Kokkos::View< int [2][3][4][5][2][3][4][5] , Kokkos::LayoutLeft , Space >
view_static_8_type ;
- if(Kokkos::Impl::VerifyExecutionCanAccessMemorySpace<Kokkos::HostSpace,Space>::value) {
+ if(Kokkos::Impl::SpaceAccessibility<Kokkos::HostSpace,typename Space::memory_space>::accessible) {
view_static_8_type x_static_8("x_static_left_8");
ASSERT_TRUE( x_static_8.is_contiguous() );
Kokkos::View<int,Kokkos::LayoutLeft,Space> x0 = Kokkos::subview( x_static_8 , 0, 0, 0, 0, 0, 0, 0, 0 );
ASSERT_TRUE( x0.is_contiguous() );
ASSERT_TRUE( & x0() == & x_static_8(0,0,0,0,0,0,0,0) );
Kokkos::View<int*,Kokkos::LayoutLeft,Space> x1 =
Kokkos::subview( x_static_8, Kokkos::pair<int,int>(0,2), 1, 2, 3, 0, 1, 2, 3 );
ASSERT_TRUE( x1.is_contiguous() );
ASSERT_TRUE( & x1(0) == & x_static_8(0,1,2,3,0,1,2,3) );
ASSERT_TRUE( & x1(1) == & x_static_8(1,1,2,3,0,1,2,3) );
Kokkos::View<int**,Kokkos::LayoutLeft,Space> x2 =
Kokkos::subview( x_static_8, Kokkos::pair<int,int>(0,2), 1, 2, 3
, Kokkos::pair<int,int>(0,2), 1, 2, 3 );
ASSERT_TRUE( ! x2.is_contiguous() );
ASSERT_TRUE( & x2(0,0) == & x_static_8(0,1,2,3,0,1,2,3) );
ASSERT_TRUE( & x2(1,0) == & x_static_8(1,1,2,3,0,1,2,3) );
ASSERT_TRUE( & x2(0,1) == & x_static_8(0,1,2,3,1,1,2,3) );
ASSERT_TRUE( & x2(1,1) == & x_static_8(1,1,2,3,1,1,2,3) );
// Kokkos::View<int**,Kokkos::LayoutLeft,Space> error_2 =
Kokkos::View<int**,Kokkos::LayoutStride,Space> sx2 =
Kokkos::subview( x_static_8, 1, Kokkos::pair<int,int>(0,2), 2, 3
, Kokkos::pair<int,int>(0,2), 1, 2, 3 );
ASSERT_TRUE( ! sx2.is_contiguous() );
ASSERT_TRUE( & sx2(0,0) == & x_static_8(1,0,2,3,0,1,2,3) );
ASSERT_TRUE( & sx2(1,0) == & x_static_8(1,1,2,3,0,1,2,3) );
ASSERT_TRUE( & sx2(0,1) == & x_static_8(1,0,2,3,1,1,2,3) );
ASSERT_TRUE( & sx2(1,1) == & x_static_8(1,1,2,3,1,1,2,3) );
Kokkos::View<int****,Kokkos::LayoutStride,Space> sx4 =
Kokkos::subview( x_static_8, 0, Kokkos::pair<int,int>(0,2) /* of [3] */
, 1, Kokkos::pair<int,int>(1,3) /* of [5] */
, 1, Kokkos::pair<int,int>(0,2) /* of [3] */
, 2, Kokkos::pair<int,int>(2,4) /* of [5] */
);
ASSERT_TRUE( ! sx4.is_contiguous() );
for ( int i0 = 0 ; i0 < (int) sx4.dimension_0() ; ++i0 )
for ( int i1 = 0 ; i1 < (int) sx4.dimension_1() ; ++i1 )
for ( int i2 = 0 ; i2 < (int) sx4.dimension_2() ; ++i2 )
for ( int i3 = 0 ; i3 < (int) sx4.dimension_3() ; ++i3 ) {
ASSERT_TRUE( & sx4(i0,i1,i2,i3) == & x_static_8(0,0+i0, 1,1+i1, 1,0+i2, 2,2+i3) );
}
}
}
template< class Space >
void test_left_1()
{
typedef Kokkos::View< int ****[2][3][4][5] , Kokkos::LayoutLeft , Space >
view_type ;
- if(Kokkos::Impl::VerifyExecutionCanAccessMemorySpace<Kokkos::HostSpace,Space>::value) {
+ if(Kokkos::Impl::SpaceAccessibility<Kokkos::HostSpace,typename Space::memory_space>::accessible) {
view_type x8("x_left_8",2,3,4,5);
ASSERT_TRUE( x8.is_contiguous() );
Kokkos::View<int,Kokkos::LayoutLeft,Space> x0 = Kokkos::subview( x8 , 0, 0, 0, 0, 0, 0, 0, 0 );
ASSERT_TRUE( x0.is_contiguous() );
ASSERT_TRUE( & x0() == & x8(0,0,0,0,0,0,0,0) );
Kokkos::View<int*,Kokkos::LayoutLeft,Space> x1 =
Kokkos::subview( x8, Kokkos::pair<int,int>(0,2), 1, 2, 3, 0, 1, 2, 3 );
ASSERT_TRUE( x1.is_contiguous() );
ASSERT_TRUE( & x1(0) == & x8(0,1,2,3,0,1,2,3) );
ASSERT_TRUE( & x1(1) == & x8(1,1,2,3,0,1,2,3) );
Kokkos::View<int**,Kokkos::LayoutLeft,Space> x2 =
Kokkos::subview( x8, Kokkos::pair<int,int>(0,2), 1, 2, 3
, Kokkos::pair<int,int>(0,2), 1, 2, 3 );
ASSERT_TRUE( ! x2.is_contiguous() );
ASSERT_TRUE( & x2(0,0) == & x8(0,1,2,3,0,1,2,3) );
ASSERT_TRUE( & x2(1,0) == & x8(1,1,2,3,0,1,2,3) );
ASSERT_TRUE( & x2(0,1) == & x8(0,1,2,3,1,1,2,3) );
ASSERT_TRUE( & x2(1,1) == & x8(1,1,2,3,1,1,2,3) );
// Kokkos::View<int**,Kokkos::LayoutLeft,Space> error_2 =
Kokkos::View<int**,Kokkos::LayoutStride,Space> sx2 =
Kokkos::subview( x8, 1, Kokkos::pair<int,int>(0,2), 2, 3
, Kokkos::pair<int,int>(0,2), 1, 2, 3 );
ASSERT_TRUE( ! sx2.is_contiguous() );
ASSERT_TRUE( & sx2(0,0) == & x8(1,0,2,3,0,1,2,3) );
ASSERT_TRUE( & sx2(1,0) == & x8(1,1,2,3,0,1,2,3) );
ASSERT_TRUE( & sx2(0,1) == & x8(1,0,2,3,1,1,2,3) );
ASSERT_TRUE( & sx2(1,1) == & x8(1,1,2,3,1,1,2,3) );
Kokkos::View<int****,Kokkos::LayoutStride,Space> sx4 =
Kokkos::subview( x8, 0, Kokkos::pair<int,int>(0,2) /* of [3] */
, 1, Kokkos::pair<int,int>(1,3) /* of [5] */
, 1, Kokkos::pair<int,int>(0,2) /* of [3] */
, 2, Kokkos::pair<int,int>(2,4) /* of [5] */
);
ASSERT_TRUE( ! sx4.is_contiguous() );
for ( int i0 = 0 ; i0 < (int) sx4.dimension_0() ; ++i0 )
for ( int i1 = 0 ; i1 < (int) sx4.dimension_1() ; ++i1 )
for ( int i2 = 0 ; i2 < (int) sx4.dimension_2() ; ++i2 )
for ( int i3 = 0 ; i3 < (int) sx4.dimension_3() ; ++i3 ) {
ASSERT_TRUE( & sx4(i0,i1,i2,i3) == & x8(0,0+i0, 1,1+i1, 1,0+i2, 2,2+i3) );
}
}
}
template< class Space >
void test_left_2()
{
typedef Kokkos::View< int **** , Kokkos::LayoutLeft , Space > view_type ;
- if(Kokkos::Impl::VerifyExecutionCanAccessMemorySpace<Kokkos::HostSpace,Space>::value) {
+ if(Kokkos::Impl::SpaceAccessibility<Kokkos::HostSpace,typename Space::memory_space>::accessible) {
view_type x4("x4",2,3,4,5);
ASSERT_TRUE( x4.is_contiguous() );
Kokkos::View<int,Kokkos::LayoutLeft,Space> x0 = Kokkos::subview( x4 , 0, 0, 0, 0 );
ASSERT_TRUE( x0.is_contiguous() );
ASSERT_TRUE( & x0() == & x4(0,0,0,0) );
Kokkos::View<int*,Kokkos::LayoutLeft,Space> x1 =
Kokkos::subview( x4, Kokkos::pair<int,int>(0,2), 1, 2, 3 );
ASSERT_TRUE( x1.is_contiguous() );
ASSERT_TRUE( & x1(0) == & x4(0,1,2,3) );
ASSERT_TRUE( & x1(1) == & x4(1,1,2,3) );
Kokkos::View<int**,Kokkos::LayoutLeft,Space> x2 =
Kokkos::subview( x4, Kokkos::pair<int,int>(0,2), 1, Kokkos::pair<int,int>(1,3), 2 );
ASSERT_TRUE( ! x2.is_contiguous() );
ASSERT_TRUE( & x2(0,0) == & x4(0,1,1,2) );
ASSERT_TRUE( & x2(1,0) == & x4(1,1,1,2) );
ASSERT_TRUE( & x2(0,1) == & x4(0,1,2,2) );
ASSERT_TRUE( & x2(1,1) == & x4(1,1,2,2) );
// Kokkos::View<int**,Kokkos::LayoutLeft,Space> error_2 =
Kokkos::View<int**,Kokkos::LayoutStride,Space> sx2 =
Kokkos::subview( x4, 1, Kokkos::pair<int,int>(0,2)
, 2, Kokkos::pair<int,int>(1,4) );
ASSERT_TRUE( ! sx2.is_contiguous() );
ASSERT_TRUE( & sx2(0,0) == & x4(1,0,2,1) );
ASSERT_TRUE( & sx2(1,0) == & x4(1,1,2,1) );
ASSERT_TRUE( & sx2(0,1) == & x4(1,0,2,2) );
ASSERT_TRUE( & sx2(1,1) == & x4(1,1,2,2) );
ASSERT_TRUE( & sx2(0,2) == & x4(1,0,2,3) );
ASSERT_TRUE( & sx2(1,2) == & x4(1,1,2,3) );
Kokkos::View<int****,Kokkos::LayoutStride,Space> sx4 =
Kokkos::subview( x4, Kokkos::pair<int,int>(1,2) /* of [2] */
, Kokkos::pair<int,int>(1,3) /* of [3] */
, Kokkos::pair<int,int>(0,4) /* of [4] */
, Kokkos::pair<int,int>(2,4) /* of [5] */
);
ASSERT_TRUE( ! sx4.is_contiguous() );
for ( int i0 = 0 ; i0 < (int) sx4.dimension_0() ; ++i0 )
for ( int i1 = 0 ; i1 < (int) sx4.dimension_1() ; ++i1 )
for ( int i2 = 0 ; i2 < (int) sx4.dimension_2() ; ++i2 )
for ( int i3 = 0 ; i3 < (int) sx4.dimension_3() ; ++i3 ) {
ASSERT_TRUE( & sx4(i0,i1,i2,i3) == & x4( 1+i0, 1+i1, 0+i2, 2+i3 ) );
}
}
}
template< class Space >
void test_left_3()
{
typedef Kokkos::View< int ** , Kokkos::LayoutLeft , Space > view_type ;
- if(Kokkos::Impl::VerifyExecutionCanAccessMemorySpace<Kokkos::HostSpace,Space>::value) {
+ if(Kokkos::Impl::SpaceAccessibility<Kokkos::HostSpace,typename Space::memory_space>::accessible) {
view_type xm("x4",10,5);
ASSERT_TRUE( xm.is_contiguous() );
Kokkos::View<int,Kokkos::LayoutLeft,Space> x0 = Kokkos::subview( xm , 5, 3 );
ASSERT_TRUE( x0.is_contiguous() );
ASSERT_TRUE( & x0() == & xm(5,3) );
Kokkos::View<int*,Kokkos::LayoutLeft,Space> x1 =
- Kokkos::subview( xm, Kokkos::ALL(), 3 );
+ Kokkos::subview( xm, Kokkos::ALL, 3 );
ASSERT_TRUE( x1.is_contiguous() );
for ( int i = 0 ; i < int(xm.dimension_0()) ; ++i ) {
ASSERT_TRUE( & x1(i) == & xm(i,3) );
}
Kokkos::View<int**,Kokkos::LayoutLeft,Space> x2 =
- Kokkos::subview( xm, Kokkos::pair<int,int>(1,9), Kokkos::ALL() );
+ Kokkos::subview( xm, Kokkos::pair<int,int>(1,9), Kokkos::ALL );
ASSERT_TRUE( ! x2.is_contiguous() );
for ( int j = 0 ; j < int(x2.dimension_1()) ; ++j )
for ( int i = 0 ; i < int(x2.dimension_0()) ; ++i ) {
ASSERT_TRUE( & x2(i,j) == & xm(1+i,j) );
}
Kokkos::View<int**,Kokkos::LayoutLeft,Space> x2c =
- Kokkos::subview( xm, Kokkos::ALL(), std::pair<int,int>(2,4) );
+ Kokkos::subview( xm, Kokkos::ALL, std::pair<int,int>(2,4) );
ASSERT_TRUE( x2c.is_contiguous() );
for ( int j = 0 ; j < int(x2c.dimension_1()) ; ++j )
for ( int i = 0 ; i < int(x2c.dimension_0()) ; ++i ) {
ASSERT_TRUE( & x2c(i,j) == & xm(i,2+j) );
}
Kokkos::View<int**,Kokkos::LayoutLeft,Space> x2_n1 =
- Kokkos::subview( xm , std::pair<int,int>(1,1) , Kokkos::ALL() );
+ Kokkos::subview( xm , std::pair<int,int>(1,1) , Kokkos::ALL );
ASSERT_TRUE( x2_n1.dimension_0() == 0 );
ASSERT_TRUE( x2_n1.dimension_1() == xm.dimension_1() );
Kokkos::View<int**,Kokkos::LayoutLeft,Space> x2_n2 =
- Kokkos::subview( xm , Kokkos::ALL() , std::pair<int,int>(1,1) );
+ Kokkos::subview( xm , Kokkos::ALL , std::pair<int,int>(1,1) );
ASSERT_TRUE( x2_n2.dimension_0() == xm.dimension_0() );
ASSERT_TRUE( x2_n2.dimension_1() == 0 );
}
}
//----------------------------------------------------------------------------
template< class Space >
void test_right_0()
{
typedef Kokkos::View< int [2][3][4][5][2][3][4][5] , Kokkos::LayoutRight , Space >
view_static_8_type ;
- if(Kokkos::Impl::VerifyExecutionCanAccessMemorySpace<Kokkos::HostSpace,Space>::value) {
+ if(Kokkos::Impl::SpaceAccessibility<Kokkos::HostSpace,typename Space::memory_space>::accessible) {
view_static_8_type x_static_8("x_static_right_8");
Kokkos::View<int,Kokkos::LayoutRight,Space> x0 = Kokkos::subview( x_static_8 , 0, 0, 0, 0, 0, 0, 0, 0 );
ASSERT_TRUE( & x0() == & x_static_8(0,0,0,0,0,0,0,0) );
Kokkos::View<int*,Kokkos::LayoutRight,Space> x1 =
Kokkos::subview( x_static_8, 0, 1, 2, 3, 0, 1, 2, Kokkos::pair<int,int>(1,3) );
ASSERT_TRUE( x1.dimension_0() == 2 );
ASSERT_TRUE( & x1(0) == & x_static_8(0,1,2,3,0,1,2,1) );
ASSERT_TRUE( & x1(1) == & x_static_8(0,1,2,3,0,1,2,2) );
Kokkos::View<int**,Kokkos::LayoutRight,Space> x2 =
Kokkos::subview( x_static_8, 0, 1, 2, Kokkos::pair<int,int>(1,3)
, 0, 1, 2, Kokkos::pair<int,int>(1,3) );
ASSERT_TRUE( x2.dimension_0() == 2 );
ASSERT_TRUE( x2.dimension_1() == 2 );
ASSERT_TRUE( & x2(0,0) == & x_static_8(0,1,2,1,0,1,2,1) );
ASSERT_TRUE( & x2(1,0) == & x_static_8(0,1,2,2,0,1,2,1) );
ASSERT_TRUE( & x2(0,1) == & x_static_8(0,1,2,1,0,1,2,2) );
ASSERT_TRUE( & x2(1,1) == & x_static_8(0,1,2,2,0,1,2,2) );
// Kokkos::View<int**,Kokkos::LayoutRight,Space> error_2 =
Kokkos::View<int**,Kokkos::LayoutStride,Space> sx2 =
Kokkos::subview( x_static_8, 1, Kokkos::pair<int,int>(0,2), 2, 3
, Kokkos::pair<int,int>(0,2), 1, 2, 3 );
ASSERT_TRUE( sx2.dimension_0() == 2 );
ASSERT_TRUE( sx2.dimension_1() == 2 );
ASSERT_TRUE( & sx2(0,0) == & x_static_8(1,0,2,3,0,1,2,3) );
ASSERT_TRUE( & sx2(1,0) == & x_static_8(1,1,2,3,0,1,2,3) );
ASSERT_TRUE( & sx2(0,1) == & x_static_8(1,0,2,3,1,1,2,3) );
ASSERT_TRUE( & sx2(1,1) == & x_static_8(1,1,2,3,1,1,2,3) );
Kokkos::View<int****,Kokkos::LayoutStride,Space> sx4 =
Kokkos::subview( x_static_8, 0, Kokkos::pair<int,int>(0,2) /* of [3] */
, 1, Kokkos::pair<int,int>(1,3) /* of [5] */
, 1, Kokkos::pair<int,int>(0,2) /* of [3] */
, 2, Kokkos::pair<int,int>(2,4) /* of [5] */
);
ASSERT_TRUE( sx4.dimension_0() == 2 );
ASSERT_TRUE( sx4.dimension_1() == 2 );
ASSERT_TRUE( sx4.dimension_2() == 2 );
ASSERT_TRUE( sx4.dimension_3() == 2 );
for ( int i0 = 0 ; i0 < (int) sx4.dimension_0() ; ++i0 )
for ( int i1 = 0 ; i1 < (int) sx4.dimension_1() ; ++i1 )
for ( int i2 = 0 ; i2 < (int) sx4.dimension_2() ; ++i2 )
for ( int i3 = 0 ; i3 < (int) sx4.dimension_3() ; ++i3 ) {
ASSERT_TRUE( & sx4(i0,i1,i2,i3) == & x_static_8(0, 0+i0, 1, 1+i1, 1, 0+i2, 2, 2+i3) );
}
}
}
template< class Space >
void test_right_1()
{
typedef Kokkos::View< int ****[2][3][4][5] , Kokkos::LayoutRight , Space >
view_type ;
- if(Kokkos::Impl::VerifyExecutionCanAccessMemorySpace<Kokkos::HostSpace,Space>::value) {
+ if(Kokkos::Impl::SpaceAccessibility<Kokkos::HostSpace,typename Space::memory_space>::accessible) {
view_type x8("x_right_8",2,3,4,5);
Kokkos::View<int,Kokkos::LayoutRight,Space> x0 = Kokkos::subview( x8 , 0, 0, 0, 0, 0, 0, 0, 0 );
ASSERT_TRUE( & x0() == & x8(0,0,0,0,0,0,0,0) );
Kokkos::View<int*,Kokkos::LayoutRight,Space> x1 =
Kokkos::subview( x8, 0, 1, 2, 3, 0, 1, 2, Kokkos::pair<int,int>(1,3) );
ASSERT_TRUE( & x1(0) == & x8(0,1,2,3,0,1,2,1) );
ASSERT_TRUE( & x1(1) == & x8(0,1,2,3,0,1,2,2) );
Kokkos::View<int**,Kokkos::LayoutRight,Space> x2 =
Kokkos::subview( x8, 0, 1, 2, Kokkos::pair<int,int>(1,3)
, 0, 1, 2, Kokkos::pair<int,int>(1,3) );
ASSERT_TRUE( & x2(0,0) == & x8(0,1,2,1,0,1,2,1) );
ASSERT_TRUE( & x2(1,0) == & x8(0,1,2,2,0,1,2,1) );
ASSERT_TRUE( & x2(0,1) == & x8(0,1,2,1,0,1,2,2) );
ASSERT_TRUE( & x2(1,1) == & x8(0,1,2,2,0,1,2,2) );
// Kokkos::View<int**,Kokkos::LayoutRight,Space> error_2 =
Kokkos::View<int**,Kokkos::LayoutStride,Space> sx2 =
Kokkos::subview( x8, 1, Kokkos::pair<int,int>(0,2), 2, 3
, Kokkos::pair<int,int>(0,2), 1, 2, 3 );
ASSERT_TRUE( & sx2(0,0) == & x8(1,0,2,3,0,1,2,3) );
ASSERT_TRUE( & sx2(1,0) == & x8(1,1,2,3,0,1,2,3) );
ASSERT_TRUE( & sx2(0,1) == & x8(1,0,2,3,1,1,2,3) );
ASSERT_TRUE( & sx2(1,1) == & x8(1,1,2,3,1,1,2,3) );
Kokkos::View<int****,Kokkos::LayoutStride,Space> sx4 =
Kokkos::subview( x8, 0, Kokkos::pair<int,int>(0,2) /* of [3] */
, 1, Kokkos::pair<int,int>(1,3) /* of [5] */
, 1, Kokkos::pair<int,int>(0,2) /* of [3] */
, 2, Kokkos::pair<int,int>(2,4) /* of [5] */
);
for ( int i0 = 0 ; i0 < (int) sx4.dimension_0() ; ++i0 )
for ( int i1 = 0 ; i1 < (int) sx4.dimension_1() ; ++i1 )
for ( int i2 = 0 ; i2 < (int) sx4.dimension_2() ; ++i2 )
for ( int i3 = 0 ; i3 < (int) sx4.dimension_3() ; ++i3 ) {
ASSERT_TRUE( & sx4(i0,i1,i2,i3) == & x8(0,0+i0, 1,1+i1, 1,0+i2, 2,2+i3) );
}
}
}
template< class Space >
void test_right_3()
{
typedef Kokkos::View< int ** , Kokkos::LayoutRight , Space > view_type ;
- if(Kokkos::Impl::VerifyExecutionCanAccessMemorySpace<Kokkos::HostSpace,Space>::value) {
+ if(Kokkos::Impl::SpaceAccessibility<Kokkos::HostSpace,typename Space::memory_space>::accessible) {
view_type xm("x4",10,5);
ASSERT_TRUE( xm.is_contiguous() );
Kokkos::View<int,Kokkos::LayoutRight,Space> x0 = Kokkos::subview( xm , 5, 3 );
ASSERT_TRUE( x0.is_contiguous() );
ASSERT_TRUE( & x0() == & xm(5,3) );
Kokkos::View<int*,Kokkos::LayoutRight,Space> x1 =
- Kokkos::subview( xm, 3, Kokkos::ALL() );
+ Kokkos::subview( xm, 3, Kokkos::ALL );
ASSERT_TRUE( x1.is_contiguous() );
for ( int i = 0 ; i < int(xm.dimension_1()) ; ++i ) {
ASSERT_TRUE( & x1(i) == & xm(3,i) );
}
Kokkos::View<int**,Kokkos::LayoutRight,Space> x2c =
- Kokkos::subview( xm, Kokkos::pair<int,int>(1,9), Kokkos::ALL() );
+ Kokkos::subview( xm, Kokkos::pair<int,int>(1,9), Kokkos::ALL );
ASSERT_TRUE( x2c.is_contiguous() );
for ( int j = 0 ; j < int(x2c.dimension_1()) ; ++j )
for ( int i = 0 ; i < int(x2c.dimension_0()) ; ++i ) {
ASSERT_TRUE( & x2c(i,j) == & xm(1+i,j) );
}
Kokkos::View<int**,Kokkos::LayoutRight,Space> x2 =
- Kokkos::subview( xm, Kokkos::ALL(), std::pair<int,int>(2,4) );
+ Kokkos::subview( xm, Kokkos::ALL, std::pair<int,int>(2,4) );
ASSERT_TRUE( ! x2.is_contiguous() );
for ( int j = 0 ; j < int(x2.dimension_1()) ; ++j )
for ( int i = 0 ; i < int(x2.dimension_0()) ; ++i ) {
ASSERT_TRUE( & x2(i,j) == & xm(i,2+j) );
}
Kokkos::View<int**,Kokkos::LayoutRight,Space> x2_n1 =
- Kokkos::subview( xm , std::pair<int,int>(1,1) , Kokkos::ALL() );
+ Kokkos::subview( xm , std::pair<int,int>(1,1) , Kokkos::ALL );
ASSERT_TRUE( x2_n1.dimension_0() == 0 );
ASSERT_TRUE( x2_n1.dimension_1() == xm.dimension_1() );
Kokkos::View<int**,Kokkos::LayoutRight,Space> x2_n2 =
- Kokkos::subview( xm , Kokkos::ALL() , std::pair<int,int>(1,1) );
+ Kokkos::subview( xm , Kokkos::ALL , std::pair<int,int>(1,1) );
ASSERT_TRUE( x2_n2.dimension_0() == xm.dimension_0() );
ASSERT_TRUE( x2_n2.dimension_1() == 0 );
}
}
namespace Impl {
constexpr int N0=113;
constexpr int N1=11;
constexpr int N2=17;
constexpr int N3=5;
constexpr int N4=7;
template<class SubView,class View>
void test_Check1D(SubView a, View b, std::pair<int,int> range) {
int errors = 0;
for(int i=0;i<range.second-range.first;i++) {
if(a(i)!=b(i+range.first))
errors++;
}
if(errors>0)
std::cout << "Error Suviews test_Check1D: " << errors <<std::endl;
ASSERT_TRUE( errors == 0 );
}
template<class SubView,class View>
void test_Check1D2D(SubView a, View b, int i0, std::pair<int,int> range) {
int errors = 0;
for(int i1=0;i1<range.second-range.first;i1++) {
if(a(i1)!=b(i0,i1+range.first))
errors++;
}
if(errors>0)
std::cout << "Error Suviews test_Check1D2D: " << errors <<std::endl;
ASSERT_TRUE( errors == 0 );
}
template<class SubView,class View>
void test_Check2D3D(SubView a, View b, int i0, std::pair<int,int> range1, std::pair<int,int> range2) {
int errors = 0;
for(int i1=0;i1<range1.second-range1.first;i1++) {
for(int i2=0;i2<range2.second-range2.first;i2++) {
if(a(i1,i2)!=b(i0,i1+range1.first,i2+range2.first))
errors++;
}
}
if(errors>0)
std::cout << "Error Suviews test_Check2D3D: " << errors <<std::endl;
ASSERT_TRUE( errors == 0 );
}
template<class SubView,class View>
void test_Check3D5D(SubView a, View b, int i0, int i1, std::pair<int,int> range2, std::pair<int,int> range3, std::pair<int,int> range4) {
int errors = 0;
for(int i2=0;i2<range2.second-range2.first;i2++) {
for(int i3=0;i3<range3.second-range3.first;i3++) {
for(int i4=0;i4<range4.second-range4.first;i4++) {
if(a(i2,i3,i4)!=b(i0,i1,i2+range2.first,i3+range3.first,i4+range4.first))
errors++;
}
}
}
if(errors>0)
std::cout << "Error Suviews test_Check3D5D: " << errors <<std::endl;
ASSERT_TRUE( errors == 0 );
}
-template<class Space, class LayoutSub, class Layout, class LayoutOrg>
+template<class Space, class LayoutSub, class Layout, class LayoutOrg, class MemTraits>
void test_1d_assign_impl() {
{ //Breaks
- Kokkos::View<int*,LayoutOrg,Space> a("A",N0);
+ Kokkos::View<int*,LayoutOrg,Space> a_org("A",N0);
+ Kokkos::View<int*,LayoutOrg,Space,MemTraits> a(a_org);
Kokkos::fence();
for(int i=0; i<N0; i++)
- a(i) = i;
+ a_org(i) = i;
- Kokkos::View<int[N0],Layout,Space> a1(a);
+ Kokkos::View<int[N0],Layout,Space,MemTraits> a1(a);
Kokkos::fence();
test_Check1D(a1,a,std::pair<int,int>(0,N0));
- Kokkos::View<int[N0],LayoutSub,Space> a2(a1);
+ Kokkos::View<int[N0],LayoutSub,Space,MemTraits> a2(a1);
Kokkos::fence();
test_Check1D(a2,a,std::pair<int,int>(0,N0));
a1 = a;
test_Check1D(a1,a,std::pair<int,int>(0,N0));
//Runtime Fail expected
//Kokkos::View<int[N1]> afail1(a);
//Compile Time Fail expected
//Kokkos::View<int[N1]> afail2(a1);
}
{ // Works
- Kokkos::View<int[N0],LayoutOrg,Space> a("A");
- Kokkos::View<int*,Layout,Space> a1(a);
+ Kokkos::View<int[N0],LayoutOrg,Space,MemTraits> a("A");
+ Kokkos::View<int*,Layout,Space,MemTraits> a1(a);
Kokkos::fence();
test_Check1D(a1,a,std::pair<int,int>(0,N0));
a1 = a;
Kokkos::fence();
test_Check1D(a1,a,std::pair<int,int>(0,N0));
}
}
-template<class Space, class Type, class TypeSub,class LayoutSub, class Layout, class LayoutOrg>
+template<class Space, class Type, class TypeSub,class LayoutSub, class Layout, class LayoutOrg,class MemTraits>
void test_2d_subview_3d_impl_type() {
Kokkos::View<int***,LayoutOrg,Space> a_org("A",N0,N1,N2);
- Kokkos::View<Type,Layout,Space> a(a_org);
+ Kokkos::View<Type,Layout,Space,MemTraits> a(a_org);
for(int i0=0; i0<N0; i0++)
for(int i1=0; i1<N1; i1++)
for(int i2=0; i2<N2; i2++)
- a(i0,i1,i2) = i0*1000000+i1*1000+i2;
- Kokkos::View<TypeSub,LayoutSub,Space> a1;
- a1 = Kokkos::subview(a,3,Kokkos::ALL(),Kokkos::ALL());
+ a_org(i0,i1,i2) = i0*1000000+i1*1000+i2;
+ Kokkos::View<TypeSub,LayoutSub,Space,MemTraits> a1;
+ a1 = Kokkos::subview(a,3,Kokkos::ALL,Kokkos::ALL);
Kokkos::fence();
test_Check2D3D(a1,a,3,std::pair<int,int>(0,N1),std::pair<int,int>(0,N2));
- Kokkos::View<TypeSub,LayoutSub,Space> a2(a,3,Kokkos::ALL(),Kokkos::ALL());
+ Kokkos::View<TypeSub,LayoutSub,Space,MemTraits> a2(a,3,Kokkos::ALL,Kokkos::ALL);
Kokkos::fence();
test_Check2D3D(a2,a,3,std::pair<int,int>(0,N1),std::pair<int,int>(0,N2));
}
-template<class Space, class LayoutSub, class Layout, class LayoutOrg>
+template<class Space, class LayoutSub, class Layout, class LayoutOrg, class MemTraits>
void test_2d_subview_3d_impl_layout() {
- test_2d_subview_3d_impl_type<Space,int[N0][N1][N2],int[N1][N2],LayoutSub, Layout, LayoutOrg>();
- test_2d_subview_3d_impl_type<Space,int[N0][N1][N2],int* [N2],LayoutSub, Layout, LayoutOrg>();
- test_2d_subview_3d_impl_type<Space,int[N0][N1][N2],int** ,LayoutSub, Layout, LayoutOrg>();
+ test_2d_subview_3d_impl_type<Space,int[N0][N1][N2],int[N1][N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_2d_subview_3d_impl_type<Space,int[N0][N1][N2],int* [N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_2d_subview_3d_impl_type<Space,int[N0][N1][N2],int** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_2d_subview_3d_impl_type<Space,int* [N1][N2],int[N1][N2],LayoutSub, Layout, LayoutOrg>();
- test_2d_subview_3d_impl_type<Space,int* [N1][N2],int* [N2],LayoutSub, Layout, LayoutOrg>();
- test_2d_subview_3d_impl_type<Space,int* [N1][N2],int** ,LayoutSub, Layout, LayoutOrg>();
+ test_2d_subview_3d_impl_type<Space,int* [N1][N2],int[N1][N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_2d_subview_3d_impl_type<Space,int* [N1][N2],int* [N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_2d_subview_3d_impl_type<Space,int* [N1][N2],int** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_2d_subview_3d_impl_type<Space,int** [N2],int[N1][N2],LayoutSub, Layout, LayoutOrg>();
- test_2d_subview_3d_impl_type<Space,int** [N2],int* [N2],LayoutSub, Layout, LayoutOrg>();
- test_2d_subview_3d_impl_type<Space,int** [N2],int** ,LayoutSub, Layout, LayoutOrg>();
+ test_2d_subview_3d_impl_type<Space,int** [N2],int[N1][N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_2d_subview_3d_impl_type<Space,int** [N2],int* [N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_2d_subview_3d_impl_type<Space,int** [N2],int** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_2d_subview_3d_impl_type<Space,int*** ,int[N1][N2],LayoutSub, Layout, LayoutOrg>();
- test_2d_subview_3d_impl_type<Space,int*** ,int* [N2],LayoutSub, Layout, LayoutOrg>();
- test_2d_subview_3d_impl_type<Space,int*** ,int** ,LayoutSub, Layout, LayoutOrg>();
+ test_2d_subview_3d_impl_type<Space,int*** ,int[N1][N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_2d_subview_3d_impl_type<Space,int*** ,int* [N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_2d_subview_3d_impl_type<Space,int*** ,int** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
+
+ test_2d_subview_3d_impl_type<Space,const int[N0][N1][N2],const int[N1][N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_2d_subview_3d_impl_type<Space,const int[N0][N1][N2],const int* [N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_2d_subview_3d_impl_type<Space,const int[N0][N1][N2],const int** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
+
+ test_2d_subview_3d_impl_type<Space,const int* [N1][N2],const int[N1][N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_2d_subview_3d_impl_type<Space,const int* [N1][N2],const int* [N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_2d_subview_3d_impl_type<Space,const int* [N1][N2],const int** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
+
+ test_2d_subview_3d_impl_type<Space,const int** [N2],const int[N1][N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_2d_subview_3d_impl_type<Space,const int** [N2],const int* [N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_2d_subview_3d_impl_type<Space,const int** [N2],const int** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
+
+ test_2d_subview_3d_impl_type<Space,const int*** ,const int[N1][N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_2d_subview_3d_impl_type<Space,const int*** ,const int* [N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_2d_subview_3d_impl_type<Space,const int*** ,const int** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
}
-template<class Space, class Type, class TypeSub,class LayoutSub, class Layout, class LayoutOrg>
-void test_2d_subview_5d_impl_type() {
+template<class Space, class Type, class TypeSub,class LayoutSub, class Layout, class LayoutOrg, class MemTraits>
+void test_3d_subview_5d_impl_type() {
Kokkos::View<int*****,LayoutOrg,Space> a_org("A",N0,N1,N2,N3,N4);
- Kokkos::View<Type,Layout,Space> a(a_org);
+ Kokkos::View<Type,Layout,Space,MemTraits> a(a_org);
for(int i0=0; i0<N0; i0++)
for(int i1=0; i1<N1; i1++)
for(int i2=0; i2<N2; i2++)
for(int i3=0; i3<N3; i3++)
for(int i4=0; i4<N4; i4++)
- a(i0,i1,i2,i3,i4) = i0*1000000+i1*10000+i2*100+i3*10+i4;
- Kokkos::View<TypeSub,LayoutSub,Space> a1;
- a1 = Kokkos::subview(a,3,5,Kokkos::ALL(),Kokkos::ALL(),Kokkos::ALL());
+ a_org(i0,i1,i2,i3,i4) = i0*1000000+i1*10000+i2*100+i3*10+i4;
+ Kokkos::View<TypeSub,LayoutSub,Space,MemTraits> a1;
+ a1 = Kokkos::subview(a,3,5,Kokkos::ALL,Kokkos::ALL,Kokkos::ALL);
Kokkos::fence();
test_Check3D5D(a1,a,3,5,std::pair<int,int>(0,N2),std::pair<int,int>(0,N3),std::pair<int,int>(0,N4));
- Kokkos::View<TypeSub,LayoutSub,Space> a2(a,3,5,Kokkos::ALL(),Kokkos::ALL(),Kokkos::ALL());
+ Kokkos::View<TypeSub,LayoutSub,Space,MemTraits> a2(a,3,5,Kokkos::ALL,Kokkos::ALL,Kokkos::ALL);
Kokkos::fence();
test_Check3D5D(a2,a,3,5,std::pair<int,int>(0,N2),std::pair<int,int>(0,N3),std::pair<int,int>(0,N4));
}
-template<class Space, class LayoutSub, class Layout, class LayoutOrg>
-void test_2d_subview_5d_impl_layout() {
- test_2d_subview_5d_impl_type<Space, int[N0][N1][N2][N3][N4],int[N2][N3][N4],LayoutSub, Layout, LayoutOrg>();
- test_2d_subview_5d_impl_type<Space, int[N0][N1][N2][N3][N4],int* [N3][N4],LayoutSub, Layout, LayoutOrg>();
- test_2d_subview_5d_impl_type<Space, int[N0][N1][N2][N3][N4],int** [N4],LayoutSub, Layout, LayoutOrg>();
- test_2d_subview_5d_impl_type<Space, int[N0][N1][N2][N3][N4],int*** ,LayoutSub, Layout, LayoutOrg>();
-
- test_2d_subview_5d_impl_type<Space, int* [N1][N2][N3][N4],int[N2][N3][N4],LayoutSub, Layout, LayoutOrg>();
- test_2d_subview_5d_impl_type<Space, int* [N1][N2][N3][N4],int* [N3][N4],LayoutSub, Layout, LayoutOrg>();
- test_2d_subview_5d_impl_type<Space, int* [N1][N2][N3][N4],int** [N4],LayoutSub, Layout, LayoutOrg>();
- test_2d_subview_5d_impl_type<Space, int* [N1][N2][N3][N4],int*** ,LayoutSub, Layout, LayoutOrg>();
-
- test_2d_subview_5d_impl_type<Space, int** [N2][N3][N4],int[N2][N3][N4],LayoutSub, Layout, LayoutOrg>();
- test_2d_subview_5d_impl_type<Space, int** [N2][N3][N4],int* [N3][N4],LayoutSub, Layout, LayoutOrg>();
- test_2d_subview_5d_impl_type<Space, int** [N2][N3][N4],int** [N4],LayoutSub, Layout, LayoutOrg>();
- test_2d_subview_5d_impl_type<Space, int** [N2][N3][N4],int*** ,LayoutSub, Layout, LayoutOrg>();
-
- test_2d_subview_5d_impl_type<Space, int*** [N3][N4],int[N2][N3][N4],LayoutSub, Layout, LayoutOrg>();
- test_2d_subview_5d_impl_type<Space, int*** [N3][N4],int* [N3][N4],LayoutSub, Layout, LayoutOrg>();
- test_2d_subview_5d_impl_type<Space, int*** [N3][N4],int** [N4],LayoutSub, Layout, LayoutOrg>();
- test_2d_subview_5d_impl_type<Space, int*** [N3][N4],int*** ,LayoutSub, Layout, LayoutOrg>();
-
- test_2d_subview_5d_impl_type<Space, int**** [N4],int[N2][N3][N4],LayoutSub, Layout, LayoutOrg>();
- test_2d_subview_5d_impl_type<Space, int**** [N4],int* [N3][N4],LayoutSub, Layout, LayoutOrg>();
- test_2d_subview_5d_impl_type<Space, int**** [N4],int** [N4],LayoutSub, Layout, LayoutOrg>();
- test_2d_subview_5d_impl_type<Space, int**** [N4],int*** ,LayoutSub, Layout, LayoutOrg>();
-
- test_2d_subview_5d_impl_type<Space, int***** ,int[N2][N3][N4],LayoutSub, Layout, LayoutOrg>();
- test_2d_subview_5d_impl_type<Space, int***** ,int* [N3][N4],LayoutSub, Layout, LayoutOrg>();
- test_2d_subview_5d_impl_type<Space, int***** ,int** [N4],LayoutSub, Layout, LayoutOrg>();
- test_2d_subview_5d_impl_type<Space, int***** ,int*** ,LayoutSub, Layout, LayoutOrg>();
+template<class Space, class LayoutSub, class Layout, class LayoutOrg, class MemTraits>
+void test_3d_subview_5d_impl_layout() {
+ test_3d_subview_5d_impl_type<Space, int[N0][N1][N2][N3][N4],int[N2][N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type<Space, int[N0][N1][N2][N3][N4],int* [N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type<Space, int[N0][N1][N2][N3][N4],int** [N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type<Space, int[N0][N1][N2][N3][N4],int*** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
+
+ test_3d_subview_5d_impl_type<Space, int* [N1][N2][N3][N4],int[N2][N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type<Space, int* [N1][N2][N3][N4],int* [N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type<Space, int* [N1][N2][N3][N4],int** [N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type<Space, int* [N1][N2][N3][N4],int*** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
+
+ test_3d_subview_5d_impl_type<Space, int** [N2][N3][N4],int[N2][N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type<Space, int** [N2][N3][N4],int* [N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type<Space, int** [N2][N3][N4],int** [N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type<Space, int** [N2][N3][N4],int*** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
+
+ test_3d_subview_5d_impl_type<Space, int*** [N3][N4],int[N2][N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type<Space, int*** [N3][N4],int* [N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type<Space, int*** [N3][N4],int** [N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type<Space, int*** [N3][N4],int*** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
+
+ test_3d_subview_5d_impl_type<Space, int**** [N4],int[N2][N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type<Space, int**** [N4],int* [N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type<Space, int**** [N4],int** [N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type<Space, int**** [N4],int*** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
+
+ test_3d_subview_5d_impl_type<Space, int***** ,int[N2][N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type<Space, int***** ,int* [N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type<Space, int***** ,int** [N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type<Space, int***** ,int*** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
+
+ test_3d_subview_5d_impl_type<Space, const int[N0][N1][N2][N3][N4],const int[N2][N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type<Space, const int[N0][N1][N2][N3][N4],const int* [N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type<Space, const int[N0][N1][N2][N3][N4],const int** [N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type<Space, const int[N0][N1][N2][N3][N4],const int*** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
+
+ test_3d_subview_5d_impl_type<Space, const int* [N1][N2][N3][N4],const int[N2][N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type<Space, const int* [N1][N2][N3][N4],const int* [N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type<Space, const int* [N1][N2][N3][N4],const int** [N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type<Space, const int* [N1][N2][N3][N4],const int*** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
+
+ test_3d_subview_5d_impl_type<Space, const int** [N2][N3][N4],const int[N2][N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type<Space, const int** [N2][N3][N4],const int* [N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type<Space, const int** [N2][N3][N4],const int** [N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type<Space, const int** [N2][N3][N4],const int*** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
+
+ test_3d_subview_5d_impl_type<Space, const int*** [N3][N4],const int[N2][N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type<Space, const int*** [N3][N4],const int* [N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type<Space, const int*** [N3][N4],const int** [N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type<Space, const int*** [N3][N4],const int*** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
+
+ test_3d_subview_5d_impl_type<Space, const int**** [N4],const int[N2][N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type<Space, const int**** [N4],const int* [N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type<Space, const int**** [N4],const int** [N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type<Space, const int**** [N4],const int*** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
+
+ test_3d_subview_5d_impl_type<Space, const int***** ,const int[N2][N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type<Space, const int***** ,const int* [N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type<Space, const int***** ,const int** [N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type<Space, const int***** ,const int*** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
}
+
+inline
+void test_subview_legal_args_right() {
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,int>::value));
+
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::Impl::ALL_t,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::Impl::ALL_t,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::pair<int,int>,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::pair<int,int>,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::pair<int,int>,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::pair<int,int>,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int>::value));
+
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int,Kokkos::Impl::ALL_t,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int,Kokkos::pair<int,int>,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int,Kokkos::Impl::ALL_t,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,Kokkos::Impl::ALL_t,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int,Kokkos::pair<int,int>,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>,int>::value));
+
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int>::value));
+
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int,Kokkos::Impl::ALL_t>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int,Kokkos::pair<int,int>>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int,Kokkos::Impl::ALL_t>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,Kokkos::Impl::ALL_t>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int,Kokkos::pair<int,int>>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int,Kokkos::Impl::ALL_t>::value));
+
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>>::value));
+ ASSERT_EQ(1,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::pair<int,int>>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>>::value));
+ ASSERT_EQ(1,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::pair<int,int>>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t>::value));
+
+ ASSERT_EQ(1,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,3,0,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,3,0,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>>::value));
+ ASSERT_EQ(1,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,3,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,3,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,3,0,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,3,0,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::pair<int,int>>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,3,0,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,3,0,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::pair<int,int>>::value));
}
-template< class Space >
+inline
+void test_subview_legal_args_left() {
+ ASSERT_EQ(1,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int,int>::value));
+ ASSERT_EQ(1,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,int>::value));
+ ASSERT_EQ(1,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int,int>::value));
+ ASSERT_EQ(1,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,int>::value));
+
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::Impl::ALL_t,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::Impl::ALL_t,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::pair<int,int>,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::pair<int,int>,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::pair<int,int>,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::pair<int,int>,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int>::value));
+
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int,Kokkos::Impl::ALL_t,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int,Kokkos::pair<int,int>,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int,Kokkos::Impl::ALL_t,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,Kokkos::Impl::ALL_t,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int,Kokkos::pair<int,int>,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>,int>::value));
+
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int>::value));
+
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int,Kokkos::Impl::ALL_t>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int,Kokkos::pair<int,int>>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int,Kokkos::Impl::ALL_t>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,Kokkos::Impl::ALL_t>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int,Kokkos::pair<int,int>>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int,Kokkos::Impl::ALL_t>::value));
+
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::pair<int,int>>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::pair<int,int>>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t>::value));
+
+ ASSERT_EQ(1,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,3,0,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>>::value));
+ ASSERT_EQ(1,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,3,0,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t>::value));
+ ASSERT_EQ(1,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,3,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>>::value));
+ ASSERT_EQ(1,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,3,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,3,0,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,3,0,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::pair<int,int>>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,3,0,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t>::value));
+ ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,3,0,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::pair<int,int>>::value));
+}
+
+}
+
+template< class Space, class MemTraits = void>
void test_1d_assign() {
- Impl::test_1d_assign_impl<Space,Kokkos::LayoutLeft ,Kokkos::LayoutLeft ,Kokkos::LayoutLeft >();
+ Impl::test_1d_assign_impl<Space,Kokkos::LayoutLeft ,Kokkos::LayoutLeft ,Kokkos::LayoutLeft, MemTraits>();
//Impl::test_1d_assign_impl<Space,Kokkos::LayoutRight ,Kokkos::LayoutLeft ,Kokkos::LayoutLeft >();
- Impl::test_1d_assign_impl<Space,Kokkos::LayoutStride,Kokkos::LayoutLeft ,Kokkos::LayoutLeft >();
+ Impl::test_1d_assign_impl<Space,Kokkos::LayoutStride,Kokkos::LayoutLeft ,Kokkos::LayoutLeft, MemTraits>();
//Impl::test_1d_assign_impl<Space,Kokkos::LayoutLeft ,Kokkos::LayoutRight ,Kokkos::LayoutLeft >();
- Impl::test_1d_assign_impl<Space,Kokkos::LayoutRight ,Kokkos::LayoutRight ,Kokkos::LayoutRight >();
- Impl::test_1d_assign_impl<Space,Kokkos::LayoutStride,Kokkos::LayoutRight ,Kokkos::LayoutRight >();
+ Impl::test_1d_assign_impl<Space,Kokkos::LayoutRight ,Kokkos::LayoutRight ,Kokkos::LayoutRight, MemTraits>();
+ Impl::test_1d_assign_impl<Space,Kokkos::LayoutStride,Kokkos::LayoutRight ,Kokkos::LayoutRight, MemTraits>();
//Impl::test_1d_assign_impl<Space,Kokkos::LayoutLeft ,Kokkos::LayoutStride,Kokkos::LayoutLeft >();
//Impl::test_1d_assign_impl<Space,Kokkos::LayoutRight ,Kokkos::LayoutStride,Kokkos::LayoutLeft >();
- Impl::test_1d_assign_impl<Space,Kokkos::LayoutStride,Kokkos::LayoutStride,Kokkos::LayoutLeft >();
+ Impl::test_1d_assign_impl<Space,Kokkos::LayoutStride,Kokkos::LayoutStride,Kokkos::LayoutLeft, MemTraits>();
}
-template<class Space >
+template<class Space, class MemTraits = void>
void test_2d_subview_3d() {
- Impl::test_2d_subview_3d_impl_layout<Space,Kokkos::LayoutRight ,Kokkos::LayoutRight, Kokkos::LayoutRight>();
- Impl::test_2d_subview_3d_impl_layout<Space,Kokkos::LayoutStride,Kokkos::LayoutRight, Kokkos::LayoutRight>();
- Impl::test_2d_subview_3d_impl_layout<Space,Kokkos::LayoutStride,Kokkos::LayoutStride,Kokkos::LayoutRight>();
- Impl::test_2d_subview_3d_impl_layout<Space,Kokkos::LayoutStride,Kokkos::LayoutLeft, Kokkos::LayoutLeft>();
- Impl::test_2d_subview_3d_impl_layout<Space,Kokkos::LayoutStride,Kokkos::LayoutStride,Kokkos::LayoutLeft>();
+ Impl::test_2d_subview_3d_impl_layout<Space,Kokkos::LayoutRight ,Kokkos::LayoutRight, Kokkos::LayoutRight, MemTraits>();
+ Impl::test_2d_subview_3d_impl_layout<Space,Kokkos::LayoutStride,Kokkos::LayoutRight, Kokkos::LayoutRight, MemTraits>();
+ Impl::test_2d_subview_3d_impl_layout<Space,Kokkos::LayoutStride,Kokkos::LayoutStride,Kokkos::LayoutRight, MemTraits>();
+ Impl::test_2d_subview_3d_impl_layout<Space,Kokkos::LayoutStride,Kokkos::LayoutLeft, Kokkos::LayoutLeft, MemTraits>();
+ Impl::test_2d_subview_3d_impl_layout<Space,Kokkos::LayoutStride,Kokkos::LayoutStride,Kokkos::LayoutLeft, MemTraits>();
}
-template<class Space >
-void test_2d_subview_5d() {
- Impl::test_2d_subview_5d_impl_layout<Space,Kokkos::LayoutStride,Kokkos::LayoutRight, Kokkos::LayoutRight>();
- Impl::test_2d_subview_5d_impl_layout<Space,Kokkos::LayoutStride,Kokkos::LayoutStride,Kokkos::LayoutRight>();
- Impl::test_2d_subview_5d_impl_layout<Space,Kokkos::LayoutStride,Kokkos::LayoutLeft, Kokkos::LayoutLeft>();
- Impl::test_2d_subview_5d_impl_layout<Space,Kokkos::LayoutStride,Kokkos::LayoutStride,Kokkos::LayoutLeft>();
+template<class Space, class MemTraits = void>
+void test_3d_subview_5d_right() {
+ Impl::test_3d_subview_5d_impl_layout<Space,Kokkos::LayoutStride,Kokkos::LayoutRight, Kokkos::LayoutRight, MemTraits>();
+ Impl::test_3d_subview_5d_impl_layout<Space,Kokkos::LayoutStride,Kokkos::LayoutStride,Kokkos::LayoutRight, MemTraits>();
+}
+
+template<class Space, class MemTraits = void>
+void test_3d_subview_5d_left() {
+ Impl::test_3d_subview_5d_impl_layout<Space,Kokkos::LayoutStride,Kokkos::LayoutLeft, Kokkos::LayoutLeft, MemTraits>();
+ Impl::test_3d_subview_5d_impl_layout<Space,Kokkos::LayoutStride,Kokkos::LayoutStride,Kokkos::LayoutLeft, MemTraits>();
}
+
+
+namespace Impl {
+
+ template<class Layout, class Space>
+ struct FillView_3D {
+ Kokkos::View<int***,Layout,Space> a;
+
+ KOKKOS_INLINE_FUNCTION
+ void operator() (const int& ii) const {
+ const int i = std::is_same<Layout,Kokkos::LayoutLeft>::value ?
+ ii % a.dimension_0(): ii / (a.dimension_1()*a.dimension_2());
+ const int j = std::is_same<Layout,Kokkos::LayoutLeft>::value ?
+ (ii / a.dimension_0()) % a.dimension_1() : (ii / a.dimension_2()) % a.dimension_1();
+ const int k = std::is_same<Layout,Kokkos::LayoutRight>::value ?
+ ii / (a.dimension_0() * a.dimension_1()) : ii % a.dimension_2();
+ a(i,j,k) = 1000000 * i + 1000 * j + k;
+ }
+ };
+
+ template<class Layout, class Space>
+ struct FillView_4D {
+ Kokkos::View<int****,Layout,Space> a;
+
+ KOKKOS_INLINE_FUNCTION
+ void operator() (const int& ii) const {
+ const int i = std::is_same<Layout,Kokkos::LayoutLeft>::value ?
+ ii % a.dimension_0(): ii / (a.dimension_1()*a.dimension_2()*a.dimension_3());
+ const int j = std::is_same<Layout,Kokkos::LayoutLeft>::value ?
+ (ii / a.dimension_0()) % a.dimension_1() : (ii / (a.dimension_2()*a.dimension_3()) % a.dimension_1());
+ const int k = std::is_same<Layout,Kokkos::LayoutRight>::value ?
+ (ii / (a.dimension_0() * a.dimension_1())) % a.dimension_2() : (ii / a.dimension_3()) % a.dimension_2();
+ const int l = std::is_same<Layout,Kokkos::LayoutRight>::value ?
+ ii / (a.dimension_0() * a.dimension_1() * a.dimension_2()) : ii % a.dimension_3();
+ a(i,j,k,l) = 1000000 * i + 10000 * j + 100 * k + l;
+ }
+ };
+
+ template<class Layout, class Space, class MemTraits>
+ struct CheckSubviewCorrectness_3D_3D {
+ Kokkos::View<const int***,Layout,Space,MemTraits> a;
+ Kokkos::View<const int***,Layout,Space,MemTraits> b;
+ int offset_0,offset_2;
+
+ KOKKOS_INLINE_FUNCTION
+ void operator() (const int& ii) const {
+ const int i = std::is_same<Layout,Kokkos::LayoutLeft>::value ?
+ ii % b.dimension_0(): ii / (b.dimension_1()*b.dimension_2());
+ const int j = std::is_same<Layout,Kokkos::LayoutLeft>::value ?
+ (ii / b.dimension_0()) % b.dimension_1() : (ii / b.dimension_2()) % b.dimension_1();
+ const int k = std::is_same<Layout,Kokkos::LayoutRight>::value ?
+ ii / (b.dimension_0() * b.dimension_1()) : ii % b.dimension_2();
+ if( a(i+offset_0,j,k+offset_2) != b(i,j,k))
+ Kokkos::abort("Error: check_subview_correctness 3D-3D (LayoutLeft -> LayoutLeft or LayoutRight -> LayoutRight)");
+ }
+ };
+
+ template<class Layout, class Space, class MemTraits>
+ struct CheckSubviewCorrectness_3D_4D {
+ Kokkos::View<const int****,Layout,Space,MemTraits> a;
+ Kokkos::View<const int***,Layout,Space,MemTraits> b;
+ int offset_0,offset_2,index;
+
+ KOKKOS_INLINE_FUNCTION
+ void operator() (const int& ii) const {
+ const int i = std::is_same<Layout,Kokkos::LayoutLeft>::value ?
+ ii % b.dimension_0(): ii / (b.dimension_1()*b.dimension_2());
+ const int j = std::is_same<Layout,Kokkos::LayoutLeft>::value ?
+ (ii / b.dimension_0()) % b.dimension_1() : (ii / b.dimension_2()) % b.dimension_1();
+ const int k = std::is_same<Layout,Kokkos::LayoutRight>::value ?
+ ii / (b.dimension_0() * b.dimension_1()) : ii % b.dimension_2();
+
+ int i0,i1,i2,i3;
+ if(std::is_same<Layout,Kokkos::LayoutLeft>::value) {
+ i0 = i + offset_0;
+ i1 = j;
+ i2 = k + offset_2;
+ i3 = index;
+ } else {
+ i0 = index;
+ i1 = i + offset_0;
+ i2 = j;
+ i3 = k + offset_2;
+ }
+ if( a(i0,i1,i2,i3) != b(i,j,k))
+ Kokkos::abort("Error: check_subview_correctness 3D-4D (LayoutLeft -> LayoutLeft or LayoutRight -> LayoutRight)");
+ }
+ };
+}
+
+template<class Space, class MemTraits = void>
+void test_layoutleft_to_layoutleft() {
+ Impl::test_subview_legal_args_left();
+
+ {
+ Kokkos::View<int***,Kokkos::LayoutLeft,Space> a("A",100,4,3);
+ Kokkos::View<int***,Kokkos::LayoutLeft,Space> b(a,Kokkos::pair<int,int>(16,32),Kokkos::ALL,Kokkos::ALL);
+
+ Impl::FillView_3D<Kokkos::LayoutLeft,Space> fill;
+ fill.a = a;
+ Kokkos::parallel_for(Kokkos::RangePolicy<typename Space::execution_space>(0,a.extent(0)*a.extent(1)*a.extent(2)), fill);
+
+ Impl::CheckSubviewCorrectness_3D_3D<Kokkos::LayoutLeft,Space,MemTraits> check;
+ check.a = a;
+ check.b = b;
+ check.offset_0 = 16;
+ check.offset_2 = 0;
+ Kokkos::parallel_for(Kokkos::RangePolicy<typename Space::execution_space>(0,b.extent(0)*b.extent(1)*b.extent(2)), check);
+ }
+ {
+ Kokkos::View<int***,Kokkos::LayoutLeft,Space> a("A",100,4,5);
+ Kokkos::View<int***,Kokkos::LayoutLeft,Space> b(a,Kokkos::pair<int,int>(16,32),Kokkos::ALL,Kokkos::pair<int,int>(1,3));
+
+ Impl::FillView_3D<Kokkos::LayoutLeft,Space> fill;
+ fill.a = a;
+ Kokkos::parallel_for(Kokkos::RangePolicy<typename Space::execution_space>(0,a.extent(0)*a.extent(1)*a.extent(2)), fill);
+
+ Impl::CheckSubviewCorrectness_3D_3D<Kokkos::LayoutLeft,Space,MemTraits> check;
+ check.a = a;
+ check.b = b;
+ check.offset_0 = 16;
+ check.offset_2 = 1;
+ Kokkos::parallel_for(Kokkos::RangePolicy<typename Space::execution_space>(0,b.extent(0)*b.extent(1)*b.extent(2)), check);
+ }
+ {
+ Kokkos::View<int****,Kokkos::LayoutLeft,Space> a("A",100,4,5,3);
+ Kokkos::View<int***,Kokkos::LayoutLeft,Space> b(a,Kokkos::pair<int,int>(16,32),Kokkos::ALL,Kokkos::pair<int,int>(1,3),1);
+
+ Impl::FillView_4D<Kokkos::LayoutLeft,Space> fill;
+ fill.a = a;
+ Kokkos::parallel_for(Kokkos::RangePolicy<typename Space::execution_space>(0,a.extent(0)*a.extent(1)*a.extent(2)*a.extent(3)), fill);
+
+ Impl::CheckSubviewCorrectness_3D_4D<Kokkos::LayoutLeft,Space,MemTraits> check;
+ check.a = a;
+ check.b = b;
+ check.offset_0 = 16;
+ check.offset_2 = 1;
+ check.index = 1;
+ Kokkos::parallel_for(Kokkos::RangePolicy<typename Space::execution_space>(0,b.extent(0)*b.extent(1)*b.extent(2)), check);
+ }
+}
+
+template<class Space, class MemTraits = void>
+void test_layoutright_to_layoutright() {
+ Impl::test_subview_legal_args_right();
+
+ {
+ Kokkos::View<int***,Kokkos::LayoutRight,Space> a("A",100,4,3);
+ Kokkos::View<int***,Kokkos::LayoutRight,Space> b(a,Kokkos::pair<int,int>(16,32),Kokkos::ALL,Kokkos::ALL);
+
+ Impl::FillView_3D<Kokkos::LayoutRight,Space> fill;
+ fill.a = a;
+ Kokkos::parallel_for(Kokkos::RangePolicy<typename Space::execution_space>(0,a.extent(0)*a.extent(1)*a.extent(2)), fill);
+
+ Impl::CheckSubviewCorrectness_3D_3D<Kokkos::LayoutRight,Space,MemTraits> check;
+ check.a = a;
+ check.b = b;
+ check.offset_0 = 16;
+ check.offset_2 = 0;
+ Kokkos::parallel_for(Kokkos::RangePolicy<typename Space::execution_space>(0,b.extent(0)*b.extent(1)*b.extent(2)), check);
+ }
+ {
+ Kokkos::View<int****,Kokkos::LayoutRight,Space> a("A",3,4,5,100);
+ Kokkos::View<int***,Kokkos::LayoutRight,Space> b(a,1,Kokkos::pair<int,int>(1,3),Kokkos::ALL,Kokkos::ALL);
+
+
+ Impl::FillView_4D<Kokkos::LayoutRight,Space> fill;
+ fill.a = a;
+ Kokkos::parallel_for(Kokkos::RangePolicy<typename Space::execution_space>(0,a.extent(0)*a.extent(1)*a.extent(2)*a.extent(3)), fill);
+
+ Impl::CheckSubviewCorrectness_3D_4D<Kokkos::LayoutRight,Space,MemTraits> check;
+ check.a = a;
+ check.b = b;
+ check.offset_0 = 1;
+ check.offset_2 = 0;
+ check.index = 1;
+ Kokkos::parallel_for(Kokkos::RangePolicy<typename Space::execution_space>(0,b.extent(0)*b.extent(1)*b.extent(2)), check);
+ }
+}
+
+
}
//----------------------------------------------------------------------------
diff --git a/lib/kokkos/containers/performance_tests/TestCuda.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda.hpp
similarity index 65%
copy from lib/kokkos/containers/performance_tests/TestCuda.cpp
copy to lib/kokkos/core/unit_test/cuda/TestCuda.hpp
index 8183adaa6..a49d9ef41 100644
--- a/lib/kokkos/containers/performance_tests/TestCuda.cpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda.hpp
@@ -1,109 +1,107 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
-
-#include <stdint.h>
-#include <string>
-#include <iostream>
-#include <iomanip>
-#include <sstream>
-#include <fstream>
-
+#ifndef KOKKOS_TEST_CUDAHPP
+#define KOKKOS_TEST_CUDAHPP
#include <gtest/gtest.h>
+#include <Kokkos_Macros.hpp>
+
#include <Kokkos_Core.hpp>
-#if defined( KOKKOS_HAVE_CUDA )
+#include <TestTile.hpp>
-#include <TestDynRankView.hpp>
+//----------------------------------------------------------------------------
-#include <Kokkos_UnorderedMap.hpp>
+#include <TestSharedAlloc.hpp>
+#include <TestViewMapping.hpp>
-#include <TestGlobal2LocalIds.hpp>
-#include <TestUnorderedMapPerformance.hpp>
+#include <TestViewAPI.hpp>
+#include <TestViewOfClass.hpp>
+#include <TestViewSubview.hpp>
+#include <TestAtomic.hpp>
+#include <TestAtomicOperations.hpp>
+#include <TestRange.hpp>
+#include <TestTeam.hpp>
+#include <TestReduce.hpp>
+#include <TestScan.hpp>
+#include <TestAggregate.hpp>
+#include <TestCompilerMacros.hpp>
+#include <TestTaskScheduler.hpp>
+#include <TestMemoryPool.hpp>
-namespace Performance {
+#include <TestCXX11.hpp>
+#include <TestCXX11Deduction.hpp>
+#include <TestTeamVector.hpp>
+#include <TestTemplateMetaFunctions.hpp>
+
+#include <TestPolicyConstruction.hpp>
+
+#include <TestMDRange.hpp>
+
+namespace Test {
+
+// For Some Reason I can only have the definition of SetUp and TearDown in one cpp file ...
class cuda : public ::testing::Test {
protected:
- static void SetUpTestCase()
+ static void SetUpTestCase();
+ static void TearDownTestCase();
+};
+
+#ifdef TEST_CUDA_INSTANTIATE_SETUP_TEARDOWN
+void cuda::SetUpTestCase()
{
- std::cout << std::setprecision(5) << std::scientific;
+ Kokkos::Cuda::print_configuration( std::cout );
Kokkos::HostSpace::execution_space::initialize();
Kokkos::Cuda::initialize( Kokkos::Cuda::SelectDevice(0) );
}
- static void TearDownTestCase()
+
+void cuda::TearDownTestCase()
{
Kokkos::Cuda::finalize();
Kokkos::HostSpace::execution_space::finalize();
}
-};
-
-TEST_F( cuda, dynrankview_perf )
-{
- std::cout << "Cuda" << std::endl;
- std::cout << " DynRankView vs View: Initialization Only " << std::endl;
- test_dynrankview_op_perf<Kokkos::Cuda>( 4096 );
-}
-
-TEST_F( cuda, global_2_local)
-{
- std::cout << "Cuda" << std::endl;
- std::cout << "size, create, generate, fill, find" << std::endl;
- for (unsigned i=Performance::begin_id_size; i<=Performance::end_id_size; i *= Performance::id_step)
- test_global_to_local_ids<Kokkos::Cuda>(i);
-}
-
-TEST_F( cuda, unordered_map_performance_near)
-{
- Perf::run_performance_tests<Kokkos::Cuda,true>("cuda-near");
-}
-
-TEST_F( cuda, unordered_map_performance_far)
-{
- Perf::run_performance_tests<Kokkos::Cuda,false>("cuda-far");
+#endif
}
-
-}
-
-#endif /* #if defined( KOKKOS_HAVE_CUDA ) */
+#endif
diff --git a/lib/kokkos/core/unit_test/TestCuda_c.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_Atomics.cpp
similarity index 63%
rename from lib/kokkos/core/unit_test/TestCuda_c.cpp
rename to lib/kokkos/core/unit_test/cuda/TestCuda_Atomics.cpp
index 70584cead..113b72c70 100644
--- a/lib/kokkos/core/unit_test/TestCuda_c.cpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_Atomics.cpp
@@ -1,375 +1,168 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
-
-#include <gtest/gtest.h>
-
-#include <iostream>
-
-#include <Kokkos_Core.hpp>
-
-//----------------------------------------------------------------------------
-
-#include <Cuda/Kokkos_Cuda_TaskPolicy.hpp>
-#include <impl/Kokkos_ViewTileLeft.hpp>
-#include <TestTile.hpp>
-
-//----------------------------------------------------------------------------
-
-#include <TestSharedAlloc.hpp>
-#include <TestViewMapping.hpp>
-
-#include <TestViewImpl.hpp>
-#include <TestAtomic.hpp>
-#include <TestAtomicOperations.hpp>
-
-#include <TestViewAPI.hpp>
-#include <TestViewSubview.hpp>
-#include <TestViewOfClass.hpp>
-
-#include <TestReduce.hpp>
-#include <TestScan.hpp>
-#include <TestRange.hpp>
-#include <TestTeam.hpp>
-#include <TestAggregate.hpp>
-#include <TestAggregateReduction.hpp>
-#include <TestCompilerMacros.hpp>
-#include <TestMemorySpaceTracking.hpp>
-#include <TestMemoryPool.hpp>
-#include <TestTeamVector.hpp>
-#include <TestTemplateMetaFunctions.hpp>
-#include <TestCXX11Deduction.hpp>
-
-#include <TestTaskPolicy.hpp>
-#include <TestPolicyConstruction.hpp>
-
-//----------------------------------------------------------------------------
-
-class cuda : public ::testing::Test {
-protected:
- static void SetUpTestCase();
- static void TearDownTestCase();
-};
-
-//----------------------------------------------------------------------------
+#include <cuda/TestCuda.hpp>
namespace Test {
-TEST_F( cuda, atomic )
+TEST_F( cuda , atomics )
{
const int loop_count = 1e3 ;
ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Cuda>(loop_count,1) ) );
ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Cuda>(loop_count,2) ) );
ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Cuda>(loop_count,3) ) );
ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Cuda>(loop_count,1) ) );
ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Cuda>(loop_count,2) ) );
ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Cuda>(loop_count,3) ) );
ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Cuda>(loop_count,1) ) );
ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Cuda>(loop_count,2) ) );
ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Cuda>(loop_count,3) ) );
ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Cuda>(loop_count,1) ) );
ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Cuda>(loop_count,2) ) );
ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Cuda>(loop_count,3) ) );
ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Cuda>(loop_count,1) ) );
ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Cuda>(loop_count,2) ) );
ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Cuda>(loop_count,3) ) );
ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Cuda>(loop_count,1) ) );
ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Cuda>(loop_count,2) ) );
ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Cuda>(loop_count,3) ) );
ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Cuda>(100,1) ) );
ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Cuda>(100,2) ) );
ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Cuda>(100,3) ) );
ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Cuda>(100,1) ) );
ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Cuda>(100,2) ) );
ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Cuda>(100,3) ) );
ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::Cuda>(100,1) ) );
ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::Cuda>(100,2) ) );
ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::Cuda>(100,3) ) );
-
}
TEST_F( cuda , atomic_operations )
{
const int start = 1; //Avoid zero for division
const int end = 11;
for (int i = start; i < end; ++i)
{
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Cuda>(start, end-i, 1 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Cuda>(start, end-i, 2 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Cuda>(start, end-i, 3 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Cuda>(start, end-i, 4 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Cuda>(start, end-i, 5 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Cuda>(start, end-i, 6 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Cuda>(start, end-i, 7 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Cuda>(start, end-i, 8 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Cuda>(start, end-i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Cuda>(start, end-i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Cuda>(start, end-i, 12 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Cuda>(start, end-i, 1 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Cuda>(start, end-i, 2 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Cuda>(start, end-i, 3 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Cuda>(start, end-i, 4 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Cuda>(start, end-i, 5 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Cuda>(start, end-i, 6 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Cuda>(start, end-i, 7 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Cuda>(start, end-i, 8 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Cuda>(start, end-i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Cuda>(start, end-i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Cuda>(start, end-i, 12 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Cuda>(start, end-i, 1 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Cuda>(start, end-i, 2 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Cuda>(start, end-i, 3 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Cuda>(start, end-i, 4 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Cuda>(start, end-i, 5 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Cuda>(start, end-i, 6 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Cuda>(start, end-i, 7 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Cuda>(start, end-i, 8 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Cuda>(start, end-i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Cuda>(start, end-i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Cuda>(start, end-i, 12 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Cuda>(start, end-i, 1 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Cuda>(start, end-i, 2 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Cuda>(start, end-i, 3 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Cuda>(start, end-i, 4 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Cuda>(start, end-i, 5 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Cuda>(start, end-i, 6 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Cuda>(start, end-i, 7 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Cuda>(start, end-i, 8 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Cuda>(start, end-i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Cuda>(start, end-i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Cuda>(start, end-i, 12 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Cuda>(start, end-i, 1 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Cuda>(start, end-i, 2 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Cuda>(start, end-i, 3 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Cuda>(start, end-i, 4 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Cuda>(start, end-i, 5 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Cuda>(start, end-i, 6 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Cuda>(start, end-i, 7 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Cuda>(start, end-i, 8 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Cuda>(start, end-i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Cuda>(start, end-i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Cuda>(start, end-i, 12 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Cuda>(start, end-i, 1 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Cuda>(start, end-i, 2 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Cuda>(start, end-i, 3 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Cuda>(start, end-i, 4 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Cuda>(start, end-i, 1 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Cuda>(start, end-i, 2 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Cuda>(start, end-i, 3 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Cuda>(start, end-i, 4 ) ) );
}
}
-//----------------------------------------------------------------------------
-
-TEST_F( cuda, tile_layout)
-{
- TestTile::test< Kokkos::Cuda , 1 , 1 >( 1 , 1 );
- TestTile::test< Kokkos::Cuda , 1 , 1 >( 2 , 3 );
- TestTile::test< Kokkos::Cuda , 1 , 1 >( 9 , 10 );
-
- TestTile::test< Kokkos::Cuda , 2 , 2 >( 1 , 1 );
- TestTile::test< Kokkos::Cuda , 2 , 2 >( 2 , 3 );
- TestTile::test< Kokkos::Cuda , 2 , 2 >( 4 , 4 );
- TestTile::test< Kokkos::Cuda , 2 , 2 >( 9 , 9 );
-
- TestTile::test< Kokkos::Cuda , 2 , 4 >( 9 , 9 );
- TestTile::test< Kokkos::Cuda , 4 , 4 >( 9 , 9 );
-
- TestTile::test< Kokkos::Cuda , 4 , 4 >( 1 , 1 );
- TestTile::test< Kokkos::Cuda , 4 , 4 >( 4 , 4 );
- TestTile::test< Kokkos::Cuda , 4 , 4 >( 9 , 9 );
- TestTile::test< Kokkos::Cuda , 4 , 4 >( 9 , 11 );
-
- TestTile::test< Kokkos::Cuda , 8 , 8 >( 1 , 1 );
- TestTile::test< Kokkos::Cuda , 8 , 8 >( 4 , 4 );
- TestTile::test< Kokkos::Cuda , 8 , 8 >( 9 , 9 );
- TestTile::test< Kokkos::Cuda , 8 , 8 >( 9 , 11 );
-}
-
-TEST_F( cuda , view_aggregate )
-{
- TestViewAggregate< Kokkos::Cuda >();
- TestViewAggregateReduction< Kokkos::Cuda >();
-}
-
-TEST_F( cuda , scan )
-{
- TestScan< Kokkos::Cuda >::test_range( 1 , 1000 );
- TestScan< Kokkos::Cuda >( 1000000 );
- TestScan< Kokkos::Cuda >( 10000000 );
-
- TestScan< Kokkos::Cuda >( 0 );
- TestScan< Kokkos::Cuda >( 0 , 0 );
-
- Kokkos::Cuda::fence();
-}
-
-TEST_F( cuda , team_scan )
-{
- TestScanTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >( 10 );
- TestScanTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >( 10 );
- TestScanTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >( 10000 );
- TestScanTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >( 10000 );
-}
-
-TEST_F( cuda , memory_pool )
-{
-// typedef Kokkos::CudaUVMSpace device_type;
- typedef Kokkos::Cuda device_type;
-
- bool val = TestMemoryPool::test_mempool< device_type >( 128, 128000000 );
- ASSERT_TRUE( val );
-
- Kokkos::Cuda::fence();
-
- TestMemoryPool::test_mempool2< device_type >( 64, 4, 100000, 200000 );
-
- Kokkos::Cuda::fence();
-
- TestMemoryPool::test_memory_exhaustion< Kokkos::Cuda >();
-
- Kokkos::Cuda::fence();
-}
-
-}
-
-//----------------------------------------------------------------------------
-
-TEST_F( cuda , template_meta_functions )
-{
- TestTemplateMetaFunctions<int, Kokkos::Cuda >();
-}
-
-//----------------------------------------------------------------------------
-
-namespace Test {
-
-TEST_F( cuda , reduction_deduction )
-{
- TestCXX11::test_reduction_deduction< Kokkos::Cuda >();
-}
-
-TEST_F( cuda , team_vector )
-{
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(0) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(1) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(2) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(3) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(4) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(5) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(6) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(7) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(8) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(9) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(10) ) );
-}
-
-TEST_F( cuda, triple_nested_parallelism )
-{
- TestTripleNestedReduce< double, Kokkos::Cuda >( 8192, 2048 , 32 , 32 );
- TestTripleNestedReduce< double, Kokkos::Cuda >( 8192, 2048 , 32 , 16 );
- TestTripleNestedReduce< double, Kokkos::Cuda >( 8192, 2048 , 16 , 16 );
-}
-
-}
-
-//----------------------------------------------------------------------------
-
-#if defined( KOKKOS_ENABLE_TASKPOLICY )
-
-TEST_F( cuda , task_fib )
-{
- for ( int i = 0 ; i < 25 ; ++i ) {
- TestTaskPolicy::TestFib< Kokkos::Cuda >::run(i, (i+1)*1000000 );
- }
-}
-
-TEST_F( cuda , task_depend )
-{
- for ( int i = 0 ; i < 25 ; ++i ) {
- TestTaskPolicy::TestTaskDependence< Kokkos::Cuda >::run(i);
- }
-}
-
-TEST_F( cuda , task_team )
-{
- //TestTaskPolicy::TestTaskTeam< Kokkos::Cuda >::run(1000);
- TestTaskPolicy::TestTaskTeam< Kokkos::Cuda >::run(104);
- TestTaskPolicy::TestTaskTeamValue< Kokkos::Cuda >::run(1000);
-}
-
-//----------------------------------------------------------------------------
-
-TEST_F( cuda , old_task_policy )
-{
- TestTaskPolicy::test_task_dep< Kokkos::Cuda >( 10 );
-
- for ( long i = 0 ; i < 15 ; ++i ) {
- // printf("TestTaskPolicy::test_fib< Kokkos::Cuda >(%d);\n",i);
- TestTaskPolicy::test_fib< Kokkos::Cuda >(i,4096);
- }
- for ( long i = 0 ; i < 35 ; ++i ) {
- // printf("TestTaskPolicy::test_fib2< Kokkos::Cuda >(%d);\n",i);
- TestTaskPolicy::test_fib2< Kokkos::Cuda >(i,4096);
- }
-}
-
-TEST_F( cuda , old_task_team )
-{
- TestTaskPolicy::test_task_team< Kokkos::Cuda >(1000);
-}
-
-TEST_F( cuda , old_task_latch )
-{
- TestTaskPolicy::test_latch< Kokkos::Cuda >(10);
- TestTaskPolicy::test_latch< Kokkos::Cuda >(1000);
-}
-
-#endif // #if defined( KOKKOS_ENABLE_TASKPOLICY )
+} // namespace test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_Other.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_Other.cpp
new file mode 100644
index 000000000..80de6618e
--- /dev/null
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_Other.cpp
@@ -0,0 +1,189 @@
+/*
+//@HEADER
+// ************************************************************************
+//
+// Kokkos v. 2.0
+// Copyright (2014) Sandia Corporation
+//
+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
+// the U.S. Government retains certain rights in this software.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the Corporation nor the names of the
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
+//
+// ************************************************************************
+//@HEADER
+*/
+#define TEST_CUDA_INSTANTIATE_SETUP_TEARDOWN
+#include <cuda/TestCuda.hpp>
+
+namespace Test {
+
+TEST_F( cuda , init ) {
+ ;
+}
+
+TEST_F( cuda , md_range ) {
+ TestMDRange_2D< Kokkos::Cuda >::test_for2(100,100);
+
+ TestMDRange_3D< Kokkos::Cuda >::test_for3(100,100,100);
+}
+
+TEST_F( cuda, policy_construction) {
+ TestRangePolicyConstruction< Kokkos::Cuda >();
+ TestTeamPolicyConstruction< Kokkos::Cuda >();
+}
+
+TEST_F( cuda , range_tag )
+{
+ TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_for(0);
+ TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_reduce(0);
+ TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_scan(0);
+ TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(0);
+ TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(0);
+ TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(0);
+
+ TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_for(2);
+ TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_reduce(2);
+ TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_scan(2);
+
+ TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(3);
+ TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(3);
+ TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(3);
+
+ TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_for(1000);
+ TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_reduce(1000);
+ TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_scan(1000);
+
+ TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(1001);
+ TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(1001);
+ TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(1001);
+}
+
+
+//----------------------------------------------------------------------------
+
+TEST_F( cuda , compiler_macros )
+{
+ ASSERT_TRUE( ( TestCompilerMacros::Test< Kokkos::Cuda >() ) );
+}
+
+//----------------------------------------------------------------------------
+
+TEST_F( cuda , memory_pool )
+{
+ bool val = TestMemoryPool::test_mempool< Kokkos::Cuda >( 128, 128000000 );
+ ASSERT_TRUE( val );
+
+ TestMemoryPool::test_mempool2< Kokkos::Cuda >( 64, 4, 1000000, 2000000 );
+
+ TestMemoryPool::test_memory_exhaustion< Kokkos::Cuda >();
+}
+
+//----------------------------------------------------------------------------
+
+#if defined( KOKKOS_ENABLE_TASKDAG )
+
+TEST_F( cuda , task_fib )
+{
+ for ( int i = 0 ; i < 25 ; ++i ) {
+ TestTaskScheduler::TestFib< Kokkos::Cuda >::run(i, (i+1)*(i+1)*10000 );
+ }
+}
+
+TEST_F( cuda , task_depend )
+{
+ for ( int i = 0 ; i < 25 ; ++i ) {
+ TestTaskScheduler::TestTaskDependence< Kokkos::Cuda >::run(i);
+ }
+}
+
+TEST_F( cuda , task_team )
+{
+ TestTaskScheduler::TestTaskTeam< Kokkos::Cuda >::run(1000);
+ //TestTaskScheduler::TestTaskTeamValue< Kokkos::Cuda >::run(1000); //put back after testing
+}
+
+#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
+
+//----------------------------------------------------------------------------
+
+#if defined( KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_CUDA )
+TEST_F( cuda , cxx11 )
+{
+ if ( std::is_same< Kokkos::DefaultExecutionSpace , Kokkos::Cuda >::value ) {
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Cuda >(1) ) );
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Cuda >(2) ) );
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Cuda >(3) ) );
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Cuda >(4) ) );
+ }
+}
+#endif
+
+TEST_F( cuda, tile_layout )
+{
+ TestTile::test< Kokkos::Cuda , 1 , 1 >( 1 , 1 );
+ TestTile::test< Kokkos::Cuda , 1 , 1 >( 2 , 3 );
+ TestTile::test< Kokkos::Cuda , 1 , 1 >( 9 , 10 );
+
+ TestTile::test< Kokkos::Cuda , 2 , 2 >( 1 , 1 );
+ TestTile::test< Kokkos::Cuda , 2 , 2 >( 2 , 3 );
+ TestTile::test< Kokkos::Cuda , 2 , 2 >( 4 , 4 );
+ TestTile::test< Kokkos::Cuda , 2 , 2 >( 9 , 9 );
+
+ TestTile::test< Kokkos::Cuda , 2 , 4 >( 9 , 9 );
+ TestTile::test< Kokkos::Cuda , 4 , 2 >( 9 , 9 );
+
+ TestTile::test< Kokkos::Cuda , 4 , 4 >( 1 , 1 );
+ TestTile::test< Kokkos::Cuda , 4 , 4 >( 4 , 4 );
+ TestTile::test< Kokkos::Cuda , 4 , 4 >( 9 , 9 );
+ TestTile::test< Kokkos::Cuda , 4 , 4 >( 9 , 11 );
+
+ TestTile::test< Kokkos::Cuda , 8 , 8 >( 1 , 1 );
+ TestTile::test< Kokkos::Cuda , 8 , 8 >( 4 , 4 );
+ TestTile::test< Kokkos::Cuda , 8 , 8 >( 9 , 9 );
+ TestTile::test< Kokkos::Cuda , 8 , 8 >( 9 , 11 );
+}
+
+#if defined (KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA)
+#if defined (KOKKOS_COMPILER_CLANG)
+TEST_F( cuda , dispatch )
+{
+ const int repeat = 100 ;
+ for ( int i = 0 ; i < repeat ; ++i ) {
+ for ( int j = 0 ; j < repeat ; ++j ) {
+ Kokkos::parallel_for( Kokkos::RangePolicy< Kokkos::Cuda >(0,j)
+ , KOKKOS_LAMBDA( int ) {} );
+ }}
+}
+#endif
+#endif
+
+} // namespace test
+
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp b/lib/kokkos/core/unit_test/cuda/TestCuda_Reductions_a.cpp
similarity index 85%
copy from lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
copy to lib/kokkos/core/unit_test/cuda/TestCuda_Reductions_a.cpp
index 61d2e3570..b9ab9fe72 100644
--- a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_Reductions_a.cpp
@@ -1,56 +1,56 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <cuda/TestCuda.hpp>
-#ifndef KOKKOS_VIEWTILELEFT_HPP
-#define KOKKOS_VIEWTILELEFT_HPP
-
-#include <impl/KokkosExp_ViewTile.hpp>
-
-namespace Kokkos {
-
-using Kokkos::Experimental::tile_subview ;
+namespace Test {
+TEST_F( cuda , reducers )
+{
+ TestReducers<int, Kokkos::Cuda>::execute_integer();
+ TestReducers<size_t, Kokkos::Cuda>::execute_integer();
+ TestReducers<double, Kokkos::Cuda>::execute_float();
+ TestReducers<Kokkos::complex<double>, Kokkos::Cuda>::execute_basic();
}
-#endif /* #ifndef KOKKOS_VIEWTILELEFT_HPP */
+} // namespace test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_Reductions_b.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_Reductions_b.cpp
new file mode 100644
index 000000000..c588d752d
--- /dev/null
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_Reductions_b.cpp
@@ -0,0 +1,130 @@
+/*
+//@HEADER
+// ************************************************************************
+//
+// Kokkos v. 2.0
+// Copyright (2014) Sandia Corporation
+//
+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
+// the U.S. Government retains certain rights in this software.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the Corporation nor the names of the
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
+//
+// ************************************************************************
+//@HEADER
+*/
+#include <cuda/TestCuda.hpp>
+
+namespace Test {
+
+TEST_F( cuda, long_reduce) {
+ TestReduce< long , Kokkos::Cuda >( 0 );
+ TestReduce< long , Kokkos::Cuda >( 1000000 );
+}
+
+TEST_F( cuda, double_reduce) {
+ TestReduce< double , Kokkos::Cuda >( 0 );
+ TestReduce< double , Kokkos::Cuda >( 1000000 );
+}
+
+TEST_F( cuda, long_reduce_dynamic ) {
+ TestReduceDynamic< long , Kokkos::Cuda >( 0 );
+ TestReduceDynamic< long , Kokkos::Cuda >( 1000000 );
+}
+
+TEST_F( cuda, double_reduce_dynamic ) {
+ TestReduceDynamic< double , Kokkos::Cuda >( 0 );
+ TestReduceDynamic< double , Kokkos::Cuda >( 1000000 );
+}
+
+TEST_F( cuda, long_reduce_dynamic_view ) {
+ TestReduceDynamicView< long , Kokkos::Cuda >( 0 );
+ TestReduceDynamicView< long , Kokkos::Cuda >( 1000000 );
+}
+
+TEST_F( cuda , scan )
+{
+ TestScan< Kokkos::Cuda >::test_range( 1 , 1000 );
+ TestScan< Kokkos::Cuda >( 0 );
+ TestScan< Kokkos::Cuda >( 100000 );
+ TestScan< Kokkos::Cuda >( 10000000 );
+ Kokkos::Cuda::fence();
+}
+
+#if 0
+TEST_F( cuda , scan_small )
+{
+ typedef TestScan< Kokkos::Cuda , Kokkos::Impl::CudaExecUseScanSmall > TestScanFunctor ;
+ for ( int i = 0 ; i < 1000 ; ++i ) {
+ TestScanFunctor( 10 );
+ TestScanFunctor( 10000 );
+ }
+ TestScanFunctor( 1000000 );
+ TestScanFunctor( 10000000 );
+
+ Kokkos::Cuda::fence();
+}
+#endif
+
+TEST_F( cuda , team_scan )
+{
+ TestScanTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >( 0 );
+ TestScanTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
+ TestScanTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >( 10 );
+ TestScanTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >( 10 );
+ TestScanTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >( 10000 );
+ TestScanTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >( 10000 );
+}
+
+TEST_F( cuda , team_long_reduce) {
+ TestReduceTeam< long , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >( 0 );
+ TestReduceTeam< long , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
+ TestReduceTeam< long , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >( 3 );
+ TestReduceTeam< long , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
+ TestReduceTeam< long , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >( 100000 );
+ TestReduceTeam< long , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
+}
+
+TEST_F( cuda , team_double_reduce) {
+ TestReduceTeam< double , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >( 0 );
+ TestReduceTeam< double , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
+ TestReduceTeam< double , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >( 3 );
+ TestReduceTeam< double , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
+ TestReduceTeam< double , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >( 100000 );
+ TestReduceTeam< double , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
+}
+
+TEST_F( cuda , reduction_deduction )
+{
+ TestCXX11::test_reduction_deduction< Kokkos::Cuda >();
+}
+
+} // namespace test
+
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_Spaces.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_Spaces.cpp
new file mode 100644
index 000000000..f3cbc3b88
--- /dev/null
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_Spaces.cpp
@@ -0,0 +1,399 @@
+/*
+//@HEADER
+// ************************************************************************
+//
+// Kokkos v. 2.0
+// Copyright (2014) Sandia Corporation
+//
+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
+// the U.S. Government retains certain rights in this software.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the Corporation nor the names of the
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
+//
+// ************************************************************************
+//@HEADER
+*/
+#include <cuda/TestCuda.hpp>
+
+namespace Test {
+
+__global__
+void test_abort()
+{
+ Kokkos::abort("test_abort");
+}
+
+__global__
+void test_cuda_spaces_int_value( int * ptr )
+{
+ if ( *ptr == 42 ) { *ptr = 2 * 42 ; }
+}
+
+TEST_F( cuda , space_access )
+{
+ //--------------------------------------
+
+ static_assert(
+ Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace , Kokkos::HostSpace >::assignable , "" );
+
+ static_assert(
+ Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace , Kokkos::CudaHostPinnedSpace >::assignable , "" );
+
+ static_assert(
+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace , Kokkos::CudaSpace >::assignable , "" );
+
+ static_assert(
+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace , Kokkos::CudaSpace >::accessible , "" );
+
+ static_assert(
+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace , Kokkos::CudaUVMSpace >::assignable , "" );
+
+ static_assert(
+ Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace , Kokkos::CudaUVMSpace >::accessible , "" );
+
+ //--------------------------------------
+
+ static_assert(
+ Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaSpace , Kokkos::CudaSpace >::assignable , "" );
+
+ static_assert(
+ Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaSpace , Kokkos::CudaUVMSpace >::assignable , "" );
+
+ static_assert(
+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaSpace , Kokkos::CudaHostPinnedSpace >::assignable , "" );
+
+ static_assert(
+ Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaSpace , Kokkos::CudaHostPinnedSpace >::accessible , "" );
+
+ static_assert(
+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaSpace , Kokkos::HostSpace >::assignable , "" );
+
+ static_assert(
+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaSpace , Kokkos::HostSpace >::accessible , "" );
+
+ //--------------------------------------
+
+ static_assert(
+ Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace , Kokkos::CudaUVMSpace >::assignable , "" );
+
+ static_assert(
+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace , Kokkos::CudaSpace >::assignable , "" );
+
+ static_assert(
+ Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace , Kokkos::CudaSpace >::accessible , "" );
+
+ static_assert(
+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace , Kokkos::HostSpace >::assignable , "" );
+
+ static_assert(
+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace , Kokkos::HostSpace >::accessible , "" );
+
+ static_assert(
+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace , Kokkos::CudaHostPinnedSpace >::assignable , "" );
+
+ static_assert(
+ Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace , Kokkos::CudaHostPinnedSpace >::accessible , "" );
+
+ //--------------------------------------
+
+ static_assert(
+ Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace , Kokkos::CudaHostPinnedSpace >::assignable , "" );
+
+ static_assert(
+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace , Kokkos::HostSpace >::assignable , "" );
+
+ static_assert(
+ Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace , Kokkos::HostSpace >::accessible , "" );
+
+ static_assert(
+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace , Kokkos::CudaSpace >::assignable , "" );
+
+ static_assert(
+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace , Kokkos::CudaSpace >::accessible , "" );
+
+ static_assert(
+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace , Kokkos::CudaUVMSpace >::assignable , "" );
+
+ static_assert(
+ Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace , Kokkos::CudaUVMSpace >::accessible , "" );
+
+ //--------------------------------------
+
+ static_assert(
+ ! Kokkos::Impl::SpaceAccessibility< Kokkos::Cuda , Kokkos::HostSpace >::accessible , "" );
+
+ static_assert(
+ Kokkos::Impl::SpaceAccessibility< Kokkos::Cuda , Kokkos::CudaSpace >::accessible , "" );
+
+ static_assert(
+ Kokkos::Impl::SpaceAccessibility< Kokkos::Cuda , Kokkos::CudaUVMSpace >::accessible , "" );
+
+ static_assert(
+ Kokkos::Impl::SpaceAccessibility< Kokkos::Cuda , Kokkos::CudaHostPinnedSpace >::accessible , "" );
+
+ static_assert(
+ ! Kokkos::Impl::SpaceAccessibility< Kokkos::HostSpace , Kokkos::CudaSpace >::accessible , "" );
+
+ static_assert(
+ Kokkos::Impl::SpaceAccessibility< Kokkos::HostSpace , Kokkos::CudaUVMSpace >::accessible , "" );
+
+ static_assert(
+ Kokkos::Impl::SpaceAccessibility< Kokkos::HostSpace , Kokkos::CudaHostPinnedSpace >::accessible , "" );
+
+
+ static_assert(
+ std::is_same< Kokkos::Impl::HostMirror< Kokkos::CudaSpace >::Space
+ , Kokkos::HostSpace >::value , "" );
+
+ static_assert(
+ std::is_same< Kokkos::Impl::HostMirror< Kokkos::CudaUVMSpace >::Space
+ , Kokkos::Device< Kokkos::HostSpace::execution_space
+ , Kokkos::CudaUVMSpace > >::value , "" );
+
+ static_assert(
+ std::is_same< Kokkos::Impl::HostMirror< Kokkos::CudaHostPinnedSpace >::Space
+ , Kokkos::CudaHostPinnedSpace >::value , "" );
+
+ static_assert(
+ std::is_same< Kokkos::Device< Kokkos::HostSpace::execution_space
+ , Kokkos::CudaUVMSpace >
+ , Kokkos::Device< Kokkos::HostSpace::execution_space
+ , Kokkos::CudaUVMSpace > >::value , "" );
+
+ static_assert(
+ Kokkos::Impl::SpaceAccessibility
+ < Kokkos::Impl::HostMirror< Kokkos::Cuda >::Space
+ , Kokkos::HostSpace
+ >::accessible , "" );
+
+ static_assert(
+ Kokkos::Impl::SpaceAccessibility
+ < Kokkos::Impl::HostMirror< Kokkos::CudaSpace >::Space
+ , Kokkos::HostSpace
+ >::accessible , "" );
+
+ static_assert(
+ Kokkos::Impl::SpaceAccessibility
+ < Kokkos::Impl::HostMirror< Kokkos::CudaUVMSpace >::Space
+ , Kokkos::HostSpace
+ >::accessible , "" );
+
+ static_assert(
+ Kokkos::Impl::SpaceAccessibility
+ < Kokkos::Impl::HostMirror< Kokkos::CudaHostPinnedSpace >::Space
+ , Kokkos::HostSpace
+ >::accessible , "" );
+}
+
+TEST_F( cuda, uvm )
+{
+ if ( Kokkos::CudaUVMSpace::available() ) {
+
+ int * uvm_ptr = (int*) Kokkos::kokkos_malloc< Kokkos::CudaUVMSpace >("uvm_ptr",sizeof(int));
+
+ *uvm_ptr = 42 ;
+
+ Kokkos::Cuda::fence();
+ test_cuda_spaces_int_value<<<1,1>>>(uvm_ptr);
+ Kokkos::Cuda::fence();
+
+ EXPECT_EQ( *uvm_ptr, int(2*42) );
+
+ Kokkos::kokkos_free< Kokkos::CudaUVMSpace >(uvm_ptr );
+
+ }
+}
+
+TEST_F( cuda, uvm_num_allocs )
+{
+ // The max number of uvm allocations allowed is 65536
+ #define MAX_NUM_ALLOCS 65536
+
+ if ( Kokkos::CudaUVMSpace::available() ) {
+
+ struct TestMaxUVMAllocs {
+
+ using view_type = Kokkos::View< double* , Kokkos::CudaUVMSpace >;
+ using view_of_view_type = Kokkos::View< view_type[ MAX_NUM_ALLOCS ]
+ , Kokkos::CudaUVMSpace >;
+
+ TestMaxUVMAllocs()
+ : view_allocs_test("view_allocs_test")
+ {
+
+ for ( auto i = 0; i < MAX_NUM_ALLOCS ; ++i ) {
+
+ // Kokkos will throw a runtime exception if an attempt is made to
+ // allocate more than the maximum number of uvm allocations
+
+ // In this test, the max num of allocs occurs when i = MAX_NUM_ALLOCS - 1
+ // since the 'outer' view counts as one UVM allocation, leaving
+ // 65535 possible UVM allocations, that is 'i in [0 , 65535)'
+
+ // The test will catch the exception thrown in this case and continue
+
+ if ( i == ( MAX_NUM_ALLOCS - 1) ) {
+ EXPECT_ANY_THROW( { view_allocs_test(i) = view_type("inner_view",1); } ) ;
+ }
+ else {
+ if(i<MAX_NUM_ALLOCS - 1000) {
+ EXPECT_NO_THROW( { view_allocs_test(i) = view_type("inner_view",1); } ) ;
+ } else { // This might or might not throw depending on compilation options.
+ try {
+ view_allocs_test(i) = view_type("inner_view",1);
+ }
+ catch (...) {}
+ }
+ }
+
+ } //end allocation for loop
+
+ for ( auto i = 0; i < MAX_NUM_ALLOCS -1; ++i ) {
+
+ view_allocs_test(i) = view_type();
+
+ } //end deallocation for loop
+
+ view_allocs_test = view_of_view_type(); // deallocate the view of views
+ }
+
+ // Member
+ view_of_view_type view_allocs_test ;
+ } ;
+
+ // trigger the test via the TestMaxUVMAllocs constructor
+ TestMaxUVMAllocs() ;
+
+ }
+ #undef MAX_NUM_ALLOCS
+}
+
+template< class MemSpace , class ExecSpace >
+struct TestViewCudaAccessible {
+
+ enum { N = 1000 };
+
+ using V = Kokkos::View<double*,MemSpace> ;
+
+ V m_base ;
+
+ struct TagInit {};
+ struct TagTest {};
+
+ KOKKOS_INLINE_FUNCTION
+ void operator()( const TagInit & , const int i ) const { m_base[i] = i + 1 ; }
+
+ KOKKOS_INLINE_FUNCTION
+ void operator()( const TagTest & , const int i , long & error_count ) const
+ { if ( m_base[i] != i + 1 ) ++error_count ; }
+
+ TestViewCudaAccessible()
+ : m_base("base",N)
+ {}
+
+ static void run()
+ {
+ TestViewCudaAccessible self ;
+ Kokkos::parallel_for( Kokkos::RangePolicy< typename MemSpace::execution_space , TagInit >(0,N) , self );
+ MemSpace::execution_space::fence();
+ // Next access is a different execution space, must complete prior kernel.
+ long error_count = -1 ;
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace , TagTest >(0,N) , self , error_count );
+ EXPECT_EQ( error_count , 0 );
+ }
+};
+
+TEST_F( cuda , impl_view_accessible )
+{
+ TestViewCudaAccessible< Kokkos::CudaSpace , Kokkos::Cuda >::run();
+
+ TestViewCudaAccessible< Kokkos::CudaUVMSpace , Kokkos::Cuda >::run();
+ TestViewCudaAccessible< Kokkos::CudaUVMSpace , Kokkos::HostSpace::execution_space >::run();
+
+ TestViewCudaAccessible< Kokkos::CudaHostPinnedSpace , Kokkos::Cuda >::run();
+ TestViewCudaAccessible< Kokkos::CudaHostPinnedSpace , Kokkos::HostSpace::execution_space >::run();
+}
+
+template< class MemSpace >
+struct TestViewCudaTexture {
+
+ enum { N = 1000 };
+
+ using V = Kokkos::View<double*,MemSpace> ;
+ using T = Kokkos::View<const double*, MemSpace, Kokkos::MemoryRandomAccess > ;
+
+ V m_base ;
+ T m_tex ;
+
+ struct TagInit {};
+ struct TagTest {};
+
+ KOKKOS_INLINE_FUNCTION
+ void operator()( const TagInit & , const int i ) const { m_base[i] = i + 1 ; }
+
+ KOKKOS_INLINE_FUNCTION
+ void operator()( const TagTest & , const int i , long & error_count ) const
+ { if ( m_tex[i] != i + 1 ) ++error_count ; }
+
+ TestViewCudaTexture()
+ : m_base("base",N)
+ , m_tex( m_base )
+ {}
+
+ static void run()
+ {
+ EXPECT_TRUE( ( std::is_same< typename V::reference_type
+ , double &
+ >::value ) );
+
+ EXPECT_TRUE( ( std::is_same< typename T::reference_type
+ , const double
+ >::value ) );
+
+ EXPECT_TRUE( V::reference_type_is_lvalue_reference ); // An ordinary view
+ EXPECT_FALSE( T::reference_type_is_lvalue_reference ); // Texture fetch returns by value
+
+ TestViewCudaTexture self ;
+ Kokkos::parallel_for( Kokkos::RangePolicy< Kokkos::Cuda , TagInit >(0,N) , self );
+ long error_count = -1 ;
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< Kokkos::Cuda , TagTest >(0,N) , self , error_count );
+ EXPECT_EQ( error_count , 0 );
+ }
+};
+
+
+TEST_F( cuda , impl_view_texture )
+{
+ TestViewCudaTexture< Kokkos::CudaSpace >::run();
+ TestViewCudaTexture< Kokkos::CudaUVMSpace >::run();
+}
+
+} // namespace test
+
diff --git a/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_a.cpp
similarity index 62%
copy from lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp
copy to lib/kokkos/core/unit_test/cuda/TestCuda_SubView_a.cpp
index c15f81223..fd8a647ef 100644
--- a/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_a.cpp
@@ -1,76 +1,92 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <cuda/TestCuda.hpp>
-#include <gtest/gtest.h>
+namespace Test {
+
+TEST_F( cuda, view_subview_auto_1d_left ) {
+ TestViewSubview::test_auto_1d< Kokkos::LayoutLeft,Kokkos::Cuda >();
+}
+
+TEST_F( cuda, view_subview_auto_1d_right ) {
+ TestViewSubview::test_auto_1d< Kokkos::LayoutRight,Kokkos::Cuda >();
+}
-#include <Kokkos_Core.hpp>
+TEST_F( cuda, view_subview_auto_1d_stride ) {
+ TestViewSubview::test_auto_1d< Kokkos::LayoutStride,Kokkos::Cuda >();
+}
-#if !defined(KOKKOS_HAVE_CUDA) || defined(__CUDACC__)
-//----------------------------------------------------------------------------
+TEST_F( cuda, view_subview_assign_strided ) {
+ TestViewSubview::test_1d_strided_assignment< Kokkos::Cuda >();
+}
-#include <TestReduce.hpp>
+TEST_F( cuda, view_subview_left_0 ) {
+ TestViewSubview::test_left_0< Kokkos::CudaUVMSpace >();
+}
+TEST_F( cuda, view_subview_left_1 ) {
+ TestViewSubview::test_left_1< Kokkos::CudaUVMSpace >();
+}
-namespace Test {
+TEST_F( cuda, view_subview_left_2 ) {
+ TestViewSubview::test_left_2< Kokkos::CudaUVMSpace >();
+}
-class defaultdevicetype : public ::testing::Test {
-protected:
- static void SetUpTestCase()
- {
- Kokkos::initialize();
- }
+TEST_F( cuda, view_subview_left_3 ) {
+ TestViewSubview::test_left_3< Kokkos::CudaUVMSpace >();
+}
- static void TearDownTestCase()
- {
- Kokkos::finalize();
- }
-};
+TEST_F( cuda, view_subview_right_0 ) {
+ TestViewSubview::test_right_0< Kokkos::CudaUVMSpace >();
+}
+TEST_F( cuda, view_subview_right_1 ) {
+ TestViewSubview::test_right_1< Kokkos::CudaUVMSpace >();
+}
-TEST_F( defaultdevicetype, reduce_instantiation) {
- TestReduceCombinatoricalInstantiation<>::execute();
+TEST_F( cuda, view_subview_right_3 ) {
+ TestViewSubview::test_right_3< Kokkos::CudaUVMSpace >();
}
} // namespace test
-#endif
diff --git a/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_b.cpp
similarity index 73%
copy from lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp
copy to lib/kokkos/core/unit_test/cuda/TestCuda_SubView_b.cpp
index c15f81223..053fcfc20 100644
--- a/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_b.cpp
@@ -1,76 +1,60 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
-
-#include <gtest/gtest.h>
-
-#include <Kokkos_Core.hpp>
-
-#if !defined(KOKKOS_HAVE_CUDA) || defined(__CUDACC__)
-//----------------------------------------------------------------------------
-
-#include <TestReduce.hpp>
-
+#include <cuda/TestCuda.hpp>
namespace Test {
-class defaultdevicetype : public ::testing::Test {
-protected:
- static void SetUpTestCase()
- {
- Kokkos::initialize();
- }
-
- static void TearDownTestCase()
- {
- Kokkos::finalize();
- }
-};
-
+TEST_F( cuda, view_subview_layoutleft_to_layoutleft) {
+ TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Cuda >();
+ TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Cuda , Kokkos::MemoryTraits<Kokkos::Atomic> >();
+ TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Cuda , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+}
-TEST_F( defaultdevicetype, reduce_instantiation) {
- TestReduceCombinatoricalInstantiation<>::execute();
+TEST_F( cuda, view_subview_layoutright_to_layoutright) {
+ TestViewSubview::test_layoutright_to_layoutright< Kokkos::Cuda >();
+ TestViewSubview::test_layoutright_to_layoutright< Kokkos::Cuda , Kokkos::MemoryTraits<Kokkos::Atomic> >();
+ TestViewSubview::test_layoutright_to_layoutright< Kokkos::Cuda , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
} // namespace test
-#endif
diff --git a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c01.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
copy to lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c01.cpp
index 86bc94ab0..4c5f2ef72 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c01.cpp
@@ -1,55 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <cuda/TestCuda.hpp>
-#ifndef KOKKOS_SINGLETON_HPP
-#define KOKKOS_SINGLETON_HPP
-
-#include <Kokkos_Macros.hpp>
-#include <cstddef>
-
-namespace Kokkos { namespace Impl {
+namespace Test {
+TEST_F( cuda, view_subview_1d_assign ) {
+ TestViewSubview::test_1d_assign< Kokkos::CudaUVMSpace >();
+}
-}} // namespace Kokkos::Impl
+} // namespace test
-#endif // KOKKOS_SINGLETON_HPP
diff --git a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c02.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
copy to lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c02.cpp
index 86bc94ab0..aee6f1730 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c02.cpp
@@ -1,55 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <cuda/TestCuda.hpp>
-#ifndef KOKKOS_SINGLETON_HPP
-#define KOKKOS_SINGLETON_HPP
-
-#include <Kokkos_Macros.hpp>
-#include <cstddef>
-
-namespace Kokkos { namespace Impl {
+namespace Test {
+TEST_F( cuda, view_subview_1d_assign_atomic ) {
+ TestViewSubview::test_1d_assign< Kokkos::CudaUVMSpace , Kokkos::MemoryTraits<Kokkos::Atomic> >();
+}
-}} // namespace Kokkos::Impl
+} // namespace test
-#endif // KOKKOS_SINGLETON_HPP
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c03.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
copy to lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c03.cpp
index 61d2e3570..2ef48c686 100644
--- a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c03.cpp
@@ -1,56 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <cuda/TestCuda.hpp>
-#ifndef KOKKOS_VIEWTILELEFT_HPP
-#define KOKKOS_VIEWTILELEFT_HPP
-
-#include <impl/KokkosExp_ViewTile.hpp>
-
-namespace Kokkos {
-
-using Kokkos::Experimental::tile_subview ;
+namespace Test {
+TEST_F( cuda, view_subview_1d_assign_randomaccess ) {
+ TestViewSubview::test_1d_assign< Kokkos::CudaUVMSpace , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-#endif /* #ifndef KOKKOS_VIEWTILELEFT_HPP */
+} // namespace test
diff --git a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c04.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
copy to lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c04.cpp
index 86bc94ab0..aec123ac2 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c04.cpp
@@ -1,55 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <cuda/TestCuda.hpp>
-#ifndef KOKKOS_SINGLETON_HPP
-#define KOKKOS_SINGLETON_HPP
-
-#include <Kokkos_Macros.hpp>
-#include <cstddef>
-
-namespace Kokkos { namespace Impl {
+namespace Test {
+TEST_F( cuda, view_subview_2d_from_3d ) {
+ TestViewSubview::test_2d_subview_3d< Kokkos::CudaUVMSpace >();
+}
-}} // namespace Kokkos::Impl
+} // namespace test
-#endif // KOKKOS_SINGLETON_HPP
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c05.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
copy to lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c05.cpp
index 61d2e3570..e8ad23199 100644
--- a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c05.cpp
@@ -1,56 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <cuda/TestCuda.hpp>
-#ifndef KOKKOS_VIEWTILELEFT_HPP
-#define KOKKOS_VIEWTILELEFT_HPP
-
-#include <impl/KokkosExp_ViewTile.hpp>
-
-namespace Kokkos {
-
-using Kokkos::Experimental::tile_subview ;
+namespace Test {
+TEST_F( cuda, view_subview_2d_from_3d_atomic ) {
+ TestViewSubview::test_2d_subview_3d< Kokkos::CudaUVMSpace , Kokkos::MemoryTraits<Kokkos::Atomic> >();
}
-#endif /* #ifndef KOKKOS_VIEWTILELEFT_HPP */
+} // namespace test
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c06.cpp
similarity index 88%
copy from lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
copy to lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c06.cpp
index 61d2e3570..e86b4513f 100644
--- a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c06.cpp
@@ -1,56 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <cuda/TestCuda.hpp>
-#ifndef KOKKOS_VIEWTILELEFT_HPP
-#define KOKKOS_VIEWTILELEFT_HPP
-
-#include <impl/KokkosExp_ViewTile.hpp>
-
-namespace Kokkos {
-
-using Kokkos::Experimental::tile_subview ;
+namespace Test {
+TEST_F( cuda, view_subview_2d_from_3d_randomaccess ) {
+ TestViewSubview::test_2d_subview_3d< Kokkos::CudaUVMSpace , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-#endif /* #ifndef KOKKOS_VIEWTILELEFT_HPP */
+} // namespace test
diff --git a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c07.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
copy to lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c07.cpp
index 86bc94ab0..ad9dcc0fd 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c07.cpp
@@ -1,55 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <cuda/TestCuda.hpp>
-#ifndef KOKKOS_SINGLETON_HPP
-#define KOKKOS_SINGLETON_HPP
-
-#include <Kokkos_Macros.hpp>
-#include <cstddef>
-
-namespace Kokkos { namespace Impl {
+namespace Test {
+TEST_F( cuda, view_subview_3d_from_5d_left ) {
+ TestViewSubview::test_3d_subview_5d_left< Kokkos::CudaUVMSpace >();
+}
-}} // namespace Kokkos::Impl
+} // namespace test
-#endif // KOKKOS_SINGLETON_HPP
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c08.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
copy to lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c08.cpp
index 61d2e3570..f97d97e59 100644
--- a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c08.cpp
@@ -1,56 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <cuda/TestCuda.hpp>
-#ifndef KOKKOS_VIEWTILELEFT_HPP
-#define KOKKOS_VIEWTILELEFT_HPP
-
-#include <impl/KokkosExp_ViewTile.hpp>
-
-namespace Kokkos {
-
-using Kokkos::Experimental::tile_subview ;
+namespace Test {
+TEST_F( cuda, view_subview_3d_from_5d_left_atomic ) {
+ TestViewSubview::test_3d_subview_5d_left< Kokkos::CudaUVMSpace , Kokkos::MemoryTraits<Kokkos::Atomic> >();
}
-#endif /* #ifndef KOKKOS_VIEWTILELEFT_HPP */
+} // namespace test
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c09.cpp
similarity index 88%
copy from lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
copy to lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c09.cpp
index 61d2e3570..2a07f28f8 100644
--- a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c09.cpp
@@ -1,56 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <cuda/TestCuda.hpp>
-#ifndef KOKKOS_VIEWTILELEFT_HPP
-#define KOKKOS_VIEWTILELEFT_HPP
-
-#include <impl/KokkosExp_ViewTile.hpp>
-
-namespace Kokkos {
-
-using Kokkos::Experimental::tile_subview ;
+namespace Test {
+TEST_F( cuda, view_subview_3d_from_5d_left_randomaccess ) {
+ TestViewSubview::test_3d_subview_5d_left< Kokkos::CudaUVMSpace , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-#endif /* #ifndef KOKKOS_VIEWTILELEFT_HPP */
+} // namespace test
diff --git a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c10.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
copy to lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c10.cpp
index 86bc94ab0..3c51d9420 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c10.cpp
@@ -1,55 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <cuda/TestCuda.hpp>
-#ifndef KOKKOS_SINGLETON_HPP
-#define KOKKOS_SINGLETON_HPP
-
-#include <Kokkos_Macros.hpp>
-#include <cstddef>
-
-namespace Kokkos { namespace Impl {
+namespace Test {
+TEST_F( cuda, view_subview_3d_from_5d_right ) {
+ TestViewSubview::test_3d_subview_5d_right< Kokkos::CudaUVMSpace >();
+}
-}} // namespace Kokkos::Impl
+} // namespace test
-#endif // KOKKOS_SINGLETON_HPP
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c11.cpp
similarity index 88%
copy from lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
copy to lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c11.cpp
index 61d2e3570..835caa7b8 100644
--- a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c11.cpp
@@ -1,56 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <cuda/TestCuda.hpp>
-#ifndef KOKKOS_VIEWTILELEFT_HPP
-#define KOKKOS_VIEWTILELEFT_HPP
-
-#include <impl/KokkosExp_ViewTile.hpp>
-
-namespace Kokkos {
-
-using Kokkos::Experimental::tile_subview ;
+namespace Test {
+TEST_F( cuda, view_subview_3d_from_5d_right_atomic ) {
+ TestViewSubview::test_3d_subview_5d_right< Kokkos::CudaUVMSpace , Kokkos::MemoryTraits<Kokkos::Atomic> >();
}
-#endif /* #ifndef KOKKOS_VIEWTILELEFT_HPP */
+} // namespace test
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c12.cpp
similarity index 88%
copy from lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
copy to lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c12.cpp
index 61d2e3570..53bd5eee2 100644
--- a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c12.cpp
@@ -1,56 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <cuda/TestCuda.hpp>
-#ifndef KOKKOS_VIEWTILELEFT_HPP
-#define KOKKOS_VIEWTILELEFT_HPP
-
-#include <impl/KokkosExp_ViewTile.hpp>
-
-namespace Kokkos {
-
-using Kokkos::Experimental::tile_subview ;
+namespace Test {
+TEST_F( cuda, view_subview_3d_from_5d_right_randomaccess ) {
+ TestViewSubview::test_3d_subview_5d_right< Kokkos::CudaUVMSpace , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-#endif /* #ifndef KOKKOS_VIEWTILELEFT_HPP */
+} // namespace test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c_all.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c_all.cpp
new file mode 100644
index 000000000..e4348319f
--- /dev/null
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c_all.cpp
@@ -0,0 +1,12 @@
+#include<cuda/TestCuda_SubView_c01.cpp>
+#include<cuda/TestCuda_SubView_c02.cpp>
+#include<cuda/TestCuda_SubView_c03.cpp>
+#include<cuda/TestCuda_SubView_c04.cpp>
+#include<cuda/TestCuda_SubView_c05.cpp>
+#include<cuda/TestCuda_SubView_c06.cpp>
+#include<cuda/TestCuda_SubView_c07.cpp>
+#include<cuda/TestCuda_SubView_c08.cpp>
+#include<cuda/TestCuda_SubView_c09.cpp>
+#include<cuda/TestCuda_SubView_c10.cpp>
+#include<cuda/TestCuda_SubView_c11.cpp>
+#include<cuda/TestCuda_SubView_c12.cpp>
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_Team.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_Team.cpp
new file mode 100644
index 000000000..800a458af
--- /dev/null
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_Team.cpp
@@ -0,0 +1,120 @@
+/*
+//@HEADER
+// ************************************************************************
+//
+// Kokkos v. 2.0
+// Copyright (2014) Sandia Corporation
+//
+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
+// the U.S. Government retains certain rights in this software.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the Corporation nor the names of the
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
+//
+// ************************************************************************
+//@HEADER
+*/
+#include <cuda/TestCuda.hpp>
+
+namespace Test {
+
+TEST_F( cuda , team_tag )
+{
+ TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_for(0);
+ TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_reduce(0);
+ TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(0);
+ TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(0);
+
+ TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_for(2);
+ TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_reduce(2);
+ TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(2);
+ TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(2);
+
+ TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_for(1000);
+ TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_reduce(1000);
+ TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(1000);
+ TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(1000);
+}
+
+TEST_F( cuda , team_shared_request) {
+ TestSharedTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >();
+ TestSharedTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >();
+}
+
+//THis Tests request to much L0 scratch
+//TEST_F( cuda, team_scratch_request) {
+// TestScratchTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >();
+// TestScratchTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >();
+//}
+
+#if defined(KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA)
+TEST_F( cuda , team_lambda_shared_request) {
+ TestLambdaSharedTeam< Kokkos::CudaSpace, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >();
+ TestLambdaSharedTeam< Kokkos::CudaUVMSpace, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >();
+ TestLambdaSharedTeam< Kokkos::CudaHostPinnedSpace, Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >();
+ TestLambdaSharedTeam< Kokkos::CudaSpace, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >();
+ TestLambdaSharedTeam< Kokkos::CudaUVMSpace, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >();
+ TestLambdaSharedTeam< Kokkos::CudaHostPinnedSpace, Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >();
+}
+#endif
+
+TEST_F( cuda, shmem_size) {
+ TestShmemSize< Kokkos::Cuda >();
+}
+
+TEST_F( cuda, multi_level_scratch) {
+ TestMultiLevelScratchTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >();
+ TestMultiLevelScratchTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >();
+}
+
+TEST_F( cuda , team_vector )
+{
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(0) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(1) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(2) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(3) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(4) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(5) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(6) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(7) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(8) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(9) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(10) ) );
+}
+
+TEST_F( cuda, triple_nested_parallelism )
+{
+ TestTripleNestedReduce< double, Kokkos::Cuda >( 8192, 2048 , 32 , 32 );
+ TestTripleNestedReduce< double, Kokkos::Cuda >( 8192, 2048 , 32 , 16 );
+ TestTripleNestedReduce< double, Kokkos::Cuda >( 8192, 2048 , 16 , 16 );
+}
+
+
+} // namespace test
+
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_a.cpp
similarity index 84%
copy from lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
copy to lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_a.cpp
index 61d2e3570..c01ca1c14 100644
--- a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_a.cpp
@@ -1,56 +1,59 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <cuda/TestCuda.hpp>
-#ifndef KOKKOS_VIEWTILELEFT_HPP
-#define KOKKOS_VIEWTILELEFT_HPP
+namespace Test {
-#include <impl/KokkosExp_ViewTile.hpp>
-
-namespace Kokkos {
-
-using Kokkos::Experimental::tile_subview ;
+TEST_F( cuda , impl_view_mapping_a ) {
+ test_view_mapping< Kokkos::CudaSpace >();
+ test_view_mapping_operator< Kokkos::CudaSpace >();
+}
+TEST_F( cuda , view_of_class )
+{
+ TestViewMappingClassValue< Kokkos::CudaSpace >::run();
+ TestViewMappingClassValue< Kokkos::CudaUVMSpace >::run();
}
-#endif /* #ifndef KOKKOS_VIEWTILELEFT_HPP */
+} // namespace test
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_b.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
copy to lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_b.cpp
index 61d2e3570..8e821ada0 100644
--- a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_b.cpp
@@ -1,56 +1,53 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <cuda/TestCuda.hpp>
-#ifndef KOKKOS_VIEWTILELEFT_HPP
-#define KOKKOS_VIEWTILELEFT_HPP
-
-#include <impl/KokkosExp_ViewTile.hpp>
-
-namespace Kokkos {
-
-using Kokkos::Experimental::tile_subview ;
+namespace Test {
+TEST_F( cuda , impl_view_mapping_d ) {
+ test_view_mapping< Kokkos::CudaHostPinnedSpace >();
+ test_view_mapping_operator< Kokkos::CudaHostPinnedSpace >();
}
-#endif /* #ifndef KOKKOS_VIEWTILELEFT_HPP */
+} // namespace test
diff --git a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_c.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
copy to lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_c.cpp
index 86bc94ab0..cf29a68e9 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_c.cpp
@@ -1,55 +1,53 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <cuda/TestCuda.hpp>
-#ifndef KOKKOS_SINGLETON_HPP
-#define KOKKOS_SINGLETON_HPP
-
-#include <Kokkos_Macros.hpp>
-#include <cstddef>
-
-namespace Kokkos { namespace Impl {
+namespace Test {
+TEST_F( cuda , impl_view_mapping_c ) {
+ test_view_mapping< Kokkos::CudaUVMSpace >();
+ test_view_mapping_operator< Kokkos::CudaUVMSpace >();
+}
-}} // namespace Kokkos::Impl
+} // namespace test
-#endif // KOKKOS_SINGLETON_HPP
diff --git a/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_d.cpp
similarity index 55%
copy from lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp
copy to lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_d.cpp
index c15f81223..db14b5158 100644
--- a/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_d.cpp
@@ -1,76 +1,112 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <cuda/TestCuda.hpp>
-#include <gtest/gtest.h>
+namespace Test {
-#include <Kokkos_Core.hpp>
+TEST_F( cuda , view_nested_view )
+{
+ ::Test::view_nested_view< Kokkos::Cuda >();
+}
-#if !defined(KOKKOS_HAVE_CUDA) || defined(__CUDACC__)
-//----------------------------------------------------------------------------
-#include <TestReduce.hpp>
+TEST_F( cuda , view_remap )
+{
+ enum { N0 = 3 , N1 = 2 , N2 = 8 , N3 = 9 };
-namespace Test {
+ typedef Kokkos::View< double*[N1][N2][N3] ,
+ Kokkos::LayoutRight ,
+ Kokkos::CudaUVMSpace > output_type ;
-class defaultdevicetype : public ::testing::Test {
-protected:
- static void SetUpTestCase()
- {
- Kokkos::initialize();
- }
+ typedef Kokkos::View< int**[N2][N3] ,
+ Kokkos::LayoutLeft ,
+ Kokkos::CudaUVMSpace > input_type ;
- static void TearDownTestCase()
- {
- Kokkos::finalize();
- }
-};
+ typedef Kokkos::View< int*[N0][N2][N3] ,
+ Kokkos::LayoutLeft ,
+ Kokkos::CudaUVMSpace > diff_type ;
+ output_type output( "output" , N0 );
+ input_type input ( "input" , N0 , N1 );
+ diff_type diff ( "diff" , N0 );
+
+ Kokkos::fence();
+ int value = 0 ;
+ for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
+ for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
+ for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
+ for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
+ input(i0,i1,i2,i3) = ++value ;
+ }}}}
+ Kokkos::fence();
+
+ // Kokkos::deep_copy( diff , input ); // throw with incompatible shape
+ Kokkos::deep_copy( output , input );
+
+ Kokkos::fence();
+ value = 0 ;
+ for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
+ for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
+ for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
+ for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
+ ++value ;
+ ASSERT_EQ( value , ((int) output(i0,i1,i2,i3) ) );
+ }}}}
+ Kokkos::fence();
+}
+
+//----------------------------------------------------------------------------
+
+TEST_F( cuda , view_aggregate )
+{
+ TestViewAggregate< Kokkos::Cuda >();
+}
-TEST_F( defaultdevicetype, reduce_instantiation) {
- TestReduceCombinatoricalInstantiation<>::execute();
+TEST_F( cuda , template_meta_functions )
+{
+ TestTemplateMetaFunctions<int, Kokkos::Cuda >();
}
} // namespace test
-#endif
diff --git a/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_e.cpp
similarity index 73%
copy from lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp
copy to lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_e.cpp
index c15f81223..07d425647 100644
--- a/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_e.cpp
@@ -1,76 +1,63 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
-
-#include <gtest/gtest.h>
-
-#include <Kokkos_Core.hpp>
-
-#if !defined(KOKKOS_HAVE_CUDA) || defined(__CUDACC__)
-//----------------------------------------------------------------------------
-
-#include <TestReduce.hpp>
-
+#include <cuda/TestCuda.hpp>
namespace Test {
-class defaultdevicetype : public ::testing::Test {
-protected:
- static void SetUpTestCase()
- {
- Kokkos::initialize();
- }
-
- static void TearDownTestCase()
- {
- Kokkos::finalize();
- }
-};
-
+TEST_F( cuda , impl_shared_alloc ) {
+ test_shared_alloc< Kokkos::CudaSpace , Kokkos::HostSpace::execution_space >();
+ test_shared_alloc< Kokkos::CudaUVMSpace , Kokkos::HostSpace::execution_space >();
+ test_shared_alloc< Kokkos::CudaHostPinnedSpace , Kokkos::HostSpace::execution_space >();
+}
-TEST_F( defaultdevicetype, reduce_instantiation) {
- TestReduceCombinatoricalInstantiation<>::execute();
+TEST_F( cuda , impl_view_mapping_b ) {
+ test_view_mapping_subview< Kokkos::CudaSpace >();
+ test_view_mapping_subview< Kokkos::CudaUVMSpace >();
+ test_view_mapping_subview< Kokkos::CudaHostPinnedSpace >();
+ TestViewMappingAtomic< Kokkos::CudaSpace >::run();
+ TestViewMappingAtomic< Kokkos::CudaUVMSpace >::run();
+ TestViewMappingAtomic< Kokkos::CudaHostPinnedSpace >::run();
}
} // namespace test
-#endif
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_f.cpp
similarity index 82%
copy from lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
copy to lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_f.cpp
index 61d2e3570..34721f02d 100644
--- a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_f.cpp
@@ -1,56 +1,55 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <cuda/TestCuda.hpp>
-#ifndef KOKKOS_VIEWTILELEFT_HPP
-#define KOKKOS_VIEWTILELEFT_HPP
-
-#include <impl/KokkosExp_ViewTile.hpp>
-
-namespace Kokkos {
+namespace Test {
-using Kokkos::Experimental::tile_subview ;
+TEST_F( cuda, view_api_a) {
+ typedef Kokkos::View< const int * , Kokkos::Cuda , Kokkos::MemoryTraits< Kokkos::RandomAccess > > view_texture_managed ;
+ typedef Kokkos::View< const int * , Kokkos::Cuda , Kokkos::MemoryTraits< Kokkos::RandomAccess | Kokkos::Unmanaged > > view_texture_unmanaged ;
+ TestViewAPI< double , Kokkos::Cuda >();
}
-#endif /* #ifndef KOKKOS_VIEWTILELEFT_HPP */
+} // namespace test
diff --git a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_g.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
copy to lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_g.cpp
index 86bc94ab0..abbcf3bf8 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_g.cpp
@@ -1,55 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <cuda/TestCuda.hpp>
-#ifndef KOKKOS_SINGLETON_HPP
-#define KOKKOS_SINGLETON_HPP
-
-#include <Kokkos_Macros.hpp>
-#include <cstddef>
-
-namespace Kokkos { namespace Impl {
+namespace Test {
+TEST_F( cuda, view_api_b) {
+ TestViewAPI< double , Kokkos::CudaUVMSpace >();
+}
-}} // namespace Kokkos::Impl
+} // namespace test
-#endif // KOKKOS_SINGLETON_HPP
diff --git a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_h.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
copy to lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_h.cpp
index 86bc94ab0..989964203 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_h.cpp
@@ -1,55 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <cuda/TestCuda.hpp>
-#ifndef KOKKOS_SINGLETON_HPP
-#define KOKKOS_SINGLETON_HPP
-
-#include <Kokkos_Macros.hpp>
-#include <cstddef>
-
-namespace Kokkos { namespace Impl {
+namespace Test {
+TEST_F( cuda, view_api_c) {
+ TestViewAPI< double , Kokkos::CudaHostPinnedSpace >();
+}
-}} // namespace Kokkos::Impl
+} // namespace test
-#endif // KOKKOS_SINGLETON_HPP
diff --git a/lib/kokkos/core/unit_test/TestOpenMP_a.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP.hpp
similarity index 62%
copy from lib/kokkos/core/unit_test/TestOpenMP_a.cpp
copy to lib/kokkos/core/unit_test/openmp/TestOpenMP.hpp
index 64eac6680..01324a1ee 100644
--- a/lib/kokkos/core/unit_test/TestOpenMP_a.cpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP.hpp
@@ -1,150 +1,116 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
-
+#ifndef KOKKOS_TEST_OPENMPHPP
+#define KOKKOS_TEST_OPENMPHPP
#include <gtest/gtest.h>
#include <Kokkos_Macros.hpp>
#ifdef KOKKOS_LAMBDA
#undef KOKKOS_LAMBDA
#endif
#define KOKKOS_LAMBDA [=]
#include <Kokkos_Core.hpp>
-//----------------------------------------------------------------------------
+#include <TestTile.hpp>
-#include <TestViewImpl.hpp>
-#include <TestAtomic.hpp>
-
-#include <TestViewAPI.hpp>
-#include <TestViewSubview.hpp>
-#include <TestViewOfClass.hpp>
+//----------------------------------------------------------------------------
#include <TestSharedAlloc.hpp>
#include <TestViewMapping.hpp>
+
+#include <TestViewAPI.hpp>
+#include <TestViewOfClass.hpp>
+#include <TestViewSubview.hpp>
+#include <TestAtomic.hpp>
+#include <TestAtomicOperations.hpp>
#include <TestRange.hpp>
#include <TestTeam.hpp>
#include <TestReduce.hpp>
#include <TestScan.hpp>
#include <TestAggregate.hpp>
-#include <TestAggregateReduction.hpp>
#include <TestCompilerMacros.hpp>
+#include <TestTaskScheduler.hpp>
#include <TestMemoryPool.hpp>
#include <TestCXX11.hpp>
#include <TestCXX11Deduction.hpp>
#include <TestTeamVector.hpp>
-#include <TestMemorySpaceTracking.hpp>
#include <TestTemplateMetaFunctions.hpp>
#include <TestPolicyConstruction.hpp>
+#include <TestMDRange.hpp>
namespace Test {
class openmp : public ::testing::Test {
protected:
- static void SetUpTestCase();
- static void TearDownTestCase();
-};
+ static void SetUpTestCase()
+ {
+ const unsigned numa_count = Kokkos::hwloc::get_available_numa_count();
+ const unsigned cores_per_numa = Kokkos::hwloc::get_available_cores_per_numa();
+ const unsigned threads_per_core = Kokkos::hwloc::get_available_threads_per_core();
-TEST_F( openmp, view_subview_auto_1d_left ) {
- TestViewSubview::test_auto_1d< Kokkos::LayoutLeft,Kokkos::OpenMP >();
-}
+ const unsigned threads_count = std::max( 1u , numa_count ) *
+ std::max( 2u , ( cores_per_numa * threads_per_core ) / 2 );
-TEST_F( openmp, view_subview_auto_1d_right ) {
- TestViewSubview::test_auto_1d< Kokkos::LayoutRight,Kokkos::OpenMP >();
-}
+ Kokkos::OpenMP::initialize( threads_count );
+ Kokkos::OpenMP::print_configuration( std::cout , true );
+ srand(10231);
+ }
-TEST_F( openmp, view_subview_auto_1d_stride ) {
- TestViewSubview::test_auto_1d< Kokkos::LayoutStride,Kokkos::OpenMP >();
-}
+ static void TearDownTestCase()
+ {
+ Kokkos::OpenMP::finalize();
-TEST_F( openmp, view_subview_assign_strided ) {
- TestViewSubview::test_1d_strided_assignment< Kokkos::OpenMP >();
-}
+ omp_set_num_threads(1);
-TEST_F( openmp, view_subview_left_0 ) {
- TestViewSubview::test_left_0< Kokkos::OpenMP >();
-}
-
-TEST_F( openmp, view_subview_left_1 ) {
- TestViewSubview::test_left_1< Kokkos::OpenMP >();
-}
-
-TEST_F( openmp, view_subview_left_2 ) {
- TestViewSubview::test_left_2< Kokkos::OpenMP >();
-}
-
-TEST_F( openmp, view_subview_left_3 ) {
- TestViewSubview::test_left_3< Kokkos::OpenMP >();
-}
-
-TEST_F( openmp, view_subview_right_0 ) {
- TestViewSubview::test_right_0< Kokkos::OpenMP >();
-}
-
-TEST_F( openmp, view_subview_right_1 ) {
- TestViewSubview::test_right_1< Kokkos::OpenMP >();
-}
-
-TEST_F( openmp, view_subview_right_3 ) {
- TestViewSubview::test_right_3< Kokkos::OpenMP >();
-}
-
-TEST_F( openmp, view_subview_1d_assign ) {
- TestViewSubview::test_1d_assign< Kokkos::OpenMP >();
-}
-
-TEST_F( openmp, view_subview_2d_from_3d ) {
- TestViewSubview::test_2d_subview_3d< Kokkos::OpenMP >();
-}
+ ASSERT_EQ( 1 , omp_get_max_threads() );
+ }
+};
-TEST_F( openmp, view_subview_2d_from_5d ) {
- TestViewSubview::test_2d_subview_5d< Kokkos::OpenMP >();
}
-
-} // namespace test
-
+#endif
diff --git a/lib/kokkos/core/unit_test/TestOpenMP.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_Atomics.cpp
similarity index 80%
rename from lib/kokkos/core/unit_test/TestOpenMP.cpp
rename to lib/kokkos/core/unit_test/openmp/TestOpenMP_Atomics.cpp
index 6e8fc4517..91722c849 100644
--- a/lib/kokkos/core/unit_test/TestOpenMP.cpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_Atomics.cpp
@@ -1,262 +1,168 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
-
-#include <gtest/gtest.h>
-
-#include <Kokkos_Macros.hpp>
-#ifdef KOKKOS_LAMBDA
-#undef KOKKOS_LAMBDA
-#endif
-#define KOKKOS_LAMBDA [=]
-
-#include <Kokkos_Core.hpp>
-
-//----------------------------------------------------------------------------
-
-#include <TestViewImpl.hpp>
-#include <TestAtomic.hpp>
-#include <TestAtomicOperations.hpp>
-
-#include <TestViewAPI.hpp>
-#include <TestViewSubview.hpp>
-#include <TestViewOfClass.hpp>
-
-#include <TestSharedAlloc.hpp>
-#include <TestViewMapping.hpp>
-
-#include <TestRange.hpp>
-#include <TestTeam.hpp>
-#include <TestReduce.hpp>
-#include <TestScan.hpp>
-#include <TestAggregate.hpp>
-#include <TestAggregateReduction.hpp>
-#include <TestCompilerMacros.hpp>
-#include <TestMemoryPool.hpp>
-
-
-#include <TestCXX11.hpp>
-#include <TestCXX11Deduction.hpp>
-#include <TestTeamVector.hpp>
-#include <TestMemorySpaceTracking.hpp>
-#include <TestTemplateMetaFunctions.hpp>
-
-#include <TestPolicyConstruction.hpp>
-
-#include <TestMDRange.hpp>
+#include <openmp/TestOpenMP.hpp>
namespace Test {
-class openmp : public ::testing::Test {
-protected:
- static void SetUpTestCase()
- {
- const unsigned numa_count = Kokkos::hwloc::get_available_numa_count();
- const unsigned cores_per_numa = Kokkos::hwloc::get_available_cores_per_numa();
- const unsigned threads_per_core = Kokkos::hwloc::get_available_threads_per_core();
-
- const unsigned threads_count = std::max( 1u , numa_count ) *
- std::max( 2u , ( cores_per_numa * threads_per_core ) / 2 );
-
- Kokkos::OpenMP::initialize( threads_count );
- Kokkos::OpenMP::print_configuration( std::cout , true );
- srand(10231);
- }
-
- static void TearDownTestCase()
- {
- Kokkos::OpenMP::finalize();
-
- omp_set_num_threads(1);
-
- ASSERT_EQ( 1 , omp_get_max_threads() );
- }
-};
-
-
-TEST_F( openmp , md_range ) {
- TestMDRange_2D< Kokkos::OpenMP >::test_for2(100,100);
-
- TestMDRange_3D< Kokkos::OpenMP >::test_for3(100,100,100);
-}
-
-TEST_F( openmp , impl_shared_alloc ) {
- test_shared_alloc< Kokkos::HostSpace , Kokkos::OpenMP >();
-}
-
-TEST_F( openmp, policy_construction) {
- TestRangePolicyConstruction< Kokkos::OpenMP >();
- TestTeamPolicyConstruction< Kokkos::OpenMP >();
-}
-
-TEST_F( openmp , impl_view_mapping ) {
- test_view_mapping< Kokkos::OpenMP >();
- test_view_mapping_subview< Kokkos::OpenMP >();
- test_view_mapping_operator< Kokkos::OpenMP >();
- TestViewMappingAtomic< Kokkos::OpenMP >::run();
-}
-
-TEST_F( openmp, view_impl) {
- test_view_impl< Kokkos::OpenMP >();
-}
-
-TEST_F( openmp, view_api) {
- TestViewAPI< double , Kokkos::OpenMP >();
-}
-
-TEST_F( openmp , view_nested_view )
-{
- ::Test::view_nested_view< Kokkos::OpenMP >();
-}
-
TEST_F( openmp , atomics )
{
const int loop_count = 1e4 ;
ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::OpenMP>(loop_count,1) ) );
ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::OpenMP>(loop_count,2) ) );
ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::OpenMP>(loop_count,3) ) );
ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::OpenMP>(loop_count,1) ) );
ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::OpenMP>(loop_count,2) ) );
ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::OpenMP>(loop_count,3) ) );
ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::OpenMP>(loop_count,1) ) );
ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::OpenMP>(loop_count,2) ) );
ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::OpenMP>(loop_count,3) ) );
ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::OpenMP>(loop_count,1) ) );
ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::OpenMP>(loop_count,2) ) );
ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::OpenMP>(loop_count,3) ) );
ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::OpenMP>(loop_count,1) ) );
ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::OpenMP>(loop_count,2) ) );
ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::OpenMP>(loop_count,3) ) );
ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::OpenMP>(loop_count,1) ) );
ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::OpenMP>(loop_count,2) ) );
ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::OpenMP>(loop_count,3) ) );
ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::OpenMP>(100,1) ) );
ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::OpenMP>(100,2) ) );
ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::OpenMP>(100,3) ) );
ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::OpenMP>(100,1) ) );
ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::OpenMP>(100,2) ) );
ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::OpenMP>(100,3) ) );
ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::OpenMP>(100,1) ) );
ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::OpenMP>(100,2) ) );
ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::OpenMP>(100,3) ) );
}
TEST_F( openmp , atomic_operations )
{
const int start = 1; //Avoid zero for division
const int end = 11;
for (int i = start; i < end; ++i)
{
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::OpenMP>(start, end-i, 1 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::OpenMP>(start, end-i, 2 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::OpenMP>(start, end-i, 3 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::OpenMP>(start, end-i, 4 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::OpenMP>(start, end-i, 5 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::OpenMP>(start, end-i, 6 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::OpenMP>(start, end-i, 7 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::OpenMP>(start, end-i, 8 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::OpenMP>(start, end-i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::OpenMP>(start, end-i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::OpenMP>(start, end-i, 12 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::OpenMP>(start, end-i, 1 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::OpenMP>(start, end-i, 2 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::OpenMP>(start, end-i, 3 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::OpenMP>(start, end-i, 4 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::OpenMP>(start, end-i, 5 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::OpenMP>(start, end-i, 6 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::OpenMP>(start, end-i, 7 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::OpenMP>(start, end-i, 8 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::OpenMP>(start, end-i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::OpenMP>(start, end-i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::OpenMP>(start, end-i, 12 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::OpenMP>(start, end-i, 1 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::OpenMP>(start, end-i, 2 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::OpenMP>(start, end-i, 3 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::OpenMP>(start, end-i, 4 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::OpenMP>(start, end-i, 5 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::OpenMP>(start, end-i, 6 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::OpenMP>(start, end-i, 7 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::OpenMP>(start, end-i, 8 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::OpenMP>(start, end-i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::OpenMP>(start, end-i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::OpenMP>(start, end-i, 12 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::OpenMP>(start, end-i, 1 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::OpenMP>(start, end-i, 2 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::OpenMP>(start, end-i, 3 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::OpenMP>(start, end-i, 4 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::OpenMP>(start, end-i, 5 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::OpenMP>(start, end-i, 6 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::OpenMP>(start, end-i, 7 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::OpenMP>(start, end-i, 8 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::OpenMP>(start, end-i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::OpenMP>(start, end-i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::OpenMP>(start, end-i, 12 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::OpenMP>(start, end-i, 1 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::OpenMP>(start, end-i, 2 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::OpenMP>(start, end-i, 3 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::OpenMP>(start, end-i, 4 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::OpenMP>(start, end-i, 5 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::OpenMP>(start, end-i, 6 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::OpenMP>(start, end-i, 7 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::OpenMP>(start, end-i, 8 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::OpenMP>(start, end-i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::OpenMP>(start, end-i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::OpenMP>(start, end-i, 12 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::OpenMP>(start, end-i, 1 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::OpenMP>(start, end-i, 2 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::OpenMP>(start, end-i, 3 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::OpenMP>(start, end-i, 4 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::OpenMP>(start, end-i, 1 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::OpenMP>(start, end-i, 2 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::OpenMP>(start, end-i, 3 ) ) );
ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::OpenMP>(start, end-i, 4 ) ) );
}
}
} // namespace test
diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_Other.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_Other.cpp
new file mode 100644
index 000000000..c69103635
--- /dev/null
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_Other.cpp
@@ -0,0 +1,189 @@
+/*
+//@HEADER
+// ************************************************************************
+//
+// Kokkos v. 2.0
+// Copyright (2014) Sandia Corporation
+//
+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
+// the U.S. Government retains certain rights in this software.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the Corporation nor the names of the
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
+//
+// ************************************************************************
+//@HEADER
+*/
+#include <openmp/TestOpenMP.hpp>
+
+namespace Test {
+
+TEST_F( openmp , init ) {
+ ;
+}
+
+TEST_F( openmp , md_range ) {
+ TestMDRange_2D< Kokkos::OpenMP >::test_for2(100,100);
+
+ TestMDRange_3D< Kokkos::OpenMP >::test_for3(100,100,100);
+}
+
+TEST_F( openmp, policy_construction) {
+ TestRangePolicyConstruction< Kokkos::OpenMP >();
+ TestTeamPolicyConstruction< Kokkos::OpenMP >();
+}
+
+TEST_F( openmp , range_tag )
+{
+ TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_for(0);
+ TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_reduce(0);
+ TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_scan(0);
+ TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(0);
+ TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(0);
+ TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(0);
+ TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy(0);
+
+ TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_for(2);
+ TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_reduce(2);
+ TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_scan(2);
+
+ TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(3);
+ TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(3);
+ TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(3);
+ TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy(3);
+
+ TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_for(1000);
+ TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_reduce(1000);
+ TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_scan(1000);
+
+ TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(1001);
+ TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(1001);
+ TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(1001);
+ TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy(1000);
+}
+
+
+//----------------------------------------------------------------------------
+
+TEST_F( openmp , compiler_macros )
+{
+ ASSERT_TRUE( ( TestCompilerMacros::Test< Kokkos::OpenMP >() ) );
+}
+
+//----------------------------------------------------------------------------
+
+TEST_F( openmp , memory_pool )
+{
+ bool val = TestMemoryPool::test_mempool< Kokkos::OpenMP >( 128, 128000000 );
+ ASSERT_TRUE( val );
+
+ TestMemoryPool::test_mempool2< Kokkos::OpenMP >( 64, 4, 1000000, 2000000 );
+
+ TestMemoryPool::test_memory_exhaustion< Kokkos::OpenMP >();
+}
+
+//----------------------------------------------------------------------------
+
+#if defined( KOKKOS_ENABLE_TASKDAG )
+
+TEST_F( openmp , task_fib )
+{
+ for ( int i = 0 ; i < 25 ; ++i ) {
+ TestTaskScheduler::TestFib< Kokkos::OpenMP >::run(i, (i+1)*(i+1)*10000 );
+ }
+}
+
+TEST_F( openmp , task_depend )
+{
+ for ( int i = 0 ; i < 25 ; ++i ) {
+ TestTaskScheduler::TestTaskDependence< Kokkos::OpenMP >::run(i);
+ }
+}
+
+TEST_F( openmp , task_team )
+{
+ TestTaskScheduler::TestTaskTeam< Kokkos::OpenMP >::run(1000);
+ //TestTaskScheduler::TestTaskTeamValue< Kokkos::OpenMP >::run(1000); //put back after testing
+}
+
+#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
+
+//----------------------------------------------------------------------------
+
+#if defined( KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_OPENMP )
+TEST_F( openmp , cxx11 )
+{
+ if ( std::is_same< Kokkos::DefaultExecutionSpace , Kokkos::OpenMP >::value ) {
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::OpenMP >(1) ) );
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::OpenMP >(2) ) );
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::OpenMP >(3) ) );
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::OpenMP >(4) ) );
+ }
+}
+#endif
+
+TEST_F( openmp, tile_layout )
+{
+ TestTile::test< Kokkos::OpenMP , 1 , 1 >( 1 , 1 );
+ TestTile::test< Kokkos::OpenMP , 1 , 1 >( 2 , 3 );
+ TestTile::test< Kokkos::OpenMP , 1 , 1 >( 9 , 10 );
+
+ TestTile::test< Kokkos::OpenMP , 2 , 2 >( 1 , 1 );
+ TestTile::test< Kokkos::OpenMP , 2 , 2 >( 2 , 3 );
+ TestTile::test< Kokkos::OpenMP , 2 , 2 >( 4 , 4 );
+ TestTile::test< Kokkos::OpenMP , 2 , 2 >( 9 , 9 );
+
+ TestTile::test< Kokkos::OpenMP , 2 , 4 >( 9 , 9 );
+ TestTile::test< Kokkos::OpenMP , 4 , 2 >( 9 , 9 );
+
+ TestTile::test< Kokkos::OpenMP , 4 , 4 >( 1 , 1 );
+ TestTile::test< Kokkos::OpenMP , 4 , 4 >( 4 , 4 );
+ TestTile::test< Kokkos::OpenMP , 4 , 4 >( 9 , 9 );
+ TestTile::test< Kokkos::OpenMP , 4 , 4 >( 9 , 11 );
+
+ TestTile::test< Kokkos::OpenMP , 8 , 8 >( 1 , 1 );
+ TestTile::test< Kokkos::OpenMP , 8 , 8 >( 4 , 4 );
+ TestTile::test< Kokkos::OpenMP , 8 , 8 >( 9 , 9 );
+ TestTile::test< Kokkos::OpenMP , 8 , 8 >( 9 , 11 );
+}
+
+
+TEST_F( openmp , dispatch )
+{
+ const int repeat = 100 ;
+ for ( int i = 0 ; i < repeat ; ++i ) {
+ for ( int j = 0 ; j < repeat ; ++j ) {
+ Kokkos::parallel_for( Kokkos::RangePolicy< Kokkos::OpenMP >(0,j)
+ , KOKKOS_LAMBDA( int ) {} );
+ }}
+}
+
+
+} // namespace test
+
diff --git a/lib/kokkos/core/unit_test/TestOpenMP_b.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_Reductions.cpp
similarity index 50%
copy from lib/kokkos/core/unit_test/TestOpenMP_b.cpp
copy to lib/kokkos/core/unit_test/openmp/TestOpenMP_Reductions.cpp
index 6cc247601..d41e1493e 100644
--- a/lib/kokkos/core/unit_test/TestOpenMP_b.cpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_Reductions.cpp
@@ -1,185 +1,138 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
-
-#include <gtest/gtest.h>
-
-#include <Kokkos_Macros.hpp>
-#ifdef KOKKOS_LAMBDA
-#undef KOKKOS_LAMBDA
-#endif
-#define KOKKOS_LAMBDA [=]
-
-#include <Kokkos_Core.hpp>
-
-//----------------------------------------------------------------------------
-
-#include <TestViewImpl.hpp>
-#include <TestAtomic.hpp>
-
-#include <TestViewAPI.hpp>
-#include <TestViewSubview.hpp>
-#include <TestViewOfClass.hpp>
-
-#include <TestSharedAlloc.hpp>
-#include <TestViewMapping.hpp>
-
-#include <TestRange.hpp>
-#include <TestTeam.hpp>
-#include <TestReduce.hpp>
-#include <TestScan.hpp>
-#include <TestAggregate.hpp>
-#include <TestAggregateReduction.hpp>
-#include <TestCompilerMacros.hpp>
-#include <TestMemoryPool.hpp>
-
-
-#include <TestCXX11.hpp>
-#include <TestCXX11Deduction.hpp>
-#include <TestTeamVector.hpp>
-#include <TestMemorySpaceTracking.hpp>
-#include <TestTemplateMetaFunctions.hpp>
-
-#include <TestPolicyConstruction.hpp>
-
+#include <openmp/TestOpenMP.hpp>
namespace Test {
-class openmp : public ::testing::Test {
-protected:
- static void SetUpTestCase();
- static void TearDownTestCase();
-};
-
-TEST_F( openmp , range_tag )
-{
- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_for(1000);
- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_reduce(1000);
- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_scan(1000);
- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(1001);
- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(1001);
- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(1001);
- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy(1000);
-}
-
-TEST_F( openmp , team_tag )
-{
- TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_for(2);
- TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_reduce(2);
- TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(2);
- TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(2);
- TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_for(1000);
- TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_reduce(1000);
- TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(1000);
- TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(1000);
-}
-
TEST_F( openmp, long_reduce) {
+ TestReduce< long , Kokkos::OpenMP >( 0 );
TestReduce< long , Kokkos::OpenMP >( 1000000 );
}
TEST_F( openmp, double_reduce) {
+ TestReduce< double , Kokkos::OpenMP >( 0 );
TestReduce< double , Kokkos::OpenMP >( 1000000 );
}
+TEST_F( openmp , reducers )
+{
+ TestReducers<int, Kokkos::OpenMP>::execute_integer();
+ TestReducers<size_t, Kokkos::OpenMP>::execute_integer();
+ TestReducers<double, Kokkos::OpenMP>::execute_float();
+ TestReducers<Kokkos::complex<double>, Kokkos::OpenMP>::execute_basic();
+}
+
TEST_F( openmp, long_reduce_dynamic ) {
+ TestReduceDynamic< long , Kokkos::OpenMP >( 0 );
TestReduceDynamic< long , Kokkos::OpenMP >( 1000000 );
}
TEST_F( openmp, double_reduce_dynamic ) {
+ TestReduceDynamic< double , Kokkos::OpenMP >( 0 );
TestReduceDynamic< double , Kokkos::OpenMP >( 1000000 );
}
TEST_F( openmp, long_reduce_dynamic_view ) {
+ TestReduceDynamicView< long , Kokkos::OpenMP >( 0 );
TestReduceDynamicView< long , Kokkos::OpenMP >( 1000000 );
}
-TEST_F( openmp , reducers )
+TEST_F( openmp , scan )
{
- TestReducers<int, Kokkos::OpenMP>::execute_integer();
- TestReducers<size_t, Kokkos::OpenMP>::execute_integer();
- TestReducers<double, Kokkos::OpenMP>::execute_float();
- TestReducers<Kokkos::complex<double>, Kokkos::OpenMP>::execute_basic();
+ TestScan< Kokkos::OpenMP >::test_range( 1 , 1000 );
+ TestScan< Kokkos::OpenMP >( 0 );
+ TestScan< Kokkos::OpenMP >( 100000 );
+ TestScan< Kokkos::OpenMP >( 10000000 );
+ Kokkos::OpenMP::fence();
+}
+
+#if 0
+TEST_F( openmp , scan_small )
+{
+ typedef TestScan< Kokkos::OpenMP , Kokkos::Impl::OpenMPExecUseScanSmall > TestScanFunctor ;
+ for ( int i = 0 ; i < 1000 ; ++i ) {
+ TestScanFunctor( 10 );
+ TestScanFunctor( 10000 );
+ }
+ TestScanFunctor( 1000000 );
+ TestScanFunctor( 10000000 );
+
+ Kokkos::OpenMP::fence();
}
+#endif
-TEST_F( openmp, team_long_reduce) {
+TEST_F( openmp , team_scan )
+{
+ TestScanTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >( 0 );
+ TestScanTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
+ TestScanTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >( 10 );
+ TestScanTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >( 10 );
+ TestScanTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >( 10000 );
+ TestScanTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >( 10000 );
+}
+
+TEST_F( openmp , team_long_reduce) {
+ TestReduceTeam< long , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >( 0 );
+ TestReduceTeam< long , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
TestReduceTeam< long , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >( 3 );
TestReduceTeam< long , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
TestReduceTeam< long , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >( 100000 );
TestReduceTeam< long , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
}
-TEST_F( openmp, team_double_reduce) {
+TEST_F( openmp , team_double_reduce) {
+ TestReduceTeam< double , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >( 0 );
+ TestReduceTeam< double , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
TestReduceTeam< double , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >( 3 );
TestReduceTeam< double , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
TestReduceTeam< double , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >( 100000 );
TestReduceTeam< double , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
}
-TEST_F( openmp, team_shared_request) {
- TestSharedTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >();
- TestSharedTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >();
-}
-
-TEST_F( openmp, team_scratch_request) {
- TestScratchTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >();
- TestScratchTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >();
-}
-
-#if defined(KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA)
-TEST_F( openmp, team_lambda_shared_request) {
- TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >();
- TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >();
-}
-#endif
-
-TEST_F( openmp, shmem_size) {
- TestShmemSize< Kokkos::OpenMP >();
-}
-
-TEST_F( openmp, multi_level_scratch) {
- TestMultiLevelScratchTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >();
- TestMultiLevelScratchTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >();
+TEST_F( openmp , reduction_deduction )
+{
+ TestCXX11::test_reduction_deduction< Kokkos::OpenMP >();
}
} // namespace test
diff --git a/lib/kokkos/core/unit_test/TestOpenMP_a.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_a.cpp
similarity index 70%
copy from lib/kokkos/core/unit_test/TestOpenMP_a.cpp
copy to lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_a.cpp
index 64eac6680..9854417e4 100644
--- a/lib/kokkos/core/unit_test/TestOpenMP_a.cpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_a.cpp
@@ -1,150 +1,92 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
-
-#include <gtest/gtest.h>
-
-#include <Kokkos_Macros.hpp>
-#ifdef KOKKOS_LAMBDA
-#undef KOKKOS_LAMBDA
-#endif
-#define KOKKOS_LAMBDA [=]
-
-#include <Kokkos_Core.hpp>
-
-//----------------------------------------------------------------------------
-
-#include <TestViewImpl.hpp>
-#include <TestAtomic.hpp>
-
-#include <TestViewAPI.hpp>
-#include <TestViewSubview.hpp>
-#include <TestViewOfClass.hpp>
-
-#include <TestSharedAlloc.hpp>
-#include <TestViewMapping.hpp>
-
-#include <TestRange.hpp>
-#include <TestTeam.hpp>
-#include <TestReduce.hpp>
-#include <TestScan.hpp>
-#include <TestAggregate.hpp>
-#include <TestAggregateReduction.hpp>
-#include <TestCompilerMacros.hpp>
-#include <TestMemoryPool.hpp>
-
-
-#include <TestCXX11.hpp>
-#include <TestCXX11Deduction.hpp>
-#include <TestTeamVector.hpp>
-#include <TestMemorySpaceTracking.hpp>
-#include <TestTemplateMetaFunctions.hpp>
-
-#include <TestPolicyConstruction.hpp>
-
+#include <openmp/TestOpenMP.hpp>
namespace Test {
-class openmp : public ::testing::Test {
-protected:
- static void SetUpTestCase();
- static void TearDownTestCase();
-};
-
TEST_F( openmp, view_subview_auto_1d_left ) {
TestViewSubview::test_auto_1d< Kokkos::LayoutLeft,Kokkos::OpenMP >();
}
TEST_F( openmp, view_subview_auto_1d_right ) {
TestViewSubview::test_auto_1d< Kokkos::LayoutRight,Kokkos::OpenMP >();
}
TEST_F( openmp, view_subview_auto_1d_stride ) {
TestViewSubview::test_auto_1d< Kokkos::LayoutStride,Kokkos::OpenMP >();
}
TEST_F( openmp, view_subview_assign_strided ) {
TestViewSubview::test_1d_strided_assignment< Kokkos::OpenMP >();
}
TEST_F( openmp, view_subview_left_0 ) {
TestViewSubview::test_left_0< Kokkos::OpenMP >();
}
TEST_F( openmp, view_subview_left_1 ) {
TestViewSubview::test_left_1< Kokkos::OpenMP >();
}
TEST_F( openmp, view_subview_left_2 ) {
TestViewSubview::test_left_2< Kokkos::OpenMP >();
}
TEST_F( openmp, view_subview_left_3 ) {
TestViewSubview::test_left_3< Kokkos::OpenMP >();
}
TEST_F( openmp, view_subview_right_0 ) {
TestViewSubview::test_right_0< Kokkos::OpenMP >();
}
TEST_F( openmp, view_subview_right_1 ) {
TestViewSubview::test_right_1< Kokkos::OpenMP >();
}
TEST_F( openmp, view_subview_right_3 ) {
TestViewSubview::test_right_3< Kokkos::OpenMP >();
}
-TEST_F( openmp, view_subview_1d_assign ) {
- TestViewSubview::test_1d_assign< Kokkos::OpenMP >();
-}
-
-TEST_F( openmp, view_subview_2d_from_3d ) {
- TestViewSubview::test_2d_subview_3d< Kokkos::OpenMP >();
-}
-
-TEST_F( openmp, view_subview_2d_from_5d ) {
- TestViewSubview::test_2d_subview_5d< Kokkos::OpenMP >();
-}
-
} // namespace test
diff --git a/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_b.cpp
similarity index 72%
copy from lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp
copy to lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_b.cpp
index c15f81223..2aa1fc5c6 100644
--- a/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_b.cpp
@@ -1,76 +1,60 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
-
-#include <gtest/gtest.h>
-
-#include <Kokkos_Core.hpp>
-
-#if !defined(KOKKOS_HAVE_CUDA) || defined(__CUDACC__)
-//----------------------------------------------------------------------------
-
-#include <TestReduce.hpp>
-
+#include <openmp/TestOpenMP.hpp>
namespace Test {
-class defaultdevicetype : public ::testing::Test {
-protected:
- static void SetUpTestCase()
- {
- Kokkos::initialize();
- }
-
- static void TearDownTestCase()
- {
- Kokkos::finalize();
- }
-};
-
+TEST_F( openmp, view_subview_layoutleft_to_layoutleft) {
+ TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::OpenMP >();
+ TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::OpenMP , Kokkos::MemoryTraits<Kokkos::Atomic> >();
+ TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::OpenMP , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+}
-TEST_F( defaultdevicetype, reduce_instantiation) {
- TestReduceCombinatoricalInstantiation<>::execute();
+TEST_F( openmp, view_subview_layoutright_to_layoutright) {
+ TestViewSubview::test_layoutright_to_layoutright< Kokkos::OpenMP >();
+ TestViewSubview::test_layoutright_to_layoutright< Kokkos::OpenMP , Kokkos::MemoryTraits<Kokkos::Atomic> >();
+ TestViewSubview::test_layoutright_to_layoutright< Kokkos::OpenMP , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
} // namespace test
-#endif
diff --git a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c01.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
copy to lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c01.cpp
index 86bc94ab0..1a6871cfc 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c01.cpp
@@ -1,55 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <openmp/TestOpenMP.hpp>
-#ifndef KOKKOS_SINGLETON_HPP
-#define KOKKOS_SINGLETON_HPP
-
-#include <Kokkos_Macros.hpp>
-#include <cstddef>
-
-namespace Kokkos { namespace Impl {
+namespace Test {
+TEST_F( openmp, view_subview_1d_assign ) {
+ TestViewSubview::test_1d_assign< Kokkos::OpenMP >();
+}
-}} // namespace Kokkos::Impl
+} // namespace test
-#endif // KOKKOS_SINGLETON_HPP
diff --git a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c02.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
copy to lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c02.cpp
index 86bc94ab0..b04edbb99 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c02.cpp
@@ -1,55 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <openmp/TestOpenMP.hpp>
-#ifndef KOKKOS_SINGLETON_HPP
-#define KOKKOS_SINGLETON_HPP
-
-#include <Kokkos_Macros.hpp>
-#include <cstddef>
-
-namespace Kokkos { namespace Impl {
+namespace Test {
+TEST_F( openmp, view_subview_1d_assign_atomic ) {
+ TestViewSubview::test_1d_assign< Kokkos::OpenMP , Kokkos::MemoryTraits<Kokkos::Atomic> >();
+}
-}} // namespace Kokkos::Impl
+} // namespace test
-#endif // KOKKOS_SINGLETON_HPP
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c03.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
copy to lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c03.cpp
index 61d2e3570..765e23583 100644
--- a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c03.cpp
@@ -1,56 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <openmp/TestOpenMP.hpp>
-#ifndef KOKKOS_VIEWTILELEFT_HPP
-#define KOKKOS_VIEWTILELEFT_HPP
-
-#include <impl/KokkosExp_ViewTile.hpp>
-
-namespace Kokkos {
-
-using Kokkos::Experimental::tile_subview ;
+namespace Test {
+TEST_F( openmp, view_subview_1d_assign_randomaccess ) {
+ TestViewSubview::test_1d_assign< Kokkos::OpenMP , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-#endif /* #ifndef KOKKOS_VIEWTILELEFT_HPP */
+} // namespace test
diff --git a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c04.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
copy to lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c04.cpp
index 86bc94ab0..9d8b62708 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c04.cpp
@@ -1,55 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <openmp/TestOpenMP.hpp>
-#ifndef KOKKOS_SINGLETON_HPP
-#define KOKKOS_SINGLETON_HPP
-
-#include <Kokkos_Macros.hpp>
-#include <cstddef>
-
-namespace Kokkos { namespace Impl {
+namespace Test {
+TEST_F( openmp, view_subview_2d_from_3d ) {
+ TestViewSubview::test_2d_subview_3d< Kokkos::OpenMP >();
+}
-}} // namespace Kokkos::Impl
+} // namespace test
-#endif // KOKKOS_SINGLETON_HPP
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c05.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
copy to lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c05.cpp
index 61d2e3570..9c19cf0e5 100644
--- a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c05.cpp
@@ -1,56 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <openmp/TestOpenMP.hpp>
-#ifndef KOKKOS_VIEWTILELEFT_HPP
-#define KOKKOS_VIEWTILELEFT_HPP
-
-#include <impl/KokkosExp_ViewTile.hpp>
-
-namespace Kokkos {
-
-using Kokkos::Experimental::tile_subview ;
+namespace Test {
+TEST_F( openmp, view_subview_2d_from_3d_atomic ) {
+ TestViewSubview::test_2d_subview_3d< Kokkos::OpenMP , Kokkos::MemoryTraits<Kokkos::Atomic> >();
}
-#endif /* #ifndef KOKKOS_VIEWTILELEFT_HPP */
+} // namespace test
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c06.cpp
similarity index 88%
copy from lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
copy to lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c06.cpp
index 61d2e3570..c1bdf7235 100644
--- a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c06.cpp
@@ -1,56 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <openmp/TestOpenMP.hpp>
-#ifndef KOKKOS_VIEWTILELEFT_HPP
-#define KOKKOS_VIEWTILELEFT_HPP
-
-#include <impl/KokkosExp_ViewTile.hpp>
-
-namespace Kokkos {
-
-using Kokkos::Experimental::tile_subview ;
+namespace Test {
+TEST_F( openmp, view_subview_2d_from_3d_randomaccess ) {
+ TestViewSubview::test_2d_subview_3d< Kokkos::OpenMP , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-#endif /* #ifndef KOKKOS_VIEWTILELEFT_HPP */
+} // namespace test
diff --git a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c07.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
copy to lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c07.cpp
index 86bc94ab0..08a3b5a54 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c07.cpp
@@ -1,55 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <openmp/TestOpenMP.hpp>
-#ifndef KOKKOS_SINGLETON_HPP
-#define KOKKOS_SINGLETON_HPP
-
-#include <Kokkos_Macros.hpp>
-#include <cstddef>
-
-namespace Kokkos { namespace Impl {
+namespace Test {
+TEST_F( openmp, view_subview_3d_from_5d_left ) {
+ TestViewSubview::test_3d_subview_5d_left< Kokkos::OpenMP >();
+}
-}} // namespace Kokkos::Impl
+} // namespace test
-#endif // KOKKOS_SINGLETON_HPP
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c08.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
copy to lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c08.cpp
index 61d2e3570..0864ebbda 100644
--- a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c08.cpp
@@ -1,56 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <openmp/TestOpenMP.hpp>
-#ifndef KOKKOS_VIEWTILELEFT_HPP
-#define KOKKOS_VIEWTILELEFT_HPP
-
-#include <impl/KokkosExp_ViewTile.hpp>
-
-namespace Kokkos {
-
-using Kokkos::Experimental::tile_subview ;
+namespace Test {
+TEST_F( openmp, view_subview_3d_from_5d_left_atomic ) {
+ TestViewSubview::test_3d_subview_5d_left< Kokkos::OpenMP , Kokkos::MemoryTraits<Kokkos::Atomic> >();
}
-#endif /* #ifndef KOKKOS_VIEWTILELEFT_HPP */
+} // namespace test
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c09.cpp
similarity index 88%
copy from lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
copy to lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c09.cpp
index 61d2e3570..e38dfecbf 100644
--- a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c09.cpp
@@ -1,56 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <openmp/TestOpenMP.hpp>
-#ifndef KOKKOS_VIEWTILELEFT_HPP
-#define KOKKOS_VIEWTILELEFT_HPP
-
-#include <impl/KokkosExp_ViewTile.hpp>
-
-namespace Kokkos {
-
-using Kokkos::Experimental::tile_subview ;
+namespace Test {
+TEST_F( openmp, view_subview_3d_from_5d_left_randomaccess ) {
+ TestViewSubview::test_3d_subview_5d_left< Kokkos::OpenMP , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-#endif /* #ifndef KOKKOS_VIEWTILELEFT_HPP */
+} // namespace test
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c10.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
copy to lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c10.cpp
index 61d2e3570..b7e4683d2 100644
--- a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c10.cpp
@@ -1,56 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <openmp/TestOpenMP.hpp>
-#ifndef KOKKOS_VIEWTILELEFT_HPP
-#define KOKKOS_VIEWTILELEFT_HPP
-
-#include <impl/KokkosExp_ViewTile.hpp>
-
-namespace Kokkos {
-
-using Kokkos::Experimental::tile_subview ;
+namespace Test {
+TEST_F( openmp, view_subview_3d_from_5d_right ) {
+ TestViewSubview::test_3d_subview_5d_right< Kokkos::OpenMP >();
}
-#endif /* #ifndef KOKKOS_VIEWTILELEFT_HPP */
+} // namespace test
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c11.cpp
similarity index 88%
copy from lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
copy to lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c11.cpp
index 61d2e3570..fc3e66fd4 100644
--- a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c11.cpp
@@ -1,56 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <openmp/TestOpenMP.hpp>
-#ifndef KOKKOS_VIEWTILELEFT_HPP
-#define KOKKOS_VIEWTILELEFT_HPP
-
-#include <impl/KokkosExp_ViewTile.hpp>
-
-namespace Kokkos {
-
-using Kokkos::Experimental::tile_subview ;
+namespace Test {
+TEST_F( openmp, view_subview_3d_from_5d_right_atomic ) {
+ TestViewSubview::test_3d_subview_5d_right< Kokkos::OpenMP , Kokkos::MemoryTraits<Kokkos::Atomic> >();
}
-#endif /* #ifndef KOKKOS_VIEWTILELEFT_HPP */
+} // namespace test
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c12.cpp
similarity index 88%
copy from lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
copy to lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c12.cpp
index 61d2e3570..e21a13ee5 100644
--- a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c12.cpp
@@ -1,56 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <openmp/TestOpenMP.hpp>
-#ifndef KOKKOS_VIEWTILELEFT_HPP
-#define KOKKOS_VIEWTILELEFT_HPP
-
-#include <impl/KokkosExp_ViewTile.hpp>
-
-namespace Kokkos {
-
-using Kokkos::Experimental::tile_subview ;
+namespace Test {
+TEST_F( openmp, view_subview_3d_from_5d_right_randomaccess ) {
+ TestViewSubview::test_3d_subview_5d_right< Kokkos::OpenMP , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-#endif /* #ifndef KOKKOS_VIEWTILELEFT_HPP */
+} // namespace test
diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c_all.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c_all.cpp
new file mode 100644
index 000000000..9da159ab5
--- /dev/null
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c_all.cpp
@@ -0,0 +1,12 @@
+#include<openmp/TestOpenMP_SubView_c01.cpp>
+#include<openmp/TestOpenMP_SubView_c02.cpp>
+#include<openmp/TestOpenMP_SubView_c03.cpp>
+#include<openmp/TestOpenMP_SubView_c04.cpp>
+#include<openmp/TestOpenMP_SubView_c05.cpp>
+#include<openmp/TestOpenMP_SubView_c06.cpp>
+#include<openmp/TestOpenMP_SubView_c07.cpp>
+#include<openmp/TestOpenMP_SubView_c08.cpp>
+#include<openmp/TestOpenMP_SubView_c09.cpp>
+#include<openmp/TestOpenMP_SubView_c10.cpp>
+#include<openmp/TestOpenMP_SubView_c11.cpp>
+#include<openmp/TestOpenMP_SubView_c12.cpp>
diff --git a/lib/kokkos/core/unit_test/TestOpenMP_b.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_Team.cpp
similarity index 52%
rename from lib/kokkos/core/unit_test/TestOpenMP_b.cpp
rename to lib/kokkos/core/unit_test/openmp/TestOpenMP_Team.cpp
index 6cc247601..1539e30e1 100644
--- a/lib/kokkos/core/unit_test/TestOpenMP_b.cpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_Team.cpp
@@ -1,185 +1,122 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
-
-#include <gtest/gtest.h>
-
-#include <Kokkos_Macros.hpp>
-#ifdef KOKKOS_LAMBDA
-#undef KOKKOS_LAMBDA
-#endif
-#define KOKKOS_LAMBDA [=]
-
-#include <Kokkos_Core.hpp>
-
-//----------------------------------------------------------------------------
-
-#include <TestViewImpl.hpp>
-#include <TestAtomic.hpp>
-
-#include <TestViewAPI.hpp>
-#include <TestViewSubview.hpp>
-#include <TestViewOfClass.hpp>
-
-#include <TestSharedAlloc.hpp>
-#include <TestViewMapping.hpp>
-
-#include <TestRange.hpp>
-#include <TestTeam.hpp>
-#include <TestReduce.hpp>
-#include <TestScan.hpp>
-#include <TestAggregate.hpp>
-#include <TestAggregateReduction.hpp>
-#include <TestCompilerMacros.hpp>
-#include <TestMemoryPool.hpp>
-
-
-#include <TestCXX11.hpp>
-#include <TestCXX11Deduction.hpp>
-#include <TestTeamVector.hpp>
-#include <TestMemorySpaceTracking.hpp>
-#include <TestTemplateMetaFunctions.hpp>
-
-#include <TestPolicyConstruction.hpp>
-
+#include <openmp/TestOpenMP.hpp>
namespace Test {
-class openmp : public ::testing::Test {
-protected:
- static void SetUpTestCase();
- static void TearDownTestCase();
-};
-
-TEST_F( openmp , range_tag )
-{
- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_for(1000);
- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_reduce(1000);
- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_scan(1000);
- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(1001);
- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(1001);
- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(1001);
- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy(1000);
-}
-
TEST_F( openmp , team_tag )
{
+ TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_for(0);
+ TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_reduce(0);
+ TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(0);
+ TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(0);
+
TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_for(2);
TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_reduce(2);
TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(2);
TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(2);
+
TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_for(1000);
TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_reduce(1000);
TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(1000);
TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(1000);
}
-TEST_F( openmp, long_reduce) {
- TestReduce< long , Kokkos::OpenMP >( 1000000 );
-}
-
-TEST_F( openmp, double_reduce) {
- TestReduce< double , Kokkos::OpenMP >( 1000000 );
-}
-
-TEST_F( openmp, long_reduce_dynamic ) {
- TestReduceDynamic< long , Kokkos::OpenMP >( 1000000 );
-}
-
-TEST_F( openmp, double_reduce_dynamic ) {
- TestReduceDynamic< double , Kokkos::OpenMP >( 1000000 );
-}
-
-TEST_F( openmp, long_reduce_dynamic_view ) {
- TestReduceDynamicView< long , Kokkos::OpenMP >( 1000000 );
-}
-
-TEST_F( openmp , reducers )
-{
- TestReducers<int, Kokkos::OpenMP>::execute_integer();
- TestReducers<size_t, Kokkos::OpenMP>::execute_integer();
- TestReducers<double, Kokkos::OpenMP>::execute_float();
- TestReducers<Kokkos::complex<double>, Kokkos::OpenMP>::execute_basic();
-}
-
-TEST_F( openmp, team_long_reduce) {
- TestReduceTeam< long , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >( 3 );
- TestReduceTeam< long , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
- TestReduceTeam< long , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >( 100000 );
- TestReduceTeam< long , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
-}
-
-TEST_F( openmp, team_double_reduce) {
- TestReduceTeam< double , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >( 3 );
- TestReduceTeam< double , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
- TestReduceTeam< double , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >( 100000 );
- TestReduceTeam< double , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
-}
-
-TEST_F( openmp, team_shared_request) {
+TEST_F( openmp , team_shared_request) {
TestSharedTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >();
TestSharedTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >();
}
TEST_F( openmp, team_scratch_request) {
TestScratchTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >();
TestScratchTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >();
}
#if defined(KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA)
-TEST_F( openmp, team_lambda_shared_request) {
+TEST_F( openmp , team_lambda_shared_request) {
TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >();
TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >();
}
#endif
TEST_F( openmp, shmem_size) {
TestShmemSize< Kokkos::OpenMP >();
}
TEST_F( openmp, multi_level_scratch) {
TestMultiLevelScratchTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >();
TestMultiLevelScratchTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >();
}
+TEST_F( openmp , team_vector )
+{
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(0) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(1) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(2) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(3) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(4) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(5) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(6) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(7) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(8) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(9) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(10) ) );
+}
+
+#ifdef KOKKOS_COMPILER_GNU
+#if ( KOKKOS_COMPILER_GNU == 472 )
+#define SKIP_TEST
+#endif
+#endif
+
+#ifndef SKIP_TEST
+TEST_F( openmp, triple_nested_parallelism )
+{
+ TestTripleNestedReduce< double, Kokkos::OpenMP >( 8192, 2048 , 32 , 32 );
+ TestTripleNestedReduce< double, Kokkos::OpenMP >( 8192, 2048 , 32 , 16 );
+ TestTripleNestedReduce< double, Kokkos::OpenMP >( 8192, 2048 , 16 , 16 );
+}
+#endif
+
} // namespace test
diff --git a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_ViewAPI_a.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
copy to lib/kokkos/core/unit_test/openmp/TestOpenMP_ViewAPI_a.cpp
index 86bc94ab0..82cbf3ea1 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_ViewAPI_a.cpp
@@ -1,55 +1,53 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <openmp/TestOpenMP.hpp>
-#ifndef KOKKOS_SINGLETON_HPP
-#define KOKKOS_SINGLETON_HPP
-
-#include <Kokkos_Macros.hpp>
-#include <cstddef>
-
-namespace Kokkos { namespace Impl {
+namespace Test {
+TEST_F( openmp , impl_view_mapping_a ) {
+ test_view_mapping< Kokkos::OpenMP >();
+ test_view_mapping_operator< Kokkos::OpenMP >();
+}
-}} // namespace Kokkos::Impl
+} // namespace test
-#endif // KOKKOS_SINGLETON_HPP
diff --git a/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_ViewAPI_b.cpp
similarity index 52%
copy from lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp
copy to lib/kokkos/core/unit_test/openmp/TestOpenMP_ViewAPI_b.cpp
index c15f81223..b2d4f87fd 100644
--- a/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_ViewAPI_b.cpp
@@ -1,76 +1,121 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <openmp/TestOpenMP.hpp>
-#include <gtest/gtest.h>
+namespace Test {
-#include <Kokkos_Core.hpp>
+TEST_F( openmp , impl_shared_alloc ) {
+ test_shared_alloc< Kokkos::HostSpace , Kokkos::OpenMP >();
+}
-#if !defined(KOKKOS_HAVE_CUDA) || defined(__CUDACC__)
-//----------------------------------------------------------------------------
+TEST_F( openmp , impl_view_mapping_b ) {
+ test_view_mapping_subview< Kokkos::OpenMP >();
+ TestViewMappingAtomic< Kokkos::OpenMP >::run();
+}
-#include <TestReduce.hpp>
+TEST_F( openmp, view_api) {
+ TestViewAPI< double , Kokkos::OpenMP >();
+}
+TEST_F( openmp , view_nested_view )
+{
+ ::Test::view_nested_view< Kokkos::OpenMP >();
+}
-namespace Test {
-class defaultdevicetype : public ::testing::Test {
-protected:
- static void SetUpTestCase()
- {
- Kokkos::initialize();
- }
- static void TearDownTestCase()
- {
- Kokkos::finalize();
- }
-};
+TEST_F( openmp , view_remap )
+{
+ enum { N0 = 3 , N1 = 2 , N2 = 8 , N3 = 9 };
+ typedef Kokkos::View< double*[N1][N2][N3] ,
+ Kokkos::LayoutRight ,
+ Kokkos::OpenMP > output_type ;
+
+ typedef Kokkos::View< int**[N2][N3] ,
+ Kokkos::LayoutLeft ,
+ Kokkos::OpenMP > input_type ;
+
+ typedef Kokkos::View< int*[N0][N2][N3] ,
+ Kokkos::LayoutLeft ,
+ Kokkos::OpenMP > diff_type ;
+
+ output_type output( "output" , N0 );
+ input_type input ( "input" , N0 , N1 );
+ diff_type diff ( "diff" , N0 );
+
+ int value = 0 ;
+ for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
+ for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
+ for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
+ for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
+ input(i0,i1,i2,i3) = ++value ;
+ }}}}
+
+ // Kokkos::deep_copy( diff , input ); // throw with incompatible shape
+ Kokkos::deep_copy( output , input );
+
+ value = 0 ;
+ for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
+ for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
+ for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
+ for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
+ ++value ;
+ ASSERT_EQ( value , ((int) output(i0,i1,i2,i3) ) );
+ }}}}
+}
+
+//----------------------------------------------------------------------------
+
+TEST_F( openmp , view_aggregate )
+{
+ TestViewAggregate< Kokkos::OpenMP >();
+}
-TEST_F( defaultdevicetype, reduce_instantiation) {
- TestReduceCombinatoricalInstantiation<>::execute();
+TEST_F( openmp , template_meta_functions )
+{
+ TestTemplateMetaFunctions<int, Kokkos::OpenMP >();
}
} // namespace test
-#endif
diff --git a/lib/kokkos/core/unit_test/TestMemorySpaceTracking.hpp b/lib/kokkos/core/unit_test/serial/TestSerial.hpp
similarity index 63%
rename from lib/kokkos/core/unit_test/TestMemorySpaceTracking.hpp
rename to lib/kokkos/core/unit_test/serial/TestSerial.hpp
index 575f2f2c2..a966257fc 100644
--- a/lib/kokkos/core/unit_test/TestMemorySpaceTracking.hpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial.hpp
@@ -1,100 +1,102 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
-
+#ifndef KOKKOS_TEST_SERIALHPP
+#define KOKKOS_TEST_SERIALHPP
#include <gtest/gtest.h>
-#include <iostream>
-#include <Kokkos_Core.hpp>
+#include <Kokkos_Macros.hpp>
+#ifdef KOKKOS_LAMBDA
+#undef KOKKOS_LAMBDA
+#endif
+#define KOKKOS_LAMBDA [=]
-/*--------------------------------------------------------------------------*/
+#include <Kokkos_Core.hpp>
-namespace {
+#include <TestTile.hpp>
-template<class Arg1>
-class TestMemorySpace {
-public:
+//----------------------------------------------------------------------------
- typedef typename Arg1::memory_space MemorySpace;
- TestMemorySpace() { run_test(); }
+#include <TestSharedAlloc.hpp>
+#include <TestViewMapping.hpp>
- void run_test()
- {
-#if ! KOKKOS_USING_EXP_VIEW
+#include <TestViewAPI.hpp>
+#include <TestViewOfClass.hpp>
+#include <TestViewSubview.hpp>
+#include <TestAtomic.hpp>
+#include <TestAtomicOperations.hpp>
+#include <TestRange.hpp>
+#include <TestTeam.hpp>
+#include <TestReduce.hpp>
+#include <TestScan.hpp>
+#include <TestAggregate.hpp>
+#include <TestCompilerMacros.hpp>
+#include <TestTaskScheduler.hpp>
+#include <TestMemoryPool.hpp>
- Kokkos::View<int* ,Arg1> invalid;
- ASSERT_EQ(0u, invalid.tracker().ref_count() );
- {
- Kokkos::View<int* ,Arg1> a("A",10);
+#include <TestCXX11.hpp>
+#include <TestCXX11Deduction.hpp>
+#include <TestTeamVector.hpp>
+#include <TestTemplateMetaFunctions.hpp>
- ASSERT_EQ(1u, a.tracker().ref_count() );
+#include <TestPolicyConstruction.hpp>
- {
- Kokkos::View<int* ,Arg1> b = a;
- ASSERT_EQ(2u, b.tracker().ref_count() );
+#include <TestMDRange.hpp>
- Kokkos::View<int* ,Arg1> D("D",10);
- ASSERT_EQ(1u, D.tracker().ref_count() );
+namespace Test {
- {
- Kokkos::View<int* ,Arg1> E("E",10);
- ASSERT_EQ(1u, E.tracker().ref_count() );
- }
-
- ASSERT_EQ(2u, b.tracker().ref_count() );
- }
- ASSERT_EQ(1u, a.tracker().ref_count() );
+class serial : public ::testing::Test {
+protected:
+ static void SetUpTestCase()
+ {
+ Kokkos::HostSpace::execution_space::initialize();
+ }
+ static void TearDownTestCase()
+ {
+ Kokkos::HostSpace::execution_space::finalize();
}
-
-#endif
-
- }
};
}
-
-/*--------------------------------------------------------------------------*/
-
-
-
+#endif
diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_Atomics.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_Atomics.cpp
new file mode 100644
index 000000000..6eec0683a
--- /dev/null
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_Atomics.cpp
@@ -0,0 +1,168 @@
+/*
+//@HEADER
+// ************************************************************************
+//
+// Kokkos v. 2.0
+// Copyright (2014) Sandia Corporation
+//
+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
+// the U.S. Government retains certain rights in this software.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the Corporation nor the names of the
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
+//
+// ************************************************************************
+//@HEADER
+*/
+#include <serial/TestSerial.hpp>
+
+namespace Test {
+
+TEST_F( serial , atomics )
+{
+ const int loop_count = 1e6 ;
+
+ ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Serial>(loop_count,1) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Serial>(loop_count,2) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Serial>(loop_count,3) ) );
+
+ ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Serial>(loop_count,1) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Serial>(loop_count,2) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Serial>(loop_count,3) ) );
+
+ ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Serial>(loop_count,1) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Serial>(loop_count,2) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Serial>(loop_count,3) ) );
+
+ ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Serial>(loop_count,1) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Serial>(loop_count,2) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Serial>(loop_count,3) ) );
+
+ ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Serial>(loop_count,1) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Serial>(loop_count,2) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Serial>(loop_count,3) ) );
+
+ ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Serial>(loop_count,1) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Serial>(loop_count,2) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Serial>(loop_count,3) ) );
+
+ ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Serial>(100,1) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Serial>(100,2) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Serial>(100,3) ) );
+
+ ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Serial>(100,1) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Serial>(100,2) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Serial>(100,3) ) );
+
+ ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::Serial>(100,1) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::Serial>(100,2) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::Serial>(100,3) ) );
+}
+
+TEST_F( serial , atomic_operations )
+{
+ const int start = 1; //Avoid zero for division
+ const int end = 11;
+ for (int i = start; i < end; ++i)
+ {
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 8 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 12 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 8 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 12 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 8 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 12 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 8 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 12) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 8 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 12 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Serial>(start, end-i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Serial>(start, end-i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Serial>(start, end-i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Serial>(start, end-i, 4 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Serial>(start, end-i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Serial>(start, end-i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Serial>(start, end-i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Serial>(start, end-i, 4 ) ) );
+ }
+
+}
+
+} // namespace test
+
diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_Other.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_Other.cpp
new file mode 100644
index 000000000..b1c32cfaf
--- /dev/null
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_Other.cpp
@@ -0,0 +1,165 @@
+/*
+//@HEADER
+// ************************************************************************
+//
+// Kokkos v. 2.0
+// Copyright (2014) Sandia Corporation
+//
+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
+// the U.S. Government retains certain rights in this software.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the Corporation nor the names of the
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
+//
+// ************************************************************************
+//@HEADER
+*/
+#include <serial/TestSerial.hpp>
+
+namespace Test {
+
+TEST_F( serial , md_range ) {
+ TestMDRange_2D< Kokkos::Serial >::test_for2(100,100);
+
+ TestMDRange_3D< Kokkos::Serial >::test_for3(100,100,100);
+}
+
+TEST_F( serial, policy_construction) {
+ TestRangePolicyConstruction< Kokkos::Serial >();
+ TestTeamPolicyConstruction< Kokkos::Serial >();
+}
+
+TEST_F( serial , range_tag )
+{
+ TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >::test_for(0);
+ TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >::test_reduce(0);
+ TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >::test_scan(0);
+ TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(0);
+ TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(0);
+ TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(0);
+
+ TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >::test_for(1000);
+ TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >::test_reduce(1000);
+ TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >::test_scan(1000);
+ TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(1001);
+ TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(1001);
+ TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(1001);
+ TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy(1000);
+}
+
+
+//----------------------------------------------------------------------------
+
+TEST_F( serial , compiler_macros )
+{
+ ASSERT_TRUE( ( TestCompilerMacros::Test< Kokkos::Serial >() ) );
+}
+
+//----------------------------------------------------------------------------
+
+TEST_F( serial , memory_pool )
+{
+ bool val = TestMemoryPool::test_mempool< Kokkos::Serial >( 128, 128000000 );
+ ASSERT_TRUE( val );
+
+ TestMemoryPool::test_mempool2< Kokkos::Serial >( 64, 4, 1000000, 2000000 );
+
+ TestMemoryPool::test_memory_exhaustion< Kokkos::Serial >();
+}
+
+//----------------------------------------------------------------------------
+
+#if defined( KOKKOS_ENABLE_TASKDAG )
+
+TEST_F( serial , task_fib )
+{
+ for ( int i = 0 ; i < 25 ; ++i ) {
+ TestTaskScheduler::TestFib< Kokkos::Serial >::run(i);
+ }
+}
+
+TEST_F( serial , task_depend )
+{
+ for ( int i = 0 ; i < 25 ; ++i ) {
+ TestTaskScheduler::TestTaskDependence< Kokkos::Serial >::run(i);
+ }
+}
+
+TEST_F( serial , task_team )
+{
+ TestTaskScheduler::TestTaskTeam< Kokkos::Serial >::run(1000);
+ //TestTaskScheduler::TestTaskTeamValue< Kokkos::Serial >::run(1000); //put back after testing
+}
+
+#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
+
+//----------------------------------------------------------------------------
+
+#if defined( KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_SERIAL )
+TEST_F( serial , cxx11 )
+{
+ if ( std::is_same< Kokkos::DefaultExecutionSpace , Kokkos::Serial >::value ) {
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Serial >(1) ) );
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Serial >(2) ) );
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Serial >(3) ) );
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Serial >(4) ) );
+ }
+}
+#endif
+
+TEST_F( serial, tile_layout )
+{
+ TestTile::test< Kokkos::Serial , 1 , 1 >( 1 , 1 );
+ TestTile::test< Kokkos::Serial , 1 , 1 >( 2 , 3 );
+ TestTile::test< Kokkos::Serial , 1 , 1 >( 9 , 10 );
+
+ TestTile::test< Kokkos::Serial , 2 , 2 >( 1 , 1 );
+ TestTile::test< Kokkos::Serial , 2 , 2 >( 2 , 3 );
+ TestTile::test< Kokkos::Serial , 2 , 2 >( 4 , 4 );
+ TestTile::test< Kokkos::Serial , 2 , 2 >( 9 , 9 );
+
+ TestTile::test< Kokkos::Serial , 2 , 4 >( 9 , 9 );
+ TestTile::test< Kokkos::Serial , 4 , 2 >( 9 , 9 );
+
+ TestTile::test< Kokkos::Serial , 4 , 4 >( 1 , 1 );
+ TestTile::test< Kokkos::Serial , 4 , 4 >( 4 , 4 );
+ TestTile::test< Kokkos::Serial , 4 , 4 >( 9 , 9 );
+ TestTile::test< Kokkos::Serial , 4 , 4 >( 9 , 11 );
+
+ TestTile::test< Kokkos::Serial , 8 , 8 >( 1 , 1 );
+ TestTile::test< Kokkos::Serial , 8 , 8 >( 4 , 4 );
+ TestTile::test< Kokkos::Serial , 8 , 8 >( 9 , 9 );
+ TestTile::test< Kokkos::Serial , 8 , 8 >( 9 , 11 );
+}
+
+
+
+
+} // namespace test
+
diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_Reductions.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_Reductions.cpp
new file mode 100644
index 000000000..25b5ac6d1
--- /dev/null
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_Reductions.cpp
@@ -0,0 +1,122 @@
+/*
+//@HEADER
+// ************************************************************************
+//
+// Kokkos v. 2.0
+// Copyright (2014) Sandia Corporation
+//
+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
+// the U.S. Government retains certain rights in this software.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the Corporation nor the names of the
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
+//
+// ************************************************************************
+//@HEADER
+*/
+#include <serial/TestSerial.hpp>
+
+namespace Test {
+
+TEST_F( serial, long_reduce) {
+ TestReduce< long , Kokkos::Serial >( 0 );
+ TestReduce< long , Kokkos::Serial >( 1000000 );
+}
+
+TEST_F( serial, double_reduce) {
+ TestReduce< double , Kokkos::Serial >( 0 );
+ TestReduce< double , Kokkos::Serial >( 1000000 );
+}
+
+TEST_F( serial , reducers )
+{
+ TestReducers<int, Kokkos::Serial>::execute_integer();
+ TestReducers<size_t, Kokkos::Serial>::execute_integer();
+ TestReducers<double, Kokkos::Serial>::execute_float();
+ TestReducers<Kokkos::complex<double>, Kokkos::Serial>::execute_basic();
+}
+
+TEST_F( serial, long_reduce_dynamic ) {
+ TestReduceDynamic< long , Kokkos::Serial >( 0 );
+ TestReduceDynamic< long , Kokkos::Serial >( 1000000 );
+}
+
+TEST_F( serial, double_reduce_dynamic ) {
+ TestReduceDynamic< double , Kokkos::Serial >( 0 );
+ TestReduceDynamic< double , Kokkos::Serial >( 1000000 );
+}
+
+TEST_F( serial, long_reduce_dynamic_view ) {
+ TestReduceDynamicView< long , Kokkos::Serial >( 0 );
+ TestReduceDynamicView< long , Kokkos::Serial >( 1000000 );
+}
+
+TEST_F( serial , scan )
+{
+ TestScan< Kokkos::Serial >::test_range( 1 , 1000 );
+ TestScan< Kokkos::Serial >( 0 );
+ TestScan< Kokkos::Serial >( 10 );
+ TestScan< Kokkos::Serial >( 10000 );
+}
+
+TEST_F( serial , team_scan )
+{
+ TestScanTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >( 0 );
+ TestScanTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
+ TestScanTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >( 10 );
+ TestScanTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >( 10 );
+ TestScanTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >( 10000 );
+ TestScanTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >( 10000 );
+}
+
+TEST_F( serial , team_long_reduce) {
+ TestReduceTeam< long , Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >( 0 );
+ TestReduceTeam< long , Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
+ TestReduceTeam< long , Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >( 3 );
+ TestReduceTeam< long , Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
+ TestReduceTeam< long , Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >( 100000 );
+ TestReduceTeam< long , Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
+}
+
+TEST_F( serial , team_double_reduce) {
+ TestReduceTeam< double , Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >( 0 );
+ TestReduceTeam< double , Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
+ TestReduceTeam< double , Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >( 3 );
+ TestReduceTeam< double , Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
+ TestReduceTeam< double , Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >( 100000 );
+ TestReduceTeam< double , Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
+}
+
+TEST_F( serial , reduction_deduction )
+{
+ TestCXX11::test_reduction_deduction< Kokkos::Serial >();
+}
+
+} // namespace test
+
diff --git a/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_a.cpp
similarity index 62%
copy from lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp
copy to lib/kokkos/core/unit_test/serial/TestSerial_SubView_a.cpp
index c15f81223..bc838ccde 100644
--- a/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_a.cpp
@@ -1,76 +1,92 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <serial/TestSerial.hpp>
-#include <gtest/gtest.h>
+namespace Test {
+
+TEST_F( serial, view_subview_auto_1d_left ) {
+ TestViewSubview::test_auto_1d< Kokkos::LayoutLeft,Kokkos::Serial >();
+}
+
+TEST_F( serial, view_subview_auto_1d_right ) {
+ TestViewSubview::test_auto_1d< Kokkos::LayoutRight,Kokkos::Serial >();
+}
-#include <Kokkos_Core.hpp>
+TEST_F( serial, view_subview_auto_1d_stride ) {
+ TestViewSubview::test_auto_1d< Kokkos::LayoutStride,Kokkos::Serial >();
+}
-#if !defined(KOKKOS_HAVE_CUDA) || defined(__CUDACC__)
-//----------------------------------------------------------------------------
+TEST_F( serial, view_subview_assign_strided ) {
+ TestViewSubview::test_1d_strided_assignment< Kokkos::Serial >();
+}
-#include <TestReduce.hpp>
+TEST_F( serial, view_subview_left_0 ) {
+ TestViewSubview::test_left_0< Kokkos::Serial >();
+}
+TEST_F( serial, view_subview_left_1 ) {
+ TestViewSubview::test_left_1< Kokkos::Serial >();
+}
-namespace Test {
+TEST_F( serial, view_subview_left_2 ) {
+ TestViewSubview::test_left_2< Kokkos::Serial >();
+}
-class defaultdevicetype : public ::testing::Test {
-protected:
- static void SetUpTestCase()
- {
- Kokkos::initialize();
- }
+TEST_F( serial, view_subview_left_3 ) {
+ TestViewSubview::test_left_3< Kokkos::Serial >();
+}
- static void TearDownTestCase()
- {
- Kokkos::finalize();
- }
-};
+TEST_F( serial, view_subview_right_0 ) {
+ TestViewSubview::test_right_0< Kokkos::Serial >();
+}
+TEST_F( serial, view_subview_right_1 ) {
+ TestViewSubview::test_right_1< Kokkos::Serial >();
+}
-TEST_F( defaultdevicetype, reduce_instantiation) {
- TestReduceCombinatoricalInstantiation<>::execute();
+TEST_F( serial, view_subview_right_3 ) {
+ TestViewSubview::test_right_3< Kokkos::Serial >();
}
} // namespace test
-#endif
diff --git a/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_b.cpp
similarity index 72%
copy from lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp
copy to lib/kokkos/core/unit_test/serial/TestSerial_SubView_b.cpp
index c15f81223..e6a5b56d3 100644
--- a/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_b.cpp
@@ -1,76 +1,60 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
-
-#include <gtest/gtest.h>
-
-#include <Kokkos_Core.hpp>
-
-#if !defined(KOKKOS_HAVE_CUDA) || defined(__CUDACC__)
-//----------------------------------------------------------------------------
-
-#include <TestReduce.hpp>
-
+#include <serial/TestSerial.hpp>
namespace Test {
-class defaultdevicetype : public ::testing::Test {
-protected:
- static void SetUpTestCase()
- {
- Kokkos::initialize();
- }
-
- static void TearDownTestCase()
- {
- Kokkos::finalize();
- }
-};
-
+TEST_F( serial, view_subview_layoutleft_to_layoutleft) {
+ TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Serial >();
+ TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Serial , Kokkos::MemoryTraits<Kokkos::Atomic> >();
+ TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Serial , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+}
-TEST_F( defaultdevicetype, reduce_instantiation) {
- TestReduceCombinatoricalInstantiation<>::execute();
+TEST_F( serial, view_subview_layoutright_to_layoutright) {
+ TestViewSubview::test_layoutright_to_layoutright< Kokkos::Serial >();
+ TestViewSubview::test_layoutright_to_layoutright< Kokkos::Serial , Kokkos::MemoryTraits<Kokkos::Atomic> >();
+ TestViewSubview::test_layoutright_to_layoutright< Kokkos::Serial , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
} // namespace test
-#endif
diff --git a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c01.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
copy to lib/kokkos/core/unit_test/serial/TestSerial_SubView_c01.cpp
index 86bc94ab0..0b7a0d3bf 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c01.cpp
@@ -1,55 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <serial/TestSerial.hpp>
-#ifndef KOKKOS_SINGLETON_HPP
-#define KOKKOS_SINGLETON_HPP
-
-#include <Kokkos_Macros.hpp>
-#include <cstddef>
-
-namespace Kokkos { namespace Impl {
+namespace Test {
+TEST_F( serial, view_subview_1d_assign ) {
+ TestViewSubview::test_1d_assign< Kokkos::Serial >();
+}
-}} // namespace Kokkos::Impl
+} // namespace test
-#endif // KOKKOS_SINGLETON_HPP
diff --git a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c02.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
copy to lib/kokkos/core/unit_test/serial/TestSerial_SubView_c02.cpp
index 86bc94ab0..8ca7285c1 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c02.cpp
@@ -1,55 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <serial/TestSerial.hpp>
-#ifndef KOKKOS_SINGLETON_HPP
-#define KOKKOS_SINGLETON_HPP
-
-#include <Kokkos_Macros.hpp>
-#include <cstddef>
-
-namespace Kokkos { namespace Impl {
+namespace Test {
+TEST_F( serial, view_subview_1d_assign_atomic ) {
+ TestViewSubview::test_1d_assign< Kokkos::Serial , Kokkos::MemoryTraits<Kokkos::Atomic> >();
+}
-}} // namespace Kokkos::Impl
+} // namespace test
-#endif // KOKKOS_SINGLETON_HPP
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c03.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
copy to lib/kokkos/core/unit_test/serial/TestSerial_SubView_c03.cpp
index 61d2e3570..1d156c741 100644
--- a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c03.cpp
@@ -1,56 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <serial/TestSerial.hpp>
-#ifndef KOKKOS_VIEWTILELEFT_HPP
-#define KOKKOS_VIEWTILELEFT_HPP
-
-#include <impl/KokkosExp_ViewTile.hpp>
-
-namespace Kokkos {
-
-using Kokkos::Experimental::tile_subview ;
+namespace Test {
+TEST_F( serial, view_subview_1d_assign_randomaccess ) {
+ TestViewSubview::test_1d_assign< Kokkos::Serial , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-#endif /* #ifndef KOKKOS_VIEWTILELEFT_HPP */
+} // namespace test
diff --git a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c04.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
copy to lib/kokkos/core/unit_test/serial/TestSerial_SubView_c04.cpp
index 86bc94ab0..ebf0e5c99 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c04.cpp
@@ -1,55 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <serial/TestSerial.hpp>
-#ifndef KOKKOS_SINGLETON_HPP
-#define KOKKOS_SINGLETON_HPP
-
-#include <Kokkos_Macros.hpp>
-#include <cstddef>
-
-namespace Kokkos { namespace Impl {
+namespace Test {
+TEST_F( serial, view_subview_2d_from_3d ) {
+ TestViewSubview::test_2d_subview_3d< Kokkos::Serial >();
+}
-}} // namespace Kokkos::Impl
+} // namespace test
-#endif // KOKKOS_SINGLETON_HPP
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c05.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
copy to lib/kokkos/core/unit_test/serial/TestSerial_SubView_c05.cpp
index 61d2e3570..74acb92f1 100644
--- a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c05.cpp
@@ -1,56 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <serial/TestSerial.hpp>
-#ifndef KOKKOS_VIEWTILELEFT_HPP
-#define KOKKOS_VIEWTILELEFT_HPP
-
-#include <impl/KokkosExp_ViewTile.hpp>
-
-namespace Kokkos {
-
-using Kokkos::Experimental::tile_subview ;
+namespace Test {
+TEST_F( serial, view_subview_2d_from_3d_atomic ) {
+ TestViewSubview::test_2d_subview_3d< Kokkos::Serial , Kokkos::MemoryTraits<Kokkos::Atomic> >();
}
-#endif /* #ifndef KOKKOS_VIEWTILELEFT_HPP */
+} // namespace test
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c06.cpp
similarity index 88%
copy from lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
copy to lib/kokkos/core/unit_test/serial/TestSerial_SubView_c06.cpp
index 61d2e3570..8075d46e0 100644
--- a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c06.cpp
@@ -1,56 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <serial/TestSerial.hpp>
-#ifndef KOKKOS_VIEWTILELEFT_HPP
-#define KOKKOS_VIEWTILELEFT_HPP
-
-#include <impl/KokkosExp_ViewTile.hpp>
-
-namespace Kokkos {
-
-using Kokkos::Experimental::tile_subview ;
+namespace Test {
+TEST_F( serial, view_subview_2d_from_3d_randomaccess ) {
+ TestViewSubview::test_2d_subview_3d< Kokkos::Serial , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-#endif /* #ifndef KOKKOS_VIEWTILELEFT_HPP */
+} // namespace test
diff --git a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c07.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
copy to lib/kokkos/core/unit_test/serial/TestSerial_SubView_c07.cpp
index 86bc94ab0..9ce822264 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c07.cpp
@@ -1,55 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <serial/TestSerial.hpp>
-#ifndef KOKKOS_SINGLETON_HPP
-#define KOKKOS_SINGLETON_HPP
-
-#include <Kokkos_Macros.hpp>
-#include <cstddef>
-
-namespace Kokkos { namespace Impl {
+namespace Test {
+TEST_F( serial, view_subview_3d_from_5d_left ) {
+ TestViewSubview::test_3d_subview_5d_left< Kokkos::Serial >();
+}
-}} // namespace Kokkos::Impl
+} // namespace test
-#endif // KOKKOS_SINGLETON_HPP
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c08.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
copy to lib/kokkos/core/unit_test/serial/TestSerial_SubView_c08.cpp
index 61d2e3570..c8a5c8f33 100644
--- a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c08.cpp
@@ -1,56 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <serial/TestSerial.hpp>
-#ifndef KOKKOS_VIEWTILELEFT_HPP
-#define KOKKOS_VIEWTILELEFT_HPP
-
-#include <impl/KokkosExp_ViewTile.hpp>
-
-namespace Kokkos {
-
-using Kokkos::Experimental::tile_subview ;
+namespace Test {
+TEST_F( serial, view_subview_3d_from_5d_left_atomic ) {
+ TestViewSubview::test_3d_subview_5d_left< Kokkos::Serial , Kokkos::MemoryTraits<Kokkos::Atomic> >();
}
-#endif /* #ifndef KOKKOS_VIEWTILELEFT_HPP */
+} // namespace test
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c09.cpp
similarity index 88%
copy from lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
copy to lib/kokkos/core/unit_test/serial/TestSerial_SubView_c09.cpp
index 61d2e3570..b66f15f17 100644
--- a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c09.cpp
@@ -1,56 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <serial/TestSerial.hpp>
-#ifndef KOKKOS_VIEWTILELEFT_HPP
-#define KOKKOS_VIEWTILELEFT_HPP
-
-#include <impl/KokkosExp_ViewTile.hpp>
-
-namespace Kokkos {
-
-using Kokkos::Experimental::tile_subview ;
+namespace Test {
+TEST_F( serial, view_subview_3d_from_5d_left_randomaccess ) {
+ TestViewSubview::test_3d_subview_5d_left< Kokkos::Serial , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-#endif /* #ifndef KOKKOS_VIEWTILELEFT_HPP */
+} // namespace test
diff --git a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c10.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
copy to lib/kokkos/core/unit_test/serial/TestSerial_SubView_c10.cpp
index 86bc94ab0..5e5e3cf3d 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c10.cpp
@@ -1,55 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <serial/TestSerial.hpp>
-#ifndef KOKKOS_SINGLETON_HPP
-#define KOKKOS_SINGLETON_HPP
-
-#include <Kokkos_Macros.hpp>
-#include <cstddef>
-
-namespace Kokkos { namespace Impl {
+namespace Test {
+TEST_F( serial, view_subview_3d_from_5d_right ) {
+ TestViewSubview::test_3d_subview_5d_right< Kokkos::Serial >();
+}
-}} // namespace Kokkos::Impl
+} // namespace test
-#endif // KOKKOS_SINGLETON_HPP
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c11.cpp
similarity index 88%
copy from lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
copy to lib/kokkos/core/unit_test/serial/TestSerial_SubView_c11.cpp
index 61d2e3570..55a353bca 100644
--- a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c11.cpp
@@ -1,56 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <serial/TestSerial.hpp>
-#ifndef KOKKOS_VIEWTILELEFT_HPP
-#define KOKKOS_VIEWTILELEFT_HPP
-
-#include <impl/KokkosExp_ViewTile.hpp>
-
-namespace Kokkos {
-
-using Kokkos::Experimental::tile_subview ;
+namespace Test {
+TEST_F( serial, view_subview_3d_from_5d_right_atomic ) {
+ TestViewSubview::test_3d_subview_5d_right< Kokkos::Serial , Kokkos::MemoryTraits<Kokkos::Atomic> >();
}
-#endif /* #ifndef KOKKOS_VIEWTILELEFT_HPP */
+} // namespace test
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c12.cpp
similarity index 88%
copy from lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
copy to lib/kokkos/core/unit_test/serial/TestSerial_SubView_c12.cpp
index 61d2e3570..a168e1e23 100644
--- a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c12.cpp
@@ -1,56 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <serial/TestSerial.hpp>
-#ifndef KOKKOS_VIEWTILELEFT_HPP
-#define KOKKOS_VIEWTILELEFT_HPP
-
-#include <impl/KokkosExp_ViewTile.hpp>
-
-namespace Kokkos {
-
-using Kokkos::Experimental::tile_subview ;
+namespace Test {
+TEST_F( serial, view_subview_3d_from_5d_right_randomaccess ) {
+ TestViewSubview::test_3d_subview_5d_right< Kokkos::Serial , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-#endif /* #ifndef KOKKOS_VIEWTILELEFT_HPP */
+} // namespace test
diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c_all.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c_all.cpp
new file mode 100644
index 000000000..a489b0fcb
--- /dev/null
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c_all.cpp
@@ -0,0 +1,12 @@
+#include<serial/TestSerial_SubView_c01.cpp>
+#include<serial/TestSerial_SubView_c02.cpp>
+#include<serial/TestSerial_SubView_c03.cpp>
+#include<serial/TestSerial_SubView_c04.cpp>
+#include<serial/TestSerial_SubView_c05.cpp>
+#include<serial/TestSerial_SubView_c06.cpp>
+#include<serial/TestSerial_SubView_c07.cpp>
+#include<serial/TestSerial_SubView_c08.cpp>
+#include<serial/TestSerial_SubView_c09.cpp>
+#include<serial/TestSerial_SubView_c10.cpp>
+#include<serial/TestSerial_SubView_c11.cpp>
+#include<serial/TestSerial_SubView_c12.cpp>
diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_Team.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_Team.cpp
new file mode 100644
index 000000000..3318e5f24
--- /dev/null
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_Team.cpp
@@ -0,0 +1,117 @@
+/*
+//@HEADER
+// ************************************************************************
+//
+// Kokkos v. 2.0
+// Copyright (2014) Sandia Corporation
+//
+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
+// the U.S. Government retains certain rights in this software.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the Corporation nor the names of the
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
+//
+// ************************************************************************
+//@HEADER
+*/
+#include <serial/TestSerial.hpp>
+
+namespace Test {
+
+TEST_F( serial , team_tag )
+{
+ TestTeamPolicy< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >::test_for(0);
+ TestTeamPolicy< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >::test_reduce(0);
+ TestTeamPolicy< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(0);
+ TestTeamPolicy< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(0);
+
+ TestTeamPolicy< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >::test_for(1000);
+ TestTeamPolicy< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >::test_reduce(1000);
+ TestTeamPolicy< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(1000);
+ TestTeamPolicy< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(1000);
+}
+
+TEST_F( serial , team_shared_request) {
+ TestSharedTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >();
+ TestSharedTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >();
+}
+
+TEST_F( serial, team_scratch_request) {
+ TestScratchTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >();
+ TestScratchTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >();
+}
+
+#if defined(KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA)
+TEST_F( serial , team_lambda_shared_request) {
+ TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >();
+ TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >();
+}
+#endif
+
+TEST_F( serial, shmem_size) {
+ TestShmemSize< Kokkos::Serial >();
+}
+
+TEST_F( serial, multi_level_scratch) {
+ TestMultiLevelScratchTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >();
+ TestMultiLevelScratchTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >();
+}
+
+TEST_F( serial , team_vector )
+{
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(0) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(1) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(2) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(3) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(4) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(5) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(6) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(7) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(8) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(9) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(10) ) );
+}
+
+#ifdef KOKKOS_COMPILER_GNU
+#if ( KOKKOS_COMPILER_GNU == 472 )
+#define SKIP_TEST
+#endif
+#endif
+
+#ifndef SKIP_TEST
+TEST_F( serial, triple_nested_parallelism )
+{
+ TestTripleNestedReduce< double, Kokkos::Serial >( 8192, 2048 , 32 , 32 );
+ TestTripleNestedReduce< double, Kokkos::Serial >( 8192, 2048 , 32 , 16 );
+ TestTripleNestedReduce< double, Kokkos::Serial >( 8192, 2048 , 16 , 16 );
+}
+#endif
+
+} // namespace test
+
diff --git a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp b/lib/kokkos/core/unit_test/serial/TestSerial_ViewAPI_a.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
copy to lib/kokkos/core/unit_test/serial/TestSerial_ViewAPI_a.cpp
index 86bc94ab0..4c655fe77 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_ViewAPI_a.cpp
@@ -1,55 +1,53 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <serial/TestSerial.hpp>
-#ifndef KOKKOS_SINGLETON_HPP
-#define KOKKOS_SINGLETON_HPP
-
-#include <Kokkos_Macros.hpp>
-#include <cstddef>
-
-namespace Kokkos { namespace Impl {
+namespace Test {
+TEST_F( serial , impl_view_mapping_a ) {
+ test_view_mapping< Kokkos::Serial >();
+ test_view_mapping_operator< Kokkos::Serial >();
+}
-}} // namespace Kokkos::Impl
+} // namespace test
-#endif // KOKKOS_SINGLETON_HPP
diff --git a/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_ViewAPI_b.cpp
similarity index 52%
copy from lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp
copy to lib/kokkos/core/unit_test/serial/TestSerial_ViewAPI_b.cpp
index c15f81223..4947f2eaa 100644
--- a/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_ViewAPI_b.cpp
@@ -1,76 +1,121 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <serial/TestSerial.hpp>
-#include <gtest/gtest.h>
+namespace Test {
-#include <Kokkos_Core.hpp>
+TEST_F( serial , impl_shared_alloc ) {
+ test_shared_alloc< Kokkos::HostSpace , Kokkos::Serial >();
+}
-#if !defined(KOKKOS_HAVE_CUDA) || defined(__CUDACC__)
-//----------------------------------------------------------------------------
+TEST_F( serial , impl_view_mapping_b ) {
+ test_view_mapping_subview< Kokkos::Serial >();
+ TestViewMappingAtomic< Kokkos::Serial >::run();
+}
-#include <TestReduce.hpp>
+TEST_F( serial, view_api) {
+ TestViewAPI< double , Kokkos::Serial >();
+}
+TEST_F( serial , view_nested_view )
+{
+ ::Test::view_nested_view< Kokkos::Serial >();
+}
-namespace Test {
-class defaultdevicetype : public ::testing::Test {
-protected:
- static void SetUpTestCase()
- {
- Kokkos::initialize();
- }
- static void TearDownTestCase()
- {
- Kokkos::finalize();
- }
-};
+TEST_F( serial , view_remap )
+{
+ enum { N0 = 3 , N1 = 2 , N2 = 8 , N3 = 9 };
+ typedef Kokkos::View< double*[N1][N2][N3] ,
+ Kokkos::LayoutRight ,
+ Kokkos::Serial > output_type ;
+
+ typedef Kokkos::View< int**[N2][N3] ,
+ Kokkos::LayoutLeft ,
+ Kokkos::Serial > input_type ;
+
+ typedef Kokkos::View< int*[N0][N2][N3] ,
+ Kokkos::LayoutLeft ,
+ Kokkos::Serial > diff_type ;
+
+ output_type output( "output" , N0 );
+ input_type input ( "input" , N0 , N1 );
+ diff_type diff ( "diff" , N0 );
+
+ int value = 0 ;
+ for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
+ for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
+ for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
+ for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
+ input(i0,i1,i2,i3) = ++value ;
+ }}}}
+
+ // Kokkos::deep_copy( diff , input ); // throw with incompatible shape
+ Kokkos::deep_copy( output , input );
+
+ value = 0 ;
+ for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
+ for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
+ for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
+ for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
+ ++value ;
+ ASSERT_EQ( value , ((int) output(i0,i1,i2,i3) ) );
+ }}}}
+}
+
+//----------------------------------------------------------------------------
+
+TEST_F( serial , view_aggregate )
+{
+ TestViewAggregate< Kokkos::Serial >();
+}
-TEST_F( defaultdevicetype, reduce_instantiation) {
- TestReduceCombinatoricalInstantiation<>::execute();
+TEST_F( serial , template_meta_functions )
+{
+ TestTemplateMetaFunctions<int, Kokkos::Serial >();
}
} // namespace test
-#endif
diff --git a/lib/kokkos/core/unit_test/TestOpenMP_a.cpp b/lib/kokkos/core/unit_test/threads/TestThreads.hpp
similarity index 61%
rename from lib/kokkos/core/unit_test/TestOpenMP_a.cpp
rename to lib/kokkos/core/unit_test/threads/TestThreads.hpp
index 64eac6680..bb9f36581 100644
--- a/lib/kokkos/core/unit_test/TestOpenMP_a.cpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads.hpp
@@ -1,150 +1,114 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
-
+#ifndef KOKKOS_TEST_THREADSHPP
+#define KOKKOS_TEST_THREADSHPP
#include <gtest/gtest.h>
#include <Kokkos_Macros.hpp>
#ifdef KOKKOS_LAMBDA
#undef KOKKOS_LAMBDA
#endif
#define KOKKOS_LAMBDA [=]
#include <Kokkos_Core.hpp>
-//----------------------------------------------------------------------------
+#include <TestTile.hpp>
-#include <TestViewImpl.hpp>
-#include <TestAtomic.hpp>
-
-#include <TestViewAPI.hpp>
-#include <TestViewSubview.hpp>
-#include <TestViewOfClass.hpp>
+//----------------------------------------------------------------------------
#include <TestSharedAlloc.hpp>
#include <TestViewMapping.hpp>
+
+#include <TestViewAPI.hpp>
+#include <TestViewOfClass.hpp>
+#include <TestViewSubview.hpp>
+#include <TestAtomic.hpp>
+#include <TestAtomicOperations.hpp>
#include <TestRange.hpp>
#include <TestTeam.hpp>
#include <TestReduce.hpp>
#include <TestScan.hpp>
#include <TestAggregate.hpp>
-#include <TestAggregateReduction.hpp>
#include <TestCompilerMacros.hpp>
+#include <TestTaskScheduler.hpp>
#include <TestMemoryPool.hpp>
#include <TestCXX11.hpp>
#include <TestCXX11Deduction.hpp>
#include <TestTeamVector.hpp>
-#include <TestMemorySpaceTracking.hpp>
#include <TestTemplateMetaFunctions.hpp>
#include <TestPolicyConstruction.hpp>
+#include <TestMDRange.hpp>
namespace Test {
-class openmp : public ::testing::Test {
+class threads : public ::testing::Test {
protected:
- static void SetUpTestCase();
- static void TearDownTestCase();
-};
+ static void SetUpTestCase()
+ {
+ const unsigned numa_count = Kokkos::hwloc::get_available_numa_count();
+ const unsigned cores_per_numa = Kokkos::hwloc::get_available_cores_per_numa();
+ const unsigned threads_per_core = Kokkos::hwloc::get_available_threads_per_core();
-TEST_F( openmp, view_subview_auto_1d_left ) {
- TestViewSubview::test_auto_1d< Kokkos::LayoutLeft,Kokkos::OpenMP >();
-}
+ unsigned threads_count = 0 ;
-TEST_F( openmp, view_subview_auto_1d_right ) {
- TestViewSubview::test_auto_1d< Kokkos::LayoutRight,Kokkos::OpenMP >();
-}
+ threads_count = std::max( 1u , numa_count )
+ * std::max( 2u , cores_per_numa * threads_per_core );
-TEST_F( openmp, view_subview_auto_1d_stride ) {
- TestViewSubview::test_auto_1d< Kokkos::LayoutStride,Kokkos::OpenMP >();
-}
+ Kokkos::Threads::initialize( threads_count );
+ Kokkos::Threads::print_configuration( std::cout , true /* detailed */ );
+ }
-TEST_F( openmp, view_subview_assign_strided ) {
- TestViewSubview::test_1d_strided_assignment< Kokkos::OpenMP >();
-}
-
-TEST_F( openmp, view_subview_left_0 ) {
- TestViewSubview::test_left_0< Kokkos::OpenMP >();
-}
-
-TEST_F( openmp, view_subview_left_1 ) {
- TestViewSubview::test_left_1< Kokkos::OpenMP >();
-}
-
-TEST_F( openmp, view_subview_left_2 ) {
- TestViewSubview::test_left_2< Kokkos::OpenMP >();
-}
-
-TEST_F( openmp, view_subview_left_3 ) {
- TestViewSubview::test_left_3< Kokkos::OpenMP >();
-}
-
-TEST_F( openmp, view_subview_right_0 ) {
- TestViewSubview::test_right_0< Kokkos::OpenMP >();
-}
-
-TEST_F( openmp, view_subview_right_1 ) {
- TestViewSubview::test_right_1< Kokkos::OpenMP >();
-}
-
-TEST_F( openmp, view_subview_right_3 ) {
- TestViewSubview::test_right_3< Kokkos::OpenMP >();
-}
-
-TEST_F( openmp, view_subview_1d_assign ) {
- TestViewSubview::test_1d_assign< Kokkos::OpenMP >();
-}
+ static void TearDownTestCase()
+ {
+ Kokkos::Threads::finalize();
+ }
+};
-TEST_F( openmp, view_subview_2d_from_3d ) {
- TestViewSubview::test_2d_subview_3d< Kokkos::OpenMP >();
-}
-TEST_F( openmp, view_subview_2d_from_5d ) {
- TestViewSubview::test_2d_subview_5d< Kokkos::OpenMP >();
}
-
-} // namespace test
-
+#endif
diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_Atomics.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_Atomics.cpp
new file mode 100644
index 000000000..8ce32fc33
--- /dev/null
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_Atomics.cpp
@@ -0,0 +1,168 @@
+/*
+//@HEADER
+// ************************************************************************
+//
+// Kokkos v. 2.0
+// Copyright (2014) Sandia Corporation
+//
+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
+// the U.S. Government retains certain rights in this software.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the Corporation nor the names of the
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
+//
+// ************************************************************************
+//@HEADER
+*/
+#include <threads/TestThreads.hpp>
+
+namespace Test {
+
+TEST_F( threads , atomics )
+{
+ const int loop_count = 1e4 ;
+
+ ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Threads>(loop_count,1) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Threads>(loop_count,2) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Threads>(loop_count,3) ) );
+
+ ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Threads>(loop_count,1) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Threads>(loop_count,2) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Threads>(loop_count,3) ) );
+
+ ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Threads>(loop_count,1) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Threads>(loop_count,2) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Threads>(loop_count,3) ) );
+
+ ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Threads>(loop_count,1) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Threads>(loop_count,2) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Threads>(loop_count,3) ) );
+
+ ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Threads>(loop_count,1) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Threads>(loop_count,2) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Threads>(loop_count,3) ) );
+
+ ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Threads>(loop_count,1) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Threads>(loop_count,2) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Threads>(loop_count,3) ) );
+
+ ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Threads>(100,1) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Threads>(100,2) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Threads>(100,3) ) );
+
+ ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Threads>(100,1) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Threads>(100,2) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Threads>(100,3) ) );
+
+ ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::Threads>(100,1) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::Threads>(100,2) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::Threads>(100,3) ) );
+}
+
+TEST_F( threads , atomic_operations )
+{
+ const int start = 1; //Avoid zero for division
+ const int end = 11;
+ for (int i = start; i < end; ++i)
+ {
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 8 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 12 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 8 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 12 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 8 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 12 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 8 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 12 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 8 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 12 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Threads>(start, end-i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Threads>(start, end-i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Threads>(start, end-i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Threads>(start, end-i, 4 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Threads>(start, end-i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Threads>(start, end-i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Threads>(start, end-i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Threads>(start, end-i, 4 ) ) );
+ }
+
+}
+
+} // namespace test
+
diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_Other.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_Other.cpp
new file mode 100644
index 000000000..d9f17cc88
--- /dev/null
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_Other.cpp
@@ -0,0 +1,189 @@
+/*
+//@HEADER
+// ************************************************************************
+//
+// Kokkos v. 2.0
+// Copyright (2014) Sandia Corporation
+//
+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
+// the U.S. Government retains certain rights in this software.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the Corporation nor the names of the
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
+//
+// ************************************************************************
+//@HEADER
+*/
+#include <threads/TestThreads.hpp>
+
+namespace Test {
+
+TEST_F( threads , init ) {
+ ;
+}
+
+TEST_F( threads , md_range ) {
+ TestMDRange_2D< Kokkos::Threads >::test_for2(100,100);
+
+ TestMDRange_3D< Kokkos::Threads >::test_for3(100,100,100);
+}
+
+TEST_F( threads, policy_construction) {
+ TestRangePolicyConstruction< Kokkos::Threads >();
+ TestTeamPolicyConstruction< Kokkos::Threads >();
+}
+
+TEST_F( threads , range_tag )
+{
+ TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_for(0);
+ TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_reduce(0);
+ TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_scan(0);
+ TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(0);
+ TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(0);
+ TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(0);
+ TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy(0);
+
+ TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_for(2);
+ TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_reduce(2);
+ TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_scan(2);
+
+ TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(3);
+ TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(3);
+ TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(3);
+ TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy(3);
+
+ TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_for(1000);
+ TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_reduce(1000);
+ TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_scan(1000);
+
+ TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(1001);
+ TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(1001);
+ TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(1001);
+ TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy(1000);
+}
+
+
+//----------------------------------------------------------------------------
+
+TEST_F( threads , compiler_macros )
+{
+ ASSERT_TRUE( ( TestCompilerMacros::Test< Kokkos::Threads >() ) );
+}
+
+//----------------------------------------------------------------------------
+
+TEST_F( threads , memory_pool )
+{
+ bool val = TestMemoryPool::test_mempool< Kokkos::Threads >( 128, 128000000 );
+ ASSERT_TRUE( val );
+
+ TestMemoryPool::test_mempool2< Kokkos::Threads >( 64, 4, 1000000, 2000000 );
+
+ TestMemoryPool::test_memory_exhaustion< Kokkos::Threads >();
+}
+
+//----------------------------------------------------------------------------
+
+#if defined( KOKKOS_ENABLE_TASKDAG )
+/*
+TEST_F( threads , task_fib )
+{
+ for ( int i = 0 ; i < 25 ; ++i ) {
+ TestTaskScheduler::TestFib< Kokkos::Threads >::run(i);
+ }
+}
+
+TEST_F( threads , task_depend )
+{
+ for ( int i = 0 ; i < 25 ; ++i ) {
+ TestTaskScheduler::TestTaskDependence< Kokkos::Threads >::run(i);
+ }
+}
+
+TEST_F( threads , task_team )
+{
+ TestTaskScheduler::TestTaskTeam< Kokkos::Threads >::run(1000);
+ //TestTaskScheduler::TestTaskTeamValue< Kokkos::Threads >::run(1000); //put back after testing
+}
+*/
+#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
+
+//----------------------------------------------------------------------------
+
+#if defined( KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_THREADS )
+TEST_F( threads , cxx11 )
+{
+ if ( std::is_same< Kokkos::DefaultExecutionSpace , Kokkos::Threads >::value ) {
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Threads >(1) ) );
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Threads >(2) ) );
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Threads >(3) ) );
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Threads >(4) ) );
+ }
+}
+#endif
+
+TEST_F( threads, tile_layout )
+{
+ TestTile::test< Kokkos::Threads , 1 , 1 >( 1 , 1 );
+ TestTile::test< Kokkos::Threads , 1 , 1 >( 2 , 3 );
+ TestTile::test< Kokkos::Threads , 1 , 1 >( 9 , 10 );
+
+ TestTile::test< Kokkos::Threads , 2 , 2 >( 1 , 1 );
+ TestTile::test< Kokkos::Threads , 2 , 2 >( 2 , 3 );
+ TestTile::test< Kokkos::Threads , 2 , 2 >( 4 , 4 );
+ TestTile::test< Kokkos::Threads , 2 , 2 >( 9 , 9 );
+
+ TestTile::test< Kokkos::Threads , 2 , 4 >( 9 , 9 );
+ TestTile::test< Kokkos::Threads , 4 , 2 >( 9 , 9 );
+
+ TestTile::test< Kokkos::Threads , 4 , 4 >( 1 , 1 );
+ TestTile::test< Kokkos::Threads , 4 , 4 >( 4 , 4 );
+ TestTile::test< Kokkos::Threads , 4 , 4 >( 9 , 9 );
+ TestTile::test< Kokkos::Threads , 4 , 4 >( 9 , 11 );
+
+ TestTile::test< Kokkos::Threads , 8 , 8 >( 1 , 1 );
+ TestTile::test< Kokkos::Threads , 8 , 8 >( 4 , 4 );
+ TestTile::test< Kokkos::Threads , 8 , 8 >( 9 , 9 );
+ TestTile::test< Kokkos::Threads , 8 , 8 >( 9 , 11 );
+}
+
+
+TEST_F( threads , dispatch )
+{
+ const int repeat = 100 ;
+ for ( int i = 0 ; i < repeat ; ++i ) {
+ for ( int j = 0 ; j < repeat ; ++j ) {
+ Kokkos::parallel_for( Kokkos::RangePolicy< Kokkos::Threads >(0,j)
+ , KOKKOS_LAMBDA( int ) {} );
+ }}
+}
+
+
+} // namespace test
+
diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_Reductions.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_Reductions.cpp
new file mode 100644
index 000000000..a637d1e3a
--- /dev/null
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_Reductions.cpp
@@ -0,0 +1,138 @@
+/*
+//@HEADER
+// ************************************************************************
+//
+// Kokkos v. 2.0
+// Copyright (2014) Sandia Corporation
+//
+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
+// the U.S. Government retains certain rights in this software.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the Corporation nor the names of the
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
+//
+// ************************************************************************
+//@HEADER
+*/
+#include <threads/TestThreads.hpp>
+
+namespace Test {
+
+TEST_F( threads, long_reduce) {
+ TestReduce< long , Kokkos::Threads >( 0 );
+ TestReduce< long , Kokkos::Threads >( 1000000 );
+}
+
+TEST_F( threads, double_reduce) {
+ TestReduce< double , Kokkos::Threads >( 0 );
+ TestReduce< double , Kokkos::Threads >( 1000000 );
+}
+
+TEST_F( threads , reducers )
+{
+ TestReducers<int, Kokkos::Threads>::execute_integer();
+ TestReducers<size_t, Kokkos::Threads>::execute_integer();
+ TestReducers<double, Kokkos::Threads>::execute_float();
+ TestReducers<Kokkos::complex<double>, Kokkos::Threads>::execute_basic();
+}
+
+TEST_F( threads, long_reduce_dynamic ) {
+ TestReduceDynamic< long , Kokkos::Threads >( 0 );
+ TestReduceDynamic< long , Kokkos::Threads >( 1000000 );
+}
+
+TEST_F( threads, double_reduce_dynamic ) {
+ TestReduceDynamic< double , Kokkos::Threads >( 0 );
+ TestReduceDynamic< double , Kokkos::Threads >( 1000000 );
+}
+
+TEST_F( threads, long_reduce_dynamic_view ) {
+ TestReduceDynamicView< long , Kokkos::Threads >( 0 );
+ TestReduceDynamicView< long , Kokkos::Threads >( 1000000 );
+}
+
+TEST_F( threads , scan )
+{
+ TestScan< Kokkos::Threads >::test_range( 1 , 1000 );
+ TestScan< Kokkos::Threads >( 0 );
+ TestScan< Kokkos::Threads >( 100000 );
+ TestScan< Kokkos::Threads >( 10000000 );
+ Kokkos::Threads::fence();
+}
+
+#if 0
+TEST_F( threads , scan_small )
+{
+ typedef TestScan< Kokkos::Threads , Kokkos::Impl::ThreadsExecUseScanSmall > TestScanFunctor ;
+ for ( int i = 0 ; i < 1000 ; ++i ) {
+ TestScanFunctor( 10 );
+ TestScanFunctor( 10000 );
+ }
+ TestScanFunctor( 1000000 );
+ TestScanFunctor( 10000000 );
+
+ Kokkos::Threads::fence();
+}
+#endif
+
+TEST_F( threads , team_scan )
+{
+ TestScanTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >( 0 );
+ TestScanTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
+ TestScanTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >( 10 );
+ TestScanTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >( 10 );
+ TestScanTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >( 10000 );
+ TestScanTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >( 10000 );
+}
+
+TEST_F( threads , team_long_reduce) {
+ TestReduceTeam< long , Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >( 0 );
+ TestReduceTeam< long , Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
+ TestReduceTeam< long , Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >( 3 );
+ TestReduceTeam< long , Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
+ TestReduceTeam< long , Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >( 100000 );
+ TestReduceTeam< long , Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
+}
+
+TEST_F( threads , team_double_reduce) {
+ TestReduceTeam< double , Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >( 0 );
+ TestReduceTeam< double , Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
+ TestReduceTeam< double , Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >( 3 );
+ TestReduceTeam< double , Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
+ TestReduceTeam< double , Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >( 100000 );
+ TestReduceTeam< double , Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
+}
+
+TEST_F( threads , reduction_deduction )
+{
+ TestCXX11::test_reduction_deduction< Kokkos::Threads >();
+}
+
+} // namespace test
+
diff --git a/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_a.cpp
similarity index 62%
copy from lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp
copy to lib/kokkos/core/unit_test/threads/TestThreads_SubView_a.cpp
index c15f81223..2df9e19de 100644
--- a/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_a.cpp
@@ -1,76 +1,92 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <threads/TestThreads.hpp>
-#include <gtest/gtest.h>
+namespace Test {
+
+TEST_F( threads, view_subview_auto_1d_left ) {
+ TestViewSubview::test_auto_1d< Kokkos::LayoutLeft,Kokkos::Threads >();
+}
+
+TEST_F( threads, view_subview_auto_1d_right ) {
+ TestViewSubview::test_auto_1d< Kokkos::LayoutRight,Kokkos::Threads >();
+}
-#include <Kokkos_Core.hpp>
+TEST_F( threads, view_subview_auto_1d_stride ) {
+ TestViewSubview::test_auto_1d< Kokkos::LayoutStride,Kokkos::Threads >();
+}
-#if !defined(KOKKOS_HAVE_CUDA) || defined(__CUDACC__)
-//----------------------------------------------------------------------------
+TEST_F( threads, view_subview_assign_strided ) {
+ TestViewSubview::test_1d_strided_assignment< Kokkos::Threads >();
+}
-#include <TestReduce.hpp>
+TEST_F( threads, view_subview_left_0 ) {
+ TestViewSubview::test_left_0< Kokkos::Threads >();
+}
+TEST_F( threads, view_subview_left_1 ) {
+ TestViewSubview::test_left_1< Kokkos::Threads >();
+}
-namespace Test {
+TEST_F( threads, view_subview_left_2 ) {
+ TestViewSubview::test_left_2< Kokkos::Threads >();
+}
-class defaultdevicetype : public ::testing::Test {
-protected:
- static void SetUpTestCase()
- {
- Kokkos::initialize();
- }
+TEST_F( threads, view_subview_left_3 ) {
+ TestViewSubview::test_left_3< Kokkos::Threads >();
+}
- static void TearDownTestCase()
- {
- Kokkos::finalize();
- }
-};
+TEST_F( threads, view_subview_right_0 ) {
+ TestViewSubview::test_right_0< Kokkos::Threads >();
+}
+TEST_F( threads, view_subview_right_1 ) {
+ TestViewSubview::test_right_1< Kokkos::Threads >();
+}
-TEST_F( defaultdevicetype, reduce_instantiation) {
- TestReduceCombinatoricalInstantiation<>::execute();
+TEST_F( threads, view_subview_right_3 ) {
+ TestViewSubview::test_right_3< Kokkos::Threads >();
}
} // namespace test
-#endif
diff --git a/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_b.cpp
similarity index 72%
copy from lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp
copy to lib/kokkos/core/unit_test/threads/TestThreads_SubView_b.cpp
index c15f81223..d57dbe97c 100644
--- a/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_b.cpp
@@ -1,76 +1,60 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
-
-#include <gtest/gtest.h>
-
-#include <Kokkos_Core.hpp>
-
-#if !defined(KOKKOS_HAVE_CUDA) || defined(__CUDACC__)
-//----------------------------------------------------------------------------
-
-#include <TestReduce.hpp>
-
+#include <threads/TestThreads.hpp>
namespace Test {
-class defaultdevicetype : public ::testing::Test {
-protected:
- static void SetUpTestCase()
- {
- Kokkos::initialize();
- }
-
- static void TearDownTestCase()
- {
- Kokkos::finalize();
- }
-};
-
+TEST_F( threads, view_subview_layoutleft_to_layoutleft) {
+ TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Threads >();
+ TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::Atomic> >();
+ TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+}
-TEST_F( defaultdevicetype, reduce_instantiation) {
- TestReduceCombinatoricalInstantiation<>::execute();
+TEST_F( threads, view_subview_layoutright_to_layoutright) {
+ TestViewSubview::test_layoutright_to_layoutright< Kokkos::Threads >();
+ TestViewSubview::test_layoutright_to_layoutright< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::Atomic> >();
+ TestViewSubview::test_layoutright_to_layoutright< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
} // namespace test
-#endif
diff --git a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c01.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
copy to lib/kokkos/core/unit_test/threads/TestThreads_SubView_c01.cpp
index 86bc94ab0..67d998c0e 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c01.cpp
@@ -1,55 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <threads/TestThreads.hpp>
-#ifndef KOKKOS_SINGLETON_HPP
-#define KOKKOS_SINGLETON_HPP
-
-#include <Kokkos_Macros.hpp>
-#include <cstddef>
-
-namespace Kokkos { namespace Impl {
+namespace Test {
+TEST_F( threads, view_subview_1d_assign ) {
+ TestViewSubview::test_1d_assign< Kokkos::Threads >();
+}
-}} // namespace Kokkos::Impl
+} // namespace test
-#endif // KOKKOS_SINGLETON_HPP
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c02.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
copy to lib/kokkos/core/unit_test/threads/TestThreads_SubView_c02.cpp
index 61d2e3570..e340240c4 100644
--- a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c02.cpp
@@ -1,56 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <threads/TestThreads.hpp>
-#ifndef KOKKOS_VIEWTILELEFT_HPP
-#define KOKKOS_VIEWTILELEFT_HPP
-
-#include <impl/KokkosExp_ViewTile.hpp>
-
-namespace Kokkos {
-
-using Kokkos::Experimental::tile_subview ;
+namespace Test {
+TEST_F( threads, view_subview_1d_assign_atomic ) {
+ TestViewSubview::test_1d_assign< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::Atomic> >();
}
-#endif /* #ifndef KOKKOS_VIEWTILELEFT_HPP */
+} // namespace test
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c03.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
copy to lib/kokkos/core/unit_test/threads/TestThreads_SubView_c03.cpp
index 61d2e3570..ad27fa0fa 100644
--- a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c03.cpp
@@ -1,56 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <threads/TestThreads.hpp>
-#ifndef KOKKOS_VIEWTILELEFT_HPP
-#define KOKKOS_VIEWTILELEFT_HPP
-
-#include <impl/KokkosExp_ViewTile.hpp>
-
-namespace Kokkos {
-
-using Kokkos::Experimental::tile_subview ;
+namespace Test {
+TEST_F( threads, view_subview_1d_assign_randomaccess ) {
+ TestViewSubview::test_1d_assign< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-#endif /* #ifndef KOKKOS_VIEWTILELEFT_HPP */
+} // namespace test
diff --git a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c04.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
copy to lib/kokkos/core/unit_test/threads/TestThreads_SubView_c04.cpp
index 86bc94ab0..6fca47cc4 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c04.cpp
@@ -1,55 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <threads/TestThreads.hpp>
-#ifndef KOKKOS_SINGLETON_HPP
-#define KOKKOS_SINGLETON_HPP
-
-#include <Kokkos_Macros.hpp>
-#include <cstddef>
-
-namespace Kokkos { namespace Impl {
+namespace Test {
+TEST_F( threads, view_subview_2d_from_3d ) {
+ TestViewSubview::test_2d_subview_3d< Kokkos::Threads >();
+}
-}} // namespace Kokkos::Impl
+} // namespace test
-#endif // KOKKOS_SINGLETON_HPP
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c05.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
copy to lib/kokkos/core/unit_test/threads/TestThreads_SubView_c05.cpp
index 61d2e3570..c7dfca941 100644
--- a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c05.cpp
@@ -1,56 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <threads/TestThreads.hpp>
-#ifndef KOKKOS_VIEWTILELEFT_HPP
-#define KOKKOS_VIEWTILELEFT_HPP
-
-#include <impl/KokkosExp_ViewTile.hpp>
-
-namespace Kokkos {
-
-using Kokkos::Experimental::tile_subview ;
+namespace Test {
+TEST_F( threads, view_subview_2d_from_3d_atomic ) {
+ TestViewSubview::test_2d_subview_3d< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::Atomic> >();
}
-#endif /* #ifndef KOKKOS_VIEWTILELEFT_HPP */
+} // namespace test
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c06.cpp
similarity index 88%
copy from lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
copy to lib/kokkos/core/unit_test/threads/TestThreads_SubView_c06.cpp
index 61d2e3570..38e839491 100644
--- a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c06.cpp
@@ -1,56 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <threads/TestThreads.hpp>
-#ifndef KOKKOS_VIEWTILELEFT_HPP
-#define KOKKOS_VIEWTILELEFT_HPP
-
-#include <impl/KokkosExp_ViewTile.hpp>
-
-namespace Kokkos {
-
-using Kokkos::Experimental::tile_subview ;
+namespace Test {
+TEST_F( threads, view_subview_2d_from_3d_randomaccess ) {
+ TestViewSubview::test_2d_subview_3d< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-#endif /* #ifndef KOKKOS_VIEWTILELEFT_HPP */
+} // namespace test
diff --git a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c07.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
copy to lib/kokkos/core/unit_test/threads/TestThreads_SubView_c07.cpp
index 86bc94ab0..1f01fe6b5 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c07.cpp
@@ -1,55 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <threads/TestThreads.hpp>
-#ifndef KOKKOS_SINGLETON_HPP
-#define KOKKOS_SINGLETON_HPP
-
-#include <Kokkos_Macros.hpp>
-#include <cstddef>
-
-namespace Kokkos { namespace Impl {
+namespace Test {
+TEST_F( threads, view_subview_3d_from_5d_left ) {
+ TestViewSubview::test_3d_subview_5d_left< Kokkos::Threads >();
+}
-}} // namespace Kokkos::Impl
+} // namespace test
-#endif // KOKKOS_SINGLETON_HPP
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c08.cpp
similarity index 88%
copy from lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
copy to lib/kokkos/core/unit_test/threads/TestThreads_SubView_c08.cpp
index 61d2e3570..e9a1ccbe3 100644
--- a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c08.cpp
@@ -1,56 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <threads/TestThreads.hpp>
-#ifndef KOKKOS_VIEWTILELEFT_HPP
-#define KOKKOS_VIEWTILELEFT_HPP
-
-#include <impl/KokkosExp_ViewTile.hpp>
-
-namespace Kokkos {
-
-using Kokkos::Experimental::tile_subview ;
+namespace Test {
+TEST_F( threads, view_subview_3d_from_5d_left_atomic ) {
+ TestViewSubview::test_3d_subview_5d_left< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::Atomic> >();
}
-#endif /* #ifndef KOKKOS_VIEWTILELEFT_HPP */
+} // namespace test
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c09.cpp
similarity index 88%
copy from lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
copy to lib/kokkos/core/unit_test/threads/TestThreads_SubView_c09.cpp
index 61d2e3570..c8b6c8743 100644
--- a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c09.cpp
@@ -1,56 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <threads/TestThreads.hpp>
-#ifndef KOKKOS_VIEWTILELEFT_HPP
-#define KOKKOS_VIEWTILELEFT_HPP
-
-#include <impl/KokkosExp_ViewTile.hpp>
-
-namespace Kokkos {
-
-using Kokkos::Experimental::tile_subview ;
+namespace Test {
+TEST_F( threads, view_subview_3d_from_5d_left_randomaccess ) {
+ TestViewSubview::test_3d_subview_5d_left< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-#endif /* #ifndef KOKKOS_VIEWTILELEFT_HPP */
+} // namespace test
diff --git a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c10.cpp
similarity index 89%
copy from lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
copy to lib/kokkos/core/unit_test/threads/TestThreads_SubView_c10.cpp
index 86bc94ab0..7cef6fa07 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c10.cpp
@@ -1,55 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <threads/TestThreads.hpp>
-#ifndef KOKKOS_SINGLETON_HPP
-#define KOKKOS_SINGLETON_HPP
-
-#include <Kokkos_Macros.hpp>
-#include <cstddef>
-
-namespace Kokkos { namespace Impl {
+namespace Test {
+TEST_F( threads, view_subview_3d_from_5d_right ) {
+ TestViewSubview::test_3d_subview_5d_right< Kokkos::Threads >();
+}
-}} // namespace Kokkos::Impl
+} // namespace test
-#endif // KOKKOS_SINGLETON_HPP
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c11.cpp
similarity index 88%
copy from lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
copy to lib/kokkos/core/unit_test/threads/TestThreads_SubView_c11.cpp
index 61d2e3570..d67bf3157 100644
--- a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c11.cpp
@@ -1,56 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <threads/TestThreads.hpp>
-#ifndef KOKKOS_VIEWTILELEFT_HPP
-#define KOKKOS_VIEWTILELEFT_HPP
-
-#include <impl/KokkosExp_ViewTile.hpp>
-
-namespace Kokkos {
-
-using Kokkos::Experimental::tile_subview ;
+namespace Test {
+TEST_F( threads, view_subview_3d_from_5d_right_atomic ) {
+ TestViewSubview::test_3d_subview_5d_right< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::Atomic> >();
}
-#endif /* #ifndef KOKKOS_VIEWTILELEFT_HPP */
+} // namespace test
diff --git a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c12.cpp
similarity index 88%
rename from lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
rename to lib/kokkos/core/unit_test/threads/TestThreads_SubView_c12.cpp
index 61d2e3570..e8a2c825c 100644
--- a/lib/kokkos/core/src/impl/Kokkos_ViewTileLeft.hpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c12.cpp
@@ -1,56 +1,52 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <threads/TestThreads.hpp>
-#ifndef KOKKOS_VIEWTILELEFT_HPP
-#define KOKKOS_VIEWTILELEFT_HPP
-
-#include <impl/KokkosExp_ViewTile.hpp>
-
-namespace Kokkos {
-
-using Kokkos::Experimental::tile_subview ;
+namespace Test {
+TEST_F( threads, view_subview_3d_from_5d_right_randomaccess ) {
+ TestViewSubview::test_3d_subview_5d_right< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-#endif /* #ifndef KOKKOS_VIEWTILELEFT_HPP */
+} // namespace test
diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_Team.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_Team.cpp
new file mode 100644
index 000000000..03f31b78c
--- /dev/null
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_Team.cpp
@@ -0,0 +1,122 @@
+/*
+//@HEADER
+// ************************************************************************
+//
+// Kokkos v. 2.0
+// Copyright (2014) Sandia Corporation
+//
+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
+// the U.S. Government retains certain rights in this software.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the Corporation nor the names of the
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
+//
+// ************************************************************************
+//@HEADER
+*/
+#include <threads/TestThreads.hpp>
+
+namespace Test {
+
+TEST_F( threads , team_tag )
+{
+ TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_for(0);
+ TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_reduce(0);
+ TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(0);
+ TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(0);
+
+ TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_for(2);
+ TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_reduce(2);
+ TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(2);
+ TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(2);
+
+ TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_for(1000);
+ TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_reduce(1000);
+ TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(1000);
+ TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(1000);
+}
+
+TEST_F( threads , team_shared_request) {
+ TestSharedTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >();
+ TestSharedTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >();
+}
+
+TEST_F( threads, team_scratch_request) {
+ TestScratchTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >();
+ TestScratchTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >();
+}
+
+#if defined(KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA)
+TEST_F( threads , team_lambda_shared_request) {
+ TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >();
+ TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >();
+}
+#endif
+
+TEST_F( threads, shmem_size) {
+ TestShmemSize< Kokkos::Threads >();
+}
+
+TEST_F( threads, multi_level_scratch) {
+ TestMultiLevelScratchTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >();
+ TestMultiLevelScratchTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >();
+}
+
+TEST_F( threads , team_vector )
+{
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(0) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(1) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(2) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(3) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(4) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(5) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(6) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(7) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(8) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(9) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(10) ) );
+}
+
+#ifdef KOKKOS_COMPILER_GNU
+#if ( KOKKOS_COMPILER_GNU == 472 )
+#define SKIP_TEST
+#endif
+#endif
+
+#ifndef SKIP_TEST
+TEST_F( threads, triple_nested_parallelism )
+{
+ TestTripleNestedReduce< double, Kokkos::Threads >( 8192, 2048 , 32 , 32 );
+ TestTripleNestedReduce< double, Kokkos::Threads >( 8192, 2048 , 32 , 16 );
+ TestTripleNestedReduce< double, Kokkos::Threads >( 8192, 2048 , 16 , 16 );
+}
+#endif
+
+} // namespace test
+
diff --git a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp b/lib/kokkos/core/unit_test/threads/TestThreads_ViewAPI_a.cpp
similarity index 89%
rename from lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
rename to lib/kokkos/core/unit_test/threads/TestThreads_ViewAPI_a.cpp
index 86bc94ab0..46a576b02 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Singleton.hpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_ViewAPI_a.cpp
@@ -1,55 +1,53 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <threads/TestThreads.hpp>
-#ifndef KOKKOS_SINGLETON_HPP
-#define KOKKOS_SINGLETON_HPP
-
-#include <Kokkos_Macros.hpp>
-#include <cstddef>
-
-namespace Kokkos { namespace Impl {
+namespace Test {
+TEST_F( threads , impl_view_mapping_a ) {
+ test_view_mapping< Kokkos::Threads >();
+ test_view_mapping_operator< Kokkos::Threads >();
+}
-}} // namespace Kokkos::Impl
+} // namespace test
-#endif // KOKKOS_SINGLETON_HPP
diff --git a/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_ViewAPI_b.cpp
similarity index 52%
copy from lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp
copy to lib/kokkos/core/unit_test/threads/TestThreads_ViewAPI_b.cpp
index c15f81223..b5d6ac843 100644
--- a/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_ViewAPI_b.cpp
@@ -1,76 +1,121 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+#include <threads/TestThreads.hpp>
-#include <gtest/gtest.h>
+namespace Test {
-#include <Kokkos_Core.hpp>
+TEST_F( threads , impl_shared_alloc ) {
+ test_shared_alloc< Kokkos::HostSpace , Kokkos::Threads >();
+}
-#if !defined(KOKKOS_HAVE_CUDA) || defined(__CUDACC__)
-//----------------------------------------------------------------------------
+TEST_F( threads , impl_view_mapping_b ) {
+ test_view_mapping_subview< Kokkos::Threads >();
+ TestViewMappingAtomic< Kokkos::Threads >::run();
+}
-#include <TestReduce.hpp>
+TEST_F( threads, view_api) {
+ TestViewAPI< double , Kokkos::Threads >();
+}
+TEST_F( threads , view_nested_view )
+{
+ ::Test::view_nested_view< Kokkos::Threads >();
+}
-namespace Test {
-class defaultdevicetype : public ::testing::Test {
-protected:
- static void SetUpTestCase()
- {
- Kokkos::initialize();
- }
- static void TearDownTestCase()
- {
- Kokkos::finalize();
- }
-};
+TEST_F( threads , view_remap )
+{
+ enum { N0 = 3 , N1 = 2 , N2 = 8 , N3 = 9 };
+ typedef Kokkos::View< double*[N1][N2][N3] ,
+ Kokkos::LayoutRight ,
+ Kokkos::Threads > output_type ;
+
+ typedef Kokkos::View< int**[N2][N3] ,
+ Kokkos::LayoutLeft ,
+ Kokkos::Threads > input_type ;
+
+ typedef Kokkos::View< int*[N0][N2][N3] ,
+ Kokkos::LayoutLeft ,
+ Kokkos::Threads > diff_type ;
+
+ output_type output( "output" , N0 );
+ input_type input ( "input" , N0 , N1 );
+ diff_type diff ( "diff" , N0 );
+
+ int value = 0 ;
+ for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
+ for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
+ for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
+ for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
+ input(i0,i1,i2,i3) = ++value ;
+ }}}}
+
+ // Kokkos::deep_copy( diff , input ); // throw with incompatible shape
+ Kokkos::deep_copy( output , input );
+
+ value = 0 ;
+ for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
+ for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
+ for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
+ for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
+ ++value ;
+ ASSERT_EQ( value , ((int) output(i0,i1,i2,i3) ) );
+ }}}}
+}
+
+//----------------------------------------------------------------------------
+
+TEST_F( threads , view_aggregate )
+{
+ TestViewAggregate< Kokkos::Threads >();
+}
-TEST_F( defaultdevicetype, reduce_instantiation) {
- TestReduceCombinatoricalInstantiation<>::execute();
+TEST_F( threads , template_meta_functions )
+{
+ TestTemplateMetaFunctions<int, Kokkos::Threads >();
}
} // namespace test
-#endif
diff --git a/lib/kokkos/doc/README b/lib/kokkos/doc/README
deleted file mode 100644
index 31e75f365..000000000
--- a/lib/kokkos/doc/README
+++ /dev/null
@@ -1,32 +0,0 @@
-Kokkos uses the Doxygen tool for providing three documentation
-sources:
-- man pages
-- Latex User Guide
-- HTML Online User Guide.
-
-Man Pages
-
-Man pages are available for all files and functions in the directory
-TRILINOS_HOME/doc/kokkos/man, where TRILINOS_HOME is the location of your
-copy of Trilinos. To use these pages with the Unix man utility, add
-the directory to your man path as follows:
-
-setenv MANPATH `echo $MANPATH`:TRILINOS_HOME/doc/kokkos/man
-
-
-LaTeX User Guide
-
-A postscript version of this guide is in
-TRILINOS_HOME/doc/kokkos/latex/user_guide.ps. The LaTeX source is in the
-directory TRILINOS_HOME/doc/kokkos/latex.
-
-HTML Online User Guide
-
-The online guide is initiated by pointing your browser to
-TRILINOS_HOME/doc/kokkos/html/index.html
-
-Any question, comments or suggestions are welcome. Please send to
-Mike Heroux at
-
-320-845-7695
-maherou@sandia.gov
diff --git a/lib/kokkos/doc/design_notes_space_instances.md b/lib/kokkos/doc/design_notes_space_instances.md
new file mode 100644
index 000000000..487fa25bc
--- /dev/null
+++ b/lib/kokkos/doc/design_notes_space_instances.md
@@ -0,0 +1,166 @@
+# Design Notes for Execution and Memory Space Instances
+
+
+## Execution Spaces
+
+ * Work is *dispatched* to an execution space instance
+
+
+
+## Host Associated Execution Space Instances
+
+Vocabulary and examples assuming C++11 Threads Support Library
+
+ * A host-side *control* thread dispatches work to an instance
+
+ * `this_thread` is the control thread
+
+ * `main` is the initial control thread
+
+ * An execution space instance is a pool of threads
+
+ * All instances are disjoint thread pools
+
+ * Exactly one control thread is associated with
+ an instance and only that control thread may
+ dispatch work to to that instance
+
+ * A control thread may be a member of an instance,
+ if so then it is also the control thread associated
+ with that instance
+
+ * The pool of threads associated with an instances is not mutatable
+
+ * The pool of threads associated with an instance may be masked
+
+ - Allows work to be dispatched to a subset of the pool
+
+ - Example: only one hyperthread per core of the instance
+
+ - When a mask is applied to an instance that mask
+ remains until cleared or another mask is applied
+
+ - Masking is portable by defining it as using a fraction
+ of the available resources (threads)
+
+ * Instances are shared (referenced counted) objects,
+ just like `Kokkos::View`
+
+```
+struct StdThread {
+ void mask( float fraction );
+ void unmask() { mask( 1.0 ); }
+};
+```
+
+
+
+### Requesting an Execution Space Instance
+
+ * `Space::request(` *who* `,` *what* `,` *control-opt* `)`
+
+ * *who* is an identifier for subsquent queries regarding
+ who requested each instance
+
+ * *what* is the number of threads and how they should be placed
+
+ - Placement within locality-topology hierarchy; e.g., HWLOC
+
+ - Compact within a level of hierarchy, or striped across that level;
+ e.g., socket or NUMA region
+
+ - Granularity of request is core
+
+ * *control-opt* optionally specifies whether the instance
+ has a new control thread
+
+ - *control-opt* includes a control function / closure
+
+ - The new control thread is a member of the instance
+
+ - The control function is called by the new control thread
+ and is passed a `const` instance
+
+ - The instance is **not** returned to the creating control thread
+
+ * `std::thread` that is not a member of an instance is
+ *hard blocked* on a `std::mutex`
+
+ - One global mutex or one mutex per thread?
+
+ * `std::thread` that is a member of an instance is
+ *spinning* waiting for work, or are working
+
+```
+struct StdThread {
+
+ struct Resource ;
+
+ static StdThread request(); // default
+
+ static StdThread request( const std::string & , const Resource & );
+
+ // If the instance can be reserved then
+ // allocate a copy of ControlClosure and invoke
+ // ControlClosure::operator()( const StdThread intance ) const
+ template< class ControlClosure >
+ static bool request( const std::string & , const Resource &
+ , const ControlClosure & );
+};
+```
+
+### Relinquishing an Execution Space Instance
+
+ * De-referencing the last reference-counted instance
+ relinquishes the pool of threads
+
+ * If a control thread was created for the instance then
+ it is relinquished when that control thread returns
+ from the control function
+
+ - Requires the reference count to be zero, an error if not
+
+ * No *forced* relinquish
+
+
+
+## CUDA Associated Execution Space Instances
+
+ * Only a signle CUDA architecture
+
+ * An instance is a device + stream
+
+ * A stream is exclusive to an instance
+
+ * Only a host-side control thread can dispatch work to an instance
+
+ * Finite number of streams per device
+
+ * ISSUE: How to use CUDA `const` memory with multiple streams?
+
+ * Masking can be mapped to restricting the number of CUDA blocks
+ to the fraction of available resources; e.g., maximum resident blocks
+
+
+### Requesting an Execution Space Instance
+
+ * `Space::request(` *who* `,` *what* `)`
+
+ * *who* is an identifier for subsquent queries regarding
+ who requested each instance
+
+ * *what* is which device, the stream is a requested/relinquished resource
+
+
+```
+struct Cuda {
+
+ struct Resource ;
+
+ static Cuda request();
+
+ static Cuda request( const std::string & , const Resource & );
+};
+```
+
+
diff --git a/lib/kokkos/example/common/VectorImport.hpp b/lib/kokkos/example/common/VectorImport.hpp
index 8ecd74d46..48b28f8c2 100644
--- a/lib/kokkos/example/common/VectorImport.hpp
+++ b/lib/kokkos/example/common/VectorImport.hpp
@@ -1,294 +1,294 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_VECTORIMPORT_HPP
#define KOKKOS_VECTORIMPORT_HPP
#include <utility>
#include <limits>
#include <iostream>
#include <sstream>
#include <stdexcept>
#include <Kokkos_Core.hpp>
#include <WrapMPI.hpp>
namespace Kokkos {
namespace Example {
template< class CommMessageType , class CommIdentType , class VectorType >
struct VectorImport ;
} // namespace Example
} // namespace Kokkos
#if ! defined( KOKKOS_HAVE_MPI )
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Example {
template< class CommMessageType , class CommIdentType , class VectorType >
struct VectorImport {
const MPI_Comm comm ;
const unsigned count_owned ;
const unsigned count_receive ;
VectorImport( MPI_Comm arg_comm ,
const CommMessageType & ,
const CommMessageType & ,
const CommIdentType & ,
const unsigned arg_count_owned ,
const unsigned arg_count_receive )
: comm( arg_comm )
, count_owned( arg_count_owned )
, count_receive( arg_count_receive )
{}
inline
void operator()( const VectorType & ) const {}
};
} // namespace Example
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
#else /* defined( KOKKOS_HAVE_MPI ) */
namespace Kokkos {
namespace Example {
template< class CommMessageType , class CommIdentType , class VectorType >
class VectorImport {
private:
// rank == 1 or array_layout == LayoutRight
enum { OK = Kokkos::Impl::StaticAssert<
( VectorType::rank == 1 ) ||
- Kokkos::Impl::is_same< typename VectorType::array_layout , Kokkos::LayoutRight >::value
+ std::is_same< typename VectorType::array_layout , Kokkos::LayoutRight >::value
>::value };
typedef typename VectorType::HostMirror HostVectorType ;
enum { ReceiveInPlace =
- Kokkos::Impl::is_same< typename VectorType::memory_space ,
+ std::is_same< typename VectorType::memory_space ,
typename HostVectorType::memory_space >::value };
const CommMessageType recv_msg ;
const CommMessageType send_msg ;
const CommIdentType send_nodeid ;
VectorType send_buffer ;
HostVectorType host_send_buffer ;
HostVectorType host_recv_buffer ;
unsigned chunk ;
public:
const MPI_Comm comm ;
const unsigned count_owned ;
const unsigned count_receive ;
struct Pack {
typedef typename VectorType::execution_space execution_space ;
const CommIdentType index ;
const VectorType source ;
const VectorType buffer ;
KOKKOS_INLINE_FUNCTION
void operator()( const unsigned i ) const
{ buffer( i ) = source( index(i) ); }
Pack( const CommIdentType & arg_index ,
const VectorType & arg_source ,
const VectorType & arg_buffer )
: index( arg_index )
, source( arg_source )
, buffer( arg_buffer )
{
Kokkos::parallel_for( index.dimension_0() , *this );
execution_space::fence();
}
};
VectorImport( MPI_Comm arg_comm ,
const CommMessageType & arg_recv_msg ,
const CommMessageType & arg_send_msg ,
const CommIdentType & arg_send_nodeid ,
const unsigned arg_count_owned ,
const unsigned arg_count_receive )
: recv_msg( arg_recv_msg )
, send_msg( arg_send_msg )
, send_nodeid( arg_send_nodeid )
, send_buffer()
, host_send_buffer()
, host_recv_buffer()
, comm( arg_comm )
, count_owned( arg_count_owned )
, count_receive( arg_count_receive )
{
if ( ! ReceiveInPlace ) {
host_recv_buffer = HostVectorType("recv_buffer",count_receive);
}
unsigned send_count = 0 ;
for ( unsigned i = 0 ; i < send_msg.dimension_0() ; ++i ) { send_count += send_msg(i,1); }
send_buffer = VectorType("send_buffer",send_count);
host_send_buffer = Kokkos::create_mirror_view( send_buffer );
}
inline
void operator()( const VectorType & v ) const
{
typedef typename VectorType::value_type scalar_type ;
const int mpi_tag = 42 ;
const unsigned chunk = v.dimension_1();
// Subvector for receives
const std::pair<unsigned,unsigned> recv_range( count_owned , count_owned + count_receive );
const VectorType recv_vector = Kokkos::subview( v , recv_range );
std::vector< MPI_Request > recv_request( recv_msg.dimension_0() , MPI_REQUEST_NULL );
{ // Post receives
scalar_type * ptr =
ReceiveInPlace ? recv_vector.ptr_on_device() : host_recv_buffer.ptr_on_device();
for ( size_t i = 0 ; i < recv_msg.dimension_0() ; ++i ) {
const int proc = recv_msg(i,0);
const int count = recv_msg(i,1) * chunk ;
MPI_Irecv( ptr , count * sizeof(scalar_type) , MPI_BYTE ,
proc , mpi_tag , comm , & recv_request[i] );
ptr += count ;
}
}
MPI_Barrier( comm );
{ // Pack and send
const Pack pack( send_nodeid , v , send_buffer );
Kokkos::deep_copy( host_send_buffer , send_buffer );
scalar_type * ptr = host_send_buffer.ptr_on_device();
for ( size_t i = 0 ; i < send_msg.dimension_0() ; ++i ) {
const int proc = send_msg(i,0);
const int count = send_msg(i,1) * chunk ;
// MPI_Ssend blocks until
// (1) a receive is matched for the message and
// (2) the send buffer can be re-used.
//
// It is suggested that MPI_Ssend will have the best performance:
// http://www.mcs.anl.gov/research/projects/mpi/sendmode.html .
MPI_Ssend( ptr ,
count * sizeof(scalar_type) , MPI_BYTE ,
proc , mpi_tag , comm );
ptr += count ;
}
}
// Wait for receives and verify:
for ( size_t i = 0 ; i < recv_msg.dimension_0() ; ++i ) {
MPI_Status recv_status ;
int recv_which = 0 ;
int recv_size = 0 ;
MPI_Waitany( recv_msg.dimension_0() , & recv_request[0] , & recv_which , & recv_status );
const int recv_proc = recv_status.MPI_SOURCE ;
MPI_Get_count( & recv_status , MPI_BYTE , & recv_size );
// Verify message properly received:
const int expected_proc = recv_msg(recv_which,0);
const int expected_size = recv_msg(recv_which,1) * chunk * sizeof(scalar_type);
if ( ( expected_proc != recv_proc ) ||
( expected_size != recv_size ) ) {
int local_rank = 0 ;
MPI_Comm_rank( comm , & local_rank );
std::ostringstream msg ;
msg << "VectorImport error:"
<< " P" << local_rank
<< " received from P" << recv_proc
<< " size " << recv_size
<< " expected " << expected_size
<< " from P" << expected_proc ;
throw std::runtime_error( msg.str() );
}
}
// Copy received data to device memory.
if ( ! ReceiveInPlace ) { Kokkos::deep_copy( recv_vector , host_recv_buffer ); }
}
};
} // namespace Example
} // namespace Kokkos
#endif
//----------------------------------------------------------------------------
#endif /* #ifndef KOKKOS_VECTORIMPORT_HPP */
diff --git a/lib/kokkos/example/feint/ElemFunctor.hpp b/lib/kokkos/example/feint/ElemFunctor.hpp
index 651e34c2e..583c4fda1 100644
--- a/lib/kokkos/example/feint/ElemFunctor.hpp
+++ b/lib/kokkos/example/feint/ElemFunctor.hpp
@@ -1,489 +1,485 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_EXAMPLE_FEINT_FUNCTORS_HPP
#define KOKKOS_EXAMPLE_FEINT_FUNCTORS_HPP
#include <stdio.h>
#include <Kokkos_Core.hpp>
#include <BoxElemFixture.hpp>
namespace Kokkos {
namespace Example {
/** \brief Numerically integrate a function on a finite element mesh and
* project the integrated values to nodes.
*/
template< class FixtureType ,
class FunctionType ,
bool PerformScatterAddWithAtomic >
struct FiniteElementIntegration ;
// Specialized for an 'Example::BoxElemFixture' finite element mesh
template< class Device , BoxElemPart::ElemOrder ElemOrder , class GridMap ,
class FunctionType ,
bool PerformScatterAddWithAtomic >
struct FiniteElementIntegration<
Kokkos::Example::BoxElemFixture< Device , ElemOrder , GridMap > ,
FunctionType ,
PerformScatterAddWithAtomic >
{
// Element mesh types:
typedef Kokkos::Example::BoxElemFixture< Device , ElemOrder >
BoxFixtureType ;
typedef Kokkos::Example::HexElement_Data< BoxFixtureType::ElemNode >
HexElemDataType ;
enum { ElemNodeCount = HexElemDataType::element_node_count };
enum { IntegrationCount = HexElemDataType::integration_count };
enum { ValueCount = FunctionType::value_count };
// Dictionary of view types:
typedef View<int*, Device> ElemErrorType ;
typedef View<double*[ElemNodeCount][ValueCount],Device> ElemValueType ;
typedef View<double*[ValueCount], Device> NodeValueType ;
// Data members for this Functor:
const HexElemDataType m_hex_elem_data ; ///< Master element
const BoxFixtureType m_box_fixture ; ///< Unstructured mesh data
const FunctionType m_function ; ///< Function to integrate
const ElemErrorType m_elem_error ; ///< Flags for element errors
const ElemValueType m_elem_integral ; ///< Per-element quantities
const NodeValueType m_node_lumped ; ///< Quantities lumped to nodes
//----------------------------------------
FiniteElementIntegration(
const BoxFixtureType & box_fixture ,
const FunctionType & function )
: m_hex_elem_data()
, m_box_fixture( box_fixture ) // Shallow copy of the mesh fixture
, m_function( function )
, m_elem_error( "elem_error" , box_fixture.elem_count() )
, m_elem_integral( "elem_integral" , box_fixture.elem_count() )
, m_node_lumped( "node_lumped" , box_fixture.node_count() )
{}
//----------------------------------------
// Device for parallel dispatch.
typedef typename Device::execution_space execution_space;
// Value type for global parallel reduction.
struct value_type {
double value[ ValueCount ]; ///< Integrated quantitie
int error ; ///< Element inversion flag
};
//----------------------------------------
// Transform element interpolation function gradients and
// compute determinant of spatial jacobian.
KOKKOS_INLINE_FUNCTION
float transform_gradients(
const float grad[][ ElemNodeCount ] , // Gradient of bases master element
const double coord[][ ElemNodeCount ] ,
float dpsi[][ ElemNodeCount ] ) const
{
enum { TensorDim = 9 };
enum { j11 = 0 , j12 = 1 , j13 = 2 ,
j21 = 3 , j22 = 4 , j23 = 5 ,
j31 = 6 , j32 = 7 , j33 = 8 };
// Temporary for jacobian accumulation is double for summation accuracy.
double J[ TensorDim ] = { 0, 0, 0, 0, 0, 0, 0, 0, 0 };
for( int i = 0; i < ElemNodeCount ; ++i ) {
J[j11] += grad[0][i] * coord[0][i] ;
J[j12] += grad[0][i] * coord[1][i] ;
J[j13] += grad[0][i] * coord[2][i] ;
J[j21] += grad[1][i] * coord[0][i] ;
J[j22] += grad[1][i] * coord[1][i] ;
J[j23] += grad[1][i] * coord[2][i] ;
J[j31] += grad[2][i] * coord[0][i] ;
J[j32] += grad[2][i] * coord[1][i] ;
J[j33] += grad[2][i] * coord[2][i] ;
}
// Inverse jacobian, compute as double and store as float.
float invJ[ TensorDim ] = {
float( J[j22] * J[j33] - J[j23] * J[j32] ) ,
float( J[j13] * J[j32] - J[j12] * J[j33] ) ,
float( J[j12] * J[j23] - J[j13] * J[j22] ) ,
float( J[j23] * J[j31] - J[j21] * J[j33] ) ,
float( J[j11] * J[j33] - J[j13] * J[j31] ) ,
float( J[j13] * J[j21] - J[j11] * J[j23] ) ,
float( J[j21] * J[j32] - J[j22] * J[j31] ) ,
float( J[j12] * J[j31] - J[j11] * J[j32] ) ,
float( J[j11] * J[j22] - J[j12] * J[j21] ) };
const float detJ = J[j11] * invJ[j11] +
J[j21] * invJ[j12] +
J[j31] * invJ[j13] ;
{
const float detJinv = 1.0 / detJ ;
for ( int i = 0 ; i < TensorDim ; ++i ) { invJ[i] *= detJinv ; }
}
// Transform gradients:
for ( int i = 0; i < ElemNodeCount ; ++i ) {
dpsi[0][i] = grad[0][i] * invJ[j11] +
grad[1][i] * invJ[j12] +
grad[2][i] * invJ[j13];
dpsi[1][i] = grad[0][i] * invJ[j21] +
grad[1][i] * invJ[j22] +
grad[2][i] * invJ[j23];
dpsi[2][i] = grad[0][i] * invJ[j31] +
grad[1][i] * invJ[j32] +
grad[2][i] * invJ[j33];
}
return detJ ;
}
// Functor's function called for each element in the mesh
// to numerically integrate the function and add element quantities
// to the global integral.
KOKKOS_INLINE_FUNCTION
void operator()( const int ielem , value_type & update ) const
{
// Local temporaries for gathering nodal data.
double node_coord[3][ ElemNodeCount ];
int inode[ ElemNodeCount ] ;
// Gather indices of element's node from global memory to local memory.
for ( int i = 0 ; i < ElemNodeCount ; ++i ) {
inode[i] = m_box_fixture.elem_node( ielem , i );
}
// Gather coordinates of element's nodes from global memory to local memory.
for ( int i = 0 ; i < ElemNodeCount ; ++i ) {
node_coord[0][i] = m_box_fixture.node_coord( inode[i] , 0 );
node_coord[1][i] = m_box_fixture.node_coord( inode[i] , 1 );
node_coord[2][i] = m_box_fixture.node_coord( inode[i] , 2 );
}
// Local temporary to accumulate numerical integration
// of vector valued function.
double accum[ ValueCount ];
for ( int j = 0 ; j < ValueCount ; ++j ) { accum[j] = 0 ; }
int error = 0 ;
// Numerical integration loop for this element:
for ( int k = 0 ; k < IntegrationCount ; ++k ) {
// Integration point in space as interpolated from nodal coordinates:
double point[3] = { 0 , 0 , 0 };
for ( int i = 0 ; i < ElemNodeCount ; ++i ) {
point[0] += node_coord[0][i] * m_hex_elem_data.values[k][i] ;
point[1] += node_coord[1][i] * m_hex_elem_data.values[k][i] ;
point[2] += node_coord[2][i] * m_hex_elem_data.values[k][i] ;
}
// Example function vector value at cubature point:
double val_at_pt[ ValueCount ];
m_function( point , val_at_pt );
// Temporary array for transformed element basis functions' gradient.
// Not used in this example, but computed anyway by the more general
// deformation function.
float dpsi[3][ ElemNodeCount ];
// Compute deformation jacobian, transform basis function gradient,
// and return determinant of deformation jacobian.
float detJ = transform_gradients( m_hex_elem_data.gradients[k] ,
node_coord , dpsi );
// Check for inverted spatial jacobian
if ( detJ <= 0 ) { error = 1 ; detJ = 0 ; }
// Integration weight.
const float w = m_hex_elem_data.weights[k] * detJ ;
// Cubature of function.
for ( int j = 0 ; j < ValueCount ; ++j ) {
accum[j] += val_at_pt[j] * w ;
}
}
m_elem_error(ielem) = error ;
// Element contribution to global integral:
if ( error ) { update.error = 1 ; }
for ( int j = 0 ; j < ValueCount ; ++j ) { update.value[j] += accum[j] ; }
// Element-node quantity for lumping to nodes:
for ( int i = 0 ; i < ElemNodeCount ; ++i ) {
for ( int j = 0 ; j < ValueCount ; ++j ) {
// Save element's integral apportionment to nodes to global memory
m_elem_integral( ielem , i , j ) = accum[j] / ElemNodeCount ;
}
}
if ( PerformScatterAddWithAtomic ) {
// Option to immediately scatter-add the integrated quantities to nodes.
// This is a race condition as two or more threads could attempt
// concurrent update of nodal values. The atomic_fetch_add (+=)
// function guarantees that the summation will occur correctly;
// however, there can be no guarantee for the order of summation.
// Due to non-associativity of floating point arithmetic the result
// is non-deterministic within bounds of floating point round-off.
for ( int i = 0 ; i < ElemNodeCount ; ++i ) {
for ( int j = 0 ; j < ValueCount ; ++j ) {
Kokkos::atomic_fetch_add( & m_node_lumped( inode[i] , j ) ,
m_elem_integral( ielem , i , j ) );
}
}
}
}
//--------------------------------------------------------------------------
// Initialization of the global reduction value.
KOKKOS_INLINE_FUNCTION
void init( value_type & update ) const
{
for ( int j = 0 ; j < ValueCount ; ++j ) update.value[j] = 0 ;
update.error = 0 ;
}
// Join two contributions to global reduction value.
KOKKOS_INLINE_FUNCTION
void join( volatile value_type & update ,
volatile const value_type & input ) const
{
for ( int j = 0 ; j < ValueCount ; ++j ) update.value[j] += input.value[j] ;
if ( input.error ) update.error = 1 ;
}
};
} /* namespace Example */
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Example {
template< class ViewElemNode ,
class ViewNodeScan ,
class ViewNodeElem >
void map_node_to_elem( const ViewElemNode & elem_node ,
const ViewNodeScan & node_scan ,
const ViewNodeElem & node_elem );
/** \brief Functor to gather-sum elements' per-node quantities
* to element nodes. Gather-sum is thread safe and
* does not require atomic updates.
*/
template< class ViewNodeValue ,
class ViewElemValue ,
bool AlreadyUsedAtomic >
struct LumpElemToNode {
typedef typename ViewElemValue::execution_space execution_space ;
// In this example we know that the ViewElemValue
// array specification is < double*[nNode][nValue] >
-#if KOKKOS_USING_EXP_VIEW
enum { value_count = ViewElemValue::dimension::N2 };
-#else
- enum { value_count = ViewElemValue::shape_type::N2 };
-#endif
ViewNodeValue m_node_value ; ///< Integrated values at nodes
ViewElemValue m_elem_value ; ///< Values apportioned to nodes
View<int*, execution_space> m_node_scan ; ///< Offsets for nodes->element
View<int*[2],execution_space> m_node_elem ; ///< Node->element connectivity
// Only allocate node->element connectivity if have
// not already used atomic updates for the nodes.
template< class ViewElemNode >
LumpElemToNode( const ViewNodeValue & node_value ,
const ViewElemValue & elem_value ,
const ViewElemNode & elem_node )
: m_node_value( node_value )
, m_elem_value( elem_value )
, m_node_scan( "node_scan" ,
AlreadyUsedAtomic ? 0 : node_value.dimension_0() + 1 )
, m_node_elem( "node_elem" ,
AlreadyUsedAtomic ? 0 : elem_node.dimension_0() *
elem_node.dimension_1() )
{
if ( ! AlreadyUsedAtomic ) {
map_node_to_elem( elem_node , m_node_scan , m_node_elem );
}
}
//----------------------------------------
struct value_type { double value[ value_count ]; };
KOKKOS_INLINE_FUNCTION
void operator()( const int inode , value_type & update ) const
{
if ( ! AlreadyUsedAtomic ) {
// Sum element quantities to a local variable.
value_type local ;
for ( int j = 0 ; j < value_count ; ++j ) { local.value[j] = 0 ; }
{
// nodes' element ids span [i,end)
int i = m_node_scan(inode);
const int end = m_node_scan(inode+1);
for ( ; i < end ; ++i ) {
// element #ielem , local node #ielem_node is this node:
const int ielem = m_node_elem(i,0);
const int ielem_node = m_node_elem(i,1);
// Sum the vector-values quantity
for ( int j = 0 ; j < value_count ; ++j ) {
local.value[j] += m_elem_value( ielem , ielem_node , j );
}
}
}
// Assign nodal quantity (no race condition).
// Sum global value.
for ( int j = 0 ; j < value_count ; ++j ) {
m_node_value( inode , j ) = local.value[j] ;
update.value[j] += local.value[j] ;
}
}
else {
// Already used atomic update of the nodal quantity,
// query and sum the value.
for ( int j = 0 ; j < value_count ; ++j ) {
update.value[j] += m_node_value( inode , j );
}
}
}
KOKKOS_INLINE_FUNCTION
void init( value_type & update ) const
{ for ( int j = 0 ; j < value_count ; ++j ) { update.value[j] = 0 ; } }
KOKKOS_INLINE_FUNCTION
void join( volatile value_type & update ,
volatile const value_type & input ) const
{
for ( int j = 0 ; j < value_count ; ++j ) {
update.value[j] += input.value[j] ;
}
}
};
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
template< class ViewElemNode ,
class ViewNodeScan ,
class ViewNodeElem >
void map_node_to_elem( const ViewElemNode & elem_node ,
const ViewNodeScan & node_scan ,
const ViewNodeElem & node_elem )
{
typedef typename ViewElemNode::host_mirror_space host_mirror_space ;
const typename ViewElemNode::HostMirror host_elem_node =
Kokkos::create_mirror_view(elem_node);
const typename ViewNodeScan::HostMirror host_node_scan =
Kokkos::create_mirror_view(node_scan);
const typename ViewNodeElem::HostMirror host_node_elem =
Kokkos::create_mirror_view(node_elem);
const int elem_count = host_elem_node.dimension_0();
const int elem_node_count = host_elem_node.dimension_1();
const int node_count = host_node_scan.dimension_0() - 1 ;
const View<int*, host_mirror_space >
node_elem_count( "node_elem_count" , node_count );
Kokkos::deep_copy( host_elem_node , elem_node );
for ( int i = 0 ; i < elem_count ; ++i ) {
for ( int j = 0 ; j < elem_node_count ; ++j ) {
++node_elem_count( host_elem_node(i,j) );
}
}
for ( int i = 0 ; i < node_count ; ++i ) {
host_node_scan(i+1) += host_node_scan(i) + node_elem_count(i);
node_elem_count(i) = 0 ;
}
for ( int i = 0 ; i < elem_count ; ++i ) {
for ( int j = 0 ; j < elem_node_count ; ++j ) {
const int inode = host_elem_node(i,j);
const int offset = host_node_scan(inode) + node_elem_count(inode);
host_node_elem( offset , 0 ) = i ;
host_node_elem( offset , 1 ) = j ;
++node_elem_count(inode);
}
}
Kokkos::deep_copy( node_scan , host_node_scan );
Kokkos::deep_copy( node_elem , host_node_elem );
}
} /* namespace Example */
} /* namespace Kokkos */
#endif /* #ifndef KOKKOS_EXAMPLE_FEINT_FUNCTORS_HPP */
diff --git a/lib/kokkos/example/feint/Makefile b/lib/kokkos/example/feint/Makefile
index f198a974c..9abf51d10 100644
--- a/lib/kokkos/example/feint/Makefile
+++ b/lib/kokkos/example/feint/Makefile
@@ -1,61 +1,59 @@
KOKKOS_PATH = ../..
+KOKKOS_SRC_PATH = ${KOKKOS_PATH}
+vpath %.cpp ${KOKKOS_SRC_PATH}/example/fixture ${KOKKOS_SRC_PATH}/example/feint
-vpath %.cpp ${KOKKOS_PATH}/example/fixture ${KOKKOS_PATH}/example/feint
-
-EXAMPLE_HEADERS = $(wildcard $(KOKKOS_PATH)/example/common/*.hpp ${KOKKOS_PATH}/example/fixture/*.hpp ${KOKKOS_PATH}/example/feint/*.hpp)
+EXAMPLE_HEADERS = $(wildcard $(KOKKOS_SRC_PATH)/example/common/*.hpp ${KOKKOS_SRC_PATH}/example/fixture/*.hpp ${KOKKOS_SRC_PATH}/example/feint/*.hpp)
default: build_all
echo "End Build"
-
-include $(KOKKOS_PATH)/Makefile.kokkos
-ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
- CXX = $(NVCC_WRAPPER)
- CXXFLAGS ?= -O3
- LINK = $(CXX)
- LDFLAGS ?= -lpthread
+ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
+ CXX = $(KOKKOS_PATH)/bin/nvcc_wrapper
else
- CXX ?= g++
- CXXFLAGS ?= -O3
- LINK ?= $(CXX)
- LDFLAGS ?= -lpthread
+ CXX = g++
endif
+CXXFLAGS = -O3
+LINK ?= $(CXX)
+LDFLAGS ?=
+
+include $(KOKKOS_PATH)/Makefile.kokkos
+
KOKKOS_CXXFLAGS += \
- -I${KOKKOS_PATH}/example/common \
- -I${KOKKOS_PATH}/example/fixture \
- -I${KOKKOS_PATH}/example/feint
+ -I${KOKKOS_SRC_PATH}/example/common \
+ -I${KOKKOS_SRC_PATH}/example/fixture \
+ -I${KOKKOS_SRC_PATH}/example/feint
EXE_EXAMPLE_FEINT = KokkosExample_Feint
OBJ_EXAMPLE_FEINT = BoxElemPart.o main.o
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
OBJ_EXAMPLE_FEINT += feint_cuda.o
endif
ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 1)
OBJ_EXAMPLE_FEINT += feint_threads.o
endif
ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
OBJ_EXAMPLE_FEINT += feint_openmp.o
endif
TARGETS = $(EXE_EXAMPLE_FEINT)
#TEST_TARGETS =
$(EXE_EXAMPLE_FEINT) : $(OBJ_EXAMPLE_FEINT) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_EXAMPLE_FEINT) $(KOKKOS_LIBS) $(LIB) -o $(EXE_EXAMPLE_FEINT)
build_all : $(TARGETS)
test : build_all
clean: kokkos-clean
rm -f *.o $(TARGETS)
# Compilation rules
%.o:%.cpp $(KOKKOS_CPP_DEPENDS) $(EXAMPLE_HEADERS)
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
diff --git a/lib/kokkos/example/fenl/Makefile b/lib/kokkos/example/fenl/Makefile
index 5d8e6fd30..24a0e61c1 100644
--- a/lib/kokkos/example/fenl/Makefile
+++ b/lib/kokkos/example/fenl/Makefile
@@ -1,54 +1,50 @@
KOKKOS_PATH ?= ../..
MAKEFILE_PATH := $(abspath $(lastword $(MAKEFILE_LIST)))
SRC_DIR := $(dir $(MAKEFILE_PATH))
vpath %.cpp ${SRC_DIR}/../fixture ${SRC_DIR}
EXAMPLE_HEADERS = $(wildcard $(SRC_DIR)/../common/*.hpp ${SRC_DIR}/../fixture/*.hpp ${SRC_DIR}/*.hpp)
default: build_all
echo "End Build"
-include $(KOKKOS_PATH)/Makefile.kokkos
-
-# KOKKOS_INTERNAL_USE_CUDA is not exported to installed Makefile.kokkos
-# use KOKKOS_DEVICE here
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
- CXX = $(NVCC_WRAPPER)
- CXXFLAGS ?= -O3
- LINK = $(CXX)
- LDFLAGS ?= -lpthread
+ CXX = $(KOKKOS_PATH)/bin/nvcc_wrapper
else
- CXX ?= g++
- CXXFLAGS ?= -O3
- LINK ?= $(CXX)
- LDFLAGS ?= -lpthread
+ CXX = g++
endif
+CXXFLAGS = -O3
+LINK ?= $(CXX)
+LDFLAGS ?=
+
+include $(KOKKOS_PATH)/Makefile.kokkos
+
KOKKOS_CXXFLAGS += \
-I${SRC_DIR}/../common \
-I${SRC_DIR}/../fixture \
-I${SRC_DIR}
EXE_EXAMPLE_FENL = KokkosExample_Fenl
OBJ_EXAMPLE_FENL = BoxElemPart.o main.o fenl.o
TARGETS = $(EXE_EXAMPLE_FENL)
#TEST_TARGETS =
$(EXE_EXAMPLE_FENL) : $(OBJ_EXAMPLE_FENL) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_EXAMPLE_FENL) $(KOKKOS_LIBS) $(LIB) -o $(EXE_EXAMPLE_FENL)
build_all : $(TARGETS)
test : build_all
clean: kokkos-clean
rm -f *.o $(TARGETS)
# Compilation rules
%.o:%.cpp $(KOKKOS_CPP_DEPENDS) $(EXAMPLE_HEADERS)
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
diff --git a/lib/kokkos/example/fenl/fenl_impl.hpp b/lib/kokkos/example/fenl/fenl_impl.hpp
index 64070ce55..15583c10e 100644
--- a/lib/kokkos/example/fenl/fenl_impl.hpp
+++ b/lib/kokkos/example/fenl/fenl_impl.hpp
@@ -1,598 +1,598 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_EXAMPLE_FENL_IMPL_HPP
#define KOKKOS_EXAMPLE_FENL_IMPL_HPP
#include <math.h>
// Kokkos libraries' headers:
#include <Kokkos_UnorderedMap.hpp>
#include <Kokkos_StaticCrsGraph.hpp>
#include <impl/Kokkos_Timer.hpp>
// Examples headers:
#include <BoxElemFixture.hpp>
#include <VectorImport.hpp>
#include <CGSolve.hpp>
#include <fenl.hpp>
#include <fenl_functors.hpp>
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Example {
namespace FENL {
inline
double maximum( MPI_Comm comm , double local )
{
double global = local ;
#if defined( KOKKOS_HAVE_MPI )
MPI_Allreduce( & local , & global , 1 , MPI_DOUBLE , MPI_MAX , comm );
#endif
return global ;
}
} /* namespace FENL */
} /* namespace Example */
} /* namespace Kokkos */
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Example {
namespace FENL {
class ManufacturedSolution {
public:
// Manufactured solution for one dimensional nonlinear PDE
//
// -K T_zz + T^2 = 0 ; T(zmin) = T_zmin ; T(zmax) = T_zmax
//
// Has an analytic solution of the form:
//
// T(z) = ( a ( z - zmin ) + b )^(-2) where K = 1 / ( 6 a^2 )
//
// Given T_0 and T_L compute K for this analytic solution.
//
// Two analytic solutions:
//
// Solution with singularity:
// , a( ( 1.0 / sqrt(T_zmax) + 1.0 / sqrt(T_zmin) ) / ( zmax - zmin ) )
// , b( -1.0 / sqrt(T_zmin) )
//
// Solution without singularity:
// , a( ( 1.0 / sqrt(T_zmax) - 1.0 / sqrt(T_zmin) ) / ( zmax - zmin ) )
// , b( 1.0 / sqrt(T_zmin) )
const double zmin ;
const double zmax ;
const double T_zmin ;
const double T_zmax ;
const double a ;
const double b ;
const double K ;
ManufacturedSolution( const double arg_zmin ,
const double arg_zmax ,
const double arg_T_zmin ,
const double arg_T_zmax )
: zmin( arg_zmin )
, zmax( arg_zmax )
, T_zmin( arg_T_zmin )
, T_zmax( arg_T_zmax )
, a( ( 1.0 / sqrt(T_zmax) - 1.0 / sqrt(T_zmin) ) / ( zmax - zmin ) )
, b( 1.0 / sqrt(T_zmin) )
, K( 1.0 / ( 6.0 * a * a ) )
{}
double operator()( const double z ) const
{
const double tmp = a * ( z - zmin ) + b ;
return 1.0 / ( tmp * tmp );
}
};
} /* namespace FENL */
} /* namespace Example */
} /* namespace Kokkos */
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Example {
namespace FENL {
template < class Space , BoxElemPart::ElemOrder ElemOrder >
Perf fenl(
MPI_Comm comm ,
const int use_print ,
const int use_trials ,
const int use_atomic ,
const int use_elems[] )
{
typedef Kokkos::Example::BoxElemFixture< Space , ElemOrder > FixtureType ;
typedef Kokkos::Example::CrsMatrix< double , Space >
SparseMatrixType ;
typedef typename SparseMatrixType::StaticCrsGraphType
SparseGraphType ;
typedef Kokkos::Example::FENL::NodeNodeGraph< typename FixtureType::elem_node_type , SparseGraphType , FixtureType::ElemNode >
NodeNodeGraphType ;
typedef Kokkos::Example::FENL::ElementComputation< FixtureType , SparseMatrixType >
ElementComputationType ;
typedef Kokkos::Example::FENL::DirichletComputation< FixtureType , SparseMatrixType >
DirichletComputationType ;
typedef NodeElemGatherFill< ElementComputationType >
NodeElemGatherFillType ;
typedef typename ElementComputationType::vector_type VectorType ;
typedef Kokkos::Example::VectorImport<
typename FixtureType::comm_list_type ,
typename FixtureType::send_nodeid_type ,
VectorType > ImportType ;
//------------------------------------
const unsigned newton_iteration_limit = 10 ;
const double newton_iteration_tolerance = 1e-7 ;
const unsigned cg_iteration_limit = 200 ;
const double cg_iteration_tolerance = 1e-7 ;
//------------------------------------
- const int print_flag = use_print && Kokkos::Impl::is_same< Kokkos::HostSpace , typename Space::memory_space >::value ;
+ const int print_flag = use_print && std::is_same< Kokkos::HostSpace , typename Space::memory_space >::value ;
int comm_rank ;
int comm_size ;
MPI_Comm_rank( comm , & comm_rank );
MPI_Comm_size( comm , & comm_size );
// Decompose by node to avoid mpi-communication for assembly
const float bubble_x = 1.0 ;
const float bubble_y = 1.0 ;
const float bubble_z = 1.0 ;
const FixtureType fixture( BoxElemPart::DecomposeNode , comm_size , comm_rank ,
use_elems[0] , use_elems[1] , use_elems[2] ,
bubble_x , bubble_y , bubble_z );
{
int global_error = ! fixture.ok();
#if defined( KOKKOS_HAVE_MPI )
int local_error = global_error ;
global_error = 0 ;
MPI_Allreduce( & local_error , & global_error , 1 , MPI_INT , MPI_SUM , comm );
#endif
if ( global_error ) {
throw std::runtime_error(std::string("Error generating finite element fixture"));
}
}
//------------------------------------
const ImportType comm_nodal_import(
comm ,
fixture.recv_node() ,
fixture.send_node() ,
fixture.send_nodeid() ,
fixture.node_count_owned() ,
fixture.node_count() - fixture.node_count_owned() );
//------------------------------------
const double bc_lower_value = 1 ;
const double bc_upper_value = 2 ;
const Kokkos::Example::FENL::ManufacturedSolution
manufactured_solution( 0 , 1 , bc_lower_value , bc_upper_value );
//------------------------------------
for ( int k = 0 ; k < comm_size && use_print ; ++k ) {
if ( k == comm_rank ) {
typename FixtureType::node_grid_type::HostMirror
h_node_grid = Kokkos::create_mirror_view( fixture.node_grid() );
typename FixtureType::node_coord_type::HostMirror
h_node_coord = Kokkos::create_mirror_view( fixture.node_coord() );
typename FixtureType::elem_node_type::HostMirror
h_elem_node = Kokkos::create_mirror_view( fixture.elem_node() );
Kokkos::deep_copy( h_node_grid , fixture.node_grid() );
Kokkos::deep_copy( h_node_coord , fixture.node_coord() );
Kokkos::deep_copy( h_elem_node , fixture.elem_node() );
std::cout << "MPI[" << comm_rank << "]" << std::endl ;
std::cout << "Node grid {" ;
for ( unsigned inode = 0 ; inode < fixture.node_count() ; ++inode ) {
std::cout << " (" << h_node_grid(inode,0)
<< "," << h_node_grid(inode,1)
<< "," << h_node_grid(inode,2)
<< ")" ;
}
std::cout << " }" << std::endl ;
std::cout << "Node coord {" ;
for ( unsigned inode = 0 ; inode < fixture.node_count() ; ++inode ) {
std::cout << " (" << h_node_coord(inode,0)
<< "," << h_node_coord(inode,1)
<< "," << h_node_coord(inode,2)
<< ")" ;
}
std::cout << " }" << std::endl ;
std::cout << "Manufactured solution"
<< " a[" << manufactured_solution.a << "]"
<< " b[" << manufactured_solution.b << "]"
<< " K[" << manufactured_solution.K << "]"
<< " {" ;
for ( unsigned inode = 0 ; inode < fixture.node_count() ; ++inode ) {
std::cout << " " << manufactured_solution( h_node_coord( inode , 2 ) );
}
std::cout << " }" << std::endl ;
std::cout << "ElemNode {" << std::endl ;
for ( unsigned ielem = 0 ; ielem < fixture.elem_count() ; ++ielem ) {
std::cout << " elem[" << ielem << "]{" ;
for ( unsigned inode = 0 ; inode < FixtureType::ElemNode ; ++inode ) {
std::cout << " " << h_elem_node(ielem,inode);
}
std::cout << " }{" ;
for ( unsigned inode = 0 ; inode < FixtureType::ElemNode ; ++inode ) {
std::cout << " (" << h_node_grid(h_elem_node(ielem,inode),0)
<< "," << h_node_grid(h_elem_node(ielem,inode),1)
<< "," << h_node_grid(h_elem_node(ielem,inode),2)
<< ")" ;
}
std::cout << " }" << std::endl ;
}
std::cout << "}" << std::endl ;
}
std::cout.flush();
MPI_Barrier( comm );
}
//------------------------------------
Kokkos::Timer wall_clock ;
Perf perf_stats = Perf() ;
for ( int itrial = 0 ; itrial < use_trials ; ++itrial ) {
Perf perf = Perf() ;
perf.global_elem_count = fixture.elem_count_global();
perf.global_node_count = fixture.node_count_global();
//----------------------------------
// Create the sparse matrix graph and element-to-graph map
// from the element->to->node identifier array.
// The graph only has rows for the owned nodes.
typename NodeNodeGraphType::Times graph_times;
const NodeNodeGraphType
mesh_to_graph( fixture.elem_node() , fixture.node_count_owned(), graph_times );
perf.map_ratio = maximum(comm, graph_times.ratio);
perf.fill_node_set = maximum(comm, graph_times.fill_node_set);
perf.scan_node_count = maximum(comm, graph_times.scan_node_count);
perf.fill_graph_entries = maximum(comm, graph_times.fill_graph_entries);
perf.sort_graph_entries = maximum(comm, graph_times.sort_graph_entries);
perf.fill_element_graph = maximum(comm, graph_times.fill_element_graph);
wall_clock.reset();
// Create the sparse matrix from the graph:
SparseMatrixType jacobian( mesh_to_graph.graph );
Space::fence();
perf.create_sparse_matrix = maximum( comm , wall_clock.seconds() );
//----------------------------------
for ( int k = 0 ; k < comm_size && print_flag ; ++k ) {
if ( k == comm_rank ) {
const unsigned nrow = jacobian.graph.numRows();
std::cout << "MPI[" << comm_rank << "]" << std::endl ;
std::cout << "JacobianGraph {" << std::endl ;
for ( unsigned irow = 0 ; irow < nrow ; ++irow ) {
std::cout << " row[" << irow << "]{" ;
const unsigned entry_end = jacobian.graph.row_map(irow+1);
for ( unsigned entry = jacobian.graph.row_map(irow) ; entry < entry_end ; ++entry ) {
std::cout << " " << jacobian.graph.entries(entry);
}
std::cout << " }" << std::endl ;
}
std::cout << "}" << std::endl ;
std::cout << "ElemGraph {" << std::endl ;
for ( unsigned ielem = 0 ; ielem < mesh_to_graph.elem_graph.dimension_0() ; ++ielem ) {
std::cout << " elem[" << ielem << "]{" ;
for ( unsigned irow = 0 ; irow < mesh_to_graph.elem_graph.dimension_1() ; ++irow ) {
std::cout << " {" ;
for ( unsigned icol = 0 ; icol < mesh_to_graph.elem_graph.dimension_2() ; ++icol ) {
std::cout << " " << mesh_to_graph.elem_graph(ielem,irow,icol);
}
std::cout << " }" ;
}
std::cout << " }" << std::endl ;
}
std::cout << "}" << std::endl ;
}
std::cout.flush();
MPI_Barrier( comm );
}
//----------------------------------
// Allocate solution vector for each node in the mesh and residual vector for each owned node
const VectorType nodal_solution( "nodal_solution" , fixture.node_count() );
const VectorType nodal_residual( "nodal_residual" , fixture.node_count_owned() );
const VectorType nodal_delta( "nodal_delta" , fixture.node_count_owned() );
// Create element computation functor
const ElementComputationType elemcomp(
use_atomic ? ElementComputationType( fixture , manufactured_solution.K , nodal_solution ,
mesh_to_graph.elem_graph , jacobian , nodal_residual )
: ElementComputationType( fixture , manufactured_solution.K , nodal_solution ) );
const NodeElemGatherFillType gatherfill(
use_atomic ? NodeElemGatherFillType()
: NodeElemGatherFillType( fixture.elem_node() ,
mesh_to_graph.elem_graph ,
nodal_residual ,
jacobian ,
elemcomp.elem_residuals ,
elemcomp.elem_jacobians ) );
// Create boundary condition functor
const DirichletComputationType dirichlet(
fixture , nodal_solution , jacobian , nodal_residual ,
2 /* apply at 'z' ends */ ,
manufactured_solution.T_zmin ,
manufactured_solution.T_zmax );
//----------------------------------
// Nonlinear Newton iteration:
double residual_norm_init = 0 ;
for ( perf.newton_iter_count = 0 ;
perf.newton_iter_count < newton_iteration_limit ;
++perf.newton_iter_count ) {
//--------------------------------
comm_nodal_import( nodal_solution );
//--------------------------------
// Element contributions to residual and jacobian
wall_clock.reset();
Kokkos::deep_copy( nodal_residual , double(0) );
Kokkos::deep_copy( jacobian.coeff , double(0) );
elemcomp.apply();
if ( ! use_atomic ) {
gatherfill.apply();
}
Space::fence();
perf.fill_time = maximum( comm , wall_clock.seconds() );
//--------------------------------
// Apply boundary conditions
wall_clock.reset();
dirichlet.apply();
Space::fence();
perf.bc_time = maximum( comm , wall_clock.seconds() );
//--------------------------------
// Evaluate convergence
const double residual_norm =
std::sqrt(
Kokkos::Example::all_reduce(
Kokkos::Example::dot( fixture.node_count_owned() , nodal_residual, nodal_residual ) , comm ) );
perf.newton_residual = residual_norm ;
if ( 0 == perf.newton_iter_count ) { residual_norm_init = residual_norm ; }
if ( residual_norm < residual_norm_init * newton_iteration_tolerance ) { break ; }
//--------------------------------
// Solve for nonlinear update
CGSolveResult cg_result ;
Kokkos::Example::cgsolve( comm_nodal_import
, jacobian
, nodal_residual
, nodal_delta
, cg_iteration_limit
, cg_iteration_tolerance
, & cg_result
);
// Update solution vector
Kokkos::Example::waxpby( fixture.node_count_owned() , nodal_solution , -1.0 , nodal_delta , 1.0 , nodal_solution );
perf.cg_iter_count += cg_result.iteration ;
perf.matvec_time += cg_result.matvec_time ;
perf.cg_time += cg_result.iter_time ;
//--------------------------------
if ( print_flag ) {
const double delta_norm =
std::sqrt(
Kokkos::Example::all_reduce(
Kokkos::Example::dot( fixture.node_count_owned() , nodal_delta, nodal_delta ) , comm ) );
if ( 0 == comm_rank ) {
std::cout << "Newton iteration[" << perf.newton_iter_count << "]"
<< " residual[" << perf.newton_residual << "]"
<< " update[" << delta_norm << "]"
<< " cg_iteration[" << cg_result.iteration << "]"
<< " cg_residual[" << cg_result.norm_res << "]"
<< std::endl ;
}
for ( int k = 0 ; k < comm_size ; ++k ) {
if ( k == comm_rank ) {
const unsigned nrow = jacobian.graph.numRows();
std::cout << "MPI[" << comm_rank << "]" << std::endl ;
std::cout << "Residual {" ;
for ( unsigned irow = 0 ; irow < nrow ; ++irow ) {
std::cout << " " << nodal_residual(irow);
}
std::cout << " }" << std::endl ;
std::cout << "Delta {" ;
for ( unsigned irow = 0 ; irow < nrow ; ++irow ) {
std::cout << " " << nodal_delta(irow);
}
std::cout << " }" << std::endl ;
std::cout << "Solution {" ;
for ( unsigned irow = 0 ; irow < nrow ; ++irow ) {
std::cout << " " << nodal_solution(irow);
}
std::cout << " }" << std::endl ;
std::cout << "Jacobian[ "
<< jacobian.graph.numRows() << " x " << Kokkos::maximum_entry( jacobian.graph )
<< " ] {" << std::endl ;
for ( unsigned irow = 0 ; irow < nrow ; ++irow ) {
std::cout << " {" ;
const unsigned entry_end = jacobian.graph.row_map(irow+1);
for ( unsigned entry = jacobian.graph.row_map(irow) ; entry < entry_end ; ++entry ) {
std::cout << " (" << jacobian.graph.entries(entry)
<< "," << jacobian.coeff(entry)
<< ")" ;
}
std::cout << " }" << std::endl ;
}
std::cout << "}" << std::endl ;
}
std::cout.flush();
MPI_Barrier( comm );
}
}
//--------------------------------
}
// Evaluate solution error
if ( 0 == itrial ) {
const typename FixtureType::node_coord_type::HostMirror
h_node_coord = Kokkos::create_mirror_view( fixture.node_coord() );
const typename VectorType::HostMirror
h_nodal_solution = Kokkos::create_mirror_view( nodal_solution );
Kokkos::deep_copy( h_node_coord , fixture.node_coord() );
Kokkos::deep_copy( h_nodal_solution , nodal_solution );
double error_max = 0 ;
for ( unsigned inode = 0 ; inode < fixture.node_count_owned() ; ++inode ) {
const double answer = manufactured_solution( h_node_coord( inode , 2 ) );
const double error = ( h_nodal_solution(inode) - answer ) / answer ;
if ( error_max < fabs( error ) ) { error_max = fabs( error ); }
}
perf.error_max = std::sqrt( Kokkos::Example::all_reduce_max( error_max , comm ) );
perf_stats = perf ;
}
else {
perf_stats.fill_node_set = std::min( perf_stats.fill_node_set , perf.fill_node_set );
perf_stats.scan_node_count = std::min( perf_stats.scan_node_count , perf.scan_node_count );
perf_stats.fill_graph_entries = std::min( perf_stats.fill_graph_entries , perf.fill_graph_entries );
perf_stats.sort_graph_entries = std::min( perf_stats.sort_graph_entries , perf.sort_graph_entries );
perf_stats.fill_element_graph = std::min( perf_stats.fill_element_graph , perf.fill_element_graph );
perf_stats.create_sparse_matrix = std::min( perf_stats.create_sparse_matrix , perf.create_sparse_matrix );
perf_stats.fill_time = std::min( perf_stats.fill_time , perf.fill_time );
perf_stats.bc_time = std::min( perf_stats.bc_time , perf.bc_time );
perf_stats.cg_time = std::min( perf_stats.cg_time , perf.cg_time );
}
}
return perf_stats ;
}
} /* namespace FENL */
} /* namespace Example */
} /* namespace Kokkos */
#endif /* #ifndef KOKKOS_EXAMPLE_FENL_IMPL_HPP */
diff --git a/lib/kokkos/example/fixture/Makefile b/lib/kokkos/example/fixture/Makefile
index 990f4f18e..5e684e344 100644
--- a/lib/kokkos/example/fixture/Makefile
+++ b/lib/kokkos/example/fixture/Makefile
@@ -1,48 +1,46 @@
KOKKOS_PATH = ../..
+KOKKOS_SRC_PATH = ${KOKKOS_PATH}
+vpath %.cpp ${KOKKOS_SRC_PATH}/example/fixture
-vpath %.cpp ${KOKKOS_PATH}/example/fixture
-
-EXAMPLE_HEADERS = $(wildcard $(KOKKOS_PATH)/example/common/*.hpp ${KOKKOS_PATH}/example/fixture/*.hpp )
+EXAMPLE_HEADERS = $(wildcard $(KOKKOS_SRC_PATH)/example/common/*.hpp ${KOKKOS_SRC_PATH}/example/fixture/*.hpp )
default: build_all
echo "End Build"
-
-include $(KOKKOS_PATH)/Makefile.kokkos
-
-ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
- CXX = $(NVCC_WRAPPER)
- CXXFLAGS ?= -O3
- LINK = $(CXX)
- LDFLAGS ?= -lpthread
+
+ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
+ CXX = $(KOKKOS_PATH)/bin/nvcc_wrapper
else
- CXX ?= g++
- CXXFLAGS ?= -O3
- LINK ?= $(CXX)
- LDFLAGS ?= -lpthread
+ CXX = g++
endif
+CXXFLAGS = -O3
+LINK ?= $(CXX)
+LDFLAGS ?=
+
+include $(KOKKOS_PATH)/Makefile.kokkos
+
KOKKOS_CXXFLAGS += \
- -I${KOKKOS_PATH}/example/common \
- -I${KOKKOS_PATH}/example/fixture
+ -I${KOKKOS_SRC_PATH}/example/common \
+ -I${KOKKOS_SRC_PATH}/example/fixture
EXE_EXAMPLE_FIXTURE = KokkosExample_Fixture
OBJ_EXAMPLE_FIXTURE = Main.o TestFixture.o BoxElemPart.o
TARGETS = $(EXE_EXAMPLE_FIXTURE)
#TEST_TARGETS =
$(EXE_EXAMPLE_FIXTURE) : $(OBJ_EXAMPLE_FIXTURE) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_EXAMPLE_FIXTURE) $(KOKKOS_LIBS) $(LIB) -o $(EXE_EXAMPLE_FIXTURE)
build_all : $(TARGETS)
test : build_all
clean: kokkos-clean
rm -f *.o $(TARGETS)
# Compilation rules
%.o:%.cpp $(KOKKOS_CPP_DEPENDS) $(EXAMPLE_HEADERS)
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
diff --git a/lib/kokkos/example/global_2_local_ids/Makefile b/lib/kokkos/example/global_2_local_ids/Makefile
index bf8fbea3e..42b376ec7 100644
--- a/lib/kokkos/example/global_2_local_ids/Makefile
+++ b/lib/kokkos/example/global_2_local_ids/Makefile
@@ -1,53 +1,46 @@
KOKKOS_PATH ?= ../..
MAKEFILE_PATH := $(abspath $(lastword $(MAKEFILE_LIST)))
SRC_DIR := $(dir $(MAKEFILE_PATH))
SRC = $(wildcard $(SRC_DIR)/*.cpp)
OBJ = $(SRC:$(SRC_DIR)/%.cpp=%.o)
#SRC = $(wildcard *.cpp)
#OBJ = $(SRC:%.cpp=%.o)
default: build
echo "Start Build"
-# use installed Makefile.kokkos
-include $(KOKKOS_PATH)/Makefile.kokkos
-
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
-CXX = $(NVCC_WRAPPER)
-CXXFLAGS = -I$(SRC_DIR) -O3
-LINK = $(CXX)
-LINKFLAGS =
-EXE = $(addsuffix .cuda, $(shell basename $(SRC_DIR)))
-#KOKKOS_DEVICES = "Cuda,OpenMP"
-#KOKKOS_ARCH = "SNB,Kepler35"
+ CXX = $(KOKKOS_PATH)/bin/nvcc_wrapper
+ EXE = $(addsuffix .cuda, $(shell basename $(SRC_DIR)))
else
-CXX = g++
-CXXFLAGS = -I$(SRC_DIR) -O3
-LINK = $(CXX)
-LINKFLAGS =
-EXE = $(addsuffix .host, $(shell basename $(SRC_DIR)))
-#KOKKOS_DEVICES = "OpenMP"
-#KOKKOS_ARCH = "SNB"
+ CXX = g++
+ EXE = $(addsuffix .host, $(shell basename $(SRC_DIR)))
endif
+CXXFLAGS = -O3 -I$(SRC_DIR)
+LINK ?= $(CXX)
+LDFLAGS ?=
+
+include $(KOKKOS_PATH)/Makefile.kokkos
+
DEPFLAGS = -M
LIB =
build: $(EXE)
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
clean:
rm -f *.a *.o *.cuda *.host
# Compilation rules
%.o:$(SRC_DIR)/%.cpp $(KOKKOS_CPP_DEPENDS)
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
diff --git a/lib/kokkos/example/grow_array/Makefile b/lib/kokkos/example/grow_array/Makefile
index bf8fbea3e..42b376ec7 100644
--- a/lib/kokkos/example/grow_array/Makefile
+++ b/lib/kokkos/example/grow_array/Makefile
@@ -1,53 +1,46 @@
KOKKOS_PATH ?= ../..
MAKEFILE_PATH := $(abspath $(lastword $(MAKEFILE_LIST)))
SRC_DIR := $(dir $(MAKEFILE_PATH))
SRC = $(wildcard $(SRC_DIR)/*.cpp)
OBJ = $(SRC:$(SRC_DIR)/%.cpp=%.o)
#SRC = $(wildcard *.cpp)
#OBJ = $(SRC:%.cpp=%.o)
default: build
echo "Start Build"
-# use installed Makefile.kokkos
-include $(KOKKOS_PATH)/Makefile.kokkos
-
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
-CXX = $(NVCC_WRAPPER)
-CXXFLAGS = -I$(SRC_DIR) -O3
-LINK = $(CXX)
-LINKFLAGS =
-EXE = $(addsuffix .cuda, $(shell basename $(SRC_DIR)))
-#KOKKOS_DEVICES = "Cuda,OpenMP"
-#KOKKOS_ARCH = "SNB,Kepler35"
+ CXX = $(KOKKOS_PATH)/bin/nvcc_wrapper
+ EXE = $(addsuffix .cuda, $(shell basename $(SRC_DIR)))
else
-CXX = g++
-CXXFLAGS = -I$(SRC_DIR) -O3
-LINK = $(CXX)
-LINKFLAGS =
-EXE = $(addsuffix .host, $(shell basename $(SRC_DIR)))
-#KOKKOS_DEVICES = "OpenMP"
-#KOKKOS_ARCH = "SNB"
+ CXX = g++
+ EXE = $(addsuffix .host, $(shell basename $(SRC_DIR)))
endif
+CXXFLAGS = -O3 -I$(SRC_DIR)
+LINK ?= $(CXX)
+LDFLAGS ?=
+
+include $(KOKKOS_PATH)/Makefile.kokkos
+
DEPFLAGS = -M
LIB =
build: $(EXE)
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
clean:
rm -f *.a *.o *.cuda *.host
# Compilation rules
%.o:$(SRC_DIR)/%.cpp $(KOKKOS_CPP_DEPENDS)
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
diff --git a/lib/kokkos/example/ichol/Makefile b/lib/kokkos/example/ichol/Makefile
deleted file mode 100644
index 57e972f04..000000000
--- a/lib/kokkos/example/ichol/Makefile
+++ /dev/null
@@ -1,63 +0,0 @@
-SCOTCH_PATH = /home/hcedwar/scotch/6.0.0
-KOKKOS_PATH = ../..
-
-vpath %.cpp ${KOKKOS_PATH}/example/ichol/src ${KOKKOS_PATH}/example/ichol/example
-
-EXAMPLE_HEADERS = $(wildcard $(KOKKOS_PATH)/example/ichol/src/*.hpp ${KOKKOS_PATH}/example/ichol/example/*.hpp )
-
-default: build_all
- echo "End Build"
-
-include $(KOKKOS_PATH)/Makefile.kokkos
-
-ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
- CXX = $(NVCC_WRAPPER)
- CXXFLAGS ?= -O3
- LINK = $(CXX)
- LDFLAGS ?= -lpthread
-else
- CXX ?= g++
- CXXFLAGS ?= -O3
- LINK ?= $(CXX)
- LDFLAGS ?= -lpthread
-endif
-
-KOKKOS_CXXFLAGS += \
- -I${KOKKOS_PATH}/example/ichol/src \
- -I${KOKKOS_PATH}/example/ichol/example \
- -I${SCOTCH_PATH}/include
-
-EXE_EXAMPLE_ICHOL_THREADS = KokkosExample_ichol_threads
-OBJ_EXAMPLE_ICHOL_THREADS = example_chol_performance_device_pthread.o
-
-EXE_EXAMPLE_ICHOL_CUDA = KokkosExample_ichol_cuda
-OBJ_EXAMPLE_ICHOL_CUDA = example_chol_performance_device_cuda.o
-
-TARGETS = $(EXE_EXAMPLE_ICHOL_THREADS) $(EXE_EXAMPLE_ICHOL_CUDA)
-
-#TEST_TARGETS =
-
-$(EXE_EXAMPLE_ICHOL_THREADS) : $(OBJ_EXAMPLE_ICHOL_THREADS) $(KOKKOS_LINK_DEPENDS)
- $(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) \
- $(OBJ_EXAMPLE_ICHOL_THREADS) $(KOKKOS_LIBS) $(LIB) \
- -L${SCOTCH_PATH}/lib -lscotch -lscotcherr -lscotcherrexit \
- -o $(EXE_EXAMPLE_ICHOL_THREADS)
-
-$(EXE_EXAMPLE_ICHOL_CUDA) : $(OBJ_EXAMPLE_ICHOL_CUDA) $(KOKKOS_LINK_DEPENDS)
- $(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) \
- $(OBJ_EXAMPLE_ICHOL_CUDA) $(KOKKOS_LIBS) $(LIB) \
- -L${SCOTCH_PATH}/lib -lscotch -lscotcherr -lscotcherrexit \
- -o $(EXE_EXAMPLE_ICHOL_CUDA)
-
-build_all : $(TARGETS)
-
-test : build_all
-
-clean: kokkos-clean
- rm -f *.o $(TARGETS)
-
-# Compilation rules
-
-%.o:%.cpp $(KOKKOS_CPP_DEPENDS) $(EXAMPLE_HEADERS)
- $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
-
diff --git a/lib/kokkos/example/ichol/example/example_chol_performance_device.hpp b/lib/kokkos/example/ichol/example/example_chol_performance_device.hpp
deleted file mode 100644
index ca819e4f9..000000000
--- a/lib/kokkos/example/ichol/example/example_chol_performance_device.hpp
+++ /dev/null
@@ -1,240 +0,0 @@
-#pragma once
-#ifndef __EXAMPLE_CHOL_PERFORMANCE_DEVICE_HPP__
-#define __EXAMPLE_CHOL_PERFORMANCE_DEVICE_HPP__
-
-#include <Kokkos_Core.hpp>
-#include <impl/Kokkos_Timer.hpp>
-
-#include "util.hpp"
-
-#include "crs_matrix_base.hpp"
-#include "crs_matrix_view.hpp"
-#include "crs_row_view.hpp"
-
-#include "graph_helper_scotch.hpp"
-#include "symbolic_factor_helper.hpp"
-#include "crs_matrix_helper.hpp"
-
-#include "task_view.hpp"
-
-#include "task_factory.hpp"
-
-#include "chol.hpp"
-
-namespace Tacho {
-
- using namespace std;
-
- template<typename ValueType,
- typename OrdinalType,
- typename SizeType = OrdinalType,
- typename SpaceType = void>
- int exampleCholPerformanceDevice(const string file_input,
- const int treecut,
- const int prunecut,
- const int seed,
- const int nthreads,
- const int max_task_dependence,
- const int max_concurrency,
- const int team_size,
- const int fill_level,
- const int league_size,
- const bool skip_serial,
- const bool verbose) {
- typedef ValueType value_type;
- typedef OrdinalType ordinal_type;
- typedef SizeType size_type;
- typedef typename
- Kokkos::Impl::is_space< SpaceType >::host_mirror_space::execution_space
- HostSpaceType ;
-
- typedef TaskFactory<Kokkos::Experimental::TaskPolicy<SpaceType>,
- Kokkos::Experimental::Future<int,SpaceType> > TaskFactoryType;
-
- typedef CrsMatrixBase<value_type,ordinal_type,size_type,SpaceType>
- CrsMatrixBaseType;
-
- typedef CrsMatrixBase<value_type,ordinal_type,size_type,HostSpaceType>
- CrsMatrixBaseHostType;
-
- typedef Kokkos::MemoryUnmanaged MemoryUnmanaged ;
-
- typedef CrsMatrixBase<value_type,ordinal_type,size_type,SpaceType,MemoryUnmanaged >
- CrsMatrixNestedType;
-
-
- typedef GraphHelper_Scotch<CrsMatrixBaseHostType> GraphHelperType;
- typedef SymbolicFactorHelper<CrsMatrixBaseHostType> SymbolicFactorHelperType;
-
- typedef CrsMatrixView<CrsMatrixNestedType> CrsMatrixViewType;
- typedef TaskView<CrsMatrixViewType,TaskFactoryType> CrsTaskViewType;
-
- typedef CrsMatrixBase<CrsTaskViewType,ordinal_type,size_type,SpaceType> CrsHierMatrixBaseType;
-
- typedef CrsMatrixView<CrsHierMatrixBaseType> CrsHierMatrixViewType;
- typedef TaskView<CrsHierMatrixViewType,TaskFactoryType> CrsHierTaskViewType;
-
- int r_val = 0;
-
- Kokkos::Timer timer;
- double
- t_import = 0.0,
- t_reorder = 0.0,
- t_symbolic = 0.0,
- t_flat2hier = 0.0,
- t_factor_task = 0.0;
-
- cout << "CholPerformanceDevice:: import input file = " << file_input << endl;
- CrsMatrixBaseHostType AA("AA");
- {
- timer.reset();
-
- ifstream in;
- in.open(file_input);
- if (!in.good()) {
- cout << "Failed in open the file: " << file_input << endl;
- return ++r_val;
- }
- AA.importMatrixMarket(in);
-
- t_import = timer.seconds();
-
- if (verbose) {
- AA.showMe( std::cout );
- std::cout << endl;
- }
- }
- cout << "CholPerformanceDevice:: import input file::time = " << t_import << endl;
-
- cout << "CholPerformanceDevice:: reorder the matrix" << endl;
- CrsMatrixBaseHostType PA("Permuted AA");
-
- // '*_UU' is the permuted base upper triangular matrix
- CrsMatrixBaseHostType host_UU("host_UU");
- CrsMatrixBaseType device_UU("UU");
- CrsHierMatrixBaseType device_HU("HU");;
-
- // typename CrsMatrixBaseHostType host_UU("host_UU");
-
- {
- typename GraphHelperType::size_type_array rptr("Graph::RowPtrArray", AA.NumRows() + 1);
- typename GraphHelperType::ordinal_type_array cidx("Graph::ColIndexArray", AA.NumNonZeros());
-
- AA.convertGraph(rptr, cidx);
- GraphHelperType S("ScotchHelper",
- AA.NumRows(),
- rptr,
- cidx,
- seed);
- {
- timer.reset();
-
- S.computeOrdering(treecut, 0);
- S.pruneTree(prunecut);
-
- PA.copy(S.PermVector(), S.InvPermVector(), AA);
-
- t_reorder = timer.seconds();
-
- if (verbose) {
- S.showMe( std::cout );
- std::cout << std::endl ;
- PA.showMe( std::cout );
- std::cout << std::endl ;
- }
- }
-
- // Symbolic factorization adds non-zero entries
- // for factorization levels.
- // Runs on the host process and currently requires std::sort.
-
- cout << "CholPerformanceDevice:: reorder the matrix::time = " << t_reorder << endl;
- {
- SymbolicFactorHelperType F(PA, league_size);
- timer.reset();
- F.createNonZeroPattern(fill_level, Uplo::Upper, host_UU);
- t_symbolic = timer.seconds();
- cout << "CholPerformanceDevice:: AA (nnz) = " << AA.NumNonZeros() << ", host_UU (nnz) = " << host_UU.NumNonZeros() << endl;
-
- if (verbose) {
- F.showMe( std::cout );
- std::cout << std::endl ;
- host_UU.showMe( std::cout );
- std::cout << std::endl ;
- }
- }
- cout << "CholPerformanceDevice:: symbolic factorization::time = " << t_symbolic << endl;
-
- //----------------------------------------------------------------------
- // Allocate device_UU conformal to host_UU
- // and deep_copy host_UU arrays to device_UU arrays.
- // Set up device_HU referencing blocks of device_UU
-
- {
- timer.reset();
-
- device_UU.copy( host_UU );
-
- CrsMatrixHelper::flat2hier(Uplo::Upper, device_UU, device_HU,
- S.NumBlocks(),
- S.RangeVector(),
- S.TreeVector());
-
- // Filling non-zero block matrixes' row ranges within block view.
- // This is performed entirely in the 'device_HU' space.
-
- CrsMatrixHelper::fillRowViewArray( device_HU );
-
- t_flat2hier = timer.seconds();
-
- cout << "CholPerformanceDevice:: Hier (dof, nnz) = " << device_HU.NumRows() << ", " << device_HU.NumNonZeros() << endl;
- }
- cout << "CholPerformanceDevice:: copy base matrix and construct hierarchical matrix::time = " << t_flat2hier << endl;
- }
-
- cout << "CholPerformanceDevice:: max concurrency = " << max_concurrency << endl;
-
- const size_t max_task_size = 4*sizeof(CrsTaskViewType)+128;
- cout << "CholPerformanceDevice:: max task size = " << max_task_size << endl;
-
- //----------------------------------------------------------------------
- // From here onward all work is on the device.
- //----------------------------------------------------------------------
-
- {
- typename TaskFactoryType::policy_type policy(max_concurrency,
- max_task_size,
- max_task_dependence,
- team_size);
-
- cout << "CholPerformanceDevice:: ByBlocks factorize the matrix:: team_size = " << team_size << endl;
- CrsHierTaskViewType H( device_HU );
- {
- timer.reset();
- {
- // auto future = policy.proc_create_team(Chol<Uplo::Upper,AlgoChol::ByBlocks>::
- auto future = policy.proc_create_team(Chol<Uplo::Upper,AlgoChol::ByBlocks,Variant::Two>::
- TaskFunctor<CrsHierTaskViewType>(policy,H), 0);
- policy.spawn(future);
- Kokkos::Experimental::wait(policy);
- }
- t_factor_task += timer.seconds();
-
- cout << "CholPerformanceDevice:: policy.allocated_task_count = "
- << policy.allocated_task_count()
- << endl ;
-
- if (verbose) {
- host_UU.copy( device_UU );
- host_UU.showMe( std::cout );
- std::cout << endl;
- }
- }
- cout << "CholPerformanceDevice:: ByBlocks factorize the matrix::time = " << t_factor_task << endl;
- }
-
- return r_val;
- }
-}
-
-#endif
diff --git a/lib/kokkos/example/ichol/example/example_chol_performance_device_cuda.cpp b/lib/kokkos/example/ichol/example/example_chol_performance_device_cuda.cpp
deleted file mode 100644
index 3a0df586b..000000000
--- a/lib/kokkos/example/ichol/example/example_chol_performance_device_cuda.cpp
+++ /dev/null
@@ -1,70 +0,0 @@
-#include <Kokkos_Core.hpp>
-
-#include <Cuda/Kokkos_Cuda_TaskPolicy.hpp>
-
-using namespace std;
-
-typedef double value_type;
-typedef int ordinal_type;
-typedef int size_type;
-
-#include "example_chol_performance_device.hpp"
-
-using namespace Tacho;
-
-int main (int argc, char *argv[]) {
-
- string file_input = "test.mtx";
- int nthreads = 1;
- int max_task_dependence = 3;
- int max_concurrency = 1024;
- int team_size = 1;
- int fill_level = 0;
- int treecut = 0;
- int prunecut = 0;
- int seed = 0;
- int league_size = 1;
- bool verbose = false;
- for (int i=0;i<argc;++i) {
- if ((strcmp(argv[i],"--file-input") ==0)) { file_input = argv[++i]; continue;}
- if ((strcmp(argv[i],"--nthreads") ==0)) { nthreads = atoi(argv[++i]); continue;}
- if ((strcmp(argv[i],"--max-task-dependence")==0)) { max_task_dependence = atoi(argv[++i]); continue;}
- if ((strcmp(argv[i],"--max-concurrency") ==0)) { max_concurrency = atoi(argv[++i]); continue;}
- if ((strcmp(argv[i],"--team-size") ==0)) { team_size = atoi(argv[++i]); continue;}
-
- if ((strcmp(argv[i],"--fill-level") ==0)) { fill_level = atoi(argv[++i]); continue;}
- if ((strcmp(argv[i],"--league-size") ==0)) { league_size = atoi(argv[++i]); continue;}
- if ((strcmp(argv[i],"--treecut") ==0)) { treecut = atoi(argv[++i]); continue;}
- if ((strcmp(argv[i],"--prunecut") ==0)) { prunecut = atoi(argv[++i]); continue;}
- if ((strcmp(argv[i],"--seed") ==0)) { seed = atoi(argv[++i]); continue;}
- if ((strcmp(argv[i],"--enable-verbose") ==0)) { verbose = true; continue;}
- }
-
- int r_val = 0;
- {
- typedef Kokkos::Cuda exec_space;
-
- Kokkos::DefaultHostExecutionSpace::initialize(nthreads);
-
- exec_space::initialize();
- exec_space::print_configuration(cout, true);
-
- r_val = exampleCholPerformanceDevice
- <value_type,ordinal_type,size_type,exec_space>
- (file_input,
- treecut,
- prunecut,
- seed,
- nthreads,
- max_task_dependence, max_concurrency, team_size,
- fill_level, league_size,
- (nthreads != 1), // skip_serial
- verbose);
-
- exec_space::finalize();
-
- Kokkos::DefaultHostExecutionSpace::finalize();
- }
-
- return r_val;
-}
diff --git a/lib/kokkos/example/ichol/example/example_chol_performance_device_pthread.cpp b/lib/kokkos/example/ichol/example/example_chol_performance_device_pthread.cpp
deleted file mode 100644
index 68f520cf6..000000000
--- a/lib/kokkos/example/ichol/example/example_chol_performance_device_pthread.cpp
+++ /dev/null
@@ -1,67 +0,0 @@
-#include <Kokkos_Core.hpp>
-
-#include <Kokkos_Threads.hpp>
-#include <Threads/Kokkos_Threads_TaskPolicy.hpp>
-
-using namespace std;
-
-typedef double value_type;
-typedef int ordinal_type;
-typedef int size_type;
-
-typedef Kokkos::Threads exec_space;
-
-#include "example_chol_performance_device.hpp"
-
-using namespace Tacho;
-
-int main (int argc, char *argv[]) {
-
- string file_input = "test.mtx";
- int nthreads = 1;
- int max_task_dependence = 3;
- int max_concurrency = 1024;
- int team_size = 1;
- int fill_level = 0;
- int treecut = 0;
- int prunecut = 0;
- int seed = 0;
- int league_size = 1;
- bool verbose = false;
- for (int i=0;i<argc;++i) {
- if ((strcmp(argv[i],"--file-input") ==0)) { file_input = argv[++i]; continue;}
- if ((strcmp(argv[i],"--nthreads") ==0)) { nthreads = atoi(argv[++i]); continue;}
- if ((strcmp(argv[i],"--max-task-dependence")==0)) { max_task_dependence = atoi(argv[++i]); continue;}
- if ((strcmp(argv[i],"--max-concurrency") ==0)) { max_concurrency = atoi(argv[++i]); continue;}
- if ((strcmp(argv[i],"--team-size") ==0)) { team_size = atoi(argv[++i]); continue;}
-
- if ((strcmp(argv[i],"--fill-level") ==0)) { fill_level = atoi(argv[++i]); continue;}
- if ((strcmp(argv[i],"--league-size") ==0)) { league_size = atoi(argv[++i]); continue;}
- if ((strcmp(argv[i],"--treecut") ==0)) { treecut = atoi(argv[++i]); continue;}
- if ((strcmp(argv[i],"--prunecut") ==0)) { prunecut = atoi(argv[++i]); continue;}
- if ((strcmp(argv[i],"--seed") ==0)) { seed = atoi(argv[++i]); continue;}
- if ((strcmp(argv[i],"--enable-verbose") ==0)) { verbose = true; continue;}
- }
-
- int r_val = 0;
- {
- exec_space::initialize(nthreads);
- exec_space::print_configuration(cout, true);
-
- r_val = exampleCholPerformanceDevice
- <value_type,ordinal_type,size_type,exec_space>
- (file_input,
- treecut,
- prunecut,
- seed,
- nthreads,
- max_task_dependence, max_concurrency, team_size,
- fill_level, league_size,
- (nthreads != 1), // skip_serial
- verbose);
-
- exec_space::finalize();
- }
-
- return r_val;
-}
diff --git a/lib/kokkos/example/ichol/src/chol.hpp b/lib/kokkos/example/ichol/src/chol.hpp
deleted file mode 100644
index e8aa4e918..000000000
--- a/lib/kokkos/example/ichol/src/chol.hpp
+++ /dev/null
@@ -1,92 +0,0 @@
-#pragma once
-#ifndef __CHOL_HPP__
-#define __CHOL_HPP__
-
-/// \file chol.hpp
-/// \brief Incomplete Cholesky factorization front interface.
-/// \author Kyungjoo Kim (kyukim@sandia.gov)
-
-#include "util.hpp"
-#include "control.hpp"
-#include "partition.hpp"
-
-namespace Tacho {
-
- using namespace std;
-
- // tasking interface
- // * default behavior is for non-by-blocks tasks
- // * control is only used for by-blocks algorithms
- // ===============================================
- template<int ArgUplo, int ArgAlgo,
- int ArgVariant = Variant::One,
- template<int,int> class ControlType = Control>
- class Chol {
- public:
-
- // function interface
- // ==================
- template<typename ExecViewType>
- KOKKOS_INLINE_FUNCTION
- static int invoke(typename ExecViewType::policy_type &policy,
- const typename ExecViewType::policy_type::member_type &member,
- typename ExecViewType::matrix_type &A);
-
- // task-data parallel interface
- // ============================
- template<typename ExecViewType>
- class TaskFunctor {
- public:
- typedef typename ExecViewType::policy_type policy_type;
- typedef typename policy_type::member_type member_type;
- typedef int value_type;
-
- private:
- typename ExecViewType::matrix_type _A;
-
- policy_type _policy;
-
- public:
- KOKKOS_INLINE_FUNCTION
- TaskFunctor(const policy_type & P ,
- const typename ExecViewType::matrix_type & A)
- : _A(A),
- _policy(P)
- { }
-
- string Label() const { return "Chol"; }
-
- // task execution
- KOKKOS_INLINE_FUNCTION
- void apply(value_type &r_val) {
- r_val = Chol::invoke<ExecViewType>(_policy, _policy.member_single(), _A);
- }
-
- // task-data execution
- KOKKOS_INLINE_FUNCTION
- void apply(const member_type &member, value_type &r_val) {
-
- const int result = Chol::invoke<ExecViewType>(_policy, member, _A);
-
- if ( 0 == member.team_rank() ) { r_val = result ; }
-
- }
-
- };
-
- };
-}
-
-
-// unblocked version blas operations
-#include "scale.hpp"
-
-// blocked version blas operations
-#include "gemm.hpp"
-#include "trsm.hpp"
-#include "herk.hpp"
-
-// cholesky
-#include "chol_u.hpp"
-
-#endif
diff --git a/lib/kokkos/example/ichol/src/chol_u.hpp b/lib/kokkos/example/ichol/src/chol_u.hpp
deleted file mode 100644
index 0465ef8f3..000000000
--- a/lib/kokkos/example/ichol/src/chol_u.hpp
+++ /dev/null
@@ -1,23 +0,0 @@
-#pragma once
-#ifndef __CHOL_U_HPP__
-#define __CHOL_U_HPP__
-
-/// \file chol_u.hpp
-/// \brief Upper Cholesky factorization variations
-/// \author Kyungjoo Kim (kyukim@sandia.gov)
-
-// testing task-data parallelism
-// #include "chol_u_unblocked_dummy.hpp"
-
-// flame style implementation
-//#include "chol_unblocked.hpp"
-//#include "chol_u_blocked.hpp"
-
-// triple for loop
-#include "chol_u_unblocked_opt1.hpp"
-#include "chol_u_unblocked_opt2.hpp"
-
-// partitioned block algorithms: see control.hpp
-#include "chol_u_right_look_by_blocks.hpp"
-
-#endif
diff --git a/lib/kokkos/example/ichol/src/chol_u_right_look_by_blocks.hpp b/lib/kokkos/example/ichol/src/chol_u_right_look_by_blocks.hpp
deleted file mode 100644
index e21bafa9f..000000000
--- a/lib/kokkos/example/ichol/src/chol_u_right_look_by_blocks.hpp
+++ /dev/null
@@ -1,394 +0,0 @@
-#pragma once
-#ifndef __CHOL_U_RIGHT_LOOK_BY_BLOCKS_HPP__
-#define __CHOL_U_RIGHT_LOOK_BY_BLOCKS_HPP__
-
-/// \file chol_u_right_look_by_blocks.hpp
-/// \brief Cholesky factorization by-blocks
-/// \author Kyungjoo Kim (kyukim@sandia.gov)
-
-/// The Partitioned-Block Matrix (PBM) is sparse and a block itself is a view of a sparse matrix.
-/// The algorithm generates tasks with a given sparse block matrix structure.
-
-// basic utils
-#include "util.hpp"
-#include "control.hpp"
-#include "partition.hpp"
-
-namespace Tacho {
-
- using namespace std;
-
- template< typename CrsTaskViewType >
- KOKKOS_INLINE_FUNCTION
- int releaseFutures( typename CrsTaskViewType::matrix_type & A )
- {
- typedef typename CrsTaskViewType::ordinal_type ordinal_type;
- typedef typename CrsTaskViewType::row_view_type row_view_type;
- typedef typename CrsTaskViewType::future_type future_type;
-
- row_view_type a(A,0);
-
- const ordinal_type nnz = a.NumNonZeros();
-
- for (ordinal_type j=0;j<nnz;++j) {
- a.Value(j).setFuture( future_type() );
- }
-
- return nnz ;
- }
-
- // ========================================
- // detailed workflow of by-blocks algorithm
- // ========================================
- template<int ArgVariant,
- template<int,int> class ControlType,
- typename CrsTaskViewType>
- class CholUpperRightLookByBlocks {
- public:
- KOKKOS_INLINE_FUNCTION
- static int genScalarTask(typename CrsTaskViewType::policy_type &policy,
- typename CrsTaskViewType::matrix_type &A) {
- typedef typename CrsTaskViewType::value_type value_type;
- typedef typename CrsTaskViewType::row_view_type row_view_type;
-
- typedef typename CrsTaskViewType::future_type future_type;
- typedef typename CrsTaskViewType::task_factory_type task_factory_type;
-
- row_view_type a(A, 0);
- value_type &aa = a.Value(0);
-
- // construct a task
- future_type f = task_factory_type::create(policy,
- typename Chol<Uplo::Upper,
- CtrlDetail(ControlType,AlgoChol::ByBlocks,ArgVariant,Chol)>
- ::template TaskFunctor<value_type>(policy,aa));
-
-
-if ( false ) {
- printf("Chol [%d +%d)x[%d +%d) spawn depend %d\n"
- , aa.OffsetRows()
- , aa.NumRows()
- , aa.OffsetCols()
- , aa.NumCols()
- , int( ! aa.Future().is_null() )
- );
-}
-
- // manage dependence
- task_factory_type::addDependence(policy, f, aa.Future());
- aa.setFuture(f);
-
- // spawn a task
- task_factory_type::spawn(policy, f, true /* high priority */ );
-
- return 1;
- }
-
- KOKKOS_INLINE_FUNCTION
- static int genTrsmTasks(typename CrsTaskViewType::policy_type &policy,
- typename CrsTaskViewType::matrix_type &A,
- typename CrsTaskViewType::matrix_type &B) {
- typedef typename CrsTaskViewType::ordinal_type ordinal_type;
- typedef typename CrsTaskViewType::row_view_type row_view_type;
- typedef typename CrsTaskViewType::value_type value_type;
-
- typedef typename CrsTaskViewType::future_type future_type;
- typedef typename CrsTaskViewType::task_factory_type task_factory_type;
-
- row_view_type a(A,0), b(B,0);
- value_type &aa = a.Value(0);
-
-if ( false ) {
- printf("genTrsmTasks after aa.Future().reference_count = %d\n"
- , aa.Future().reference_count());
-}
- const ordinal_type nnz = b.NumNonZeros();
- for (ordinal_type j=0;j<nnz;++j) {
- typedef typename
- Trsm< Side::Left,Uplo::Upper,Trans::ConjTranspose,
- CtrlDetail(ControlType,AlgoChol::ByBlocks,ArgVariant,Trsm)>
- ::template TaskFunctor<double,value_type,value_type>
- FunctorType ;
-
- value_type &bb = b.Value(j);
-
- future_type f = task_factory_type
- ::create(policy, FunctorType(policy,Diag::NonUnit, 1.0, aa, bb));
-
-if ( false ) {
- printf("Trsm [%d +%d)x[%d +%d) spawn depend %d %d\n"
- , bb.OffsetRows()
- , bb.NumRows()
- , bb.OffsetCols()
- , bb.NumCols()
- , int( ! aa.Future().is_null() )
- , int( ! bb.Future().is_null() )
- );
-}
-
- // trsm dependence
- task_factory_type::addDependence(policy, f, aa.Future());
-
- // self
- task_factory_type::addDependence(policy, f, bb.Future());
-
- // place task signature on b
- bb.setFuture(f);
-
- // spawn a task
- task_factory_type::spawn(policy, f, true /* high priority */);
- }
-
-if ( false ) {
- printf("genTrsmTasks after aa.Future().reference_count = %d\n"
- , aa.Future().reference_count());
-}
-
- return nnz ;
- }
-
- KOKKOS_INLINE_FUNCTION
- static int genHerkTasks(typename CrsTaskViewType::policy_type &policy,
- typename CrsTaskViewType::matrix_type &A,
- typename CrsTaskViewType::matrix_type &C) {
- typedef typename CrsTaskViewType::ordinal_type ordinal_type;
- typedef typename CrsTaskViewType::value_type value_type;
- typedef typename CrsTaskViewType::row_view_type row_view_type;
-
- typedef typename CrsTaskViewType::future_type future_type;
- typedef typename CrsTaskViewType::task_factory_type task_factory_type;
-
- // case that X.transpose, A.no_transpose, Y.no_transpose
-
- row_view_type a(A,0), c;
-
- const ordinal_type nnz = a.NumNonZeros();
- ordinal_type herk_count = 0 ;
- ordinal_type gemm_count = 0 ;
-
- // update herk
- for (ordinal_type i=0;i<nnz;++i) {
- const ordinal_type row_at_i = a.Col(i);
- value_type &aa = a.Value(i);
-
- c.setView(C, row_at_i);
-
- ordinal_type idx = 0;
- for (ordinal_type j=i;j<nnz && (idx > -2);++j) {
- const ordinal_type col_at_j = a.Col(j);
- value_type &bb = a.Value(j);
-
- if (row_at_i == col_at_j) {
- idx = c.Index(row_at_i, idx);
- if (idx >= 0) {
- ++herk_count ;
- value_type &cc = c.Value(idx);
- future_type f = task_factory_type
- ::create(policy,
- typename Herk<Uplo::Upper,Trans::ConjTranspose,
- CtrlDetail(ControlType,AlgoChol::ByBlocks,ArgVariant,Herk)>
- ::template TaskFunctor<double,value_type,value_type>(policy,-1.0, aa, 1.0, cc));
-
-
-if ( false ) {
- printf("Herk [%d +%d)x[%d +%d) spawn %d %d\n"
- , cc.OffsetRows()
- , cc.NumRows()
- , cc.OffsetCols()
- , cc.NumCols()
- , int( ! aa.Future().is_null() )
- , int( ! cc.Future().is_null() )
- );
-}
-
- // dependence
- task_factory_type::addDependence(policy, f, aa.Future());
-
- // self
- task_factory_type::addDependence(policy, f, cc.Future());
-
- // place task signature on y
- cc.setFuture(f);
-
- // spawn a task
- task_factory_type::spawn(policy, f);
- }
- } else {
- idx = c.Index(col_at_j, idx);
- if (idx >= 0) {
- ++gemm_count ;
- value_type &cc = c.Value(idx);
- future_type f = task_factory_type
- ::create(policy,
- typename Gemm<Trans::ConjTranspose,Trans::NoTranspose,
- CtrlDetail(ControlType,AlgoChol::ByBlocks,ArgVariant,Gemm)>
- ::template TaskFunctor<double,value_type,value_type,value_type>(policy,-1.0, aa, bb, 1.0, cc));
-
-
-if ( false ) {
- printf("Gemm [%d +%d)x[%d +%d) spawn %d %d %d\n"
- , cc.OffsetRows()
- , cc.NumRows()
- , cc.OffsetCols()
- , cc.NumCols()
- , int( ! aa.Future().is_null() )
- , int( ! bb.Future().is_null() )
- , int( ! cc.Future().is_null() )
- );
-}
-
- // dependence
- task_factory_type::addDependence(policy, f, aa.Future());
- task_factory_type::addDependence(policy, f, bb.Future());
-
- // self
- task_factory_type::addDependence(policy, f, cc.Future());
-
- // place task signature on y
- cc.setFuture(f);
-
- // spawn a task
- task_factory_type::spawn(policy, f);
- }
- }
- }
- }
-
-if ( false ) {
-printf("genHerkTask Herk(%ld) Gemm(%ld)\n",(long)herk_count,(long)gemm_count);
-}
-
- return herk_count + gemm_count ;
- }
-
- };
-
- // specialization for different task generation in right looking by-blocks algorithm
- // =================================================================================
- template<int ArgVariant, template<int,int> class ControlType>
- class Chol<Uplo::Upper,AlgoChol::RightLookByBlocks,ArgVariant,ControlType> {
- public:
-
- // function interface
- // ==================
- template<typename ExecViewType>
- KOKKOS_INLINE_FUNCTION
- static int invoke(typename ExecViewType::policy_type &policy,
- const typename ExecViewType::policy_type::member_type &member,
- typename ExecViewType::matrix_type & A,
- int checkpoint )
- {
- typedef typename ExecViewType::row_view_type row_view_type ;
-
- enum { CYCLE = 2 };
-
- typename ExecViewType::matrix_type
- ATL, ATR, A00, A01, A02,
- ABL, ABR, A10, A11, A12,
- A20, A21, A22;
-
- Part_2x2(A, ATL, ATR,
- /**/ABL, ABR,
- checkpoint, checkpoint, Partition::TopLeft);
-
- int tasks_spawned = 0 ;
- int futures_released = 0 ;
-
- for ( int i = 0 ; i < CYCLE && ATL.NumRows() < A.NumRows() ; ++i ) {
- Part_2x2_to_3x3(ATL, ATR, /**/ A00, A01, A02,
- /*******/ /**/ A10, A11, A12,
- ABL, ABR, /**/ A20, A21, A22,
- 1, 1, Partition::BottomRight);
- // -----------------------------------------------------
- // Spawning tasks:
-
- // A11 = chol(A11) : #task = 1
- tasks_spawned +=
- CholUpperRightLookByBlocks<ArgVariant,ControlType,ExecViewType>
- ::genScalarTask(policy, A11);
-
- // A12 = inv(triu(A11)') * A12 : #tasks = non-zero row blocks
- tasks_spawned +=
- CholUpperRightLookByBlocks<ArgVariant,ControlType,ExecViewType>
- ::genTrsmTasks(policy, A11, A12);
-
- // A22 = A22 - A12' * A12 : #tasks = highly variable
- tasks_spawned +=
- CholUpperRightLookByBlocks<ArgVariant,ControlType,ExecViewType>
- ::genHerkTasks(policy, A12, A22);
-
- // -----------------------------------------------------
- // Can release futures of A11 and A12
-
- futures_released += releaseFutures<ExecViewType>( A11 );
- futures_released += releaseFutures<ExecViewType>( A12 );
-
-if ( false ) {
- printf("Chol iteration(%d) task_count(%d) cumulative: spawn(%d) release(%d)\n"
- , int(ATL.NumRows())
- , policy.allocated_task_count()
- , tasks_spawned , futures_released
- );
-}
-
- // -----------------------------------------------------
- Merge_3x3_to_2x2(A00, A01, A02, /**/ ATL, ATR,
- A10, A11, A12, /**/ /******/
- A20, A21, A22, /**/ ABL, ABR,
- Partition::TopLeft);
-
- }
-
- return ATL.NumRows();
- }
-
- // task-data parallel interface
- // ============================
- template<typename ExecViewType>
- class TaskFunctor {
- public:
- typedef typename ExecViewType::policy_type policy_type;
- typedef typename ExecViewType::future_type future_type;
- typedef typename policy_type::member_type member_type;
- typedef int value_type;
-
- private:
- typename ExecViewType::matrix_type _A;
-
- policy_type _policy;
- int _checkpoint ;
-
- public:
- KOKKOS_INLINE_FUNCTION
- TaskFunctor(const policy_type & P ,
- const typename ExecViewType::matrix_type & A)
- : _A(A),
- _policy(P),
- _checkpoint(0)
- { }
-
- string Label() const { return "Chol"; }
-
- // task-data execution
- KOKKOS_INLINE_FUNCTION
- void apply(const member_type &member, value_type &r_val)
- {
- if (member.team_rank() == 0) {
- // Clear out previous dependence
- _policy.clear_dependence( this );
-
- _checkpoint = Chol::invoke<ExecViewType>(_policy, member, _A, _checkpoint);
-
- if ( _checkpoint < _A.NumRows() ) _policy.respawn_needing_memory(this);
-
- r_val = 0 ;
- }
- return ;
- }
-
- };
-
- };
-}
-
-#endif
diff --git a/lib/kokkos/example/ichol/src/chol_u_unblocked_opt1.hpp b/lib/kokkos/example/ichol/src/chol_u_unblocked_opt1.hpp
deleted file mode 100644
index 3bb99c714..000000000
--- a/lib/kokkos/example/ichol/src/chol_u_unblocked_opt1.hpp
+++ /dev/null
@@ -1,90 +0,0 @@
-#pragma once
-#ifndef __CHOL_U_UNBLOCKED_OPT1_HPP__
-#define __CHOL_U_UNBLOCKED_OPT1_HPP__
-
-/// \file chol_u_unblocked_opt1.hpp
-/// \brief Unblocked incomplete Chloesky factorization.
-/// \author Kyungjoo Kim (kyukim@sandia.gov)
-
-#include "util.hpp"
-#include "partition.hpp"
-
-namespace Tacho {
-
- using namespace std;
-
- template<>
- template<typename CrsExecViewType>
- KOKKOS_INLINE_FUNCTION
- int
- Chol<Uplo::Upper,AlgoChol::UnblockedOpt,Variant::One>
- ::invoke(typename CrsExecViewType::policy_type &policy,
- const typename CrsExecViewType::policy_type::member_type &member,
- typename CrsExecViewType::matrix_type &A) {
-
- typedef typename CrsExecViewType::value_type value_type;
- typedef typename CrsExecViewType::ordinal_type ordinal_type;
- typedef typename CrsExecViewType::row_view_type row_view_type;
-
- // row_view_type r1t, r2t;
-
- for (ordinal_type k=0;k<A.NumRows();++k) {
- //r1t.setView(A, k);
- row_view_type &r1t = A.RowView(k);
-
- // extract diagonal from alpha11
- value_type &alpha = r1t.Value(0);
-
- if (member.team_rank() == 0) {
- // if encounter null diag or wrong index, return -(row + 1)
- if (abs(alpha) == 0.0 || r1t.Col(0) != k)
- return -(k + 1);
-
- // error handling should be more carefully designed
-
- // sqrt on diag
- // alpha = sqrt(real(alpha));
- alpha = sqrt(alpha);
- }
- member.team_barrier();
-
- const ordinal_type nnz_r1t = r1t.NumNonZeros();
-
- if (nnz_r1t) {
- // inverse scale
- Kokkos::parallel_for(Kokkos::TeamThreadRange(member, 1, nnz_r1t),
- [&](const ordinal_type j) {
- r1t.Value(j) /= alpha;
- });
-
- member.team_barrier();
-
- // hermitian rank update
- Kokkos::parallel_for(Kokkos::TeamThreadRange(member, 1, nnz_r1t),
- [&](const ordinal_type i) {
- const ordinal_type row_at_i = r1t.Col(i);
- // const value_type val_at_i = conj(r1t.Value(i));
- const value_type val_at_i = r1t.Value(i);
-
- //r2t.setView(A, row_at_i);
- row_view_type &r2t = A.RowView(row_at_i);
- ordinal_type idx = 0;
-
- for (ordinal_type j=i;j<nnz_r1t && (idx > -2);++j) {
- const ordinal_type col_at_j = r1t.Col(j);
- idx = r2t.Index(col_at_j, idx);
-
- if (idx >= 0) {
- const value_type val_at_j = r1t.Value(j);
- r2t.Value(idx) -= val_at_i*val_at_j;
- }
- }
- });
- }
- }
- return 0;
- }
-
-}
-
-#endif
diff --git a/lib/kokkos/example/ichol/src/chol_u_unblocked_opt2.hpp b/lib/kokkos/example/ichol/src/chol_u_unblocked_opt2.hpp
deleted file mode 100644
index e7d1dc826..000000000
--- a/lib/kokkos/example/ichol/src/chol_u_unblocked_opt2.hpp
+++ /dev/null
@@ -1,154 +0,0 @@
-#pragma once
-#ifndef __CHOL_U_UNBLOCKED_OPT2_HPP__
-#define __CHOL_U_UNBLOCKED_OPT2_HPP__
-
-/// \file chol_u_unblocked_opt2.hpp
-/// \brief Unblocked incomplete Chloesky factorization; version for data parallel sharing L1 cache.
-/// \author Kyungjoo Kim (kyukim@sandia.gov)
-
-#include "util.hpp"
-#include "partition.hpp"
-
-namespace Tacho {
-
- using namespace std;
-
- template<>
- template<typename CrsExecViewType>
- KOKKOS_INLINE_FUNCTION
- int
- Chol<Uplo::Upper,AlgoChol::UnblockedOpt,Variant::Two>
- ::invoke(typename CrsExecViewType::policy_type &policy,
- const typename CrsExecViewType::policy_type::member_type &member,
- typename CrsExecViewType::matrix_type &A) {
-
- typedef typename CrsExecViewType::value_type value_type;
- typedef typename CrsExecViewType::ordinal_type ordinal_type;
- typedef typename CrsExecViewType::row_view_type row_view_type;
-
-if ( false && member.team_rank() == 0 ) {
- printf("Chol [%d +%d)x[%d +%d) begin\n"
- , A.OffsetRows()
- , A.NumRows()
- , A.OffsetCols()
- , A.NumCols()
- );
-}
-
- // row_view_type r1t, r2t;
-
- for (ordinal_type k=0;k<A.NumRows();++k) {
- //r1t.setView(A, k);
- row_view_type &r1t = A.RowView(k);
-
- // extract diagonal from alpha11
- value_type &alpha = r1t.Value(0);
-
- if (member.team_rank() == 0) {
- // if encounter null diag or wrong index, return -(row + 1)
- if (abs(alpha) == 0.0 || r1t.Col(0) != k)
- return -(k + 1);
-
- // error handling should be more carefully designed
-
- // sqrt on diag
- // alpha = sqrt(real(alpha));
- alpha = sqrt(alpha);
- }
- member.team_barrier();
-
-
-if ( false && member.team_rank() == 0 ) {
- printf("Chol [%d +%d)x[%d +%d) local row %d\n"
- , A.OffsetRows()
- , A.NumRows()
- , A.OffsetCols()
- , A.NumCols()
- , int(k)
- );
-}
-
-
- const ordinal_type nnz_r1t = r1t.NumNonZeros();
-
- if (nnz_r1t) {
- // inverse scale
- Kokkos::parallel_for(Kokkos::TeamThreadRange(member, 1, nnz_r1t),
- [&](const ordinal_type j) {
- r1t.Value(j) /= alpha;
- });
-
- member.team_barrier();
-
-
-if ( false && member.team_rank() == 0 ) {
- printf("Chol [%d +%d)x[%d +%d) local row %d nnz_r1t\n"
- , A.OffsetRows()
- , A.NumRows()
- , A.OffsetCols()
- , A.NumCols()
- , int(k)
- );
-}
-
- // hermitian rank update
- for (ordinal_type i=1;i<nnz_r1t;++i) {
- const ordinal_type row_at_i = r1t.Col(i);
- // const value_type val_at_i = conj(r1t.Value(i));
- const value_type val_at_i = r1t.Value(i);
-
- //r2t.setView(A, row_at_i);
- row_view_type &r2t = A.RowView(row_at_i);
-
- ordinal_type member_idx = 0 ;
-
- Kokkos::parallel_for(Kokkos::TeamThreadRange(member, i, nnz_r1t),
- [&](const ordinal_type j) {
- if (member_idx > -2) {
- const ordinal_type col_at_j = r1t.Col(j);
- member_idx = r2t.Index(col_at_j, member_idx);
- if (member_idx >= 0) {
- const value_type val_at_j = r1t.Value(j);
- r2t.Value(member_idx) -= val_at_i*val_at_j;
- }
- }
- });
- }
- }
-
-
-if ( false ) {
-member.team_barrier();
-if ( member.team_rank() == 0 ) {
- printf("Chol [%d +%d)x[%d +%d) local row %d end\n"
- , A.OffsetRows()
- , A.NumRows()
- , A.OffsetCols()
- , A.NumCols()
- , int(k)
- );
-}
-}
-
- }
-
-
-if ( false ) {
-member.team_barrier();
-if ( member.team_rank() == 0 ) {
- printf("Chol [%d +%d)x[%d +%d) end\n"
- , A.OffsetRows()
- , A.NumRows()
- , A.OffsetCols()
- , A.NumCols()
- );
-}
-}
-
-
- return 0;
- }
-
-}
-
-#endif
diff --git a/lib/kokkos/example/ichol/src/control.hpp b/lib/kokkos/example/ichol/src/control.hpp
deleted file mode 100644
index bf5efef9f..000000000
--- a/lib/kokkos/example/ichol/src/control.hpp
+++ /dev/null
@@ -1,110 +0,0 @@
-#pragma once
-#ifndef __CONTROL_HPP__
-#define __CONTROL_HPP__
-
-#include "util.hpp"
-
-/// \file control.hpp
-/// \brief A collection of control trees composing high-level variants of algorithms.
-/// \author Kyungjoo Kim (kyukim@sandia.gov)
-
-/// description is a bit wrong
-
-using namespace std;
-
-namespace Tacho {
-
- // forward declaration for control tree
- template<int ArgAlgo, int ArgVariant>
- struct Control {
- static constexpr int Self[2] = { ArgAlgo, ArgVariant };
- };
-
- // ----------------------------------------------------------------------------------
-
- // - CholByblocks Variant 1
- // * partitioned block matrix (blocks are sparse)
- template<> struct Control<AlgoChol::ByBlocks,Variant::One> {
- // chol var 1 : nested data parallel for is applied in the second inner loop
- // chol var 2 : nested data parallel for is applied in the most inner loop
- static constexpr int Chol[2] = { AlgoChol::UnblockedOpt, Variant::Two };
- static constexpr int Trsm[2] = { AlgoTrsm::ForFactorBlocked, Variant::One };
- static constexpr int Herk[2] = { AlgoHerk::ForFactorBlocked, Variant::One };
- static constexpr int Gemm[2] = { AlgoGemm::ForFactorBlocked, Variant::One };
- };
-
- // - CholByBlocks Variant 2
- // * diagonal blocks have nested dense blocks
- template<> struct Control<AlgoChol::ByBlocks,Variant::Two> {
- static constexpr int Chol[2] = { AlgoChol::UnblockedOpt, Variant::One };
- static constexpr int Trsm[2] = { AlgoTrsm::ForFactorBlocked, Variant::One };
- static constexpr int Herk[2] = { AlgoHerk::ForFactorBlocked, Variant::One };
- static constexpr int Gemm[2] = { AlgoGemm::ForFactorBlocked, Variant::One };
- };
-
- // - CholByBlocks Variant 3
- // * all blocks have nested dense blocks (full supernodal algorithm)
- // template<> struct Control<AlgoChol::ByBlocks,Variant::Three> {
- // static constexpr int Chol[2] = { AlgoChol::NestedDenseBlock, Variant::One };
- // static constexpr int Trsm[2] = { AlgoTrsm::NestedDenseBlock, Variant::One };
- // static constexpr int Herk[2] = { AlgoHerk::NestedDenseBlock, Variant::One };
- // static constexpr int Gemm[2] = { AlgoGemm::NestedDenseBlock, Variant::One };
- // };
-
- // - CholByBlocks Variant 4
- // * diagonal blocks have nested hier dense blocks (hierarchical task scheduling)
- // template<> struct Control<AlgoChol::ByBlocks,Variant::Four> {
- // static constexpr int Chol[2] = { AlgoChol::NestedDenseByBlocks, Variant::One };
- // static constexpr int Trsm[2] = { AlgoTrsm::ForFactorBlocked, Variant::One };
- // static constexpr int Herk[2] = { AlgoHerk::ForFactorBlocked, Variant::One };
- // static constexpr int Gemm[2] = { AlgoGemm::ForFactorBlocked, Variant::One };
- //};
-
- // - CholByBlocks Variant 5
- // * diagonal blocks have nested hier dense blocks (hierarchical task scheduling)
- // template<> struct Control<AlgoChol::ByBlocks,Variant::Four> {
- // static constexpr int Chol[2] = { AlgoChol::NestedDenseByBlocks, Variant::One };
- // static constexpr int Trsm[2] = { AlgoTrsm::NestedDenseByBlocks, Variant::One };
- // static constexpr int Herk[2] = { AlgoHerk::NestedDenseByBlocks, Variant::One };
- // static constexpr int Gemm[2] = { AlgoGemm::NestedDenseByBlocks, Variant::One };
- // };
-
- // ----------------------------------------------------------------------------------
-
- // - CholNestedDenseBlock
- // * branch control between sparse and dense operations
- template<> struct Control<AlgoChol::NestedDenseBlock,Variant::One> {
- static constexpr int CholSparse[2] = { AlgoChol::UnblockedOpt, Variant::One };
- static constexpr int CholDense[2] = { AlgoChol::ExternalLapack, Variant::One };
- };
-
- // - CholNestedDenseBlock
- // * branch control between sparse and dense operations
- template<> struct Control<AlgoChol::NestedDenseByBlocks,Variant::One> {
- static constexpr int CholSparse[2] = { AlgoChol::UnblockedOpt, Variant::One };
- static constexpr int CholDenseByBlocks[2] = { AlgoChol::DenseByBlocks, Variant::One };
- };
-
- // ----------------------------------------------------------------------------------
-
- // - CholDenseBlock
- // * dense matrix Cholesky-by-blocks
- template<> struct Control<AlgoChol::DenseByBlocks,Variant::One> {
- static constexpr int Chol[2] = { AlgoChol::ExternalLapack, Variant::One };
- static constexpr int Trsm[2] = { AlgoTrsm::ExternalBlas, Variant::One };
- static constexpr int Herk[2] = { AlgoHerk::ExternalBlas, Variant::One };
- static constexpr int Gemm[2] = { AlgoGemm::ExternalBlas, Variant::One };
- };
-
- template<> struct Control<AlgoGemm::DenseByBlocks,Variant::One> {
- static constexpr int Gemm[2] = { AlgoGemm::ExternalBlas, Variant::One };
- };
-
- template<> struct Control<AlgoTrsm::DenseByBlocks,Variant::One> {
- static constexpr int Gemm[2] = { AlgoGemm::ExternalBlas, Variant::One };
- static constexpr int Trsm[2] = { AlgoTrsm::ExternalBlas, Variant::One };
- };
-
-}
-
-#endif
diff --git a/lib/kokkos/example/ichol/src/coo.hpp b/lib/kokkos/example/ichol/src/coo.hpp
deleted file mode 100644
index 977f17e5c..000000000
--- a/lib/kokkos/example/ichol/src/coo.hpp
+++ /dev/null
@@ -1,75 +0,0 @@
-#pragma once
-#ifndef __COO_HPP__
-#define __COO_HPP__
-
-/// \file coo.hpp
-/// \author Kyungjoo Kim (kyukim@sandia.gov)
-
-namespace Tacho {
-
- using namespace std;
-
- /// \class Coo
- /// \brief Sparse coordinate format; (i, j, val).
- template<typename CrsMatType>
- class Coo {
- public:
- typedef typename CrsMatType::ordinal_type ordinal_type;
- typedef typename CrsMatType::value_type value_type;
-
- public:
- ordinal_type _i,_j;
- value_type _val;
-
- public:
- ordinal_type& Row() { return _i; }
- ordinal_type& Col() { return _j; }
- value_type& Val() { return _val; }
-
- ordinal_type Row() const { return _i; }
- ordinal_type Col() const { return _j; }
- value_type Val() const { return _val; }
-
- Coo() {}
-
- Coo(const ordinal_type i,
- const ordinal_type j,
- const value_type val)
- : _i(i),
- _j(j),
- _val(val)
- { }
-
- Coo(const Coo& b)
- : _i(b._i),
- _j(b._j),
- _val(b._val)
- { }
-
- Coo<CrsMatType>& operator=(const Coo<CrsMatType> &y) {
- this->_i = y._i;
- this->_j = y._j;
- this->_val = y._val;
-
- return *this;
- }
-
- /// \brief Compare "less" index i and j only.
- bool operator<(const Coo<CrsMatType> &y) const {
- ordinal_type r_val = (this->_i - y._i);
- return (r_val == 0 ? this->_j < y._j : r_val < 0);
- }
-
- /// \brief Compare "equality" only index i and j.
- bool operator==(const Coo<CrsMatType> &y) const {
- return (this->_i == y._i) && (this->_j == y._j);
- }
-
- /// \brief Compare "in-equality" only index i and j.
- bool operator!=(const Coo<CrsMatType> &y) const {
- return !(*this == y);
- }
- };
-
-}
-#endif
diff --git a/lib/kokkos/example/ichol/src/crs_matrix_base.hpp b/lib/kokkos/example/ichol/src/crs_matrix_base.hpp
deleted file mode 100644
index ad08b8757..000000000
--- a/lib/kokkos/example/ichol/src/crs_matrix_base.hpp
+++ /dev/null
@@ -1,598 +0,0 @@
-#pragma once
-#ifndef __CRS_MATRIX_BASE_HPP__
-#define __CRS_MATRIX_BASE_HPP__
-
-/// \file crs_matrix_base.hpp
-/// \brief CRS matrix base object interfaces to user provided input matrices.
-/// \author Kyungjoo Kim (kyukim@sandia.gov)
-
-#include "util.hpp"
-#include "coo.hpp"
-
-namespace Tacho {
-
- using namespace std;
-
- template< typename , typename > class TaskView ;
-
- template < typename CrsMatrixType >
- struct GetCrsMatrixRowViewType {
- typedef int type ;
- };
-
-
- template < typename CrsMatrixViewType , typename TaskFactoryType >
- struct GetCrsMatrixRowViewType
- < TaskView<CrsMatrixViewType,TaskFactoryType> >
- {
- typedef typename CrsMatrixViewType::row_view_type type ;
- };
-
- /// \class CrsMatrixBase
- /// \breif CRS matrix base object using Kokkos view and subview
- template<typename ValueType,
- typename OrdinalType,
- typename SizeType = OrdinalType,
- typename SpaceType = void,
- typename MemoryTraits = void>
- class CrsMatrixBase {
- public:
- typedef ValueType value_type;
- typedef OrdinalType ordinal_type;
- typedef SpaceType space_type;
- typedef SizeType size_type;
- typedef MemoryTraits memory_traits;
-
- // 1D view, layout does not matter; no template parameters for that
- typedef Kokkos::View<size_type*, space_type,memory_traits> size_type_array;
- typedef Kokkos::View<ordinal_type*,space_type,memory_traits> ordinal_type_array;
- typedef Kokkos::View<value_type*, space_type,memory_traits> value_type_array;
-
- typedef typename size_type_array::value_type* size_type_array_ptr;
- typedef typename ordinal_type_array::value_type* ordinal_type_array_ptr;
- typedef typename value_type_array::value_type* value_type_array_ptr;
-
- // range type
- template<typename T> using range_type = pair<T,T>;
-
- // external interface
- typedef Coo<CrsMatrixBase> ijv_type;
-
- friend class CrsMatrixHelper;
-
- private:
-
- ordinal_type _m; //!< # of rows
- ordinal_type _n; //!< # of cols
- size_type _nnz; //!< # of nonzeros
- size_type_array _ap; //!< pointers to column index and values
- ordinal_type_array _aj; //!< column index compressed format
- value_type_array _ax; //!< values
-
- public:
-
- typedef typename GetCrsMatrixRowViewType< ValueType >::type row_view_type ;
- typedef Kokkos::View<row_view_type*,space_type> row_view_type_array;
-
- row_view_type_array _all_row_views ;
-
- protected:
-
- void createInternalArrays(const ordinal_type m,
- const ordinal_type n,
- const size_type nnz) {
- _m = m;
- _n = n;
- _nnz = nnz;
-
- if (static_cast<ordinal_type>(_ap.dimension_0()) < m+1)
- _ap = size_type_array("CrsMatrixBase::RowPtrArray", m+1);
-
- if (static_cast<size_type>(_aj.dimension_0()) < nnz)
- _aj = ordinal_type_array("CrsMatrixBase::ColsArray", nnz);
-
- if (static_cast<size_type>(_ax.dimension_0()) < nnz)
- _ax = value_type_array("CrsMatrixBase::ValuesArray", nnz);
- }
-
- // Copy sparse matrix structure from coordinate format in 'mm'
- // to CRS format in Views _ap, _aj, a_x.
- void ijv2crs(const vector<ijv_type> &mm) {
-
- ordinal_type ii = 0;
- size_type jj = 0;
-
- ijv_type prev = mm[0];
- _ap[ii++] = 0;
- _aj[jj] = prev.Col();
- _ax[jj] = prev.Val();
- ++jj;
-
- for (typename vector<ijv_type>::const_iterator it=(mm.begin()+1);it<mm.end();++it) {
- ijv_type aij = (*it);
-
- // row index
- if (aij.Row() != prev.Row()) {
- _ap[ii++] = jj;
- }
-
- if (aij == prev) {
- --jj;
- _aj[jj] = aij.Col();
- _ax[jj] += aij.Val();
- } else {
- _aj[jj] = aij.Col();
- _ax[jj] = aij.Val();
- }
- ++jj;
-
- prev = aij;
- }
-
- // add the last index to terminate the storage
- _ap[ii++] = jj;
- _nnz = jj;
- }
-
- public:
-
- KOKKOS_INLINE_FUNCTION
- void setNumNonZeros() {
- if (_m)
- _nnz = _ap[_m];
- }
-
- KOKKOS_INLINE_FUNCTION
- ordinal_type NumRows() const { return _m; }
-
- KOKKOS_INLINE_FUNCTION
- ordinal_type NumCols() const { return _n; }
-
- KOKKOS_INLINE_FUNCTION
- size_type NumNonZeros() const { return _nnz; }
-
- KOKKOS_INLINE_FUNCTION
- size_type_array_ptr RowPtr() const { return &_ap[0]; }
-
- KOKKOS_INLINE_FUNCTION
- ordinal_type_array_ptr ColPtr() const { return &_aj[0]; }
-
- KOKKOS_INLINE_FUNCTION
- value_type_array_ptr ValuePtr() const { return &_ax[0];}
-
- KOKKOS_INLINE_FUNCTION
- size_type RowPtr(const ordinal_type i) const { return _ap[i]; }
-
- KOKKOS_INLINE_FUNCTION
- ordinal_type_array_ptr ColsInRow(const ordinal_type i) const { return _aj.data() + _ap[i] ; }
-
- KOKKOS_INLINE_FUNCTION
- value_type_array_ptr ValuesInRow(const ordinal_type i) const { return _ax.data() + _ap[i] ; }
-
- KOKKOS_INLINE_FUNCTION
- ordinal_type NumNonZerosInRow(const ordinal_type i) const { return (_ap[i+1] - _ap[i]); }
-
- KOKKOS_INLINE_FUNCTION
- value_type& Value(const ordinal_type k) { return _ax[k]; }
-
- KOKKOS_INLINE_FUNCTION
- value_type Value(const ordinal_type k) const { return _ax[k]; }
-
- /// \brief Default constructor.
- KOKKOS_INLINE_FUNCTION
- CrsMatrixBase()
- : _m(0),
- _n(0),
- _nnz(0),
- _ap(),
- _aj(),
- _ax()
- { }
-
- /// \brief Constructor with label
- CrsMatrixBase(const string & )
- : _m(0),
- _n(0),
- _nnz(0),
- _ap(),
- _aj(),
- _ax()
- { }
-
- /// \brief Copy constructor (shallow copy), for deep-copy use a method copy
- template<typename VT,
- typename OT,
- typename ST,
- typename SpT,
- typename MT>
- CrsMatrixBase(const CrsMatrixBase<VT,OT,ST,SpT,MT> &b)
- : _m(b._m),
- _n(b._n),
- _nnz(b._nnz),
- _ap(b._ap),
- _aj(b._aj),
- _ax(b._ax)
- { }
-
- /// \brief Constructor to allocate internal data structures.
- CrsMatrixBase(const string & ,
- const ordinal_type m,
- const ordinal_type n,
- const ordinal_type nnz)
- : _m(m),
- _n(n),
- _nnz(nnz),
- _ap("CrsMatrixBase::RowPtrArray", m+1),
- _aj("CrsMatrixBase::ColsArray", nnz),
- _ax("CrsMatrixBase::ValuesArray", nnz)
- { }
-
- /// \brief Constructor to attach external arrays to the matrix.
- CrsMatrixBase(const string &,
- const ordinal_type m,
- const ordinal_type n,
- const ordinal_type nnz,
- const size_type_array &ap,
- const ordinal_type_array &aj,
- const value_type_array &ax)
- : _m(m),
- _n(n),
- _nnz(nnz),
- _ap(ap),
- _aj(aj),
- _ax(ax)
- { }
-
- // Allow the copy function access to the input CrsMatrixBase
- // private data.
- template<typename, typename, typename, typename, typename>
- friend class CrsMatrixBase ;
-
- public:
- /// \brief deep copy of matrix b, potentially different spaces
- template< typename SpT >
- int
- copy(const CrsMatrixBase<ValueType,OrdinalType,SizeType,SpT,MemoryTraits> &b) {
-
- space_type::execution_space::fence();
-
- createInternalArrays(b._m, b._n, b._nnz);
-
- space_type::execution_space::fence();
-
- const auto ap_range = range_type<ordinal_type>(0, min(_ap.dimension_0(), b._ap.dimension_0()));
- const auto aj_range = range_type<size_type> (0, min(_aj.dimension_0(), b._aj.dimension_0()));
- const auto ax_range = range_type<size_type> (0, min(_ax.dimension_0(), b._ax.dimension_0()));
-
- Kokkos::deep_copy(Kokkos::subview( _ap, ap_range),
- Kokkos::subview(b._ap, ap_range));
- Kokkos::deep_copy(Kokkos::subview( _aj, aj_range),
- Kokkos::subview(b._aj, aj_range));
-
- Kokkos::deep_copy(Kokkos::subview( _ax, ax_range),
- Kokkos::subview(b._ax, ax_range));
-
- space_type::execution_space::fence();
-
- return 0;
- }
-
- /// \brief deep copy of lower/upper triangular of matrix b
- int
- copy(const int uplo,
- const CrsMatrixBase &b) {
-
- createInternalArrays(b._m, b._n, b._nnz);
-
- // assume that matrix b is sorted.
- switch (uplo) {
- case Uplo::Lower: {
- _nnz = 0;
- for (ordinal_type i=0;i<_m;++i) {
- size_type jbegin = b._ap[i];
- size_type jend = b._ap[i+1];
- _ap[i] = _nnz;
- for (size_type j=jbegin;j<jend && (i >= b._aj[j]);++j,++_nnz) {
- _aj[_nnz] = b._aj[j];
- _ax[_nnz] = b._ax[j];
- }
- }
- _ap[_m] = _nnz;
- break;
- }
- case Uplo::Upper: {
- _nnz = 0;
- for (ordinal_type i=0;i<_m;++i) {
- size_type j = b._ap[i];
- size_type jend = b._ap[i+1];
- _ap[i] = _nnz;
- for ( ;j<jend && (i > b._aj[j]);++j) ;
- for ( ;j<jend;++j,++_nnz) {
- _aj[_nnz] = b._aj[j];
- _ax[_nnz] = b._ax[j];
- }
- }
- _ap[_m] = _nnz;
- break;
- }
- }
-
- return 0;
- }
-
- /// \brief deep copy of matrix b with given permutation vectors
- template<typename VT,
- typename OT,
- typename ST,
- typename SpT,
- typename MT>
- int
- copy(const typename CrsMatrixBase<VT,OT,ST,SpT,MT>::ordinal_type_array &p,
- const typename CrsMatrixBase<VT,OT,ST,SpT,MT>::ordinal_type_array &ip,
- const CrsMatrixBase<VT,OT,ST,SpT,MT> &b) {
-
- createInternalArrays(b._m, b._n, b._nnz);
-
- // Question:: do I need to use Kokkos::vector ?
- // in other words, where do we permute matrix in factoriztion ?
- // permuting a matrix is a kernel ?
- vector<ijv_type> tmp;
-
- // any chance to use parallel_for ?
- _nnz = 0;
- for (ordinal_type i=0;i<_m;++i) {
- ordinal_type ii = ip[i];
-
- size_type jbegin = b._ap[ii];
- size_type jend = b._ap[ii+1];
-
- _ap[i] = _nnz;
- for (size_type j=jbegin;j<jend;++j) {
- ordinal_type jj = p[b._aj[j]];
- ijv_type aij(i, jj, b._ax[j]);
- tmp.push_back(aij);
- }
-
- sort(tmp.begin(), tmp.end(), less<ijv_type>());
- for (auto it=tmp.begin();it<tmp.end();++it) {
- ijv_type aij = (*it);
-
- _aj[_nnz] = aij.Col();
- _ax[_nnz] = aij.Val();
- ++_nnz;
- }
- tmp.clear();
- }
- _ap[_m] = _nnz;
-
- return 0;
- }
-
- /// \brief add the matrix b into this non-zero entires
- template<typename VT,
- typename OT,
- typename ST,
- typename SpT,
- typename MT>
- int
- add(const CrsMatrixBase<VT,OT,ST,SpT,MT> &b) {
-
- const ordinal_type m = min(b._m, _m);
- for (ordinal_type i=0;i<m;++i) {
- const size_type jaend = _ap[i+1];
- const size_type jbend = b._ap[i+1];
-
- size_type ja = _ap[i];
- size_type jb = b._ap[i];
-
- for ( ;jb<jbend;++jb) {
- for ( ;(_aj[ja]<b._aj[jb] && ja<jaend);++ja);
- _ax[ja] += (_aj[ja] == b._aj[jb])*b._ax[jb];
- }
- }
-
- return 0;
- }
-
- int symmetrize(const int uplo,
- const bool conjugate = false) {
- vector<ijv_type> mm;
- mm.reserve(_nnz*2);
-
- for (ordinal_type i=0;i<_m;++i) {
- const size_type jbegin = _ap[i];
- const size_type jend = _ap[i+1];
- for (size_type jj=jbegin;jj<jend;++jj) {
- const ordinal_type j = _aj[jj];
- const value_type val = (conjugate ? conj(_ax[j]) : _ax[j]);
- if (uplo == Uplo::Lower && i > j) {
- mm.push_back(ijv_type(i, j, val));
- mm.push_back(ijv_type(j, i, val));
- } else if (uplo == Uplo::Upper && i < j) {
- mm.push_back(ijv_type(i, j, val));
- mm.push_back(ijv_type(j, i, val));
- } else if (i == j) {
- mm.push_back(ijv_type(i, i, val));
- }
- }
- }
- sort(mm.begin(), mm.end(), less<ijv_type>());
-
- createInternalArrays(_m, _n, mm.size());
-
- ijv2crs(mm);
-
- return 0;
- }
-
- int hermitianize(int uplo) {
- return symmetrize(uplo, true);
- }
-
- ostream& showMe(ostream &os) const {
- streamsize prec = os.precision();
- os.precision(8);
- os << scientific;
-
- os << " -- CrsMatrixBase -- " << endl
- << " # of Rows = " << _m << endl
- << " # of Cols = " << _n << endl
- << " # of NonZeros = " << _nnz << endl
- << endl
- << " RowPtrArray length = " << _ap.dimension_0() << endl
- << " ColArray length = " << _aj.dimension_0() << endl
- << " ValueArray length = " << _ax.dimension_0() << endl
- << endl;
-
- const int w = 10;
- if (_ap.size() && _aj.size() && _ax.size()) {
- os << setw(w) << "Row" << " "
- << setw(w) << "Col" << " "
- << setw(w) << "Val" << endl;
- for (ordinal_type i=0;i<_m;++i) {
- size_type jbegin = _ap[i], jend = _ap[i+1];
- for (size_type j=jbegin;j<jend;++j) {
- value_type val = _ax[j];
- os << setw(w) << i << " "
- << setw(w) << _aj[j] << " "
- << setw(w) << val << endl;
- }
- }
- }
-
- os.unsetf(ios::scientific);
- os.precision(prec);
-
- return os;
- }
-
- int importMatrixMarket(ifstream &file) {
-
- vector<ijv_type> mm;
- const ordinal_type mm_base = 1;
-
- {
- string header;
- if (file.is_open()) {
- getline(file, header);
- while (file.good()) {
- char c = file.peek();
- if (c == '%' || c == '\n') {
- file.ignore(256, '\n');
- continue;
- }
- break;
- }
- } else {
- ERROR(MSG_INVALID_INPUT(file));
- }
-
- // check the header
- bool symmetry = (header.find("symmetric") != string::npos);
-
- // read matrix specification
- ordinal_type m, n;
- size_type nnz;
-
- file >> m >> n >> nnz;
-
- mm.reserve(nnz*(symmetry ? 2 : 1));
- for (size_type i=0;i<nnz;++i) {
- ordinal_type row, col;
- value_type val;
- file >> row >> col >> val;
-
- row -= mm_base;
- col -= mm_base;
-
- mm.push_back(ijv_type(row, col, val));
- if (symmetry && row != col)
- mm.push_back(ijv_type(col, row, val));
- }
- sort(mm.begin(), mm.end(), less<ijv_type>());
-
- // construct workspace and set variables
- createInternalArrays(m, n, mm.size());
- }
-
- // change mm to crs
- ijv2crs(mm);
-
- return 0;
- }
-
- int exportMatrixMarket(ofstream &file,
- const string comment,
- const int uplo = 0) {
- streamsize prec = file.precision();
- file.precision(8);
- file << scientific;
-
- file << "%%MatrixMarket matrix coordinate "
- << (is_fundamental<value_type>::value ? "real " : "complex ")
- << ((uplo == Uplo::Upper || uplo == Uplo::Lower) ? "symmetric " : "general ")
- << endl;
-
- file << comment << endl;
-
- // cnt nnz
- size_type nnz = 0;
- for (ordinal_type i=0;i<_m;++i) {
- const size_type jbegin = _ap[i], jend = _ap[i+1];
- for (size_type j=jbegin;j<jend;++j) {
- if (uplo == Uplo::Upper && i <= _aj[j]) ++nnz;
- if (uplo == Uplo::Lower && i >= _aj[j]) ++nnz;
- if (!uplo) ++nnz;
- }
- }
- file << _m << " " << _n << " " << nnz << endl;
-
- const int w = 10;
- for (ordinal_type i=0;i<_m;++i) {
- const size_type jbegin = _ap[i], jend = _ap[i+1];
- for (size_type j=jbegin;j<jend;++j) {
- bool flag = false;
- if (uplo == Uplo::Upper && i <= _aj[j]) flag = true;
- if (uplo == Uplo::Lower && i >= _aj[j]) flag = true;
- if (!uplo) flag = true;
- if (flag) {
- value_type val = _ax[j];
- file << setw(w) << ( i+1) << " "
- << setw(w) << (_aj[j]+1) << " "
- << setw(w) << val << endl;
- }
- }
- }
-
- file.unsetf(ios::scientific);
- file.precision(prec);
-
- return 0;
- }
-
- //----------------------------------------------------------------------
-
- int convertGraph(size_type_array rptr,
- ordinal_type_array cidx) const {
- ordinal_type ii = 0;
- size_type jj = 0;
-
- for (ordinal_type i=0;i<_m;++i) {
- size_type jbegin = _ap[i], jend = _ap[i+1];
- rptr[ii++] = jj;
- for (size_type j=jbegin;j<jend;++j)
- if (i != _aj[j])
- cidx[jj++] = _aj[j];
- }
- rptr[ii] = jj;
-
- return 0;
- }
-
- //----------------------------------------------------------------------
-
- };
-
-}
-
-#endif
diff --git a/lib/kokkos/example/ichol/src/crs_matrix_base_import.hpp b/lib/kokkos/example/ichol/src/crs_matrix_base_import.hpp
deleted file mode 100644
index e1ff0f3a9..000000000
--- a/lib/kokkos/example/ichol/src/crs_matrix_base_import.hpp
+++ /dev/null
@@ -1,104 +0,0 @@
-#pragma once
-#ifndef __CRS_MATRIX_BASE_IMPL_HPP__
-#define __CRS_MATRIX_BASE_IMPL_HPP__
-
-/// \file crs_matrix_base_impl.hpp
-/// \brief Implementation of external interfaces to CrsMatrixBase
-/// \author Kyungjoo Kim (kyukim@sandia.gov)
-
-namespace Tacho {
-
- using namespace std;
-
- template<typename VT,
- typename OT,
- typename ST,
- typename SpT,
- typename MT>
- inline int
- CrsMatrixBase<VT,OT,ST,SpT,MT>::importMatrixMarket(ifstream &file) {
- // skip initial title comments
- {
- ordinal_type m, n;
- size_type nnz;
-
- while (file.good()) {
- char c = file.peek();
- if (c == '%' || c == '\n') {
- file.ignore(256, '\n');
- continue;
- }
- break;
- }
-
- // read matrix specification
- file >> m >> n >> nnz;
-
- // construct workspace and set variables
- createInternalArrays(m, n, nnz);
- }
-
- // read the coordinate format (matrix-market)
- vector<ijv_type> mm;
- mm.reserve(_nnz);
- {
- // matrix market use one base index
- const ordinal_type mm_base = 1;
-
- for (size_type i=0;i<_nnz;++i) {
- ijv_type aij;
- file >> aij.Row() >> aij.Col() >> aij.Val();
-
- // one base to zero base
- aij.Row() -= mm_base;
- aij.Col() -= mm_base;
-
- mm.push_back(aij);
- }
- sort(mm.begin(), mm.end(), less<ijv_type>());
- }
-
- // change mm to crs
- {
- ordinal_type ii = 0;
- size_type jj = 0;
-
- ijv_type prev = mm[0];
- _ap[ii++] = 0;
- _aj[jj] = prev.Col();
- _ax[jj] = prev.Val();
- ++jj;
-
- for (typename vector<ijv_type>::iterator it=(mm.begin()+1);it<mm.end();++it) {
- ijv_type aij = (*it);
-
- // row index
- if (aij.Row() != prev.Row()) {
- _ap[ii++] = jj;
- }
-
- if (aij == prev) {
- --jj;
- _aj[jj] = aij.Col();
- _ax[jj] += aij.Val();
- } else {
- _aj[jj] = aij.Col();
- _ax[jj] = aij.Val();
- }
- ++jj;
-
- prev = aij;
- }
-
- // add the last index to terminate the storage
- _ap[ii++] = jj;
- _nnz = jj;
- }
-
- return 0;
- }
-
-}
-
-
-#endif
diff --git a/lib/kokkos/example/ichol/src/crs_matrix_helper.hpp b/lib/kokkos/example/ichol/src/crs_matrix_helper.hpp
deleted file mode 100644
index 5b80e7793..000000000
--- a/lib/kokkos/example/ichol/src/crs_matrix_helper.hpp
+++ /dev/null
@@ -1,71 +0,0 @@
-#pragma once
-#ifndef __CRS_MATRIX_HELPER_HPP__
-#define __CRS_MATRIX_HELPER_HPP__
-
-/// \file crs_matrix_helper.hpp
-/// \brief This file includes utility functions to convert between flat and hierarchical matrices.
-/// \author Kyungjoo Kim (kyukim@sandia.gov)
-
-#include "util.hpp"
-
-namespace Tacho {
-
- using namespace std;
-
- class CrsMatrixHelper {
- public:
-
- template< typename CrsHierBase >
- static int fillRowViewArray( CrsHierBase & HU );
-
- template<typename CrsFlatBase>
- static int
- filterZeros(CrsFlatBase &flat);
-
- /// \brief Transform a scalar flat matrix to hierarchical matrix of matrices 1x1; testing only.
- template<typename CrsFlatBase,
- typename CrsHierBase>
- static int
- flat2hier(CrsFlatBase &flat,
- CrsHierBase &hier);
-
- /// \brief Transform a scalar flat matrix to upper hierarchical matrix given scotch info.
- template<typename CrsFlatBase,
- typename CrsHierBase,
- typename HostOrdinalTypeArray >
- static int
- flat2hier(int uplo,
- CrsFlatBase &flat,
- CrsHierBase &hier,
- const typename CrsHierBase::ordinal_type nblks,
- const HostOrdinalTypeArray range,
- const HostOrdinalTypeArray tree);
-
- /// \brief Transform a scalar flat matrix to upper hierarchical matrix given scotch info.
- template<typename CrsFlatBase,
- typename CrsHierBase,
- typename HostOrdinalTypeArray >
- static int
- flat2hier_upper(CrsFlatBase &flat,
- CrsHierBase &hier,
- const typename CrsHierBase::ordinal_type nblks,
- const HostOrdinalTypeArray range,
- const HostOrdinalTypeArray tree);
-
- /// \brief Transform a scalar flat matrix to lower hierarchical matrix given scotch info.
- template<typename CrsFlatBase,
- typename CrsHierBase,
- typename HostOrdinalTypeArray >
- static int
- flat2hier_lower(CrsFlatBase &flat,
- CrsHierBase &hier,
- const typename CrsHierBase::ordinal_type nblks,
- const HostOrdinalTypeArray range,
- const HostOrdinalTypeArray tree);
- };
-
-}
-
-#include "crs_matrix_helper_impl.hpp"
-
-#endif
diff --git a/lib/kokkos/example/ichol/src/crs_matrix_helper_impl.hpp b/lib/kokkos/example/ichol/src/crs_matrix_helper_impl.hpp
deleted file mode 100644
index 0fc4c9f1b..000000000
--- a/lib/kokkos/example/ichol/src/crs_matrix_helper_impl.hpp
+++ /dev/null
@@ -1,364 +0,0 @@
-
-#ifndef __CRS_MATRIX_HELPER_IMPL_HPP__
-#define __CRS_MATRIX_HELPER_IMPL_HPP__
-
-/// \file crs_matrix_helper_impl.hpp
-/// \brief This file includes utility functions to convert between flat and hierarchical matrices.
-/// \author Kyungjoo Kim (kyukim@sandia.gov)
-
-#include "util.hpp"
-
-namespace Tacho {
-
- using namespace std;
-
- template< typename CrsHierBase >
- struct FunctorFillRowViewArray {
-
- typedef typename CrsHierBase::ordinal_type ordinal_type ;
- typedef typename CrsHierBase::row_view_type_array row_view_type_array ;
- typedef typename CrsHierBase::value_type_array ax_type ;
-
- typedef ordinal_type value_type ;
-
- row_view_type_array _all_row_views ;
- ax_type _ax ;
-
- FunctorFillRowViewArray( const row_view_type_array & arg_all_row_views
- , const ax_type & arg_ax )
- : _all_row_views( arg_all_row_views )
- , _ax( arg_ax )
- {}
-
- KOKKOS_INLINE_FUNCTION
- void operator()( ordinal_type k , ordinal_type & value ) const
- { value += _ax(k).NumRows(); }
-
- KOKKOS_INLINE_FUNCTION
- void operator()( ordinal_type k , ordinal_type & value , bool final ) const
- {
- if ( final ) {
- const int begin = value ;
- const int end = begin + _ax(k).NumRows();
-
- auto sub = Kokkos::subview( _all_row_views, Kokkos::pair<int,int>(begin,end) );
-
- _ax(k).setRowViewArray( sub );
- }
-
- value += _ax(k).NumRows();
- }
- };
-
- template< typename CrsHierBase >
- int CrsMatrixHelper::fillRowViewArray( CrsHierBase & device_HU )
- {
- typedef typename CrsHierBase::row_view_type_array row_view_type_array ;
- typedef typename CrsHierBase::space_type space_type ;
-
- ordinal_type total_row_view_count = 0 ;
-
- Kokkos::RangePolicy< space_type >
- range_policy( 0 , device_HU.NumNonZeros() );
-
- space_type::fence();
-
- {
- FunctorFillRowViewArray< CrsHierBase >
- functor( row_view_type_array() , device_HU._ax );
-
-
- Kokkos::parallel_reduce( range_policy , functor , total_row_view_count );
- }
-
- device_HU._all_row_views =
- row_view_type_array("RowViews",total_row_view_count);
-
- space_type::fence();
-
- {
- FunctorFillRowViewArray< CrsHierBase >
- functor( device_HU._all_row_views , device_HU._ax );
-
- Kokkos::parallel_scan( range_policy , functor );
- }
-
- space_type::fence();
-
- return 0 ;
- }
-
- template<typename CrsFlatBase>
- int
- CrsMatrixHelper::filterZeros(CrsFlatBase &flat) {
- typedef typename CrsFlatBase::ordinal_type ordinal_type;
- typedef typename CrsFlatBase::size_type size_type;
- typedef typename CrsFlatBase::value_type value_type;
-
- typedef typename CrsFlatBase::ordinal_type_array_ptr ordinal_type_array_ptr;
- typedef typename CrsFlatBase::value_type_array_ptr value_type_array_ptr;
-
- size_type nz = 0;
- const value_type zero(0);
-
- for (ordinal_type k=0;k<flat.NumNonZeros();++k)
- nz += (flat.Value(k) == zero) ;
-
- if (nz) {
- CrsFlatBase resized(flat.Label() + "::ZeroFiltered",
- flat.NumRows(),
- flat.NumCols(),
- flat.NumNonZeros() - nz);
-
- ordinal_type_array_ptr rows = resized.RowPtr(); rows[0] = 0;
- ordinal_type_array_ptr cols = resized.ColPtr();
- value_type_array_ptr vals = resized.ValuePtr();
-
- size_type nnz = 0;
- for (ordinal_type i=0;i<flat.NumRows();++i) {
- const ordinal_type nnz_in_row = flat.NumNonZerosInRow(i);
- const ordinal_type_array_ptr cols_in_row = flat.ColsInRow(i);
- const value_type_array_ptr vals_in_row = flat.ValuesInRow(i);
-
- for (ordinal_type j=0;j<nnz_in_row;++j) {
- if (vals_in_row[j] != zero) {
- cols[nnz] = cols_in_row[j];
- vals[nnz] = vals_in_row[j];
- ++nnz;
- }
- }
- rows[i+1] = nnz;
- }
- flat = resized;
- resized.setNumNonZeros();
- }
-
- return 0;
- }
-
-
- template<typename CrsFlatBase,
- typename CrsHierBase>
- int
- CrsMatrixHelper::flat2hier(CrsFlatBase &flat,
- CrsHierBase &hier) {
- typedef typename CrsHierBase::ordinal_type ordinal_type;
- typedef typename CrsHierBase::size_type size_type;
- typedef typename CrsHierBase::ordinal_type_array_ptr ordinal_type_array_ptr;
-
- size_type nnz = 0;
-
- hier.createInternalArrays(flat.NumRows(), flat.NumCols(), flat.NumNonZeros());
-
- for (ordinal_type i=0;i<flat.NumRows();++i) {
- ordinal_type jsize = flat.NumNonZerosInRow(i);
-
- hier._ap[i] = nnz;
- ordinal_type_array_ptr ci = flat.ColsInRow(i);
- for (ordinal_type j=0;j<jsize;++j,++nnz) {
- hier._aj[nnz] = ci[j];
- hier._ax[nnz].setView( flat, i, 1,
- /**/ ci[j], 1);
- }
- }
-
- hier._ap[flat.NumRows()] = nnz;
- hier._nnz = nnz;
-
- return 0;
- }
-
- template< typename CrsFlatBase ,
- typename CrsHierBase ,
- typename HostOrdinalTypeArray >
- int
- CrsMatrixHelper::flat2hier(int uplo,
- CrsFlatBase &flat,
- CrsHierBase &hier,
- const typename CrsHierBase::ordinal_type nblks,
- const HostOrdinalTypeArray range ,
- const HostOrdinalTypeArray tree) {
- switch(uplo) {
- case Uplo::Upper: return flat2hier_upper(flat, hier, nblks, range, tree);
- case Uplo::Lower: return flat2hier_lower(flat, hier, nblks, range, tree);
- }
- return -1;
- }
-
- template<typename CrsFlatBase,
- typename CrsHierBase,
- typename HostOrdinalTypeArray >
- int
- CrsMatrixHelper::flat2hier_upper(CrsFlatBase & device_flat,
- CrsHierBase & device_hier,
- const typename CrsHierBase::ordinal_type nblks,
- const HostOrdinalTypeArray range,
- const HostOrdinalTypeArray tree) {
- typedef typename CrsHierBase::ordinal_type ordinal_type;
- typedef typename CrsHierBase::size_type size_type;
-
- //typedef typename CrsHierBase::ordinal_type_array ordinal_type_array;
- //typedef typename CrsHierBase::ordinal_type_array_ptr ordinal_type_array_ptr;
- //typedef typename CrsHierBase::value_type_array_ptr value_type_array_ptr;
-
- size_type nnz = 0;
-
- // count nnz and nnz in rows for the upper triangular hier matrix
- for (ordinal_type i=0;i<nblks;++i)
- for (ordinal_type j=i;j != -1;++nnz,j=tree[j]) ;
-
- // create upper triangular block matrix
- device_hier.createInternalArrays(nblks, nblks, nnz);
-
- typename CrsHierBase::size_type_array::HostMirror
- host_ap = Kokkos::create_mirror_view( device_hier._ap );
-
- typename CrsHierBase::ordinal_type_array::HostMirror
- host_aj = Kokkos::create_mirror_view( device_hier._aj );
-
- typename CrsHierBase::value_type_array::HostMirror
- host_ax = Kokkos::create_mirror_view( device_hier._ax );
-
- nnz = 0;
- for (ordinal_type i=0;i<nblks;++i) {
- host_ap[i] = nnz;
- for (ordinal_type j=i;j != -1;++nnz,j=tree[j]) {
- host_aj[nnz] = j;
- host_ax[nnz].setView( device_flat, range[i], (range[i+1] - range[i]),
- /**/ range[j], (range[j+1] - range[j]));
-
- // this checking might more expensive
- // and attempts to access device memory from the host
- // if (!host_ax[nnz].countNumNonZeros())
- // --nnz;
- }
- }
-
- host_ap[nblks] = nnz;
-
- Kokkos::deep_copy( device_hier._ap , host_ap );
- Kokkos::deep_copy( device_hier._aj , host_aj );
- Kokkos::deep_copy( device_hier._ax , host_ax );
-
- device_hier._nnz = nnz;
-
- return 0;
- }
-
- // template<typename CrsFlatBase,
- // typename CrsHierBase>
- // int
- // CrsMatrixHelper::flat2hier_upper(CrsFlatBase &flat,
- // CrsHierBase &hier,
- // const typename CrsHierBase::ordinal_type nblks,
- // const typename CrsHierBase::ordinal_type_array range,
- // const typename CrsHierBase::ordinal_type_array tree) {
- // typedef typename CrsHierBase::ordinal_type ordinal_type;
- // typedef typename CrsHierBase::size_type size_type;
-
- // typedef typename CrsHierBase::ordinal_type_array ordinal_type_array;
- // //typedef typename CrsHierBase::ordinal_type_array_ptr ordinal_type_array_ptr;
- // //typedef typename CrsHierBase::value_type_array_ptr value_type_array_ptr;
-
- // ordinal_type_array sibling("CrsMatrixHelper::flat2hier_upper::sibling", nblks);
-
- // // check the end of adjacent siblings (if not adjacent, they are separators)
- // ordinal_type p = tree[0];
- // for (ordinal_type i=1;i<nblks;++i) {
- // const ordinal_type j = tree[i];
- // if (p != j) {
- // p = j;
- // sibling[i-1] = -1;
- // }
- // }
- // sibling[nblks-1] = -1;
-
- // size_type nnz = 0;
-
- // // count nnz and nnz in rows for the upper triangular hier matrix
- // for (ordinal_type i=0;i<nblks;++i) { // search for all rows
- // for (ordinal_type j=i;j != -1;j=tree[j]) { // move up
- // ordinal_type k=j;
- // do {
- // ++nnz;
- // } while (sibling[k++] != -1);
- // }
- // }
-
- // // create upper triangular block matrix
- // hier.createInternalArrays(nblks, nblks, nnz);
-
- // nnz = 0;
- // for (ordinal_type i=0;i<nblks;++i) {
- // hier._ap[i] = nnz;
- // for (ordinal_type j=i;j != -1;j=tree[j]) {
- // ordinal_type k=j;
- // do {
- // hier._aj[nnz] = k;
- // hier._ax[nnz].setView( flat, range[i], (range[i+1] - range[i]),
- // /**/ range[k], (range[k+1] - range[k]));
-
- // // this checking might more expensive
- // if (hier._ax[nnz].hasNumNonZeros())
- // ++nnz;
- // } while (sibling[k++] != -1);
- // }
- // }
- // hier._ap[nblks] = nnz;
- // hier._nnz = nnz;
-
- // return 0;
- // }
-
- template<typename CrsFlatBase,
- typename CrsHierBase,
- typename HostOrdinalTypeArray >
- int
- CrsMatrixHelper::flat2hier_lower(CrsFlatBase &flat,
- CrsHierBase &hier,
- const typename CrsHierBase::ordinal_type nblks,
- const HostOrdinalTypeArray range,
- const HostOrdinalTypeArray tree) {
- ERROR(MSG_NOT_YET_IMPLEMENTED);
-
- // typedef typename CrsHierBase::ordinal_type ordinal_type;
- // typedef typename CrsHierBase::size_type size_type;
-
- // typedef typename CrsHierBase::ordinal_type_array ordinal_type_array;
- // //typedef typename CrsHierBase::ordinal_type_array_ptr ordinal_type_array_ptr;
- // //typedef typename CrsHierBase::value_type_array_ptr value_type_array_ptr;
-
- // ordinal_type_array tmp = ordinal_type_array("flat2hier:tmp", nblks+1);
- // size_type nnz = 0;
-
- // // count nnz and nnz in rows for lower triangular matrix
- // for (ordinal_type i=0;i<nblks;++i)
- // for (ordinal_type j=i;j != -1;++nnz) {
- // ++tmp[j];
- // j = tree[j];
- // }
-
- // // count nnz and nnz in rows for lower triangular matrix
- // hier.createInternalArrays(nblks, nblks, nnz);
- // for (ordinal_type i=1;i<(nblks+1);++i)
- // hier._ap[i] = hier._ap[i-1] + tmp[i-1];
-
- // for (ordinal_type i=0;i<(nblks+1);++i)
- // tmp[i] = hier._ap[i];
-
- // for (ordinal_type i=0;i<nblks;++i)
- // for (ordinal_type j=i;j != -1;j=tree[j]) {
- // hier._aj[tmp[j]] = i;
- // hier._ax[tmp[j]].setView( flat, range[j], (range[j+1] - range[j]),
- // /**/ range[i], (range[i+1] - range[i]));
- // ++tmp[j];
- // }
-
- return 0;
- }
-
-}
-
-
-#endif
-
diff --git a/lib/kokkos/example/ichol/src/crs_matrix_view.hpp b/lib/kokkos/example/ichol/src/crs_matrix_view.hpp
deleted file mode 100644
index 2a55e6fac..000000000
--- a/lib/kokkos/example/ichol/src/crs_matrix_view.hpp
+++ /dev/null
@@ -1,226 +0,0 @@
-#pragma once
-#ifndef __CRS_MATRIX_VIEW_HPP__
-#define __CRS_MATRIX_VIEW_HPP__
-
-/// \file crs_matrix_view.hpp
-/// \brief CRS matrix view object creates 2D view to setup a computing region.
-/// \author Kyungjoo Kim (kyukim@sandia.gov)
-
-#include "util.hpp"
-
-namespace Tacho {
-
- using namespace std;
-
- template<typename CrsMatBaseType>
- class CrsRowView;
-
- template<typename CrsMatBaseType>
- class CrsMatrixView {
- public:
- typedef typename CrsMatBaseType::space_type space_type;
-
- typedef typename CrsMatBaseType::value_type value_type;
- typedef typename CrsMatBaseType::ordinal_type ordinal_type;
- typedef typename CrsMatBaseType::size_type size_type;
-
- typedef CrsMatBaseType mat_base_type;
- typedef CrsRowView<mat_base_type> row_view_type;
-
- // be careful this use rcp and atomic operation
- // - use setView to create a view if _rows is not necessary
- // - copy constructor and assignment operator will do soft copy of the object
- typedef Kokkos::View<row_view_type*,space_type,Kokkos::MemoryUnmanaged> row_view_type_array;
-
- private:
- CrsMatBaseType _base; // shallow copy of the base object
- ordinal_type _offm; // offset in rows
- ordinal_type _offn; // offset in cols
- ordinal_type _m; // # of rows
- ordinal_type _n; // # of cols
-
- row_view_type_array _rows;
-
- public:
-
- KOKKOS_INLINE_FUNCTION
- void setRowViewArray( const row_view_type_array & arg_rows )
- {
- _rows = arg_rows ;
-
- for (ordinal_type i=0;i<_m;++i) {
- _rows[i].setView(*this, i);
- }
- }
-
- KOKKOS_INLINE_FUNCTION
- row_view_type& RowView(const ordinal_type i) const { return _rows[i]; }
-
- KOKKOS_INLINE_FUNCTION
- void setView(const CrsMatBaseType &base,
- const ordinal_type offm, const ordinal_type m,
- const ordinal_type offn, const ordinal_type n) {
- _base = base;
-
- _offm = offm; _m = m;
- _offn = offn; _n = n;
- }
-
- KOKKOS_INLINE_FUNCTION
- const CrsMatBaseType & BaseObject() const { return _base; }
-
- KOKKOS_INLINE_FUNCTION
- ordinal_type OffsetRows() const { return _offm; }
-
- KOKKOS_INLINE_FUNCTION
- ordinal_type OffsetCols() const { return _offn; }
-
- KOKKOS_INLINE_FUNCTION
- ordinal_type NumRows() const { return _m; }
-
- KOKKOS_INLINE_FUNCTION
- ordinal_type NumCols() const { return _n; }
-
- KOKKOS_INLINE_FUNCTION
- bool hasNumNonZeros() const {
- const ordinal_type m = NumRows();
- for (ordinal_type i=0;i<m;++i) {
- row_view_type row;
- row.setView(*this, i);
- if (row.NumNonZeros()) return true;
- }
- return false;
- }
-
- inline
- size_type countNumNonZeros() const {
- size_type nnz = 0;
- const ordinal_type m = NumRows();
- for (ordinal_type i=0;i<m;++i) {
- row_view_type row;
- row.setView(*this, i);
- nnz += row.NumNonZeros();
- }
- return nnz;
- }
-
- KOKKOS_INLINE_FUNCTION
- CrsMatrixView()
- : _base(),
- _offm(0),
- _offn(0),
- _m(0),
- _n(0),
- _rows()
- { }
-
- KOKKOS_INLINE_FUNCTION
- CrsMatrixView(const CrsMatrixView &b)
- : _base(b._base),
- _offm(b._offm),
- _offn(b._offn),
- _m(b._m),
- _n(b._n),
- _rows(b._rows)
- { }
-
- KOKKOS_INLINE_FUNCTION
- CrsMatrixView(const CrsMatBaseType & b)
- : _base(b),
- _offm(0),
- _offn(0),
- _m(b.NumRows()),
- _n(b.NumCols()),
- _rows()
- { }
-
- CrsMatrixView(const CrsMatBaseType & b,
- const ordinal_type offm, const ordinal_type m,
- const ordinal_type offn, const ordinal_type n)
- : _base(b),
- _offm(offm),
- _offn(offn),
- _m(m),
- _n(n),
- _rows()
- { }
-
- ostream& showMe(ostream &os) const {
- const int w = 4;
- os << "CrsMatrixView, "
- << " Offs ( " << setw(w) << _offm << ", " << setw(w) << _offn << " ); "
- << " Dims ( " << setw(w) << _m << ", " << setw(w) << _n << " ); "
- << " NumNonZeros = " << countNumNonZeros() << ";";
-
- return os;
- }
-
- };
-}
-
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-#if ! KOKKOS_USING_EXP_VIEW
-
-namespace Kokkos {
- namespace Impl {
-
- // The Kokkos::View allocation will by default assign each allocated datum to zero.
- // This is not the required initialization behavior when
- // Tacho::CrsRowView and Tacho::CrsMatrixView
- // are used within a Kokkos::View.
- // Create a partial specialization of the Kokkos::Impl::AViewDefaultConstruct
- // to replace the assignment initialization with placement new initialization.
- //
- // This work-around is necessary until a TBD design refactorization of Kokkos::View.
-
- template< class ExecSpace , typename T >
- struct ViewDefaultConstruct< ExecSpace , Tacho::CrsRowView<T> , true >
- {
- typedef Tacho::CrsRowView<T> type ;
- type * const m_ptr ;
-
- KOKKOS_FORCEINLINE_FUNCTION
- void operator()( const typename ExecSpace::size_type& i ) const
- { new(m_ptr+i) type(); }
-
- ViewDefaultConstruct( type * pointer , size_t capacity )
- : m_ptr( pointer )
- {
- Kokkos::RangePolicy< ExecSpace > range( 0 , capacity );
- parallel_for( range , *this );
- ExecSpace::fence();
- }
- };
-
- template< class ExecSpace , typename T >
- struct ViewDefaultConstruct< ExecSpace , Tacho::CrsMatrixView<T> , true >
- {
- typedef Tacho::CrsMatrixView<T> type ;
- type * const m_ptr ;
-
- KOKKOS_FORCEINLINE_FUNCTION
- void operator()( const typename ExecSpace::size_type& i ) const
- { new(m_ptr+i) type(); }
-
- ViewDefaultConstruct( type * pointer , size_t capacity )
- : m_ptr( pointer )
- {
- Kokkos::RangePolicy< ExecSpace > range( 0 , capacity );
- parallel_for( range , *this );
- ExecSpace::fence();
- }
- };
-
- } // namespace Impl
-} // namespace Kokkos
-
-#endif /* #if ! KOKKOS_USING_EXP_VIEW */
-
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-#endif
diff --git a/lib/kokkos/example/ichol/src/crs_row_view.hpp b/lib/kokkos/example/ichol/src/crs_row_view.hpp
deleted file mode 100644
index 8556bcb9e..000000000
--- a/lib/kokkos/example/ichol/src/crs_row_view.hpp
+++ /dev/null
@@ -1,185 +0,0 @@
-#pragma once
-#ifndef __CRS_ROW_VIEW_HPP__
-#define __CRS_ROW_VIEW_HPP__
-
-/// \file crs_row_view.hpp
-/// \brief A view to a row extracted from CrsMatrixView.
-/// \author Kyungjoo Kim (kyukim@sandia.gov)
-
-namespace Tacho {
-
- using namespace std;
-
- /// \class CrsRowView
- template<typename CrsMatBaseType>
- class CrsRowView {
- public:
- typedef typename CrsMatBaseType::ordinal_type ordinal_type;
- typedef typename CrsMatBaseType::value_type value_type;
- typedef typename CrsMatBaseType::ordinal_type_array_ptr ordinal_type_array_ptr;
- typedef typename CrsMatBaseType::value_type_array_ptr value_type_array_ptr;
-
- private:
- // row info
- ordinal_type _offn, _n;
-
- // this assumes a contiguous memory buffer
- ordinal_type_array_ptr _aj, _ajn; // column index compressed format in row
- value_type_array_ptr _ax; // values
-
- static KOKKOS_INLINE_FUNCTION
- typename CrsMatBaseType::ordinal_type_array_ptr
- lower_bound( typename CrsMatBaseType::ordinal_type_array_ptr begin ,
- typename CrsMatBaseType::ordinal_type_array_ptr const end ,
- typename CrsMatBaseType::ordinal_type const val )
- {
- typename CrsMatBaseType::ordinal_type_array_ptr it = begin ;
- int count = end - begin ;
- int step = 0 ;
- while (count>0) {
- it = begin ;
- it += ( step = (count >> 1) );
- if (*it<val) {
- begin=++it;
- count-=step+1;
- }
- else { count=step; }
- }
- return begin;
- }
-
- public:
- KOKKOS_INLINE_FUNCTION
- ordinal_type OffsetCols() const { return _offn; }
-
- KOKKOS_INLINE_FUNCTION
- ordinal_type NumCols() const { return _n; }
-
- KOKKOS_INLINE_FUNCTION
- ordinal_type NumNonZeros() const { return _ajn - _aj; }
-
- KOKKOS_INLINE_FUNCTION
- ordinal_type Col(const ordinal_type j) const { return _aj[j] - _offn; }
-
- KOKKOS_INLINE_FUNCTION
- value_type& Value(const ordinal_type j) { return _ax[j]; }
-
- KOKKOS_INLINE_FUNCTION
- value_type Value(const ordinal_type j) const { return _ax[j]; }
-
- KOKKOS_INLINE_FUNCTION
- ordinal_type Index(const ordinal_type col ) const {
- const ordinal_type loc = _offn + col ;
- // binary search
- ordinal_type_array_ptr aj = CrsRowView::lower_bound(_aj, _ajn, loc);
-
- // if found, return index for the location,
- // otherwise return -1 (not found), -2 (end of array)
- return (aj < _ajn ? (*aj == loc ? aj - _aj : -1) : -2);
- }
-
- KOKKOS_INLINE_FUNCTION
- ordinal_type Index(const ordinal_type col,
- const ordinal_type prev ) const {
- const ordinal_type loc = _offn + col;
- ordinal_type_array_ptr aj = _aj + prev;
-
- // binary search
- // aj = lower_bound(aj, _ajn, loc);
-
- // linear search from prev: this gains about 45 % faster
- for ( ;aj < _ajn && *aj<loc; ++aj);
-
- // if found, return index for the location,
- // otherwise return -1 (not found), -2 (end of array)
- return (aj < _ajn ? (*aj == loc ? aj - _aj : -1) : -2);
- }
-
- KOKKOS_INLINE_FUNCTION
- value_type ValueAtColumn(const ordinal_type col) const {
- const ordinal_type j = Index(col);
- return (j < 0 ? value_type(0) : _ax[j]);
- }
-
- KOKKOS_INLINE_FUNCTION
- CrsRowView()
- : _offn(0),
- _n(0),
- _aj(),
- _ajn(),
- _ax()
- { }
-
-
- KOKKOS_INLINE_FUNCTION
- CrsRowView(const ordinal_type offn,
- const ordinal_type n,
- const ordinal_type_array_ptr aj,
- const ordinal_type_array_ptr ajn,
- const value_type_array_ptr ax)
- : _offn(offn),
- _n(n),
- _aj(aj),
- _ajn(ajn),
- _ax(ax)
- { }
-
- KOKKOS_INLINE_FUNCTION
- CrsRowView(const CrsMatrixView<CrsMatBaseType> &A,
- const ordinal_type i) {
- this->setView(A, i);
- }
-
- KOKKOS_INLINE_FUNCTION
- CrsRowView(const CrsMatBaseType &A,
- const ordinal_type i) {
- this->setView(A, i);
- }
-
- KOKKOS_INLINE_FUNCTION
- void setView(const CrsMatrixView<CrsMatBaseType> &A,
- const ordinal_type i) {
- _offn = A.OffsetCols();
- _n = A.NumCols();
-
- const ordinal_type ii = A.OffsetRows() + i;
-
- const typename CrsMatBaseType::ordinal_type_array_ptr cols = A.BaseObject().ColsInRow(ii);
- const typename CrsMatBaseType::ordinal_type_array_ptr next = A.BaseObject().ColsInRow(ii+1);
- const typename CrsMatBaseType::value_type_array_ptr vals = A.BaseObject().ValuesInRow(ii);
-
- // [cols..next) is sorted so a log(N) search could performed
- _aj = CrsRowView::lower_bound(cols, next, _offn);
- _ajn = CrsRowView::lower_bound(_aj, next, _offn+_n);
-
- _ax = &vals[_aj - cols];
- }
-
- KOKKOS_INLINE_FUNCTION
- void setView(const CrsMatBaseType &A,
- const ordinal_type i) {
- _offn = 0;
- _n = A.NumCols();
- _aj = A.ColsInRow(i);
- _ajn = A.ColsInRow(i+1);
- _ax = A.ValuesInRow(i);
- }
-
- ostream& showMe(ostream &os) const {
- const ordinal_type nnz = NumNonZeros();
- const ordinal_type offset = OffsetCols();
- os << " offset = " << offset
- << ", nnz = " << nnz
- << endl;
- for (ordinal_type j=0;j<nnz;++j) {
- const value_type val = _ax[j];
- os << "(" << Col(j) << ", "
- << val << ")"
- << endl;
- }
- return os;
- }
- };
-}
-
-#endif
diff --git a/lib/kokkos/example/ichol/src/dot.hpp b/lib/kokkos/example/ichol/src/dot.hpp
deleted file mode 100644
index acf927e06..000000000
--- a/lib/kokkos/example/ichol/src/dot.hpp
+++ /dev/null
@@ -1,74 +0,0 @@
-#pragma once
-#ifndef __DOT_HPP__
-#define __DOT_HPP__
-
-/// \file dot.hpp
-/// \brief Sparse dot product.
-/// \author Kyungjoo Kim (kyukim@sandia.gov)
-
-/// dot_type result = x^H y
-
-namespace Tacho {
-
- using namespace std;
-
- template<typename T> struct DotTraits {
- typedef T dot_type;
-
- static KOKKOS_FORCEINLINE_FUNCTION
- dot_type
- // dot(const T &x, const T &y) { return conj<T>(x)*y; }
- dot(const T &x, const T &y) { return x*y; }
- };
-
- template<typename CrsRowViewType>
- KOKKOS_INLINE_FUNCTION
- typename CrsRowViewType::value_type
- dot(const CrsRowViewType x, const CrsRowViewType y) {
- typedef typename CrsRowViewType::ordinal_type ordinal_type;
- typedef typename CrsRowViewType::value_type value_type;
-
- typedef DotTraits<value_type> dot_traits;
-
- value_type r_val(0);
-
- const ordinal_type nnz_x = x.NumNonZeros();
- const ordinal_type nnz_y = y.NumNonZeros();
-
- for (ordinal_type jx=0, jy=0;jx<nnz_x && jy<nnz_y;) {
- const ordinal_type diff = x.Col(jx) - y.Col(jy);
- const ordinal_type sign = (0 < diff) - (diff < 0);
- switch (sign) {
- case 0:
- r_val += dot_traits::dot(x.Value(jx++), y.Value(jy++));
- break;
- case -1: ++jx; break;
- case 1: ++jy; break;
- }
- }
-
- return r_val;
- }
-
- template<typename CrsRowViewType>
- KOKKOS_INLINE_FUNCTION
- typename CrsRowViewType::value_type
- dot(const CrsRowViewType x) {
- typedef typename CrsRowViewType::ordinal_type ordinal_type;
- typedef typename CrsRowViewType::value_type value_type;
-
- typedef DotTraits<value_type> dot_traits;
-
- value_type r_val(0);
-
- const ordinal_type nnz = x.NumNonZeros();
-
- for (ordinal_type j=0;j<nnz;++j)
- r_val += dot_traits::dot(x.Value(j), x.Value(j));
-
- return r_val;
- }
-
-}
-
-#endif
diff --git a/lib/kokkos/example/ichol/src/gemm.hpp b/lib/kokkos/example/ichol/src/gemm.hpp
deleted file mode 100644
index 33c6058ec..000000000
--- a/lib/kokkos/example/ichol/src/gemm.hpp
+++ /dev/null
@@ -1,99 +0,0 @@
-#pragma once
-#ifndef __GEMM_HPP__
-#define __GEMM_HPP__
-
-/// \file gemm.hpp
-/// \brief Sparse matrix-matrix multiplication on given sparse patterns.
-/// \author Kyungjoo Kim (kyukim@sandia.gov)
-
-#include "util.hpp"
-#include "control.hpp"
-#include "partition.hpp"
-
-namespace Tacho {
-
- using namespace std;
-
- template<int ArgTransA, int ArgTransB, int ArgAlgo,
- int ArgVariant = Variant::One,
- template<int,int> class ControlType = Control>
- struct Gemm {
-
- // data-parallel interface
- // =======================
- template<typename ScalarType,
- typename ExecViewTypeA,
- typename ExecViewTypeB,
- typename ExecViewTypeC>
- KOKKOS_INLINE_FUNCTION
- static int invoke(typename ExecViewTypeA::policy_type &policy,
- const typename ExecViewTypeA::policy_type::member_type &member,
- const ScalarType alpha,
- typename ExecViewTypeA::matrix_type &A,
- typename ExecViewTypeB::matrix_type &B,
- const ScalarType beta,
- typename ExecViewTypeC::matrix_type &C);
-
- // task-data parallel interface
- // ============================
- template<typename ScalarType,
- typename ExecViewTypeA,
- typename ExecViewTypeB,
- typename ExecViewTypeC>
- class TaskFunctor {
- public:
- typedef typename ExecViewTypeA::policy_type policy_type;
- typedef typename policy_type::member_type member_type;
- typedef int value_type;
-
- private:
- ScalarType _alpha, _beta;
- typename ExecViewTypeA::matrix_type _A;
- typename ExecViewTypeB::matrix_type _B;
- typename ExecViewTypeC::matrix_type _C;
-
- policy_type _policy;
-
- public:
- KOKKOS_INLINE_FUNCTION
- TaskFunctor(const policy_type & P,
- const ScalarType alpha,
- const typename ExecViewTypeA::matrix_type & A,
- const typename ExecViewTypeB::matrix_type & B,
- const ScalarType beta,
- const typename ExecViewTypeC::matrix_type & C)
- : _alpha(alpha),
- _beta(beta),
- _A(A),
- _B(B),
- _C(C),
- _policy(P)
- { }
-
- string Label() const { return "Gemm"; }
-
- // task execution
- KOKKOS_INLINE_FUNCTION
- void apply(value_type &r_val) {
- r_val = Gemm::invoke<ScalarType,ExecViewTypeA,ExecViewTypeB,ExecViewTypeC>(_policy, _policy.member_single(),
- _alpha, _A, _B, _beta, _C);
- }
-
- // task-data execution
- KOKKOS_INLINE_FUNCTION
- void apply(const member_type &member, value_type &r_val) {
- r_val = Gemm::invoke<ScalarType,ExecViewTypeA,ExecViewTypeB,ExecViewTypeC>(_policy, member,
- _alpha, _A, _B, _beta, _C);
- }
-
- };
-
- };
-
-}
-
-
-// #include "gemm_nt_nt.hpp"
-#include "gemm_ct_nt.hpp"
-
-#endif
diff --git a/lib/kokkos/example/ichol/src/gemm_ct_nt.hpp b/lib/kokkos/example/ichol/src/gemm_ct_nt.hpp
deleted file mode 100644
index 13d2518ca..000000000
--- a/lib/kokkos/example/ichol/src/gemm_ct_nt.hpp
+++ /dev/null
@@ -1,12 +0,0 @@
-#pragma once
-#ifndef __GEMM_CT_NT_HPP__
-#define __GEMM_CT_NT_HPP__
-
-/// \file gemm_ct_nt.hpp
-/// \brief Sparse matrix-matrix multiplication on given sparse patterns.
-/// \author Kyungjoo Kim (kyukim@sandia.gov)
-
-#include "gemm_ct_nt_for_factor_blocked.hpp"
-// #include "gemm_ct_nt_for_tri_solve_blocked.hpp"
-
-#endif
diff --git a/lib/kokkos/example/ichol/src/gemm_ct_nt_for_factor_blocked.hpp b/lib/kokkos/example/ichol/src/gemm_ct_nt_for_factor_blocked.hpp
deleted file mode 100644
index 88a465848..000000000
--- a/lib/kokkos/example/ichol/src/gemm_ct_nt_for_factor_blocked.hpp
+++ /dev/null
@@ -1,108 +0,0 @@
-#pragma once
-#ifndef __GEMM_CT_NT_FOR_FACTOR_BLOCKED_HPP__
-#define __GEMM_CT_NT_FOR_FACTOR_BLOCKED_HPP__
-
-/// \file gemm_ct_nt_for_factor_blocked.hpp
-/// \brief Sparse matrix-matrix multiplication on given sparse patterns.
-/// \author Kyungjoo Kim (kyukim@sandia.gov)
-
-namespace Tacho {
-
- using namespace std;
-
- // Gemm used in the factorization phase
- // ====================================
- template<>
- template<typename ScalarType,
- typename CrsExecViewTypeA,
- typename CrsExecViewTypeB,
- typename CrsExecViewTypeC>
- KOKKOS_INLINE_FUNCTION
- int
- Gemm<Trans::ConjTranspose,Trans::NoTranspose,
- AlgoGemm::ForFactorBlocked>
- ::invoke(typename CrsExecViewTypeA::policy_type &policy,
- const typename CrsExecViewTypeA::policy_type::member_type &member,
- const ScalarType alpha,
- typename CrsExecViewTypeA::matrix_type &A,
- typename CrsExecViewTypeB::matrix_type &B,
- const ScalarType beta,
- typename CrsExecViewTypeC::matrix_type &C) {
- typedef typename CrsExecViewTypeA::ordinal_type ordinal_type;
- typedef typename CrsExecViewTypeA::value_type value_type;
- typedef typename CrsExecViewTypeA::row_view_type row_view_type;
-
-
-if ( false && member.team_rank() == 0 ) {
- printf("Gemm [%d +%d)x[%d +%d)\n"
- , C.OffsetRows()
- , C.NumRows()
- , C.OffsetCols()
- , C.NumCols()
- );
-}
-
- // scale the matrix C with beta
- scaleCrsMatrix<ScalarType,CrsExecViewTypeC>(member, beta, C);
-
- // Sparse matrix-matrix multiply:
- // C(i,j) += alpha*A'(i,k)*B(k,j)
-
- const ordinal_type mA = A.NumRows();
- for (ordinal_type k=0;k<mA;++k) {
- row_view_type &a = A.RowView(k);
- const ordinal_type nnz_a = a.NumNonZeros();
-
- row_view_type &b = B.RowView(k);
- const ordinal_type nnz_b = b.NumNonZeros();
-
- if (nnz_a > 0 && nnz_b > 0 ) {
-#if 0
- Kokkos::parallel_for(
- Kokkos::TeamThreadRange(member, 0, nnz_a),
- [&](const ordinal_type i) {
- const ordinal_type row_at_i = a.Col(i);
- const value_type val_at_ik = a.Value(i);
- // const value_type val_at_ik = conj(a.Value(i));
-
- row_view_type &c = C.RowView(row_at_i);
-
- ordinal_type idx = 0;
- for (ordinal_type j=0;j<nnz_b && (idx > -2);++j) {
- const ordinal_type col_at_j = b.Col(j);
- const value_type val_at_kj = b.Value(j);
-
- idx = c.Index(col_at_j, idx);
- if (idx >= 0)
- c.Value(idx) += alpha*val_at_ik*val_at_kj;
- }
- });
-#else
- Kokkos::parallel_for(
- Kokkos::TeamThreadRange(member, 0, nnz_a * nnz_b ),
- [&](const ordinal_type ii) {
- const ordinal_type i = ii / nnz_a ;
- const ordinal_type j = ii % nnz_a ;
-
- row_view_type &c = C.RowView( a.Col(i) );
-
- // Binary search for c's index of b.Col(j)
- const ordinal_type idx = c.Index( b.Col(j) );
-
- if (idx >= 0) {
- // const value_type val_at_ik = conj(a.Value(i));
- c.Value(idx) += alpha * a.Value(i) * b.Value(j);
- }
- });
-#endif
-
- member.team_barrier();
- }
- }
-
- return 0;
- }
-
-}
-
-#endif
diff --git a/lib/kokkos/example/ichol/src/graph_helper_scotch.hpp b/lib/kokkos/example/ichol/src/graph_helper_scotch.hpp
deleted file mode 100644
index d2dd00457..000000000
--- a/lib/kokkos/example/ichol/src/graph_helper_scotch.hpp
+++ /dev/null
@@ -1,427 +0,0 @@
-#pragma once
-#ifndef __GRAPH_HELPER_SCOTCH_HPP__
-#define __GRAPH_HELPER_SCOTCH_HPP__
-
-/// \file graph_helper_scotch.hpp
-/// \brief Interface to scotch reordering
-/// \author Kyungjoo Kim (kyukim@sandia.gov)
-
-#include "scotch.h"
-#include "util.hpp"
-
-namespace Tacho {
-
- using namespace std;
-
- template<class CrsMatBaseType>
- class GraphHelper_Scotch : public Disp {
- public:
- typedef typename CrsMatBaseType::ordinal_type ordinal_type;
- typedef typename CrsMatBaseType::size_type size_type;
-
- typedef typename CrsMatBaseType::ordinal_type_array ordinal_type_array;
- typedef typename CrsMatBaseType::size_type_array size_type_array;
-
- private:
- string _label;
-
- // scotch main data structure
- SCOTCH_Graph _graph;
- SCOTCH_Num _strat;
- int _level;
-
- // scotch input has no diagonal contribution
- ordinal_type _base,_m;
- ordinal_type_array _cidx;
-
- size_type _nnz;
- size_type_array _rptr;
-
- // scotch output
- ordinal_type _cblk;
- ordinal_type_array _perm,_peri,_range,_tree;
-
- // status flag
- bool _is_ordered;
-
- public:
-
- void setLabel(string label) { _label = label; }
- string Label() const { return _label; }
-
- size_type NumNonZeros() const { return _nnz; }
- ordinal_type NumRows() const { return _m; }
-
- size_type_array RowPtrVector() const { return _rptr; }
- ordinal_type_array ColIndexVector() const { return _cidx; }
-
- ordinal_type_array PermVector() const { return _perm; }
- ordinal_type_array InvPermVector() const { return _peri; }
-
- ordinal_type_array RangeVector() const { return _range; }
- ordinal_type_array TreeVector() const { return _tree; }
-
- ordinal_type NumBlocks() const { return _cblk; }
-
- GraphHelper_Scotch() = default;
-
- // convert graph first
- GraphHelper_Scotch(const string label,
- const ordinal_type m,
- const size_type_array rptr,
- const ordinal_type_array cidx,
- const int seed = GraphHelper::DefaultRandomSeed) {
-
- _label = "GraphHelper_Scotch::" + label;
-
- _is_ordered = false;
- _cblk = 0;
-
- // scotch does not allow self-contribution (diagonal term in sparse matrix)
- _base = 0; //A.BaseVal();
- _m = m; // A.NumRows();
- _nnz = rptr[m]; //A.NumNonZeros();
-
- _rptr = rptr; //size_type_array(_label+"::RowPtrArray", _m+1);
- _cidx = cidx; //ordinal_type_array(_label+"::ColIndexArray", _nnz);
-
- _perm = ordinal_type_array(_label+"::PermutationArray", _m);
- _peri = ordinal_type_array(_label+"::InvPermutationArray", _m);
- _range = ordinal_type_array(_label+"::RangeArray", _m);
- _tree = ordinal_type_array(_label+"::TreeArray", _m);
-
- // create a graph structure without diagonals
- _strat = 0;
- _level = 0;
-
- //A.convertGraph(_nnz, _rptr, _cidx);
-
- int ierr = 0;
- ordinal_type *rptr_ptr = reinterpret_cast<ordinal_type*>(_rptr.ptr_on_device());
- ordinal_type *cidx_ptr = reinterpret_cast<ordinal_type*>(_cidx.ptr_on_device());
-
- if (seed != GraphHelper::DefaultRandomSeed) {
- SCOTCH_randomSeed(seed);
- SCOTCH_randomReset();
- }
-
- ierr = SCOTCH_graphInit(&_graph);CHKERR(ierr);
- ierr = SCOTCH_graphBuild(&_graph, // scotch graph
- _base, // base value
- _m, // # of vertices
- rptr_ptr, // column index array pointer begin
- rptr_ptr+1, // column index array pointer end
- NULL, // weights on vertices (optional)
- NULL, // label array on vertices (optional)
- _nnz, // # of nonzeros
- cidx_ptr, // column index array
- NULL);CHKERR(ierr); // edge load array (optional)
- ierr = SCOTCH_graphCheck(&_graph);CHKERR(ierr);
- }
- GraphHelper_Scotch(const GraphHelper_Scotch &b) = default;
-
- virtual~GraphHelper_Scotch() {
- SCOTCH_graphFree(&_graph);
- }
-
- void setStratGraph(const SCOTCH_Num strat = 0) {
- _strat = strat;
- }
-
- void setTreeLevel(const int level = 0) {
- _level = level;
- }
-
- int computeOrdering(const ordinal_type treecut = 0,
- const ordinal_type minblksize = 0) {
- int ierr = 0;
-
- // pointers for global graph ordering
- ordinal_type *perm = _perm.ptr_on_device();
- ordinal_type *peri = _peri.ptr_on_device();
- ordinal_type *range = _range.ptr_on_device();
- ordinal_type *tree = _tree.ptr_on_device();
-
- {
- const int level = (_level ? _level : max(1, int(log2(_m)-treecut))); // level = log2(_nnz)+10;
- SCOTCH_Strat stradat;
- SCOTCH_Num straval = _strat;
- //(SCOTCH_STRATLEVELMAX));// |
- //SCOTCH_STRATLEVELMIN |
- //SCOTCH_STRATLEAFSIMPLE |
- //SCOTCH_STRATSEPASIMPLE);
-
- ierr = SCOTCH_stratInit(&stradat);CHKERR(ierr);
-
- // if both are zero, do not run strategy
- if (_strat || _level) {
- cout << "GraphHelper_Scotch:: User provide a strategy and/or level" << endl
- << " strategy = " << _strat << ", level = " << _level << endl;
- ierr = SCOTCH_stratGraphOrderBuild (&stradat, straval, level, 0.2);CHKERR(ierr);
- }
- ierr = SCOTCH_graphOrder(&_graph,
- &stradat,
- perm,
- peri,
- &_cblk,
- range,
- tree);CHKERR(ierr);
- SCOTCH_stratExit(&stradat);
- }
-
-#if 0
- {
- // assume there are multiple roots
- range[_cblk+1] = range[_cblk]; // dummy range
- tree[_cblk] = -1; // dummy root
- for (ordinal_type i=0;i<_cblk;++i)
- if (tree[i] == -1) // multiple roots becomes children of the hummy root
- tree[i] = (_cblk+1);
- ++_cblk; // include the dummy root
- }
-#endif
-
- // provided blksize is greater than 0, reorder internally
- // if (treecut > 0 && minblksize > 0) {
- // // graph array
- // ordinal_type *rptr_ptr = reinterpret_cast<ordinal_type*>(_rptr.ptr_on_device());
- // ordinal_type *cidx_ptr = reinterpret_cast<ordinal_type*>(_cidx.ptr_on_device());
-
- // // create workspace in
- // size_type_array rptr_work = size_type_array(_label+"::Block::RowPtrArray", _m+1);
- // ordinal_type_array cidx_work = ordinal_type_array(_label+"::Block::ColIndexArray", _nnz);
-
- // // create workspace output
- // ordinal_type_array perm_work = ordinal_type_array(_label+"::Block::PermutationArray", _m);
- // ordinal_type_array peri_work = ordinal_type_array(_label+"::Block::InvPermutationArray", _m);
- // ordinal_type_array range_work = ordinal_type_array(_label+"::Block::RangeArray", _m);
- // ordinal_type_array tree_work = ordinal_type_array(_label+"::Block::TreeArray", _m);
-
- // // scotch input
- // ordinal_type *rptr_blk = reinterpret_cast<ordinal_type*>(rptr_work.ptr_on_device());
- // ordinal_type *cidx_blk = reinterpret_cast<ordinal_type*>(cidx_work.ptr_on_device());
-
- // size_type nnz = 0;
- // rptr_blk[0] = nnz;
-
- // for (ordinal_type iblk=0;iblk<_cblk;++iblk) {
- // // allocate graph
- // SCOTCH_Graph graph;
-
- // ierr = SCOTCH_graphInit(&graph);CHKERR(ierr);
-
- // SCOTCH_Strat stradat;
- // SCOTCH_Num straval = (/*SCOTCH_STRATLEVELMAX |
- // SCOTCH_STRATLEVELMIN |*/
- // SCOTCH_STRATLEAFSIMPLE |
- // SCOTCH_STRATSEPASIMPLE);
-
- // ierr = SCOTCH_stratInit(&stradat);CHKERR(ierr);
- // ierr = SCOTCH_stratGraphOrderBuild(&stradat, straval, 0, 0.2);CHKERR(ierr);
-
- // const ordinal_type ibegin = range[iblk], iend = range[iblk+1], m = iend - ibegin;
-
- // // scotch output
- // ordinal_type cblk_blk = 0;
-
- // ordinal_type *perm_blk = perm_work.ptr_on_device() + ibegin;
- // ordinal_type *peri_blk = peri_work.ptr_on_device() + ibegin;
- // ordinal_type *range_blk = range_work.ptr_on_device() + ibegin;
- // ordinal_type *tree_blk = tree_work.ptr_on_device() + ibegin;
-
- // // if each blk is greater than the given minblksize, reorder internally
- // if (m > minblksize) {
- // for (int i=ibegin;i<iend;++i) {
- // const ordinal_type ii = peri[i];
- // const ordinal_type jbegin = rptr_ptr[ii];
- // const ordinal_type jend = rptr_ptr[ii+1];
-
- // for (int j=jbegin;j<jend;++j) {
- // const ordinal_type jj = perm[cidx_ptr[j]];
- // if (ibegin <= jj && jj < iend)
- // cidx_blk[nnz++] = (jj - ibegin);
- // }
- // rptr_blk[i+1] = nnz;
- // }
- // const size_type nnz_blk = nnz - rptr_blk[ibegin];
-
- // ierr = SCOTCH_graphBuild(&graph, // scotch graph
- // 0, // base value
- // m, // # of vertices
- // &rptr_blk[ibegin], // column index array pointer begin
- // &rptr_blk[ibegin]+1,// column index array pointer end
- // NULL, // weights on vertices (optional)
- // NULL, // label array on vertices (optional)
- // nnz_blk, // # of nonzeros
- // cidx_blk, // column index array
- // NULL);CHKERR(ierr); // edge load array (optional)
- // ierr = SCOTCH_graphCheck(&graph);CHKERR(ierr);
- // ierr = SCOTCH_graphOrder(&graph,
- // &stradat,
- // perm_blk,
- // peri_blk,
- // &cblk_blk,
- // range_blk,
- // tree_blk);CHKERR(ierr);
- // } else {
- // for (ordinal_type i=0;i<m;++i) {
- // perm_blk[i] = i;
- // peri_blk[i] = i;
- // }
- // range_blk[1] = m;
- // tree_blk[0] = -1;
- // }
-
- // SCOTCH_stratExit(&stradat);
- // SCOTCH_graphFree(&graph);
-
- // for (ordinal_type i=0;i<m;++i) {
- // const ordinal_type ii = peri_blk[i] + ibegin;
- // peri_blk[i] = peri[ii];
- // }
- // for (ordinal_type i=0;i<m;++i) {
- // const ordinal_type ii = i + ibegin;
- // peri[ii] = peri_blk[i];
- // }
-
- // }
-
- // for (ordinal_type i=0;i<_m;++i)
- // perm[peri[i]] = i;
- // }
-
- _is_ordered = true;
-
- //cout << "SCOTCH level = " << level << endl;
- //cout << "Range Tree " << endl;
- //for (int i=0;i<_cblk;++i)
- // cout << _range[i] << " :: " << i << " " << _tree[i] << endl;
-
- return 0;
- }
-
- int pruneTree(const ordinal_type cut) {
- if (cut <=0 ) return 0;
-
- ordinal_type_array work = ordinal_type_array(_label+"::WorkArray", _cblk+1);
- for (ordinal_type iter=0;iter<cut && _cblk > 1;++iter) {
- // horizontal merging
- {
- ordinal_type cnt = 0;
- ordinal_type parent = _tree[0];
- work[0] = cnt;
- for (ordinal_type i=1;i<_cblk;++i) {
- const ordinal_type myparent = _tree[i];
- if (myparent == parent) {
- work[i] = cnt;
- } else {
- parent = _tree[i];
- work[i] = ++cnt;
- }
- }
- work[_cblk] = ++cnt;
-
- ordinal_type prev = -2;
- const ordinal_type root = _cblk - 1;
- for (ordinal_type i=0;i<root;++i) {
- const ordinal_type myparent = _tree[i];
- const ordinal_type me = work[i];
-
- _tree[me] = work[myparent];
- if (prev != me) {
- _range[me] = _range[i];
- prev = me;
- }
- }
- {
- const ordinal_type me = work[root];
- _tree[me] = -1;
- _range[me] = _range[root];
-
- _range[work[root+1]] = _range[root+1];
- _cblk = cnt;
- }
- }
-
- // vertical merging
- if (_cblk == 2) {
- _tree[0] = -1;
- _range[0] = 0;
- _range[1] = _range[2];
- _cblk = 1;
- } else {
- ordinal_type cnt = 0;
- for (ordinal_type i=0;i<_cblk;++i) {
- const ordinal_type diff = _tree[i+1] - _tree[i];
- work[i] = (diff == 1 ? cnt : cnt++);
- }
- work[_cblk] = cnt;
-
- ordinal_type prev = -2;
- const ordinal_type root = _cblk - 1;
- for (ordinal_type i=0;i<root;++i) {
- const ordinal_type myparent = _tree[i];
- const ordinal_type me = work[i];
-
- _tree[me] = work[myparent];
- if (prev != me) {
- _range[me] = _range[i];
- prev = me;
- }
- }
- {
- const ordinal_type me = work[root];
- _tree[me] = -1;
- _range[me] = _range[root];
-
- _range[work[root+1]] = _range[root+1];
- _cblk = cnt;
- }
- }
- }
-
- // cleaning
- {
- for (ordinal_type i=(_cblk+1);i<_m;++i) {
- _tree[i] = 0;
- _range[i] = 0;
- }
- _tree[_cblk] = 0;
- }
-
- return 0;
- }
-
- ostream& showMe(ostream &os) const {
- streamsize prec = os.precision();
- os.precision(15);
- os << scientific;
-
- os << " -- Scotch input -- " << endl
- << " Base Value = " << _base << endl
- << " # of Rows = " << _m << endl
- << " # of NonZeros = " << _nnz << endl;
-
- if (_is_ordered)
- os << " -- Ordering -- " << endl
- << " CBLK = " << _cblk << endl
- << " PERM PERI RANG TREE" << endl;
-
- const int w = 6;
- for (ordinal_type i=0;i<_m;++i)
- os << setw(w) << _perm[i] << " "
- << setw(w) << _peri[i] << " "
- << setw(w) << _range[i] << " "
- << setw(w) << _tree[i] << endl;
-
- os.unsetf(ios::scientific);
- os.precision(prec);
-
- return os;
- }
-
- };
-
-}
-
-#endif
diff --git a/lib/kokkos/example/ichol/src/herk.hpp b/lib/kokkos/example/ichol/src/herk.hpp
deleted file mode 100644
index 548c495c4..000000000
--- a/lib/kokkos/example/ichol/src/herk.hpp
+++ /dev/null
@@ -1,91 +0,0 @@
-#pragma once
-#ifndef __HERK_HPP__
-#define __HERK_HPP__
-
-/// \file herk.hpp
-/// \brief Sparse hermitian rank one update on given sparse patterns.
-/// \author Kyungjoo Kim (kyukim@sandia.gov)
-
-#include "util.hpp"
-#include "control.hpp"
-#include "partition.hpp"
-
-namespace Tacho {
-
- using namespace std;
-
- template<int ArgUplo, int ArgTrans, int ArgAlgo,
- int ArgVariant = Variant::One,
- template<int,int> class ControlType = Control>
- struct Herk {
-
- // data-parallel interface
- // =======================
- template<typename ScalarType,
- typename ExecViewTypeA,
- typename ExecViewTypeC>
- KOKKOS_INLINE_FUNCTION
- static int invoke(typename ExecViewTypeA::policy_type &policy,
- const typename ExecViewTypeA::policy_type::member_type &member,
- const ScalarType alpha,
- typename ExecViewTypeA::matrix_type &A,
- const ScalarType beta,
- typename ExecViewTypeC::matrix_type &C);
-
- // task-data parallel interface
- // ============================
- template<typename ScalarType,
- typename ExecViewTypeA,
- typename ExecViewTypeC>
- class TaskFunctor {
- public:
- typedef typename ExecViewTypeA::policy_type policy_type;
- typedef typename policy_type::member_type member_type;
- typedef int value_type;
-
- private:
- ScalarType _alpha, _beta;
- typename ExecViewTypeA::matrix_type _A;
- typename ExecViewTypeC::matrix_type _C;
-
- policy_type _policy;
-
- public:
- KOKKOS_INLINE_FUNCTION
- TaskFunctor(const policy_type & P,
- const ScalarType alpha,
- const typename ExecViewTypeA::matrix_type & A,
- const ScalarType beta,
- const typename ExecViewTypeC::matrix_type & C)
- : _alpha(alpha),
- _beta(beta),
- _A(A),
- _C(C),
- _policy(P)
- { }
-
- string Label() const { return "Herk"; }
-
- // task execution
- KOKKOS_INLINE_FUNCTION
- void apply(value_type &r_val) {
- r_val = Herk::invoke<ScalarType,ExecViewTypeA,ExecViewTypeC>(_policy, _policy.member_single(),
- _alpha, _A, _beta, _C);
- }
-
- // task-data execution
- KOKKOS_INLINE_FUNCTION
- void apply(const member_type &member, value_type &r_val) {
- r_val = Herk::invoke<ScalarType,ExecViewTypeA,ExecViewTypeC>(_policy, member,
- _alpha, _A, _beta, _C);
- }
-
- };
-
- };
-
-}
-
-#include "herk_u_ct.hpp"
-
-#endif
diff --git a/lib/kokkos/example/ichol/src/herk_u_ct.hpp b/lib/kokkos/example/ichol/src/herk_u_ct.hpp
deleted file mode 100644
index 6de4a2fa5..000000000
--- a/lib/kokkos/example/ichol/src/herk_u_ct.hpp
+++ /dev/null
@@ -1,11 +0,0 @@
-#pragma once
-#ifndef __HERK_U_CT_HPP__
-#define __HERK_U_CT_HPP__
-
-/// \file herk_u_ct.hpp
-/// \brief Sparse hermitian rank one update on given sparse patterns.
-/// \author Kyungjoo Kim (kyukim@sandia.gov)
-
-#include "herk_u_ct_for_factor_blocked.hpp"
-
-#endif
diff --git a/lib/kokkos/example/ichol/src/herk_u_ct_for_factor_blocked.hpp b/lib/kokkos/example/ichol/src/herk_u_ct_for_factor_blocked.hpp
deleted file mode 100644
index 58bba2be3..000000000
--- a/lib/kokkos/example/ichol/src/herk_u_ct_for_factor_blocked.hpp
+++ /dev/null
@@ -1,103 +0,0 @@
-#pragma once
-#ifndef __HERK_U_CT_FOR_FACTOR_BLOCKED_HPP__
-#define __HERK_U_CT_FOR_FACTOR_BLOCKED_HPP__
-
-/// \file herk_u_ct_for_factor_blocked.hpp
-/// \brief Sparse hermitian rank one update on given sparse patterns.
-/// \author Kyungjoo Kim (kyukim@sandia.gov)
-
-namespace Tacho {
-
- using namespace std;
-
-
- // Herk used in the factorization phase
- // ====================================
- template<>
- template<typename ScalarType,
- typename CrsExecViewTypeA,
- typename CrsExecViewTypeC>
- KOKKOS_INLINE_FUNCTION
- int
- Herk<Uplo::Upper,Trans::ConjTranspose,
- AlgoHerk::ForFactorBlocked>
- ::invoke(typename CrsExecViewTypeA::policy_type &policy,
- const typename CrsExecViewTypeA::policy_type::member_type &member,
- const ScalarType alpha,
- typename CrsExecViewTypeA::matrix_type &A,
- const ScalarType beta,
- typename CrsExecViewTypeC::matrix_type &C) {
- typedef typename CrsExecViewTypeA::ordinal_type ordinal_type;
- typedef typename CrsExecViewTypeA::value_type value_type;
- typedef typename CrsExecViewTypeA::row_view_type row_view_type;
-
-
-if ( false && member.team_rank() == 0 ) {
- printf("Herk [%d +%d)x[%d +%d)\n"
- , C.OffsetRows()
- , C.NumRows()
- , C.OffsetCols()
- , C.NumCols()
- );
-}
-
- // scale the matrix C with beta
- scaleCrsMatrix<ScalarType,CrsExecViewTypeC>(member, beta, C);
-
- // C(i,j) += alpha*A'(i,k)*A(k,j)
- for (ordinal_type k=0;k<A.NumRows();++k) {
- row_view_type &a = A.RowView(k);
- const ordinal_type nnz = a.NumNonZeros();
-
- if (nnz > 0) {
-
-#if 0
-
- Kokkos::parallel_for(
- Kokkos::TeamThreadRange(member, 0, nnz),
- [&](const ordinal_type i) {
- const ordinal_type row_at_i = a.Col(i);
- // const value_type val_at_ik = conj(a.Value(i));
- const value_type val_at_ik = a.Value(i);
-
- row_view_type &c = C.RowView(row_at_i);
-
- ordinal_type idx = 0;
- for (ordinal_type j=i;j<nnz && (idx > -2);++j) {
- const ordinal_type col_at_j = a.Col(j);
- const value_type val_at_kj = a.Value(j);
-
- idx = c.Index(col_at_j, idx);
- if (idx >= 0)
- c.Value(idx) += alpha*val_at_ik*val_at_kj;
- }
- });
-#else
-
- Kokkos::parallel_for(
- Kokkos::TeamThreadRange(member, 0, nnz*nnz),
- [&](const ordinal_type ii) {
- const ordinal_type i = ii / nnz ;
- const ordinal_type j = ii % nnz ;
-
- row_view_type &c = C.RowView( a.Col(i) );
-
- const ordinal_type idx = c.Index( a.Col(j) );
-
- if (idx >= 0) {
- c.Value(idx) += alpha* a.Value(i) * a.Value(j);
- }
- });
-
-#endif
-
- member.team_barrier();
- }
- }
-
- return 0;
- }
-
-}
-
-#endif
diff --git a/lib/kokkos/example/ichol/src/norm.hpp b/lib/kokkos/example/ichol/src/norm.hpp
deleted file mode 100644
index be77ee0dc..000000000
--- a/lib/kokkos/example/ichol/src/norm.hpp
+++ /dev/null
@@ -1,82 +0,0 @@
-#pragma once
-#ifndef __NORM_HPP__
-#define __NORM_HPP__
-
-/// \file norm.hpp
-/// \brief Compute norm of sparse or dense matrices.
-/// \author Kyungjoo Kim (kyukim@sandia.gov)
-
-namespace Tacho {
-
- using namespace std;
-
- template<typename DenseExecViewType>
- KOKKOS_INLINE_FUNCTION
- auto
- normOneDenseMatrix(DenseExecViewType &A) -> decltype(real(typename DenseExecViewType::value_type())) {
- typedef typename DenseExecViewType::ordinal_type ordinal_type;
- typedef typename DenseExecViewType::value_type value_type;
- typedef decltype(real(value_type())) norm_type;
-
- const ordinal_type mA = A.NumRows();
- const ordinal_type nA = A.NumCols();
-
- norm_type r_val = 0.0;
-
- for (ordinal_type j=0;j<nA;++j) {
- norm_type col_sum_at_j = 0.0;
- for (ordinal_type i=0;i<mA;++i)
- col_sum_at_j += abs(A.Value(i,j));
- r_val = max(r_val, col_sum_at_j);
- }
- return r_val;
- }
-
- template<typename DenseExecViewType>
- KOKKOS_INLINE_FUNCTION
- auto
- normInfDenseMatrix(DenseExecViewType &A) -> decltype(real(typename DenseExecViewType::value_type())) {
- typedef typename DenseExecViewType::ordinal_type ordinal_type;
- typedef typename DenseExecViewType::value_type value_type;
- typedef decltype(real(value_type())) norm_type;
-
- const ordinal_type mA = A.NumRows();
- const ordinal_type nA = A.NumCols();
-
- norm_type r_val = 0.0;
-
- for (ordinal_type i=0;i<mA;++i) {
- norm_type row_sum_at_i = 0.0;
- for (ordinal_type j=0;j<nA;++j)
- row_sum_at_i += abs(A.Value(i,j));
- r_val = max(r_val, row_sum_at_i);
- }
- return r_val;
- }
-
- template<typename DenseExecViewType>
- KOKKOS_INLINE_FUNCTION
- auto
- normFrobeniusDenseMatrix(DenseExecViewType &A) -> decltype(real(typename DenseExecViewType::value_type())) {
- typedef typename DenseExecViewType::ordinal_type ordinal_type;
- typedef typename DenseExecViewType::value_type value_type;
- typedef decltype(real(value_type())) norm_type;
-
- const ordinal_type mA = A.NumRows();
- const ordinal_type nA = A.NumCols();
-
- norm_type r_val = 0.0;
-
- for (ordinal_type i=0;i<mA;++i)
- for (ordinal_type j=0;j<nA;++j) {
- value_type val = A.Value(i,j);
- // r_val += conj(val)*val;
- r_val += val*val;
- }
- return sqrt(r_val);
- }
-
-}
-
-#endif
-
diff --git a/lib/kokkos/example/ichol/src/partition.hpp b/lib/kokkos/example/ichol/src/partition.hpp
deleted file mode 100644
index a3e9f7095..000000000
--- a/lib/kokkos/example/ichol/src/partition.hpp
+++ /dev/null
@@ -1,381 +0,0 @@
-
-#ifndef __PARTITION_HPP__
-#define __PARTITION_HPP__
-
-/// \file partition.hpp
-/// \brief Matrix partitioning utilities.
-/// \author Kyungjoo Kim (kyukim@sandia.gov)
-
-namespace Tacho {
-
- using namespace std;
-
- template<typename MatView>
- KOKKOS_INLINE_FUNCTION
- void
- Part_2x2(const MatView A, MatView &ATL, MatView &ATR,
- /**************/ MatView &ABL, MatView &ABR,
- const typename MatView::ordinal_type bm,
- const typename MatView::ordinal_type bn,
- const int quadrant) {
- typename MatView::ordinal_type bmm, bnn;
-
- switch (quadrant) {
- case Partition::TopLeft:
- bmm = min(bm, A.NumRows());
- bnn = min(bn, A.NumCols());
-
- ATL.setView(A.BaseObject(),
- A.OffsetRows(), bmm,
- A.OffsetCols(), bnn);
- break;
- case Partition::TopRight:
- case Partition::BottomLeft:
- Kokkos::abort("Tacho::Part_2x2 Not yet implemented");
- break;
- case Partition::BottomRight:
- bmm = A.NumRows() - min(bm, A.NumRows());
- bnn = A.NumCols() - min(bn, A.NumCols());
-
- ATL.setView(A.BaseObject(),
- A.OffsetRows(), bmm,
- A.OffsetCols(), bnn);
- break;
- default:
- Kokkos::abort("Tacho::Part_2x2 Invalid Input");
- break;
- }
-
- ATR.setView(A.BaseObject(),
- A.OffsetRows(), ATL.NumRows(),
- A.OffsetCols() + ATL.NumCols(), A.NumCols() - ATL.NumCols());
-
- ABL.setView(A.BaseObject(),
- A.OffsetRows() + ATL.NumRows(), A.NumRows() - ATL.NumRows(),
- A.OffsetCols(), ATL.NumCols());
-
- ABR.setView(A.BaseObject(),
- A.OffsetRows() + ATL.NumRows(), A.NumRows() - ATL.NumRows(),
- A.OffsetCols() + ATL.NumCols(), A.NumCols() - ATL.NumCols());
- }
-
- template<typename MatView>
- KOKKOS_INLINE_FUNCTION
- void
- Part_1x2(const MatView A, MatView &AL, MatView &AR,
- const typename MatView::ordinal_type bn,
- const int side) {
- typename MatView::ordinal_type bmm, bnn;
-
- switch (side) {
- case Partition::Left:
- bmm = A.NumRows();
- bnn = min(bn, A.NumCols());
-
- AL.setView(A.BaseObject(),
- A.OffsetRows(), bmm,
- A.OffsetCols(), bnn);
- break;
- case Partition::Right:
- bmm = A.NumRows();
- bnn = A.NumCols() - min(bn, A.NumCols());
-
- AL.setView(A.BaseObject(),
- A.OffsetRows(), bmm,
- A.OffsetCols(), bnn);
- break;
- default:
- Kokkos::abort("Tacho::Part_1x2 Invalid Input");
- break;
- }
-
- AR.setView(A.BaseObject(),
- A.OffsetRows(), A.NumRows(),
- A.OffsetCols() + AL.NumCols(), A.NumCols() - AL.NumCols());
- }
-
- template<typename MatView>
- KOKKOS_INLINE_FUNCTION
- void
- Part_2x1(const MatView A, MatView &AT,
- /*************/ MatView &AB,
- const typename MatView::ordinal_type bm,
- const int side) {
- typename MatView::ordinal_type bmm, bnn;
-
- switch (side) {
- case Partition::Top:
- bmm = min(bm, A.NumRows());
- bnn = A.NumCols();
-
- AT.setView(A.BaseObject(),
- A.OffsetRows(), bmm,
- A.OffsetCols(), bnn);
- break;
- case Partition::Bottom:
- bmm = A.NumRows() - min(bm, A.NumRows());
- bnn = A.NumCols();
-
- AT.setView(A.BaseObject(),
- A.OffsetRows(), bmm,
- A.OffsetCols(), bnn);
- break;
- default:
- Kokkos::abort("Tacho::Part_2x1 Invalid Input");
- break;
- }
-
- AB.setView(A.BaseObject(),
- A.OffsetRows() + AT.NumRows(), A.NumRows() - AT.NumRows(),
- A.OffsetCols(), A.NumCols());
- }
-
- template<typename MatView>
- KOKKOS_INLINE_FUNCTION
- void
- Part_2x2_to_3x3(const MatView ATL, const MatView ATR, MatView &A00, MatView &A01, MatView &A02,
- /***********************************/ MatView &A10, MatView &A11, MatView &A12,
- const MatView ABL, const MatView ABR, MatView &A20, MatView &A21, MatView &A22,
- const typename MatView::ordinal_type bm,
- const typename MatView::ordinal_type bn,
- const int quadrant) {
- switch (quadrant) {
- case Partition::TopLeft:
- Part_2x2(ATL, A00, A01,
- /**/ A10, A11,
- bm, bn, Partition::BottomRight);
-
- Part_2x1(ATR, A02,
- /**/ A12,
- bm, Partition::Bottom);
-
- Part_1x2(ABL, A20, A21,
- bn, Partition::Right);
-
- A22.setView(ABR.BaseObject(),
- ABR.OffsetRows(), ABR.NumRows(),
- ABR.OffsetCols(), ABR.NumCols());
- break;
- case Partition::TopRight:
- case Partition::BottomLeft:
- Kokkos::abort("Tacho::Part_???");
- break;
- case Partition::BottomRight:
- A00.setView(ATL.BaseObject(),
- ATL.OffsetRows(), ATL.NumRows(),
- ATL.OffsetCols(), ATL.NumCols());
-
- Part_1x2(ATR, A01, A02,
- bn, Partition::Left);
-
- Part_2x1(ABL, A10,
- /**/ A20,
- bm, Partition::Top);
-
- Part_2x2(ABR, A11, A12,
- /**/ A21, A22,
- bm, bn, Partition::TopLeft);
- break;
- default:
- Kokkos::abort("Tacho::Part_???");
- break;
- }
- }
-
- template<typename MatView>
- KOKKOS_INLINE_FUNCTION
- void
- Part_2x1_to_3x1(const MatView AT, MatView &A0,
- /***************/ MatView &A1,
- const MatView AB, MatView &A2,
- const typename MatView::ordinal_type bm,
- const int side) {
- switch (side) {
- case Partition::Top:
- Part_2x1(AT, A0,
- /**/ A1,
- bm, Partition::Bottom);
-
- A2.setView(AB.BaseObject(),
- AB.OffsetRows(), AB.NumRows(),
- AB.OffsetCols(), AB.NumCols());
- break;
- case Partition::Bottom:
- A0.setView(AT.BaseObject(),
- AT.OffsetRows(), AT.NumRows(),
- AT.OffsetCols(), AT.NumCols());
-
- Part_2x1(AB, A1,
- /**/ A2,
- bm, Partition::Top);
- break;
- default:
- Kokkos::abort("Tacho::Part_???");
- break;
- }
- }
-
- template<typename MatView>
- KOKKOS_INLINE_FUNCTION
- void
- Part_1x2_to_1x3(const MatView AL, const MatView AR,
- MatView &A0, MatView &A1, MatView &A2,
- const typename MatView::ordinal_type bn,
- const int side) {
- switch (side) {
- case Partition::Left:
- Part_1x2(AL, A0, A1,
- bn, Partition::Right);
-
- A2.setView(AR.BaseObaject(),
- AR.OffsetRows(), AR.NumRows(),
- AR.OffsetCols(), AR.NumCols());
- break;
- case Partition::Right:
- A0.setView(AL.BaseObject(),
- AL.OffsetRows(), AL.NumRows(),
- AL.OffsetCols(), AL.NumCols());
-
- Part_1x2(AR, A1, A2,
- bn, Partition::Left);
- break;
- default:
- Kokkos::abort("Tacho::Part_???");
- break;
- }
- }
-
- template<typename MatView>
- KOKKOS_INLINE_FUNCTION
- void
- Merge_2x2(const MatView ATL, const MatView ATR,
- const MatView ABL, const MatView ABR, MatView &A) {
- A.setView(ATL.BaseObject(),
- ATL.OffsetRows(), ATL.NumRows() + ABR.NumRows(),
- ATL.OffsetCols(), ATL.NumCols() + ABR.NumCols());
- }
-
- template<typename MatView>
- KOKKOS_INLINE_FUNCTION
- void
- Merge_1x2(const MatView AL, const MatView AR, MatView &A) {
- A.setView(AL.BaseObject(),
- AL.OffsetRows(), AL.NumRows(),
- AL.OffsetCols(), AL.NumCols() + AR.NumCols());
- }
-
- template<typename MatView>
- KOKKOS_INLINE_FUNCTION
- void
- Merge_2x1(const MatView AT,
- const MatView AB, MatView &A) {
- A.setView(AT.BaseObject(),
- AT.OffsetRows(), AT.NumRows() + AB.NumRows(),
- AT.OffsetCols(), AT.NumCols());
- }
-
- template<typename MatView>
- KOKKOS_INLINE_FUNCTION
- void
- Merge_3x3_to_2x2(const MatView A00, const MatView A01, const MatView A02, MatView &ATL, MatView &ATR,
- const MatView A10, const MatView A11, const MatView A12,
- const MatView A20, const MatView A21, const MatView A22, MatView &ABL, MatView &ABR,
- const int quadrant) {
- switch (quadrant) {
- case Partition::TopLeft:
- Merge_2x2(A00, A01,
- A10, A11, ATL);
-
- Merge_2x1(A02,
- A12, ATR);
-
- Merge_1x2(A20, A21, ABL);
-
- ABR.setView(A22.BaseObject(),
- A22.OffsetRows(), A22.NumRows(),
- A22.OffsetCols(), A22.NumCols());
- break;
- case Partition::TopRight:
- case Partition::BottomLeft:
- Kokkos::abort("Tacho::Part_???");
- break;
- case Partition::BottomRight:
- ATL.setView(A00.BaseObject(),
- A00.OffsetRows(), A00.NumRows(),
- A00.OffsetCols(), A00.NumCols());
-
- Merge_1x2(A01, A02, ATR);
-
- Merge_2x1(A10,
- A20, ABL);
-
- Merge_2x2(A11, A12,
- A21, A22, ABR);
- break;
- default:
- Kokkos::abort("Tacho::Part_???");
- break;
- }
- }
-
- template<typename MatView>
- KOKKOS_INLINE_FUNCTION
- void
- Merge_3x1_to_2x1(const MatView A0, MatView &AT,
- const MatView A1,
- const MatView A2, MatView &AB,
- const int side) {
- switch (side) {
- case Partition::Top:
- Merge_2x1(A0,
- A1, AT);
-
- AB.setView(A2.BaseObject(),
- A2.OffsetRows(), A2.NumRows(),
- A2.OffsetCols(), A2.NumCols());
- break;
- case Partition::Bottom:
- AT.setView(A0.BaseObject(),
- A0.OffsetRows(), A0.NumRows(),
- A0.OffsetCols(), A0.NumCols());
-
- Merge_2x1(A1,
- A2, AB);
- break;
- default:
- Kokkos::abort("Tacho::Part_???");
- break;
- }
- }
-
- template<typename MatView>
- KOKKOS_INLINE_FUNCTION
- void
- Merge_1x3_to_1x2(const MatView A0, const MatView A1, const MatView A2,
- MatView &AL, MatView &AR,
- const int side) {
- switch (side) {
- case Partition::Left:
- Merge_1x2(A0, A1, AL);
-
- AR.setView(A2.BaseObject(),
- A2.OffsetRows(), A2.NumRows(),
- A2.OffsetCols(), A2.NumCols());
- break;
- case Partition::Right:
- AL.setView(A0.BaseObject(),
- A0.OffsetRows(), A0.NumRows(),
- A0.OffsetCols(), A0.NumCols());
-
- Merge_1x2(A1, A2, AR);
- break;
- default:
- Kokkos::abort("Tacho::Part_???");
- break;
- }
- }
-
-
-}
-
-#endif
diff --git a/lib/kokkos/example/ichol/src/scale.hpp b/lib/kokkos/example/ichol/src/scale.hpp
deleted file mode 100644
index 315252096..000000000
--- a/lib/kokkos/example/ichol/src/scale.hpp
+++ /dev/null
@@ -1,92 +0,0 @@
-#pragma once
-#ifndef __SCALE_HPP__
-#define __SCALE_HPP__
-
-/// \file scale.hpp
-/// \brief Scaling sparse matrix.
-/// \author Kyungjoo Kim (kyukim@sandia.gov)
-
-namespace Tacho {
-
- using namespace std;
-
- template<typename T> struct ScaleTraits {
- typedef T scale_type;
- // assume built-in types have appropriate type conversion
- static constexpr T one = 1 ;
- static constexpr T zero = 0 ;
- };
-
-
- template<typename ScalarType,
- typename CrsExecViewType>
- KOKKOS_INLINE_FUNCTION
- int
- scaleCrsMatrix(const typename CrsExecViewType::policy_type::member_type &member,
- const ScalarType alpha,
- typename CrsExecViewType::matrix_type &A) {
- typedef typename CrsExecViewType::ordinal_type ordinal_type;
- typedef typename CrsExecViewType::value_type value_type;
- typedef typename CrsExecViewType::row_view_type row_view_type;
-
- if (alpha == ScaleTraits<value_type>::one) {
- // do nothing
- } else {
- const ordinal_type mA = A.NumRows();
- if (mA > 0) {
- Kokkos::parallel_for(Kokkos::TeamThreadRange(member, 0, mA),
- [&](const ordinal_type i) {
- row_view_type &row = A.RowView(i);
- for (ordinal_type j=0;j<row.NumNonZeros();++j)
- row.Value(j) *= alpha;
- });
- member.team_barrier();
- }
- }
-
- return 0;
- }
-
- template<typename ScalarType,
- typename DenseExecViewType>
- KOKKOS_INLINE_FUNCTION
- int
- scaleDenseMatrix(const typename DenseExecViewType::policy_type::member_type &member,
- const ScalarType alpha,
- DenseExecViewType &A) {
- typedef typename DenseExecViewType::ordinal_type ordinal_type;
- typedef typename DenseExecViewType::value_type value_type;
-
- if (alpha == ScaleTraits<value_type>::one) {
- // do nothing
- } else {
- if (A.BaseObject().ColStride() > A.BaseObject().RowStride()) {
- const ordinal_type nA = A.NumCols();
- if (nA > 0) {
- Kokkos::parallel_for(Kokkos::TeamThreadRange(member, 0, nA),
- [&](const ordinal_type j) {
- for (ordinal_type i=0;i<A.NumRows();++i)
- A.Value(i, j) *= alpha;
- });
- member.team_barrier();
- }
- } else {
- const ordinal_type mA = A.NumRows();
- if (mA > 0) {
- Kokkos::parallel_for(Kokkos::TeamThreadRange(member, 0, mA),
- [&](const ordinal_type i) {
- for (ordinal_type j=0;j<A.NumCols();++j)
- A.Value(i, j) *= alpha;
- });
- member.team_barrier();
- }
- }
- }
-
- return 0;
- }
-
-}
-
-#endif
-
diff --git a/lib/kokkos/example/ichol/src/symbolic_factor_helper.hpp b/lib/kokkos/example/ichol/src/symbolic_factor_helper.hpp
deleted file mode 100644
index f6c381a99..000000000
--- a/lib/kokkos/example/ichol/src/symbolic_factor_helper.hpp
+++ /dev/null
@@ -1,379 +0,0 @@
-#pragma once
-#ifndef __SYMBOLIC_FACTOR_HELPER_HPP__
-#define __SYMBOLIC_FACTOR_HELPER_HPP__
-
-/// \file symbolic_factor_helper.hpp
-/// \brief The class compute a nonzero pattern with a given level of fills
-/// \author Kyungjoo Kim (kyukim@sandia.gov)
-
-#include "util.hpp"
-
-namespace Tacho {
-
- using namespace std;
-
- template<class CrsMatrixType>
- class SymbolicFactorHelper : public Disp {
- public:
- typedef typename CrsMatrixType::ordinal_type ordinal_type;
- typedef typename CrsMatrixType::size_type size_type;
-
- typedef typename Kokkos::HostSpace::execution_space host_exec_space ;
-
- typedef typename CrsMatrixType::ordinal_type_array ordinal_type_array;
- typedef typename CrsMatrixType::size_type_array size_type_array;
- typedef typename CrsMatrixType::value_type_array value_type_array;
-
- private:
- string _label; // name of this class
-
- // matrix index base
- CrsMatrixType _A; // input matrix
- ordinal_type _m, _n; // matrix dimension
-
- struct crs_graph {
- size_type_array _ap; // row ptr array
- ordinal_type_array _aj; // col index array
- size_type _nnz; // # of nonzeros
- };
- typedef struct crs_graph crs_graph_type;
- crs_graph_type _in, _out;
-
- typedef Kokkos::View<ordinal_type**, Kokkos::LayoutLeft, host_exec_space> league_specific_ordinal_type_array;
- typedef typename league_specific_ordinal_type_array::value_type* league_specific_ordinal_type_array_ptr;
-
- int _lsize;
- league_specific_ordinal_type_array _queue, _visited, _distance;
-
- void createInternalWorkSpace() {
- _queue = league_specific_ordinal_type_array(_label+"::QueueArray", _m, _lsize);
- _visited = league_specific_ordinal_type_array(_label+"::VisitedArray", _m, _lsize);
- _distance = league_specific_ordinal_type_array(_label+"::DistanceArray", _m, _lsize);
- }
-
- void freeInternalWorkSpace() {
- _queue = league_specific_ordinal_type_array();
- _visited = league_specific_ordinal_type_array();
- _distance = league_specific_ordinal_type_array();
- }
-
- public:
-
- void setLabel(string label) { _label = label; }
- string Label() const { return _label; }
-
- SymbolicFactorHelper(const CrsMatrixType &A,
- const int lsize = (host_exec_space::thread_pool_size(0)/
- host_exec_space::thread_pool_size(2))) {
-
- _label = "SymbolicFactorHelper::" ;
-
- // matrix index base and the number of rows
- _A = A;
-
- _m = _A.NumRows();
- _n = _A.NumCols();
-
- // allocate memory for input crs matrix
- _in._nnz = _A.NumNonZeros();
- _in._ap = size_type_array(_label+"::Input::RowPtrArray", _m+1);
- _in._aj = ordinal_type_array(_label+"::Input::ColIndexArray", _in._nnz);
-
- // adjust graph structure; A is assumed to have a graph without its diagonal
- A.convertGraph(_in._ap, _in._aj);
- _in._nnz = _in._ap[_m];
-
- // league size
- _lsize = lsize;
-
- // create workspace per league
- createInternalWorkSpace();
- }
- virtual~SymbolicFactorHelper() {
- freeInternalWorkSpace();
- }
-
- class Queue {
- private:
- league_specific_ordinal_type_array_ptr _q;
- ordinal_type _begin, _end;
-
- public:
- Queue(league_specific_ordinal_type_array_ptr q)
- : _q(q),_begin(0),_end(0) { }
-
- ordinal_type size() const { return _end - _begin; }
- bool empty() const { return !size(); }
-
- void push(const ordinal_type val) { _q[_end++] = val; }
- ordinal_type pop() { return _q[_begin++]; }
- ordinal_type end() { return _end; }
- void reset() { _begin = 0; _end = 0; }
- };
-
- class FunctorComputeNonZeroPatternInRow {
- public:
- typedef Kokkos::TeamPolicy<host_exec_space> policy_type;
-
- private:
- ordinal_type _level, _m;
- crs_graph_type _graph;
-
- league_specific_ordinal_type_array _queue;
- league_specific_ordinal_type_array _visited;
- league_specific_ordinal_type_array _distance;
-
- size_type_array _ap;
- ordinal_type_array _aj;
-
- ordinal_type _phase;
-
- public:
- FunctorComputeNonZeroPatternInRow(const ordinal_type level,
- const ordinal_type m,
- const crs_graph_type &graph,
- league_specific_ordinal_type_array &queue,
- league_specific_ordinal_type_array &visited,
- league_specific_ordinal_type_array &distance,
- size_type_array &ap,
- ordinal_type_array &aj)
- : _level(level), _m(m), _graph(graph),
- _queue(queue), _visited(visited), _distance(distance),
- _ap(ap), _aj(aj), _phase(0)
- { }
-
- void setPhaseCountNumNonZeros() { _phase = 0; }
- void setPhaseComputeColIndex() { _phase = 1; }
-
- inline
- void operator()(const typename policy_type::member_type &member) const {
- const int lrank = member.league_rank();
- const int lsize = member.league_size();
-
- league_specific_ordinal_type_array_ptr queue = &_queue(0, lrank);
- league_specific_ordinal_type_array_ptr distance = &_distance(0, lrank);
- league_specific_ordinal_type_array_ptr visited = &_visited(0, lrank);
-
- for (ordinal_type i=0;i<_m;++i)
- visited[i] = 0;
-
- // shuffle rows to get better load balance;
- // for instance, if ND is applied, more fills are generated in the last seperator.
- for (ordinal_type i=lrank;i<_m;i+=lsize) {
-
- size_type cnt = 0;
-
- // account for the diagonal
- switch (_phase) {
- case 0:
- cnt = 1;
- break;
- case 1:
- cnt = _ap[i];
- _aj[cnt++] = i;
- break;
- }
-
- {
- Queue q(queue); // fixed size queue
-
- // initialize work space
- q.push(i);
- distance[i] = 0;
-
- const ordinal_type id = (i+1);
- visited[i] = id;
-
- // breath first search for i
- while (!q.empty()) {
- const ordinal_type h = q.pop();
- // loop over j adjancy
- const ordinal_type jbegin = _graph._ap[h], jend = _graph._ap[h+1];
- for (ordinal_type j=jbegin;j<jend;++j) {
- const ordinal_type t = _graph._aj[j];
- if (visited[t] != id) {
- visited[t] = id;
-
- if (t < i && (_level < 0 || distance[h] < _level)) {
- q.push(t);
- distance[t] = distance[h] + 1;
- }
- if (t > i) {
- switch (_phase) {
- case 0:
- ++cnt;
- break;
- case 1:
- _aj[cnt++] = t;
- break;
- }
- }
- }
- }
- }
-
- // clear work space
- for (ordinal_type j=0;j<q.end();++j) {
- const ordinal_type jj = queue[j];
- distance[jj] = 0;
- }
- q.reset();
- }
- switch (_phase) {
- case 0:
- _ap[i+1] = cnt;
- break;
- case 1:
- sort(_aj.data() + _ap[i] , _aj.data() + _ap[i+1]);
- break;
- }
- }
- }
- };
-
- class FunctorCountOffsetsInRow {
- public:
- typedef Kokkos::RangePolicy<host_exec_space> policy_type;
- typedef size_type value_type;
-
- private:
- size_type_array _off_in_rows;
-
- public:
- FunctorCountOffsetsInRow(size_type_array &off_in_rows)
- : _off_in_rows(off_in_rows)
- { }
-
- KOKKOS_INLINE_FUNCTION
- void init(value_type &update) const {
- update = 0;
- }
-
- KOKKOS_INLINE_FUNCTION
- void operator()(const typename policy_type::member_type &i, value_type &update, const bool final) const {
- update += _off_in_rows(i);
- if (final)
- _off_in_rows(i) = update;
- }
-
- KOKKOS_INLINE_FUNCTION
- void join(volatile value_type &update,
- volatile const value_type &input) const {
- update += input;
- }
- };
-
- int createNonZeroPattern(const ordinal_type level,
- const int uplo,
- CrsMatrixType &F) {
- // all output array should be local and rcp in Kokkos::View manage memory (de)allocation
- size_type_array ap = size_type_array(_label+"::Output::RowPtrArray", _m+1);
-
- // later determined
- ordinal_type_array aj;
- value_type_array ax;
- size_type nnz = 0;
-
- {
- FunctorComputeNonZeroPatternInRow functor(level, _m, _in,
- _queue,
- _visited,
- _distance,
- ap,
- aj);
-
- functor.setPhaseCountNumNonZeros();
- Kokkos::parallel_for(typename FunctorComputeNonZeroPatternInRow::policy_type(_lsize, 1), functor);
- }
- {
- FunctorCountOffsetsInRow functor(ap);
- Kokkos::parallel_scan(typename FunctorCountOffsetsInRow::policy_type(0, _m+1), functor);
- }
-
- nnz = ap[_m];
- aj = ordinal_type_array(_label+"::Output::ColIndexArray", nnz);
- ax = value_type_array(_label+"::Output::ValueArray", nnz);
-
- {
- FunctorComputeNonZeroPatternInRow functor(level, _m, _in,
- _queue,
- _visited,
- _distance,
- ap,
- aj);
-
- functor.setPhaseComputeColIndex();
- Kokkos::parallel_for(typename FunctorComputeNonZeroPatternInRow::policy_type(_lsize, 1), functor);
- }
-
- {
- F = CrsMatrixType("dummy", _m, _n, nnz, ap, aj, ax);
- F.add(_A);
- }
-
- // record the symbolic factors
- _out._nnz = nnz;
- _out._ap = ap;
- _out._aj = aj;
-
- return 0;
- }
-
- int createNonZeroPattern(const int uplo,
- CrsMatrixType &F) {
- return createNonZeroPattern(-1, uplo, F);
- }
-
- ostream& showMe(ostream &os) const {
- streamsize prec = os.precision();
- os.precision(15);
- os << scientific;
-
- const int w = 6;
-
- os << " -- Matrix Dimension -- " << endl
- << " # of Rows = " << _m << endl
- << " # of Cols = " << _n << endl;
-
- os << endl;
-
- os << " -- Input Graph Without Diagonals -- " << endl
- << " # of NonZeros = " << _in._nnz << endl ;
-
- os << " -- Input Graph :: RowPtr -- " << endl;
- {
- const ordinal_type n0 = _in._ap.dimension_0();
- for (ordinal_type i=0;i<n0;++i)
- os << setw(w) << i
- << setw(w) << _in._ap[i]
- << endl;
- }
-
- os << endl;
-
- os << " -- Output Graph With Diagonals-- " << endl
- << " # of NonZeros = " << _out._nnz << endl ;
-
- os << " -- Output Graph :: RowPtr -- " << endl;
- {
- const ordinal_type n0 = _out._ap.dimension_0();
- for (ordinal_type i=0;i<n0;++i)
- os << setw(w) << i
- << setw(w) << _out._ap[i]
- << endl;
- }
-
- os.unsetf(ios::scientific);
- os.precision(prec);
-
- return os;
- }
-
- };
-
-}
-
-#endif
-
-
-
diff --git a/lib/kokkos/example/ichol/src/symbolic_task.hpp b/lib/kokkos/example/ichol/src/symbolic_task.hpp
deleted file mode 100644
index f6cdc28ab..000000000
--- a/lib/kokkos/example/ichol/src/symbolic_task.hpp
+++ /dev/null
@@ -1,118 +0,0 @@
-#pragma once
-#ifndef __SYMBOLIC_TASK_HPP__
-#define __SYMBOLIC_TASK_HPP__
-
-/// \file symbolic_task.hpp
-/// \brief Provides tasking interface with graphviz output.
-/// \author Kyungjoo Kim (kyukim@sandia.gov)
-
-namespace Tacho {
-
- using namespace std;
-
- /// \brief Graphviz color mapping for the generated tasks.
- static map<string,string> g_graphviz_color = {
- { "chol/scalar", "indianred2"},
- { "chol/trsm", "orange2" },
- { "chol/gemm", "lightblue2"} };
-
- class SymbolicTaskQueue;
-
- class SymbolicTask {
- private:
- string _name;
- set<SymbolicTask*> _dep_tasks;
-
- public:
- // at this moment, make the queue global
- // but this should be local and work with
- // multiple queues with separate thread teams
- typedef SymbolicTaskQueue queue;
-
- SymbolicTask()
- : _name("no-name")
- { }
-
- SymbolicTask(const SymbolicTask &b)
- : _name(b._name)
- { }
-
- SymbolicTask(const string name)
- : _name(name)
- { }
-
- int addDependence(SymbolicTask *b) {
- if (b != NULL)
- _dep_tasks.insert(b);
- return 0;
- }
-
- int clearDependence() {
- _dep_tasks.clear();
- return 0;
- }
-
- ostream& showMe(ostream &os) const {
- os << " uid = " << this << " , name = " << _name << ", # of deps = " << _dep_tasks.size() << endl;
- if (_dep_tasks.size()) {
- for (auto it=_dep_tasks.begin();it!=_dep_tasks.end();++it)
- os << " " << (*it) << " , name = " << (*it)->_name << endl;
- }
- return os;
- }
-
- ostream& graphviz(ostream &os) const {
- os << (long)(this)
- << " [label=\"" << _name ;
- auto it = g_graphviz_color.find(_name);
- if (it != g_graphviz_color.end())
- os << "\" ,style=filled,color=\"" << it->second << "\" ";
- os << "];";
- for (auto it=_dep_tasks.begin();it!=_dep_tasks.end();++it)
- os << (long)(*it) << " -> " << (long)this << ";";
- return (os << endl);
- }
-
- };
-
- static vector<SymbolicTask*> g_queue;
-
- class SymbolicTaskQueue {
- public:
- static SymbolicTask* push(SymbolicTask *task) {
- g_queue.push_back(task);
- return g_queue.back();
- }
-
- static int clear() {
- for (auto it=g_queue.begin();it!=g_queue.end();++it)
- delete (*it);
- g_queue.clear();
- return 0;
- }
-
- static ostream& showMe(ostream &os) {
- if (g_queue.size()) {
- os << " -- Symbolic Task Queue -- " << endl;
- for (auto it=g_queue.begin();it!=g_queue.end();++it)
- (*it)->showMe(os);
- } else {
- os << " -- Symbolic Task Queue is empty -- " << endl;
- }
- return os;
- }
-
- static ostream& graphviz(ostream &os,
- const double width = 7.5,
- const double length = 10.0) {
- os << "digraph TaskGraph {" << endl;
- os << "size=\"" << width << "," << length << "\";" << endl;
- for (auto it=g_queue.begin();it!=g_queue.end();++it)
- (*it)->graphviz(os);
- os << "}" << endl;
- return (os << endl);
- }
- };
-
-}
-#endif
diff --git a/lib/kokkos/example/ichol/src/task_factory.hpp b/lib/kokkos/example/ichol/src/task_factory.hpp
deleted file mode 100644
index b829da673..000000000
--- a/lib/kokkos/example/ichol/src/task_factory.hpp
+++ /dev/null
@@ -1,77 +0,0 @@
-#pragma once
-#ifndef __TASK_FACTORY_HPP__
-#define __TASK_FACTORY_HPP__
-
-/// \file task_factory.hpp
-/// \brief A wrapper for task policy and future with a provided space type.
-/// \author Kyungjoo Kim (kyukim@sandia.gov)
-
-namespace Tacho {
-
- using namespace std;
-
- /// \class TaskFactory
- /// \brief Minimal interface to Kokkos tasking.
- ///
- /// TaskFactory is attached to blocks as a template argument in order to
- /// create and manage tasking future objects. Note that policy (shared
- /// pointer to the task generator) is not a member object in this class.
- /// This class includes minimum interface for tasking with type decralation
- /// of the task policy and template alias of future so that future objects
- /// generated in this class will match to their policy and its execution space.
- ///
- template<typename PolicyType,
- typename FutureType>
- class TaskFactory {
- private:
- static constexpr int _max_task_dependence = 10 ;
-
- public:
- typedef PolicyType policy_type;
- typedef FutureType future_type;
-
- template<typename TaskFunctorType>
- static KOKKOS_INLINE_FUNCTION
- future_type create(policy_type &policy, const TaskFunctorType &func) {
-
- future_type f ;
- // while ( f.is_null() ) {
- f = policy.task_create_team(func, _max_task_dependence);
- // }
- if ( f.is_null() ) Kokkos::abort("task_create_team FAILED, out of memory");
- return f ;
- }
-
- static KOKKOS_INLINE_FUNCTION
- void spawn(policy_type &policy, const future_type &obj, bool priority = false ) {
- policy.spawn(obj,priority);
- }
-
- static KOKKOS_INLINE_FUNCTION
- void addDependence(policy_type &policy,
- const future_type &after, const future_type &before) {
- policy.add_dependence(after, before);
- }
-
- template<typename TaskFunctorType>
- static KOKKOS_INLINE_FUNCTION
- void addDependence(policy_type &policy,
- TaskFunctorType *after, const future_type &before) {
- policy.add_dependence(after, before);
- }
-
- template<typename TaskFunctorType>
- static KOKKOS_INLINE_FUNCTION
- void clearDependence(policy_type &policy, TaskFunctorType *func) {
- policy.clear_dependence(func);
- }
-
- template<typename TaskFunctorType>
- static KOKKOS_INLINE_FUNCTION
- void respawn(policy_type &policy, TaskFunctorType *func) {
- policy.respawn(func);
- }
- };
-}
-
-#endif
diff --git a/lib/kokkos/example/ichol/src/task_view.hpp b/lib/kokkos/example/ichol/src/task_view.hpp
deleted file mode 100644
index ce280a325..000000000
--- a/lib/kokkos/example/ichol/src/task_view.hpp
+++ /dev/null
@@ -1,104 +0,0 @@
-#pragma once
-#ifndef __TASK_VIEW_HPP__
-#define __TASK_VIEW_HPP__
-
-/// \file task_view.hpp
-/// \brief Task view is inherited from matrix view and have a member for the task handler.
-/// \author Kyungjoo Kim (kyukim@sandia.gov)
-
-namespace Tacho {
-
- using namespace std;
-
- template<typename MatrixViewType,
- typename TaskFactoryType>
- class TaskView : public MatrixViewType {
- public:
- typedef MatrixViewType matrix_type ;
- typedef typename MatrixViewType::value_type value_type;
- typedef typename MatrixViewType::ordinal_type ordinal_type;
-
- typedef TaskFactoryType task_factory_type;
- typedef typename task_factory_type::policy_type policy_type;
- typedef typename task_factory_type::future_type future_type;
-
- private:
- future_type _f;
-
- public:
- KOKKOS_INLINE_FUNCTION
- void setFuture(const future_type &f)
- { _f = f; }
-
- KOKKOS_INLINE_FUNCTION
- future_type Future() const { return _f; }
-
- KOKKOS_INLINE_FUNCTION
- ~TaskView() = default ;
-
- KOKKOS_INLINE_FUNCTION
- TaskView()
- : MatrixViewType(), _f()
- { }
-
- TaskView(const TaskView &b) = delete ;
-
- KOKKOS_INLINE_FUNCTION
- TaskView(typename MatrixViewType::mat_base_type const & b)
- : MatrixViewType(b), _f()
- { }
-
- KOKKOS_INLINE_FUNCTION
- TaskView(typename MatrixViewType::mat_base_type const & b,
- const ordinal_type offm, const ordinal_type m,
- const ordinal_type offn, const ordinal_type n)
- : MatrixViewType(b, offm, m, offn, n), _f()
- { }
-
- };
-}
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-#if ! KOKKOS_USING_EXP_VIEW
-
-namespace Kokkos {
- namespace Impl {
-
- // The Kokkos::View allocation will by default assign each allocated datum to zero.
- // This is not the required initialization behavior when
- // non-trivial objects are used within a Kokkos::View.
- // Create a partial specialization of the Kokkos::Impl::AViewDefaultConstruct
- // to replace the assignment initialization with placement new initialization.
- //
- // This work-around is necessary until a TBD design refactorization of Kokkos::View.
-
- template< class ExecSpace , typename T1, typename T2 >
- struct ViewDefaultConstruct< ExecSpace , Tacho::TaskView<T1,T2> , true >
- {
- typedef Tacho::TaskView<T1,T2> type ;
- type * const m_ptr ;
-
- KOKKOS_FORCEINLINE_FUNCTION
- void operator()( const typename ExecSpace::size_type& i ) const
- { new(m_ptr+i) type(); }
-
- ViewDefaultConstruct( type * pointer , size_t capacity )
- : m_ptr( pointer )
- {
- Kokkos::RangePolicy< ExecSpace > range( 0 , capacity );
- parallel_for( range , *this );
- ExecSpace::fence();
- }
- };
-
- } // namespace Impl
-} // namespace Kokkos
-
-#endif /* #if ! KOKKOS_USING_EXP_VIEW */
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-#endif
diff --git a/lib/kokkos/example/ichol/src/trsm.hpp b/lib/kokkos/example/ichol/src/trsm.hpp
deleted file mode 100644
index b4a6a7df4..000000000
--- a/lib/kokkos/example/ichol/src/trsm.hpp
+++ /dev/null
@@ -1,92 +0,0 @@
-#pragma once
-#ifndef __TRSM_HPP__
-#define __TRSM_HPP__
-
-/// \file trsm.hpp
-/// \brief Sparse triangular solve on given sparse patterns and multiple rhs.
-/// \author Kyungjoo Kim (kyukim@sandia.gov)
-
-#include "util.hpp"
-#include "control.hpp"
-#include "partition.hpp"
-
-namespace Tacho {
-
- using namespace std;
-
- template<int ArgSide,int ArgUplo, int ArgTrans, int ArgAlgo,
- int ArgVariant = Variant::One,
- template<int,int> class ControlType = Control>
- struct Trsm {
-
- // data-parallel interface
- // =======================
- template<typename ScalarType,
- typename ExecViewTypeA,
- typename ExecViewTypeB>
- KOKKOS_INLINE_FUNCTION
- static int invoke(typename ExecViewTypeA::policy_type &policy,
- const typename ExecViewTypeA::policy_type::member_type &member,
- const int diagA,
- const ScalarType alpha,
- typename ExecViewTypeA::matrix_type &A,
- typename ExecViewTypeB::matrix_type &B);
-
- // task-data parallel interface
- // ============================
- template<typename ScalarType,
- typename ExecViewTypeA,
- typename ExecViewTypeB>
- class TaskFunctor {
- public:
- typedef typename ExecViewTypeA::policy_type policy_type;
- typedef typename policy_type::member_type member_type;
- typedef int value_type;
-
- private:
- int _diagA;
- ScalarType _alpha;
- typename ExecViewTypeA::matrix_type _A;
- typename ExecViewTypeB::matrix_type _B;
-
- policy_type _policy;
-
- public:
- KOKKOS_INLINE_FUNCTION
- TaskFunctor(const policy_type & P,
- const int diagA,
- const ScalarType alpha,
- const ExecViewTypeA & A,
- const ExecViewTypeB & B)
- : _diagA(diagA),
- _alpha(alpha),
- _A(A),
- _B(B),
- _policy(P)
- { }
-
- string Label() const { return "Trsm"; }
-
- // task execution
- KOKKOS_INLINE_FUNCTION
- void apply(value_type &r_val) {
- r_val = Trsm::invoke<ScalarType,ExecViewTypeA,ExecViewTypeB>(_policy, _policy.member_single(),
- _diagA, _alpha, _A, _B);
- }
-
- // task-data execution
- KOKKOS_INLINE_FUNCTION
- void apply(const member_type &member, value_type &r_val) {
- r_val = Trsm::invoke<ScalarType,ExecViewTypeA,ExecViewTypeB>(_policy, member,
- _diagA, _alpha, _A, _B);
- }
-
- };
- };
-
-}
-
-// #include "trsm_l_u_nt.hpp"
-#include "trsm_l_u_ct.hpp"
-
-#endif
diff --git a/lib/kokkos/example/ichol/src/trsm_l_u_ct.hpp b/lib/kokkos/example/ichol/src/trsm_l_u_ct.hpp
deleted file mode 100644
index b6f328947..000000000
--- a/lib/kokkos/example/ichol/src/trsm_l_u_ct.hpp
+++ /dev/null
@@ -1,14 +0,0 @@
-#pragma once
-#ifndef __TRSM_L_U_CT_HPP__
-#define __TRSM_L_U_CT_HPP__
-
-/// \file trsm_l_u_ct.hpp
-/// \brief Sparse triangular solve on given sparse patterns and multiple rhs.
-/// \author Kyungjoo Kim (kyukim@sandia.gov)
-///
-#include "gemm.hpp"
-
-#include "trsm_l_u_ct_for_factor_blocked.hpp"
-// #include "trsm_l_u_ct_for_tri_solve_blocked.hpp"
-
-#endif
diff --git a/lib/kokkos/example/ichol/src/trsm_l_u_ct_for_factor_blocked.hpp b/lib/kokkos/example/ichol/src/trsm_l_u_ct_for_factor_blocked.hpp
deleted file mode 100644
index 7414e5d80..000000000
--- a/lib/kokkos/example/ichol/src/trsm_l_u_ct_for_factor_blocked.hpp
+++ /dev/null
@@ -1,185 +0,0 @@
-#pragma once
-#ifndef __TRSM_L_U_CT_FOR_FACTOR_BLOCKED_HPP__
-#define __TRSM_L_U_CT_FOR_FACTOR_BLOCKED_HPP__
-
-/// \file trsm_l_u_ct_for_factor_blocked.hpp
-/// \brief Sparse triangular solve on given sparse patterns and multiple rhs.
-/// \author Kyungjoo Kim (kyukim@sandia.gov)
-///
-
-namespace Tacho {
-
- using namespace std;
-
- // Trsm used in the factorization phase: data parallel on b1t
- // ==========================================================
- template<>
- template<typename ScalarType,
- typename CrsExecViewTypeA,
- typename CrsExecViewTypeB>
- KOKKOS_INLINE_FUNCTION
- int
- Trsm<Side::Left,Uplo::Upper,Trans::ConjTranspose,
- AlgoTrsm::ForFactorBlocked,Variant::One>
- ::invoke(typename CrsExecViewTypeA::policy_type &policy,
- const typename CrsExecViewTypeA::policy_type::member_type &member,
- const int diagA,
- const ScalarType alpha,
- typename CrsExecViewTypeA::matrix_type &A,
- typename CrsExecViewTypeB::matrix_type &B) {
- typedef typename CrsExecViewTypeA::ordinal_type ordinal_type;
- typedef typename CrsExecViewTypeA::value_type value_type;
- typedef typename CrsExecViewTypeA::row_view_type row_view_type;
-
-
-if ( false && member.team_rank() == 0 ) {
- printf("Trsm [%d +%d)x[%d +%d)\n"
- , B.OffsetRows()
- , B.NumRows()
- , B.OffsetCols()
- , B.NumCols()
- );
-}
-
- // scale the matrix B with alpha
- scaleCrsMatrix<ScalarType,CrsExecViewTypeB>(member, alpha, B);
-
- // Solve a system: AX = B -> B := inv(A) B
- const ordinal_type mA = A.NumRows();
- const ordinal_type nB = B.NumCols();
-
- if (nB > 0) {
- for (ordinal_type k=0;k<mA;++k) {
- row_view_type &a = A.RowView(k);
- // const value_type cdiag = std::conj(a.Value(0)); // for complex<T>
- const value_type cdiag = a.Value(0);
-
- // invert
- row_view_type &b1 = B.RowView(k);
- const ordinal_type nnz_b1 = b1.NumNonZeros();
-
- if (diagA != Diag::Unit && nnz_b1 > 0) {
- // b1t = b1t / conj(diag)
- Kokkos::parallel_for(Kokkos::TeamThreadRange(member, 0, nnz_b1),
- [&](const ordinal_type j) {
- b1.Value(j) /= cdiag;
- });
- }
-
- // update
- const ordinal_type nnz_a = a.NumNonZeros();
- if (nnz_a > 0) {
- // B2 = B2 - trans(conj(a12t)) b1t
- Kokkos::parallel_for(Kokkos::TeamThreadRange(member, 0, nnz_b1),
- [&](const ordinal_type j) {
- // grab b1
- const ordinal_type col_at_j = b1.Col(j);
- const value_type val_at_j = b1.Value(j);
-
- for (ordinal_type i=1;i<nnz_a;++i) {
- // grab a12t
- const ordinal_type row_at_i = a.Col(i);
- // const value_type val_at_i = conj(a.Value(i));
- const value_type val_at_i = a.Value(i);
-
- // grab b2
- row_view_type &b2 = B.RowView(row_at_i);
-
- // check and update
- ordinal_type idx = 0;
- idx = b2.Index(col_at_j, idx);
- if (idx >= 0)
- b2.Value(idx) -= val_at_i*val_at_j;
- }
- });
- }
- member.team_barrier();
- }
- }
-
- return 0;
- }
-
- // Trsm used in the factorization phase: data parallel on a1t
- // ==========================================================
- template<>
- template<typename ScalarType,
- typename CrsExecViewTypeA,
- typename CrsExecViewTypeB>
- KOKKOS_INLINE_FUNCTION
- int
- Trsm<Side::Left,Uplo::Upper,Trans::ConjTranspose,
- AlgoTrsm::ForFactorBlocked,Variant::Two>
- ::invoke(typename CrsExecViewTypeA::policy_type &policy,
- const typename CrsExecViewTypeA::policy_type::member_type &member,
- const int diagA,
- const ScalarType alpha,
- typename CrsExecViewTypeA::matrix_type &A,
- typename CrsExecViewTypeB::matrix_type &B) {
- typedef typename CrsExecViewTypeA::ordinal_type ordinal_type;
- typedef typename CrsExecViewTypeA::value_type value_type;
- typedef typename CrsExecViewTypeA::row_view_type row_view_type;
-
- // scale the matrix B with alpha
- scaleCrsMatrix<ScalarType,CrsExecViewTypeB>(member, alpha, B);
-
- // Solve a system: AX = B -> B := inv(A) B
- const ordinal_type mA = A.NumRows();
- const ordinal_type nB = B.NumCols();
-
- if (nB > 0) {
- for (ordinal_type k=0;k<mA;++k) {
- row_view_type &a = A.RowView(k);
- // const value_type cdiag = conj(a.Value(0));
- const value_type cdiag = a.Value(0);
-
- // invert
- row_view_type &b1 = B.RowView(k);
- const ordinal_type nnz_b1 = b1.NumNonZeros();
-
- if (diagA != Diag::Unit && nnz_b1 > 0) {
- // b1t = b1t / conj(diag)
- Kokkos::parallel_for(Kokkos::TeamThreadRange(member, 0, nnz_b1),
- [&](const ordinal_type j) {
- b1.Value(j) /= cdiag;
- });
- member.team_barrier();
- }
-
- // update
- const ordinal_type nnz_a = a.NumNonZeros();
- if (nnz_a > 0) {
- // B2 = B2 - trans(conj(a12t)) b1t
- Kokkos::parallel_for(Kokkos::TeamThreadRange(member, 1, nnz_a),
- [&](const ordinal_type i) {
- // grab a12t
- const ordinal_type row_at_i = a.Col(i);
- // const value_type val_at_i = conj(a.Value(i));
- const value_type val_at_i = a.Value(i);
-
- // grab b2
- row_view_type &b2 = B.RowView(row_at_i);
-
- ordinal_type idx = 0;
- for (ordinal_type j=0;j<nnz_b1 && (idx > -2);++j) {
- // grab b1
- const ordinal_type col_at_j = b1.Col(j);
- const value_type val_at_j = b1.Value(j);
-
- // check and update
- idx = b2.Index(col_at_j, idx);
- if (idx >= 0)
- b2.Value(idx) -= val_at_i*val_at_j;
- }
- });
- member.team_barrier();
- }
- }
- }
-
- return 0;
- }
-
-}
-
-#endif
diff --git a/lib/kokkos/example/ichol/src/util.cpp b/lib/kokkos/example/ichol/src/util.cpp
deleted file mode 100644
index ef220c48c..000000000
--- a/lib/kokkos/example/ichol/src/util.cpp
+++ /dev/null
@@ -1,4 +0,0 @@
-
-
-static int dummy = 1;
-
diff --git a/lib/kokkos/example/ichol/src/util.hpp b/lib/kokkos/example/ichol/src/util.hpp
deleted file mode 100644
index 020475bc5..000000000
--- a/lib/kokkos/example/ichol/src/util.hpp
+++ /dev/null
@@ -1,237 +0,0 @@
-#pragma once
-#ifndef __UTIL_HPP__
-#define __UTIL_HPP__
-
-#include <stdio.h>
-#include <string.h>
-
-#include <string>
-#include <iostream>
-#include <iomanip>
-#include <fstream>
-#include <vector>
-#include <set>
-#include <map>
-#include <algorithm>
-#include <memory>
-
-#include <cmath>
-#include <complex>
-
-#include <limits>
-
-/// \file util.hpp
-/// \brief Utility functions and constant integer class like an enum class.
-/// \author Kyungjoo Kim (kyukim@sandia.gov)
-///
-/// This provides utility functions for implementing mini-app for incomplete
-/// sparse matrix factorization with task-data parallelism e.g., parameter
-/// classes, error handling, ostream << overloading.
-///
-/// Note: The reference of the "static const int" members in the enum-like
-/// classes should not be used as function arguments but their values only.
-
-
-using namespace std;
-
-namespace Tacho {
-
-#undef CHKERR
-#define CHKERR(ierr) \
- if (ierr != 0) { cout << endl << ">> Error in " << __FILE__ << ", " << __LINE__ << " : " << ierr << endl; }
-
-#define MSG_NOT_YET_IMPLEMENTED ">> Not yet implemented"
-#define MSG_INVALID_INPUT(what) ">> Invaid input argument: " #what
-#define MSG_INVALID_TEMPLATE_ARGS ">> Invaid template arguments"
-#define ERROR(msg) \
- { cout << endl << ">> Error in " << __FILE__ << ", " << __LINE__ << endl << msg << endl; }
-
- // control id
-#undef Ctrl
-#define Ctrl(name,algo,variant) name<algo,variant>
-
- // control leaf
-#undef CtrlComponent
-#define CtrlComponent(name,algo,variant,component,id) \
- Ctrl(name,algo,variant)::component[id]
-
- // control recursion
-#undef CtrlDetail
-#define CtrlDetail(name,algo,variant,component) \
- CtrlComponent(name,algo,variant,component,0),CtrlComponent(name,algo,variant,component,1),name
-
- /// \class GraphHelper
- class GraphHelper {
- public:
- static const int DefaultRandomSeed = -1;
- };
-
-
- /// \class Partition
- /// \brief Matrix partition parameters.
- class Partition {
- public:
- static const int Top = 101;
- static const int Bottom = 102;
-
- static const int Left = 201;
- static const int Right = 202;
-
- static const int TopLeft = 401;
- static const int TopRight = 402;
- static const int BottomLeft = 403;
- static const int BottomRight = 404;
- };
-
- /// \class Uplo
- /// \brief Matrix upper/lower parameters.
- class Uplo {
- public:
- static const int Upper = 501;
- static const int Lower = 502;
- };
-
- /// \class Side
- /// \brief Matrix left/right parameters.
- class Side {
- public:
- static const int Left = 601;
- static const int Right = 602;
- };
-
- /// \class Diag
- /// \brief Matrix unit/non-unit diag parameters.
- class Diag {
- public:
- static const int Unit = 701;
- static const int NonUnit = 702;
- };
-
- /// \class Trans
- /// \brief Matrix upper/lower parameters.
- class Trans {
- public:
- static const int Transpose = 801;
- static const int ConjTranspose = 802;
- static const int NoTranspose = 803;
- };
-
- /// \class Loop
- /// \brief outer/innner parameters
- class Loop {
- public:
- static const int Outer = 901;
- static const int Inner = 902;
- static const int Fused = 903;
- };
-
- class Variant {
- public:
- static const int One = 1;
- static const int Two = 2;
- static const int Three = 3;
- static const int Four = 4;
- };
-
- /// \class AlgoChol
- /// \brief Algorithmic variants in sparse factorization and sparse BLAS operations.
- class AlgoChol {
- public:
- // One side factorization on flat matrices
- static const int Dummy = 1000;
- static const int Unblocked = 1001;
- static const int UnblockedOpt = 1002;
- static const int Blocked = 1101; // testing only
-
- static const int RightLookByBlocks = 1201; // backbone structure is right looking
- static const int ByBlocks = RightLookByBlocks;
-
- static const int NestedDenseBlock = 1211;
- static const int NestedDenseByBlocks = 1212;
-
- static const int RightLookDenseByBlocks = 1221;
- static const int DenseByBlocks = RightLookDenseByBlocks;
-
- static const int ExternalLapack = 1231;
- static const int ExternalPardiso = 1232;
- };
-
- // aliasing name space
- typedef AlgoChol AlgoTriSolve;
-
- class AlgoBlasLeaf {
- public:
- // One side factorization on flat matrices
- static const int ForFactorBlocked = 2001;
-
- // B and C are dense matrices and used for solve phase
- static const int ForTriSolveBlocked = 2011;
-
- static const int ExternalBlas = 2021;
- };
-
- class AlgoGemm : public AlgoBlasLeaf {
- public:
- static const int DenseByBlocks = 2101;
- };
-
- class AlgoTrsm : public AlgoBlasLeaf {
- public:
- static const int DenseByBlocks = 2201;
- };
-
- class AlgoHerk : public AlgoBlasLeaf {
- public:
- static const int DenseByBlocks = 2301;
- };
-
- /// \brief Interface for overloaded stream operators.
- template<typename T>
- inline
- ostream& operator<<(ostream &os, const unique_ptr<T> &p) {
- return p->showMe(os);
- }
-
- /// \class Disp
- /// \brief Interface for the stream operator.
- class Disp {
- friend ostream& operator<<(ostream &os, const Disp &disp);
- public:
- Disp() { }
- virtual ostream& showMe(ostream &os) const {
- return os;
- }
- };
-
- /// \brief Implementation of the overloaded stream operator.
- inline
- ostream& operator<<(ostream &os, const Disp &disp) {
- return disp.showMe(os);
- }
-
- template<typename T> struct NumericTraits {};
-
- template<>
- struct NumericTraits<float> {
- typedef float real_type;
- static real_type epsilon() { return numeric_limits<float>::epsilon(); }
- };
- template<>
- struct NumericTraits<double> {
- typedef double real_type;
- static real_type epsilon() { return numeric_limits<double>::epsilon(); }
- };
- template<>
- struct NumericTraits<complex<float> > {
- typedef float real_type;
- static real_type epsilon() { return numeric_limits<float>::epsilon(); }
- };
- template<>
- struct NumericTraits<complex<double> > {
- typedef double real_type;
- static real_type epsilon() { return numeric_limits<double>::epsilon(); }
- };
-
-}
-
-#endif
diff --git a/lib/kokkos/example/md_skeleton/Makefile b/lib/kokkos/example/md_skeleton/Makefile
index bf8fbea3e..42b376ec7 100644
--- a/lib/kokkos/example/md_skeleton/Makefile
+++ b/lib/kokkos/example/md_skeleton/Makefile
@@ -1,53 +1,46 @@
KOKKOS_PATH ?= ../..
MAKEFILE_PATH := $(abspath $(lastword $(MAKEFILE_LIST)))
SRC_DIR := $(dir $(MAKEFILE_PATH))
SRC = $(wildcard $(SRC_DIR)/*.cpp)
OBJ = $(SRC:$(SRC_DIR)/%.cpp=%.o)
#SRC = $(wildcard *.cpp)
#OBJ = $(SRC:%.cpp=%.o)
default: build
echo "Start Build"
-# use installed Makefile.kokkos
-include $(KOKKOS_PATH)/Makefile.kokkos
-
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
-CXX = $(NVCC_WRAPPER)
-CXXFLAGS = -I$(SRC_DIR) -O3
-LINK = $(CXX)
-LINKFLAGS =
-EXE = $(addsuffix .cuda, $(shell basename $(SRC_DIR)))
-#KOKKOS_DEVICES = "Cuda,OpenMP"
-#KOKKOS_ARCH = "SNB,Kepler35"
+ CXX = $(KOKKOS_PATH)/bin/nvcc_wrapper
+ EXE = $(addsuffix .cuda, $(shell basename $(SRC_DIR)))
else
-CXX = g++
-CXXFLAGS = -I$(SRC_DIR) -O3
-LINK = $(CXX)
-LINKFLAGS =
-EXE = $(addsuffix .host, $(shell basename $(SRC_DIR)))
-#KOKKOS_DEVICES = "OpenMP"
-#KOKKOS_ARCH = "SNB"
+ CXX = g++
+ EXE = $(addsuffix .host, $(shell basename $(SRC_DIR)))
endif
+CXXFLAGS = -O3 -I$(SRC_DIR)
+LINK ?= $(CXX)
+LDFLAGS ?=
+
+include $(KOKKOS_PATH)/Makefile.kokkos
+
DEPFLAGS = -M
LIB =
build: $(EXE)
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
clean:
rm -f *.a *.o *.cuda *.host
# Compilation rules
%.o:$(SRC_DIR)/%.cpp $(KOKKOS_CPP_DEPENDS)
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
diff --git a/lib/kokkos/example/multi_fem/Makefile b/lib/kokkos/example/multi_fem/Makefile
index 72e1768fc..4b114b562 100644
--- a/lib/kokkos/example/multi_fem/Makefile
+++ b/lib/kokkos/example/multi_fem/Makefile
@@ -1,53 +1,49 @@
KOKKOS_PATH ?= ../..
MAKEFILE_PATH := $(abspath $(lastword $(MAKEFILE_LIST)))
SRC_DIR := $(dir $(MAKEFILE_PATH))
SRC = $(wildcard $(SRC_DIR)/*.cpp)
OBJ = $(SRC:$(SRC_DIR)/%.cpp=%.o)
#SRC = $(wildcard *.cpp)
#OBJ = $(SRC:%.cpp=%.o)
default: build
echo "Start Build"
-# use installed Makefile.kokkos
-include $(KOKKOS_PATH)/Makefile.kokkos
+CXXFLAGS = -O3 -I$(SRC_DIR)
+LDFLAGS ?=
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
-CXX = $(NVCC_WRAPPER)
-CXXFLAGS = -I$(SRC_DIR) -I$(CUDA_PATH) -O3
-LINK = $(CXX)
-LINKFLAGS = -L$(CUDA_PATH)/lib64 -lcusparse
-EXE = $(addsuffix .cuda, $(shell basename $(SRC_DIR)))
-#KOKKOS_DEVICES = "Cuda,OpenMP"
-#KOKKOS_ARCH = "SNB,Kepler35"
+ CXX = $(KOKKOS_PATH)/bin/nvcc_wrapper
+ EXE = $(addsuffix .cuda, $(shell basename $(SRC_DIR)))
+ CXXFLAGS += -I$(SRC_DIR) -I$(CUDA_PATH) -O3
+ LDFLAGS += -L$(CUDA_PATH)/lib64 -lcusparse
else
-CXX = g++
-CXXFLAGS = -I$(SRC_DIR) -O3
-LINK = $(CXX)
-LINKFLAGS =
-EXE = $(addsuffix .host, $(shell basename $(SRC_DIR)))
-#KOKKOS_DEVICES = "OpenMP"
-#KOKKOS_ARCH = "SNB"
+ CXX = g++
+ EXE = $(addsuffix .host, $(shell basename $(SRC_DIR)))
endif
+LINK ?= $(CXX)
+
+include $(KOKKOS_PATH)/Makefile.kokkos
+
DEPFLAGS = -M
LIB =
build: $(EXE)
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
clean:
rm -f *.a *.o *.cuda *.host
# Compilation rules
%.o:$(SRC_DIR)/%.cpp $(KOKKOS_CPP_DEPENDS)
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
diff --git a/lib/kokkos/example/query_device/Makefile b/lib/kokkos/example/query_device/Makefile
index bf8fbea3e..42b376ec7 100644
--- a/lib/kokkos/example/query_device/Makefile
+++ b/lib/kokkos/example/query_device/Makefile
@@ -1,53 +1,46 @@
KOKKOS_PATH ?= ../..
MAKEFILE_PATH := $(abspath $(lastword $(MAKEFILE_LIST)))
SRC_DIR := $(dir $(MAKEFILE_PATH))
SRC = $(wildcard $(SRC_DIR)/*.cpp)
OBJ = $(SRC:$(SRC_DIR)/%.cpp=%.o)
#SRC = $(wildcard *.cpp)
#OBJ = $(SRC:%.cpp=%.o)
default: build
echo "Start Build"
-# use installed Makefile.kokkos
-include $(KOKKOS_PATH)/Makefile.kokkos
-
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
-CXX = $(NVCC_WRAPPER)
-CXXFLAGS = -I$(SRC_DIR) -O3
-LINK = $(CXX)
-LINKFLAGS =
-EXE = $(addsuffix .cuda, $(shell basename $(SRC_DIR)))
-#KOKKOS_DEVICES = "Cuda,OpenMP"
-#KOKKOS_ARCH = "SNB,Kepler35"
+ CXX = $(KOKKOS_PATH)/bin/nvcc_wrapper
+ EXE = $(addsuffix .cuda, $(shell basename $(SRC_DIR)))
else
-CXX = g++
-CXXFLAGS = -I$(SRC_DIR) -O3
-LINK = $(CXX)
-LINKFLAGS =
-EXE = $(addsuffix .host, $(shell basename $(SRC_DIR)))
-#KOKKOS_DEVICES = "OpenMP"
-#KOKKOS_ARCH = "SNB"
+ CXX = g++
+ EXE = $(addsuffix .host, $(shell basename $(SRC_DIR)))
endif
+CXXFLAGS = -O3 -I$(SRC_DIR)
+LINK ?= $(CXX)
+LDFLAGS ?=
+
+include $(KOKKOS_PATH)/Makefile.kokkos
+
DEPFLAGS = -M
LIB =
build: $(EXE)
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
clean:
rm -f *.a *.o *.cuda *.host
# Compilation rules
%.o:$(SRC_DIR)/%.cpp $(KOKKOS_CPP_DEPENDS)
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
diff --git a/lib/kokkos/example/sort_array/Makefile b/lib/kokkos/example/sort_array/Makefile
index bf8fbea3e..42b376ec7 100644
--- a/lib/kokkos/example/sort_array/Makefile
+++ b/lib/kokkos/example/sort_array/Makefile
@@ -1,53 +1,46 @@
KOKKOS_PATH ?= ../..
MAKEFILE_PATH := $(abspath $(lastword $(MAKEFILE_LIST)))
SRC_DIR := $(dir $(MAKEFILE_PATH))
SRC = $(wildcard $(SRC_DIR)/*.cpp)
OBJ = $(SRC:$(SRC_DIR)/%.cpp=%.o)
#SRC = $(wildcard *.cpp)
#OBJ = $(SRC:%.cpp=%.o)
default: build
echo "Start Build"
-# use installed Makefile.kokkos
-include $(KOKKOS_PATH)/Makefile.kokkos
-
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
-CXX = $(NVCC_WRAPPER)
-CXXFLAGS = -I$(SRC_DIR) -O3
-LINK = $(CXX)
-LINKFLAGS =
-EXE = $(addsuffix .cuda, $(shell basename $(SRC_DIR)))
-#KOKKOS_DEVICES = "Cuda,OpenMP"
-#KOKKOS_ARCH = "SNB,Kepler35"
+ CXX = $(KOKKOS_PATH)/bin/nvcc_wrapper
+ EXE = $(addsuffix .cuda, $(shell basename $(SRC_DIR)))
else
-CXX = g++
-CXXFLAGS = -I$(SRC_DIR) -O3
-LINK = $(CXX)
-LINKFLAGS =
-EXE = $(addsuffix .host, $(shell basename $(SRC_DIR)))
-#KOKKOS_DEVICES = "OpenMP"
-#KOKKOS_ARCH = "SNB"
+ CXX = g++
+ EXE = $(addsuffix .host, $(shell basename $(SRC_DIR)))
endif
+CXXFLAGS = -O3 -I$(SRC_DIR)
+LINK ?= $(CXX)
+LDFLAGS ?=
+
+include $(KOKKOS_PATH)/Makefile.kokkos
+
DEPFLAGS = -M
LIB =
build: $(EXE)
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
clean:
rm -f *.a *.o *.cuda *.host
# Compilation rules
%.o:$(SRC_DIR)/%.cpp $(KOKKOS_CPP_DEPENDS)
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
diff --git a/lib/kokkos/example/sort_array/sort_array.hpp b/lib/kokkos/example/sort_array/sort_array.hpp
index d21f99895..ae17cb7ac 100644
--- a/lib/kokkos/example/sort_array/sort_array.hpp
+++ b/lib/kokkos/example/sort_array/sort_array.hpp
@@ -1,190 +1,190 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef EXAMPLE_SORT_ARRAY
#define EXAMPLE_SORT_ARRAY
#include <stdlib.h>
#include <algorithm>
#include <Kokkos_Core.hpp>
#include <impl/Kokkos_Timer.hpp>
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Example {
template< class Device >
struct SortView {
template< typename ValueType >
SortView( const Kokkos::View<ValueType*,Device> v , int begin , int end )
{
std::sort( v.ptr_on_device() + begin , v.ptr_on_device() + end );
}
};
}
#if defined(KOKKOS_HAVE_CUDA)
#include <thrust/device_ptr.h>
#include <thrust/sort.h>
namespace Example {
template<>
struct SortView< Kokkos::Cuda > {
template< typename ValueType >
SortView( const Kokkos::View<ValueType*,Kokkos::Cuda> v , int begin , int end )
{
thrust::sort( thrust::device_ptr<ValueType>( v.ptr_on_device() + begin )
, thrust::device_ptr<ValueType>( v.ptr_on_device() + end ) );
}
};
}
#endif
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Example {
template< class Device >
void sort_array( const size_t array_length /* length of spans of array to sort */
, const size_t total_length /* total length of array */
, const int print = 1 )
{
typedef Device execution_space ;
typedef Kokkos::View<int*,Device> device_array_type ;
#if defined( KOKKOS_HAVE_CUDA )
typedef typename
- Kokkos::Impl::if_c< Kokkos::Impl::is_same< Device , Kokkos::Cuda >::value
+ Kokkos::Impl::if_c< std::is_same< Device , Kokkos::Cuda >::value
, Kokkos::View<int*,Kokkos::Cuda::array_layout,Kokkos::CudaHostPinnedSpace>
, typename device_array_type::HostMirror
>::type host_array_type ;
#else
typedef typename device_array_type::HostMirror host_array_type ;
#endif
Kokkos::Timer timer;
const device_array_type work_array("work_array" , array_length );
const host_array_type host_array("host_array" , total_length );
std::cout << "sort_array length( " << total_length << " )"
<< " in chunks( " << array_length << " )"
<< std::endl ;
double sec = timer.seconds();
std::cout << "declaring Views took "
<< sec << " seconds" << std::endl;
timer.reset();
for ( size_t i = 0 ; i < total_length ; ++i ) {
host_array(i) = ( lrand48() * total_length ) >> 31 ;
}
sec = timer.seconds();
std::cout << "initializing " << total_length << " elements on host took "
<< sec << " seconds" << std::endl;
timer.reset();
double sec_copy_in = 0 ;
double sec_sort = 0 ;
double sec_copy_out = 0 ;
double sec_error = 0 ;
size_t error_count = 0 ;
for ( size_t begin = 0 ; begin < total_length ; begin += array_length ) {
const size_t end = begin + array_length < total_length
? begin + array_length : total_length ;
const std::pair<size_t,size_t> host_range(begin,end);
const host_array_type host_subarray = Kokkos::subview( host_array , host_range );
timer.reset();
Kokkos::deep_copy( work_array , host_subarray );
sec_copy_in += timer.seconds(); timer.reset();
SortView< execution_space >( work_array , 0 , end - begin );
sec_sort += timer.seconds(); timer.reset();
Kokkos::deep_copy( host_subarray , work_array );
sec_copy_out += timer.seconds(); timer.reset();
for ( size_t i = begin + 1 ; i < end ; ++i ) {
if ( host_array(i) < host_array(i-1) ) ++error_count ;
}
sec_error += timer.seconds(); timer.reset();
}
std::cout << "copy to device " << sec_copy_in << " seconds" << std::endl
<< "sort on device " << sec_sort << " seconds" << std::endl
<< "copy from device " << sec_copy_out << " seconds" << std::endl
<< "errors " << error_count << " took " << sec_error << " seconds" << std::endl
;
}
} // namespace Example
//----------------------------------------------------------------------------
#endif /* #ifndef EXAMPLE_SORT_ARRAY */
diff --git a/lib/kokkos/example/tutorial/01_hello_world/Makefile b/lib/kokkos/example/tutorial/01_hello_world/Makefile
index 78a9fed0c..62ab22f17 100644
--- a/lib/kokkos/example/tutorial/01_hello_world/Makefile
+++ b/lib/kokkos/example/tutorial/01_hello_world/Makefile
@@ -1,43 +1,48 @@
KOKKOS_PATH = ../../..
-SRC = $(wildcard *.cpp)
+KOKKOS_SRC_PATH = ${KOKKOS_PATH}
+SRC = $(wildcard ${KOKKOS_SRC_PATH}/example/tutorial/01_hello_world/*.cpp)
+vpath %.cpp $(sort $(dir $(SRC)))
default: build
echo "Start Build"
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
-CXX = ../../../config/nvcc_wrapper
+CXX = ${KOKKOS_PATH}/bin/nvcc_wrapper
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.cuda)
+EXE = 01_hello_world.cuda
KOKKOS_DEVICES = "Cuda,OpenMP"
KOKKOS_ARCH = "SNB,Kepler35"
else
CXX = g++
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.host)
+EXE = 01_hello_world.host
KOKKOS_DEVICES = "OpenMP"
KOKKOS_ARCH = "SNB"
endif
DEPFLAGS = -M
-OBJ = $(SRC:.cpp=.o)
+OBJ = $(notdir $(SRC:.cpp=.o))
LIB =
include $(KOKKOS_PATH)/Makefile.kokkos
build: $(EXE)
+test: $(EXE)
+ ./$(EXE)
+
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
clean: kokkos-clean
rm -f *.o *.cuda *.host
# Compilation rules
%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
- $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
+ $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $< -o $(notdir $@)
diff --git a/lib/kokkos/example/tutorial/01_hello_world_lambda/Makefile b/lib/kokkos/example/tutorial/01_hello_world_lambda/Makefile
index 95ee2c47f..52d5fb07c 100644
--- a/lib/kokkos/example/tutorial/01_hello_world_lambda/Makefile
+++ b/lib/kokkos/example/tutorial/01_hello_world_lambda/Makefile
@@ -1,44 +1,49 @@
KOKKOS_PATH = ../../..
-SRC = $(wildcard *.cpp)
+KOKKOS_SRC_PATH = ${KOKKOS_PATH}
+SRC = $(wildcard ${KOKKOS_SRC_PATH}/example/tutorial/01_hello_world_lambda/*.cpp)
+vpath %.cpp $(sort $(dir $(SRC)))
default: build
echo "Start Build"
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
-CXX = ../../../config/nvcc_wrapper
+CXX = ${KOKKOS_PATH}/bin/nvcc_wrapper
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.cuda)
+EXE = 01_hello_world_lambda.cuda
KOKKOS_DEVICES = "Cuda,OpenMP"
KOKKOS_ARCH = "SNB,Kepler35"
-KOKKOS_CUDA_OPTIONS = "enable_lambda"
+KOKKOS_CUDA_OPTIONS += "enable_lambda"
else
CXX = g++
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.host)
+EXE = 01_hello_world_lambda.host
KOKKOS_DEVICES = "OpenMP"
KOKKOS_ARCH = "SNB"
endif
DEPFLAGS = -M
-OBJ = $(SRC:.cpp=.o)
+OBJ = $(notdir $(SRC:.cpp=.o))
LIB =
include $(KOKKOS_PATH)/Makefile.kokkos
build: $(EXE)
+test: $(EXE)
+ ./$(EXE)
+
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
clean: kokkos-clean
rm -f *.o *.cuda *.host
# Compilation rules
%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
- $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
+ $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $< -o $(notdir $@)
diff --git a/lib/kokkos/example/tutorial/01_hello_world_lambda/hello_world_lambda.cpp b/lib/kokkos/example/tutorial/01_hello_world_lambda/hello_world_lambda.cpp
index b6c9cc5e4..4b8b9db62 100644
--- a/lib/kokkos/example/tutorial/01_hello_world_lambda/hello_world_lambda.cpp
+++ b/lib/kokkos/example/tutorial/01_hello_world_lambda/hello_world_lambda.cpp
@@ -1,109 +1,112 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Core.hpp>
#include <cstdio>
#include <typeinfo>
//
// "Hello world" parallel_for example:
// 1. Start up Kokkos
// 2. Execute a parallel for loop in the default execution space,
// using a C++11 lambda to define the loop body
// 3. Shut down Kokkos
//
// This example only builds if C++11 is enabled. Compare this example
// to 01_hello_world, which uses functors (explicitly defined classes)
// to define the loop body of the parallel_for. Both functors and
// lambdas have their places.
//
int main (int argc, char* argv[]) {
// You must call initialize() before you may call Kokkos.
//
// With no arguments, this initializes the default execution space
// (and potentially its host execution space) with default
// parameters. You may also pass in argc and argv, analogously to
// MPI_Init(). It reads and removes command-line arguments that
// start with "--kokkos-".
Kokkos::initialize (argc, argv);
// Print the name of Kokkos' default execution space. We're using
// typeid here, so the name might get a bit mangled by the linker,
// but you should still be able to figure out what it is.
printf ("Hello World on Kokkos execution space %s\n",
typeid (Kokkos::DefaultExecutionSpace).name ());
// Run lambda on the default Kokkos execution space in parallel,
// with a parallel for loop count of 15. The lambda's argument is
// an integer which is the parallel for's loop index. As you learn
// about different kinds of parallelism, you will find out that
// there are other valid argument types as well.
//
// For a single level of parallelism, we prefer that you use the
// KOKKOS_LAMBDA macro. If CUDA is disabled, this just turns into
// [=]. That captures variables from the surrounding scope by
// value. Do NOT capture them by reference! If CUDA is enabled,
// this macro may have a special definition that makes the lambda
// work correctly with CUDA. Compare to the KOKKOS_INLINE_FUNCTION
// macro, which has a special meaning if CUDA is enabled.
//
// The following parallel_for would look like this if we were using
// OpenMP by itself, instead of Kokkos:
//
// #pragma omp parallel for
// for (int i = 0; i < 15; ++i) {
// printf ("Hello from i = %i\n", i);
// }
//
// You may notice that the printed numbers do not print out in
// order. Parallel for loops may execute in any order.
+ // We also need to protect the usage of a lambda against compiling
+ // with a backend which doesn't support it (i.e. Cuda 6.5/7.0).
+#if (KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA)
Kokkos::parallel_for (15, KOKKOS_LAMBDA (const int i) {
// printf works in a CUDA parallel kernel; std::ostream does not.
printf ("Hello from i = %i\n", i);
});
-
+#endif
// You must call finalize() after you are done using Kokkos.
Kokkos::finalize ();
}
diff --git a/lib/kokkos/example/tutorial/02_simple_reduce/Makefile b/lib/kokkos/example/tutorial/02_simple_reduce/Makefile
index 78a9fed0c..d102af515 100644
--- a/lib/kokkos/example/tutorial/02_simple_reduce/Makefile
+++ b/lib/kokkos/example/tutorial/02_simple_reduce/Makefile
@@ -1,43 +1,48 @@
KOKKOS_PATH = ../../..
-SRC = $(wildcard *.cpp)
+KOKKOS_SRC_PATH = ${KOKKOS_PATH}
+SRC = $(wildcard ${KOKKOS_SRC_PATH}/example/tutorial/02_simple_reduce/*.cpp)
+vpath %.cpp $(sort $(dir $(SRC)))
default: build
echo "Start Build"
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
-CXX = ../../../config/nvcc_wrapper
+CXX = ${KOKKOS_PATH}/bin/nvcc_wrapper
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.cuda)
+EXE = 02_simple_reduce.cuda
KOKKOS_DEVICES = "Cuda,OpenMP"
KOKKOS_ARCH = "SNB,Kepler35"
else
CXX = g++
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.host)
+EXE = 02_simple_reduce.host
KOKKOS_DEVICES = "OpenMP"
KOKKOS_ARCH = "SNB"
endif
DEPFLAGS = -M
-OBJ = $(SRC:.cpp=.o)
+OBJ = $(notdir $(SRC:.cpp=.o))
LIB =
include $(KOKKOS_PATH)/Makefile.kokkos
build: $(EXE)
+test: $(EXE)
+ ./$(EXE)
+
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
clean: kokkos-clean
rm -f *.o *.cuda *.host
# Compilation rules
%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
- $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
+ $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $< -o $(notdir $@)
diff --git a/lib/kokkos/example/tutorial/02_simple_reduce_lambda/Makefile b/lib/kokkos/example/tutorial/02_simple_reduce_lambda/Makefile
index 95ee2c47f..4545668b7 100644
--- a/lib/kokkos/example/tutorial/02_simple_reduce_lambda/Makefile
+++ b/lib/kokkos/example/tutorial/02_simple_reduce_lambda/Makefile
@@ -1,44 +1,49 @@
KOKKOS_PATH = ../../..
-SRC = $(wildcard *.cpp)
+KOKKOS_SRC_PATH = ${KOKKOS_PATH}
+SRC = $(wildcard ${KOKKOS_SRC_PATH}/example/tutorial/02_simple_reduce_lambda/*.cpp)
+vpath %.cpp $(sort $(dir $(SRC)))
default: build
echo "Start Build"
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
-CXX = ../../../config/nvcc_wrapper
+CXX = ${KOKKOS_PATH}/bin/nvcc_wrapper
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.cuda)
+EXE = 02_simple_reduce_lambda.cuda
KOKKOS_DEVICES = "Cuda,OpenMP"
KOKKOS_ARCH = "SNB,Kepler35"
-KOKKOS_CUDA_OPTIONS = "enable_lambda"
+KOKKOS_CUDA_OPTIONS += "enable_lambda"
else
CXX = g++
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.host)
+EXE = 02_simple_reduce_lambda.host
KOKKOS_DEVICES = "OpenMP"
KOKKOS_ARCH = "SNB"
endif
DEPFLAGS = -M
-OBJ = $(SRC:.cpp=.o)
+OBJ = $(notdir $(SRC:.cpp=.o))
LIB =
include $(KOKKOS_PATH)/Makefile.kokkos
build: $(EXE)
+test: $(EXE)
+ ./$(EXE)
+
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
clean: kokkos-clean
rm -f *.o *.cuda *.host
# Compilation rules
%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
- $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
+ $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $< -o $(notdir $@)
diff --git a/lib/kokkos/example/tutorial/02_simple_reduce_lambda/simple_reduce_lambda.cpp b/lib/kokkos/example/tutorial/02_simple_reduce_lambda/simple_reduce_lambda.cpp
index a403633a8..f44ddce30 100644
--- a/lib/kokkos/example/tutorial/02_simple_reduce_lambda/simple_reduce_lambda.cpp
+++ b/lib/kokkos/example/tutorial/02_simple_reduce_lambda/simple_reduce_lambda.cpp
@@ -1,86 +1,94 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Core.hpp>
#include <cstdio>
//
// First reduction (parallel_reduce) example:
// 1. Start up Kokkos
// 2. Execute a parallel_reduce loop in the default execution space,
// using a C++11 lambda to define the loop body
// 3. Shut down Kokkos
//
// This example only builds if C++11 is enabled. Compare this example
// to 02_simple_reduce, which uses a functor to define the loop body
// of the parallel_reduce.
//
int main (int argc, char* argv[]) {
Kokkos::initialize (argc, argv);
const int n = 10;
// Compute the sum of squares of integers from 0 to n-1, in
// parallel, using Kokkos. This time, use a lambda instead of a
// functor. The lambda takes the same arguments as the functor's
// operator().
int sum = 0;
// The KOKKOS_LAMBDA macro replaces the capture-by-value clause [=].
// It also handles any other syntax needed for CUDA.
+ // We also need to protect the usage of a lambda against compiling
+ // with a backend which doesn't support it (i.e. Cuda 6.5/7.0).
+ #if (KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA)
Kokkos::parallel_reduce (n, KOKKOS_LAMBDA (const int i, int& lsum) {
lsum += i*i;
}, sum);
+ #endif
printf ("Sum of squares of integers from 0 to %i, "
"computed in parallel, is %i\n", n - 1, sum);
// Compare to a sequential loop.
int seqSum = 0;
for (int i = 0; i < n; ++i) {
seqSum += i*i;
}
printf ("Sum of squares of integers from 0 to %i, "
"computed sequentially, is %i\n", n - 1, seqSum);
Kokkos::finalize ();
+#if (KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA)
return (sum == seqSum) ? 0 : -1;
+#else
+ return 0;
+#endif
}
diff --git a/lib/kokkos/example/tutorial/03_simple_view/Makefile b/lib/kokkos/example/tutorial/03_simple_view/Makefile
index 78a9fed0c..e716b765e 100644
--- a/lib/kokkos/example/tutorial/03_simple_view/Makefile
+++ b/lib/kokkos/example/tutorial/03_simple_view/Makefile
@@ -1,43 +1,48 @@
KOKKOS_PATH = ../../..
-SRC = $(wildcard *.cpp)
+KOKKOS_SRC_PATH = ${KOKKOS_PATH}
+SRC = $(wildcard ${KOKKOS_SRC_PATH}/example/tutorial/03_simple_view/*.cpp)
+vpath %.cpp $(sort $(dir $(SRC)))
default: build
echo "Start Build"
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
-CXX = ../../../config/nvcc_wrapper
+CXX = ${KOKKOS_PATH}/bin/nvcc_wrapper
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.cuda)
+EXE = 03_simple_view.cuda
KOKKOS_DEVICES = "Cuda,OpenMP"
KOKKOS_ARCH = "SNB,Kepler35"
else
CXX = g++
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.host)
+EXE = 03_simple_view.host
KOKKOS_DEVICES = "OpenMP"
KOKKOS_ARCH = "SNB"
endif
DEPFLAGS = -M
-OBJ = $(SRC:.cpp=.o)
+OBJ = $(notdir $(SRC:.cpp=.o))
LIB =
include $(KOKKOS_PATH)/Makefile.kokkos
build: $(EXE)
+test: $(EXE)
+ ./$(EXE)
+
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
clean: kokkos-clean
rm -f *.o *.cuda *.host
# Compilation rules
%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
- $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
+ $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $< -o $(notdir $@)
diff --git a/lib/kokkos/example/tutorial/03_simple_view_lambda/Makefile b/lib/kokkos/example/tutorial/03_simple_view_lambda/Makefile
index 95ee2c47f..b93c14910 100644
--- a/lib/kokkos/example/tutorial/03_simple_view_lambda/Makefile
+++ b/lib/kokkos/example/tutorial/03_simple_view_lambda/Makefile
@@ -1,44 +1,49 @@
KOKKOS_PATH = ../../..
-SRC = $(wildcard *.cpp)
+KOKKOS_SRC_PATH = ${KOKKOS_PATH}
+SRC = $(wildcard ${KOKKOS_SRC_PATH}/example/tutorial/03_simple_view_lambda/*.cpp)
+vpath %.cpp $(sort $(dir $(SRC)))
default: build
echo "Start Build"
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
-CXX = ../../../config/nvcc_wrapper
+CXX = ${KOKKOS_PATH}/bin/nvcc_wrapper
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.cuda)
+EXE = 03_simple_view_lambda.cuda
KOKKOS_DEVICES = "Cuda,OpenMP"
KOKKOS_ARCH = "SNB,Kepler35"
-KOKKOS_CUDA_OPTIONS = "enable_lambda"
+KOKKOS_CUDA_OPTIONS += "enable_lambda"
else
CXX = g++
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.host)
+EXE = 03_simple_view_lambda.host
KOKKOS_DEVICES = "OpenMP"
KOKKOS_ARCH = "SNB"
endif
DEPFLAGS = -M
-OBJ = $(SRC:.cpp=.o)
+OBJ = $(notdir $(SRC:.cpp=.o))
LIB =
include $(KOKKOS_PATH)/Makefile.kokkos
build: $(EXE)
+test: $(EXE)
+ ./$(EXE)
+
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
clean: kokkos-clean
rm -f *.o *.cuda *.host
# Compilation rules
%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
- $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
+ $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $< -o $(notdir $@)
diff --git a/lib/kokkos/example/tutorial/03_simple_view_lambda/simple_view_lambda.cpp b/lib/kokkos/example/tutorial/03_simple_view_lambda/simple_view_lambda.cpp
index 974af7477..e9e7c2370 100644
--- a/lib/kokkos/example/tutorial/03_simple_view_lambda/simple_view_lambda.cpp
+++ b/lib/kokkos/example/tutorial/03_simple_view_lambda/simple_view_lambda.cpp
@@ -1,116 +1,120 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
//
// First Kokkos::View (multidimensional array) example:
// 1. Start up Kokkos
// 2. Allocate a Kokkos::View
// 3. Execute a parallel_for and a parallel_reduce over that View's data
// 4. Shut down Kokkos
//
// Compare this example to 03_simple_view, which uses functors to
// define the loop bodies of the parallel_for and parallel_reduce.
//
#include <Kokkos_Core.hpp>
#include <cstdio>
// A Kokkos::View is an array of zero or more dimensions. The number
// of dimensions is specified at compile time, as part of the type of
// the View. This array has two dimensions. The first one
// (represented by the asterisk) is a run-time dimension, and the
// second (represented by [3]) is a compile-time dimension. Thus,
// this View type is an N x 3 array of type double, where N is
// specified at run time in the View's constructor.
//
// The first dimension of the View is the dimension over which it is
// efficient for Kokkos to parallelize.
typedef Kokkos::View<double*[3]> view_type;
int main (int argc, char* argv[]) {
Kokkos::initialize (argc, argv);
// Allocate the View. The first dimension is a run-time parameter
// N. We set N = 10 here. The second dimension is a compile-time
// parameter, 3. We don't specify it here because we already set it
// by declaring the type of the View.
//
// Views get initialized to zero by default. This happens in
// parallel, using the View's memory space's default execution
// space. Parallel initialization ensures first-touch allocation.
// There is a way to shut off default initialization.
//
// You may NOT allocate a View inside of a parallel_{for, reduce,
// scan}. Treat View allocation as a "thread collective."
//
// The string "A" is just the label; it only matters for debugging.
// Different Views may have the same label.
view_type a ("A", 10);
// Fill the View with some data. The parallel_for loop will iterate
// over the View's first dimension N.
//
// Note that the View is passed by value into the lambda. The macro
// KOKKOS_LAMBDA includes the "capture by value" clause [=]. This
// tells the lambda to "capture all variables in the enclosing scope
// by value." Views have "view semantics"; they behave like
// pointers, not like std::vector. Passing them by value does a
// shallow copy. A deep copy never happens unless you explicitly
// ask for one.
+ // We also need to protect the usage of a lambda against compiling
+ // with a backend which doesn't support it (i.e. Cuda 6.5/7.0).
+ #if (KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA)
Kokkos::parallel_for (10, KOKKOS_LAMBDA (const int i) {
// Acesss the View just like a Fortran array. The layout depends
// on the View's memory space, so don't rely on the View's
// physical memory layout unless you know what you're doing.
a(i,0) = 1.0*i;
a(i,1) = 1.0*i*i;
a(i,2) = 1.0*i*i*i;
});
// Reduction functor that reads the View given to its constructor.
double sum = 0;
Kokkos::parallel_reduce (10, KOKKOS_LAMBDA (const int i, double& lsum) {
lsum += a(i,0)*a(i,1)/(a(i,2)+0.1);
}, sum);
printf ("Result: %f\n", sum);
+ #endif
Kokkos::finalize ();
}
diff --git a/lib/kokkos/example/tutorial/04_simple_memoryspaces/Makefile b/lib/kokkos/example/tutorial/04_simple_memoryspaces/Makefile
index 78a9fed0c..8dd7598f0 100644
--- a/lib/kokkos/example/tutorial/04_simple_memoryspaces/Makefile
+++ b/lib/kokkos/example/tutorial/04_simple_memoryspaces/Makefile
@@ -1,43 +1,48 @@
KOKKOS_PATH = ../../..
-SRC = $(wildcard *.cpp)
+KOKKOS_SRC_PATH = ${KOKKOS_PATH}
+SRC = $(wildcard ${KOKKOS_SRC_PATH}/example/tutorial/04_simple_memoryspaces/*.cpp)
+vpath %.cpp $(sort $(dir $(SRC)))
default: build
echo "Start Build"
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
-CXX = ../../../config/nvcc_wrapper
+CXX = ${KOKKOS_PATH}/bin/nvcc_wrapper
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.cuda)
+EXE = 04_simple_memoryspaces.cuda
KOKKOS_DEVICES = "Cuda,OpenMP"
KOKKOS_ARCH = "SNB,Kepler35"
else
CXX = g++
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.host)
+EXE = 04_simple_memoryspaces.host
KOKKOS_DEVICES = "OpenMP"
KOKKOS_ARCH = "SNB"
endif
DEPFLAGS = -M
-OBJ = $(SRC:.cpp=.o)
+OBJ = $(notdir $(SRC:.cpp=.o))
LIB =
include $(KOKKOS_PATH)/Makefile.kokkos
build: $(EXE)
+test: $(EXE)
+ ./$(EXE)
+
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
clean: kokkos-clean
rm -f *.o *.cuda *.host
# Compilation rules
%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
- $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
+ $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $< -o $(notdir $@)
diff --git a/lib/kokkos/example/tutorial/05_simple_atomics/Makefile b/lib/kokkos/example/tutorial/05_simple_atomics/Makefile
index 78a9fed0c..d297d4557 100644
--- a/lib/kokkos/example/tutorial/05_simple_atomics/Makefile
+++ b/lib/kokkos/example/tutorial/05_simple_atomics/Makefile
@@ -1,43 +1,48 @@
KOKKOS_PATH = ../../..
-SRC = $(wildcard *.cpp)
+KOKKOS_SRC_PATH = ${KOKKOS_PATH}
+SRC = $(wildcard ${KOKKOS_SRC_PATH}/example/tutorial/05_simple_atomics/*.cpp)
+vpath %.cpp $(sort $(dir $(SRC)))
default: build
echo "Start Build"
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
-CXX = ../../../config/nvcc_wrapper
+CXX = ${KOKKOS_PATH}/bin/nvcc_wrapper
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.cuda)
+EXE = 05_simple_atomics.cuda
KOKKOS_DEVICES = "Cuda,OpenMP"
KOKKOS_ARCH = "SNB,Kepler35"
else
CXX = g++
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.host)
+EXE = 05_simple_atomics.host
KOKKOS_DEVICES = "OpenMP"
KOKKOS_ARCH = "SNB"
endif
DEPFLAGS = -M
-OBJ = $(SRC:.cpp=.o)
+OBJ = $(notdir $(SRC:.cpp=.o))
LIB =
include $(KOKKOS_PATH)/Makefile.kokkos
build: $(EXE)
+test: $(EXE)
+ ./$(EXE)
+
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
clean: kokkos-clean
rm -f *.o *.cuda *.host
# Compilation rules
%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
- $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
+ $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $< -o $(notdir $@)
diff --git a/lib/kokkos/example/tutorial/Advanced_Views/01_data_layouts/Makefile b/lib/kokkos/example/tutorial/Advanced_Views/01_data_layouts/Makefile
index 12ad36b31..956a4d179 100644
--- a/lib/kokkos/example/tutorial/Advanced_Views/01_data_layouts/Makefile
+++ b/lib/kokkos/example/tutorial/Advanced_Views/01_data_layouts/Makefile
@@ -1,43 +1,48 @@
KOKKOS_PATH = ../../../..
-SRC = $(wildcard *.cpp)
+KOKKOS_SRC_PATH = ${KOKKOS_PATH}
+SRC = $(wildcard ${KOKKOS_SRC_PATH}/example/tutorial/Advanced_Views/01_data_layouts/*.cpp)
+vpath %.cpp $(sort $(dir $(SRC)))
default: build
echo "Start Build"
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
-CXX = ../../../../config/nvcc_wrapper
+CXX = ${KOKKOS_PATH}/bin/nvcc_wrapper
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.cuda)
+EXE = 01_data_layouts.cuda
KOKKOS_DEVICES = "Cuda,OpenMP"
KOKKOS_ARCH = "SNB,Kepler35"
else
CXX = g++
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.host)
+EXE = 01_data_layouts.host
KOKKOS_DEVICES = "OpenMP"
KOKKOS_ARCH = "SNB"
endif
DEPFLAGS = -M
-OBJ = $(SRC:.cpp=.o)
+OBJ = $(notdir $(SRC:.cpp=.o))
LIB =
include $(KOKKOS_PATH)/Makefile.kokkos
build: $(EXE)
+test: $(EXE)
+ ./$(EXE)
+
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
clean: kokkos-clean
rm -f *.o *.cuda *.host
# Compilation rules
%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
- $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
+ $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $< -o $(notdir $@)
diff --git a/lib/kokkos/example/tutorial/Advanced_Views/02_memory_traits/Makefile b/lib/kokkos/example/tutorial/Advanced_Views/02_memory_traits/Makefile
index 12ad36b31..41697b073 100644
--- a/lib/kokkos/example/tutorial/Advanced_Views/02_memory_traits/Makefile
+++ b/lib/kokkos/example/tutorial/Advanced_Views/02_memory_traits/Makefile
@@ -1,43 +1,48 @@
KOKKOS_PATH = ../../../..
-SRC = $(wildcard *.cpp)
+KOKKOS_SRC_PATH = ${KOKKOS_PATH}
+SRC = $(wildcard ${KOKKOS_SRC_PATH}/example/tutorial/Advanced_Views/02_memory_traits/*.cpp)
+vpath %.cpp $(sort $(dir $(SRC)))
default: build
echo "Start Build"
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
-CXX = ../../../../config/nvcc_wrapper
+CXX = ${KOKKOS_PATH}/bin/nvcc_wrapper
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.cuda)
+EXE = 02_memory_traits.cuda
KOKKOS_DEVICES = "Cuda,OpenMP"
KOKKOS_ARCH = "SNB,Kepler35"
else
CXX = g++
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.host)
+EXE = 02_memory_traits.host
KOKKOS_DEVICES = "OpenMP"
KOKKOS_ARCH = "SNB"
endif
DEPFLAGS = -M
-OBJ = $(SRC:.cpp=.o)
+OBJ = $(notdir $(SRC:.cpp=.o))
LIB =
include $(KOKKOS_PATH)/Makefile.kokkos
build: $(EXE)
+test: $(EXE)
+ ./$(EXE)
+
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
clean: kokkos-clean
rm -f *.o *.cuda *.host
# Compilation rules
%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
- $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
+ $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $< -o $(notdir $@)
diff --git a/lib/kokkos/example/tutorial/Advanced_Views/03_subviews/Makefile b/lib/kokkos/example/tutorial/Advanced_Views/03_subviews/Makefile
index 12ad36b31..8d0697aa2 100644
--- a/lib/kokkos/example/tutorial/Advanced_Views/03_subviews/Makefile
+++ b/lib/kokkos/example/tutorial/Advanced_Views/03_subviews/Makefile
@@ -1,43 +1,48 @@
KOKKOS_PATH = ../../../..
-SRC = $(wildcard *.cpp)
+KOKKOS_SRC_PATH = ${KOKKOS_PATH}
+SRC = $(wildcard ${KOKKOS_SRC_PATH}/example/tutorial/Advanced_Views/03_subviews/*.cpp)
+vpath %.cpp $(sort $(dir $(SRC)))
default: build
echo "Start Build"
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
-CXX = ../../../../config/nvcc_wrapper
+CXX = ${KOKKOS_PATH}/bin/nvcc_wrapper
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.cuda)
+EXE = 03_subviews.cuda
KOKKOS_DEVICES = "Cuda,OpenMP"
KOKKOS_ARCH = "SNB,Kepler35"
else
CXX = g++
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.host)
+EXE = 03_subviews.host
KOKKOS_DEVICES = "OpenMP"
KOKKOS_ARCH = "SNB"
endif
DEPFLAGS = -M
-OBJ = $(SRC:.cpp=.o)
+OBJ = $(notdir $(SRC:.cpp=.o))
LIB =
include $(KOKKOS_PATH)/Makefile.kokkos
build: $(EXE)
+test: $(EXE)
+ ./$(EXE)
+
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
clean: kokkos-clean
rm -f *.o *.cuda *.host
# Compilation rules
%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
- $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
+ $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $< -o $(notdir $@)
diff --git a/lib/kokkos/example/tutorial/Advanced_Views/04_dualviews/Makefile b/lib/kokkos/example/tutorial/Advanced_Views/04_dualviews/Makefile
index 12ad36b31..0a3acd984 100644
--- a/lib/kokkos/example/tutorial/Advanced_Views/04_dualviews/Makefile
+++ b/lib/kokkos/example/tutorial/Advanced_Views/04_dualviews/Makefile
@@ -1,43 +1,48 @@
KOKKOS_PATH = ../../../..
-SRC = $(wildcard *.cpp)
+KOKKOS_SRC_PATH = ${KOKKOS_PATH}
+SRC = $(wildcard ${KOKKOS_SRC_PATH}/example/tutorial/Advanced_Views/04_dualviews/*.cpp)
+vpath %.cpp $(sort $(dir $(SRC)))
default: build
echo "Start Build"
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
-CXX = ../../../../config/nvcc_wrapper
+CXX = ${KOKKOS_PATH}/bin/nvcc_wrapper
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.cuda)
+EXE = 04_dualviews.cuda
KOKKOS_DEVICES = "Cuda,OpenMP"
KOKKOS_ARCH = "SNB,Kepler35"
else
CXX = g++
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.host)
+EXE = 04_dualviews.host
KOKKOS_DEVICES = "OpenMP"
KOKKOS_ARCH = "SNB"
endif
DEPFLAGS = -M
-OBJ = $(SRC:.cpp=.o)
+OBJ = $(notdir $(SRC:.cpp=.o))
LIB =
include $(KOKKOS_PATH)/Makefile.kokkos
build: $(EXE)
+test: $(EXE)
+ ./$(EXE)
+
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
clean: kokkos-clean
rm -f *.o *.cuda *.host
# Compilation rules
%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
- $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
+ $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $< -o $(notdir $@)
diff --git a/lib/kokkos/example/tutorial/Advanced_Views/04_dualviews/dual_view.cpp b/lib/kokkos/example/tutorial/Advanced_Views/04_dualviews/dual_view.cpp
index 4905e4bf8..26b55eae7 100644
--- a/lib/kokkos/example/tutorial/Advanced_Views/04_dualviews/dual_view.cpp
+++ b/lib/kokkos/example/tutorial/Advanced_Views/04_dualviews/dual_view.cpp
@@ -1,218 +1,218 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Core.hpp>
#include <Kokkos_DualView.hpp>
#include <impl/Kokkos_Timer.hpp>
#include <cstdio>
#include <cstdlib>
// DualView helps you manage data and computations that take place on
// two different memory spaces. Examples include CUDA device memory
// and (CPU) host memory (currently implemented), or Intel Knights
// Landing MCDRAM and DRAM (not yet implemented). For example, if you
// have ported only some parts of you application to run in CUDA,
// DualView can help manage moving data between the parts of your
// application that work best with CUDA, and the parts that work
// better on the CPU.
//
// A DualView takes the same template parameters as a View, but
// contains two Views: One that lives in the DualView's memory space,
// and one that lives in that memory space's host mirror space. If
// both memory spaces are the same, then the two Views just alias one
// another. This means that you can use DualView all the time, even
// when not running in a memory space like CUDA. DualView's
// operations to help you manage memory take almost no time in that
// case. This makes your code even more performance portable.
typedef Kokkos::DualView<double*> view_type;
typedef Kokkos::DualView<int**> idx_type;
template<class ExecutionSpace>
struct localsum {
// If the functor has a public 'execution_space' typedef, that defines
// the functor's execution space (where it runs in parallel). This
// overrides Kokkos' default execution space.
typedef ExecutionSpace execution_space;
- typedef typename Kokkos::Impl::if_c<Kokkos::Impl::is_same<ExecutionSpace,Kokkos::DefaultExecutionSpace>::value ,
+ typedef typename Kokkos::Impl::if_c<std::is_same<ExecutionSpace,Kokkos::DefaultExecutionSpace>::value ,
idx_type::memory_space, idx_type::host_mirror_space>::type memory_space;
// Get the view types on the particular device for which the functor
// is instantiated.
//
// "const_data_type" is a typedef in View (and DualView) which is
// the const version of the first template parameter of the View.
// For example, the const_data_type version of double** is const
// double**.
Kokkos::View<idx_type::const_data_type, idx_type::array_layout, memory_space> idx;
// "scalar_array_type" is a typedef in ViewTraits (and DualView) which is the
// array version of the value(s) stored in the View.
Kokkos::View<view_type::scalar_array_type, view_type::array_layout, memory_space> dest;
Kokkos::View<view_type::const_data_type, view_type::array_layout,
memory_space, Kokkos::MemoryRandomAccess> src;
// Constructor takes DualViews, synchronizes them to the device,
// then marks them as modified on the device.
localsum (idx_type dv_idx, view_type dv_dest, view_type dv_src)
{
// Extract the view on the correct Device (i.e., the correct
// memory space) from the DualView. DualView has a template
// method, view(), which is templated on the memory space. If the
// DualView has a View from that memory space, view() returns the
// View in that space.
idx = dv_idx.view<memory_space> ();
dest = dv_dest.template view<memory_space> ();
src = dv_src.template view<memory_space> ();
// Synchronize the DualView to the correct Device.
//
// DualView's sync() method is templated on a memory space, and
// synchronizes the DualView in a one-way fashion to that memory
// space. "Synchronizing" means copying, from the other memory
// space to the Device memory space. sync() does _nothing_ if the
// Views on the two memory spaces are in sync. DualView
// determines this by the user manually marking one side or the
// other as modified; see the modify() call below.
dv_idx.sync<memory_space> ();
dv_dest.template sync<memory_space> ();
dv_src.template sync<memory_space> ();
// Mark dest as modified on Device.
dv_dest.template modify<memory_space> ();
}
KOKKOS_INLINE_FUNCTION
void operator() (const int i) const {
double tmp = 0.0;
for (int j = 0; j < (int) idx.dimension_1(); ++j) {
const double val = src(idx(i,j));
tmp += val*val + 0.5*(idx.dimension_0()*val -idx.dimension_1()*val);
}
dest(i) += tmp;
}
};
class ParticleType {
public:
double q;
double m;
double q_over_m;
KOKKOS_INLINE_FUNCTION
ParticleType(double q_ = -1, double m_ = 1):
q(q_), m(m_), q_over_m(q/m) {}
protected:
};
typedef Kokkos::DualView<ParticleType[10]> ParticleTypes;
int main (int narg, char* arg[]) {
Kokkos::initialize (narg, arg);
// If View is non-trivial constructible type then add braces so it is out of scope
// before Kokkos::finalize() call
{
ParticleTypes test("Test");
Kokkos::fence();
test.h_view(0) = ParticleType(-1e4,1);
Kokkos::fence();
int size = 1000000;
// Create DualViews. This will allocate on both the device and its
// host_mirror_device.
idx_type idx ("Idx",size,64);
view_type dest ("Dest",size);
view_type src ("Src",size);
srand (134231);
// Get a reference to the host view of idx directly (equivalent to
// idx.view<idx_type::host_mirror_space>() )
idx_type::t_host h_idx = idx.h_view;
for (int i = 0; i < size; ++i) {
for (view_type::size_type j = 0; j < h_idx.dimension_1 (); ++j) {
h_idx(i,j) = (size + i + (rand () % 500 - 250)) % size;
}
}
// Mark idx as modified on the host_mirror_space so that a
// sync to the device will actually move data. The sync happens in
// the functor's constructor.
idx.modify<idx_type::host_mirror_space> ();
// Run on the device. This will cause a sync of idx to the device,
// since it was marked as modified on the host.
Kokkos::Timer timer;
Kokkos::parallel_for(size,localsum<view_type::execution_space>(idx,dest,src));
Kokkos::fence();
double sec1_dev = timer.seconds();
timer.reset();
Kokkos::parallel_for(size,localsum<view_type::execution_space>(idx,dest,src));
Kokkos::fence();
double sec2_dev = timer.seconds();
// Run on the host's default execution space (could be the same as device).
// This will cause a sync back to the host of dest. Note that if the Device is CUDA,
// the data layout will not be optimal on host, so performance is
// lower than what it would be for a pure host compilation.
timer.reset();
Kokkos::parallel_for(size,localsum<Kokkos::HostSpace::execution_space>(idx,dest,src));
Kokkos::fence();
double sec1_host = timer.seconds();
timer.reset();
Kokkos::parallel_for(size,localsum<Kokkos::HostSpace::execution_space>(idx,dest,src));
Kokkos::fence();
double sec2_host = timer.seconds();
printf("Device Time with Sync: %f without Sync: %f \n",sec1_dev,sec2_dev);
printf("Host Time with Sync: %f without Sync: %f \n",sec1_host,sec2_host);
}
Kokkos::finalize();
}
diff --git a/lib/kokkos/example/tutorial/Advanced_Views/05_NVIDIA_UVM/Makefile b/lib/kokkos/example/tutorial/Advanced_Views/05_NVIDIA_UVM/Makefile
index 12ad36b31..615ee2887 100644
--- a/lib/kokkos/example/tutorial/Advanced_Views/05_NVIDIA_UVM/Makefile
+++ b/lib/kokkos/example/tutorial/Advanced_Views/05_NVIDIA_UVM/Makefile
@@ -1,43 +1,48 @@
KOKKOS_PATH = ../../../..
-SRC = $(wildcard *.cpp)
+KOKKOS_SRC_PATH = ${KOKKOS_PATH}
+SRC = $(wildcard ${KOKKOS_SRC_PATH}/example/tutorial/Advanced_Views/05_NVIDIA_UVM/*.cpp)
+vpath %.cpp $(sort $(dir $(SRC)))
default: build
echo "Start Build"
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
-CXX = ../../../../config/nvcc_wrapper
+CXX = ${KOKKOS_PATH}/bin/nvcc_wrapper
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.cuda)
+EXE = 05_NVIDIA_UVM.cuda
KOKKOS_DEVICES = "Cuda,OpenMP"
KOKKOS_ARCH = "SNB,Kepler35"
else
CXX = g++
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.host)
+EXE = 05_NVIDIA_UVM.host
KOKKOS_DEVICES = "OpenMP"
KOKKOS_ARCH = "SNB"
endif
DEPFLAGS = -M
-OBJ = $(SRC:.cpp=.o)
+OBJ = $(notdir $(SRC:.cpp=.o))
LIB =
include $(KOKKOS_PATH)/Makefile.kokkos
build: $(EXE)
+test: $(EXE)
+ ./$(EXE)
+
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
clean: kokkos-clean
rm -f *.o *.cuda *.host
# Compilation rules
%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
- $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
+ $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $< -o $(notdir $@)
diff --git a/lib/kokkos/example/tutorial/Advanced_Views/05_NVIDIA_UVM/uvm_example.cpp b/lib/kokkos/example/tutorial/Advanced_Views/05_NVIDIA_UVM/uvm_example.cpp
index cf5326b68..72fd444ab 100644
--- a/lib/kokkos/example/tutorial/Advanced_Views/05_NVIDIA_UVM/uvm_example.cpp
+++ b/lib/kokkos/example/tutorial/Advanced_Views/05_NVIDIA_UVM/uvm_example.cpp
@@ -1,134 +1,140 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Core.hpp>
#include <Kokkos_DualView.hpp>
#include <impl/Kokkos_Timer.hpp>
#include <cstdio>
#include <cstdlib>
-typedef Kokkos::View<double*> view_type;
-typedef Kokkos::View<int**> idx_type;
-
+#ifdef KOKKOS_HAVE_CUDA
+typedef Kokkos::View<double*, Kokkos::CudaUVMSpace> view_type;
+typedef Kokkos::View<int**, Kokkos::CudaUVMSpace> idx_type;
+#else
+typedef Kokkos::View<double*,Kokkos::HostSpace> view_type;
+typedef Kokkos::View<int**,Kokkos::HostSpace> idx_type;
+#endif
template<class Device>
struct localsum {
// Define the execution space for the functor (overrides the DefaultExecutionSpace)
typedef Device execution_space;
// Get the view types on the particular device the functor is instantiated for
idx_type::const_type idx;
view_type dest;
- Kokkos::View<view_type::const_data_type, view_type::array_layout, view_type::execution_space, Kokkos::MemoryRandomAccess > src;
+ Kokkos::View<view_type::const_data_type, view_type::array_layout, view_type::device_type, Kokkos::MemoryRandomAccess > src;
localsum(idx_type idx_, view_type dest_,
view_type src_):idx(idx_),dest(dest_),src(src_) {
}
KOKKOS_INLINE_FUNCTION
void operator() (int i) const {
double tmp = 0.0;
- for(int j = 0; j < idx.dimension_1(); j++) {
+ for(int j = 0; j < int(idx.dimension_1()); j++) {
const double val = src(idx(i,j));
tmp += val*val + 0.5*(idx.dimension_0()*val -idx.dimension_1()*val);
}
dest(i) += tmp;
}
};
int main(int narg, char* arg[]) {
Kokkos::initialize(narg,arg);
int size = 1000000;
// Create Views
idx_type idx("Idx",size,64);
view_type dest("Dest",size);
view_type src("Src",size);
srand(134231);
+ Kokkos::fence();
+
// When using UVM Cuda views can be accessed on the Host directly
for(int i=0; i<size; i++) {
- for(int j=0; j<idx.dimension_1(); j++)
+ for(int j=0; j<int(idx.dimension_1()); j++)
idx(i,j) = (size + i + (rand()%500 - 250))%size;
}
Kokkos::fence();
// Run on the device
// This will cause a sync of idx to the device since it was modified on the host
Kokkos::Timer timer;
Kokkos::parallel_for(size,localsum<view_type::execution_space>(idx,dest,src));
Kokkos::fence();
double sec1_dev = timer.seconds();
// No data transfer will happen now, since nothing is accessed on the host
timer.reset();
Kokkos::parallel_for(size,localsum<view_type::execution_space>(idx,dest,src));
Kokkos::fence();
double sec2_dev = timer.seconds();
// Run on the host
// This will cause a sync back to the host of dest which was changed on the device
// Compare runtime here with the dual_view example: dest will be copied back in 4k blocks
// when they are accessed the first time during the parallel_for. Due to the latency of a memcpy
// this gives lower effective bandwidth when doing a manual copy via dual views
timer.reset();
Kokkos::parallel_for(size,localsum<Kokkos::HostSpace::execution_space>(idx,dest,src));
Kokkos::fence();
double sec1_host = timer.seconds();
// No data transfers will happen now
timer.reset();
Kokkos::parallel_for(size,localsum<Kokkos::HostSpace::execution_space>(idx,dest,src));
Kokkos::fence();
double sec2_host = timer.seconds();
- printf("Device Time with Sync: %lf without Sync: %lf \n",sec1_dev,sec2_dev);
- printf("Host Time with Sync: %lf without Sync: %lf \n",sec1_host,sec2_host);
+ printf("Device Time with Sync: %e without Sync: %e \n",sec1_dev,sec2_dev);
+ printf("Host Time with Sync: %e without Sync: %e \n",sec1_host,sec2_host);
Kokkos::finalize();
}
diff --git a/lib/kokkos/example/tutorial/Advanced_Views/06_AtomicViews/Makefile b/lib/kokkos/example/tutorial/Advanced_Views/06_AtomicViews/Makefile
index 12ad36b31..dfb7d6df6 100644
--- a/lib/kokkos/example/tutorial/Advanced_Views/06_AtomicViews/Makefile
+++ b/lib/kokkos/example/tutorial/Advanced_Views/06_AtomicViews/Makefile
@@ -1,43 +1,48 @@
KOKKOS_PATH = ../../../..
-SRC = $(wildcard *.cpp)
+KOKKOS_SRC_PATH = ${KOKKOS_PATH}
+SRC = $(wildcard ${KOKKOS_SRC_PATH}/example/tutorial/Advanced_Views/06_AtomicViews/*.cpp)
+vpath %.cpp $(sort $(dir $(SRC)))
default: build
echo "Start Build"
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
-CXX = ../../../../config/nvcc_wrapper
+CXX = ${KOKKOS_PATH}/bin/nvcc_wrapper
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.cuda)
+EXE = 06_AtomicViews.cuda
KOKKOS_DEVICES = "Cuda,OpenMP"
KOKKOS_ARCH = "SNB,Kepler35"
else
CXX = g++
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.host)
+EXE = 06_AtomicViews.host
KOKKOS_DEVICES = "OpenMP"
KOKKOS_ARCH = "SNB"
endif
DEPFLAGS = -M
-OBJ = $(SRC:.cpp=.o)
+OBJ = $(notdir $(SRC:.cpp=.o))
LIB =
include $(KOKKOS_PATH)/Makefile.kokkos
build: $(EXE)
+test: $(EXE)
+ ./$(EXE)
+
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
clean: kokkos-clean
rm -f *.o *.cuda *.host
# Compilation rules
%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
- $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
+ $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $< -o $(notdir $@)
diff --git a/lib/kokkos/example/tutorial/Advanced_Views/07_Overlapping_DeepCopy/Makefile b/lib/kokkos/example/tutorial/Advanced_Views/07_Overlapping_DeepCopy/Makefile
index 60a514f4d..432a90126 100644
--- a/lib/kokkos/example/tutorial/Advanced_Views/07_Overlapping_DeepCopy/Makefile
+++ b/lib/kokkos/example/tutorial/Advanced_Views/07_Overlapping_DeepCopy/Makefile
@@ -1,43 +1,48 @@
KOKKOS_PATH = ../../../..
-SRC = $(wildcard *.cpp)
+KOKKOS_SRC_PATH = ${KOKKOS_PATH}
+SRC = $(wildcard ${KOKKOS_SRC_PATH}/example/tutorial/Advanced_Views/07_Overlapping_DeepCopy/*.cpp)
+vpath %.cpp $(sort $(dir $(SRC)))
default: build
echo "Start Build"
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
-CXX = ../../../../config/nvcc_wrapper
+CXX = ${KOKKOS_PATH}/bin/nvcc_wrapper
CXXFLAGS = -O3 --default-stream per-thread
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.cuda)
+EXE = 07_Overlapping_DeepCopy.cuda
KOKKOS_DEVICES = "Cuda,OpenMP"
KOKKOS_ARCH = "SNB,Kepler35"
else
CXX = g++
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.host)
+EXE = 07_Overlapping_DeepCopy.host
KOKKOS_DEVICES = "OpenMP"
KOKKOS_ARCH = "SNB"
endif
DEPFLAGS = -M
-OBJ = $(SRC:.cpp=.o)
+OBJ = $(notdir $(SRC:.cpp=.o))
LIB =
include $(KOKKOS_PATH)/Makefile.kokkos
build: $(EXE)
+test: $(EXE)
+ ./$(EXE)
+
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
clean: kokkos-clean
rm -f *.o *.cuda *.host
# Compilation rules
%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
- $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
+ $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $< -o $(notdir $@)
diff --git a/lib/kokkos/example/tutorial/Advanced_Views/Makefile b/lib/kokkos/example/tutorial/Advanced_Views/Makefile
index 19053b61b..bc4012f68 100644
--- a/lib/kokkos/example/tutorial/Advanced_Views/Makefile
+++ b/lib/kokkos/example/tutorial/Advanced_Views/Makefile
@@ -1,84 +1,121 @@
-default:
+ifndef KOKKOS_PATH
+ MAKEFILE_PATH := $(abspath $(lastword $(MAKEFILE_LIST)))
+ KOKKOS_PATH = $(subst Makefile,,$(MAKEFILE_PATH))../../..
+endif
+
+ifndef KOKKOS_SETTINGS
+ KOKKOS_SETTINGS = "KOKKOS_PATH=${KOKKOS_PATH}"
+ ifdef KOKKOS_ARCH
+ KOKKOS_SETTINGS += "KOKKOS_ARCH=${KOKKOS_ARCH}"
+ endif
+ ifdef KOKKOS_DEVICES
+ KOKKOS_SETTINGS += "KOKKOS_DEVICES=${KOKKOS_DEVICES}"
+ endif
+ ifdef KOKKOS_OPTIONS
+ KOKKOS_SETTINGS += "KOKKOS_OPTIONS=${KOKKOS_OPTIONS}"
+ endif
+ ifdef KOKKOS_CUDA_OPTIONS
+ KOKKOS_SETTINGS += "KOKKOS_CUDA_OPTIONS=${KOKKOS_CUDA_OPTIONS}"
+ endif
+endif
+
+build:
+ mkdir -p 01_data_layouts
cd ./01_data_layouts; \
- make -j 4
+ make build -j 4 -f ${KOKKOS_PATH}/example/tutorial/Advanced_Views/01_data_layouts/Makefile ${KOKKOS_SETTINGS}
+ mkdir -p 02_memory_traits
cd ./02_memory_traits; \
- make -j 4
+ make build -j 4 -f ${KOKKOS_PATH}/example/tutorial/Advanced_Views/02_memory_traits/Makefile ${KOKKOS_SETTINGS}
+ mkdir -p 03_subviews
cd ./03_subviews; \
- make -j 4
+ make build -j 4 -f ${KOKKOS_PATH}/example/tutorial/Advanced_Views/03_subviews/Makefile ${KOKKOS_SETTINGS}
+ mkdir -p 04_dualviews
cd ./04_dualviews; \
- make -j 4
+ make build -j 4 -f ${KOKKOS_PATH}/example/tutorial/Advanced_Views/04_dualviews/Makefile ${KOKKOS_SETTINGS}
+ mkdir -p 05_NVIDIA_UVM
cd ./05_NVIDIA_UVM; \
- make -j 4
- cd ./06_AtomicViews; \
- make -j 4
+ make build -j 4 -f ${KOKKOS_PATH}/example/tutorial/Advanced_Views/05_NVIDIA_UVM/Makefile ${KOKKOS_SETTINGS}
+ #mkdir -p 06_AtomicViews
+ #cd ./06_AtomicViews; \
+ #make build -j 4 -f ${KOKKOS_PATH}/example/tutorial/Advanced_Views/06_AtomicViews/Makefile ${KOKKOS_SETTINGS}
+ #mkdir -p 07_Overlapping_DeepCopy
+ #cd ./07_Overlapping_DeepCopy; \
+ #make build -j 4 -f ${KOKKOS_PATH}/example/tutorial/Advanced_Views/07_Overlapping_DeepCopy/Makefile ${KOKKOS_SETTINGS}
-openmp:
+build-insource:
cd ./01_data_layouts; \
- make -j 4 KOKKOS_DEVICES=OpenMP
+ make build -j 4 ${KOKKOS_SETTINGS}
cd ./02_memory_traits; \
- make -j 4 KOKKOS_DEVICES=OpenMP
+ make build -j 4 ${KOKKOS_SETTINGS}
cd ./03_subviews; \
- make -j 4 KOKKOS_DEVICES=OpenMP
+ make build -j 4 ${KOKKOS_SETTINGS}
cd ./04_dualviews; \
- make -j 4 KOKKOS_DEVICES=OpenMP
+ make build -j 4 ${KOKKOS_SETTINGS}
cd ./05_NVIDIA_UVM; \
- make -j 4 KOKKOS_DEVICES=OpenMP
- cd ./06_AtomicViews; \
- make -j 4 KOKKOS_DEVICES=OpenMP
-
-pthreads:
+ make build -j 4 ${KOKKOS_SETTINGS}
+ #cd ./06_AtomicViews; \
+ #make build -j 4 ${KOKKOS_SETTINGS}
+ #cd ./07_Overlapping_DeepCopy; \
+ #make build -j 4 ${KOKKOS_SETTINGS}
+test:
cd ./01_data_layouts; \
- make -j 4 KOKKOS_DEVICES=Pthreads
+ make test -j 4 -f ${KOKKOS_PATH}/example/tutorial/Advanced_Views/01_data_layouts/Makefile ${KOKKOS_SETTINGS}
cd ./02_memory_traits; \
- make -j 4 KOKKOS_DEVICES=Pthreads
+ make test -j 4 -f ${KOKKOS_PATH}/example/tutorial/Advanced_Views/02_memory_traits/Makefile ${KOKKOS_SETTINGS}
cd ./03_subviews; \
- make -j 4 KOKKOS_DEVICES=Pthreads
+ make test -j 4 -f ${KOKKOS_PATH}/example/tutorial/Advanced_Views/03_subviews/Makefile ${KOKKOS_SETTINGS}
cd ./04_dualviews; \
- make -j 4 KOKKOS_DEVICES=Pthreads
+ make test -j 4 -f ${KOKKOS_PATH}/example/tutorial/Advanced_Views/04_dualviews/Makefile ${KOKKOS_SETTINGS}
cd ./05_NVIDIA_UVM; \
- make -j 4 KOKKOS_DEVICES=Pthreads
- cd ./06_AtomicViews; \
- make -j 4 KOKKOS_DEVICES=Pthreads
+ make test -j 4 -f ${KOKKOS_PATH}/example/tutorial/Advanced_Views/05_NVIDIA_UVM/Makefile ${KOKKOS_SETTINGS}
+ #cd ./06_AtomicViews; \
+ #make test -j 4 -f ${KOKKOS_PATH}/example/tutorial/Advanced_Views/06_AtomicViews/Makefile ${KOKKOS_SETTINGS}
+ #cd ./07_Overlapping_DeepCopy; \
+ #make test -j 4 -f ${KOKKOS_PATH}/example/tutorial/Advanced_Views/07_Overlapping_DeepCopy/Makefile ${KOKKOS_SETTINGS}
-serial:
+test-insource:
cd ./01_data_layouts; \
- make -j 4 KOKKOS_DEVICES=Serial
+ make test -j 4 ${KOKKOS_SETTINGS}
cd ./02_memory_traits; \
- make -j 4 KOKKOS_DEVICES=Serial
+ make test -j 4 ${KOKKOS_SETTINGS}
cd ./03_subviews; \
- make -j 4 KOKKOS_DEVICES=Serial
+ make test -j 4 ${KOKKOS_SETTINGS}
cd ./04_dualviews; \
- make -j 4 KOKKOS_DEVICES=Serial
+ make test -j 4 ${KOKKOS_SETTINGS}
cd ./05_NVIDIA_UVM; \
- make -j 4 KOKKOS_DEVICES=Serial
- cd ./06_AtomicViews; \
- make -j 4 KOKKOS_DEVICES=Serial
-
-cuda:
+ make test -j 4 ${KOKKOS_SETTINGS}
+ #cd ./06_AtomicViews; \
+ #make test -j 4 ${KOKKOS_SETTINGS}
+ #cd ./07_Overlapping_DeepCopy; \
+ #make test -j 4 ${KOKKOS_SETTINGS}
+clean:
cd ./01_data_layouts; \
- make -j 4 KOKKOS_DEVICES=Cuda,Serial
+ make clean -f ${KOKKOS_PATH}/example/tutorial/Advanced_Views/01_data_layouts/Makefile ${KOKKOS_SETTINGS}
cd ./02_memory_traits; \
- make -j 4 KOKKOS_DEVICES=Cuda,Serial
+ make clean -f ${KOKKOS_PATH}/example/tutorial/Advanced_Views/02_memory_traits/Makefile ${KOKKOS_SETTINGS}
cd ./03_subviews; \
- make -j 4 KOKKOS_DEVICES=Cuda,Serial
+ make clean -f ${KOKKOS_PATH}/example/tutorial/Advanced_Views/03_subviews/Makefile ${KOKKOS_SETTINGS}
cd ./04_dualviews; \
- make -j 4 KOKKOS_DEVICES=Cuda,Serial
+ make clean -f ${KOKKOS_PATH}/example/tutorial/Advanced_Views/04_dualviews/Makefile ${KOKKOS_SETTINGS}
cd ./05_NVIDIA_UVM; \
- make -j 4 KOKKOS_DEVICES=Cuda,Serial
- cd ./06_AtomicViews; \
- make -j 4 KOKKOS_DEVICES=Cuda,Serial
+ make clean -f ${KOKKOS_PATH}/example/tutorial/Advanced_Views/05_NVIDIA_UVM/Makefile ${KOKKOS_SETTINGS}
+ #cd ./06_AtomicViews; \
+ #make clean -f ${KOKKOS_PATH}/example/tutorial/Advanced_Views/06_AtomicViews/Makefile ${KOKKOS_SETTINGS}
+ #cd ./07_Overlapping_DeepCopy; \
+ #make clean -f ${KOKKOS_PATH}/example/tutorial/Advanced_Views/07_Overlapping_DeepCopy/Makefile ${KOKKOS_SETTINGS}
-clean:
+clean-insource:
cd ./01_data_layouts; \
- make clean
+ make clean ${KOKKOS_SETTINGS}
cd ./02_memory_traits; \
- make clean
+ make clean ${KOKKOS_SETTINGS}
cd ./03_subviews; \
- make clean
+ make clean ${KOKKOS_SETTINGS}
cd ./04_dualviews; \
- make clean
+ make clean ${KOKKOS_SETTINGS}
cd ./05_NVIDIA_UVM; \
- make clean
- cd ./06_AtomicViews; \
- make clean
-
+ make clean ${KOKKOS_SETTINGS}
+ #cd ./06_AtomicViews; \
+ #make clean ${KOKKOS_SETTINGS}
+ #cd ./07_Overlapping_DeepCopy; \
+ #make clean ${KOKKOS_SETTINGS}
diff --git a/lib/kokkos/example/tutorial/Algorithms/01_random_numbers/Makefile b/lib/kokkos/example/tutorial/Algorithms/01_random_numbers/Makefile
index 12ad36b31..60f6f94cd 100644
--- a/lib/kokkos/example/tutorial/Algorithms/01_random_numbers/Makefile
+++ b/lib/kokkos/example/tutorial/Algorithms/01_random_numbers/Makefile
@@ -1,43 +1,48 @@
KOKKOS_PATH = ../../../..
-SRC = $(wildcard *.cpp)
+KOKKOS_SRC_PATH = ${KOKKOS_PATH}
+SRC = $(wildcard ${KOKKOS_SRC_PATH}/example/tutorial/Algorithms/01_random_numbers/*.cpp)
+vpath %.cpp $(sort $(dir $(SRC)))
default: build
echo "Start Build"
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
-CXX = ../../../../config/nvcc_wrapper
+CXX = ${KOKKOS_PATH}/bin/nvcc_wrapper
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.cuda)
+EXE = 01_random_numbers.cuda
KOKKOS_DEVICES = "Cuda,OpenMP"
KOKKOS_ARCH = "SNB,Kepler35"
else
CXX = g++
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.host)
+EXE = 01_random_numbers.host
KOKKOS_DEVICES = "OpenMP"
KOKKOS_ARCH = "SNB"
endif
DEPFLAGS = -M
-OBJ = $(SRC:.cpp=.o)
+OBJ = $(notdir $(SRC:.cpp=.o))
LIB =
include $(KOKKOS_PATH)/Makefile.kokkos
build: $(EXE)
+test: $(EXE)
+ ./$(EXE)
+
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
clean: kokkos-clean
rm -f *.o *.cuda *.host
# Compilation rules
%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
- $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
+ $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $< -o $(notdir $@)
diff --git a/lib/kokkos/example/tutorial/Algorithms/01_random_numbers/random_numbers.cpp b/lib/kokkos/example/tutorial/Algorithms/01_random_numbers/random_numbers.cpp
index 3e6175a75..a5cf40ced 100644
--- a/lib/kokkos/example/tutorial/Algorithms/01_random_numbers/random_numbers.cpp
+++ b/lib/kokkos/example/tutorial/Algorithms/01_random_numbers/random_numbers.cpp
@@ -1,152 +1,154 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Core.hpp>
#include <Kokkos_Random.hpp>
#include <Kokkos_DualView.hpp>
#include <impl/Kokkos_Timer.hpp>
#include <cstdlib>
typedef Kokkos::HostSpace::execution_space DefaultHostType;
// Kokkos provides two different random number generators with a 64 bit and a 1024 bit state.
// These generators are based on Vigna, Sebastiano (2014). "An experimental exploration of Marsaglia's xorshift generators, scrambled"
// See: http://arxiv.org/abs/1402.6246
// The generators can be used fully independently on each thread and have been tested to
// produce good statistics for both inter and intra thread numbers.
// Note that within a kernel NO random number operations are (team) collective operations.
// Everything can be called within branches. This is a difference to the curand library where
// certain operations are required to be called by all threads in a block.
//
// In Kokkos you are required to create a pool of generator states, so that threads can
// grep their own. On CPU architectures the pool size is equal to the thread number,
// on CUDA about 128k states are generated (enough to give every potentially simultaneously
// running thread its own state). With a kernel a thread is required to aquire a state from the
// pool and later return it.
// On CPUs the Random number generator is deterministic if using the same number of threads.
// On GPUs (i.e. using the CUDA backend it is not deterministic because threads aquire states via
// atomics.
// A Functor for generating uint64_t random numbers templated on the GeneratorPool type
template<class GeneratorPool>
struct generate_random {
- // The GeneratorPool
- GeneratorPool rand_pool;
// Output View for the random numbers
Kokkos::View<uint64_t*> vals;
+
+ // The GeneratorPool
+ GeneratorPool rand_pool;
+
int samples;
// Initialize all members
generate_random(Kokkos::View<uint64_t*> vals_,
GeneratorPool rand_pool_,
int samples_):
vals(vals_),rand_pool(rand_pool_),samples(samples_) {}
KOKKOS_INLINE_FUNCTION
void operator() (int i) const {
// Get a random number state from the pool for the active thread
typename GeneratorPool::generator_type rand_gen = rand_pool.get_state();
// Draw samples numbers from the pool as urand64 between 0 and rand_pool.MAX_URAND64
// Note there are function calls to get other type of scalars, and also to specify
// Ranges or get a normal distributed float.
for(int k = 0;k<samples;k++)
vals(i*samples+k) = rand_gen.urand64();
// Give the state back, which will allow another thread to aquire it
rand_pool.free_state(rand_gen);
}
};
int main(int argc, char* args[]) {
if (argc != 3){
printf("Please pass two integers on the command line\n");
}
else {
// Initialize Kokkos
Kokkos::initialize(argc,args);
int size = atoi(args[1]);
int samples = atoi(args[2]);
// Create two random number generator pools one for 64bit states and one for 1024 bit states
// Both take an 64 bit unsigned integer seed to initialize a Random_XorShift64 generator which
// is used to fill the generators of the pool.
Kokkos::Random_XorShift64_Pool<> rand_pool64(5374857);
Kokkos::Random_XorShift1024_Pool<> rand_pool1024(5374857);
Kokkos::DualView<uint64_t*> vals("Vals",size*samples);
// Run some performance comparisons
Kokkos::Timer timer;
Kokkos::parallel_for(size,generate_random<Kokkos::Random_XorShift64_Pool<> >(vals.d_view,rand_pool64,samples));
Kokkos::fence();
timer.reset();
Kokkos::parallel_for(size,generate_random<Kokkos::Random_XorShift64_Pool<> >(vals.d_view,rand_pool64,samples));
Kokkos::fence();
double time_64 = timer.seconds();
Kokkos::parallel_for(size,generate_random<Kokkos::Random_XorShift1024_Pool<> >(vals.d_view,rand_pool1024,samples));
Kokkos::fence();
timer.reset();
Kokkos::parallel_for(size,generate_random<Kokkos::Random_XorShift1024_Pool<> >(vals.d_view,rand_pool1024,samples));
Kokkos::fence();
double time_1024 = timer.seconds();
- printf("#Time XorShift64*: %lf %lf\n",time_64,1.0e-9*samples*size/time_64 );
- printf("#Time XorShift1024*: %lf %lf\n",time_1024,1.0e-9*samples*size/time_1024 );
+ printf("#Time XorShift64*: %e %e\n",time_64,1.0e-9*samples*size/time_64 );
+ printf("#Time XorShift1024*: %e %e\n",time_1024,1.0e-9*samples*size/time_1024 );
Kokkos::deep_copy(vals.h_view,vals.d_view);
Kokkos::finalize();
}
return 0;
}
diff --git a/lib/kokkos/example/tutorial/Algorithms/Makefile b/lib/kokkos/example/tutorial/Algorithms/Makefile
index edc2a3602..ad0b76f9d 100644
--- a/lib/kokkos/example/tutorial/Algorithms/Makefile
+++ b/lib/kokkos/example/tutorial/Algorithms/Makefile
@@ -1,24 +1,43 @@
-default:
- cd ./01_random_numbers; \
- make -j 4
+ifndef KOKKOS_PATH
+ MAKEFILE_PATH := $(abspath $(lastword $(MAKEFILE_LIST)))
+ KOKKOS_PATH = $(subst Makefile,,$(MAKEFILE_PATH))../../..
+endif
-openmp:
- cd ./01_random_numbers; \
- make -j 4 KOKKOS_DEVICES=OpenMP
+ifndef KOKKOS_SETTINGS
+ KOKKOS_SETTINGS = "KOKKOS_PATH=${KOKKOS_PATH}"
+ ifdef KOKKOS_ARCH
+ KOKKOS_SETTINGS += "KOKKOS_ARCH=${KOKKOS_ARCH}"
+ endif
+ ifdef KOKKOS_DEVICES
+ KOKKOS_SETTINGS += "KOKKOS_DEVICES=${KOKKOS_DEVICES}"
+ endif
+ ifdef KOKKOS_OPTIONS
+ KOKKOS_SETTINGS += "KOKKOS_OPTIONS=${KOKKOS_OPTIONS}"
+ endif
+ ifdef KOKKOS_CUDA_OPTIONS
+ KOKKOS_SETTINGS += "KOKKOS_CUDA_OPTIONS=${KOKKOS_CUDA_OPTIONS}"
+ endif
+endif
-pthreads:
+build:
+ mkdir -p 01_random_numbers
cd ./01_random_numbers; \
- make -j 4 KOKKOS_DEVICES=Pthreads
+ make build -j 4 -f ${KOKKOS_PATH}/example/tutorial/Algorithms/01_random_numbers/Makefile ${KOKKOS_SETTINGS}
-serial:
+build-insource:
cd ./01_random_numbers; \
- make -j 4 KOKKOS_DEVICES=Serial
-
-cuda:
+ make build -j 4 ${KOKKOS_SETTINGS}
+test:
cd ./01_random_numbers; \
- make -j 4 KOKKOS_DEVICES=Cuda,Serial
+ make test -j 4 -f ${KOKKOS_PATH}/example/tutorial/Algorithms/01_random_numbers/Makefile ${KOKKOS_SETTINGS}
+test-insource:
+ cd ./01_random_numbers; \
+ make test -j 4 ${KOKKOS_SETTINGS}
clean:
cd ./01_random_numbers; \
- make clean
+ make clean -f ${KOKKOS_PATH}/example/tutorial/Algorithms/01_random_numbers/Makefile ${KOKKOS_SETTINGS}
+clean-insource:
+ cd ./01_random_numbers; \
+ make clean ${KOKKOS_SETTINGS}
diff --git a/lib/kokkos/example/tutorial/Hierarchical_Parallelism/01_thread_teams/Makefile b/lib/kokkos/example/tutorial/Hierarchical_Parallelism/01_thread_teams/Makefile
index 12ad36b31..8c50430c3 100644
--- a/lib/kokkos/example/tutorial/Hierarchical_Parallelism/01_thread_teams/Makefile
+++ b/lib/kokkos/example/tutorial/Hierarchical_Parallelism/01_thread_teams/Makefile
@@ -1,43 +1,48 @@
KOKKOS_PATH = ../../../..
-SRC = $(wildcard *.cpp)
+KOKKOS_SRC_PATH = ${KOKKOS_PATH}
+SRC = $(wildcard ${KOKKOS_SRC_PATH}/example/tutorial/Hierarchical_Parallelism/01_thread_teams/*.cpp)
+vpath %.cpp $(sort $(dir $(SRC)))
default: build
echo "Start Build"
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
-CXX = ../../../../config/nvcc_wrapper
+CXX = ${KOKKOS_PATH}/bin/nvcc_wrapper
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.cuda)
+EXE = 01_thread_teams.cuda
KOKKOS_DEVICES = "Cuda,OpenMP"
KOKKOS_ARCH = "SNB,Kepler35"
else
CXX = g++
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.host)
+EXE = 01_thread_teams.host
KOKKOS_DEVICES = "OpenMP"
KOKKOS_ARCH = "SNB"
endif
DEPFLAGS = -M
-OBJ = $(SRC:.cpp=.o)
+OBJ = $(notdir $(SRC:.cpp=.o))
LIB =
include $(KOKKOS_PATH)/Makefile.kokkos
build: $(EXE)
+test: $(EXE)
+ ./$(EXE)
+
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
clean: kokkos-clean
rm -f *.o *.cuda *.host
# Compilation rules
%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
- $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
+ $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $< -o $(notdir $@)
diff --git a/lib/kokkos/example/tutorial/Hierarchical_Parallelism/01_thread_teams_lambda/Makefile b/lib/kokkos/example/tutorial/Hierarchical_Parallelism/01_thread_teams_lambda/Makefile
index 965b72b4e..b9b017bf1 100644
--- a/lib/kokkos/example/tutorial/Hierarchical_Parallelism/01_thread_teams_lambda/Makefile
+++ b/lib/kokkos/example/tutorial/Hierarchical_Parallelism/01_thread_teams_lambda/Makefile
@@ -1,44 +1,49 @@
KOKKOS_PATH = ../../../..
-SRC = $(wildcard *.cpp)
+KOKKOS_SRC_PATH = ${KOKKOS_PATH}
+SRC = $(wildcard ${KOKKOS_SRC_PATH}/example/tutorial/Hierarchical_Parallelism/01_thread_teams_lambda/*.cpp)
+vpath %.cpp $(sort $(dir $(SRC)))
default: build
echo "Start Build"
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
-CXX = ../../../../config/nvcc_wrapper
+CXX = ${KOKKOS_PATH}/bin/nvcc_wrapper
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.cuda)
+EXE = 01_thread_teams_lambda.cuda
KOKKOS_DEVICES = "Cuda,OpenMP"
KOKKOS_ARCH = "SNB,Kepler35"
-KOKKOS_CUDA_OPTIONS = "enable_lambda"
+KOKKOS_CUDA_OPTIONS += "enable_lambda"
else
CXX = g++
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.host)
+EXE = 01_thread_teams_lambda.host
KOKKOS_DEVICES = "OpenMP"
KOKKOS_ARCH = "SNB"
endif
DEPFLAGS = -M
-OBJ = $(SRC:.cpp=.o)
+OBJ = $(notdir $(SRC:.cpp=.o))
LIB =
include $(KOKKOS_PATH)/Makefile.kokkos
build: $(EXE)
+test: $(EXE)
+ ./$(EXE)
+
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
clean: kokkos-clean
rm -f *.o *.cuda *.host
# Compilation rules
%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
- $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
+ $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $< -o $(notdir $@)
diff --git a/lib/kokkos/example/tutorial/Hierarchical_Parallelism/01_thread_teams_lambda/thread_teams_lambda.cpp b/lib/kokkos/example/tutorial/Hierarchical_Parallelism/01_thread_teams_lambda/thread_teams_lambda.cpp
index 565dd22e8..c0865cfa6 100644
--- a/lib/kokkos/example/tutorial/Hierarchical_Parallelism/01_thread_teams_lambda/thread_teams_lambda.cpp
+++ b/lib/kokkos/example/tutorial/Hierarchical_Parallelism/01_thread_teams_lambda/thread_teams_lambda.cpp
@@ -1,94 +1,97 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Core.hpp>
#include <cstdio>
// Demonstrate a parallel reduction using thread teams (TeamPolicy).
//
// A thread team consists of 1 to n threads. The hardware determines
// the maxmimum value of n. On a dual-socket CPU machine with 8 cores
// per socket, the maximum size of a team is 8. The number of teams
// (the league_size) is not limited by physical constraints (up to
// some reasonable bound, which eventually depends upon the hardware
// and programming model implementation).
int main (int narg, char* args[]) {
using Kokkos::parallel_reduce;
typedef Kokkos::TeamPolicy<> team_policy;
typedef typename team_policy::member_type team_member;
Kokkos::initialize (narg, args);
// Set up a policy that launches 12 teams, with the maximum number
// of threads per team.
const team_policy policy (12, Kokkos::AUTO);
// This is a reduction with a team policy. The team policy changes
// the first argument of the lambda. Rather than an integer index
// (as with RangePolicy), it's now TeamPolicy::member_type. This
// object provides all information to identify a thread uniquely.
// It also provides some team-related function calls such as a team
// barrier (which a subsequent example will use).
//
// Every member of the team contributes to the total sum. It is
// helpful to think of the lambda's body as a "team parallel
// region." That is, every team member is active and will execute
// the body of the lambda.
int sum = 0;
+ // We also need to protect the usage of a lambda against compiling
+ // with a backend which doesn't support it (i.e. Cuda 6.5/7.0).
+ #if (KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA)
parallel_reduce (policy, KOKKOS_LAMBDA (const team_member& thread, int& lsum) {
lsum += 1;
// TeamPolicy<>::member_type provides functions to query the
// multidimensional index of a thread, as well as the number of
// thread teams and the size of each team.
printf ("Hello World: %i %i // %i %i\n", thread.league_rank (),
thread.team_rank (), thread.league_size (), thread.team_size ());
}, sum);
-
+ #endif
// The result will be 12*team_policy::team_size_max([=]{})
printf ("Result %i\n",sum);
Kokkos::finalize ();
}
diff --git a/lib/kokkos/example/tutorial/Hierarchical_Parallelism/02_nested_parallel_for/Makefile b/lib/kokkos/example/tutorial/Hierarchical_Parallelism/02_nested_parallel_for/Makefile
index 12ad36b31..bae935122 100644
--- a/lib/kokkos/example/tutorial/Hierarchical_Parallelism/02_nested_parallel_for/Makefile
+++ b/lib/kokkos/example/tutorial/Hierarchical_Parallelism/02_nested_parallel_for/Makefile
@@ -1,43 +1,48 @@
KOKKOS_PATH = ../../../..
-SRC = $(wildcard *.cpp)
+KOKKOS_SRC_PATH = ${KOKKOS_PATH}
+SRC = $(wildcard ${KOKKOS_SRC_PATH}/example/tutorial/Hierarchical_Parallelism/02_nested_parallel_for/*.cpp)
+vpath %.cpp $(sort $(dir $(SRC)))
default: build
echo "Start Build"
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
-CXX = ../../../../config/nvcc_wrapper
+CXX = ${KOKKOS_PATH}/bin/nvcc_wrapper
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.cuda)
+EXE = 02_nested_parallel_for.cuda
KOKKOS_DEVICES = "Cuda,OpenMP"
KOKKOS_ARCH = "SNB,Kepler35"
else
CXX = g++
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.host)
+EXE = 02_nested_parallel_for.host
KOKKOS_DEVICES = "OpenMP"
KOKKOS_ARCH = "SNB"
endif
DEPFLAGS = -M
-OBJ = $(SRC:.cpp=.o)
+OBJ = $(notdir $(SRC:.cpp=.o))
LIB =
include $(KOKKOS_PATH)/Makefile.kokkos
build: $(EXE)
+test: $(EXE)
+ ./$(EXE)
+
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
clean: kokkos-clean
rm -f *.o *.cuda *.host
# Compilation rules
%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
- $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
+ $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $< -o $(notdir $@)
diff --git a/lib/kokkos/example/tutorial/Hierarchical_Parallelism/03_vectorization/Makefile b/lib/kokkos/example/tutorial/Hierarchical_Parallelism/03_vectorization/Makefile
index 12ad36b31..a041b69b5 100644
--- a/lib/kokkos/example/tutorial/Hierarchical_Parallelism/03_vectorization/Makefile
+++ b/lib/kokkos/example/tutorial/Hierarchical_Parallelism/03_vectorization/Makefile
@@ -1,43 +1,48 @@
KOKKOS_PATH = ../../../..
-SRC = $(wildcard *.cpp)
+KOKKOS_SRC_PATH = ${KOKKOS_PATH}
+SRC = $(wildcard ${KOKKOS_SRC_PATH}/example/tutorial/Hierarchical_Parallelism/03_vectorization/*.cpp)
+vpath %.cpp $(sort $(dir $(SRC)))
default: build
echo "Start Build"
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
-CXX = ../../../../config/nvcc_wrapper
+CXX = ${KOKKOS_PATH}/bin/nvcc_wrapper
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.cuda)
+EXE = 03_vectorization.cuda
KOKKOS_DEVICES = "Cuda,OpenMP"
KOKKOS_ARCH = "SNB,Kepler35"
else
CXX = g++
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.host)
+EXE = 03_vectorization.host
KOKKOS_DEVICES = "OpenMP"
KOKKOS_ARCH = "SNB"
endif
DEPFLAGS = -M
-OBJ = $(SRC:.cpp=.o)
+OBJ = $(notdir $(SRC:.cpp=.o))
LIB =
include $(KOKKOS_PATH)/Makefile.kokkos
build: $(EXE)
+test: $(EXE)
+ ./$(EXE)
+
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
clean: kokkos-clean
rm -f *.o *.cuda *.host
# Compilation rules
%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
- $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
+ $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $< -o $(notdir $@)
diff --git a/lib/kokkos/example/tutorial/Hierarchical_Parallelism/04_team_scan/Makefile b/lib/kokkos/example/tutorial/Hierarchical_Parallelism/04_team_scan/Makefile
index 12ad36b31..6418875c9 100644
--- a/lib/kokkos/example/tutorial/Hierarchical_Parallelism/04_team_scan/Makefile
+++ b/lib/kokkos/example/tutorial/Hierarchical_Parallelism/04_team_scan/Makefile
@@ -1,43 +1,48 @@
KOKKOS_PATH = ../../../..
-SRC = $(wildcard *.cpp)
+KOKKOS_SRC_PATH = ${KOKKOS_PATH}
+SRC = $(wildcard ${KOKKOS_SRC_PATH}/example/tutorial/Hierarchical_Parallelism/04_team_scan/*.cpp)
+vpath %.cpp $(sort $(dir $(SRC)))
default: build
echo "Start Build"
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
-CXX = ../../../../config/nvcc_wrapper
+CXX = ${KOKKOS_PATH}/bin/nvcc_wrapper
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.cuda)
+EXE = 04_team_scan.cuda
KOKKOS_DEVICES = "Cuda,OpenMP"
KOKKOS_ARCH = "SNB,Kepler35"
else
CXX = g++
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS =
-EXE = $(SRC:.cpp=.host)
+EXE = 04_team_scan.host
KOKKOS_DEVICES = "OpenMP"
KOKKOS_ARCH = "SNB"
endif
DEPFLAGS = -M
-OBJ = $(SRC:.cpp=.o)
+OBJ = $(notdir $(SRC:.cpp=.o))
LIB =
include $(KOKKOS_PATH)/Makefile.kokkos
build: $(EXE)
+test: $(EXE)
+ ./$(EXE)
+
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
clean: kokkos-clean
rm -f *.o *.cuda *.host
# Compilation rules
%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
- $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
+ $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $< -o $(notdir $@)
diff --git a/lib/kokkos/example/tutorial/Hierarchical_Parallelism/04_team_scan/team_scan.cpp b/lib/kokkos/example/tutorial/Hierarchical_Parallelism/04_team_scan/team_scan.cpp
index c12b11d04..ebc8578f0 100644
--- a/lib/kokkos/example/tutorial/Hierarchical_Parallelism/04_team_scan/team_scan.cpp
+++ b/lib/kokkos/example/tutorial/Hierarchical_Parallelism/04_team_scan/team_scan.cpp
@@ -1,141 +1,144 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Core.hpp>
#include <Kokkos_DualView.hpp>
#include <impl/Kokkos_Timer.hpp>
#include <cstdio>
#include <cstdlib>
typedef Kokkos::DefaultExecutionSpace Device ;
typedef Kokkos::HostSpace::execution_space Host ;
typedef Kokkos::TeamPolicy< Device > team_policy ;
typedef team_policy::member_type team_member ;
static const int TEAM_SIZE = 16 ;
struct find_2_tuples {
int chunk_size;
Kokkos::View<const int*> data;
Kokkos::View<int**> histogram;
find_2_tuples(int chunk_size_, Kokkos::DualView<int*> data_,
Kokkos::DualView<int**> histogram_):chunk_size(chunk_size_),
data(data_.d_view),histogram(histogram_.d_view) {
data_.sync<Device>();
histogram_.sync<Device>();
histogram_.modify<Device>();
}
KOKKOS_INLINE_FUNCTION
void operator() ( const team_member & dev) const {
Kokkos::View<int**,Kokkos::MemoryUnmanaged> l_histogram(dev.team_shmem(),TEAM_SIZE,TEAM_SIZE);
Kokkos::View<int*,Kokkos::MemoryUnmanaged> l_data(dev.team_shmem(),chunk_size+1);
const int i = dev.league_rank() * chunk_size;
for(int j = dev.team_rank(); j<chunk_size+1; j+=dev.team_size())
l_data(j) = data(i+j);
for(int k = dev.team_rank(); k < TEAM_SIZE; k+=dev.team_size())
for(int l = 0; l < TEAM_SIZE; l++)
l_histogram(k,l) = 0;
dev.team_barrier();
for(int j = 0; j<chunk_size; j++) {
for(int k = dev.team_rank(); k < TEAM_SIZE; k+=dev.team_size())
for(int l = 0; l < TEAM_SIZE; l++) {
if((l_data(j) == k) && (l_data(j+1)==l))
l_histogram(k,l)++;
}
}
for(int k = dev.team_rank(); k < TEAM_SIZE; k+=dev.team_size())
for(int l = 0; l < TEAM_SIZE; l++) {
Kokkos::atomic_fetch_add(&histogram(k,l),l_histogram(k,l));
}
dev.team_barrier();
}
- size_t team_shmem_size( int team_size ) const { return sizeof(int)*(chunk_size+2 + team_size * team_size ); }
+ size_t team_shmem_size( int team_size ) const {
+ return Kokkos::View<int**,Kokkos::MemoryUnmanaged>::shmem_size(TEAM_SIZE,TEAM_SIZE) +
+ Kokkos::View<int*,Kokkos::MemoryUnmanaged>::shmem_size(chunk_size+1);
+ }
};
int main(int narg, char* args[]) {
Kokkos::initialize(narg,args);
int chunk_size = 1024;
int nchunks = 100000; //1024*1024;
Kokkos::DualView<int*> data("data",nchunks*chunk_size+1);
srand(1231093);
for(int i = 0; i < (int) data.dimension_0(); i++) {
data.h_view(i) = rand()%TEAM_SIZE;
}
data.modify<Host>();
data.sync<Device>();
Kokkos::DualView<int**> histogram("histogram",TEAM_SIZE,TEAM_SIZE);
Kokkos::Timer timer;
// threads/team is automatically limited to maximum supported by the device.
Kokkos::parallel_for( team_policy( nchunks , TEAM_SIZE )
, find_2_tuples(chunk_size,data,histogram) );
Kokkos::fence();
double time = timer.seconds();
histogram.sync<Host>();
printf("Time: %f \n\n",time);
int sum = 0;
for(int k=0; k<TEAM_SIZE; k++) {
for(int l=0; l<TEAM_SIZE; l++) {
printf("%i ",histogram.h_view(k,l));
sum += histogram.h_view(k,l);
}
printf("\n");
}
printf("Result: %i %i\n",sum,chunk_size*nchunks);
Kokkos::finalize();
}
diff --git a/lib/kokkos/example/tutorial/Hierarchical_Parallelism/Makefile b/lib/kokkos/example/tutorial/Hierarchical_Parallelism/Makefile
index 9d6fff798..44fdf90f8 100644
--- a/lib/kokkos/example/tutorial/Hierarchical_Parallelism/Makefile
+++ b/lib/kokkos/example/tutorial/Hierarchical_Parallelism/Makefile
@@ -1,72 +1,95 @@
-default:
+ifndef KOKKOS_PATH
+ MAKEFILE_PATH := $(abspath $(lastword $(MAKEFILE_LIST)))
+ KOKKOS_PATH = $(subst Makefile,,$(MAKEFILE_PATH))../../..
+endif
+
+ifndef KOKKOS_SETTINGS
+ KOKKOS_SETTINGS = "KOKKOS_PATH=${KOKKOS_PATH}"
+ ifdef KOKKOS_ARCH
+ KOKKOS_SETTINGS += "KOKKOS_ARCH=${KOKKOS_ARCH}"
+ endif
+ ifdef KOKKOS_DEVICES
+ KOKKOS_SETTINGS += "KOKKOS_DEVICES=${KOKKOS_DEVICES}"
+ endif
+ ifdef KOKKOS_OPTIONS
+ KOKKOS_SETTINGS += "KOKKOS_OPTIONS=${KOKKOS_OPTIONS}"
+ endif
+ ifdef KOKKOS_CUDA_OPTIONS
+ KOKKOS_SETTINGS += "KOKKOS_CUDA_OPTIONS=${KOKKOS_CUDA_OPTIONS}"
+ endif
+endif
+
+build:
+ mkdir -p 01_thread_teams
cd ./01_thread_teams; \
- make -j 4
+ make build -j 4 -f ${KOKKOS_PATH}/example/tutorial/Hierarchical_Parallelism/01_thread_teams/Makefile ${KOKKOS_SETTINGS}
+ mkdir -p 01_thread_teams_lambda
cd ./01_thread_teams_lambda; \
- make -j 4
+ make build -j 4 -f ${KOKKOS_PATH}/example/tutorial/Hierarchical_Parallelism/01_thread_teams_lambda/Makefile ${KOKKOS_SETTINGS}
+ mkdir -p 02_nested_parallel_for
cd ./02_nested_parallel_for; \
- make -j 4
+ make build -j 4 -f ${KOKKOS_PATH}/example/tutorial/Hierarchical_Parallelism/02_nested_parallel_for/Makefile ${KOKKOS_SETTINGS}
+ mkdir -p 03_vectorization
cd ./03_vectorization; \
- make -j 4
+ make build -j 4 -f ${KOKKOS_PATH}/example/tutorial/Hierarchical_Parallelism/03_vectorization/Makefile ${KOKKOS_SETTINGS}
+ mkdir -p 04_team_scan
cd ./04_team_scan; \
- make -j 4
+ make build -j 4 -f ${KOKKOS_PATH}/example/tutorial/Hierarchical_Parallelism/04_team_scan/Makefile ${KOKKOS_SETTINGS}
-openmp:
+build-insource:
cd ./01_thread_teams; \
- make -j 4 KOKKOS_DEVICES=OpenMP
+ make build -j 4 ${KOKKOS_SETTINGS}
cd ./01_thread_teams_lambda; \
- make -j 4 KOKKOS_DEVICES=OpenMP
+ make build -j 4 ${KOKKOS_SETTINGS}
cd ./02_nested_parallel_for; \
- make -j 4 KOKKOS_DEVICES=OpenMP
+ make build -j 4 ${KOKKOS_SETTINGS}
cd ./03_vectorization; \
- make -j 4 KOKKOS_DEVICES=OpenMP
+ make build -j 4 ${KOKKOS_SETTINGS}
cd ./04_team_scan; \
- make -j 4 KOKKOS_DEVICES=OpenMP
-
-pthreads:
+ make build -j 4 ${KOKKOS_SETTINGS}
+test:
cd ./01_thread_teams; \
- make -j 4 KOKKOS_DEVICES=Pthreads
+ make test -j 4 -f ${KOKKOS_PATH}/example/tutorial/Hierarchical_Parallelism/01_thread_teams/Makefile ${KOKKOS_SETTINGS}
cd ./01_thread_teams_lambda; \
- make -j 4 KOKKOS_DEVICES=Pthreads
+ make test -j 4 -f ${KOKKOS_PATH}/example/tutorial/Hierarchical_Parallelism/01_thread_teams_lambda/Makefile ${KOKKOS_SETTINGS}
cd ./02_nested_parallel_for; \
- make -j 4 KOKKOS_DEVICES=Pthreads
+ make test -j 4 -f ${KOKKOS_PATH}/example/tutorial/Hierarchical_Parallelism/02_nested_parallel_for/Makefile ${KOKKOS_SETTINGS}
cd ./03_vectorization; \
- make -j 4 KOKKOS_DEVICES=Pthreads
+ make test -j 4 -f ${KOKKOS_PATH}/example/tutorial/Hierarchical_Parallelism/03_vectorization/Makefile ${KOKKOS_SETTINGS}
cd ./04_team_scan; \
- make -j 4 KOKKOS_DEVICES=Pthreads
+ make test -j 4 -f ${KOKKOS_PATH}/example/tutorial/Hierarchical_Parallelism/04_team_scan/Makefile ${KOKKOS_SETTINGS}
-serial:
+test-insource:
cd ./01_thread_teams; \
- make -j 4 KOKKOS_DEVICES=Serial
+ make test -j 4 ${KOKKOS_SETTINGS}
cd ./01_thread_teams_lambda; \
- make -j 4 KOKKOS_DEVICES=Serial
+ make test -j 4 ${KOKKOS_SETTINGS}
cd ./02_nested_parallel_for; \
- make -j 4 KOKKOS_DEVICES=Serial
+ make test -j 4 ${KOKKOS_SETTINGS}
cd ./03_vectorization; \
- make -j 4 KOKKOS_DEVICES=Serial
+ make test -j 4 ${KOKKOS_SETTINGS}
cd ./04_team_scan; \
- make -j 4 KOKKOS_DEVICES=Serial
-
-cuda:
+ make test -j 4 ${KOKKOS_SETTINGS}
+clean:
cd ./01_thread_teams; \
- make -j 4 KOKKOS_DEVICES=Cuda,Serial
+ make clean -f ${KOKKOS_PATH}/example/tutorial/Hierarchical_Parallelism/01_thread_teams/Makefile ${KOKKOS_SETTINGS}
cd ./01_thread_teams_lambda; \
- make -j 4 KOKKOS_DEVICES=Cuda,Serial
+ make clean -f ${KOKKOS_PATH}/example/tutorial/Hierarchical_Parallelism/01_thread_teams_lambda/Makefile ${KOKKOS_SETTINGS}
cd ./02_nested_parallel_for; \
- make -j 4 KOKKOS_DEVICES=Cuda,Serial
+ make clean -f ${KOKKOS_PATH}/example/tutorial/Hierarchical_Parallelism/02_nested_parallel_for/Makefile ${KOKKOS_SETTINGS}
cd ./03_vectorization; \
- make -j 4 KOKKOS_DEVICES=Cuda,Serial
+ make clean -f ${KOKKOS_PATH}/example/tutorial/Hierarchical_Parallelism/03_vectorization/Makefile ${KOKKOS_SETTINGS}
cd ./04_team_scan; \
- make -j 4 KOKKOS_DEVICES=Cuda,Serial
+ make clean -f ${KOKKOS_PATH}/example/tutorial/Hierarchical_Parallelism/04_team_scan/Makefile ${KOKKOS_SETTINGS}
-clean:
+clean-insource:
cd ./01_thread_teams; \
- make clean
+ make clean ${KOKKOS_SETTINGS}
cd ./01_thread_teams_lambda; \
- make clean
+ make clean ${KOKKOS_SETTINGS}
cd ./02_nested_parallel_for; \
- make clean
+ make clean ${KOKKOS_SETTINGS}
cd ./03_vectorization; \
- make clean
+ make clean ${KOKKOS_SETTINGS}
cd ./04_team_scan; \
- make clean
-
+ make clean ${KOKKOS_SETTINGS}
diff --git a/lib/kokkos/example/tutorial/Makefile b/lib/kokkos/example/tutorial/Makefile
index 300d98ab4..063ace8aa 100644
--- a/lib/kokkos/example/tutorial/Makefile
+++ b/lib/kokkos/example/tutorial/Makefile
@@ -1,144 +1,174 @@
-default:
+
+ifndef KOKKOS_PATH
+ MAKEFILE_PATH := $(abspath $(lastword $(MAKEFILE_LIST)))
+ KOKKOS_PATH = $(subst Makefile,,$(MAKEFILE_PATH))../..
+endif
+
+ifndef KOKKOS_SETTINGS
+ KOKKOS_SETTINGS = "KOKKOS_PATH=${KOKKOS_PATH}"
+ ifdef KOKKOS_ARCH
+ KOKKOS_SETTINGS += "KOKKOS_ARCH=${KOKKOS_ARCH}"
+ endif
+ ifdef KOKKOS_DEVICES
+ KOKKOS_SETTINGS += "KOKKOS_DEVICES=${KOKKOS_DEVICES}"
+ endif
+ ifdef KOKKOS_OPTIONS
+ KOKKOS_SETTINGS += "KOKKOS_OPTIONS=${KOKKOS_OPTIONS}"
+ endif
+ ifdef KOKKOS_CUDA_OPTIONS
+ KOKKOS_SETTINGS += "KOKKOS_CUDA_OPTIONS=${KOKKOS_CUDA_OPTIONS}"
+ endif
+endif
+
+build:
+ mkdir -p 01_hello_world
cd ./01_hello_world; \
- make -j 4
+ make build -j 4 -f ${KOKKOS_PATH}/example/tutorial/01_hello_world/Makefile ${KOKKOS_SETTINGS}
+ mkdir -p 01_hello_world_lambda
cd ./01_hello_world_lambda; \
- make -j 4
+ make build -j 4 -f ${KOKKOS_PATH}/example/tutorial/01_hello_world_lambda/Makefile ${KOKKOS_SETTINGS}
+ mkdir -p 02_simple_reduce
cd ./02_simple_reduce; \
- make -j 4
+ make build -j 4 -f ${KOKKOS_PATH}/example/tutorial/02_simple_reduce/Makefile ${KOKKOS_SETTINGS}
+ mkdir -p 02_simple_reduce_lambda
cd ./02_simple_reduce_lambda; \
- make -j 4
+ make build -j 4 -f ${KOKKOS_PATH}/example/tutorial/02_simple_reduce_lambda/Makefile ${KOKKOS_SETTINGS}
+ mkdir -p 03_simple_view
cd ./03_simple_view; \
- make -j 4
+ make build -j 4 -f ${KOKKOS_PATH}/example/tutorial/03_simple_view/Makefile ${KOKKOS_SETTINGS}
+ mkdir -p 03_simple_view_lambda
cd ./03_simple_view_lambda; \
- make -j 4
+ make build -j 4 -f ${KOKKOS_PATH}/example/tutorial/03_simple_view_lambda/Makefile ${KOKKOS_SETTINGS}
+ mkdir -p 04_simple_memoryspaces
cd ./04_simple_memoryspaces; \
- make -j 4
+ make build -j 4 -f ${KOKKOS_PATH}/example/tutorial/04_simple_memoryspaces/Makefile ${KOKKOS_SETTINGS}
+ mkdir -p 05_simple_atomics
cd ./05_simple_atomics; \
- make -j 4
+ make build -j 4 -f ${KOKKOS_PATH}/example/tutorial/05_simple_atomics/Makefile ${KOKKOS_SETTINGS}
+ mkdir -p Advanced_Views
cd ./Advanced_Views; \
- make -j 4
+ make build -f ${KOKKOS_PATH}/example/tutorial/Advanced_Views/Makefile KOKKOS_SETTINGS='${KOKKOS_SETTINGS}'
+ mkdir -p Algorithms
cd ./Algorithms; \
- make -j 4
+ make build -f ${KOKKOS_PATH}/example/tutorial/Algorithms/Makefile KOKKOS_SETTINGS='${KOKKOS_SETTINGS}'
+ mkdir -p Hierarchical_Parallelism
cd ./Hierarchical_Parallelism; \
- make -j 4
+ make build -f ${KOKKOS_PATH}/example/tutorial/Hierarchical_Parallelism/Makefile KOKKOS_SETTINGS='${KOKKOS_SETTINGS}'
-openmp:
+build-insource:
cd ./01_hello_world; \
- make -j 4 KOKKOS_DEVICES=OpenMP
+ make build -j 4 ${KOKKOS_SETTINGS}
cd ./01_hello_world_lambda; \
- make -j 4 KOKKOS_DEVICES=OpenMP
+ make build -j 4 ${KOKKOS_SETTINGS}
cd ./02_simple_reduce; \
- make -j 4 KOKKOS_DEVICES=OpenMP
+ make build -j 4 ${KOKKOS_SETTINGS}
cd ./02_simple_reduce_lambda; \
- make -j 4 KOKKOS_DEVICES=OpenMP
+ make build -j 4 ${KOKKOS_SETTINGS}
cd ./03_simple_view; \
- make -j 4 KOKKOS_DEVICES=OpenMP
+ make build -j 4 ${KOKKOS_SETTINGS}
cd ./03_simple_view_lambda; \
- make -j 4 KOKKOS_DEVICES=OpenMP
+ make build -j 4 ${KOKKOS_SETTINGS}
cd ./04_simple_memoryspaces; \
- make -j 4 KOKKOS_DEVICES=OpenMP
+ make build -j 4 ${KOKKOS_SETTINGS}
cd ./05_simple_atomics; \
- make -j 4 KOKKOS_DEVICES=OpenMP
+ make build -j 4 ${KOKKOS_SETTINGS}
cd ./Advanced_Views; \
- make -j 4 KOKKOS_DEVICES=OpenMP
+ make build KOKKOS_SETTINGS='${KOKKOS_SETTINGS}'
cd ./Algorithms; \
- make -j 4 KOKKOS_DEVICES=OpenMP
+ make build KOKKOS_SETTINGS='${KOKKOS_SETTINGS}'
cd ./Hierarchical_Parallelism; \
- make -j 4 KOKKOS_DEVICES=OpenMP
-
-pthreads:
+ make build KOKKOS_SETTINGS='${KOKKOS_SETTINGS}'
+test:
cd ./01_hello_world; \
- make -j 4 KOKKOS_DEVICES=Pthreads
+ make test -j 4 -f ${KOKKOS_PATH}/example/tutorial/01_hello_world/Makefile ${KOKKOS_SETTINGS}
cd ./01_hello_world_lambda; \
- make -j 4 KOKKOS_DEVICES=Pthreads
+ make test -j 4 -f ${KOKKOS_PATH}/example/tutorial/01_hello_world_lambda/Makefile ${KOKKOS_SETTINGS}
cd ./02_simple_reduce; \
- make -j 4 KOKKOS_DEVICES=Pthreads
+ make test -j 4 -f ${KOKKOS_PATH}/example/tutorial/02_simple_reduce/Makefile ${KOKKOS_SETTINGS}
cd ./02_simple_reduce_lambda; \
- make -j 4 KOKKOS_DEVICES=Pthreads
+ make test -j 4 -f ${KOKKOS_PATH}/example/tutorial/02_simple_reduce_lambda/Makefile ${KOKKOS_SETTINGS}
cd ./03_simple_view; \
- make -j 4 KOKKOS_DEVICES=Pthreads
+ make test -j 4 -f ${KOKKOS_PATH}/example/tutorial/03_simple_view/Makefile ${KOKKOS_SETTINGS}
cd ./03_simple_view_lambda; \
- make -j 4 KOKKOS_DEVICES=Pthreads
+ make test -j 4 -f ${KOKKOS_PATH}/example/tutorial/03_simple_view_lambda/Makefile ${KOKKOS_SETTINGS}
cd ./04_simple_memoryspaces; \
- make -j 4 KOKKOS_DEVICES=Pthreads
+ make test -j 4 -f ${KOKKOS_PATH}/example/tutorial/04_simple_memoryspaces/Makefile ${KOKKOS_SETTINGS}
cd ./05_simple_atomics; \
- make -j 4 KOKKOS_DEVICES=Pthreads
+ make test -j 4 -f ${KOKKOS_PATH}/example/tutorial/05_simple_atomics/Makefile ${KOKKOS_SETTINGS}
cd ./Advanced_Views; \
- make -j 4 KOKKOS_DEVICES=Pthreads
+ make test -f ${KOKKOS_PATH}/example/tutorial/Advanced_Views/Makefile KOKKOS_SETTINGS='${KOKKOS_SETTINGS}'
cd ./Algorithms; \
- make -j 4 KOKKOS_DEVICES=Pthreads
+ make test -f ${KOKKOS_PATH}/example/tutorial/Algorithms/Makefile KOKKOS_SETTINGS='${KOKKOS_SETTINGS}'
cd ./Hierarchical_Parallelism; \
- make -j 4 KOKKOS_DEVICES=Pthreads
+ make test -f ${KOKKOS_PATH}/example/tutorial/Hierarchical_Parallelism/Makefile KOKKOS_SETTINGS='${KOKKOS_SETTINGS}'
-serial:
+test-insource:
cd ./01_hello_world; \
- make -j 4 KOKKOS_DEVICES=Serial
+ make test -j 4 ${KOKKOS_SETTINGS}
cd ./01_hello_world_lambda; \
- make -j 4 KOKKOS_DEVICES=Serial
+ make test -j 4 ${KOKKOS_SETTINGS}
cd ./02_simple_reduce; \
- make -j 4 KOKKOS_DEVICES=Serial
+ make test -j 4 ${KOKKOS_SETTINGS}
cd ./02_simple_reduce_lambda; \
- make -j 4 KOKKOS_DEVICES=Serial
+ make test -j 4 ${KOKKOS_SETTINGS}
cd ./03_simple_view; \
- make -j 4 KOKKOS_DEVICES=Serial
+ make test -j 4 ${KOKKOS_SETTINGS}
cd ./03_simple_view_lambda; \
- make -j 4 KOKKOS_DEVICES=Serial
+ make test -j 4 ${KOKKOS_SETTINGS}
cd ./04_simple_memoryspaces; \
- make -j 4 KOKKOS_DEVICES=Serial
+ make test -j 4 ${KOKKOS_SETTINGS}
cd ./05_simple_atomics; \
- make -j 4 KOKKOS_DEVICES=Serial
+ make test -j 4 ${KOKKOS_SETTINGS}
cd ./Advanced_Views; \
- make -j 4 KOKKOS_DEVICES=Serial
+ make test KOKKOS_SETTINGS='${KOKKOS_SETTINGS}'
cd ./Algorithms; \
- make -j 4 KOKKOS_DEVICES=Serial
+ make test KOKKOS_SETTINGS='${KOKKOS_SETTINGS}'
cd ./Hierarchical_Parallelism; \
- make -j 4 KOKKOS_DEVICES=Serial
-
-cuda:
+ make test KOKKOS_SETTINGS='${KOKKOS_SETTINGS}'
+clean:
cd ./01_hello_world; \
- make -j 4 KOKKOS_DEVICES=Cuda,Serial
+ make clean -f ${KOKKOS_PATH}/example/tutorial/01_hello_world/Makefile ${KOKKOS_SETTINGS}
cd ./01_hello_world_lambda; \
- make -j 4 KOKKOS_DEVICES=Cuda,Serial
+ make clean -f ${KOKKOS_PATH}/example/tutorial/01_hello_world_lambda/Makefile ${KOKKOS_SETTINGS}
cd ./02_simple_reduce; \
- make -j 4 KOKKOS_DEVICES=Cuda,Serial
+ make clean -f ${KOKKOS_PATH}/example/tutorial/02_simple_reduce/Makefile ${KOKKOS_SETTINGS}
cd ./02_simple_reduce_lambda; \
- make -j 4 KOKKOS_DEVICES=Cuda,Serial
+ make clean -f ${KOKKOS_PATH}/example/tutorial/02_simple_reduce_lambda/Makefile ${KOKKOS_SETTINGS}
cd ./03_simple_view; \
- make -j 4 KOKKOS_DEVICES=Cuda,Serial
+ make clean -f ${KOKKOS_PATH}/example/tutorial/03_simple_view/Makefile ${KOKKOS_SETTINGS}
cd ./03_simple_view_lambda; \
- make -j 4 KOKKOS_DEVICES=Cuda,Serial
+ make clean -f ${KOKKOS_PATH}/example/tutorial/03_simple_view_lambda/Makefile ${KOKKOS_SETTINGS}
cd ./04_simple_memoryspaces; \
- make -j 4 KOKKOS_DEVICES=Cuda,Serial
+ make clean -f ${KOKKOS_PATH}/example/tutorial/04_simple_memoryspaces/Makefile ${KOKKOS_SETTINGS}
cd ./05_simple_atomics; \
- make -j 4 KOKKOS_DEVICES=Cuda,Serial
+ make clean -f ${KOKKOS_PATH}/example/tutorial/05_simple_atomics/Makefile ${KOKKOS_SETTINGS}
cd ./Advanced_Views; \
- make -j 4 KOKKOS_DEVICES=Cuda,Serial
+ make clean -f ${KOKKOS_PATH}/example/tutorial/Advanced_Views/Makefile KOKKOS_SETTINGS='${KOKKOS_SETTINGS}'
cd ./Algorithms; \
- make -j 4 KOKKOS_DEVICES=Cuda,Serial
+ make clean -f ${KOKKOS_PATH}/example/tutorial/Algorithms/Makefile KOKKOS_SETTINGS='${KOKKOS_SETTINGS}'
cd ./Hierarchical_Parallelism; \
- make -j 4 KOKKOS_DEVICES=Cuda,Serial
+ make clean -f ${KOKKOS_PATH}/example/tutorial/Hierarchical_Parallelism/Makefile KOKKOS_SETTINGS='${KOKKOS_SETTINGS}'
-clean:
+clean-insource:
cd ./01_hello_world; \
- make clean
+ make clean ${KOKKOS_SETTINGS}
cd ./01_hello_world_lambda; \
- make clean
+ make clean ${KOKKOS_SETTINGS}
cd ./02_simple_reduce; \
- make clean
+ make clean ${KOKKOS_SETTINGS}
cd ./02_simple_reduce_lambda; \
- make clean
+ make clean ${KOKKOS_SETTINGS}
cd ./03_simple_view; \
- make clean
+ make clean ${KOKKOS_SETTINGS}
cd ./03_simple_view_lambda; \
- make clean
+ make clean ${KOKKOS_SETTINGS}
cd ./04_simple_memoryspaces; \
- make clean
+ make clean ${KOKKOS_SETTINGS}
cd ./05_simple_atomics; \
- make clean
+ make clean ${KOKKOS_SETTINGS}
cd ./Advanced_Views; \
- make clean
+ make clean KOKKOS_SETTINGS='${KOKKOS_SETTINGS}'
cd ./Algorithms; \
- make clean
+ make clean KOKKOS_SETTINGS='${KOKKOS_SETTINGS}'
cd ./Hierarchical_Parallelism; \
- make clean
-
+ make clean KOKKOS_SETTINGS='${KOKKOS_SETTINGS}'
diff --git a/lib/kokkos/generate_makefile.bash b/lib/kokkos/generate_makefile.bash
index 86f136da9..6fa03ebb3 100755
--- a/lib/kokkos/generate_makefile.bash
+++ b/lib/kokkos/generate_makefile.bash
@@ -1,336 +1,407 @@
#!/bin/bash
KOKKOS_DEVICES=""
while [[ $# > 0 ]]
do
key="$1"
case $key in
--kokkos-path*)
KOKKOS_PATH="${key#*=}"
;;
--prefix*)
PREFIX="${key#*=}"
;;
--with-cuda)
KOKKOS_DEVICES="${KOKKOS_DEVICES},Cuda"
CUDA_PATH_NVCC=`which nvcc`
CUDA_PATH=${CUDA_PATH_NVCC%/bin/nvcc}
;;
# Catch this before '--with-cuda*'
--with-cuda-options*)
KOKKOS_CUDA_OPT="${key#*=}"
;;
--with-cuda*)
KOKKOS_DEVICES="${KOKKOS_DEVICES},Cuda"
CUDA_PATH="${key#*=}"
;;
--with-openmp)
KOKKOS_DEVICES="${KOKKOS_DEVICES},OpenMP"
;;
--with-pthread)
KOKKOS_DEVICES="${KOKKOS_DEVICES},Pthread"
;;
--with-serial)
KOKKOS_DEVICES="${KOKKOS_DEVICES},Serial"
;;
--with-qthread*)
KOKKOS_DEVICES="${KOKKOS_DEVICES},Qthread"
QTHREAD_PATH="${key#*=}"
;;
--with-devices*)
DEVICES="${key#*=}"
KOKKOS_DEVICES="${KOKKOS_DEVICES},${DEVICES}"
;;
--with-gtest*)
GTEST_PATH="${key#*=}"
;;
--with-hwloc*)
HWLOC_PATH="${key#*=}"
;;
--arch*)
KOKKOS_ARCH="${key#*=}"
;;
--cxxflags*)
CXXFLAGS="${key#*=}"
;;
--ldflags*)
LDFLAGS="${key#*=}"
;;
--debug|-dbg)
KOKKOS_DEBUG=yes
;;
--compiler*)
COMPILER="${key#*=}"
+ CNUM=`which ${COMPILER} 2>&1 >/dev/null | grep "no ${COMPILER}" | wc -l`
+ if [ ${CNUM} -gt 0 ]; then
+ echo "Invalid compiler by --compiler command: '${COMPILER}'"
+ exit
+ fi
+ if [[ ! -n ${COMPILER} ]]; then
+ echo "Empty compiler specified by --compiler command."
+ exit
+ fi
+ CNUM=`which ${COMPILER} | grep ${COMPILER} | wc -l`
+ if [ ${CNUM} -eq 0 ]; then
+ echo "Invalid compiler by --compiler command: '${COMPILER}'"
+ exit
+ fi
;;
--with-options*)
KOKKOS_OPT="${key#*=}"
;;
--help)
echo "Kokkos configure options:"
echo "--kokkos-path=/Path/To/Kokkos: Path to the Kokkos root directory"
echo "--prefix=/Install/Path: Path to where the Kokkos library should be installed"
echo ""
echo "--with-cuda[=/Path/To/Cuda]: enable Cuda and set path to Cuda Toolkit"
echo "--with-openmp: enable OpenMP backend"
echo "--with-pthread: enable Pthreads backend"
echo "--with-serial: enable Serial backend"
echo "--with-qthread=/Path/To/Qthread: enable Qthread backend"
echo "--with-devices: explicitly add a set of backends"
echo ""
echo "--arch=[OPTIONS]: set target architectures. Options are:"
- echo " SNB = Intel Sandy/Ivy Bridge CPUs"
- echo " HSW = Intel Haswell CPUs"
- echo " KNC = Intel Knights Corner Xeon Phi"
- echo " KNL = Intel Knights Landing Xeon Phi"
- echo " Kepler30 = NVIDIA Kepler generation CC 3.0"
- echo " Kepler35 = NVIDIA Kepler generation CC 3.5"
- echo " Kepler37 = NVIDIA Kepler generation CC 3.7"
- echo " Maxwell50 = NVIDIA Maxwell generation CC 5.0"
- echo " Power8 = IBM Power 8 CPUs"
+ echo " ARMv80 = ARMv8.0 Compatible CPU"
+ echo " ARMv81 = ARMv8.1 Compatible CPU"
+ echo " ARMv8-ThunderX = ARMv8 Cavium ThunderX CPU"
+ echo " SNB = Intel Sandy/Ivy Bridge CPUs"
+ echo " HSW = Intel Haswell CPUs"
+ echo " BDW = Intel Broadwell Xeon E-class CPUs"
+ echo " SKX = Intel Sky Lake Xeon E-class HPC CPUs (AVX512)"
+ echo " KNC = Intel Knights Corner Xeon Phi"
+ echo " KNL = Intel Knights Landing Xeon Phi"
+ echo " Kepler30 = NVIDIA Kepler generation CC 3.0"
+ echo " Kepler35 = NVIDIA Kepler generation CC 3.5"
+ echo " Kepler37 = NVIDIA Kepler generation CC 3.7"
+ echo " Pascal60 = NVIDIA Pascal generation CC 6.0"
+ echo " Pascal61 = NVIDIA Pascal generation CC 6.1"
+ echo " Maxwell50 = NVIDIA Maxwell generation CC 5.0"
+ echo " Power8 = IBM POWER8 CPUs"
echo ""
echo "--compiler=/Path/To/Compiler set the compiler"
echo "--debug,-dbg: enable Debugging"
echo "--cxxflags=[FLAGS] overwrite CXXFLAGS for library build and test build"
echo " This will still set certain required flags via"
echo " KOKKOS_CXXFLAGS (such as -fopenmp, --std=c++11, etc.)"
echo "--ldflags=[FLAGS] overwrite LDFLAGS for library build and test build"
echo " This will still set certain required flags via"
echo " KOKKOS_LDFLAGS (such as -fopenmp, -lpthread, etc.)"
echo "--with-gtest=/Path/To/Gtest: set path to gtest (used in unit and performance tests"
echo "--with-hwloc=/Path/To/Hwloc: set path to hwloc"
echo "--with-options=[OPTIONS]: additional options to Kokkos:"
echo " aggressive_vectorization = add ivdep on loops"
echo "--with-cuda-options=[OPTIONS]: additional options to CUDA:"
echo " force_uvm, use_ldg, enable_lambda, rdc"
exit 0
;;
*)
echo "warning: ignoring unknown option $key"
;;
esac
shift
done
# If KOKKOS_PATH undefined, assume parent dir of this
# script is the KOKKOS_PATH
if [ -z "$KOKKOS_PATH" ]; then
KOKKOS_PATH=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )
else
# Ensure KOKKOS_PATH is abs path
KOKKOS_PATH=$( cd $KOKKOS_PATH && pwd )
fi
if [ "${KOKKOS_PATH}" = "${PWD}" ] || [ "${KOKKOS_PATH}" = "${PWD}/" ]; then
echo "Running generate_makefile.sh in the Kokkos root directory is not allowed"
exit
fi
-KOKKOS_OPTIONS="KOKKOS_PATH=${KOKKOS_PATH}"
+KOKKOS_SRC_PATH=${KOKKOS_PATH}
+
+KOKKOS_SETTINGS="KOKKOS_SRC_PATH=${KOKKOS_SRC_PATH}"
+#KOKKOS_SETTINGS="KOKKOS_PATH=${KOKKOS_PATH}"
if [ ${#COMPILER} -gt 0 ]; then
-KOKKOS_OPTIONS="${KOKKOS_OPTIONS} CXX=${COMPILER}"
-fi
-if [ ${#PREFIX} -gt 0 ]; then
-KOKKOS_OPTIONS="${KOKKOS_OPTIONS} PREFIX=${PREFIX}"
+KOKKOS_SETTINGS="${KOKKOS_SETTINGS} CXX=${COMPILER}"
fi
if [ ${#KOKKOS_DEVICES} -gt 0 ]; then
-KOKKOS_OPTIONS="${KOKKOS_OPTIONS} KOKKOS_DEVICES=${KOKKOS_DEVICES}"
+KOKKOS_SETTINGS="${KOKKOS_SETTINGS} KOKKOS_DEVICES=${KOKKOS_DEVICES}"
fi
if [ ${#KOKKOS_ARCH} -gt 0 ]; then
-KOKKOS_OPTIONS="${KOKKOS_OPTIONS} KOKKOS_ARCH=${KOKKOS_ARCH}"
+KOKKOS_SETTINGS="${KOKKOS_SETTINGS} KOKKOS_ARCH=${KOKKOS_ARCH}"
fi
if [ ${#KOKKOS_DEBUG} -gt 0 ]; then
-KOKKOS_OPTIONS="${KOKKOS_OPTIONS} KOKKOS_DEBUG=${KOKKOS_DEBUG}"
+KOKKOS_SETTINGS="${KOKKOS_SETTINGS} KOKKOS_DEBUG=${KOKKOS_DEBUG}"
fi
if [ ${#CUDA_PATH} -gt 0 ]; then
-KOKKOS_OPTIONS="${KOKKOS_OPTIONS} CUDA_PATH=${CUDA_PATH}"
+KOKKOS_SETTINGS="${KOKKOS_SETTINGS} CUDA_PATH=${CUDA_PATH}"
fi
if [ ${#CXXFLAGS} -gt 0 ]; then
-KOKKOS_OPTIONS="${KOKKOS_OPTIONS} CXXFLAGS=\"${CXXFLAGS}\""
+KOKKOS_SETTINGS="${KOKKOS_SETTINGS} CXXFLAGS=\"${CXXFLAGS}\""
fi
if [ ${#LDFLAGS} -gt 0 ]; then
-KOKKOS_OPTIONS="${KOKKOS_OPTIONS} LDFLAGS=\"${LDFLAGS}\""
+KOKKOS_SETTINGS="${KOKKOS_SETTINGS} LDFLAGS=\"${LDFLAGS}\""
fi
if [ ${#GTEST_PATH} -gt 0 ]; then
-KOKKOS_OPTIONS="${KOKKOS_OPTIONS} GTEST_PATH=${GTEST_PATH}"
+KOKKOS_SETTINGS="${KOKKOS_SETTINGS} GTEST_PATH=${GTEST_PATH}"
else
GTEST_PATH=${KOKKOS_PATH}/tpls/gtest
-KOKKOS_OPTIONS="${KOKKOS_OPTIONS} GTEST_PATH=${GTEST_PATH}"
+KOKKOS_SETTINGS="${KOKKOS_SETTINGS} GTEST_PATH=${GTEST_PATH}"
fi
if [ ${#HWLOC_PATH} -gt 0 ]; then
-KOKKOS_OPTIONS="${KOKKOS_OPTIONS} HWLOC_PATH=${HWLOC_PATH} KOKKOS_USE_TPLS=hwloc"
+KOKKOS_SETTINGS="${KOKKOS_SETTINGS} HWLOC_PATH=${HWLOC_PATH} KOKKOS_USE_TPLS=hwloc"
fi
if [ ${#QTHREAD_PATH} -gt 0 ]; then
-KOKKOS_OPTIONS="${KOKKOS_OPTIONS} QTHREAD_PATH=${QTHREAD_PATH}"
+KOKKOS_SETTINGS="${KOKKOS_SETTINGS} QTHREAD_PATH=${QTHREAD_PATH}"
fi
if [ ${#KOKKOS_OPT} -gt 0 ]; then
-KOKKOS_OPTIONS="${KOKKOS_OPTIONS} KOKKOS_OPTIONS=${KOKKOS_OPT}"
+KOKKOS_SETTINGS="${KOKKOS_SETTINGS} KOKKOS_OPTIONS=${KOKKOS_OPT}"
fi
if [ ${#KOKKOS_CUDA_OPT} -gt 0 ]; then
-KOKKOS_OPTIONS="${KOKKOS_OPTIONS} KOKKOS_CUDA_OPTIONS=${KOKKOS_CUDA_OPT}"
+KOKKOS_SETTINGS="${KOKKOS_SETTINGS} KOKKOS_CUDA_OPTIONS=${KOKKOS_CUDA_OPT}"
+fi
+
+KOKKOS_SETTINGS_NO_KOKKOS_PATH="${KOKKOS_SETTINGS}"
+
+KOKKOS_TEST_INSTALL_PATH="${PWD}/install"
+if [ ${#PREFIX} -gt 0 ]; then
+KOKKOS_INSTALL_PATH="${PREFIX}"
+else
+KOKKOS_INSTALL_PATH=${KOKKOS_TEST_INSTALL_PATH}
fi
+
+
+mkdir install
+echo "#Makefile to satisfy existens of target kokkos-clean before installing the library" > install/Makefile.kokkos
+echo "kokkos-clean:" >> install/Makefile.kokkos
+echo "" >> install/Makefile.kokkos
mkdir core
mkdir core/unit_test
mkdir core/perf_test
mkdir containers
mkdir containers/unit_tests
mkdir containers/performance_tests
mkdir algorithms
mkdir algorithms/unit_tests
mkdir algorithms/performance_tests
mkdir example
mkdir example/fixture
mkdir example/feint
mkdir example/fenl
+mkdir example/tutorial
if [ ${#KOKKOS_ENABLE_EXAMPLE_ICHOL} -gt 0 ]; then
mkdir example/ichol
fi
+KOKKOS_SETTINGS="${KOKKOS_SETTINGS_NO_KOKKOS_PATH} KOKKOS_PATH=${KOKKOS_PATH}"
+
# Generate subdirectory makefiles.
-echo "KOKKOS_OPTIONS=${KOKKOS_OPTIONS}" > core/unit_test/Makefile
+echo "KOKKOS_SETTINGS=${KOKKOS_SETTINGS}" > core/unit_test/Makefile
echo "" >> core/unit_test/Makefile
echo "all:" >> core/unit_test/Makefile
-echo -e "\tmake -j -f ${KOKKOS_PATH}/core/unit_test/Makefile ${KOKKOS_OPTIONS}" >> core/unit_test/Makefile
+echo -e "\tmake -j -f ${KOKKOS_PATH}/core/unit_test/Makefile ${KOKKOS_SETTINGS}" >> core/unit_test/Makefile
echo "" >> core/unit_test/Makefile
echo "test: all" >> core/unit_test/Makefile
-echo -e "\tmake -f ${KOKKOS_PATH}/core/unit_test/Makefile ${KOKKOS_OPTIONS} test" >> core/unit_test/Makefile
+echo -e "\tmake -f ${KOKKOS_PATH}/core/unit_test/Makefile ${KOKKOS_SETTINGS} test" >> core/unit_test/Makefile
echo "" >> core/unit_test/Makefile
echo "clean:" >> core/unit_test/Makefile
-echo -e "\tmake -f ${KOKKOS_PATH}/core/unit_test/Makefile ${KOKKOS_OPTIONS} clean" >> core/unit_test/Makefile
+echo -e "\tmake -f ${KOKKOS_PATH}/core/unit_test/Makefile ${KOKKOS_SETTINGS} clean" >> core/unit_test/Makefile
-echo "KOKKOS_OPTIONS=${KOKKOS_OPTIONS}" > core/perf_test/Makefile
+echo "KOKKOS_SETTINGS=${KOKKOS_SETTINGS}" > core/perf_test/Makefile
echo "" >> core/perf_test/Makefile
echo "all:" >> core/perf_test/Makefile
-echo -e "\tmake -j -f ${KOKKOS_PATH}/core/perf_test/Makefile ${KOKKOS_OPTIONS}" >> core/perf_test/Makefile
+echo -e "\tmake -j -f ${KOKKOS_PATH}/core/perf_test/Makefile ${KOKKOS_SETTINGS}" >> core/perf_test/Makefile
echo "" >> core/perf_test/Makefile
echo "test: all" >> core/perf_test/Makefile
-echo -e "\tmake -f ${KOKKOS_PATH}/core/perf_test/Makefile ${KOKKOS_OPTIONS} test" >> core/perf_test/Makefile
+echo -e "\tmake -f ${KOKKOS_PATH}/core/perf_test/Makefile ${KOKKOS_SETTINGS} test" >> core/perf_test/Makefile
echo "" >> core/perf_test/Makefile
echo "clean:" >> core/perf_test/Makefile
-echo -e "\tmake -f ${KOKKOS_PATH}/core/perf_test/Makefile ${KOKKOS_OPTIONS} clean" >> core/perf_test/Makefile
+echo -e "\tmake -f ${KOKKOS_PATH}/core/perf_test/Makefile ${KOKKOS_SETTINGS} clean" >> core/perf_test/Makefile
-echo "KOKKOS_OPTIONS=${KOKKOS_OPTIONS}" > containers/unit_tests/Makefile
+echo "KOKKOS_SETTINGS=${KOKKOS_SETTINGS}" > containers/unit_tests/Makefile
echo "" >> containers/unit_tests/Makefile
echo "all:" >> containers/unit_tests/Makefile
-echo -e "\tmake -j -f ${KOKKOS_PATH}/containers/unit_tests/Makefile ${KOKKOS_OPTIONS}" >> containers/unit_tests/Makefile
+echo -e "\tmake -j -f ${KOKKOS_PATH}/containers/unit_tests/Makefile ${KOKKOS_SETTINGS}" >> containers/unit_tests/Makefile
echo "" >> containers/unit_tests/Makefile
echo "test: all" >> containers/unit_tests/Makefile
-echo -e "\tmake -f ${KOKKOS_PATH}/containers/unit_tests/Makefile ${KOKKOS_OPTIONS} test" >> containers/unit_tests/Makefile
+echo -e "\tmake -f ${KOKKOS_PATH}/containers/unit_tests/Makefile ${KOKKOS_SETTINGS} test" >> containers/unit_tests/Makefile
echo "" >> containers/unit_tests/Makefile
echo "clean:" >> containers/unit_tests/Makefile
-echo -e "\tmake -f ${KOKKOS_PATH}/containers/unit_tests/Makefile ${KOKKOS_OPTIONS} clean" >> containers/unit_tests/Makefile
+echo -e "\tmake -f ${KOKKOS_PATH}/containers/unit_tests/Makefile ${KOKKOS_SETTINGS} clean" >> containers/unit_tests/Makefile
-echo "KOKKOS_OPTIONS=${KOKKOS_OPTIONS}" > containers/performance_tests/Makefile
+echo "KOKKOS_SETTINGS=${KOKKOS_SETTINGS}" > containers/performance_tests/Makefile
echo "" >> containers/performance_tests/Makefile
echo "all:" >> containers/performance_tests/Makefile
-echo -e "\tmake -j -f ${KOKKOS_PATH}/containers/performance_tests/Makefile ${KOKKOS_OPTIONS}" >> containers/performance_tests/Makefile
+echo -e "\tmake -j -f ${KOKKOS_PATH}/containers/performance_tests/Makefile ${KOKKOS_SETTINGS}" >> containers/performance_tests/Makefile
echo "" >> containers/performance_tests/Makefile
echo "test: all" >> containers/performance_tests/Makefile
-echo -e "\tmake -f ${KOKKOS_PATH}/containers/performance_tests/Makefile ${KOKKOS_OPTIONS} test" >> containers/performance_tests/Makefile
+echo -e "\tmake -f ${KOKKOS_PATH}/containers/performance_tests/Makefile ${KOKKOS_SETTINGS} test" >> containers/performance_tests/Makefile
echo "" >> containers/performance_tests/Makefile
echo "clean:" >> containers/performance_tests/Makefile
-echo -e "\tmake -f ${KOKKOS_PATH}/containers/performance_tests/Makefile ${KOKKOS_OPTIONS} clean" >> containers/performance_tests/Makefile
+echo -e "\tmake -f ${KOKKOS_PATH}/containers/performance_tests/Makefile ${KOKKOS_SETTINGS} clean" >> containers/performance_tests/Makefile
-echo "KOKKOS_OPTIONS=${KOKKOS_OPTIONS}" > algorithms/unit_tests/Makefile
+echo "KOKKOS_SETTINGS=${KOKKOS_SETTINGS}" > algorithms/unit_tests/Makefile
echo "" >> algorithms/unit_tests/Makefile
echo "all:" >> algorithms/unit_tests/Makefile
-echo -e "\tmake -j -f ${KOKKOS_PATH}/algorithms/unit_tests/Makefile ${KOKKOS_OPTIONS}" >> algorithms/unit_tests/Makefile
+echo -e "\tmake -j -f ${KOKKOS_PATH}/algorithms/unit_tests/Makefile ${KOKKOS_SETTINGS}" >> algorithms/unit_tests/Makefile
echo "" >> algorithms/unit_tests/Makefile
echo "test: all" >> algorithms/unit_tests/Makefile
-echo -e "\tmake -f ${KOKKOS_PATH}/algorithms/unit_tests/Makefile ${KOKKOS_OPTIONS} test" >> algorithms/unit_tests/Makefile
+echo -e "\tmake -f ${KOKKOS_PATH}/algorithms/unit_tests/Makefile ${KOKKOS_SETTINGS} test" >> algorithms/unit_tests/Makefile
echo "" >> algorithms/unit_tests/Makefile
echo "clean:" >> algorithms/unit_tests/Makefile
-echo -e "\tmake -f ${KOKKOS_PATH}/algorithms/unit_tests/Makefile ${KOKKOS_OPTIONS} clean" >> algorithms/unit_tests/Makefile
+echo -e "\tmake -f ${KOKKOS_PATH}/algorithms/unit_tests/Makefile ${KOKKOS_SETTINGS} clean" >> algorithms/unit_tests/Makefile
+
+KOKKOS_SETTINGS="${KOKKOS_SETTINGS_NO_KOKKOS_PATH} KOKKOS_PATH=${KOKKOS_TEST_INSTALL_PATH}"
-echo "KOKKOS_OPTIONS=${KOKKOS_OPTIONS}" > example/fixture/Makefile
+echo "KOKKOS_SETTINGS=${KOKKOS_SETTINGS}" > example/fixture/Makefile
echo "" >> example/fixture/Makefile
echo "all:" >> example/fixture/Makefile
-echo -e "\tmake -f ${KOKKOS_PATH}/example/fixture/Makefile ${KOKKOS_OPTIONS}" >> example/fixture/Makefile
+echo -e "\tmake -j -f ${KOKKOS_PATH}/example/fixture/Makefile ${KOKKOS_SETTINGS}" >> example/fixture/Makefile
echo "" >> example/fixture/Makefile
echo "test: all" >> example/fixture/Makefile
-echo -e "\tmake -f ${KOKKOS_PATH}/example/fixture/Makefile ${KOKKOS_OPTIONS} test" >> example/fixture/Makefile
+echo -e "\tmake -j -f ${KOKKOS_PATH}/example/fixture/Makefile ${KOKKOS_SETTINGS} test" >> example/fixture/Makefile
echo "" >> example/fixture/Makefile
echo "clean:" >> example/fixture/Makefile
-echo -e "\tmake -f ${KOKKOS_PATH}/example/fixture/Makefile ${KOKKOS_OPTIONS} clean" >> example/fixture/Makefile
+echo -e "\tmake -j -f ${KOKKOS_PATH}/example/fixture/Makefile ${KOKKOS_SETTINGS} clean" >> example/fixture/Makefile
-echo "KOKKOS_OPTIONS=${KOKKOS_OPTIONS}" > example/feint/Makefile
+echo "KOKKOS_SETTINGS=${KOKKOS_SETTINGS}" > example/feint/Makefile
echo "" >> example/feint/Makefile
echo "all:" >> example/feint/Makefile
-echo -e "\tmake -f ${KOKKOS_PATH}/example/feint/Makefile ${KOKKOS_OPTIONS}" >> example/feint/Makefile
+echo -e "\tmake -j -f ${KOKKOS_PATH}/example/feint/Makefile ${KOKKOS_SETTINGS}" >> example/feint/Makefile
echo "" >> example/feint/Makefile
echo "test: all" >> example/feint/Makefile
-echo -e "\tmake -f ${KOKKOS_PATH}/example/feint/Makefile ${KOKKOS_OPTIONS} test" >> example/feint/Makefile
+echo -e "\tmake -j -f ${KOKKOS_PATH}/example/feint/Makefile ${KOKKOS_SETTINGS} test" >> example/feint/Makefile
echo "" >> example/feint/Makefile
echo "clean:" >> example/feint/Makefile
-echo -e "\tmake -f ${KOKKOS_PATH}/example/feint/Makefile ${KOKKOS_OPTIONS} clean" >> example/feint/Makefile
+echo -e "\tmake -j -f ${KOKKOS_PATH}/example/feint/Makefile ${KOKKOS_SETTINGS} clean" >> example/feint/Makefile
-echo "KOKKOS_OPTIONS=${KOKKOS_OPTIONS}" > example/fenl/Makefile
+echo "KOKKOS_SETTINGS=${KOKKOS_SETTINGS}" > example/fenl/Makefile
echo "" >> example/fenl/Makefile
echo "all:" >> example/fenl/Makefile
-echo -e "\tmake -f ${KOKKOS_PATH}/example/fenl/Makefile ${KOKKOS_OPTIONS}" >> example/fenl/Makefile
+echo -e "\tmake -j -f ${KOKKOS_PATH}/example/fenl/Makefile ${KOKKOS_SETTINGS}" >> example/fenl/Makefile
echo "" >> example/fenl/Makefile
echo "test: all" >> example/fenl/Makefile
-echo -e "\tmake -f ${KOKKOS_PATH}/example/fenl/Makefile ${KOKKOS_OPTIONS} test" >> example/fenl/Makefile
+echo -e "\tmake -j -f ${KOKKOS_PATH}/example/fenl/Makefile ${KOKKOS_SETTINGS} test" >> example/fenl/Makefile
echo "" >> example/fenl/Makefile
echo "clean:" >> example/fenl/Makefile
-echo -e "\tmake -f ${KOKKOS_PATH}/example/fenl/Makefile ${KOKKOS_OPTIONS} clean" >> example/fenl/Makefile
+echo -e "\tmake -j -f ${KOKKOS_PATH}/example/fenl/Makefile ${KOKKOS_SETTINGS} clean" >> example/fenl/Makefile
+
+echo "KOKKOS_SETTINGS=${KOKKOS_SETTINGS}" > example/tutorial/Makefile
+echo "" >> example/tutorial/Makefile
+echo "build:" >> example/tutorial/Makefile
+echo -e "\tmake -j -f ${KOKKOS_PATH}/example/tutorial/Makefile KOKKOS_SETTINGS='${KOKKOS_SETTINGS}' KOKKOS_PATH=${KOKKOS_PATH} build">> example/tutorial/Makefile
+echo "" >> example/tutorial/Makefile
+echo "test: build" >> example/tutorial/Makefile
+echo -e "\tmake -j -f ${KOKKOS_PATH}/example/tutorial/Makefile KOKKOS_SETTINGS='${KOKKOS_SETTINGS}' KOKKOS_PATH=${KOKKOS_PATH} test" >> example/tutorial/Makefile
+echo "" >> example/tutorial/Makefile
+echo "clean:" >> example/tutorial/Makefile
+echo -e "\tmake -j -f ${KOKKOS_PATH}/example/tutorial/Makefile KOKKOS_SETTINGS='${KOKKOS_SETTINGS}' KOKKOS_PATH=${KOKKOS_PATH} clean" >> example/tutorial/Makefile
+
if [ ${#KOKKOS_ENABLE_EXAMPLE_ICHOL} -gt 0 ]; then
-echo "KOKKOS_OPTIONS=${KOKKOS_OPTIONS}" > example/ichol/Makefile
+echo "KOKKOS_SETTINGS=${KOKKOS_SETTINGS}" > example/ichol/Makefile
echo "" >> example/ichol/Makefile
echo "all:" >> example/ichol/Makefile
-echo -e "\tmake -f ${KOKKOS_PATH}/example/ichol/Makefile ${KOKKOS_OPTIONS}" >> example/ichol/Makefile
+echo -e "\tmake -j -f ${KOKKOS_PATH}/example/ichol/Makefile ${KOKKOS_SETTINGS}" >> example/ichol/Makefile
echo "" >> example/ichol/Makefile
echo "test: all" >> example/ichol/Makefile
-echo -e "\tmake -f ${KOKKOS_PATH}/example/ichol/Makefile ${KOKKOS_OPTIONS} test" >> example/ichol/Makefile
+echo -e "\tmake -j -f ${KOKKOS_PATH}/example/ichol/Makefile ${KOKKOS_SETTINGS} test" >> example/ichol/Makefile
echo "" >> example/ichol/Makefile
echo "clean:" >> example/ichol/Makefile
-echo -e "\tmake -f ${KOKKOS_PATH}/example/ichol/Makefile ${KOKKOS_OPTIONS} clean" >> example/ichol/Makefile
+echo -e "\tmake -j -f ${KOKKOS_PATH}/example/ichol/Makefile ${KOKKOS_SETTINGS} clean" >> example/ichol/Makefile
fi
+KOKKOS_SETTINGS="${KOKKOS_SETTINGS_NO_KOKKOS_PATH} KOKKOS_PATH=${KOKKOS_PATH}"
+
# Generate top level directory makefile.
-echo "Generating Makefiles with options " ${KOKKOS_OPTIONS}
-echo "KOKKOS_OPTIONS=${KOKKOS_OPTIONS}" > Makefile
+echo "Generating Makefiles with options " ${KOKKOS_SETTINGS}
+echo "KOKKOS_SETTINGS=${KOKKOS_SETTINGS}" > Makefile
echo "" >> Makefile
-echo "lib:" >> Makefile
+echo "kokkoslib:" >> Makefile
echo -e "\tcd core; \\" >> Makefile
-echo -e "\tmake -j -f ${KOKKOS_PATH}/core/src/Makefile ${KOKKOS_OPTIONS}" >> Makefile
+echo -e "\tmake -j -f ${KOKKOS_PATH}/core/src/Makefile ${KOKKOS_SETTINGS} PREFIX=${KOKKOS_INSTALL_PATH} build-lib" >> Makefile
echo "" >> Makefile
-echo "install: lib" >> Makefile
+echo "install: kokkoslib" >> Makefile
echo -e "\tcd core; \\" >> Makefile
-echo -e "\tmake -j -f ${KOKKOS_PATH}/core/src/Makefile ${KOKKOS_OPTIONS} install" >> Makefile
+echo -e "\tmake -j -f ${KOKKOS_PATH}/core/src/Makefile ${KOKKOS_SETTINGS} PREFIX=${KOKKOS_INSTALL_PATH} install" >> Makefile
echo "" >> Makefile
-echo "build-test:" >> Makefile
+echo "kokkoslib-test:" >> Makefile
+echo -e "\tcd core; \\" >> Makefile
+echo -e "\tmake -j -f ${KOKKOS_PATH}/core/src/Makefile ${KOKKOS_SETTINGS} PREFIX=${KOKKOS_TEST_INSTALL_PATH} build-lib" >> Makefile
+echo "" >> Makefile
+echo "install-test: kokkoslib-test" >> Makefile
+echo -e "\tcd core; \\" >> Makefile
+echo -e "\tmake -j -f ${KOKKOS_PATH}/core/src/Makefile ${KOKKOS_SETTINGS} PREFIX=${KOKKOS_TEST_INSTALL_PATH} install" >> Makefile
+echo "" >> Makefile
+echo "build-test: install-test" >> Makefile
echo -e "\tmake -C core/unit_test" >> Makefile
echo -e "\tmake -C core/perf_test" >> Makefile
echo -e "\tmake -C containers/unit_tests" >> Makefile
echo -e "\tmake -C containers/performance_tests" >> Makefile
echo -e "\tmake -C algorithms/unit_tests" >> Makefile
echo -e "\tmake -C example/fixture" >> Makefile
echo -e "\tmake -C example/feint" >> Makefile
echo -e "\tmake -C example/fenl" >> Makefile
+echo -e "\tmake -C example/tutorial build" >> Makefile
echo "" >> Makefile
echo "test: build-test" >> Makefile
echo -e "\tmake -C core/unit_test test" >> Makefile
echo -e "\tmake -C core/perf_test test" >> Makefile
echo -e "\tmake -C containers/unit_tests test" >> Makefile
echo -e "\tmake -C containers/performance_tests test" >> Makefile
echo -e "\tmake -C algorithms/unit_tests test" >> Makefile
echo -e "\tmake -C example/fixture test" >> Makefile
echo -e "\tmake -C example/feint test" >> Makefile
echo -e "\tmake -C example/fenl test" >> Makefile
+echo -e "\tmake -C example/tutorial test" >> Makefile
+echo "" >> Makefile
+echo "unit-tests-only:" >> Makefile
+echo -e "\tmake -C core/unit_test test" >> Makefile
+echo -e "\tmake -C containers/unit_tests test" >> Makefile
+echo -e "\tmake -C algorithms/unit_tests test" >> Makefile
echo "" >> Makefile
echo "clean:" >> Makefile
echo -e "\tmake -C core/unit_test clean" >> Makefile
echo -e "\tmake -C core/perf_test clean" >> Makefile
echo -e "\tmake -C containers/unit_tests clean" >> Makefile
echo -e "\tmake -C containers/performance_tests clean" >> Makefile
echo -e "\tmake -C algorithms/unit_tests clean" >> Makefile
echo -e "\tmake -C example/fixture clean" >> Makefile
echo -e "\tmake -C example/feint clean" >> Makefile
echo -e "\tmake -C example/fenl clean" >> Makefile
+echo -e "\tmake -C example/tutorial clean" >> Makefile
echo -e "\tcd core; \\" >> Makefile
-echo -e "\tmake -f ${KOKKOS_PATH}/core/src/Makefile ${KOKKOS_OPTIONS} clean" >> Makefile
+echo -e "\tmake -f ${KOKKOS_PATH}/core/src/Makefile ${KOKKOS_SETTINGS} clean" >> Makefile
diff --git a/src/.gitignore b/src/.gitignore
index 6290aa925..ff139216b 100644
--- a/src/.gitignore
+++ b/src/.gitignore
@@ -1,1040 +1,1044 @@
/Makefile.package
/Makefile.package.settings
/MAKE/MINE
/Make.py.last
/lmp_*
/style_*.h
/*_gpu.h
/*_gpu.cpp
/*_intel.h
/*_intel.cpp
/*_kokkos.h
/*_kokkos.cpp
/*_omp.h
/*_omp.cpp
/*_tally.h
/*_tally.cpp
/*_rx.h
/*_rx.cpp
/*_ssa.h
/*_ssa.cpp
/kokkos.cpp
/kokkos.h
/kokkos_type.h
/kokkos_few.h
/manifold*.cpp
/manifold*.h
/fix_*manifold*.cpp
/fix_*manifold*.h
/fix_qeq*.cpp
/fix_qeq*.h
/compute_test_nbl.cpp
/compute_test_nbl.h
/pair_multi_lucy.cpp
/pair_multi_lucy.h
/colvarproxy_lammps.cpp
/colvarproxy_lammps.h
/fix_colvars.cpp
/fix_colvars.h
/dump_molfile.cpp
/dump_molfile.h
/molfile_interface.cpp
/molfile_interface.h
/molfile_plugin.h
/vmdplugin.h
/type_detector.h
/intel_buffers.cpp
/intel_buffers.h
/intel_intrinsics.h
/intel_preprocess.h
/intel_simd.h
/compute_sna_atom.cpp
/compute_sna_atom.h
/compute_snad_atom.cpp
/compute_snad_atom.h
/compute_snav_atom.cpp
/compute_snav_atom.h
/openmp_snap.h
/pair_snap.cpp
/pair_snap.h
/sna.cpp
/sna.h
/atom_vec_wavepacket.cpp
/atom_vec_wavepacket.h
/fix_nve_awpmd.cpp
/fix_nve_awpmd.h
/pair_awpmd_cut.cpp
/pair_awpmd_cut.h
/angle_cg_cmm.cpp
/angle_cg_cmm.h
/angle_charmm.cpp
/angle_charmm.h
/angle_class2.cpp
/angle_class2.h
/angle_cosine.cpp
/angle_cosine.h
/angle_cosine_delta.cpp
/angle_cosine_delta.h
/angle_cosine_periodic.cpp
/angle_cosine_periodic.h
/angle_cosine_shift.cpp
/angle_cosine_shift.h
/angle_cosine_shift_exp.cpp
/angle_cosine_shift_exp.h
/angle_cosine_squared.cpp
/angle_cosine_squared.h
/angle_dipole.cpp
/angle_dipole.h
/angle_fourier.cpp
/angle_fourier.h
/angle_fourier_simple.cpp
/angle_fourier_simple.h
/angle_harmonic.cpp
/angle_harmonic.h
/angle_quartic.cpp
/angle_quartic.h
/angle_sdk.cpp
/angle_sdk.h
/angle_table.cpp
/angle_table.h
/atom_vec_angle.cpp
/atom_vec_angle.h
/atom_vec_bond.cpp
/atom_vec_bond.h
/atom_vec_colloid.cpp
/atom_vec_colloid.h
/atom_vec_dipole.cpp
/atom_vec_dipole.h
/atom_vec_dpd.cpp
/atom_vec_dpd.h
/atom_vec_electron.cpp
/atom_vec_electron.h
/atom_vec_ellipsoid.cpp
/atom_vec_ellipsoid.h
/atom_vec_full.cpp
/atom_vec_full.h
/atom_vec_full_hars.cpp
/atom_vec_full_hars.h
/atom_vec_granular.cpp
/atom_vec_granular.h
/atom_vec_meso.cpp
/atom_vec_meso.h
/atom_vec_molecular.cpp
/atom_vec_molecular.h
/atom_vec_peri.cpp
/atom_vec_peri.h
/atom_vec_template.cpp
/atom_vec_template.h
/body_nparticle.cpp
/body_nparticle.h
/bond_class2.cpp
/bond_class2.h
/bond_fene.cpp
/bond_fene.h
/bond_fene_expand.cpp
/bond_fene_expand.h
/bond_harmonic.cpp
/bond_harmonic.h
/bond_harmonic_shift.cpp
/bond_harmonic_shift.h
/bond_harmonic_shift_cut.cpp
/bond_harmonic_shift_cut.h
/bond_morse.cpp
/bond_morse.h
/bond_nonlinear.cpp
/bond_nonlinear.h
/bond_quartic.cpp
/bond_quartic.h
/bond_table.cpp
/bond_table.h
/cg_cmm_parms.cpp
/cg_cmm_parms.h
/commgrid.cpp
/commgrid.h
/compute_ackland_atom.cpp
/compute_ackland_atom.h
/compute_basal_atom.cpp
/compute_basal_atom.h
/compute_body_local.cpp
/compute_body_local.h
/compute_cna_atom2.cpp
/compute_cna_atom2.h
/compute_damage_atom.cpp
/compute_damage_atom.h
/compute_dilatation_atom.cpp
/compute_dilatation_atom.h
/compute_dpd.cpp
/compute_dpd.h
/compute_dpd_atom.cpp
/compute_dpd_atom.h
/compute_erotate_asphere.cpp
/compute_erotate_asphere.h
/compute_erotate_rigid.cpp
/compute_erotate_rigid.h
/compute_event_displace.cpp
/compute_event_displace.h
/compute_fep.cpp
/compute_fep.h
/compute_force_tally.cpp
/compute_force_tally.h
/compute_heat_flux_tally.cpp
/compute_heat_flux_tally.h
/compute_ke_atom_eff.cpp
/compute_ke_atom_eff.h
/compute_ke_eff.cpp
/compute_ke_eff.h
/compute_ke_rigid.cpp
/compute_ke_rigid.h
/compute_meso_e_atom.cpp
/compute_meso_e_atom.h
/compute_meso_rho_atom.cpp
/compute_meso_rho_atom.h
/compute_meso_t_atom.cpp
/compute_meso_t_atom.h
/compute_msd_nongauss.cpp
/compute_msd_nongauss.h
/compute_pe_tally.cpp
/compute_pe_tally.h
/compute_plasticity_atom.cpp
/compute_plasticity_atom.h
/compute_pressure_grem.cpp
/compute_pressure_grem.h
/compute_rigid_local.cpp
/compute_rigid_local.h
/compute_spec_atom.cpp
/compute_spec_atom.h
/compute_stress_tally.cpp
/compute_stress_tally.h
/compute_temp_asphere.cpp
/compute_temp_asphere.h
/compute_temp_body.cpp
/compute_temp_body.h
/compute_temp_deform_eff.cpp
/compute_temp_deform_eff.h
/compute_temp_eff.cpp
/compute_temp_eff.h
/compute_temp_region_eff.cpp
/compute_temp_region_eff.h
/compute_temp_rotate.cpp
/compute_temp_rotate.h
/compute_ti.cpp
/compute_ti.h
/compute_voronoi_atom.cpp
/compute_voronoi_atom.h
/dihedral_charmm.cpp
/dihedral_charmm.h
/dihedral_class2.cpp
/dihedral_class2.h
/dihedral_cosine_shift_exp.cpp
/dihedral_cosine_shift_exp.h
/dihedral_fourier.cpp
/dihedral_fourier.h
/dihedral_harmonic.cpp
/dihedral_harmonic.h
/dihedral_helix.cpp
/dihedral_helix.h
/dihedral_hybrid.cpp
/dihedral_hybrid.h
/dihedral_multi_harmonic.cpp
/dihedral_multi_harmonic.h
/dihedral_nharmonic.cpp
/dihedral_nharmonic.h
/dihedral_opls.cpp
/dihedral_opls.h
/dihedral_quadratic.cpp
/dihedral_quadratic.h
/dihedral_spherical.cpp
/dihedral_spherical.h
/dihedral_table.cpp
/dihedral_table.h
/dump_atom_gz.cpp
/dump_atom_gz.h
/dump_xyz_gz.cpp
/dump_xyz_gz.h
/dump_atom_mpiio.cpp
/dump_atom_mpiio.h
/dump_cfg_gz.cpp
/dump_cfg_gz.h
/dump_cfg_mpiio.cpp
/dump_cfg_mpiio.h
/dump_custom_gz.cpp
/dump_custom_gz.h
/dump_custom_mpiio.cpp
/dump_custom_mpiio.h
/dump_custom_vtk.cpp
/dump_custom_vtk.h
/dump_h5md.cpp
/dump_h5md.h
/dump_nc.cpp
/dump_nc.h
/dump_nc_mpiio.cpp
/dump_nc_mpiio.h
/dump_xtc.cpp
/dump_xtc.h
/dump_xyz_mpiio.cpp
/dump_xyz_mpiio.h
/ewald.cpp
/ewald.h
/ewald_cg.cpp
/ewald_cg.h
/ewald_disp.cpp
/ewald_disp.h
/ewald_n.cpp
/ewald_n.h
/fft3d.cpp
/fft3d.h
/fft3d_wrap.cpp
/fft3d_wrap.h
/fix_adapt_fep.cpp
/fix_adapt_fep.h
/fix_addtorque.cpp
/fix_addtorque.h
/fix_append_atoms.cpp
/fix_append_atoms.h
/fix_atc.cpp
/fix_atc.h
/fix_ave_correlate_long.cpp
/fix_ave_correlate_long.h
/fix_bond_break.cpp
/fix_bond_break.h
/fix_bond_create.cpp
/fix_bond_create.h
/fix_bond_swap.cpp
/fix_bond_swap.h
/fix_cmap.cpp
/fix_cmap.h
/fix_deposit.cpp
/fix_deposit.h
/fix_dpd_energy.cpp
/fix_dpd_energy.h
/fix_efield.cpp
/fix_efield.h
/fix_eos_cv.cpp
/fix_eos_cv.h
/fix_eos_table.cpp
/fix_eos_table.h
/fix_evaporate.cpp
/fix_evaporate.h
/fix_viscosity.cpp
/fix_viscosity.h
/fix_ehex.cpp
/fix_ehex.h
/fix_event.cpp
/fix_event.h
/fix_event_prd.cpp
/fix_event_prd.h
/fix_event_tad.cpp
/fix_event_tad.h
/fix_flow_gauss.cpp
/fix_flow_gauss.h
/fix_freeze.cpp
/fix_freeze.h
/fix_gcmc.cpp
/fix_gcmc.h
/fix_gld.cpp
/fix_gld.h
/fix_gle.cpp
/fix_gle.h
/fix_gpu.cpp
/fix_gpu.h
/fix_grem.cpp
/fix_grem.h
/fix_imd.cpp
/fix_imd.h
/fix_ipi.cpp
/fix_ipi.h
/fix_lambdah_calc.cpp
/fix_lambdah_calc.h
/fix_langevin_eff.cpp
/fix_langevin_eff.h
/fix_lb_fluid.cpp
/fix_lb_fluid.h
/fix_lb_momentum.cpp
/fix_lb_momentum.h
/fix_lb_pc.cpp
/fix_lb_pc.h
/fix_lb_rigid_pc_sphere.cpp
/fix_lb_rigid_pc_sphere.h
/fix_lb_viscous.cpp
/fix_lb_viscous.h
/fix_load_report.cpp
/fix_load_report.h
/fix_meso.cpp
/fix_meso.h
/fix_meso_stationary.cpp
/fix_meso_stationary.h
+/fix_mscg.cpp
+/fix_mscg.h
/fix_msst.cpp
/fix_msst.h
/fix_neb.cpp
/fix_neb.h
/fix_nh_asphere.cpp
/fix_nh_asphere.h
/fix_nph_asphere.cpp
/fix_nph_asphere.h
/fix_npt_asphere.cpp
/fix_npt_asphere.h
/fix_nve_asphere.cpp
/fix_nve_asphere.h
/fix_nve_asphere_noforce.cpp
/fix_nve_asphere_noforce.h
/fix_nh_body.cpp
/fix_nh_body.h
/fix_nph_body.cpp
/fix_nph_body.h
/fix_npt_body.cpp
/fix_npt_body.h
+/fix_nvk.cpp
+/fix_nvk.h
/fix_nvt_body.cpp
/fix_nvt_body.h
/fix_nve_body.cpp
/fix_nve_body.h
/fix_nvt_asphere.cpp
/fix_nvt_asphere.h
/fix_nh_eff.cpp
/fix_nh_eff.h
/fix_nph_eff.cpp
/fix_nph_eff.h
/fix_nphug.cpp
/fix_nphug.h
/fix_npt_eff.cpp
/fix_npt_eff.h
/fix_nve_eff.cpp
/fix_nve_eff.h
/fix_nve_line.cpp
/fix_nve_line.h
/fix_nvt_eff.cpp
/fix_nvt_eff.h
/fix_nvt_sllod_eff.cpp
/fix_nvt_sllod_eff.h
/fix_nve_tri.cpp
/fix_nve_tri.h
/fix_oneway.cpp
/fix_oneway.h
/fix_orient_bcc.cpp
/fix_orient_bcc.h
/fix_orient_fcc.cpp
/fix_orient_fcc.h
/fix_peri_neigh.cpp
/fix_peri_neigh.h
/fix_phonon.cpp
/fix_phonon.h
/fix_poems.cpp
/fix_poems.h
/fix_pour.cpp
/fix_pour.h
/fix_qeq_comb.cpp
/fix_qeq_comb.h
/fix_qeq_reax.cpp
/fix_qeq_fire.cpp
/fix_qeq_fire.h
/fix_qeq_reax.h
/fix_qmmm.cpp
/fix_qmmm.h
/fix_reax_bonds.cpp
/fix_reax_bonds.h
/fix_reax_c.cpp
/fix_reax_c.h
/fix_reaxc_bonds.cpp
/fix_reaxc_bonds.h
/fix_reaxc_species.cpp
/fix_reaxc_species.h
/fix_rigid.cpp
/fix_rigid.h
/fix_rigid_nh.cpp
/fix_rigid_nh.h
/fix_rigid_nph.cpp
/fix_rigid_nph.h
/fix_rigid_npt.cpp
/fix_rigid_npt.h
/fix_rigid_nve.cpp
/fix_rigid_nve.h
/fix_rigid_nvt.cpp
/fix_rigid_nvt.h
/fix_rigid_nh_small.cpp
/fix_rigid_nh_small.h
/fix_rigid_nph_small.cpp
/fix_rigid_nph_small.h
/fix_rigid_npt_small.cpp
/fix_rigid_npt_small.h
/fix_rigid_nve_small.cpp
/fix_rigid_nve_small.h
/fix_rigid_nvt_small.cpp
/fix_rigid_nvt_small.h
/fix_rigid_small.cpp
/fix_rigid_small.h
/fix_shake.cpp
/fix_shake.h
/fix_shardlow.cpp
/fix_shardlow.h
/fix_smd.cpp
/fix_smd.h
/fix_species.cpp
/fix_species.h
/fix_spring_pull.cpp
/fix_spring_pull.h
/fix_srd.cpp
/fix_srd.h
/fix_temp_rescale_eff.cpp
/fix_temp_rescale_eff.h
/fix_thermal_conductivity.cpp
/fix_thermal_conductivity.h
/fix_ti_rs.cpp
/fix_ti_rs.h
/fix_ti_spring.cpp
/fix_ti_spring.h
/fix_ttm.cpp
/fix_ttm.h
/fix_tune_kspace.cpp
/fix_tune_kspace.h
/fix_wall_colloid.cpp
/fix_wall_colloid.h
/fix_wall_gran.cpp
/fix_wall_gran.h
/fix_wall_gran_region.cpp
/fix_wall_gran_region.h
/fix_wall_piston.cpp
/fix_wall_piston.h
/fix_wall_srd.cpp
/fix_wall_srd.h
/gpu_extra.h
/gridcomm.cpp
/gridcomm.h
/group_ndx.cpp
/group_ndx.h
/ndx_group.cpp
/ndx_group.h
/improper_class2.cpp
/improper_class2.h
/improper_cossq.cpp
/improper_cossq.h
/improper_cvff.cpp
/improper_cvff.h
/improper_distance.cpp
/improper_distance.h
/improper_fourier.cpp
/improper_fourier.h
/improper_harmonic.cpp
/improper_harmonic.h
/improper_hybrid.cpp
/improper_hybrid.h
/improper_ring.cpp
/improper_ring.h
/improper_umbrella.cpp
/improper_umbrella.h
/kissfft.h
/lj_sdk_common.h
/math_complex.h
/math_vector.h
/mgpt_*.cpp
/mgpt_*.h
/msm.cpp
/msm.h
/msm_cg.cpp
/msm_cg.h
/neb.cpp
/neb.h
/pair_adp.cpp
/pair_adp.h
/pair_agni.cpp
/pair_agni.h
/pair_airebo.cpp
/pair_airebo.h
/pair_airebo_morse.cpp
/pair_airebo_morse.h
/pair_body.cpp
/pair_body.h
/pair_bop.cpp
/pair_bop.h
/pair_born_coul_long.cpp
/pair_born_coul_long.h
/pair_born_coul_msm.cpp
/pair_born_coul_msm.h
/pair_brownian.cpp
/pair_brownian.h
/pair_brownian_poly.cpp
/pair_brownian_poly.h
/pair_buck_coul_long.cpp
/pair_buck_coul_long.h
/pair_buck_coul_msm.cpp
/pair_buck_coul_msm.h
/pair_buck_coul.cpp
/pair_buck_coul.h
/pair_buck_long_coul_long.cpp
/pair_buck_long_coul_long.h
/pair_cdeam.cpp
/pair_cdeam.h
/pair_cg_cmm.cpp
/pair_cg_cmm.h
/pair_cg_cmm_coul_cut.cpp
/pair_cg_cmm_coul_cut.h
/pair_cg_cmm_coul_long.cpp
/pair_cg_cmm_coul_long.h
/pair_cmm_common.cpp
/pair_cmm_common.h
/pair_cg_cmm_coul_msm.cpp
/pair_cg_cmm_coul_msm.h
/pair_comb.cpp
/pair_comb.h
/pair_comb3.cpp
/pair_comb3.h
/pair_colloid.cpp
/pair_colloid.h
/pair_coul_diel.cpp
/pair_coul_diel.h
/pair_coul_long.cpp
/pair_coul_long.h
/pair_coul_msm.cpp
/pair_coul_msm.h
/pair_dipole_cut.cpp
/pair_dipole_cut.h
/pair_dipole_sf.cpp
/pair_dipole_sf.h
/pair_dpd_mt.cpp
/pair_dpd_mt.h
/pair_dsmc.cpp
/pair_dsmc.h
/pair_eam.cpp
/pair_eam.h
/pair_eam_opt.cpp
/pair_eam_opt.h
/pair_eam_alloy.cpp
/pair_eam_alloy.h
/pair_eam_alloy_opt.cpp
/pair_eam_alloy_opt.h
/pair_eam_fs.cpp
/pair_eam_fs.h
/pair_eam_fs_opt.cpp
/pair_eam_fs_opt.h
/pair_edip.cpp
/pair_edip.h
/pair_eff_cut.cpp
/pair_eff_cut.h
/pair_eff_inline.h
/pair_eim.cpp
/pair_eim.h
/pair_gauss_cut.cpp
/pair_gauss_cut.h
/pair_gayberne.cpp
/pair_gayberne.h
/pair_gran_easy.cpp
/pair_gran_easy.h
/pair_gran_hertz_history.cpp
/pair_gran_hertz_history.h
/pair_gran_hooke.cpp
/pair_gran_hooke.h
/pair_gran_hooke_history.cpp
/pair_gran_hooke_history.h
/pair_gw.cpp
/pair_gw.h
/pair_gw_zbl.cpp
/pair_gw_zbl.h
/pair_hbond_dreiding_lj.cpp
/pair_hbond_dreiding_lj.h
/pair_hbond_dreiding_morse.cpp
/pair_hbond_dreiding_morse.h
/pair_lcbop.cpp
/pair_lcbop.h
/pair_line_lj.cpp
/pair_line_lj.h
/pair_list.cpp
/pair_list.h
/pair_lj_charmm_coul_charmm.cpp
/pair_lj_charmm_coul_charmm.h
/pair_lj_charmm_coul_charmm_implicit.cpp
/pair_lj_charmm_coul_charmm_implicit.h
/pair_lj_charmm_coul_long.cpp
/pair_lj_charmm_coul_long.h
/pair_lj_charmm_coul_long_opt.cpp
/pair_lj_charmm_coul_long_opt.h
/pair_lj_charmm_coul_long_soft.cpp
/pair_lj_charmm_coul_long_soft.h
/pair_lj_charmm_coul_msm.cpp
/pair_lj_charmm_coul_msm.h
/pair_lj_class2.cpp
/pair_lj_class2.h
/pair_lj_class2_coul_cut.cpp
/pair_lj_class2_coul_cut.h
/pair_lj_class2_coul_long.cpp
/pair_lj_class2_coul_long.h
/pair_lj_coul.cpp
/pair_lj_coul.h
/pair_coul_cut_soft.cpp
/pair_coul_cut_soft.h
/pair_coul_long_soft.cpp
/pair_coul_long_soft.h
/pair_lj_cut_coul_cut_soft.cpp
/pair_lj_cut_coul_cut_soft.h
/pair_lj_cut_tip4p_cut.cpp
/pair_lj_cut_tip4p_cut.h
/pair_lj_cut_coul_long.cpp
/pair_lj_cut_coul_long.h
/pair_lj_cut_coul_long_opt.cpp
/pair_lj_cut_coul_long_opt.h
/pair_lj_cut_coul_long_soft.cpp
/pair_lj_cut_coul_long_soft.h
/pair_lj_cut_coul_msm.cpp
/pair_lj_cut_coul_msm.h
/pair_lj_cut_dipole_cut.cpp
/pair_lj_cut_dipole_cut.h
/pair_lj_cut_dipole_long.cpp
/pair_lj_cut_dipole_long.h
/pair_lj_cut_*hars_*.cpp
/pair_lj_cut_*hars_*.h
/pair_lj_cut_soft.cpp
/pair_lj_cut_soft.h
/pair_lj_cut_tip4p_long.cpp
/pair_lj_cut_tip4p_long.h
/pair_lj_cut_tip4p_long_opt.cpp
/pair_lj_cut_tip4p_long_opt.h
/pair_lj_cut_tip4p_long_soft.cpp
/pair_lj_cut_tip4p_long_soft.h
/pair_lj_long_coul_long.cpp
/pair_lj_long_coul_long.h
/pair_lj_long_coul_long_opt.cpp
/pair_lj_long_coul_long_opt.h
/pair_lj_long_dipole_long.cpp
/pair_lj_long_dipole_long.h
/pair_lj_long_tip4p_long.cpp
/pair_lj_long_tip4p_long.h
/pair_lj_cut_opt.cpp
/pair_lj_cut_opt.h
/pair_lj_cut_tgpu.cpp
/pair_lj_cut_tgpu.h
/pair_lj_sdk.cpp
/pair_lj_sdk.h
/pair_lj_sdk_coul_long.cpp
/pair_lj_sdk_coul_long.h
/pair_lj_sdk_coul_msm.cpp
/pair_lj_sdk_coul_msm.h
/pair_lj_sf.cpp
/pair_lj_sf.h
/pair_lj_sf_dipole_sf.cpp
/pair_lj_sf_dipole_sf.h
/pair_lubricateU.cpp
/pair_lubricateU.h
/pair_lubricateU_poly.cpp
/pair_lubricateU_poly.h
/pair_lubricate_poly.cpp
/pair_lubricate_poly.h
/pair_lubricate.cpp
/pair_lubricate.h
/pair_meam.cpp
/pair_meam.h
/pair_meam_spline.cpp
/pair_meam_spline.h
/pair_meam_sw_spline.cpp
/pair_meam_sw_spline.h
/pair_morse_opt.cpp
/pair_morse_opt.h
/pair_morse_soft.cpp
/pair_morse_soft.h
/pair_nb3b_harmonic.cpp
/pair_nb3b_harmonic.h
/pair_nm_cut.cpp
/pair_nm_cut.h
/pair_nm_cut_coul_cut.cpp
/pair_nm_cut_coul_cut.h
/pair_nm_cut_coul_long.cpp
/pair_nm_cut_coul_long.h
/pair_peri_eps.cpp
/pair_peri_eps.h
/pair_peri_lps.cpp
/pair_peri_lps.h
/pair_peri_pmb.cpp
/pair_peri_pmb.h
/pair_peri_ves.cpp
/pair_peri_ves.h
/pair_reax.cpp
/pair_reax.h
/pair_reax_fortran.h
/pair_reax_c.cpp
/pair_reax_c.h
/pair_rebo.cpp
/pair_rebo.h
/pair_resquared.cpp
/pair_resquared.h
/pair_sph_heatconduction.cpp
/pair_sph_heatconduction.h
/pair_sph_idealgas.cpp
/pair_sph_idealgas.h
/pair_sph_lj.cpp
/pair_sph_lj.h
/pair_sph_rhosum.cpp
/pair_sph_rhosum.h
/pair_sph_taitwater.cpp
/pair_sph_taitwater.h
/pair_sph_taitwater_morris.cpp
/pair_sph_taitwater_morris.h
/pair_sw.cpp
/pair_sw.h
/pair_tersoff.cpp
/pair_tersoff.h
/pair_tersoff_mod.cpp
/pair_tersoff_mod.h
/pair_tersoff_mod_c.cpp
/pair_tersoff_mod_c.h
/pair_tersoff_table.cpp
/pair_tersoff_table.h
/pair_tersoff_zbl.cpp
/pair_tersoff_zbl.h
/pair_tip4p_cut.cpp
/pair_tip4p_cut.h
/pair_tip4p_long.cpp
/pair_tip4p_long.h
/pair_tip4p_long_soft.cpp
/pair_tip4p_long_soft.h
/pair_tri_lj.cpp
/pair_tri_lj.h
/pair_yukawa_colloid.cpp
/pair_yukawa_colloid.h
/pppm.cpp
/pppm.h
/pppm_cg.cpp
/pppm_cg.h
/pppm_disp.cpp
/pppm_disp.h
/pppm_disp_tip4p.cpp
/pppm_disp_tip4p.h
/pppm_old.cpp
/pppm_old.h
/pppm_proxy.cpp
/pppm_proxy.h
/pppm_stagger.cpp
/pppm_stagger.h
/pppm_tip4p.cpp
/pppm_tip4p.h
/pppm_tip4p_proxy.cpp
/pppm_tip4p_proxy.h
/pppm_tip4p_cg.cpp
/pppm_tip4p_cg.h
/prd.cpp
/prd.h
/python.cpp
/python.h
/reader_molfile.cpp
/reader_molfile.h
/reaxc_allocate.cpp
/reaxc_allocate.h
/reaxc_basic_comm.cpp
/reaxc_basic_comm.h
/reaxc_bond_orders.cpp
/reaxc_bond_orders.h
/reaxc_bonds.cpp
/reaxc_bonds.h
/reaxc_control.cpp
/reaxc_control.h
/reaxc_defs.h
/reaxc_ffield.cpp
/reaxc_ffield.h
/reaxc_forces.cpp
/reaxc_forces.h
/reaxc_hydrogen_bonds.cpp
/reaxc_hydrogen_bonds.h
/reaxc_init_md.cpp
/reaxc_init_md.h
/reaxc_io_tools.cpp
/reaxc_io_tools.h
/reaxc_list.cpp
/reaxc_list.h
/reaxc_lookup.cpp
/reaxc_lookup.h
/reaxc_multi_body.cpp
/reaxc_multi_body.h
/reaxc_nonbonded.cpp
/reaxc_nonbonded.h
/reaxc_reset_tools.cpp
/reaxc_reset_tools.h
/reaxc_system_props.cpp
/reaxc_system_props.h
/reaxc_tool_box.cpp
/reaxc_tool_box.h
/reaxc_torsion_angles.cpp
/reaxc_torsion_angles.h
/reaxc_traj.cpp
/reaxc_traj.h
/reaxc_types.h
/reaxc_valence_angles.cpp
/reaxc_valence_angles.h
/reaxc_vector.cpp
/reaxc_vector.h
/remap.cpp
/remap.h
/remap_wrap.cpp
/remap_wrap.h
/restart_mpiio.cpp
/restart_mpiio.h
/smd_kernels.h
/smd_material_models.cpp
/smd_material_models.h
/smd_math.h
/tad.cpp
/tad.h
/temper.cpp
/temper.h
/temper_grem.cpp
/temper_grem.h
/thr_data.cpp
/thr_data.h
/verlet_split.cpp
/verlet_split.h
/write_dump.cpp
/write_dump.h
/xdr_compat.cpp
/xdr_compat.h
/atom_vec_smd.cpp
/atom_vec_smd.h
/compute_saed.cpp
/compute_saed.h
/compute_saed_consts.h
/compute_smd_contact_radius.cpp
/compute_smd_contact_radius.h
/compute_smd_damage.cpp
/compute_smd_damage.h
/compute_smd_hourglass_error.cpp
/compute_smd_hourglass_error.h
/compute_smd_internal_energy.cpp
/compute_smd_internal_energy.h
/compute_smd_plastic_strain.cpp
/compute_smd_plastic_strain.h
/compute_smd_plastic_strain_rate.cpp
/compute_smd_plastic_strain_rate.h
/compute_smd_rho.cpp
/compute_smd_rho.h
/compute_smd_tlsph_defgrad.cpp
/compute_smd_tlsph_defgrad.h
/compute_smd_tlsph_dt.cpp
/compute_smd_tlsph_dt.h
/compute_smd_tlsph_num_neighs.cpp
/compute_smd_tlsph_num_neighs.h
/compute_smd_tlsph_shape.cpp
/compute_smd_tlsph_shape.h
/compute_smd_tlsph_strain.cpp
/compute_smd_tlsph_strain.h
/compute_smd_tlsph_strain_rate.cpp
/compute_smd_tlsph_strain_rate.h
/compute_smd_tlsph_stress.cpp
/compute_smd_tlsph_stress.h
/compute_smd_triangle_mesh_vertices.cpp
/compute_smd_triangle_mesh_vertices.h
/compute_smd_ulsph_effm.cpp
/compute_smd_ulsph_effm.h
/compute_smd_ulsph_num_neighs.cpp
/compute_smd_ulsph_num_neighs.h
/compute_smd_ulsph_strain.cpp
/compute_smd_ulsph_strain.h
/compute_smd_ulsph_strain_rate.cpp
/compute_smd_ulsph_strain_rate.h
/compute_smd_ulsph_stress.cpp
/compute_smd_ulsph_stress.h
/compute_smd_vol.cpp
/compute_smd_vol.h
/compute_temp_cs.cpp
/compute_temp_cs.h
/compute_temp_drude.cpp
/compute_temp_drude.h
/compute_xrd.cpp
/compute_xrd.h
/compute_xrd_consts.h
/fix_atom_swap.cpp
/fix_atom_swap.h
/fix_ave_spatial_sphere.cpp
/fix_ave_spatial_sphere.h
/fix_drude.cpp
/fix_drude.h
/fix_drude_transform.cpp
/fix_drude_transform.h
/fix_langevin_drude.cpp
/fix_langevin_drude.h
/fix_pimd.cpp
/fix_pimd.h
/fix_qbmsst.cpp
/fix_qbmsst.h
/fix_qtb.cpp
/fix_qtb.h
/fix_rattle.cpp
/fix_rattle.h
/fix_saed_vtk.cpp
/fix_saed_vtk.h
/fix_smd_adjust_dt.cpp
/fix_smd_adjust_dt.h
/fix_smd_integrate_tlsph.cpp
/fix_smd_integrate_tlsph.h
/fix_smd_integrate_ulsph.cpp
/fix_smd_integrate_ulsph.h
/fix_smd_move_triangulated_surface.cpp
/fix_smd_move_triangulated_surface.h
/fix_smd_setvel.cpp
/fix_smd_setvel.h
/fix_smd_tlsph_reference_configuration.cpp
/fix_smd_tlsph_reference_configuration.h
/fix_smd_wall_surface.cpp
/fix_smd_wall_surface.h
/fix_srp.cpp
/fix_srp.h
/fix_tfmc.cpp
/fix_tfmc.h
/fix_ttm_mod.cpp
/fix_ttm_mod.h
/pair_born_coul_long_cs.cpp
/pair_born_coul_long_cs.h
/pair_born_coul_dsf_cs.cpp
/pair_born_coul_dsf_cs.h
/pair_buck_coul_long_cs.cpp
/pair_buck_coul_long_cs.h
/pair_coul_long_cs.cpp
/pair_coul_long_cs.h
/pair_lj_cut_thole_long.cpp
/pair_lj_cut_thole_long.h
/pair_plum_hb.cpp
/pair_plum_hb.h
/pair_plum_hp.cpp
/pair_plum_hp.h
/pair_polymorphic.cpp
/pair_polymorphic.h
/pair_smd_hertz.cpp
/pair_smd_hertz.h
/pair_smd_tlsph.cpp
/pair_smd_tlsph.h
/pair_smd_triangulated_surface.cpp
/pair_smd_triangulated_surface.h
/pair_smd_ulsph.cpp
/pair_smd_ulsph.h
/pair_srp.cpp
/pair_srp.h
/pair_thole.cpp
/pair_thole.h
/pair_buck_mdf.cpp
/pair_buck_mdf.h
/pair_dpd_conservative.cpp
/pair_dpd_conservative.h
/pair_dpd_fdt.cpp
/pair_dpd_fdt.h
/pair_dpd_fdt_energy.cpp
/pair_dpd_fdt_energy.h
/pair_lennard_mdf.cpp
/pair_lennard_mdf.h
/pair_lj_cut_coul_long_cs.cpp
/pair_lj_cut_coul_long_cs.h
/pair_lj_mdf.cpp
/pair_lj_mdf.h
/pair_mgpt.cpp
/pair_mgpt.h
/pair_morse_smooth_linear.cpp
/pair_morse_smooth_linear.h
/pair_smtbq.cpp
/pair_smtbq.h
/pair_vashishta*.cpp
/pair_vashishta*.h
diff --git a/src/GRANULAR/fix_pour.cpp b/src/GRANULAR/fix_pour.cpp
index e8baca759..c4e03a24a 100644
--- a/src/GRANULAR/fix_pour.cpp
+++ b/src/GRANULAR/fix_pour.cpp
@@ -1,1071 +1,1071 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include <math.h>
#include <stdlib.h>
#include <string.h>
#include "fix_pour.h"
#include "atom.h"
#include "atom_vec.h"
#include "force.h"
#include "update.h"
#include "comm.h"
#include "molecule.h"
#include "modify.h"
#include "fix_gravity.h"
#include "domain.h"
#include "region.h"
#include "region_block.h"
#include "region_cylinder.h"
#include "random_park.h"
#include "math_extra.h"
#include "math_const.h"
#include "memory.h"
#include "error.h"
using namespace LAMMPS_NS;
using namespace FixConst;
using namespace MathConst;
enum{ATOM,MOLECULE};
enum{ONE,RANGE,POLY};
enum{LAYOUT_UNIFORM,LAYOUT_NONUNIFORM,LAYOUT_TILED}; // several files
#define EPSILON 0.001
#define SMALL 1.0e-10
/* ---------------------------------------------------------------------- */
FixPour::FixPour(LAMMPS *lmp, int narg, char **arg) :
- Fix(lmp, narg, arg), radius_poly(NULL), frac_poly(NULL),
- idrigid(NULL), idshake(NULL), onemols(NULL), molfrac(NULL), coords(NULL),
- imageflags(NULL), fixrigid(NULL), fixshake(NULL), recvcounts(NULL),
+ Fix(lmp, narg, arg), radius_poly(NULL), frac_poly(NULL),
+ idrigid(NULL), idshake(NULL), onemols(NULL), molfrac(NULL), coords(NULL),
+ imageflags(NULL), fixrigid(NULL), fixshake(NULL), recvcounts(NULL),
displs(NULL), random(NULL), random2(NULL)
{
if (narg < 6) error->all(FLERR,"Illegal fix pour command");
time_depend = 1;
if (!atom->radius_flag || !atom->rmass_flag)
error->all(FLERR,"Fix pour requires atom attributes radius, rmass");
// required args
ninsert = force->inumeric(FLERR,arg[3]);
ntype = force->inumeric(FLERR,arg[4]);
seed = force->inumeric(FLERR,arg[5]);
if (seed <= 0) error->all(FLERR,"Illegal fix pour command");
// read options from end of input line
options(narg-6,&arg[6]);
// error check on type
if (mode == ATOM && (ntype <= 0 || ntype > atom->ntypes))
error->all(FLERR,"Invalid atom type in fix pour command");
// error checks on region and its extent being inside simulation box
if (iregion == -1) error->all(FLERR,"Must specify a region in fix pour");
if (domain->regions[iregion]->bboxflag == 0)
error->all(FLERR,"Fix pour region does not support a bounding box");
if (domain->regions[iregion]->dynamic_check())
error->all(FLERR,"Fix pour region cannot be dynamic");
if (strcmp(domain->regions[iregion]->style,"block") == 0) {
region_style = 1;
xlo = ((RegBlock *) domain->regions[iregion])->xlo;
xhi = ((RegBlock *) domain->regions[iregion])->xhi;
ylo = ((RegBlock *) domain->regions[iregion])->ylo;
yhi = ((RegBlock *) domain->regions[iregion])->yhi;
zlo = ((RegBlock *) domain->regions[iregion])->zlo;
zhi = ((RegBlock *) domain->regions[iregion])->zhi;
if (xlo < domain->boxlo[0] || xhi > domain->boxhi[0] ||
ylo < domain->boxlo[1] || yhi > domain->boxhi[1] ||
zlo < domain->boxlo[2] || zhi > domain->boxhi[2])
error->all(FLERR,"Insertion region extends outside simulation box");
} else if (strcmp(domain->regions[iregion]->style,"cylinder") == 0) {
region_style = 2;
char axis = ((RegCylinder *) domain->regions[iregion])->axis;
xc = ((RegCylinder *) domain->regions[iregion])->c1;
yc = ((RegCylinder *) domain->regions[iregion])->c2;
rc = ((RegCylinder *) domain->regions[iregion])->radius;
zlo = ((RegCylinder *) domain->regions[iregion])->lo;
zhi = ((RegCylinder *) domain->regions[iregion])->hi;
if (axis != 'z')
error->all(FLERR,"Must use a z-axis cylinder region with fix pour");
if (xc-rc < domain->boxlo[0] || xc+rc > domain->boxhi[0] ||
yc-rc < domain->boxlo[1] || yc+rc > domain->boxhi[1] ||
zlo < domain->boxlo[2] || zhi > domain->boxhi[2])
error->all(FLERR,"Insertion region extends outside simulation box");
} else error->all(FLERR,"Must use a block or cylinder region with fix pour");
if (region_style == 2 && domain->dimension == 2)
error->all(FLERR,
"Must use a block region with fix pour for 2d simulations");
// error check and further setup for mode = MOLECULE
if (atom->tag_enable == 0)
error->all(FLERR,"Cannot use fix_pour unless atoms have IDs");
if (mode == MOLECULE) {
for (int i = 0; i < nmol; i++) {
if (onemols[i]->xflag == 0)
error->all(FLERR,"Fix pour molecule must have coordinates");
if (onemols[i]->typeflag == 0)
error->all(FLERR,"Fix pour molecule must have atom types");
if (ntype+onemols[i]->ntypes <= 0 ||
ntype+onemols[i]->ntypes > atom->ntypes)
error->all(FLERR,"Invalid atom type in fix pour mol command");
if (atom->molecular == 2 && onemols != atom->avec->onemols)
error->all(FLERR,"Fix pour molecule template ID must be same "
"as atom style template ID");
onemols[i]->check_attributes(0);
// fix pour uses geoemetric center of molecule for insertion
onemols[i]->compute_center();
}
}
if (rigidflag && mode == ATOM)
error->all(FLERR,"Cannot use fix pour rigid and not molecule");
if (shakeflag && mode == ATOM)
error->all(FLERR,"Cannot use fix pour shake and not molecule");
if (rigidflag && shakeflag)
error->all(FLERR,"Cannot use fix pour rigid and shake");
// setup of coords and imageflags array
if (mode == ATOM) natom_max = 1;
else {
natom_max = 0;
for (int i = 0; i < nmol; i++)
natom_max = MAX(natom_max,onemols[i]->natoms);
}
memory->create(coords,natom_max,4,"pour:coords");
memory->create(imageflags,natom_max,"pour:imageflags");
// find max atom and molecule IDs just once
if (idnext) find_maxid();
// random number generator, same for all procs
random = new RanPark(lmp,seed);
// allgather arrays
MPI_Comm_rank(world,&me);
MPI_Comm_size(world,&nprocs);
recvcounts = new int[nprocs];
displs = new int[nprocs];
// grav = gravity in distance/time^2 units
// assume grav = -magnitude at this point, enforce in init()
int ifix;
for (ifix = 0; ifix < modify->nfix; ifix++) {
if (strcmp(modify->fix[ifix]->style,"gravity") == 0) break;
if (strcmp(modify->fix[ifix]->style,"gravity/omp") == 0) break;
}
if (ifix == modify->nfix)
error->all(FLERR,"No fix gravity defined for fix pour");
grav = - ((FixGravity *) modify->fix[ifix])->magnitude * force->ftm2v;
// nfreq = timesteps between insertions
// should be time for a particle to fall from top of insertion region
// to bottom, taking into account that the region may be moving
// set these 2 eqs equal to each other, solve for smallest positive t
// x = zhi + vz*t + 1/2 grav t^2
// x = zlo + rate*t
// gives t = [-(vz-rate) - sqrt((vz-rate)^2 - 2*grav*(zhi-zlo))] / grav
// where zhi-zlo > 0, grav < 0, and vz & rate can be either > 0 or < 0
double v_relative,delta;
if (domain->dimension == 3) {
v_relative = vz - rate;
delta = zhi - zlo;
} else {
v_relative = vy - rate;
delta = yhi - ylo;
}
double t =
(-v_relative - sqrt(v_relative*v_relative - 2.0*grav*delta)) / grav;
nfreq = static_cast<int> (t/update->dt + 0.5);
// 1st insertion on next timestep
force_reneighbor = 1;
next_reneighbor = update->ntimestep + 1;
nfirst = next_reneighbor;
ninserted = 0;
// nper = # to insert each time
// depends on specified volume fraction
// volume = volume of insertion region
// volume_one = volume of inserted particle (with max possible radius)
// in 3d, insure dy >= 1, for quasi-2d simulations
double volume,volume_one=1.0;
molradius_max = 0.0;
if (mode == MOLECULE) {
for (int i = 0; i < nmol; i++)
molradius_max = MAX(molradius_max,onemols[i]->molradius);
}
if (domain->dimension == 3) {
if (region_style == 1) {
double dy = yhi - ylo;
if (dy < 1.0) dy = 1.0;
volume = (xhi-xlo) * dy * (zhi-zlo);
} else volume = MY_PI*rc*rc * (zhi-zlo);
if (mode == MOLECULE) {
volume_one = 4.0/3.0 * MY_PI * molradius_max*molradius_max*molradius_max;
} else if (dstyle == ONE || dstyle == RANGE) {
volume_one = 4.0/3.0 * MY_PI * radius_max*radius_max*radius_max;
} else if (dstyle == POLY) {
volume_one = 0.0;
for (int i = 0; i < npoly; i++)
volume_one += (4.0/3.0 * MY_PI *
radius_poly[i]*radius_poly[i]*radius_poly[i]) * frac_poly[i];
}
} else {
volume = (xhi-xlo) * (yhi-ylo);
if (mode == MOLECULE) {
volume_one = MY_PI * molradius_max*molradius_max;
} else if (dstyle == ONE || dstyle == RANGE) {
volume_one = MY_PI * radius_max*radius_max;
} else if (dstyle == POLY) {
volume_one = 0.0;
for (int i = 0; i < npoly; i++)
volume_one += (MY_PI * radius_poly[i]*radius_poly[i]) * frac_poly[i];
}
}
nper = static_cast<int> (volfrac*volume/volume_one);
if (nper == 0) error->all(FLERR,"Fix pour insertion count per timestep is 0");
int nfinal = update->ntimestep + 1 + (ninsert-1)/nper * nfreq;
// print stats
if (me == 0) {
if (screen)
fprintf(screen,
"Particle insertion: %d every %d steps, %d by step %d\n",
nper,nfreq,ninsert,nfinal);
if (logfile)
fprintf(logfile,
"Particle insertion: %d every %d steps, %d by step %d\n",
nper,nfreq,ninsert,nfinal);
}
}
/* ---------------------------------------------------------------------- */
FixPour::~FixPour()
{
delete random;
delete [] molfrac;
delete [] idrigid;
delete [] idshake;
delete [] radius_poly;
delete [] frac_poly;
memory->destroy(coords);
memory->destroy(imageflags);
delete [] recvcounts;
delete [] displs;
}
/* ---------------------------------------------------------------------- */
int FixPour::setmask()
{
int mask = 0;
mask |= PRE_EXCHANGE;
return mask;
}
/* ---------------------------------------------------------------------- */
void FixPour::init()
{
if (domain->triclinic)
error->all(FLERR,"Cannot use fix pour with triclinic box");
// insure gravity fix exists
// for 3d must point in -z, for 2d must point in -y
// else insertion cannot work
int ifix;
for (ifix = 0; ifix < modify->nfix; ifix++) {
if (strcmp(modify->fix[ifix]->style,"gravity") == 0) break;
if (strcmp(modify->fix[ifix]->style,"gravity/omp") == 0) break;
}
if (ifix == modify->nfix)
error->all(FLERR,"No fix gravity defined for fix pour");
double xgrav = ((FixGravity *) modify->fix[ifix])->xgrav;
double ygrav = ((FixGravity *) modify->fix[ifix])->ygrav;
double zgrav = ((FixGravity *) modify->fix[ifix])->zgrav;
if (domain->dimension == 3) {
if (fabs(xgrav) > EPSILON || fabs(ygrav) > EPSILON ||
fabs(zgrav+1.0) > EPSILON)
error->all(FLERR,"Gravity must point in -z to use with fix pour in 3d");
} else {
if (fabs(xgrav) > EPSILON || fabs(ygrav+1.0) > EPSILON ||
fabs(zgrav) > EPSILON)
error->all(FLERR,"Gravity must point in -y to use with fix pour in 2d");
}
double gnew = - ((FixGravity *) modify->fix[ifix])->magnitude * force->ftm2v;
if (gnew != grav)
error->all(FLERR,"Gravity changed since fix pour was created");
// if rigidflag defined, check for rigid/small fix
// its molecule template must be same as this one
fixrigid = NULL;
if (rigidflag) {
int ifix = modify->find_fix(idrigid);
if (ifix < 0) error->all(FLERR,"Fix pour rigid fix does not exist");
fixrigid = modify->fix[ifix];
int tmp;
if (onemols != (Molecule **) fixrigid->extract("onemol",tmp))
error->all(FLERR,
"Fix pour and fix rigid/small not using "
"same molecule template ID");
}
// if shakeflag defined, check for SHAKE fix
// its molecule template must be same as this one
fixshake = NULL;
if (shakeflag) {
int ifix = modify->find_fix(idshake);
if (ifix < 0) error->all(FLERR,"Fix pour shake fix does not exist");
fixshake = modify->fix[ifix];
int tmp;
if (onemols != (Molecule **) fixshake->extract("onemol",tmp))
error->all(FLERR,"Fix pour and fix shake not using "
"same molecule template ID");
}
}
/* ----------------------------------------------------------------------
perform particle insertion
------------------------------------------------------------------------- */
void FixPour::pre_exchange()
{
int i,m,flag,nlocalprev,imol,natom;
double r[3],rotmat[3][3],quat[4],vnew[3];
double *newcoord;
// just return if should not be called on this timestep
if (next_reneighbor != update->ntimestep) return;
// clear ghost count and any ghost bonus data internal to AtomVec
// same logic as beginning of Comm::exchange()
// do it now b/c inserting atoms will overwrite ghost atoms
atom->nghost = 0;
atom->avec->clear_bonus();
// find current max atom and molecule IDs on every insertion step
if (!idnext) find_maxid();
// nnew = # of particles (atoms or molecules) to insert this timestep
int nnew = nper;
if (ninserted + nnew > ninsert) nnew = ninsert - ninserted;
// lo/hi current = z (or y) bounds of insertion region this timestep
int dimension = domain->dimension;
if (dimension == 3) {
lo_current = zlo + (update->ntimestep - nfirst) * update->dt * rate;
hi_current = zhi + (update->ntimestep - nfirst) * update->dt * rate;
} else {
lo_current = ylo + (update->ntimestep - nfirst) * update->dt * rate;
hi_current = yhi + (update->ntimestep - nfirst) * update->dt * rate;
}
// ncount = # of my atoms that overlap the insertion region
// nprevious = total of ncount across all procs
int ncount = 0;
for (i = 0; i < atom->nlocal; i++)
if (overlap(i)) ncount++;
int nprevious;
MPI_Allreduce(&ncount,&nprevious,1,MPI_INT,MPI_SUM,world);
// xmine is for my atoms
// xnear is for atoms from all procs + atoms to be inserted
double **xmine,**xnear;
memory->create(xmine,ncount,4,"fix_pour:xmine");
memory->create(xnear,nprevious+nnew*natom_max,4,"fix_pour:xnear");
int nnear = nprevious;
// setup for allgatherv
int n = 4*ncount;
MPI_Allgather(&n,1,MPI_INT,recvcounts,1,MPI_INT,world);
displs[0] = 0;
for (int iproc = 1; iproc < nprocs; iproc++)
displs[iproc] = displs[iproc-1] + recvcounts[iproc-1];
// load up xmine array
double **x = atom->x;
double *radius = atom->radius;
ncount = 0;
for (i = 0; i < atom->nlocal; i++)
if (overlap(i)) {
xmine[ncount][0] = x[i][0];
xmine[ncount][1] = x[i][1];
xmine[ncount][2] = x[i][2];
xmine[ncount][3] = radius[i];
ncount++;
}
// perform allgatherv to acquire list of nearby particles on all procs
double *ptr = NULL;
if (ncount) ptr = xmine[0];
MPI_Allgatherv(ptr,4*ncount,MPI_DOUBLE,
xnear[0],recvcounts,displs,MPI_DOUBLE,world);
// insert new particles into xnear list, one by one
// check against all nearby atoms and previously inserted ones
// if there is an overlap then try again at same z (3d) or y (2d) coord
// else insert by adding to xnear list
// max = maximum # of insertion attempts for all particles
// h = height, biased to give uniform distribution in time of insertion
// for MOLECULE mode:
// coords = coords of all atoms in particle
// perform random rotation around center pt
// apply PBC so final coords are inside box
// store image flag modified due to PBC
int success;
double radtmp,delx,dely,delz,rsq,radsum,rn,h;
double coord[3];
double denstmp;
double *sublo = domain->sublo;
double *subhi = domain->subhi;
int nsuccess = 0;
int attempt = 0;
int maxiter = nnew * maxattempt;
while (nsuccess < nnew) {
rn = random->uniform();
h = hi_current - rn*rn * (hi_current-lo_current);
if (mode == ATOM) radtmp = radius_sample();
success = 0;
while (attempt < maxiter) {
attempt++;
xyz_random(h,coord);
if (mode == ATOM) {
natom = 1;
coords[0][0] = coord[0];
coords[0][1] = coord[1];
coords[0][2] = coord[2];
coords[0][3] = radtmp;
imageflags[0] = ((imageint) IMGMAX << IMG2BITS) |
((imageint) IMGMAX << IMGBITS) | IMGMAX;
} else {
double rng = random->uniform();
imol = 0;
while (rng > molfrac[imol]) imol++;
natom = onemols[imol]->natoms;
if (dimension == 3) {
r[0] = random->uniform() - 0.5;
r[1] = random->uniform() - 0.5;
r[2] = random->uniform() - 0.5;
} else {
r[0] = r[1] = 0.0;
r[2] = 1.0;
}
double theta = random->uniform() * MY_2PI;
MathExtra::norm3(r);
MathExtra::axisangle_to_quat(r,theta,quat);
MathExtra::quat_to_mat(quat,rotmat);
for (i = 0; i < natom; i++) {
MathExtra::matvec(rotmat,onemols[imol]->dx[i],coords[i]);
coords[i][0] += coord[0];
coords[i][1] += coord[1];
coords[i][2] += coord[2];
// coords[3] = particle radius
// default to 0.5, if radii not defined in Molecule
// same as atom->avec->create_atom(), invoked below
if (onemols[imol]->radiusflag)
coords[i][3] = onemols[imol]->radius[i];
else coords[i][3] = 0.5;
imageflags[i] = ((imageint) IMGMAX << IMG2BITS) |
((imageint) IMGMAX << IMGBITS) | IMGMAX;
domain->remap(coords[i],imageflags[i]);
}
}
// if any pair of atoms overlap, try again
// use minimum_image() to account for PBC
for (m = 0; m < natom; m++) {
for (i = 0; i < nnear; i++) {
delx = coords[m][0] - xnear[i][0];
dely = coords[m][1] - xnear[i][1];
delz = coords[m][2] - xnear[i][2];
- domain->minimum_image(delx,dely,delz);
+ domain->minimum_image(delx,dely,delz);
rsq = delx*delx + dely*dely + delz*delz;
radsum = coords[m][3] + xnear[i][3];
if (rsq <= radsum*radsum) break;
}
if (i < nnear) break;
}
if (m == natom) {
success = 1;
break;
}
}
if (!success) break;
// proceed with insertion
nsuccess++;
nlocalprev = atom->nlocal;
// add all atoms in particle to xnear
for (m = 0; m < natom; m++) {
xnear[nnear][0] = coords[m][0];
xnear[nnear][1] = coords[m][1];
xnear[nnear][2] = coords[m][2];
xnear[nnear][3] = coords[m][3];
nnear++;
}
// choose random velocity for new particle
// used for every atom in molecule
// z velocity set to what velocity would be if particle
// had fallen from top of insertion region
// this gives continuous stream of atoms
// solution for v from these 2 eqs, after eliminate t:
// v = vz + grav*t
// coord[2] = hi_current + vz*t + 1/2 grav t^2
if (dimension == 3) {
vnew[0] = vxlo + random->uniform() * (vxhi-vxlo);
vnew[1] = vylo + random->uniform() * (vyhi-vylo);
vnew[2] = -sqrt(vz*vz + 2.0*grav*(coord[2]-hi_current));
} else {
vnew[0] = vxlo + random->uniform() * (vxhi-vxlo);
vnew[1] = -sqrt(vy*vy + 2.0*grav*(coord[1]-hi_current));
vnew[2] = 0.0;
}
// check if new atoms are in my sub-box or above it if I am highest proc
// if so, add atom to my list via create_atom()
// initialize additional info about the atoms
// set group mask to "all" plus fix group
for (m = 0; m < natom; m++) {
if (mode == ATOM)
denstmp = density_lo + random->uniform() * (density_hi-density_lo);
newcoord = coords[m];
flag = 0;
if (newcoord[0] >= sublo[0] && newcoord[0] < subhi[0] &&
newcoord[1] >= sublo[1] && newcoord[1] < subhi[1] &&
newcoord[2] >= sublo[2] && newcoord[2] < subhi[2]) flag = 1;
else if (dimension == 3 && newcoord[2] >= domain->boxhi[2]) {
if (comm->layout != LAYOUT_TILED) {
if (comm->myloc[2] == comm->procgrid[2]-1 &&
newcoord[0] >= sublo[0] && newcoord[0] < subhi[0] &&
newcoord[1] >= sublo[1] && newcoord[1] < subhi[1]) flag = 1;
} else {
if (comm->mysplit[2][1] == 1.0 &&
newcoord[0] >= sublo[0] && newcoord[0] < subhi[0] &&
newcoord[1] >= sublo[1] && newcoord[1] < subhi[1]) flag = 1;
}
} else if (dimension == 2 && newcoord[1] >= domain->boxhi[1]) {
if (comm->layout != LAYOUT_TILED) {
if (comm->myloc[1] == comm->procgrid[1]-1 &&
newcoord[0] >= sublo[0] && newcoord[0] < subhi[0]) flag = 1;
} else {
if (comm->mysplit[1][1] == 1.0 &&
newcoord[0] >= sublo[0] && newcoord[0] < subhi[0]) flag = 1;
}
}
if (flag) {
if (mode == ATOM) atom->avec->create_atom(ntype,coords[m]);
else atom->avec->create_atom(ntype+onemols[imol]->type[m],coords[m]);
int n = atom->nlocal - 1;
atom->tag[n] = maxtag_all + m+1;
if (mode == MOLECULE) {
if (atom->molecule_flag) atom->molecule[n] = maxmol_all+1;
if (atom->molecular == 2) {
atom->molindex[n] = 0;
atom->molatom[n] = m;
}
}
atom->mask[n] = 1 | groupbit;
atom->image[n] = imageflags[m];
atom->v[n][0] = vnew[0];
atom->v[n][1] = vnew[1];
atom->v[n][2] = vnew[2];
if (mode == ATOM) {
radtmp = newcoord[3];
atom->radius[n] = radtmp;
atom->rmass[n] = 4.0*MY_PI/3.0 * radtmp*radtmp*radtmp * denstmp;
} else {
- onemols[imol]->quat_external = quat;
- atom->add_molecule_atom(onemols[imol],m,n,maxtag_all);
- }
-
+ onemols[imol]->quat_external = quat;
+ atom->add_molecule_atom(onemols[imol],m,n,maxtag_all);
+ }
+
modify->create_attribute(n);
}
}
// FixRigidSmall::set_molecule stores rigid body attributes
// coord is new position of geometric center of mol, not COM
// FixShake::set_molecule stores shake info for molecule
if (rigidflag)
fixrigid->set_molecule(nlocalprev,maxtag_all,imol,coord,vnew,quat);
else if (shakeflag)
fixshake->set_molecule(nlocalprev,maxtag_all,imol,coord,vnew,quat);
maxtag_all += natom;
if (mode == MOLECULE && atom->molecule_flag) maxmol_all++;
}
// warn if not successful with all insertions b/c too many attempts
int ninserted_atoms = nnear - nprevious;
int ninserted_mols = ninserted_atoms / natom;
ninserted += ninserted_mols;
if (ninserted_mols < nnew && me == 0)
error->warning(FLERR,"Less insertions than requested",0);
// reset global natoms,nbonds,etc
// increment maxtag_all and maxmol_all if necessary
// if global map exists, reset it now instead of waiting for comm
// since other pre-exchange fixes may use it
// invoke map_init() b/c atom count has grown
if (ninserted_atoms) {
atom->natoms += ninserted_atoms;
if (atom->natoms < 0)
error->all(FLERR,"Too many total atoms");
if (mode == MOLECULE) {
atom->nbonds += onemols[imol]->nbonds * ninserted_mols;
atom->nangles += onemols[imol]->nangles * ninserted_mols;
atom->ndihedrals += onemols[imol]->ndihedrals * ninserted_mols;
atom->nimpropers += onemols[imol]->nimpropers * ninserted_mols;
}
if (maxtag_all >= MAXTAGINT)
error->all(FLERR,"New atom IDs exceed maximum allowed ID");
if (atom->map_style) {
atom->map_init();
atom->map_set();
}
}
// free local memory
memory->destroy(xmine);
memory->destroy(xnear);
// next timestep to insert
if (ninserted < ninsert) next_reneighbor += nfreq;
else next_reneighbor = 0;
}
/* ----------------------------------------------------------------------
maxtag_all = current max atom ID for all atoms
maxmol_all = current max molecule ID for all atoms
------------------------------------------------------------------------- */
void FixPour::find_maxid()
{
tagint *tag = atom->tag;
tagint *molecule = atom->molecule;
int nlocal = atom->nlocal;
tagint max = 0;
for (int i = 0; i < nlocal; i++) max = MAX(max,tag[i]);
MPI_Allreduce(&max,&maxtag_all,1,MPI_LMP_TAGINT,MPI_MAX,world);
if (mode == MOLECULE && molecule) {
max = 0;
for (int i = 0; i < nlocal; i++) max = MAX(max,molecule[i]);
MPI_Allreduce(&max,&maxmol_all,1,MPI_LMP_TAGINT,MPI_MAX,world);
}
}
/* ----------------------------------------------------------------------
check if particle i could overlap with a particle inserted into region
return 1 if yes, 0 if no
for ATOM mode, use delta with maximum size for inserted atoms
for MOLECULE mode, use delta with max radius of inserted molecules
if ignore line/tri set, ignore line or tri particles
account for PBC in overlap decision via outside() and minimum_image()
------------------------------------------------------------------------- */
int FixPour::overlap(int i)
{
double delta;
if (ignoreflag) {
if (ignoreline && atom->line[i] >= 0) return 0;
if (ignoretri && atom->tri[i] >= 0) return 0;
}
if (mode == ATOM) delta = atom->radius[i] + radius_max;
else delta = atom->radius[i] + molradius_max;
double *x = atom->x[i];
if (domain->dimension == 3) {
if (region_style == 1) {
if (outside(0,x[0],xlo-delta,xhi+delta)) return 0;
if (outside(1,x[1],ylo-delta,yhi+delta)) return 0;
if (outside(2,x[2],lo_current-delta,hi_current+delta)) return 0;
} else {
double delx = x[0] - xc;
double dely = x[1] - yc;
double delz = 0.0;
domain->minimum_image(delx,dely,delz);
double rsq = delx*delx + dely*dely;
double r = rc + delta;
if (rsq > r*r) return 0;
if (outside(2,x[2],lo_current-delta,hi_current+delta)) return 0;
}
} else {
if (outside(0,x[0],xlo-delta,xhi+delta)) return 0;
if (outside(1,x[1],lo_current-delta,hi_current+delta)) return 0;
}
return 1;
}
/* ----------------------------------------------------------------------
check if value is inside/outside lo/hi bounds in dimension
account for PBC if needed
return 1 if value is outside, 0 if inside
------------------------------------------------------------------------- */
int FixPour::outside(int dim, double value, double lo, double hi)
{
double boxlo = domain->boxlo[dim];
double boxhi = domain->boxhi[dim];
if (domain->periodicity[dim]) {
if (lo < boxlo && hi > boxhi) {
return 0;
} else if (lo < boxlo) {
if (value > hi && value < lo + domain->prd[dim]) return 1;
} else if (hi > boxhi) {
if (value > hi - domain->prd[dim] && value < lo) return 1;
} else {
if (value < lo || value > hi) return 1;
}
}
if (value < lo || value > hi) return 1;
return 0;
}
/* ---------------------------------------------------------------------- */
void FixPour::xyz_random(double h, double *coord)
{
if (domain->dimension == 3) {
if (region_style == 1) {
coord[0] = xlo + random->uniform() * (xhi-xlo);
coord[1] = ylo + random->uniform() * (yhi-ylo);
coord[2] = h;
} else {
double r1,r2;
while (1) {
r1 = random->uniform() - 0.5;
r2 = random->uniform() - 0.5;
if (r1*r1 + r2*r2 < 0.25) break;
}
coord[0] = xc + 2.0*r1*rc;
coord[1] = yc + 2.0*r2*rc;
coord[2] = h;
}
} else {
coord[0] = xlo + random->uniform() * (xhi-xlo);
coord[1] = h;
coord[2] = 0.0;
}
}
/* ---------------------------------------------------------------------- */
double FixPour::radius_sample()
{
if (dstyle == ONE) return radius_one;
if (dstyle == RANGE) return radius_lo +
random->uniform()*(radius_hi-radius_lo);
double value = random->uniform();
int i = 0;
double sum = 0.0;
while (sum < value) {
sum += frac_poly[i];
i++;
}
return radius_poly[i-1];
}
/* ----------------------------------------------------------------------
parse optional parameters at end of input line
------------------------------------------------------------------------- */
void FixPour::options(int narg, char **arg)
{
// defaults
iregion = -1;
mode = ATOM;
molfrac = NULL;
rigidflag = 0;
idrigid = NULL;
shakeflag = 0;
idshake = NULL;
idnext = 0;
ignoreflag = ignoreline = ignoretri = 0;
dstyle = ONE;
radius_max = radius_one = 0.5;
radius_poly = frac_poly = NULL;
density_lo = density_hi = 1.0;
volfrac = 0.25;
maxattempt = 50;
rate = 0.0;
vxlo = vxhi = vylo = vyhi = vy = vz = 0.0;
int iarg = 0;
while (iarg < narg) {
if (strcmp(arg[iarg],"region") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal fix pour command");
iregion = domain->find_region(arg[iarg+1]);
if (iregion == -1) error->all(FLERR,"Fix pour region ID does not exist");
iarg += 2;
} else if (strcmp(arg[iarg],"mol") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal fix pour command");
int imol = atom->find_molecule(arg[iarg+1]);
if (imol == -1)
error->all(FLERR,"Molecule template ID for fix pour does not exist");
mode = MOLECULE;
onemols = &atom->molecules[imol];
nmol = onemols[0]->nset;
delete [] molfrac;
molfrac = new double[nmol];
molfrac[0] = 1.0/nmol;
for (int i = 1; i < nmol-1; i++) molfrac[i] = molfrac[i-1] + 1.0/nmol;
molfrac[nmol-1] = 1.0;
iarg += 2;
} else if (strcmp(arg[iarg],"molfrac") == 0) {
if (mode != MOLECULE) error->all(FLERR,"Illegal fix pour command");
if (iarg+nmol+1 > narg) error->all(FLERR,"Illegal fix pour command");
molfrac[0] = force->numeric(FLERR,arg[iarg+1]);
for (int i = 1; i < nmol; i++)
molfrac[i] = molfrac[i-1] + force->numeric(FLERR,arg[iarg+i+1]);
if (molfrac[nmol-1] < 1.0-EPSILON || molfrac[nmol-1] > 1.0+EPSILON)
error->all(FLERR,"Illegal fix pour command");
molfrac[nmol-1] = 1.0;
iarg += nmol+1;
} else if (strcmp(arg[iarg],"rigid") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal fix pour command");
int n = strlen(arg[iarg+1]) + 1;
delete [] idrigid;
idrigid = new char[n];
strcpy(idrigid,arg[iarg+1]);
rigidflag = 1;
iarg += 2;
} else if (strcmp(arg[iarg],"shake") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal fix pour command");
int n = strlen(arg[iarg+1]) + 1;
delete [] idshake;
idshake = new char[n];
strcpy(idshake,arg[iarg+1]);
shakeflag = 1;
iarg += 2;
} else if (strcmp(arg[iarg],"id") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal fix pour command");
if (strcmp(arg[iarg+1],"max") == 0) idnext = 0;
else if (strcmp(arg[iarg+1],"next") == 0) idnext = 1;
else error->all(FLERR,"Illegal fix pour command");
iarg += 2;
} else if (strcmp(arg[iarg],"ignore") == 0) {
if (atom->line_flag) ignoreline = 1;
if (atom->tri_flag) ignoretri = 1;
if (ignoreline || ignoretri) ignoreflag = 1;
iarg += 1;
} else if (strcmp(arg[iarg],"diam") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal fix pour command");
if (strcmp(arg[iarg+1],"one") == 0) {
if (iarg+3 > narg) error->all(FLERR,"Illegal fix pour command");
dstyle = ONE;
radius_one = 0.5 * force->numeric(FLERR,arg[iarg+2]);
radius_max = radius_one;
iarg += 3;
} else if (strcmp(arg[iarg+1],"range") == 0) {
if (iarg+4 > narg) error->all(FLERR,"Illegal fix pour command");
dstyle = RANGE;
radius_lo = 0.5 * force->numeric(FLERR,arg[iarg+2]);
radius_hi = 0.5 * force->numeric(FLERR,arg[iarg+3]);
if (radius_lo > radius_hi) error->all(FLERR,"Illegal fix pour command");
radius_max = radius_hi;
iarg += 4;
} else if (strcmp(arg[iarg+1],"poly") == 0) {
if (iarg+3 > narg) error->all(FLERR,"Illegal fix pour command");
dstyle = POLY;
npoly = force->inumeric(FLERR,arg[iarg+2]);
if (npoly <= 0) error->all(FLERR,"Illegal fix pour command");
if (iarg+3 + 2*npoly > narg)
error->all(FLERR,"Illegal fix pour command");
radius_poly = new double[npoly];
frac_poly = new double[npoly];
iarg += 3;
radius_max = 0.0;
for (int i = 0; i < npoly; i++) {
radius_poly[i] = 0.5 * force->numeric(FLERR,arg[iarg++]);
frac_poly[i] = force->numeric(FLERR,arg[iarg++]);
if (radius_poly[i] <= 0.0 || frac_poly[i] < 0.0)
error->all(FLERR,"Illegal fix pour command");
radius_max = MAX(radius_max,radius_poly[i]);
}
double sum = 0.0;
for (int i = 0; i < npoly; i++) sum += frac_poly[i];
if (fabs(sum - 1.0) > SMALL)
error->all(FLERR,"Fix pour polydisperse fractions do not sum to 1.0");
} else error->all(FLERR,"Illegal fix pour command");
} else if (strcmp(arg[iarg],"dens") == 0) {
if (iarg+3 > narg) error->all(FLERR,"Illegal fix pour command");
density_lo = force->numeric(FLERR,arg[iarg+1]);
density_hi = force->numeric(FLERR,arg[iarg+2]);
if (density_lo > density_hi) error->all(FLERR,"Illegal fix pour command");
iarg += 3;
} else if (strcmp(arg[iarg],"vol") == 0) {
if (iarg+3 > narg) error->all(FLERR,"Illegal fix pour command");
volfrac = force->numeric(FLERR,arg[iarg+1]);
maxattempt = force->inumeric(FLERR,arg[iarg+2]);
iarg += 3;
} else if (strcmp(arg[iarg],"rate") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal fix pour command");
rate = force->numeric(FLERR,arg[iarg+1]);
iarg += 2;
} else if (strcmp(arg[iarg],"vel") == 0) {
if (domain->dimension == 3) {
if (iarg+6 > narg) error->all(FLERR,"Illegal fix pour command");
vxlo = force->numeric(FLERR,arg[iarg+1]);
vxhi = force->numeric(FLERR,arg[iarg+2]);
vylo = force->numeric(FLERR,arg[iarg+3]);
vyhi = force->numeric(FLERR,arg[iarg+4]);
- if (vxlo > vxhi || vylo > vyhi)
+ if (vxlo > vxhi || vylo > vyhi)
error->all(FLERR,"Illegal fix pour command");
vz = force->numeric(FLERR,arg[iarg+5]);
iarg += 6;
} else {
if (iarg+4 > narg) error->all(FLERR,"Illegal fix pour command");
vxlo = force->numeric(FLERR,arg[iarg+1]);
vxhi = force->numeric(FLERR,arg[iarg+2]);
vy = force->numeric(FLERR,arg[iarg+3]);
vz = 0.0;
if (vxlo > vxhi) error->all(FLERR,"Illegal fix pour command");
iarg += 4;
}
} else error->all(FLERR,"Illegal fix pour command");
}
}
/* ---------------------------------------------------------------------- */
void FixPour::reset_dt()
{
error->all(FLERR,"Cannot change timestep with fix pour");
}
/* ----------------------------------------------------------------------
extract particle radius for atom type = itype
------------------------------------------------------------------------- */
void *FixPour::extract(const char *str, int &itype)
{
if (strcmp(str,"radius") == 0) {
if (mode == ATOM) {
if (itype == ntype) oneradius = radius_max;
else oneradius = 0.0;
} else {
// loop over onemols molecules
// skip a molecule with no atoms as large as itype
oneradius = 0.0;
for (int i = 0; i < nmol; i++) {
if (itype > ntype+onemols[i]->ntypes) continue;
double *radius = onemols[i]->radius;
int *type = onemols[i]->type;
int natoms = onemols[i]->natoms;
// check radii of atoms in Molecule with matching types
// default to 0.5, if radii not defined in Molecule
// same as atom->avec->create_atom(), invoked in pre_exchange()
for (int i = 0; i < natoms; i++)
if (type[i]+ntype == itype) {
if (radius) oneradius = MAX(oneradius,radius[i]);
else oneradius = MAX(oneradius,0.5);
}
}
}
itype = 0;
return &oneradius;
}
return NULL;
}
diff --git a/src/GRANULAR/fix_wall_gran.cpp b/src/GRANULAR/fix_wall_gran.cpp
index 05933fd57..eeec94fdf 100644
--- a/src/GRANULAR/fix_wall_gran.cpp
+++ b/src/GRANULAR/fix_wall_gran.cpp
@@ -1,1149 +1,1149 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing authors: Leo Silbert (SNL), Gary Grest (SNL),
Dan Bolintineanu (SNL)
------------------------------------------------------------------------- */
#include <math.h>
#include <stdlib.h>
#include <string.h>
#include "fix_wall_gran.h"
#include "atom.h"
#include "domain.h"
#include "update.h"
#include "force.h"
#include "pair.h"
#include "modify.h"
#include "respa.h"
#include "math_const.h"
#include "memory.h"
#include "error.h"
using namespace LAMMPS_NS;
using namespace FixConst;
using namespace MathConst;
// XYZ PLANE need to be 0,1,2
enum{XPLANE=0,YPLANE=1,ZPLANE=2,ZCYLINDER,REGION};
enum{HOOKE,HOOKE_HISTORY,HERTZ_HISTORY,BONDED_HISTORY};
enum{NONE,CONSTANT,EQUAL};
#define BIG 1.0e20
/* ---------------------------------------------------------------------- */
FixWallGran::FixWallGran(LAMMPS *lmp, int narg, char **arg) :
Fix(lmp, narg, arg), idregion(NULL), shearone(NULL), fix_rigid(NULL), mass_rigid(NULL)
{
if (narg < 4) error->all(FLERR,"Illegal fix wall/gran command");
if (!atom->sphere_flag)
error->all(FLERR,"Fix wall/gran requires atom style sphere");
create_attribute = 1;
// set interaction style
// disable bonded/history option for now
if (strcmp(arg[3],"hooke") == 0) pairstyle = HOOKE;
else if (strcmp(arg[3],"hooke/history") == 0) pairstyle = HOOKE_HISTORY;
else if (strcmp(arg[3],"hertz/history") == 0) pairstyle = HERTZ_HISTORY;
//else if (strcmp(arg[3],"bonded/history") == 0) pairstyle = BONDED_HISTORY;
else error->all(FLERR,"Invalid fix wall/gran interaction style");
history = restart_peratom = 1;
if (pairstyle == HOOKE) history = restart_peratom = 0;
// wall/particle coefficients
int iarg;
if (pairstyle != BONDED_HISTORY) {
if (narg < 11) error->all(FLERR,"Illegal fix wall/gran command");
kn = force->numeric(FLERR,arg[4]);
if (strcmp(arg[5],"NULL") == 0) kt = kn * 2.0/7.0;
else kt = force->numeric(FLERR,arg[5]);
gamman = force->numeric(FLERR,arg[6]);
if (strcmp(arg[7],"NULL") == 0) gammat = 0.5 * gamman;
else gammat = force->numeric(FLERR,arg[7]);
xmu = force->numeric(FLERR,arg[8]);
int dampflag = force->inumeric(FLERR,arg[9]);
if (dampflag == 0) gammat = 0.0;
if (kn < 0.0 || kt < 0.0 || gamman < 0.0 || gammat < 0.0 ||
xmu < 0.0 || xmu > 10000.0 || dampflag < 0 || dampflag > 1)
error->all(FLERR,"Illegal fix wall/gran command");
// convert Kn and Kt from pressure units to force/distance^2 if Hertzian
if (pairstyle == HERTZ_HISTORY) {
kn /= force->nktv2p;
kt /= force->nktv2p;
}
iarg = 10;
}
else {
if (narg < 10) error->all(FLERR,"Illegal fix wall/gran command");
E = force->numeric(FLERR,arg[4]);
G = force->numeric(FLERR,arg[5]);
SurfEnergy = force->numeric(FLERR,arg[6]);
// Note: this doesn't get used, check w/ Jeremy?
gamman = force->numeric(FLERR,arg[7]);
xmu = force->numeric(FLERR,arg[8]);
// pois = E/(2.0*G) - 1.0;
// kn = 2.0*E/(3.0*(1.0+pois)*(1.0-pois));
// gammat=0.5*gamman;
iarg = 9;
}
// wallstyle args
idregion = NULL;
if (strcmp(arg[iarg],"xplane") == 0) {
if (narg < iarg+3) error->all(FLERR,"Illegal fix wall/gran command");
wallstyle = XPLANE;
if (strcmp(arg[iarg+1],"NULL") == 0) lo = -BIG;
else lo = force->numeric(FLERR,arg[iarg+1]);
if (strcmp(arg[iarg+2],"NULL") == 0) hi = BIG;
else hi = force->numeric(FLERR,arg[iarg+2]);
iarg += 3;
} else if (strcmp(arg[iarg],"yplane") == 0) {
if (narg < iarg+3) error->all(FLERR,"Illegal fix wall/gran command");
wallstyle = YPLANE;
if (strcmp(arg[iarg+1],"NULL") == 0) lo = -BIG;
else lo = force->numeric(FLERR,arg[iarg+1]);
if (strcmp(arg[iarg+2],"NULL") == 0) hi = BIG;
else hi = force->numeric(FLERR,arg[iarg+2]);
iarg += 3;
} else if (strcmp(arg[iarg],"zplane") == 0) {
if (narg < iarg+3) error->all(FLERR,"Illegal fix wall/gran command");
wallstyle = ZPLANE;
if (strcmp(arg[iarg+1],"NULL") == 0) lo = -BIG;
else lo = force->numeric(FLERR,arg[iarg+1]);
if (strcmp(arg[iarg+2],"NULL") == 0) hi = BIG;
else hi = force->numeric(FLERR,arg[iarg+2]);
iarg += 3;
} else if (strcmp(arg[iarg],"zcylinder") == 0) {
if (narg < iarg+2) error->all(FLERR,"Illegal fix wall/gran command");
wallstyle = ZCYLINDER;
lo = hi = 0.0;
- cylradius = force->numeric(FLERR,arg[iarg+3]);
+ cylradius = force->numeric(FLERR,arg[iarg+1]);
iarg += 2;
} else if (strcmp(arg[iarg],"region") == 0) {
if (narg < iarg+2) error->all(FLERR,"Illegal fix wall/gran command");
wallstyle = REGION;
int n = strlen(arg[iarg+1]) + 1;
idregion = new char[n];
strcpy(idregion,arg[iarg+1]);
iarg += 2;
}
// optional args
wiggle = 0;
wshear = 0;
while (iarg < narg) {
if (strcmp(arg[iarg],"wiggle") == 0) {
if (iarg+4 > narg) error->all(FLERR,"Illegal fix wall/gran command");
if (strcmp(arg[iarg+1],"x") == 0) axis = 0;
else if (strcmp(arg[iarg+1],"y") == 0) axis = 1;
else if (strcmp(arg[iarg+1],"z") == 0) axis = 2;
else error->all(FLERR,"Illegal fix wall/gran command");
amplitude = force->numeric(FLERR,arg[iarg+2]);
period = force->numeric(FLERR,arg[iarg+3]);
wiggle = 1;
iarg += 4;
} else if (strcmp(arg[iarg],"shear") == 0) {
if (iarg+3 > narg) error->all(FLERR,"Illegal fix wall/gran command");
if (strcmp(arg[iarg+1],"x") == 0) axis = 0;
else if (strcmp(arg[iarg+1],"y") == 0) axis = 1;
else if (strcmp(arg[iarg+1],"z") == 0) axis = 2;
else error->all(FLERR,"Illegal fix wall/gran command");
vshear = force->numeric(FLERR,arg[iarg+2]);
wshear = 1;
iarg += 3;
} else error->all(FLERR,"Illegal fix wall/gran command");
}
if (wallstyle == XPLANE && domain->xperiodic)
error->all(FLERR,"Cannot use wall in periodic dimension");
if (wallstyle == YPLANE && domain->yperiodic)
error->all(FLERR,"Cannot use wall in periodic dimension");
if (wallstyle == ZPLANE && domain->zperiodic)
error->all(FLERR,"Cannot use wall in periodic dimension");
if (wallstyle == ZCYLINDER && (domain->xperiodic || domain->yperiodic))
error->all(FLERR,"Cannot use wall in periodic dimension");
if (wiggle && wshear)
error->all(FLERR,"Cannot wiggle and shear fix wall/gran");
if (wiggle && wallstyle == ZCYLINDER && axis != 2)
error->all(FLERR,"Invalid wiggle direction for fix wall/gran");
if (wshear && wallstyle == XPLANE && axis == 0)
error->all(FLERR,"Invalid shear direction for fix wall/gran");
if (wshear && wallstyle == YPLANE && axis == 1)
error->all(FLERR,"Invalid shear direction for fix wall/gran");
if (wshear && wallstyle == ZPLANE && axis == 2)
error->all(FLERR,"Invalid shear direction for fix wall/gran");
if ((wiggle || wshear) && wallstyle == REGION)
error->all(FLERR,"Cannot wiggle or shear with fix wall/gran/region");
// setup oscillations
if (wiggle) omega = 2.0*MY_PI / period;
// perform initial allocation of atom-based arrays
// register with Atom class
if (pairstyle == BONDED_HISTORY) sheardim = 7;
else sheardim = 3;
shearone = NULL;
grow_arrays(atom->nmax);
atom->add_callback(0);
atom->add_callback(1);
nmax = 0;
mass_rigid = NULL;
// initialize shear history as if particle is not touching region
// shearone will be NULL for wallstyle = REGION
if (history && shearone) {
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++)
for (int j = 0; j < sheardim; j++)
shearone[i][j] = 0.0;
}
time_origin = update->ntimestep;
}
/* ---------------------------------------------------------------------- */
FixWallGran::~FixWallGran()
{
// unregister callbacks to this fix from Atom class
atom->delete_callback(id,0);
atom->delete_callback(id,1);
// delete local storage
delete [] idregion;
memory->destroy(shearone);
memory->destroy(mass_rigid);
}
/* ---------------------------------------------------------------------- */
int FixWallGran::setmask()
{
int mask = 0;
mask |= POST_FORCE;
mask |= POST_FORCE_RESPA;
return mask;
}
/* ---------------------------------------------------------------------- */
void FixWallGran::init()
{
int i;
dt = update->dt;
if (strstr(update->integrate_style,"respa"))
nlevels_respa = ((Respa *) update->integrate)->nlevels;
// check for FixRigid so can extract rigid body masses
fix_rigid = NULL;
for (i = 0; i < modify->nfix; i++)
if (modify->fix[i]->rigid_flag) break;
if (i < modify->nfix) fix_rigid = modify->fix[i];
}
/* ---------------------------------------------------------------------- */
void FixWallGran::setup(int vflag)
{
if (strstr(update->integrate_style,"verlet"))
post_force(vflag);
else {
((Respa *) update->integrate)->copy_flevel_f(nlevels_respa-1);
post_force_respa(vflag,nlevels_respa-1,0);
((Respa *) update->integrate)->copy_f_flevel(nlevels_respa-1);
}
}
/* ---------------------------------------------------------------------- */
void FixWallGran::post_force(int vflag)
{
int i,j;
double dx,dy,dz,del1,del2,delxy,delr,rsq,rwall,meff;
double vwall[3];
// do not update shear history during setup
shearupdate = 1;
if (update->setupflag) shearupdate = 0;
// if just reneighbored:
// update rigid body masses for owned atoms if using FixRigid
// body[i] = which body atom I is in, -1 if none
// mass_body = mass of each rigid body
if (neighbor->ago == 0 && fix_rigid) {
int tmp;
int *body = (int *) fix_rigid->extract("body",tmp);
double *mass_body = (double *) fix_rigid->extract("masstotal",tmp);
if (atom->nmax > nmax) {
memory->destroy(mass_rigid);
nmax = atom->nmax;
memory->create(mass_rigid,nmax,"wall/gran:mass_rigid");
}
int nlocal = atom->nlocal;
for (i = 0; i < nlocal; i++) {
if (body[i] >= 0) mass_rigid[i] = mass_body[body[i]];
else mass_rigid[i] = 0.0;
}
}
// set position of wall to initial settings and velocity to 0.0
// if wiggle or shear, set wall position and velocity accordingly
double wlo = lo;
double whi = hi;
vwall[0] = vwall[1] = vwall[2] = 0.0;
if (wiggle) {
double arg = omega * (update->ntimestep - time_origin) * dt;
if (wallstyle == axis) {
wlo = lo + amplitude - amplitude*cos(arg);
whi = hi + amplitude - amplitude*cos(arg);
}
vwall[axis] = amplitude*omega*sin(arg);
} else if (wshear) vwall[axis] = vshear;
// loop over all my atoms
// rsq = distance from wall
// dx,dy,dz = signed distance from wall
// for rotating cylinder, reset vwall based on particle position
// skip atom if not close enough to wall
// if wall was set to NULL, it's skipped since lo/hi are infinity
// compute force and torque on atom if close enough to wall
// via wall potential matched to pair potential
// set shear if pair potential stores history
double **x = atom->x;
double **v = atom->v;
double **f = atom->f;
double **omega = atom->omega;
double **torque = atom->torque;
double *radius = atom->radius;
double *rmass = atom->rmass;
int *mask = atom->mask;
int nlocal = atom->nlocal;
rwall = 0.0;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) {
dx = dy = dz = 0.0;
if (wallstyle == XPLANE) {
del1 = x[i][0] - wlo;
del2 = whi - x[i][0];
if (del1 < del2) dx = del1;
else dx = -del2;
} else if (wallstyle == YPLANE) {
del1 = x[i][1] - wlo;
del2 = whi - x[i][1];
if (del1 < del2) dy = del1;
else dy = -del2;
} else if (wallstyle == ZPLANE) {
del1 = x[i][2] - wlo;
del2 = whi - x[i][2];
if (del1 < del2) dz = del1;
else dz = -del2;
} else if (wallstyle == ZCYLINDER) {
delxy = sqrt(x[i][0]*x[i][0] + x[i][1]*x[i][1]);
delr = cylradius - delxy;
if (delr > radius[i]) {
dz = cylradius;
rwall = 0.0;
} else {
dx = -delr/delxy * x[i][0];
dy = -delr/delxy * x[i][1];
// rwall = -2r_c if inside cylinder, 2r_c outside
rwall = (delxy < cylradius) ? -2*cylradius : 2*cylradius;
if (wshear && axis != 2) {
vwall[0] += vshear * x[i][1]/delxy;
vwall[1] += -vshear * x[i][0]/delxy;
vwall[2] = 0.0;
}
}
}
rsq = dx*dx + dy*dy + dz*dz;
if (rsq > radius[i]*radius[i]) {
if (history)
for (j = 0; j < sheardim; j++)
shearone[i][j] = 0.0;
} else {
// meff = effective mass of sphere
// if I is part of rigid body, use body mass
meff = rmass[i];
if (fix_rigid && mass_rigid[i] > 0.0) meff = mass_rigid[i];
// invoke sphere/wall interaction
if (pairstyle == HOOKE)
hooke(rsq,dx,dy,dz,vwall,v[i],f[i],
omega[i],torque[i],radius[i],meff);
else if (pairstyle == HOOKE_HISTORY)
hooke_history(rsq,dx,dy,dz,vwall,v[i],f[i],
omega[i],torque[i],radius[i],meff,shearone[i]);
else if (pairstyle == HERTZ_HISTORY)
hertz_history(rsq,dx,dy,dz,vwall,rwall,v[i],f[i],
omega[i],torque[i],radius[i],meff,shearone[i]);
else if (pairstyle == BONDED_HISTORY)
bonded_history(rsq,dx,dy,dz,vwall,rwall,v[i],f[i],
omega[i],torque[i],radius[i],meff,shearone[i]);
}
}
}
}
/* ---------------------------------------------------------------------- */
void FixWallGran::post_force_respa(int vflag, int ilevel, int iloop)
{
if (ilevel == nlevels_respa-1) post_force(vflag);
}
/* ---------------------------------------------------------------------- */
void FixWallGran::hooke(double rsq, double dx, double dy, double dz,
double *vwall, double *v,
double *f, double *omega, double *torque,
double radius, double meff)
{
double r,vr1,vr2,vr3,vnnr,vn1,vn2,vn3,vt1,vt2,vt3;
double wr1,wr2,wr3,damp,ccel,vtr1,vtr2,vtr3,vrel;
double fn,fs,ft,fs1,fs2,fs3,fx,fy,fz,tor1,tor2,tor3,rinv,rsqinv;
r = sqrt(rsq);
rinv = 1.0/r;
rsqinv = 1.0/rsq;
// relative translational velocity
vr1 = v[0] - vwall[0];
vr2 = v[1] - vwall[1];
vr3 = v[2] - vwall[2];
// normal component
vnnr = vr1*dx + vr2*dy + vr3*dz;
vn1 = dx*vnnr * rsqinv;
vn2 = dy*vnnr * rsqinv;
vn3 = dz*vnnr * rsqinv;
// tangential component
vt1 = vr1 - vn1;
vt2 = vr2 - vn2;
vt3 = vr3 - vn3;
// relative rotational velocity
wr1 = radius*omega[0] * rinv;
wr2 = radius*omega[1] * rinv;
wr3 = radius*omega[2] * rinv;
// normal forces = Hookian contact + normal velocity damping
damp = meff*gamman*vnnr*rsqinv;
ccel = kn*(radius-r)*rinv - damp;
// relative velocities
vtr1 = vt1 - (dz*wr2-dy*wr3);
vtr2 = vt2 - (dx*wr3-dz*wr1);
vtr3 = vt3 - (dy*wr1-dx*wr2);
vrel = vtr1*vtr1 + vtr2*vtr2 + vtr3*vtr3;
vrel = sqrt(vrel);
// force normalization
fn = xmu * fabs(ccel*r);
fs = meff*gammat*vrel;
if (vrel != 0.0) ft = MIN(fn,fs) / vrel;
else ft = 0.0;
// tangential force due to tangential velocity damping
fs1 = -ft*vtr1;
fs2 = -ft*vtr2;
fs3 = -ft*vtr3;
// forces & torques
fx = dx*ccel + fs1;
fy = dy*ccel + fs2;
fz = dz*ccel + fs3;
f[0] += fx;
f[1] += fy;
f[2] += fz;
tor1 = rinv * (dy*fs3 - dz*fs2);
tor2 = rinv * (dz*fs1 - dx*fs3);
tor3 = rinv * (dx*fs2 - dy*fs1);
torque[0] -= radius*tor1;
torque[1] -= radius*tor2;
torque[2] -= radius*tor3;
}
/* ---------------------------------------------------------------------- */
void FixWallGran::hooke_history(double rsq, double dx, double dy, double dz,
double *vwall, double *v,
double *f, double *omega, double *torque,
double radius, double meff, double *shear)
{
double r,vr1,vr2,vr3,vnnr,vn1,vn2,vn3,vt1,vt2,vt3;
double wr1,wr2,wr3,damp,ccel,vtr1,vtr2,vtr3,vrel;
double fn,fs,fs1,fs2,fs3,fx,fy,fz,tor1,tor2,tor3;
double shrmag,rsht,rinv,rsqinv;
r = sqrt(rsq);
rinv = 1.0/r;
rsqinv = 1.0/rsq;
// relative translational velocity
vr1 = v[0] - vwall[0];
vr2 = v[1] - vwall[1];
vr3 = v[2] - vwall[2];
// normal component
vnnr = vr1*dx + vr2*dy + vr3*dz;
vn1 = dx*vnnr * rsqinv;
vn2 = dy*vnnr * rsqinv;
vn3 = dz*vnnr * rsqinv;
// tangential component
vt1 = vr1 - vn1;
vt2 = vr2 - vn2;
vt3 = vr3 - vn3;
// relative rotational velocity
wr1 = radius*omega[0] * rinv;
wr2 = radius*omega[1] * rinv;
wr3 = radius*omega[2] * rinv;
// normal forces = Hookian contact + normal velocity damping
damp = meff*gamman*vnnr*rsqinv;
ccel = kn*(radius-r)*rinv - damp;
// relative velocities
vtr1 = vt1 - (dz*wr2-dy*wr3);
vtr2 = vt2 - (dx*wr3-dz*wr1);
vtr3 = vt3 - (dy*wr1-dx*wr2);
vrel = vtr1*vtr1 + vtr2*vtr2 + vtr3*vtr3;
vrel = sqrt(vrel);
// shear history effects
if (shearupdate) {
shear[0] += vtr1*dt;
shear[1] += vtr2*dt;
shear[2] += vtr3*dt;
}
shrmag = sqrt(shear[0]*shear[0] + shear[1]*shear[1] + shear[2]*shear[2]);
// rotate shear displacements
rsht = shear[0]*dx + shear[1]*dy + shear[2]*dz;
rsht = rsht*rsqinv;
if (shearupdate) {
shear[0] -= rsht*dx;
shear[1] -= rsht*dy;
shear[2] -= rsht*dz;
}
// tangential forces = shear + tangential velocity damping
fs1 = - (kt*shear[0] + meff*gammat*vtr1);
fs2 = - (kt*shear[1] + meff*gammat*vtr2);
fs3 = - (kt*shear[2] + meff*gammat*vtr3);
// rescale frictional displacements and forces if needed
fs = sqrt(fs1*fs1 + fs2*fs2 + fs3*fs3);
fn = xmu * fabs(ccel*r);
if (fs > fn) {
if (shrmag != 0.0) {
shear[0] = (fn/fs) * (shear[0] + meff*gammat*vtr1/kt) -
meff*gammat*vtr1/kt;
shear[1] = (fn/fs) * (shear[1] + meff*gammat*vtr2/kt) -
meff*gammat*vtr2/kt;
shear[2] = (fn/fs) * (shear[2] + meff*gammat*vtr3/kt) -
meff*gammat*vtr3/kt;
fs1 *= fn/fs ;
fs2 *= fn/fs;
fs3 *= fn/fs;
} else fs1 = fs2 = fs3 = 0.0;
}
// forces & torques
fx = dx*ccel + fs1;
fy = dy*ccel + fs2;
fz = dz*ccel + fs3;
f[0] += fx;
f[1] += fy;
f[2] += fz;
tor1 = rinv * (dy*fs3 - dz*fs2);
tor2 = rinv * (dz*fs1 - dx*fs3);
tor3 = rinv * (dx*fs2 - dy*fs1);
torque[0] -= radius*tor1;
torque[1] -= radius*tor2;
torque[2] -= radius*tor3;
}
/* ---------------------------------------------------------------------- */
void FixWallGran::hertz_history(double rsq, double dx, double dy, double dz,
double *vwall, double rwall, double *v,
double *f, double *omega, double *torque,
double radius, double meff, double *shear)
{
double r,vr1,vr2,vr3,vnnr,vn1,vn2,vn3,vt1,vt2,vt3;
double wr1,wr2,wr3,damp,ccel,vtr1,vtr2,vtr3,vrel;
double fn,fs,fs1,fs2,fs3,fx,fy,fz,tor1,tor2,tor3;
double shrmag,rsht,polyhertz,rinv,rsqinv;
r = sqrt(rsq);
rinv = 1.0/r;
rsqinv = 1.0/rsq;
// relative translational velocity
vr1 = v[0] - vwall[0];
vr2 = v[1] - vwall[1];
vr3 = v[2] - vwall[2];
// normal component
vnnr = vr1*dx + vr2*dy + vr3*dz;
vn1 = dx*vnnr / rsq;
vn2 = dy*vnnr / rsq;
vn3 = dz*vnnr / rsq;
// tangential component
vt1 = vr1 - vn1;
vt2 = vr2 - vn2;
vt3 = vr3 - vn3;
// relative rotational velocity
wr1 = radius*omega[0] * rinv;
wr2 = radius*omega[1] * rinv;
wr3 = radius*omega[2] * rinv;
// normal forces = Hertzian contact + normal velocity damping
// rwall = 0 is flat wall case
// rwall positive or negative is curved wall
// will break (as it should) if rwall is negative and
// its absolute value < radius of particle
damp = meff*gamman*vnnr*rsqinv;
ccel = kn*(radius-r)*rinv - damp;
if (rwall == 0.0) polyhertz = sqrt((radius-r)*radius);
else polyhertz = sqrt((radius-r)*radius*rwall/(rwall+radius));
ccel *= polyhertz;
// relative velocities
vtr1 = vt1 - (dz*wr2-dy*wr3);
vtr2 = vt2 - (dx*wr3-dz*wr1);
vtr3 = vt3 - (dy*wr1-dx*wr2);
vrel = vtr1*vtr1 + vtr2*vtr2 + vtr3*vtr3;
vrel = sqrt(vrel);
// shear history effects
if (shearupdate) {
shear[0] += vtr1*dt;
shear[1] += vtr2*dt;
shear[2] += vtr3*dt;
}
shrmag = sqrt(shear[0]*shear[0] + shear[1]*shear[1] + shear[2]*shear[2]);
// rotate shear displacements
rsht = shear[0]*dx + shear[1]*dy + shear[2]*dz;
rsht = rsht*rsqinv;
if (shearupdate) {
shear[0] -= rsht*dx;
shear[1] -= rsht*dy;
shear[2] -= rsht*dz;
}
// tangential forces = shear + tangential velocity damping
fs1 = -polyhertz * (kt*shear[0] + meff*gammat*vtr1);
fs2 = -polyhertz * (kt*shear[1] + meff*gammat*vtr2);
fs3 = -polyhertz * (kt*shear[2] + meff*gammat*vtr3);
// rescale frictional displacements and forces if needed
fs = sqrt(fs1*fs1 + fs2*fs2 + fs3*fs3);
fn = xmu * fabs(ccel*r);
if (fs > fn) {
if (shrmag != 0.0) {
shear[0] = (fn/fs) * (shear[0] + meff*gammat*vtr1/kt) -
meff*gammat*vtr1/kt;
shear[1] = (fn/fs) * (shear[1] + meff*gammat*vtr2/kt) -
meff*gammat*vtr2/kt;
shear[2] = (fn/fs) * (shear[2] + meff*gammat*vtr3/kt) -
meff*gammat*vtr3/kt;
fs1 *= fn/fs ;
fs2 *= fn/fs;
fs3 *= fn/fs;
} else fs1 = fs2 = fs3 = 0.0;
}
// forces & torques
fx = dx*ccel + fs1;
fy = dy*ccel + fs2;
fz = dz*ccel + fs3;
f[0] += fx;
f[1] += fy;
f[2] += fz;
tor1 = rinv * (dy*fs3 - dz*fs2);
tor2 = rinv * (dz*fs1 - dx*fs3);
tor3 = rinv * (dx*fs2 - dy*fs1);
torque[0] -= radius*tor1;
torque[1] -= radius*tor2;
torque[2] -= radius*tor3;
}
/* ---------------------------------------------------------------------- */
void FixWallGran::bonded_history(double rsq, double dx, double dy, double dz,
double *vwall, double rwall, double *v,
double *f, double *omega, double *torque,
double radius, double meff, double *shear)
{
double r,vr1,vr2,vr3,vnnr,vn1,vn2,vn3,vt1,vt2,vt3;
double wr1,wr2,wr3,damp,ccel,vtr1,vtr2,vtr3,vrel;
double fn,fs,fs1,fs2,fs3,fx,fy,fz,tor1,tor2,tor3;
double shrmag,rsht,polyhertz,rinv,rsqinv;
double pois,E_eff,G_eff,rad_eff;
double a0,Fcrit,delcrit,delcritinv;
double overlap,olapsq,olapcubed,sqrtterm,tmp,keyterm,keyterm2,keyterm3;
double aovera0,foverFc;
double gammatsuji;
double ktwist,kroll,twistcrit,rollcrit;
double relrot1,relrot2,relrot3,vrl1,vrl2,vrl3,vrlmag,vrlmaginv;
double magtwist,magtortwist;
double magrollsq,magroll,magrollinv,magtorroll;
r = sqrt(rsq);
rinv = 1.0/r;
rsqinv = 1.0/rsq;
// relative translational velocity
vr1 = v[0] - vwall[0];
vr2 = v[1] - vwall[1];
vr3 = v[2] - vwall[2];
// normal component
vnnr = vr1*dx + vr2*dy + vr3*dz;
vn1 = dx*vnnr / rsq;
vn2 = dy*vnnr / rsq;
vn3 = dz*vnnr / rsq;
// tangential component
vt1 = vr1 - vn1;
vt2 = vr2 - vn2;
vt3 = vr3 - vn3;
// relative rotational velocity
wr1 = radius*omega[0] * rinv;
wr2 = radius*omega[1] * rinv;
wr3 = radius*omega[2] * rinv;
// normal forces = Hertzian contact + normal velocity damping
// material properties: currently assumes identical materials
pois = E/(2.0*G) - 1.0;
E_eff=0.5*E/(1.0-pois*pois);
G_eff=G/(4.0-2.0*pois);
// rwall = 0 is infinite wall radius of curvature (flat wall)
if (rwall == 0) rad_eff = radius;
else rad_eff = radius*rwall/(radius+rwall);
Fcrit = rad_eff * (3.0 * M_PI * SurfEnergy);
a0=pow(9.0*M_PI*SurfEnergy*rad_eff*rad_eff/E_eff,1.0/3.0);
delcrit = 1.0/rad_eff*(0.5 * a0*a0/pow(6.0,1.0/3.0));
delcritinv = 1.0/delcrit;
overlap = (radius-r) * delcritinv;
olapsq = overlap*overlap;
olapcubed = olapsq*overlap;
sqrtterm = sqrt(1.0 + olapcubed);
tmp = 2.0 + olapcubed + 2.0*sqrtterm;
keyterm = pow(tmp,THIRD);
keyterm2 = olapsq/keyterm;
keyterm3 = sqrt(overlap + keyterm2 + keyterm);
aovera0 = pow(6.0,-TWOTHIRDS) * (keyterm3 +
sqrt(2.0*overlap - keyterm2 - keyterm + 4.0/keyterm3));
foverFc = 4.0*((aovera0*aovera0*aovera0) - pow(aovera0,1.5));
ccel = Fcrit*foverFc*rinv;
// damp = meff*gamman*vnnr*rsqinv;
// ccel = kn*(radius-r)*rinv - damp;
// polyhertz = sqrt((radius-r)*radius);
// ccel *= polyhertz;
// use Tsuji et al form
polyhertz = 1.2728- 4.2783*0.9 + 11.087*0.9*0.9 - 22.348*0.9*0.9*0.9 +
27.467*0.9*0.9*0.9*0.9 - 18.022*0.9*0.9*0.9*0.9*0.9 +
4.8218*0.9*0.9*0.9*0.9*0.9*0.9;
gammatsuji = 0.2*sqrt(meff*kn);
damp = gammatsuji*vnnr/rsq;
ccel = ccel - polyhertz * damp;
// relative velocities
vtr1 = vt1 - (dz*wr2-dy*wr3);
vtr2 = vt2 - (dx*wr3-dz*wr1);
vtr3 = vt3 - (dy*wr1-dx*wr2);
vrel = vtr1*vtr1 + vtr2*vtr2 + vtr3*vtr3;
vrel = sqrt(vrel);
// shear history effects
if (shearupdate) {
shear[0] += vtr1*dt;
shear[1] += vtr2*dt;
shear[2] += vtr3*dt;
}
shrmag = sqrt(shear[0]*shear[0] + shear[1]*shear[1] + shear[2]*shear[2]);
// rotate shear displacements
rsht = shear[0]*dx + shear[1]*dy + shear[2]*dz;
rsht = rsht*rsqinv;
if (shearupdate) {
shear[0] -= rsht*dx;
shear[1] -= rsht*dy;
shear[2] -= rsht*dz;
}
// tangential forces = shear + tangential velocity damping
fs1 = -polyhertz * (kt*shear[0] + meff*gammat*vtr1);
fs2 = -polyhertz * (kt*shear[1] + meff*gammat*vtr2);
fs3 = -polyhertz * (kt*shear[2] + meff*gammat*vtr3);
kt=8.0*G_eff*a0*aovera0;
// shear damping uses Tsuji et al form also
fs1 = -kt*shear[0] - polyhertz*gammatsuji*vtr1;
fs2 = -kt*shear[1] - polyhertz*gammatsuji*vtr2;
fs3 = -kt*shear[2] - polyhertz*gammatsuji*vtr3;
// rescale frictional displacements and forces if needed
fs = sqrt(fs1*fs1 + fs2*fs2 + fs3*fs3);
fn = xmu * fabs(ccel*r + 2.0*Fcrit);
if (fs > fn) {
if (shrmag != 0.0) {
shear[0] = (fn/fs) * (shear[0] + polyhertz*gammatsuji*vtr1/kt) -
polyhertz*gammatsuji*vtr1/kt;
shear[1] = (fn/fs) * (shear[1] + polyhertz*gammatsuji*vtr2/kt) -
polyhertz*gammatsuji*vtr2/kt;
shear[2] = (fn/fs) * (shear[2] + polyhertz*gammatsuji*vtr3/kt) -
polyhertz*gammatsuji*vtr3/kt;
fs1 *= fn/fs ;
fs2 *= fn/fs;
fs3 *= fn/fs;
} else fs1 = fs2 = fs3 = 0.0;
}
// calculate twisting and rolling components of torque
// NOTE: this assumes spheres!
relrot1 = omega[0];
relrot2 = omega[1];
relrot3 = omega[2];
// rolling velocity
// NOTE: this assumes mondisperse spheres!
vrl1 = -rad_eff*rinv * (relrot2*dz - relrot3*dy);
vrl2 = -rad_eff*rinv * (relrot3*dx - relrot1*dz);
vrl3 = -rad_eff*rinv * (relrot1*dy - relrot2*dx);
vrlmag = sqrt(vrl1*vrl1+vrl2*vrl2+vrl3*vrl3);
if (vrlmag != 0.0) vrlmaginv = 1.0/vrlmag;
else vrlmaginv = 0.0;
// bond history effects
shear[3] += vrl1*dt;
shear[4] += vrl2*dt;
shear[5] += vrl3*dt;
// rotate bonded displacements correctly
double rlt = shear[3]*dx + shear[4]*dy + shear[5]*dz;
rlt /= rsq;
shear[3] -= rlt*dx;
shear[4] -= rlt*dy;
shear[5] -= rlt*dz;
// twisting torque
magtwist = rinv*(relrot1*dx + relrot2*dy + relrot3*dz);
shear[6] += magtwist*dt;
ktwist = 0.5*kt*(a0*aovera0)*(a0*aovera0);
magtortwist = -ktwist*shear[6] -
0.5*polyhertz*gammatsuji*(a0*aovera0)*(a0*aovera0)*magtwist;
twistcrit=TWOTHIRDS*a0*aovera0*Fcrit;
if (fabs(magtortwist) > twistcrit)
magtortwist = -twistcrit * magtwist/fabs(magtwist);
// rolling torque
magrollsq = shear[3]*shear[3] + shear[4]*shear[4] + shear[5]*shear[5];
magroll = sqrt(magrollsq);
if (magroll != 0.0) magrollinv = 1.0/magroll;
else magrollinv = 0.0;
kroll = 1.0*4.0*Fcrit*pow(aovera0,1.5);
magtorroll = -kroll*magroll - 0.1*gammat*vrlmag;
rollcrit = 0.01;
if (magroll > rollcrit) magtorroll = -kroll*rollcrit;
// forces & torques
fx = dx*ccel + fs1;
fy = dy*ccel + fs2;
fz = dz*ccel + fs3;
f[0] += fx;
f[1] += fy;
f[2] += fz;
tor1 = rinv * (dy*fs3 - dz*fs2);
tor2 = rinv * (dz*fs1 - dx*fs3);
tor3 = rinv * (dx*fs2 - dy*fs1);
torque[0] -= radius*tor1;
torque[1] -= radius*tor2;
torque[2] -= radius*tor3;
torque[0] += magtortwist * dx*rinv;
torque[1] += magtortwist * dy*rinv;
torque[2] += magtortwist * dz*rinv;
torque[0] += magtorroll * (shear[4]*dz - shear[5]*dy)*rinv*magrollinv;
torque[1] += magtorroll * (shear[5]*dx - shear[3]*dz)*rinv*magrollinv;
torque[2] += magtorroll * (shear[3]*dy - shear[4]*dx)*rinv*magrollinv;
}
/* ----------------------------------------------------------------------
memory usage of local atom-based arrays
------------------------------------------------------------------------- */
double FixWallGran::memory_usage()
{
int nmax = atom->nmax;
double bytes = 0.0;
if (history) bytes += nmax*sheardim * sizeof(double); // shear history
if (fix_rigid) bytes += nmax * sizeof(int); // mass_rigid
return bytes;
}
/* ----------------------------------------------------------------------
allocate local atom-based arrays
------------------------------------------------------------------------- */
void FixWallGran::grow_arrays(int nmax)
{
if (history) memory->grow(shearone,nmax,sheardim,"fix_wall_gran:shearone");
}
/* ----------------------------------------------------------------------
copy values within local atom-based arrays
------------------------------------------------------------------------- */
void FixWallGran::copy_arrays(int i, int j, int delflag)
{
if (history)
for (int m = 0; m < sheardim; m++)
shearone[j][m] = shearone[i][m];
}
/* ----------------------------------------------------------------------
initialize one atom's array values, called when atom is created
------------------------------------------------------------------------- */
void FixWallGran::set_arrays(int i)
{
if (history)
for (int m = 0; m < sheardim; m++)
shearone[i][m] = 0;
}
/* ----------------------------------------------------------------------
pack values in local atom-based arrays for exchange with another proc
------------------------------------------------------------------------- */
int FixWallGran::pack_exchange(int i, double *buf)
{
if (!history) return 0;
int n = 0;
for (int m = 0; m < sheardim; m++)
buf[n++] = shearone[i][m];
return n;
}
/* ----------------------------------------------------------------------
unpack values into local atom-based arrays after exchange
------------------------------------------------------------------------- */
int FixWallGran::unpack_exchange(int nlocal, double *buf)
{
if (!history) return 0;
int n = 0;
for (int m = 0; m < sheardim; m++)
shearone[nlocal][m] = buf[n++];
return n;
}
/* ----------------------------------------------------------------------
pack values in local atom-based arrays for restart file
------------------------------------------------------------------------- */
int FixWallGran::pack_restart(int i, double *buf)
{
if (!history) return 0;
int n = 0;
buf[n++] = sheardim + 1;
for (int m = 0; m < sheardim; m++)
buf[n++] = shearone[i][m];
return n;
}
/* ----------------------------------------------------------------------
unpack values from atom->extra array to restart the fix
------------------------------------------------------------------------- */
void FixWallGran::unpack_restart(int nlocal, int nth)
{
if (!history) return;
double **extra = atom->extra;
// skip to Nth set of extra values
int m = 0;
for (int i = 0; i < nth; i++) m += static_cast<int> (extra[nlocal][m]);
m++;
for (int i = 0; i < sheardim; i++)
shearone[nlocal][i] = extra[nlocal][m++];
}
/* ----------------------------------------------------------------------
maxsize of any atom's restart data
------------------------------------------------------------------------- */
int FixWallGran::maxsize_restart()
{
if (!history) return 0;
return 1 + sheardim;
}
/* ----------------------------------------------------------------------
size of atom nlocal's restart data
------------------------------------------------------------------------- */
int FixWallGran::size_restart(int nlocal)
{
if (!history) return 0;
return 1 + sheardim;
}
/* ---------------------------------------------------------------------- */
void FixWallGran::reset_dt()
{
dt = update->dt;
}
diff --git a/src/GRANULAR/fix_wall_gran_region.cpp b/src/GRANULAR/fix_wall_gran_region.cpp
index 8283c0911..d56715230 100644
--- a/src/GRANULAR/fix_wall_gran_region.cpp
+++ b/src/GRANULAR/fix_wall_gran_region.cpp
@@ -1,510 +1,510 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing authors: Dan Bolintineanu (SNL)
------------------------------------------------------------------------- */
#include <math.h>
#include <stdlib.h>
#include <string.h>
#include "fix_wall_gran_region.h"
#include "region.h"
#include "atom.h"
#include "domain.h"
#include "update.h"
#include "force.h"
#include "pair.h"
#include "modify.h"
#include "respa.h"
#include "math_const.h"
#include "memory.h"
#include "error.h"
using namespace LAMMPS_NS;
using namespace FixConst;
using namespace MathConst;
// same as FixWallGran
enum{HOOKE,HOOKE_HISTORY,HERTZ_HISTORY,BONDED_HISTORY};
#define BIG 1.0e20
/* ---------------------------------------------------------------------- */
FixWallGranRegion::FixWallGranRegion(LAMMPS *lmp, int narg, char **arg) :
- FixWallGran(lmp, narg, arg), region(NULL), region_style(NULL), ncontact(NULL),
+ FixWallGran(lmp, narg, arg), region(NULL), region_style(NULL), ncontact(NULL),
walls(NULL), shearmany(NULL), c2r(NULL)
{
restart_global = 1;
motion_resetflag = 0;
int iregion = domain->find_region(idregion);
if (iregion == -1)
error->all(FLERR,"Region ID for fix wall/gran/region does not exist");
region = domain->regions[iregion];
region_style = new char[strlen(region->style)+1];
strcpy(region_style,region->style);
nregion = region->nregion;
-
+
tmax = domain->regions[iregion]->tmax;
c2r = new int[tmax];
// re-allocate atom-based arrays with nshear
// do not register with Atom class, since parent class did that
-
+
memory->destroy(shearone);
shearone = NULL;
ncontact = NULL;
walls = NULL;
shearmany = NULL;
grow_arrays(atom->nmax);
-
+
// initialize shear history as if particle is not touching region
if (history) {
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++)
ncontact[i] = 0;
}
}
/* ---------------------------------------------------------------------- */
FixWallGranRegion::~FixWallGranRegion()
{
delete [] c2r;
delete [] region_style;
memory->destroy(ncontact);
memory->destroy(walls);
memory->destroy(shearmany);
}
/* ---------------------------------------------------------------------- */
void FixWallGranRegion::init()
{
FixWallGran::init();
int iregion = domain->find_region(idregion);
if (iregion == -1)
error->all(FLERR,"Region ID for fix wall/gran/region does not exist");
region = domain->regions[iregion];
// check if region properties changed between runs
// reset if restart info was inconsistent
if (strcmp(idregion,region->id) != 0 ||
strcmp(region_style,region->style) != 0 ||
nregion != region->nregion) {
char str[256];
sprintf(str,"Region properties for region %s changed between runs, "
"resetting its motion",idregion);
error->warning(FLERR,str);
region->reset_vel();
}
- if (motion_resetflag){
+ if (motion_resetflag){
char str[256];
sprintf(str,"Region properties for region %s are inconsistent "
"with restart file, resetting its motion",idregion);
error->warning(FLERR,str);
region->reset_vel();
}
}
/* ---------------------------------------------------------------------- */
void FixWallGranRegion::post_force(int vflag)
{
int i,m,nc,iwall;
double rinv,fx,fy,fz,tooclose;
double dx,dy,dz,rsq,meff;
double xc[3],vwall[3];
// do not update shear history during setup
shearupdate = 1;
if (update->setupflag) shearupdate = 0;
// if just reneighbored:
// update rigid body masses for owned atoms if using FixRigid
// body[i] = which body atom I is in, -1 if none
// mass_body = mass of each rigid body
if (neighbor->ago == 0 && fix_rigid) {
int tmp;
int *body = (int *) fix_rigid->extract("body",tmp);
double *mass_body = (double *) fix_rigid->extract("masstotal",tmp);
if (atom->nmax > nmax) {
memory->destroy(mass_rigid);
nmax = atom->nmax;
memory->create(mass_rigid,nmax,"wall/gran:mass_rigid");
}
int nlocal = atom->nlocal;
for (i = 0; i < nlocal; i++) {
if (body[i] >= 0) mass_rigid[i] = mass_body[body[i]];
else mass_rigid[i] = 0.0;
}
}
int regiondynamic = region->dynamic_check();
if (!regiondynamic) vwall[0] = vwall[1] = vwall[2] = 0.0;
double **x = atom->x;
double **v = atom->v;
double **f = atom->f;
double **omega = atom->omega;
double **torque = atom->torque;
double *radius = atom->radius;
double *rmass = atom->rmass;
int *mask = atom->mask;
int nlocal = atom->nlocal;
// set current motion attributes of region
// set_velocity() also updates prev to current step
if (regiondynamic) {
region->prematch();
- region->set_velocity();
+ region->set_velocity();
}
for (i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) {
if (!region->match(x[i][0],x[i][1],x[i][2])) continue;
-
+
nc = region->surface(x[i][0],x[i][1],x[i][2],radius[i]);
- if (nc > tmax)
+ if (nc > tmax)
error->one(FLERR,"Too many wall/gran/region contacts for one particle");
// shear history maintenance
// update ncontact,walls,shear2many for particle I
// to reflect new and persistent shear history values
// also set c2r[] = indices into region->contact[] for each of N contacts
// process zero or one contact here, otherwise invoke update_contacts()
if (history) {
if (nc == 0) {
ncontact[i] = 0;
continue;
}
if (nc == 1) {
c2r[0] = 0;
iwall = region->contact[0].iwall;
if (ncontact[i] == 0) {
ncontact[i] = 1;
walls[i][0] = iwall;
for (m = 0; m < sheardim; m++)
shearmany[i][0][m] = 0.0;
- } else if (ncontact[i] > 1 || iwall != walls[i][0])
+ } else if (ncontact[i] > 1 || iwall != walls[i][0])
update_contacts(i,nc);
} else update_contacts(i,nc);
}
// process current contacts
for (int ic = 0; ic < nc; ic++) {
// rsq = squared contact distance
// xc = contact point
rsq = region->contact[ic].r*region->contact[ic].r;
dx = region->contact[ic].delx;
dy = region->contact[ic].dely;
dz = region->contact[ic].delz;
if (regiondynamic) region->velocity_contact(vwall, x[i], ic);
// meff = effective mass of sphere
// if I is part of rigid body, use body mass
-
+
meff = rmass[i];
if (fix_rigid && mass_rigid[i] > 0.0) meff = mass_rigid[i];
// invoke sphere/wall interaction
-
+
if (pairstyle == HOOKE)
hooke(rsq,dx,dy,dz,vwall,v[i],f[i],
omega[i],torque[i],radius[i],meff);
else if (pairstyle == HOOKE_HISTORY)
hooke_history(rsq,dx,dy,dz,vwall,v[i],f[i],
omega[i],torque[i],radius[i],meff,
shearmany[i][c2r[ic]]);
else if (pairstyle == HERTZ_HISTORY)
hertz_history(rsq,dx,dy,dz,vwall,region->contact[ic].radius,
v[i],f[i],omega[i],torque[i],
radius[i],meff,shearmany[i][c2r[ic]]);
else if (pairstyle == BONDED_HISTORY)
bonded_history(rsq,dx,dy,dz,vwall,region->contact[ic].radius,
v[i],f[i],omega[i],torque[i],
radius[i],meff,shearmany[i][c2r[ic]]);
}
}
}
}
/* ----------------------------------------------------------------------
update contact info in ncontact, walls, shear2many for particle I
based on ncontacts[i] old contacts and N new contacts
matched via their associated walls
delete/zero shear history for broken/new contacts
also set c2r[i] = index of Ith contact in region list of contacts
------------------------------------------------------------------------- */
void FixWallGranRegion::update_contacts(int i, int nc)
{
int j,m,iold,nold,ilast,inew,iadd,iwall;
// loop over old contacts
// if not in new contact list:
// delete old contact by copying last contact over it
iold = 0;
while (iold < ncontact[i]) {
for (m = 0; m < nc; m++)
if (region->contact[m].iwall = walls[i][iold]) break;
if (m < nc) {
ilast = ncontact[i]-1;
for (j = 0; j < sheardim; j++)
shearmany[i][iold][j] = shearmany[i][ilast][j];
walls[i][iold] = walls[i][ilast];
ncontact[i]--;
} else iold++;
}
// loop over new contacts
// if not in newly compressed contact list of length nold:
// add it with zeroed shear history
// set all values in c2r
nold = ncontact[i];
for (inew = 0; inew < nc; inew++) {
iwall = region->contact[inew].iwall;
for (m = 0; m < nold; m++)
if (walls[i][m] == iwall) break;
if (m < nold) c2r[m] = inew;
else {
iadd = ncontact[i];
c2r[iadd] = inew;
for (j = 0; j < sheardim; j++)
shearmany[i][iadd][j] = 0.0;
walls[i][iadd] = iwall;
ncontact[i]++;
}
}
}
/* ----------------------------------------------------------------------
memory usage of local atom-based arrays
------------------------------------------------------------------------- */
double FixWallGranRegion::memory_usage()
{
int nmax = atom->nmax;
double bytes = 0.0;
if (history) { // shear history
bytes += nmax * sizeof(int); // ncontact
bytes += nmax*tmax * sizeof(int); // walls
bytes += nmax*tmax*sheardim * sizeof(double); // shearmany
}
if (fix_rigid) bytes += nmax * sizeof(int); // mass_rigid
return bytes;
}
/* ----------------------------------------------------------------------
allocate local atom-based arrays
------------------------------------------------------------------------- */
void FixWallGranRegion::grow_arrays(int nmax)
{
if (history) {
memory->grow(ncontact,nmax,"fix_wall_gran:ncontact");
memory->grow(walls,nmax,tmax,"fix_wall_gran:walls");
memory->grow(shearmany,nmax,tmax,sheardim,"fix_wall_gran:shearmany");
}
}
/* ----------------------------------------------------------------------
copy values within local atom-based arrays
------------------------------------------------------------------------- */
void FixWallGranRegion::copy_arrays(int i, int j, int delflag)
{
int m,n,iwall;
if (!history) return;
n = ncontact[i];
for (iwall = 0; iwall < n; iwall++) {
walls[j][iwall] = walls[i][iwall];
for (m = 0; m < sheardim; m++)
shearmany[j][iwall][m] = shearmany[i][iwall][m];
}
ncontact[j] = ncontact[i];
}
/* ----------------------------------------------------------------------
initialize one atom's array values, called when atom is created
------------------------------------------------------------------------- */
void FixWallGranRegion::set_arrays(int i)
{
if (!history) return;
ncontact[i] = 0;
}
/* ----------------------------------------------------------------------
pack values in local atom-based arrays for exchange with another proc
------------------------------------------------------------------------- */
int FixWallGranRegion::pack_exchange(int i, double *buf)
{
int m;
if (!history) return 0;
int n = 0;
int count = ncontact[i];
buf[n++] = ubuf(count).d;
for (int iwall = 0; iwall < count; iwall++) {
buf[n++] = ubuf(walls[i][iwall]).d;
for (m = 0; m < sheardim; m++)
buf[n++] = shearmany[i][iwall][m];
}
return n;
}
/* ----------------------------------------------------------------------
unpack values into local atom-based arrays after exchange
------------------------------------------------------------------------- */
int FixWallGranRegion::unpack_exchange(int nlocal, double *buf)
{
int m;
if (!history) return 0;
int n = 0;
int count = ncontact[nlocal] = (int) ubuf(buf[n++]).i;
for (int iwall = 0; iwall < count; iwall++) {
walls[nlocal][iwall] = (int) ubuf(buf[n++]).i;
for (m = 0; m < sheardim; m++)
shearmany[nlocal][iwall][m] = buf[n++];
}
return n;
}
/* ----------------------------------------------------------------------
pack values in local atom-based arrays for restart file
------------------------------------------------------------------------- */
int FixWallGranRegion::pack_restart(int i, double *buf)
{
int m;
if (!history) return 0;
int n = 1;
int count = ncontact[i];
buf[n++] = ubuf(count).d;
for (int iwall = 0; iwall < count; iwall++) {
buf[n++] = ubuf(walls[i][iwall]).d;
for (m = 0; m < sheardim; m++)
buf[n++] = shearmany[i][iwall][m];
}
buf[0] = n;
return n;
}
/* ----------------------------------------------------------------------
unpack values from atom->extra array to restart the fix
------------------------------------------------------------------------- */
void FixWallGranRegion::unpack_restart(int nlocal, int nth)
{
int k;
if (!history) return;
double **extra = atom->extra;
// skip to Nth set of extra values
-
+
int m = 0;
for (int i = 0; i < nth; i++) m += static_cast<int> (extra[nlocal][m]);
m++;
-
+
int count = ncontact[nlocal] = (int) ubuf(extra[nlocal][m++]).i;
for (int iwall = 0; iwall < count; iwall++) {
walls[nlocal][iwall] = (int) ubuf(extra[nlocal][m++]).i;
for (k = 0; k < sheardim; k++)
shearmany[nlocal][iwall][k] = extra[nlocal][m++];
}
}
/* ----------------------------------------------------------------------
maxsize of any atom's restart data
------------------------------------------------------------------------- */
int FixWallGranRegion::maxsize_restart()
{
if (!history) return 0;
return 2 + tmax*(sheardim+1);
}
/* ----------------------------------------------------------------------
size of atom nlocal's restart data
------------------------------------------------------------------------- */
int FixWallGranRegion::size_restart(int nlocal)
{
if (!history) return 0;
return 2 + ncontact[nlocal]*(sheardim+1);
}
/* ----------------------------------------------------------------------
pack entire state of Fix into one write
------------------------------------------------------------------------- */
void FixWallGranRegion::write_restart(FILE *fp)
{
if (comm->me) return;
int len = 0;
region->length_restart_string(len);
fwrite(&len, sizeof(int),1,fp);
region->write_restart(fp);
}
/* ----------------------------------------------------------------------
use state info from restart file to restart the Fix
------------------------------------------------------------------------- */
void FixWallGranRegion::restart(char *buf)
{
int n = 0;
- if (!region->restart(buf,n)) motion_resetflag = 1;
+ if (!region->restart(buf,n)) motion_resetflag = 1;
}
diff --git a/src/KOKKOS/fix_qeq_reax_kokkos.cpp b/src/KOKKOS/fix_qeq_reax_kokkos.cpp
index 844d48dae..ab25e557e 100644
--- a/src/KOKKOS/fix_qeq_reax_kokkos.cpp
+++ b/src/KOKKOS/fix_qeq_reax_kokkos.cpp
@@ -1,1263 +1,1230 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing author: Ray Shan (SNL), Stan Moore (SNL)
------------------------------------------------------------------------- */
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "fix_qeq_reax_kokkos.h"
#include "kokkos.h"
#include "atom.h"
#include "atom_masks.h"
#include "atom_kokkos.h"
#include "comm.h"
#include "force.h"
#include "group.h"
#include "modify.h"
#include "neighbor.h"
#include "neigh_list_kokkos.h"
#include "neigh_request.h"
#include "update.h"
#include "integrate.h"
#include "respa.h"
#include "math_const.h"
#include "memory.h"
#include "error.h"
#include "pair_reax_c_kokkos.h"
using namespace LAMMPS_NS;
using namespace FixConst;
#define SMALL 0.0001
#define EV_TO_KCAL_PER_MOL 14.4
#define TEAMSIZE 128
/* ---------------------------------------------------------------------- */
template<class DeviceType>
FixQEqReaxKokkos<DeviceType>::FixQEqReaxKokkos(LAMMPS *lmp, int narg, char **arg) :
FixQEqReax(lmp, narg, arg)
{
kokkosable = 1;
atomKK = (AtomKokkos *) atom;
execution_space = ExecutionSpaceFromDevice<DeviceType>::space;
datamask_read = X_MASK | V_MASK | F_MASK | MASK_MASK | Q_MASK | TYPE_MASK;
datamask_modify = Q_MASK | X_MASK;
nmax = nmax = m_cap = 0;
allocated_flag = 0;
-
- reaxc = (PairReaxC *) force->pair_match("reax/c/kk",1);
- use_pair_list = 0;
- if (reaxc->execution_space == this->execution_space)
- use_pair_list = 1;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
FixQEqReaxKokkos<DeviceType>::~FixQEqReaxKokkos()
{
if (copymode) return;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void FixQEqReaxKokkos<DeviceType>::init()
{
atomKK->k_q.modify<LMPHostType>();
atomKK->k_q.sync<LMPDeviceType>();
- //FixQEqReax::init();
- {
- if (!atom->q_flag) error->all(FLERR,"Fix qeq/reax requires atom attribute q");
-
- ngroup = group->count(igroup);
- if (ngroup == 0) error->all(FLERR,"Fix qeq/reax group has no atoms");
-
- // need a half neighbor list w/ Newton off and ghost neighbors
- // built whenever re-neighboring occurs
-
- if (!use_pair_list) {
- int irequest = neighbor->request(this,instance_me);
- neighbor->requests[irequest]->pair = 0;
- neighbor->requests[irequest]->fix = 1;
- neighbor->requests[irequest]->newton = 2;
- neighbor->requests[irequest]->ghost = 1;
- }
-
- init_shielding();
- init_taper();
-
- if (strstr(update->integrate_style,"respa"))
- nlevels_respa = ((Respa *) update->integrate)->nlevels;
- }
+ FixQEqReax::init();
neighflag = lmp->kokkos->neighflag;
-
- if (!use_pair_list) {
- int irequest = neighbor->nrequest - 1;
- neighbor->requests[irequest]->
- kokkos_host = Kokkos::Impl::is_same<DeviceType,LMPHostType>::value &&
- !Kokkos::Impl::is_same<DeviceType,LMPDeviceType>::value;
- neighbor->requests[irequest]->
- kokkos_device = Kokkos::Impl::is_same<DeviceType,LMPDeviceType>::value;
+ int irequest = neighbor->nrequest - 1;
+
+ neighbor->requests[irequest]->
+ kokkos_host = Kokkos::Impl::is_same<DeviceType,LMPHostType>::value &&
+ !Kokkos::Impl::is_same<DeviceType,LMPDeviceType>::value;
+ neighbor->requests[irequest]->
+ kokkos_device = Kokkos::Impl::is_same<DeviceType,LMPDeviceType>::value;
- if (neighflag == FULL) {
- neighbor->requests[irequest]->fix = 1;
- neighbor->requests[irequest]->pair = 0;
- neighbor->requests[irequest]->full = 1;
- neighbor->requests[irequest]->half = 0;
- } else { //if (neighflag == HALF || neighflag == HALFTHREAD)
- neighbor->requests[irequest]->fix = 1;
- neighbor->requests[irequest]->full = 0;
- neighbor->requests[irequest]->half = 1;
- neighbor->requests[irequest]->ghost = 1;
- }
+ if (neighflag == FULL) {
+ neighbor->requests[irequest]->fix = 1;
+ neighbor->requests[irequest]->pair = 0;
+ neighbor->requests[irequest]->full = 1;
+ neighbor->requests[irequest]->half = 0;
+ } else { //if (neighflag == HALF || neighflag == HALFTHREAD)
+ neighbor->requests[irequest]->fix = 1;
+ neighbor->requests[irequest]->full = 0;
+ neighbor->requests[irequest]->half = 1;
+ neighbor->requests[irequest]->ghost = 1;
}
int ntypes = atom->ntypes;
k_params = Kokkos::DualView<params_qeq*,Kokkos::LayoutRight,DeviceType>
("FixQEqReax::params",ntypes+1);
params = k_params.template view<DeviceType>();
for (n = 1; n <= ntypes; n++) {
k_params.h_view(n).chi = chi[n];
k_params.h_view(n).eta = eta[n];
k_params.h_view(n).gamma = gamma[n];
}
k_params.template modify<LMPHostType>();
cutsq = swb * swb;
init_shielding_k();
init_hist();
-
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void FixQEqReaxKokkos<DeviceType>::init_shielding_k()
{
int i,j;
int ntypes = atom->ntypes;
k_shield = DAT::tdual_ffloat_2d("qeq/kk:shield",ntypes+1,ntypes+1);
d_shield = k_shield.template view<DeviceType>();
for( i = 1; i <= ntypes; ++i )
for( j = 1; j <= ntypes; ++j )
k_shield.h_view(i,j) = pow( gamma[i] * gamma[j], -1.5 );
k_shield.template modify<LMPHostType>();
k_shield.template sync<DeviceType>();
k_tap = DAT::tdual_ffloat_1d("qeq/kk:tap",8);
d_tap = k_tap.template view<DeviceType>();
for (i = 0; i < 8; i ++)
k_tap.h_view(i) = Tap[i];
k_tap.template modify<LMPHostType>();
k_tap.template sync<DeviceType>();
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void FixQEqReaxKokkos<DeviceType>::init_hist()
{
int i,j;
k_s_hist = DAT::tdual_ffloat_2d("qeq/kk:s_hist",atom->nmax,5);
d_s_hist = k_s_hist.template view<DeviceType>();
h_s_hist = k_s_hist.h_view;
k_t_hist = DAT::tdual_ffloat_2d("qeq/kk:t_hist",atom->nmax,5);
d_t_hist = k_t_hist.template view<DeviceType>();
h_t_hist = k_t_hist.h_view;
for( i = 0; i < atom->nmax; i++ )
for( j = 0; j < 5; j++ )
k_s_hist.h_view(i,j) = k_t_hist.h_view(i,j) = 0.0;
k_s_hist.template modify<LMPHostType>();
k_s_hist.template sync<DeviceType>();
k_t_hist.template modify<LMPHostType>();
k_t_hist.template sync<DeviceType>();
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void FixQEqReaxKokkos<DeviceType>::setup_pre_force(int vflag)
{
//neighbor->build_one(list);
pre_force(vflag);
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void FixQEqReaxKokkos<DeviceType>::pre_force(int vflag)
{
if (update->ntimestep % nevery) return;
atomKK->sync(execution_space,datamask_read);
atomKK->modified(execution_space,datamask_modify);
x = atomKK->k_x.view<DeviceType>();
v = atomKK->k_v.view<DeviceType>();
f = atomKK->k_f.view<DeviceType>();
q = atomKK->k_q.view<DeviceType>();
tag = atomKK->k_tag.view<DeviceType>();
type = atomKK->k_type.view<DeviceType>();
mask = atomKK->k_mask.view<DeviceType>();
nlocal = atomKK->nlocal;
nall = atom->nlocal + atom->nghost;
newton_pair = force->newton_pair;
k_params.template sync<DeviceType>();
k_shield.template sync<DeviceType>();
k_tap.template sync<DeviceType>();
- if (use_pair_list)
- list = reaxc->list;
NeighListKokkos<DeviceType>* k_list = static_cast<NeighListKokkos<DeviceType>*>(list);
d_numneigh = k_list->d_numneigh;
d_neighbors = k_list->d_neighbors;
d_ilist = k_list->d_ilist;
inum = list->inum;
k_list->clean_copy();
//cleanup_copy();
copymode = 1;
int teamsize = TEAMSIZE;
// allocate
allocate_array();
// get max number of neighbor
if (!allocated_flag || update->ntimestep == neighbor->lastcall)
allocate_matrix();
// compute_H
FixQEqReaxKokkosComputeHFunctor<DeviceType> computeH_functor(this);
Kokkos::parallel_scan(inum,computeH_functor);
DeviceType::fence();
// init_matvec
FixQEqReaxKokkosMatVecFunctor<DeviceType> matvec_functor(this);
Kokkos::parallel_for(inum,matvec_functor);
DeviceType::fence();
// comm->forward_comm_fix(this); //Dist_vector( s );
pack_flag = 2;
k_s.template modify<DeviceType>();
k_s.template sync<LMPHostType>();
comm->forward_comm_fix(this);
k_s.template modify<LMPHostType>();
k_s.template sync<DeviceType>();
// comm->forward_comm_fix(this); //Dist_vector( t );
pack_flag = 3;
k_t.template modify<DeviceType>();
k_t.template sync<LMPHostType>();
comm->forward_comm_fix(this);
k_t.template modify<LMPHostType>();
k_t.template sync<DeviceType>();
// 1st cg solve over b_s, s
cg_solve1();
DeviceType::fence();
// 2nd cg solve over b_t, t
cg_solve2();
DeviceType::fence();
// calculate_Q();
calculate_q();
DeviceType::fence();
copymode = 0;
if (!allocated_flag)
allocated_flag = 1;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void FixQEqReaxKokkos<DeviceType>::num_neigh_item(int ii, int &maxneigh) const
{
const int i = d_ilist[ii];
maxneigh += d_numneigh[i];
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void FixQEqReaxKokkos<DeviceType>::allocate_matrix()
{
int i,ii,m;
const int inum = list->inum;
nmax = atom->nmax;
// determine the total space for the H matrix
m_cap = 0;
FixQEqReaxKokkosNumNeighFunctor<DeviceType> neigh_functor(this);
Kokkos::parallel_reduce(inum,neigh_functor,m_cap);
d_firstnbr = typename AT::t_int_1d("qeq/kk:firstnbr",nmax);
d_numnbrs = typename AT::t_int_1d("qeq/kk:numnbrs",nmax);
d_jlist = typename AT::t_int_1d("qeq/kk:jlist",m_cap);
d_val = typename AT::t_ffloat_1d("qeq/kk:val",m_cap);
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void FixQEqReaxKokkos<DeviceType>::allocate_array()
{
if (atom->nmax > nmax) {
nmax = atom->nmax;
k_o = DAT::tdual_ffloat_1d("qeq/kk:h_o",nmax);
d_o = k_o.template view<DeviceType>();
h_o = k_o.h_view;
d_Hdia_inv = typename AT::t_ffloat_1d("qeq/kk:h_Hdia_inv",nmax);
d_b_s = typename AT::t_ffloat_1d("qeq/kk:h_b_s",nmax);
d_b_t = typename AT::t_ffloat_1d("qeq/kk:h_b_t",nmax);
k_s = DAT::tdual_ffloat_1d("qeq/kk:h_s",nmax);
d_s = k_s.template view<DeviceType>();
h_s = k_s.h_view;
k_t = DAT::tdual_ffloat_1d("qeq/kk:h_t",nmax);
d_t = k_t.template view<DeviceType>();
h_t = k_t.h_view;
d_p = typename AT::t_ffloat_1d("qeq/kk:h_p",nmax);
d_r = typename AT::t_ffloat_1d("qeq/kk:h_r",nmax);
k_d = DAT::tdual_ffloat_1d("qeq/kk:h_d",nmax);
d_d = k_d.template view<DeviceType>();
h_d = k_d.h_view;
k_s_hist = DAT::tdual_ffloat_2d("qeq/kk:s_hist",nmax,5);
d_s_hist = k_s_hist.template view<DeviceType>();
h_s_hist = k_s_hist.h_view;
k_t_hist = DAT::tdual_ffloat_2d("qeq/kk:t_hist",nmax,5);
d_t_hist = k_t_hist.template view<DeviceType>();
h_t_hist = k_t_hist.h_view;
}
// init_storage
const int ignum = atom->nlocal + atom->nghost;
FixQEqReaxKokkosZeroFunctor<DeviceType> zero_functor(this);
Kokkos::parallel_for(ignum,zero_functor);
DeviceType::fence();
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void FixQEqReaxKokkos<DeviceType>::zero_item(int ii) const
{
const int i = d_ilist[ii];
const int itype = type(i);
if (mask[i] & groupbit) {
d_Hdia_inv[i] = 1.0 / params(itype).eta;
d_b_s[i] = -params(itype).chi;
d_b_t[i] = -1.0;
d_s[i] = 0.0;
d_t[i] = 0.0;
d_p[i] = 0.0;
d_o[i] = 0.0;
d_r[i] = 0.0;
d_d[i] = 0.0;
//for( int j = 0; j < 5; j++ )
//d_s_hist(i,j) = d_t_hist(i,j) = 0.0;
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void FixQEqReaxKokkos<DeviceType>::compute_h_item(int ii, int &m_fill, const bool &final) const
{
const int i = d_ilist[ii];
int j,jj,jtag,jtype,flag;
if (mask[i] & groupbit) {
const X_FLOAT xtmp = x(i,0);
const X_FLOAT ytmp = x(i,1);
const X_FLOAT ztmp = x(i,2);
const int itype = type(i);
const int itag = tag(i);
const int jnum = d_numneigh[i];
if (final)
d_firstnbr[i] = m_fill;
for (jj = 0; jj < jnum; jj++) {
j = d_neighbors(i,jj);
j &= NEIGHMASK;
jtype = type(j);
const X_FLOAT delx = x(j,0) - xtmp;
const X_FLOAT dely = x(j,1) - ytmp;
const X_FLOAT delz = x(j,2) - ztmp;
if (neighflag != FULL) {
flag = 0;
if (j < nlocal) flag = 1;
else if (tag[i] < tag[j]) flag = 1;
else if (tag[i] == tag[j]) {
if (delz > SMALL) flag = 1;
else if (fabs(delz) < SMALL) {
if (dely > SMALL) flag = 1;
else if (fabs(dely) < SMALL && delx > SMALL)
flag = 1;
}
}
if (!flag) continue;
}
const F_FLOAT rsq = delx*delx + dely*dely + delz*delz;
if (rsq > cutsq) continue;
if (final) {
const F_FLOAT r = sqrt(rsq);
d_jlist(m_fill) = j;
const F_FLOAT shldij = d_shield(itype,jtype);
d_val(m_fill) = calculate_H_k(r,shldij);
}
m_fill++;
}
if (final)
d_numnbrs[i] = m_fill - d_firstnbr[i];
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
double FixQEqReaxKokkos<DeviceType>::calculate_H_k(const F_FLOAT &r, const F_FLOAT &shld) const
{
F_FLOAT taper, denom;
taper = d_tap[7] * r + d_tap[6];
taper = taper * r + d_tap[5];
taper = taper * r + d_tap[4];
taper = taper * r + d_tap[3];
taper = taper * r + d_tap[2];
taper = taper * r + d_tap[1];
taper = taper * r + d_tap[0];
denom = r * r * r + shld;
denom = pow(denom,0.3333333333333);
return taper * EV_TO_KCAL_PER_MOL / denom;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void FixQEqReaxKokkos<DeviceType>::mat_vec_item(int ii) const
{
const int i = d_ilist[ii];
const int itype = type(i);
if (mask[i] & groupbit) {
d_Hdia_inv[i] = 1.0 / params(itype).eta;
d_b_s[i] = -params(itype).chi;
d_b_t[i] = -1.0;
d_t[i] = d_t_hist(i,2) + 3*(d_t_hist(i,0) - d_t_hist(i,1));
d_s[i] = 4*(d_s_hist(i,0)+d_s_hist(i,2))-(6*d_s_hist(i,1)+d_s_hist(i,3));
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void FixQEqReaxKokkos<DeviceType>::cg_solve1()
// b = b_s, x = s;
{
const int inum = list->inum;
const int ignum = inum + list->gnum;
F_FLOAT tmp, sig_old, b_norm;
const int teamsize = TEAMSIZE;
// sparse_matvec( &H, x, q );
FixQEqReaxKokkosSparse12Functor<DeviceType> sparse12_functor(this);
Kokkos::parallel_for(inum,sparse12_functor);
DeviceType::fence();
if (neighflag != FULL) {
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType,TagZeroQGhosts>(nlocal,nlocal+atom->nghost),*this);
DeviceType::fence();
if (neighflag == HALF) {
FixQEqReaxKokkosSparse13Functor<DeviceType,HALF> sparse13_functor(this);
Kokkos::parallel_for(inum,sparse13_functor);
} else {
FixQEqReaxKokkosSparse13Functor<DeviceType,HALFTHREAD> sparse13_functor(this);
Kokkos::parallel_for(inum,sparse13_functor);
}
} else {
Kokkos::parallel_for(Kokkos::TeamPolicy <DeviceType, TagSparseMatvec1> (inum, teamsize), *this);
}
DeviceType::fence();
if (neighflag != FULL) {
k_o.template modify<DeviceType>();
k_o.template sync<LMPHostType>();
comm->reverse_comm_fix(this); //Coll_vector( q );
k_o.template modify<LMPHostType>();
k_o.template sync<DeviceType>();
}
// vector_sum( r , 1., b, -1., q, nn );
// preconditioning: d[j] = r[j] * Hdia_inv[j];
// b_norm = parallel_norm( b, nn );
F_FLOAT my_norm = 0.0;
FixQEqReaxKokkosNorm1Functor<DeviceType> norm1_functor(this);
Kokkos::parallel_reduce(inum,norm1_functor,my_norm);
DeviceType::fence();
F_FLOAT norm_sqr = 0.0;
MPI_Allreduce( &my_norm, &norm_sqr, 1, MPI_DOUBLE, MPI_SUM, world );
b_norm = sqrt(norm_sqr);
DeviceType::fence();
// sig_new = parallel_dot( r, d, nn);
F_FLOAT my_dot = 0.0;
FixQEqReaxKokkosDot1Functor<DeviceType> dot1_functor(this);
Kokkos::parallel_reduce(inum,dot1_functor,my_dot);
DeviceType::fence();
F_FLOAT dot_sqr = 0.0;
MPI_Allreduce( &my_dot, &dot_sqr, 1, MPI_DOUBLE, MPI_SUM, world );
F_FLOAT sig_new = dot_sqr;
DeviceType::fence();
int loop;
const int loopmax = 200;
for (loop = 1; loop < loopmax & sqrt(sig_new)/b_norm > tolerance; loop++) {
// comm->forward_comm_fix(this); //Dist_vector( d );
pack_flag = 1;
k_d.template modify<DeviceType>();
k_d.template sync<LMPHostType>();
comm->forward_comm_fix(this);
k_d.template modify<LMPHostType>();
k_d.template sync<DeviceType>();
// sparse_matvec( &H, d, q );
FixQEqReaxKokkosSparse22Functor<DeviceType> sparse22_functor(this);
Kokkos::parallel_for(inum,sparse22_functor);
DeviceType::fence();
if (neighflag != FULL) {
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType,TagZeroQGhosts>(nlocal,nlocal+atom->nghost),*this);
DeviceType::fence();
if (neighflag == HALF) {
FixQEqReaxKokkosSparse23Functor<DeviceType,HALF> sparse23_functor(this);
Kokkos::parallel_for(inum,sparse23_functor);
} else {
FixQEqReaxKokkosSparse23Functor<DeviceType,HALFTHREAD> sparse23_functor(this);
Kokkos::parallel_for(inum,sparse23_functor);
}
} else {
Kokkos::parallel_for(Kokkos::TeamPolicy <DeviceType, TagSparseMatvec2> (inum, teamsize), *this);
}
DeviceType::fence();
if (neighflag != FULL) {
k_o.template modify<DeviceType>();
k_o.template sync<LMPHostType>();
comm->reverse_comm_fix(this); //Coll_vector( q );
k_o.template modify<LMPHostType>();
k_o.template sync<DeviceType>();
}
// tmp = parallel_dot( d, q, nn);
my_dot = dot_sqr = 0.0;
FixQEqReaxKokkosDot2Functor<DeviceType> dot2_functor(this);
Kokkos::parallel_reduce(inum,dot2_functor,my_dot);
DeviceType::fence();
MPI_Allreduce( &my_dot, &dot_sqr, 1, MPI_DOUBLE, MPI_SUM, world );
tmp = dot_sqr;
alpha = sig_new / tmp;
sig_old = sig_new;
// vector_add( s, alpha, d, nn );
// vector_add( r, -alpha, q, nn );
my_dot = dot_sqr = 0.0;
FixQEqReaxKokkosPrecon1Functor<DeviceType> precon1_functor(this);
Kokkos::parallel_for(inum,precon1_functor);
DeviceType::fence();
// preconditioning: p[j] = r[j] * Hdia_inv[j];
// sig_new = parallel_dot( r, p, nn);
FixQEqReaxKokkosPreconFunctor<DeviceType> precon_functor(this);
Kokkos::parallel_reduce(inum,precon_functor,my_dot);
DeviceType::fence();
MPI_Allreduce( &my_dot, &dot_sqr, 1, MPI_DOUBLE, MPI_SUM, world );
sig_new = dot_sqr;
beta = sig_new / sig_old;
// vector_sum( d, 1., p, beta, d, nn );
FixQEqReaxKokkosVecSum2Functor<DeviceType> vecsum2_functor(this);
Kokkos::parallel_for(inum,vecsum2_functor);
DeviceType::fence();
}
if (loop >= loopmax && comm->me == 0) {
char str[128];
sprintf(str,"Fix qeq/reax cg_solve1 convergence failed after %d iterations "
"at " BIGINT_FORMAT " step: %f",loop,update->ntimestep,sqrt(sig_new)/b_norm);
error->warning(FLERR,str);
//error->all(FLERR,str);
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void FixQEqReaxKokkos<DeviceType>::cg_solve2()
// b = b_t, x = t;
{
const int inum = list->inum;
const int ignum = inum + list->gnum;
F_FLOAT tmp, sig_old, b_norm;
const int teamsize = TEAMSIZE;
// sparse_matvec( &H, x, q );
FixQEqReaxKokkosSparse32Functor<DeviceType> sparse32_functor(this);
Kokkos::parallel_for(inum,sparse32_functor);
DeviceType::fence();
if (neighflag != FULL) {
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType,TagZeroQGhosts>(nlocal,nlocal+atom->nghost),*this);
DeviceType::fence();
if (neighflag == HALF) {
FixQEqReaxKokkosSparse33Functor<DeviceType,HALF> sparse33_functor(this);
Kokkos::parallel_for(inum,sparse33_functor);
} else {
FixQEqReaxKokkosSparse33Functor<DeviceType,HALFTHREAD> sparse33_functor(this);
Kokkos::parallel_for(inum,sparse33_functor);
}
} else {
Kokkos::parallel_for(Kokkos::TeamPolicy <DeviceType, TagSparseMatvec3> (inum, teamsize), *this);
}
DeviceType::fence();
if (neighflag != FULL) {
k_o.template modify<DeviceType>();
k_o.template sync<LMPHostType>();
comm->reverse_comm_fix(this); //Coll_vector( q );
k_o.template modify<LMPHostType>();
k_o.template sync<DeviceType>();
}
// vector_sum( r , 1., b, -1., q, nn );
// preconditioning: d[j] = r[j] * Hdia_inv[j];
// b_norm = parallel_norm( b, nn );
F_FLOAT my_norm = 0.0;
FixQEqReaxKokkosNorm2Functor<DeviceType> norm2_functor(this);
Kokkos::parallel_reduce(inum,norm2_functor,my_norm);
DeviceType::fence();
F_FLOAT norm_sqr = 0.0;
MPI_Allreduce( &my_norm, &norm_sqr, 1, MPI_DOUBLE, MPI_SUM, world );
b_norm = sqrt(norm_sqr);
DeviceType::fence();
// sig_new = parallel_dot( r, d, nn);
F_FLOAT my_dot = 0.0;
FixQEqReaxKokkosDot1Functor<DeviceType> dot1_functor(this);
Kokkos::parallel_reduce(inum,dot1_functor,my_dot);
DeviceType::fence();
F_FLOAT dot_sqr = 0.0;
MPI_Allreduce( &my_dot, &dot_sqr, 1, MPI_DOUBLE, MPI_SUM, world );
F_FLOAT sig_new = dot_sqr;
DeviceType::fence();
int loop;
const int loopmax = 200;
for (loop = 1; loop < loopmax & sqrt(sig_new)/b_norm > tolerance; loop++) {
// comm->forward_comm_fix(this); //Dist_vector( d );
pack_flag = 1;
k_d.template modify<DeviceType>();
k_d.template sync<LMPHostType>();
comm->forward_comm_fix(this);
k_d.template modify<LMPHostType>();
k_d.template sync<DeviceType>();
// sparse_matvec( &H, d, q );
FixQEqReaxKokkosSparse22Functor<DeviceType> sparse22_functor(this);
Kokkos::parallel_for(inum,sparse22_functor);
DeviceType::fence();
if (neighflag != FULL) {
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType,TagZeroQGhosts>(nlocal,nlocal+atom->nghost),*this);
DeviceType::fence();
if (neighflag == HALF) {
FixQEqReaxKokkosSparse23Functor<DeviceType,HALF> sparse23_functor(this);
Kokkos::parallel_for(inum,sparse23_functor);
} else {
FixQEqReaxKokkosSparse23Functor<DeviceType,HALFTHREAD> sparse23_functor(this);
Kokkos::parallel_for(inum,sparse23_functor);
}
} else {
Kokkos::parallel_for(Kokkos::TeamPolicy <DeviceType, TagSparseMatvec2> (inum, teamsize), *this);
}
DeviceType::fence();
if (neighflag != FULL) {
k_o.template modify<DeviceType>();
k_o.template sync<LMPHostType>();
comm->reverse_comm_fix(this); //Coll_vector( q );
k_o.template modify<LMPHostType>();
k_o.template sync<DeviceType>();
}
// tmp = parallel_dot( d, q, nn);
my_dot = dot_sqr = 0.0;
FixQEqReaxKokkosDot2Functor<DeviceType> dot2_functor(this);
Kokkos::parallel_reduce(inum,dot2_functor,my_dot);
DeviceType::fence();
MPI_Allreduce( &my_dot, &dot_sqr, 1, MPI_DOUBLE, MPI_SUM, world );
tmp = dot_sqr;
DeviceType::fence();
alpha = sig_new / tmp;
sig_old = sig_new;
// vector_add( t, alpha, d, nn );
// vector_add( r, -alpha, q, nn );
my_dot = dot_sqr = 0.0;
FixQEqReaxKokkosPrecon2Functor<DeviceType> precon2_functor(this);
Kokkos::parallel_for(inum,precon2_functor);
DeviceType::fence();
// preconditioning: p[j] = r[j] * Hdia_inv[j];
// sig_new = parallel_dot( r, p, nn);
FixQEqReaxKokkosPreconFunctor<DeviceType> precon_functor(this);
Kokkos::parallel_reduce(inum,precon_functor,my_dot);
DeviceType::fence();
MPI_Allreduce( &my_dot, &dot_sqr, 1, MPI_DOUBLE, MPI_SUM, world );
sig_new = dot_sqr;
beta = sig_new / sig_old;
// vector_sum( d, 1., p, beta, d, nn );
FixQEqReaxKokkosVecSum2Functor<DeviceType> vecsum2_functor(this);
Kokkos::parallel_for(inum,vecsum2_functor);
DeviceType::fence();
}
if (loop >= loopmax && comm->me == 0) {
char str[128];
sprintf(str,"Fix qeq/reax cg_solve2 convergence failed after %d iterations "
"at " BIGINT_FORMAT " step: %f",loop,update->ntimestep,sqrt(sig_new)/b_norm);
error->warning(FLERR,str);
//error->all(FLERR,str);
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void FixQEqReaxKokkos<DeviceType>::calculate_q()
{
F_FLOAT sum, sum_all;
const int inum = list->inum;
// s_sum = parallel_vector_acc( s, nn );
sum = sum_all = 0.0;
FixQEqReaxKokkosVecAcc1Functor<DeviceType> vecacc1_functor(this);
Kokkos::parallel_reduce(inum,vecacc1_functor,sum);
DeviceType::fence();
MPI_Allreduce(&sum, &sum_all, 1, MPI_DOUBLE, MPI_SUM, world );
const F_FLOAT s_sum = sum_all;
// t_sum = parallel_vector_acc( t, nn);
sum = sum_all = 0.0;
FixQEqReaxKokkosVecAcc2Functor<DeviceType> vecacc2_functor(this);
Kokkos::parallel_reduce(inum,vecacc2_functor,sum);
DeviceType::fence();
MPI_Allreduce(&sum, &sum_all, 1, MPI_DOUBLE, MPI_SUM, world );
const F_FLOAT t_sum = sum_all;
// u = s_sum / t_sum;
delta = s_sum/t_sum;
// q[i] = s[i] - u * t[i];
FixQEqReaxKokkosCalculateQFunctor<DeviceType> calculateQ_functor(this);
Kokkos::parallel_for(inum,calculateQ_functor);
DeviceType::fence();
pack_flag = 4;
//comm->forward_comm_fix( this ); //Dist_vector( atom->q );
atomKK->k_q.modify<DeviceType>();
atomKK->k_q.sync<LMPHostType>();
comm->forward_comm_fix(this);
atomKK->k_q.modify<LMPHostType>();
atomKK->k_q.sync<DeviceType>();
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void FixQEqReaxKokkos<DeviceType>::sparse12_item(int ii) const
{
const int i = d_ilist[ii];
const int itype = type(i);
if (mask[i] & groupbit) {
d_o[i] = params(itype).eta * d_s[i];
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
template<int NEIGHFLAG>
KOKKOS_INLINE_FUNCTION
void FixQEqReaxKokkos<DeviceType>::sparse13_item(int ii) const
{
// The q array is atomic for Half/Thread neighbor style
Kokkos::View<F_FLOAT*, typename DAT::t_float_1d::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_o = d_o;
const int i = d_ilist[ii];
if (mask[i] & groupbit) {
F_FLOAT tmp = 0.0;
for(int jj = d_firstnbr[i]; jj < d_firstnbr[i] + d_numnbrs[i]; jj++) {
const int j = d_jlist(jj);
tmp += d_val(jj) * d_s[j];
a_o[j] += d_val(jj) * d_s[i];
}
a_o[i] += tmp;
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void FixQEqReaxKokkos<DeviceType>::operator() (TagSparseMatvec1, const membertype1 &team) const
{
const int i = d_ilist[team.league_rank()];
if (mask[i] & groupbit) {
F_FLOAT doitmp;
Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team, d_firstnbr[i], d_firstnbr[i] + d_numnbrs[i]), [&] (const int &jj, F_FLOAT &doi) {
const int j = d_jlist(jj);
doi += d_val(jj) * d_s[j];
}, doitmp);
Kokkos::single(Kokkos::PerTeam(team), [&] () {d_o[i] += doitmp;});
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void FixQEqReaxKokkos<DeviceType>::sparse22_item(int ii) const
{
const int i = d_ilist[ii];
const int itype = type(i);
if (mask[i] & groupbit) {
d_o[i] = params(itype).eta * d_d[i];
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
template<int NEIGHFLAG>
KOKKOS_INLINE_FUNCTION
void FixQEqReaxKokkos<DeviceType>::sparse23_item(int ii) const
{
// The q array is atomic for Half/Thread neighbor style
Kokkos::View<F_FLOAT*, typename DAT::t_float_1d::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_o = d_o;
const int i = d_ilist[ii];
if (mask[i] & groupbit) {
F_FLOAT tmp = 0.0;
for(int jj = d_firstnbr[i]; jj < d_firstnbr[i] + d_numnbrs[i]; jj++) {
const int j = d_jlist(jj);
tmp += d_val(jj) * d_d[j];
a_o[j] += d_val(jj) * d_d[i];
}
a_o[i] += tmp;
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void FixQEqReaxKokkos<DeviceType>::operator() (TagSparseMatvec2, const membertype2 &team) const
{
const int i = d_ilist[team.league_rank()];
if (mask[i] & groupbit) {
F_FLOAT doitmp;
Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team, d_firstnbr[i], d_firstnbr[i] + d_numnbrs[i]), [&] (const int &jj, F_FLOAT &doi) {
const int j = d_jlist(jj);
doi += d_val(jj) * d_d[j];
}, doitmp);
Kokkos::single(Kokkos::PerTeam(team), [&] () {d_o[i] += doitmp; });
}
}
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void FixQEqReaxKokkos<DeviceType>::operator() (TagZeroQGhosts, const int &i) const
{
if (mask[i] & groupbit)
d_o[i] = 0.0;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void FixQEqReaxKokkos<DeviceType>::sparse32_item(int ii) const
{
const int i = d_ilist[ii];
const int itype = type(i);
if (mask[i] & groupbit)
d_o[i] = params(itype).eta * d_t[i];
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
template<int NEIGHFLAG>
KOKKOS_INLINE_FUNCTION
void FixQEqReaxKokkos<DeviceType>::sparse33_item(int ii) const
{
// The q array is atomic for Half/Thread neighbor style
Kokkos::View<F_FLOAT*, typename DAT::t_float_1d::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_o = d_o;
const int i = d_ilist[ii];
if (mask[i] & groupbit) {
F_FLOAT tmp = 0.0;
for(int jj = d_firstnbr[i]; jj < d_firstnbr[i] + d_numnbrs[i]; jj++) {
const int j = d_jlist(jj);
tmp += d_val(jj) * d_t[j];
a_o[j] += d_val(jj) * d_t[i];
}
a_o[i] += tmp;
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void FixQEqReaxKokkos<DeviceType>::operator() (TagSparseMatvec3, const membertype3 &team) const
{
const int i = d_ilist[team.league_rank()];
if (mask[i] & groupbit) {
F_FLOAT doitmp;
Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team, d_firstnbr[i], d_firstnbr[i] + d_numnbrs[i]), [&] (const int &jj, F_FLOAT &doi) {
const int j = d_jlist(jj);
doi += d_val(jj) * d_t[j];
}, doitmp);
Kokkos::single(Kokkos::PerTeam(team), [&] () {d_o[i] += doitmp;});
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void FixQEqReaxKokkos<DeviceType>::vecsum2_item(int ii) const
{
const int i = d_ilist[ii];
if (mask[i] & groupbit)
d_d[i] = 1.0 * d_p[i] + beta * d_d[i];
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
double FixQEqReaxKokkos<DeviceType>::norm1_item(int ii) const
{
F_FLOAT tmp = 0;
const int i = d_ilist[ii];
if (mask[i] & groupbit) {
d_r[i] = 1.0*d_b_s[i] + -1.0*d_o[i];
d_d[i] = d_r[i] * d_Hdia_inv[i];
tmp = d_b_s[i] * d_b_s[i];
}
return tmp;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
double FixQEqReaxKokkos<DeviceType>::norm2_item(int ii) const
{
F_FLOAT tmp = 0;
const int i = d_ilist[ii];
if (mask[i] & groupbit) {
d_r[i] = 1.0*d_b_t[i] + -1.0*d_o[i];
d_d[i] = d_r[i] * d_Hdia_inv[i];
tmp = d_b_t[i] * d_b_t[i];
}
return tmp;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
double FixQEqReaxKokkos<DeviceType>::dot1_item(int ii) const
{
F_FLOAT tmp = 0.0;
const int i = d_ilist[ii];
if (mask[i] & groupbit)
tmp = d_r[i] * d_d[i];
return tmp;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
double FixQEqReaxKokkos<DeviceType>::dot2_item(int ii) const
{
double tmp = 0.0;
const int i = d_ilist[ii];
if (mask[i] & groupbit) {
tmp = d_d[i] * d_o[i];
}
return tmp;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void FixQEqReaxKokkos<DeviceType>::precon1_item(int ii) const
{
const int i = d_ilist[ii];
if (mask[i] & groupbit) {
d_s[i] += alpha * d_d[i];
d_r[i] += -alpha * d_o[i];
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void FixQEqReaxKokkos<DeviceType>::precon2_item(int ii) const
{
const int i = d_ilist[ii];
if (mask[i] & groupbit) {
d_t[i] += alpha * d_d[i];
d_r[i] += -alpha * d_o[i];
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
double FixQEqReaxKokkos<DeviceType>::precon_item(int ii) const
{
F_FLOAT tmp = 0.0;
const int i = d_ilist[ii];
if (mask[i] & groupbit) {
d_p[i] = d_r[i] * d_Hdia_inv[i];
tmp = d_r[i] * d_p[i];
}
return tmp;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
double FixQEqReaxKokkos<DeviceType>::vecacc1_item(int ii) const
{
F_FLOAT tmp = 0.0;
const int i = d_ilist[ii];
if (mask[i] & groupbit)
tmp = d_s[i];
return tmp;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
double FixQEqReaxKokkos<DeviceType>::vecacc2_item(int ii) const
{
F_FLOAT tmp = 0.0;
const int i = d_ilist[ii];
if (mask[i] & groupbit) {
tmp = d_t[i];
}
return tmp;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void FixQEqReaxKokkos<DeviceType>::calculate_q_item(int ii) const
{
const int i = d_ilist[ii];
if (mask[i] & groupbit) {
q(i) = d_s[i] - delta * d_t[i];
for (int k = 4; k > 0; --k) {
d_s_hist(i,k) = d_s_hist(i,k-1);
d_t_hist(i,k) = d_t_hist(i,k-1);
}
d_s_hist(i,0) = d_s[i];
d_t_hist(i,0) = d_t[i];
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
int FixQEqReaxKokkos<DeviceType>::pack_forward_comm(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int m;
if( pack_flag == 1)
for(m = 0; m < n; m++) buf[m] = h_d[list[m]];
else if( pack_flag == 2 )
for(m = 0; m < n; m++) buf[m] = h_s[list[m]];
else if( pack_flag == 3 )
for(m = 0; m < n; m++) buf[m] = h_t[list[m]];
else if( pack_flag == 4 )
for(m = 0; m < n; m++) buf[m] = atom->q[list[m]];
return n;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void FixQEqReaxKokkos<DeviceType>::unpack_forward_comm(int n, int first, double *buf)
{
int i, m;
if( pack_flag == 1)
for(m = 0, i = first; m < n; m++, i++) h_d[i] = buf[m];
else if( pack_flag == 2)
for(m = 0, i = first; m < n; m++, i++) h_s[i] = buf[m];
else if( pack_flag == 3)
for(m = 0, i = first; m < n; m++, i++) h_t[i] = buf[m];
else if( pack_flag == 4)
for(m = 0, i = first; m < n; m++, i++) atom->q[i] = buf[m];
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
int FixQEqReaxKokkos<DeviceType>::pack_reverse_comm(int n, int first, double *buf)
{
int i, m;
for(m = 0, i = first; m < n; m++, i++) {
buf[m] = h_o[i];
}
return n;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void FixQEqReaxKokkos<DeviceType>::unpack_reverse_comm(int n, int *list, double *buf)
{
for(int m = 0; m < n; m++) {
h_o[list[m]] += buf[m];
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void FixQEqReaxKokkos<DeviceType>::cleanup_copy()
{
id = style = NULL;
}
/* ----------------------------------------------------------------------
memory usage of local atom-based arrays
------------------------------------------------------------------------- */
template<class DeviceType>
double FixQEqReaxKokkos<DeviceType>::memory_usage()
{
double bytes;
bytes = atom->nmax*5*2 * sizeof(F_FLOAT); // s_hist & t_hist
bytes += atom->nmax*8 * sizeof(F_FLOAT); // storage
bytes += n_cap*2 * sizeof(int); // matrix...
bytes += m_cap * sizeof(int);
bytes += m_cap * sizeof(F_FLOAT);
return bytes;
}
/* ---------------------------------------------------------------------- */\
namespace LAMMPS_NS {
template class FixQEqReaxKokkos<LMPDeviceType>;
#ifdef KOKKOS_HAVE_CUDA
template class FixQEqReaxKokkos<LMPHostType>;
#endif
-}
\ No newline at end of file
+}
diff --git a/src/KOKKOS/fix_qeq_reax_kokkos.h b/src/KOKKOS/fix_qeq_reax_kokkos.h
index fcfc28fa7..eca0d761b 100644
--- a/src/KOKKOS/fix_qeq_reax_kokkos.h
+++ b/src/KOKKOS/fix_qeq_reax_kokkos.h
@@ -1,498 +1,498 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#ifdef FIX_CLASS
FixStyle(qeq/reax/kk,FixQEqReaxKokkos<LMPDeviceType>)
FixStyle(qeq/reax/kk/device,FixQEqReaxKokkos<LMPDeviceType>)
FixStyle(qeq/reax/kk/host,FixQEqReaxKokkos<LMPHostType>)
#else
#ifndef LMP_FIX_QEQ_REAX_KOKKOS_H
#define LMP_FIX_QEQ_REAX_KOKKOS_H
#include "fix_qeq_reax.h"
#include "kokkos_type.h"
#include "neigh_list.h"
#include "neigh_list_kokkos.h"
namespace LAMMPS_NS {
struct TagSparseMatvec1 {};
struct TagSparseMatvec2 {};
struct TagSparseMatvec3 {};
struct TagZeroQGhosts{};
template<class DeviceType>
class FixQEqReaxKokkos : public FixQEqReax {
public:
typedef DeviceType device_type;
typedef ArrayTypes<DeviceType> AT;
FixQEqReaxKokkos(class LAMMPS *, int, char **);
~FixQEqReaxKokkos();
void cleanup_copy();
void init();
void setup_pre_force(int);
void pre_force(int);
KOKKOS_INLINE_FUNCTION
void num_neigh_item(int, int&) const;
KOKKOS_INLINE_FUNCTION
void zero_item(int) const;
KOKKOS_INLINE_FUNCTION
void compute_h_item(int, int &, const bool &) const;
KOKKOS_INLINE_FUNCTION
void mat_vec_item(int) const;
KOKKOS_INLINE_FUNCTION
void sparse12_item(int) const;
template<int NEIGHFLAG>
KOKKOS_INLINE_FUNCTION
void sparse13_item(int) const;
KOKKOS_INLINE_FUNCTION
void sparse22_item(int) const;
template<int NEIGHFLAG>
KOKKOS_INLINE_FUNCTION
void sparse23_item(int) const;
KOKKOS_INLINE_FUNCTION
void sparse32_item(int) const;
template<int NEIGHFLAG>
KOKKOS_INLINE_FUNCTION
void sparse33_item(int) const;
typedef typename Kokkos::TeamPolicy <DeviceType, TagSparseMatvec1> ::member_type membertype1;
KOKKOS_INLINE_FUNCTION
void operator() (TagSparseMatvec1, const membertype1 &team) const;
typedef typename Kokkos::TeamPolicy <DeviceType, TagSparseMatvec2> ::member_type membertype2;
KOKKOS_INLINE_FUNCTION
void operator() (TagSparseMatvec2, const membertype2 &team) const;
typedef typename Kokkos::TeamPolicy <DeviceType, TagSparseMatvec3> ::member_type membertype3;
KOKKOS_INLINE_FUNCTION
void operator() (TagSparseMatvec3, const membertype3 &team) const;
KOKKOS_INLINE_FUNCTION
void operator()(TagZeroQGhosts, const int&) const;
KOKKOS_INLINE_FUNCTION
void vecsum2_item(int) const;
KOKKOS_INLINE_FUNCTION
double norm1_item(int) const;
KOKKOS_INLINE_FUNCTION
double norm2_item(int) const;
KOKKOS_INLINE_FUNCTION
double dot1_item(int) const;
KOKKOS_INLINE_FUNCTION
double dot2_item(int) const;
KOKKOS_INLINE_FUNCTION
void precon1_item(int) const;
KOKKOS_INLINE_FUNCTION
void precon2_item(int) const;
KOKKOS_INLINE_FUNCTION
double precon_item(int) const;
KOKKOS_INLINE_FUNCTION
double vecacc1_item(int) const;
KOKKOS_INLINE_FUNCTION
double vecacc2_item(int) const;
KOKKOS_INLINE_FUNCTION
void calculate_q_item(int) const;
KOKKOS_INLINE_FUNCTION
double calculate_H_k(const F_FLOAT &r, const F_FLOAT &shld) const;
struct params_qeq{
KOKKOS_INLINE_FUNCTION
params_qeq(){chi=0;eta=0;gamma=0;};
KOKKOS_INLINE_FUNCTION
params_qeq(int i){chi=0;eta=0;gamma=0;};
F_FLOAT chi, eta, gamma;
};
virtual int pack_forward_comm(int, int *, double *, int, int *);
virtual void unpack_forward_comm(int, int, double *);
int pack_reverse_comm(int, int, double *);
void unpack_reverse_comm(int, int *, double *);
double memory_usage();
protected:
- int inum,use_pair_list;
+ int inum;
int allocated_flag;
typedef Kokkos::DualView<int***,DeviceType> tdual_int_1d;
Kokkos::DualView<params_qeq*,Kokkos::LayoutRight,DeviceType> k_params;
typename Kokkos::DualView<params_qeq*, Kokkos::LayoutRight,DeviceType>::t_dev_const params;
typename ArrayTypes<DeviceType>::t_x_array x;
typename ArrayTypes<DeviceType>::t_v_array v;
typename ArrayTypes<DeviceType>::t_f_array_const f;
//typename ArrayTypes<DeviceType>::t_float_1d_randomread mass, q;
typename ArrayTypes<DeviceType>::t_float_1d_randomread mass;
typename ArrayTypes<DeviceType>::t_float_1d q;
typename ArrayTypes<DeviceType>::t_int_1d type, tag, mask;
DAT::tdual_float_1d k_q;
typename AT::t_float_1d d_q;
HAT::t_float_1d h_q;
typename ArrayTypes<DeviceType>::t_neighbors_2d d_neighbors;
typename ArrayTypes<DeviceType>::t_int_1d_randomread d_ilist, d_numneigh;
DAT::tdual_ffloat_1d k_tap;
typename AT::t_ffloat_1d d_tap;
typename AT::t_int_1d d_firstnbr;
typename AT::t_int_1d d_numnbrs;
typename AT::t_int_1d d_jlist;
typename AT::t_ffloat_1d d_val;
DAT::tdual_ffloat_1d k_t, k_s;
typename AT::t_ffloat_1d d_Hdia_inv, d_b_s, d_b_t, d_t, d_s;
HAT::t_ffloat_1d h_t, h_s;
typename AT::t_ffloat_1d_randomread r_b_s, r_b_t, r_t, r_s;
DAT::tdual_ffloat_1d k_o, k_d;
typename AT::t_ffloat_1d d_p, d_o, d_r, d_d;
HAT::t_ffloat_1d h_o, h_d;
typename AT::t_ffloat_1d_randomread r_p, r_o, r_r, r_d;
DAT::tdual_ffloat_2d k_shield, k_s_hist, k_t_hist;
typename AT::t_ffloat_2d d_shield, d_s_hist, d_t_hist;
HAT::t_ffloat_2d h_s_hist, h_t_hist;
typename AT::t_ffloat_2d_randomread r_s_hist, r_t_hist;
void init_shielding_k();
void init_hist();
void allocate_matrix();
void allocate_array();
void cg_solve1();
void cg_solve2();
void calculate_q();
int neighflag, pack_flag;
int nlocal,nall,nmax,newton_pair;
int count, isuccess;
double alpha, beta, delta, cutsq;
int iswap;
int first;
typename AT::t_int_2d d_sendlist;
typename AT::t_xfloat_1d_um v_buf;
};
template <class DeviceType>
struct FixQEqReaxKokkosNumNeighFunctor {
typedef DeviceType device_type ;
typedef int value_type ;
FixQEqReaxKokkos<DeviceType> c;
FixQEqReaxKokkosNumNeighFunctor(FixQEqReaxKokkos<DeviceType>* c_ptr):c(*c_ptr) {
c.cleanup_copy();
};
KOKKOS_INLINE_FUNCTION
void operator()(const int ii, int &maxneigh) const {
c.num_neigh_item(ii, maxneigh);
}
};
template <class DeviceType>
struct FixQEqReaxKokkosMatVecFunctor {
typedef DeviceType device_type ;
FixQEqReaxKokkos<DeviceType> c;
FixQEqReaxKokkosMatVecFunctor(FixQEqReaxKokkos<DeviceType>* c_ptr):c(*c_ptr) {
c.cleanup_copy();
};
KOKKOS_INLINE_FUNCTION
void operator()(const int ii) const {
c.mat_vec_item(ii);
}
};
template <class DeviceType>
struct FixQEqReaxKokkosComputeHFunctor {
typedef DeviceType device_type ;
FixQEqReaxKokkos<DeviceType> c;
FixQEqReaxKokkosComputeHFunctor(FixQEqReaxKokkos<DeviceType>* c_ptr):c(*c_ptr) {
c.cleanup_copy();
};
KOKKOS_INLINE_FUNCTION
void operator()(const int ii, int &m_fill, const bool &final) const {
c.compute_h_item(ii,m_fill,final);
}
};
template <class DeviceType>
struct FixQEqReaxKokkosZeroFunctor {
typedef DeviceType device_type ;
FixQEqReaxKokkos<DeviceType> c;
FixQEqReaxKokkosZeroFunctor(FixQEqReaxKokkos<DeviceType>* c_ptr):c(*c_ptr) {
c.cleanup_copy();
};
KOKKOS_INLINE_FUNCTION
void operator()(const int ii) const {
c.zero_item(ii);
}
};
template <class DeviceType>
struct FixQEqReaxKokkosSparse12Functor {
typedef DeviceType device_type ;
FixQEqReaxKokkos<DeviceType> c;
FixQEqReaxKokkosSparse12Functor(FixQEqReaxKokkos<DeviceType>* c_ptr):c(*c_ptr) {
c.cleanup_copy();
};
KOKKOS_INLINE_FUNCTION
void operator()(const int ii) const {
c.sparse12_item(ii);
}
};
template <class DeviceType,int NEIGHFLAG>
struct FixQEqReaxKokkosSparse13Functor {
typedef DeviceType device_type ;
FixQEqReaxKokkos<DeviceType> c;
FixQEqReaxKokkosSparse13Functor(FixQEqReaxKokkos<DeviceType>* c_ptr):c(*c_ptr) {
c.cleanup_copy();
};
KOKKOS_INLINE_FUNCTION
void operator()(const int ii) const {
c.template sparse13_item<NEIGHFLAG>(ii);
}
};
template <class DeviceType>
struct FixQEqReaxKokkosSparse22Functor {
typedef DeviceType device_type ;
FixQEqReaxKokkos<DeviceType> c;
FixQEqReaxKokkosSparse22Functor(FixQEqReaxKokkos<DeviceType>* c_ptr):c(*c_ptr) {
c.cleanup_copy();
};
KOKKOS_INLINE_FUNCTION
void operator()(const int ii) const {
c.sparse22_item(ii);
}
};
template <class DeviceType,int NEIGHFLAG>
struct FixQEqReaxKokkosSparse23Functor {
typedef DeviceType device_type ;
FixQEqReaxKokkos<DeviceType> c;
FixQEqReaxKokkosSparse23Functor(FixQEqReaxKokkos<DeviceType>* c_ptr):c(*c_ptr) {
c.cleanup_copy();
};
KOKKOS_INLINE_FUNCTION
void operator()(const int ii) const {
c.template sparse23_item<NEIGHFLAG>(ii);
}
};
template <class DeviceType>
struct FixQEqReaxKokkosSparse32Functor {
typedef DeviceType device_type ;
FixQEqReaxKokkos<DeviceType> c;
FixQEqReaxKokkosSparse32Functor(FixQEqReaxKokkos<DeviceType>* c_ptr):c(*c_ptr) {
c.cleanup_copy();
};
KOKKOS_INLINE_FUNCTION
void operator()(const int ii) const {
c.sparse32_item(ii);
}
};
template <class DeviceType,int NEIGHFLAG>
struct FixQEqReaxKokkosSparse33Functor {
typedef DeviceType device_type ;
FixQEqReaxKokkos<DeviceType> c;
FixQEqReaxKokkosSparse33Functor(FixQEqReaxKokkos<DeviceType>* c_ptr):c(*c_ptr) {
c.cleanup_copy();
};
KOKKOS_INLINE_FUNCTION
void operator()(const int ii) const {
c.template sparse33_item<NEIGHFLAG>(ii);
}
};
template <class DeviceType>
struct FixQEqReaxKokkosVecSum2Functor {
typedef DeviceType device_type ;
FixQEqReaxKokkos<DeviceType> c;
FixQEqReaxKokkosVecSum2Functor(FixQEqReaxKokkos<DeviceType>* c_ptr):c(*c_ptr) {
c.cleanup_copy();
};
KOKKOS_INLINE_FUNCTION
void operator()(const int ii) const {
c.vecsum2_item(ii);
}
};
template <class DeviceType>
struct FixQEqReaxKokkosNorm1Functor {
typedef DeviceType device_type ;
FixQEqReaxKokkos<DeviceType> c;
typedef double value_type;
FixQEqReaxKokkosNorm1Functor(FixQEqReaxKokkos<DeviceType>* c_ptr):c(*c_ptr) {
c.cleanup_copy();
};
KOKKOS_INLINE_FUNCTION
void operator()(const int ii, value_type &tmp) const {
tmp += c.norm1_item(ii);
}
};
template <class DeviceType>
struct FixQEqReaxKokkosNorm2Functor {
typedef DeviceType device_type ;
FixQEqReaxKokkos<DeviceType> c;
typedef double value_type;
FixQEqReaxKokkosNorm2Functor(FixQEqReaxKokkos<DeviceType>* c_ptr):c(*c_ptr) {
c.cleanup_copy();
};
KOKKOS_INLINE_FUNCTION
void operator()(const int ii, value_type &tmp) const {
tmp += c.norm2_item(ii);
}
};
template <class DeviceType>
struct FixQEqReaxKokkosDot1Functor {
typedef DeviceType device_type ;
FixQEqReaxKokkos<DeviceType> c;
typedef double value_type;
FixQEqReaxKokkosDot1Functor(FixQEqReaxKokkos<DeviceType>* c_ptr):c(*c_ptr) {
c.cleanup_copy();
};
KOKKOS_INLINE_FUNCTION
void operator()(const int ii, value_type &tmp) const {
tmp += c.dot1_item(ii);
}
};
template <class DeviceType>
struct FixQEqReaxKokkosDot2Functor {
typedef DeviceType device_type ;
FixQEqReaxKokkos<DeviceType> c;
typedef double value_type;
FixQEqReaxKokkosDot2Functor(FixQEqReaxKokkos<DeviceType>* c_ptr):c(*c_ptr) {
c.cleanup_copy();
};
KOKKOS_INLINE_FUNCTION
void operator()(const int ii, value_type &tmp) const {
tmp += c.dot2_item(ii);
}
};
template <class DeviceType>
struct FixQEqReaxKokkosPrecon1Functor {
typedef DeviceType device_type ;
FixQEqReaxKokkos<DeviceType> c;
FixQEqReaxKokkosPrecon1Functor(FixQEqReaxKokkos<DeviceType>* c_ptr):c(*c_ptr) {
c.cleanup_copy();
};
KOKKOS_INLINE_FUNCTION
void operator()(const int ii) const {
c.precon1_item(ii);
}
};
template <class DeviceType>
struct FixQEqReaxKokkosPrecon2Functor {
typedef DeviceType device_type ;
FixQEqReaxKokkos<DeviceType> c;
FixQEqReaxKokkosPrecon2Functor(FixQEqReaxKokkos<DeviceType>* c_ptr):c(*c_ptr) {
c.cleanup_copy();
};
KOKKOS_INLINE_FUNCTION
void operator()(const int ii) const {
c.precon2_item(ii);
}
};
template <class DeviceType>
struct FixQEqReaxKokkosPreconFunctor {
typedef DeviceType device_type ;
FixQEqReaxKokkos<DeviceType> c;
typedef double value_type;
FixQEqReaxKokkosPreconFunctor(FixQEqReaxKokkos<DeviceType>* c_ptr):c(*c_ptr) {
c.cleanup_copy();
};
KOKKOS_INLINE_FUNCTION
void operator()(const int ii, value_type &tmp) const {
tmp += c.precon_item(ii);
}
};
template <class DeviceType>
struct FixQEqReaxKokkosVecAcc1Functor {
typedef DeviceType device_type ;
FixQEqReaxKokkos<DeviceType> c;
typedef double value_type;
FixQEqReaxKokkosVecAcc1Functor(FixQEqReaxKokkos<DeviceType>* c_ptr):c(*c_ptr) {
c.cleanup_copy();
};
KOKKOS_INLINE_FUNCTION
void operator()(const int ii, value_type &tmp) const {
tmp += c.vecacc1_item(ii);
}
};
template <class DeviceType>
struct FixQEqReaxKokkosVecAcc2Functor {
typedef DeviceType device_type ;
FixQEqReaxKokkos<DeviceType> c;
typedef double value_type;
FixQEqReaxKokkosVecAcc2Functor(FixQEqReaxKokkos<DeviceType>* c_ptr):c(*c_ptr) {
c.cleanup_copy();
};
KOKKOS_INLINE_FUNCTION
void operator()(const int ii, value_type &tmp) const {
tmp += c.vecacc2_item(ii);
}
};
template <class DeviceType>
struct FixQEqReaxKokkosCalculateQFunctor {
typedef DeviceType device_type ;
FixQEqReaxKokkos<DeviceType> c;
FixQEqReaxKokkosCalculateQFunctor(FixQEqReaxKokkos<DeviceType>* c_ptr):c(*c_ptr) {
c.cleanup_copy();
};
KOKKOS_INLINE_FUNCTION
void operator()(const int ii) const {
c.calculate_q_item(ii);
}
};
}
#endif
#endif
diff --git a/src/KOKKOS/kokkos_type.h b/src/KOKKOS/kokkos_type.h
index cc096058e..5b53b8ed0 100644
--- a/src/KOKKOS/kokkos_type.h
+++ b/src/KOKKOS/kokkos_type.h
@@ -1,929 +1,937 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#ifndef LMP_LMPTYPE_KOKKOS_H
#define LMP_LMPTYPE_KOKKOS_H
#include "lmptype.h"
#include <Kokkos_Core.hpp>
#include <Kokkos_DualView.hpp>
#include <impl/Kokkos_Timer.hpp>
#include <Kokkos_Vectorization.hpp>
#if defined(KOKKOS_HAVE_CXX11)
#undef ISFINITE
#define ISFINITE(x) std::isfinite(x)
#endif
// User-settable FFT precision
// FFT_PRECISION = 1 is single-precision complex (4-byte real, 4-byte imag)
// FFT_PRECISION = 2 is double-precision complex (8-byte real, 8-byte imag)
#ifdef FFT_SINGLE
#define FFT_PRECISION 1
#define MPI_FFT_SCALAR MPI_FLOAT
typedef float FFT_SCALAR;
#else
#define FFT_PRECISION 2
#define MPI_FFT_SCALAR MPI_DOUBLE
typedef double FFT_SCALAR;
#endif
#define MAX_TYPES_STACKPARAMS 12
#define NeighClusterSize 8
struct lmp_float3 {
float x,y,z;
KOKKOS_INLINE_FUNCTION
lmp_float3():x(0.0f),y(0.0f),z(0.0f) {}
KOKKOS_INLINE_FUNCTION
void operator += (const lmp_float3& tmp) {
x+=tmp.x;
y+=tmp.y;
z+=tmp.z;
}
KOKKOS_INLINE_FUNCTION
void operator += (const lmp_float3& tmp) volatile {
x+=tmp.x;
y+=tmp.y;
z+=tmp.z;
}
KOKKOS_INLINE_FUNCTION
void operator = (const lmp_float3& tmp) {
x=tmp.x;
y=tmp.y;
z=tmp.z;
}
KOKKOS_INLINE_FUNCTION
void operator = (const lmp_float3& tmp) volatile {
x=tmp.x;
y=tmp.y;
z=tmp.z;
}
};
struct lmp_double3 {
double x,y,z;
KOKKOS_INLINE_FUNCTION
lmp_double3():x(0.0),y(0.0),z(0.0) {}
KOKKOS_INLINE_FUNCTION
void operator += (const lmp_double3& tmp) {
x+=tmp.x;
y+=tmp.y;
z+=tmp.z;
}
KOKKOS_INLINE_FUNCTION
void operator += (const lmp_double3& tmp) volatile {
x+=tmp.x;
y+=tmp.y;
z+=tmp.z;
}
KOKKOS_INLINE_FUNCTION
void operator = (const lmp_double3& tmp) {
x=tmp.x;
y=tmp.y;
z=tmp.z;
}
KOKKOS_INLINE_FUNCTION
void operator = (const lmp_double3& tmp) volatile {
x=tmp.x;
y=tmp.y;
z=tmp.z;
}
};
#if !defined(__CUDACC__) && !defined(__VECTOR_TYPES_H__)
struct double2 {
double x, y;
};
struct float2 {
float x, y;
};
struct float4 {
float x, y, z, w;
};
struct double4 {
double x, y, z, w;
};
#endif
// set LMPHostype and LMPDeviceType from Kokkos Default Types
typedef Kokkos::DefaultExecutionSpace LMPDeviceType;
typedef Kokkos::HostSpace::execution_space LMPHostType;
// set ExecutionSpace stuct with variable "space"
template<class Device>
struct ExecutionSpaceFromDevice;
template<>
struct ExecutionSpaceFromDevice<LMPHostType> {
static const LAMMPS_NS::ExecutionSpace space = LAMMPS_NS::Host;
};
#ifdef KOKKOS_HAVE_CUDA
template<>
struct ExecutionSpaceFromDevice<Kokkos::Cuda> {
static const LAMMPS_NS::ExecutionSpace space = LAMMPS_NS::Device;
};
#endif
// define precision
// handle global precision, force, energy, positions, kspace separately
#ifndef PRECISION
#define PRECISION 2
#endif
#if PRECISION==1
typedef float LMP_FLOAT;
typedef float2 LMP_FLOAT2;
typedef lmp_float3 LMP_FLOAT3;
typedef float4 LMP_FLOAT4;
#else
typedef double LMP_FLOAT;
typedef double2 LMP_FLOAT2;
typedef lmp_double3 LMP_FLOAT3;
typedef double4 LMP_FLOAT4;
#endif
#ifndef PREC_FORCE
#define PREC_FORCE PRECISION
#endif
#if PREC_FORCE==1
typedef float F_FLOAT;
typedef float2 F_FLOAT2;
typedef lmp_float3 F_FLOAT3;
typedef float4 F_FLOAT4;
#else
typedef double F_FLOAT;
typedef double2 F_FLOAT2;
typedef lmp_double3 F_FLOAT3;
typedef double4 F_FLOAT4;
#endif
#ifndef PREC_ENERGY
#define PREC_ENERGY PRECISION
#endif
#if PREC_ENERGY==1
typedef float E_FLOAT;
typedef float2 E_FLOAT2;
typedef float4 E_FLOAT4;
#else
typedef double E_FLOAT;
typedef double2 E_FLOAT2;
typedef double4 E_FLOAT4;
#endif
struct s_EV_FLOAT {
E_FLOAT evdwl;
E_FLOAT ecoul;
E_FLOAT v[6];
KOKKOS_INLINE_FUNCTION
s_EV_FLOAT() {
evdwl = 0;
ecoul = 0;
v[0] = 0; v[1] = 0; v[2] = 0;
v[3] = 0; v[4] = 0; v[5] = 0;
}
KOKKOS_INLINE_FUNCTION
void operator+=(const s_EV_FLOAT &rhs) {
evdwl += rhs.evdwl;
ecoul += rhs.ecoul;
v[0] += rhs.v[0];
v[1] += rhs.v[1];
v[2] += rhs.v[2];
v[3] += rhs.v[3];
v[4] += rhs.v[4];
v[5] += rhs.v[5];
}
KOKKOS_INLINE_FUNCTION
void operator+=(const volatile s_EV_FLOAT &rhs) volatile {
evdwl += rhs.evdwl;
ecoul += rhs.ecoul;
v[0] += rhs.v[0];
v[1] += rhs.v[1];
v[2] += rhs.v[2];
v[3] += rhs.v[3];
v[4] += rhs.v[4];
v[5] += rhs.v[5];
}
};
typedef struct s_EV_FLOAT EV_FLOAT;
struct s_EV_FLOAT_REAX {
E_FLOAT evdwl;
E_FLOAT ecoul;
E_FLOAT v[6];
E_FLOAT ereax[10];
KOKKOS_INLINE_FUNCTION
s_EV_FLOAT_REAX() {
evdwl = 0;
ecoul = 0;
v[0] = 0; v[1] = 0; v[2] = 0;
v[3] = 0; v[4] = 0; v[5] = 0;
ereax[0] = 0; ereax[1] = 0; ereax[2] = 0;
ereax[3] = 0; ereax[4] = 0; ereax[5] = 0;
ereax[6] = 0; ereax[7] = 0; ereax[8] = 0;
}
KOKKOS_INLINE_FUNCTION
void operator+=(const s_EV_FLOAT_REAX &rhs) {
evdwl += rhs.evdwl;
ecoul += rhs.ecoul;
v[0] += rhs.v[0];
v[1] += rhs.v[1];
v[2] += rhs.v[2];
v[3] += rhs.v[3];
v[4] += rhs.v[4];
v[5] += rhs.v[5];
ereax[0] += rhs.ereax[0];
ereax[1] += rhs.ereax[1];
ereax[2] += rhs.ereax[2];
ereax[3] += rhs.ereax[3];
ereax[4] += rhs.ereax[4];
ereax[5] += rhs.ereax[5];
ereax[6] += rhs.ereax[6];
ereax[7] += rhs.ereax[7];
ereax[8] += rhs.ereax[8];
}
KOKKOS_INLINE_FUNCTION
void operator+=(const volatile s_EV_FLOAT_REAX &rhs) volatile {
evdwl += rhs.evdwl;
ecoul += rhs.ecoul;
v[0] += rhs.v[0];
v[1] += rhs.v[1];
v[2] += rhs.v[2];
v[3] += rhs.v[3];
v[4] += rhs.v[4];
v[5] += rhs.v[5];
ereax[0] += rhs.ereax[0];
ereax[1] += rhs.ereax[1];
ereax[2] += rhs.ereax[2];
ereax[3] += rhs.ereax[3];
ereax[4] += rhs.ereax[4];
ereax[5] += rhs.ereax[5];
ereax[6] += rhs.ereax[6];
ereax[7] += rhs.ereax[7];
ereax[8] += rhs.ereax[8];
}
};
typedef struct s_EV_FLOAT_REAX EV_FLOAT_REAX;
#ifndef PREC_POS
#define PREC_POS PRECISION
#endif
#if PREC_POS==1
typedef float X_FLOAT;
typedef float2 X_FLOAT2;
typedef float4 X_FLOAT4;
#else
typedef double X_FLOAT;
typedef double2 X_FLOAT2;
typedef double4 X_FLOAT4;
#endif
#ifndef PREC_VELOCITIES
#define PREC_VELOCITIES PRECISION
#endif
#if PREC_VELOCITIES==1
typedef float V_FLOAT;
typedef float2 V_FLOAT2;
typedef float4 V_FLOAT4;
#else
typedef double V_FLOAT;
typedef double2 V_FLOAT2;
typedef double4 V_FLOAT4;
#endif
#if PREC_KSPACE==1
typedef float K_FLOAT;
typedef float2 K_FLOAT2;
typedef float4 K_FLOAT4;
#else
typedef double K_FLOAT;
typedef double2 K_FLOAT2;
typedef double4 K_FLOAT4;
#endif
// ------------------------------------------------------------------------
// LAMMPS types
template <class DeviceType>
struct ArrayTypes;
template <>
struct ArrayTypes<LMPDeviceType> {
// scalar types
typedef Kokkos::
DualView<int, LMPDeviceType::array_layout, LMPDeviceType> tdual_int_scalar;
typedef tdual_int_scalar::t_dev t_int_scalar;
typedef tdual_int_scalar::t_dev_const t_int_scalar_const;
typedef tdual_int_scalar::t_dev_um t_int_scalar_um;
typedef tdual_int_scalar::t_dev_const_um t_int_scalar_const_um;
typedef Kokkos::
DualView<LMP_FLOAT, LMPDeviceType::array_layout, LMPDeviceType>
tdual_float_scalar;
typedef tdual_float_scalar::t_dev t_float_scalar;
typedef tdual_float_scalar::t_dev_const t_float_scalar_const;
typedef tdual_float_scalar::t_dev_um t_float_scalar_um;
typedef tdual_float_scalar::t_dev_const_um t_float_scalar_const_um;
// generic array types
typedef Kokkos::
DualView<int*, LMPDeviceType::array_layout, LMPDeviceType> tdual_int_1d;
typedef tdual_int_1d::t_dev t_int_1d;
typedef tdual_int_1d::t_dev_const t_int_1d_const;
typedef tdual_int_1d::t_dev_um t_int_1d_um;
typedef tdual_int_1d::t_dev_const_um t_int_1d_const_um;
typedef tdual_int_1d::t_dev_const_randomread t_int_1d_randomread;
typedef Kokkos::
DualView<int*[3], Kokkos::LayoutRight, LMPDeviceType> tdual_int_1d_3;
typedef tdual_int_1d_3::t_dev t_int_1d_3;
typedef tdual_int_1d_3::t_dev_const t_int_1d_3_const;
typedef tdual_int_1d_3::t_dev_um t_int_1d_3_um;
typedef tdual_int_1d_3::t_dev_const_um t_int_1d_3_const_um;
typedef tdual_int_1d_3::t_dev_const_randomread t_int_1d_3_randomread;
typedef Kokkos::
DualView<int**, Kokkos::LayoutRight, LMPDeviceType> tdual_int_2d;
typedef tdual_int_2d::t_dev t_int_2d;
typedef tdual_int_2d::t_dev_const t_int_2d_const;
typedef tdual_int_2d::t_dev_um t_int_2d_um;
typedef tdual_int_2d::t_dev_const_um t_int_2d_const_um;
typedef tdual_int_2d::t_dev_const_randomread t_int_2d_randomread;
typedef Kokkos::
DualView<int**, LMPDeviceType::array_layout, LMPDeviceType> tdual_int_2d_dl;
typedef tdual_int_2d_dl::t_dev t_int_2d_dl;
typedef tdual_int_2d_dl::t_dev_const t_int_2d_const_dl;
typedef tdual_int_2d_dl::t_dev_um t_int_2d_um_dl;
typedef tdual_int_2d_dl::t_dev_const_um t_int_2d_const_um_dl;
typedef tdual_int_2d_dl::t_dev_const_randomread t_int_2d_randomread_dl;
typedef Kokkos::
DualView<LAMMPS_NS::tagint*, LMPDeviceType::array_layout, LMPDeviceType>
tdual_tagint_1d;
typedef tdual_tagint_1d::t_dev t_tagint_1d;
typedef tdual_tagint_1d::t_dev_const t_tagint_1d_const;
typedef tdual_tagint_1d::t_dev_um t_tagint_1d_um;
typedef tdual_tagint_1d::t_dev_const_um t_tagint_1d_const_um;
typedef tdual_tagint_1d::t_dev_const_randomread t_tagint_1d_randomread;
typedef Kokkos::
DualView<LAMMPS_NS::tagint**, Kokkos::LayoutRight, LMPDeviceType>
tdual_tagint_2d;
typedef tdual_tagint_2d::t_dev t_tagint_2d;
typedef tdual_tagint_2d::t_dev_const t_tagint_2d_const;
typedef tdual_tagint_2d::t_dev_um t_tagint_2d_um;
typedef tdual_tagint_2d::t_dev_const_um t_tagint_2d_const_um;
typedef tdual_tagint_2d::t_dev_const_randomread t_tagint_2d_randomread;
typedef Kokkos::
DualView<LAMMPS_NS::imageint*, LMPDeviceType::array_layout, LMPDeviceType>
tdual_imageint_1d;
typedef tdual_imageint_1d::t_dev t_imageint_1d;
typedef tdual_imageint_1d::t_dev_const t_imageint_1d_const;
typedef tdual_imageint_1d::t_dev_um t_imageint_1d_um;
typedef tdual_imageint_1d::t_dev_const_um t_imageint_1d_const_um;
typedef tdual_imageint_1d::t_dev_const_randomread t_imageint_1d_randomread;
typedef Kokkos::
DualView<double*, Kokkos::LayoutRight, LMPDeviceType> tdual_double_1d;
typedef tdual_double_1d::t_dev t_double_1d;
typedef tdual_double_1d::t_dev_const t_double_1d_const;
typedef tdual_double_1d::t_dev_um t_double_1d_um;
typedef tdual_double_1d::t_dev_const_um t_double_1d_const_um;
typedef tdual_double_1d::t_dev_const_randomread t_double_1d_randomread;
typedef Kokkos::
DualView<double**, Kokkos::LayoutRight, LMPDeviceType> tdual_double_2d;
typedef tdual_double_2d::t_dev t_double_2d;
typedef tdual_double_2d::t_dev_const t_double_2d_const;
typedef tdual_double_2d::t_dev_um t_double_2d_um;
typedef tdual_double_2d::t_dev_const_um t_double_2d_const_um;
typedef tdual_double_2d::t_dev_const_randomread t_double_2d_randomread;
// 1d float array n
typedef Kokkos::DualView<LMP_FLOAT*, LMPDeviceType::array_layout, LMPDeviceType> tdual_float_1d;
typedef tdual_float_1d::t_dev t_float_1d;
typedef tdual_float_1d::t_dev_const t_float_1d_const;
typedef tdual_float_1d::t_dev_um t_float_1d_um;
typedef tdual_float_1d::t_dev_const_um t_float_1d_const_um;
typedef tdual_float_1d::t_dev_const_randomread t_float_1d_randomread;
//2d float array n
typedef Kokkos::DualView<LMP_FLOAT**, Kokkos::LayoutRight, LMPDeviceType> tdual_float_2d;
typedef tdual_float_2d::t_dev t_float_2d;
typedef tdual_float_2d::t_dev_const t_float_2d_const;
typedef tdual_float_2d::t_dev_um t_float_2d_um;
typedef tdual_float_2d::t_dev_const_um t_float_2d_const_um;
typedef tdual_float_2d::t_dev_const_randomread t_float_2d_randomread;
//Position Types
//1d X_FLOAT array n
typedef Kokkos::DualView<X_FLOAT*, LMPDeviceType::array_layout, LMPDeviceType> tdual_xfloat_1d;
typedef tdual_xfloat_1d::t_dev t_xfloat_1d;
typedef tdual_xfloat_1d::t_dev_const t_xfloat_1d_const;
typedef tdual_xfloat_1d::t_dev_um t_xfloat_1d_um;
typedef tdual_xfloat_1d::t_dev_const_um t_xfloat_1d_const_um;
typedef tdual_xfloat_1d::t_dev_const_randomread t_xfloat_1d_randomread;
//2d X_FLOAT array n*m
typedef Kokkos::DualView<X_FLOAT**, Kokkos::LayoutRight, LMPDeviceType> tdual_xfloat_2d;
typedef tdual_xfloat_2d::t_dev t_xfloat_2d;
typedef tdual_xfloat_2d::t_dev_const t_xfloat_2d_const;
typedef tdual_xfloat_2d::t_dev_um t_xfloat_2d_um;
typedef tdual_xfloat_2d::t_dev_const_um t_xfloat_2d_const_um;
typedef tdual_xfloat_2d::t_dev_const_randomread t_xfloat_2d_randomread;
//2d X_FLOAT array n*4
#ifdef LMP_KOKKOS_NO_LEGACY
typedef Kokkos::DualView<X_FLOAT*[3], Kokkos::LayoutLeft, LMPDeviceType> tdual_x_array;
#else
typedef Kokkos::DualView<X_FLOAT*[3], Kokkos::LayoutRight, LMPDeviceType> tdual_x_array;
#endif
typedef tdual_x_array::t_dev t_x_array;
typedef tdual_x_array::t_dev_const t_x_array_const;
typedef tdual_x_array::t_dev_um t_x_array_um;
typedef tdual_x_array::t_dev_const_um t_x_array_const_um;
typedef tdual_x_array::t_dev_const_randomread t_x_array_randomread;
//Velocity Types
//1d V_FLOAT array n
typedef Kokkos::DualView<V_FLOAT*, LMPDeviceType::array_layout, LMPDeviceType> tdual_vfloat_1d;
typedef tdual_vfloat_1d::t_dev t_vfloat_1d;
typedef tdual_vfloat_1d::t_dev_const t_vfloat_1d_const;
typedef tdual_vfloat_1d::t_dev_um t_vfloat_1d_um;
typedef tdual_vfloat_1d::t_dev_const_um t_vfloat_1d_const_um;
typedef tdual_vfloat_1d::t_dev_const_randomread t_vfloat_1d_randomread;
//2d V_FLOAT array n*m
typedef Kokkos::DualView<V_FLOAT**, Kokkos::LayoutRight, LMPDeviceType> tdual_vfloat_2d;
typedef tdual_vfloat_2d::t_dev t_vfloat_2d;
typedef tdual_vfloat_2d::t_dev_const t_vfloat_2d_const;
typedef tdual_vfloat_2d::t_dev_um t_vfloat_2d_um;
typedef tdual_vfloat_2d::t_dev_const_um t_vfloat_2d_const_um;
typedef tdual_vfloat_2d::t_dev_const_randomread t_vfloat_2d_randomread;
//2d V_FLOAT array n*3
typedef Kokkos::DualView<V_FLOAT*[3], Kokkos::LayoutRight, LMPDeviceType> tdual_v_array;
//typedef Kokkos::DualView<V_FLOAT*[3], LMPDeviceType::array_layout, LMPDeviceType> tdual_v_array;
typedef tdual_v_array::t_dev t_v_array;
typedef tdual_v_array::t_dev_const t_v_array_const;
typedef tdual_v_array::t_dev_um t_v_array_um;
typedef tdual_v_array::t_dev_const_um t_v_array_const_um;
typedef tdual_v_array::t_dev_const_randomread t_v_array_randomread;
//Force Types
//1d F_FLOAT array n
typedef Kokkos::DualView<F_FLOAT*, LMPDeviceType::array_layout, LMPDeviceType> tdual_ffloat_1d;
typedef tdual_ffloat_1d::t_dev t_ffloat_1d;
typedef tdual_ffloat_1d::t_dev_const t_ffloat_1d_const;
typedef tdual_ffloat_1d::t_dev_um t_ffloat_1d_um;
typedef tdual_ffloat_1d::t_dev_const_um t_ffloat_1d_const_um;
typedef tdual_ffloat_1d::t_dev_const_randomread t_ffloat_1d_randomread;
//2d F_FLOAT array n*m
typedef Kokkos::DualView<F_FLOAT**, Kokkos::LayoutRight, LMPDeviceType> tdual_ffloat_2d;
typedef tdual_ffloat_2d::t_dev t_ffloat_2d;
typedef tdual_ffloat_2d::t_dev_const t_ffloat_2d_const;
typedef tdual_ffloat_2d::t_dev_um t_ffloat_2d_um;
typedef tdual_ffloat_2d::t_dev_const_um t_ffloat_2d_const_um;
typedef tdual_ffloat_2d::t_dev_const_randomread t_ffloat_2d_randomread;
//2d F_FLOAT array n*m, device layout
typedef Kokkos::DualView<F_FLOAT**, LMPDeviceType::array_layout, LMPDeviceType> tdual_ffloat_2d_dl;
typedef tdual_ffloat_2d_dl::t_dev t_ffloat_2d_dl;
typedef tdual_ffloat_2d_dl::t_dev_const t_ffloat_2d_const_dl;
typedef tdual_ffloat_2d_dl::t_dev_um t_ffloat_2d_um_dl;
typedef tdual_ffloat_2d_dl::t_dev_const_um t_ffloat_2d_const_um_dl;
typedef tdual_ffloat_2d_dl::t_dev_const_randomread t_ffloat_2d_randomread_dl;
//2d F_FLOAT array n*3
typedef Kokkos::DualView<F_FLOAT*[3], Kokkos::LayoutRight, LMPDeviceType> tdual_f_array;
//typedef Kokkos::DualView<F_FLOAT*[3], LMPDeviceType::array_layout, LMPDeviceType> tdual_f_array;
typedef tdual_f_array::t_dev t_f_array;
typedef tdual_f_array::t_dev_const t_f_array_const;
typedef tdual_f_array::t_dev_um t_f_array_um;
typedef tdual_f_array::t_dev_const_um t_f_array_const_um;
typedef tdual_f_array::t_dev_const_randomread t_f_array_randomread;
//2d F_FLOAT array n*6 (for virial)
typedef Kokkos::DualView<F_FLOAT*[6], Kokkos::LayoutRight, LMPDeviceType> tdual_virial_array;
typedef tdual_virial_array::t_dev t_virial_array;
typedef tdual_virial_array::t_dev_const t_virial_array_const;
typedef tdual_virial_array::t_dev_um t_virial_array_um;
typedef tdual_virial_array::t_dev_const_um t_virial_array_const_um;
typedef tdual_virial_array::t_dev_const_randomread t_virial_array_randomread;
//Energy Types
//1d E_FLOAT array n
typedef Kokkos::DualView<E_FLOAT*, LMPDeviceType::array_layout, LMPDeviceType> tdual_efloat_1d;
typedef tdual_efloat_1d::t_dev t_efloat_1d;
typedef tdual_efloat_1d::t_dev_const t_efloat_1d_const;
typedef tdual_efloat_1d::t_dev_um t_efloat_1d_um;
typedef tdual_efloat_1d::t_dev_const_um t_efloat_1d_const_um;
typedef tdual_efloat_1d::t_dev_const_randomread t_efloat_1d_randomread;
//2d E_FLOAT array n*m
typedef Kokkos::DualView<E_FLOAT**, Kokkos::LayoutRight, LMPDeviceType> tdual_efloat_2d;
typedef tdual_efloat_2d::t_dev t_efloat_2d;
typedef tdual_efloat_2d::t_dev_const t_efloat_2d_const;
typedef tdual_efloat_2d::t_dev_um t_efloat_2d_um;
typedef tdual_efloat_2d::t_dev_const_um t_efloat_2d_const_um;
typedef tdual_efloat_2d::t_dev_const_randomread t_efloat_2d_randomread;
//2d E_FLOAT array n*3
typedef Kokkos::DualView<E_FLOAT*[3], Kokkos::LayoutRight, LMPDeviceType> tdual_e_array;
typedef tdual_e_array::t_dev t_e_array;
typedef tdual_e_array::t_dev_const t_e_array_const;
typedef tdual_e_array::t_dev_um t_e_array_um;
typedef tdual_e_array::t_dev_const_um t_e_array_const_um;
typedef tdual_e_array::t_dev_const_randomread t_e_array_randomread;
//Neighbor Types
typedef Kokkos::DualView<int**, LMPDeviceType::array_layout, LMPDeviceType> tdual_neighbors_2d;
typedef tdual_neighbors_2d::t_dev t_neighbors_2d;
typedef tdual_neighbors_2d::t_dev_const t_neighbors_2d_const;
typedef tdual_neighbors_2d::t_dev_um t_neighbors_2d_um;
typedef tdual_neighbors_2d::t_dev_const_um t_neighbors_2d_const_um;
typedef tdual_neighbors_2d::t_dev_const_randomread t_neighbors_2d_randomread;
//Kspace
typedef Kokkos::
DualView<FFT_SCALAR*, Kokkos::LayoutRight, LMPDeviceType> tdual_FFT_SCALAR_1d;
typedef tdual_FFT_SCALAR_1d::t_dev t_FFT_SCALAR_1d;
typedef tdual_FFT_SCALAR_1d::t_dev_um t_FFT_SCALAR_1d_um;
typedef Kokkos::DualView<FFT_SCALAR**,Kokkos::LayoutRight,LMPDeviceType> tdual_FFT_SCALAR_2d;
typedef tdual_FFT_SCALAR_2d::t_dev t_FFT_SCALAR_2d;
typedef Kokkos::DualView<FFT_SCALAR**[3],Kokkos::LayoutRight,LMPDeviceType> tdual_FFT_SCALAR_2d_3;
typedef tdual_FFT_SCALAR_2d_3::t_dev t_FFT_SCALAR_2d_3;
typedef Kokkos::DualView<FFT_SCALAR***,Kokkos::LayoutRight,LMPDeviceType> tdual_FFT_SCALAR_3d;
typedef tdual_FFT_SCALAR_3d::t_dev t_FFT_SCALAR_3d;
typedef Kokkos::
DualView<FFT_SCALAR*[2], Kokkos::LayoutRight, LMPDeviceType> tdual_FFT_DATA_1d;
typedef tdual_FFT_DATA_1d::t_dev t_FFT_DATA_1d;
typedef tdual_FFT_DATA_1d::t_dev_um t_FFT_DATA_1d_um;
typedef Kokkos::
DualView<int*, LMPDeviceType::array_layout, LMPDeviceType> tdual_int_64;
typedef tdual_int_64::t_dev t_int_64;
typedef tdual_int_64::t_dev_um t_int_64_um;
};
#ifdef KOKKOS_HAVE_CUDA
template <>
struct ArrayTypes<LMPHostType> {
//Scalar Types
typedef Kokkos::DualView<int, LMPDeviceType::array_layout, LMPDeviceType> tdual_int_scalar;
typedef tdual_int_scalar::t_host t_int_scalar;
typedef tdual_int_scalar::t_host_const t_int_scalar_const;
typedef tdual_int_scalar::t_host_um t_int_scalar_um;
typedef tdual_int_scalar::t_host_const_um t_int_scalar_const_um;
typedef Kokkos::DualView<LMP_FLOAT, LMPDeviceType::array_layout, LMPDeviceType> tdual_float_scalar;
typedef tdual_float_scalar::t_host t_float_scalar;
typedef tdual_float_scalar::t_host_const t_float_scalar_const;
typedef tdual_float_scalar::t_host_um t_float_scalar_um;
typedef tdual_float_scalar::t_host_const_um t_float_scalar_const_um;
//Generic ArrayTypes
typedef Kokkos::DualView<int*, LMPDeviceType::array_layout, LMPDeviceType> tdual_int_1d;
typedef tdual_int_1d::t_host t_int_1d;
typedef tdual_int_1d::t_host_const t_int_1d_const;
typedef tdual_int_1d::t_host_um t_int_1d_um;
typedef tdual_int_1d::t_host_const_um t_int_1d_const_um;
typedef tdual_int_1d::t_host_const_randomread t_int_1d_randomread;
typedef Kokkos::DualView<int*[3], Kokkos::LayoutRight, LMPDeviceType> tdual_int_1d_3;
typedef tdual_int_1d_3::t_host t_int_1d_3;
typedef tdual_int_1d_3::t_host_const t_int_1d_3_const;
typedef tdual_int_1d_3::t_host_um t_int_1d_3_um;
typedef tdual_int_1d_3::t_host_const_um t_int_1d_3_const_um;
typedef tdual_int_1d_3::t_host_const_randomread t_int_1d_3_randomread;
typedef Kokkos::DualView<int**, Kokkos::LayoutRight, LMPDeviceType> tdual_int_2d;
typedef tdual_int_2d::t_host t_int_2d;
typedef tdual_int_2d::t_host_const t_int_2d_const;
typedef tdual_int_2d::t_host_um t_int_2d_um;
typedef tdual_int_2d::t_host_const_um t_int_2d_const_um;
typedef tdual_int_2d::t_host_const_randomread t_int_2d_randomread;
typedef Kokkos::DualView<int**, LMPDeviceType::array_layout, LMPDeviceType> tdual_int_2d_dl;
typedef tdual_int_2d_dl::t_host t_int_2d_dl;
typedef tdual_int_2d_dl::t_host_const t_int_2d_const_dl;
typedef tdual_int_2d_dl::t_host_um t_int_2d_um_dl;
typedef tdual_int_2d_dl::t_host_const_um t_int_2d_const_um_dl;
typedef tdual_int_2d_dl::t_host_const_randomread t_int_2d_randomread_dl;
typedef Kokkos::DualView<LAMMPS_NS::tagint*, LMPDeviceType::array_layout, LMPDeviceType> tdual_tagint_1d;
typedef tdual_tagint_1d::t_host t_tagint_1d;
typedef tdual_tagint_1d::t_host_const t_tagint_1d_const;
typedef tdual_tagint_1d::t_host_um t_tagint_1d_um;
typedef tdual_tagint_1d::t_host_const_um t_tagint_1d_const_um;
typedef tdual_tagint_1d::t_host_const_randomread t_tagint_1d_randomread;
typedef Kokkos::
DualView<LAMMPS_NS::tagint**, Kokkos::LayoutRight, LMPDeviceType>
tdual_tagint_2d;
typedef tdual_tagint_2d::t_host t_tagint_2d;
typedef tdual_tagint_2d::t_host_const t_tagint_2d_const;
typedef tdual_tagint_2d::t_host_um t_tagint_2d_um;
typedef tdual_tagint_2d::t_host_const_um t_tagint_2d_const_um;
typedef tdual_tagint_2d::t_host_const_randomread t_tagint_2d_randomread;
typedef Kokkos::
DualView<LAMMPS_NS::imageint*, LMPDeviceType::array_layout, LMPDeviceType>
tdual_imageint_1d;
typedef tdual_imageint_1d::t_host t_imageint_1d;
typedef tdual_imageint_1d::t_host_const t_imageint_1d_const;
typedef tdual_imageint_1d::t_host_um t_imageint_1d_um;
typedef tdual_imageint_1d::t_host_const_um t_imageint_1d_const_um;
typedef tdual_imageint_1d::t_host_const_randomread t_imageint_1d_randomread;
typedef Kokkos::
DualView<double*, Kokkos::LayoutRight, LMPDeviceType> tdual_double_1d;
typedef tdual_double_1d::t_host t_double_1d;
typedef tdual_double_1d::t_host_const t_double_1d_const;
typedef tdual_double_1d::t_host_um t_double_1d_um;
typedef tdual_double_1d::t_host_const_um t_double_1d_const_um;
typedef tdual_double_1d::t_host_const_randomread t_double_1d_randomread;
typedef Kokkos::
DualView<double**, Kokkos::LayoutRight, LMPDeviceType> tdual_double_2d;
typedef tdual_double_2d::t_host t_double_2d;
typedef tdual_double_2d::t_host_const t_double_2d_const;
typedef tdual_double_2d::t_host_um t_double_2d_um;
typedef tdual_double_2d::t_host_const_um t_double_2d_const_um;
typedef tdual_double_2d::t_host_const_randomread t_double_2d_randomread;
//1d float array n
typedef Kokkos::DualView<LMP_FLOAT*, LMPDeviceType::array_layout, LMPDeviceType> tdual_float_1d;
typedef tdual_float_1d::t_host t_float_1d;
typedef tdual_float_1d::t_host_const t_float_1d_const;
typedef tdual_float_1d::t_host_um t_float_1d_um;
typedef tdual_float_1d::t_host_const_um t_float_1d_const_um;
typedef tdual_float_1d::t_host_const_randomread t_float_1d_randomread;
//2d float array n
typedef Kokkos::DualView<LMP_FLOAT**, Kokkos::LayoutRight, LMPDeviceType> tdual_float_2d;
typedef tdual_float_2d::t_host t_float_2d;
typedef tdual_float_2d::t_host_const t_float_2d_const;
typedef tdual_float_2d::t_host_um t_float_2d_um;
typedef tdual_float_2d::t_host_const_um t_float_2d_const_um;
typedef tdual_float_2d::t_host_const_randomread t_float_2d_randomread;
//Position Types
//1d X_FLOAT array n
typedef Kokkos::DualView<X_FLOAT*, LMPDeviceType::array_layout, LMPDeviceType> tdual_xfloat_1d;
typedef tdual_xfloat_1d::t_host t_xfloat_1d;
typedef tdual_xfloat_1d::t_host_const t_xfloat_1d_const;
typedef tdual_xfloat_1d::t_host_um t_xfloat_1d_um;
typedef tdual_xfloat_1d::t_host_const_um t_xfloat_1d_const_um;
typedef tdual_xfloat_1d::t_host_const_randomread t_xfloat_1d_randomread;
//2d X_FLOAT array n*m
typedef Kokkos::DualView<X_FLOAT**, Kokkos::LayoutRight, LMPDeviceType> tdual_xfloat_2d;
typedef tdual_xfloat_2d::t_host t_xfloat_2d;
typedef tdual_xfloat_2d::t_host_const t_xfloat_2d_const;
typedef tdual_xfloat_2d::t_host_um t_xfloat_2d_um;
typedef tdual_xfloat_2d::t_host_const_um t_xfloat_2d_const_um;
typedef tdual_xfloat_2d::t_host_const_randomread t_xfloat_2d_randomread;
//2d X_FLOAT array n*3
typedef Kokkos::DualView<X_FLOAT*[3], Kokkos::LayoutRight, LMPDeviceType> tdual_x_array;
typedef tdual_x_array::t_host t_x_array;
typedef tdual_x_array::t_host_const t_x_array_const;
typedef tdual_x_array::t_host_um t_x_array_um;
typedef tdual_x_array::t_host_const_um t_x_array_const_um;
typedef tdual_x_array::t_host_const_randomread t_x_array_randomread;
//Velocity Types
//1d V_FLOAT array n
typedef Kokkos::DualView<V_FLOAT*, LMPDeviceType::array_layout, LMPDeviceType> tdual_vfloat_1d;
typedef tdual_vfloat_1d::t_host t_vfloat_1d;
typedef tdual_vfloat_1d::t_host_const t_vfloat_1d_const;
typedef tdual_vfloat_1d::t_host_um t_vfloat_1d_um;
typedef tdual_vfloat_1d::t_host_const_um t_vfloat_1d_const_um;
typedef tdual_vfloat_1d::t_host_const_randomread t_vfloat_1d_randomread;
//2d V_FLOAT array n*m
typedef Kokkos::DualView<V_FLOAT**, Kokkos::LayoutRight, LMPDeviceType> tdual_vfloat_2d;
typedef tdual_vfloat_2d::t_host t_vfloat_2d;
typedef tdual_vfloat_2d::t_host_const t_vfloat_2d_const;
typedef tdual_vfloat_2d::t_host_um t_vfloat_2d_um;
typedef tdual_vfloat_2d::t_host_const_um t_vfloat_2d_const_um;
typedef tdual_vfloat_2d::t_host_const_randomread t_vfloat_2d_randomread;
//2d V_FLOAT array n*3
typedef Kokkos::DualView<V_FLOAT*[3], Kokkos::LayoutRight, LMPDeviceType> tdual_v_array;
//typedef Kokkos::DualView<V_FLOAT*[3], LMPDeviceType::array_layout, LMPDeviceType> tdual_v_array;
typedef tdual_v_array::t_host t_v_array;
typedef tdual_v_array::t_host_const t_v_array_const;
typedef tdual_v_array::t_host_um t_v_array_um;
typedef tdual_v_array::t_host_const_um t_v_array_const_um;
typedef tdual_v_array::t_host_const_randomread t_v_array_randomread;
//Force Types
//1d F_FLOAT array n
typedef Kokkos::DualView<F_FLOAT*, LMPDeviceType::array_layout, LMPDeviceType> tdual_ffloat_1d;
typedef tdual_ffloat_1d::t_host t_ffloat_1d;
typedef tdual_ffloat_1d::t_host_const t_ffloat_1d_const;
typedef tdual_ffloat_1d::t_host_um t_ffloat_1d_um;
typedef tdual_ffloat_1d::t_host_const_um t_ffloat_1d_const_um;
typedef tdual_ffloat_1d::t_host_const_randomread t_ffloat_1d_randomread;
//2d F_FLOAT array n*m
typedef Kokkos::DualView<F_FLOAT**, Kokkos::LayoutRight, LMPDeviceType> tdual_ffloat_2d;
typedef tdual_ffloat_2d::t_host t_ffloat_2d;
typedef tdual_ffloat_2d::t_host_const t_ffloat_2d_const;
typedef tdual_ffloat_2d::t_host_um t_ffloat_2d_um;
typedef tdual_ffloat_2d::t_host_const_um t_ffloat_2d_const_um;
typedef tdual_ffloat_2d::t_host_const_randomread t_ffloat_2d_randomread;
//2d F_FLOAT array n*m, device layout
typedef Kokkos::DualView<F_FLOAT**, LMPDeviceType::array_layout, LMPDeviceType> tdual_ffloat_2d_dl;
typedef tdual_ffloat_2d_dl::t_host t_ffloat_2d_dl;
typedef tdual_ffloat_2d_dl::t_host_const t_ffloat_2d_const_dl;
typedef tdual_ffloat_2d_dl::t_host_um t_ffloat_2d_um_dl;
typedef tdual_ffloat_2d_dl::t_host_const_um t_ffloat_2d_const_um_dl;
typedef tdual_ffloat_2d_dl::t_host_const_randomread t_ffloat_2d_randomread_dl;
//2d F_FLOAT array n*3
typedef Kokkos::DualView<F_FLOAT*[3], Kokkos::LayoutRight, LMPDeviceType> tdual_f_array;
//typedef Kokkos::DualView<F_FLOAT*[3], LMPDeviceType::array_layout, LMPDeviceType> tdual_f_array;
typedef tdual_f_array::t_host t_f_array;
typedef tdual_f_array::t_host_const t_f_array_const;
typedef tdual_f_array::t_host_um t_f_array_um;
typedef tdual_f_array::t_host_const_um t_f_array_const_um;
typedef tdual_f_array::t_host_const_randomread t_f_array_randomread;
//2d F_FLOAT array n*6 (for virial)
typedef Kokkos::DualView<F_FLOAT*[6], Kokkos::LayoutRight, LMPDeviceType> tdual_virial_array;
typedef tdual_virial_array::t_host t_virial_array;
typedef tdual_virial_array::t_host_const t_virial_array_const;
typedef tdual_virial_array::t_host_um t_virial_array_um;
typedef tdual_virial_array::t_host_const_um t_virial_array_const_um;
typedef tdual_virial_array::t_host_const_randomread t_virial_array_randomread;
//Energy Types
//1d E_FLOAT array n
typedef Kokkos::DualView<E_FLOAT*, LMPDeviceType::array_layout, LMPDeviceType> tdual_efloat_1d;
typedef tdual_efloat_1d::t_host t_efloat_1d;
typedef tdual_efloat_1d::t_host_const t_efloat_1d_const;
typedef tdual_efloat_1d::t_host_um t_efloat_1d_um;
typedef tdual_efloat_1d::t_host_const_um t_efloat_1d_const_um;
typedef tdual_efloat_1d::t_host_const_randomread t_efloat_1d_randomread;
//2d E_FLOAT array n*m
typedef Kokkos::DualView<E_FLOAT**, Kokkos::LayoutRight, LMPDeviceType> tdual_efloat_2d;
typedef tdual_efloat_2d::t_host t_efloat_2d;
typedef tdual_efloat_2d::t_host_const t_efloat_2d_const;
typedef tdual_efloat_2d::t_host_um t_efloat_2d_um;
typedef tdual_efloat_2d::t_host_const_um t_efloat_2d_const_um;
typedef tdual_efloat_2d::t_host_const_randomread t_efloat_2d_randomread;
//2d E_FLOAT array n*3
typedef Kokkos::DualView<E_FLOAT*[3], Kokkos::LayoutRight, LMPDeviceType> tdual_e_array;
typedef tdual_e_array::t_host t_e_array;
typedef tdual_e_array::t_host_const t_e_array_const;
typedef tdual_e_array::t_host_um t_e_array_um;
typedef tdual_e_array::t_host_const_um t_e_array_const_um;
typedef tdual_e_array::t_host_const_randomread t_e_array_randomread;
//Neighbor Types
typedef Kokkos::DualView<int**, LMPDeviceType::array_layout, LMPDeviceType> tdual_neighbors_2d;
typedef tdual_neighbors_2d::t_host t_neighbors_2d;
typedef tdual_neighbors_2d::t_host_const t_neighbors_2d_const;
typedef tdual_neighbors_2d::t_host_um t_neighbors_2d_um;
typedef tdual_neighbors_2d::t_host_const_um t_neighbors_2d_const_um;
typedef tdual_neighbors_2d::t_host_const_randomread t_neighbors_2d_randomread;
//Kspace
typedef Kokkos::
DualView<FFT_SCALAR*, Kokkos::LayoutRight, LMPDeviceType> tdual_FFT_SCALAR_1d;
typedef tdual_FFT_SCALAR_1d::t_host t_FFT_SCALAR_1d;
typedef tdual_FFT_SCALAR_1d::t_host_um t_FFT_SCALAR_1d_um;
typedef Kokkos::DualView<FFT_SCALAR**,Kokkos::LayoutRight,LMPDeviceType> tdual_FFT_SCALAR_2d;
typedef tdual_FFT_SCALAR_2d::t_host t_FFT_SCALAR_2d;
typedef Kokkos::DualView<FFT_SCALAR**[3],Kokkos::LayoutRight,LMPDeviceType> tdual_FFT_SCALAR_2d_3;
typedef tdual_FFT_SCALAR_2d_3::t_host t_FFT_SCALAR_2d_3;
typedef Kokkos::DualView<FFT_SCALAR***,Kokkos::LayoutRight,LMPDeviceType> tdual_FFT_SCALAR_3d;
typedef tdual_FFT_SCALAR_3d::t_host t_FFT_SCALAR_3d;
typedef Kokkos::
DualView<FFT_SCALAR*[2], Kokkos::LayoutRight, LMPDeviceType> tdual_FFT_DATA_1d;
typedef tdual_FFT_DATA_1d::t_host t_FFT_DATA_1d;
typedef tdual_FFT_DATA_1d::t_host_um t_FFT_DATA_1d_um;
typedef Kokkos::
DualView<int*, LMPDeviceType::array_layout, LMPDeviceType> tdual_int_64;
typedef tdual_int_64::t_host t_int_64;
typedef tdual_int_64::t_host_um t_int_64_um;
};
#endif
//default LAMMPS Types
typedef struct ArrayTypes<LMPDeviceType> DAT;
typedef struct ArrayTypes<LMPHostType> HAT;
template<class DeviceType, class BufferView, class DualView>
void buffer_view(BufferView &buf, DualView &view,
const size_t n0,
const size_t n1 = 0,
const size_t n2 = 0,
const size_t n3 = 0,
const size_t n4 = 0,
const size_t n5 = 0,
const size_t n6 = 0,
const size_t n7 = 0) {
buf = BufferView(
view.template view<DeviceType>().ptr_on_device(),
n0,n1,n2,n3,n4,n5,n6,n7);
}
template<class DeviceType>
struct MemsetZeroFunctor {
typedef DeviceType execution_space ;
void* ptr;
KOKKOS_INLINE_FUNCTION void operator()(const int i) const {
((int*)ptr)[i] = 0;
}
};
template<class ViewType>
void memset_kokkos (ViewType &view) {
static MemsetZeroFunctor<typename ViewType::execution_space> f;
f.ptr = view.ptr_on_device();
#ifndef KOKKOS_USING_DEPRECATED_VIEW
Kokkos::parallel_for(view.span()*sizeof(typename ViewType::value_type)/4, f);
#else
Kokkos::parallel_for(view.capacity()*sizeof(typename ViewType::value_type)/4, f);
#endif
ViewType::execution_space::fence();
}
+struct params_lj_coul {
+ KOKKOS_INLINE_FUNCTION
+ params_lj_coul(){cut_ljsq=0;cut_coulsq=0;lj1=0;lj2=0;lj3=0;lj4=0;offset=0;};
+ KOKKOS_INLINE_FUNCTION
+ params_lj_coul(int i){cut_ljsq=0;cut_coulsq=0;lj1=0;lj2=0;lj3=0;lj4=0;offset=0;};
+ F_FLOAT cut_ljsq,cut_coulsq,lj1,lj2,lj3,lj4,offset;
+};
+
#if defined(KOKKOS_HAVE_CXX11)
#undef ISFINITE
#define ISFINITE(x) std::isfinite(x)
#endif
#ifdef KOKKOS_HAVE_CUDA
#define LAMMPS_LAMBDA [=] __device__
#else
#define LAMMPS_LAMBDA [=]
#endif
#endif
diff --git a/src/KOKKOS/pair_lj_charmm_coul_charmm_implicit_kokkos.h b/src/KOKKOS/pair_lj_charmm_coul_charmm_implicit_kokkos.h
index 3c0b7d46a..048a7dab6 100644
--- a/src/KOKKOS/pair_lj_charmm_coul_charmm_implicit_kokkos.h
+++ b/src/KOKKOS/pair_lj_charmm_coul_charmm_implicit_kokkos.h
@@ -1,165 +1,160 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#ifdef PAIR_CLASS
PairStyle(lj/charmm/coul/charmm/implicit/kk,PairLJCharmmCoulCharmmImplicitKokkos<LMPDeviceType>)
PairStyle(lj/charmm/coul/charmm/implicit/kk/device,PairLJCharmmCoulCharmmImplicitKokkos<LMPDeviceType>)
PairStyle(lj/charmm/coul/charmm/implicit/kk/host,PairLJCharmmCoulCharmmImplicitKokkos<LMPHostType>)
#else
#ifndef LMP_PAIR_LJ_CHARMM_COUL_CHARMM_IMPLICIT_KOKKOS_H
#define LMP_PAIR_LJ_CHARMM_COUL_CHARMM_IMPLICIT_KOKKOS_H
#include "pair_kokkos.h"
#include "pair_lj_charmm_coul_charmm_implicit.h"
#include "neigh_list_kokkos.h"
namespace LAMMPS_NS {
template<class DeviceType>
class PairLJCharmmCoulCharmmImplicitKokkos : public PairLJCharmmCoulCharmmImplicit {
public:
enum {EnabledNeighFlags=FULL|HALFTHREAD|HALF};
enum {COUL_FLAG=1};
typedef DeviceType device_type;
PairLJCharmmCoulCharmmImplicitKokkos(class LAMMPS *);
~PairLJCharmmCoulCharmmImplicitKokkos();
void compute(int, int);
void settings(int, char **);
void init_tables(double cut_coul, double *cut_respa);
void init_style();
double init_one(int, int);
- struct params_lj_coul{
- params_lj_coul(){cut_ljsq=0;cut_coulsq=0;lj1=0;lj2=0;lj3=0;lj4=0;offset=0;};
- params_lj_coul(int i){cut_ljsq=0;cut_coulsq=0;lj1=0;lj2=0;lj3=0;lj4=0;offset=0;};
- F_FLOAT cut_ljsq,cut_coulsq,lj1,lj2,lj3,lj4,offset;
- };
protected:
void cleanup_copy();
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_fpair(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype) const;
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_fcoul(const F_FLOAT& rsq, const int& i, const int&j, const int& itype,
const int& jtype, const F_FLOAT& factor_coul, const F_FLOAT& qtmp) const;
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_evdwl(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype) const;
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_ecoul(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype, const F_FLOAT& factor_coul, const F_FLOAT& qtmp) const;
Kokkos::DualView<params_lj_coul**,Kokkos::LayoutRight,DeviceType> k_params;
typename Kokkos::DualView<params_lj_coul**,
Kokkos::LayoutRight,DeviceType>::t_dev_const_um params;
// hardwired to space for 12 atom types
params_lj_coul m_params[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
F_FLOAT m_cutsq[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
F_FLOAT m_cut_ljsq[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
F_FLOAT m_cut_coulsq[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
typename ArrayTypes<DeviceType>::t_x_array_randomread x;
typename ArrayTypes<DeviceType>::t_x_array c_x;
typename ArrayTypes<DeviceType>::t_f_array f;
typename ArrayTypes<DeviceType>::t_int_1d_randomread type;
typename ArrayTypes<DeviceType>::t_float_1d_randomread q;
DAT::tdual_efloat_1d k_eatom;
DAT::tdual_virial_array k_vatom;
typename ArrayTypes<DeviceType>::t_efloat_1d d_eatom;
typename ArrayTypes<DeviceType>::t_virial_array d_vatom;
int newton_pair;
typename ArrayTypes<DeviceType>::tdual_ffloat_2d k_cutsq;
typename ArrayTypes<DeviceType>::t_ffloat_2d d_cutsq;
typename ArrayTypes<DeviceType>::tdual_ffloat_2d k_cut_ljsq;
typename ArrayTypes<DeviceType>::t_ffloat_2d d_cut_ljsq;
typename ArrayTypes<DeviceType>::tdual_ffloat_2d k_cut_coulsq;
typename ArrayTypes<DeviceType>::t_ffloat_2d d_cut_coulsq;
typename ArrayTypes<DeviceType>::t_ffloat_1d_randomread
d_rtable, d_drtable, d_ftable, d_dftable,
d_ctable, d_dctable, d_etable, d_detable;
int neighflag;
int nlocal,nall,eflag,vflag;
double special_coul[4];
double special_lj[4];
double qqrd2e;
void allocate();
friend class PairComputeFunctor<PairLJCharmmCoulCharmmImplicitKokkos,FULL,true,CoulLongTable<1> >;
friend class PairComputeFunctor<PairLJCharmmCoulCharmmImplicitKokkos,HALF,true,CoulLongTable<1> >;
friend class PairComputeFunctor<PairLJCharmmCoulCharmmImplicitKokkos,HALFTHREAD,true,CoulLongTable<1> >;
friend class PairComputeFunctor<PairLJCharmmCoulCharmmImplicitKokkos,FULL,false,CoulLongTable<1> >;
friend class PairComputeFunctor<PairLJCharmmCoulCharmmImplicitKokkos,HALF,false,CoulLongTable<1> >;
friend class PairComputeFunctor<PairLJCharmmCoulCharmmImplicitKokkos,HALFTHREAD,false,CoulLongTable<1> >;
friend EV_FLOAT pair_compute_neighlist<PairLJCharmmCoulCharmmImplicitKokkos,FULL,CoulLongTable<1> >(PairLJCharmmCoulCharmmImplicitKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute_neighlist<PairLJCharmmCoulCharmmImplicitKokkos,HALF,CoulLongTable<1> >(PairLJCharmmCoulCharmmImplicitKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute_neighlist<PairLJCharmmCoulCharmmImplicitKokkos,HALFTHREAD,CoulLongTable<1> >(PairLJCharmmCoulCharmmImplicitKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute<PairLJCharmmCoulCharmmImplicitKokkos,CoulLongTable<1> >(PairLJCharmmCoulCharmmImplicitKokkos*,
NeighListKokkos<DeviceType>*);
friend class PairComputeFunctor<PairLJCharmmCoulCharmmImplicitKokkos,FULL,true,CoulLongTable<0> >;
friend class PairComputeFunctor<PairLJCharmmCoulCharmmImplicitKokkos,HALF,true,CoulLongTable<0> >;
friend class PairComputeFunctor<PairLJCharmmCoulCharmmImplicitKokkos,HALFTHREAD,true,CoulLongTable<0> >;
friend class PairComputeFunctor<PairLJCharmmCoulCharmmImplicitKokkos,FULL,false,CoulLongTable<0> >;
friend class PairComputeFunctor<PairLJCharmmCoulCharmmImplicitKokkos,HALF,false,CoulLongTable<0> >;
friend class PairComputeFunctor<PairLJCharmmCoulCharmmImplicitKokkos,HALFTHREAD,false,CoulLongTable<0> >;
friend EV_FLOAT pair_compute_neighlist<PairLJCharmmCoulCharmmImplicitKokkos,FULL,CoulLongTable<0> >(PairLJCharmmCoulCharmmImplicitKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute_neighlist<PairLJCharmmCoulCharmmImplicitKokkos,HALF,CoulLongTable<0> >(PairLJCharmmCoulCharmmImplicitKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute_neighlist<PairLJCharmmCoulCharmmImplicitKokkos,HALFTHREAD,CoulLongTable<0> >(PairLJCharmmCoulCharmmImplicitKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute<PairLJCharmmCoulCharmmImplicitKokkos,CoulLongTable<0> >(PairLJCharmmCoulCharmmImplicitKokkos*,
NeighListKokkos<DeviceType>*);
friend void pair_virial_fdotr_compute<PairLJCharmmCoulCharmmImplicitKokkos>(PairLJCharmmCoulCharmmImplicitKokkos*);
};
}
#endif
#endif
/* ERROR/WARNING messages:
E: Illegal ... command
Self-explanatory. Check the input script syntax and compare to the
documentation for the command. You can use -echo screen as a
command-line option when running LAMMPS to see the offending line.
E: Cannot use Kokkos pair style with rRESPA inner/middle
Self-explanatory.
E: Cannot use chosen neighbor list style with lj/charmm/coul/charmm/implicit/kk
Self-explanatory.
*/
diff --git a/src/KOKKOS/pair_lj_charmm_coul_charmm_kokkos.h b/src/KOKKOS/pair_lj_charmm_coul_charmm_kokkos.h
index 202cda68b..db0b14a84 100644
--- a/src/KOKKOS/pair_lj_charmm_coul_charmm_kokkos.h
+++ b/src/KOKKOS/pair_lj_charmm_coul_charmm_kokkos.h
@@ -1,165 +1,160 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#ifdef PAIR_CLASS
PairStyle(lj/charmm/coul/charmm/kk,PairLJCharmmCoulCharmmKokkos<LMPDeviceType>)
PairStyle(lj/charmm/coul/charmm/kk/device,PairLJCharmmCoulCharmmKokkos<LMPDeviceType>)
PairStyle(lj/charmm/coul/charmm/kk/host,PairLJCharmmCoulCharmmKokkos<LMPHostType>)
#else
#ifndef LMP_PAIR_LJ_CHARMM_COUL_CHARMM_KOKKOS_H
#define LMP_PAIR_LJ_CHARMM_COUL_CHARMM_KOKKOS_H
#include "pair_kokkos.h"
#include "pair_lj_charmm_coul_charmm.h"
#include "neigh_list_kokkos.h"
namespace LAMMPS_NS {
template<class DeviceType>
class PairLJCharmmCoulCharmmKokkos : public PairLJCharmmCoulCharmm {
public:
enum {EnabledNeighFlags=FULL|HALFTHREAD|HALF};
enum {COUL_FLAG=1};
typedef DeviceType device_type;
PairLJCharmmCoulCharmmKokkos(class LAMMPS *);
~PairLJCharmmCoulCharmmKokkos();
void compute(int, int);
void settings(int, char **);
void init_tables(double cut_coul, double *cut_respa);
void init_style();
double init_one(int, int);
- struct params_lj_coul{
- params_lj_coul(){cut_ljsq=0;cut_coulsq=0;lj1=0;lj2=0;lj3=0;lj4=0;offset=0;};
- params_lj_coul(int i){cut_ljsq=0;cut_coulsq=0;lj1=0;lj2=0;lj3=0;lj4=0;offset=0;};
- F_FLOAT cut_ljsq,cut_coulsq,lj1,lj2,lj3,lj4,offset;
- };
protected:
void cleanup_copy();
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_fpair(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype) const;
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_fcoul(const F_FLOAT& rsq, const int& i, const int&j, const int& itype,
const int& jtype, const F_FLOAT& factor_coul, const F_FLOAT& qtmp) const;
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_evdwl(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype) const;
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_ecoul(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype, const F_FLOAT& factor_coul, const F_FLOAT& qtmp) const;
Kokkos::DualView<params_lj_coul**,Kokkos::LayoutRight,DeviceType> k_params;
typename Kokkos::DualView<params_lj_coul**,
Kokkos::LayoutRight,DeviceType>::t_dev_const_um params;
// hardwired to space for 12 atom types
params_lj_coul m_params[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
F_FLOAT m_cutsq[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
F_FLOAT m_cut_ljsq[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
F_FLOAT m_cut_coulsq[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
typename ArrayTypes<DeviceType>::t_x_array_randomread x;
typename ArrayTypes<DeviceType>::t_x_array c_x;
typename ArrayTypes<DeviceType>::t_f_array f;
typename ArrayTypes<DeviceType>::t_int_1d_randomread type;
typename ArrayTypes<DeviceType>::t_float_1d_randomread q;
DAT::tdual_efloat_1d k_eatom;
DAT::tdual_virial_array k_vatom;
typename ArrayTypes<DeviceType>::t_efloat_1d d_eatom;
typename ArrayTypes<DeviceType>::t_virial_array d_vatom;
int newton_pair;
typename ArrayTypes<DeviceType>::tdual_ffloat_2d k_cutsq;
typename ArrayTypes<DeviceType>::t_ffloat_2d d_cutsq;
typename ArrayTypes<DeviceType>::tdual_ffloat_2d k_cut_ljsq;
typename ArrayTypes<DeviceType>::t_ffloat_2d d_cut_ljsq;
typename ArrayTypes<DeviceType>::tdual_ffloat_2d k_cut_coulsq;
typename ArrayTypes<DeviceType>::t_ffloat_2d d_cut_coulsq;
typename ArrayTypes<DeviceType>::t_ffloat_1d_randomread
d_rtable, d_drtable, d_ftable, d_dftable,
d_ctable, d_dctable, d_etable, d_detable;
int neighflag;
int nlocal,nall,eflag,vflag;
double special_coul[4];
double special_lj[4];
double qqrd2e;
void allocate();
friend class PairComputeFunctor<PairLJCharmmCoulCharmmKokkos,FULL,true,CoulLongTable<1> >;
friend class PairComputeFunctor<PairLJCharmmCoulCharmmKokkos,HALF,true,CoulLongTable<1> >;
friend class PairComputeFunctor<PairLJCharmmCoulCharmmKokkos,HALFTHREAD,true,CoulLongTable<1> >;
friend class PairComputeFunctor<PairLJCharmmCoulCharmmKokkos,FULL,false,CoulLongTable<1> >;
friend class PairComputeFunctor<PairLJCharmmCoulCharmmKokkos,HALF,false,CoulLongTable<1> >;
friend class PairComputeFunctor<PairLJCharmmCoulCharmmKokkos,HALFTHREAD,false,CoulLongTable<1> >;
friend EV_FLOAT pair_compute_neighlist<PairLJCharmmCoulCharmmKokkos,FULL,CoulLongTable<1> >(PairLJCharmmCoulCharmmKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute_neighlist<PairLJCharmmCoulCharmmKokkos,HALF,CoulLongTable<1> >(PairLJCharmmCoulCharmmKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute_neighlist<PairLJCharmmCoulCharmmKokkos,HALFTHREAD,CoulLongTable<1> >(PairLJCharmmCoulCharmmKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute<PairLJCharmmCoulCharmmKokkos,CoulLongTable<1> >(PairLJCharmmCoulCharmmKokkos*,
NeighListKokkos<DeviceType>*);
friend class PairComputeFunctor<PairLJCharmmCoulCharmmKokkos,FULL,true,CoulLongTable<0> >;
friend class PairComputeFunctor<PairLJCharmmCoulCharmmKokkos,HALF,true,CoulLongTable<0> >;
friend class PairComputeFunctor<PairLJCharmmCoulCharmmKokkos,HALFTHREAD,true,CoulLongTable<0> >;
friend class PairComputeFunctor<PairLJCharmmCoulCharmmKokkos,FULL,false,CoulLongTable<0> >;
friend class PairComputeFunctor<PairLJCharmmCoulCharmmKokkos,HALF,false,CoulLongTable<0> >;
friend class PairComputeFunctor<PairLJCharmmCoulCharmmKokkos,HALFTHREAD,false,CoulLongTable<0> >;
friend EV_FLOAT pair_compute_neighlist<PairLJCharmmCoulCharmmKokkos,FULL,CoulLongTable<0> >(PairLJCharmmCoulCharmmKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute_neighlist<PairLJCharmmCoulCharmmKokkos,HALF,CoulLongTable<0> >(PairLJCharmmCoulCharmmKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute_neighlist<PairLJCharmmCoulCharmmKokkos,HALFTHREAD,CoulLongTable<0> >(PairLJCharmmCoulCharmmKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute<PairLJCharmmCoulCharmmKokkos,CoulLongTable<0> >(PairLJCharmmCoulCharmmKokkos*,
NeighListKokkos<DeviceType>*);
friend void pair_virial_fdotr_compute<PairLJCharmmCoulCharmmKokkos>(PairLJCharmmCoulCharmmKokkos*);
};
}
#endif
#endif
/* ERROR/WARNING messages:
E: Illegal ... command
Self-explanatory. Check the input script syntax and compare to the
documentation for the command. You can use -echo screen as a
command-line option when running LAMMPS to see the offending line.
E: Cannot use Kokkos pair style with rRESPA inner/middle
Self-explanatory.
E: Cannot use chosen neighbor list style with lj/charmm/coul/charmm/kk
Self-explanatory.
*/
diff --git a/src/KOKKOS/pair_lj_charmm_coul_long_kokkos.h b/src/KOKKOS/pair_lj_charmm_coul_long_kokkos.h
index fcdab7ddb..0969d11b0 100644
--- a/src/KOKKOS/pair_lj_charmm_coul_long_kokkos.h
+++ b/src/KOKKOS/pair_lj_charmm_coul_long_kokkos.h
@@ -1,158 +1,152 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#ifdef PAIR_CLASS
PairStyle(lj/charmm/coul/long/kk,PairLJCharmmCoulLongKokkos<LMPDeviceType>)
PairStyle(lj/charmm/coul/long/kk/device,PairLJCharmmCoulLongKokkos<LMPDeviceType>)
PairStyle(lj/charmm/coul/long/kk/host,PairLJCharmmCoulLongKokkos<LMPHostType>)
#else
#ifndef LMP_PAIR_LJ_CHARMM_COUL_LONG_KOKKOS_H
#define LMP_PAIR_LJ_CHARMM_COUL_LONG_KOKKOS_H
#include "pair_kokkos.h"
#include "pair_lj_charmm_coul_long.h"
#include "neigh_list_kokkos.h"
namespace LAMMPS_NS {
template<class DeviceType>
class PairLJCharmmCoulLongKokkos : public PairLJCharmmCoulLong {
public:
enum {EnabledNeighFlags=FULL|HALFTHREAD|HALF};
enum {COUL_FLAG=1};
typedef DeviceType device_type;
PairLJCharmmCoulLongKokkos(class LAMMPS *);
~PairLJCharmmCoulLongKokkos();
void compute(int, int);
void init_tables(double cut_coul, double *cut_respa);
void init_style();
double init_one(int, int);
- struct params_lj_coul{
- params_lj_coul(){cut_ljsq=0;cut_coulsq=0;lj1=0;lj2=0;lj3=0;lj4=0;offset=0;};
- params_lj_coul(int i){cut_ljsq=0;cut_coulsq=0;lj1=0;lj2=0;lj3=0;lj4=0;offset=0;};
- F_FLOAT cut_ljsq,cut_coulsq,lj1,lj2,lj3,lj4,offset;
- };
-
protected:
void cleanup_copy();
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_fpair(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype) const;
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_fcoul(const F_FLOAT& rsq, const int& i, const int&j, const int& itype,
const int& jtype, const F_FLOAT& factor_coul, const F_FLOAT& qtmp) const;
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_evdwl(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype) const;
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_ecoul(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype, const F_FLOAT& factor_coul, const F_FLOAT& qtmp) const;
Kokkos::DualView<params_lj_coul**,Kokkos::LayoutRight,DeviceType> k_params;
typename Kokkos::DualView<params_lj_coul**,
Kokkos::LayoutRight,DeviceType>::t_dev_const_um params;
// hardwired to space for 12 atom types
params_lj_coul m_params[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
F_FLOAT m_cutsq[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
F_FLOAT m_cut_ljsq[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
F_FLOAT m_cut_coulsq[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
typename ArrayTypes<DeviceType>::t_x_array_randomread x;
typename ArrayTypes<DeviceType>::t_x_array c_x;
typename ArrayTypes<DeviceType>::t_f_array f;
typename ArrayTypes<DeviceType>::t_int_1d_randomread type;
typename ArrayTypes<DeviceType>::t_float_1d_randomread q;
DAT::tdual_efloat_1d k_eatom;
DAT::tdual_virial_array k_vatom;
typename ArrayTypes<DeviceType>::t_efloat_1d d_eatom;
typename ArrayTypes<DeviceType>::t_virial_array d_vatom;
int newton_pair;
typename ArrayTypes<DeviceType>::tdual_ffloat_2d k_cutsq;
typename ArrayTypes<DeviceType>::t_ffloat_2d d_cutsq;
typename ArrayTypes<DeviceType>::tdual_ffloat_2d k_cut_ljsq;
typename ArrayTypes<DeviceType>::t_ffloat_2d d_cut_ljsq;
typename ArrayTypes<DeviceType>::tdual_ffloat_2d k_cut_coulsq;
typename ArrayTypes<DeviceType>::t_ffloat_2d d_cut_coulsq;
typename ArrayTypes<DeviceType>::t_ffloat_1d_randomread
d_rtable, d_drtable, d_ftable, d_dftable,
d_ctable, d_dctable, d_etable, d_detable;
int neighflag;
int nlocal,nall,eflag,vflag;
double special_coul[4];
double special_lj[4];
double qqrd2e;
void allocate();
friend class PairComputeFunctor<PairLJCharmmCoulLongKokkos,FULL,true,CoulLongTable<1> >;
friend class PairComputeFunctor<PairLJCharmmCoulLongKokkos,HALF,true,CoulLongTable<1> >;
friend class PairComputeFunctor<PairLJCharmmCoulLongKokkos,HALFTHREAD,true,CoulLongTable<1> >;
friend class PairComputeFunctor<PairLJCharmmCoulLongKokkos,FULL,false,CoulLongTable<1> >;
friend class PairComputeFunctor<PairLJCharmmCoulLongKokkos,HALF,false,CoulLongTable<1> >;
friend class PairComputeFunctor<PairLJCharmmCoulLongKokkos,HALFTHREAD,false,CoulLongTable<1> >;
friend EV_FLOAT pair_compute_neighlist<PairLJCharmmCoulLongKokkos,FULL,CoulLongTable<1> >(PairLJCharmmCoulLongKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute_neighlist<PairLJCharmmCoulLongKokkos,HALF,CoulLongTable<1> >(PairLJCharmmCoulLongKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute_neighlist<PairLJCharmmCoulLongKokkos,HALFTHREAD,CoulLongTable<1> >(PairLJCharmmCoulLongKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute<PairLJCharmmCoulLongKokkos,CoulLongTable<1> >(PairLJCharmmCoulLongKokkos*,
NeighListKokkos<DeviceType>*);
friend class PairComputeFunctor<PairLJCharmmCoulLongKokkos,FULL,true,CoulLongTable<0> >;
friend class PairComputeFunctor<PairLJCharmmCoulLongKokkos,HALF,true,CoulLongTable<0> >;
friend class PairComputeFunctor<PairLJCharmmCoulLongKokkos,HALFTHREAD,true,CoulLongTable<0> >;
friend class PairComputeFunctor<PairLJCharmmCoulLongKokkos,FULL,false,CoulLongTable<0> >;
friend class PairComputeFunctor<PairLJCharmmCoulLongKokkos,HALF,false,CoulLongTable<0> >;
friend class PairComputeFunctor<PairLJCharmmCoulLongKokkos,HALFTHREAD,false,CoulLongTable<0> >;
friend EV_FLOAT pair_compute_neighlist<PairLJCharmmCoulLongKokkos,FULL,CoulLongTable<0> >(PairLJCharmmCoulLongKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute_neighlist<PairLJCharmmCoulLongKokkos,HALF,CoulLongTable<0> >(PairLJCharmmCoulLongKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute_neighlist<PairLJCharmmCoulLongKokkos,HALFTHREAD,CoulLongTable<0> >(PairLJCharmmCoulLongKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute<PairLJCharmmCoulLongKokkos,CoulLongTable<0> >(PairLJCharmmCoulLongKokkos*,
NeighListKokkos<DeviceType>*);
friend void pair_virial_fdotr_compute<PairLJCharmmCoulLongKokkos>(PairLJCharmmCoulLongKokkos*);
};
}
#endif
#endif
/* ERROR/WARNING messages:
E: Cannot use Kokkos pair style with rRESPA inner/middle
Self-explanatory.
E: Cannot use chosen neighbor list style with lj/charmm/coul/long/kk
Self-explanatory.
*/
diff --git a/src/KOKKOS/pair_lj_class2_coul_cut_kokkos.h b/src/KOKKOS/pair_lj_class2_coul_cut_kokkos.h
index 1ea5bc69b..c3492666d 100644
--- a/src/KOKKOS/pair_lj_class2_coul_cut_kokkos.h
+++ b/src/KOKKOS/pair_lj_class2_coul_cut_kokkos.h
@@ -1,144 +1,139 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#ifdef PAIR_CLASS
PairStyle(lj/class2/coul/cut/kk,PairLJClass2CoulCutKokkos<LMPDeviceType>)
PairStyle(lj/class2/coul/cut/kk/device,PairLJClass2CoulCutKokkos<LMPDeviceType>)
PairStyle(lj/class2/coul/cut/kk/host,PairLJClass2CoulCutKokkos<LMPHostType>)
#else
#ifndef LMP_PAIR_LJ_CLASS2_COUL_CUT_KOKKOS_H
#define LMP_PAIR_LJ_CLASS2_COUL_CUT_KOKKOS_H
#include "pair_kokkos.h"
#include "pair_lj_class2_coul_cut.h"
#include "neigh_list_kokkos.h"
namespace LAMMPS_NS {
template<class DeviceType>
class PairLJClass2CoulCutKokkos : public PairLJClass2CoulCut {
public:
enum {EnabledNeighFlags=FULL|HALFTHREAD|HALF};
enum {COUL_FLAG=1};
typedef DeviceType device_type;
PairLJClass2CoulCutKokkos(class LAMMPS *);
~PairLJClass2CoulCutKokkos();
void compute(int, int);
void settings(int, char **);
void init_style();
double init_one(int, int);
- struct params_lj_coul{
- params_lj_coul(){cut_ljsq=0;cut_coulsq=0;lj1=0;lj2=0;lj3=0;lj4=0;offset=0;};
- params_lj_coul(int i){cut_ljsq=0;cut_coulsq=0;lj1=0;lj2=0;lj3=0;lj4=0;offset=0;};
- F_FLOAT cut_ljsq,cut_coulsq,lj1,lj2,lj3,lj4,offset;
- };
protected:
void cleanup_copy();
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_fpair(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype) const;
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_fcoul(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype, const F_FLOAT& factor_coul, const F_FLOAT& qtmp) const;
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_evdwl(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype) const;
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_ecoul(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype, const F_FLOAT& factor_coul, const F_FLOAT& qtmp) const;
Kokkos::DualView<params_lj_coul**,Kokkos::LayoutRight,DeviceType> k_params;
typename Kokkos::DualView<params_lj_coul**,
Kokkos::LayoutRight,DeviceType>::t_dev_const_um params;
// hardwired to space for 12 atom types
params_lj_coul m_params[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
F_FLOAT m_cutsq[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
F_FLOAT m_cut_ljsq[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
F_FLOAT m_cut_coulsq[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
typename ArrayTypes<DeviceType>::t_x_array_randomread x;
typename ArrayTypes<DeviceType>::t_x_array c_x;
typename ArrayTypes<DeviceType>::t_f_array f;
typename ArrayTypes<DeviceType>::t_int_1d_randomread type;
typename ArrayTypes<DeviceType>::t_float_1d_randomread q;
typename ArrayTypes<DeviceType>::t_efloat_1d d_eatom;
typename ArrayTypes<DeviceType>::t_virial_array d_vatom;
int newton_pair;
typename ArrayTypes<DeviceType>::tdual_ffloat_2d k_cutsq;
typename ArrayTypes<DeviceType>::t_ffloat_2d d_cutsq;
typename ArrayTypes<DeviceType>::tdual_ffloat_2d k_cut_ljsq;
typename ArrayTypes<DeviceType>::t_ffloat_2d d_cut_ljsq;
typename ArrayTypes<DeviceType>::tdual_ffloat_2d k_cut_coulsq;
typename ArrayTypes<DeviceType>::t_ffloat_2d d_cut_coulsq;
int neighflag;
int nlocal,nall,eflag,vflag;
double special_coul[4];
double special_lj[4];
double qqrd2e;
void allocate();
friend class PairComputeFunctor<PairLJClass2CoulCutKokkos,FULL,true>;
friend class PairComputeFunctor<PairLJClass2CoulCutKokkos,HALF,true>;
friend class PairComputeFunctor<PairLJClass2CoulCutKokkos,HALFTHREAD,true>;
friend class PairComputeFunctor<PairLJClass2CoulCutKokkos,FULL,false>;
friend class PairComputeFunctor<PairLJClass2CoulCutKokkos,HALF,false>;
friend class PairComputeFunctor<PairLJClass2CoulCutKokkos,HALFTHREAD,false>;
friend EV_FLOAT pair_compute_neighlist<PairLJClass2CoulCutKokkos,FULL,void>(PairLJClass2CoulCutKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute_neighlist<PairLJClass2CoulCutKokkos,HALF,void>(PairLJClass2CoulCutKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute_neighlist<PairLJClass2CoulCutKokkos,HALFTHREAD,void>(PairLJClass2CoulCutKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute<PairLJClass2CoulCutKokkos,void>(PairLJClass2CoulCutKokkos*,
NeighListKokkos<DeviceType>*);
};
}
#endif
#endif
/* ERROR/WARNING messages:
E: Illegal ... command
Self-explanatory. Check the input script syntax and compare to the
documentation for the command. You can use -echo screen as a
command-line option when running LAMMPS to see the offending line.
E: Cannot use Kokkos pair style with rRESPA inner/middle
Self-explanatory.
E: Cannot use chosen neighbor list style with lj/class2/coul/cut/kk
Self-explanatory.
*/
diff --git a/src/KOKKOS/pair_lj_class2_coul_long_kokkos.h b/src/KOKKOS/pair_lj_class2_coul_long_kokkos.h
index 0b1b2dc90..c5c46ed2d 100644
--- a/src/KOKKOS/pair_lj_class2_coul_long_kokkos.h
+++ b/src/KOKKOS/pair_lj_class2_coul_long_kokkos.h
@@ -1,161 +1,155 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#ifdef PAIR_CLASS
PairStyle(lj/class2/coul/long/kk,PairLJClass2CoulLongKokkos<LMPDeviceType>)
PairStyle(lj/class2/coul/long/kk/device,PairLJClass2CoulLongKokkos<LMPDeviceType>)
PairStyle(lj/class2/coul/long/kk/host,PairLJClass2CoulLongKokkos<LMPHostType>)
#else
#ifndef LMP_PAIR_LJ_CLASS2_COUL_LONG_KOKKOS_H
#define LMP_PAIR_LJ_CLASS2_COUL_LONG_KOKKOS_H
#include "pair_kokkos.h"
#include "pair_lj_class2_coul_long.h"
#include "neigh_list_kokkos.h"
namespace LAMMPS_NS {
template<class DeviceType>
class PairLJClass2CoulLongKokkos : public PairLJClass2CoulLong {
public:
enum {EnabledNeighFlags=FULL|HALFTHREAD|HALF};
enum {COUL_FLAG=1};
typedef DeviceType device_type;
PairLJClass2CoulLongKokkos(class LAMMPS *);
~PairLJClass2CoulLongKokkos();
void compute(int, int);
void settings(int, char **);
void init_tables(double cut_coul, double *cut_respa);
void init_style();
double init_one(int, int);
- struct params_lj_coul{
- params_lj_coul(){cut_ljsq=0;cut_coulsq=0;lj1=0;lj2=0;lj3=0;lj4=0;offset=0;};
- params_lj_coul(int i){cut_ljsq=0;cut_coulsq=0;lj1=0;lj2=0;lj3=0;lj4=0;offset=0;};
- F_FLOAT cut_ljsq,cut_coulsq,lj1,lj2,lj3,lj4,offset;
- };
-
protected:
void cleanup_copy();
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_fpair(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype) const;
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_fcoul(const F_FLOAT& rsq, const int& i, const int&j, const int& itype,
const int& jtype, const F_FLOAT& factor_coul, const F_FLOAT& qtmp) const;
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_evdwl(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype) const;
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_ecoul(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype, const F_FLOAT& factor_coul, const F_FLOAT& qtmp) const;
Kokkos::DualView<params_lj_coul**,Kokkos::LayoutRight,DeviceType> k_params;
typename Kokkos::DualView<params_lj_coul**,
Kokkos::LayoutRight,DeviceType>::t_dev_const_um params;
// hardwired to space for 12 atom types
params_lj_coul m_params[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
F_FLOAT m_cutsq[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
F_FLOAT m_cut_ljsq[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
F_FLOAT m_cut_coulsq[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
typename ArrayTypes<DeviceType>::t_x_array_randomread x;
typename ArrayTypes<DeviceType>::t_x_array c_x;
typename ArrayTypes<DeviceType>::t_f_array f;
typename ArrayTypes<DeviceType>::t_int_1d_randomread type;
typename ArrayTypes<DeviceType>::t_float_1d_randomread q;
typename ArrayTypes<DeviceType>::t_efloat_1d d_eatom;
typename ArrayTypes<DeviceType>::t_virial_array d_vatom;
int newton_pair;
typename ArrayTypes<DeviceType>::tdual_ffloat_2d k_cutsq;
typename ArrayTypes<DeviceType>::t_ffloat_2d d_cutsq;
typename ArrayTypes<DeviceType>::tdual_ffloat_2d k_cut_ljsq;
typename ArrayTypes<DeviceType>::t_ffloat_2d d_cut_ljsq;
typename ArrayTypes<DeviceType>::tdual_ffloat_2d k_cut_coulsq;
typename ArrayTypes<DeviceType>::t_ffloat_2d d_cut_coulsq;
typename ArrayTypes<DeviceType>::t_ffloat_1d_randomread
d_rtable, d_drtable, d_ftable, d_dftable,
d_ctable, d_dctable, d_etable, d_detable;
int neighflag;
int nlocal,nall,eflag,vflag;
double special_coul[4];
double special_lj[4];
double qqrd2e;
void allocate();
friend class PairComputeFunctor<PairLJClass2CoulLongKokkos,FULL,true,CoulLongTable<1> >;
friend class PairComputeFunctor<PairLJClass2CoulLongKokkos,HALF,true,CoulLongTable<1> >;
friend class PairComputeFunctor<PairLJClass2CoulLongKokkos,HALFTHREAD,true,CoulLongTable<1> >;
friend class PairComputeFunctor<PairLJClass2CoulLongKokkos,FULL,false,CoulLongTable<1> >;
friend class PairComputeFunctor<PairLJClass2CoulLongKokkos,HALF,false,CoulLongTable<1> >;
friend class PairComputeFunctor<PairLJClass2CoulLongKokkos,HALFTHREAD,false,CoulLongTable<1> >;
friend EV_FLOAT pair_compute_neighlist<PairLJClass2CoulLongKokkos,FULL,CoulLongTable<1> >(PairLJClass2CoulLongKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute_neighlist<PairLJClass2CoulLongKokkos,HALF,CoulLongTable<1> >(PairLJClass2CoulLongKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute_neighlist<PairLJClass2CoulLongKokkos,HALFTHREAD,CoulLongTable<1> >(PairLJClass2CoulLongKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute<PairLJClass2CoulLongKokkos,CoulLongTable<1> >(PairLJClass2CoulLongKokkos*,
NeighListKokkos<DeviceType>*);
friend class PairComputeFunctor<PairLJClass2CoulLongKokkos,FULL,true,CoulLongTable<0> >;
friend class PairComputeFunctor<PairLJClass2CoulLongKokkos,HALF,true,CoulLongTable<0> >;
friend class PairComputeFunctor<PairLJClass2CoulLongKokkos,HALFTHREAD,true,CoulLongTable<0> >;
friend class PairComputeFunctor<PairLJClass2CoulLongKokkos,FULL,false,CoulLongTable<0> >;
friend class PairComputeFunctor<PairLJClass2CoulLongKokkos,HALF,false,CoulLongTable<0> >;
friend class PairComputeFunctor<PairLJClass2CoulLongKokkos,HALFTHREAD,false,CoulLongTable<0> >;
friend EV_FLOAT pair_compute_neighlist<PairLJClass2CoulLongKokkos,FULL,CoulLongTable<0> >(PairLJClass2CoulLongKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute_neighlist<PairLJClass2CoulLongKokkos,HALF,CoulLongTable<0> >(PairLJClass2CoulLongKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute_neighlist<PairLJClass2CoulLongKokkos,HALFTHREAD,CoulLongTable<0> >(PairLJClass2CoulLongKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute<PairLJClass2CoulLongKokkos,CoulLongTable<0> >(PairLJClass2CoulLongKokkos*,
NeighListKokkos<DeviceType>*);
friend void pair_virial_fdotr_compute<PairLJClass2CoulLongKokkos>(PairLJClass2CoulLongKokkos*);
};
}
#endif
#endif
/* ERROR/WARNING messages:
E: Illegal ... command
Self-explanatory. Check the input script syntax and compare to the
documentation for the command. You can use -echo screen as a
command-line option when running LAMMPS to see the offending line.
E: Cannot use Kokkos pair style with rRESPA inner/middle
Self-explanatory.
E: Cannot use chosen neighbor list style with lj/class2/coul/long/kk
Self-explanatory.
*/
diff --git a/src/KOKKOS/pair_lj_cut_coul_cut_kokkos.h b/src/KOKKOS/pair_lj_cut_coul_cut_kokkos.h
index 36f31d176..5891371d1 100644
--- a/src/KOKKOS/pair_lj_cut_coul_cut_kokkos.h
+++ b/src/KOKKOS/pair_lj_cut_coul_cut_kokkos.h
@@ -1,147 +1,139 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#ifdef PAIR_CLASS
PairStyle(lj/cut/coul/cut/kk,PairLJCutCoulCutKokkos<LMPDeviceType>)
PairStyle(lj/cut/coul/cut/kk/device,PairLJCutCoulCutKokkos<LMPDeviceType>)
PairStyle(lj/cut/coul/cut/kk/host,PairLJCutCoulCutKokkos<LMPHostType>)
#else
#ifndef LMP_PAIR_LJ_CUT_COUL_CUT_KOKKOS_H
#define LMP_PAIR_LJ_CUT_COUL_CUT_KOKKOS_H
#include "pair_kokkos.h"
#include "pair_lj_cut_coul_cut.h"
#include "neigh_list_kokkos.h"
namespace LAMMPS_NS {
template<class DeviceType>
class PairLJCutCoulCutKokkos : public PairLJCutCoulCut {
public:
enum {EnabledNeighFlags=FULL|HALFTHREAD|HALF};
enum {COUL_FLAG=1};
typedef DeviceType device_type;
PairLJCutCoulCutKokkos(class LAMMPS *);
~PairLJCutCoulCutKokkos();
void compute(int, int);
void settings(int, char **);
void init_style();
double init_one(int, int);
- struct params_lj_coul{
- KOKKOS_INLINE_FUNCTION
- params_lj_coul(){cut_ljsq=0;cut_coulsq=0;lj1=0;lj2=0;lj3=0;lj4=0;offset=0;};
- KOKKOS_INLINE_FUNCTION
- params_lj_coul(int i){cut_ljsq=0;cut_coulsq=0;lj1=0;lj2=0;lj3=0;lj4=0;offset=0;};
- F_FLOAT cut_ljsq,cut_coulsq,lj1,lj2,lj3,lj4,offset;
- };
-
protected:
void cleanup_copy();
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_fpair(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype) const;
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_fcoul(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype, const F_FLOAT& factor_coul, const F_FLOAT& qtmp) const;
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_evdwl(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype) const;
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_ecoul(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype, const F_FLOAT& factor_coul, const F_FLOAT& qtmp) const;
Kokkos::DualView<params_lj_coul**,Kokkos::LayoutRight,DeviceType> k_params;
typename Kokkos::DualView<params_lj_coul**,
Kokkos::LayoutRight,DeviceType>::t_dev_const_um params;
// hardwired to space for 12 atom types
params_lj_coul m_params[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
F_FLOAT m_cutsq[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
F_FLOAT m_cut_ljsq[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
F_FLOAT m_cut_coulsq[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
typename ArrayTypes<DeviceType>::t_x_array_randomread x;
typename ArrayTypes<DeviceType>::t_x_array c_x;
typename ArrayTypes<DeviceType>::t_f_array f;
typename ArrayTypes<DeviceType>::t_int_1d_randomread type;
typename ArrayTypes<DeviceType>::t_float_1d_randomread q;
typename ArrayTypes<DeviceType>::t_efloat_1d d_eatom;
typename ArrayTypes<DeviceType>::t_virial_array d_vatom;
int newton_pair;
typename ArrayTypes<DeviceType>::tdual_ffloat_2d k_cutsq;
typename ArrayTypes<DeviceType>::t_ffloat_2d d_cutsq;
typename ArrayTypes<DeviceType>::tdual_ffloat_2d k_cut_ljsq;
typename ArrayTypes<DeviceType>::t_ffloat_2d d_cut_ljsq;
typename ArrayTypes<DeviceType>::tdual_ffloat_2d k_cut_coulsq;
typename ArrayTypes<DeviceType>::t_ffloat_2d d_cut_coulsq;
int neighflag;
int nlocal,nall,eflag,vflag;
double special_coul[4];
double special_lj[4];
double qqrd2e;
void allocate();
friend class PairComputeFunctor<PairLJCutCoulCutKokkos,FULL,true>;
friend class PairComputeFunctor<PairLJCutCoulCutKokkos,HALF,true>;
friend class PairComputeFunctor<PairLJCutCoulCutKokkos,HALFTHREAD,true>;
friend class PairComputeFunctor<PairLJCutCoulCutKokkos,FULL,false>;
friend class PairComputeFunctor<PairLJCutCoulCutKokkos,HALF,false>;
friend class PairComputeFunctor<PairLJCutCoulCutKokkos,HALFTHREAD,false>;
friend EV_FLOAT pair_compute_neighlist<PairLJCutCoulCutKokkos,FULL,void>(PairLJCutCoulCutKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute_neighlist<PairLJCutCoulCutKokkos,HALF,void>(PairLJCutCoulCutKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute_neighlist<PairLJCutCoulCutKokkos,HALFTHREAD,void>(PairLJCutCoulCutKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute<PairLJCutCoulCutKokkos,void>(PairLJCutCoulCutKokkos*,
NeighListKokkos<DeviceType>*);
};
}
#endif
#endif
/* ERROR/WARNING messages:
E: Illegal ... command
Self-explanatory. Check the input script syntax and compare to the
documentation for the command. You can use -echo screen as a
command-line option when running LAMMPS to see the offending line.
E: Cannot use Kokkos pair style with rRESPA inner/middle
Self-explanatory.
E: Cannot use chosen neighbor list style with lj/cut/coul/cut/kk
That style is not supported by Kokkos.
*/
diff --git a/src/KOKKOS/pair_lj_cut_coul_debye_kokkos.h b/src/KOKKOS/pair_lj_cut_coul_debye_kokkos.h
index 9e1e30aba..d507f76a3 100644
--- a/src/KOKKOS/pair_lj_cut_coul_debye_kokkos.h
+++ b/src/KOKKOS/pair_lj_cut_coul_debye_kokkos.h
@@ -1,145 +1,139 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#ifdef PAIR_CLASS
PairStyle(lj/cut/coul/debye/kk,PairLJCutCoulDebyeKokkos<LMPDeviceType>)
PairStyle(lj/cut/coul/debye/kk/device,PairLJCutCoulDebyeKokkos<LMPDeviceType>)
PairStyle(lj/cut/coul/debye/kk/host,PairLJCutCoulDebyeKokkos<LMPHostType>)
#else
#ifndef LMP_PAIR_LJ_CUT_COUL_DEBYE_KOKKOS_H
#define LMP_PAIR_LJ_CUT_COUL_DEBYE_KOKKOS_H
#include "pair_kokkos.h"
#include "pair_lj_cut_coul_debye.h"
#include "neigh_list_kokkos.h"
namespace LAMMPS_NS {
template<class DeviceType>
class PairLJCutCoulDebyeKokkos : public PairLJCutCoulDebye {
public:
enum {EnabledNeighFlags=FULL|HALFTHREAD|HALF};
enum {COUL_FLAG=1};
typedef DeviceType device_type;
PairLJCutCoulDebyeKokkos(class LAMMPS *);
~PairLJCutCoulDebyeKokkos();
void compute(int, int);
void settings(int, char **);
void init_style();
double init_one(int, int);
- struct params_lj_coul{
- params_lj_coul(){cut_ljsq=0;cut_coulsq=0;lj1=0;lj2=0;lj3=0;lj4=0;offset=0;};
- params_lj_coul(int i){cut_ljsq=0;cut_coulsq=0;lj1=0;lj2=0;lj3=0;lj4=0;offset=0;};
- F_FLOAT cut_ljsq,cut_coulsq,lj1,lj2,lj3,lj4,offset;
- };
-
protected:
void cleanup_copy();
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_fpair(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype) const;
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_fcoul(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype, const F_FLOAT& factor_coul, const F_FLOAT& qtmp) const;
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_evdwl(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype) const;
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_ecoul(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype, const F_FLOAT& factor_coul, const F_FLOAT& qtmp) const;
Kokkos::DualView<params_lj_coul**,Kokkos::LayoutRight,DeviceType> k_params;
typename Kokkos::DualView<params_lj_coul**,
Kokkos::LayoutRight,DeviceType>::t_dev_const_um params;
// hardwired to space for 12 atom types
params_lj_coul m_params[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
F_FLOAT m_cutsq[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
F_FLOAT m_cut_ljsq[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
F_FLOAT m_cut_coulsq[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
typename ArrayTypes<DeviceType>::t_x_array_randomread x;
typename ArrayTypes<DeviceType>::t_x_array c_x;
typename ArrayTypes<DeviceType>::t_f_array f;
typename ArrayTypes<DeviceType>::t_int_1d_randomread type;
typename ArrayTypes<DeviceType>::t_float_1d_randomread q;
typename ArrayTypes<DeviceType>::t_efloat_1d d_eatom;
typename ArrayTypes<DeviceType>::t_virial_array d_vatom;
int newton_pair;
typename ArrayTypes<DeviceType>::tdual_ffloat_2d k_cutsq;
typename ArrayTypes<DeviceType>::t_ffloat_2d d_cutsq;
typename ArrayTypes<DeviceType>::tdual_ffloat_2d k_cut_ljsq;
typename ArrayTypes<DeviceType>::t_ffloat_2d d_cut_ljsq;
typename ArrayTypes<DeviceType>::tdual_ffloat_2d k_cut_coulsq;
typename ArrayTypes<DeviceType>::t_ffloat_2d d_cut_coulsq;
int neighflag;
int nlocal,nall,eflag,vflag;
double special_coul[4];
double special_lj[4];
double qqrd2e;
void allocate();
friend class PairComputeFunctor<PairLJCutCoulDebyeKokkos,FULL,true>;
friend class PairComputeFunctor<PairLJCutCoulDebyeKokkos,HALF,true>;
friend class PairComputeFunctor<PairLJCutCoulDebyeKokkos,HALFTHREAD,true>;
friend class PairComputeFunctor<PairLJCutCoulDebyeKokkos,FULL,false>;
friend class PairComputeFunctor<PairLJCutCoulDebyeKokkos,HALF,false>;
friend class PairComputeFunctor<PairLJCutCoulDebyeKokkos,HALFTHREAD,false>;
friend EV_FLOAT pair_compute_neighlist<PairLJCutCoulDebyeKokkos,FULL,void>(PairLJCutCoulDebyeKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute_neighlist<PairLJCutCoulDebyeKokkos,HALF,void>(PairLJCutCoulDebyeKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute_neighlist<PairLJCutCoulDebyeKokkos,HALFTHREAD,void>(PairLJCutCoulDebyeKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute<PairLJCutCoulDebyeKokkos,void>(PairLJCutCoulDebyeKokkos*,
NeighListKokkos<DeviceType>*);
};
}
#endif
#endif
/* ERROR/WARNING messages:
E: Illegal ... command
Self-explanatory. Check the input script syntax and compare to the
documentation for the command. You can use -echo screen as a
command-line option when running LAMMPS to see the offending line.
E: Cannot use Kokkos pair style with rRESPA inner/middle
Self-explanatory.
E: Cannot use chosen neighbor list style with lj/cut/coul/debye/kk
Self-explanatory.
*/
diff --git a/src/KOKKOS/pair_lj_cut_coul_dsf_kokkos.h b/src/KOKKOS/pair_lj_cut_coul_dsf_kokkos.h
index b1f578ec0..3e378757c 100644
--- a/src/KOKKOS/pair_lj_cut_coul_dsf_kokkos.h
+++ b/src/KOKKOS/pair_lj_cut_coul_dsf_kokkos.h
@@ -1,138 +1,132 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#ifdef PAIR_CLASS
PairStyle(lj/cut/coul/dsf/kk,PairLJCutCoulDSFKokkos<LMPDeviceType>)
PairStyle(lj/cut/coul/dsf/kk/device,PairLJCutCoulDSFKokkos<LMPDeviceType>)
PairStyle(lj/cut/coul/dsf/kk/host,PairLJCutCoulDSFKokkos<LMPHostType>)
#else
#ifndef LMP_PAIR_LJ_CUT_COUL_DSF_KOKKOS_H
#define LMP_PAIR_LJ_CUT_COUL_DSF_KOKKOS_H
#include "pair_kokkos.h"
#include "pair_lj_cut_coul_dsf.h"
#include "neigh_list_kokkos.h"
namespace LAMMPS_NS {
template<class DeviceType>
class PairLJCutCoulDSFKokkos : public PairLJCutCoulDSF {
public:
enum {EnabledNeighFlags=FULL|HALFTHREAD|HALF};
enum {COUL_FLAG=1};
typedef DeviceType device_type;
PairLJCutCoulDSFKokkos(class LAMMPS *);
~PairLJCutCoulDSFKokkos();
void compute(int, int);
void init_style();
double init_one(int, int);
- struct params_lj_coul{
- params_lj_coul(){cut_ljsq=0;cut_coulsq=0;lj1=0;lj2=0;lj3=0;lj4=0;offset=0;};
- params_lj_coul(int i){cut_ljsq=0;cut_coulsq=0;lj1=0;lj2=0;lj3=0;lj4=0;offset=0;};
- F_FLOAT cut_ljsq,cut_coulsq,lj1,lj2,lj3,lj4,offset;
- };
-
protected:
void cleanup_copy();
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_fpair(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype) const;
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_fcoul(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype, const F_FLOAT& factor_coul, const F_FLOAT& qtmp) const;
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_evdwl(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype) const;
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_ecoul(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype, const F_FLOAT& factor_coul, const F_FLOAT& qtmp) const;
Kokkos::DualView<params_lj_coul**,Kokkos::LayoutRight,DeviceType> k_params;
typename Kokkos::DualView<params_lj_coul**,
Kokkos::LayoutRight,DeviceType>::t_dev_const_um params;
// hardwired to space for 12 atom types
params_lj_coul m_params[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
F_FLOAT m_cutsq[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
F_FLOAT m_cut_ljsq[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
F_FLOAT m_cut_coulsq[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
typename ArrayTypes<DeviceType>::t_x_array_randomread x;
typename ArrayTypes<DeviceType>::t_x_array c_x;
typename ArrayTypes<DeviceType>::t_f_array f;
typename ArrayTypes<DeviceType>::t_int_1d_randomread type;
typename ArrayTypes<DeviceType>::t_float_1d_randomread q;
typename ArrayTypes<DeviceType>::t_efloat_1d d_eatom;
typename ArrayTypes<DeviceType>::t_virial_array d_vatom;
int newton_pair;
typename ArrayTypes<DeviceType>::tdual_ffloat_2d k_cutsq;
typename ArrayTypes<DeviceType>::t_ffloat_2d d_cutsq;
typename ArrayTypes<DeviceType>::tdual_ffloat_2d k_cut_ljsq;
typename ArrayTypes<DeviceType>::t_ffloat_2d d_cut_ljsq;
typename ArrayTypes<DeviceType>::tdual_ffloat_2d k_cut_coulsq;
typename ArrayTypes<DeviceType>::t_ffloat_2d d_cut_coulsq;
int neighflag;
int nlocal,nall,eflag,vflag;
double special_coul[4];
double special_lj[4];
double qqrd2e;
void allocate();
friend class PairComputeFunctor<PairLJCutCoulDSFKokkos,FULL,true>;
friend class PairComputeFunctor<PairLJCutCoulDSFKokkos,HALF,true>;
friend class PairComputeFunctor<PairLJCutCoulDSFKokkos,HALFTHREAD,true>;
friend class PairComputeFunctor<PairLJCutCoulDSFKokkos,FULL,false>;
friend class PairComputeFunctor<PairLJCutCoulDSFKokkos,HALF,false>;
friend class PairComputeFunctor<PairLJCutCoulDSFKokkos,HALFTHREAD,false>;
friend EV_FLOAT pair_compute_neighlist<PairLJCutCoulDSFKokkos,FULL,void>(PairLJCutCoulDSFKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute_neighlist<PairLJCutCoulDSFKokkos,HALF,void>(PairLJCutCoulDSFKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute_neighlist<PairLJCutCoulDSFKokkos,HALFTHREAD,void>(PairLJCutCoulDSFKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute<PairLJCutCoulDSFKokkos,void>(PairLJCutCoulDSFKokkos*,
NeighListKokkos<DeviceType>*);
};
}
#endif
#endif
/* ERROR/WARNING messages:
E: Cannot use Kokkos pair style with rRESPA inner/middle
Self-explanatory.
E: Cannot use chosen neighbor list style with lj/cut/coul/cut/kk
That style is not supported by Kokkos.
*/
diff --git a/src/KOKKOS/pair_lj_cut_coul_long_kokkos.h b/src/KOKKOS/pair_lj_cut_coul_long_kokkos.h
index 5bdaaf96c..732832923 100644
--- a/src/KOKKOS/pair_lj_cut_coul_long_kokkos.h
+++ b/src/KOKKOS/pair_lj_cut_coul_long_kokkos.h
@@ -1,164 +1,158 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#ifdef PAIR_CLASS
PairStyle(lj/cut/coul/long/kk,PairLJCutCoulLongKokkos<LMPDeviceType>)
PairStyle(lj/cut/coul/long/kk/device,PairLJCutCoulLongKokkos<LMPDeviceType>)
PairStyle(lj/cut/coul/long/kk/host,PairLJCutCoulLongKokkos<LMPHostType>)
#else
#ifndef LMP_PAIR_LJ_CUT_COUL_LONG_KOKKOS_H
#define LMP_PAIR_LJ_CUT_COUL_LONG_KOKKOS_H
#include "pair_kokkos.h"
#include "pair_lj_cut_coul_long.h"
#include "neigh_list_kokkos.h"
namespace LAMMPS_NS {
template<class DeviceType>
class PairLJCutCoulLongKokkos : public PairLJCutCoulLong {
public:
enum {EnabledNeighFlags=FULL|HALFTHREAD|HALF};
enum {COUL_FLAG=1};
typedef DeviceType device_type;
PairLJCutCoulLongKokkos(class LAMMPS *);
~PairLJCutCoulLongKokkos();
void compute(int, int);
void settings(int, char **);
void init_tables(double cut_coul, double *cut_respa);
void init_style();
double init_one(int, int);
- struct params_lj_coul{
- params_lj_coul(){cut_ljsq=0;cut_coulsq=0;lj1=0;lj2=0;lj3=0;lj4=0;offset=0;};
- params_lj_coul(int i){cut_ljsq=0;cut_coulsq=0;lj1=0;lj2=0;lj3=0;lj4=0;offset=0;};
- F_FLOAT cut_ljsq,cut_coulsq,lj1,lj2,lj3,lj4,offset;
- };
-
protected:
void cleanup_copy();
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_fpair(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype) const;
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_fcoul(const F_FLOAT& rsq, const int& i, const int&j, const int& itype,
const int& jtype, const F_FLOAT& factor_coul, const F_FLOAT& qtmp) const;
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_evdwl(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype) const;
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_ecoul(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype, const F_FLOAT& factor_coul, const F_FLOAT& qtmp) const;
Kokkos::DualView<params_lj_coul**,Kokkos::LayoutRight,DeviceType> k_params;
typename Kokkos::DualView<params_lj_coul**,
Kokkos::LayoutRight,DeviceType>::t_dev_const_um params;
// hardwired to space for 12 atom types
params_lj_coul m_params[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
F_FLOAT m_cutsq[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
F_FLOAT m_cut_ljsq[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
F_FLOAT m_cut_coulsq[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
typename ArrayTypes<DeviceType>::t_x_array_randomread x;
typename ArrayTypes<DeviceType>::t_x_array c_x;
typename ArrayTypes<DeviceType>::t_f_array f;
typename ArrayTypes<DeviceType>::t_int_1d_randomread type;
typename ArrayTypes<DeviceType>::t_float_1d_randomread q;
DAT::tdual_efloat_1d k_eatom;
DAT::tdual_virial_array k_vatom;
typename ArrayTypes<DeviceType>::t_efloat_1d d_eatom;
typename ArrayTypes<DeviceType>::t_virial_array d_vatom;
int newton_pair;
typename ArrayTypes<DeviceType>::tdual_ffloat_2d k_cutsq;
typename ArrayTypes<DeviceType>::t_ffloat_2d d_cutsq;
typename ArrayTypes<DeviceType>::tdual_ffloat_2d k_cut_ljsq;
typename ArrayTypes<DeviceType>::t_ffloat_2d d_cut_ljsq;
typename ArrayTypes<DeviceType>::tdual_ffloat_2d k_cut_coulsq;
typename ArrayTypes<DeviceType>::t_ffloat_2d d_cut_coulsq;
typename ArrayTypes<DeviceType>::t_ffloat_1d_randomread
d_rtable, d_drtable, d_ftable, d_dftable,
d_ctable, d_dctable, d_etable, d_detable;
int neighflag;
int nlocal,nall,eflag,vflag;
double special_coul[4];
double special_lj[4];
double qqrd2e;
void allocate();
friend class PairComputeFunctor<PairLJCutCoulLongKokkos,FULL,true,CoulLongTable<1> >;
friend class PairComputeFunctor<PairLJCutCoulLongKokkos,HALF,true,CoulLongTable<1> >;
friend class PairComputeFunctor<PairLJCutCoulLongKokkos,HALFTHREAD,true,CoulLongTable<1> >;
friend class PairComputeFunctor<PairLJCutCoulLongKokkos,FULL,false,CoulLongTable<1> >;
friend class PairComputeFunctor<PairLJCutCoulLongKokkos,HALF,false,CoulLongTable<1> >;
friend class PairComputeFunctor<PairLJCutCoulLongKokkos,HALFTHREAD,false,CoulLongTable<1> >;
friend EV_FLOAT pair_compute_neighlist<PairLJCutCoulLongKokkos,FULL,CoulLongTable<1> >(PairLJCutCoulLongKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute_neighlist<PairLJCutCoulLongKokkos,HALF,CoulLongTable<1> >(PairLJCutCoulLongKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute_neighlist<PairLJCutCoulLongKokkos,HALFTHREAD,CoulLongTable<1> >(PairLJCutCoulLongKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute<PairLJCutCoulLongKokkos,CoulLongTable<1> >(PairLJCutCoulLongKokkos*,
NeighListKokkos<DeviceType>*);
friend class PairComputeFunctor<PairLJCutCoulLongKokkos,FULL,true,CoulLongTable<0> >;
friend class PairComputeFunctor<PairLJCutCoulLongKokkos,HALF,true,CoulLongTable<0> >;
friend class PairComputeFunctor<PairLJCutCoulLongKokkos,HALFTHREAD,true,CoulLongTable<0> >;
friend class PairComputeFunctor<PairLJCutCoulLongKokkos,FULL,false,CoulLongTable<0> >;
friend class PairComputeFunctor<PairLJCutCoulLongKokkos,HALF,false,CoulLongTable<0> >;
friend class PairComputeFunctor<PairLJCutCoulLongKokkos,HALFTHREAD,false,CoulLongTable<0> >;
friend EV_FLOAT pair_compute_neighlist<PairLJCutCoulLongKokkos,FULL,CoulLongTable<0> >(PairLJCutCoulLongKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute_neighlist<PairLJCutCoulLongKokkos,HALF,CoulLongTable<0> >(PairLJCutCoulLongKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute_neighlist<PairLJCutCoulLongKokkos,HALFTHREAD,CoulLongTable<0> >(PairLJCutCoulLongKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute<PairLJCutCoulLongKokkos,CoulLongTable<0> >(PairLJCutCoulLongKokkos*,
NeighListKokkos<DeviceType>*);
friend void pair_virial_fdotr_compute<PairLJCutCoulLongKokkos>(PairLJCutCoulLongKokkos*);
};
}
#endif
#endif
/* ERROR/WARNING messages:
E: Illegal ... command
Self-explanatory. Check the input script syntax and compare to the
documentation for the command. You can use -echo screen as a
command-line option when running LAMMPS to see the offending line.
E: Cannot use Kokkos pair style with rRESPA inner/middle
Self-explanatory.
E: Cannot use chosen neighbor list style with lj/cut/coul/long/kk
That style is not supported by Kokkos.
*/
diff --git a/src/KOKKOS/pair_lj_gromacs_coul_gromacs_kokkos.cpp b/src/KOKKOS/pair_lj_gromacs_coul_gromacs_kokkos.cpp
index 499a82667..b636f3649 100644
--- a/src/KOKKOS/pair_lj_gromacs_coul_gromacs_kokkos.cpp
+++ b/src/KOKKOS/pair_lj_gromacs_coul_gromacs_kokkos.cpp
@@ -1,500 +1,500 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing author: Ray Shan (SNL)
------------------------------------------------------------------------- */
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "pair_lj_gromacs_coul_gromacs_kokkos.h"
#include "kokkos.h"
#include "atom_kokkos.h"
#include "comm.h"
#include "force.h"
#include "neighbor.h"
#include "neigh_list.h"
#include "neigh_request.h"
#include "update.h"
#include "integrate.h"
#include "respa.h"
#include "math_const.h"
#include "memory.h"
#include "error.h"
#include "atom_masks.h"
using namespace LAMMPS_NS;
using namespace MathConst;
#define KOKKOS_CUDA_MAX_THREADS 256
#define KOKKOS_CUDA_MIN_BLOCKS 8
/* ---------------------------------------------------------------------- */
template<class DeviceType>
PairLJGromacsCoulGromacsKokkos<DeviceType>::PairLJGromacsCoulGromacsKokkos(LAMMPS *lmp):PairLJGromacsCoulGromacs(lmp)
{
respa_enable = 0;
atomKK = (AtomKokkos *) atom;
execution_space = ExecutionSpaceFromDevice<DeviceType>::space;
datamask_read = X_MASK | F_MASK | TYPE_MASK | Q_MASK | ENERGY_MASK | VIRIAL_MASK;
datamask_modify = F_MASK | ENERGY_MASK | VIRIAL_MASK;
cutsq = NULL;
cut_ljsq = 0.0;
cut_coulsq = 0.0;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
PairLJGromacsCoulGromacsKokkos<DeviceType>::~PairLJGromacsCoulGromacsKokkos()
{
if (!copymode) {
memory->destroy_kokkos(k_eatom,eatom);
memory->destroy_kokkos(k_vatom,vatom);
k_cutsq = DAT::tdual_ffloat_2d();
k_cut_ljsq = DAT::tdual_ffloat_2d();
k_cut_coulsq = DAT::tdual_ffloat_2d();
memory->sfree(cutsq);
//memory->sfree(cut_ljsq);
//memory->sfree(cut_coulsq);
eatom = NULL;
vatom = NULL;
cutsq = NULL;
cut_ljsq = 0.0;
cut_coulsq = 0.0;
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void PairLJGromacsCoulGromacsKokkos<DeviceType>::cleanup_copy() {
// WHY needed: this prevents parent copy from deallocating any arrays
allocated = 0;
cutsq = NULL;
cut_ljsq = 0.0;
eatom = NULL;
vatom = NULL;
ftable = NULL;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void PairLJGromacsCoulGromacsKokkos<DeviceType>::compute(int eflag_in, int vflag_in)
{
eflag = eflag_in;
vflag = vflag_in;
if (neighflag == FULL) no_virial_fdotr_compute = 1;
if (eflag || vflag) ev_setup(eflag,vflag);
else evflag = vflag_fdotr = 0;
atomKK->sync(execution_space,datamask_read);
k_cutsq.template sync<DeviceType>();
k_cut_ljsq.template sync<DeviceType>();
k_cut_coulsq.template sync<DeviceType>();
k_params.template sync<DeviceType>();
if (eflag || vflag) atomKK->modified(execution_space,datamask_modify);
else atomKK->modified(execution_space,F_MASK);
x = atomKK->k_x.view<DeviceType>();
c_x = atomKK->k_x.view<DeviceType>();
f = atomKK->k_f.view<DeviceType>();
q = atomKK->k_q.view<DeviceType>();
type = atomKK->k_type.view<DeviceType>();
nlocal = atom->nlocal;
nall = atom->nlocal + atom->nghost;
special_lj[0] = force->special_lj[0];
special_lj[1] = force->special_lj[1];
special_lj[2] = force->special_lj[2];
special_lj[3] = force->special_lj[3];
special_coul[0] = force->special_coul[0];
special_coul[1] = force->special_coul[1];
special_coul[2] = force->special_coul[2];
special_coul[3] = force->special_coul[3];
qqrd2e = force->qqrd2e;
newton_pair = force->newton_pair;
// loop over neighbors of my atoms
copymode = 1;
EV_FLOAT ev;
if(ncoultablebits)
ev = pair_compute<PairLJGromacsCoulGromacsKokkos<DeviceType>,CoulLongTable<1> >
(this,(NeighListKokkos<DeviceType>*)list);
else
ev = pair_compute<PairLJGromacsCoulGromacsKokkos<DeviceType>,CoulLongTable<0> >
(this,(NeighListKokkos<DeviceType>*)list);
if (eflag) {
eng_vdwl += ev.evdwl;
eng_coul += ev.ecoul;
}
if (vflag_global) {
virial[0] += ev.v[0];
virial[1] += ev.v[1];
virial[2] += ev.v[2];
virial[3] += ev.v[3];
virial[4] += ev.v[4];
virial[5] += ev.v[5];
}
if (vflag_fdotr) pair_virial_fdotr_compute(this);
copymode = 0;
}
/* ----------------------------------------------------------------------
compute LJ GROMACS pair force between atoms i and j
---------------------------------------------------------------------- */
template<class DeviceType>
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT PairLJGromacsCoulGromacsKokkos<DeviceType>::
compute_fpair(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype) const {
const F_FLOAT r2inv = 1.0/rsq;
const F_FLOAT r6inv = r2inv*r2inv*r2inv;
F_FLOAT forcelj = r6inv *
((STACKPARAMS?m_params[itype][jtype].lj1:params(itype,jtype).lj1)*r6inv -
(STACKPARAMS?m_params[itype][jtype].lj2:params(itype,jtype).lj2));
if (rsq > cut_lj_innersq) {
const F_FLOAT r = sqrt(rsq);
const F_FLOAT tlj = r - cut_lj_inner;
const F_FLOAT fswitch = r*tlj*tlj*
((STACKPARAMS?m_params[itype][jtype].ljsw1:params(itype,jtype).ljsw1) +
(STACKPARAMS?m_params[itype][jtype].ljsw2:params(itype,jtype).ljsw2)*tlj);
forcelj += fswitch;
}
return forcelj*r2inv;
}
/* ----------------------------------------------------------------------
compute LJ GROMACS pair potential energy between atoms i and j
---------------------------------------------------------------------- */
template<class DeviceType>
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT PairLJGromacsCoulGromacsKokkos<DeviceType>::
compute_evdwl(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype) const {
const F_FLOAT r2inv = 1.0/rsq;
const F_FLOAT r6inv = r2inv*r2inv*r2inv;
F_FLOAT englj = r6inv *
((STACKPARAMS?m_params[itype][jtype].lj3:params(itype,jtype).lj3)*r6inv -
(STACKPARAMS?m_params[itype][jtype].lj4:params(itype,jtype).lj4));
englj += (STACKPARAMS?m_params[itype][jtype].ljsw5:params(itype,jtype).ljsw5);
if (rsq > cut_lj_innersq) {
const F_FLOAT r = sqrt(rsq);
const F_FLOAT tlj = r - cut_lj_inner;
const F_FLOAT eswitch = tlj*tlj*tlj *
((STACKPARAMS?m_params[itype][jtype].ljsw3:params(itype,jtype).ljsw3) +
(STACKPARAMS?m_params[itype][jtype].ljsw4:params(itype,jtype).ljsw4)*tlj);
englj += eswitch;
}
return englj;
}
/* ----------------------------------------------------------------------
compute coulomb pair force between atoms i and j
---------------------------------------------------------------------- */
template<class DeviceType>
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT PairLJGromacsCoulGromacsKokkos<DeviceType>::
compute_fcoul(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype, const F_FLOAT& factor_coul, const F_FLOAT& qtmp) const {
const F_FLOAT r2inv = 1.0/rsq;
const F_FLOAT rinv = sqrt(r2inv);
F_FLOAT forcecoul = qqrd2e*qtmp*q(j) *rinv;
if (rsq > cut_coul_innersq) {
const F_FLOAT r = 1.0/rinv;
const F_FLOAT tc = r - cut_coul_inner;
const F_FLOAT fcoulswitch = qqrd2e * qtmp*q(j)*r*tc*tc*(coulsw1 + coulsw2*tc);
forcecoul += fcoulswitch;
}
return forcecoul * r2inv * factor_coul;
}
/* ----------------------------------------------------------------------
compute coulomb pair potential energy between atoms i and j
---------------------------------------------------------------------- */
template<class DeviceType>
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT PairLJGromacsCoulGromacsKokkos<DeviceType>::
compute_ecoul(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype, const F_FLOAT& factor_coul, const F_FLOAT& qtmp) const {
const F_FLOAT r2inv = 1.0/rsq;
const F_FLOAT rinv = sqrt(r2inv);
F_FLOAT ecoul = qqrd2e * qtmp * q(j) * (rinv-coulsw5);
if (rsq > cut_coul_innersq) {
const F_FLOAT r = 1.0/rinv;
const F_FLOAT tc = r - cut_coul_inner;
const F_FLOAT ecoulswitch = tc*tc*tc * (coulsw3 + coulsw4*tc);
ecoul += qqrd2e*qtmp*q(j)*ecoulswitch;
}
return ecoul * factor_coul;
}
/* ----------------------------------------------------------------------
allocate all arrays
------------------------------------------------------------------------- */
template<class DeviceType>
void PairLJGromacsCoulGromacsKokkos<DeviceType>::allocate()
{
PairLJGromacsCoulGromacs::allocate();
int n = atom->ntypes;
memory->destroy(cutsq);
memory->create_kokkos(k_cutsq,cutsq,n+1,n+1,"pair:cutsq");
d_cutsq = k_cutsq.template view<DeviceType>();
//memory->destroy(cut_ljsq);
memory->create_kokkos(k_cut_ljsq,n+1,n+1,"pair:cut_ljsq");
d_cut_ljsq = k_cut_ljsq.template view<DeviceType>();
memory->create_kokkos(k_cut_coulsq,n+1,n+1,"pair:cut_coulsq");
d_cut_coulsq = k_cut_coulsq.template view<DeviceType>();
- k_params = Kokkos::DualView<params_lj_coul**,Kokkos::LayoutRight,DeviceType>("PairLJGromacsCoulGromacs::params",n+1,n+1);
+ k_params = Kokkos::DualView<params_lj_coul_gromacs**,Kokkos::LayoutRight,DeviceType>("PairLJGromacsCoulGromacs::params",n+1,n+1);
params = k_params.d_view;
}
template<class DeviceType>
void PairLJGromacsCoulGromacsKokkos<DeviceType>::init_tables(double cut_coul, double *cut_respa)
{
Pair::init_tables(cut_coul,cut_respa);
typedef typename ArrayTypes<DeviceType>::t_ffloat_1d table_type;
typedef typename ArrayTypes<LMPHostType>::t_ffloat_1d host_table_type;
int ntable = 1;
for (int i = 0; i < ncoultablebits; i++) ntable *= 2;
// Copy rtable and drtable
{
host_table_type h_table("HostTable",ntable);
table_type d_table("DeviceTable",ntable);
for(int i = 0; i < ntable; i++) {
h_table(i) = rtable[i];
}
Kokkos::deep_copy(d_table,h_table);
d_rtable = d_table;
}
{
host_table_type h_table("HostTable",ntable);
table_type d_table("DeviceTable",ntable);
for(int i = 0; i < ntable; i++) {
h_table(i) = drtable[i];
}
Kokkos::deep_copy(d_table,h_table);
d_drtable = d_table;
}
{
host_table_type h_table("HostTable",ntable);
table_type d_table("DeviceTable",ntable);
// Copy ftable and dftable
for(int i = 0; i < ntable; i++) {
h_table(i) = ftable[i];
}
Kokkos::deep_copy(d_table,h_table);
d_ftable = d_table;
}
{
host_table_type h_table("HostTable",ntable);
table_type d_table("DeviceTable",ntable);
for(int i = 0; i < ntable; i++) {
h_table(i) = dftable[i];
}
Kokkos::deep_copy(d_table,h_table);
d_dftable = d_table;
}
{
host_table_type h_table("HostTable",ntable);
table_type d_table("DeviceTable",ntable);
// Copy ctable and dctable
for(int i = 0; i < ntable; i++) {
h_table(i) = ctable[i];
}
Kokkos::deep_copy(d_table,h_table);
d_ctable = d_table;
}
{
host_table_type h_table("HostTable",ntable);
table_type d_table("DeviceTable",ntable);
for(int i = 0; i < ntable; i++) {
h_table(i) = dctable[i];
}
Kokkos::deep_copy(d_table,h_table);
d_dctable = d_table;
}
{
host_table_type h_table("HostTable",ntable);
table_type d_table("DeviceTable",ntable);
// Copy etable and detable
for(int i = 0; i < ntable; i++) {
h_table(i) = etable[i];
}
Kokkos::deep_copy(d_table,h_table);
d_etable = d_table;
}
{
host_table_type h_table("HostTable",ntable);
table_type d_table("DeviceTable",ntable);
for(int i = 0; i < ntable; i++) {
h_table(i) = detable[i];
}
Kokkos::deep_copy(d_table,h_table);
d_detable = d_table;
}
}
/* ----------------------------------------------------------------------
global settings
------------------------------------------------------------------------- */
template<class DeviceType>
void PairLJGromacsCoulGromacsKokkos<DeviceType>::settings(int narg, char **arg)
{
if (narg > 2) error->all(FLERR,"Illegal pair_style command");
PairLJGromacsCoulGromacs::settings(narg,arg);
}
/* ----------------------------------------------------------------------
init specific to this pair style
------------------------------------------------------------------------- */
template<class DeviceType>
void PairLJGromacsCoulGromacsKokkos<DeviceType>::init_style()
{
PairLJGromacsCoulGromacs::init_style();
// error if rRESPA with inner levels
if (update->whichflag == 1 && strstr(update->integrate_style,"respa")) {
int respa = 0;
if (((Respa *) update->integrate)->level_inner >= 0) respa = 1;
if (((Respa *) update->integrate)->level_middle >= 0) respa = 2;
if (respa)
error->all(FLERR,"Cannot use Kokkos pair style with rRESPA inner/middle");
}
// irequest = neigh request made by parent class
neighflag = lmp->kokkos->neighflag;
int irequest = neighbor->nrequest - 1;
neighbor->requests[irequest]->
kokkos_host = Kokkos::Impl::is_same<DeviceType,LMPHostType>::value &&
!Kokkos::Impl::is_same<DeviceType,LMPDeviceType>::value;
neighbor->requests[irequest]->
kokkos_device = Kokkos::Impl::is_same<DeviceType,LMPDeviceType>::value;
if (neighflag == FULL) {
neighbor->requests[irequest]->full = 1;
neighbor->requests[irequest]->half = 0;
} else if (neighflag == HALF || neighflag == HALFTHREAD) {
neighbor->requests[irequest]->full = 0;
neighbor->requests[irequest]->half = 1;
} else {
error->all(FLERR,"Cannot use chosen neighbor list style with lj/gromacs/coul/gromacs/kk");
}
}
/* ----------------------------------------------------------------------
init for one type pair i,j and corresponding j,i
------------------------------------------------------------------------- */
template<class DeviceType>
double PairLJGromacsCoulGromacsKokkos<DeviceType>::init_one(int i, int j)
{
double cutone = PairLJGromacsCoulGromacs::init_one(i,j);
double cut_ljsqm = cut_ljsq;
double cut_coulsqm = cut_coulsq;
k_params.h_view(i,j).lj1 = lj1[i][j];
k_params.h_view(i,j).lj2 = lj2[i][j];
k_params.h_view(i,j).lj3 = lj3[i][j];
k_params.h_view(i,j).lj4 = lj4[i][j];
k_params.h_view(i,j).ljsw1 = ljsw1[i][j];
k_params.h_view(i,j).ljsw2 = ljsw2[i][j];
k_params.h_view(i,j).ljsw3 = ljsw3[i][j];
k_params.h_view(i,j).ljsw4 = ljsw4[i][j];
k_params.h_view(i,j).ljsw5 = ljsw5[i][j];
k_params.h_view(i,j).cut_ljsq = cut_ljsqm;
k_params.h_view(i,j).cut_coulsq = cut_coulsqm;
k_params.h_view(j,i) = k_params.h_view(i,j);
if(i<MAX_TYPES_STACKPARAMS+1 && j<MAX_TYPES_STACKPARAMS+1) {
m_params[i][j] = m_params[j][i] = k_params.h_view(i,j);
m_cutsq[j][i] = m_cutsq[i][j] = cutone*cutone;
m_cut_ljsq[j][i] = m_cut_ljsq[i][j] = cut_ljsqm;
m_cut_coulsq[j][i] = m_cut_coulsq[i][j] = cut_coulsqm;
}
k_cutsq.h_view(i,j) = k_cutsq.h_view(j,i) = cutone*cutone;
k_cutsq.template modify<LMPHostType>();
k_cut_ljsq.h_view(i,j) = k_cut_ljsq.h_view(j,i) = cut_ljsqm;
k_cut_ljsq.template modify<LMPHostType>();
k_cut_coulsq.h_view(i,j) = k_cut_coulsq.h_view(j,i) = cut_coulsqm;
k_cut_coulsq.template modify<LMPHostType>();
k_params.template modify<LMPHostType>();
return cutone;
}
namespace LAMMPS_NS {
template class PairLJGromacsCoulGromacsKokkos<LMPDeviceType>;
#ifdef KOKKOS_HAVE_CUDA
template class PairLJGromacsCoulGromacsKokkos<LMPHostType>;
#endif
}
diff --git a/src/KOKKOS/pair_lj_gromacs_coul_gromacs_kokkos.h b/src/KOKKOS/pair_lj_gromacs_coul_gromacs_kokkos.h
index 8b10eb71a..bbf5c50a6 100644
--- a/src/KOKKOS/pair_lj_gromacs_coul_gromacs_kokkos.h
+++ b/src/KOKKOS/pair_lj_gromacs_coul_gromacs_kokkos.h
@@ -1,165 +1,167 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#ifdef PAIR_CLASS
PairStyle(lj/gromacs/coul/gromacs/kk,PairLJGromacsCoulGromacsKokkos<LMPDeviceType>)
PairStyle(lj/gromacs/coul/gromacs/kk/device,PairLJGromacsCoulGromacsKokkos<LMPDeviceType>)
PairStyle(lj/gromacs/coul/gromacs/kk/host,PairLJGromacsCoulGromacsKokkos<LMPHostType>)
#else
#ifndef LMP_PAIR_LJ_GROMACS_COUL_GROMACS_KOKKOS_H
#define LMP_PAIR_LJ_GROMACS_COUL_GROMACS_KOKKOS_H
#include "pair_kokkos.h"
#include "pair_lj_gromacs_coul_gromacs.h"
#include "neigh_list_kokkos.h"
namespace LAMMPS_NS {
template<class DeviceType>
class PairLJGromacsCoulGromacsKokkos : public PairLJGromacsCoulGromacs {
public:
enum {EnabledNeighFlags=FULL|HALFTHREAD|HALF};
enum {COUL_FLAG=1};
typedef DeviceType device_type;
PairLJGromacsCoulGromacsKokkos(class LAMMPS *);
~PairLJGromacsCoulGromacsKokkos();
void compute(int, int);
void settings(int, char **);
void init_tables(double cut_coul, double *cut_respa);
void init_style();
double init_one(int, int);
- struct params_lj_coul{
- params_lj_coul(){cut_ljsq=0;cut_coulsq=0;lj1=0;lj2=0;lj3=0;lj4=0;offset=0;ljsw1=0;ljsw2=0;ljsw3=0;ljsw4=0;ljsw5=0;};
- params_lj_coul(int i){cut_ljsq=0;cut_coulsq=0;lj1=0;lj2=0;lj3=0;lj4=0;offset=0;ljsw1=0;ljsw2=0;ljsw3=0;ljsw4=0;ljsw5=0;};
+ struct params_lj_coul_gromacs{
+ KOKKOS_INLINE_FUNCTION
+ params_lj_coul_gromacs(){cut_ljsq=0;cut_coulsq=0;lj1=0;lj2=0;lj3=0;lj4=0;offset=0;ljsw1=0;ljsw2=0;ljsw3=0;ljsw4=0;ljsw5=0;};
+ KOKKOS_INLINE_FUNCTION
+ params_lj_coul_gromacs(int i){cut_ljsq=0;cut_coulsq=0;lj1=0;lj2=0;lj3=0;lj4=0;offset=0;ljsw1=0;ljsw2=0;ljsw3=0;ljsw4=0;ljsw5=0;};
F_FLOAT cut_ljsq,cut_coulsq,lj1,lj2,lj3,lj4,offset,ljsw1,ljsw2,ljsw3,ljsw4,ljsw5;
};
protected:
void cleanup_copy();
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_fpair(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype) const;
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_fcoul(const F_FLOAT& rsq, const int& i, const int&j, const int& itype,
const int& jtype, const F_FLOAT& factor_coul, const F_FLOAT& qtmp) const;
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_evdwl(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype) const;
template<bool STACKPARAMS, class Specialisation>
KOKKOS_INLINE_FUNCTION
F_FLOAT compute_ecoul(const F_FLOAT& rsq, const int& i, const int&j,
const int& itype, const int& jtype, const F_FLOAT& factor_coul, const F_FLOAT& qtmp) const;
- Kokkos::DualView<params_lj_coul**,Kokkos::LayoutRight,DeviceType> k_params;
- typename Kokkos::DualView<params_lj_coul**,
+ Kokkos::DualView<params_lj_coul_gromacs**,Kokkos::LayoutRight,DeviceType> k_params;
+ typename Kokkos::DualView<params_lj_coul_gromacs**,
Kokkos::LayoutRight,DeviceType>::t_dev_const_um params;
// hardwired to space for 12 atom types
- params_lj_coul m_params[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
+ params_lj_coul_gromacs m_params[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
F_FLOAT m_cutsq[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
F_FLOAT m_cut_ljsq[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
F_FLOAT m_cut_coulsq[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
typename ArrayTypes<DeviceType>::t_x_array_randomread x;
typename ArrayTypes<DeviceType>::t_x_array c_x;
typename ArrayTypes<DeviceType>::t_f_array f;
typename ArrayTypes<DeviceType>::t_int_1d_randomread type;
typename ArrayTypes<DeviceType>::t_float_1d_randomread q;
DAT::tdual_efloat_1d k_eatom;
DAT::tdual_virial_array k_vatom;
typename ArrayTypes<DeviceType>::t_efloat_1d d_eatom;
typename ArrayTypes<DeviceType>::t_virial_array d_vatom;
int newton_pair;
typename ArrayTypes<DeviceType>::tdual_ffloat_2d k_cutsq;
typename ArrayTypes<DeviceType>::t_ffloat_2d d_cutsq;
typename ArrayTypes<DeviceType>::tdual_ffloat_2d k_cut_ljsq;
typename ArrayTypes<DeviceType>::t_ffloat_2d d_cut_ljsq;
typename ArrayTypes<DeviceType>::tdual_ffloat_2d k_cut_coulsq;
typename ArrayTypes<DeviceType>::t_ffloat_2d d_cut_coulsq;
typename ArrayTypes<DeviceType>::t_ffloat_1d_randomread
d_rtable, d_drtable, d_ftable, d_dftable,
d_ctable, d_dctable, d_etable, d_detable;
int neighflag;
int nlocal,nall,eflag,vflag;
double special_coul[4];
double special_lj[4];
double qqrd2e;
void allocate();
friend class PairComputeFunctor<PairLJGromacsCoulGromacsKokkos,FULL,true,CoulLongTable<1> >;
friend class PairComputeFunctor<PairLJGromacsCoulGromacsKokkos,HALF,true,CoulLongTable<1> >;
friend class PairComputeFunctor<PairLJGromacsCoulGromacsKokkos,HALFTHREAD,true,CoulLongTable<1> >;
friend class PairComputeFunctor<PairLJGromacsCoulGromacsKokkos,FULL,false,CoulLongTable<1> >;
friend class PairComputeFunctor<PairLJGromacsCoulGromacsKokkos,HALF,false,CoulLongTable<1> >;
friend class PairComputeFunctor<PairLJGromacsCoulGromacsKokkos,HALFTHREAD,false,CoulLongTable<1> >;
friend EV_FLOAT pair_compute_neighlist<PairLJGromacsCoulGromacsKokkos,FULL,CoulLongTable<1> >(PairLJGromacsCoulGromacsKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute_neighlist<PairLJGromacsCoulGromacsKokkos,HALF,CoulLongTable<1> >(PairLJGromacsCoulGromacsKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute_neighlist<PairLJGromacsCoulGromacsKokkos,HALFTHREAD,CoulLongTable<1> >(PairLJGromacsCoulGromacsKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute<PairLJGromacsCoulGromacsKokkos,CoulLongTable<1> >(PairLJGromacsCoulGromacsKokkos*,
NeighListKokkos<DeviceType>*);
friend class PairComputeFunctor<PairLJGromacsCoulGromacsKokkos,FULL,true,CoulLongTable<0> >;
friend class PairComputeFunctor<PairLJGromacsCoulGromacsKokkos,HALF,true,CoulLongTable<0> >;
friend class PairComputeFunctor<PairLJGromacsCoulGromacsKokkos,HALFTHREAD,true,CoulLongTable<0> >;
friend class PairComputeFunctor<PairLJGromacsCoulGromacsKokkos,FULL,false,CoulLongTable<0> >;
friend class PairComputeFunctor<PairLJGromacsCoulGromacsKokkos,HALF,false,CoulLongTable<0> >;
friend class PairComputeFunctor<PairLJGromacsCoulGromacsKokkos,HALFTHREAD,false,CoulLongTable<0> >;
friend EV_FLOAT pair_compute_neighlist<PairLJGromacsCoulGromacsKokkos,FULL,CoulLongTable<0> >(PairLJGromacsCoulGromacsKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute_neighlist<PairLJGromacsCoulGromacsKokkos,HALF,CoulLongTable<0> >(PairLJGromacsCoulGromacsKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute_neighlist<PairLJGromacsCoulGromacsKokkos,HALFTHREAD,CoulLongTable<0> >(PairLJGromacsCoulGromacsKokkos*,NeighListKokkos<DeviceType>*);
friend EV_FLOAT pair_compute<PairLJGromacsCoulGromacsKokkos,CoulLongTable<0> >(PairLJGromacsCoulGromacsKokkos*,
NeighListKokkos<DeviceType>*);
friend void pair_virial_fdotr_compute<PairLJGromacsCoulGromacsKokkos>(PairLJGromacsCoulGromacsKokkos*);
};
}
#endif
#endif
/* ERROR/WARNING messages:
E: Illegal ... command
Self-explanatory. Check the input script syntax and compare to the
documentation for the command. You can use -echo screen as a
command-line option when running LAMMPS to see the offending line.
E: Cannot use Kokkos pair style with rRESPA inner/middle
Self-explanatory.
E: Cannot use chosen neighbor list style with lj/gromacs/coul/gromacs/kk
Self-explanatory.
*/
diff --git a/src/KSPACE/pair_buck_long_coul_long.cpp b/src/KSPACE/pair_buck_long_coul_long.cpp
index 6504af57d..eb311fe3d 100644
--- a/src/KSPACE/pair_buck_long_coul_long.cpp
+++ b/src/KSPACE/pair_buck_long_coul_long.cpp
@@ -1,1052 +1,1056 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing author: Pieter J. in 't Veld (SNL)
------------------------------------------------------------------------- */
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "math_vector.h"
#include "pair_buck_long_coul_long.h"
#include "atom.h"
#include "comm.h"
#include "neighbor.h"
#include "neigh_list.h"
#include "neigh_request.h"
#include "force.h"
#include "kspace.h"
#include "update.h"
#include "integrate.h"
#include "respa.h"
#include "memory.h"
#include "error.h"
using namespace LAMMPS_NS;
#define EWALD_F 1.12837917
#define EWALD_P 0.3275911
#define A1 0.254829592
#define A2 -0.284496736
#define A3 1.421413741
#define A4 -1.453152027
#define A5 1.061405429
/* ---------------------------------------------------------------------- */
PairBuckLongCoulLong::PairBuckLongCoulLong(LAMMPS *lmp) : Pair(lmp)
{
dispersionflag = ewaldflag = pppmflag = 1;
respa_enable = 1;
writedata = 1;
+ ftable = NULL;
+ fdisptable = NULL;
}
/* ----------------------------------------------------------------------
global settings
------------------------------------------------------------------------- */
void PairBuckLongCoulLong::options(char **arg, int order)
{
const char *option[] = {"long", "cut", "off", NULL};
int i;
if (!*arg) error->all(FLERR,"Illegal pair_style buck/long/coul/long command");
for (i=0; option[i]&&strcmp(arg[0], option[i]); ++i);
switch (i) {
default: error->all(FLERR,"Illegal pair_style buck/long/coul/long command");
case 0: ewald_order |= 1<<order; break;
case 2: ewald_off |= 1<<order;
case 1: break;
}
}
/* ---------------------------------------------------------------------- */
void PairBuckLongCoulLong::settings(int narg, char **arg)
{
if (narg != 3 && narg != 4) error->all(FLERR,"Illegal pair_style command");
ewald_order = 0;
ewald_off = 0;
options(arg,6);
options(++arg,1);
if (!comm->me && ewald_order == ((1<<1) | (1<<6)))
error->warning(FLERR,"Using largest cutoff for buck/long/coul/long");
if (!*(++arg))
error->all(FLERR,"Cutoffs missing in pair_style buck/long/coul/long");
if (ewald_off & (1<<6))
error->all(FLERR,"LJ6 off not supported in pair_style buck/long/coul/long");
if (!((ewald_order^ewald_off) & (1<<1)))
error->all(FLERR,
"Coulomb cut not supported in pair_style buck/long/coul/coul");
cut_buck_global = force->numeric(FLERR,*(arg++));
if (narg == 4 && ((ewald_order & 0x42) == 0x42))
error->all(FLERR,"Only one cutoff allowed when requesting all long");
if (narg == 4) cut_coul = force->numeric(FLERR,*arg);
else cut_coul = cut_buck_global;
if (allocated) {
int i,j;
for (i = 1; i <= atom->ntypes; i++)
for (j = i+1; j <= atom->ntypes; j++)
if (setflag[i][j]) cut_buck[i][j] = cut_buck_global;
}
}
/* ----------------------------------------------------------------------
free all arrays
------------------------------------------------------------------------- */
PairBuckLongCoulLong::~PairBuckLongCoulLong()
{
if (allocated) {
memory->destroy(setflag);
memory->destroy(cutsq);
memory->destroy(cut_buck_read);
memory->destroy(cut_buck);
memory->destroy(cut_bucksq);
memory->destroy(buck_a_read);
memory->destroy(buck_a);
memory->destroy(buck_c_read);
memory->destroy(buck_c);
memory->destroy(buck_rho_read);
memory->destroy(buck_rho);
memory->destroy(buck1);
memory->destroy(buck2);
memory->destroy(rhoinv);
memory->destroy(offset);
}
if (ftable) free_tables();
if (fdisptable) free_disp_tables();
}
/* ----------------------------------------------------------------------
allocate all arrays
------------------------------------------------------------------------- */
void PairBuckLongCoulLong::allocate()
{
allocated = 1;
int n = atom->ntypes;
memory->create(setflag,n+1,n+1,"pair:setflag");
for (int i = 1; i <= n; i++)
for (int j = i; j <= n; j++)
setflag[i][j] = 0;
memory->create(cutsq,n+1,n+1,"pair:cutsq");
memory->create(cut_buck_read,n+1,n+1,"pair:cut_buck_read");
memory->create(cut_buck,n+1,n+1,"pair:cut_buck");
memory->create(cut_bucksq,n+1,n+1,"pair:cut_bucksq");
memory->create(buck_a_read,n+1,n+1,"pair:buck_a_read");
memory->create(buck_a,n+1,n+1,"pair:buck_a");
memory->create(buck_c_read,n+1,n+1,"pair:buck_c_read");
memory->create(buck_c,n+1,n+1,"pair:buck_c");
memory->create(buck_rho_read,n+1,n+1,"pair:buck_rho_read");
memory->create(buck_rho,n+1,n+1,"pair:buck_rho");
memory->create(buck1,n+1,n+1,"pair:buck1");
memory->create(buck2,n+1,n+1,"pair:buck2");
memory->create(rhoinv,n+1,n+1,"pair:rhoinv");
memory->create(offset,n+1,n+1,"pair:offset");
}
/* ----------------------------------------------------------------------
extract protected data from object
------------------------------------------------------------------------- */
void *PairBuckLongCoulLong::extract(const char *id, int &dim)
{
const char *ids[] = {
"B", "ewald_order", "ewald_cut", "ewald_mix", "cut_coul", "cut_LJ", NULL};
void *ptrs[] = {
buck_c, &ewald_order, &cut_coul, &mix_flag, &cut_coul, &cut_buck_global,
NULL};
int i;
for (i=0; ids[i]&&strcmp(ids[i], id); ++i);
if (i == 0) dim = 2;
else dim = 0;
return ptrs[i];
}
/* ----------------------------------------------------------------------
set coeffs for one or more type pairs
------------------------------------------------------------------------- */
void PairBuckLongCoulLong::coeff(int narg, char **arg)
{
if (narg < 5 || narg > 6)
error->all(FLERR,"Incorrect args for pair coefficients");
if (!allocated) allocate();
int ilo,ihi,jlo,jhi;
force->bounds(FLERR,*(arg++),atom->ntypes,ilo,ihi);
force->bounds(FLERR,*(arg++),atom->ntypes,jlo,jhi);
double buck_a_one = force->numeric(FLERR,*(arg++));
double buck_rho_one = force->numeric(FLERR,*(arg++));
double buck_c_one = force->numeric(FLERR,*(arg++));
double cut_buck_one = cut_buck_global;
if (narg == 6) cut_buck_one = force->numeric(FLERR,*(arg++));
int count = 0;
for (int i = ilo; i <= ihi; i++) {
for (int j = MAX(jlo,i); j <= jhi; j++) {
buck_a_read[i][j] = buck_a_one;
buck_c_read[i][j] = buck_c_one;
buck_rho_read[i][j] = buck_rho_one;
cut_buck_read[i][j] = cut_buck_one;
setflag[i][j] = 1;
count++;
}
}
if (count == 0) error->all(FLERR,"Incorrect args for pair coefficients");
}
/* ----------------------------------------------------------------------
init specific to this pair style
------------------------------------------------------------------------- */
void PairBuckLongCoulLong::init_style()
{
// require an atom style with charge defined
if (!atom->q_flag && (ewald_order&(1<<1)))
- error->all(FLERR,"Pair style buck/long/coul/long requires atom attribute q");
+ error->all(FLERR,
+ "Invoking coulombic in pair style buck/long/coul/long requires atom attribute q");
+
+ // ensure use of KSpace long-range solver, set two g_ewalds
+
+ if (force->kspace == NULL)
+ error->all(FLERR,"Pair style requires a KSpace style");
+ if (ewald_order&(1<<1)) g_ewald = force->kspace->g_ewald;
+ if (ewald_order&(1<<6)) g_ewald_6 = force->kspace->g_ewald_6;
+
+ // set rRESPA cutoffs
+
+ if (strstr(update->integrate_style,"respa") &&
+ ((Respa *) update->integrate)->level_inner >= 0)
+ cut_respa = ((Respa *) update->integrate)->cutoff;
+ else cut_respa = NULL;
+
+ // setup force tables
+
+ if (ncoultablebits && (ewald_order&(1<<1))) init_tables(cut_coul,cut_respa);
+ if (ndisptablebits && (ewald_order&(1<<6))) init_tables_disp(cut_buck_global);
// request regular or rRESPA neighbor lists if neighrequest_flag != 0
if (force->kspace->neighrequest_flag) {
int irequest;
if (update->whichflag == 1 && strstr(update->integrate_style,"respa")) {
int respa = 0;
if (((Respa *) update->integrate)->level_inner >= 0) respa = 1;
if (((Respa *) update->integrate)->level_middle >= 0) respa = 2;
if (respa == 0) irequest = neighbor->request(this,instance_me);
else if (respa == 1) {
irequest = neighbor->request(this,instance_me);
neighbor->requests[irequest]->id = 1;
neighbor->requests[irequest]->half = 0;
neighbor->requests[irequest]->respainner = 1;
irequest = neighbor->request(this,instance_me);
neighbor->requests[irequest]->id = 3;
neighbor->requests[irequest]->half = 0;
neighbor->requests[irequest]->respaouter = 1;
} else {
irequest = neighbor->request(this,instance_me);
neighbor->requests[irequest]->id = 1;
neighbor->requests[irequest]->half = 0;
neighbor->requests[irequest]->respainner = 1;
irequest = neighbor->request(this,instance_me);
neighbor->requests[irequest]->id = 2;
neighbor->requests[irequest]->half = 0;
neighbor->requests[irequest]->respamiddle = 1;
irequest = neighbor->request(this,instance_me);
neighbor->requests[irequest]->id = 3;
neighbor->requests[irequest]->half = 0;
neighbor->requests[irequest]->respaouter = 1;
}
} else irequest = neighbor->request(this,instance_me);
}
cut_coulsq = cut_coul * cut_coul;
-
- // set rRESPA cutoffs
-
- if (strstr(update->integrate_style,"respa") &&
- ((Respa *) update->integrate)->level_inner >= 0)
- cut_respa = ((Respa *) update->integrate)->cutoff;
- else cut_respa = NULL;
-
- // ensure use of KSpace long-range solver, set two g_ewalds
-
- if (force->kspace == NULL)
- error->all(FLERR,"Pair style requires a KSpace style");
- if (ewald_order&(1<<1)) g_ewald = force->kspace->g_ewald;
- if (ewald_order&(1<<6)) g_ewald_6 = force->kspace->g_ewald_6;
- // setup force tables
-
- if (ncoultablebits && (ewald_order&(1<<1))) init_tables(cut_coul,cut_respa);
- if (ndisptablebits && (ewald_order&(1<<6))) init_tables_disp(cut_buck_global);
}
/* ----------------------------------------------------------------------
neighbor callback to inform pair style of neighbor list to use
regular or rRESPA
------------------------------------------------------------------------- */
void PairBuckLongCoulLong::init_list(int id, NeighList *ptr)
{
if (id == 0) list = ptr;
else if (id == 1) listinner = ptr;
else if (id == 2) listmiddle = ptr;
else if (id == 3) listouter = ptr;
}
/* ----------------------------------------------------------------------
init for one type pair i,j and corresponding j,i
------------------------------------------------------------------------- */
double PairBuckLongCoulLong::init_one(int i, int j)
{
if (setflag[i][j] == 0) error->all(FLERR,"All pair coeffs are not set");
if (ewald_order&(1<<6)) cut_buck[i][j] = cut_buck_global;
else cut_buck[i][j] = cut_buck_read[i][j];
buck_a[i][j] = buck_a_read[i][j];
buck_c[i][j] = buck_c_read[i][j];
buck_rho[i][j] = buck_rho_read[i][j];
double cut = MAX(cut_buck[i][j],cut_coul);
cutsq[i][j] = cut*cut;
cut_bucksq[i][j] = cut_buck[i][j] * cut_buck[i][j];
buck1[i][j] = buck_a[i][j]/buck_rho[i][j];
buck2[i][j] = 6.0*buck_c[i][j];
rhoinv[i][j] = 1.0/buck_rho[i][j];
// check interior rRESPA cutoff
if (cut_respa && MIN(cut_buck[i][j],cut_coul) < cut_respa[3])
error->all(FLERR,"Pair cutoff < Respa interior cutoff");
if (offset_flag) {
double rexp = exp(-cut_buck[i][j]/buck_rho[i][j]);
offset[i][j] = buck_a[i][j]*rexp - buck_c[i][j]/pow(cut_buck[i][j],6.0);
} else offset[i][j] = 0.0;
cutsq[j][i] = cutsq[i][j];
cut_bucksq[j][i] = cut_bucksq[i][j];
buck_a[j][i] = buck_a[i][j];
buck_c[j][i] = buck_c[i][j];
rhoinv[j][i] = rhoinv[i][j];
buck1[j][i] = buck1[i][j];
buck2[j][i] = buck2[i][j];
offset[j][i] = offset[i][j];
return cut;
}
/* ----------------------------------------------------------------------
proc 0 writes to restart file
------------------------------------------------------------------------- */
void PairBuckLongCoulLong::write_restart(FILE *fp)
{
write_restart_settings(fp);
int i,j;
for (i = 1; i <= atom->ntypes; i++)
for (j = i; j <= atom->ntypes; j++) {
fwrite(&setflag[i][j],sizeof(int),1,fp);
if (setflag[i][j]) {
fwrite(&buck_a_read[i][j],sizeof(double),1,fp);
fwrite(&buck_rho_read[i][j],sizeof(double),1,fp);
fwrite(&buck_c_read[i][j],sizeof(double),1,fp);
fwrite(&cut_buck_read[i][j],sizeof(double),1,fp);
}
}
}
/* ----------------------------------------------------------------------
proc 0 reads from restart file, bcasts
------------------------------------------------------------------------- */
void PairBuckLongCoulLong::read_restart(FILE *fp)
{
read_restart_settings(fp);
allocate();
int i,j;
int me = comm->me;
for (i = 1; i <= atom->ntypes; i++)
for (j = i; j <= atom->ntypes; j++) {
if (me == 0) fread(&setflag[i][j],sizeof(int),1,fp);
MPI_Bcast(&setflag[i][j],1,MPI_INT,0,world);
if (setflag[i][j]) {
if (me == 0) {
fread(&buck_a_read[i][j],sizeof(double),1,fp);
fread(&buck_rho_read[i][j],sizeof(double),1,fp);
fread(&buck_c_read[i][j],sizeof(double),1,fp);
fread(&cut_buck_read[i][j],sizeof(double),1,fp);
}
MPI_Bcast(&buck_a_read[i][j],1,MPI_DOUBLE,0,world);
MPI_Bcast(&buck_rho_read[i][j],1,MPI_DOUBLE,0,world);
MPI_Bcast(&buck_c_read[i][j],1,MPI_DOUBLE,0,world);
MPI_Bcast(&cut_buck_read[i][j],1,MPI_DOUBLE,0,world);
}
}
}
/* ----------------------------------------------------------------------
proc 0 writes to restart file
------------------------------------------------------------------------- */
void PairBuckLongCoulLong::write_restart_settings(FILE *fp)
{
fwrite(&cut_buck_global,sizeof(double),1,fp);
fwrite(&cut_coul,sizeof(double),1,fp);
fwrite(&offset_flag,sizeof(int),1,fp);
fwrite(&mix_flag,sizeof(int),1,fp);
fwrite(&ncoultablebits,sizeof(int),1,fp);
fwrite(&tabinner,sizeof(double),1,fp);
fwrite(&ewald_order,sizeof(int),1,fp);
}
/* ----------------------------------------------------------------------
proc 0 reads from restart file, bcasts
------------------------------------------------------------------------- */
void PairBuckLongCoulLong::read_restart_settings(FILE *fp)
{
if (comm->me == 0) {
fread(&cut_buck_global,sizeof(double),1,fp);
fread(&cut_coul,sizeof(double),1,fp);
fread(&offset_flag,sizeof(int),1,fp);
fread(&mix_flag,sizeof(int),1,fp);
fread(&ncoultablebits,sizeof(int),1,fp);
fread(&tabinner,sizeof(double),1,fp);
fread(&ewald_order,sizeof(int),1,fp);
}
MPI_Bcast(&cut_buck_global,1,MPI_DOUBLE,0,world);
MPI_Bcast(&cut_coul,1,MPI_DOUBLE,0,world);
MPI_Bcast(&offset_flag,1,MPI_INT,0,world);
MPI_Bcast(&mix_flag,1,MPI_INT,0,world);
MPI_Bcast(&ncoultablebits,1,MPI_INT,0,world);
MPI_Bcast(&tabinner,1,MPI_DOUBLE,0,world);
MPI_Bcast(&ewald_order,1,MPI_INT,0,world);
}
/* ----------------------------------------------------------------------
proc 0 writes to data file
------------------------------------------------------------------------- */
void PairBuckLongCoulLong::write_data(FILE *fp)
{
for (int i = 1; i <= atom->ntypes; i++)
fprintf(fp,"%d %g %g %g\n",i,
buck_a_read[i][i],buck_rho_read[i][i],buck_c_read[i][i]);
}
/* ----------------------------------------------------------------------
proc 0 writes all pairs to data file
------------------------------------------------------------------------- */
void PairBuckLongCoulLong::write_data_all(FILE *fp)
{
for (int i = 1; i <= atom->ntypes; i++)
for (int j = i; j <= atom->ntypes; j++)
fprintf(fp,"%d %d %g %g %g\n",i,j,
buck_a_read[i][j],buck_rho_read[i][j],buck_c_read[i][j]);
}
/* ----------------------------------------------------------------------
compute pair interactions
------------------------------------------------------------------------- */
void PairBuckLongCoulLong::compute(int eflag, int vflag)
{
double evdwl,ecoul,fpair;
evdwl = ecoul = 0.0;
if (eflag || vflag) ev_setup(eflag,vflag);
else evflag = vflag_fdotr = 0;
double **x = atom->x, *x0 = x[0];
double **f = atom->f, *f0 = f[0], *fi = f0;
double *q = atom->q;
int *type = atom->type;
int nlocal = atom->nlocal;
double *special_coul = force->special_coul;
double *special_lj = force->special_lj;
int newton_pair = force->newton_pair;
double qqrd2e = force->qqrd2e;
int i, j, order1 = ewald_order&(1<<1), order6 = ewald_order&(1<<6);
int *ineigh, *ineighn, *jneigh, *jneighn, typei, typej, ni;
double qi = 0.0, qri = 0.0, *cutsqi, *cut_bucksqi,
*buck1i, *buck2i, *buckai, *buckci, *rhoinvi, *offseti;
double r, rsq, r2inv, force_coul, force_buck;
double g2 = g_ewald_6*g_ewald_6, g6 = g2*g2*g2, g8 = g6*g2;
vector xi, d;
ineighn = (ineigh = list->ilist)+list->inum;
for (; ineigh<ineighn; ++ineigh) { // loop over my atoms
i = *ineigh; fi = f0+3*i;
if (order1) qri = (qi = q[i])*qqrd2e; // initialize constants
offseti = offset[typei = type[i]];
buck1i = buck1[typei]; buck2i = buck2[typei];
buckai = buck_a[typei]; buckci = buck_c[typei], rhoinvi = rhoinv[typei];
cutsqi = cutsq[typei]; cut_bucksqi = cut_bucksq[typei];
memcpy(xi, x0+(i+(i<<1)), sizeof(vector));
jneighn = (jneigh = list->firstneigh[i])+list->numneigh[i];
for (; jneigh<jneighn; ++jneigh) { // loop over neighbors
j = *jneigh;
ni = sbmask(j);
j &= NEIGHMASK;
{ register double *xj = x0+(j+(j<<1));
d[0] = xi[0] - xj[0]; // pair vector
d[1] = xi[1] - xj[1];
d[2] = xi[2] - xj[2]; }
if ((rsq = vec_dot(d, d)) >= cutsqi[typej = type[j]]) continue;
r2inv = 1.0/rsq;
r = sqrt(rsq);
if (order1 && (rsq < cut_coulsq)) { // coulombic
if (!ncoultablebits || rsq <= tabinnersq) { // series real space
register double x = g_ewald*r;
register double s = qri*q[j], t = 1.0/(1.0+EWALD_P*x);
if (ni == 0) {
s *= g_ewald*exp(-x*x);
force_coul = (t *= ((((t*A5+A4)*t+A3)*t+A2)*t+A1)*s/x)+EWALD_F*s;
if (eflag) ecoul = t;
}
else { // special case
register double f = s*(1.0-special_coul[ni])/r;
s *= g_ewald*exp(-x*x);
force_coul = (t *= ((((t*A5+A4)*t+A3)*t+A2)*t+A1)*s/x)+EWALD_F*s-f;
if (eflag) ecoul = t-f;
}
} // table real space
else {
register union_int_float_t t;
t.f = rsq;
register const int k = (t.i & ncoulmask) >> ncoulshiftbits;
register double f = (rsq-rtable[k])*drtable[k], qiqj = qi*q[j];
if (ni == 0) {
force_coul = qiqj*(ftable[k]+f*dftable[k]);
if (eflag) ecoul = qiqj*(etable[k]+f*detable[k]);
}
else { // special case
t.f = (1.0-special_coul[ni])*(ctable[k]+f*dctable[k]);
force_coul = qiqj*(ftable[k]+f*dftable[k]-t.f);
if (eflag) ecoul = qiqj*(etable[k]+f*detable[k]-t.f);
}
}
}
else force_coul = ecoul = 0.0;
if (rsq < cut_bucksqi[typej]) { // buckingham
register double rn = r2inv*r2inv*r2inv,
expr = exp(-r*rhoinvi[typej]);
if (order6) { // long-range
if (!ndisptablebits || rsq <= tabinnerdispsq) {
register double x2 = g2*rsq, a2 = 1.0/x2;
x2 = a2*exp(-x2)*buckci[typej];
if (ni == 0) {
force_buck =
r*expr*buck1i[typej]-g8*(((6.0*a2+6.0)*a2+3.0)*a2+1.0)*x2*rsq;
if (eflag) evdwl = expr*buckai[typej]-g6*((a2+1.0)*a2+0.5)*x2;
}
else { // special case
register double f = special_lj[ni], t = rn*(1.0-f);
force_buck = f*r*expr*buck1i[typej]-
g8*(((6.0*a2+6.0)*a2+3.0)*a2+1.0)*x2*rsq+t*buck2i[typej];
if (eflag) evdwl = f*expr*buckai[typej] -
g6*((a2+1.0)*a2+0.5)*x2+t*buckci[typej];
}
}
else { //table real space
register union_int_float_t disp_t;
disp_t.f = rsq;
register const int disp_k = (disp_t.i & ndispmask)>>ndispshiftbits;
register double f_disp = (rsq-rdisptable[disp_k])*drdisptable[disp_k];
if (ni == 0) {
force_buck = r*expr*buck1i[typej]-(fdisptable[disp_k]+f_disp*dfdisptable[disp_k])*buckci[typej];
if (eflag) evdwl = expr*buckai[typej]-(edisptable[disp_k]+f_disp*dedisptable[disp_k])*buckci[typej];
}
else { //speial case
register double f = special_lj[ni], t = rn*(1.0-f);
force_buck = f*r*expr*buck1i[typej] -(fdisptable[disp_k]+f_disp*dfdisptable[disp_k])*buckci[typej] +t*buck2i[typej];
if (eflag) evdwl = f*expr*buckai[typej] -(edisptable[disp_k]+f_disp*dedisptable[disp_k])*buckci[typej]+t*buckci[typej];
}
}
}
else { // cut
if (ni == 0) {
force_buck = r*expr*buck1i[typej]-rn*buck2i[typej];
if (eflag) evdwl = expr*buckai[typej] -
rn*buckci[typej]-offseti[typej];
}
else { // special case
register double f = special_lj[ni];
force_buck = f*(r*expr*buck1i[typej]-rn*buck2i[typej]);
if (eflag)
evdwl = f*(expr*buckai[typej]-rn*buckci[typej]-offseti[typej]);
}
}
}
else force_buck = evdwl = 0.0;
fpair = (force_coul+force_buck)*r2inv;
if (newton_pair || j < nlocal) {
register double *fj = f0+(j+(j<<1)), f;
fi[0] += f = d[0]*fpair; fj[0] -= f;
fi[1] += f = d[1]*fpair; fj[1] -= f;
fi[2] += f = d[2]*fpair; fj[2] -= f;
}
else {
fi[0] += d[0]*fpair;
fi[1] += d[1]*fpair;
fi[2] += d[2]*fpair;
}
if (evflag) ev_tally(i,j,nlocal,newton_pair,
evdwl,ecoul,fpair,d[0],d[1],d[2]);
}
}
if (vflag_fdotr) virial_fdotr_compute();
}
/* ---------------------------------------------------------------------- */
void PairBuckLongCoulLong::compute_inner()
{
double r, rsq, r2inv, force_coul = 0.0, force_buck, fpair;
int *type = atom->type;
int nlocal = atom->nlocal;
double *x0 = atom->x[0], *f0 = atom->f[0], *fi = f0, *q = atom->q;
double *special_coul = force->special_coul;
double *special_lj = force->special_lj;
int newton_pair = force->newton_pair;
double qqrd2e = force->qqrd2e;
double cut_out_on = cut_respa[0];
double cut_out_off = cut_respa[1];
double cut_out_diff = cut_out_off - cut_out_on;
double cut_out_on_sq = cut_out_on*cut_out_on;
double cut_out_off_sq = cut_out_off*cut_out_off;
int *ineigh, *ineighn, *jneigh, *jneighn, typei, typej, ni;
int i, j, order1 = (ewald_order|(ewald_off^-1))&(1<<1);
double qri, *cut_bucksqi, *buck1i, *buck2i, *rhoinvi;
vector xi, d;
ineighn = (ineigh = listinner->ilist) + listinner->inum;
for (; ineigh<ineighn; ++ineigh) { // loop over my atoms
i = *ineigh; fi = f0+3*i;
if (order1) qri = qqrd2e*q[i];
memcpy(xi, x0+(i+(i<<1)), sizeof(vector));
cut_bucksqi = cut_bucksq[typei = type[i]];
buck1i = buck1[typei]; buck2i = buck2[typei]; rhoinvi = rhoinv[typei];
jneighn = (jneigh = listinner->firstneigh[i])+listinner->numneigh[i];
for (; jneigh<jneighn; ++jneigh) { // loop over neighbors
j = *jneigh;
ni = sbmask(j);
j &= NEIGHMASK;
{ register double *xj = x0+(j+(j<<1));
d[0] = xi[0] - xj[0]; // pair vector
d[1] = xi[1] - xj[1];
d[2] = xi[2] - xj[2]; }
if ((rsq = vec_dot(d, d)) >= cut_out_off_sq) continue;
r2inv = 1.0/rsq;
r = sqrt(rsq);
if (order1 && (rsq < cut_coulsq)) // coulombic
force_coul = ni == 0 ?
qri*q[j]/r : qri*q[j]/r*special_coul[ni];
if (rsq < cut_bucksqi[typej = type[j]]) { // buckingham
register double rn = r2inv*r2inv*r2inv,
expr = exp(-r*rhoinvi[typej]);
force_buck = ni == 0 ?
(r*expr*buck1i[typej]-rn*buck2i[typej]) :
(r*expr*buck1i[typej]-rn*buck2i[typej])*special_lj[ni];
}
else force_buck = 0.0;
fpair = (force_coul + force_buck) * r2inv;
if (rsq > cut_out_on_sq) { // switching
register double rsw = (sqrt(rsq) - cut_out_on)/cut_out_diff;
fpair *= 1.0 + rsw*rsw*(2.0*rsw-3.0);
}
if (newton_pair || j < nlocal) { // force update
register double *fj = f0+(j+(j<<1)), f;
fi[0] += f = d[0]*fpair; fj[0] -= f;
fi[1] += f = d[1]*fpair; fj[1] -= f;
fi[2] += f = d[2]*fpair; fj[2] -= f;
}
else {
fi[0] += d[0]*fpair;
fi[1] += d[1]*fpair;
fi[2] += d[2]*fpair;
}
}
}
}
/* ---------------------------------------------------------------------- */
void PairBuckLongCoulLong::compute_middle()
{
double r, rsq, r2inv, force_coul = 0.0, force_buck, fpair;
int *type = atom->type;
int nlocal = atom->nlocal;
double *x0 = atom->x[0], *f0 = atom->f[0], *fi = f0, *q = atom->q;
double *special_coul = force->special_coul;
double *special_lj = force->special_lj;
int newton_pair = force->newton_pair;
double qqrd2e = force->qqrd2e;
double cut_in_off = cut_respa[0];
double cut_in_on = cut_respa[1];
double cut_out_on = cut_respa[2];
double cut_out_off = cut_respa[3];
double cut_in_diff = cut_in_on - cut_in_off;
double cut_out_diff = cut_out_off - cut_out_on;
double cut_in_off_sq = cut_in_off*cut_in_off;
double cut_in_on_sq = cut_in_on*cut_in_on;
double cut_out_on_sq = cut_out_on*cut_out_on;
double cut_out_off_sq = cut_out_off*cut_out_off;
int *ineigh, *ineighn, *jneigh, *jneighn, typei, typej, ni;
int i, j, order1 = (ewald_order|(ewald_off^-1))&(1<<1);
double qri, *cut_bucksqi, *buck1i, *buck2i, *rhoinvi;
vector xi, d;
ineighn = (ineigh = listmiddle->ilist)+listmiddle->inum;
for (; ineigh<ineighn; ++ineigh) { // loop over my atoms
i = *ineigh; fi = f0+3*i;
if (order1) qri = qqrd2e*q[i];
memcpy(xi, x0+(i+(i<<1)), sizeof(vector));
cut_bucksqi = cut_bucksq[typei = type[i]];
buck1i = buck1[typei]; buck2i = buck2[typei]; rhoinvi = rhoinv[typei];
jneighn = (jneigh = listmiddle->firstneigh[i])+listmiddle->numneigh[i];
for (; jneigh<jneighn; ++jneigh) { // loop over neighbors
j = *jneigh;
ni = sbmask(j);
j &= NEIGHMASK;
{ register double *xj = x0+(j+(j<<1));
d[0] = xi[0] - xj[0]; // pair vector
d[1] = xi[1] - xj[1];
d[2] = xi[2] - xj[2]; }
if ((rsq = vec_dot(d, d)) >= cut_out_off_sq) continue;
if (rsq <= cut_in_off_sq) continue;
r2inv = 1.0/rsq;
r = sqrt(rsq);
if (order1 && (rsq < cut_coulsq)) // coulombic
force_coul = ni == 0 ?
qri*q[j]/r : qri*q[j]/r*special_coul[ni];
if (rsq < cut_bucksqi[typej = type[j]]) { // buckingham
register double rn = r2inv*r2inv*r2inv,
expr = exp(-r*rhoinvi[typej]);
force_buck = ni == 0 ?
(r*expr*buck1i[typej]-rn*buck2i[typej]) :
(r*expr*buck1i[typej]-rn*buck2i[typej])*special_lj[ni];
}
else force_buck = 0.0;
fpair = (force_coul + force_buck) * r2inv;
if (rsq < cut_in_on_sq) { // switching
register double rsw = (sqrt(rsq) - cut_in_off)/cut_in_diff;
fpair *= rsw*rsw*(3.0 - 2.0*rsw);
}
if (rsq > cut_out_on_sq) {
register double rsw = (sqrt(rsq) - cut_out_on)/cut_out_diff;
fpair *= 1.0 + rsw*rsw*(2.0*rsw-3.0);
}
if (newton_pair || j < nlocal) { // force update
register double *fj = f0+(j+(j<<1)), f;
fi[0] += f = d[0]*fpair; fj[0] -= f;
fi[1] += f = d[1]*fpair; fj[1] -= f;
fi[2] += f = d[2]*fpair; fj[2] -= f;
}
else {
fi[0] += d[0]*fpair;
fi[1] += d[1]*fpair;
fi[2] += d[2]*fpair;
}
}
}
}
/* ---------------------------------------------------------------------- */
void PairBuckLongCoulLong::compute_outer(int eflag, int vflag)
{
double evdwl,ecoul,fpair,fvirial;
evdwl = ecoul = 0.0;
if (eflag || vflag) ev_setup(eflag,vflag);
else evflag = 0;
double **x = atom->x, *x0 = x[0];
double **f = atom->f, *f0 = f[0], *fi = f0;
double *q = atom->q;
int *type = atom->type;
int nlocal = atom->nlocal;
double *special_coul = force->special_coul;
double *special_lj = force->special_lj;
int newton_pair = force->newton_pair;
double qqrd2e = force->qqrd2e;
int i, j, order1 = ewald_order&(1<<1), order6 = ewald_order&(1<<6);
int *ineigh, *ineighn, *jneigh, *jneighn, typei, typej, ni, respa_flag;
double qi = 0.0, qri = 0.0, *cutsqi, *cut_bucksqi,
*buck1i, *buck2i, *buckai, *buckci, *rhoinvi, *offseti;
double r, rsq, r2inv, force_coul, force_buck;
double g2 = g_ewald_6*g_ewald_6, g6 = g2*g2*g2, g8 = g6*g2;
double respa_buck = 0.0, respa_coul = 0.0, frespa = 0.0;
vector xi, d;
double cut_in_off = cut_respa[2];
double cut_in_on = cut_respa[3];
double cut_in_diff = cut_in_on - cut_in_off;
double cut_in_off_sq = cut_in_off*cut_in_off;
double cut_in_on_sq = cut_in_on*cut_in_on;
ineighn = (ineigh = listouter->ilist)+listouter->inum;
for (; ineigh<ineighn; ++ineigh) { // loop over my atoms
i = *ineigh; fi = f0+3*i;
if (order1) qri = (qi = q[i])*qqrd2e; // initialize constants
offseti = offset[typei = type[i]];
buck1i = buck1[typei]; buck2i = buck2[typei];
buckai = buck_a[typei]; buckci = buck_c[typei]; rhoinvi = rhoinv[typei];
cutsqi = cutsq[typei]; cut_bucksqi = cut_bucksq[typei];
memcpy(xi, x0+(i+(i<<1)), sizeof(vector));
jneighn = (jneigh = listouter->firstneigh[i])+listouter->numneigh[i];
for (; jneigh<jneighn; ++jneigh) { // loop over neighbors
j = *jneigh;
ni = sbmask(j);
j &= NEIGHMASK;
{ register double *xj = x0+(j+(j<<1));
d[0] = xi[0] - xj[0]; // pair vector
d[1] = xi[1] - xj[1];
d[2] = xi[2] - xj[2]; }
if ((rsq = vec_dot(d, d)) >= cutsqi[typej = type[j]]) continue;
r2inv = 1.0/rsq;
r = sqrt(rsq);
frespa = 1.0; //check whether and how to compute respa corrections
respa_coul = 0.0;
respa_buck = 0.0;
respa_flag = rsq < cut_in_on_sq ? 1 : 0;
if (respa_flag && (rsq > cut_in_off_sq)) {
register double rsw = (r-cut_in_off)/cut_in_diff;
frespa = 1-rsw*rsw*(3.0-2.0*rsw);
}
if (order1 && (rsq < cut_coulsq)) { // coulombic
if (!ncoultablebits || rsq <= tabinnersq) { // series real space
register double s = qri*q[j];
if (respa_flag) // correct for respa
respa_coul = ni == 0 ? frespa*s/r : frespa*s/r*special_coul[ni];
register double x = g_ewald*r, t = 1.0/(1.0+EWALD_P*x);
if (ni == 0) {
s *= g_ewald*exp(-x*x);
force_coul = (t *= ((((t*A5+A4)*t+A3)*t+A2)*t+A1)*s/x)+EWALD_F*s-respa_coul;
if (eflag) ecoul = t;
}
else { // correct for special
register double ri = s*(1.0-special_coul[ni])/r; s *= g_ewald*exp(-x*x);
force_coul = (t *= ((((t*A5+A4)*t+A3)*t+A2)*t+A1)*s/x)+EWALD_F*s-ri-respa_coul;
if (eflag) ecoul = t-ri;
}
} // table real space
else {
if (respa_flag) {
register double s = qri*q[j];
respa_coul = ni == 0 ? frespa*s/r : frespa*s/r*special_coul[ni];
}
register union_int_float_t t;
t.f = rsq;
register const int k = (t.i & ncoulmask) >> ncoulshiftbits;
register double f = (rsq-rtable[k])*drtable[k], qiqj = qi*q[j];
if (ni == 0) {
force_coul = qiqj*(ftable[k]+f*dftable[k]);
if (eflag) ecoul = qiqj*(etable[k]+f*detable[k]);
}
else { // correct for special
t.f = (1.0-special_coul[ni])*(ctable[k]+f*dctable[k]);
force_coul = qiqj*(ftable[k]+f*dftable[k]-t.f);
if (eflag) {
t.f = (1.0-special_coul[ni])*(ptable[k]+f*dptable[k]);
ecoul = qiqj*(etable[k]+f*detable[k]-t.f);
}
}
}
}
else force_coul = respa_coul = ecoul = 0.0;
if (rsq < cut_bucksqi[typej]) { // buckingham
register double rn = r2inv*r2inv*r2inv,
expr = exp(-r*rhoinvi[typej]);
if (respa_flag) respa_buck = ni == 0 ? // correct for respa
frespa*(r*expr*buck1i[typej]-rn*buck2i[typej]) :
frespa*(r*expr*buck1i[typej]-rn*buck2i[typej])*special_lj[ni];
if (order6) { // long-range form
if (!ndisptablebits || rsq <= tabinnerdispsq) {
register double x2 = g2*rsq, a2 = 1.0/x2;
x2 = a2*exp(-x2)*buckci[typej];
if (ni == 0) {
force_buck =
r*expr*buck1i[typej]-g8*(((6.0*a2+6.0)*a2+3.0)*a2+1.0)*x2*rsq-respa_buck;
if (eflag) evdwl = expr*buckai[typej]-g6*((a2+1.0)*a2+0.5)*x2;
}
else { // correct for special
register double f = special_lj[ni], t = rn*(1.0-f);
force_buck = f*r*expr*buck1i[typej]-
g8*(((6.0*a2+6.0)*a2+3.0)*a2+1.0)*x2*rsq+t*buck2i[typej]-respa_buck;
if (eflag) evdwl = f*expr*buckai[typej] -
g6*((a2+1.0)*a2+0.5)*x2+t*buckci[typej];
}
}
else { // table real space
register union_int_float_t disp_t;
disp_t.f = rsq;
register const int disp_k = (disp_t.i & ndispmask)>>ndispshiftbits;
register double f_disp = (rsq-rdisptable[disp_k])*drdisptable[disp_k];
register double rn = r2inv*r2inv*r2inv;
if (ni == 0) {
force_buck = r*expr*buck1i[typej]-(fdisptable[disp_k]+f_disp*dfdisptable[disp_k])*buckci[typej]-respa_buck;
if (eflag) evdwl = expr*buckai[typej]-(edisptable[disp_k]+f_disp*dedisptable[disp_k])*buckci[typej];
}
else { //special case
register double f = special_lj[ni], t = rn*(1.0-f);
force_buck = f*r*expr*buck1i[typej]-(fdisptable[disp_k]+f_disp*dfdisptable[disp_k])*buckci[typej]+t*buck2i[typej]-respa_buck;
if (eflag) evdwl = f*expr*buckai[typej]-(edisptable[disp_k]+f_disp*dedisptable[disp_k])*buckci[typej]+t*buckci[typej];
}
}
}
else { // cut form
if (ni == 0) {
force_buck = r*expr*buck1i[typej]-rn*buck2i[typej]-respa_buck;
if (eflag)
evdwl = expr*buckai[typej]-rn*buckci[typej]-offseti[typej];
}
else { // correct for special
register double f = special_lj[ni];
force_buck = f*(r*expr*buck1i[typej]-rn*buck2i[typej])-respa_buck;
if (eflag)
evdwl = f*(expr*buckai[typej]-rn*buckci[typej]-offseti[typej]);
}
}
}
else force_buck = respa_buck = evdwl = 0.0;
fpair = (force_coul+force_buck)*r2inv;
if (newton_pair || j < nlocal) {
register double *fj = f0+(j+(j<<1)), f;
fi[0] += f = d[0]*fpair; fj[0] -= f;
fi[1] += f = d[1]*fpair; fj[1] -= f;
fi[2] += f = d[2]*fpair; fj[2] -= f;
}
else {
fi[0] += d[0]*fpair;
fi[1] += d[1]*fpair;
fi[2] += d[2]*fpair;
}
if (evflag) {
fvirial = (force_coul + force_buck + respa_coul + respa_buck)*r2inv;
ev_tally(i,j,nlocal,newton_pair,
evdwl,ecoul,fvirial,d[0],d[1],d[2]);
}
}
}
}
/* ---------------------------------------------------------------------- */
double PairBuckLongCoulLong::single(int i, int j, int itype, int jtype,
double rsq, double factor_coul, double factor_buck,
double &fforce)
{
double f, r, r2inv, r6inv, force_coul, force_buck;
double g2 = g_ewald_6*g_ewald_6, g6 = g2*g2*g2, g8 = g6*g2, *q = atom->q;
r = sqrt(rsq);
r2inv = 1.0/rsq;
double eng = 0.0;
if ((ewald_order&2) && (rsq < cut_coulsq)) { // coulombic
if (!ncoultablebits || rsq <= tabinnersq) { // series real space
register double x = g_ewald*r;
register double s = force->qqrd2e*q[i]*q[j], t = 1.0/(1.0+EWALD_P*x);
f = s*(1.0-factor_coul)/r; s *= g_ewald*exp(-x*x);
force_coul = (t *= ((((t*A5+A4)*t+A3)*t+A2)*t+A1)*s/x)+EWALD_F*s-f;
eng += t-f;
}
else { // table real space
register union_int_float_t t;
t.f = rsq;
register const int k = (t.i & ncoulmask) >> ncoulshiftbits;
register double f = (rsq-rtable[k])*drtable[k], qiqj = q[i]*q[j];
t.f = (1.0-factor_coul)*(ctable[k]+f*dctable[k]);
force_coul = qiqj*(ftable[k]+f*dftable[k]-t.f);
eng += qiqj*(etable[k]+f*detable[k]-t.f);
}
} else force_coul = 0.0;
if (rsq < cut_bucksq[itype][jtype]) { // buckingham
register double expr = factor_buck*exp(-sqrt(rsq)*rhoinv[itype][jtype]);
r6inv = r2inv*r2inv*r2inv;
if (ewald_order&64) { // long-range
register double x2 = g2*rsq, a2 = 1.0/x2, t = r6inv*(1.0-factor_buck);
x2 = a2*exp(-x2)*buck_c[itype][jtype];
force_buck = buck1[itype][jtype]*r*expr-
g8*(((6.0*a2+6.0)*a2+3.0)*a2+a2)*x2*rsq+t*buck2[itype][jtype];
eng += buck_a[itype][jtype]*expr-
g6*((a2+1.0)*a2+0.5)*x2+t*buck_c[itype][jtype];
}
else { // cut
force_buck =
buck1[itype][jtype]*r*expr-factor_buck*buck_c[itype][jtype]*r6inv;
eng += buck_a[itype][jtype]*expr-
factor_buck*(buck_c[itype][jtype]*r6inv-offset[itype][jtype]);
}
} else force_buck = 0.0;
fforce = (force_coul+force_buck)*r2inv;
return eng;
}
diff --git a/src/KSPACE/pair_lj_long_coul_long.cpp b/src/KSPACE/pair_lj_long_coul_long.cpp
index 5ae607a2f..90d517ca2 100644
--- a/src/KSPACE/pair_lj_long_coul_long.cpp
+++ b/src/KSPACE/pair_lj_long_coul_long.cpp
@@ -1,1042 +1,1044 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing author: Pieter J. in 't Veld (SNL)
Tabulation for long-range dispersion added by Wayne Mitchell (Loyola
University New Orleans)
------------------------------------------------------------------------- */
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "math_vector.h"
#include "pair_lj_long_coul_long.h"
#include "atom.h"
#include "comm.h"
#include "neighbor.h"
#include "neigh_list.h"
#include "neigh_request.h"
#include "force.h"
#include "kspace.h"
#include "update.h"
#include "integrate.h"
#include "respa.h"
#include "memory.h"
#include "error.h"
using namespace LAMMPS_NS;
#define EWALD_F 1.12837917
#define EWALD_P 0.3275911
#define A1 0.254829592
#define A2 -0.284496736
#define A3 1.421413741
#define A4 -1.453152027
#define A5 1.061405429
/* ---------------------------------------------------------------------- */
PairLJLongCoulLong::PairLJLongCoulLong(LAMMPS *lmp) : Pair(lmp)
{
dispersionflag = ewaldflag = pppmflag = 1;
respa_enable = 1;
writedata = 1;
ftable = NULL;
fdisptable = NULL;
qdist = 0.0;
}
/* ----------------------------------------------------------------------
global settings
------------------------------------------------------------------------- */
void PairLJLongCoulLong::options(char **arg, int order)
{
const char *option[] = {"long", "cut", "off", NULL};
int i;
if (!*arg) error->all(FLERR,"Illegal pair_style lj/long/coul/long command");
for (i=0; option[i]&&strcmp(arg[0], option[i]); ++i);
switch (i) {
default: error->all(FLERR,"Illegal pair_style lj/long/coul/long command");
case 0: ewald_order |= 1<<order; break;
case 2: ewald_off |= 1<<order;
case 1: break;
}
}
void PairLJLongCoulLong::settings(int narg, char **arg)
{
if (narg != 3 && narg != 4) error->all(FLERR,"Illegal pair_style command");
- ewald_off = 0;
ewald_order = 0;
- options(arg, 6);
- options(++arg, 1);
+ ewald_off = 0;
+
+ options(arg,6);
+ options(++arg,1);
+
if (!comm->me && ewald_order == ((1<<1) | (1<<6)))
error->warning(FLERR,"Using largest cutoff for lj/long/coul/long");
if (!*(++arg))
error->all(FLERR,"Cutoffs missing in pair_style lj/long/coul/long");
if (!((ewald_order^ewald_off) & (1<<1)))
error->all(FLERR,
"Coulomb cut not supported in pair_style lj/long/coul/long");
cut_lj_global = force->numeric(FLERR,*(arg++));
if (narg == 4 && ((ewald_order & 0x42) == 0x42))
error->all(FLERR,"Only one cutoff allowed when requesting all long");
if (narg == 4) cut_coul = force->numeric(FLERR,*arg);
else cut_coul = cut_lj_global;
if (allocated) {
int i,j;
for (i = 1; i <= atom->ntypes; i++)
for (j = i+1; j <= atom->ntypes; j++)
if (setflag[i][j]) cut_lj[i][j] = cut_lj_global;
}
}
/* ----------------------------------------------------------------------
free all arrays
------------------------------------------------------------------------- */
PairLJLongCoulLong::~PairLJLongCoulLong()
{
if (allocated) {
memory->destroy(setflag);
memory->destroy(cutsq);
memory->destroy(cut_lj_read);
memory->destroy(cut_lj);
memory->destroy(cut_ljsq);
memory->destroy(epsilon_read);
memory->destroy(epsilon);
memory->destroy(sigma_read);
memory->destroy(sigma);
memory->destroy(lj1);
memory->destroy(lj2);
memory->destroy(lj3);
memory->destroy(lj4);
memory->destroy(offset);
}
if (ftable) free_tables();
if (fdisptable) free_disp_tables();
}
/* ----------------------------------------------------------------------
allocate all arrays
------------------------------------------------------------------------- */
void PairLJLongCoulLong::allocate()
{
allocated = 1;
int n = atom->ntypes;
memory->create(setflag,n+1,n+1,"pair:setflag");
for (int i = 1; i <= n; i++)
for (int j = i; j <= n; j++)
setflag[i][j] = 0;
memory->create(cutsq,n+1,n+1,"pair:cutsq");
memory->create(cut_lj_read,n+1,n+1,"pair:cut_lj_read");
memory->create(cut_lj,n+1,n+1,"pair:cut_lj");
memory->create(cut_ljsq,n+1,n+1,"pair:cut_ljsq");
memory->create(epsilon_read,n+1,n+1,"pair:epsilon_read");
memory->create(epsilon,n+1,n+1,"pair:epsilon");
memory->create(sigma_read,n+1,n+1,"pair:sigma_read");
memory->create(sigma,n+1,n+1,"pair:sigma");
memory->create(lj1,n+1,n+1,"pair:lj1");
memory->create(lj2,n+1,n+1,"pair:lj2");
memory->create(lj3,n+1,n+1,"pair:lj3");
memory->create(lj4,n+1,n+1,"pair:lj4");
memory->create(offset,n+1,n+1,"pair:offset");
}
/* ----------------------------------------------------------------------
extract protected data from object
------------------------------------------------------------------------- */
void *PairLJLongCoulLong::extract(const char *id, int &dim)
{
const char *ids[] = {
"B", "sigma", "epsilon", "ewald_order", "ewald_cut", "ewald_mix",
"cut_coul", "cut_LJ", NULL};
void *ptrs[] = {
lj4, sigma, epsilon, &ewald_order, &cut_coul, &mix_flag,
&cut_coul, &cut_lj_global, NULL};
int i;
for (i=0; ids[i]&&strcmp(ids[i], id); ++i);
if (i <= 2) dim = 2;
else dim = 0;
return ptrs[i];
}
/* ----------------------------------------------------------------------
set coeffs for one or more type pairs
------------------------------------------------------------------------- */
void PairLJLongCoulLong::coeff(int narg, char **arg)
{
if (narg < 4 || narg > 5) error->all(FLERR,"Incorrect args for pair coefficients");
if (!allocated) allocate();
int ilo,ihi,jlo,jhi;
force->bounds(FLERR,arg[0],atom->ntypes,ilo,ihi);
force->bounds(FLERR,arg[1],atom->ntypes,jlo,jhi);
double epsilon_one = force->numeric(FLERR,arg[2]);
double sigma_one = force->numeric(FLERR,arg[3]);
double cut_lj_one = cut_lj_global;
if (narg == 5) cut_lj_one = force->numeric(FLERR,arg[4]);
int count = 0;
for (int i = ilo; i <= ihi; i++) {
for (int j = MAX(jlo,i); j <= jhi; j++) {
epsilon_read[i][j] = epsilon_one;
sigma_read[i][j] = sigma_one;
cut_lj_read[i][j] = cut_lj_one;
setflag[i][j] = 1;
count++;
}
}
if (count == 0) error->all(FLERR,"Incorrect args for pair coefficients");
}
/* ----------------------------------------------------------------------
init specific to this pair style
------------------------------------------------------------------------- */
void PairLJLongCoulLong::init_style()
{
// require an atom style with charge defined
if (!atom->q_flag && (ewald_order&(1<<1)))
error->all(FLERR,
- "Invoking coulombic in pair style lj/coul requires atom attribute q");
+ "Invoking coulombic in pair style lj/long/coul/long requires atom attribute q");
+
+ // ensure use of KSpace long-range solver, set two g_ewalds
+
+ if (force->kspace == NULL)
+ error->all(FLERR,"Pair style requires a KSpace style");
+ if (ewald_order&(1<<1)) g_ewald = force->kspace->g_ewald;
+ if (ewald_order&(1<<6)) g_ewald_6 = force->kspace->g_ewald_6;
+
+ // set rRESPA cutoffs
+
+ if (strstr(update->integrate_style,"respa") &&
+ ((Respa *) update->integrate)->level_inner >= 0)
+ cut_respa = ((Respa *) update->integrate)->cutoff;
+ else cut_respa = NULL;
+
+ // setup force tables
+
+ if (ncoultablebits && (ewald_order&(1<<1))) init_tables(cut_coul,cut_respa);
+ if (ndisptablebits && (ewald_order&(1<<6))) init_tables_disp(cut_lj_global);
// request regular or rRESPA neighbor lists if neighrequest_flag != 0
if (force->kspace->neighrequest_flag) {
int irequest;
if (update->whichflag == 1 && strstr(update->integrate_style,"respa")) {
int respa = 0;
if (((Respa *) update->integrate)->level_inner >= 0) respa = 1;
if (((Respa *) update->integrate)->level_middle >= 0) respa = 2;
if (respa == 0) irequest = neighbor->request(this,instance_me);
else if (respa == 1) {
irequest = neighbor->request(this,instance_me);
neighbor->requests[irequest]->id = 1;
neighbor->requests[irequest]->half = 0;
neighbor->requests[irequest]->respainner = 1;
irequest = neighbor->request(this,instance_me);
neighbor->requests[irequest]->id = 3;
neighbor->requests[irequest]->half = 0;
neighbor->requests[irequest]->respaouter = 1;
} else {
irequest = neighbor->request(this,instance_me);
neighbor->requests[irequest]->id = 1;
neighbor->requests[irequest]->half = 0;
neighbor->requests[irequest]->respainner = 1;
irequest = neighbor->request(this,instance_me);
neighbor->requests[irequest]->id = 2;
neighbor->requests[irequest]->half = 0;
neighbor->requests[irequest]->respamiddle = 1;
irequest = neighbor->request(this,instance_me);
neighbor->requests[irequest]->id = 3;
neighbor->requests[irequest]->half = 0;
neighbor->requests[irequest]->respaouter = 1;
}
} else irequest = neighbor->request(this,instance_me);
}
- cut_coulsq = cut_coul * cut_coul;
-
- // set rRESPA cutoffs
-
- if (strstr(update->integrate_style,"respa") &&
- ((Respa *) update->integrate)->level_inner >= 0)
- cut_respa = ((Respa *) update->integrate)->cutoff;
- else cut_respa = NULL;
-
- // ensure use of KSpace long-range solver, set g_ewald
-
- if (force->kspace == NULL)
- error->all(FLERR,"Pair style requires a KSpace style");
- if (force->kspace) g_ewald = force->kspace->g_ewald;
- if (force->kspace) g_ewald_6 = force->kspace->g_ewald_6;
-
- // setup force tables
-
- if (ncoultablebits && (ewald_order&(1<<1))) init_tables(cut_coul,cut_respa);
- if (ndisptablebits && (ewald_order&(1<<6))) init_tables_disp(cut_lj_global);
+ cut_coulsq = cut_coul * cut_coul;
}
/* ----------------------------------------------------------------------
neighbor callback to inform pair style of neighbor list to use
regular or rRESPA
------------------------------------------------------------------------- */
void PairLJLongCoulLong::init_list(int id, NeighList *ptr)
{
if (id == 0) list = ptr;
else if (id == 1) listinner = ptr;
else if (id == 2) listmiddle = ptr;
else if (id == 3) listouter = ptr;
}
/* ----------------------------------------------------------------------
init for one type pair i,j and corresponding j,i
------------------------------------------------------------------------- */
double PairLJLongCoulLong::init_one(int i, int j)
{
if (setflag[i][j] == 0) {
epsilon[i][j] = mix_energy(epsilon_read[i][i],epsilon_read[j][j],
sigma_read[i][i],sigma_read[j][j]);
sigma[i][j] = mix_distance(sigma_read[i][i],sigma_read[j][j]);
if (ewald_order&(1<<6))
cut_lj[i][j] = cut_lj_global;
else
cut_lj[i][j] = mix_distance(cut_lj_read[i][i],cut_lj_read[j][j]);
}
else {
sigma[i][j] = sigma_read[i][j];
epsilon[i][j] = epsilon_read[i][j];
cut_lj[i][j] = cut_lj_read[i][j];
}
double cut = MAX(cut_lj[i][j], cut_coul + 2.0*qdist);
cutsq[i][j] = cut*cut;
cut_ljsq[i][j] = cut_lj[i][j] * cut_lj[i][j];
lj1[i][j] = 48.0 * epsilon[i][j] * pow(sigma[i][j],12.0);
lj2[i][j] = 24.0 * epsilon[i][j] * pow(sigma[i][j],6.0);
lj3[i][j] = 4.0 * epsilon[i][j] * pow(sigma[i][j],12.0);
lj4[i][j] = 4.0 * epsilon[i][j] * pow(sigma[i][j],6.0);
// check interior rRESPA cutoff
if (cut_respa && MIN(cut_lj[i][j],cut_coul) < cut_respa[3])
error->all(FLERR,"Pair cutoff < Respa interior cutoff");
if (offset_flag) {
double ratio = sigma[i][j] / cut_lj[i][j];
offset[i][j] = 4.0 * epsilon[i][j] * (pow(ratio,12.0) - pow(ratio,6.0));
} else offset[i][j] = 0.0;
cutsq[j][i] = cutsq[i][j];
cut_ljsq[j][i] = cut_ljsq[i][j];
lj1[j][i] = lj1[i][j];
lj2[j][i] = lj2[i][j];
lj3[j][i] = lj3[i][j];
lj4[j][i] = lj4[i][j];
offset[j][i] = offset[i][j];
return cut;
}
/* ----------------------------------------------------------------------
proc 0 writes to restart file
------------------------------------------------------------------------- */
void PairLJLongCoulLong::write_restart(FILE *fp)
{
write_restart_settings(fp);
int i,j;
for (i = 1; i <= atom->ntypes; i++)
for (j = i; j <= atom->ntypes; j++) {
fwrite(&setflag[i][j],sizeof(int),1,fp);
if (setflag[i][j]) {
fwrite(&epsilon_read[i][j],sizeof(double),1,fp);
fwrite(&sigma_read[i][j],sizeof(double),1,fp);
fwrite(&cut_lj_read[i][j],sizeof(double),1,fp);
}
}
}
/* ----------------------------------------------------------------------
proc 0 reads from restart file, bcasts
------------------------------------------------------------------------- */
void PairLJLongCoulLong::read_restart(FILE *fp)
{
read_restart_settings(fp);
allocate();
int i,j;
int me = comm->me;
for (i = 1; i <= atom->ntypes; i++)
for (j = i; j <= atom->ntypes; j++) {
if (me == 0) fread(&setflag[i][j],sizeof(int),1,fp);
MPI_Bcast(&setflag[i][j],1,MPI_INT,0,world);
if (setflag[i][j]) {
if (me == 0) {
fread(&epsilon_read[i][j],sizeof(double),1,fp);
fread(&sigma_read[i][j],sizeof(double),1,fp);
fread(&cut_lj_read[i][j],sizeof(double),1,fp);
}
MPI_Bcast(&epsilon_read[i][j],1,MPI_DOUBLE,0,world);
MPI_Bcast(&sigma_read[i][j],1,MPI_DOUBLE,0,world);
MPI_Bcast(&cut_lj_read[i][j],1,MPI_DOUBLE,0,world);
}
}
}
/* ----------------------------------------------------------------------
proc 0 writes to restart file
------------------------------------------------------------------------- */
void PairLJLongCoulLong::write_restart_settings(FILE *fp)
{
fwrite(&cut_lj_global,sizeof(double),1,fp);
fwrite(&cut_coul,sizeof(double),1,fp);
fwrite(&offset_flag,sizeof(int),1,fp);
fwrite(&mix_flag,sizeof(int),1,fp);
fwrite(&ncoultablebits,sizeof(int),1,fp);
fwrite(&tabinner,sizeof(double),1,fp);
fwrite(&ewald_order,sizeof(int),1,fp);
}
/* ----------------------------------------------------------------------
proc 0 reads from restart file, bcasts
------------------------------------------------------------------------- */
void PairLJLongCoulLong::read_restart_settings(FILE *fp)
{
if (comm->me == 0) {
fread(&cut_lj_global,sizeof(double),1,fp);
fread(&cut_coul,sizeof(double),1,fp);
fread(&offset_flag,sizeof(int),1,fp);
fread(&mix_flag,sizeof(int),1,fp);
fread(&ncoultablebits,sizeof(int),1,fp);
fread(&tabinner,sizeof(double),1,fp);
fread(&ewald_order,sizeof(int),1,fp);
}
MPI_Bcast(&cut_lj_global,1,MPI_DOUBLE,0,world);
MPI_Bcast(&cut_coul,1,MPI_DOUBLE,0,world);
MPI_Bcast(&offset_flag,1,MPI_INT,0,world);
MPI_Bcast(&mix_flag,1,MPI_INT,0,world);
MPI_Bcast(&ncoultablebits,1,MPI_INT,0,world);
MPI_Bcast(&tabinner,1,MPI_DOUBLE,0,world);
MPI_Bcast(&ewald_order,1,MPI_INT,0,world);
}
/* ----------------------------------------------------------------------
proc 0 writes to data file
------------------------------------------------------------------------- */
void PairLJLongCoulLong::write_data(FILE *fp)
{
for (int i = 1; i <= atom->ntypes; i++)
fprintf(fp,"%d %g %g\n",i,epsilon_read[i][i],sigma_read[i][i]);
}
/* ----------------------------------------------------------------------
proc 0 writes all pairs to data file
------------------------------------------------------------------------- */
void PairLJLongCoulLong::write_data_all(FILE *fp)
{
for (int i = 1; i <= atom->ntypes; i++)
for (int j = i; j <= atom->ntypes; j++)
fprintf(fp,"%d %d %g %g %g\n",i,j,
epsilon_read[i][j],sigma_read[i][j],cut_lj_read[i][j]);
}
/* ----------------------------------------------------------------------
compute pair interactions
------------------------------------------------------------------------- */
void PairLJLongCoulLong::compute(int eflag, int vflag)
{
double evdwl,ecoul,fpair;
evdwl = ecoul = 0.0;
if (eflag || vflag) ev_setup(eflag,vflag);
else evflag = vflag_fdotr = 0;
double **x = atom->x, *x0 = x[0];
double **f = atom->f, *f0 = f[0], *fi = f0;
double *q = atom->q;
int *type = atom->type;
int nlocal = atom->nlocal;
double *special_coul = force->special_coul;
double *special_lj = force->special_lj;
int newton_pair = force->newton_pair;
double qqrd2e = force->qqrd2e;
int i, j, order1 = ewald_order&(1<<1), order6 = ewald_order&(1<<6);
int *ineigh, *ineighn, *jneigh, *jneighn, typei, typej, ni;
double qi = 0.0, qri = 0.0;
double *cutsqi, *cut_ljsqi, *lj1i, *lj2i, *lj3i, *lj4i, *offseti;
double rsq, r2inv, force_coul, force_lj;
double g2 = g_ewald_6*g_ewald_6, g6 = g2*g2*g2, g8 = g6*g2;
vector xi, d;
ineighn = (ineigh = list->ilist)+list->inum;
for (; ineigh<ineighn; ++ineigh) { // loop over my atoms
i = *ineigh; fi = f0+3*i;
if (order1) qri = (qi = q[i])*qqrd2e; // initialize constants
offseti = offset[typei = type[i]];
lj1i = lj1[typei]; lj2i = lj2[typei]; lj3i = lj3[typei]; lj4i = lj4[typei];
cutsqi = cutsq[typei]; cut_ljsqi = cut_ljsq[typei];
memcpy(xi, x0+(i+(i<<1)), sizeof(vector));
jneighn = (jneigh = list->firstneigh[i])+list->numneigh[i];
for (; jneigh<jneighn; ++jneigh) { // loop over neighbors
j = *jneigh;
ni = sbmask(j);
j &= NEIGHMASK;
{ register double *xj = x0+(j+(j<<1));
d[0] = xi[0] - xj[0]; // pair vector
d[1] = xi[1] - xj[1];
d[2] = xi[2] - xj[2]; }
if ((rsq = vec_dot(d, d)) >= cutsqi[typej = type[j]]) continue;
r2inv = 1.0/rsq;
if (order1 && (rsq < cut_coulsq)) { // coulombic
if (!ncoultablebits || rsq <= tabinnersq) { // series real space
register double r = sqrt(rsq), x = g_ewald*r;
register double s = qri*q[j], t = 1.0/(1.0+EWALD_P*x);
if (ni == 0) {
s *= g_ewald*exp(-x*x);
force_coul = (t *= ((((t*A5+A4)*t+A3)*t+A2)*t+A1)*s/x)+EWALD_F*s;
if (eflag) ecoul = t;
}
else { // special case
r = s*(1.0-special_coul[ni])/r; s *= g_ewald*exp(-x*x);
force_coul = (t *= ((((t*A5+A4)*t+A3)*t+A2)*t+A1)*s/x)+EWALD_F*s-r;
if (eflag) ecoul = t-r;
}
} // table real space
else {
register union_int_float_t t;
t.f = rsq;
register const int k = (t.i & ncoulmask)>>ncoulshiftbits;
register double f = (rsq-rtable[k])*drtable[k], qiqj = qi*q[j];
if (ni == 0) {
force_coul = qiqj*(ftable[k]+f*dftable[k]);
if (eflag) ecoul = qiqj*(etable[k]+f*detable[k]);
}
else { // special case
t.f = (1.0-special_coul[ni])*(ctable[k]+f*dctable[k]);
force_coul = qiqj*(ftable[k]+f*dftable[k]-t.f);
if (eflag) ecoul = qiqj*(etable[k]+f*detable[k]-t.f);
}
}
}
else force_coul = ecoul = 0.0;
if (rsq < cut_ljsqi[typej]) { // lj
if (order6) { // long-range lj
if(!ndisptablebits || rsq <= tabinnerdispsq) { // series real space
register double rn = r2inv*r2inv*r2inv;
register double x2 = g2*rsq, a2 = 1.0/x2;
x2 = a2*exp(-x2)*lj4i[typej];
if (ni == 0) {
force_lj =
(rn*=rn)*lj1i[typej]-g8*(((6.0*a2+6.0)*a2+3.0)*a2+1.0)*x2*rsq;
if (eflag)
evdwl = rn*lj3i[typej]-g6*((a2+1.0)*a2+0.5)*x2;
}
else { // special case
register double f = special_lj[ni], t = rn*(1.0-f);
force_lj = f*(rn *= rn)*lj1i[typej]-
g8*(((6.0*a2+6.0)*a2+3.0)*a2+1.0)*x2*rsq+t*lj2i[typej];
if (eflag)
evdwl = f*rn*lj3i[typej]-g6*((a2+1.0)*a2+0.5)*x2+t*lj4i[typej];
}
}
else { // table real space
register union_int_float_t disp_t;
disp_t.f = rsq;
register const int disp_k = (disp_t.i & ndispmask)>>ndispshiftbits;
register double f_disp = (rsq-rdisptable[disp_k])*drdisptable[disp_k];
register double rn = r2inv*r2inv*r2inv;
if (ni == 0) {
force_lj = (rn*=rn)*lj1i[typej]-(fdisptable[disp_k]+f_disp*dfdisptable[disp_k])*lj4i[typej];
if (eflag) evdwl = rn*lj3i[typej]-(edisptable[disp_k]+f_disp*dedisptable[disp_k])*lj4i[typej];
}
else { // special case
register double f = special_lj[ni], t = rn*(1.0-f);
force_lj = f*(rn *= rn)*lj1i[typej]-(fdisptable[disp_k]+f_disp*dfdisptable[disp_k])*lj4i[typej]+t*lj2i[typej];
if (eflag) evdwl = f*rn*lj3i[typej]-(edisptable[disp_k]+f_disp*dedisptable[disp_k])*lj4i[typej]+t*lj4i[typej];
}
}
}
else { // cut lj
register double rn = r2inv*r2inv*r2inv;
if (ni == 0) {
force_lj = rn*(rn*lj1i[typej]-lj2i[typej]);
if (eflag) evdwl = rn*(rn*lj3i[typej]-lj4i[typej])-offseti[typej];
}
else { // special case
register double f = special_lj[ni];
force_lj = f*rn*(rn*lj1i[typej]-lj2i[typej]);
if (eflag)
evdwl = f * (rn*(rn*lj3i[typej]-lj4i[typej])-offseti[typej]);
}
}
}
else force_lj = evdwl = 0.0;
fpair = (force_coul+force_lj)*r2inv;
if (newton_pair || j < nlocal) {
register double *fj = f0+(j+(j<<1)), f;
fi[0] += f = d[0]*fpair; fj[0] -= f;
fi[1] += f = d[1]*fpair; fj[1] -= f;
fi[2] += f = d[2]*fpair; fj[2] -= f;
}
else {
fi[0] += d[0]*fpair;
fi[1] += d[1]*fpair;
fi[2] += d[2]*fpair;
}
if (evflag) ev_tally(i,j,nlocal,newton_pair,
evdwl,ecoul,fpair,d[0],d[1],d[2]);
}
}
if (vflag_fdotr) virial_fdotr_compute();
}
/* ---------------------------------------------------------------------- */
void PairLJLongCoulLong::compute_inner()
{
double rsq, r2inv, force_coul = 0.0, force_lj, fpair;
int *type = atom->type;
int nlocal = atom->nlocal;
double *x0 = atom->x[0], *f0 = atom->f[0], *fi = f0, *q = atom->q;
double *special_coul = force->special_coul;
double *special_lj = force->special_lj;
int newton_pair = force->newton_pair;
double qqrd2e = force->qqrd2e;
double cut_out_on = cut_respa[0];
double cut_out_off = cut_respa[1];
double cut_out_diff = cut_out_off - cut_out_on;
double cut_out_on_sq = cut_out_on*cut_out_on;
double cut_out_off_sq = cut_out_off*cut_out_off;
int *ineigh, *ineighn, *jneigh, *jneighn, typei, typej, ni;
int i, j, order1 = (ewald_order|(ewald_off^-1))&(1<<1);
double qri, *cut_ljsqi, *lj1i, *lj2i;
vector xi, d;
ineighn = (ineigh = listinner->ilist)+listinner->inum;
for (; ineigh<ineighn; ++ineigh) { // loop over my atoms
i = *ineigh; fi = f0+3*i;
memcpy(xi, x0+(i+(i<<1)), sizeof(vector));
cut_ljsqi = cut_ljsq[typei = type[i]];
lj1i = lj1[typei]; lj2i = lj2[typei];
jneighn = (jneigh = listinner->firstneigh[i])+listinner->numneigh[i];
for (; jneigh<jneighn; ++jneigh) { // loop over neighbors
j = *jneigh;
ni = sbmask(j);
j &= NEIGHMASK;
{ register double *xj = x0+(j+(j<<1));
d[0] = xi[0] - xj[0]; // pair vector
d[1] = xi[1] - xj[1];
d[2] = xi[2] - xj[2]; }
if ((rsq = vec_dot(d, d)) >= cut_out_off_sq) continue;
r2inv = 1.0/rsq;
if (order1 && (rsq < cut_coulsq)) { // coulombic
qri = qqrd2e*q[i];
force_coul = ni == 0 ?
qri*q[j]*sqrt(r2inv) : qri*q[j]*sqrt(r2inv)*special_coul[ni];
}
if (rsq < cut_ljsqi[typej = type[j]]) { // lennard-jones
register double rn = r2inv*r2inv*r2inv;
force_lj = ni == 0 ?
rn*(rn*lj1i[typej]-lj2i[typej]) :
rn*(rn*lj1i[typej]-lj2i[typej])*special_lj[ni];
}
else force_lj = 0.0;
fpair = (force_coul + force_lj) * r2inv;
if (rsq > cut_out_on_sq) { // switching
register double rsw = (sqrt(rsq) - cut_out_on)/cut_out_diff;
fpair *= 1.0 + rsw*rsw*(2.0*rsw-3.0);
}
if (newton_pair || j < nlocal) { // force update
register double *fj = f0+(j+(j<<1)), f;
fi[0] += f = d[0]*fpair; fj[0] -= f;
fi[1] += f = d[1]*fpair; fj[1] -= f;
fi[2] += f = d[2]*fpair; fj[2] -= f;
}
else {
fi[0] += d[0]*fpair;
fi[1] += d[1]*fpair;
fi[2] += d[2]*fpair;
}
}
}
}
/* ---------------------------------------------------------------------- */
void PairLJLongCoulLong::compute_middle()
{
double rsq, r2inv, force_coul = 0.0, force_lj, fpair;
int *type = atom->type;
int nlocal = atom->nlocal;
double *x0 = atom->x[0], *f0 = atom->f[0], *fi = f0, *q = atom->q;
double *special_coul = force->special_coul;
double *special_lj = force->special_lj;
int newton_pair = force->newton_pair;
double qqrd2e = force->qqrd2e;
double cut_in_off = cut_respa[0];
double cut_in_on = cut_respa[1];
double cut_out_on = cut_respa[2];
double cut_out_off = cut_respa[3];
double cut_in_diff = cut_in_on - cut_in_off;
double cut_out_diff = cut_out_off - cut_out_on;
double cut_in_off_sq = cut_in_off*cut_in_off;
double cut_in_on_sq = cut_in_on*cut_in_on;
double cut_out_on_sq = cut_out_on*cut_out_on;
double cut_out_off_sq = cut_out_off*cut_out_off;
int *ineigh, *ineighn, *jneigh, *jneighn, typei, typej, ni;
int i, j, order1 = (ewald_order|(ewald_off^-1))&(1<<1);
double qri, *cut_ljsqi, *lj1i, *lj2i;
vector xi, d;
ineighn = (ineigh = listmiddle->ilist)+listmiddle->inum;
for (; ineigh<ineighn; ++ineigh) { // loop over my atoms
i = *ineigh; fi = f0+3*i;
if (order1) qri = qqrd2e*q[i];
memcpy(xi, x0+(i+(i<<1)), sizeof(vector));
cut_ljsqi = cut_ljsq[typei = type[i]];
lj1i = lj1[typei]; lj2i = lj2[typei];
jneighn = (jneigh = listmiddle->firstneigh[i])+listmiddle->numneigh[i];
for (; jneigh<jneighn; ++jneigh) {
j = *jneigh;
ni = sbmask(j);
j &= NEIGHMASK;
{ register double *xj = x0+(j+(j<<1));
d[0] = xi[0] - xj[0]; // pair vector
d[1] = xi[1] - xj[1];
d[2] = xi[2] - xj[2]; }
if ((rsq = vec_dot(d, d)) >= cut_out_off_sq) continue;
if (rsq <= cut_in_off_sq) continue;
r2inv = 1.0/rsq;
if (order1 && (rsq < cut_coulsq)) // coulombic
force_coul = ni == 0 ?
qri*q[j]*sqrt(r2inv) : qri*q[j]*sqrt(r2inv)*special_coul[ni];
if (rsq < cut_ljsqi[typej = type[j]]) { // lennard-jones
register double rn = r2inv*r2inv*r2inv;
force_lj = ni == 0 ?
rn*(rn*lj1i[typej]-lj2i[typej]) :
rn*(rn*lj1i[typej]-lj2i[typej])*special_lj[ni];
}
else force_lj = 0.0;
fpair = (force_coul + force_lj) * r2inv;
if (rsq < cut_in_on_sq) { // switching
register double rsw = (sqrt(rsq) - cut_in_off)/cut_in_diff;
fpair *= rsw*rsw*(3.0 - 2.0*rsw);
}
if (rsq > cut_out_on_sq) {
register double rsw = (sqrt(rsq) - cut_out_on)/cut_out_diff;
fpair *= 1.0 + rsw*rsw*(2.0*rsw-3.0);
}
if (newton_pair || j < nlocal) { // force update
register double *fj = f0+(j+(j<<1)), f;
fi[0] += f = d[0]*fpair; fj[0] -= f;
fi[1] += f = d[1]*fpair; fj[1] -= f;
fi[2] += f = d[2]*fpair; fj[2] -= f;
}
else {
fi[0] += d[0]*fpair;
fi[1] += d[1]*fpair;
fi[2] += d[2]*fpair;
}
}
}
}
/* ---------------------------------------------------------------------- */
void PairLJLongCoulLong::compute_outer(int eflag, int vflag)
{
double evdwl,ecoul,fvirial,fpair;
evdwl = ecoul = 0.0;
if (eflag || vflag) ev_setup(eflag,vflag);
else evflag = 0;
double **x = atom->x, *x0 = x[0];
double **f = atom->f, *f0 = f[0], *fi = f0;
double *q = atom->q;
int *type = atom->type;
int nlocal = atom->nlocal;
double *special_coul = force->special_coul;
double *special_lj = force->special_lj;
int newton_pair = force->newton_pair;
double qqrd2e = force->qqrd2e;
int i, j, order1 = ewald_order&(1<<1), order6 = ewald_order&(1<<6);
int *ineigh, *ineighn, *jneigh, *jneighn, typei, typej, ni, respa_flag;
double qi = 0.0, qri = 0.0;
double *cutsqi, *cut_ljsqi, *lj1i, *lj2i, *lj3i, *lj4i, *offseti;
double rsq, r2inv, force_coul, force_lj;
double g2 = g_ewald_6*g_ewald_6, g6 = g2*g2*g2, g8 = g6*g2;
double respa_lj = 0.0, respa_coul = 0.0, frespa = 0.0;
vector xi, d;
double cut_in_off = cut_respa[2];
double cut_in_on = cut_respa[3];
double cut_in_diff = cut_in_on - cut_in_off;
double cut_in_off_sq = cut_in_off*cut_in_off;
double cut_in_on_sq = cut_in_on*cut_in_on;
ineighn = (ineigh = listouter->ilist)+listouter->inum;
for (; ineigh<ineighn; ++ineigh) { // loop over my atoms
i = *ineigh; fi = f0+3*i;
if (order1) qri = (qi = q[i])*qqrd2e; // initialize constants
offseti = offset[typei = type[i]];
lj1i = lj1[typei]; lj2i = lj2[typei]; lj3i = lj3[typei]; lj4i = lj4[typei];
cutsqi = cutsq[typei]; cut_ljsqi = cut_ljsq[typei];
memcpy(xi, x0+(i+(i<<1)), sizeof(vector));
jneighn = (jneigh = listouter->firstneigh[i])+listouter->numneigh[i];
for (; jneigh<jneighn; ++jneigh) { // loop over neighbors
j = *jneigh;
ni = sbmask(j);
j &= NEIGHMASK;
{ register double *xj = x0+(j+(j<<1));
d[0] = xi[0] - xj[0]; // pair vector
d[1] = xi[1] - xj[1];
d[2] = xi[2] - xj[2]; }
if ((rsq = vec_dot(d, d)) >= cutsqi[typej = type[j]]) continue;
r2inv = 1.0/rsq;
frespa = 1.0; // check whether and how to compute respa corrections
respa_coul = 0;
respa_lj = 0;
respa_flag = rsq < cut_in_on_sq ? 1 : 0;
if (respa_flag && (rsq > cut_in_off_sq)) {
register double rsw = (sqrt(rsq)-cut_in_off)/cut_in_diff;
frespa = 1-rsw*rsw*(3.0-2.0*rsw);
}
if (order1 && (rsq < cut_coulsq)) { // coulombic
if (!ncoultablebits || rsq <= tabinnersq) { // series real space
register double r = sqrt(rsq), s = qri*q[j];
if (respa_flag) // correct for respa
respa_coul = ni == 0 ? frespa*s/r : frespa*s/r*special_coul[ni];
register double x = g_ewald*r, t = 1.0/(1.0+EWALD_P*x);
if (ni == 0) {
s *= g_ewald*exp(-x*x);
force_coul = (t *= ((((t*A5+A4)*t+A3)*t+A2)*t+A1)*s/x)+EWALD_F*s-respa_coul;
if (eflag) ecoul = t;
}
else { // correct for special
r = s*(1.0-special_coul[ni])/r; s *= g_ewald*exp(-x*x);
force_coul = (t *= ((((t*A5+A4)*t+A3)*t+A2)*t+A1)*s/x)+EWALD_F*s-r-respa_coul;
if (eflag) ecoul = t-r;
}
} // table real space
else {
if (respa_flag) {
register double r = sqrt(rsq), s = qri*q[j];
respa_coul = ni == 0 ? frespa*s/r : frespa*s/r*special_coul[ni];
}
register union_int_float_t t;
t.f = rsq;
register const int k = (t.i & ncoulmask) >> ncoulshiftbits;
register double f = (rsq-rtable[k])*drtable[k], qiqj = qi*q[j];
if (ni == 0) {
force_coul = qiqj*(ftable[k]+f*dftable[k]);
if (eflag) ecoul = qiqj*(etable[k]+f*detable[k]);
}
else { // correct for special
t.f = (1.0-special_coul[ni])*(ctable[k]+f*dctable[k]);
force_coul = qiqj*(ftable[k]+f*dftable[k]-t.f);
if (eflag) {
t.f = (1.0-special_coul[ni])*(ptable[k]+f*dptable[k]);
ecoul = qiqj*(etable[k]+f*detable[k]-t.f);
}
}
}
}
else force_coul = respa_coul = ecoul = 0.0;
if (rsq < cut_ljsqi[typej]) { // lennard-jones
register double rn = r2inv*r2inv*r2inv;
if (respa_flag) respa_lj = ni == 0 ? // correct for respa
frespa*rn*(rn*lj1i[typej]-lj2i[typej]) :
frespa*rn*(rn*lj1i[typej]-lj2i[typej])*special_lj[ni];
if (order6) { // long-range form
if (!ndisptablebits || rsq <= tabinnerdispsq) {
register double x2 = g2*rsq, a2 = 1.0/x2;
x2 = a2*exp(-x2)*lj4i[typej];
if (ni == 0) {
force_lj =
(rn*=rn)*lj1i[typej]-g8*(((6.0*a2+6.0)*a2+3.0)*a2+1.0)*x2*rsq-respa_lj;
if (eflag) evdwl = rn*lj3i[typej]-g6*((a2+1.0)*a2+0.5)*x2;
}
else { // correct for special
register double f = special_lj[ni], t = rn*(1.0-f);
force_lj = f*(rn *= rn)*lj1i[typej]-
g8*(((6.0*a2+6.0)*a2+3.0)*a2+1.0)*x2*rsq+t*lj2i[typej]-respa_lj;
if (eflag)
evdwl = f*rn*lj3i[typej]-g6*((a2+1.0)*a2+0.5)*x2+t*lj4i[typej];
}
}
else { // table real space
register union_int_float_t disp_t;
disp_t.f = rsq;
register const int disp_k = (disp_t.i & ndispmask)>>ndispshiftbits;
register double f_disp = (rsq-rdisptable[disp_k])*drdisptable[disp_k];
register double rn = r2inv*r2inv*r2inv;
if (ni == 0) {
force_lj = (rn*=rn)*lj1i[typej]-(fdisptable[disp_k]+f_disp*dfdisptable[disp_k])*lj4i[typej]-respa_lj;
if (eflag) evdwl = rn*lj3i[typej]-(edisptable[disp_k]+f_disp*dedisptable[disp_k])*lj4i[typej];
}
else { // special case
register double f = special_lj[ni], t = rn*(1.0-f);
force_lj = f*(rn *= rn)*lj1i[typej]-(fdisptable[disp_k]+f_disp*dfdisptable[disp_k])*lj4i[typej]+t*lj2i[typej]-respa_lj;
if (eflag) evdwl = f*rn*lj3i[typej]-(edisptable[disp_k]+f_disp*dedisptable[disp_k])*lj4i[typej]+t*lj4i[typej];
}
}
}
else { // cut form
if (ni == 0) {
force_lj = rn*(rn*lj1i[typej]-lj2i[typej])-respa_lj;
if (eflag) evdwl = rn*(rn*lj3i[typej]-lj4i[typej])-offseti[typej];
}
else { // correct for special
register double f = special_lj[ni];
force_lj = f*rn*(rn*lj1i[typej]-lj2i[typej])-respa_lj;
if (eflag)
evdwl = f*(rn*(rn*lj3i[typej]-lj4i[typej])-offseti[typej]);
}
}
}
else force_lj = respa_lj = evdwl = 0.0;
fpair = (force_coul+force_lj)*r2inv;
if (newton_pair || j < nlocal) {
register double *fj = f0+(j+(j<<1)), f;
fi[0] += f = d[0]*fpair; fj[0] -= f;
fi[1] += f = d[1]*fpair; fj[1] -= f;
fi[2] += f = d[2]*fpair; fj[2] -= f;
}
else {
fi[0] += d[0]*fpair;
fi[1] += d[1]*fpair;
fi[2] += d[2]*fpair;
}
if (evflag) {
fvirial = (force_coul + force_lj + respa_coul + respa_lj)*r2inv;
ev_tally(i,j,nlocal,newton_pair,
evdwl,ecoul,fvirial,d[0],d[1],d[2]);
}
}
}
}
/* ---------------------------------------------------------------------- */
double PairLJLongCoulLong::single(int i, int j, int itype, int jtype,
double rsq, double factor_coul, double factor_lj,
double &fforce)
{
double r2inv, r6inv, force_coul, force_lj;
double g2 = g_ewald_6*g_ewald_6, g6 = g2*g2*g2, g8 = g6*g2, *q = atom->q;
double eng = 0.0;
r2inv = 1.0/rsq;
if ((ewald_order&2) && (rsq < cut_coulsq)) { // coulombic
if (!ncoultablebits || rsq <= tabinnersq) { // series real space
register double r = sqrt(rsq), x = g_ewald*r;
register double s = force->qqrd2e*q[i]*q[j], t = 1.0/(1.0+EWALD_P*x);
r = s*(1.0-factor_coul)/r; s *= g_ewald*exp(-x*x);
force_coul = (t *= ((((t*A5+A4)*t+A3)*t+A2)*t+A1)*s/x)+EWALD_F*s-r;
eng += t-r;
}
else { // table real space
register union_int_float_t t;
t.f = rsq;
register const int k = (t.i & ncoulmask) >> ncoulshiftbits;
register double f = (rsq-rtable[k])*drtable[k], qiqj = q[i]*q[j];
t.f = (1.0-factor_coul)*(ctable[k]+f*dctable[k]);
force_coul = qiqj*(ftable[k]+f*dftable[k]-t.f);
eng += qiqj*(etable[k]+f*detable[k]-t.f);
}
} else force_coul = 0.0;
if (rsq < cut_ljsq[itype][jtype]) { // lennard-jones
r6inv = r2inv*r2inv*r2inv;
if (ewald_order&64) { // long-range
register double x2 = g2*rsq, a2 = 1.0/x2, t = r6inv*(1.0-factor_lj);
x2 = a2*exp(-x2)*lj4[itype][jtype];
force_lj = factor_lj*(r6inv *= r6inv)*lj1[itype][jtype]-
g8*(((6.0*a2+6.0)*a2+3.0)*a2+a2)*x2*rsq+t*lj2[itype][jtype];
eng += factor_lj*r6inv*lj3[itype][jtype]-
g6*((a2+1.0)*a2+0.5)*x2+t*lj4[itype][jtype];
}
else { // cut
force_lj = factor_lj*r6inv*(lj1[itype][jtype]*r6inv-lj2[itype][jtype]);
eng += factor_lj*(r6inv*(r6inv*lj3[itype][jtype]-
lj4[itype][jtype])-offset[itype][jtype]);
}
} else force_lj = 0.0;
fforce = (force_coul+force_lj)*r2inv;
return eng;
}
diff --git a/src/MAKE/MACHINES/Makefile.white b/src/MAKE/MACHINES/Makefile.white
index ae31664b0..53de76e73 100644
--- a/src/MAKE/MACHINES/Makefile.white
+++ b/src/MAKE/MACHINES/Makefile.white
@@ -1,125 +1,124 @@
# kokkos_cuda = KOKKOS/CUDA package, OpenMPI with nvcc compiler, Kepler GPU
SHELL = /bin/sh
# ---------------------------------------------------------------------
# compiler/linker settings
# specify flags and libraries needed for your compiler
KOKKOS_ABSOLUTE_PATH = $(shell cd $(KOKKOS_PATH); pwd)
export OMPI_CXX = $(KOKKOS_ABSOLUTE_PATH)/config/nvcc_wrapper
CC = mpicxx
CCFLAGS = -g -O3
SHFLAGS = -fPIC
DEPFLAGS = -M
LINK = mpicxx
LINKFLAGS = -g -O3
LIB =
SIZE = size
ARCHIVE = ar
ARFLAGS = -rc
SHLIBFLAGS = -shared
KOKKOS_DEVICES = Cuda, OpenMP
KOKKOS_ARCH = Kepler35
-KOKKOS_CUDA_OPTIONS = enable_lambda
# ---------------------------------------------------------------------
# LAMMPS-specific settings, all OPTIONAL
# specify settings for LAMMPS features you will use
# if you change any -D setting, do full re-compile after "make clean"
# LAMMPS ifdef settings
# see possible settings in Section 2.2 (step 4) of manual
LMP_INC = -DLAMMPS_GZIP
# MPI library
# see discussion in Section 2.2 (step 5) of manual
# MPI wrapper compiler/linker can provide this info
# can point to dummy MPI library in src/STUBS as in Makefile.serial
# use -D MPICH and OMPI settings in INC to avoid C++ lib conflicts
# INC = path for mpi.h, MPI compiler settings
# PATH = path for MPI library
# LIB = name of MPI library
MPI_INC = -DMPICH_SKIP_MPICXX -DOMPI_SKIP_MPICXX=1
MPI_PATH =
MPI_LIB =
# FFT library
# see discussion in Section 2.2 (step 6) of manaul
# can be left blank to use provided KISS FFT library
# INC = -DFFT setting, e.g. -DFFT_FFTW, FFT compiler settings
# PATH = path for FFT library
# LIB = name of FFT library
FFT_INC =
FFT_PATH =
FFT_LIB =
# JPEG and/or PNG library
# see discussion in Section 2.2 (step 7) of manual
# only needed if -DLAMMPS_JPEG or -DLAMMPS_PNG listed with LMP_INC
# INC = path(s) for jpeglib.h and/or png.h
# PATH = path(s) for JPEG library and/or PNG library
# LIB = name(s) of JPEG library and/or PNG library
JPG_INC =
JPG_PATH =
JPG_LIB =
# ---------------------------------------------------------------------
# build rules and dependencies
# do not edit this section
include Makefile.package.settings
include Makefile.package
EXTRA_INC = $(LMP_INC) $(PKG_INC) $(MPI_INC) $(FFT_INC) $(JPG_INC) $(PKG_SYSINC)
EXTRA_PATH = $(PKG_PATH) $(MPI_PATH) $(FFT_PATH) $(JPG_PATH) $(PKG_SYSPATH)
EXTRA_LIB = $(PKG_LIB) $(MPI_LIB) $(FFT_LIB) $(JPG_LIB) $(PKG_SYSLIB)
EXTRA_CPP_DEPENDS = $(PKG_CPP_DEPENDS)
EXTRA_LINK_DEPENDS = $(PKG_LINK_DEPENDS)
# Path to src files
vpath %.cpp ..
vpath %.h ..
# Link target
$(EXE): $(OBJ) $(EXTRA_LINK_DEPENDS)
$(LINK) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(EXTRA_LIB) $(LIB) -o $(EXE)
$(SIZE) $(EXE)
# Library targets
lib: $(OBJ) $(EXTRA_LINK_DEPENDS)
$(ARCHIVE) $(ARFLAGS) $(EXE) $(OBJ)
shlib: $(OBJ) $(EXTRA_LINK_DEPENDS)
$(CC) $(CCFLAGS) $(SHFLAGS) $(SHLIBFLAGS) $(EXTRA_PATH) -o $(EXE) \
$(OBJ) $(EXTRA_LIB) $(LIB)
# Compilation rules
%.o:%.cpp $(EXTRA_CPP_DEPENDS)
$(CC) $(CCFLAGS) $(SHFLAGS) $(EXTRA_INC) -c $<
%.d:%.cpp $(EXTRA_CPP_DEPENDS)
$(CC) $(CCFLAGS) $(EXTRA_INC) $(DEPFLAGS) $< > $@
%.o:%.cu $(EXTRA_CPP_DEPENDS)
$(CC) $(CCFLAGS) $(SHFLAGS) $(EXTRA_INC) -c $<
# Individual dependencies
depend : fastdep.exe $(SRC)
@./fastdep.exe $(EXTRA_INC) -- $^ > .depend || exit 1
fastdep.exe: ../DEPEND/fastdep.c
gcc -O -o $@ $<
sinclude .depend
diff --git a/src/MAKE/OPTIONS/Makefile.intel_cpu b/src/MAKE/OPTIONS/Makefile.intel_cpu
index b34ff4776..b7db06457 100755
--- a/src/MAKE/OPTIONS/Makefile.intel_cpu
+++ b/src/MAKE/OPTIONS/Makefile.intel_cpu
@@ -1,123 +1,123 @@
# intel_cpu_intelmpi = USER-INTEL package, Intel MPI, MKL FFT
SHELL = /bin/sh
# ---------------------------------------------------------------------
# compiler/linker settings
# specify flags and libraries needed for your compiler
CC = mpiicpc
OPTFLAGS = -xHost -O2 -fp-model fast=2 -no-prec-div -qoverride-limits
CCFLAGS = -g -qopenmp -DLAMMPS_MEMALIGN=64 -no-offload \
-fno-alias -ansi-alias -restrict $(OPTFLAGS)
SHFLAGS = -fPIC
DEPFLAGS = -M
LINK = mpiicpc
LINKFLAGS = -g -qopenmp $(OPTFLAGS)
LIB = -ltbbmalloc -ltbbmalloc_proxy
SIZE = size
ARCHIVE = ar
ARFLAGS = -rc
SHLIBFLAGS = -shared
# ---------------------------------------------------------------------
# LAMMPS-specific settings, all OPTIONAL
# specify settings for LAMMPS features you will use
# if you change any -D setting, do full re-compile after "make clean"
# LAMMPS ifdef settings
# see possible settings in Section 2.2 (step 4) of manual
LMP_INC = -DLAMMPS_GZIP -DLAMMPS_JPEG
# MPI library
# see discussion in Section 2.2 (step 5) of manual
# MPI wrapper compiler/linker can provide this info
# can point to dummy MPI library in src/STUBS as in Makefile.serial
# use -D MPICH and OMPI settings in INC to avoid C++ lib conflicts
# INC = path for mpi.h, MPI compiler settings
# PATH = path for MPI library
# LIB = name of MPI library
MPI_INC = -DMPICH_SKIP_MPICXX -DOMPI_SKIP_MPICXX=1
MPI_PATH =
MPI_LIB =
# FFT library
# see discussion in Section 2.2 (step 6) of manaul
# can be left blank to use provided KISS FFT library
# INC = -DFFT setting, e.g. -DFFT_FFTW, FFT compiler settings
# PATH = path for FFT library
# LIB = name of FFT library
FFT_INC = -DFFT_MKL -DFFT_SINGLE
FFT_PATH =
-FFT_LIB = -L$MKLROOT/lib/intel64/ -lmkl_intel_ilp64 \
+FFT_LIB = -L$(MKLROOT)/lib/intel64/ -lmkl_intel_ilp64 \
-lmkl_sequential -lmkl_core
# JPEG and/or PNG library
# see discussion in Section 2.2 (step 7) of manual
# only needed if -DLAMMPS_JPEG or -DLAMMPS_PNG listed with LMP_INC
# INC = path(s) for jpeglib.h and/or png.h
# PATH = path(s) for JPEG library and/or PNG library
# LIB = name(s) of JPEG library and/or PNG library
JPG_INC =
JPG_PATH =
JPG_LIB = -ljpeg
# ---------------------------------------------------------------------
# build rules and dependencies
# do not edit this section
include Makefile.package.settings
include Makefile.package
EXTRA_INC = $(LMP_INC) $(PKG_INC) $(MPI_INC) $(FFT_INC) $(JPG_INC) $(PKG_SYSINC)
EXTRA_PATH = $(PKG_PATH) $(MPI_PATH) $(FFT_PATH) $(JPG_PATH) $(PKG_SYSPATH)
EXTRA_LIB = $(PKG_LIB) $(MPI_LIB) $(FFT_LIB) $(JPG_LIB) $(PKG_SYSLIB)
EXTRA_CPP_DEPENDS = $(PKG_CPP_DEPENDS)
EXTRA_LINK_DEPENDS = $(PKG_LINK_DEPENDS)
# Path to src files
vpath %.cpp ..
vpath %.h ..
# Link target
$(EXE): $(OBJ) $(EXTRA_LINK_DEPENDS)
$(LINK) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(EXTRA_LIB) $(LIB) -o $(EXE)
$(SIZE) $(EXE)
# Library targets
lib: $(OBJ) $(EXTRA_LINK_DEPENDS)
$(ARCHIVE) $(ARFLAGS) $(EXE) $(OBJ)
shlib: $(OBJ) $(EXTRA_LINK_DEPENDS)
$(CC) $(CCFLAGS) $(SHFLAGS) $(SHLIBFLAGS) $(EXTRA_PATH) -o $(EXE) \
$(OBJ) $(EXTRA_LIB) $(LIB)
# Compilation rules
%.o:%.cpp $(EXTRA_CPP_DEPENDS)
$(CC) $(CCFLAGS) $(SHFLAGS) $(EXTRA_INC) -c $<
%.d:%.cpp $(EXTRA_CPP_DEPENDS)
$(CC) $(CCFLAGS) $(EXTRA_INC) $(DEPFLAGS) $< > $@
%.o:%.cu $(EXTRA_CPP_DEPENDS)
$(CC) $(CCFLAGS) $(SHFLAGS) $(EXTRA_INC) -c $<
# Individual dependencies
depend : fastdep.exe $(SRC)
@./fastdep.exe $(EXTRA_INC) -- $^ > .depend || exit 1
fastdep.exe: ../DEPEND/fastdep.c
cc -O -o $@ $<
sinclude .depend
diff --git a/src/MAKE/OPTIONS/Makefile.intel_cpu_intelmpi b/src/MAKE/OPTIONS/Makefile.intel_cpu_intelmpi
index 74ff65d0c..2cb37ed9f 100644
--- a/src/MAKE/OPTIONS/Makefile.intel_cpu_intelmpi
+++ b/src/MAKE/OPTIONS/Makefile.intel_cpu_intelmpi
@@ -1,123 +1,123 @@
# intel_cpu_intelmpi = USER-INTEL package, Intel MPI, MKL FFT
SHELL = /bin/sh
# ---------------------------------------------------------------------
# compiler/linker settings
# specify flags and libraries needed for your compiler
CC = mpiicpc
OPTFLAGS = -xHost -O2 -fp-model fast=2 -no-prec-div -qoverride-limits
CCFLAGS = -g -qopenmp -DLAMMPS_MEMALIGN=64 -no-offload \
-fno-alias -ansi-alias -restrict $(OPTFLAGS)
SHFLAGS = -fPIC
DEPFLAGS = -M
LINK = mpiicpc
LINKFLAGS = -g -qopenmp $(OPTFLAGS)
LIB = -ltbbmalloc
SIZE = size
ARCHIVE = ar
ARFLAGS = -rc
SHLIBFLAGS = -shared
# ---------------------------------------------------------------------
# LAMMPS-specific settings, all OPTIONAL
# specify settings for LAMMPS features you will use
# if you change any -D setting, do full re-compile after "make clean"
# LAMMPS ifdef settings
# see possible settings in Section 2.2 (step 4) of manual
LMP_INC = -DLAMMPS_GZIP
# MPI library
# see discussion in Section 2.2 (step 5) of manual
# MPI wrapper compiler/linker can provide this info
# can point to dummy MPI library in src/STUBS as in Makefile.serial
# use -D MPICH and OMPI settings in INC to avoid C++ lib conflicts
# INC = path for mpi.h, MPI compiler settings
# PATH = path for MPI library
# LIB = name of MPI library
MPI_INC = -DMPICH_SKIP_MPICXX -DOMPI_SKIP_MPICXX=1
MPI_PATH =
MPI_LIB =
# FFT library
# see discussion in Section 2.2 (step 6) of manaul
# can be left blank to use provided KISS FFT library
# INC = -DFFT setting, e.g. -DFFT_FFTW, FFT compiler settings
# PATH = path for FFT library
# LIB = name of FFT library
FFT_INC = -DFFT_MKL -DFFT_SINGLE
FFT_PATH =
-FFT_LIB = -L$MKLROOT/lib/intel64/ -lmkl_intel_ilp64 \
+FFT_LIB = -L$(MKLROOT)/lib/intel64/ -lmkl_intel_ilp64 \
-lmkl_sequential -lmkl_core
# JPEG and/or PNG library
# see discussion in Section 2.2 (step 7) of manual
# only needed if -DLAMMPS_JPEG or -DLAMMPS_PNG listed with LMP_INC
# INC = path(s) for jpeglib.h and/or png.h
# PATH = path(s) for JPEG library and/or PNG library
# LIB = name(s) of JPEG library and/or PNG library
JPG_INC =
JPG_PATH =
JPG_LIB =
# ---------------------------------------------------------------------
# build rules and dependencies
# do not edit this section
include Makefile.package.settings
include Makefile.package
EXTRA_INC = $(LMP_INC) $(PKG_INC) $(MPI_INC) $(FFT_INC) $(JPG_INC) $(PKG_SYSINC)
EXTRA_PATH = $(PKG_PATH) $(MPI_PATH) $(FFT_PATH) $(JPG_PATH) $(PKG_SYSPATH)
EXTRA_LIB = $(PKG_LIB) $(MPI_LIB) $(FFT_LIB) $(JPG_LIB) $(PKG_SYSLIB)
EXTRA_CPP_DEPENDS = $(PKG_CPP_DEPENDS)
EXTRA_LINK_DEPENDS = $(PKG_LINK_DEPENDS)
# Path to src files
vpath %.cpp ..
vpath %.h ..
# Link target
$(EXE): $(OBJ) $(EXTRA_LINK_DEPENDS)
$(LINK) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(EXTRA_LIB) $(LIB) -o $(EXE)
$(SIZE) $(EXE)
# Library targets
lib: $(OBJ) $(EXTRA_LINK_DEPENDS)
$(ARCHIVE) $(ARFLAGS) $(EXE) $(OBJ)
shlib: $(OBJ) $(EXTRA_LINK_DEPENDS)
$(CC) $(CCFLAGS) $(SHFLAGS) $(SHLIBFLAGS) $(EXTRA_PATH) -o $(EXE) \
$(OBJ) $(EXTRA_LIB) $(LIB)
# Compilation rules
%.o:%.cpp $(EXTRA_CPP_DEPENDS)
$(CC) $(CCFLAGS) $(SHFLAGS) $(EXTRA_INC) -c $<
%.d:%.cpp $(EXTRA_CPP_DEPENDS)
$(CC) $(CCFLAGS) $(EXTRA_INC) $(DEPFLAGS) $< > $@
%.o:%.cu $(EXTRA_CPP_DEPENDS)
$(CC) $(CCFLAGS) $(SHFLAGS) $(EXTRA_INC) -c $<
# Individual dependencies
depend : fastdep.exe $(SRC)
@./fastdep.exe $(EXTRA_INC) -- $^ > .depend || exit 1
fastdep.exe: ../DEPEND/fastdep.c
cc -O -o $@ $<
sinclude .depend
diff --git a/src/MAKE/OPTIONS/Makefile.knl b/src/MAKE/OPTIONS/Makefile.knl
index 1260a27a1..3bc777592 100644
--- a/src/MAKE/OPTIONS/Makefile.knl
+++ b/src/MAKE/OPTIONS/Makefile.knl
@@ -1,121 +1,121 @@
# knl = Flags for Knights Landing Xeon Phi Processor,Intel Compiler/MPI,MKL FFT
SHELL = /bin/sh
# ---------------------------------------------------------------------
# compiler/linker settings
# specify flags and libraries needed for your compiler
CC = mpiicpc
OPTFLAGS = -xMIC-AVX512 -O2 -fp-model fast=2 -no-prec-div -qoverride-limits
CCFLAGS = -g -qopenmp -DLAMMPS_MEMALIGN=64 -no-offload \
-fno-alias -ansi-alias -restrict $(OPTFLAGS)
SHFLAGS = -fPIC
DEPFLAGS = -M
LINK = mpiicpc
LINKFLAGS = -g -qopenmp $(OPTFLAGS)
LIB = -ltbbmalloc
SIZE = size
ARCHIVE = ar
ARFLAGS = -rc
SHLIBFLAGS = -shared
# ---------------------------------------------------------------------
# LAMMPS-specific settings, all OPTIONAL
# specify settings for LAMMPS features you will use
# if you change any -D setting, do full re-compile after "make clean"
# LAMMPS ifdef settings
# see possible settings in Section 2.2 (step 4) of manual
LMP_INC = -DLAMMPS_GZIP -DLAMMPS_JPEG
# MPI library
# see discussion in Section 2.2 (step 5) of manual
# MPI wrapper compiler/linker can provide this info
# can point to dummy MPI library in src/STUBS as in Makefile.serial
# use -D MPICH and OMPI settings in INC to avoid C++ lib conflicts
# INC = path for mpi.h, MPI compiler settings
# PATH = path for MPI library
# LIB = name of MPI library
MPI_INC = -DMPICH_SKIP_MPICXX -DOMPI_SKIP_MPICXX=1
MPI_PATH =
MPI_LIB =
# FFT library
# see discussion in Section 2.2 (step 6) of manaul
# can be left blank to use provided KISS FFT library
# INC = -DFFT setting, e.g. -DFFT_FFTW, FFT compiler settings
# PATH = path for FFT library
# LIB = name of FFT library
FFT_INC = -DFFT_MKL -DFFT_SINGLE
FFT_PATH =
-FFT_LIB = -L$MKLROOT/lib/intel64/ -lmkl_intel_ilp64 \
+FFT_LIB = -L$(MKLROOT)/lib/intel64/ -lmkl_intel_ilp64 \
-lmkl_sequential -lmkl_core
# JPEG and/or PNG library
# see discussion in Section 2.2 (step 7) of manual
# only needed if -DLAMMPS_JPEG or -DLAMMPS_PNG listed with LMP_INC
# INC = path(s) for jpeglib.h and/or png.h
# PATH = path(s) for JPEG library and/or PNG library
# LIB = name(s) of JPEG library and/or PNG library
JPG_INC =
JPG_PATH =
JPG_LIB = -ljpeg
# ---------------------------------------------------------------------
# build rules and dependencies
# do not edit this section
include Makefile.package.settings
include Makefile.package
EXTRA_INC = $(LMP_INC) $(PKG_INC) $(MPI_INC) $(FFT_INC) $(JPG_INC) $(PKG_SYSINC)
EXTRA_PATH = $(PKG_PATH) $(MPI_PATH) $(FFT_PATH) $(JPG_PATH) $(PKG_SYSPATH)
EXTRA_LIB = $(PKG_LIB) $(MPI_LIB) $(FFT_LIB) $(JPG_LIB) $(PKG_SYSLIB)
# Path to src files
vpath %.cpp ..
vpath %.h ..
# Link target
$(EXE): $(OBJ)
$(LINK) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(EXTRA_LIB) $(LIB) -o $(EXE)
$(SIZE) $(EXE)
# Library targets
lib: $(OBJ)
$(ARCHIVE) $(ARFLAGS) $(EXE) $(OBJ)
shlib: $(OBJ)
$(CC) $(CCFLAGS) $(SHFLAGS) $(SHLIBFLAGS) $(EXTRA_PATH) -o $(EXE) \
$(OBJ) $(EXTRA_LIB) $(LIB)
# Compilation rules
%.o:%.cpp
$(CC) $(CCFLAGS) $(SHFLAGS) $(EXTRA_INC) -c $<
%.d:%.cpp
$(CC) $(CCFLAGS) $(EXTRA_INC) $(DEPFLAGS) $< > $@
%.o:%.cu
$(CC) $(CCFLAGS) $(SHFLAGS) $(EXTRA_INC) -c $<
# Individual dependencies
depend : fastdep.exe $(SRC)
@./fastdep.exe $(EXTRA_INC) -- $^ > .depend || exit 1
fastdep.exe: ../DEPEND/fastdep.c
cc -O -o $@ $<
sinclude .depend
diff --git a/src/MAKE/OPTIONS/Makefile.kokkos_cuda_mpich b/src/MAKE/OPTIONS/Makefile.kokkos_cuda_mpich
index efdc728bd..be0c2d191 100644
--- a/src/MAKE/OPTIONS/Makefile.kokkos_cuda_mpich
+++ b/src/MAKE/OPTIONS/Makefile.kokkos_cuda_mpich
@@ -1,123 +1,124 @@
# kokkos_cuda = KOKKOS/CUDA package, MPICH with nvcc compiler, Kepler GPU
SHELL = /bin/sh
# ---------------------------------------------------------------------
# compiler/linker settings
# specify flags and libraries needed for your compiler
KOKKOS_ABSOLUTE_PATH = $(shell cd $(KOKKOS_PATH); pwd)
-CC = mpicxx -cxx=$(KOKKOS_ABSOLUTE_PATH)/config/nvcc_wrapper
+export MPICH_CXX = $(KOKKOS_ABSOLUTE_PATH)/config/nvcc_wrapper
+CC = mpicxx
CCFLAGS = -g -O3
SHFLAGS = -fPIC
DEPFLAGS = -M
-LINK = mpicxx -cxx=$(KOKKOS_ABSOLUTE_PATH)/config/nvcc_wrapper
-LINKFLAGS = -g -O
+LINK = mpicxx
+LINKFLAGS = -g -O3
LIB =
SIZE = size
ARCHIVE = ar
ARFLAGS = -rc
SHLIBFLAGS = -shared
KOKKOS_DEVICES = Cuda, OpenMP
KOKKOS_ARCH = Kepler35
# ---------------------------------------------------------------------
# LAMMPS-specific settings, all OPTIONAL
# specify settings for LAMMPS features you will use
# if you change any -D setting, do full re-compile after "make clean"
# LAMMPS ifdef settings
# see possible settings in Section 2.2 (step 4) of manual
LMP_INC = -DLAMMPS_GZIP
# MPI library
# see discussion in Section 2.2 (step 5) of manual
# MPI wrapper compiler/linker can provide this info
# can point to dummy MPI library in src/STUBS as in Makefile.serial
# use -D MPICH and OMPI settings in INC to avoid C++ lib conflicts
# INC = path for mpi.h, MPI compiler settings
# PATH = path for MPI library
# LIB = name of MPI library
MPI_INC = -DMPICH_SKIP_MPICXX -DOMPI_SKIP_MPICXX=1
MPI_PATH =
MPI_LIB =
# FFT library
# see discussion in Section 2.2 (step 6) of manaul
# can be left blank to use provided KISS FFT library
# INC = -DFFT setting, e.g. -DFFT_FFTW, FFT compiler settings
# PATH = path for FFT library
# LIB = name of FFT library
FFT_INC =
FFT_PATH =
FFT_LIB =
# JPEG and/or PNG library
# see discussion in Section 2.2 (step 7) of manual
# only needed if -DLAMMPS_JPEG or -DLAMMPS_PNG listed with LMP_INC
# INC = path(s) for jpeglib.h and/or png.h
# PATH = path(s) for JPEG library and/or PNG library
# LIB = name(s) of JPEG library and/or PNG library
JPG_INC =
JPG_PATH =
JPG_LIB =
# ---------------------------------------------------------------------
# build rules and dependencies
# do not edit this section
include Makefile.package.settings
include Makefile.package
EXTRA_INC = $(LMP_INC) $(PKG_INC) $(MPI_INC) $(FFT_INC) $(JPG_INC) $(PKG_SYSINC)
EXTRA_PATH = $(PKG_PATH) $(MPI_PATH) $(FFT_PATH) $(JPG_PATH) $(PKG_SYSPATH)
EXTRA_LIB = $(PKG_LIB) $(MPI_LIB) $(FFT_LIB) $(JPG_LIB) $(PKG_SYSLIB)
EXTRA_CPP_DEPENDS = $(PKG_CPP_DEPENDS)
EXTRA_LINK_DEPENDS = $(PKG_LINK_DEPENDS)
# Path to src files
vpath %.cpp ..
vpath %.h ..
# Link target
$(EXE): $(OBJ) $(EXTRA_LINK_DEPENDS)
$(LINK) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(EXTRA_LIB) $(LIB) -o $(EXE)
$(SIZE) $(EXE)
# Library targets
lib: $(OBJ) $(EXTRA_LINK_DEPENDS)
$(ARCHIVE) $(ARFLAGS) $(EXE) $(OBJ)
shlib: $(OBJ) $(EXTRA_LINK_DEPENDS)
$(CC) $(CCFLAGS) $(SHFLAGS) $(SHLIBFLAGS) $(EXTRA_PATH) -o $(EXE) \
$(OBJ) $(EXTRA_LIB) $(LIB)
# Compilation rules
%.o:%.cpp $(EXTRA_CPP_DEPENDS)
$(CC) $(CCFLAGS) $(SHFLAGS) $(EXTRA_INC) -c $<
%.d:%.cpp $(EXTRA_CPP_DEPENDS)
$(CC) $(CCFLAGS) $(EXTRA_INC) $(DEPFLAGS) $< > $@
%.o:%.cu $(EXTRA_CPP_DEPENDS)
$(CC) $(CCFLAGS) $(SHFLAGS) $(EXTRA_INC) -c $<
# Individual dependencies
depend : fastdep.exe $(SRC)
@./fastdep.exe $(EXTRA_INC) -- $^ > .depend || exit 1
fastdep.exe: ../DEPEND/fastdep.c
cc -O -o $@ $<
sinclude .depend
diff --git a/src/USER-DPD/fix_rx.cpp b/src/USER-DPD/fix_rx.cpp
index b7330ba1e..a55ae7811 100644
--- a/src/USER-DPD/fix_rx.cpp
+++ b/src/USER-DPD/fix_rx.cpp
@@ -1,1825 +1,1840 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include "fix_rx.h"
#include "atom.h"
#include "error.h"
#include "group.h"
#include "modify.h"
#include "force.h"
#include "memory.h"
#include "comm.h"
#include "update.h"
#include "domain.h"
#include "neighbor.h"
#include "neigh_list.h"
+#include "neigh_request.h"
#include "math_special.h"
#include "pair_dpd_fdt_energy.h"
#include <float.h> // DBL_EPSILON
#include <vector> // std::vector<>
#include <algorithm> // std::max
#include <cmath> // std::fmod
using namespace LAMMPS_NS;
using namespace FixConst;
using namespace MathSpecial;
enum{NONE,HARMONIC};
enum{LUCY};
#define MAXLINE 1024
#define DELTA 4
#ifdef DBL_EPSILON
#define MY_EPSILON (10.0*DBL_EPSILON)
#else
#define MY_EPSILON (10.0*2.220446049250313e-16)
#endif
#define SparseKinetics_enableIntegralReactions (true)
#define SparseKinetics_invalidIndex (-1)
namespace /* anonymous */
{
typedef double TimerType;
TimerType getTimeStamp(void) { return MPI_Wtime(); }
double getElapsedTime( const TimerType &t0, const TimerType &t1) { return t1-t0; }
} // end namespace
/* ---------------------------------------------------------------------- */
FixRX::FixRX(LAMMPS *lmp, int narg, char **arg) :
Fix(lmp, narg, arg), mol2param(NULL), nreactions(0),
params(NULL), Arr(NULL), nArr(NULL), Ea(NULL), tempExp(NULL),
stoich(NULL), stoichReactants(NULL), stoichProducts(NULL), kR(NULL),
pairDPDE(NULL), dpdThetaLocal(NULL), sumWeights(NULL), sparseKinetics_nu(NULL),
sparseKinetics_nuk(NULL), sparseKinetics_inu(NULL), sparseKinetics_isIntegralReaction(NULL),
kineticsFile(NULL), id_fix_species(NULL),
id_fix_species_old(NULL), fix_species(NULL), fix_species_old(NULL)
{
if (narg < 7 || narg > 12) error->all(FLERR,"Illegal fix rx command");
nevery = 1;
nreactions = maxparam = 0;
params = NULL;
mol2param = NULL;
pairDPDE = NULL;
id_fix_species = NULL;
id_fix_species_old = NULL;
const int Verbosity = 1;
// Keep track of the argument list.
int iarg = 3;
// Read the kinetic file in arg[3].
kineticsFile = arg[iarg++];
// Determine the local temperature averaging method in arg[4].
wtFlag = 0;
localTempFlag = NONE;
{
char *word = arg[iarg++];
if (strcmp(word,"none") == 0){
wtFlag = 0;
localTempFlag = NONE;
}
else if (strcmp(word,"lucy") == 0){
wtFlag = LUCY;
localTempFlag = HARMONIC;
}
else
error->all(FLERR,"Illegal fix rx local temperature weighting technique");
}
// Select either sparse and dense matrix
// representations of the stoichiometric matrix.
useSparseKinetics = true;
{
char *word = arg[iarg++];
if (strcmp(word,"sparse") == 0)
useSparseKinetics = true;
else if (strcmp(word,"dense") == 0)
useSparseKinetics = false;
else {
std::string errmsg = "Illegal command " + std::string(word)
+ " expected \"sparse\" or \"dense\"\n";
error->all(FLERR, errmsg.c_str());
}
if (comm->me == 0 and Verbosity > 1){
std::string msg = "FixRX: matrix format is ";
if (useSparseKinetics)
msg += std::string("sparse");
else
msg += std::string("dense");
error->message(FLERR, msg.c_str());
}
}
// Determine the ODE solver/stepper strategy in arg[6].
odeIntegrationFlag = ODE_LAMMPS_RK4;
{
char *word = arg[iarg++];
if (strcmp(word,"lammps_rk4") == 0 || strcmp(word,"rk4") == 0)
odeIntegrationFlag = ODE_LAMMPS_RK4;
else if (strcmp(word,"lammps_rkf45") == 0 || strcmp(word,"rkf45") == 0)
odeIntegrationFlag = ODE_LAMMPS_RKF45;
else {
std::string errmsg = "Illegal ODE integration type: " + std::string(word);
error->all(FLERR, errmsg.c_str());
}
}
/// Set the default ODE parameters here. Modify with arg[].
/// 'minSteps' has a different meaning for RK4 and RKF45.
/// RK4: This is the # of steps that will be taken with h = dt_dpd / minSteps;
/// RKF45: This sets as h0 = dt_dpd / minSteps. If minSteps == 0, RKF45 will
/// estimate h0 internally. h will be adjusted as needed on subsequent steps.
minSteps = 1;
maxIters = 100;
relTol = 1.0e-6;
absTol = 1.0e-8;
diagnosticFrequency = 0;
for (int i = 0; i < numDiagnosticCounters; ++i){
diagnosticCounter[i] = 0;
diagnosticCounterPerODE[i] = NULL;
}
if (odeIntegrationFlag == ODE_LAMMPS_RK4 && narg==8){
char *word = arg[iarg++];
minSteps = atoi( word );
if (comm->me == 0 and Verbosity > 1){
char msg[128];
sprintf(msg, "FixRX: RK4 numSteps= %d", minSteps);
error->message(FLERR, msg);
}
}
else if (odeIntegrationFlag == ODE_LAMMPS_RK4 && narg>8){
error->all(FLERR,"Illegal fix rx command. Too many arguments for RK4 solver.");
}
else if (odeIntegrationFlag == ODE_LAMMPS_RKF45){
// Must have four options.
if (narg < 11)
error->all(FLERR,"Illegal fix rx command. Too few arguments for RKF45 solver.");
minSteps = atoi( arg[iarg++] );
maxIters = atoi( arg[iarg++] );
relTol = strtod( arg[iarg++], NULL);
absTol = strtod( arg[iarg++], NULL);
if (iarg < narg)
diagnosticFrequency = atoi( arg[iarg++] );
// maxIters must be at least minSteps.
maxIters = std::max( minSteps, maxIters );
if (comm->me == 0 and Verbosity > 1){
//printf("FixRX: RKF45 minSteps= %d maxIters= %d absTol= %e relTol= %e\n", minSteps, maxIters, absTol, relTol);
char msg[128];
sprintf(msg, "FixRX: RKF45 minSteps= %d maxIters= %d relTol= %.1e absTol= %.1e diagnosticFrequency= %d", minSteps, maxIters, relTol, absTol, diagnosticFrequency);
error->message(FLERR, msg);
}
}
// Initialize/Create the sparse matrix database.
sparseKinetics_nu = NULL;
sparseKinetics_nuk = NULL;
sparseKinetics_inu = NULL;
sparseKinetics_isIntegralReaction = NULL;
sparseKinetics_maxReactants = 0;
sparseKinetics_maxProducts = 0;
sparseKinetics_maxSpecies = 0;
}
/* ---------------------------------------------------------------------- */
FixRX::~FixRX()
{
// De-Allocate memory to prevent memory leak
for (int ii = 0; ii < nreactions; ii++){
delete [] stoich[ii];
delete [] stoichReactants[ii];
delete [] stoichProducts[ii];
}
delete [] Arr;
delete [] nArr;
delete [] Ea;
delete [] tempExp;
delete [] stoich;
delete [] stoichReactants;
delete [] stoichProducts;
delete [] kR;
delete [] id_fix_species;
delete [] id_fix_species_old;
if (useSparseKinetics){
memory->destroy( sparseKinetics_nu );
memory->destroy( sparseKinetics_nuk );
memory->destroy( sparseKinetics_inu );
memory->destroy( sparseKinetics_isIntegralReaction );
}
}
/* ---------------------------------------------------------------------- */
void FixRX::post_constructor()
{
int maxspecies = 1000;
int nUniqueSpecies = 0;
bool match;
for (int i = 0; i < modify->nfix; i++)
if (strncmp(modify->fix[i]->style,"property/atom",13) == 0)
error->all(FLERR,"fix rx cannot be combined with fix property/atom");
char **tmpspecies = new char*[maxspecies];
for(int jj=0; jj < maxspecies; jj++)
tmpspecies[jj] = NULL;
// open file on proc 0
FILE *fp;
fp = NULL;
if (comm->me == 0) {
fp = force->open_potential(kineticsFile);
if (fp == NULL) {
char str[128];
sprintf(str,"Cannot open rx file %s",kineticsFile);
error->one(FLERR,str);
}
}
// Assign species names to tmpspecies array and determine the number of unique species
int n,nwords;
char line[MAXLINE],*ptr;
int eof = 0;
char * word;
while (1) {
if (comm->me == 0) {
ptr = fgets(line,MAXLINE,fp);
if (ptr == NULL) {
eof = 1;
fclose(fp);
} else n = strlen(line) + 1;
}
MPI_Bcast(&eof,1,MPI_INT,0,world);
if (eof) break;
MPI_Bcast(&n,1,MPI_INT,0,world);
MPI_Bcast(line,n,MPI_CHAR,0,world);
// strip comment, skip line if blank
if ((ptr = strchr(line,'#'))) *ptr = '\0';
nwords = atom->count_words(line);
if (nwords == 0) continue;
// words = ptrs to all words in line
nwords = 0;
word = strtok(line," \t\n\r\f");
while (word != NULL){
word = strtok(NULL, " \t\n\r\f");
match=false;
for(int jj=0;jj<nUniqueSpecies;jj++){
if(strcmp(word,tmpspecies[jj])==0){
match=true;
break;
}
}
if(!match){
if(nUniqueSpecies+1>=maxspecies)
error->all(FLERR,"Exceeded the maximum number of species permitted in fix rx.");
tmpspecies[nUniqueSpecies] = new char[strlen(word)+1];
strcpy(tmpspecies[nUniqueSpecies],word);
nUniqueSpecies++;
}
word = strtok(NULL, " \t\n\r\f");
if(strcmp(word,"+") != 0 && strcmp(word,"=") != 0) break;
word = strtok(NULL, " \t\n\r\f");
}
}
atom->nspecies_dpd = nUniqueSpecies;
nspecies = atom->nspecies_dpd;
// new id = fix-ID + FIX_STORE_ATTRIBUTE
// new fix group = group for this fix
id_fix_species = NULL;
id_fix_species_old = NULL;
n = strlen(id) + strlen("_SPECIES") + 1;
id_fix_species = new char[n];
n = strlen(id) + strlen("_SPECIES_OLD") + 1;
id_fix_species_old = new char[n];
strcpy(id_fix_species,id);
strcat(id_fix_species,"_SPECIES");
strcpy(id_fix_species_old,id);
strcat(id_fix_species_old,"_SPECIES_OLD");
char **newarg = new char*[nspecies+5];
char **newarg2 = new char*[nspecies+5];
newarg[0] = id_fix_species;
newarg[1] = group->names[igroup];
newarg[2] = (char *) "property/atom";
newarg2[0] = id_fix_species_old;
newarg2[1] = group->names[igroup];
newarg2[2] = (char *) "property/atom";
for(int ii=0; ii<nspecies; ii++){
char str1[2+strlen(tmpspecies[ii])+1];
char str2[2+strlen(tmpspecies[ii])+4];
strcpy(str1,"d_");
strcpy(str2,"d_");
strncat(str1,tmpspecies[ii],strlen(tmpspecies[ii]));
strncat(str2,tmpspecies[ii],strlen(tmpspecies[ii]));
strncat(str2,"Old",3);
newarg[ii+3] = new char[strlen(str1)+1];
newarg2[ii+3] = new char[strlen(str2)+1];
strcpy(newarg[ii+3],str1);
strcpy(newarg2[ii+3],str2);
}
newarg[nspecies+3] = (char *) "ghost";
newarg[nspecies+4] = (char *) "yes";
newarg2[nspecies+3] = (char *) "ghost";
newarg2[nspecies+4] = (char *) "yes";
modify->add_fix(nspecies+5,newarg,1);
fix_species = (FixPropertyAtom *) modify->fix[modify->nfix-1];
restartFlag = modify->fix[modify->nfix-1]->restart_reset;
modify->add_fix(nspecies+5,newarg2,1);
fix_species_old = (FixPropertyAtom *) modify->fix[modify->nfix-1];
if(nspecies==0) error->all(FLERR,"There are no rx species specified.");
for(int jj=0;jj<nspecies;jj++) {
delete[] tmpspecies[jj];
delete[] newarg[jj+3];
delete[] newarg2[jj+3];
}
delete[] newarg;
delete[] newarg2;
delete[] tmpspecies;
read_file( kineticsFile );
if (useSparseKinetics)
this->initSparse();
// set comm size needed by this Pair
comm_forward = nspecies*2;
comm_reverse = 2;
}
/* ---------------------------------------------------------------------- */
void FixRX::initSparse()
{
const int Verbosity = 1;
if (comm->me == 0 and Verbosity > 1){
for (int k = 0; k < nspecies; ++k)
printf("atom->dname[%d]= %s\n", k, atom->dname[k]);
printf("stoich[][]\n");
for (int i = 0; i < nreactions; ++i){
int nreac_i = 0, nprod_i = 0;
printf("%d: ", i);
for (int k = 0; k < nspecies; ++k){
printf(" %g", stoich[i][k]);
if (stoich[i][k] < 0.0) nreac_i++;
else if (stoich[i][k] > 0.0) nprod_i++;
}
printf(" : %d %d\n", nreac_i, nprod_i);
}
printf("stoichReactants[][]\n");
for (int i = 0; i < nreactions; ++i){
int nreac_i = 0;
printf("%d: ", i);
for (int k = 0; k < nspecies; ++k){
printf(" %g", stoichReactants[i][k]);
if (stoichReactants[i][k] > 0.0) nreac_i++;
}
printf(" : %d\n", nreac_i);
}
printf("stoichProducts[][]\n");
for (int i = 0; i < nreactions; ++i){
int nprod_i = 0;
printf("%d: ", i);
for (int k = 0; k < nspecies; ++k){
printf(" %g", stoichProducts[i][k]);
if (stoichProducts[i][k] > 0.0) nprod_i++;
}
printf(" : %d\n", nprod_i);
}
} // if (Verbose)
// 1) Measure the sparsity of stoich[][]
int nzeros = 0;
int mxprod = 0;
int mxreac = 0;
int mxspec = 0;
int nIntegral = 0;
for (int i = 0; i < nreactions; ++i){
int nreac_i = 0, nprod_i = 0;
std::string pstr, rstr;
bool allAreIntegral = true;
for (int k = 0; k < nspecies; ++k){
if (stoichReactants[i][k] == 0 and stoichProducts[i][k] == 0)
nzeros++;
if (stoichReactants[i][k] > 0.0){
allAreIntegral &= (std::fmod( stoichReactants[i][k], 1.0 ) == 0.0);
nreac_i++;
if (rstr.length() > 0)
rstr += " + ";
char digit[6];
sprintf(digit, "%4.1f ", stoichReactants[i][k]); rstr += digit;
rstr += atom->dname[k];
}
if (stoichProducts[i][k] > 0.0){
allAreIntegral &= (std::fmod( stoichProducts[i][k], 1.0 ) == 0.0);
nprod_i++;
if (pstr.length() > 0)
pstr += " + ";
char digit[6];
sprintf(digit, "%4.1f ", stoichProducts[i][k]); pstr += digit;
pstr += atom->dname[k];
}
}
if (comm->me == 0 and Verbosity > 1)
printf("rx%3d: %d %d %d ... %s %s %s\n", i, nreac_i, nprod_i, allAreIntegral, rstr.c_str(), /*reversible[i]*/ (false) ? "<=>" : "=", pstr.c_str());
mxreac = std::max( mxreac, nreac_i );
mxprod = std::max( mxprod, nprod_i );
mxspec = std::max( mxspec, nreac_i + nprod_i );
if (allAreIntegral) nIntegral++;
}
if (comm->me == 0 and Verbosity > 1){
char msg[256];
sprintf(msg, "FixRX: Sparsity of Stoichiometric Matrix= %.1f%% non-zeros= %d nspecies= %d nreactions= %d maxReactants= %d maxProducts= %d maxSpecies= %d integralReactions= %d", 100*(double(nzeros) / (nspecies * nreactions)), nzeros, nspecies, nreactions, mxreac, mxprod, (mxreac + mxprod), SparseKinetics_enableIntegralReactions);
error->message(FLERR, msg);
}
// Allocate the sparse matrix data.
{
sparseKinetics_maxSpecies = (mxreac + mxprod);
sparseKinetics_maxReactants = mxreac;
sparseKinetics_maxProducts = mxprod;
memory->create( sparseKinetics_nu , nreactions, sparseKinetics_maxSpecies, "sparseKinetics_nu");
memory->create( sparseKinetics_nuk, nreactions, sparseKinetics_maxSpecies, "sparseKinetics_nuk");
for (int i = 0; i < nreactions; ++i)
for (int k = 0; k < sparseKinetics_maxSpecies; ++k){
sparseKinetics_nu [i][k] = 0.0;
sparseKinetics_nuk[i][k] = SparseKinetics_invalidIndex; // Initialize with an invalid index.
}
if (SparseKinetics_enableIntegralReactions){
memory->create( sparseKinetics_inu, nreactions, sparseKinetics_maxSpecies, "sparseKinetics_inu");
memory->create( sparseKinetics_isIntegralReaction, nreactions, "sparseKinetics_isIntegralReaction");
for (int i = 0; i < nreactions; ++i){
sparseKinetics_isIntegralReaction[i] = false;
for (int k = 0; k < sparseKinetics_maxSpecies; ++k)
sparseKinetics_inu[i][k] = 0;
}
}
}
// Measure the distribution of the # of moles for the ::fastpowi function.
std::vector<int> nu_bin(10);
for (int i = 0; i < nreactions; ++i){
int nreac_i = 0, nprod_i = 0;
bool isIntegral_i = true;
for (int k = 0; k < nspecies; ++k){
if (stoichReactants[i][k] > 0.0){
const int idx = nreac_i;
sparseKinetics_nu [i][idx] = stoichReactants[i][k];
sparseKinetics_nuk[i][idx] = k;
isIntegral_i &= (std::fmod( stoichReactants[i][k], 1.0 ) == 0.0);
if (SparseKinetics_enableIntegralReactions){
sparseKinetics_inu[i][idx] = (int)sparseKinetics_nu[i][idx];
if (isIntegral_i){
if (sparseKinetics_inu[i][idx] >= nu_bin.size())
nu_bin.resize( sparseKinetics_inu[i][idx] );
nu_bin[ sparseKinetics_inu[i][idx] ] ++;
}
}
nreac_i++;
}
if (stoichProducts[i][k] > 0.0){
const int idx = sparseKinetics_maxReactants + nprod_i;
sparseKinetics_nu [i][idx] = stoichProducts[i][k];
sparseKinetics_nuk[i][idx] = k;
isIntegral_i &= (std::fmod( sparseKinetics_nu[i][idx], 1.0 ) == 0.0);
if (SparseKinetics_enableIntegralReactions){
sparseKinetics_inu[i][idx] = (int) sparseKinetics_nu[i][idx];
if (isIntegral_i){
if (sparseKinetics_inu[i][idx] >= nu_bin.size())
nu_bin.resize( sparseKinetics_inu[i][idx] );
nu_bin[ sparseKinetics_inu[i][idx] ] ++;
}
}
nprod_i++;
}
}
if (SparseKinetics_enableIntegralReactions)
sparseKinetics_isIntegralReaction[i] = isIntegral_i;
}
if (comm->me == 0 and Verbosity > 1){
for (int i = 1; i < nu_bin.size(); ++i)
if (nu_bin[i] > 0)
printf("nu_bin[%d] = %d\n", i, nu_bin[i]);
for (int i = 0; i < nreactions; ++i){
std::string pstr, rstr;
for (int kk = 0; kk < sparseKinetics_maxReactants; kk++){
const int k = sparseKinetics_nuk[i][kk];
if (k != SparseKinetics_invalidIndex){
if (rstr.length() > 0)
rstr += " + ";
char digit[6];
if (SparseKinetics_enableIntegralReactions and sparseKinetics_isIntegralReaction[i])
sprintf(digit,"%d ", sparseKinetics_inu[i][kk]);
else
sprintf(digit,"%4.1f ", sparseKinetics_nu[i][kk]);
rstr += digit;
rstr += atom->dname[k];
}
}
for (int kk = sparseKinetics_maxReactants; kk < sparseKinetics_maxSpecies; kk++){
const int k = sparseKinetics_nuk[i][kk];
if (k != SparseKinetics_invalidIndex){
if (pstr.length() > 0)
pstr += " + ";
char digit[6];
if (SparseKinetics_enableIntegralReactions and sparseKinetics_isIntegralReaction[i])
sprintf(digit,"%d ", sparseKinetics_inu[i][kk]);
else
sprintf(digit,"%4.1f ", sparseKinetics_nu[i][kk]);
pstr += digit;
pstr += atom->dname[k];
}
}
if (comm->me == 0 and Verbosity > 1)
printf("rx%3d: %s %s %s\n", i, rstr.c_str(), /*reversible[i]*/ (false) ? "<=>" : "=", pstr.c_str());
}
// end for nreactions
}
// end if Verbose
}
/* ---------------------------------------------------------------------- */
int FixRX::setmask()
{
int mask = 0;
mask |= PRE_FORCE;
return mask;
}
/* ---------------------------------------------------------------------- */
void FixRX::init()
{
pairDPDE = (PairDPDfdtEnergy *) force->pair_match("dpd/fdt/energy",1);
if (pairDPDE == NULL)
pairDPDE = (PairDPDfdtEnergy *) force->pair_match("dpd/fdt/energy/kk",1);
if (pairDPDE == NULL)
error->all(FLERR,"Must use pair_style dpd/fdt/energy with fix rx");
bool eos_flag = false;
for (int i = 0; i < modify->nfix; i++)
if (strcmp(modify->fix[i]->style,"eos/table/rx") == 0) eos_flag = true;
if(!eos_flag) error->all(FLERR,"fix rx requires fix eos/table/rx to be specified");
+
+ // need a half neighbor list
+ // built whenever re-neighboring occurs
+
+ int irequest = neighbor->request(this,instance_me);
+ neighbor->requests[irequest]->pair = 0;
+ neighbor->requests[irequest]->fix = 1;
+}
+
+/* ---------------------------------------------------------------------- */
+
+void FixRX::init_list(int, class NeighList* ptr)
+{
+ this->list = ptr;
}
/* ---------------------------------------------------------------------- */
void FixRX::setup_pre_force(int vflag)
{
int nlocal = atom->nlocal;
int nghost = atom->nghost;
int *mask = atom->mask;
int newton_pair = force->newton_pair;
double tmp;
int ii;
if(restartFlag){
restartFlag = 0;
} else {
if(localTempFlag){
int count = nlocal + (newton_pair ? nghost : 0);
dpdThetaLocal = new double[count];
memset(dpdThetaLocal, 0, sizeof(double)*count);
computeLocalTemperature();
}
for (int id = 0; id < nlocal; id++)
for (int ispecies=0; ispecies<nspecies; ispecies++){
tmp = atom->dvector[ispecies][id];
atom->dvector[ispecies+nspecies][id] = tmp;
}
for (int i = 0; i < nlocal; i++)
if (mask[i] & groupbit){
// Set the reaction rate constants to zero: no reactions occur at step 0
for(int irxn=0;irxn<nreactions;irxn++)
kR[irxn] = 0.0;
if (odeIntegrationFlag == ODE_LAMMPS_RK4)
rk4(i,NULL);
else if (odeIntegrationFlag == ODE_LAMMPS_RKF45)
rkf45(i,NULL);
}
// Communicate the updated momenta and velocities to all nodes
comm->forward_comm_fix(this);
if(localTempFlag) delete [] dpdThetaLocal;
}
}
/* ---------------------------------------------------------------------- */
void FixRX::pre_force(int vflag)
{
int nlocal = atom->nlocal;
int nghost = atom->nghost;
int *mask = atom->mask;
double *dpdTheta = atom->dpdTheta;
int newton_pair = force->newton_pair;
double theta;
if(localTempFlag){
int count = nlocal + (newton_pair ? nghost : 0);
dpdThetaLocal = new double[count];
memset(dpdThetaLocal, 0, sizeof(double)*count);
computeLocalTemperature();
}
TimerType timer_localTemperature = getTimeStamp();
// Zero the counters for the ODE solvers.
this->nSteps = this->nIters = this->nFuncs = this->nFails = 0;
if (odeIntegrationFlag == ODE_LAMMPS_RKF45 && diagnosticFrequency == 1)
{
memory->create( diagnosticCounterPerODE[StepSum], nlocal, "FixRX::diagnosticCounterPerODE");
memory->create( diagnosticCounterPerODE[FuncSum], nlocal, "FixRX::diagnosticCounterPerODE");
}
double *rwork = new double[8*nspecies + nreactions];
for (int i = 0; i < nlocal; i++)
if (mask[i] & groupbit){
if (localTempFlag)
theta = dpdThetaLocal[i];
else
theta = dpdTheta[i];
//Compute the reaction rate constants
for (int irxn = 0; irxn < nreactions; irxn++)
kR[irxn] = Arr[irxn]*pow(theta,nArr[irxn])*exp(-Ea[irxn]/force->boltz/theta);
if (odeIntegrationFlag == ODE_LAMMPS_RK4)
rk4(i,rwork);
else if (odeIntegrationFlag == ODE_LAMMPS_RKF45)
rkf45(i,rwork);
}
TimerType timer_ODE = getTimeStamp();
delete [] rwork;
// Communicate the updated momenta and velocities to all nodes
comm->forward_comm_fix(this);
if(localTempFlag) delete [] dpdThetaLocal;
double time_ODE = getElapsedTime(timer_localTemperature, timer_ODE);
// Warn the user if a failure was detected in the ODE solver.
if (nFails > 0){
char sbuf[128];
sprintf(sbuf,"in FixRX::pre_force, ODE solver failed for %d atoms.", nFails);
error->warning(FLERR, sbuf);
}
// Compute and report ODE diagnostics, if requested.
if (odeIntegrationFlag == ODE_LAMMPS_RKF45 && diagnosticFrequency != 0){
// Update the counters.
diagnosticCounter[StepSum] += nSteps;
diagnosticCounter[FuncSum] += nFuncs;
diagnosticCounter[TimeSum] += time_ODE;
diagnosticCounter[AtomSum] += nlocal;
diagnosticCounter[numDiagnosticCounters-1] ++;
if ( (diagnosticFrequency > 0 &&
((update->ntimestep - update->firststep) % diagnosticFrequency) == 0) ||
(diagnosticFrequency < 0 && update->ntimestep == update->laststep) )
this->odeDiagnostics();
for (int i = 0; i < numDiagnosticCounters; ++i)
if (diagnosticCounterPerODE[i])
memory->destroy( diagnosticCounterPerODE[i] );
}
}
/* ---------------------------------------------------------------------- */
void FixRX::read_file(char *file)
{
nreactions = 0;
// open file on proc 0
FILE *fp;
fp = NULL;
if (comm->me == 0) {
fp = force->open_potential(file);
if (fp == NULL) {
char str[128];
sprintf(str,"Cannot open rx file %s",file);
error->one(FLERR,str);
}
}
// Count the number of reactions from kinetics file
int n,nwords,ispecies;
char line[MAXLINE],*ptr;
int eof = 0;
while (1) {
if (comm->me == 0) {
ptr = fgets(line,MAXLINE,fp);
if (ptr == NULL) {
eof = 1;
fclose(fp);
} else n = strlen(line) + 1;
}
MPI_Bcast(&eof,1,MPI_INT,0,world);
if (eof) break;
MPI_Bcast(&n,1,MPI_INT,0,world);
MPI_Bcast(line,n,MPI_CHAR,0,world);
// strip comment, skip line if blank
if ((ptr = strchr(line,'#'))) *ptr = '\0';
nwords = atom->count_words(line);
if (nwords == 0) continue;
nreactions++;
}
// open file on proc 0
if (comm->me == 0) fp = force->open_potential(file);
// read each reaction from kinetics file
eof=0;
char * word;
double tmpStoich;
double sign;
Arr = new double[nreactions];
nArr = new double[nreactions];
Ea = new double[nreactions];
tempExp = new double[nreactions];
stoich = new double*[nreactions];
stoichReactants = new double*[nreactions];
stoichProducts = new double*[nreactions];
for (int ii=0;ii<nreactions;ii++){
stoich[ii] = new double[nspecies];
stoichReactants[ii] = new double[nspecies];
stoichProducts[ii] = new double[nspecies];
}
kR = new double[nreactions];
for (int ii=0;ii<nreactions;ii++){
for (int jj=0;jj<nspecies;jj++){
stoich[ii][jj] = 0.0;
stoichReactants[ii][jj] = 0.0;
stoichProducts[ii][jj] = 0.0;
}
}
nreactions=0;
sign = -1.0;
while (1) {
if (comm->me == 0) {
ptr = fgets(line,MAXLINE,fp);
if (ptr == NULL) {
eof = 1;
fclose(fp);
} else n = strlen(line) + 1;
}
MPI_Bcast(&eof,1,MPI_INT,0,world);
if (eof) break;
MPI_Bcast(&n,1,MPI_INT,0,world);
MPI_Bcast(line,n,MPI_CHAR,0,world);
// strip comment, skip line if blank
if ((ptr = strchr(line,'#'))) *ptr = '\0';
nwords = atom->count_words(line);
if (nwords == 0) continue;
// words = ptrs to all words in line
nwords = 0;
word = strtok(line," \t\n\r\f");
while (word != NULL){
tmpStoich = atof(word);
word = strtok(NULL, " \t\n\r\f");
for (ispecies = 0; ispecies < nspecies; ispecies++){
if (strcmp(word,&atom->dname[ispecies][0]) == 0){
stoich[nreactions][ispecies] += sign*tmpStoich;
if(sign<0.0)
stoichReactants[nreactions][ispecies] += tmpStoich;
else stoichProducts[nreactions][ispecies] += tmpStoich;
break;
}
}
if(ispecies==nspecies){
if (comm->me) {
fprintf(stderr,"%s mol fraction is not found in data file\n",word);
fprintf(stderr,"nspecies=%d ispecies=%d\n",nspecies,ispecies);
}
error->all(FLERR,"Illegal fix rx command");
}
word = strtok(NULL, " \t\n\r\f");
if(word==NULL) error->all(FLERR,"Missing parameters in reaction kinetic equation");
if(strcmp(word,"=") == 0) sign = 1.0;
if(strcmp(word,"+") != 0 && strcmp(word,"=") != 0){
if(word==NULL) error->all(FLERR,"Missing parameters in reaction kinetic equation");
Arr[nreactions] = atof(word);
word = strtok(NULL, " \t\n\r\f");
if(word==NULL) error->all(FLERR,"Missing parameters in reaction kinetic equation");
nArr[nreactions] = atof(word);
word = strtok(NULL, " \t\n\r\f");
if(word==NULL) error->all(FLERR,"Missing parameters in reaction kinetic equation");
Ea[nreactions] = atof(word);
sign = -1.0;
break;
}
word = strtok(NULL, " \t\n\r\f");
}
nreactions++;
}
}
/* ---------------------------------------------------------------------- */
void FixRX::setupParams()
{
int i,j,n;
// set mol2param for all combinations
// must be a single exact match to lines read from file
memory->destroy(mol2param);
memory->create(mol2param,nspecies,"pair:mol2param");
for (i = 0; i < nspecies; i++) {
n = -1;
for (j = 0; j < nreactions; j++) {
if (i == params[j].ispecies) {
if (n >= 0) error->all(FLERR,"Potential file has duplicate entry");
n = j;
}
}
mol2param[i] = n;
}
}
/* ---------------------------------------------------------------------- */
void FixRX::rk4(int id, double *rwork)
{
double *k1 = NULL;
if (rwork == NULL)
k1 = new double[6*nspecies + nreactions];
else
k1 = rwork;
double *k2 = k1 + nspecies;
double *k3 = k2 + nspecies;
double *k4 = k3 + nspecies;
double *y = k4 + nspecies;
double *yp = y + nspecies;
double *dummyArray = yp + nspecies; // Passed to the rhs function.
const int numSteps = minSteps;
const double h = update->dt / double(numSteps);
// Update ConcOld
for (int ispecies = 0; ispecies < nspecies; ispecies++)
{
const double tmp = atom->dvector[ispecies][id];
atom->dvector[ispecies+nspecies][id] = tmp;
y[ispecies] = tmp;
}
// Run the requested steps with h.
for (int step = 0; step < numSteps; step++)
{
// k1
rhs(0.0,y,k1,dummyArray);
// k2
for (int ispecies = 0; ispecies < nspecies; ispecies++)
yp[ispecies] = y[ispecies] + 0.5*h*k1[ispecies];
rhs(0.0,yp,k2,dummyArray);
// k3
for (int ispecies = 0; ispecies < nspecies; ispecies++)
yp[ispecies] = y[ispecies] + 0.5*h*k2[ispecies];
rhs(0.0,yp,k3,dummyArray);
// k4
for (int ispecies = 0; ispecies < nspecies; ispecies++)
yp[ispecies] = y[ispecies] + h*k3[ispecies];
rhs(0.0,yp,k4,dummyArray);
for (int ispecies = 0; ispecies < nspecies; ispecies++)
y[ispecies] += h*(k1[ispecies]/6.0 + k2[ispecies]/3.0 + k3[ispecies]/3.0 + k4[ispecies]/6.0);
} // end for (int step...
// Store the solution back in atom->dvector.
for (int ispecies = 0; ispecies < nspecies; ispecies++){
if(y[ispecies] < -MY_EPSILON)
error->one(FLERR,"Computed concentration in RK4 solver is < -10*DBL_EPSILON");
else if(y[ispecies] < MY_EPSILON)
y[ispecies] = 0.0;
atom->dvector[ispecies][id] = y[ispecies];
}
if (rwork == NULL)
delete [] k1;
}
/* ---------------------------------------------------------------------- */
// f1 = dt*f(t,x)
// f2 = dt*f(t+ c20*dt,x + c21*f1)
// f3 = dt*f(t+ c30*dt,x + c31*f1 + c32*f2)
// f4 = dt*f(t+ c40*dt,x + c41*f1 + c42*f2 + c43*f3)
// f5 = dt*f(t+dt,x + c51*f1 + c52*f2 + c53*f3 + c54*f4)
// f6 = dt*f(t+ c60*dt,x + c61*f1 + c62*f2 + c63*f3 + c64*f4 + c65*f5)
//
// fifth-order runge-kutta integration
// x5 = x + b1*f1 + b3*f3 + b4*f4 + b5*f5 + b6*f6
// fourth-order runge-kutta integration
// x = x + a1*f1 + a3*f3 + a4*f4 + a5*f5
void FixRX::rkf45_step (const int neq, const double h, double y[], double y_out[], double rwk[], void* v_param)
{
const double c21=0.25;
const double c31=0.09375;
const double c32=0.28125;
const double c41=0.87938097405553;
const double c42=-3.2771961766045;
const double c43=3.3208921256258;
const double c51=2.0324074074074;
const double c52=-8.0;
const double c53=7.1734892787524;
const double c54=-0.20589668615984;
const double c61=-0.2962962962963;
const double c62=2.0;
const double c63=-1.3816764132554;
const double c64=0.45297270955166;
const double c65=-0.275;
const double a1=0.11574074074074;
const double a3=0.54892787524366;
const double a4=0.5353313840156;
const double a5=-0.2;
const double b1=0.11851851851852;
const double b3=0.51898635477583;
const double b4=0.50613149034201;
const double b5=-0.18;
const double b6=0.036363636363636;
// local dependent variables (5 total)
double* f1 = &rwk[ 0];
double* f2 = &rwk[ neq];
double* f3 = &rwk[2*neq];
double* f4 = &rwk[3*neq];
double* f5 = &rwk[4*neq];
double* f6 = &rwk[5*neq];
// scratch for the intermediate solution.
//double* ytmp = &rwk[6*neq];
double* ytmp = y_out;
// 1)
rhs (0.0, y, f1, v_param);
for (int k = 0; k < neq; k++){
f1[k] *= h;
ytmp[k] = y[k] + c21 * f1[k];
}
// 2)
rhs(0.0, ytmp, f2, v_param);
for (int k = 0; k < neq; k++){
f2[k] *= h;
ytmp[k] = y[k] + c31 * f1[k] + c32 * f2[k];
}
// 3)
rhs(0.0, ytmp, f3, v_param);
for (int k = 0; k < neq; k++) {
f3[k] *= h;
ytmp[k] = y[k] + c41 * f1[k] + c42 * f2[k] + c43 * f3[k];
}
// 4)
rhs(0.0, ytmp, f4, v_param);
for (int k = 0; k < neq; k++) {
f4[k] *= h;
ytmp[k] = y[k] + c51 * f1[k] + c52 * f2[k] + c53 * f3[k] + c54 * f4[k];
}
// 5)
rhs(0.0, ytmp, f5, v_param);
for (int k = 0; k < neq; k++) {
f5[k] *= h;
ytmp[k] = y[k] + c61*f1[k] + c62*f2[k] + c63*f3[k] + c64*f4[k] + c65*f5[k];
}
// 6)
rhs(0.0, ytmp, f6, v_param);
for (int k = 0; k < neq; k++)
{
//const double f6 = h * ydot[k];
f6[k] *= h;
// 5th-order solution.
const double r5 = b1*f1[k] + b3*f3[k] + b4*f4[k] + b5*f5[k] + b6*f6[k];
// 4th-order solution.
const double r4 = a1*f1[k] + a3*f3[k] + a4*f4[k] + a5*f5[k];
// Truncation error: difference between 4th and 5th-order solutions.
rwk[k] = fabs(r5 - r4);
// Update solution.
//y_out[k] = y[k] + r5; // Local extrapolation
y_out[k] = y[k] + r4;
}
return;
}
int FixRX::rkf45_h0 (const int neq, const double t, const double t_stop,
const double hmin, const double hmax,
double& h0, double y[], double rwk[], void* v_params)
{
// Set lower and upper bounds on h0, and take geometric mean as first trial value.
// Exit with this value if the bounds cross each other.
// Adjust upper bound based on ydot ...
double hg = sqrt(hmin*hmax);
//if (hmax < hmin)
//{
// h0 = hg;
// return;
//}
// Start iteration to find solution to ... {WRMS norm of (h0^2 y'' / 2)} = 1
double *ydot = rwk;
double *y1 = ydot + neq;
double *ydot1 = y1 + neq;
const int max_iters = 10;
bool hnew_is_ok = false;
double hnew = hg;
int iter = 0;
// compute ydot at t=t0
rhs (t, y, ydot, v_params);
while(1)
{
// Estimate y'' with finite-difference ...
for (int k = 0; k < neq; k++)
y1[k] = y[k] + hg * ydot[k];
// compute y' at t1
rhs (t + hg, y1, ydot1, v_params);
// Compute WRMS norm of y''
double yddnrm = 0.0;
for (int k = 0; k < neq; k++){
double ydd = (ydot1[k] - ydot[k]) / hg;
double wterr = ydd / (relTol * fabs( y[k] ) + absTol);
yddnrm += wterr * wterr;
}
yddnrm = sqrt( yddnrm / double(neq) );
//std::cout << "iter " << _iter << " hg " << hg << " y'' " << yddnrm << std::endl;
//std::cout << "ydot " << ydot[neq-1] << std::endl;
// should we accept this?
if (hnew_is_ok || iter == max_iters){
hnew = hg;
if (iter == max_iters)
fprintf(stderr, "ERROR_HIN_MAX_ITERS\n");
break;
}
// Get the new value of h ...
hnew = (yddnrm*hmax*hmax > 2.0) ? sqrt(2.0 / yddnrm) : sqrt(hg * hmax);
// test the stopping conditions.
double hrat = hnew / hg;
// Accept this value ... the bias factor should bring it within range.
if ( (hrat > 0.5) && (hrat < 2.0) )
hnew_is_ok = true;
// If y'' is still bad after a few iterations, just accept h and give up.
if ( (iter > 1) && hrat > 2.0 ) {
hnew = hg;
hnew_is_ok = true;
}
//printf("iter=%d, yddnrw=%e, hnew=%e, hmin=%e, hmax=%e\n", iter, yddnrm, hnew, hmin, hmax);
hg = hnew;
iter ++;
}
// bound and bias estimate
h0 = hnew * 0.5;
h0 = fmax(h0, hmin);
h0 = fmin(h0, hmax);
//printf("h0=%e, hmin=%e, hmax=%e\n", h0, hmin, hmax);
return (iter + 1);
}
void FixRX::odeDiagnostics(void)
{
TimerType timer_start = getTimeStamp();
// Compute:
// 1) Average # of ODE integrator steps and RHS evaluations per atom globally.
// 2) RMS # of ...
// 3) Average # of ODE steps and RHS evaluations per MPI task.
// 4) RMS # of ODE steps and RHS evaluations per MPI task.
// 5) MAX # of ODE steps and RHS evaluations per MPI task.
//
// ... 1,2 are for ODE control diagnostics.
// ... 3-5 are for load balancing diagnostics.
//
// To do this, we'll need to
// a) Allreduce (sum) the sum of nSteps / nFuncs. Dividing by atom->natoms
// gives the avg # of steps/funcs per atom globally.
// b) Reduce (sum) to root the sum of squares of the differences.
// i) Sum_i (steps_i - avg_steps_global)^2
// ii) Sum_i (funcs_i - avg_funcs_global)^2
// iii) (avg_steps_local - avg_steps_global)^2
// iv) (avg_funcs_local - avg_funcs_global)^2
const int numCounters = numDiagnosticCounters-1;
// # of time-steps for averaging.
const int nTimes = this->diagnosticCounter[numDiagnosticCounters-1];
// # of ODE's per time-step (on average).
//const int nODEs = this->diagnosticCounter[AtomSum] / nTimes;
// Sum up the sums from each task.
double sums[numCounters];
double my_vals[numCounters];
double max_per_proc[numCounters];
double min_per_proc[numCounters];
// Compute counters per dpd time-step.
for (int i = 0; i < numCounters; ++i){
my_vals[i] = this->diagnosticCounter[i] / nTimes;
//printf("my sum[%d] = %f %d\n", i, my_vals[i], comm->me);
}
MPI_Allreduce (my_vals, sums, numCounters, MPI_DOUBLE, MPI_SUM, world);
MPI_Reduce (my_vals, max_per_proc, numCounters, MPI_DOUBLE, MPI_MAX, 0, world);
MPI_Reduce (my_vals, min_per_proc, numCounters, MPI_DOUBLE, MPI_MIN, 0, world);
const double nODEs = sums[numCounters-1];
double avg_per_atom[numCounters], avg_per_proc[numCounters];
// Averages per-ODE and per-proc per time-step.
for (int i = 0; i < numCounters; ++i){
avg_per_atom[i] = sums[i] / nODEs;
avg_per_proc[i] = sums[i] / comm->nprocs;
}
// Sum up the differences from each task.
double sum_sq[2*numCounters];
double my_sum_sq[2*numCounters];
for (int i = 0; i < numCounters; ++i){
double diff_i = my_vals[i] - avg_per_proc[i];
my_sum_sq[i] = diff_i * diff_i;
}
double max_per_ODE[numCounters], min_per_ODE[numCounters];
// Process the per-ODE RMS of the # of steps/funcs
if (diagnosticFrequency == 1){
double my_max[numCounters], my_min[numCounters];
const int nlocal = atom->nlocal;
const int *mask = atom->mask;
for (int i = 0; i < numCounters; ++i){
my_sum_sq[i+numCounters] = 0;
my_max[i] = 0;
my_min[i] = DBL_MAX;
if (diagnosticCounterPerODE[i] != NULL){
for (int j = 0; j < nlocal; ++j)
if (mask[j] & groupbit){
double diff = double(diagnosticCounterPerODE[i][j]) - avg_per_atom[i];
my_sum_sq[i+numCounters] += diff*diff;
my_max[i] = std::max( my_max[i], (double)diagnosticCounterPerODE[i][j] );
my_min[i] = std::min( my_min[i], (double)diagnosticCounterPerODE[i][j] );
}
}
}
MPI_Reduce (my_sum_sq, sum_sq, 2*numCounters, MPI_DOUBLE, MPI_SUM, 0, world);
MPI_Reduce (my_max, max_per_ODE, numCounters, MPI_DOUBLE, MPI_MAX, 0, world);
MPI_Reduce (my_min, min_per_ODE, numCounters, MPI_DOUBLE, MPI_MIN, 0, world);
}
else
MPI_Reduce (my_sum_sq, sum_sq, numCounters, MPI_DOUBLE, MPI_SUM, 0, world);
TimerType timer_stop = getTimeStamp();
double time_local = getElapsedTime( timer_start, timer_stop );
if (comm->me == 0){
char smesg[128];
#define print_mesg(smesg) {\
if (screen) fprintf(screen,"%s\n", smesg); \
if (logfile) fprintf(logfile,"%s\n", smesg); }
sprintf(smesg, "FixRX::ODE Diagnostics: # of steps |# of rhs evals| run-time (sec)");
print_mesg(smesg);
sprintf(smesg, " AVG per ODE : %-12.5g | %-12.5g | %-12.5g", avg_per_atom[0], avg_per_atom[1], avg_per_atom[2]);
print_mesg(smesg);
// only valid for single time-step!
if (diagnosticFrequency == 1){
double rms_per_ODE[numCounters];
for (int i = 0; i < numCounters; ++i)
rms_per_ODE[i] = sqrt( sum_sq[i+numCounters] / nODEs );
sprintf(smesg, " RMS per ODE : %-12.5g | %-12.5g ", rms_per_ODE[0], rms_per_ODE[1]);
print_mesg(smesg);
sprintf(smesg, " MAX per ODE : %-12.5g | %-12.5g ", max_per_ODE[0], max_per_ODE[1]);
print_mesg(smesg);
sprintf(smesg, " MIN per ODE : %-12.5g | %-12.5g ", min_per_ODE[0], min_per_ODE[1]);
print_mesg(smesg);
}
sprintf(smesg, " AVG per Proc : %-12.5g | %-12.5g | %-12.5g", avg_per_proc[0], avg_per_proc[1], avg_per_proc[2]);
print_mesg(smesg);
if (comm->nprocs > 1){
double rms_per_proc[numCounters];
for (int i = 0; i < numCounters; ++i)
rms_per_proc[i] = sqrt( sum_sq[i] / comm->nprocs );
sprintf(smesg, " RMS per Proc : %-12.5g | %-12.5g | %-12.5g", rms_per_proc[0], rms_per_proc[1], rms_per_proc[2]);
print_mesg(smesg);
sprintf(smesg, " MAX per Proc : %-12.5g | %-12.5g | %-12.5g", max_per_proc[0], max_per_proc[1], max_per_proc[2]);
print_mesg(smesg);
sprintf(smesg, " MIN per Proc : %-12.5g | %-12.5g | %-12.5g", min_per_proc[0], min_per_proc[1], min_per_proc[2]);
print_mesg(smesg);
}
sprintf(smesg, " AVG'd over %d time-steps", nTimes);
print_mesg(smesg);
sprintf(smesg, " AVG'ing took %g sec", time_local);
print_mesg(smesg);
#undef print_mesg
}
// Reset the counters.
for (int i = 0; i < numDiagnosticCounters; ++i)
diagnosticCounter[i] = 0;
return;
}
void FixRX::rkf45(int id, double *rwork)
{
// Rounding coefficient.
const double uround = DBL_EPSILON;
// Adaption limit (shrink or grow)
const double adaption_limit = 4.0;
//double *y = new double[8*nspecies + nreactions];
double *y = NULL;
if (rwork == NULL)
y = new double[8*nspecies + nreactions];
else
y = rwork;
double *rhstmp = y + 8*nspecies;
const int neq = nspecies;
// Update ConcOld and initialize the ODE solution vector y[].
for (int ispecies = 0; ispecies < nspecies; ispecies++){
const double tmp = atom->dvector[ispecies][id];
atom->dvector[ispecies+nspecies][id] = tmp;
y[ispecies] = tmp;
}
// Integration length.
const double t_stop = update->dt; // DPD time-step.
// Safety factor on the adaption. very specific but not necessary .. 0.9 is common.
const double hsafe = 0.840896415;
// Time rounding factor.
const double tround = t_stop * uround;
// Counters for diagnostics.
int nst = 0; // # of steps (accepted)
int nit = 0; // # of iterations total
int nfe = 0; // # of RHS evaluations
// Min/Max step-size limits.
const double h_min = 100.0 * tround;
const double h_max = (minSteps > 0) ? t_stop / double(minSteps) : t_stop;
// Set the initial step-size. 0 forces an internal estimate ... stable Euler step size.
double h = (minSteps > 0) ? t_stop / double(minSteps) : 0.0;
double t = 0.0;
if (h < h_min){
//fprintf(stderr,"hin not implemented yet\n");
//exit(-1);
nfe = rkf45_h0 (neq, t, t_stop, h_min, h_max, h, y, y + neq, rhstmp);
}
//printf("t= %e t_stop= %e h= %e\n", t, t_stop, h);
// Integrate until we reach the end time.
while (fabs(t - t_stop) > tround){
double *yout = y + neq;
double *eout = yout + neq;
// Take a trial step.
rkf45_step (neq, h, y, yout, eout, rhstmp);
// Estimate the solution error.
// ... weighted 2-norm of the error.
double err2 = 0.0;
for (int k = 0; k < neq; k++){
const double wterr = eout[k] / (relTol * fabs( y[k] ) + absTol);
err2 += wterr * wterr;
}
double err = fmax( uround, sqrt( err2 / double(nspecies) ));
// Accept the solution?
if (err <= 1.0 || h <= h_min){
t += h;
nst++;
for (int k = 0; k < neq; k++)
y[k] = yout[k];
}
// Adjust h for the next step.
double hfac = hsafe * sqrt( sqrt( 1.0 / err ) );
// Limit the adaption.
hfac = fmax( hfac, 1.0 / adaption_limit );
hfac = fmin( hfac, adaption_limit );
// Apply the adaption factor...
h *= hfac;
// Limit h.
h = fmin( h, h_max );
h = fmax( h, h_min );
// Stretch h if we're within 5% ... and we didn't just fail.
if (err <= 1.0 && (t + 1.05*h) > t_stop)
h = t_stop - t;
// And don't overshoot the end.
if (t + h > t_stop)
h = t_stop - t;
nit++;
nfe += 6;
if (maxIters && nit > maxIters){
//fprintf(stderr,"atom[%d] took too many iterations in rkf45 %d %e %e\n", id, nit, t, t_stop);
nFails ++;
break;
// We should set an error here so that the solution is not used!
}
} // end while
nSteps += nst;
nIters += nit;
nFuncs += nfe;
//if (diagnosticFrequency == 1 && diagnosticCounterPerODE[StepSum] != NULL)
if (diagnosticCounterPerODE[StepSum] != NULL){
diagnosticCounterPerODE[StepSum][id] = nst;
diagnosticCounterPerODE[FuncSum][id] = nfe;
}
//printf("id= %d nst= %d nit= %d\n", id, nst, nit);
// Store the solution back in atom->dvector.
for (int ispecies = 0; ispecies < nspecies; ispecies++){
if(y[ispecies] < -1.0e-10)
error->one(FLERR,"Computed concentration in RKF45 solver is < -1.0e-10");
else if(y[ispecies] < MY_EPSILON)
y[ispecies] = 0.0;
atom->dvector[ispecies][id] = y[ispecies];
}
if (rwork == NULL)
delete [] y;
}
/* ---------------------------------------------------------------------- */
int FixRX::rhs(double t, const double *y, double *dydt, void *params)
{
// Use the sparse format instead.
if (useSparseKinetics)
return this->rhs_sparse( t, y, dydt, params);
else
return this->rhs_dense ( t, y, dydt, params);
}
/* ---------------------------------------------------------------------- */
int FixRX::rhs_dense(double t, const double *y, double *dydt, void *params)
{
double rxnRateLawForward;
double *rxnRateLaw = (double *) params;
double VDPD = domain->xprd * domain->yprd * domain->zprd / atom->natoms;
double concentration;
int nspecies = atom->nspecies_dpd;
for(int ispecies=0; ispecies<nspecies; ispecies++)
dydt[ispecies] = 0.0;
// Construct the reaction rate laws
for(int jrxn=0; jrxn<nreactions; jrxn++){
rxnRateLawForward = kR[jrxn];
for(int ispecies=0; ispecies<nspecies; ispecies++){
concentration = y[ispecies]/VDPD;
rxnRateLawForward *= pow(concentration,stoichReactants[jrxn][ispecies]);
}
rxnRateLaw[jrxn] = rxnRateLawForward;
}
// Construct the reaction rates for each species
for(int ispecies=0; ispecies<nspecies; ispecies++)
for(int jrxn=0; jrxn<nreactions; jrxn++)
dydt[ispecies] += stoich[jrxn][ispecies]*VDPD*rxnRateLaw[jrxn];
return 0;
}
/* ---------------------------------------------------------------------- */
int FixRX::rhs_sparse(double t, const double *y, double *dydt, void *v_params) const
{
double *_rxnRateLaw = (double *) v_params;
const double VDPD = domain->xprd * domain->yprd * domain->zprd / atom->natoms;
#define kFor (this->kR)
#define kRev (NULL)
#define rxnRateLaw (_rxnRateLaw)
#define conc (dydt)
#define maxReactants (this->sparseKinetics_maxReactants)
#define maxSpecies (this->sparseKinetics_maxSpecies)
#define nuk (this->sparseKinetics_nuk)
#define nu (this->sparseKinetics_nu)
#define inu (this->sparseKinetics_inu)
#define isIntegral(idx) (SparseKinetics_enableIntegralReactions \
&& this->sparseKinetics_isIntegralReaction[idx])
for (int k = 0; k < nspecies; ++k)
conc[k] = y[k] / VDPD;
// Construct the reaction rate laws
for (int i = 0; i < nreactions; ++i)
{
double rxnRateLawForward;
if (isIntegral(i)){
rxnRateLawForward = kFor[i] * powint( conc[ nuk[i][0] ], inu[i][0]);
for (int kk = 1; kk < maxReactants; ++kk){
const int k = nuk[i][kk];
if (k == SparseKinetics_invalidIndex) break;
//if (k != SparseKinetics_invalidIndex)
rxnRateLawForward *= powint( conc[k], inu[i][kk] );
}
} else {
rxnRateLawForward = kFor[i] * pow( conc[ nuk[i][0] ], nu[i][0]);
for (int kk = 1; kk < maxReactants; ++kk){
const int k = nuk[i][kk];
if (k == SparseKinetics_invalidIndex) break;
//if (k != SparseKinetics_invalidIndex)
rxnRateLawForward *= pow( conc[k], nu[i][kk] );
}
}
rxnRateLaw[i] = rxnRateLawForward;
}
// Construct the reaction rates for each species from the
// Stoichiometric matrix and ROP vector.
for (int k = 0; k < nspecies; ++k)
dydt[k] = 0.0;
for (int i = 0; i < nreactions; ++i){
// Reactants ...
dydt[ nuk[i][0] ] -= nu[i][0] * rxnRateLaw[i];
for (int kk = 1; kk < maxReactants; ++kk){
const int k = nuk[i][kk];
if (k == SparseKinetics_invalidIndex) break;
//if (k != SparseKinetics_invalidIndex)
dydt[k] -= nu[i][kk] * rxnRateLaw[i];
}
// Products ...
dydt[ nuk[i][maxReactants] ] += nu[i][maxReactants] * rxnRateLaw[i];
for (int kk = maxReactants+1; kk < maxSpecies; ++kk){
const int k = nuk[i][kk];
if (k == SparseKinetics_invalidIndex) break;
//if (k != SparseKinetics_invalidIndex)
dydt[k] += nu[i][kk] * rxnRateLaw[i];
}
}
// Add in the volume factor to convert to the proper units.
for (int k = 0; k < nspecies; ++k)
dydt[k] *= VDPD;
#undef kFor
#undef kRev
#undef rxnRateLaw
#undef conc
#undef maxReactants
#undef maxSpecies
#undef nuk
#undef nu
#undef inu
#undef isIntegral
//#undef invalidIndex
return 0;
}
/* ---------------------------------------------------------------------- */
void FixRX::computeLocalTemperature()
{
int i,j,ii,jj,inum,jnum,itype,jtype;
double xtmp,ytmp,ztmp,delx,dely,delz;
double rsq;
int *ilist,*jlist,*numneigh,**firstneigh;
double **x = atom->x;
int *type = atom->type;
int nlocal = atom->nlocal;
int nghost = atom->nghost;
int newton_pair = force->newton_pair;
// local temperature variables
double wij=0.0;
double *dpdTheta = atom->dpdTheta;
// Initialize the local temperature weight array
int sumWeightsCt = nlocal + (newton_pair ? nghost : 0);
sumWeights = new double[sumWeightsCt];
memset(sumWeights, 0, sizeof(double)*sumWeightsCt);
- inum = pairDPDE->list->inum;
- ilist = pairDPDE->list->ilist;
- numneigh = pairDPDE->list->numneigh;
- firstneigh = pairDPDE->list->firstneigh;
+ inum = list->inum;
+ ilist = list->ilist;
+ numneigh = list->numneigh;
+ firstneigh = list->firstneigh;
// loop over neighbors of my atoms
for (ii = 0; ii < inum; ii++) {
i = ilist[ii];
xtmp = x[i][0];
ytmp = x[i][1];
ztmp = x[i][2];
itype = type[i];
jlist = firstneigh[i];
jnum = numneigh[i];
for (jj = 0; jj < jnum; jj++) {
j = jlist[jj];
j &= NEIGHMASK;
jtype = type[j];
delx = xtmp - x[j][0];
dely = ytmp - x[j][1];
delz = ztmp - x[j][2];
rsq = delx*delx + dely*dely + delz*delz;
if (rsq < pairDPDE->cutsq[itype][jtype]) {
double rcut = sqrt(pairDPDE->cutsq[itype][jtype]);
double rij = sqrt(rsq);
double ratio = rij/rcut;
// Lucy's Weight Function
if(wtFlag==LUCY){
wij = (1.0+3.0*ratio) * (1.0-ratio)*(1.0-ratio)*(1.0-ratio);
dpdThetaLocal[i] += wij/dpdTheta[j];
if (newton_pair || j < nlocal)
dpdThetaLocal[j] += wij/dpdTheta[i];
}
sumWeights[i] += wij;
if (newton_pair || j < nlocal)
sumWeights[j] += wij;
}
}
}
if (newton_pair) comm->reverse_comm_fix(this);
// self-interaction for local temperature
for (i = 0; i < nlocal; i++){
// Lucy Weight Function
if(wtFlag==LUCY){
wij = 1.0;
dpdThetaLocal[i] += wij / dpdTheta[i];
}
sumWeights[i] += wij;
// Normalized local temperature
dpdThetaLocal[i] = dpdThetaLocal[i] / sumWeights[i];
if(localTempFlag == HARMONIC)
dpdThetaLocal[i] = 1.0 / dpdThetaLocal[i];
}
delete [] sumWeights;
}
/* ---------------------------------------------------------------------- */
int FixRX::pack_forward_comm(int n, int *list, double *buf, int pbc_flag, int *pbc)
{
int ii,jj,m;
double tmp;
m = 0;
for (ii = 0; ii < n; ii++) {
jj = list[ii];
for(int ispecies=0;ispecies<nspecies;ispecies++){
tmp = atom->dvector[ispecies][jj];
buf[m++] = tmp;
tmp = atom->dvector[ispecies+nspecies][jj];
buf[m++] = tmp;
}
}
return m;
}
/* ---------------------------------------------------------------------- */
void FixRX::unpack_forward_comm(int n, int first, double *buf)
{
int ii,m,last;
double tmp;
m = 0;
last = first + n ;
for (ii = first; ii < last; ii++){
for(int ispecies=0;ispecies<nspecies;ispecies++){
tmp = buf[m++];
atom->dvector[ispecies][ii] = tmp;
tmp = buf[m++];
atom->dvector[ispecies+nspecies][ii] = tmp;
}
}
}
/* ---------------------------------------------------------------------- */
int FixRX::pack_reverse_comm(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
buf[m++] = dpdThetaLocal[i];
buf[m++] = sumWeights[i];
}
return m;
}
/* ---------------------------------------------------------------------- */
void FixRX::unpack_reverse_comm(int n, int *list, double *buf)
{
int i,j,m;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
dpdThetaLocal[j] += buf[m++];
sumWeights[j] += buf[m++];
}
}
diff --git a/src/USER-DPD/fix_rx.h b/src/USER-DPD/fix_rx.h
index ca3938f06..c35c9afab 100644
--- a/src/USER-DPD/fix_rx.h
+++ b/src/USER-DPD/fix_rx.h
@@ -1,192 +1,195 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#ifdef FIX_CLASS
FixStyle(rx,FixRX)
#else
#ifndef LMP_FIX_RX_H
#define LMP_FIX_RX_H
#include "fix.h"
typedef int (*fnptr)(double, const double *, double *, void *);
namespace LAMMPS_NS {
enum { ODE_LAMMPS_RK4, ODE_LAMMPS_RKF45 };
class FixRX : public Fix {
public:
FixRX(class LAMMPS *, int, char **);
~FixRX();
int setmask();
void post_constructor();
virtual void init();
+ void init_list(int, class NeighList *);
virtual void setup_pre_force(int);
virtual void pre_force(int);
protected:
int pack_reverse_comm(int, int, double *);
void unpack_reverse_comm(int, int *, double *);
int pack_forward_comm(int , int *, double *, int, int *);
void unpack_forward_comm(int , int , double *);
+ class NeighList *list;
+
double tmpArg;
int *mol2param; // mapping from molecule to parameters
int nreactions; // # of stored parameter sets
int maxparam; // max # of parameter sets
struct Param {
double cp;
int ispecies;
char *name; // names of unique molecules and interaction type
};
Param *params; // parameter set for an I-J-K interaction
int nspecies;
void read_file(char *);
void setupParams();
double *Arr, *nArr, *Ea, *tempExp;
double **stoich, **stoichReactants, **stoichProducts;
double *kR;
//!< Classic Runge-Kutta 4th-order stepper.
void rk4(int,double*);
//!< Runge-Kutta-Fehlberg ODE Solver.
void rkf45(int,double*);
//!< Runge-Kutta-Fehlberg ODE stepper function.
void rkf45_step (const int neq, const double h, double y[], double y_out[],
double rwk[], void* v_param);
//!< Initial step size estimation for the Runge-Kutta-Fehlberg ODE solver.
int rkf45_h0 (const int neq, const double t, const double t_stop,
const double hmin, const double hmax,
double& h0, double y[], double rwk[], void* v_params);
class PairDPDfdtEnergy *pairDPDE;
double *dpdThetaLocal;
double *sumWeights;
void computeLocalTemperature();
int localTempFlag,wtFlag,odeIntegrationFlag;
double sigFactor;
int rhs(double, const double *, double *, void *);
int rhs_dense (double, const double *, double *, void *);
// Sparse stoichiometric matrix storage format and methods.
bool useSparseKinetics;
//SparseKinetics sparseKinetics;
void initSparse(void);
int rhs_sparse(double, const double *, double *, void *) const;
int sparseKinetics_maxReactants; //<! Max # of reactants species in any reaction
int sparseKinetics_maxProducts; //<! Max # of product species in any reaction
int sparseKinetics_maxSpecies; //<! Max # of species (maxReactants + maxProducts) in any reaction
//! Objects to hold the stoichiometric coefficients using a sparse matrix
//! format. Enables a sparse formulation for the reaction rates:
//! \f${\omega}_i = \Pi_{j=1}^{NS_i} K^{f}_i [x_j]^{\nu^{'}_{ij}} -
//! K^{r}_i x_j^{\nu^{''}_{ij}}\f$.
double **sparseKinetics_nu; //<! Stoichiometric matrix with FLT values.
int **sparseKinetics_nuk; //<! Index (base-0) of species ... this is the column sparse matrix.
int **sparseKinetics_inu; //<! Stoichiometric matrix with integral values.
bool *sparseKinetics_isIntegralReaction; //<! Flag indicating if a reaction has integer stoichiometric values.
// ODE Parameters
int minSteps; //!< Minimum # of steps for the ODE solver(s).
int maxIters; //!< Maximum # of iterations for the ODE solver(s).
double relTol, absTol; //!< Relative and absolute tolerances for the ODE solver(s).
// ODE Diagnostics
int nSteps; //!< # of accepted steps taken over all atoms.
int nIters; //!< # of attemped steps for all atoms.
int nFuncs; //!< # of RHS evaluations for all atoms.
int nFails; //!< # of ODE systems that failed (for some reason).
int diagnosticFrequency; //!< Frequency (LMP steps) that run-time diagnostics will be printed to the log.
enum { numDiagnosticCounters = 5 };
enum { StepSum=0, FuncSum, TimeSum, AtomSum, CountSum };
double diagnosticCounter[ numDiagnosticCounters ];
int *diagnosticCounterPerODE[ numDiagnosticCounters ];
//!< ODE Solver diagnostics.
void odeDiagnostics(void);
private:
char *kineticsFile;
char *id_fix_species,*id_fix_species_old;
class FixPropertyAtom *fix_species,*fix_species_old;
int restartFlag;
};
}
#endif
#endif
/* ERROR/WARNING messages:
E: Illegal ... command
Self-explanatory. Check the input script syntax and compare to the
documentation for the command. You can use -echo screen as a
command-line option when running LAMMPS to see the offending line.
E: fix rx cannot be combined with fix property/atom
Self-explanatory
E: Cannot open rx file %s
Self-explanatory
E: Exceeded the maximum number of species permitted in fix rx
Reduce the number of species in the fix rx reaction kinetics file
E: There are no rx species specified.
Self-explanatory
E: Must use pair_style dpd/fdt/energy with fix rx.
Self-explanatory
E: fix rx requires fix eos/table/rx to be specified.
Self-explanatory
W: in FixRX::pre_force, ODE solver failed for %d atoms.
Self-explanatory
E: Missing parameters in reaction kinetic equation.
Self-explanatory
E: Potential file has duplicate entry.
Self-explanatory
E: Computed concentration in RK4 (RKF45) solver is < -1.0e-10.
Self-explanatory: Adjust settings for the RK4 solver.
*/
diff --git a/src/USER-MISC/fix_flow_gauss.cpp b/src/USER-MISC/fix_flow_gauss.cpp
index ad4c78f87..681717001 100644
--- a/src/USER-MISC/fix_flow_gauss.cpp
+++ b/src/USER-MISC/fix_flow_gauss.cpp
@@ -1,246 +1,245 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing authors: Steven E. Strong and Joel D. Eaves
Joel.Eaves@Colorado.edu
------------------------------------------------------------------------- */
#include <stdlib.h>
#include <string.h>
#include "fix_flow_gauss.h"
#include "atom.h"
#include "force.h"
#include "group.h"
#include "comm.h"
#include "update.h"
#include "domain.h"
#include "error.h"
#include "citeme.h"
#include "respa.h"
using namespace LAMMPS_NS;
using namespace FixConst;
static const char cite_flow_gauss[] =
"Gaussian dynamics package:\n\n"
- "@Article{strong_atomistic_2016,\n"
- "title = {Atomistic Hydrodynamics and the Dynamical Hydrophobic Effect in Porous Graphene},\n"
- "volume = {7},\n"
- "number = {10},\n"
- "issn = {1948-7185},\n"
- "url = {http://dx.doi.org/10.1021/acs.jpclett.6b00748},\n"
- "doi = {10.1021/acs.jpclett.6b00748},\n"
- "urldate = {2016-05-10},\n"
- "journal = {J. Phys. Chem. Lett.},\n"
+ "@Article{strong_water_2017,\n"
+ "title = {The Dynamics of Water in Porous Two-Dimensional Crystals},\n"
+ "volume = {121},\n"
+ "number = {1},\n"
+ "url = {http://dx.doi.org/10.1021/acs.jpcb.6b09387},\n"
+ "doi = {10.1021/acs.jpcb.6b09387},\n"
+ "urldate = {2016-12-07},\n"
+ "journal = {J. Phys. Chem. B},\n"
"author = {Strong, Steven E. and Eaves, Joel D.},\n"
- "year = {2016},\n"
- "pages = {1907--1912}\n"
+ "year = {2017},\n"
+ "pages = {189--207}\n"
"}\n\n";
FixFlowGauss::FixFlowGauss(LAMMPS *lmp, int narg, char **arg) :
Fix(lmp, narg, arg)
{
if (lmp->citeme) lmp->citeme->add(cite_flow_gauss);
if (narg < 6) error->all(FLERR,"Not enough input arguments");
// a group which conserves momentum must also conserve particle number
dynamic_group_allow = 0;
scalar_flag = 1;
vector_flag = 1;
extscalar = 1;
extvector = 1;
size_vector = 3;
global_freq = 1; //data available every timestep
respa_level_support = 1;
//default respa level=outermost level is set in init()
dimension = domain->dimension;
//get inputs
int tmpFlag;
for (int ii=0; ii<3; ii++)
{
tmpFlag=force->inumeric(FLERR,arg[3+ii]);
if (tmpFlag==1 || tmpFlag==0)
flow[ii]=tmpFlag;
else
error->all(FLERR,"Constraint flags must be 1 or 0");
}
// by default, do not compute work done
workflag=0;
// process optional keyword
int iarg = 6;
while (iarg < narg) {
if ( strcmp(arg[iarg],"energy") == 0 ) {
if ( iarg+2 > narg ) error->all(FLERR,"Illegal energy keyword");
if ( strcmp(arg[iarg+1],"yes") == 0 ) workflag = 1;
else if ( strcmp(arg[iarg+1],"no") != 0 )
error->all(FLERR,"Illegal energy keyword");
iarg += 2;
} else error->all(FLERR,"Illegal fix flow/gauss command");
}
//error checking
if (dimension == 2) {
if (flow[2])
error->all(FLERR,"Can't constrain z flow in 2d simulation");
}
dt=update->dt;
pe_tot=0.0;
}
/* ---------------------------------------------------------------------- */
int FixFlowGauss::setmask()
{
int mask = 0;
mask |= POST_FORCE;
mask |= THERMO_ENERGY;
mask |= POST_FORCE_RESPA;
return mask;
}
/* ---------------------------------------------------------------------- */
void FixFlowGauss::init()
{
//if respa level specified by fix_modify, then override default (outermost)
//if specified level too high, set to max level
if (strstr(update->integrate_style,"respa")) {
ilevel_respa = ((Respa *) update->integrate)->nlevels-1;
if (respa_level >= 0)
ilevel_respa = MIN(respa_level,ilevel_respa);
}
}
/* ----------------------------------------------------------------------
setup is called after the initial evaluation of forces before a run, so we
must remove the total force here too
------------------------------------------------------------------------- */
void FixFlowGauss::setup(int vflag)
{
//need to compute work done if set fix_modify energy yes
if (thermo_energy)
workflag=1;
//get total mass of group
mTot=group->mass(igroup);
if (mTot <= 0.0)
error->all(FLERR,"Invalid group mass in fix flow/gauss");
if (strstr(update->integrate_style,"respa")) {
((Respa *) update->integrate)->copy_flevel_f(ilevel_respa);
post_force_respa(vflag,ilevel_respa,0);
((Respa *) update->integrate)->copy_f_flevel(ilevel_respa);
}
else
post_force(vflag);
}
/* ----------------------------------------------------------------------
this is where Gaussian dynamics constraint is applied
------------------------------------------------------------------------- */
void FixFlowGauss::post_force(int vflag)
{
double **f = atom->f;
double **v = atom->v;
int *mask = atom->mask;
int *type = atom->type;
double *mass = atom->mass;
double *rmass = atom->rmass;
int nlocal = atom->nlocal;
int ii,jj;
//find the total force on all atoms
//initialize to zero
double f_thisProc[3];
for (ii=0; ii<3; ii++)
f_thisProc[ii]=0.0;
//add all forces on each processor
for(ii=0; ii<nlocal; ii++)
if (mask[ii] & groupbit)
for (jj=0; jj<3; jj++)
if (flow[jj])
f_thisProc[jj] += f[ii][jj];
//add the processor sums together
MPI_Allreduce(f_thisProc, f_tot, 3, MPI_DOUBLE, MPI_SUM, world);
//compute applied acceleration
for (ii=0; ii<3; ii++)
a_app[ii] = -f_tot[ii] / mTot;
//apply added accelleration to each atom
double f_app[3];
double peAdded=0.0;
for( ii = 0; ii<nlocal; ii++)
if (mask[ii] & groupbit) {
if (rmass) {
f_app[0] = a_app[0]*rmass[ii];
f_app[1] = a_app[1]*rmass[ii];
f_app[2] = a_app[2]*rmass[ii];
} else {
f_app[0] = a_app[0]*mass[type[ii]];
f_app[1] = a_app[1]*mass[type[ii]];
f_app[2] = a_app[2]*mass[type[ii]];
}
f[ii][0] += f_app[0]; //f_app[jj] is 0 if flow[jj] is false
f[ii][1] += f_app[1];
f[ii][2] += f_app[2];
//calculate added energy, since more costly, only do this if requested
if (workflag)
peAdded += f_app[0]*v[ii][0] + f_app[1]*v[ii][1] + f_app[2]*v[ii][2];
}
//finish calculation of work done, sum over all procs
if (workflag) {
double pe_tmp=0.0;
MPI_Allreduce(&peAdded,&pe_tmp,1,MPI_DOUBLE,MPI_SUM,world);
pe_tot += pe_tmp;
}
}
void FixFlowGauss::post_force_respa(int vflag, int ilevel, int iloop)
{
if (ilevel == ilevel_respa) post_force(vflag);
}
/* ----------------------------------------------------------------------
negative of work done by this fix
This is only computed if requested, either with fix_modify energy yes, or with the energy keyword. Otherwise returns 0.
------------------------------------------------------------------------- */
double FixFlowGauss::compute_scalar()
{
return -pe_tot*dt;
}
/* ----------------------------------------------------------------------
return components of applied force
------------------------------------------------------------------------- */
double FixFlowGauss::compute_vector(int n)
{
return -f_tot[n];
}
diff --git a/src/USER-MISC/fix_ipi.cpp b/src/USER-MISC/fix_ipi.cpp
index 67c9cc8ee..271574613 100644
--- a/src/USER-MISC/fix_ipi.cpp
+++ b/src/USER-MISC/fix_ipi.cpp
@@ -1,481 +1,487 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing author: Michele Ceriotti (EPFL), Axel Kohlmeyer (Temple U)
------------------------------------------------------------------------- */
#include <mpi.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "fix_ipi.h"
#include "atom.h"
#include "force.h"
#include "update.h"
#include "respa.h"
#include "error.h"
#include "kspace.h"
#include "modify.h"
#include "compute.h"
#include "comm.h"
#include "neighbor.h"
#include "irregular.h"
#include "domain.h"
#include "compute_pressure.h"
#include <errno.h>
using namespace LAMMPS_NS;
using namespace FixConst;
/******************************************************************************************
* A fix to interface LAMMPS with i-PI - A Python interface for path integral molecular dynamics
* Michele Ceriotti, EPFL (2014)
* Please cite:
* Ceriotti, M., More, J., & Manolopoulos, D. E. (2014).
* i-PI: A Python interface for ab initio path integral molecular dynamics simulations.
* Computer Physics Communications, 185, 1019–1026. doi:10.1016/j.cpc.2013.10.027
* And see [http://github.com/i-pi/i-pi] to download a version of i-PI
******************************************************************************************/
// socket interface
#ifndef _WIN32
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <sys/un.h>
#include <netdb.h>
#endif
#define MSGLEN 12
/* Utility functions to simplify the interface with POSIX sockets */
static void open_socket(int &sockfd, int inet, int port, char* host,
Error *error)
/* Opens a socket.
Args:
sockfd: The id of the socket that will be created.
inet: An integer that determines whether the socket will be an inet or unix
domain socket. Gives unix if 0, inet otherwise.
port: The port number for the socket to be created. Low numbers are often
reserved for important channels, so use of numbers of 4 or more digits is
recommended.
host: The name of the host server.
error: pointer to a LAMMPS Error object
*/
{
int ai_err;
#ifdef _WIN32
error->one(FLERR,"i-PI socket implementation requires UNIX environment");
#else
if (inet>0) { // creates an internet socket
// fetches information on the host
struct addrinfo hints, *res;
char service[256];
memset(&hints, 0, sizeof(hints));
hints.ai_socktype = SOCK_STREAM;
hints.ai_family = AF_UNSPEC;
hints.ai_flags = AI_PASSIVE;
sprintf(service,"%d",port); // convert the port number to a string
ai_err = getaddrinfo(host, service, &hints, &res);
if (ai_err!=0)
error->one(FLERR,"Error fetching host data. Wrong host name?");
// creates socket
sockfd = socket(res->ai_family, res->ai_socktype, res->ai_protocol);
if (sockfd < 0)
error->one(FLERR,"Error opening socket");
// makes connection
if (connect(sockfd, res->ai_addr, res->ai_addrlen) < 0)
error->one(FLERR,"Error opening INET socket: wrong port or server unreachable");
freeaddrinfo(res);
} else { // creates a unix socket
struct sockaddr_un serv_addr;
// fills up details of the socket addres
memset(&serv_addr, 0, sizeof(serv_addr));
serv_addr.sun_family = AF_UNIX;
strcpy(serv_addr.sun_path, "/tmp/ipi_");
strcpy(serv_addr.sun_path+9, host);
// creates the socket
sockfd = socket(AF_UNIX, SOCK_STREAM, 0);
// connects
if (connect(sockfd, (struct sockaddr *) &serv_addr, sizeof(serv_addr)) < 0)
error->one(FLERR,"Error opening UNIX socket: server may not be running "
"or the path to the socket unavailable");
}
#endif
}
static void writebuffer(int sockfd, const char *data, int len, Error* error)
/* Writes to a socket.
Args:
sockfd: The id of the socket that will be written to.
data: The data to be written to the socket.
len: The length of the data in bytes.
*/
{
int n;
n = write(sockfd,data,len);
if (n < 0)
error->one(FLERR,"Error writing to socket: broken connection");
}
static void readbuffer(int sockfd, char *data, int len, Error* error)
/* Reads from a socket.
Args:
sockfd: The id of the socket that will be read from.
data: The storage array for data read from the socket.
len: The length of the data in bytes.
*/
{
int n, nr;
n = nr = read(sockfd,data,len);
while (nr>0 && n<len ) {
nr=read(sockfd,&data[n],len-n);
n+=nr;
}
if (n == 0)
error->one(FLERR,"Error reading from socket: broken connection");
}
/* ---------------------------------------------------------------------- */
FixIPI::FixIPI(LAMMPS *lmp, int narg, char **arg) :
Fix(lmp, narg, arg), irregular(NULL)
{
/* format for fix:
* fix num group_id ipi host port [unix]
*/
if (strcmp(style,"ipi") != 0 && narg < 5)
error->all(FLERR,"Illegal fix ipi command");
if (atom->tag_enable == 0)
error->all(FLERR,"Cannot use fix ipi without atom IDs");
if (atom->tag_consecutive() == 0)
error->all(FLERR,"Fix ipi requires consecutive atom IDs");
if (strcmp(arg[1],"all"))
error->warning(FLERR,"Fix ipi always uses group all");
host = strdup(arg[3]);
port = force->inumeric(FLERR,arg[4]);
inet = ((narg > 5) && (strcmp(arg[5],"unix") == 0) ) ? 0 : 1;
master = (comm->me==0) ? 1 : 0;
// check if forces should be reinitialized and set flag
reset_flag = ((narg > 6 && (strcmp(arg[5],"reset") == 0 )) || ((narg > 5) && (strcmp(arg[5],"reset") == 0)) ) ? 1 : 0;
hasdata = bsize = 0;
// creates a temperature compute for all atoms
char** newarg = new char*[3];
newarg[0] = (char *) "IPI_TEMP";
newarg[1] = (char *) "all";
newarg[2] = (char *) "temp";
modify->add_compute(3,newarg);
delete [] newarg;
// creates a pressure compute to extract the virial
newarg = new char*[5];
newarg[0] = (char *) "IPI_PRESS";
newarg[1] = (char *) "all";
newarg[2] = (char *) "pressure";
newarg[3] = (char *) "IPI_TEMP";
newarg[4] = (char *) "virial";
modify->add_compute(5,newarg);
delete [] newarg;
// create instance of Irregular class
irregular = new Irregular(lmp);
+
+ // yet, we have not assigned a socket
+ socketflag = 0;
}
/* ---------------------------------------------------------------------- */
FixIPI::~FixIPI()
{
if (bsize) delete[] buffer;
free(host);
modify->delete_compute("IPI_TEMP");
modify->delete_compute("IPI_PRESS");
delete irregular;
}
/* ---------------------------------------------------------------------- */
int FixIPI::setmask()
{
int mask = 0;
mask |= INITIAL_INTEGRATE;
mask |= FINAL_INTEGRATE;
return mask;
}
/* ---------------------------------------------------------------------- */
void FixIPI::init()
{
//only opens socket on master process
- if (master) open_socket(ipisock, inet, port, host, error);
- else ipisock=0;
+ if (master) {
+ if (!socketflag) open_socket(ipisock, inet, port, host, error);
+ } else ipisock=0;
//! should check for success in socket opening -- but the current open_socket routine dies brutally if unsuccessful
+ // tell lammps we have assigned a socket
+ socketflag = 1;
// asks for evaluation of PE at first step
modify->compute[modify->find_compute("thermo_pe")]->invoked_scalar = -1;
modify->addstep_compute_all(update->ntimestep + 1);
kspace_flag = (force->kspace) ? 1 : 0;
// makes sure that neighbor lists are re-built at each step (cannot make assumptions when cycling over beads!)
neighbor->delay = 0;
neighbor->every = 1;
}
void FixIPI::initial_integrate(int vflag)
{
/* This is called at the beginning of the integration loop,
* and will be used to read positions from the socket. Then,
* everything should be updated, since there is no guarantee
* that successive snapshots will be close together (think
* of parallel tempering for instance) */
char header[MSGLEN+1];
if (hasdata)
error->all(FLERR, "i-PI got out of sync in initial_integrate and will die!");
double cellh[9], cellih[9];
int nat;
if (master) { // only read positions on master
// wait until something happens
while (true) {
// while i-PI just asks for status, signal we are ready and wait
readbuffer(ipisock, header, MSGLEN, error); header[MSGLEN]=0;
if (strcmp(header,"STATUS ") == 0 )
writebuffer(ipisock,"READY ",MSGLEN, error);
else break;
}
if (strcmp(header,"EXIT ") == 0 )
error->one(FLERR, "Got EXIT message from i-PI. Now leaving!");
// when i-PI signals it has positions to evaluate new forces,
// read positions and cell data
if (strcmp(header,"POSDATA ") == 0 ) {
readbuffer(ipisock, (char*) cellh, 9*8, error);
readbuffer(ipisock, (char*) cellih, 9*8, error);
readbuffer(ipisock, (char*) &nat, 4, error);
// allocate buffer, but only do this once.
if (bsize==0) {
bsize=3*nat;
buffer = new double[bsize];
} else if (bsize != 3*nat)
error->one(FLERR, "Number of atoms changed along the way.");
// finally read position data into buffer
readbuffer(ipisock, (char*) buffer, 8*bsize, error);
} else
error->one(FLERR, "Wrapper did not send positions, I will now die!");
}
// shares the atomic coordinates with everyone
MPI_Bcast(&nat,1,MPI_INT,0,world);
// must also allocate the buffer on the non-head nodes
if (bsize==0) {
bsize=3*nat;
buffer = new double[bsize];
}
MPI_Bcast(cellh,9,MPI_DOUBLE,0,world);
MPI_Bcast(cellih,9,MPI_DOUBLE,0,world);
MPI_Bcast(buffer,bsize,MPI_DOUBLE,0,world);
//updates atomic coordinates and cell based on the data received
double *boxhi = domain->boxhi;
double *boxlo = domain->boxlo;
double posconv;
posconv=0.52917721*force->angstrom;
boxlo[0] = -0.5*cellh[0]*posconv;
boxlo[1] = -0.5*cellh[4]*posconv;
boxlo[2] = -0.5*cellh[8]*posconv;
boxhi[0] = -boxlo[0];
boxhi[1] = -boxlo[1];
boxhi[2] = -boxlo[2];
domain->xy = cellh[1]*posconv;
domain->xz = cellh[2]*posconv;
domain->yz = cellh[5]*posconv;
// do error checks on simulation box and set small for triclinic boxes
domain->set_initial_box();
// reset global and local box using the new box dimensions
domain->reset_box();
// signal that the box has (or may have) changed
domain->box_change = 1;
// picks local atoms from the buffer
double **x = atom->x;
int *mask = atom->mask;
int nlocal = atom->nlocal;
if (igroup == atom->firstgroup) nlocal = atom->nfirst;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) {
x[i][0]=buffer[3*(atom->tag[i]-1)+0]*posconv;
x[i][1]=buffer[3*(atom->tag[i]-1)+1]*posconv;
x[i][2]=buffer[3*(atom->tag[i]-1)+2]*posconv;
}
}
// insure atoms are in current box & update box via shrink-wrap
// has to be be done before invoking Irregular::migrate_atoms()
// since it requires atoms be inside simulation box
if (domain->triclinic) domain->x2lamda(atom->nlocal);
domain->pbc();
domain->reset_box();
if (domain->triclinic) domain->lamda2x(atom->nlocal);
// move atoms to new processors via irregular()
// only needed if migrate_check() says an atom moves to far
if (domain->triclinic) domain->x2lamda(atom->nlocal);
if (irregular->migrate_check()) irregular->migrate_atoms();
if (domain->triclinic) domain->lamda2x(atom->nlocal);
// check if kspace solver is used
if (reset_flag && kspace_flag) {
// reset kspace, pair, angles, ... b/c simulation box might have changed.
// kspace->setup() is in some cases not enough since, e.g., g_ewald needs
// to be reestimated due to changes in box dimensions.
force->init();
// setup_grid() is necessary for pppm since init() is not calling
// setup() nor setup_grid() upon calling init().
if (force->kspace->pppmflag) force->kspace->setup_grid();
// other kspace styles might need too another setup()?
} else if (!reset_flag && kspace_flag) {
// original version
force->kspace->setup();
}
// compute PE. makes sure that it will be evaluated at next step
modify->compute[modify->find_compute("thermo_pe")]->invoked_scalar = -1;
modify->addstep_compute_all(update->ntimestep+1);
hasdata=1;
}
void FixIPI::final_integrate()
{
/* This is called after forces and energy have been computed. Now we only need to
* communicate them back to i-PI so that the integration can continue. */
char header[MSGLEN+1];
double vir[9], pot=0.0;
double forceconv, potconv, posconv, pressconv, posconv3;
char retstr[1024];
// conversions from LAMMPS units to atomic units, which are used by i-PI
potconv=3.1668152e-06/force->boltz;
posconv=0.52917721*force->angstrom;
posconv3=posconv*posconv*posconv;
forceconv=potconv*posconv;
pressconv=1/force->nktv2p*potconv*posconv3;
// compute for potential energy
pot=modify->compute[modify->find_compute("thermo_pe")]->compute_scalar();
pot*=potconv;
// probably useless check
if (!hasdata)
error->all(FLERR, "i-PI got out of sync in final_integrate and will die!");
int nat=bsize/3;
double **f= atom->f;
double lbuf[bsize];
// reassembles the force vector from the local arrays
int nlocal = atom->nlocal;
if (igroup == atom->firstgroup) nlocal = atom->nfirst;
for (int i = 0; i < bsize; ++i) lbuf[i]=0.0;
for (int i = 0; i < nlocal; i++) {
lbuf[3*(atom->tag[i]-1)+0]=f[i][0]*forceconv;
lbuf[3*(atom->tag[i]-1)+1]=f[i][1]*forceconv;
lbuf[3*(atom->tag[i]-1)+2]=f[i][2]*forceconv;
}
MPI_Allreduce(lbuf,buffer,bsize,MPI_DOUBLE,MPI_SUM,world);
for (int i = 0; i < 9; ++i) vir[i]=0.0;
int press_id = modify->find_compute("IPI_PRESS");
Compute* comp_p = modify->compute[press_id];
comp_p->compute_vector();
double myvol = domain->xprd*domain->yprd*domain->zprd/posconv3;
vir[0] = comp_p->vector[0]*pressconv*myvol;
vir[4] = comp_p->vector[1]*pressconv*myvol;
vir[8] = comp_p->vector[2]*pressconv*myvol;
vir[1] = comp_p->vector[3]*pressconv*myvol;
vir[2] = comp_p->vector[4]*pressconv*myvol;
vir[5] = comp_p->vector[5]*pressconv*myvol;
retstr[0]=0;
if (master) {
while (true) {
readbuffer(ipisock, header, MSGLEN, error); header[MSGLEN]=0;
if (strcmp(header,"STATUS ") == 0 )
writebuffer(ipisock,"HAVEDATA ",MSGLEN, error);
else break;
}
if (strcmp(header,"EXIT ") == 0 )
error->one(FLERR, "Got EXIT message from i-PI. Now leaving!");
if (strcmp(header,"GETFORCE ") == 0 ) {
writebuffer(ipisock,"FORCEREADY ",MSGLEN, error);
writebuffer(ipisock,(char*) &pot,8, error);
writebuffer(ipisock,(char*) &nat,4, error);
writebuffer(ipisock,(char*) buffer, bsize*8, error);
writebuffer(ipisock,(char*) vir,9*8, error);
nat=strlen(retstr); writebuffer(ipisock,(char*) &nat,4, error);
writebuffer(ipisock,(char*) retstr, nat, error);
}
else
error->one(FLERR, "Wrapper did not ask for forces, I will now die!");
}
hasdata=0;
}
diff --git a/src/USER-MISC/fix_ipi.h b/src/USER-MISC/fix_ipi.h
index 0bb3717de..191b6c280 100644
--- a/src/USER-MISC/fix_ipi.h
+++ b/src/USER-MISC/fix_ipi.h
@@ -1,49 +1,49 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#ifdef FIX_CLASS
FixStyle(ipi,FixIPI)
#else
#ifndef LMP_FIX_IPI_H
#define LMP_FIX_IPI_H
#include "fix.h"
namespace LAMMPS_NS {
class FixIPI : public Fix {
public:
FixIPI(class LAMMPS *, int, char **);
virtual ~FixIPI();
int setmask();
virtual void init();
virtual void initial_integrate(int);
virtual void final_integrate();
protected:
char *host; int port; int inet, master, hasdata;
- int ipisock, me; double *buffer; long bsize;
+ int ipisock, me, socketflag; double *buffer; long bsize;
int kspace_flag;
int reset_flag;
private:
class Irregular *irregular;
};
}
#endif
#endif
diff --git a/src/USER-MISC/fix_srp.cpp b/src/USER-MISC/fix_srp.cpp
index 88f18e9a7..fbd8473cb 100644
--- a/src/USER-MISC/fix_srp.cpp
+++ b/src/USER-MISC/fix_srp.cpp
@@ -1,634 +1,631 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing authors: Timothy Sirk (ARL), Pieter in't Veld (BASF)
------------------------------------------------------------------------- */
#include <string.h>
#include <stdlib.h>
#include "fix_srp.h"
#include "atom.h"
#include "force.h"
#include "domain.h"
#include "comm.h"
#include "memory.h"
#include "error.h"
#include "neighbor.h"
#include "atom_vec.h"
#include "modify.h"
using namespace LAMMPS_NS;
using namespace FixConst;
/* ---------------------------------------------------------------------- */
FixSRP::FixSRP(LAMMPS *lmp, int narg, char **arg) : Fix(lmp, narg, arg)
{
// settings
nevery=1;
peratom_freq = 1;
time_integrate = 0;
create_attribute = 0;
comm_border = 2;
// restart settings
restart_global = 1;
restart_peratom = 1;
restart_pbc = 1;
// per-atom array width 2
peratom_flag = 1;
size_peratom_cols = 2;
// initial allocation of atom-based array
// register with Atom class
array = NULL;
grow_arrays(atom->nmax);
// extends pack_exchange()
atom->add_callback(0);
atom->add_callback(1); // restart
atom->add_callback(2);
// initialize to illegal values so we capture
btype = -1;
bptype = -1;
// zero
for (int i = 0; i < atom->nmax; i++)
- for (int m = 0; m < 3; m++)
+ for (int m = 0; m < 2; m++)
array[i][m] = 0.0;
}
/* ---------------------------------------------------------------------- */
FixSRP::~FixSRP()
{
// unregister callbacks to this fix from Atom class
atom->delete_callback(id,0);
atom->delete_callback(id,1);
atom->delete_callback(id,2);
memory->destroy(array);
}
/* ---------------------------------------------------------------------- */
int FixSRP::setmask()
{
int mask = 0;
mask |= PRE_FORCE;
mask |= PRE_EXCHANGE;
mask |= POST_RUN;
return mask;
}
/* ---------------------------------------------------------------------- */
void FixSRP::init()
{
if (force->pair_match("hybrid",1) == NULL)
error->all(FLERR,"Cannot use pair srp without pair_style hybrid");
if ((bptype < 1) || (bptype > atom->ntypes))
error->all(FLERR,"Illegal bond particle type");
// fix SRP should be the first fix running at the PRE_EXCHANGE step.
// Otherwise it might conflict with, e.g. fix deform
if (modify->n_pre_exchange > 1) {
char *first = modify->fix[modify->list_pre_exchange[0]]->id;
if ((comm->me == 0) && (strcmp(id,first) != 0))
error->warning(FLERR,"Internal fix for pair srp defined too late."
" May lead to incorrect behavior.");
}
// setup neigh exclusions for diff atom types
// bond particles do not interact with other types
// type bptype only interacts with itself
char* arg1[4];
arg1[0] = (char *) "exclude";
arg1[1] = (char *) "type";
char c0[20];
char c1[20];
for(int z = 1; z < atom->ntypes; z++) {
if(z == bptype)
continue;
sprintf(c0, "%d", z);
arg1[2] = c0;
sprintf(c1, "%d", bptype);
arg1[3] = c1;
neighbor->modify_params(4, arg1);
}
}
/* ----------------------------------------------------------------------
insert bond particles
------------------------------------------------------------------------- */
void FixSRP::setup_pre_force(int zz)
{
double **x = atom->x;
double **xold;
tagint *tag = atom->tag;
tagint *tagold;
int *type = atom->type;
int* dlist;
AtomVec *avec = atom->avec;
int **bondlist = neighbor->bondlist;
int nlocal, nlocal_old;
nlocal = nlocal_old = atom->nlocal;
bigint nall = atom->nlocal + atom->nghost;
int nbondlist = neighbor->nbondlist;
int i,j,n;
// make a copy of all coordinates and tags
// that is consistent with the bond list as
// atom->x will be affected by creating/deleting atoms.
// also compile list of local atoms to be deleted.
memory->create(xold,nall,3,"fix_srp:xold");
memory->create(tagold,nall,"fix_srp:tagold");
memory->create(dlist,nall,"fix_srp:dlist");
for (i = 0; i < nall; i++){
xold[i][0] = x[i][0];
xold[i][1] = x[i][1];
xold[i][2] = x[i][2];
tagold[i]=tag[i];
dlist[i] = (type[i] == bptype) ? 1 : 0;
- for (n = 0; n < 3; n++)
+ for (n = 0; n < 2; n++)
array[i][n] = 0.0;
}
// delete local atoms flagged in dlist
i = 0;
int ndel = 0;
while (i < nlocal) {
if (dlist[i]) {
avec->copy(nlocal-1,i,1);
dlist[i] = dlist[nlocal-1];
nlocal--;
ndel++;
} else i++;
}
atom->nlocal = nlocal;
memory->destroy(dlist);
int nadd = 0;
double rsqold = 0.0;
double delx, dely, delz, rmax, rsq, rsqmax;
double xone[3];
for (n = 0; n < nbondlist; n++) {
// consider only the user defined bond type
// btype of zero considers all bonds
if(btype > 0 && bondlist[n][2] != btype)
continue;
i = bondlist[n][0];
j = bondlist[n][1];
// position of bond i
xone[0] = (xold[i][0] + xold[j][0])*0.5;
xone[1] = (xold[i][1] + xold[j][1])*0.5;
xone[2] = (xold[i][2] + xold[j][2])*0.5;
// record longest bond
// this used to set ghost cutoff
delx = xold[j][0] - xold[i][0];
dely = xold[j][1] - xold[i][1];
delz = xold[j][2] - xold[i][2];
rsq = delx*delx + dely*dely + delz*delz;
if(rsq > rsqold) rsqold = rsq;
// make one particle for each bond
// i is local
// if newton bond, always make particle
// if j is local, always make particle
// if j is ghost, decide from tag
if ((force->newton_bond) || (j < nlocal_old) || (tagold[i] > tagold[j])) {
atom->natoms++;
avec->create_atom(bptype,xone);
// pack tag i/j into buffer for comm
array[atom->nlocal-1][0] = static_cast<double>(tagold[i]);
array[atom->nlocal-1][1] = static_cast<double>(tagold[j]);
nadd++;
}
}
bigint nblocal = atom->nlocal;
MPI_Allreduce(&nblocal,&atom->natoms,1,MPI_LMP_BIGINT,MPI_SUM,world);
// free temporary storage
memory->destroy(xold);
memory->destroy(tagold);
char str[128];
int nadd_all = 0, ndel_all = 0;
MPI_Allreduce(&ndel,&ndel_all,1,MPI_INT,MPI_SUM,world);
MPI_Allreduce(&nadd,&nadd_all,1,MPI_INT,MPI_SUM,world);
if(comm->me == 0){
sprintf(str, "Removed/inserted %d/%d bond particles.", ndel_all,nadd_all);
error->message(FLERR,str);
}
// check ghost comm distances
// warn and change if shorter from estimate
// ghost atoms must be present for bonds on edge of neighbor cutoff
// extend cutghost slightly more than half of the longest bond
MPI_Allreduce(&rsqold,&rsqmax,1,MPI_DOUBLE,MPI_MAX,world);
rmax = sqrt(rsqmax);
double cutneighmax_srp = neighbor->cutneighmax + 0.51*rmax;
// find smallest cutghost
double cutghostmin = comm->cutghost[0];
if (cutghostmin > comm->cutghost[1])
cutghostmin = comm->cutghost[1];
if (cutghostmin > comm->cutghost[2])
cutghostmin = comm->cutghost[2];
- // reset cutghost if needed
+ // stop if cutghost is insufficient
if (cutneighmax_srp > cutghostmin){
- if(comm->me == 0){
- sprintf(str, "Extending ghost comm cutoff. New %f, old %f.", cutneighmax_srp, cutghostmin);
- error->message(FLERR,str);
- }
- // cutghost updated by comm->setup
- comm->cutghostuser = cutneighmax_srp;
+ sprintf(str, "Communication cutoff too small for fix srp. "
+ "Need %f, current %f.", cutneighmax_srp, cutghostmin);
+ error->all(FLERR,str);
}
// assign tags for new atoms, update map
atom->tag_extend();
if (atom->map_style) {
atom->nghost = 0;
atom->map_init();
atom->map_set();
}
// put new particles in the box before exchange
// move owned to new procs
// get ghosts
// build neigh lists again
// if triclinic, lambda coords needed for pbc, exchange, borders
if (domain->triclinic) domain->x2lamda(atom->nlocal);
domain->pbc();
comm->setup();
if (neighbor->style) neighbor->setup_bins();
comm->exchange();
if (atom->sortfreq > 0) atom->sort();
comm->borders();
// back to box coords
if (domain->triclinic) domain->lamda2x(atom->nlocal+atom->nghost);
domain->image_check();
domain->box_too_small_check();
modify->setup_pre_neighbor();
neighbor->build();
neighbor->ncalls = 0;
// new atom counts
nlocal = atom->nlocal;
nall = atom->nlocal + atom->nghost;
// zero all forces
for(i = 0; i < nall; i++)
atom->f[i][0] = atom->f[i][1] = atom->f[i][2] = 0.0;
// do not include bond particles in thermo output
// remove them from all groups. set their velocity to zero.
for(i=0; i< nlocal; i++)
if(atom->type[i] == bptype) {
atom->mask[i] = 0;
atom->v[i][0] = atom->v[i][1] = atom->v[i][2] = 0.0;
}
}
/* ----------------------------------------------------------------------
set position of bond particles
------------------------------------------------------------------------- */
void FixSRP::pre_exchange()
{
// update ghosts
comm->forward_comm();
// reassign bond particle coordinates to midpoint of bonds
// only need to do this before neigh rebuild
double **x=atom->x;
int i,j;
int nlocal = atom->nlocal;
for(int ii = 0; ii < nlocal; ii++){
if(atom->type[ii] != bptype) continue;
i = atom->map(static_cast<tagint>(array[ii][0]));
if(i < 0) error->all(FLERR,"Fix SRP failed to map atom");
i = domain->closest_image(ii,i);
j = atom->map(static_cast<tagint>(array[ii][1]));
if(j < 0) error->all(FLERR,"Fix SRP failed to map atom");
j = domain->closest_image(ii,j);
// position of bond particle ii
atom->x[ii][0] = (x[i][0] + x[j][0])*0.5;
atom->x[ii][1] = (x[i][1] + x[j][1])*0.5;
atom->x[ii][2] = (x[i][2] + x[j][2])*0.5;
}
}
/* ----------------------------------------------------------------------
memory usage of local atom-based array
------------------------------------------------------------------------- */
double FixSRP::memory_usage()
{
double bytes = atom->nmax*2 * sizeof(double);
return bytes;
}
/* ----------------------------------------------------------------------
allocate atom-based array
------------------------------------------------------------------------- */
void FixSRP::grow_arrays(int nmax)
{
memory->grow(array,nmax,2,"fix_srp:array");
array_atom = array;
}
/* ----------------------------------------------------------------------
copy values within local atom-based array
called when move to new proc
------------------------------------------------------------------------- */
void FixSRP::copy_arrays(int i, int j, int delflag)
{
for (int m = 0; m < 2; m++)
array[j][m] = array[i][m];
}
/* ----------------------------------------------------------------------
initialize one atom's array values
called when atom is created
------------------------------------------------------------------------- */
void FixSRP::set_arrays(int i)
{
array[i][0] = -1;
array[i][1] = -1;
}
/* ----------------------------------------------------------------------
pack values in local atom-based array for exchange with another proc
------------------------------------------------------------------------- */
int FixSRP::pack_exchange(int i, double *buf)
{
for (int m = 0; m < 2; m++) buf[m] = array[i][m];
return 2;
}
/* ----------------------------------------------------------------------
unpack values in local atom-based array from exchange with another proc
------------------------------------------------------------------------- */
int FixSRP::unpack_exchange(int nlocal, double *buf)
{
for (int m = 0; m < 2; m++) array[nlocal][m] = buf[m];
return 2;
}
/* ----------------------------------------------------------------------
pack values for border communication at re-neighboring
------------------------------------------------------------------------- */
int FixSRP::pack_border(int n, int *list, double *buf)
{
// pack buf for border com
int i,j;
int m = 0;
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = array[j][0];
buf[m++] = array[j][1];
}
return m;
}
/* ----------------------------------------------------------------------
unpack values for border communication at re-neighboring
------------------------------------------------------------------------- */
int FixSRP::unpack_border(int n, int first, double *buf)
{
// unpack buf into array
int i,last;
int m = 0;
last = first + n;
for (i = first; i < last; i++){
array[i][0] = buf[m++];
array[i][1] = buf[m++];
}
return m;
}
/* ----------------------------------------------------------------------
remove particles after run
------------------------------------------------------------------------- */
void FixSRP::post_run()
{
// all bond particles are removed after each run
// useful for write_data and write_restart commands
// since those commands occur between runs
bigint natoms_previous = atom->natoms;
int nlocal = atom->nlocal;
int* dlist;
memory->create(dlist,nlocal,"fix_srp:dlist");
for (int i = 0; i < nlocal; i++){
if(atom->type[i] == bptype)
dlist[i] = 1;
else
dlist[i] = 0;
}
// delete local atoms flagged in dlist
// reset nlocal
AtomVec *avec = atom->avec;
int i = 0;
while (i < nlocal) {
if (dlist[i]) {
avec->copy(nlocal-1,i,1);
dlist[i] = dlist[nlocal-1];
nlocal--;
} else i++;
}
atom->nlocal = nlocal;
memory->destroy(dlist);
// reset atom->natoms
// reset atom->map if it exists
// set nghost to 0 so old ghosts won't be mapped
bigint nblocal = atom->nlocal;
MPI_Allreduce(&nblocal,&atom->natoms,1,MPI_LMP_BIGINT,MPI_SUM,world);
if (atom->map_style) {
atom->nghost = 0;
atom->map_init();
atom->map_set();
}
// print before and after atom count
bigint ndelete = natoms_previous - atom->natoms;
if (comm->me == 0) {
if (screen) fprintf(screen,"Deleted " BIGINT_FORMAT
" atoms, new total = " BIGINT_FORMAT "\n",
ndelete,atom->natoms);
if (logfile) fprintf(logfile,"Deleted " BIGINT_FORMAT
" atoms, new total = " BIGINT_FORMAT "\n",
ndelete,atom->natoms);
}
// verlet calls box_too_small_check() in post_run
// this check maps all bond partners
// therefore need ghosts
// need to convert to lambda coords before apply pbc
if (domain->triclinic) domain->x2lamda(atom->nlocal);
domain->pbc();
comm->setup();
comm->exchange();
if (atom->sortfreq > 0) atom->sort();
comm->borders();
// change back to box coordinates
if (domain->triclinic) domain->lamda2x(atom->nlocal+atom->nghost);
}
/* ----------------------------------------------------------------------
pack values in local atom-based arrays for restart file
------------------------------------------------------------------------- */
int FixSRP::pack_restart(int i, double *buf)
{
int m = 0;
buf[m++] = 3;
buf[m++] = array[i][0];
buf[m++] = array[i][1];
return m;
}
/* ----------------------------------------------------------------------
unpack values from atom->extra array to restart the fix
------------------------------------------------------------------------- */
void FixSRP::unpack_restart(int nlocal, int nth)
{
double **extra = atom->extra;
// skip to Nth set of extra values
int m = 0;
for (int i = 0; i < nth; i++){
m += extra[nlocal][m];
}
m++;
array[nlocal][0] = extra[nlocal][m++];
array[nlocal][1] = extra[nlocal][m++];
}
/* ----------------------------------------------------------------------
maxsize of any atom's restart data
------------------------------------------------------------------------- */
int FixSRP::maxsize_restart()
{
return 3;
}
/* ----------------------------------------------------------------------
size of atom nlocal's restart data
------------------------------------------------------------------------- */
int FixSRP::size_restart(int nlocal)
{
return 3;
}
/* ----------------------------------------------------------------------
pack global state of Fix
------------------------------------------------------------------------- */
void FixSRP::write_restart(FILE *fp)
{
int n = 0;
double list[3];
list[n++] = comm->cutghostuser;
list[n++] = btype;
list[n++] = bptype;
if (comm->me == 0) {
int size = n * sizeof(double);
fwrite(&size,sizeof(int),1,fp);
fwrite(list,sizeof(double),n,fp);
}
}
/* ----------------------------------------------------------------------
use info from restart file to restart the Fix
------------------------------------------------------------------------- */
void FixSRP::restart(char *buf)
{
int n = 0;
double *list = (double *) buf;
comm->cutghostuser = static_cast<double> (list[n++]);
btype = static_cast<int> (list[n++]);
bptype = static_cast<int> (list[n++]);
}
/* ----------------------------------------------------------------------
interface with pair class
pair srp sets the bond type in this fix
------------------------------------------------------------------------- */
int FixSRP::modify_param(int narg, char **arg)
{
if (strcmp(arg[0],"btype") == 0) {
btype = atoi(arg[1]);
return 2;
}
if (strcmp(arg[0],"bptype") == 0) {
bptype = atoi(arg[1]);
return 2;
}
return 0;
}
diff --git a/src/atom_vec_atomic.cpp b/src/atom_vec_atomic.cpp
index c29e04ea8..eda1a3315 100644
--- a/src/atom_vec_atomic.cpp
+++ b/src/atom_vec_atomic.cpp
@@ -1,683 +1,683 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include <stdlib.h>
#include "atom_vec_atomic.h"
#include "atom.h"
#include "comm.h"
#include "domain.h"
#include "modify.h"
#include "fix.h"
#include "memory.h"
#include "error.h"
using namespace LAMMPS_NS;
/* ---------------------------------------------------------------------- */
AtomVecAtomic::AtomVecAtomic(LAMMPS *lmp) : AtomVec(lmp)
{
molecular = 0;
mass_type = 1;
comm_x_only = comm_f_only = 1;
size_forward = 3;
size_reverse = 3;
size_border = 6;
size_velocity = 3;
size_data_atom = 5;
size_data_vel = 4;
xcol_data = 3;
}
/* ----------------------------------------------------------------------
grow atom arrays
n = 0 grows arrays by a chunk
n > 0 allocates arrays to size n
------------------------------------------------------------------------- */
void AtomVecAtomic::grow(int n)
{
if (n == 0) grow_nmax();
else nmax = n;
atom->nmax = nmax;
- if (nmax < 0)
+ if (nmax < 0 || nmax > MAXSMALLINT)
error->one(FLERR,"Per-processor system is too big");
tag = memory->grow(atom->tag,nmax,"atom:tag");
type = memory->grow(atom->type,nmax,"atom:type");
mask = memory->grow(atom->mask,nmax,"atom:mask");
image = memory->grow(atom->image,nmax,"atom:image");
x = memory->grow(atom->x,nmax,3,"atom:x");
v = memory->grow(atom->v,nmax,3,"atom:v");
f = memory->grow(atom->f,nmax*comm->nthreads,3,"atom:f");
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
modify->fix[atom->extra_grow[iextra]]->grow_arrays(nmax);
}
/* ----------------------------------------------------------------------
reset local array ptrs
------------------------------------------------------------------------- */
void AtomVecAtomic::grow_reset()
{
tag = atom->tag; type = atom->type;
mask = atom->mask; image = atom->image;
x = atom->x; v = atom->v; f = atom->f;
}
/* ----------------------------------------------------------------------
copy atom I info to atom J
------------------------------------------------------------------------- */
void AtomVecAtomic::copy(int i, int j, int delflag)
{
tag[j] = tag[i];
type[j] = type[i];
mask[j] = mask[i];
image[j] = image[i];
x[j][0] = x[i][0];
x[j][1] = x[i][1];
x[j][2] = x[i][2];
v[j][0] = v[i][0];
v[j][1] = v[i][1];
v[j][2] = v[i][2];
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
modify->fix[atom->extra_grow[iextra]]->copy_arrays(i,j,delflag);
}
/* ---------------------------------------------------------------------- */
int AtomVecAtomic::pack_comm(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0];
buf[m++] = x[j][1];
buf[m++] = x[j][2];
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0]*domain->xprd + pbc[5]*domain->xy + pbc[4]*domain->xz;
dy = pbc[1]*domain->yprd + pbc[3]*domain->yz;
dz = pbc[2]*domain->zprd;
}
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
}
}
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecAtomic::pack_comm_vel(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz,dvx,dvy,dvz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0];
buf[m++] = x[j][1];
buf[m++] = x[j][2];
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0]*domain->xprd + pbc[5]*domain->xy + pbc[4]*domain->xz;
dy = pbc[1]*domain->yprd + pbc[3]*domain->yz;
dz = pbc[2]*domain->zprd;
}
if (!deform_vremap) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
}
} else {
dvx = pbc[0]*h_rate[0] + pbc[5]*h_rate[5] + pbc[4]*h_rate[4];
dvy = pbc[1]*h_rate[1] + pbc[3]*h_rate[3];
dvz = pbc[2]*h_rate[2];
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
if (mask[i] & deform_groupbit) {
buf[m++] = v[j][0] + dvx;
buf[m++] = v[j][1] + dvy;
buf[m++] = v[j][2] + dvz;
} else {
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
}
}
}
}
return m;
}
/* ---------------------------------------------------------------------- */
void AtomVecAtomic::unpack_comm(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
x[i][0] = buf[m++];
x[i][1] = buf[m++];
x[i][2] = buf[m++];
}
}
/* ---------------------------------------------------------------------- */
void AtomVecAtomic::unpack_comm_vel(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
x[i][0] = buf[m++];
x[i][1] = buf[m++];
x[i][2] = buf[m++];
v[i][0] = buf[m++];
v[i][1] = buf[m++];
v[i][2] = buf[m++];
}
}
/* ---------------------------------------------------------------------- */
int AtomVecAtomic::pack_reverse(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
buf[m++] = f[i][0];
buf[m++] = f[i][1];
buf[m++] = f[i][2];
}
return m;
}
/* ---------------------------------------------------------------------- */
void AtomVecAtomic::unpack_reverse(int n, int *list, double *buf)
{
int i,j,m;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
f[j][0] += buf[m++];
f[j][1] += buf[m++];
f[j][2] += buf[m++];
}
}
/* ---------------------------------------------------------------------- */
int AtomVecAtomic::pack_border(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0];
buf[m++] = x[j][1];
buf[m++] = x[j][2];
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0];
dy = pbc[1];
dz = pbc[2];
}
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
}
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecAtomic::pack_border_vel(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz,dvx,dvy,dvz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0];
buf[m++] = x[j][1];
buf[m++] = x[j][2];
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0];
dy = pbc[1];
dz = pbc[2];
}
if (!deform_vremap) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
}
} else {
dvx = pbc[0]*h_rate[0] + pbc[5]*h_rate[5] + pbc[4]*h_rate[4];
dvy = pbc[1]*h_rate[1] + pbc[3]*h_rate[3];
dvz = pbc[2]*h_rate[2];
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
if (mask[i] & deform_groupbit) {
buf[m++] = v[j][0] + dvx;
buf[m++] = v[j][1] + dvy;
buf[m++] = v[j][2] + dvz;
} else {
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
}
}
}
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);
return m;
}
/* ---------------------------------------------------------------------- */
void AtomVecAtomic::unpack_border(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
if (i == nmax) grow(0);
x[i][0] = buf[m++];
x[i][1] = buf[m++];
x[i][2] = buf[m++];
tag[i] = (tagint) ubuf(buf[m++]).i;
type[i] = (int) ubuf(buf[m++]).i;
mask[i] = (int) ubuf(buf[m++]).i;
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->
unpack_border(n,first,&buf[m]);
}
/* ---------------------------------------------------------------------- */
void AtomVecAtomic::unpack_border_vel(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
if (i == nmax) grow(0);
x[i][0] = buf[m++];
x[i][1] = buf[m++];
x[i][2] = buf[m++];
tag[i] = (tagint) ubuf(buf[m++]).i;
type[i] = (int) ubuf(buf[m++]).i;
mask[i] = (int) ubuf(buf[m++]).i;
v[i][0] = buf[m++];
v[i][1] = buf[m++];
v[i][2] = buf[m++];
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->
unpack_border(n,first,&buf[m]);
}
/* ----------------------------------------------------------------------
pack data for atom I for sending to another proc
xyz must be 1st 3 values, so comm::exchange() can test on them
------------------------------------------------------------------------- */
int AtomVecAtomic::pack_exchange(int i, double *buf)
{
int m = 1;
buf[m++] = x[i][0];
buf[m++] = x[i][1];
buf[m++] = x[i][2];
buf[m++] = v[i][0];
buf[m++] = v[i][1];
buf[m++] = v[i][2];
buf[m++] = ubuf(tag[i]).d;
buf[m++] = ubuf(type[i]).d;
buf[m++] = ubuf(mask[i]).d;
buf[m++] = ubuf(image[i]).d;
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
m += modify->fix[atom->extra_grow[iextra]]->pack_exchange(i,&buf[m]);
buf[0] = m;
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecAtomic::unpack_exchange(double *buf)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) grow(0);
int m = 1;
x[nlocal][0] = buf[m++];
x[nlocal][1] = buf[m++];
x[nlocal][2] = buf[m++];
v[nlocal][0] = buf[m++];
v[nlocal][1] = buf[m++];
v[nlocal][2] = buf[m++];
tag[nlocal] = (tagint) ubuf(buf[m++]).i;
type[nlocal] = (int) ubuf(buf[m++]).i;
mask[nlocal] = (int) ubuf(buf[m++]).i;
image[nlocal] = (imageint) ubuf(buf[m++]).i;
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
m += modify->fix[atom->extra_grow[iextra]]->
unpack_exchange(nlocal,&buf[m]);
atom->nlocal++;
return m;
}
/* ----------------------------------------------------------------------
size of restart data for all atoms owned by this proc
include extra data stored by fixes
------------------------------------------------------------------------- */
int AtomVecAtomic::size_restart()
{
int i;
int nlocal = atom->nlocal;
int n = 11 * nlocal;
if (atom->nextra_restart)
for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
for (i = 0; i < nlocal; i++)
n += modify->fix[atom->extra_restart[iextra]]->size_restart(i);
return n;
}
/* ----------------------------------------------------------------------
pack atom I's data for restart file including extra quantities
xyz must be 1st 3 values, so that read_restart can test on them
molecular types may be negative, but write as positive
------------------------------------------------------------------------- */
int AtomVecAtomic::pack_restart(int i, double *buf)
{
int m = 1;
buf[m++] = x[i][0];
buf[m++] = x[i][1];
buf[m++] = x[i][2];
buf[m++] = ubuf(tag[i]).d;
buf[m++] = ubuf(type[i]).d;
buf[m++] = ubuf(mask[i]).d;
buf[m++] = ubuf(image[i]).d;
buf[m++] = v[i][0];
buf[m++] = v[i][1];
buf[m++] = v[i][2];
if (atom->nextra_restart)
for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
m += modify->fix[atom->extra_restart[iextra]]->pack_restart(i,&buf[m]);
buf[0] = m;
return m;
}
/* ----------------------------------------------------------------------
unpack data for one atom from restart file including extra quantities
------------------------------------------------------------------------- */
int AtomVecAtomic::unpack_restart(double *buf)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) {
grow(0);
if (atom->nextra_store)
memory->grow(atom->extra,nmax,atom->nextra_store,"atom:extra");
}
int m = 1;
x[nlocal][0] = buf[m++];
x[nlocal][1] = buf[m++];
x[nlocal][2] = buf[m++];
tag[nlocal] = (tagint) ubuf(buf[m++]).i;
type[nlocal] = (int) ubuf(buf[m++]).i;
mask[nlocal] = (int) ubuf(buf[m++]).i;
image[nlocal] = (imageint) ubuf(buf[m++]).i;
v[nlocal][0] = buf[m++];
v[nlocal][1] = buf[m++];
v[nlocal][2] = buf[m++];
double **extra = atom->extra;
if (atom->nextra_store) {
int size = static_cast<int> (buf[0]) - m;
for (int i = 0; i < size; i++) extra[nlocal][i] = buf[m++];
}
atom->nlocal++;
return m;
}
/* ----------------------------------------------------------------------
create one atom of itype at coord
set other values to defaults
------------------------------------------------------------------------- */
void AtomVecAtomic::create_atom(int itype, double *coord)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) grow(0);
tag[nlocal] = 0;
type[nlocal] = itype;
x[nlocal][0] = coord[0];
x[nlocal][1] = coord[1];
x[nlocal][2] = coord[2];
mask[nlocal] = 1;
image[nlocal] = ((imageint) IMGMAX << IMG2BITS) |
((imageint) IMGMAX << IMGBITS) | IMGMAX;
v[nlocal][0] = 0.0;
v[nlocal][1] = 0.0;
v[nlocal][2] = 0.0;
atom->nlocal++;
}
/* ----------------------------------------------------------------------
unpack one line from Atoms section of data file
initialize other atom quantities
------------------------------------------------------------------------- */
void AtomVecAtomic::data_atom(double *coord, imageint imagetmp, char **values)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) grow(0);
tag[nlocal] = ATOTAGINT(values[0]);
type[nlocal] = atoi(values[1]);
if (type[nlocal] <= 0 || type[nlocal] > atom->ntypes)
error->one(FLERR,"Invalid atom type in Atoms section of data file");
x[nlocal][0] = coord[0];
x[nlocal][1] = coord[1];
x[nlocal][2] = coord[2];
image[nlocal] = imagetmp;
mask[nlocal] = 1;
v[nlocal][0] = 0.0;
v[nlocal][1] = 0.0;
v[nlocal][2] = 0.0;
atom->nlocal++;
}
/* ----------------------------------------------------------------------
pack atom info for data file including 3 image flags
------------------------------------------------------------------------- */
void AtomVecAtomic::pack_data(double **buf)
{
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
buf[i][0] = ubuf(tag[i]).d;
buf[i][1] = ubuf(type[i]).d;
buf[i][2] = x[i][0];
buf[i][3] = x[i][1];
buf[i][4] = x[i][2];
buf[i][5] = ubuf((image[i] & IMGMASK) - IMGMAX).d;
buf[i][6] = ubuf((image[i] >> IMGBITS & IMGMASK) - IMGMAX).d;
buf[i][7] = ubuf((image[i] >> IMG2BITS) - IMGMAX).d;
}
}
/* ----------------------------------------------------------------------
write atom info to data file including 3 image flags
------------------------------------------------------------------------- */
void AtomVecAtomic::write_data(FILE *fp, int n, double **buf)
{
for (int i = 0; i < n; i++)
fprintf(fp,TAGINT_FORMAT " %d %-1.16e %-1.16e %-1.16e %d %d %d\n",
(tagint) ubuf(buf[i][0]).i,(int) ubuf(buf[i][1]).i,
buf[i][2],buf[i][3],buf[i][4],
(int) ubuf(buf[i][5]).i,(int) ubuf(buf[i][6]).i,
(int) ubuf(buf[i][7]).i);
}
/* ----------------------------------------------------------------------
return # of bytes of allocated memory
------------------------------------------------------------------------- */
bigint AtomVecAtomic::memory_usage()
{
bigint bytes = 0;
if (atom->memcheck("tag")) bytes += memory->usage(tag,nmax);
if (atom->memcheck("type")) bytes += memory->usage(type,nmax);
if (atom->memcheck("mask")) bytes += memory->usage(mask,nmax);
if (atom->memcheck("image")) bytes += memory->usage(image,nmax);
if (atom->memcheck("x")) bytes += memory->usage(x,nmax,3);
if (atom->memcheck("v")) bytes += memory->usage(v,nmax,3);
if (atom->memcheck("f")) bytes += memory->usage(f,nmax*comm->nthreads,3);
return bytes;
}
diff --git a/src/atom_vec_body.cpp b/src/atom_vec_body.cpp
index 86d3ed872..ca080ff0b 100644
--- a/src/atom_vec_body.cpp
+++ b/src/atom_vec_body.cpp
@@ -1,1585 +1,1585 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include <math.h>
#include <stdlib.h>
#include <string.h>
#include "atom_vec_body.h"
#include "style_body.h"
#include "body.h"
#include "atom.h"
#include "comm.h"
#include "domain.h"
#include "modify.h"
#include "force.h"
#include "fix.h"
#include "memory.h"
#include "error.h"
using namespace LAMMPS_NS;
/* ---------------------------------------------------------------------- */
AtomVecBody::AtomVecBody(LAMMPS *lmp) : AtomVec(lmp)
{
molecular = 0;
// size_forward and size_border set in settings(), via Body class
comm_x_only = comm_f_only = 0;
size_forward = 0;
size_reverse = 6;
size_border = 0;
size_velocity = 6;
size_data_atom = 7;
size_data_vel = 7;
xcol_data = 5;
atom->body_flag = 1;
atom->rmass_flag = 1;
atom->angmom_flag = atom->torque_flag = 1;
atom->radius_flag = 1;
nlocal_bonus = nghost_bonus = nmax_bonus = 0;
bonus = NULL;
bptr = NULL;
if (sizeof(double) == sizeof(int)) intdoubleratio = 1;
else if (sizeof(double) == 2*sizeof(int)) intdoubleratio = 2;
else error->all(FLERR,"Internal error in atom_style body");
}
/* ---------------------------------------------------------------------- */
AtomVecBody::~AtomVecBody()
{
int nall = nlocal_bonus + nghost_bonus;
for (int i = 0; i < nall; i++) {
icp->put(bonus[i].iindex);
dcp->put(bonus[i].dindex);
}
memory->sfree(bonus);
delete bptr;
}
/* ----------------------------------------------------------------------
process additional args
instantiate Body class
set size_forward and size_border to max sizes
------------------------------------------------------------------------- */
void AtomVecBody::process_args(int narg, char **arg)
{
if (narg < 1) error->all(FLERR,"Invalid atom_style body command");
if (0) bptr = NULL;
#define BODY_CLASS
#define BodyStyle(key,Class) \
else if (strcmp(arg[0],#key) == 0) bptr = new Class(lmp,narg,arg);
#include "style_body.h"
#undef BodyStyle
#undef BODY_CLASS
else error->all(FLERR,"Unknown body style");
bptr->avec = this;
icp = bptr->icp;
dcp = bptr->dcp;
// max size of forward/border comm
// 7,16 are packed in pack_comm/pack_border
// bptr values = max number of additional ivalues/dvalues from Body class
size_forward = 7 + bptr->size_forward;
size_border = 18 + bptr->size_border;
}
/* ----------------------------------------------------------------------
grow atom arrays
n = 0 grows arrays by a chunk
n > 0 allocates arrays to size n
------------------------------------------------------------------------- */
void AtomVecBody::grow(int n)
{
if (n == 0) grow_nmax();
else nmax = n;
atom->nmax = nmax;
- if (nmax < 0)
+ if (nmax < 0 || nmax > MAXSMALLINT)
error->one(FLERR,"Per-processor system is too big");
tag = memory->grow(atom->tag,nmax,"atom:tag");
type = memory->grow(atom->type,nmax,"atom:type");
mask = memory->grow(atom->mask,nmax,"atom:mask");
image = memory->grow(atom->image,nmax,"atom:image");
x = memory->grow(atom->x,nmax,3,"atom:x");
v = memory->grow(atom->v,nmax,3,"atom:v");
f = memory->grow(atom->f,nmax*comm->nthreads,3,"atom:f");
radius = memory->grow(atom->radius,nmax,"atom:radius");
rmass = memory->grow(atom->rmass,nmax,"atom:rmass");
angmom = memory->grow(atom->angmom,nmax,3,"atom:angmom");
torque = memory->grow(atom->torque,nmax*comm->nthreads,3,"atom:torque");
body = memory->grow(atom->body,nmax,"atom:body");
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
modify->fix[atom->extra_grow[iextra]]->grow_arrays(nmax);
}
/* ----------------------------------------------------------------------
reset local array ptrs
------------------------------------------------------------------------- */
void AtomVecBody::grow_reset()
{
tag = atom->tag; type = atom->type;
mask = atom->mask; image = atom->image;
x = atom->x; v = atom->v; f = atom->f;
radius = atom->radius; rmass = atom->rmass;
angmom = atom->angmom; torque = atom->torque;
body = atom->body;
}
/* ----------------------------------------------------------------------
grow bonus data structure
------------------------------------------------------------------------- */
void AtomVecBody::grow_bonus()
{
nmax_bonus = grow_nmax_bonus(nmax_bonus);
if (nmax_bonus < 0)
error->one(FLERR,"Per-processor system is too big");
bonus = (Bonus *) memory->srealloc(bonus,nmax_bonus*sizeof(Bonus),
"atom:bonus");
}
/* ----------------------------------------------------------------------
copy atom I info to atom J
if delflag and atom J has bonus data, then delete it
------------------------------------------------------------------------- */
void AtomVecBody::copy(int i, int j, int delflag)
{
tag[j] = tag[i];
type[j] = type[i];
mask[j] = mask[i];
image[j] = image[i];
x[j][0] = x[i][0];
x[j][1] = x[i][1];
x[j][2] = x[i][2];
v[j][0] = v[i][0];
v[j][1] = v[i][1];
v[j][2] = v[i][2];
radius[j] = radius[i];
rmass[j] = rmass[i];
angmom[j][0] = angmom[i][0];
angmom[j][1] = angmom[i][1];
angmom[j][2] = angmom[i][2];
// if deleting atom J via delflag and J has bonus data, then delete it
if (delflag && body[j] >= 0) {
int k = body[j];
icp->put(bonus[k].iindex);
dcp->put(bonus[k].dindex);
copy_bonus(nlocal_bonus-1,k);
nlocal_bonus--;
}
// if atom I has bonus data, reset I's bonus.ilocal to loc J
// do NOT do this if self-copy (I=J) since I's bonus data is already deleted
if (body[i] >= 0 && i != j) bonus[body[i]].ilocal = j;
body[j] = body[i];
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
modify->fix[atom->extra_grow[iextra]]->copy_arrays(i,j,delflag);
}
/* ----------------------------------------------------------------------
copy bonus data from I to J, effectively deleting the J entry
also reset body that points to I to now point to J
------------------------------------------------------------------------- */
void AtomVecBody::copy_bonus(int i, int j)
{
body[bonus[i].ilocal] = j;
memcpy(&bonus[j],&bonus[i],sizeof(Bonus));
}
/* ----------------------------------------------------------------------
clear ghost info in bonus data
called before ghosts are recommunicated in comm and irregular
------------------------------------------------------------------------- */
void AtomVecBody::clear_bonus()
{
int nall = nlocal_bonus + nghost_bonus;
for (int i = nlocal_bonus; i < nall; i++) {
icp->put(bonus[i].iindex);
dcp->put(bonus[i].dindex);
}
nghost_bonus = 0;
}
/* ---------------------------------------------------------------------- */
int AtomVecBody::pack_comm(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz;
double *quat;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0];
buf[m++] = x[j][1];
buf[m++] = x[j][2];
if (body[j] >= 0) {
quat = bonus[body[j]].quat;
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
m += bptr->pack_comm_body(&bonus[body[j]],&buf[m]);
}
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0]*domain->xprd + pbc[5]*domain->xy + pbc[4]*domain->xz;
dy = pbc[1]*domain->yprd + pbc[3]*domain->yz;
dz = pbc[2]*domain->zprd;
}
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
if (body[j] >= 0) {
quat = bonus[body[j]].quat;
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
m += bptr->pack_comm_body(&bonus[body[j]],&buf[m]);
}
}
}
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecBody::pack_comm_vel(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz,dvx,dvy,dvz;
double *quat;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0];
buf[m++] = x[j][1];
buf[m++] = x[j][2];
if (body[j] >= 0) {
quat = bonus[body[j]].quat;
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
m += bptr->pack_comm_body(&bonus[body[j]],&buf[m]);
}
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
buf[m++] = angmom[j][0];
buf[m++] = angmom[j][1];
buf[m++] = angmom[j][2];
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0]*domain->xprd + pbc[5]*domain->xy + pbc[4]*domain->xz;
dy = pbc[1]*domain->yprd + pbc[3]*domain->yz;
dz = pbc[2]*domain->zprd;
}
if (!deform_vremap) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
if (body[j] >= 0) {
quat = bonus[body[j]].quat;
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
m += bptr->pack_comm_body(&bonus[body[j]],&buf[m]);
}
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
buf[m++] = angmom[j][0];
buf[m++] = angmom[j][1];
buf[m++] = angmom[j][2];
}
} else {
dvx = pbc[0]*h_rate[0] + pbc[5]*h_rate[5] + pbc[4]*h_rate[4];
dvy = pbc[1]*h_rate[1] + pbc[3]*h_rate[3];
dvz = pbc[2]*h_rate[2];
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
if (body[j] >= 0) {
quat = bonus[body[j]].quat;
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
m += bptr->pack_comm_body(&bonus[body[j]],&buf[m]);
}
if (mask[i] & deform_groupbit) {
buf[m++] = v[j][0] + dvx;
buf[m++] = v[j][1] + dvy;
buf[m++] = v[j][2] + dvz;
} else {
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
}
buf[m++] = angmom[j][0];
buf[m++] = angmom[j][1];
buf[m++] = angmom[j][2];
}
}
}
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecBody::pack_comm_hybrid(int n, int *list, double *buf)
{
int i,j,m;
double *quat;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
if (body[j] >= 0) {
quat = bonus[body[j]].quat;
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
m += bptr->pack_comm_body(&bonus[body[j]],&buf[m]);
}
}
return m;
}
/* ---------------------------------------------------------------------- */
void AtomVecBody::unpack_comm(int n, int first, double *buf)
{
int i,m,last;
double *quat;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
x[i][0] = buf[m++];
x[i][1] = buf[m++];
x[i][2] = buf[m++];
if (body[i] >= 0) {
quat = bonus[body[i]].quat;
quat[0] = buf[m++];
quat[1] = buf[m++];
quat[2] = buf[m++];
quat[3] = buf[m++];
m += bptr->unpack_comm_body(&bonus[body[i]],&buf[m]);
}
}
}
/* ---------------------------------------------------------------------- */
void AtomVecBody::unpack_comm_vel(int n, int first, double *buf)
{
int i,m,last;
double *quat;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
x[i][0] = buf[m++];
x[i][1] = buf[m++];
x[i][2] = buf[m++];
if (body[i] >= 0) {
quat = bonus[body[i]].quat;
quat[0] = buf[m++];
quat[1] = buf[m++];
quat[2] = buf[m++];
quat[3] = buf[m++];
m += bptr->unpack_comm_body(&bonus[body[i]],&buf[m]);
}
v[i][0] = buf[m++];
v[i][1] = buf[m++];
v[i][2] = buf[m++];
angmom[i][0] = buf[m++];
angmom[i][1] = buf[m++];
angmom[i][2] = buf[m++];
}
}
/* ---------------------------------------------------------------------- */
int AtomVecBody::unpack_comm_hybrid(int n, int first, double *buf)
{
int i,m,last;
double *quat;
m = 0;
last = first + n;
for (i = first; i < last; i++)
if (body[i] >= 0) {
quat = bonus[body[i]].quat;
quat[0] = buf[m++];
quat[1] = buf[m++];
quat[2] = buf[m++];
quat[3] = buf[m++];
m += bptr->unpack_comm_body(&bonus[body[i]],&buf[m]);
}
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecBody::pack_reverse(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
buf[m++] = f[i][0];
buf[m++] = f[i][1];
buf[m++] = f[i][2];
buf[m++] = torque[i][0];
buf[m++] = torque[i][1];
buf[m++] = torque[i][2];
}
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecBody::pack_reverse_hybrid(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
buf[m++] = torque[i][0];
buf[m++] = torque[i][1];
buf[m++] = torque[i][2];
}
return m;
}
/* ---------------------------------------------------------------------- */
void AtomVecBody::unpack_reverse(int n, int *list, double *buf)
{
int i,j,m;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
f[j][0] += buf[m++];
f[j][1] += buf[m++];
f[j][2] += buf[m++];
torque[j][0] += buf[m++];
torque[j][1] += buf[m++];
torque[j][2] += buf[m++];
}
}
/* ---------------------------------------------------------------------- */
int AtomVecBody::unpack_reverse_hybrid(int n, int *list, double *buf)
{
int i,j,m;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
torque[j][0] += buf[m++];
torque[j][1] += buf[m++];
torque[j][2] += buf[m++];
}
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecBody::pack_border(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz;
double *quat,*inertia;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0];
buf[m++] = x[j][1];
buf[m++] = x[j][2];
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
buf[m++] = radius[j];
buf[m++] = rmass[j];
if (body[j] < 0) buf[m++] = ubuf(0).d;
else {
buf[m++] = ubuf(1).d;
quat = bonus[body[j]].quat;
inertia = bonus[body[j]].inertia;
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
buf[m++] = inertia[0];
buf[m++] = inertia[1];
buf[m++] = inertia[2];
buf[m++] = ubuf(bonus[body[j]].ninteger).d;
buf[m++] = ubuf(bonus[body[j]].ndouble).d;
m += bptr->pack_border_body(&bonus[body[j]],&buf[m]);
}
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0];
dy = pbc[1];
dz = pbc[2];
}
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
buf[m++] = radius[j];
buf[m++] = rmass[j];
if (body[j] < 0) buf[m++] = ubuf(0).d;
else {
buf[m++] = ubuf(1).d;
quat = bonus[body[j]].quat;
inertia = bonus[body[j]].inertia;
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
buf[m++] = inertia[0];
buf[m++] = inertia[1];
buf[m++] = inertia[2];
buf[m++] = ubuf(bonus[body[j]].ninteger).d;
buf[m++] = ubuf(bonus[body[j]].ndouble).d;
m += bptr->pack_border_body(&bonus[body[j]],&buf[m]);
}
}
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecBody::pack_border_vel(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz,dvx,dvy,dvz;
double *quat,*inertia;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0];
buf[m++] = x[j][1];
buf[m++] = x[j][2];
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
buf[m++] = radius[j];
buf[m++] = rmass[j];
if (body[j] < 0) buf[m++] = ubuf(0).d;
else {
buf[m++] = ubuf(1).d;
quat = bonus[body[j]].quat;
inertia = bonus[body[j]].inertia;
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
buf[m++] = inertia[0];
buf[m++] = inertia[1];
buf[m++] = inertia[2];
buf[m++] = ubuf(bonus[body[j]].ninteger).d;
buf[m++] = ubuf(bonus[body[j]].ndouble).d;
m += bptr->pack_border_body(&bonus[body[j]],&buf[m]);
}
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
buf[m++] = angmom[j][0];
buf[m++] = angmom[j][1];
buf[m++] = angmom[j][2];
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0];
dy = pbc[1];
dz = pbc[2];
}
if (!deform_vremap) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
buf[m++] = radius[j];
buf[m++] = rmass[j];
if (body[j] < 0) buf[m++] = ubuf(0).d;
else {
buf[m++] = ubuf(1).d;
quat = bonus[body[j]].quat;
inertia = bonus[body[j]].inertia;
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
buf[m++] = inertia[0];
buf[m++] = inertia[1];
buf[m++] = inertia[2];
buf[m++] = ubuf(bonus[body[j]].ninteger).d;
buf[m++] = ubuf(bonus[body[j]].ndouble).d;
m += bptr->pack_border_body(&bonus[body[j]],&buf[m]);
}
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
buf[m++] = angmom[j][0];
buf[m++] = angmom[j][1];
buf[m++] = angmom[j][2];
}
} else {
dvx = pbc[0]*h_rate[0] + pbc[5]*h_rate[5] + pbc[4]*h_rate[4];
dvy = pbc[1]*h_rate[1] + pbc[3]*h_rate[3];
dvz = pbc[2]*h_rate[2];
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
if (body[j] < 0) buf[m++] = ubuf(0).d;
else {
buf[m++] = ubuf(1).d;
quat = bonus[body[j]].quat;
inertia = bonus[body[j]].inertia;
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
buf[m++] = inertia[0];
buf[m++] = inertia[1];
buf[m++] = inertia[2];
buf[m++] = ubuf(bonus[body[j]].ninteger).d;
buf[m++] = ubuf(bonus[body[j]].ndouble).d;
m += bptr->pack_border_body(&bonus[body[j]],&buf[m]);
}
if (mask[i] & deform_groupbit) {
buf[m++] = v[j][0] + dvx;
buf[m++] = v[j][1] + dvy;
buf[m++] = v[j][2] + dvz;
} else {
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
}
buf[m++] = angmom[j][0];
buf[m++] = angmom[j][1];
buf[m++] = angmom[j][2];
}
}
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecBody::pack_border_hybrid(int n, int *list, double *buf)
{
int i,j,m;
double *quat,*inertia;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = radius[j];
buf[m++] = rmass[j];
if (body[j] < 0) buf[m++] = ubuf(0).d;
else {
buf[m++] = ubuf(1).d;
quat = bonus[body[j]].quat;
inertia = bonus[body[j]].inertia;
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
buf[m++] = inertia[0];
buf[m++] = inertia[1];
buf[m++] = inertia[2];
buf[m++] = ubuf(bonus[body[j]].ninteger).d;
buf[m++] = ubuf(bonus[body[j]].ndouble).d;
m += bptr->pack_border_body(&bonus[body[j]],&buf[m]);
}
}
return m;
}
/* ---------------------------------------------------------------------- */
void AtomVecBody::unpack_border(int n, int first, double *buf)
{
int i,j,m,last;
double *quat,*inertia;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
if (i == nmax) grow(0);
x[i][0] = buf[m++];
x[i][1] = buf[m++];
x[i][2] = buf[m++];
tag[i] = (tagint) ubuf(buf[m++]).i;
type[i] = (int) ubuf(buf[m++]).i;
mask[i] = (int) ubuf(buf[m++]).i;
radius[i] = buf[m++];
rmass[i] = buf[m++];
body[i] = (int) ubuf(buf[m++]).i;
if (body[i] == 0) body[i] = -1;
else {
j = nlocal_bonus + nghost_bonus;
if (j == nmax_bonus) grow_bonus();
quat = bonus[j].quat;
inertia = bonus[j].inertia;
quat[0] = buf[m++];
quat[1] = buf[m++];
quat[2] = buf[m++];
quat[3] = buf[m++];
inertia[0] = buf[m++];
inertia[1] = buf[m++];
inertia[2] = buf[m++];
bonus[j].ninteger = (int) ubuf(buf[m++]).i;
bonus[j].ndouble = (int) ubuf(buf[m++]).i;
// corresponding put() calls are in clear_bonus()
bonus[j].ivalue = icp->get(bonus[j].ninteger,bonus[j].iindex);
bonus[j].dvalue = dcp->get(bonus[j].ndouble,bonus[j].dindex);
m += bptr->unpack_border_body(&bonus[j],&buf[m]);
bonus[j].ilocal = i;
body[i] = j;
nghost_bonus++;
}
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->
unpack_border(n,first,&buf[m]);
}
/* ---------------------------------------------------------------------- */
void AtomVecBody::unpack_border_vel(int n, int first, double *buf)
{
int i,j,m,last;
double *quat,*inertia;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
if (i == nmax) grow(0);
x[i][0] = buf[m++];
x[i][1] = buf[m++];
x[i][2] = buf[m++];
tag[i] = (tagint) ubuf(buf[m++]).i;
type[i] = (int) ubuf(buf[m++]).i;
mask[i] = (int) ubuf(buf[m++]).i;
radius[i] = buf[m++];
rmass[i] = buf[m++];
body[i] = (int) ubuf(buf[m++]).i;
if (body[i] == 0) body[i] = -1;
else {
j = nlocal_bonus + nghost_bonus;
if (j == nmax_bonus) grow_bonus();
quat = bonus[j].quat;
inertia = bonus[j].inertia;
quat[0] = buf[m++];
quat[1] = buf[m++];
quat[2] = buf[m++];
quat[3] = buf[m++];
inertia[0] = buf[m++];
inertia[1] = buf[m++];
inertia[2] = buf[m++];
bonus[j].ninteger = (int) ubuf(buf[m++]).i;
bonus[j].ndouble = (int) ubuf(buf[m++]).i;
// corresponding put() calls are in clear_bonus()
bonus[j].ivalue = icp->get(bonus[j].ninteger,bonus[j].iindex);
bonus[j].dvalue = dcp->get(bonus[j].ndouble,bonus[j].dindex);
m += bptr->unpack_border_body(&bonus[j],&buf[m]);
bonus[j].ilocal = i;
body[i] = j;
nghost_bonus++;
}
v[i][0] = buf[m++];
v[i][1] = buf[m++];
v[i][2] = buf[m++];
angmom[i][0] = buf[m++];
angmom[i][1] = buf[m++];
angmom[i][2] = buf[m++];
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->
unpack_border(n,first,&buf[m]);
}
/* ---------------------------------------------------------------------- */
int AtomVecBody::unpack_border_hybrid(int n, int first, double *buf)
{
int i,j,m,last;
double *quat,*inertia;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
radius[i] = buf[m++];
rmass[i] = buf[m++];
body[i] = (int) ubuf(buf[m++]).i;
if (body[i] == 0) body[i] = -1;
else {
j = nlocal_bonus + nghost_bonus;
if (j == nmax_bonus) grow_bonus();
quat = bonus[j].quat;
inertia = bonus[j].inertia;
quat[0] = buf[m++];
quat[1] = buf[m++];
quat[2] = buf[m++];
quat[3] = buf[m++];
inertia[0] = buf[m++];
inertia[1] = buf[m++];
inertia[2] = buf[m++];
bonus[j].ninteger = (int) ubuf(buf[m++]).i;
bonus[j].ndouble = (int) ubuf(buf[m++]).i;
// corresponding put() calls are in clear_bonus()
bonus[j].ivalue = icp->get(bonus[j].ninteger,bonus[j].iindex);
bonus[j].dvalue = dcp->get(bonus[j].ndouble,bonus[j].dindex);
m += bptr->unpack_border_body(&bonus[j],&buf[m]);
bonus[j].ilocal = i;
body[i] = j;
nghost_bonus++;
}
}
return m;
}
/* ----------------------------------------------------------------------
pack data for atom I for sending to another proc
xyz must be 1st 3 values, so comm::exchange() can test on them
------------------------------------------------------------------------- */
int AtomVecBody::pack_exchange(int i, double *buf)
{
int m = 1;
buf[m++] = x[i][0];
buf[m++] = x[i][1];
buf[m++] = x[i][2];
buf[m++] = v[i][0];
buf[m++] = v[i][1];
buf[m++] = v[i][2];
buf[m++] = ubuf(tag[i]).d;
buf[m++] = ubuf(type[i]).d;
buf[m++] = ubuf(mask[i]).d;
buf[m++] = ubuf(image[i]).d;
buf[m++] = radius[i];
buf[m++] = rmass[i];
buf[m++] = angmom[i][0];
buf[m++] = angmom[i][1];
buf[m++] = angmom[i][2];
if (body[i] < 0) buf[m++] = ubuf(0).d;
else {
buf[m++] = ubuf(1).d;
int j = body[i];
double *quat = bonus[j].quat;
double *inertia = bonus[j].inertia;
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
buf[m++] = inertia[0];
buf[m++] = inertia[1];
buf[m++] = inertia[2];
buf[m++] = ubuf(bonus[j].ninteger).d;
buf[m++] = ubuf(bonus[j].ndouble).d;
memcpy(&buf[m],bonus[j].ivalue,bonus[j].ninteger*sizeof(int));
if (intdoubleratio == 1) m += bonus[j].ninteger;
else m += (bonus[j].ninteger+1)/2;
memcpy(&buf[m],bonus[j].dvalue,bonus[j].ndouble*sizeof(double));
m += bonus[j].ndouble;
}
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
m += modify->fix[atom->extra_grow[iextra]]->pack_exchange(i,&buf[m]);
buf[0] = m;
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecBody::unpack_exchange(double *buf)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) grow(0);
int m = 1;
x[nlocal][0] = buf[m++];
x[nlocal][1] = buf[m++];
x[nlocal][2] = buf[m++];
v[nlocal][0] = buf[m++];
v[nlocal][1] = buf[m++];
v[nlocal][2] = buf[m++];
tag[nlocal] = (tagint) ubuf(buf[m++]).i;
type[nlocal] = (int) ubuf(buf[m++]).i;
mask[nlocal] = (int) ubuf(buf[m++]).i;
image[nlocal] = (imageint) ubuf(buf[m++]).i;
radius[nlocal] = buf[m++];
rmass[nlocal] = buf[m++];
angmom[nlocal][0] = buf[m++];
angmom[nlocal][1] = buf[m++];
angmom[nlocal][2] = buf[m++];
body[nlocal] = (int) ubuf(buf[m++]).i;
if (body[nlocal] == 0) body[nlocal] = -1;
else {
if (nlocal_bonus == nmax_bonus) grow_bonus();
double *quat = bonus[nlocal_bonus].quat;
double *inertia = bonus[nlocal_bonus].inertia;
quat[0] = buf[m++];
quat[1] = buf[m++];
quat[2] = buf[m++];
quat[3] = buf[m++];
inertia[0] = buf[m++];
inertia[1] = buf[m++];
inertia[2] = buf[m++];
bonus[nlocal_bonus].ninteger = (int) ubuf(buf[m++]).i;
bonus[nlocal_bonus].ndouble = (int) ubuf(buf[m++]).i;
// corresponding put() calls are in copy()
bonus[nlocal_bonus].ivalue = icp->get(bonus[nlocal_bonus].ninteger,
bonus[nlocal_bonus].iindex);
bonus[nlocal_bonus].dvalue = dcp->get(bonus[nlocal_bonus].ndouble,
bonus[nlocal_bonus].dindex);
memcpy(bonus[nlocal_bonus].ivalue,&buf[m],
bonus[nlocal_bonus].ninteger*sizeof(int));
if (intdoubleratio == 1) m += bonus[nlocal_bonus].ninteger;
else m += (bonus[nlocal_bonus].ninteger+1)/2;
memcpy(bonus[nlocal_bonus].dvalue,&buf[m],
bonus[nlocal_bonus].ndouble*sizeof(double));
m += bonus[nlocal_bonus].ndouble;
bonus[nlocal_bonus].ilocal = nlocal;
body[nlocal] = nlocal_bonus++;
}
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
m += modify->fix[atom->extra_grow[iextra]]->
unpack_exchange(nlocal,&buf[m]);
atom->nlocal++;
return m;
}
/* ----------------------------------------------------------------------
size of restart data for all atoms owned by this proc
include extra data stored by fixes
------------------------------------------------------------------------- */
int AtomVecBody::size_restart()
{
int i;
int n = 0;
int nlocal = atom->nlocal;
for (i = 0; i < nlocal; i++)
if (body[i] >= 0) {
n += 26;
if (intdoubleratio == 1) n += bonus[body[i]].ninteger;
else n += (bonus[body[i]].ninteger+1)/2;
n += bonus[body[i]].ndouble;
} else n += 17;
if (atom->nextra_restart)
for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
for (i = 0; i < nlocal; i++)
n += modify->fix[atom->extra_restart[iextra]]->size_restart(i);
return n;
}
/* ----------------------------------------------------------------------
pack atom I's data for restart file including extra quantities
xyz must be 1st 3 values, so that read_restart can test on them
molecular types may be negative, but write as positive
------------------------------------------------------------------------- */
int AtomVecBody::pack_restart(int i, double *buf)
{
int m = 1;
buf[m++] = x[i][0];
buf[m++] = x[i][1];
buf[m++] = x[i][2];
buf[m++] = ubuf(tag[i]).d;
buf[m++] = ubuf(type[i]).d;
buf[m++] = ubuf(mask[i]).d;
buf[m++] = ubuf(image[i]).d;
buf[m++] = v[i][0];
buf[m++] = v[i][1];
buf[m++] = v[i][2];
buf[m++] = radius[i];
buf[m++] = rmass[i];
buf[m++] = angmom[i][0];
buf[m++] = angmom[i][1];
buf[m++] = angmom[i][2];
if (body[i] < 0) buf[m++] = ubuf(0).d;
else {
buf[m++] = ubuf(1).d;
int j = body[i];
double *quat = bonus[j].quat;
double *inertia = bonus[j].inertia;
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
buf[m++] = inertia[0];
buf[m++] = inertia[1];
buf[m++] = inertia[2];
buf[m++] = ubuf(bonus[j].ninteger).d;
buf[m++] = ubuf(bonus[j].ndouble).d;
memcpy(&buf[m],bonus[j].ivalue,bonus[j].ninteger*sizeof(int));
if (intdoubleratio == 1) m += bonus[j].ninteger;
else m += (bonus[j].ninteger+1)/2;
memcpy(&buf[m],bonus[j].dvalue,bonus[j].ndouble*sizeof(double));
m += bonus[j].ndouble;
}
if (atom->nextra_restart)
for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
m += modify->fix[atom->extra_restart[iextra]]->pack_restart(i,&buf[m]);
buf[0] = m;
return m;
}
/* ----------------------------------------------------------------------
unpack data for one atom from restart file including extra quantities
------------------------------------------------------------------------- */
int AtomVecBody::unpack_restart(double *buf)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) {
grow(0);
if (atom->nextra_store)
memory->grow(atom->extra,nmax,atom->nextra_store,"atom:extra");
}
int m = 1;
x[nlocal][0] = buf[m++];
x[nlocal][1] = buf[m++];
x[nlocal][2] = buf[m++];
tag[nlocal] = (tagint) ubuf(buf[m++]).i;
type[nlocal] = (int) ubuf(buf[m++]).i;
mask[nlocal] = (int) ubuf(buf[m++]).i;
image[nlocal] = (imageint) ubuf(buf[m++]).i;
v[nlocal][0] = buf[m++];
v[nlocal][1] = buf[m++];
v[nlocal][2] = buf[m++];
radius[nlocal] = buf[m++];
rmass[nlocal] = buf[m++];
angmom[nlocal][0] = buf[m++];
angmom[nlocal][1] = buf[m++];
angmom[nlocal][2] = buf[m++];
body[nlocal] = (int) ubuf(buf[m++]).i;
if (body[nlocal] == 0) body[nlocal] = -1;
else {
if (nlocal_bonus == nmax_bonus) grow_bonus();
double *quat = bonus[nlocal_bonus].quat;
double *inertia = bonus[nlocal_bonus].inertia;
quat[0] = buf[m++];
quat[1] = buf[m++];
quat[2] = buf[m++];
quat[3] = buf[m++];
inertia[0] = buf[m++];
inertia[1] = buf[m++];
inertia[2] = buf[m++];
bonus[nlocal_bonus].ninteger = (int) ubuf(buf[m++]).i;
bonus[nlocal_bonus].ndouble = (int) ubuf(buf[m++]).i;
bonus[nlocal_bonus].ivalue = icp->get(bonus[nlocal_bonus].ninteger,
bonus[nlocal_bonus].iindex);
bonus[nlocal_bonus].dvalue = dcp->get(bonus[nlocal_bonus].ndouble,
bonus[nlocal_bonus].dindex);
memcpy(bonus[nlocal_bonus].ivalue,&buf[m],
bonus[nlocal_bonus].ninteger*sizeof(int));
if (intdoubleratio == 1) m += bonus[nlocal_bonus].ninteger;
else m += (bonus[nlocal_bonus].ninteger+1)/2;
memcpy(bonus[nlocal_bonus].dvalue,&buf[m],
bonus[nlocal_bonus].ndouble*sizeof(double));
m += bonus[nlocal_bonus].ndouble;
bonus[nlocal_bonus].ilocal = nlocal;
body[nlocal] = nlocal_bonus++;
}
double **extra = atom->extra;
if (atom->nextra_store) {
int size = static_cast<int> (buf[0]) - m;
for (int i = 0; i < size; i++) extra[nlocal][i] = buf[m++];
}
atom->nlocal++;
return m;
}
/* ----------------------------------------------------------------------
create one atom of itype at coord
set other values to defaults
------------------------------------------------------------------------- */
void AtomVecBody::create_atom(int itype, double *coord)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) grow(0);
tag[nlocal] = 0;
type[nlocal] = itype;
x[nlocal][0] = coord[0];
x[nlocal][1] = coord[1];
x[nlocal][2] = coord[2];
mask[nlocal] = 1;
image[nlocal] = ((imageint) IMGMAX << IMG2BITS) |
((imageint) IMGMAX << IMGBITS) | IMGMAX;
v[nlocal][0] = 0.0;
v[nlocal][1] = 0.0;
v[nlocal][2] = 0.0;
radius[nlocal] = 0.5;
rmass[nlocal] = 1.0;
angmom[nlocal][0] = 0.0;
angmom[nlocal][1] = 0.0;
angmom[nlocal][2] = 0.0;
body[nlocal] = -1;
atom->nlocal++;
}
/* ----------------------------------------------------------------------
unpack one line from Atoms section of data file
initialize other atom quantities
------------------------------------------------------------------------- */
void AtomVecBody::data_atom(double *coord, imageint imagetmp, char **values)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) grow(0);
tag[nlocal] = ATOTAGINT(values[0]);
type[nlocal] = atoi(values[1]);
if (type[nlocal] <= 0 || type[nlocal] > atom->ntypes)
error->one(FLERR,"Invalid atom type in Atoms section of data file");
body[nlocal] = atoi(values[2]);
if (body[nlocal] == 0) body[nlocal] = -1;
else if (body[nlocal] == 1) body[nlocal] = 0;
else error->one(FLERR,"Invalid atom type in Atoms section of data file");
rmass[nlocal] = atof(values[3]);
if (rmass[nlocal] <= 0.0)
error->one(FLERR,"Invalid density in Atoms section of data file");
x[nlocal][0] = coord[0];
x[nlocal][1] = coord[1];
x[nlocal][2] = coord[2];
image[nlocal] = imagetmp;
mask[nlocal] = 1;
v[nlocal][0] = 0.0;
v[nlocal][1] = 0.0;
v[nlocal][2] = 0.0;
angmom[nlocal][0] = 0.0;
angmom[nlocal][1] = 0.0;
angmom[nlocal][2] = 0.0;
radius[nlocal] = 0.5;
atom->nlocal++;
}
/* ----------------------------------------------------------------------
unpack hybrid quantities from one line in Atoms section of data file
initialize other atom quantities for this sub-style
------------------------------------------------------------------------- */
int AtomVecBody::data_atom_hybrid(int nlocal, char **values)
{
body[nlocal] = atoi(values[0]);
if (body[nlocal] == 0) body[nlocal] = -1;
else if (body[nlocal] == 1) body[nlocal] = 0;
else error->one(FLERR,"Invalid atom type in Atoms section of data file");
rmass[nlocal] = atof(values[1]);
if (rmass[nlocal] <= 0.0)
error->one(FLERR,"Invalid density in Atoms section of data file");
return 2;
}
/* ----------------------------------------------------------------------
unpack one body from Bodies section of data file
------------------------------------------------------------------------- */
void AtomVecBody::data_body(int m, int ninteger, int ndouble,
int *ivalues, double *dvalues)
{
if (body[m]) error->one(FLERR,"Assigning body parameters to non-body atom");
if (nlocal_bonus == nmax_bonus) grow_bonus();
bonus[nlocal_bonus].ilocal = m;
bptr->data_body(nlocal_bonus,ninteger,ndouble,ivalues,dvalues);
body[m] = nlocal_bonus++;
}
/* ----------------------------------------------------------------------
unpack one tri from Velocities section of data file
------------------------------------------------------------------------- */
void AtomVecBody::data_vel(int m, char **values)
{
v[m][0] = atof(values[0]);
v[m][1] = atof(values[1]);
v[m][2] = atof(values[2]);
angmom[m][0] = atof(values[3]);
angmom[m][1] = atof(values[4]);
angmom[m][2] = atof(values[5]);
}
/* ----------------------------------------------------------------------
unpack hybrid quantities from one body in Velocities section of data file
------------------------------------------------------------------------- */
int AtomVecBody::data_vel_hybrid(int m, char **values)
{
angmom[m][0] = atof(values[0]);
angmom[m][1] = atof(values[1]);
angmom[m][2] = atof(values[2]);
return 3;
}
/* ----------------------------------------------------------------------
pack atom info for data file including 3 image flags
------------------------------------------------------------------------- */
void AtomVecBody::pack_data(double **buf)
{
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
buf[i][0] = ubuf(tag[i]).d;
buf[i][1] = ubuf(type[i]).d;
if (body[i] < 0) buf[i][2] = ubuf(0).d;
else buf[i][2] = ubuf(1).d;
buf[i][3] = rmass[i];
buf[i][4] = x[i][0];
buf[i][5] = x[i][1];
buf[i][6] = x[i][2];
buf[i][7] = ubuf((image[i] & IMGMASK) - IMGMAX).d;
buf[i][8] = ubuf((image[i] >> IMGBITS & IMGMASK) - IMGMAX).d;
buf[i][9] = ubuf((image[i] >> IMG2BITS) - IMGMAX).d;
}
}
/* ----------------------------------------------------------------------
pack hybrid atom info for data file
------------------------------------------------------------------------- */
int AtomVecBody::pack_data_hybrid(int i, double *buf)
{
if (body[i] < 0) buf[0] = ubuf(0).d;
else buf[0] = ubuf(1).d;
buf[1] = rmass[i];
return 2;
}
/* ----------------------------------------------------------------------
write atom info to data file including 3 image flags
------------------------------------------------------------------------- */
void AtomVecBody::write_data(FILE *fp, int n, double **buf)
{
for (int i = 0; i < n; i++)
fprintf(fp,TAGINT_FORMAT " %d %d %g %g %g %g %d %d %d\n",
(tagint) ubuf(buf[i][0]).i,(int) ubuf(buf[i][1]).i,
(int) ubuf(buf[i][2]).i,
buf[i][3],buf[i][4],buf[i][5],buf[i][6],
(int) ubuf(buf[i][7]).i,(int) ubuf(buf[i][8]).i,
(int) ubuf(buf[i][9]).i);
}
/* ----------------------------------------------------------------------
write hybrid atom info to data file
------------------------------------------------------------------------- */
int AtomVecBody::write_data_hybrid(FILE *fp, double *buf)
{
fprintf(fp," %d %g",(int) ubuf(buf[0]).i,buf[1]);
return 2;
}
/* ----------------------------------------------------------------------
pack velocity info for data file
------------------------------------------------------------------------- */
void AtomVecBody::pack_vel(double **buf)
{
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
buf[i][0] = ubuf(tag[i]).d;
buf[i][1] = v[i][0];
buf[i][2] = v[i][1];
buf[i][3] = v[i][2];
buf[i][4] = angmom[i][0];
buf[i][5] = angmom[i][1];
buf[i][6] = angmom[i][2];
}
}
/* ----------------------------------------------------------------------
pack hybrid velocity info for data file
------------------------------------------------------------------------- */
int AtomVecBody::pack_vel_hybrid(int i, double *buf)
{
buf[0] = angmom[i][0];
buf[1] = angmom[i][1];
buf[2] = angmom[i][2];
return 3;
}
/* ----------------------------------------------------------------------
write velocity info to data file
------------------------------------------------------------------------- */
void AtomVecBody::write_vel(FILE *fp, int n, double **buf)
{
for (int i = 0; i < n; i++)
fprintf(fp,TAGINT_FORMAT " %g %g %g %g %g %g\n",
(tagint) ubuf(buf[i][0]).i,buf[i][1],buf[i][2],buf[i][3],
buf[i][4],buf[i][5],buf[i][6]);
}
/* ----------------------------------------------------------------------
write hybrid velocity info to data file
------------------------------------------------------------------------- */
int AtomVecBody::write_vel_hybrid(FILE *fp, double *buf)
{
fprintf(fp," %g %g %g",buf[0],buf[1],buf[2]);
return 3;
}
/* ----------------------------------------------------------------------
body computes its size based on ivalues/dvalues and returns it
------------------------------------------------------------------------- */
double AtomVecBody::radius_body(int ninteger, int ndouble,
int *ivalues, double *dvalues)
{
return bptr->radius_body(ninteger,ndouble,ivalues,dvalues);
}
/* ----------------------------------------------------------------------
reset quat orientation for atom M to quat_external
called by Atom:add_molecule_atom()
------------------------------------------------------------------------- */
void AtomVecBody::set_quat(int m, double *quat_external)
{
if (body[m] < 0) error->one(FLERR,"Assigning quat to non-body atom");
double *quat = bonus[body[m]].quat;
quat[0] = quat_external[0]; quat[1] = quat_external[1];
quat[2] = quat_external[2]; quat[3] = quat_external[3];
}
/* ----------------------------------------------------------------------
return # of bytes of allocated memory
------------------------------------------------------------------------- */
bigint AtomVecBody::memory_usage()
{
bigint bytes = 0;
if (atom->memcheck("tag")) bytes += memory->usage(tag,nmax);
if (atom->memcheck("type")) bytes += memory->usage(type,nmax);
if (atom->memcheck("mask")) bytes += memory->usage(mask,nmax);
if (atom->memcheck("image")) bytes += memory->usage(image,nmax);
if (atom->memcheck("x")) bytes += memory->usage(x,nmax,3);
if (atom->memcheck("v")) bytes += memory->usage(v,nmax,3);
if (atom->memcheck("f")) bytes += memory->usage(f,nmax*comm->nthreads,3);
if (atom->memcheck("radius")) bytes += memory->usage(radius,nmax);
if (atom->memcheck("rmass")) bytes += memory->usage(rmass,nmax);
if (atom->memcheck("angmom")) bytes += memory->usage(angmom,nmax,3);
if (atom->memcheck("torque")) bytes +=
memory->usage(torque,nmax*comm->nthreads,3);
if (atom->memcheck("body")) bytes += memory->usage(body,nmax);
bytes += nmax_bonus*sizeof(Bonus);
bytes += icp->size + dcp->size;
int nall = nlocal_bonus + nghost_bonus;
for (int i = 0; i < nall; i++) {
bytes += bonus[i].ninteger * sizeof(int);
bytes += bonus[i].ndouble * sizeof(double);
}
return bytes;
}
/* ----------------------------------------------------------------------
debug method for sanity checking of own/bonus data pointers
------------------------------------------------------------------------- */
/*
void AtomVecBody::check(int flag)
{
for (int i = 0; i < atom->nlocal; i++) {
if (atom->body[i] >= 0 && atom->body[i] >= nlocal_bonus) {
printf("Proc %d, step %ld, flag %d\n",comm->me,update->ntimestep,flag);
errorx->one(FLERR,"BAD AAA");
}
}
for (int i = atom->nlocal; i < atom->nlocal+atom->nghost; i++) {
if (atom->body[i] >= 0 &&
(atom->body[i] < nlocal_bonus ||
atom->body[i] >= nlocal_bonus+nghost_bonus)) {
printf("Proc %d, step %ld, flag %d\n",comm->me,update->ntimestep,flag);
errorx->one(FLERR,"BAD BBB");
}
}
for (int i = 0; i < nlocal_bonus; i++) {
if (bonus[i].ilocal < 0 || bonus[i].ilocal >= atom->nlocal) {
printf("Proc %d, step %ld, flag %d\n",comm->me,update->ntimestep,flag);
errorx->one(FLERR,"BAD CCC");
}
}
for (int i = 0; i < nlocal_bonus; i++) {
if (atom->body[bonus[i].ilocal] != i) {
printf("Proc %d, step %ld, flag %d\n",comm->me,update->ntimestep,flag);
errorx->one(FLERR,"BAD DDD");
}
}
for (int i = nlocal_bonus; i < nlocal_bonus+nghost_bonus; i++) {
if (bonus[i].ilocal < atom->nlocal ||
bonus[i].ilocal >= atom->nlocal+atom->nghost) {
printf("Proc %d, step %ld, flag %d\n",comm->me,update->ntimestep,flag);
errorx->one(FLERR,"BAD EEE");
}
}
for (int i = nlocal_bonus; i < nlocal_bonus+nghost_bonus; i++) {
if (atom->body[bonus[i].ilocal] != i) {
printf("Proc %d, step %ld, flag %d\n",comm->me,update->ntimestep,flag);
errorx->one(FLERR,"BAD FFF");
}
}
}
*/
diff --git a/src/atom_vec_charge.cpp b/src/atom_vec_charge.cpp
index 08c3186a4..a93a29662 100644
--- a/src/atom_vec_charge.cpp
+++ b/src/atom_vec_charge.cpp
@@ -1,771 +1,771 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include <stdlib.h>
#include "atom_vec_charge.h"
#include "atom.h"
#include "comm.h"
#include "domain.h"
#include "modify.h"
#include "fix.h"
#include "memory.h"
#include "error.h"
using namespace LAMMPS_NS;
/* ---------------------------------------------------------------------- */
AtomVecCharge::AtomVecCharge(LAMMPS *lmp) : AtomVec(lmp)
{
molecular = 0;
mass_type = 1;
comm_x_only = comm_f_only = 1;
size_forward = 3;
size_reverse = 3;
size_border = 7;
size_velocity = 3;
size_data_atom = 6;
size_data_vel = 4;
xcol_data = 4;
atom->q_flag = 1;
}
/* ----------------------------------------------------------------------
grow atom arrays
n = 0 grows arrays by a chunk
n > 0 allocates arrays to size n
------------------------------------------------------------------------- */
void AtomVecCharge::grow(int n)
{
if (n == 0) grow_nmax();
else nmax = n;
atom->nmax = nmax;
- if (nmax < 0)
+ if (nmax < 0 || nmax > MAXSMALLINT)
error->one(FLERR,"Per-processor system is too big");
tag = memory->grow(atom->tag,nmax,"atom:tag");
type = memory->grow(atom->type,nmax,"atom:type");
mask = memory->grow(atom->mask,nmax,"atom:mask");
image = memory->grow(atom->image,nmax,"atom:image");
x = memory->grow(atom->x,nmax,3,"atom:x");
v = memory->grow(atom->v,nmax,3,"atom:v");
f = memory->grow(atom->f,nmax*comm->nthreads,3,"atom:f");
q = memory->grow(atom->q,nmax,"atom:q");
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
modify->fix[atom->extra_grow[iextra]]->grow_arrays(nmax);
}
/* ----------------------------------------------------------------------
reset local array ptrs
------------------------------------------------------------------------- */
void AtomVecCharge::grow_reset()
{
tag = atom->tag; type = atom->type;
mask = atom->mask; image = atom->image;
x = atom->x; v = atom->v; f = atom->f;
q = atom->q;
}
/* ----------------------------------------------------------------------
copy atom I info to atom J
------------------------------------------------------------------------- */
void AtomVecCharge::copy(int i, int j, int delflag)
{
tag[j] = tag[i];
type[j] = type[i];
mask[j] = mask[i];
image[j] = image[i];
x[j][0] = x[i][0];
x[j][1] = x[i][1];
x[j][2] = x[i][2];
v[j][0] = v[i][0];
v[j][1] = v[i][1];
v[j][2] = v[i][2];
q[j] = q[i];
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
modify->fix[atom->extra_grow[iextra]]->copy_arrays(i,j,delflag);
}
/* ---------------------------------------------------------------------- */
int AtomVecCharge::pack_comm(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0];
buf[m++] = x[j][1];
buf[m++] = x[j][2];
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0]*domain->xprd + pbc[5]*domain->xy + pbc[4]*domain->xz;
dy = pbc[1]*domain->yprd + pbc[3]*domain->yz;
dz = pbc[2]*domain->zprd;
}
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
}
}
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecCharge::pack_comm_vel(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz,dvx,dvy,dvz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0];
buf[m++] = x[j][1];
buf[m++] = x[j][2];
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0]*domain->xprd + pbc[5]*domain->xy + pbc[4]*domain->xz;
dy = pbc[1]*domain->yprd + pbc[3]*domain->yz;
dz = pbc[2]*domain->zprd;
}
if (!deform_vremap) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
}
} else {
dvx = pbc[0]*h_rate[0] + pbc[5]*h_rate[5] + pbc[4]*h_rate[4];
dvy = pbc[1]*h_rate[1] + pbc[3]*h_rate[3];
dvz = pbc[2]*h_rate[2];
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
if (mask[i] & deform_groupbit) {
buf[m++] = v[j][0] + dvx;
buf[m++] = v[j][1] + dvy;
buf[m++] = v[j][2] + dvz;
} else {
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
}
}
}
}
return m;
}
/* ---------------------------------------------------------------------- */
void AtomVecCharge::unpack_comm(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
x[i][0] = buf[m++];
x[i][1] = buf[m++];
x[i][2] = buf[m++];
}
}
/* ---------------------------------------------------------------------- */
void AtomVecCharge::unpack_comm_vel(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
x[i][0] = buf[m++];
x[i][1] = buf[m++];
x[i][2] = buf[m++];
v[i][0] = buf[m++];
v[i][1] = buf[m++];
v[i][2] = buf[m++];
}
}
/* ---------------------------------------------------------------------- */
int AtomVecCharge::pack_reverse(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
buf[m++] = f[i][0];
buf[m++] = f[i][1];
buf[m++] = f[i][2];
}
return m;
}
/* ---------------------------------------------------------------------- */
void AtomVecCharge::unpack_reverse(int n, int *list, double *buf)
{
int i,j,m;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
f[j][0] += buf[m++];
f[j][1] += buf[m++];
f[j][2] += buf[m++];
}
}
/* ---------------------------------------------------------------------- */
int AtomVecCharge::pack_border(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0];
buf[m++] = x[j][1];
buf[m++] = x[j][2];
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
buf[m++] = q[j];
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0];
dy = pbc[1];
dz = pbc[2];
}
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
buf[m++] = q[j];
}
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecCharge::pack_border_vel(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz,dvx,dvy,dvz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0];
buf[m++] = x[j][1];
buf[m++] = x[j][2];
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
buf[m++] = q[j];
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0];
dy = pbc[1];
dz = pbc[2];
}
if (!deform_vremap) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
buf[m++] = q[j];
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
}
} else {
dvx = pbc[0]*h_rate[0] + pbc[5]*h_rate[5] + pbc[4]*h_rate[4];
dvy = pbc[1]*h_rate[1] + pbc[3]*h_rate[3];
dvz = pbc[2]*h_rate[2];
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
buf[m++] = q[j];
if (mask[i] & deform_groupbit) {
buf[m++] = v[j][0] + dvx;
buf[m++] = v[j][1] + dvy;
buf[m++] = v[j][2] + dvz;
} else {
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
}
}
}
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecCharge::pack_border_hybrid(int n, int *list, double *buf)
{
int i,j,m;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = q[j];
}
return m;
}
/* ---------------------------------------------------------------------- */
void AtomVecCharge::unpack_border(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
if (i == nmax) grow(0);
x[i][0] = buf[m++];
x[i][1] = buf[m++];
x[i][2] = buf[m++];
tag[i] = (tagint) ubuf(buf[m++]).i;
type[i] = (int) ubuf(buf[m++]).i;
mask[i] = (int) ubuf(buf[m++]).i;
q[i] = buf[m++];
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->
unpack_border(n,first,&buf[m]);
}
/* ---------------------------------------------------------------------- */
void AtomVecCharge::unpack_border_vel(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
if (i == nmax) grow(0);
x[i][0] = buf[m++];
x[i][1] = buf[m++];
x[i][2] = buf[m++];
tag[i] = (tagint) ubuf(buf[m++]).i;
type[i] = (int) ubuf(buf[m++]).i;
mask[i] = (int) ubuf(buf[m++]).i;
q[i] = buf[m++];
v[i][0] = buf[m++];
v[i][1] = buf[m++];
v[i][2] = buf[m++];
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->
unpack_border(n,first,&buf[m]);
}
/* ---------------------------------------------------------------------- */
int AtomVecCharge::unpack_border_hybrid(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++)
q[i] = buf[m++];
return m;
}
/* ----------------------------------------------------------------------
pack data for atom I for sending to another proc
xyz must be 1st 3 values, so comm::exchange() can test on them
------------------------------------------------------------------------- */
int AtomVecCharge::pack_exchange(int i, double *buf)
{
int m = 1;
buf[m++] = x[i][0];
buf[m++] = x[i][1];
buf[m++] = x[i][2];
buf[m++] = v[i][0];
buf[m++] = v[i][1];
buf[m++] = v[i][2];
buf[m++] = ubuf(tag[i]).d;
buf[m++] = ubuf(type[i]).d;
buf[m++] = ubuf(mask[i]).d;
buf[m++] = ubuf(image[i]).d;
buf[m++] = q[i];
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
m += modify->fix[atom->extra_grow[iextra]]->pack_exchange(i,&buf[m]);
buf[0] = m;
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecCharge::unpack_exchange(double *buf)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) grow(0);
int m = 1;
x[nlocal][0] = buf[m++];
x[nlocal][1] = buf[m++];
x[nlocal][2] = buf[m++];
v[nlocal][0] = buf[m++];
v[nlocal][1] = buf[m++];
v[nlocal][2] = buf[m++];
tag[nlocal] = (tagint) ubuf(buf[m++]).i;
type[nlocal] = (int) ubuf(buf[m++]).i;
mask[nlocal] = (int) ubuf(buf[m++]).i;
image[nlocal] = (imageint) ubuf(buf[m++]).i;
q[nlocal] = buf[m++];
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
m += modify->fix[atom->extra_grow[iextra]]->
unpack_exchange(nlocal,&buf[m]);
atom->nlocal++;
return m;
}
/* ----------------------------------------------------------------------
size of restart data for all atoms owned by this proc
include extra data stored by fixes
------------------------------------------------------------------------- */
int AtomVecCharge::size_restart()
{
int i;
int nlocal = atom->nlocal;
int n = 12 * nlocal;
if (atom->nextra_restart)
for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
for (i = 0; i < nlocal; i++)
n += modify->fix[atom->extra_restart[iextra]]->size_restart(i);
return n;
}
/* ----------------------------------------------------------------------
pack atom I's data for restart file including extra quantities
xyz must be 1st 3 values, so that read_restart can test on them
molecular types may be negative, but write as positive
------------------------------------------------------------------------- */
int AtomVecCharge::pack_restart(int i, double *buf)
{
int m = 1;
buf[m++] = x[i][0];
buf[m++] = x[i][1];
buf[m++] = x[i][2];
buf[m++] = ubuf(tag[i]).d;
buf[m++] = ubuf(type[i]).d;
buf[m++] = ubuf(mask[i]).d;
buf[m++] = ubuf(image[i]).d;
buf[m++] = v[i][0];
buf[m++] = v[i][1];
buf[m++] = v[i][2];
buf[m++] = q[i];
if (atom->nextra_restart)
for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
m += modify->fix[atom->extra_restart[iextra]]->pack_restart(i,&buf[m]);
buf[0] = m;
return m;
}
/* ----------------------------------------------------------------------
unpack data for one atom from restart file including extra quantities
------------------------------------------------------------------------- */
int AtomVecCharge::unpack_restart(double *buf)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) {
grow(0);
if (atom->nextra_store)
memory->grow(atom->extra,nmax,atom->nextra_store,"atom:extra");
}
int m = 1;
x[nlocal][0] = buf[m++];
x[nlocal][1] = buf[m++];
x[nlocal][2] = buf[m++];
tag[nlocal] = (tagint) ubuf(buf[m++]).i;
type[nlocal] = (int) ubuf(buf[m++]).i;
mask[nlocal] = (int) ubuf(buf[m++]).i;
image[nlocal] = (imageint) ubuf(buf[m++]).i;
v[nlocal][0] = buf[m++];
v[nlocal][1] = buf[m++];
v[nlocal][2] = buf[m++];
q[nlocal] = buf[m++];
double **extra = atom->extra;
if (atom->nextra_store) {
int size = static_cast<int> (buf[0]) - m;
for (int i = 0; i < size; i++) extra[nlocal][i] = buf[m++];
}
atom->nlocal++;
return m;
}
/* ----------------------------------------------------------------------
create one atom of itype at coord
set other values to defaults
------------------------------------------------------------------------- */
void AtomVecCharge::create_atom(int itype, double *coord)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) grow(0);
tag[nlocal] = 0;
type[nlocal] = itype;
x[nlocal][0] = coord[0];
x[nlocal][1] = coord[1];
x[nlocal][2] = coord[2];
mask[nlocal] = 1;
image[nlocal] = ((imageint) IMGMAX << IMG2BITS) |
((imageint) IMGMAX << IMGBITS) | IMGMAX;
v[nlocal][0] = 0.0;
v[nlocal][1] = 0.0;
v[nlocal][2] = 0.0;
q[nlocal] = 0.0;
atom->nlocal++;
}
/* ----------------------------------------------------------------------
unpack one line from Atoms section of data file
initialize other atom quantities
------------------------------------------------------------------------- */
void AtomVecCharge::data_atom(double *coord, imageint imagetmp, char **values)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) grow(0);
tag[nlocal] = ATOTAGINT(values[0]);
type[nlocal] = atoi(values[1]);
if (type[nlocal] <= 0 || type[nlocal] > atom->ntypes)
error->one(FLERR,"Invalid atom type in Atoms section of data file");
q[nlocal] = atof(values[2]);
x[nlocal][0] = coord[0];
x[nlocal][1] = coord[1];
x[nlocal][2] = coord[2];
image[nlocal] = imagetmp;
mask[nlocal] = 1;
v[nlocal][0] = 0.0;
v[nlocal][1] = 0.0;
v[nlocal][2] = 0.0;
atom->nlocal++;
}
/* ----------------------------------------------------------------------
unpack hybrid quantities from one line in Atoms section of data file
initialize other atom quantities for this sub-style
------------------------------------------------------------------------- */
int AtomVecCharge::data_atom_hybrid(int nlocal, char **values)
{
q[nlocal] = atof(values[0]);
return 1;
}
/* ----------------------------------------------------------------------
pack atom info for data file including 3 image flags
------------------------------------------------------------------------- */
void AtomVecCharge::pack_data(double **buf)
{
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
buf[i][0] = ubuf(tag[i]).d;
buf[i][1] = ubuf(type[i]).d;
buf[i][2] = q[i];
buf[i][3] = x[i][0];
buf[i][4] = x[i][1];
buf[i][5] = x[i][2];
buf[i][6] = ubuf((image[i] & IMGMASK) - IMGMAX).d;
buf[i][7] = ubuf((image[i] >> IMGBITS & IMGMASK) - IMGMAX).d;
buf[i][8] = ubuf((image[i] >> IMG2BITS) - IMGMAX).d;
}
}
/* ----------------------------------------------------------------------
pack hybrid atom info for data file
------------------------------------------------------------------------- */
int AtomVecCharge::pack_data_hybrid(int i, double *buf)
{
buf[0] = q[i];
return 1;
}
/* ----------------------------------------------------------------------
write atom info to data file including 3 image flags
------------------------------------------------------------------------- */
void AtomVecCharge::write_data(FILE *fp, int n, double **buf)
{
for (int i = 0; i < n; i++)
fprintf(fp,TAGINT_FORMAT " %d %-1.16e %-1.16e %-1.16e %-1.16e %d %d %d\n",
(tagint) ubuf(buf[i][0]).i,(int) ubuf(buf[i][1]).i,
buf[i][2],buf[i][3],buf[i][4],buf[i][5],
(int) ubuf(buf[i][6]).i,(int) ubuf(buf[i][7]).i,
(int) ubuf(buf[i][8]).i);
}
/* ----------------------------------------------------------------------
write hybrid atom info to data file
------------------------------------------------------------------------- */
int AtomVecCharge::write_data_hybrid(FILE *fp, double *buf)
{
fprintf(fp," %-1.16e",buf[0]);
return 1;
}
/* ----------------------------------------------------------------------
return # of bytes of allocated memory
------------------------------------------------------------------------- */
bigint AtomVecCharge::memory_usage()
{
bigint bytes = 0;
if (atom->memcheck("tag")) bytes += memory->usage(tag,nmax);
if (atom->memcheck("type")) bytes += memory->usage(type,nmax);
if (atom->memcheck("mask")) bytes += memory->usage(mask,nmax);
if (atom->memcheck("image")) bytes += memory->usage(image,nmax);
if (atom->memcheck("x")) bytes += memory->usage(x,nmax,3);
if (atom->memcheck("v")) bytes += memory->usage(v,nmax,3);
if (atom->memcheck("f")) bytes += memory->usage(f,nmax*comm->nthreads,3);
if (atom->memcheck("q")) bytes += memory->usage(q,nmax);
return bytes;
}
diff --git a/src/atom_vec_ellipsoid.cpp b/src/atom_vec_ellipsoid.cpp
index 4d1dc01c0..858b89d62 100644
--- a/src/atom_vec_ellipsoid.cpp
+++ b/src/atom_vec_ellipsoid.cpp
@@ -1,1401 +1,1401 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing author: Mike Brown (SNL)
------------------------------------------------------------------------- */
#include <stdlib.h>
#include "atom_vec_ellipsoid.h"
#include "math_extra.h"
#include "atom.h"
#include "comm.h"
#include "force.h"
#include "domain.h"
#include "modify.h"
#include "fix.h"
#include "math_const.h"
#include "memory.h"
#include "error.h"
using namespace LAMMPS_NS;
using namespace MathConst;
/* ---------------------------------------------------------------------- */
AtomVecEllipsoid::AtomVecEllipsoid(LAMMPS *lmp) : AtomVec(lmp)
{
molecular = 0;
comm_x_only = comm_f_only = 0;
size_forward = 7;
size_reverse = 6;
size_border = 14;
size_velocity = 6;
size_data_atom = 7;
size_data_vel = 7;
size_data_bonus = 8;
xcol_data = 5;
atom->ellipsoid_flag = 1;
atom->rmass_flag = atom->angmom_flag = atom->torque_flag = 1;
nlocal_bonus = nghost_bonus = nmax_bonus = 0;
bonus = NULL;
}
/* ---------------------------------------------------------------------- */
AtomVecEllipsoid::~AtomVecEllipsoid()
{
memory->sfree(bonus);
}
/* ----------------------------------------------------------------------
grow atom arrays
n = 0 grows arrays by a chunk
n > 0 allocates arrays to size n
------------------------------------------------------------------------- */
void AtomVecEllipsoid::grow(int n)
{
if (n == 0) grow_nmax();
else nmax = n;
atom->nmax = nmax;
- if (nmax < 0)
+ if (nmax < 0 || nmax > MAXSMALLINT)
error->one(FLERR,"Per-processor system is too big");
tag = memory->grow(atom->tag,nmax,"atom:tag");
type = memory->grow(atom->type,nmax,"atom:type");
mask = memory->grow(atom->mask,nmax,"atom:mask");
image = memory->grow(atom->image,nmax,"atom:image");
x = memory->grow(atom->x,nmax,3,"atom:x");
v = memory->grow(atom->v,nmax,3,"atom:v");
f = memory->grow(atom->f,nmax*comm->nthreads,3,"atom:f");
rmass = memory->grow(atom->rmass,nmax,"atom:rmass");
angmom = memory->grow(atom->angmom,nmax,3,"atom:angmom");
torque = memory->grow(atom->torque,nmax*comm->nthreads,3,"atom:torque");
ellipsoid = memory->grow(atom->ellipsoid,nmax,"atom:ellipsoid");
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
modify->fix[atom->extra_grow[iextra]]->grow_arrays(nmax);
}
/* ----------------------------------------------------------------------
reset local array ptrs
------------------------------------------------------------------------- */
void AtomVecEllipsoid::grow_reset()
{
tag = atom->tag; type = atom->type;
mask = atom->mask; image = atom->image;
x = atom->x; v = atom->v; f = atom->f;
rmass = atom->rmass; angmom = atom->angmom; torque = atom->torque;
ellipsoid = atom->ellipsoid;
}
/* ----------------------------------------------------------------------
grow bonus data structure
------------------------------------------------------------------------- */
void AtomVecEllipsoid::grow_bonus()
{
nmax_bonus = grow_nmax_bonus(nmax_bonus);
if (nmax_bonus < 0)
error->one(FLERR,"Per-processor system is too big");
bonus = (Bonus *) memory->srealloc(bonus,nmax_bonus*sizeof(Bonus),
"atom:bonus");
}
/* ----------------------------------------------------------------------
copy atom I info to atom J
------------------------------------------------------------------------- */
void AtomVecEllipsoid::copy(int i, int j, int delflag)
{
tag[j] = tag[i];
type[j] = type[i];
mask[j] = mask[i];
image[j] = image[i];
x[j][0] = x[i][0];
x[j][1] = x[i][1];
x[j][2] = x[i][2];
v[j][0] = v[i][0];
v[j][1] = v[i][1];
v[j][2] = v[i][2];
rmass[j] = rmass[i];
angmom[j][0] = angmom[i][0];
angmom[j][1] = angmom[i][1];
angmom[j][2] = angmom[i][2];
// if deleting atom J via delflag and J has bonus data, then delete it
if (delflag && ellipsoid[j] >= 0) {
copy_bonus(nlocal_bonus-1,ellipsoid[j]);
nlocal_bonus--;
}
// if atom I has bonus data, reset I's bonus.ilocal to loc J
// do NOT do this if self-copy (I=J) since I's bonus data is already deleted
if (ellipsoid[i] >= 0 && i != j) bonus[ellipsoid[i]].ilocal = j;
ellipsoid[j] = ellipsoid[i];
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
modify->fix[atom->extra_grow[iextra]]->copy_arrays(i,j,delflag);
}
/* ----------------------------------------------------------------------
copy bonus data from I to J, effectively deleting the J entry
also reset ellipsoid that points to I to now point to J
------------------------------------------------------------------------- */
void AtomVecEllipsoid::copy_bonus(int i, int j)
{
ellipsoid[bonus[i].ilocal] = j;
memcpy(&bonus[j],&bonus[i],sizeof(Bonus));
}
/* ----------------------------------------------------------------------
clear ghost info in bonus data
called before ghosts are recommunicated in comm and irregular
------------------------------------------------------------------------- */
void AtomVecEllipsoid::clear_bonus()
{
nghost_bonus = 0;
}
/* ----------------------------------------------------------------------
set shape values in bonus data for particle I
oriented aligned with xyz axes
this may create or delete entry in bonus data
------------------------------------------------------------------------- */
void AtomVecEllipsoid::set_shape(int i,
double shapex, double shapey, double shapez)
{
if (ellipsoid[i] < 0) {
if (shapex == 0.0 && shapey == 0.0 && shapez == 0.0) return;
if (nlocal_bonus == nmax_bonus) grow_bonus();
double *shape = bonus[nlocal_bonus].shape;
double *quat = bonus[nlocal_bonus].quat;
shape[0] = shapex;
shape[1] = shapey;
shape[2] = shapez;
quat[0] = 1.0;
quat[1] = 0.0;
quat[2] = 0.0;
quat[3] = 0.0;
bonus[nlocal_bonus].ilocal = i;
ellipsoid[i] = nlocal_bonus++;
} else if (shapex == 0.0 && shapey == 0.0 && shapez == 0.0) {
copy_bonus(nlocal_bonus-1,ellipsoid[i]);
nlocal_bonus--;
ellipsoid[i] = -1;
} else {
double *shape = bonus[ellipsoid[i]].shape;
shape[0] = shapex;
shape[1] = shapey;
shape[2] = shapez;
}
}
/* ---------------------------------------------------------------------- */
int AtomVecEllipsoid::pack_comm(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz;
double *quat;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0];
buf[m++] = x[j][1];
buf[m++] = x[j][2];
if (ellipsoid[j] >= 0) {
quat = bonus[ellipsoid[j]].quat;
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
}
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0]*domain->xprd + pbc[5]*domain->xy + pbc[4]*domain->xz;
dy = pbc[1]*domain->yprd + pbc[3]*domain->yz;
dz = pbc[2]*domain->zprd;
}
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
if (ellipsoid[j] >= 0) {
quat = bonus[ellipsoid[j]].quat;
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
}
}
}
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecEllipsoid::pack_comm_vel(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz,dvx,dvy,dvz;
double *quat;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0];
buf[m++] = x[j][1];
buf[m++] = x[j][2];
if (ellipsoid[j] >= 0) {
quat = bonus[ellipsoid[j]].quat;
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
}
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
buf[m++] = angmom[j][0];
buf[m++] = angmom[j][1];
buf[m++] = angmom[j][2];
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0]*domain->xprd + pbc[5]*domain->xy + pbc[4]*domain->xz;
dy = pbc[1]*domain->yprd + pbc[3]*domain->yz;
dz = pbc[2]*domain->zprd;
}
if (!deform_vremap) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
if (ellipsoid[j] >= 0) {
quat = bonus[ellipsoid[j]].quat;
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
}
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
buf[m++] = angmom[j][0];
buf[m++] = angmom[j][1];
buf[m++] = angmom[j][2];
}
} else {
dvx = pbc[0]*h_rate[0] + pbc[5]*h_rate[5] + pbc[4]*h_rate[4];
dvy = pbc[1]*h_rate[1] + pbc[3]*h_rate[3];
dvz = pbc[2]*h_rate[2];
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
if (ellipsoid[j] >= 0) {
quat = bonus[ellipsoid[j]].quat;
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
}
if (mask[i] & deform_groupbit) {
buf[m++] = v[j][0] + dvx;
buf[m++] = v[j][1] + dvy;
buf[m++] = v[j][2] + dvz;
} else {
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
}
buf[m++] = angmom[j][0];
buf[m++] = angmom[j][1];
buf[m++] = angmom[j][2];
}
}
}
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecEllipsoid::pack_comm_hybrid(int n, int *list, double *buf)
{
int i,j,m;
double *quat;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
if (ellipsoid[j] >= 0) {
quat = bonus[ellipsoid[j]].quat;
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
}
}
return m;
}
/* ---------------------------------------------------------------------- */
void AtomVecEllipsoid::unpack_comm(int n, int first, double *buf)
{
int i,m,last;
double *quat;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
x[i][0] = buf[m++];
x[i][1] = buf[m++];
x[i][2] = buf[m++];
if (ellipsoid[i] >= 0) {
quat = bonus[ellipsoid[i]].quat;
quat[0] = buf[m++];
quat[1] = buf[m++];
quat[2] = buf[m++];
quat[3] = buf[m++];
}
}
}
/* ---------------------------------------------------------------------- */
void AtomVecEllipsoid::unpack_comm_vel(int n, int first, double *buf)
{
int i,m,last;
double *quat;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
x[i][0] = buf[m++];
x[i][1] = buf[m++];
x[i][2] = buf[m++];
if (ellipsoid[i] >= 0) {
quat = bonus[ellipsoid[i]].quat;
quat[0] = buf[m++];
quat[1] = buf[m++];
quat[2] = buf[m++];
quat[3] = buf[m++];
}
v[i][0] = buf[m++];
v[i][1] = buf[m++];
v[i][2] = buf[m++];
angmom[i][0] = buf[m++];
angmom[i][1] = buf[m++];
angmom[i][2] = buf[m++];
}
}
/* ---------------------------------------------------------------------- */
int AtomVecEllipsoid::unpack_comm_hybrid(int n, int first, double *buf)
{
int i,m,last;
double *quat;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
if (ellipsoid[i] >= 0) {
quat = bonus[ellipsoid[i]].quat;
quat[0] = buf[m++];
quat[1] = buf[m++];
quat[2] = buf[m++];
quat[3] = buf[m++];
}
}
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecEllipsoid::pack_reverse(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
buf[m++] = f[i][0];
buf[m++] = f[i][1];
buf[m++] = f[i][2];
buf[m++] = torque[i][0];
buf[m++] = torque[i][1];
buf[m++] = torque[i][2];
}
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecEllipsoid::pack_reverse_hybrid(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
buf[m++] = torque[i][0];
buf[m++] = torque[i][1];
buf[m++] = torque[i][2];
}
return m;
}
/* ---------------------------------------------------------------------- */
void AtomVecEllipsoid::unpack_reverse(int n, int *list, double *buf)
{
int i,j,m;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
f[j][0] += buf[m++];
f[j][1] += buf[m++];
f[j][2] += buf[m++];
torque[j][0] += buf[m++];
torque[j][1] += buf[m++];
torque[j][2] += buf[m++];
}
}
/* ---------------------------------------------------------------------- */
int AtomVecEllipsoid::unpack_reverse_hybrid(int n, int *list, double *buf)
{
int i,j,m;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
torque[j][0] += buf[m++];
torque[j][1] += buf[m++];
torque[j][2] += buf[m++];
}
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecEllipsoid::pack_border(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz;
double *shape,*quat;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0];
buf[m++] = x[j][1];
buf[m++] = x[j][2];
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
if (ellipsoid[j] < 0) buf[m++] = ubuf(0).d;
else {
buf[m++] = ubuf(1).d;
shape = bonus[ellipsoid[j]].shape;
quat = bonus[ellipsoid[j]].quat;
buf[m++] = shape[0];
buf[m++] = shape[1];
buf[m++] = shape[2];
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
}
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0];
dy = pbc[1];
dz = pbc[2];
}
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
if (ellipsoid[j] < 0) buf[m++] = ubuf(0).d;
else {
buf[m++] = ubuf(1).d;
shape = bonus[ellipsoid[j]].shape;
quat = bonus[ellipsoid[j]].quat;
buf[m++] = shape[0];
buf[m++] = shape[1];
buf[m++] = shape[2];
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
}
}
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecEllipsoid::pack_border_vel(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz,dvx,dvy,dvz;
double *shape,*quat;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0];
buf[m++] = x[j][1];
buf[m++] = x[j][2];
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
if (ellipsoid[j] < 0) buf[m++] = ubuf(0).d;
else {
buf[m++] = ubuf(1).d;
shape = bonus[ellipsoid[j]].shape;
quat = bonus[ellipsoid[j]].quat;
buf[m++] = shape[0];
buf[m++] = shape[1];
buf[m++] = shape[2];
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
}
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
buf[m++] = angmom[j][0];
buf[m++] = angmom[j][1];
buf[m++] = angmom[j][2];
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0];
dy = pbc[1];
dz = pbc[2];
}
if (!deform_vremap) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
if (ellipsoid[j] < 0) buf[m++] = ubuf(0).d;
else {
buf[m++] = ubuf(1).d;
shape = bonus[ellipsoid[j]].shape;
quat = bonus[ellipsoid[j]].quat;
buf[m++] = shape[0];
buf[m++] = shape[1];
buf[m++] = shape[2];
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
}
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
buf[m++] = angmom[j][0];
buf[m++] = angmom[j][1];
buf[m++] = angmom[j][2];
}
} else {
dvx = pbc[0]*h_rate[0] + pbc[5]*h_rate[5] + pbc[4]*h_rate[4];
dvy = pbc[1]*h_rate[1] + pbc[3]*h_rate[3];
dvz = pbc[2]*h_rate[2];
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
if (ellipsoid[j] < 0) buf[m++] = ubuf(0).d;
else {
buf[m++] = ubuf(1).d;
shape = bonus[ellipsoid[j]].shape;
quat = bonus[ellipsoid[j]].quat;
buf[m++] = shape[0];
buf[m++] = shape[1];
buf[m++] = shape[2];
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
}
if (mask[i] & deform_groupbit) {
buf[m++] = v[j][0] + dvx;
buf[m++] = v[j][1] + dvy;
buf[m++] = v[j][2] + dvz;
} else {
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
}
buf[m++] = angmom[j][0];
buf[m++] = angmom[j][1];
buf[m++] = angmom[j][2];
}
}
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecEllipsoid::pack_border_hybrid(int n, int *list, double *buf)
{
int i,j,m;
double *shape,*quat;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
if (ellipsoid[j] < 0) buf[m++] = ubuf(0).d;
else {
buf[m++] = ubuf(1).d;
shape = bonus[ellipsoid[j]].shape;
quat = bonus[ellipsoid[j]].quat;
buf[m++] = shape[0];
buf[m++] = shape[1];
buf[m++] = shape[2];
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
}
}
return m;
}
/* ---------------------------------------------------------------------- */
void AtomVecEllipsoid::unpack_border(int n, int first, double *buf)
{
int i,j,m,last;
double *shape,*quat;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
if (i == nmax) grow(0);
x[i][0] = buf[m++];
x[i][1] = buf[m++];
x[i][2] = buf[m++];
tag[i] = (tagint) ubuf(buf[m++]).i;
type[i] = (int) ubuf(buf[m++]).i;
mask[i] = (int) ubuf(buf[m++]).i;
ellipsoid[i] = (int) ubuf(buf[m++]).i;
if (ellipsoid[i] == 0) ellipsoid[i] = -1;
else {
j = nlocal_bonus + nghost_bonus;
if (j == nmax_bonus) grow_bonus();
shape = bonus[j].shape;
quat = bonus[j].quat;
shape[0] = buf[m++];
shape[1] = buf[m++];
shape[2] = buf[m++];
quat[0] = buf[m++];
quat[1] = buf[m++];
quat[2] = buf[m++];
quat[3] = buf[m++];
bonus[j].ilocal = i;
ellipsoid[i] = j;
nghost_bonus++;
}
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->
unpack_border(n,first,&buf[m]);
}
/* ---------------------------------------------------------------------- */
void AtomVecEllipsoid::unpack_border_vel(int n, int first, double *buf)
{
int i,j,m,last;
double *shape,*quat;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
if (i == nmax) grow(0);
x[i][0] = buf[m++];
x[i][1] = buf[m++];
x[i][2] = buf[m++];
tag[i] = (tagint) ubuf(buf[m++]).i;
type[i] = (int) ubuf(buf[m++]).i;
mask[i] = (int) ubuf(buf[m++]).i;
ellipsoid[i] = (int) ubuf(buf[m++]).i;
if (ellipsoid[i] == 0) ellipsoid[i] = -1;
else {
j = nlocal_bonus + nghost_bonus;
if (j == nmax_bonus) grow_bonus();
shape = bonus[j].shape;
quat = bonus[j].quat;
shape[0] = buf[m++];
shape[1] = buf[m++];
shape[2] = buf[m++];
quat[0] = buf[m++];
quat[1] = buf[m++];
quat[2] = buf[m++];
quat[3] = buf[m++];
bonus[j].ilocal = i;
ellipsoid[i] = j;
nghost_bonus++;
}
v[i][0] = buf[m++];
v[i][1] = buf[m++];
v[i][2] = buf[m++];
angmom[i][0] = buf[m++];
angmom[i][1] = buf[m++];
angmom[i][2] = buf[m++];
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->
unpack_border(n,first,&buf[m]);
}
/* ---------------------------------------------------------------------- */
int AtomVecEllipsoid::unpack_border_hybrid(int n, int first, double *buf)
{
int i,j,m,last;
double *shape,*quat;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
ellipsoid[i] = (int) ubuf(buf[m++]).i;
if (ellipsoid[i] == 0) ellipsoid[i] = -1;
else {
j = nlocal_bonus + nghost_bonus;
if (j == nmax_bonus) grow_bonus();
shape = bonus[j].shape;
quat = bonus[j].quat;
shape[0] = buf[m++];
shape[1] = buf[m++];
shape[2] = buf[m++];
quat[0] = buf[m++];
quat[1] = buf[m++];
quat[2] = buf[m++];
quat[3] = buf[m++];
bonus[j].ilocal = i;
ellipsoid[i] = j;
nghost_bonus++;
}
}
return m;
}
/* ----------------------------------------------------------------------
pack data for atom I for sending to another proc
xyz must be 1st 3 values, so comm::exchange() can test on them
------------------------------------------------------------------------- */
int AtomVecEllipsoid::pack_exchange(int i, double *buf)
{
int m = 1;
buf[m++] = x[i][0];
buf[m++] = x[i][1];
buf[m++] = x[i][2];
buf[m++] = v[i][0];
buf[m++] = v[i][1];
buf[m++] = v[i][2];
buf[m++] = ubuf(tag[i]).d;
buf[m++] = ubuf(type[i]).d;
buf[m++] = ubuf(mask[i]).d;
buf[m++] = ubuf(image[i]).d;
buf[m++] = rmass[i];
buf[m++] = angmom[i][0];
buf[m++] = angmom[i][1];
buf[m++] = angmom[i][2];
if (ellipsoid[i] < 0) buf[m++] = ubuf(0).d;
else {
buf[m++] = ubuf(1).d;
int j = ellipsoid[i];
double *shape = bonus[j].shape;
double *quat = bonus[j].quat;
buf[m++] = shape[0];
buf[m++] = shape[1];
buf[m++] = shape[2];
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
}
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
m += modify->fix[atom->extra_grow[iextra]]->pack_exchange(i,&buf[m]);
buf[0] = m;
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecEllipsoid::unpack_exchange(double *buf)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) grow(0);
int m = 1;
x[nlocal][0] = buf[m++];
x[nlocal][1] = buf[m++];
x[nlocal][2] = buf[m++];
v[nlocal][0] = buf[m++];
v[nlocal][1] = buf[m++];
v[nlocal][2] = buf[m++];
tag[nlocal] = (tagint) ubuf(buf[m++]).i;
type[nlocal] = (int) ubuf(buf[m++]).i;
mask[nlocal] = (int) ubuf(buf[m++]).i;
image[nlocal] = (imageint) ubuf(buf[m++]).i;
rmass[nlocal] = buf[m++];
angmom[nlocal][0] = buf[m++];
angmom[nlocal][1] = buf[m++];
angmom[nlocal][2] = buf[m++];
ellipsoid[nlocal] = (int) ubuf(buf[m++]).i;
if (ellipsoid[nlocal] == 0) ellipsoid[nlocal] = -1;
else {
if (nlocal_bonus == nmax_bonus) grow_bonus();
double *shape = bonus[nlocal_bonus].shape;
double *quat = bonus[nlocal_bonus].quat;
shape[0] = buf[m++];
shape[1] = buf[m++];
shape[2] = buf[m++];
quat[0] = buf[m++];
quat[1] = buf[m++];
quat[2] = buf[m++];
quat[3] = buf[m++];
bonus[nlocal_bonus].ilocal = nlocal;
ellipsoid[nlocal] = nlocal_bonus++;
}
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
m += modify->fix[atom->extra_grow[iextra]]->
unpack_exchange(nlocal,&buf[m]);
atom->nlocal++;
return m;
}
/* ----------------------------------------------------------------------
size of restart data for all atoms owned by this proc
include extra data stored by fixes
------------------------------------------------------------------------- */
int AtomVecEllipsoid::size_restart()
{
int i;
int n = 0;
int nlocal = atom->nlocal;
for (i = 0; i < nlocal; i++)
if (ellipsoid[i] >= 0) n += 23;
else n += 16;
if (atom->nextra_restart)
for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
for (i = 0; i < nlocal; i++)
n += modify->fix[atom->extra_restart[iextra]]->size_restart(i);
return n;
}
/* ----------------------------------------------------------------------
pack atom I's data for restart file including bonus data
xyz must be 1st 3 values, so that read_restart can test on them
molecular types may be negative, but write as positive
------------------------------------------------------------------------- */
int AtomVecEllipsoid::pack_restart(int i, double *buf)
{
int m = 1;
buf[m++] = x[i][0];
buf[m++] = x[i][1];
buf[m++] = x[i][2];
buf[m++] = ubuf(tag[i]).d;
buf[m++] = ubuf(type[i]).d;
buf[m++] = ubuf(mask[i]).d;
buf[m++] = ubuf(image[i]).d;
buf[m++] = v[i][0];
buf[m++] = v[i][1];
buf[m++] = v[i][2];
buf[m++] = rmass[i];
buf[m++] = angmom[i][0];
buf[m++] = angmom[i][1];
buf[m++] = angmom[i][2];
if (ellipsoid[i] < 0) buf[m++] = ubuf(0).d;
else {
buf[m++] = ubuf(1).d;
int j = ellipsoid[i];
buf[m++] = bonus[j].shape[0];
buf[m++] = bonus[j].shape[1];
buf[m++] = bonus[j].shape[2];
buf[m++] = bonus[j].quat[0];
buf[m++] = bonus[j].quat[1];
buf[m++] = bonus[j].quat[2];
buf[m++] = bonus[j].quat[3];
}
if (atom->nextra_restart)
for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
m += modify->fix[atom->extra_restart[iextra]]->pack_restart(i,&buf[m]);
buf[0] = m;
return m;
}
/* ----------------------------------------------------------------------
unpack data for one atom from restart file including bonus data
------------------------------------------------------------------------- */
int AtomVecEllipsoid::unpack_restart(double *buf)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) {
grow(0);
if (atom->nextra_store)
memory->grow(atom->extra,nmax,atom->nextra_store,"atom:extra");
}
int m = 1;
x[nlocal][0] = buf[m++];
x[nlocal][1] = buf[m++];
x[nlocal][2] = buf[m++];
tag[nlocal] = (tagint) ubuf(buf[m++]).i;
type[nlocal] = (int) ubuf(buf[m++]).i;
mask[nlocal] = (int) ubuf(buf[m++]).i;
image[nlocal] = (imageint) ubuf(buf[m++]).i;
v[nlocal][0] = buf[m++];
v[nlocal][1] = buf[m++];
v[nlocal][2] = buf[m++];
rmass[nlocal] = buf[m++];
angmom[nlocal][0] = buf[m++];
angmom[nlocal][1] = buf[m++];
angmom[nlocal][2] = buf[m++];
ellipsoid[nlocal] = (int) ubuf(buf[m++]).i;
if (ellipsoid[nlocal] == 0) ellipsoid[nlocal] = -1;
else {
if (nlocal_bonus == nmax_bonus) grow_bonus();
double *shape = bonus[nlocal_bonus].shape;
double *quat = bonus[nlocal_bonus].quat;
shape[0] = buf[m++];
shape[1] = buf[m++];
shape[2] = buf[m++];
quat[0] = buf[m++];
quat[1] = buf[m++];
quat[2] = buf[m++];
quat[3] = buf[m++];
bonus[nlocal_bonus].ilocal = nlocal;
ellipsoid[nlocal] = nlocal_bonus++;
}
double **extra = atom->extra;
if (atom->nextra_store) {
int size = static_cast<int> (buf[0]) - m;
for (int i = 0; i < size; i++) extra[nlocal][i] = buf[m++];
}
atom->nlocal++;
return m;
}
/* ----------------------------------------------------------------------
create one atom of itype at coord
set other values to defaults
------------------------------------------------------------------------- */
void AtomVecEllipsoid::create_atom(int itype, double *coord)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) grow(0);
tag[nlocal] = 0;
type[nlocal] = itype;
x[nlocal][0] = coord[0];
x[nlocal][1] = coord[1];
x[nlocal][2] = coord[2];
mask[nlocal] = 1;
image[nlocal] = ((imageint) IMGMAX << IMG2BITS) |
((imageint) IMGMAX << IMGBITS) | IMGMAX;
v[nlocal][0] = 0.0;
v[nlocal][1] = 0.0;
v[nlocal][2] = 0.0;
rmass[nlocal] = 1.0;
angmom[nlocal][0] = 0.0;
angmom[nlocal][1] = 0.0;
angmom[nlocal][2] = 0.0;
ellipsoid[nlocal] = -1;
atom->nlocal++;
}
/* ----------------------------------------------------------------------
unpack one line from Atoms section of data file
initialize other atom quantities
------------------------------------------------------------------------- */
void AtomVecEllipsoid::data_atom(double *coord, imageint imagetmp,
char **values)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) grow(0);
tag[nlocal] = ATOTAGINT(values[0]);
type[nlocal] = atoi(values[1]);
if (type[nlocal] <= 0 || type[nlocal] > atom->ntypes)
error->one(FLERR,"Invalid atom type in Atoms section of data file");
ellipsoid[nlocal] = atoi(values[2]);
if (ellipsoid[nlocal] == 0) ellipsoid[nlocal] = -1;
else if (ellipsoid[nlocal] == 1) ellipsoid[nlocal] = 0;
else error->one(FLERR,"Invalid atom type in Atoms section of data file");
rmass[nlocal] = atof(values[3]);
if (rmass[nlocal] <= 0.0)
error->one(FLERR,"Invalid density in Atoms section of data file");
x[nlocal][0] = coord[0];
x[nlocal][1] = coord[1];
x[nlocal][2] = coord[2];
image[nlocal] = imagetmp;
mask[nlocal] = 1;
v[nlocal][0] = 0.0;
v[nlocal][1] = 0.0;
v[nlocal][2] = 0.0;
angmom[nlocal][0] = 0.0;
angmom[nlocal][1] = 0.0;
angmom[nlocal][2] = 0.0;
atom->nlocal++;
}
/* ----------------------------------------------------------------------
unpack hybrid quantities from one line in Atoms section of data file
initialize other atom quantities for this sub-style
------------------------------------------------------------------------- */
int AtomVecEllipsoid::data_atom_hybrid(int nlocal, char **values)
{
ellipsoid[nlocal] = atoi(values[0]);
if (ellipsoid[nlocal] == 0) ellipsoid[nlocal] = -1;
else if (ellipsoid[nlocal] == 1) ellipsoid[nlocal] = 0;
else error->one(FLERR,"Invalid atom type in Atoms section of data file");
rmass[nlocal] = atof(values[1]);
if (rmass[nlocal] <= 0.0)
error->one(FLERR,"Invalid density in Atoms section of data file");
return 2;
}
/* ----------------------------------------------------------------------
unpack one line from Ellipsoids section of data file
------------------------------------------------------------------------- */
void AtomVecEllipsoid::data_atom_bonus(int m, char **values)
{
if (ellipsoid[m])
error->one(FLERR,"Assigning ellipsoid parameters to non-ellipsoid atom");
if (nlocal_bonus == nmax_bonus) grow_bonus();
double *shape = bonus[nlocal_bonus].shape;
shape[0] = 0.5 * atof(values[0]);
shape[1] = 0.5 * atof(values[1]);
shape[2] = 0.5 * atof(values[2]);
if (shape[0] <= 0.0 || shape[1] <= 0.0 || shape[2] <= 0.0)
error->one(FLERR,"Invalid shape in Ellipsoids section of data file");
double *quat = bonus[nlocal_bonus].quat;
quat[0] = atof(values[3]);
quat[1] = atof(values[4]);
quat[2] = atof(values[5]);
quat[3] = atof(values[6]);
MathExtra::qnormalize(quat);
// reset ellipsoid mass
// previously stored density in rmass
rmass[m] *= 4.0*MY_PI/3.0 * shape[0]*shape[1]*shape[2];
bonus[nlocal_bonus].ilocal = m;
ellipsoid[m] = nlocal_bonus++;
}
/* ----------------------------------------------------------------------
unpack one line from Velocities section of data file
------------------------------------------------------------------------- */
void AtomVecEllipsoid::data_vel(int m, char **values)
{
v[m][0] = atof(values[0]);
v[m][1] = atof(values[1]);
v[m][2] = atof(values[2]);
angmom[m][0] = atof(values[3]);
angmom[m][1] = atof(values[4]);
angmom[m][2] = atof(values[5]);
}
/* ----------------------------------------------------------------------
unpack hybrid quantities from one line in Velocities section of data file
------------------------------------------------------------------------- */
int AtomVecEllipsoid::data_vel_hybrid(int m, char **values)
{
angmom[m][0] = atof(values[0]);
angmom[m][1] = atof(values[1]);
angmom[m][2] = atof(values[2]);
return 3;
}
/* ----------------------------------------------------------------------
pack atom info for data file including 3 image flags
------------------------------------------------------------------------- */
void AtomVecEllipsoid::pack_data(double **buf)
{
double *shape;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
buf[i][0] = ubuf(tag[i]).d;
buf[i][1] = ubuf(type[i]).d;
if (ellipsoid[i] < 0) buf[i][2] = ubuf(0).d;
else buf[i][2] = ubuf(1).d;
if (ellipsoid[i] < 0) buf[i][3] = rmass[i];
else {
shape = bonus[ellipsoid[i]].shape;
buf[i][3] = rmass[i] / (4.0*MY_PI/3.0 * shape[0]*shape[1]*shape[2]);
}
buf[i][4] = x[i][0];
buf[i][5] = x[i][1];
buf[i][6] = x[i][2];
buf[i][7] = ubuf((image[i] & IMGMASK) - IMGMAX).d;
buf[i][8] = ubuf((image[i] >> IMGBITS & IMGMASK) - IMGMAX).d;
buf[i][9] = ubuf((image[i] >> IMG2BITS) - IMGMAX).d;
}
}
/* ----------------------------------------------------------------------
pack hybrid atom info for data file
------------------------------------------------------------------------- */
int AtomVecEllipsoid::pack_data_hybrid(int i, double *buf)
{
if (ellipsoid[i] < 0) buf[0] = ubuf(0).d;
else buf[0] = ubuf(1).d;
if (ellipsoid[i] < 0) buf[1] = rmass[i];
else {
double *shape = bonus[ellipsoid[i]].shape;
buf[1] = rmass[i] / (4.0*MY_PI/3.0 * shape[0]*shape[1]*shape[2]);
}
return 2;
}
/* ----------------------------------------------------------------------
write atom info to data file including 3 image flags
------------------------------------------------------------------------- */
void AtomVecEllipsoid::write_data(FILE *fp, int n, double **buf)
{
for (int i = 0; i < n; i++)
fprintf(fp,TAGINT_FORMAT
" %d %d %-1.16e %-1.16e %-1.16e %-1.16e %d %d %d\n",
(tagint) ubuf(buf[i][0]).i,(int) ubuf(buf[i][1]).i,
(int) ubuf(buf[i][2]).i,
buf[i][3],buf[i][4],buf[i][5],buf[i][6],
(int) ubuf(buf[i][7]).i,(int) ubuf(buf[i][8]).i,
(int) ubuf(buf[i][9]).i);
}
/* ----------------------------------------------------------------------
write hybrid atom info to data file
------------------------------------------------------------------------- */
int AtomVecEllipsoid::write_data_hybrid(FILE *fp, double *buf)
{
fprintf(fp," %d %-1.16e",(int) ubuf(buf[0]).i,buf[1]);
return 2;
}
/* ----------------------------------------------------------------------
pack velocity info for data file
------------------------------------------------------------------------- */
void AtomVecEllipsoid::pack_vel(double **buf)
{
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
buf[i][0] = ubuf(tag[i]).d;
buf[i][1] = v[i][0];
buf[i][2] = v[i][1];
buf[i][3] = v[i][2];
buf[i][4] = angmom[i][0];
buf[i][5] = angmom[i][1];
buf[i][6] = angmom[i][2];
}
}
/* ----------------------------------------------------------------------
pack hybrid velocity info for data file
------------------------------------------------------------------------- */
int AtomVecEllipsoid::pack_vel_hybrid(int i, double *buf)
{
buf[0] = angmom[i][0];
buf[1] = angmom[i][1];
buf[2] = angmom[i][2];
return 3;
}
/* ----------------------------------------------------------------------
write velocity info to data file
------------------------------------------------------------------------- */
void AtomVecEllipsoid::write_vel(FILE *fp, int n, double **buf)
{
for (int i = 0; i < n; i++)
fprintf(fp,TAGINT_FORMAT
" %-1.16e %-1.16e %-1.16e %-1.16e %-1.16e %-1.16e\n",
(tagint) ubuf(buf[i][0]).i,buf[i][1],buf[i][2],buf[i][3],
buf[i][4],buf[i][5],buf[i][6]);
}
/* ----------------------------------------------------------------------
write hybrid velocity info to data file
------------------------------------------------------------------------- */
int AtomVecEllipsoid::write_vel_hybrid(FILE *fp, double *buf)
{
fprintf(fp," %-1.16e %-1.16e %-1.16e",buf[0],buf[1],buf[2]);
return 3;
}
/* ----------------------------------------------------------------------
return # of bytes of allocated memory
------------------------------------------------------------------------- */
bigint AtomVecEllipsoid::memory_usage()
{
bigint bytes = 0;
if (atom->memcheck("tag")) bytes += memory->usage(tag,nmax);
if (atom->memcheck("type")) bytes += memory->usage(type,nmax);
if (atom->memcheck("mask")) bytes += memory->usage(mask,nmax);
if (atom->memcheck("image")) bytes += memory->usage(image,nmax);
if (atom->memcheck("x")) bytes += memory->usage(x,nmax,3);
if (atom->memcheck("v")) bytes += memory->usage(v,nmax,3);
if (atom->memcheck("f")) bytes += memory->usage(f,nmax*comm->nthreads,3);
if (atom->memcheck("rmass")) bytes += memory->usage(rmass,nmax);
if (atom->memcheck("angmom")) bytes += memory->usage(angmom,nmax,3);
if (atom->memcheck("torque"))
bytes += memory->usage(torque,nmax*comm->nthreads,3);
if (atom->memcheck("ellipsoid")) bytes += memory->usage(ellipsoid,nmax);
bytes += nmax_bonus*sizeof(Bonus);
return bytes;
}
diff --git a/src/atom_vec_hybrid.cpp b/src/atom_vec_hybrid.cpp
index 7d34931b4..54bd78a83 100644
--- a/src/atom_vec_hybrid.cpp
+++ b/src/atom_vec_hybrid.cpp
@@ -1,1083 +1,1083 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include <stdlib.h>
#include <string.h>
#include "atom_vec_hybrid.h"
#include "atom.h"
#include "domain.h"
#include "modify.h"
#include "fix.h"
#include "memory.h"
#include "error.h"
using namespace LAMMPS_NS;
/* ---------------------------------------------------------------------- */
AtomVecHybrid::AtomVecHybrid(LAMMPS *lmp) : AtomVec(lmp) {}
/* ---------------------------------------------------------------------- */
AtomVecHybrid::~AtomVecHybrid()
{
for (int k = 0; k < nstyles; k++) delete styles[k];
delete [] styles;
for (int k = 0; k < nstyles; k++) delete [] keywords[k];
delete [] keywords;
}
/* ----------------------------------------------------------------------
process sub-style args
------------------------------------------------------------------------- */
void AtomVecHybrid::process_args(int narg, char **arg)
{
// build list of all known atom styles
build_styles();
// allocate list of sub-styles as big as possibly needed if no extra args
styles = new AtomVec*[narg];
keywords = new char*[narg];
// allocate each sub-style
// call process_args() with set of args that are not atom style names
// use known_style() to determine which args these are
int i,jarg,dummy;
int iarg = 0;
nstyles = 0;
while (iarg < narg) {
if (strcmp(arg[iarg],"hybrid") == 0)
error->all(FLERR,"Atom style hybrid cannot have hybrid as an argument");
for (i = 0; i < nstyles; i++)
if (strcmp(arg[iarg],keywords[i]) == 0)
error->all(FLERR,"Atom style hybrid cannot use same atom style twice");
styles[nstyles] = atom->new_avec(arg[iarg],1,dummy);
keywords[nstyles] = new char[strlen(arg[iarg])+1];
strcpy(keywords[nstyles],arg[iarg]);
jarg = iarg + 1;
while (jarg < narg && !known_style(arg[jarg])) jarg++;
styles[nstyles]->process_args(jarg-iarg-1,&arg[iarg+1]);
iarg = jarg;
nstyles++;
}
// free allstyles created by build_styles()
for (int i = 0; i < nallstyles; i++) delete [] allstyles[i];
delete [] allstyles;
// hybrid settings are MAX or MIN of sub-style settings
// hybrid sizes are minimial values plus extra values for each sub-style
molecular = 0;
comm_x_only = comm_f_only = 1;
size_forward = 3;
size_reverse = 3;
size_border = 6;
size_data_atom = 5;
size_data_vel = 4;
xcol_data = 3;
for (int k = 0; k < nstyles; k++) {
if ((styles[k]->molecular == 1 && molecular == 2) ||
(styles[k]->molecular == 2 && molecular == 1))
error->all(FLERR,"Cannot mix molecular and molecule template "
"atom styles");
molecular = MAX(molecular,styles[k]->molecular);
bonds_allow = MAX(bonds_allow,styles[k]->bonds_allow);
angles_allow = MAX(angles_allow,styles[k]->angles_allow);
dihedrals_allow = MAX(dihedrals_allow,styles[k]->dihedrals_allow);
impropers_allow = MAX(impropers_allow,styles[k]->impropers_allow);
mass_type = MAX(mass_type,styles[k]->mass_type);
dipole_type = MAX(dipole_type,styles[k]->dipole_type);
forceclearflag = MAX(forceclearflag,styles[k]->forceclearflag);
if (styles[k]->molecular == 2) onemols = styles[k]->onemols;
comm_x_only = MIN(comm_x_only,styles[k]->comm_x_only);
comm_f_only = MIN(comm_f_only,styles[k]->comm_f_only);
size_forward += styles[k]->size_forward - 3;
size_reverse += styles[k]->size_reverse - 3;
size_border += styles[k]->size_border - 6;
size_data_atom += styles[k]->size_data_atom - 5;
size_data_vel += styles[k]->size_data_vel - 4;
}
size_velocity = 3;
if (atom->omega_flag) size_velocity += 3;
if (atom->angmom_flag) size_velocity += 3;
}
/* ---------------------------------------------------------------------- */
void AtomVecHybrid::init()
{
AtomVec::init();
for (int k = 0; k < nstyles; k++) styles[k]->init();
}
/* ----------------------------------------------------------------------
grow atom arrays
n = 0 grows arrays by a chunk
n > 0 allocates arrays to size n
------------------------------------------------------------------------- */
void AtomVecHybrid::grow(int n)
{
if (n == 0) grow_nmax();
else nmax = n;
atom->nmax = nmax;
- if (nmax < 0)
+ if (nmax < 0 || nmax > MAXSMALLINT)
error->one(FLERR,"Per-processor system is too big");
// sub-styles perform all reallocation
// turn off nextra_grow so hybrid can do that once below
int tmp = atom->nextra_grow;
atom->nextra_grow = 0;
for (int k = 0; k < nstyles; k++) styles[k]->grow(nmax);
atom->nextra_grow = tmp;
// insure hybrid local ptrs and sub-style ptrs are up to date
// for sub-styles, do this in case
// multiple sub-style reallocs of same array occurred
grow_reset();
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
modify->fix[atom->extra_grow[iextra]]->grow_arrays(nmax);
}
/* ----------------------------------------------------------------------
reset local array ptrs
------------------------------------------------------------------------- */
void AtomVecHybrid::grow_reset()
{
tag = atom->tag; type = atom->type;
mask = atom->mask; image = atom->image;
x = atom->x; v = atom->v; f = atom->f;
omega = atom->omega; angmom = atom->angmom;
for (int k = 0; k < nstyles; k++) styles[k]->grow_reset();
}
/* ----------------------------------------------------------------------
copy atom I info to atom J for all sub-styles
------------------------------------------------------------------------- */
void AtomVecHybrid::copy(int i, int j, int delflag)
{
int tmp = atom->nextra_grow;
atom->nextra_grow = 0;
for (int k = 0; k < nstyles; k++) styles[k]->copy(i,j,delflag);
atom->nextra_grow = tmp;
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
modify->fix[atom->extra_grow[iextra]]->copy_arrays(i,j,delflag);
}
/* ---------------------------------------------------------------------- */
void AtomVecHybrid::clear_bonus()
{
for (int k = 0; k < nstyles; k++) styles[k]->clear_bonus();
}
/* ---------------------------------------------------------------------- */
void AtomVecHybrid::force_clear(int n, size_t nbytes)
{
for (int k = 0; k < nstyles; k++)
if (styles[k]->forceclearflag) styles[k]->force_clear(n,nbytes);
}
/* ---------------------------------------------------------------------- */
int AtomVecHybrid::pack_comm(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,k,m;
double dx,dy,dz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0];
buf[m++] = x[j][1];
buf[m++] = x[j][2];
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0]*domain->xprd + pbc[5]*domain->xy + pbc[4]*domain->xz;
dy = pbc[1]*domain->yprd + pbc[3]*domain->yz;
dz = pbc[2]*domain->zprd;
}
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
}
}
// pack sub-style contributions as contiguous chunks
for (k = 0; k < nstyles; k++)
m += styles[k]->pack_comm_hybrid(n,list,&buf[m]);
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecHybrid::pack_comm_vel(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,k,m;
double dx,dy,dz,dvx,dvy,dvz;
int omega_flag = atom->omega_flag;
int angmom_flag = atom->angmom_flag;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0];
buf[m++] = x[j][1];
buf[m++] = x[j][2];
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
if (omega_flag) {
buf[m++] = omega[j][0];
buf[m++] = omega[j][1];
buf[m++] = omega[j][2];
}
if (angmom_flag) {
buf[m++] = angmom[j][0];
buf[m++] = angmom[j][1];
buf[m++] = angmom[j][2];
}
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0]*domain->xprd + pbc[5]*domain->xy + pbc[4]*domain->xz;
dy = pbc[1]*domain->yprd + pbc[3]*domain->yz;
dz = pbc[2]*domain->zprd;
}
if (!deform_vremap) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
if (omega_flag) {
buf[m++] = omega[j][0];
buf[m++] = omega[j][1];
buf[m++] = omega[j][2];
}
if (angmom_flag) {
buf[m++] = angmom[j][0];
buf[m++] = angmom[j][1];
buf[m++] = angmom[j][2];
}
}
} else {
dvx = pbc[0]*h_rate[0] + pbc[5]*h_rate[5] + pbc[4]*h_rate[4];
dvy = pbc[1]*h_rate[1] + pbc[3]*h_rate[3];
dvz = pbc[2]*h_rate[2];
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
if (mask[i] & deform_groupbit) {
buf[m++] = v[j][0] + dvx;
buf[m++] = v[j][1] + dvy;
buf[m++] = v[j][2] + dvz;
} else {
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
}
if (omega_flag) {
buf[m++] = omega[j][0];
buf[m++] = omega[j][1];
buf[m++] = omega[j][2];
}
if (angmom_flag) {
buf[m++] = angmom[j][0];
buf[m++] = angmom[j][1];
buf[m++] = angmom[j][2];
}
}
}
}
// pack sub-style contributions as contiguous chunks
for (k = 0; k < nstyles; k++)
m += styles[k]->pack_comm_hybrid(n,list,&buf[m]);
return m;
}
/* ---------------------------------------------------------------------- */
void AtomVecHybrid::unpack_comm(int n, int first, double *buf)
{
int i,k,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
x[i][0] = buf[m++];
x[i][1] = buf[m++];
x[i][2] = buf[m++];
}
// unpack sub-style contributions as contiguous chunks
for (k = 0; k < nstyles; k++)
m += styles[k]->unpack_comm_hybrid(n,first,&buf[m]);
}
/* ---------------------------------------------------------------------- */
void AtomVecHybrid::unpack_comm_vel(int n, int first, double *buf)
{
int i,k,m,last;
int omega_flag = atom->omega_flag;
int angmom_flag = atom->angmom_flag;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
x[i][0] = buf[m++];
x[i][1] = buf[m++];
x[i][2] = buf[m++];
v[i][0] = buf[m++];
v[i][1] = buf[m++];
v[i][2] = buf[m++];
if (omega_flag) {
omega[i][0] = buf[m++];
omega[i][1] = buf[m++];
omega[i][2] = buf[m++];
}
if (angmom_flag) {
angmom[i][0] = buf[m++];
angmom[i][1] = buf[m++];
angmom[i][2] = buf[m++];
}
}
// unpack sub-style contributions as contiguous chunks
for (k = 0; k < nstyles; k++)
m += styles[k]->unpack_comm_hybrid(n,first,&buf[m]);
}
/* ---------------------------------------------------------------------- */
int AtomVecHybrid::pack_reverse(int n, int first, double *buf)
{
int i,k,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
buf[m++] = f[i][0];
buf[m++] = f[i][1];
buf[m++] = f[i][2];
}
// pack sub-style contributions as contiguous chunks
for (k = 0; k < nstyles; k++)
m += styles[k]->pack_reverse_hybrid(n,first,&buf[m]);
return m;
}
/* ---------------------------------------------------------------------- */
void AtomVecHybrid::unpack_reverse(int n, int *list, double *buf)
{
int i,j,k,m;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
f[j][0] += buf[m++];
f[j][1] += buf[m++];
f[j][2] += buf[m++];
}
// unpack sub-style contributions as contiguous chunks
for (k = 0; k < nstyles; k++)
m += styles[k]->unpack_reverse_hybrid(n,list,&buf[m]);
}
/* ---------------------------------------------------------------------- */
int AtomVecHybrid::pack_border(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,k,m;
double dx,dy,dz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0];
buf[m++] = x[j][1];
buf[m++] = x[j][2];
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0];
dy = pbc[1];
dz = pbc[2];
}
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
}
}
// pack sub-style contributions as contiguous chunks
for (k = 0; k < nstyles; k++)
m += styles[k]->pack_border_hybrid(n,list,&buf[m]);
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecHybrid::pack_border_vel(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,k,m;
double dx,dy,dz,dvx,dvy,dvz;
int omega_flag = atom->omega_flag;
int angmom_flag = atom->angmom_flag;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0];
buf[m++] = x[j][1];
buf[m++] = x[j][2];
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
if (omega_flag) {
buf[m++] = omega[j][0];
buf[m++] = omega[j][1];
buf[m++] = omega[j][2];
}
if (angmom_flag) {
buf[m++] = angmom[j][0];
buf[m++] = angmom[j][1];
buf[m++] = angmom[j][2];
}
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0];
dy = pbc[1];
dz = pbc[2];
}
if (!deform_vremap) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
if (omega_flag) {
buf[m++] = omega[j][0];
buf[m++] = omega[j][1];
buf[m++] = omega[j][2];
}
if (angmom_flag) {
buf[m++] = angmom[j][0];
buf[m++] = angmom[j][1];
buf[m++] = angmom[j][2];
}
}
} else {
dvx = pbc[0]*h_rate[0] + pbc[5]*h_rate[5] + pbc[4]*h_rate[4];
dvy = pbc[1]*h_rate[1] + pbc[3]*h_rate[3];
dvz = pbc[2]*h_rate[2];
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
if (mask[i] & deform_groupbit) {
buf[m++] = v[j][0] + dvx;
buf[m++] = v[j][1] + dvy;
buf[m++] = v[j][2] + dvz;
} else {
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
}
if (omega_flag) {
buf[m++] = omega[j][0];
buf[m++] = omega[j][1];
buf[m++] = omega[j][2];
}
if (angmom_flag) {
buf[m++] = angmom[j][0];
buf[m++] = angmom[j][1];
buf[m++] = angmom[j][2];
}
}
}
}
// pack sub-style contributions as contiguous chunks
for (k = 0; k < nstyles; k++)
m += styles[k]->pack_border_hybrid(n,list,&buf[m]);
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);
return m;
}
/* ---------------------------------------------------------------------- */
void AtomVecHybrid::unpack_border(int n, int first, double *buf)
{
int i,k,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
if (i == nmax) grow(0);
x[i][0] = buf[m++];
x[i][1] = buf[m++];
x[i][2] = buf[m++];
tag[i] = (tagint) ubuf(buf[m++]).i;
type[i] = (int) ubuf(buf[m++]).i;
mask[i] = (int) ubuf(buf[m++]).i;
}
// unpack sub-style contributions as contiguous chunks
for (k = 0; k < nstyles; k++)
m += styles[k]->unpack_border_hybrid(n,first,&buf[m]);
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->
unpack_border(n,first,&buf[m]);
}
/* ---------------------------------------------------------------------- */
void AtomVecHybrid::unpack_border_vel(int n, int first, double *buf)
{
int i,k,m,last;
int omega_flag = atom->omega_flag;
int angmom_flag = atom->angmom_flag;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
if (i == nmax) grow(0);
x[i][0] = buf[m++];
x[i][1] = buf[m++];
x[i][2] = buf[m++];
tag[i] = (tagint) ubuf(buf[m++]).i;
type[i] = (int) ubuf(buf[m++]).i;
mask[i] = (int) ubuf(buf[m++]).i;
v[i][0] = buf[m++];
v[i][1] = buf[m++];
v[i][2] = buf[m++];
if (omega_flag) {
omega[i][0] = buf[m++];
omega[i][1] = buf[m++];
omega[i][2] = buf[m++];
}
if (angmom_flag) {
angmom[i][0] = buf[m++];
angmom[i][1] = buf[m++];
angmom[i][2] = buf[m++];
}
}
// unpack sub-style contributions as contiguous chunks
for (k = 0; k < nstyles; k++)
m += styles[k]->unpack_border_hybrid(n,first,&buf[m]);
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->
unpack_border(n,first,&buf[m]);
}
/* ----------------------------------------------------------------------
pack data for atom I for sending to another proc
pack each sub-style one after the other
------------------------------------------------------------------------- */
int AtomVecHybrid::pack_exchange(int i, double *buf)
{
int k,m;
int tmp = atom->nextra_grow;
atom->nextra_grow = 0;
m = 0;
for (k = 0; k < nstyles; k++)
m += styles[k]->pack_exchange(i,&buf[m]);
atom->nextra_grow = tmp;
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
m += modify->fix[atom->extra_grow[iextra]]->pack_exchange(i,&buf[m]);
buf[0] = m;
return m;
}
/* ----------------------------------------------------------------------
unpack data for single atom received from another proc
unpack each sub-style one after the other
grow() occurs here so arrays for all sub-styles are grown
------------------------------------------------------------------------- */
int AtomVecHybrid::unpack_exchange(double *buf)
{
int k,m;
int nlocal = atom->nlocal;
if (nlocal == nmax) grow(0);
int tmp = atom->nextra_grow;
atom->nextra_grow = 0;
m = 0;
for (k = 0; k < nstyles; k++) {
m += styles[k]->unpack_exchange(&buf[m]);
atom->nlocal--;
}
atom->nextra_grow = tmp;
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
m += modify->fix[atom->extra_grow[iextra]]->
unpack_exchange(nlocal,&buf[m]);
atom->nlocal++;
return m;
}
/* ----------------------------------------------------------------------
size of restart data for all atoms owned by this proc
include extra data stored by fixes
------------------------------------------------------------------------- */
int AtomVecHybrid::size_restart()
{
int tmp = atom->nextra_restart;
atom->nextra_restart = 0;
int n = 0;
for (int k = 0; k < nstyles; k++)
n += styles[k]->size_restart();
atom->nextra_restart = tmp;
int nlocal = atom->nlocal;
if (atom->nextra_restart)
for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
for (int i = 0; i < nlocal; i++)
n += modify->fix[atom->extra_restart[iextra]]->size_restart(i);
return n;
}
/* ----------------------------------------------------------------------
pack atom I's data for restart file including extra quantities
xyz must be 1st 3 values, so that read_restart can test on them
pack each sub-style one after the other
------------------------------------------------------------------------- */
int AtomVecHybrid::pack_restart(int i, double *buf)
{
int tmp = atom->nextra_restart;
atom->nextra_restart = 0;
int m = 0;
for (int k = 0; k < nstyles; k++)
m += styles[k]->pack_restart(i,&buf[m]);
atom->nextra_restart = tmp;
if (atom->nextra_restart)
for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
m += modify->fix[atom->extra_restart[iextra]]->pack_restart(i,&buf[m]);
buf[0] = m;
return m;
}
/* ----------------------------------------------------------------------
unpack data for one atom from restart file including extra quantities
unpack each sub-style one after the other
grow() occurs here so arrays for all sub-styles are grown
------------------------------------------------------------------------- */
int AtomVecHybrid::unpack_restart(double *buf)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) {
grow(0);
if (atom->nextra_store)
memory->grow(atom->extra,nmax,atom->nextra_store,"atom:extra");
}
int tmp = atom->nextra_store;
atom->nextra_store = 0;
int m = 0;
for (int k = 0; k < nstyles; k++) {
m += styles[k]->unpack_restart(&buf[m]);
atom->nlocal--;
}
atom->nextra_store = tmp;
double **extra = atom->extra;
if (atom->nextra_store) {
int size = static_cast<int> (buf[0]) - m;
for (int i = 0; i < size; i++) extra[nlocal][i] = buf[m++];
}
atom->nlocal++;
return m;
}
/* ----------------------------------------------------------------------
create one atom of itype at coord
create each sub-style one after the other
grow() occurs here so arrays for all sub-styles are grown
------------------------------------------------------------------------- */
void AtomVecHybrid::create_atom(int itype, double *coord)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) grow(0);
for (int k = 0; k < nstyles; k++) {
styles[k]->create_atom(itype,coord);
atom->nlocal--;
}
atom->nlocal++;
}
/* ----------------------------------------------------------------------
unpack one line from Atoms section of data file
grow() occurs here so arrays for all sub-styles are grown
------------------------------------------------------------------------- */
void AtomVecHybrid::data_atom(double *coord, imageint imagetmp, char **values)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) grow(0);
tag[nlocal] = ATOTAGINT(values[0]);
type[nlocal] = atoi(values[1]);
if (type[nlocal] <= 0 || type[nlocal] > atom->ntypes)
error->one(FLERR,"Invalid atom type in Atoms section of data file");
x[nlocal][0] = coord[0];
x[nlocal][1] = coord[1];
x[nlocal][2] = coord[2];
image[nlocal] = imagetmp;
mask[nlocal] = 1;
v[nlocal][0] = 0.0;
v[nlocal][1] = 0.0;
v[nlocal][2] = 0.0;
if (atom->omega_flag) {
omega[nlocal][0] = 0.0;
omega[nlocal][1] = 0.0;
omega[nlocal][2] = 0.0;
}
if (atom->angmom_flag) {
angmom[nlocal][0] = 0.0;
angmom[nlocal][1] = 0.0;
angmom[nlocal][2] = 0.0;
}
// each sub-style parses sub-style specific values
int m = 5;
for (int k = 0; k < nstyles; k++)
m += styles[k]->data_atom_hybrid(nlocal,&values[m]);
atom->nlocal++;
}
/* ----------------------------------------------------------------------
unpack one line from Velocities section of data file
------------------------------------------------------------------------- */
void AtomVecHybrid::data_vel(int m, char **values)
{
v[m][0] = atof(values[0]);
v[m][1] = atof(values[1]);
v[m][2] = atof(values[2]);
// each sub-style parses sub-style specific values
int n = 3;
for (int k = 0; k < nstyles; k++)
n += styles[k]->data_vel_hybrid(m,&values[n]);
}
/* ----------------------------------------------------------------------
pack atom info for data file including 3 image flags
------------------------------------------------------------------------- */
void AtomVecHybrid::pack_data(double **buf)
{
int k,m;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
buf[i][0] = ubuf(tag[i]).d;
buf[i][1] = ubuf(type[i]).d;
buf[i][2] = x[i][0];
buf[i][3] = x[i][1];
buf[i][4] = x[i][2];
m = 5;
for (k = 0; k < nstyles; k++)
m += styles[k]->pack_data_hybrid(i,&buf[i][m]);
buf[i][m] = ubuf((image[i] & IMGMASK) - IMGMAX).d;
buf[i][m+1] = ubuf((image[i] >> IMGBITS & IMGMASK) - IMGMAX).d;
buf[i][m+2] = ubuf((image[i] >> IMG2BITS) - IMGMAX).d;
}
}
/* ----------------------------------------------------------------------
write atom info to data file including 3 image flags
------------------------------------------------------------------------- */
void AtomVecHybrid::write_data(FILE *fp, int n, double **buf)
{
int k,m;
for (int i = 0; i < n; i++) {
fprintf(fp,TAGINT_FORMAT " %d %-1.16e %-1.16e %-1.16e",
(tagint) ubuf(buf[i][0]).i,(int) ubuf(buf[i][1]).i,
buf[i][2],buf[i][3],buf[i][4]);
m = 5;
for (k = 0; k < nstyles; k++)
m += styles[k]->write_data_hybrid(fp,&buf[i][m]);
fprintf(fp," %d %d %d\n",
(int) ubuf(buf[i][m]).i,(int) ubuf(buf[i][m+1]).i,
(int) ubuf(buf[i][m+2]).i);
}
}
/* ----------------------------------------------------------------------
pack velocity info for data file
------------------------------------------------------------------------- */
void AtomVecHybrid::pack_vel(double **buf)
{
int k,m;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
buf[i][0] = ubuf(tag[i]).d;
buf[i][1] = v[i][0];
buf[i][2] = v[i][1];
buf[i][3] = v[i][2];
m = 4;
for (k = 0; k < nstyles; k++)
m += styles[k]->pack_vel_hybrid(i,&buf[i][m]);
}
}
/* ----------------------------------------------------------------------
write velocity info to data file
------------------------------------------------------------------------- */
void AtomVecHybrid::write_vel(FILE *fp, int n, double **buf)
{
int k,m;
for (int i = 0; i < n; i++) {
fprintf(fp,TAGINT_FORMAT " %g %g %g",
(tagint) ubuf(buf[i][0]).i,buf[i][1],buf[i][2],buf[i][3]);
m = 4;
for (k = 0; k < nstyles; k++)
m += styles[k]->write_vel_hybrid(fp,&buf[i][m]);
fprintf(fp,"\n");
}
}
/* ----------------------------------------------------------------------
assign an index to named atom property and return index
returned value encodes which sub-style and index returned by sub-style
return -1 if name is unknown to any sub-styles
------------------------------------------------------------------------- */
int AtomVecHybrid::property_atom(char *name)
{
for (int k = 0; k < nstyles; k++) {
int index = styles[k]->property_atom(name);
if (index >= 0) return index*nstyles + k;
}
return -1;
}
/* ----------------------------------------------------------------------
pack per-atom data into buf for ComputePropertyAtom
index maps to data specific to this atom style
------------------------------------------------------------------------- */
void AtomVecHybrid::pack_property_atom(int multiindex, double *buf,
int nvalues, int groupbit)
{
int k = multiindex % nstyles;
int index = multiindex/nstyles;
styles[k]->pack_property_atom(index,buf,nvalues,groupbit);
}
/* ----------------------------------------------------------------------
allstyles = list of all atom styles in this LAMMPS executable
------------------------------------------------------------------------- */
void AtomVecHybrid::build_styles()
{
nallstyles = 0;
#define ATOM_CLASS
#define AtomStyle(key,Class) nallstyles++;
#include "style_atom.h"
#undef AtomStyle
#undef ATOM_CLASS
allstyles = new char*[nallstyles];
int n;
nallstyles = 0;
#define ATOM_CLASS
#define AtomStyle(key,Class) \
n = strlen(#key) + 1; \
allstyles[nallstyles] = new char[n]; \
strcpy(allstyles[nallstyles],#key); \
nallstyles++;
#include "style_atom.h"
#undef AtomStyle
#undef ATOM_CLASS
}
/* ----------------------------------------------------------------------
allstyles = list of all known atom styles
------------------------------------------------------------------------- */
int AtomVecHybrid::known_style(char *str)
{
for (int i = 0; i < nallstyles; i++)
if (strcmp(str,allstyles[i]) == 0) return 1;
return 0;
}
/* ----------------------------------------------------------------------
return # of bytes of allocated memory
------------------------------------------------------------------------- */
bigint AtomVecHybrid::memory_usage()
{
bigint bytes = 0;
for (int k = 0; k < nstyles; k++) bytes += styles[k]->memory_usage();
return bytes;
}
diff --git a/src/atom_vec_line.cpp b/src/atom_vec_line.cpp
index 0e534577f..345128537 100644
--- a/src/atom_vec_line.cpp
+++ b/src/atom_vec_line.cpp
@@ -1,1357 +1,1357 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include <math.h>
#include <stdlib.h>
#include <string.h>
#include "atom_vec_line.h"
#include "atom.h"
#include "comm.h"
#include "domain.h"
#include "modify.h"
#include "force.h"
#include "fix.h"
#include "math_const.h"
#include "memory.h"
#include "error.h"
using namespace LAMMPS_NS;
using namespace MathConst;
#define EPSILON 0.001
/* ---------------------------------------------------------------------- */
AtomVecLine::AtomVecLine(LAMMPS *lmp) : AtomVec(lmp)
{
molecular = 0;
comm_x_only = comm_f_only = 0;
size_forward = 4;
size_reverse = 6;
size_border = 12;
size_velocity = 6;
size_data_atom = 8;
size_data_vel = 7;
size_data_bonus = 5;
xcol_data = 6;
atom->line_flag = 1;
atom->molecule_flag = atom->rmass_flag = 1;
atom->radius_flag = atom->omega_flag = atom->torque_flag = 1;
atom->sphere_flag = 1;
nlocal_bonus = nghost_bonus = nmax_bonus = 0;
bonus = NULL;
}
/* ---------------------------------------------------------------------- */
AtomVecLine::~AtomVecLine()
{
memory->sfree(bonus);
}
/* ---------------------------------------------------------------------- */
void AtomVecLine::init()
{
AtomVec::init();
if (domain->dimension != 2)
error->all(FLERR,"Atom_style line can only be used in 2d simulations");
}
/* ----------------------------------------------------------------------
grow atom arrays
n = 0 grows arrays by a chunk
n > 0 allocates arrays to size n
------------------------------------------------------------------------- */
void AtomVecLine::grow(int n)
{
if (n == 0) grow_nmax();
else nmax = n;
atom->nmax = nmax;
- if (nmax < 0)
+ if (nmax < 0 || nmax > MAXSMALLINT)
error->one(FLERR,"Per-processor system is too big");
tag = memory->grow(atom->tag,nmax,"atom:tag");
type = memory->grow(atom->type,nmax,"atom:type");
mask = memory->grow(atom->mask,nmax,"atom:mask");
image = memory->grow(atom->image,nmax,"atom:image");
x = memory->grow(atom->x,nmax,3,"atom:x");
v = memory->grow(atom->v,nmax,3,"atom:v");
f = memory->grow(atom->f,nmax*comm->nthreads,3,"atom:f");
molecule = memory->grow(atom->molecule,nmax,"atom:molecule");
rmass = memory->grow(atom->rmass,nmax,"atom:rmass");
radius = memory->grow(atom->radius,nmax,"atom:radius");
omega = memory->grow(atom->omega,nmax,3,"atom:omega");
torque = memory->grow(atom->torque,nmax*comm->nthreads,3,"atom:torque");
line = memory->grow(atom->line,nmax,"atom:line");
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
modify->fix[atom->extra_grow[iextra]]->grow_arrays(nmax);
}
/* ----------------------------------------------------------------------
reset local array ptrs
------------------------------------------------------------------------- */
void AtomVecLine::grow_reset()
{
tag = atom->tag; type = atom->type;
mask = atom->mask; image = atom->image;
x = atom->x; v = atom->v; f = atom->f;
molecule = atom->molecule; rmass = atom->rmass;
radius = atom->radius; omega = atom->omega; torque = atom->torque;
line = atom->line;
}
/* ----------------------------------------------------------------------
grow bonus data structure
------------------------------------------------------------------------- */
void AtomVecLine::grow_bonus()
{
nmax_bonus = grow_nmax_bonus(nmax_bonus);
if (nmax_bonus < 0)
error->one(FLERR,"Per-processor system is too big");
bonus = (Bonus *) memory->srealloc(bonus,nmax_bonus*sizeof(Bonus),
"atom:bonus");
}
/* ----------------------------------------------------------------------
copy atom I info to atom J
------------------------------------------------------------------------- */
void AtomVecLine::copy(int i, int j, int delflag)
{
tag[j] = tag[i];
type[j] = type[i];
mask[j] = mask[i];
image[j] = image[i];
x[j][0] = x[i][0];
x[j][1] = x[i][1];
x[j][2] = x[i][2];
v[j][0] = v[i][0];
v[j][1] = v[i][1];
v[j][2] = v[i][2];
molecule[j] = molecule[i];
rmass[j] = rmass[i];
radius[j] = radius[i];
omega[j][0] = omega[i][0];
omega[j][1] = omega[i][1];
omega[j][2] = omega[i][2];
// if deleting atom J via delflag and J has bonus data, then delete it
if (delflag && line[j] >= 0) {
copy_bonus(nlocal_bonus-1,line[j]);
nlocal_bonus--;
}
// if atom I has bonus data, reset I's bonus.ilocal to loc J
// do NOT do this if self-copy (I=J) since I's bonus data is already deleted
if (line[i] >= 0 && i != j) bonus[line[i]].ilocal = j;
line[j] = line[i];
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
modify->fix[atom->extra_grow[iextra]]->copy_arrays(i,j,delflag);
}
/* ----------------------------------------------------------------------
copy bonus data from I to J, effectively deleting the J entry
also reset line that points to I to now point to J
------------------------------------------------------------------------- */
void AtomVecLine::copy_bonus(int i, int j)
{
line[bonus[i].ilocal] = j;
memcpy(&bonus[j],&bonus[i],sizeof(Bonus));
}
/* ----------------------------------------------------------------------
clear ghost info in bonus data
called before ghosts are recommunicated in comm and irregular
------------------------------------------------------------------------- */
void AtomVecLine::clear_bonus()
{
nghost_bonus = 0;
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
modify->fix[atom->extra_grow[iextra]]->clear_bonus();
}
/* ----------------------------------------------------------------------
set length value in bonus data for particle I
oriented along x axis
this may create or delete entry in bonus data
------------------------------------------------------------------------- */
void AtomVecLine::set_length(int i, double value)
{
if (line[i] < 0) {
if (value == 0.0) return;
if (nlocal_bonus == nmax_bonus) grow_bonus();
bonus[nlocal_bonus].length = value;
bonus[nlocal_bonus].theta = 0.0;
bonus[nlocal_bonus].ilocal = i;
line[i] = nlocal_bonus++;
} else if (value == 0.0) {
copy_bonus(nlocal_bonus-1,line[i]);
nlocal_bonus--;
line[i] = -1;
} else bonus[line[i]].length = value;
// also set radius = half of length
// unless value = 0.0, then set diameter = 1.0
radius[i] = 0.5 * value;
if (value == 0.0) radius[i] = 0.5;
}
/* ---------------------------------------------------------------------- */
int AtomVecLine::pack_comm(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0];
buf[m++] = x[j][1];
buf[m++] = x[j][2];
if (line[j] >= 0) buf[m++] = bonus[line[j]].theta;
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0]*domain->xprd + pbc[5]*domain->xy + pbc[4]*domain->xz;
dy = pbc[1]*domain->yprd + pbc[3]*domain->yz;
dz = pbc[2]*domain->zprd;
}
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
if (line[j] >= 0) buf[m++] = bonus[line[j]].theta;
}
}
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecLine::pack_comm_vel(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz,dvx,dvy,dvz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0];
buf[m++] = x[j][1];
buf[m++] = x[j][2];
if (line[j] >= 0) buf[m++] = bonus[line[j]].theta;
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
buf[m++] = omega[j][0];
buf[m++] = omega[j][1];
buf[m++] = omega[j][2];
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0]*domain->xprd + pbc[5]*domain->xy + pbc[4]*domain->xz;
dy = pbc[1]*domain->yprd + pbc[3]*domain->yz;
dz = pbc[2]*domain->zprd;
}
if (!deform_vremap) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
if (line[j] >= 0) buf[m++] = bonus[line[j]].theta;
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
buf[m++] = omega[j][0];
buf[m++] = omega[j][1];
buf[m++] = omega[j][2];
}
} else {
dvx = pbc[0]*h_rate[0] + pbc[5]*h_rate[5] + pbc[4]*h_rate[4];
dvy = pbc[1]*h_rate[1] + pbc[3]*h_rate[3];
dvz = pbc[2]*h_rate[2];
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
if (line[j] >= 0) buf[m++] = bonus[line[j]].theta;
if (mask[i] & deform_groupbit) {
buf[m++] = v[j][0] + dvx;
buf[m++] = v[j][1] + dvy;
buf[m++] = v[j][2] + dvz;
} else {
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
}
buf[m++] = omega[j][0];
buf[m++] = omega[j][1];
buf[m++] = omega[j][2];
}
}
}
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecLine::pack_comm_hybrid(int n, int *list, double *buf)
{
int i,j,m;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
if (line[j] >= 0) buf[m++] = bonus[line[j]].theta;
}
return m;
}
/* ---------------------------------------------------------------------- */
void AtomVecLine::unpack_comm(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
x[i][0] = buf[m++];
x[i][1] = buf[m++];
x[i][2] = buf[m++];
if (line[i] >= 0) bonus[line[i]].theta = buf[m++];
}
}
/* ---------------------------------------------------------------------- */
void AtomVecLine::unpack_comm_vel(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
x[i][0] = buf[m++];
x[i][1] = buf[m++];
x[i][2] = buf[m++];
if (line[i] >= 0) bonus[line[i]].theta = buf[m++];
v[i][0] = buf[m++];
v[i][1] = buf[m++];
v[i][2] = buf[m++];
omega[i][0] = buf[m++];
omega[i][1] = buf[m++];
omega[i][2] = buf[m++];
}
}
/* ---------------------------------------------------------------------- */
int AtomVecLine::unpack_comm_hybrid(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++)
if (line[i] >= 0) bonus[line[i]].theta = buf[m++];
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecLine::pack_reverse(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
buf[m++] = f[i][0];
buf[m++] = f[i][1];
buf[m++] = f[i][2];
buf[m++] = torque[i][0];
buf[m++] = torque[i][1];
buf[m++] = torque[i][2];
}
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecLine::pack_reverse_hybrid(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
buf[m++] = torque[i][0];
buf[m++] = torque[i][1];
buf[m++] = torque[i][2];
}
return m;
}
/* ---------------------------------------------------------------------- */
void AtomVecLine::unpack_reverse(int n, int *list, double *buf)
{
int i,j,m;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
f[j][0] += buf[m++];
f[j][1] += buf[m++];
f[j][2] += buf[m++];
torque[j][0] += buf[m++];
torque[j][1] += buf[m++];
torque[j][2] += buf[m++];
}
}
/* ---------------------------------------------------------------------- */
int AtomVecLine::unpack_reverse_hybrid(int n, int *list, double *buf)
{
int i,j,m;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
torque[j][0] += buf[m++];
torque[j][1] += buf[m++];
torque[j][2] += buf[m++];
}
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecLine::pack_border(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0];
buf[m++] = x[j][1];
buf[m++] = x[j][2];
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
buf[m++] = ubuf(molecule[j]).d;
buf[m++] = radius[j];
buf[m++] = rmass[j];
if (line[j] < 0) buf[m++] = ubuf(0).d;
else {
buf[m++] = ubuf(1).d;
buf[m++] = bonus[line[j]].length;
buf[m++] = bonus[line[j]].theta;
}
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0];
dy = pbc[1];
dz = pbc[2];
}
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
buf[m++] = ubuf(molecule[j]).d;
buf[m++] = radius[j];
buf[m++] = rmass[j];
if (line[j] < 0) buf[m++] = ubuf(0).d;
else {
buf[m++] = ubuf(1).d;
buf[m++] = bonus[line[j]].length;
buf[m++] = bonus[line[j]].theta;
}
}
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecLine::pack_border_vel(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz,dvx,dvy,dvz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0];
buf[m++] = x[j][1];
buf[m++] = x[j][2];
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
buf[m++] = ubuf(molecule[j]).d;
buf[m++] = radius[j];
buf[m++] = rmass[j];
if (line[j] < 0) buf[m++] = ubuf(0).d;
else {
buf[m++] = ubuf(1).d;
buf[m++] = bonus[line[j]].length;
buf[m++] = bonus[line[j]].theta;
}
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
buf[m++] = omega[j][0];
buf[m++] = omega[j][1];
buf[m++] = omega[j][2];
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0];
dy = pbc[1];
dz = pbc[2];
}
if (!deform_vremap) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
buf[m++] = ubuf(molecule[j]).d;
buf[m++] = radius[j];
buf[m++] = rmass[j];
if (line[j] < 0) buf[m++] = ubuf(0).d;
else {
buf[m++] = ubuf(1).d;
buf[m++] = bonus[line[j]].length;
buf[m++] = bonus[line[j]].theta;
}
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
buf[m++] = omega[j][0];
buf[m++] = omega[j][1];
buf[m++] = omega[j][2];
}
} else {
dvx = pbc[0]*h_rate[0] + pbc[5]*h_rate[5] + pbc[4]*h_rate[4];
dvy = pbc[1]*h_rate[1] + pbc[3]*h_rate[3];
dvz = pbc[2]*h_rate[2];
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
buf[m++] = ubuf(molecule[j]).d;
buf[m++] = radius[j];
buf[m++] = rmass[j];
if (line[j] < 0) buf[m++] = ubuf(0).d;
else {
buf[m++] = ubuf(1).d;
buf[m++] = bonus[line[j]].length;
buf[m++] = bonus[line[j]].theta;
}
if (mask[i] & deform_groupbit) {
buf[m++] = v[j][0] + dvx;
buf[m++] = v[j][1] + dvy;
buf[m++] = v[j][2] + dvz;
} else {
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
}
buf[m++] = omega[j][0];
buf[m++] = omega[j][1];
buf[m++] = omega[j][2];
}
}
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecLine::pack_border_hybrid(int n, int *list, double *buf)
{
int i,j,m;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = ubuf(molecule[j]).d;
buf[m++] = radius[j];
buf[m++] = rmass[j];
if (line[j] < 0) buf[m++] = ubuf(0).d;
else {
buf[m++] = ubuf(1).d;
buf[m++] = bonus[line[j]].length;
buf[m++] = bonus[line[j]].theta;
}
}
return m;
}
/* ---------------------------------------------------------------------- */
void AtomVecLine::unpack_border(int n, int first, double *buf)
{
int i,j,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
if (i == nmax) grow(0);
x[i][0] = buf[m++];
x[i][1] = buf[m++];
x[i][2] = buf[m++];
tag[i] = (tagint) ubuf(buf[m++]).i;
type[i] = (int) ubuf(buf[m++]).i;
mask[i] = (int) ubuf(buf[m++]).i;
molecule[i] = (tagint) ubuf(buf[m++]).i;
radius[i] = buf[m++];
rmass[i] = buf[m++];
line[i] = (int) ubuf(buf[m++]).i;
if (line[i] == 0) line[i] = -1;
else {
j = nlocal_bonus + nghost_bonus;
if (j == nmax_bonus) grow_bonus();
bonus[j].length = buf[m++];
bonus[j].theta = buf[m++];
bonus[j].ilocal = i;
line[i] = j;
nghost_bonus++;
}
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->
unpack_border(n,first,&buf[m]);
}
/* ---------------------------------------------------------------------- */
void AtomVecLine::unpack_border_vel(int n, int first, double *buf)
{
int i,j,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
if (i == nmax) grow(0);
x[i][0] = buf[m++];
x[i][1] = buf[m++];
x[i][2] = buf[m++];
tag[i] = (tagint) ubuf(buf[m++]).i;
type[i] = (int) ubuf(buf[m++]).i;
mask[i] = (int) ubuf(buf[m++]).i;
molecule[i] = (tagint) ubuf(buf[m++]).i;
radius[i] = buf[m++];
rmass[i] = buf[m++];
line[i] = (int) ubuf(buf[m++]).i;
if (line[i] == 0) line[i] = -1;
else {
j = nlocal_bonus + nghost_bonus;
if (j == nmax_bonus) grow_bonus();
bonus[j].length = buf[m++];
bonus[j].theta = buf[m++];
bonus[j].ilocal = i;
line[i] = j;
nghost_bonus++;
}
v[i][0] = buf[m++];
v[i][1] = buf[m++];
v[i][2] = buf[m++];
omega[i][0] = buf[m++];
omega[i][1] = buf[m++];
omega[i][2] = buf[m++];
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->
unpack_border(n,first,&buf[m]);
}
/* ---------------------------------------------------------------------- */
int AtomVecLine::unpack_border_hybrid(int n, int first, double *buf)
{
int i,j,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
molecule[i] = (tagint) ubuf(buf[m++]).i;
radius[i] = buf[m++];
rmass[i] = buf[m++];
line[i] = (int) ubuf(buf[m++]).i;
if (line[i] == 0) line[i] = -1;
else {
j = nlocal_bonus + nghost_bonus;
if (j == nmax_bonus) grow_bonus();
bonus[j].length = buf[m++];
bonus[j].theta = buf[m++];
bonus[j].ilocal = i;
line[i] = j;
nghost_bonus++;
}
}
return m;
}
/* ----------------------------------------------------------------------
pack data for atom I for sending to another proc
xyz must be 1st 3 values, so comm::exchange() can test on them
------------------------------------------------------------------------- */
int AtomVecLine::pack_exchange(int i, double *buf)
{
int m = 1;
buf[m++] = x[i][0];
buf[m++] = x[i][1];
buf[m++] = x[i][2];
buf[m++] = v[i][0];
buf[m++] = v[i][1];
buf[m++] = v[i][2];
buf[m++] = ubuf(tag[i]).d;
buf[m++] = ubuf(type[i]).d;
buf[m++] = ubuf(mask[i]).d;
buf[m++] = ubuf(image[i]).d;
buf[m++] = ubuf(molecule[i]).d;
buf[m++] = rmass[i];
buf[m++] = radius[i];
buf[m++] = omega[i][0];
buf[m++] = omega[i][1];
buf[m++] = omega[i][2];
if (line[i] < 0) buf[m++] = ubuf(0).d;
else {
buf[m++] = ubuf(1).d;
int j = line[i];
buf[m++] = bonus[j].length;
buf[m++] = bonus[j].theta;
}
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
m += modify->fix[atom->extra_grow[iextra]]->pack_exchange(i,&buf[m]);
buf[0] = m;
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecLine::unpack_exchange(double *buf)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) grow(0);
int m = 1;
x[nlocal][0] = buf[m++];
x[nlocal][1] = buf[m++];
x[nlocal][2] = buf[m++];
v[nlocal][0] = buf[m++];
v[nlocal][1] = buf[m++];
v[nlocal][2] = buf[m++];
tag[nlocal] = (tagint) ubuf(buf[m++]).i;
type[nlocal] = (int) ubuf(buf[m++]).i;
mask[nlocal] = (int) ubuf(buf[m++]).i;
image[nlocal] = (imageint) ubuf(buf[m++]).i;
molecule[nlocal] = (tagint) ubuf(buf[m++]).i;
rmass[nlocal] = buf[m++];
radius[nlocal] = buf[m++];
omega[nlocal][0] = buf[m++];
omega[nlocal][1] = buf[m++];
omega[nlocal][2] = buf[m++];
line[nlocal] = (int) ubuf(buf[m++]).i;
if (line[nlocal] == 0) line[nlocal] = -1;
else {
if (nlocal_bonus == nmax_bonus) grow_bonus();
bonus[nlocal_bonus].length = buf[m++];
bonus[nlocal_bonus].theta = buf[m++];
bonus[nlocal_bonus].ilocal = nlocal;
line[nlocal] = nlocal_bonus++;
}
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
m += modify->fix[atom->extra_grow[iextra]]->
unpack_exchange(nlocal,&buf[m]);
atom->nlocal++;
return m;
}
/* ----------------------------------------------------------------------
size of restart data for all atoms owned by this proc
include extra data stored by fixes
------------------------------------------------------------------------- */
int AtomVecLine::size_restart()
{
int i;
int n = 0;
int nlocal = atom->nlocal;
for (i = 0; i < nlocal; i++)
if (line[i] >= 0) n += 20;
else n += 18;
if (atom->nextra_restart)
for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
for (i = 0; i < nlocal; i++)
n += modify->fix[atom->extra_restart[iextra]]->size_restart(i);
return n;
}
/* ----------------------------------------------------------------------
pack atom I's data for restart file including extra quantities
xyz must be 1st 3 values, so that read_restart can test on them
molecular types may be negative, but write as positive
------------------------------------------------------------------------- */
int AtomVecLine::pack_restart(int i, double *buf)
{
int m = 1;
buf[m++] = x[i][0];
buf[m++] = x[i][1];
buf[m++] = x[i][2];
buf[m++] = ubuf(tag[i]).d;
buf[m++] = ubuf(type[i]).d;
buf[m++] = ubuf(mask[i]).d;
buf[m++] = ubuf(image[i]).d;
buf[m++] = v[i][0];
buf[m++] = v[i][1];
buf[m++] = v[i][2];
buf[m++] = ubuf(molecule[i]).d;
buf[m++] = rmass[i];
buf[m++] = radius[i];
buf[m++] = omega[i][0];
buf[m++] = omega[i][1];
buf[m++] = omega[i][2];
if (line[i] < 0) buf[m++] = ubuf(0).d;
else {
buf[m++] = ubuf(1).d;
int j = line[i];
buf[m++] = bonus[j].length;
buf[m++] = bonus[j].theta;
}
if (atom->nextra_restart)
for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
m += modify->fix[atom->extra_restart[iextra]]->pack_restart(i,&buf[m]);
buf[0] = m;
return m;
}
/* ----------------------------------------------------------------------
unpack data for one atom from restart file including extra quantities
------------------------------------------------------------------------- */
int AtomVecLine::unpack_restart(double *buf)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) {
grow(0);
if (atom->nextra_store)
memory->grow(atom->extra,nmax,atom->nextra_store,"atom:extra");
}
int m = 1;
x[nlocal][0] = buf[m++];
x[nlocal][1] = buf[m++];
x[nlocal][2] = buf[m++];
tag[nlocal] = (tagint) ubuf(buf[m++]).i;
type[nlocal] = (int) ubuf(buf[m++]).i;
mask[nlocal] = (int) ubuf(buf[m++]).i;
image[nlocal] = (imageint) ubuf(buf[m++]).i;
v[nlocal][0] = buf[m++];
v[nlocal][1] = buf[m++];
v[nlocal][2] = buf[m++];
molecule[nlocal] = (tagint) ubuf(buf[m++]).i;
rmass[nlocal] = buf[m++];
radius[nlocal] = buf[m++];
omega[nlocal][0] = buf[m++];
omega[nlocal][1] = buf[m++];
omega[nlocal][2] = buf[m++];
line[nlocal] = (int) ubuf(buf[m++]).i;
if (line[nlocal] == 0) line[nlocal] = -1;
else {
if (nlocal_bonus == nmax_bonus) grow_bonus();
bonus[nlocal_bonus].length = buf[m++];
bonus[nlocal_bonus].theta = buf[m++];
bonus[nlocal_bonus].ilocal = nlocal;
line[nlocal] = nlocal_bonus++;
}
double **extra = atom->extra;
if (atom->nextra_store) {
int size = static_cast<int> (buf[0]) - m;
for (int i = 0; i < size; i++) extra[nlocal][i] = buf[m++];
}
atom->nlocal++;
return m;
}
/* ----------------------------------------------------------------------
create one atom of itype at coord
set other values to defaults
------------------------------------------------------------------------- */
void AtomVecLine::create_atom(int itype, double *coord)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) grow(0);
tag[nlocal] = 0;
type[nlocal] = itype;
x[nlocal][0] = coord[0];
x[nlocal][1] = coord[1];
x[nlocal][2] = coord[2];
mask[nlocal] = 1;
image[nlocal] = ((imageint) IMGMAX << IMG2BITS) |
((imageint) IMGMAX << IMGBITS) | IMGMAX;
v[nlocal][0] = 0.0;
v[nlocal][1] = 0.0;
v[nlocal][2] = 0.0;
molecule[nlocal] = 0;
radius[nlocal] = 0.5;
rmass[nlocal] = 4.0*MY_PI/3.0 * radius[nlocal]*radius[nlocal]*radius[nlocal];
omega[nlocal][0] = 0.0;
omega[nlocal][1] = 0.0;
omega[nlocal][2] = 0.0;
line[nlocal] = -1;
atom->nlocal++;
}
/* ----------------------------------------------------------------------
unpack one line from Atoms section of data file
initialize other atom quantities
------------------------------------------------------------------------- */
void AtomVecLine::data_atom(double *coord, imageint imagetmp, char **values)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) grow(0);
tag[nlocal] = ATOTAGINT(values[0]);
molecule[nlocal] = ATOTAGINT(values[1]);
type[nlocal] = atoi(values[2]);
if (type[nlocal] <= 0 || type[nlocal] > atom->ntypes)
error->one(FLERR,"Invalid atom type in Atoms section of data file");
line[nlocal] = atoi(values[3]);
if (line[nlocal] == 0) line[nlocal] = -1;
else if (line[nlocal] == 1) line[nlocal] = 0;
else error->one(FLERR,"Invalid atom type in Atoms section of data file");
rmass[nlocal] = atof(values[4]);
if (rmass[nlocal] <= 0.0)
error->one(FLERR,"Invalid density in Atoms section of data file");
if (line[nlocal] < 0) {
radius[nlocal] = 0.5;
rmass[nlocal] *= 4.0*MY_PI/3.0 *
radius[nlocal]*radius[nlocal]*radius[nlocal];
} else radius[nlocal] = 0.0;
x[nlocal][0] = coord[0];
x[nlocal][1] = coord[1];
x[nlocal][2] = coord[2];
image[nlocal] = imagetmp;
mask[nlocal] = 1;
v[nlocal][0] = 0.0;
v[nlocal][1] = 0.0;
v[nlocal][2] = 0.0;
omega[nlocal][0] = 0.0;
omega[nlocal][1] = 0.0;
omega[nlocal][2] = 0.0;
atom->nlocal++;
}
/* ----------------------------------------------------------------------
unpack hybrid quantities from one line in Atoms section of data file
initialize other atom quantities for this sub-style
------------------------------------------------------------------------- */
int AtomVecLine::data_atom_hybrid(int nlocal, char **values)
{
molecule[nlocal] = ATOTAGINT(values[0]);
line[nlocal] = atoi(values[1]);
if (line[nlocal] == 0) line[nlocal] = -1;
else if (line[nlocal] == 1) line[nlocal] = 0;
else error->one(FLERR,"Invalid atom type in Atoms section of data file");
rmass[nlocal] = atof(values[2]);
if (rmass[nlocal] <= 0.0)
error->one(FLERR,"Invalid density in Atoms section of data file");
if (line[nlocal] < 0) {
radius[nlocal] = 0.5;
rmass[nlocal] *= 4.0*MY_PI/3.0 *
radius[nlocal]*radius[nlocal]*radius[nlocal];
} else radius[nlocal] = 0.0;
return 3;
}
/* ----------------------------------------------------------------------
unpack one line from Lines section of data file
------------------------------------------------------------------------- */
void AtomVecLine::data_atom_bonus(int m, char **values)
{
if (line[m]) error->one(FLERR,"Assigning line parameters to non-line atom");
if (nlocal_bonus == nmax_bonus) grow_bonus();
double x1 = atof(values[0]);
double y1 = atof(values[1]);
double x2 = atof(values[2]);
double y2 = atof(values[3]);
double dx = x2 - x1;
double dy = y2 - y1;
double length = sqrt(dx*dx + dy*dy);
bonus[nlocal_bonus].length = length;
if (dy >= 0.0) bonus[nlocal_bonus].theta = acos(dx/length);
else bonus[nlocal_bonus].theta = -acos(dx/length);
double xc = 0.5*(x1+x2);
double yc = 0.5*(y1+y2);
dx = xc - x[m][0];
dy = yc - x[m][1];
double delta = sqrt(dx*dx + dy*dy);
if (delta/length > EPSILON)
error->one(FLERR,"Inconsistent line segment in data file");
x[m][0] = xc;
x[m][1] = yc;
// reset line radius and mass
// rmass currently holds density
radius[m] = 0.5 * length;
rmass[m] *= length;
bonus[nlocal_bonus].ilocal = m;
line[m] = nlocal_bonus++;
}
/* ----------------------------------------------------------------------
unpack one line from Velocities section of data file
------------------------------------------------------------------------- */
void AtomVecLine::data_vel(int m, char **values)
{
v[m][0] = atof(values[0]);
v[m][1] = atof(values[1]);
v[m][2] = atof(values[2]);
omega[m][0] = atof(values[3]);
omega[m][1] = atof(values[4]);
omega[m][2] = atof(values[5]);
}
/* ----------------------------------------------------------------------
unpack hybrid quantities from one line in Velocities section of data file
------------------------------------------------------------------------- */
int AtomVecLine::data_vel_hybrid(int m, char **values)
{
omega[m][0] = atof(values[0]);
omega[m][1] = atof(values[1]);
omega[m][2] = atof(values[2]);
return 3;
}
/* ----------------------------------------------------------------------
pack atom info for data file including 3 image flags
------------------------------------------------------------------------- */
void AtomVecLine::pack_data(double **buf)
{
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
buf[i][0] = ubuf(tag[i]).d;
buf[i][1] = ubuf(molecule[i]).d;
buf[i][2] = ubuf(type[i]).d;
if (line[i] < 0) buf[i][3] = ubuf(0).d;
else buf[i][3] = ubuf(1).d;
if (line[i] < 0)
buf[i][4] = rmass[i] / (4.0*MY_PI/3.0 * radius[i]*radius[i]*radius[i]);
else buf[i][4] = rmass[i]/bonus[line[i]].length;
buf[i][5] = x[i][0];
buf[i][6] = x[i][1];
buf[i][7] = x[i][2];
buf[i][8] = ubuf((image[i] & IMGMASK) - IMGMAX).d;
buf[i][9] = ubuf((image[i] >> IMGBITS & IMGMASK) - IMGMAX).d;
buf[i][10] = ubuf((image[i] >> IMG2BITS) - IMGMAX).d;
}
}
/* ----------------------------------------------------------------------
pack hybrid atom info for data file
------------------------------------------------------------------------- */
int AtomVecLine::pack_data_hybrid(int i, double *buf)
{
buf[0] = ubuf(molecule[i]).d;
if (line[i] < 0) buf[1] = ubuf(0).d;
else buf[1] = ubuf(1).d;
if (line[i] < 0)
buf[2] = rmass[i] / (4.0*MY_PI/3.0 * radius[i]*radius[i]*radius[i]);
else buf[2] = rmass[i]/bonus[line[i]].length;
return 3;
}
/* ----------------------------------------------------------------------
write atom info to data file including 3 image flags
------------------------------------------------------------------------- */
void AtomVecLine::write_data(FILE *fp, int n, double **buf)
{
for (int i = 0; i < n; i++)
fprintf(fp,TAGINT_FORMAT " " TAGINT_FORMAT
" %d %d %-1.16e %-1.16e %-1.16e %-1.16e %d %d %d\n",
(tagint) ubuf(buf[i][0]).i,(tagint) ubuf(buf[i][1]).i,
(int) ubuf(buf[i][2]).i,(int) ubuf(buf[i][3]).i,
buf[i][4],buf[i][5],buf[i][6],buf[i][7],
(int) ubuf(buf[i][8]).i,(int) ubuf(buf[i][9]).i,
(int) ubuf(buf[i][10]).i);
}
/* ----------------------------------------------------------------------
write hybrid atom info to data file
------------------------------------------------------------------------- */
int AtomVecLine::write_data_hybrid(FILE *fp, double *buf)
{
fprintf(fp," " TAGINT_FORMAT " %d %-1.16e",
(tagint) ubuf(buf[0]).i,(int) ubuf(buf[1]).i,buf[2]);
return 3;
}
/* ----------------------------------------------------------------------
pack velocity info for data file
------------------------------------------------------------------------- */
void AtomVecLine::pack_vel(double **buf)
{
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
buf[i][0] = ubuf(tag[i]).d;
buf[i][1] = v[i][0];
buf[i][2] = v[i][1];
buf[i][3] = v[i][2];
buf[i][4] = omega[i][0];
buf[i][5] = omega[i][1];
buf[i][6] = omega[i][2];
}
}
/* ----------------------------------------------------------------------
pack hybrid velocity info for data file
------------------------------------------------------------------------- */
int AtomVecLine::pack_vel_hybrid(int i, double *buf)
{
buf[0] = omega[i][0];
buf[1] = omega[i][1];
buf[2] = omega[i][2];
return 3;
}
/* ----------------------------------------------------------------------
write velocity info to data file
------------------------------------------------------------------------- */
void AtomVecLine::write_vel(FILE *fp, int n, double **buf)
{
for (int i = 0; i < n; i++)
fprintf(fp,TAGINT_FORMAT
" %-1.16e %-1.16e %-1.16e %-1.16e %-1.16e %-1.16e\n",
(tagint) ubuf(buf[i][0]).i,buf[i][1],buf[i][2],buf[i][3],
buf[i][4],buf[i][5],buf[i][6]);
}
/* ----------------------------------------------------------------------
write hybrid velocity info to data file
------------------------------------------------------------------------- */
int AtomVecLine::write_vel_hybrid(FILE *fp, double *buf)
{
fprintf(fp," %-1.16e %-1.16e %-1.16e",buf[0],buf[1],buf[2]);
return 3;
}
/* ----------------------------------------------------------------------
return # of bytes of allocated memory
------------------------------------------------------------------------- */
bigint AtomVecLine::memory_usage()
{
bigint bytes = 0;
if (atom->memcheck("tag")) bytes += memory->usage(tag,nmax);
if (atom->memcheck("type")) bytes += memory->usage(type,nmax);
if (atom->memcheck("mask")) bytes += memory->usage(mask,nmax);
if (atom->memcheck("image")) bytes += memory->usage(image,nmax);
if (atom->memcheck("x")) bytes += memory->usage(x,nmax,3);
if (atom->memcheck("v")) bytes += memory->usage(v,nmax,3);
if (atom->memcheck("f")) bytes += memory->usage(f,nmax*comm->nthreads,3);
if (atom->memcheck("molecule")) bytes += memory->usage(molecule,nmax);
if (atom->memcheck("rmass")) bytes += memory->usage(rmass,nmax);
if (atom->memcheck("radius")) bytes += memory->usage(radius,nmax);
if (atom->memcheck("omega")) bytes += memory->usage(omega,nmax,3);
if (atom->memcheck("torque"))
bytes += memory->usage(torque,nmax*comm->nthreads,3);
if (atom->memcheck("line")) bytes += memory->usage(line,nmax);
bytes += nmax_bonus*sizeof(Bonus);
return bytes;
}
/* ----------------------------------------------------------------------
check consistency of internal Bonus data structure
n = # of atoms in regular structure to check against
------------------------------------------------------------------------- */
/*
void AtomVecLine::consistency_check(int n, char *str)
{
int iflag = 0;
int count = 0;
for (int i = 0; i < n; i++) {
if (line[i] >= 0) {
count++;
if (line[i] >= nlocal_bonus) iflag++;
if (bonus[line[i]].ilocal != i) iflag++;
//if (comm->me == 1 && update->ntimestep == 873)
// printf("CCHK %s: %d %d: %d %d: %d %d\n",
// str,i,n,line[i],nlocal_bonus,bonus[line[i]].ilocal,iflag);
}
}
if (iflag) {
printf("BAD vecline ptrs: %s: %d %d: %d\n",str,comm->me,
update->ntimestep,iflag);
MPI_Abort(world,1);
}
if (count != nlocal_bonus) {
char msg[128];
printf("BAD vecline count: %s: %d %d: %d %d\n",
str,comm->me,update->ntimestep,count,nlocal_bonus);
MPI_Abort(world,1);
}
}
*/
diff --git a/src/atom_vec_sphere.cpp b/src/atom_vec_sphere.cpp
index 7bf4d4082..a72704b4c 100644
--- a/src/atom_vec_sphere.cpp
+++ b/src/atom_vec_sphere.cpp
@@ -1,1188 +1,1188 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include <math.h>
#include <stdlib.h>
#include <string.h>
#include "atom_vec_sphere.h"
#include "atom.h"
#include "comm.h"
#include "domain.h"
#include "modify.h"
#include "force.h"
#include "fix.h"
#include "fix_adapt.h"
#include "math_const.h"
#include "memory.h"
#include "error.h"
using namespace LAMMPS_NS;
using namespace MathConst;
/* ---------------------------------------------------------------------- */
AtomVecSphere::AtomVecSphere(LAMMPS *lmp) : AtomVec(lmp)
{
molecular = 0;
comm_x_only = 1;
comm_f_only = 0;
size_forward = 3;
size_reverse = 6;
size_border = 8;
size_velocity = 6;
size_data_atom = 7;
size_data_vel = 7;
xcol_data = 5;
atom->sphere_flag = 1;
atom->radius_flag = atom->rmass_flag = atom->omega_flag =
atom->torque_flag = 1;
}
/* ---------------------------------------------------------------------- */
void AtomVecSphere::init()
{
AtomVec::init();
// set radvary if particle diameters are time-varying due to fix adapt
radvary = 0;
comm_x_only = 1;
size_forward = 3;
for (int i = 0; i < modify->nfix; i++)
if (strcmp(modify->fix[i]->style,"adapt") == 0) {
FixAdapt *fix = (FixAdapt *) modify->fix[i];
if (fix->diamflag) {
radvary = 1;
comm_x_only = 0;
size_forward = 5;
}
}
}
/* ----------------------------------------------------------------------
grow atom arrays
n = 0 grows arrays by a chunk
n > 0 allocates arrays to size n
------------------------------------------------------------------------- */
void AtomVecSphere::grow(int n)
{
if (n == 0) grow_nmax();
else nmax = n;
atom->nmax = nmax;
- if (nmax < 0)
+ if (nmax < 0 || nmax > MAXSMALLINT)
error->one(FLERR,"Per-processor system is too big");
tag = memory->grow(atom->tag,nmax,"atom:tag");
type = memory->grow(atom->type,nmax,"atom:type");
mask = memory->grow(atom->mask,nmax,"atom:mask");
image = memory->grow(atom->image,nmax,"atom:image");
x = memory->grow(atom->x,nmax,3,"atom:x");
v = memory->grow(atom->v,nmax,3,"atom:v");
f = memory->grow(atom->f,nmax*comm->nthreads,3,"atom:f");
radius = memory->grow(atom->radius,nmax,"atom:radius");
rmass = memory->grow(atom->rmass,nmax,"atom:rmass");
omega = memory->grow(atom->omega,nmax,3,"atom:omega");
torque = memory->grow(atom->torque,nmax*comm->nthreads,3,"atom:torque");
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
modify->fix[atom->extra_grow[iextra]]->grow_arrays(nmax);
}
/* ----------------------------------------------------------------------
reset local array ptrs
------------------------------------------------------------------------- */
void AtomVecSphere::grow_reset()
{
tag = atom->tag; type = atom->type;
mask = atom->mask; image = atom->image;
x = atom->x; v = atom->v; f = atom->f;
radius = atom->radius; rmass = atom->rmass;
omega = atom->omega; torque = atom->torque;
}
/* ----------------------------------------------------------------------
copy atom I info to atom J
------------------------------------------------------------------------- */
void AtomVecSphere::copy(int i, int j, int delflag)
{
tag[j] = tag[i];
type[j] = type[i];
mask[j] = mask[i];
image[j] = image[i];
x[j][0] = x[i][0];
x[j][1] = x[i][1];
x[j][2] = x[i][2];
v[j][0] = v[i][0];
v[j][1] = v[i][1];
v[j][2] = v[i][2];
radius[j] = radius[i];
rmass[j] = rmass[i];
omega[j][0] = omega[i][0];
omega[j][1] = omega[i][1];
omega[j][2] = omega[i][2];
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
modify->fix[atom->extra_grow[iextra]]->copy_arrays(i,j,delflag);
}
/* ---------------------------------------------------------------------- */
int AtomVecSphere::pack_comm(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz;
if (radvary == 0) {
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0];
buf[m++] = x[j][1];
buf[m++] = x[j][2];
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0]*domain->xprd + pbc[5]*domain->xy + pbc[4]*domain->xz;
dy = pbc[1]*domain->yprd + pbc[3]*domain->yz;
dz = pbc[2]*domain->zprd;
}
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
}
}
} else {
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0];
buf[m++] = x[j][1];
buf[m++] = x[j][2];
buf[m++] = radius[j];
buf[m++] = rmass[j];
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0]*domain->xprd + pbc[5]*domain->xy + pbc[4]*domain->xz;
dy = pbc[1]*domain->yprd + pbc[3]*domain->yz;
dz = pbc[2]*domain->zprd;
}
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
buf[m++] = radius[j];
buf[m++] = rmass[j];
}
}
}
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecSphere::pack_comm_vel(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz,dvx,dvy,dvz;
if (radvary == 0) {
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0];
buf[m++] = x[j][1];
buf[m++] = x[j][2];
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
buf[m++] = omega[j][0];
buf[m++] = omega[j][1];
buf[m++] = omega[j][2];
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0]*domain->xprd + pbc[5]*domain->xy + pbc[4]*domain->xz;
dy = pbc[1]*domain->yprd + pbc[3]*domain->yz;
dz = pbc[2]*domain->zprd;
}
if (!deform_vremap) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
buf[m++] = omega[j][0];
buf[m++] = omega[j][1];
buf[m++] = omega[j][2];
}
} else {
dvx = pbc[0]*h_rate[0] + pbc[5]*h_rate[5] + pbc[4]*h_rate[4];
dvy = pbc[1]*h_rate[1] + pbc[3]*h_rate[3];
dvz = pbc[2]*h_rate[2];
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
if (mask[i] & deform_groupbit) {
buf[m++] = v[j][0] + dvx;
buf[m++] = v[j][1] + dvy;
buf[m++] = v[j][2] + dvz;
} else {
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
}
buf[m++] = omega[j][0];
buf[m++] = omega[j][1];
buf[m++] = omega[j][2];
}
}
}
} else {
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0];
buf[m++] = x[j][1];
buf[m++] = x[j][2];
buf[m++] = radius[j];
buf[m++] = rmass[j];
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
buf[m++] = omega[j][0];
buf[m++] = omega[j][1];
buf[m++] = omega[j][2];
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0]*domain->xprd + pbc[5]*domain->xy + pbc[4]*domain->xz;
dy = pbc[1]*domain->yprd + pbc[3]*domain->yz;
dz = pbc[2]*domain->zprd;
}
if (!deform_vremap) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
buf[m++] = radius[j];
buf[m++] = rmass[j];
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
buf[m++] = omega[j][0];
buf[m++] = omega[j][1];
buf[m++] = omega[j][2];
}
} else {
dvx = pbc[0]*h_rate[0] + pbc[5]*h_rate[5] + pbc[4]*h_rate[4];
dvy = pbc[1]*h_rate[1] + pbc[3]*h_rate[3];
dvz = pbc[2]*h_rate[2];
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
buf[m++] = radius[j];
buf[m++] = rmass[j];
if (mask[i] & deform_groupbit) {
buf[m++] = v[j][0] + dvx;
buf[m++] = v[j][1] + dvy;
buf[m++] = v[j][2] + dvz;
} else {
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
}
buf[m++] = omega[j][0];
buf[m++] = omega[j][1];
buf[m++] = omega[j][2];
}
}
}
}
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecSphere::pack_comm_hybrid(int n, int *list, double *buf)
{
int i,j,m;
if (radvary == 0) return 0;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = radius[j];
buf[m++] = rmass[j];
}
return m;
}
/* ---------------------------------------------------------------------- */
void AtomVecSphere::unpack_comm(int n, int first, double *buf)
{
int i,m,last;
if (radvary == 0) {
m = 0;
last = first + n;
for (i = first; i < last; i++) {
x[i][0] = buf[m++];
x[i][1] = buf[m++];
x[i][2] = buf[m++];
}
} else {
m = 0;
last = first + n;
for (i = first; i < last; i++) {
x[i][0] = buf[m++];
x[i][1] = buf[m++];
x[i][2] = buf[m++];
radius[i] = buf[m++];
rmass[i] = buf[m++];
}
}
}
/* ---------------------------------------------------------------------- */
void AtomVecSphere::unpack_comm_vel(int n, int first, double *buf)
{
int i,m,last;
if (radvary == 0) {
m = 0;
last = first + n;
for (i = first; i < last; i++) {
x[i][0] = buf[m++];
x[i][1] = buf[m++];
x[i][2] = buf[m++];
v[i][0] = buf[m++];
v[i][1] = buf[m++];
v[i][2] = buf[m++];
omega[i][0] = buf[m++];
omega[i][1] = buf[m++];
omega[i][2] = buf[m++];
}
} else {
m = 0;
last = first + n;
for (i = first; i < last; i++) {
x[i][0] = buf[m++];
x[i][1] = buf[m++];
x[i][2] = buf[m++];
radius[i] = buf[m++];
rmass[i] = buf[m++];
v[i][0] = buf[m++];
v[i][1] = buf[m++];
v[i][2] = buf[m++];
omega[i][0] = buf[m++];
omega[i][1] = buf[m++];
omega[i][2] = buf[m++];
}
}
}
/* ---------------------------------------------------------------------- */
int AtomVecSphere::unpack_comm_hybrid(int n, int first, double *buf)
{
int i,m,last;
if (radvary == 0) return 0;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
radius[i] = buf[m++];
rmass[i] = buf[m++];
}
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecSphere::pack_reverse(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
buf[m++] = f[i][0];
buf[m++] = f[i][1];
buf[m++] = f[i][2];
buf[m++] = torque[i][0];
buf[m++] = torque[i][1];
buf[m++] = torque[i][2];
}
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecSphere::pack_reverse_hybrid(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
buf[m++] = torque[i][0];
buf[m++] = torque[i][1];
buf[m++] = torque[i][2];
}
return m;
}
/* ---------------------------------------------------------------------- */
void AtomVecSphere::unpack_reverse(int n, int *list, double *buf)
{
int i,j,m;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
f[j][0] += buf[m++];
f[j][1] += buf[m++];
f[j][2] += buf[m++];
torque[j][0] += buf[m++];
torque[j][1] += buf[m++];
torque[j][2] += buf[m++];
}
}
/* ---------------------------------------------------------------------- */
int AtomVecSphere::unpack_reverse_hybrid(int n, int *list, double *buf)
{
int i,j,m;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
torque[j][0] += buf[m++];
torque[j][1] += buf[m++];
torque[j][2] += buf[m++];
}
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecSphere::pack_border(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0];
buf[m++] = x[j][1];
buf[m++] = x[j][2];
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
buf[m++] = radius[j];
buf[m++] = rmass[j];
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0];
dy = pbc[1];
dz = pbc[2];
}
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
buf[m++] = radius[j];
buf[m++] = rmass[j];
}
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecSphere::pack_border_vel(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz,dvx,dvy,dvz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0];
buf[m++] = x[j][1];
buf[m++] = x[j][2];
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
buf[m++] = radius[j];
buf[m++] = rmass[j];
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
buf[m++] = omega[j][0];
buf[m++] = omega[j][1];
buf[m++] = omega[j][2];
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0];
dy = pbc[1];
dz = pbc[2];
}
if (!deform_vremap) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
buf[m++] = radius[j];
buf[m++] = rmass[j];
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
buf[m++] = omega[j][0];
buf[m++] = omega[j][1];
buf[m++] = omega[j][2];
}
} else {
dvx = pbc[0]*h_rate[0] + pbc[5]*h_rate[5] + pbc[4]*h_rate[4];
dvy = pbc[1]*h_rate[1] + pbc[3]*h_rate[3];
dvz = pbc[2]*h_rate[2];
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
buf[m++] = radius[j];
buf[m++] = rmass[j];
if (mask[i] & deform_groupbit) {
buf[m++] = v[j][0] + dvx;
buf[m++] = v[j][1] + dvy;
buf[m++] = v[j][2] + dvz;
} else {
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
}
buf[m++] = omega[j][0];
buf[m++] = omega[j][1];
buf[m++] = omega[j][2];
}
}
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecSphere::pack_border_hybrid(int n, int *list, double *buf)
{
int i,j,m;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = radius[j];
buf[m++] = rmass[j];
}
return m;
}
/* ---------------------------------------------------------------------- */
void AtomVecSphere::unpack_border(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
if (i == nmax) grow(0);
x[i][0] = buf[m++];
x[i][1] = buf[m++];
x[i][2] = buf[m++];
tag[i] = (tagint) ubuf(buf[m++]).i;
type[i] = (int) ubuf(buf[m++]).i;
mask[i] = (int) ubuf(buf[m++]).i;
radius[i] = buf[m++];
rmass[i] = buf[m++];
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->
unpack_border(n,first,&buf[m]);
}
/* ---------------------------------------------------------------------- */
void AtomVecSphere::unpack_border_vel(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
if (i == nmax) grow(0);
x[i][0] = buf[m++];
x[i][1] = buf[m++];
x[i][2] = buf[m++];
tag[i] = (tagint) ubuf(buf[m++]).i;
type[i] = (int) ubuf(buf[m++]).i;
mask[i] = (int) ubuf(buf[m++]).i;
radius[i] = buf[m++];
rmass[i] = buf[m++];
v[i][0] = buf[m++];
v[i][1] = buf[m++];
v[i][2] = buf[m++];
omega[i][0] = buf[m++];
omega[i][1] = buf[m++];
omega[i][2] = buf[m++];
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->
unpack_border(n,first,&buf[m]);
}
/* ---------------------------------------------------------------------- */
int AtomVecSphere::unpack_border_hybrid(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
radius[i] = buf[m++];
rmass[i] = buf[m++];
}
return m;
}
/* ----------------------------------------------------------------------
pack data for atom I for sending to another proc
xyz must be 1st 3 values, so comm::exchange() can test on them
------------------------------------------------------------------------- */
int AtomVecSphere::pack_exchange(int i, double *buf)
{
int m = 1;
buf[m++] = x[i][0];
buf[m++] = x[i][1];
buf[m++] = x[i][2];
buf[m++] = v[i][0];
buf[m++] = v[i][1];
buf[m++] = v[i][2];
buf[m++] = ubuf(tag[i]).d;
buf[m++] = ubuf(type[i]).d;
buf[m++] = ubuf(mask[i]).d;
buf[m++] = ubuf(image[i]).d;
buf[m++] = radius[i];
buf[m++] = rmass[i];
buf[m++] = omega[i][0];
buf[m++] = omega[i][1];
buf[m++] = omega[i][2];
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
m += modify->fix[atom->extra_grow[iextra]]->pack_exchange(i,&buf[m]);
buf[0] = m;
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecSphere::unpack_exchange(double *buf)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) grow(0);
int m = 1;
x[nlocal][0] = buf[m++];
x[nlocal][1] = buf[m++];
x[nlocal][2] = buf[m++];
v[nlocal][0] = buf[m++];
v[nlocal][1] = buf[m++];
v[nlocal][2] = buf[m++];
tag[nlocal] = (tagint) ubuf(buf[m++]).i;
type[nlocal] = (int) ubuf(buf[m++]).i;
mask[nlocal] = (int) ubuf(buf[m++]).i;
image[nlocal] = (imageint) ubuf(buf[m++]).i;
radius[nlocal] = buf[m++];
rmass[nlocal] = buf[m++];
omega[nlocal][0] = buf[m++];
omega[nlocal][1] = buf[m++];
omega[nlocal][2] = buf[m++];
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
m += modify->fix[atom->extra_grow[iextra]]->
unpack_exchange(nlocal,&buf[m]);
atom->nlocal++;
return m;
}
/* ----------------------------------------------------------------------
size of restart data for all atoms owned by this proc
include extra data stored by fixes
------------------------------------------------------------------------- */
int AtomVecSphere::size_restart()
{
int i;
int nlocal = atom->nlocal;
int n = 16 * nlocal;
if (atom->nextra_restart)
for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
for (i = 0; i < nlocal; i++)
n += modify->fix[atom->extra_restart[iextra]]->size_restart(i);
return n;
}
/* ----------------------------------------------------------------------
pack atom I's data for restart file including extra quantities
xyz must be 1st 3 values, so that read_restart can test on them
molecular types may be negative, but write as positive
------------------------------------------------------------------------- */
int AtomVecSphere::pack_restart(int i, double *buf)
{
int m = 1;
buf[m++] = x[i][0];
buf[m++] = x[i][1];
buf[m++] = x[i][2];
buf[m++] = ubuf(tag[i]).d;
buf[m++] = ubuf(type[i]).d;
buf[m++] = ubuf(mask[i]).d;
buf[m++] = ubuf(image[i]).d;
buf[m++] = v[i][0];
buf[m++] = v[i][1];
buf[m++] = v[i][2];
buf[m++] = radius[i];
buf[m++] = rmass[i];
buf[m++] = omega[i][0];
buf[m++] = omega[i][1];
buf[m++] = omega[i][2];
if (atom->nextra_restart)
for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
m += modify->fix[atom->extra_restart[iextra]]->pack_restart(i,&buf[m]);
buf[0] = m;
return m;
}
/* ----------------------------------------------------------------------
unpack data for one atom from restart file including extra quantities
------------------------------------------------------------------------- */
int AtomVecSphere::unpack_restart(double *buf)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) {
grow(0);
if (atom->nextra_store)
memory->grow(atom->extra,nmax,atom->nextra_store,"atom:extra");
}
int m = 1;
x[nlocal][0] = buf[m++];
x[nlocal][1] = buf[m++];
x[nlocal][2] = buf[m++];
tag[nlocal] = (tagint) ubuf(buf[m++]).i;
type[nlocal] = (int) ubuf(buf[m++]).i;
mask[nlocal] = (int) ubuf(buf[m++]).i;
image[nlocal] = (imageint) ubuf(buf[m++]).i;
v[nlocal][0] = buf[m++];
v[nlocal][1] = buf[m++];
v[nlocal][2] = buf[m++];
radius[nlocal] = buf[m++];
rmass[nlocal] = buf[m++];
omega[nlocal][0] = buf[m++];
omega[nlocal][1] = buf[m++];
omega[nlocal][2] = buf[m++];
double **extra = atom->extra;
if (atom->nextra_store) {
int size = static_cast<int> (buf[0]) - m;
for (int i = 0; i < size; i++) extra[nlocal][i] = buf[m++];
}
atom->nlocal++;
return m;
}
/* ----------------------------------------------------------------------
create one atom of itype at coord
set other values to defaults
------------------------------------------------------------------------- */
void AtomVecSphere::create_atom(int itype, double *coord)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) grow(0);
tag[nlocal] = 0;
type[nlocal] = itype;
x[nlocal][0] = coord[0];
x[nlocal][1] = coord[1];
x[nlocal][2] = coord[2];
mask[nlocal] = 1;
image[nlocal] = ((imageint) IMGMAX << IMG2BITS) |
((imageint) IMGMAX << IMGBITS) | IMGMAX;
v[nlocal][0] = 0.0;
v[nlocal][1] = 0.0;
v[nlocal][2] = 0.0;
radius[nlocal] = 0.5;
rmass[nlocal] = 4.0*MY_PI/3.0 * radius[nlocal]*radius[nlocal]*radius[nlocal];
omega[nlocal][0] = 0.0;
omega[nlocal][1] = 0.0;
omega[nlocal][2] = 0.0;
atom->nlocal++;
}
/* ----------------------------------------------------------------------
unpack one line from Atoms section of data file
initialize other atom quantities
------------------------------------------------------------------------- */
void AtomVecSphere::data_atom(double *coord, imageint imagetmp, char **values)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) grow(0);
tag[nlocal] = ATOTAGINT(values[0]);
type[nlocal] = atoi(values[1]);
if (type[nlocal] <= 0 || type[nlocal] > atom->ntypes)
error->one(FLERR,"Invalid atom type in Atoms section of data file");
radius[nlocal] = 0.5 * atof(values[2]);
if (radius[nlocal] < 0.0)
error->one(FLERR,"Invalid radius in Atoms section of data file");
double density = atof(values[3]);
if (density <= 0.0)
error->one(FLERR,"Invalid density in Atoms section of data file");
if (radius[nlocal] == 0.0) rmass[nlocal] = density;
else
rmass[nlocal] = 4.0*MY_PI/3.0 *
radius[nlocal]*radius[nlocal]*radius[nlocal] * density;
x[nlocal][0] = coord[0];
x[nlocal][1] = coord[1];
x[nlocal][2] = coord[2];
image[nlocal] = imagetmp;
mask[nlocal] = 1;
v[nlocal][0] = 0.0;
v[nlocal][1] = 0.0;
v[nlocal][2] = 0.0;
omega[nlocal][0] = 0.0;
omega[nlocal][1] = 0.0;
omega[nlocal][2] = 0.0;
atom->nlocal++;
}
/* ----------------------------------------------------------------------
unpack hybrid quantities from one line in Atoms section of data file
initialize other atom quantities for this sub-style
------------------------------------------------------------------------- */
int AtomVecSphere::data_atom_hybrid(int nlocal, char **values)
{
radius[nlocal] = 0.5 * atof(values[0]);
if (radius[nlocal] < 0.0)
error->one(FLERR,"Invalid radius in Atoms section of data file");
double density = atof(values[1]);
if (density <= 0.0)
error->one(FLERR,"Invalid density in Atoms section of data file");
if (radius[nlocal] == 0.0) rmass[nlocal] = density;
else
rmass[nlocal] = 4.0*MY_PI/3.0 *
radius[nlocal]*radius[nlocal]*radius[nlocal] * density;
return 2;
}
/* ----------------------------------------------------------------------
unpack one line from Velocities section of data file
------------------------------------------------------------------------- */
void AtomVecSphere::data_vel(int m, char **values)
{
v[m][0] = atof(values[0]);
v[m][1] = atof(values[1]);
v[m][2] = atof(values[2]);
omega[m][0] = atof(values[3]);
omega[m][1] = atof(values[4]);
omega[m][2] = atof(values[5]);
}
/* ----------------------------------------------------------------------
unpack hybrid quantities from one line in Velocities section of data file
------------------------------------------------------------------------- */
int AtomVecSphere::data_vel_hybrid(int m, char **values)
{
omega[m][0] = atof(values[0]);
omega[m][1] = atof(values[1]);
omega[m][2] = atof(values[2]);
return 3;
}
/* ----------------------------------------------------------------------
pack atom info for data file including 3 image flags
------------------------------------------------------------------------- */
void AtomVecSphere::pack_data(double **buf)
{
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
buf[i][0] = ubuf(tag[i]).d;
buf[i][1] = ubuf(type[i]).d;
buf[i][2] = 2.0*radius[i];
if (radius[i] == 0.0) buf[i][3] = rmass[i];
else
buf[i][3] = rmass[i] / (4.0*MY_PI/3.0 * radius[i]*radius[i]*radius[i]);
buf[i][4] = x[i][0];
buf[i][5] = x[i][1];
buf[i][6] = x[i][2];
buf[i][7] = ubuf((image[i] & IMGMASK) - IMGMAX).d;
buf[i][8] = ubuf((image[i] >> IMGBITS & IMGMASK) - IMGMAX).d;
buf[i][9] = ubuf((image[i] >> IMG2BITS) - IMGMAX).d;
}
}
/* ----------------------------------------------------------------------
pack hybrid atom info for data file
------------------------------------------------------------------------- */
int AtomVecSphere::pack_data_hybrid(int i, double *buf)
{
buf[0] = 2.0*radius[i];
if (radius[i] == 0.0) buf[1] = rmass[i];
else buf[1] = rmass[i] / (4.0*MY_PI/3.0 * radius[i]*radius[i]*radius[i]);
return 2;
}
/* ----------------------------------------------------------------------
write atom info to data file including 3 image flags
------------------------------------------------------------------------- */
void AtomVecSphere::write_data(FILE *fp, int n, double **buf)
{
for (int i = 0; i < n; i++)
fprintf(fp,TAGINT_FORMAT
" %d %-1.16e %-1.16e %-1.16e %-1.16e %-1.16e %d %d %d\n",
(tagint) ubuf(buf[i][0]).i,(int) ubuf(buf[i][1]).i,
buf[i][2],buf[i][3],
buf[i][4],buf[i][5],buf[i][6],
(int) ubuf(buf[i][7]).i,(int) ubuf(buf[i][8]).i,
(int) ubuf(buf[i][9]).i);
}
/* ----------------------------------------------------------------------
write hybrid atom info to data file
------------------------------------------------------------------------- */
int AtomVecSphere::write_data_hybrid(FILE *fp, double *buf)
{
fprintf(fp," %-1.16e %-1.16e",buf[0],buf[1]);
return 2;
}
/* ----------------------------------------------------------------------
pack velocity info for data file
------------------------------------------------------------------------- */
void AtomVecSphere::pack_vel(double **buf)
{
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
buf[i][0] = ubuf(tag[i]).d;
buf[i][1] = v[i][0];
buf[i][2] = v[i][1];
buf[i][3] = v[i][2];
buf[i][4] = omega[i][0];
buf[i][5] = omega[i][1];
buf[i][6] = omega[i][2];
}
}
/* ----------------------------------------------------------------------
pack hybrid velocity info for data file
------------------------------------------------------------------------- */
int AtomVecSphere::pack_vel_hybrid(int i, double *buf)
{
buf[0] = omega[i][0];
buf[1] = omega[i][1];
buf[2] = omega[i][2];
return 3;
}
/* ----------------------------------------------------------------------
write velocity info to data file
------------------------------------------------------------------------- */
void AtomVecSphere::write_vel(FILE *fp, int n, double **buf)
{
for (int i = 0; i < n; i++)
fprintf(fp,TAGINT_FORMAT
" %-1.16e %-1.16e %-1.16e %-1.16e %-1.16e %-1.16e\n",
(tagint) ubuf(buf[i][0]).i,buf[i][1],buf[i][2],buf[i][3],
buf[i][4],buf[i][5],buf[i][6]);
}
/* ----------------------------------------------------------------------
write hybrid velocity info to data file
------------------------------------------------------------------------- */
int AtomVecSphere::write_vel_hybrid(FILE *fp, double *buf)
{
fprintf(fp," %-1.16e %-1.16e %-1.16e",buf[0],buf[1],buf[2]);
return 3;
}
/* ----------------------------------------------------------------------
return # of bytes of allocated memory
------------------------------------------------------------------------- */
bigint AtomVecSphere::memory_usage()
{
bigint bytes = 0;
if (atom->memcheck("tag")) bytes += memory->usage(tag,nmax);
if (atom->memcheck("type")) bytes += memory->usage(type,nmax);
if (atom->memcheck("mask")) bytes += memory->usage(mask,nmax);
if (atom->memcheck("image")) bytes += memory->usage(image,nmax);
if (atom->memcheck("x")) bytes += memory->usage(x,nmax,3);
if (atom->memcheck("v")) bytes += memory->usage(v,nmax,3);
if (atom->memcheck("f")) bytes += memory->usage(f,nmax*comm->nthreads,3);
if (atom->memcheck("radius")) bytes += memory->usage(radius,nmax);
if (atom->memcheck("rmass")) bytes += memory->usage(rmass,nmax);
if (atom->memcheck("omega")) bytes += memory->usage(omega,nmax,3);
if (atom->memcheck("torque"))
bytes += memory->usage(torque,nmax*comm->nthreads,3);
return bytes;
}
diff --git a/src/atom_vec_tri.cpp b/src/atom_vec_tri.cpp
index 8ffc39cec..eb87e75b1 100644
--- a/src/atom_vec_tri.cpp
+++ b/src/atom_vec_tri.cpp
@@ -1,1842 +1,1842 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include <math.h>
#include <stdlib.h>
#include <string.h>
#include "atom_vec_tri.h"
#include "math_extra.h"
#include "atom.h"
#include "comm.h"
#include "domain.h"
#include "modify.h"
#include "force.h"
#include "fix.h"
#include "math_const.h"
#include "memory.h"
#include "error.h"
using namespace LAMMPS_NS;
using namespace MathConst;
#define EPSILON 0.001
/* ---------------------------------------------------------------------- */
AtomVecTri::AtomVecTri(LAMMPS *lmp) : AtomVec(lmp)
{
molecular = 0;
comm_x_only = comm_f_only = 0;
size_forward = 7;
size_reverse = 6;
size_border = 26;
size_velocity = 9;
size_data_atom = 8;
size_data_vel = 7;
size_data_bonus = 10;
xcol_data = 6;
atom->tri_flag = 1;
atom->molecule_flag = atom->rmass_flag = 1;
atom->radius_flag = atom->omega_flag = atom->angmom_flag = 1;
atom->torque_flag = 1;
atom->sphere_flag = 1;
nlocal_bonus = nghost_bonus = nmax_bonus = 0;
bonus = NULL;
if (domain->dimension != 3)
error->all(FLERR,"Atom_style tri can only be used in 3d simulations");
}
/* ---------------------------------------------------------------------- */
AtomVecTri::~AtomVecTri()
{
memory->sfree(bonus);
}
/* ---------------------------------------------------------------------- */
void AtomVecTri::init()
{
AtomVec::init();
if (domain->dimension != 3)
error->all(FLERR,"Atom_style tri can only be used in 3d simulations");
}
/* ----------------------------------------------------------------------
grow atom arrays
n = 0 grows arrays by a chunk
n > 0 allocates arrays to size n
------------------------------------------------------------------------- */
void AtomVecTri::grow(int n)
{
if (n == 0) grow_nmax();
else nmax = n;
atom->nmax = nmax;
- if (nmax < 0)
+ if (nmax < 0 || nmax > MAXSMALLINT)
error->one(FLERR,"Per-processor system is too big");
tag = memory->grow(atom->tag,nmax,"atom:tag");
type = memory->grow(atom->type,nmax,"atom:type");
mask = memory->grow(atom->mask,nmax,"atom:mask");
image = memory->grow(atom->image,nmax,"atom:image");
x = memory->grow(atom->x,nmax,3,"atom:x");
v = memory->grow(atom->v,nmax,3,"atom:v");
f = memory->grow(atom->f,nmax*comm->nthreads,3,"atom:f");
molecule = memory->grow(atom->molecule,nmax,"atom:molecule");
rmass = memory->grow(atom->rmass,nmax,"atom:rmass");
radius = memory->grow(atom->radius,nmax,"atom:radius");
omega = memory->grow(atom->omega,nmax,3,"atom:omega");
angmom = memory->grow(atom->angmom,nmax,3,"atom:angmom");
torque = memory->grow(atom->torque,nmax*comm->nthreads,3,"atom:torque");
tri = memory->grow(atom->tri,nmax,"atom:tri");
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
modify->fix[atom->extra_grow[iextra]]->grow_arrays(nmax);
}
/* ----------------------------------------------------------------------
reset local array ptrs
------------------------------------------------------------------------- */
void AtomVecTri::grow_reset()
{
tag = atom->tag; type = atom->type;
mask = atom->mask; image = atom->image;
x = atom->x; v = atom->v; f = atom->f;
molecule = atom->molecule; rmass = atom->rmass;
radius = atom->radius; omega = atom->omega;
angmom = atom->angmom; torque = atom->torque;
tri = atom->tri;
}
/* ----------------------------------------------------------------------
grow bonus data structure
------------------------------------------------------------------------- */
void AtomVecTri::grow_bonus()
{
nmax_bonus = grow_nmax_bonus(nmax_bonus);
if (nmax_bonus < 0)
error->one(FLERR,"Per-processor system is too big");
bonus = (Bonus *) memory->srealloc(bonus,nmax_bonus*sizeof(Bonus),
"atom:bonus");
}
/* ----------------------------------------------------------------------
copy atom I info to atom J
if delflag and atom J has bonus data, then delete it
------------------------------------------------------------------------- */
void AtomVecTri::copy(int i, int j, int delflag)
{
tag[j] = tag[i];
type[j] = type[i];
mask[j] = mask[i];
image[j] = image[i];
x[j][0] = x[i][0];
x[j][1] = x[i][1];
x[j][2] = x[i][2];
v[j][0] = v[i][0];
v[j][1] = v[i][1];
v[j][2] = v[i][2];
molecule[j] = molecule[i];
rmass[j] = rmass[i];
radius[j] = radius[i];
omega[j][0] = omega[i][0];
omega[j][1] = omega[i][1];
omega[j][2] = omega[i][2];
angmom[j][0] = angmom[i][0];
angmom[j][1] = angmom[i][1];
angmom[j][2] = angmom[i][2];
// if deleting atom J via delflag and J has bonus data, then delete it
if (delflag && tri[j] >= 0) {
copy_bonus(nlocal_bonus-1,tri[j]);
nlocal_bonus--;
}
// if atom I has bonus data, reset I's bonus.ilocal to loc J
// do NOT do this if self-copy (I=J) since I's bonus data is already deleted
if (tri[i] >= 0 && i != j) bonus[tri[i]].ilocal = j;
tri[j] = tri[i];
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
modify->fix[atom->extra_grow[iextra]]->copy_arrays(i,j,delflag);
}
/* ----------------------------------------------------------------------
copy bonus data from I to J, effectively deleting the J entry
also reset tri that points to I to now point to J
------------------------------------------------------------------------- */
void AtomVecTri::copy_bonus(int i, int j)
{
tri[bonus[i].ilocal] = j;
memcpy(&bonus[j],&bonus[i],sizeof(Bonus));
}
/* ----------------------------------------------------------------------
clear ghost info in bonus data
called before ghosts are recommunicated in comm and irregular
------------------------------------------------------------------------- */
void AtomVecTri::clear_bonus()
{
nghost_bonus = 0;
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
modify->fix[atom->extra_grow[iextra]]->clear_bonus();
}
/* ----------------------------------------------------------------------
set equilateral tri of size in bonus data for particle I
oriented symmetrically in xy plane
this may create or delete entry in bonus data
------------------------------------------------------------------------- */
void AtomVecTri::set_equilateral(int i, double size)
{
// also set radius = distance from center to corner-pt = len(c1)
// unless size = 0.0, then set diameter = 1.0
if (tri[i] < 0) {
if (size == 0.0) return;
if (nlocal_bonus == nmax_bonus) grow_bonus();
double *quat = bonus[nlocal_bonus].quat;
double *c1 = bonus[nlocal_bonus].c1;
double *c2 = bonus[nlocal_bonus].c2;
double *c3 = bonus[nlocal_bonus].c3;
double *inertia = bonus[nlocal_bonus].inertia;
quat[0] = 1.0;
quat[1] = 0.0;
quat[2] = 0.0;
quat[3] = 0.0;
c1[0] = -size/2.0;
c1[1] = -sqrt(3.0)/2.0 * size / 3.0;
c1[2] = 0.0;
c2[0] = size/2.0;
c2[1] = -sqrt(3.0)/2.0 * size / 3.0;
c2[2] = 0.0;
c3[0] = 0.0;
c3[1] = sqrt(3.0)/2.0 * size * 2.0/3.0;
c3[2] = 0.0;
inertia[0] = sqrt(3.0)/96.0 * size*size*size*size;
inertia[1] = sqrt(3.0)/96.0 * size*size*size*size;
inertia[2] = sqrt(3.0)/48.0 * size*size*size*size;
radius[i] = MathExtra::len3(c1);
bonus[nlocal_bonus].ilocal = i;
tri[i] = nlocal_bonus++;
} else if (size == 0.0) {
radius[i] = 0.5;
copy_bonus(nlocal_bonus-1,tri[i]);
nlocal_bonus--;
tri[i] = -1;
} else {
double *c1 = bonus[tri[i]].c1;
double *c2 = bonus[tri[i]].c2;
double *c3 = bonus[tri[i]].c3;
double *inertia = bonus[tri[i]].inertia;
c1[0] = -size/2.0;
c1[1] = -sqrt(3.0)/2.0 * size / 3.0;
c1[2] = 0.0;
c2[0] = size/2.0;
c2[1] = -sqrt(3.0)/2.0 * size / 3.0;
c2[2] = 0.0;
c3[0] = 0.0;
c3[1] = sqrt(3.0)/2.0 * size * 2.0/3.0;
c3[2] = 0.0;
inertia[0] = sqrt(3.0)/96.0 * size*size*size*size;
inertia[1] = sqrt(3.0)/96.0 * size*size*size*size;
inertia[2] = sqrt(3.0)/48.0 * size*size*size*size;
radius[i] = MathExtra::len3(c1);
}
}
/* ---------------------------------------------------------------------- */
int AtomVecTri::pack_comm(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz;
double *quat;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0];
buf[m++] = x[j][1];
buf[m++] = x[j][2];
if (tri[j] >= 0) {
quat = bonus[tri[j]].quat;
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
}
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0]*domain->xprd + pbc[5]*domain->xy + pbc[4]*domain->xz;
dy = pbc[1]*domain->yprd + pbc[3]*domain->yz;
dz = pbc[2]*domain->zprd;
}
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
if (tri[j] >= 0) {
quat = bonus[tri[j]].quat;
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
}
}
}
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecTri::pack_comm_vel(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz,dvx,dvy,dvz;
double *quat;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0];
buf[m++] = x[j][1];
buf[m++] = x[j][2];
if (tri[j] >= 0) {
quat = bonus[tri[j]].quat;
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
}
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
buf[m++] = omega[j][0];
buf[m++] = omega[j][1];
buf[m++] = omega[j][2];
buf[m++] = angmom[j][0];
buf[m++] = angmom[j][1];
buf[m++] = angmom[j][2];
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0]*domain->xprd + pbc[5]*domain->xy + pbc[4]*domain->xz;
dy = pbc[1]*domain->yprd + pbc[3]*domain->yz;
dz = pbc[2]*domain->zprd;
}
if (!deform_vremap) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
if (tri[j] >= 0) {
quat = bonus[tri[j]].quat;
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
}
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
buf[m++] = omega[j][0];
buf[m++] = omega[j][1];
buf[m++] = omega[j][2];
buf[m++] = angmom[j][0];
buf[m++] = angmom[j][1];
buf[m++] = angmom[j][2];
}
} else {
dvx = pbc[0]*h_rate[0] + pbc[5]*h_rate[5] + pbc[4]*h_rate[4];
dvy = pbc[1]*h_rate[1] + pbc[3]*h_rate[3];
dvz = pbc[2]*h_rate[2];
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
if (tri[j] >= 0) {
quat = bonus[tri[j]].quat;
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
}
if (mask[i] & deform_groupbit) {
buf[m++] = v[j][0] + dvx;
buf[m++] = v[j][1] + dvy;
buf[m++] = v[j][2] + dvz;
} else {
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
}
buf[m++] = omega[j][0];
buf[m++] = omega[j][1];
buf[m++] = omega[j][2];
buf[m++] = angmom[j][0];
buf[m++] = angmom[j][1];
buf[m++] = angmom[j][2];
}
}
}
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecTri::pack_comm_hybrid(int n, int *list, double *buf)
{
int i,j,m;
double *quat;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
if (tri[j] >= 0) {
quat = bonus[tri[j]].quat;
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
}
}
return m;
}
/* ---------------------------------------------------------------------- */
void AtomVecTri::unpack_comm(int n, int first, double *buf)
{
int i,m,last;
double *quat;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
x[i][0] = buf[m++];
x[i][1] = buf[m++];
x[i][2] = buf[m++];
if (tri[i] >= 0) {
quat = bonus[tri[i]].quat;
quat[0] = buf[m++];
quat[1] = buf[m++];
quat[2] = buf[m++];
quat[3] = buf[m++];
}
}
}
/* ---------------------------------------------------------------------- */
void AtomVecTri::unpack_comm_vel(int n, int first, double *buf)
{
int i,m,last;
double *quat;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
x[i][0] = buf[m++];
x[i][1] = buf[m++];
x[i][2] = buf[m++];
if (tri[i] >= 0) {
quat = bonus[tri[i]].quat;
quat[0] = buf[m++];
quat[1] = buf[m++];
quat[2] = buf[m++];
quat[3] = buf[m++];
}
v[i][0] = buf[m++];
v[i][1] = buf[m++];
v[i][2] = buf[m++];
omega[i][0] = buf[m++];
omega[i][1] = buf[m++];
omega[i][2] = buf[m++];
angmom[i][0] = buf[m++];
angmom[i][1] = buf[m++];
angmom[i][2] = buf[m++];
}
}
/* ---------------------------------------------------------------------- */
int AtomVecTri::unpack_comm_hybrid(int n, int first, double *buf)
{
int i,m,last;
double *quat;
m = 0;
last = first + n;
for (i = first; i < last; i++)
if (tri[i] >= 0) {
quat = bonus[tri[i]].quat;
quat[0] = buf[m++];
quat[1] = buf[m++];
quat[2] = buf[m++];
quat[3] = buf[m++];
}
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecTri::pack_reverse(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
buf[m++] = f[i][0];
buf[m++] = f[i][1];
buf[m++] = f[i][2];
buf[m++] = torque[i][0];
buf[m++] = torque[i][1];
buf[m++] = torque[i][2];
}
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecTri::pack_reverse_hybrid(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
buf[m++] = torque[i][0];
buf[m++] = torque[i][1];
buf[m++] = torque[i][2];
}
return m;
}
/* ---------------------------------------------------------------------- */
void AtomVecTri::unpack_reverse(int n, int *list, double *buf)
{
int i,j,m;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
f[j][0] += buf[m++];
f[j][1] += buf[m++];
f[j][2] += buf[m++];
torque[j][0] += buf[m++];
torque[j][1] += buf[m++];
torque[j][2] += buf[m++];
}
}
/* ---------------------------------------------------------------------- */
int AtomVecTri::unpack_reverse_hybrid(int n, int *list, double *buf)
{
int i,j,m;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
torque[j][0] += buf[m++];
torque[j][1] += buf[m++];
torque[j][2] += buf[m++];
}
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecTri::pack_border(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz;
double *quat,*c1,*c2,*c3,*inertia;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0];
buf[m++] = x[j][1];
buf[m++] = x[j][2];
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
buf[m++] = ubuf(molecule[j]).d;
buf[m++] = radius[j];
buf[m++] = rmass[j];
if (tri[j] < 0) buf[m++] = ubuf(0).d;
else {
buf[m++] = ubuf(1).d;
quat = bonus[tri[j]].quat;
c1 = bonus[tri[j]].c1;
c2 = bonus[tri[j]].c2;
c3 = bonus[tri[j]].c3;
inertia = bonus[tri[j]].inertia;
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
buf[m++] = c1[0];
buf[m++] = c1[1];
buf[m++] = c1[2];
buf[m++] = c2[0];
buf[m++] = c2[1];
buf[m++] = c2[2];
buf[m++] = c3[0];
buf[m++] = c3[1];
buf[m++] = c3[2];
buf[m++] = inertia[0];
buf[m++] = inertia[1];
buf[m++] = inertia[2];
}
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0];
dy = pbc[1];
dz = pbc[2];
}
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
buf[m++] = ubuf(molecule[j]).d;
buf[m++] = radius[j];
buf[m++] = rmass[j];
if (tri[j] < 0) buf[m++] = ubuf(0).d;
else {
buf[m++] = ubuf(1).d;
quat = bonus[tri[j]].quat;
c1 = bonus[tri[j]].c1;
c2 = bonus[tri[j]].c2;
c3 = bonus[tri[j]].c3;
inertia = bonus[tri[j]].inertia;
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
buf[m++] = c1[0];
buf[m++] = c1[1];
buf[m++] = c1[2];
buf[m++] = c2[0];
buf[m++] = c2[1];
buf[m++] = c2[2];
buf[m++] = c3[0];
buf[m++] = c3[1];
buf[m++] = c3[2];
buf[m++] = inertia[0];
buf[m++] = inertia[1];
buf[m++] = inertia[2];
}
}
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecTri::pack_border_vel(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz,dvx,dvy,dvz;
double *quat,*c1,*c2,*c3,*inertia;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0];
buf[m++] = x[j][1];
buf[m++] = x[j][2];
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
buf[m++] = ubuf(molecule[j]).d;
buf[m++] = radius[j];
buf[m++] = rmass[j];
if (tri[j] < 0) buf[m++] = ubuf(0).d;
else {
buf[m++] = ubuf(1).d;
quat = bonus[tri[j]].quat;
c1 = bonus[tri[j]].c1;
c2 = bonus[tri[j]].c2;
c3 = bonus[tri[j]].c3;
inertia = bonus[tri[j]].inertia;
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
buf[m++] = c1[0];
buf[m++] = c1[1];
buf[m++] = c1[2];
buf[m++] = c2[0];
buf[m++] = c2[1];
buf[m++] = c2[2];
buf[m++] = c3[0];
buf[m++] = c3[1];
buf[m++] = c3[2];
buf[m++] = inertia[0];
buf[m++] = inertia[1];
buf[m++] = inertia[2];
}
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
buf[m++] = omega[j][0];
buf[m++] = omega[j][1];
buf[m++] = omega[j][2];
buf[m++] = angmom[j][0];
buf[m++] = angmom[j][1];
buf[m++] = angmom[j][2];
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0];
dy = pbc[1];
dz = pbc[2];
}
if (!deform_vremap) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
buf[m++] = ubuf(molecule[j]).d;
buf[m++] = radius[j];
buf[m++] = rmass[j];
if (tri[j] < 0) buf[m++] = ubuf(0).d;
else {
buf[m++] = ubuf(1).d;
quat = bonus[tri[j]].quat;
c1 = bonus[tri[j]].c1;
c2 = bonus[tri[j]].c2;
c3 = bonus[tri[j]].c3;
inertia = bonus[tri[j]].inertia;
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
buf[m++] = c1[0];
buf[m++] = c1[1];
buf[m++] = c1[2];
buf[m++] = c2[0];
buf[m++] = c2[1];
buf[m++] = c2[2];
buf[m++] = c3[0];
buf[m++] = c3[1];
buf[m++] = c3[2];
buf[m++] = inertia[0];
buf[m++] = inertia[1];
buf[m++] = inertia[2];
}
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
buf[m++] = omega[j][0];
buf[m++] = omega[j][1];
buf[m++] = omega[j][2];
buf[m++] = angmom[j][0];
buf[m++] = angmom[j][1];
buf[m++] = angmom[j][2];
}
} else {
dvx = pbc[0]*h_rate[0] + pbc[5]*h_rate[5] + pbc[4]*h_rate[4];
dvy = pbc[1]*h_rate[1] + pbc[3]*h_rate[3];
dvz = pbc[2]*h_rate[2];
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = x[j][0] + dx;
buf[m++] = x[j][1] + dy;
buf[m++] = x[j][2] + dz;
buf[m++] = ubuf(tag[j]).d;
buf[m++] = ubuf(type[j]).d;
buf[m++] = ubuf(mask[j]).d;
buf[m++] = ubuf(molecule[j]).d;
buf[m++] = radius[j];
buf[m++] = rmass[j];
if (tri[j] < 0) buf[m++] = ubuf(0).d;
else {
buf[m++] = ubuf(1).d;
quat = bonus[tri[j]].quat;
c1 = bonus[tri[j]].c1;
c2 = bonus[tri[j]].c2;
c3 = bonus[tri[j]].c3;
inertia = bonus[tri[j]].inertia;
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
buf[m++] = c1[0];
buf[m++] = c1[1];
buf[m++] = c1[2];
buf[m++] = c2[0];
buf[m++] = c2[1];
buf[m++] = c2[2];
buf[m++] = c3[0];
buf[m++] = c3[1];
buf[m++] = c3[2];
buf[m++] = inertia[0];
buf[m++] = inertia[1];
buf[m++] = inertia[2];
}
if (mask[i] & deform_groupbit) {
buf[m++] = v[j][0] + dvx;
buf[m++] = v[j][1] + dvy;
buf[m++] = v[j][2] + dvz;
} else {
buf[m++] = v[j][0];
buf[m++] = v[j][1];
buf[m++] = v[j][2];
}
buf[m++] = omega[j][0];
buf[m++] = omega[j][1];
buf[m++] = omega[j][2];
buf[m++] = angmom[j][0];
buf[m++] = angmom[j][1];
buf[m++] = angmom[j][2];
}
}
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecTri::pack_border_hybrid(int n, int *list, double *buf)
{
int i,j,m;
double *quat,*c1,*c2,*c3,*inertia;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = ubuf(molecule[j]).d;
buf[m++] = radius[j];
buf[m++] = rmass[j];
if (tri[j] < 0) buf[m++] = ubuf(0).d;
else {
buf[m++] = ubuf(1).d;
quat = bonus[tri[j]].quat;
c1 = bonus[tri[j]].c1;
c2 = bonus[tri[j]].c2;
c3 = bonus[tri[j]].c3;
inertia = bonus[tri[j]].inertia;
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
buf[m++] = c1[0];
buf[m++] = c1[1];
buf[m++] = c1[2];
buf[m++] = c2[0];
buf[m++] = c2[1];
buf[m++] = c2[2];
buf[m++] = c3[0];
buf[m++] = c3[1];
buf[m++] = c3[2];
buf[m++] = inertia[0];
buf[m++] = inertia[1];
buf[m++] = inertia[2];
}
}
return m;
}
/* ---------------------------------------------------------------------- */
void AtomVecTri::unpack_border(int n, int first, double *buf)
{
int i,j,m,last;
double *quat,*c1,*c2,*c3,*inertia;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
if (i == nmax) grow(0);
x[i][0] = buf[m++];
x[i][1] = buf[m++];
x[i][2] = buf[m++];
tag[i] = (tagint) ubuf(buf[m++]).i;
type[i] = (int) ubuf(buf[m++]).i;
mask[i] = (int) ubuf(buf[m++]).i;
molecule[i] = (tagint) ubuf(buf[m++]).i;
radius[i] = buf[m++];
rmass[i] = buf[m++];
tri[i] = (int) ubuf(buf[m++]).i;
if (tri[i] == 0) tri[i] = -1;
else {
j = nlocal_bonus + nghost_bonus;
if (j == nmax_bonus) grow_bonus();
quat = bonus[j].quat;
c1 = bonus[j].c1;
c2 = bonus[j].c2;
c3 = bonus[j].c3;
inertia = bonus[j].inertia;
quat[0] = buf[m++];
quat[1] = buf[m++];
quat[2] = buf[m++];
quat[3] = buf[m++];
c1[0] = buf[m++];
c1[1] = buf[m++];
c1[2] = buf[m++];
c2[0] = buf[m++];
c2[1] = buf[m++];
c2[2] = buf[m++];
c3[0] = buf[m++];
c3[1] = buf[m++];
c3[2] = buf[m++];
inertia[0] = buf[m++];
inertia[1] = buf[m++];
inertia[2] = buf[m++];
bonus[j].ilocal = i;
tri[i] = j;
nghost_bonus++;
}
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->
unpack_border(n,first,&buf[m]);
}
/* ---------------------------------------------------------------------- */
void AtomVecTri::unpack_border_vel(int n, int first, double *buf)
{
int i,j,m,last;
double *quat,*c1,*c2,*c3,*inertia;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
if (i == nmax) grow(0);
x[i][0] = buf[m++];
x[i][1] = buf[m++];
x[i][2] = buf[m++];
tag[i] = (tagint) ubuf(buf[m++]).i;
type[i] = (int) ubuf(buf[m++]).i;
mask[i] = (int) ubuf(buf[m++]).i;
molecule[i] = (tagint) ubuf(buf[m++]).i;
radius[i] = buf[m++];
rmass[i] = buf[m++];
tri[i] = (int) ubuf(buf[m++]).i;
if (tri[i] == 0) tri[i] = -1;
else {
j = nlocal_bonus + nghost_bonus;
if (j == nmax_bonus) grow_bonus();
quat = bonus[j].quat;
c1 = bonus[j].c1;
c2 = bonus[j].c2;
c3 = bonus[j].c3;
inertia = bonus[j].inertia;
quat[0] = buf[m++];
quat[1] = buf[m++];
quat[2] = buf[m++];
quat[3] = buf[m++];
c1[0] = buf[m++];
c1[1] = buf[m++];
c1[2] = buf[m++];
c2[0] = buf[m++];
c2[1] = buf[m++];
c2[2] = buf[m++];
c3[0] = buf[m++];
c3[1] = buf[m++];
c3[2] = buf[m++];
inertia[0] = buf[m++];
inertia[1] = buf[m++];
inertia[2] = buf[m++];
bonus[j].ilocal = i;
tri[i] = j;
nghost_bonus++;
}
v[i][0] = buf[m++];
v[i][1] = buf[m++];
v[i][2] = buf[m++];
omega[i][0] = buf[m++];
omega[i][1] = buf[m++];
omega[i][2] = buf[m++];
angmom[i][0] = buf[m++];
angmom[i][1] = buf[m++];
angmom[i][2] = buf[m++];
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->
unpack_border(n,first,&buf[m]);
}
/* ---------------------------------------------------------------------- */
int AtomVecTri::unpack_border_hybrid(int n, int first, double *buf)
{
int i,j,m,last;
double *quat,*c1,*c2,*c3,*inertia;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
molecule[i] = (tagint) ubuf(buf[m++]).i;
radius[i] = buf[m++];
rmass[i] = buf[m++];
tri[i] = (int) ubuf(buf[m++]).i;
if (tri[i] == 0) tri[i] = -1;
else {
j = nlocal_bonus + nghost_bonus;
if (j == nmax_bonus) grow_bonus();
quat = bonus[j].quat;
c1 = bonus[j].c1;
c2 = bonus[j].c2;
c3 = bonus[j].c3;
inertia = bonus[j].inertia;
quat[0] = buf[m++];
quat[1] = buf[m++];
quat[2] = buf[m++];
quat[3] = buf[m++];
c1[0] = buf[m++];
c1[1] = buf[m++];
c1[2] = buf[m++];
c2[0] = buf[m++];
c2[1] = buf[m++];
c2[2] = buf[m++];
c3[0] = buf[m++];
c3[1] = buf[m++];
c3[2] = buf[m++];
inertia[0] = buf[m++];
inertia[1] = buf[m++];
inertia[2] = buf[m++];
bonus[j].ilocal = i;
tri[i] = j;
nghost_bonus++;
}
}
return m;
}
/* ----------------------------------------------------------------------
pack data for atom I for sending to another proc
xyz must be 1st 3 values, so comm::exchange() can test on them
------------------------------------------------------------------------- */
int AtomVecTri::pack_exchange(int i, double *buf)
{
int m = 1;
buf[m++] = x[i][0];
buf[m++] = x[i][1];
buf[m++] = x[i][2];
buf[m++] = v[i][0];
buf[m++] = v[i][1];
buf[m++] = v[i][2];
buf[m++] = ubuf(tag[i]).d;
buf[m++] = ubuf(type[i]).d;
buf[m++] = ubuf(mask[i]).d;
buf[m++] = ubuf(image[i]).d;
buf[m++] = ubuf(molecule[i]).d;
buf[m++] = rmass[i];
buf[m++] = radius[i];
buf[m++] = omega[i][0];
buf[m++] = omega[i][1];
buf[m++] = omega[i][2];
buf[m++] = angmom[i][0];
buf[m++] = angmom[i][1];
buf[m++] = angmom[i][2];
if (tri[i] < 0) buf[m++] = ubuf(0).d;
else {
buf[m++] = ubuf(1).d;
int j = tri[i];
double *quat = bonus[j].quat;
double *c1 = bonus[j].c1;
double *c2 = bonus[j].c2;
double *c3 = bonus[j].c3;
double *inertia = bonus[j].inertia;
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
buf[m++] = c1[0];
buf[m++] = c1[1];
buf[m++] = c1[2];
buf[m++] = c2[0];
buf[m++] = c2[1];
buf[m++] = c2[2];
buf[m++] = c3[0];
buf[m++] = c3[1];
buf[m++] = c3[2];
buf[m++] = inertia[0];
buf[m++] = inertia[1];
buf[m++] = inertia[2];
}
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
m += modify->fix[atom->extra_grow[iextra]]->pack_exchange(i,&buf[m]);
buf[0] = m;
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecTri::unpack_exchange(double *buf)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) grow(0);
int m = 1;
x[nlocal][0] = buf[m++];
x[nlocal][1] = buf[m++];
x[nlocal][2] = buf[m++];
v[nlocal][0] = buf[m++];
v[nlocal][1] = buf[m++];
v[nlocal][2] = buf[m++];
tag[nlocal] = (tagint) ubuf(buf[m++]).i;
type[nlocal] = (int) ubuf(buf[m++]).i;
mask[nlocal] = (int) ubuf(buf[m++]).i;
image[nlocal] = (imageint) ubuf(buf[m++]).i;
molecule[nlocal] = (tagint) ubuf(buf[m++]).i;
rmass[nlocal] = buf[m++];
radius[nlocal] = buf[m++];
omega[nlocal][0] = buf[m++];
omega[nlocal][1] = buf[m++];
omega[nlocal][2] = buf[m++];
angmom[nlocal][0] = buf[m++];
angmom[nlocal][1] = buf[m++];
angmom[nlocal][2] = buf[m++];
tri[nlocal] = (int) ubuf(buf[m++]).i;
if (tri[nlocal] == 0) tri[nlocal] = -1;
else {
if (nlocal_bonus == nmax_bonus) grow_bonus();
double *quat = bonus[nlocal_bonus].quat;
double *c1 = bonus[nlocal_bonus].c1;
double *c2 = bonus[nlocal_bonus].c2;
double *c3 = bonus[nlocal_bonus].c3;
double *inertia = bonus[nlocal_bonus].inertia;
quat[0] = buf[m++];
quat[1] = buf[m++];
quat[2] = buf[m++];
quat[3] = buf[m++];
c1[0] = buf[m++];
c1[1] = buf[m++];
c1[2] = buf[m++];
c2[0] = buf[m++];
c2[1] = buf[m++];
c2[2] = buf[m++];
c3[0] = buf[m++];
c3[1] = buf[m++];
c3[2] = buf[m++];
inertia[0] = buf[m++];
inertia[1] = buf[m++];
inertia[2] = buf[m++];
bonus[nlocal_bonus].ilocal = nlocal;
tri[nlocal] = nlocal_bonus++;
}
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
m += modify->fix[atom->extra_grow[iextra]]->
unpack_exchange(nlocal,&buf[m]);
atom->nlocal++;
return m;
}
/* ----------------------------------------------------------------------
size of restart data for all atoms owned by this proc
include extra data stored by fixes
------------------------------------------------------------------------- */
int AtomVecTri::size_restart()
{
int i;
int n = 0;
int nlocal = atom->nlocal;
for (i = 0; i < nlocal; i++)
if (tri[i] >= 0) n += 37;
else n += 21;
if (atom->nextra_restart)
for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
for (i = 0; i < nlocal; i++)
n += modify->fix[atom->extra_restart[iextra]]->size_restart(i);
return n;
}
/* ----------------------------------------------------------------------
pack atom I's data for restart file including extra quantities
xyz must be 1st 3 values, so that read_restart can test on them
molecular types may be negative, but write as positive
------------------------------------------------------------------------- */
int AtomVecTri::pack_restart(int i, double *buf)
{
int m = 1;
buf[m++] = x[i][0];
buf[m++] = x[i][1];
buf[m++] = x[i][2];
buf[m++] = ubuf(tag[i]).d;
buf[m++] = ubuf(type[i]).d;
buf[m++] = ubuf(mask[i]).d;
buf[m++] = ubuf(image[i]).d;
buf[m++] = v[i][0];
buf[m++] = v[i][1];
buf[m++] = v[i][2];
buf[m++] = ubuf(molecule[i]).d;
buf[m++] = rmass[i];
buf[m++] = radius[i];
buf[m++] = omega[i][0];
buf[m++] = omega[i][1];
buf[m++] = omega[i][2];
buf[m++] = angmom[i][0];
buf[m++] = angmom[i][1];
buf[m++] = angmom[i][2];
if (tri[i] < 0) buf[m++] = ubuf(0).d;
else {
buf[m++] = ubuf(1).d;
int j = tri[i];
double *quat = bonus[j].quat;
double *c1 = bonus[j].c1;
double *c2 = bonus[j].c2;
double *c3 = bonus[j].c3;
double *inertia = bonus[j].inertia;
buf[m++] = quat[0];
buf[m++] = quat[1];
buf[m++] = quat[2];
buf[m++] = quat[3];
buf[m++] = c1[0];
buf[m++] = c1[1];
buf[m++] = c1[2];
buf[m++] = c2[0];
buf[m++] = c2[1];
buf[m++] = c2[2];
buf[m++] = c3[0];
buf[m++] = c3[1];
buf[m++] = c3[2];
buf[m++] = inertia[0];
buf[m++] = inertia[1];
buf[m++] = inertia[2];
}
if (atom->nextra_restart)
for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
m += modify->fix[atom->extra_restart[iextra]]->pack_restart(i,&buf[m]);
buf[0] = m;
return m;
}
/* ----------------------------------------------------------------------
unpack data for one atom from restart file including extra quantities
------------------------------------------------------------------------- */
int AtomVecTri::unpack_restart(double *buf)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) {
grow(0);
if (atom->nextra_store)
memory->grow(atom->extra,nmax,atom->nextra_store,"atom:extra");
}
int m = 1;
x[nlocal][0] = buf[m++];
x[nlocal][1] = buf[m++];
x[nlocal][2] = buf[m++];
tag[nlocal] = (tagint) ubuf(buf[m++]).i;
type[nlocal] = (int) ubuf(buf[m++]).i;
mask[nlocal] = (int) ubuf(buf[m++]).i;
image[nlocal] = (imageint) ubuf(buf[m++]).i;
v[nlocal][0] = buf[m++];
v[nlocal][1] = buf[m++];
v[nlocal][2] = buf[m++];
molecule[nlocal] = (tagint) ubuf(buf[m++]).i;
rmass[nlocal] = buf[m++];
radius[nlocal] = buf[m++];
omega[nlocal][0] = buf[m++];
omega[nlocal][1] = buf[m++];
omega[nlocal][2] = buf[m++];
angmom[nlocal][0] = buf[m++];
angmom[nlocal][1] = buf[m++];
angmom[nlocal][2] = buf[m++];
tri[nlocal] = (int) ubuf(buf[m++]).i;
if (tri[nlocal] == 0) tri[nlocal] = -1;
else {
if (nlocal_bonus == nmax_bonus) grow_bonus();
double *quat = bonus[nlocal_bonus].quat;
double *c1 = bonus[nlocal_bonus].c1;
double *c2 = bonus[nlocal_bonus].c2;
double *c3 = bonus[nlocal_bonus].c3;
double *inertia = bonus[nlocal_bonus].inertia;
quat[0] = buf[m++];
quat[1] = buf[m++];
quat[2] = buf[m++];
quat[3] = buf[m++];
c1[0] = buf[m++];
c1[1] = buf[m++];
c1[2] = buf[m++];
c2[0] = buf[m++];
c2[1] = buf[m++];
c2[2] = buf[m++];
c3[0] = buf[m++];
c3[1] = buf[m++];
c3[2] = buf[m++];
inertia[0] = buf[m++];
inertia[1] = buf[m++];
inertia[2] = buf[m++];
bonus[nlocal_bonus].ilocal = nlocal;
tri[nlocal] = nlocal_bonus++;
}
double **extra = atom->extra;
if (atom->nextra_store) {
int size = static_cast<int> (buf[0]) - m;
for (int i = 0; i < size; i++) extra[nlocal][i] = buf[m++];
}
atom->nlocal++;
return m;
}
/* ----------------------------------------------------------------------
create one atom of itype at coord
set other values to defaults
------------------------------------------------------------------------- */
void AtomVecTri::create_atom(int itype, double *coord)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) grow(0);
tag[nlocal] = 0;
type[nlocal] = itype;
x[nlocal][0] = coord[0];
x[nlocal][1] = coord[1];
x[nlocal][2] = coord[2];
mask[nlocal] = 1;
image[nlocal] = ((imageint) IMGMAX << IMG2BITS) |
((imageint) IMGMAX << IMGBITS) | IMGMAX;
v[nlocal][0] = 0.0;
v[nlocal][1] = 0.0;
v[nlocal][2] = 0.0;
molecule[nlocal] = 0;
radius[nlocal] = 0.5;
rmass[nlocal] = 4.0*MY_PI/3.0 * radius[nlocal]*radius[nlocal]*radius[nlocal];
omega[nlocal][0] = 0.0;
omega[nlocal][1] = 0.0;
omega[nlocal][2] = 0.0;
angmom[nlocal][0] = 0.0;
angmom[nlocal][1] = 0.0;
angmom[nlocal][2] = 0.0;
tri[nlocal] = -1;
atom->nlocal++;
}
/* ----------------------------------------------------------------------
unpack one line from Atoms section of data file
initialize other atom quantities
------------------------------------------------------------------------- */
void AtomVecTri::data_atom(double *coord, imageint imagetmp, char **values)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) grow(0);
tag[nlocal] = ATOTAGINT(values[0]);
molecule[nlocal] = ATOTAGINT(values[1]);
type[nlocal] = atoi(values[2]);
if (type[nlocal] <= 0 || type[nlocal] > atom->ntypes)
error->one(FLERR,"Invalid atom type in Atoms section of data file");
tri[nlocal] = atoi(values[3]);
if (tri[nlocal] == 0) tri[nlocal] = -1;
else if (tri[nlocal] == 1) tri[nlocal] = 0;
else error->one(FLERR,"Invalid atom type in Atoms section of data file");
rmass[nlocal] = atof(values[4]);
if (rmass[nlocal] <= 0.0)
error->one(FLERR,"Invalid density in Atoms section of data file");
if (tri[nlocal] < 0) {
radius[nlocal] = 0.5;
rmass[nlocal] *= 4.0*MY_PI/3.0 *
radius[nlocal]*radius[nlocal]*radius[nlocal];
} else radius[nlocal] = 0.0;
x[nlocal][0] = coord[0];
x[nlocal][1] = coord[1];
x[nlocal][2] = coord[2];
image[nlocal] = imagetmp;
mask[nlocal] = 1;
v[nlocal][0] = 0.0;
v[nlocal][1] = 0.0;
v[nlocal][2] = 0.0;
omega[nlocal][0] = 0.0;
omega[nlocal][1] = 0.0;
omega[nlocal][2] = 0.0;
angmom[nlocal][0] = 0.0;
angmom[nlocal][1] = 0.0;
angmom[nlocal][2] = 0.0;
atom->nlocal++;
}
/* ----------------------------------------------------------------------
unpack hybrid quantities from one tri in Atoms section of data file
initialize other atom quantities for this sub-style
------------------------------------------------------------------------- */
int AtomVecTri::data_atom_hybrid(int nlocal, char **values)
{
molecule[nlocal] = ATOTAGINT(values[0]);
tri[nlocal] = atoi(values[1]);
if (tri[nlocal] == 0) tri[nlocal] = -1;
else if (tri[nlocal] == 1) tri[nlocal] = 0;
else error->one(FLERR,"Invalid atom type in Atoms section of data file");
rmass[nlocal] = atof(values[2]);
if (rmass[nlocal] <= 0.0)
error->one(FLERR,"Invalid density in Atoms section of data file");
if (tri[nlocal] < 0) {
radius[nlocal] = 0.5;
rmass[nlocal] *= 4.0*MY_PI/3.0 *
radius[nlocal]*radius[nlocal]*radius[nlocal];
} else radius[nlocal] = 0.0;
return 3;
}
/* ----------------------------------------------------------------------
unpack one line from Tris section of data file
------------------------------------------------------------------------- */
void AtomVecTri::data_atom_bonus(int m, char **values)
{
if (tri[m]) error->one(FLERR,"Assigning tri parameters to non-tri atom");
if (nlocal_bonus == nmax_bonus) grow_bonus();
double c1[3],c2[3],c3[3];
c1[0] = atof(values[0]);
c1[1] = atof(values[1]);
c1[2] = atof(values[2]);
c2[0] = atof(values[3]);
c2[1] = atof(values[4]);
c2[2] = atof(values[5]);
c3[0] = atof(values[6]);
c3[1] = atof(values[7]);
c3[2] = atof(values[8]);
// check for duplicate points
if (c1[0] == c2[0] && c1[1] == c2[1] && c1[2] == c2[2])
error->one(FLERR,"Invalid shape in Triangles section of data file");
if (c1[0] == c3[0] && c1[1] == c3[1] && c1[2] == c3[2])
error->one(FLERR,"Invalid shape in Triangles section of data file");
if (c2[0] == c3[0] && c2[1] == c3[1] && c2[2] == c3[2])
error->one(FLERR,"Invalid shape in Triangles section of data file");
// size = length of one edge
double c2mc1[3],c3mc1[3];
MathExtra::sub3(c2,c1,c2mc1);
MathExtra::sub3(c3,c1,c3mc1);
double size = MAX(MathExtra::len3(c2mc1),MathExtra::len3(c3mc1));
// centroid = 1/3 of sum of vertices
double centroid[3];
centroid[0] = (c1[0]+c2[0]+c3[0]) / 3.0;
centroid[1] = (c1[1]+c2[1]+c3[1]) / 3.0;
centroid[2] = (c1[2]+c2[2]+c3[2]) / 3.0;
double dx = centroid[0] - x[m][0];
double dy = centroid[1] - x[m][1];
double dz = centroid[2] - x[m][2];
double delta = sqrt(dx*dx + dy*dy + dz*dz);
if (delta/size > EPSILON)
error->one(FLERR,"Inconsistent triangle in data file");
x[m][0] = centroid[0];
x[m][1] = centroid[1];
x[m][2] = centroid[2];
// reset tri radius and mass
// rmass currently holds density
// tri area = 0.5 len(U x V), where U,V are edge vectors from one vertex
double c4[3];
MathExtra::sub3(c1,centroid,c4);
radius[m] = MathExtra::lensq3(c4);
MathExtra::sub3(c2,centroid,c4);
radius[m] = MAX(radius[m],MathExtra::lensq3(c4));
MathExtra::sub3(c3,centroid,c4);
radius[m] = MAX(radius[m],MathExtra::lensq3(c4));
radius[m] = sqrt(radius[m]);
double norm[3];
MathExtra::cross3(c2mc1,c3mc1,norm);
double area = 0.5 * MathExtra::len3(norm);
rmass[m] *= area;
// inertia = inertia tensor of triangle as 6-vector in Voigt notation
double inertia[6];
MathExtra::inertia_triangle(c1,c2,c3,rmass[m],inertia);
// diagonalize inertia tensor via Jacobi rotations
// bonus[].inertia = 3 eigenvalues = principal moments of inertia
// evectors and exzy_space = 3 evectors = principal axes of triangle
double tensor[3][3],evectors[3][3];
tensor[0][0] = inertia[0];
tensor[1][1] = inertia[1];
tensor[2][2] = inertia[2];
tensor[1][2] = tensor[2][1] = inertia[3];
tensor[0][2] = tensor[2][0] = inertia[4];
tensor[0][1] = tensor[1][0] = inertia[5];
int ierror = MathExtra::jacobi(tensor,bonus[nlocal_bonus].inertia,evectors);
if (ierror) error->one(FLERR,"Insufficient Jacobi rotations for triangle");
double ex_space[3],ey_space[3],ez_space[3];
ex_space[0] = evectors[0][0];
ex_space[1] = evectors[1][0];
ex_space[2] = evectors[2][0];
ey_space[0] = evectors[0][1];
ey_space[1] = evectors[1][1];
ey_space[2] = evectors[2][1];
ez_space[0] = evectors[0][2];
ez_space[1] = evectors[1][2];
ez_space[2] = evectors[2][2];
// enforce 3 orthogonal vectors as a right-handed coordinate system
// flip 3rd vector if needed
MathExtra::cross3(ex_space,ey_space,norm);
if (MathExtra::dot3(norm,ez_space) < 0.0) MathExtra::negate3(ez_space);
// create initial quaternion
MathExtra::exyz_to_q(ex_space,ey_space,ez_space,bonus[nlocal_bonus].quat);
// bonus c1,c2,c3 = displacement of c1,c2,c3 from centroid
// in basis of principal axes
double disp[3];
MathExtra::sub3(c1,centroid,disp);
MathExtra::transpose_matvec(ex_space,ey_space,ez_space,
disp,bonus[nlocal_bonus].c1);
MathExtra::sub3(c2,centroid,disp);
MathExtra::transpose_matvec(ex_space,ey_space,ez_space,
disp,bonus[nlocal_bonus].c2);
MathExtra::sub3(c3,centroid,disp);
MathExtra::transpose_matvec(ex_space,ey_space,ez_space,
disp,bonus[nlocal_bonus].c3);
bonus[nlocal_bonus].ilocal = m;
tri[m] = nlocal_bonus++;
}
/* ----------------------------------------------------------------------
unpack one line from Velocities section of data file
------------------------------------------------------------------------- */
void AtomVecTri::data_vel(int m, char **values)
{
v[m][0] = atof(values[0]);
v[m][1] = atof(values[1]);
v[m][2] = atof(values[2]);
omega[m][0] = atof(values[3]);
omega[m][1] = atof(values[4]);
omega[m][2] = atof(values[5]);
angmom[m][0] = atof(values[6]);
angmom[m][1] = atof(values[7]);
angmom[m][2] = atof(values[8]);
}
/* ----------------------------------------------------------------------
unpack hybrid quantities from one line in Velocities section of data file
------------------------------------------------------------------------- */
int AtomVecTri::data_vel_hybrid(int m, char **values)
{
omega[m][0] = atof(values[0]);
omega[m][1] = atof(values[1]);
omega[m][2] = atof(values[2]);
angmom[m][0] = atof(values[3]);
angmom[m][1] = atof(values[4]);
angmom[m][2] = atof(values[5]);
return 6;
}
/* ----------------------------------------------------------------------
pack atom info for data file including 3 image flags
------------------------------------------------------------------------- */
void AtomVecTri::pack_data(double **buf)
{
double c2mc1[3],c3mc1[3],norm[3];
double area;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
buf[i][0] = ubuf(tag[i]).d;
buf[i][1] = ubuf(molecule[i]).d;
buf[i][2] = ubuf(type[i]).d;
if (tri[i] < 0) buf[i][3] = ubuf(0).d;
else buf[i][3] = ubuf(1).d;
if (tri[i] < 0)
buf[i][4] = rmass[i] / (4.0*MY_PI/3.0 * radius[i]*radius[i]*radius[i]);
else {
MathExtra::sub3(bonus[tri[i]].c2,bonus[tri[i]].c1,c2mc1);
MathExtra::sub3(bonus[tri[i]].c3,bonus[tri[i]].c1,c3mc1);
MathExtra::cross3(c2mc1,c3mc1,norm);
area = 0.5 * MathExtra::len3(norm);
buf[i][4] = rmass[i]/area;
}
buf[i][5] = x[i][0];
buf[i][6] = x[i][1];
buf[i][7] = x[i][2];
buf[i][8] = ubuf((image[i] & IMGMASK) - IMGMAX).d;
buf[i][9] = ubuf((image[i] >> IMGBITS & IMGMASK) - IMGMAX).d;
buf[i][10] = ubuf((image[i] >> IMG2BITS) - IMGMAX).d;
}
}
/* ----------------------------------------------------------------------
pack hybrid atom info for data file
------------------------------------------------------------------------- */
int AtomVecTri::pack_data_hybrid(int i, double *buf)
{
buf[0] = ubuf(molecule[i]).d;
if (tri[i] < 0) buf[1] = ubuf(0).d;
else buf[1] = ubuf(1).d;
if (tri[i] < 0)
buf[2] = rmass[i] / (4.0*MY_PI/3.0 * radius[i]*radius[i]*radius[i]);
else {
double c2mc1[3],c3mc1[3],norm[3];
MathExtra::sub3(bonus[tri[i]].c2,bonus[tri[i]].c1,c2mc1);
MathExtra::sub3(bonus[tri[i]].c3,bonus[tri[i]].c1,c3mc1);
MathExtra::cross3(c2mc1,c3mc1,norm);
double area = 0.5 * MathExtra::len3(norm);
buf[2] = rmass[i]/area;
}
return 3;
}
/* ----------------------------------------------------------------------
write atom info to data file including 3 image flags
------------------------------------------------------------------------- */
void AtomVecTri::write_data(FILE *fp, int n, double **buf)
{
for (int i = 0; i < n; i++)
fprintf(fp,TAGINT_FORMAT " " TAGINT_FORMAT
" %d %d %-1.16e %-1.16e %-1.16e %-1.16e %d %d %d\n",
(tagint) ubuf(buf[i][0]).i,(tagint) ubuf(buf[i][1]).i,
(int) ubuf(buf[i][2]).i,(int) ubuf(buf[i][3]).i,
buf[i][4],buf[i][5],buf[i][6],buf[i][7],
(int) ubuf(buf[i][8]).i,(int) ubuf(buf[i][9]).i,
(int) ubuf(buf[i][10]).i);
}
/* ----------------------------------------------------------------------
write hybrid atom info to data file
------------------------------------------------------------------------- */
int AtomVecTri::write_data_hybrid(FILE *fp, double *buf)
{
fprintf(fp," " TAGINT_FORMAT " %d %-1.16e",
(tagint) ubuf(buf[0]).i,(int) ubuf(buf[1]).i,buf[2]);
return 3;
}
/* ----------------------------------------------------------------------
pack velocity info for data file
------------------------------------------------------------------------- */
void AtomVecTri::pack_vel(double **buf)
{
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
buf[i][0] = ubuf(tag[i]).d;
buf[i][1] = v[i][0];
buf[i][2] = v[i][1];
buf[i][3] = v[i][2];
buf[i][4] = omega[i][0];
buf[i][5] = omega[i][1];
buf[i][6] = omega[i][2];
buf[i][7] = angmom[i][0];
buf[i][8] = angmom[i][1];
buf[i][9] = angmom[i][2];
}
}
/* ----------------------------------------------------------------------
pack hybrid velocity info for data file
------------------------------------------------------------------------- */
int AtomVecTri::pack_vel_hybrid(int i, double *buf)
{
buf[0] = omega[i][0];
buf[1] = omega[i][1];
buf[2] = omega[i][2];
buf[3] = angmom[i][0];
buf[4] = angmom[i][1];
buf[5] = angmom[i][2];
return 6;
}
/* ----------------------------------------------------------------------
write velocity info to data file
------------------------------------------------------------------------- */
void AtomVecTri::write_vel(FILE *fp, int n, double **buf)
{
for (int i = 0; i < n; i++)
fprintf(fp,TAGINT_FORMAT
" %-1.16e %-1.16e %-1.16e %-1.16e %-1.16e %-1.16e "
"%-1.16e %-1.16e %-1.16e\n",
(tagint) ubuf(buf[i][0]).i,buf[i][1],buf[i][2],buf[i][3],
buf[i][4],buf[i][5],buf[i][6],buf[i][7],buf[i][8],buf[i][9]);
}
/* ----------------------------------------------------------------------
write hybrid velocity info to data file
------------------------------------------------------------------------- */
int AtomVecTri::write_vel_hybrid(FILE *fp, double *buf)
{
fprintf(fp," %-1.16e %-1.16e %-1.16e %-1.16e %-1.16e %-1.16e",
buf[0],buf[1],buf[2],buf[3],buf[4],buf[5]);
return 6;
}
/* ----------------------------------------------------------------------
return # of bytes of allocated memory
------------------------------------------------------------------------- */
bigint AtomVecTri::memory_usage()
{
bigint bytes = 0;
if (atom->memcheck("tag")) bytes += memory->usage(tag,nmax);
if (atom->memcheck("type")) bytes += memory->usage(type,nmax);
if (atom->memcheck("mask")) bytes += memory->usage(mask,nmax);
if (atom->memcheck("image")) bytes += memory->usage(image,nmax);
if (atom->memcheck("x")) bytes += memory->usage(x,nmax,3);
if (atom->memcheck("v")) bytes += memory->usage(v,nmax,3);
if (atom->memcheck("f")) bytes += memory->usage(f,nmax*comm->nthreads,3);
if (atom->memcheck("molecule")) bytes += memory->usage(molecule,nmax);
if (atom->memcheck("rmass")) bytes += memory->usage(rmass,nmax);
if (atom->memcheck("radius")) bytes += memory->usage(radius,nmax);
if (atom->memcheck("omega")) bytes += memory->usage(omega,nmax,3);
if (atom->memcheck("angmom")) bytes += memory->usage(angmom,nmax,3);
if (atom->memcheck("torque")) bytes +=
memory->usage(torque,nmax*comm->nthreads,3);
if (atom->memcheck("tri")) bytes += memory->usage(tri,nmax);
bytes += nmax_bonus*sizeof(Bonus);
return bytes;
}
diff --git a/src/comm_brick.cpp b/src/comm_brick.cpp
index 289b11782..d6cbed40a 100644
--- a/src/comm_brick.cpp
+++ b/src/comm_brick.cpp
@@ -1,1485 +1,1489 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing author (triclinic) : Pieter in 't Veld (SNL)
------------------------------------------------------------------------- */
#include <mpi.h>
#include <math.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include "comm_brick.h"
#include "comm_tiled.h"
#include "universe.h"
#include "atom.h"
#include "atom_vec.h"
#include "force.h"
#include "pair.h"
#include "domain.h"
#include "neighbor.h"
#include "group.h"
#include "modify.h"
#include "fix.h"
#include "compute.h"
#include "output.h"
#include "dump.h"
#include "math_extra.h"
#include "error.h"
#include "memory.h"
using namespace LAMMPS_NS;
#define BUFFACTOR 1.5
#define BUFMIN 1000
#define BUFEXTRA 1000
#define BIG 1.0e20
enum{SINGLE,MULTI}; // same as in Comm
enum{LAYOUT_UNIFORM,LAYOUT_NONUNIFORM,LAYOUT_TILED}; // several files
/* ---------------------------------------------------------------------- */
-CommBrick::CommBrick(LAMMPS *lmp) : Comm(lmp),
- sendnum(NULL), recvnum(NULL), sendproc(NULL), recvproc(NULL), size_forward_recv(NULL),
- size_reverse_send(NULL), size_reverse_recv(NULL), slablo(NULL), slabhi(NULL), multilo(NULL), multihi(NULL),
- cutghostmulti(NULL), pbc_flag(NULL), pbc(NULL), firstrecv(NULL), sendlist(NULL), maxsendlist(NULL), buf_send(NULL), buf_recv(NULL)
+CommBrick::CommBrick(LAMMPS *lmp) :
+ Comm(lmp),
+ sendnum(NULL), recvnum(NULL), sendproc(NULL), recvproc(NULL),
+ size_forward_recv(NULL),
+ size_reverse_send(NULL), size_reverse_recv(NULL),
+ slablo(NULL), slabhi(NULL), multilo(NULL), multihi(NULL),
+ cutghostmulti(NULL), pbc_flag(NULL), pbc(NULL), firstrecv(NULL),
+ sendlist(NULL), maxsendlist(NULL), buf_send(NULL), buf_recv(NULL)
{
style = 0;
layout = LAYOUT_UNIFORM;
pbc_flag = NULL;
init_buffers();
}
/* ---------------------------------------------------------------------- */
CommBrick::~CommBrick()
{
free_swap();
if (mode == MULTI) {
free_multi();
memory->destroy(cutghostmulti);
}
if (sendlist) for (int i = 0; i < maxswap; i++) memory->destroy(sendlist[i]);
memory->sfree(sendlist);
memory->destroy(maxsendlist);
memory->destroy(buf_send);
memory->destroy(buf_recv);
}
/* ---------------------------------------------------------------------- */
//IMPORTANT: we *MUST* pass "*oldcomm" to the Comm initializer here, as
// the code below *requires* that the (implicit) copy constructor
// for Comm is run and thus creating a shallow copy of "oldcomm".
// The call to Comm::copy_arrays() then converts the shallow copy
// into a deep copy of the class with the new layout.
CommBrick::CommBrick(LAMMPS *lmp, Comm *oldcomm) : Comm(*oldcomm)
{
if (oldcomm->layout == LAYOUT_TILED)
error->all(FLERR,"Cannot change to comm_style brick from tiled layout");
style = 0;
layout = oldcomm->layout;
Comm::copy_arrays(oldcomm);
init_buffers();
}
/* ----------------------------------------------------------------------
initialize comm buffers and other data structs local to CommBrick
------------------------------------------------------------------------- */
void CommBrick::init_buffers()
{
multilo = multihi = NULL;
cutghostmulti = NULL;
// bufextra = max size of one exchanged atom
// = allowed overflow of sendbuf in exchange()
// atomvec, fix reset these 2 maxexchange values if needed
// only necessary if their size > BUFEXTRA
maxexchange = maxexchange_atom + maxexchange_fix;
bufextra = maxexchange + BUFEXTRA;
maxsend = BUFMIN;
memory->create(buf_send,maxsend+bufextra,"comm:buf_send");
maxrecv = BUFMIN;
memory->create(buf_recv,maxrecv,"comm:buf_recv");
maxswap = 6;
allocate_swap(maxswap);
sendlist = (int **) memory->smalloc(maxswap*sizeof(int *),"comm:sendlist");
memory->create(maxsendlist,maxswap,"comm:maxsendlist");
for (int i = 0; i < maxswap; i++) {
maxsendlist[i] = BUFMIN;
memory->create(sendlist[i],BUFMIN,"comm:sendlist[i]");
}
}
/* ---------------------------------------------------------------------- */
void CommBrick::init()
{
Comm::init();
// memory for multi-style communication
if (mode == MULTI && multilo == NULL) {
allocate_multi(maxswap);
memory->create(cutghostmulti,atom->ntypes+1,3,"comm:cutghostmulti");
}
if (mode == SINGLE && multilo) {
free_multi();
memory->destroy(cutghostmulti);
}
}
/* ----------------------------------------------------------------------
setup spatial-decomposition communication patterns
function of neighbor cutoff(s) & cutghostuser & current box size
single mode sets slab boundaries (slablo,slabhi) based on max cutoff
multi mode sets type-dependent slab boundaries (multilo,multihi)
------------------------------------------------------------------------- */
void CommBrick::setup()
{
// cutghost[] = max distance at which ghost atoms need to be acquired
// for orthogonal:
// cutghost is in box coords = neigh->cutghost in all 3 dims
// for triclinic:
// neigh->cutghost = distance between tilted planes in box coords
// cutghost is in lamda coords = distance between those planes
// for multi:
// cutghostmulti = same as cutghost, only for each atom type
int i;
int ntypes = atom->ntypes;
double *prd,*sublo,*subhi;
double cut = MAX(neighbor->cutneighmax,cutghostuser);
if (triclinic == 0) {
prd = domain->prd;
sublo = domain->sublo;
subhi = domain->subhi;
cutghost[0] = cutghost[1] = cutghost[2] = cut;
if (mode == MULTI) {
double *cuttype = neighbor->cuttype;
for (i = 1; i <= ntypes; i++) {
cut = 0.0;
if (cutusermulti) cut = cutusermulti[i];
cutghostmulti[i][0] = MAX(cut,cuttype[i]);
cutghostmulti[i][1] = MAX(cut,cuttype[i]);
cutghostmulti[i][2] = MAX(cut,cuttype[i]);
}
}
} else {
prd = domain->prd_lamda;
sublo = domain->sublo_lamda;
subhi = domain->subhi_lamda;
double *h_inv = domain->h_inv;
double length0,length1,length2;
length0 = sqrt(h_inv[0]*h_inv[0] + h_inv[5]*h_inv[5] + h_inv[4]*h_inv[4]);
cutghost[0] = cut * length0;
length1 = sqrt(h_inv[1]*h_inv[1] + h_inv[3]*h_inv[3]);
cutghost[1] = cut * length1;
length2 = h_inv[2];
cutghost[2] = cut * length2;
if (mode == MULTI) {
double *cuttype = neighbor->cuttype;
for (i = 1; i <= ntypes; i++) {
cut = 0.0;
if (cutusermulti) cut = cutusermulti[i];
cutghostmulti[i][0] = length0 * MAX(cut,cuttype[i]);
cutghostmulti[i][1] = length1 * MAX(cut,cuttype[i]);
cutghostmulti[i][2] = length2 * MAX(cut,cuttype[i]);
}
}
}
// recvneed[idim][0/1] = # of procs away I recv atoms from, within cutghost
// 0 = from left, 1 = from right
// do not cross non-periodic boundaries, need[2] = 0 for 2d
// sendneed[idim][0/1] = # of procs away I send atoms to
// 0 = to left, 1 = to right
// set equal to recvneed[idim][1/0] of neighbor proc
// maxneed[idim] = max procs away any proc recvs atoms in either direction
// layout = UNIFORM = uniform sized sub-domains:
// maxneed is directly computable from sub-domain size
// limit to procgrid-1 for non-PBC
// recvneed = maxneed except for procs near non-PBC
// sendneed = recvneed of neighbor on each side
// layout = NONUNIFORM = non-uniform sized sub-domains:
// compute recvneed via updown() which accounts for non-PBC
// sendneed = recvneed of neighbor on each side
// maxneed via Allreduce() of recvneed
int *periodicity = domain->periodicity;
int left,right;
if (layout == LAYOUT_UNIFORM) {
maxneed[0] = static_cast<int> (cutghost[0] * procgrid[0] / prd[0]) + 1;
maxneed[1] = static_cast<int> (cutghost[1] * procgrid[1] / prd[1]) + 1;
maxneed[2] = static_cast<int> (cutghost[2] * procgrid[2] / prd[2]) + 1;
if (domain->dimension == 2) maxneed[2] = 0;
if (!periodicity[0]) maxneed[0] = MIN(maxneed[0],procgrid[0]-1);
if (!periodicity[1]) maxneed[1] = MIN(maxneed[1],procgrid[1]-1);
if (!periodicity[2]) maxneed[2] = MIN(maxneed[2],procgrid[2]-1);
if (!periodicity[0]) {
recvneed[0][0] = MIN(maxneed[0],myloc[0]);
recvneed[0][1] = MIN(maxneed[0],procgrid[0]-myloc[0]-1);
left = myloc[0] - 1;
if (left < 0) left = procgrid[0] - 1;
sendneed[0][0] = MIN(maxneed[0],procgrid[0]-left-1);
right = myloc[0] + 1;
if (right == procgrid[0]) right = 0;
sendneed[0][1] = MIN(maxneed[0],right);
} else recvneed[0][0] = recvneed[0][1] =
sendneed[0][0] = sendneed[0][1] = maxneed[0];
if (!periodicity[1]) {
recvneed[1][0] = MIN(maxneed[1],myloc[1]);
recvneed[1][1] = MIN(maxneed[1],procgrid[1]-myloc[1]-1);
left = myloc[1] - 1;
if (left < 0) left = procgrid[1] - 1;
sendneed[1][0] = MIN(maxneed[1],procgrid[1]-left-1);
right = myloc[1] + 1;
if (right == procgrid[1]) right = 0;
sendneed[1][1] = MIN(maxneed[1],right);
} else recvneed[1][0] = recvneed[1][1] =
sendneed[1][0] = sendneed[1][1] = maxneed[1];
if (!periodicity[2]) {
recvneed[2][0] = MIN(maxneed[2],myloc[2]);
recvneed[2][1] = MIN(maxneed[2],procgrid[2]-myloc[2]-1);
left = myloc[2] - 1;
if (left < 0) left = procgrid[2] - 1;
sendneed[2][0] = MIN(maxneed[2],procgrid[2]-left-1);
right = myloc[2] + 1;
if (right == procgrid[2]) right = 0;
sendneed[2][1] = MIN(maxneed[2],right);
} else recvneed[2][0] = recvneed[2][1] =
sendneed[2][0] = sendneed[2][1] = maxneed[2];
} else {
recvneed[0][0] = updown(0,0,myloc[0],prd[0],periodicity[0],xsplit);
recvneed[0][1] = updown(0,1,myloc[0],prd[0],periodicity[0],xsplit);
left = myloc[0] - 1;
if (left < 0) left = procgrid[0] - 1;
sendneed[0][0] = updown(0,1,left,prd[0],periodicity[0],xsplit);
right = myloc[0] + 1;
if (right == procgrid[0]) right = 0;
sendneed[0][1] = updown(0,0,right,prd[0],periodicity[0],xsplit);
recvneed[1][0] = updown(1,0,myloc[1],prd[1],periodicity[1],ysplit);
recvneed[1][1] = updown(1,1,myloc[1],prd[1],periodicity[1],ysplit);
left = myloc[1] - 1;
if (left < 0) left = procgrid[1] - 1;
sendneed[1][0] = updown(1,1,left,prd[1],periodicity[1],ysplit);
right = myloc[1] + 1;
if (right == procgrid[1]) right = 0;
sendneed[1][1] = updown(1,0,right,prd[1],periodicity[1],ysplit);
if (domain->dimension == 3) {
recvneed[2][0] = updown(2,0,myloc[2],prd[2],periodicity[2],zsplit);
recvneed[2][1] = updown(2,1,myloc[2],prd[2],periodicity[2],zsplit);
left = myloc[2] - 1;
if (left < 0) left = procgrid[2] - 1;
sendneed[2][0] = updown(2,1,left,prd[2],periodicity[2],zsplit);
right = myloc[2] + 1;
if (right == procgrid[2]) right = 0;
sendneed[2][1] = updown(2,0,right,prd[2],periodicity[2],zsplit);
} else recvneed[2][0] = recvneed[2][1] =
sendneed[2][0] = sendneed[2][1] = 0;
int all[6];
MPI_Allreduce(&recvneed[0][0],all,6,MPI_INT,MPI_MAX,world);
maxneed[0] = MAX(all[0],all[1]);
maxneed[1] = MAX(all[2],all[3]);
maxneed[2] = MAX(all[4],all[5]);
}
// allocate comm memory
nswap = 2 * (maxneed[0]+maxneed[1]+maxneed[2]);
if (nswap > maxswap) grow_swap(nswap);
// setup parameters for each exchange:
// sendproc = proc to send to at each swap
// recvproc = proc to recv from at each swap
// for mode SINGLE:
// slablo/slabhi = boundaries for slab of atoms to send at each swap
// use -BIG/midpt/BIG to insure all atoms included even if round-off occurs
// if round-off, atoms recvd across PBC can be < or > than subbox boundary
// note that borders() only loops over subset of atoms during each swap
// treat all as PBC here, non-PBC is handled in borders() via r/s need[][]
// for mode MULTI:
// multilo/multihi is same, with slablo/slabhi for each atom type
// pbc_flag: 0 = nothing across a boundary, 1 = something across a boundary
// pbc = -1/0/1 for PBC factor in each of 3/6 orthogonal/triclinic dirs
// for triclinic, slablo/hi and pbc_border will be used in lamda (0-1) coords
// 1st part of if statement is sending to the west/south/down
// 2nd part of if statement is sending to the east/north/up
int dim,ineed;
int iswap = 0;
for (dim = 0; dim < 3; dim++) {
for (ineed = 0; ineed < 2*maxneed[dim]; ineed++) {
pbc_flag[iswap] = 0;
pbc[iswap][0] = pbc[iswap][1] = pbc[iswap][2] =
pbc[iswap][3] = pbc[iswap][4] = pbc[iswap][5] = 0;
if (ineed % 2 == 0) {
sendproc[iswap] = procneigh[dim][0];
recvproc[iswap] = procneigh[dim][1];
if (mode == SINGLE) {
if (ineed < 2) slablo[iswap] = -BIG;
else slablo[iswap] = 0.5 * (sublo[dim] + subhi[dim]);
slabhi[iswap] = sublo[dim] + cutghost[dim];
} else {
for (i = 1; i <= ntypes; i++) {
if (ineed < 2) multilo[iswap][i] = -BIG;
else multilo[iswap][i] = 0.5 * (sublo[dim] + subhi[dim]);
multihi[iswap][i] = sublo[dim] + cutghostmulti[i][dim];
}
}
if (myloc[dim] == 0) {
pbc_flag[iswap] = 1;
pbc[iswap][dim] = 1;
if (triclinic) {
if (dim == 1) pbc[iswap][5] = 1;
else if (dim == 2) pbc[iswap][4] = pbc[iswap][3] = 1;
}
}
} else {
sendproc[iswap] = procneigh[dim][1];
recvproc[iswap] = procneigh[dim][0];
if (mode == SINGLE) {
slablo[iswap] = subhi[dim] - cutghost[dim];
if (ineed < 2) slabhi[iswap] = BIG;
else slabhi[iswap] = 0.5 * (sublo[dim] + subhi[dim]);
} else {
for (i = 1; i <= ntypes; i++) {
multilo[iswap][i] = subhi[dim] - cutghostmulti[i][dim];
if (ineed < 2) multihi[iswap][i] = BIG;
else multihi[iswap][i] = 0.5 * (sublo[dim] + subhi[dim]);
}
}
if (myloc[dim] == procgrid[dim]-1) {
pbc_flag[iswap] = 1;
pbc[iswap][dim] = -1;
if (triclinic) {
if (dim == 1) pbc[iswap][5] = -1;
else if (dim == 2) pbc[iswap][4] = pbc[iswap][3] = -1;
}
}
}
iswap++;
}
}
}
/* ----------------------------------------------------------------------
walk up/down the extent of nearby processors in dim and dir
loc = myloc of proc to start at
dir = 0/1 = walk to left/right
do not cross non-periodic boundaries
is not called for z dim in 2d
return how many procs away are needed to encompass cutghost away from loc
------------------------------------------------------------------------- */
int CommBrick::updown(int dim, int dir, int loc,
double prd, int periodicity, double *split)
{
int index,count;
double frac,delta;
if (dir == 0) {
frac = cutghost[dim]/prd;
index = loc - 1;
delta = 0.0;
count = 0;
while (delta < frac) {
if (index < 0) {
if (!periodicity) break;
index = procgrid[dim] - 1;
}
count++;
delta += split[index+1] - split[index];
index--;
}
} else {
frac = cutghost[dim]/prd;
index = loc + 1;
delta = 0.0;
count = 0;
while (delta < frac) {
if (index >= procgrid[dim]) {
if (!periodicity) break;
index = 0;
}
count++;
delta += split[index+1] - split[index];
index++;
}
}
return count;
}
/* ----------------------------------------------------------------------
forward communication of atom coords every timestep
other per-atom attributes may also be sent via pack/unpack routines
------------------------------------------------------------------------- */
void CommBrick::forward_comm(int dummy)
{
int n;
MPI_Request request;
AtomVec *avec = atom->avec;
double **x = atom->x;
double *buf;
// exchange data with another proc
// if other proc is self, just copy
// if comm_x_only set, exchange or copy directly to x, don't unpack
for (int iswap = 0; iswap < nswap; iswap++) {
if (sendproc[iswap] != me) {
if (comm_x_only) {
if (size_forward_recv[iswap]) {
if (size_forward_recv[iswap]) buf = x[firstrecv[iswap]];
else buf = NULL;
MPI_Irecv(buf,size_forward_recv[iswap],MPI_DOUBLE,
recvproc[iswap],0,world,&request);
}
n = avec->pack_comm(sendnum[iswap],sendlist[iswap],
buf_send,pbc_flag[iswap],pbc[iswap]);
if (n) MPI_Send(buf_send,n,MPI_DOUBLE,sendproc[iswap],0,world);
if (size_forward_recv[iswap]) MPI_Wait(&request,MPI_STATUS_IGNORE);
} else if (ghost_velocity) {
if (size_forward_recv[iswap])
MPI_Irecv(buf_recv,size_forward_recv[iswap],MPI_DOUBLE,
recvproc[iswap],0,world,&request);
n = avec->pack_comm_vel(sendnum[iswap],sendlist[iswap],
buf_send,pbc_flag[iswap],pbc[iswap]);
if (n) MPI_Send(buf_send,n,MPI_DOUBLE,sendproc[iswap],0,world);
if (size_forward_recv[iswap]) MPI_Wait(&request,MPI_STATUS_IGNORE);
avec->unpack_comm_vel(recvnum[iswap],firstrecv[iswap],buf_recv);
} else {
if (size_forward_recv[iswap])
MPI_Irecv(buf_recv,size_forward_recv[iswap],MPI_DOUBLE,
recvproc[iswap],0,world,&request);
n = avec->pack_comm(sendnum[iswap],sendlist[iswap],
buf_send,pbc_flag[iswap],pbc[iswap]);
if (n) MPI_Send(buf_send,n,MPI_DOUBLE,sendproc[iswap],0,world);
if (size_forward_recv[iswap]) MPI_Wait(&request,MPI_STATUS_IGNORE);
avec->unpack_comm(recvnum[iswap],firstrecv[iswap],buf_recv);
}
} else {
if (comm_x_only) {
if (sendnum[iswap])
avec->pack_comm(sendnum[iswap],sendlist[iswap],
x[firstrecv[iswap]],pbc_flag[iswap],pbc[iswap]);
} else if (ghost_velocity) {
avec->pack_comm_vel(sendnum[iswap],sendlist[iswap],
buf_send,pbc_flag[iswap],pbc[iswap]);
avec->unpack_comm_vel(recvnum[iswap],firstrecv[iswap],buf_send);
} else {
avec->pack_comm(sendnum[iswap],sendlist[iswap],
buf_send,pbc_flag[iswap],pbc[iswap]);
avec->unpack_comm(recvnum[iswap],firstrecv[iswap],buf_send);
}
}
}
}
/* ----------------------------------------------------------------------
reverse communication of forces on atoms every timestep
other per-atom attributes may also be sent via pack/unpack routines
------------------------------------------------------------------------- */
void CommBrick::reverse_comm()
{
int n;
MPI_Request request;
AtomVec *avec = atom->avec;
double **f = atom->f;
double *buf;
// exchange data with another proc
// if other proc is self, just copy
// if comm_f_only set, exchange or copy directly from f, don't pack
for (int iswap = nswap-1; iswap >= 0; iswap--) {
if (sendproc[iswap] != me) {
if (comm_f_only) {
if (size_reverse_recv[iswap])
MPI_Irecv(buf_recv,size_reverse_recv[iswap],MPI_DOUBLE,
sendproc[iswap],0,world,&request);
if (size_reverse_send[iswap]) {
if (size_reverse_send[iswap]) buf = f[firstrecv[iswap]];
else buf = NULL;
MPI_Send(buf,size_reverse_send[iswap],MPI_DOUBLE,
recvproc[iswap],0,world);
}
if (size_reverse_recv[iswap]) MPI_Wait(&request,MPI_STATUS_IGNORE);
} else {
if (size_reverse_recv[iswap])
MPI_Irecv(buf_recv,size_reverse_recv[iswap],MPI_DOUBLE,
sendproc[iswap],0,world,&request);
n = avec->pack_reverse(recvnum[iswap],firstrecv[iswap],buf_send);
if (n) MPI_Send(buf_send,n,MPI_DOUBLE,recvproc[iswap],0,world);
if (size_reverse_recv[iswap]) MPI_Wait(&request,MPI_STATUS_IGNORE);
}
avec->unpack_reverse(sendnum[iswap],sendlist[iswap],buf_recv);
} else {
if (comm_f_only) {
if (sendnum[iswap])
avec->unpack_reverse(sendnum[iswap],sendlist[iswap],
f[firstrecv[iswap]]);
} else {
avec->pack_reverse(recvnum[iswap],firstrecv[iswap],buf_send);
avec->unpack_reverse(sendnum[iswap],sendlist[iswap],buf_send);
}
}
}
}
/* ----------------------------------------------------------------------
exchange: move atoms to correct processors
atoms exchanged with all 6 stencil neighbors
send out atoms that have left my box, receive ones entering my box
atoms will be lost if not inside a stencil proc's box
can happen if atom moves outside of non-periodic bounary
or if atom moves more than one proc away
this routine called before every reneighboring
for triclinic, atoms must be in lamda coords (0-1) before exchange is called
------------------------------------------------------------------------- */
void CommBrick::exchange()
{
int i,m,nsend,nrecv,nrecv1,nrecv2,nlocal;
double lo,hi,value;
double **x;
double *sublo,*subhi;
MPI_Request request;
AtomVec *avec = atom->avec;
// clear global->local map for owned and ghost atoms
// b/c atoms migrate to new procs in exchange() and
// new ghosts are created in borders()
// map_set() is done at end of borders()
// clear ghost count and any ghost bonus data internal to AtomVec
if (map_style) atom->map_clear();
atom->nghost = 0;
atom->avec->clear_bonus();
// insure send buf is large enough for single atom
// bufextra = max size of one atom = allowed overflow of sendbuf
// fixes can change per-atom size requirement on-the-fly
int bufextra_old = bufextra;
maxexchange = maxexchange_atom + maxexchange_fix;
bufextra = maxexchange + BUFEXTRA;
if (bufextra > bufextra_old)
memory->grow(buf_send,maxsend+bufextra,"comm:buf_send");
// subbox bounds for orthogonal or triclinic
if (triclinic == 0) {
sublo = domain->sublo;
subhi = domain->subhi;
} else {
sublo = domain->sublo_lamda;
subhi = domain->subhi_lamda;
}
// loop over dimensions
int dimension = domain->dimension;
for (int dim = 0; dim < dimension; dim++) {
// fill buffer with atoms leaving my box, using < and >=
// when atom is deleted, fill it in with last atom
x = atom->x;
lo = sublo[dim];
hi = subhi[dim];
nlocal = atom->nlocal;
i = nsend = 0;
while (i < nlocal) {
if (x[i][dim] < lo || x[i][dim] >= hi) {
if (nsend > maxsend) grow_send(nsend,1);
nsend += avec->pack_exchange(i,&buf_send[nsend]);
avec->copy(nlocal-1,i,1);
nlocal--;
} else i++;
}
atom->nlocal = nlocal;
// send/recv atoms in both directions
// send size of message first so receiver can realloc buf_recv if needed
// if 1 proc in dimension, no send/recv
// set nrecv = 0 so buf_send atoms will be lost
// if 2 procs in dimension, single send/recv
// if more than 2 procs in dimension, send/recv to both neighbors
if (procgrid[dim] == 1) nrecv = 0;
else {
MPI_Sendrecv(&nsend,1,MPI_INT,procneigh[dim][0],0,
&nrecv1,1,MPI_INT,procneigh[dim][1],0,world,
MPI_STATUS_IGNORE);
nrecv = nrecv1;
if (procgrid[dim] > 2) {
MPI_Sendrecv(&nsend,1,MPI_INT,procneigh[dim][1],0,
&nrecv2,1,MPI_INT,procneigh[dim][0],0,world,
MPI_STATUS_IGNORE);
nrecv += nrecv2;
}
if (nrecv > maxrecv) grow_recv(nrecv);
MPI_Irecv(buf_recv,nrecv1,MPI_DOUBLE,procneigh[dim][1],0,
world,&request);
MPI_Send(buf_send,nsend,MPI_DOUBLE,procneigh[dim][0],0,world);
MPI_Wait(&request,MPI_STATUS_IGNORE);
if (procgrid[dim] > 2) {
MPI_Irecv(&buf_recv[nrecv1],nrecv2,MPI_DOUBLE,procneigh[dim][0],0,
world,&request);
MPI_Send(buf_send,nsend,MPI_DOUBLE,procneigh[dim][1],0,world);
MPI_Wait(&request,MPI_STATUS_IGNORE);
}
}
// check incoming atoms to see if they are in my box
// if so, add to my list
// box check is only for this dimension,
// atom may be passed to another proc in later dims
m = 0;
while (m < nrecv) {
value = buf_recv[m+dim+1];
if (value >= lo && value < hi) m += avec->unpack_exchange(&buf_recv[m]);
else m += static_cast<int> (buf_recv[m]);
}
}
if (atom->firstgroupname) atom->first_reorder();
}
/* ----------------------------------------------------------------------
borders: list nearby atoms to send to neighboring procs at every timestep
one list is created for every swap that will be made
as list is made, actually do swaps
this does equivalent of a forward_comm(), so don't need to explicitly
call forward_comm() on reneighboring timestep
this routine is called before every reneighboring
for triclinic, atoms must be in lamda coords (0-1) before borders is called
------------------------------------------------------------------------- */
void CommBrick::borders()
{
int i,n,itype,iswap,dim,ineed,twoneed;
int nsend,nrecv,sendflag,nfirst,nlast,ngroup;
double lo,hi;
int *type;
double **x;
double *buf,*mlo,*mhi;
MPI_Request request;
AtomVec *avec = atom->avec;
// do swaps over all 3 dimensions
iswap = 0;
smax = rmax = 0;
for (dim = 0; dim < 3; dim++) {
nlast = 0;
twoneed = 2*maxneed[dim];
for (ineed = 0; ineed < twoneed; ineed++) {
// find atoms within slab boundaries lo/hi using <= and >=
// check atoms between nfirst and nlast
// for first swaps in a dim, check owned and ghost
// for later swaps in a dim, only check newly arrived ghosts
// store sent atom indices in sendlist for use in future timesteps
x = atom->x;
if (mode == SINGLE) {
lo = slablo[iswap];
hi = slabhi[iswap];
} else {
type = atom->type;
mlo = multilo[iswap];
mhi = multihi[iswap];
}
if (ineed % 2 == 0) {
nfirst = nlast;
nlast = atom->nlocal + atom->nghost;
}
nsend = 0;
// sendflag = 0 if I do not send on this swap
// sendneed test indicates receiver no longer requires data
// e.g. due to non-PBC or non-uniform sub-domains
if (ineed/2 >= sendneed[dim][ineed % 2]) sendflag = 0;
else sendflag = 1;
// find send atoms according to SINGLE vs MULTI
// all atoms eligible versus only atoms in bordergroup
// can only limit loop to bordergroup for first sends (ineed < 2)
// on these sends, break loop in two: owned (in group) and ghost
if (sendflag) {
if (!bordergroup || ineed >= 2) {
if (mode == SINGLE) {
for (i = nfirst; i < nlast; i++)
if (x[i][dim] >= lo && x[i][dim] <= hi) {
if (nsend == maxsendlist[iswap]) grow_list(iswap,nsend);
sendlist[iswap][nsend++] = i;
}
} else {
for (i = nfirst; i < nlast; i++) {
itype = type[i];
if (x[i][dim] >= mlo[itype] && x[i][dim] <= mhi[itype]) {
if (nsend == maxsendlist[iswap]) grow_list(iswap,nsend);
sendlist[iswap][nsend++] = i;
}
}
}
} else {
if (mode == SINGLE) {
ngroup = atom->nfirst;
for (i = 0; i < ngroup; i++)
if (x[i][dim] >= lo && x[i][dim] <= hi) {
if (nsend == maxsendlist[iswap]) grow_list(iswap,nsend);
sendlist[iswap][nsend++] = i;
}
for (i = atom->nlocal; i < nlast; i++)
if (x[i][dim] >= lo && x[i][dim] <= hi) {
if (nsend == maxsendlist[iswap]) grow_list(iswap,nsend);
sendlist[iswap][nsend++] = i;
}
} else {
ngroup = atom->nfirst;
for (i = 0; i < ngroup; i++) {
itype = type[i];
if (x[i][dim] >= mlo[itype] && x[i][dim] <= mhi[itype]) {
if (nsend == maxsendlist[iswap]) grow_list(iswap,nsend);
sendlist[iswap][nsend++] = i;
}
}
for (i = atom->nlocal; i < nlast; i++) {
itype = type[i];
if (x[i][dim] >= mlo[itype] && x[i][dim] <= mhi[itype]) {
if (nsend == maxsendlist[iswap]) grow_list(iswap,nsend);
sendlist[iswap][nsend++] = i;
}
}
}
}
}
// pack up list of border atoms
if (nsend*size_border > maxsend) grow_send(nsend*size_border,0);
if (ghost_velocity)
n = avec->pack_border_vel(nsend,sendlist[iswap],buf_send,
pbc_flag[iswap],pbc[iswap]);
else
n = avec->pack_border(nsend,sendlist[iswap],buf_send,
pbc_flag[iswap],pbc[iswap]);
// swap atoms with other proc
// no MPI calls except SendRecv if nsend/nrecv = 0
// put incoming ghosts at end of my atom arrays
// if swapping with self, simply copy, no messages
if (sendproc[iswap] != me) {
MPI_Sendrecv(&nsend,1,MPI_INT,sendproc[iswap],0,
&nrecv,1,MPI_INT,recvproc[iswap],0,world,
MPI_STATUS_IGNORE);
if (nrecv*size_border > maxrecv) grow_recv(nrecv*size_border);
if (nrecv) MPI_Irecv(buf_recv,nrecv*size_border,MPI_DOUBLE,
recvproc[iswap],0,world,&request);
if (n) MPI_Send(buf_send,n,MPI_DOUBLE,sendproc[iswap],0,world);
if (nrecv) MPI_Wait(&request,MPI_STATUS_IGNORE);
buf = buf_recv;
} else {
nrecv = nsend;
buf = buf_send;
}
// unpack buffer
if (ghost_velocity)
avec->unpack_border_vel(nrecv,atom->nlocal+atom->nghost,buf);
else
avec->unpack_border(nrecv,atom->nlocal+atom->nghost,buf);
// set all pointers & counters
smax = MAX(smax,nsend);
rmax = MAX(rmax,nrecv);
sendnum[iswap] = nsend;
recvnum[iswap] = nrecv;
size_forward_recv[iswap] = nrecv*size_forward;
size_reverse_send[iswap] = nrecv*size_reverse;
size_reverse_recv[iswap] = nsend*size_reverse;
firstrecv[iswap] = atom->nlocal + atom->nghost;
atom->nghost += nrecv;
iswap++;
}
}
// insure send/recv buffers are long enough for all forward & reverse comm
int max = MAX(maxforward*smax,maxreverse*rmax);
if (max > maxsend) grow_send(max,0);
max = MAX(maxforward*rmax,maxreverse*smax);
if (max > maxrecv) grow_recv(max);
// reset global->local map
if (map_style) atom->map_set();
}
/* ----------------------------------------------------------------------
forward communication invoked by a Pair
nsize used only to set recv buffer limit
------------------------------------------------------------------------- */
void CommBrick::forward_comm_pair(Pair *pair)
{
int iswap,n;
double *buf;
MPI_Request request;
int nsize = pair->comm_forward;
for (iswap = 0; iswap < nswap; iswap++) {
// pack buffer
n = pair->pack_forward_comm(sendnum[iswap],sendlist[iswap],
buf_send,pbc_flag[iswap],pbc[iswap]);
// exchange with another proc
// if self, set recv buffer to send buffer
if (sendproc[iswap] != me) {
if (recvnum[iswap])
MPI_Irecv(buf_recv,nsize*recvnum[iswap],MPI_DOUBLE,
recvproc[iswap],0,world,&request);
if (sendnum[iswap])
MPI_Send(buf_send,n,MPI_DOUBLE,sendproc[iswap],0,world);
if (recvnum[iswap]) MPI_Wait(&request,MPI_STATUS_IGNORE);
buf = buf_recv;
} else buf = buf_send;
// unpack buffer
pair->unpack_forward_comm(recvnum[iswap],firstrecv[iswap],buf);
}
}
/* ----------------------------------------------------------------------
reverse communication invoked by a Pair
nsize used only to set recv buffer limit
------------------------------------------------------------------------- */
void CommBrick::reverse_comm_pair(Pair *pair)
{
int iswap,n;
double *buf;
MPI_Request request;
int nsize = MAX(pair->comm_reverse,pair->comm_reverse_off);
for (iswap = nswap-1; iswap >= 0; iswap--) {
// pack buffer
n = pair->pack_reverse_comm(recvnum[iswap],firstrecv[iswap],buf_send);
// exchange with another proc
// if self, set recv buffer to send buffer
if (sendproc[iswap] != me) {
if (sendnum[iswap])
MPI_Irecv(buf_recv,nsize*sendnum[iswap],MPI_DOUBLE,sendproc[iswap],0,
world,&request);
if (recvnum[iswap])
MPI_Send(buf_send,n,MPI_DOUBLE,recvproc[iswap],0,world);
if (sendnum[iswap]) MPI_Wait(&request,MPI_STATUS_IGNORE);
buf = buf_recv;
} else buf = buf_send;
// unpack buffer
pair->unpack_reverse_comm(sendnum[iswap],sendlist[iswap],buf);
}
}
/* ----------------------------------------------------------------------
forward communication invoked by a Fix
size/nsize used only to set recv buffer limit
size = 0 (default) -> use comm_forward from Fix
size > 0 -> Fix passes max size per atom
the latter is only useful if Fix does several comm modes,
some are smaller than max stored in its comm_forward
------------------------------------------------------------------------- */
void CommBrick::forward_comm_fix(Fix *fix, int size)
{
int iswap,n,nsize;
double *buf;
MPI_Request request;
if (size) nsize = size;
else nsize = fix->comm_forward;
for (iswap = 0; iswap < nswap; iswap++) {
// pack buffer
n = fix->pack_forward_comm(sendnum[iswap],sendlist[iswap],
buf_send,pbc_flag[iswap],pbc[iswap]);
// exchange with another proc
// if self, set recv buffer to send buffer
if (sendproc[iswap] != me) {
if (recvnum[iswap])
MPI_Irecv(buf_recv,nsize*recvnum[iswap],MPI_DOUBLE,recvproc[iswap],0,
world,&request);
if (sendnum[iswap])
MPI_Send(buf_send,n,MPI_DOUBLE,sendproc[iswap],0,world);
if (recvnum[iswap]) MPI_Wait(&request,MPI_STATUS_IGNORE);
buf = buf_recv;
} else buf = buf_send;
// unpack buffer
fix->unpack_forward_comm(recvnum[iswap],firstrecv[iswap],buf);
}
}
/* ----------------------------------------------------------------------
reverse communication invoked by a Fix
size/nsize used only to set recv buffer limit
size = 0 (default) -> use comm_forward from Fix
size > 0 -> Fix passes max size per atom
the latter is only useful if Fix does several comm modes,
some are smaller than max stored in its comm_forward
------------------------------------------------------------------------- */
void CommBrick::reverse_comm_fix(Fix *fix, int size)
{
int iswap,n,nsize;
double *buf;
MPI_Request request;
if (size) nsize = size;
else nsize = fix->comm_reverse;
for (iswap = nswap-1; iswap >= 0; iswap--) {
// pack buffer
n = fix->pack_reverse_comm(recvnum[iswap],firstrecv[iswap],buf_send);
// exchange with another proc
// if self, set recv buffer to send buffer
if (sendproc[iswap] != me) {
if (sendnum[iswap])
MPI_Irecv(buf_recv,nsize*sendnum[iswap],MPI_DOUBLE,sendproc[iswap],0,
world,&request);
if (recvnum[iswap])
MPI_Send(buf_send,n,MPI_DOUBLE,recvproc[iswap],0,world);
if (sendnum[iswap]) MPI_Wait(&request,MPI_STATUS_IGNORE);
buf = buf_recv;
} else buf = buf_send;
// unpack buffer
fix->unpack_reverse_comm(sendnum[iswap],sendlist[iswap],buf);
}
}
/* ----------------------------------------------------------------------
reverse communication invoked by a Fix with variable size data
query fix for pack size to insure buf_send is big enough
handshake sizes before each Irecv/Send to insure buf_recv is big enough
------------------------------------------------------------------------- */
void CommBrick::reverse_comm_fix_variable(Fix *fix)
{
int iswap,nsend,nrecv;
double *buf;
MPI_Request request;
for (iswap = nswap-1; iswap >= 0; iswap--) {
// pack buffer
nsend = fix->pack_reverse_comm_size(recvnum[iswap],firstrecv[iswap]);
if (nsend > maxsend) grow_send(nsend,0);
nsend = fix->pack_reverse_comm(recvnum[iswap],firstrecv[iswap],buf_send);
// exchange with another proc
// if self, set recv buffer to send buffer
if (sendproc[iswap] != me) {
MPI_Sendrecv(&nsend,1,MPI_INT,recvproc[iswap],0,
&nrecv,1,MPI_INT,sendproc[iswap],0,world,
MPI_STATUS_IGNORE);
if (sendnum[iswap]) {
if (nrecv > maxrecv) grow_recv(nrecv);
MPI_Irecv(buf_recv,maxrecv,MPI_DOUBLE,sendproc[iswap],0,
world,&request);
}
if (recvnum[iswap])
MPI_Send(buf_send,nsend,MPI_DOUBLE,recvproc[iswap],0,world);
if (sendnum[iswap]) MPI_Wait(&request,MPI_STATUS_IGNORE);
buf = buf_recv;
} else buf = buf_send;
// unpack buffer
fix->unpack_reverse_comm(sendnum[iswap],sendlist[iswap],buf);
}
}
/* ----------------------------------------------------------------------
forward communication invoked by a Compute
nsize used only to set recv buffer limit
------------------------------------------------------------------------- */
void CommBrick::forward_comm_compute(Compute *compute)
{
int iswap,n;
double *buf;
MPI_Request request;
int nsize = compute->comm_forward;
for (iswap = 0; iswap < nswap; iswap++) {
// pack buffer
n = compute->pack_forward_comm(sendnum[iswap],sendlist[iswap],
buf_send,pbc_flag[iswap],pbc[iswap]);
// exchange with another proc
// if self, set recv buffer to send buffer
if (sendproc[iswap] != me) {
if (recvnum[iswap])
MPI_Irecv(buf_recv,nsize*recvnum[iswap],MPI_DOUBLE,recvproc[iswap],0,
world,&request);
if (sendnum[iswap])
MPI_Send(buf_send,n,MPI_DOUBLE,sendproc[iswap],0,world);
if (recvnum[iswap]) MPI_Wait(&request,MPI_STATUS_IGNORE);
buf = buf_recv;
} else buf = buf_send;
// unpack buffer
compute->unpack_forward_comm(recvnum[iswap],firstrecv[iswap],buf);
}
}
/* ----------------------------------------------------------------------
reverse communication invoked by a Compute
nsize used only to set recv buffer limit
------------------------------------------------------------------------- */
void CommBrick::reverse_comm_compute(Compute *compute)
{
int iswap,n;
double *buf;
MPI_Request request;
int nsize = compute->comm_reverse;
for (iswap = nswap-1; iswap >= 0; iswap--) {
// pack buffer
n = compute->pack_reverse_comm(recvnum[iswap],firstrecv[iswap],buf_send);
// exchange with another proc
// if self, set recv buffer to send buffer
if (sendproc[iswap] != me) {
if (sendnum[iswap])
MPI_Irecv(buf_recv,nsize*sendnum[iswap],MPI_DOUBLE,sendproc[iswap],0,
world,&request);
if (recvnum[iswap])
MPI_Send(buf_send,n,MPI_DOUBLE,recvproc[iswap],0,world);
if (sendnum[iswap]) MPI_Wait(&request,MPI_STATUS_IGNORE);
buf = buf_recv;
} else buf = buf_send;
// unpack buffer
compute->unpack_reverse_comm(sendnum[iswap],sendlist[iswap],buf);
}
}
/* ----------------------------------------------------------------------
forward communication invoked by a Dump
nsize used only to set recv buffer limit
------------------------------------------------------------------------- */
void CommBrick::forward_comm_dump(Dump *dump)
{
int iswap,n;
double *buf;
MPI_Request request;
int nsize = dump->comm_forward;
for (iswap = 0; iswap < nswap; iswap++) {
// pack buffer
n = dump->pack_forward_comm(sendnum[iswap],sendlist[iswap],
buf_send,pbc_flag[iswap],pbc[iswap]);
// exchange with another proc
// if self, set recv buffer to send buffer
if (sendproc[iswap] != me) {
if (recvnum[iswap])
MPI_Irecv(buf_recv,nsize*recvnum[iswap],MPI_DOUBLE,recvproc[iswap],0,
world,&request);
if (sendnum[iswap])
MPI_Send(buf_send,n,MPI_DOUBLE,sendproc[iswap],0,world);
if (recvnum[iswap]) MPI_Wait(&request,MPI_STATUS_IGNORE);
buf = buf_recv;
} else buf = buf_send;
// unpack buffer
dump->unpack_forward_comm(recvnum[iswap],firstrecv[iswap],buf);
}
}
/* ----------------------------------------------------------------------
reverse communication invoked by a Dump
nsize used only to set recv buffer limit
------------------------------------------------------------------------- */
void CommBrick::reverse_comm_dump(Dump *dump)
{
int iswap,n;
double *buf;
MPI_Request request;
int nsize = dump->comm_reverse;
for (iswap = nswap-1; iswap >= 0; iswap--) {
// pack buffer
n = dump->pack_reverse_comm(recvnum[iswap],firstrecv[iswap],buf_send);
// exchange with another proc
// if self, set recv buffer to send buffer
if (sendproc[iswap] != me) {
if (sendnum[iswap])
MPI_Irecv(buf_recv,nsize*sendnum[iswap],MPI_DOUBLE,sendproc[iswap],0,
world,&request);
if (recvnum[iswap])
MPI_Send(buf_send,n,MPI_DOUBLE,recvproc[iswap],0,world);
if (sendnum[iswap]) MPI_Wait(&request,MPI_STATUS_IGNORE);
buf = buf_recv;
} else buf = buf_send;
// unpack buffer
dump->unpack_reverse_comm(sendnum[iswap],sendlist[iswap],buf);
}
}
/* ----------------------------------------------------------------------
forward communication of N values in per-atom array
------------------------------------------------------------------------- */
void CommBrick::forward_comm_array(int nsize, double **array)
{
int i,j,k,m,iswap,last;
double *buf;
MPI_Request request;
// insure send/recv bufs are big enough for nsize
// based on smax/rmax from most recent borders() invocation
if (nsize > maxforward) {
maxforward = nsize;
if (maxforward*smax > maxsend) grow_send(maxforward*smax,0);
if (maxforward*rmax > maxrecv) grow_recv(maxforward*rmax);
}
for (iswap = 0; iswap < nswap; iswap++) {
// pack buffer
m = 0;
for (i = 0; i < sendnum[iswap]; i++) {
j = sendlist[iswap][i];
for (k = 0; k < nsize; k++)
buf_send[m++] = array[j][k];
}
// exchange with another proc
// if self, set recv buffer to send buffer
if (sendproc[iswap] != me) {
if (recvnum[iswap])
MPI_Irecv(buf_recv,nsize*recvnum[iswap],MPI_DOUBLE,recvproc[iswap],0,
world,&request);
if (sendnum[iswap])
MPI_Send(buf_send,nsize*sendnum[iswap],MPI_DOUBLE,
sendproc[iswap],0,world);
if (recvnum[iswap]) MPI_Wait(&request,MPI_STATUS_IGNORE);
buf = buf_recv;
} else buf = buf_send;
// unpack buffer
m = 0;
last = firstrecv[iswap] + recvnum[iswap];
for (i = firstrecv[iswap]; i < last; i++)
for (k = 0; k < nsize; k++)
array[i][k] = buf[m++];
}
}
/* ----------------------------------------------------------------------
exchange info provided with all 6 stencil neighbors
------------------------------------------------------------------------- */
int CommBrick::exchange_variable(int n, double *inbuf, double *&outbuf)
{
int nsend,nrecv,nrecv1,nrecv2;
MPI_Request request;
nrecv = n;
if (nrecv > maxrecv) grow_recv(nrecv);
memcpy(buf_recv,inbuf,nrecv*sizeof(double));
// loop over dimensions
for (int dim = 0; dim < 3; dim++) {
// no exchange if only one proc in a dimension
if (procgrid[dim] == 1) continue;
// send/recv info in both directions using same buf_recv
// if 2 procs in dimension, single send/recv
// if more than 2 procs in dimension, send/recv to both neighbors
nsend = nrecv;
MPI_Sendrecv(&nsend,1,MPI_INT,procneigh[dim][0],0,
&nrecv1,1,MPI_INT,procneigh[dim][1],0,world,MPI_STATUS_IGNORE);
nrecv += nrecv1;
if (procgrid[dim] > 2) {
MPI_Sendrecv(&nsend,1,MPI_INT,procneigh[dim][1],0,
&nrecv2,1,MPI_INT,procneigh[dim][0],0,world,
MPI_STATUS_IGNORE);
nrecv += nrecv2;
} else nrecv2 = 0;
if (nrecv > maxrecv) grow_recv(nrecv);
MPI_Irecv(&buf_recv[nsend],nrecv1,MPI_DOUBLE,procneigh[dim][1],0,
world,&request);
MPI_Send(buf_recv,nsend,MPI_DOUBLE,procneigh[dim][0],0,world);
MPI_Wait(&request,MPI_STATUS_IGNORE);
if (procgrid[dim] > 2) {
MPI_Irecv(&buf_recv[nsend+nrecv1],nrecv2,MPI_DOUBLE,procneigh[dim][0],0,
world,&request);
MPI_Send(buf_recv,nsend,MPI_DOUBLE,procneigh[dim][1],0,world);
MPI_Wait(&request,MPI_STATUS_IGNORE);
}
}
outbuf = buf_recv;
return nrecv;
}
/* ----------------------------------------------------------------------
realloc the size of the send buffer as needed with BUFFACTOR and bufextra
if flag = 1, realloc
if flag = 0, don't need to realloc with copy, just free/malloc
------------------------------------------------------------------------- */
void CommBrick::grow_send(int n, int flag)
{
maxsend = static_cast<int> (BUFFACTOR * n);
if (flag)
memory->grow(buf_send,maxsend+bufextra,"comm:buf_send");
else {
memory->destroy(buf_send);
memory->create(buf_send,maxsend+bufextra,"comm:buf_send");
}
}
/* ----------------------------------------------------------------------
free/malloc the size of the recv buffer as needed with BUFFACTOR
------------------------------------------------------------------------- */
void CommBrick::grow_recv(int n)
{
maxrecv = static_cast<int> (BUFFACTOR * n);
memory->destroy(buf_recv);
memory->create(buf_recv,maxrecv,"comm:buf_recv");
}
/* ----------------------------------------------------------------------
realloc the size of the iswap sendlist as needed with BUFFACTOR
------------------------------------------------------------------------- */
void CommBrick::grow_list(int iswap, int n)
{
maxsendlist[iswap] = static_cast<int> (BUFFACTOR * n);
memory->grow(sendlist[iswap],maxsendlist[iswap],"comm:sendlist[iswap]");
}
/* ----------------------------------------------------------------------
realloc the buffers needed for swaps
------------------------------------------------------------------------- */
void CommBrick::grow_swap(int n)
{
free_swap();
allocate_swap(n);
if (mode == MULTI) {
free_multi();
allocate_multi(n);
}
sendlist = (int **)
memory->srealloc(sendlist,n*sizeof(int *),"comm:sendlist");
memory->grow(maxsendlist,n,"comm:maxsendlist");
for (int i = maxswap; i < n; i++) {
maxsendlist[i] = BUFMIN;
memory->create(sendlist[i],BUFMIN,"comm:sendlist[i]");
}
maxswap = n;
}
/* ----------------------------------------------------------------------
allocation of swap info
------------------------------------------------------------------------- */
void CommBrick::allocate_swap(int n)
{
memory->create(sendnum,n,"comm:sendnum");
memory->create(recvnum,n,"comm:recvnum");
memory->create(sendproc,n,"comm:sendproc");
memory->create(recvproc,n,"comm:recvproc");
memory->create(size_forward_recv,n,"comm:size");
memory->create(size_reverse_send,n,"comm:size");
memory->create(size_reverse_recv,n,"comm:size");
memory->create(slablo,n,"comm:slablo");
memory->create(slabhi,n,"comm:slabhi");
memory->create(firstrecv,n,"comm:firstrecv");
memory->create(pbc_flag,n,"comm:pbc_flag");
memory->create(pbc,n,6,"comm:pbc");
}
/* ----------------------------------------------------------------------
allocation of multi-type swap info
------------------------------------------------------------------------- */
void CommBrick::allocate_multi(int n)
{
multilo = memory->create(multilo,n,atom->ntypes+1,"comm:multilo");
multihi = memory->create(multihi,n,atom->ntypes+1,"comm:multihi");
}
/* ----------------------------------------------------------------------
free memory for swaps
------------------------------------------------------------------------- */
void CommBrick::free_swap()
{
memory->destroy(sendnum);
memory->destroy(recvnum);
memory->destroy(sendproc);
memory->destroy(recvproc);
memory->destroy(size_forward_recv);
memory->destroy(size_reverse_send);
memory->destroy(size_reverse_recv);
memory->destroy(slablo);
memory->destroy(slabhi);
memory->destroy(firstrecv);
memory->destroy(pbc_flag);
memory->destroy(pbc);
}
/* ----------------------------------------------------------------------
free memory for multi-type swaps
------------------------------------------------------------------------- */
void CommBrick::free_multi()
{
memory->destroy(multilo);
memory->destroy(multihi);
multilo = multihi = NULL;
}
/* ----------------------------------------------------------------------
return # of bytes of allocated memory
------------------------------------------------------------------------- */
bigint CommBrick::memory_usage()
{
bigint bytes = 0;
bytes += nprocs * sizeof(int); // grid2proc
for (int i = 0; i < nswap; i++)
bytes += memory->usage(sendlist[i],maxsendlist[i]);
bytes += memory->usage(buf_send,maxsend+bufextra);
bytes += memory->usage(buf_recv,maxrecv);
return bytes;
}
diff --git a/src/compute_chunk_atom.cpp b/src/compute_chunk_atom.cpp
index fafcf7aee..925c5fbf8 100644
--- a/src/compute_chunk_atom.cpp
+++ b/src/compute_chunk_atom.cpp
@@ -1,2006 +1,2006 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
// NOTE: allow for bin center to be variables for sphere/cylinder
#include <mpi.h>
#include <string.h>
#include <stdlib.h>
#include "compute_chunk_atom.h"
#include "atom.h"
#include "update.h"
#include "force.h"
#include "domain.h"
#include "region.h"
#include "lattice.h"
#include "modify.h"
#include "fix_store.h"
#include "comm.h"
#include "group.h"
#include "input.h"
#include "variable.h"
#include "math_const.h"
#include "memory.h"
#include "error.h"
#include <map>
using namespace LAMMPS_NS;
using namespace MathConst;
enum{BIN1D,BIN2D,BIN3D,BINSPHERE,BINCYLINDER,
TYPE,MOLECULE,COMPUTE,FIX,VARIABLE};
enum{LOWER,CENTER,UPPER,COORD};
enum{BOX,LATTICE,REDUCED};
enum{NODISCARD,MIXED,YESDISCARD};
enum{ONCE,NFREQ,EVERY}; // used in several files
enum{LIMITMAX,LIMITEXACT};
#define IDMAX 1024*1024
#define INVOKED_PERATOM 8
// allocate space for static class variable
ComputeChunkAtom *ComputeChunkAtom::cptr;
/* ---------------------------------------------------------------------- */
ComputeChunkAtom::ComputeChunkAtom(LAMMPS *lmp, int narg, char **arg) :
Compute(lmp, narg, arg),
chunk_volume_vec(NULL), coord(NULL), ichunk(NULL), chunkID(NULL),
cfvid(NULL), idregion(NULL), region(NULL), cchunk(NULL), fchunk(NULL),
varatom(NULL), id_fix(NULL), fixstore(NULL), lockfix(NULL), chunk(NULL),
exclude(NULL), hash(NULL)
{
if (narg < 4) error->all(FLERR,"Illegal compute chunk/atom command");
peratom_flag = 1;
size_peratom_cols = 0;
create_attribute = 1;
// chunk style and its args
int iarg;
binflag = 0;
ncoord = 0;
cfvid = NULL;
if (strcmp(arg[3],"bin/1d") == 0) {
binflag = 1;
which = BIN1D;
ncoord = 1;
iarg = 4;
readdim(narg,arg,iarg,0);
iarg += 3;
} else if (strcmp(arg[3],"bin/2d") == 0) {
binflag = 1;
which = BIN2D;
ncoord = 2;
iarg = 4;
readdim(narg,arg,iarg,0);
readdim(narg,arg,iarg+3,1);
iarg += 6;
} else if (strcmp(arg[3],"bin/3d") == 0) {
binflag = 1;
which = BIN3D;
ncoord = 3;
iarg = 4;
readdim(narg,arg,iarg,0);
readdim(narg,arg,iarg+3,1);
readdim(narg,arg,iarg+6,2);
iarg += 9;
} else if (strcmp(arg[3],"bin/sphere") == 0) {
binflag = 1;
which = BINSPHERE;
ncoord = 1;
iarg = 4;
if (iarg+6 > narg) error->all(FLERR,"Illegal compute chunk/atom command");
sorigin_user[0] = force->numeric(FLERR,arg[iarg]);
sorigin_user[1] = force->numeric(FLERR,arg[iarg+1]);
sorigin_user[2] = force->numeric(FLERR,arg[iarg+2]);
sradmin_user = force->numeric(FLERR,arg[iarg+3]);
sradmax_user = force->numeric(FLERR,arg[iarg+4]);
nsbin = force->inumeric(FLERR,arg[iarg+5]);
iarg += 6;
} else if (strcmp(arg[3],"bin/cylinder") == 0) {
binflag = 1;
which = BINCYLINDER;
ncoord = 2;
iarg = 4;
readdim(narg,arg,iarg,0);
iarg += 3;
if (dim[0] == 0) {
cdim1 = 1;
cdim2 = 2;
} else if (dim[0] == 1) {
cdim1 = 0;
cdim2 = 2;
} else {
cdim1 = 0;
cdim2 = 1;
}
if (iarg+5 > narg) error->all(FLERR,"Illegal compute chunk/atom command");
corigin_user[dim[0]] = 0.0;
corigin_user[cdim1] = force->numeric(FLERR,arg[iarg]);
corigin_user[cdim2] = force->numeric(FLERR,arg[iarg+1]);
cradmin_user = force->numeric(FLERR,arg[iarg+2]);
cradmax_user = force->numeric(FLERR,arg[iarg+3]);
ncbin = force->inumeric(FLERR,arg[iarg+4]);
iarg += 5;
} else if (strcmp(arg[3],"type") == 0) {
which = TYPE;
iarg = 4;
} else if (strcmp(arg[3],"molecule") == 0) {
which = MOLECULE;
iarg = 4;
} else if (strstr(arg[3],"c_") == arg[3] ||
strstr(arg[3],"f_") == arg[3] ||
strstr(arg[3],"v_") == arg[3]) {
if (arg[3][0] == 'c') which = COMPUTE;
else if (arg[3][0] == 'f') which = FIX;
else if (arg[3][0] == 'v') which = VARIABLE;
iarg = 4;
int n = strlen(arg[3]);
char *suffix = new char[n];
strcpy(suffix,&arg[3][2]);
char *ptr = strchr(suffix,'[');
if (ptr) {
if (suffix[strlen(suffix)-1] != ']')
error->all(FLERR,"Illegal compute chunk/atom command");
argindex = atoi(ptr+1);
*ptr = '\0';
} else argindex = 0;
n = strlen(suffix) + 1;
cfvid = new char[n];
strcpy(cfvid,suffix);
delete [] suffix;
} else error->all(FLERR,"Illegal compute chunk/atom command");
// optional args
regionflag = 0;
idregion = NULL;
nchunksetflag = 0;
nchunkflag = EVERY;
limit = 0;
limitstyle = LIMITMAX;
limitfirst = 0;
idsflag = EVERY;
compress = 0;
int discardsetflag = 0;
discard = MIXED;
minflag[0] = LOWER;
minflag[1] = LOWER;
minflag[2] = LOWER;
maxflag[0] = UPPER;
maxflag[1] = UPPER;
maxflag[2] = UPPER;
scaleflag = LATTICE;
pbcflag = 0;
while (iarg < narg) {
if (strcmp(arg[iarg],"region") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal compute chunk/atom command");
int iregion = domain->find_region(arg[iarg+1]);
if (iregion == -1)
error->all(FLERR,"Region ID for compute chunk/atom does not exist");
int n = strlen(arg[iarg+1]) + 1;
idregion = new char[n];
strcpy(idregion,arg[iarg+1]);
regionflag = 1;
iarg += 2;
} else if (strcmp(arg[iarg],"nchunk") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal compute chunk/atom command");
if (strcmp(arg[iarg+1],"once") == 0) nchunkflag = ONCE;
else if (strcmp(arg[iarg+1],"every") == 0) nchunkflag = EVERY;
else error->all(FLERR,"Illegal compute chunk/atom command");
nchunksetflag = 1;
iarg += 2;
} else if (strcmp(arg[iarg],"limit") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal compute chunk/atom command");
limit = force->inumeric(FLERR,arg[iarg+1]);
if (limit < 0) error->all(FLERR,"Illegal compute chunk/atom command");
if (limit && !compress) limitfirst = 1;
iarg += 2;
if (limit) {
if (iarg+1 > narg)
error->all(FLERR,"Illegal compute chunk/atom command");
if (strcmp(arg[iarg+1],"max") == 0) limitstyle = LIMITMAX;
else if (strcmp(arg[iarg+1],"exact") == 0) limitstyle = LIMITEXACT;
else error->all(FLERR,"Illegal compute chunk/atom command");
iarg++;
}
} else if (strcmp(arg[iarg],"ids") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal compute chunk/atom command");
if (strcmp(arg[iarg+1],"once") == 0) idsflag = ONCE;
else if (strcmp(arg[iarg+1],"nfreq") == 0) idsflag = NFREQ;
else if (strcmp(arg[iarg+1],"every") == 0) idsflag = EVERY;
else error->all(FLERR,"Illegal compute chunk/atom command");
iarg += 2;
} else if (strcmp(arg[iarg],"compress") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal compute chunk/atom command");
else if (strcmp(arg[iarg+1],"no") == 0) compress = 0;
else if (strcmp(arg[iarg+1],"yes") == 0) compress = 1;
else error->all(FLERR,"Illegal compute chunk/atom command");
iarg += 2;
} else if (strcmp(arg[iarg],"discard") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal compute chunk/atom command");
if (strcmp(arg[iarg+1],"mixed") == 0) discard = MIXED;
else if (strcmp(arg[iarg+1],"no") == 0) discard = NODISCARD;
else if (strcmp(arg[iarg+1],"yes") == 0) discard = YESDISCARD;
else error->all(FLERR,"Illegal compute chunk/atom command");
discardsetflag = 1;
iarg += 2;
} else if (strcmp(arg[iarg],"bound") == 0) {
if (iarg+4 > narg) error->all(FLERR,"Illegal compute chunk/atom command");
int idim;
if (strcmp(arg[iarg+1],"x") == 0) idim = 0;
else if (strcmp(arg[iarg+1],"y") == 0) idim = 1;
else if (strcmp(arg[iarg+1],"z") == 0) idim = 2;
else error->all(FLERR,"Illegal compute chunk/atom command");
if (strcmp(arg[iarg+2],"lower") == 0) minflag[idim] = LOWER;
else minflag[idim] = COORD;
if (minflag[idim] == COORD)
minvalue[idim] = force->numeric(FLERR,arg[iarg+2]);
if (strcmp(arg[iarg+3],"upper") == 0) maxflag[idim] = UPPER;
else maxflag[idim] = COORD;
if (maxflag[idim] == COORD)
maxvalue[idim] = force->numeric(FLERR,arg[iarg+3]);
else error->all(FLERR,"Illegal compute chunk/atom command");
iarg += 4;
} else if (strcmp(arg[iarg],"units") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal compute chunk/atom command");
if (strcmp(arg[iarg+1],"box") == 0) scaleflag = BOX;
else if (strcmp(arg[iarg+1],"lattice") == 0) scaleflag = LATTICE;
else if (strcmp(arg[iarg+1],"reduced") == 0) scaleflag = REDUCED;
else error->all(FLERR,"Illegal compute chunk/atom command");
iarg += 2;
} else if (strcmp(arg[iarg],"pbc") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal compute chunk/atom command");
if (strcmp(arg[iarg+1],"no") == 0) pbcflag = 0;
else if (strcmp(arg[iarg+1],"yes") == 0) pbcflag = 1;
else error->all(FLERR,"Illegal compute chunk/atom command");
iarg += 2;
} else error->all(FLERR,"Illegal compute chunk/atom command");
}
// set nchunkflag and discard to default values if not explicitly set
// for binning style, also check in init() if simulation box is static,
// which sets nchunkflag = ONCE
if (!nchunksetflag) {
if (binflag) {
if (scaleflag == REDUCED) nchunkflag = ONCE;
else nchunkflag = EVERY;
}
if (which == TYPE) nchunkflag = ONCE;
if (which == MOLECULE) {
if (regionflag) nchunkflag = EVERY;
else nchunkflag = ONCE;
}
if (compress) nchunkflag = EVERY;
}
if (!discardsetflag) {
if (binflag) discard = MIXED;
else discard = YESDISCARD;
}
// error checks
if (which == MOLECULE && !atom->molecule_flag)
error->all(FLERR,"Compute chunk/atom molecule for non-molecular system");
if (!binflag && discard == MIXED)
error->all(FLERR,"Compute chunk/atom without bins "
"cannot use discard mixed");
if (which == BIN1D && delta[0] <= 0.0)
error->all(FLERR,"Illegal compute chunk/atom command");
if (which == BIN2D && (delta[0] <= 0.0 || delta[1] <= 0.0))
error->all(FLERR,"Illegal compute chunk/atom command");
if (which == BIN2D && (dim[0] == dim[1]))
error->all(FLERR,"Illegal compute chunk/atom command");
if (which == BIN3D &&
(delta[0] <= 0.0 || delta[1] <= 0.0 || delta[2] <= 0.0))
error->all(FLERR,"Illegal compute chunk/atom command");
if (which == BIN3D &&
(dim[0] == dim[1] || dim[1] == dim[2] || dim[0] == dim[2]))
error->all(FLERR,"Illegal compute chunk/atom command");
if (which == BINSPHERE) {
if (domain->dimension == 2 && sorigin_user[2] != 0.0)
error->all(FLERR,"Compute chunk/atom sphere z origin must be 0.0 for 2d");
if (sradmin_user < 0.0 || sradmin_user >= sradmax_user || nsbin < 1)
error->all(FLERR,"Illegal compute chunk/atom command");
}
if (which == BINCYLINDER) {
if (delta[0] <= 0.0)
error->all(FLERR,"Illegal compute chunk/atom command");
if (domain->dimension == 2 && dim[0] != 2)
error->all(FLERR,"Compute chunk/atom cylinder axis must be z for 2d");
if (cradmin_user < 0.0 || cradmin_user >= cradmax_user || ncbin < 1)
error->all(FLERR,"Illegal compute chunk/atom command");
}
if (which == COMPUTE) {
int icompute = modify->find_compute(cfvid);
if (icompute < 0)
error->all(FLERR,"Compute ID for compute chunk /atom does not exist");
if (modify->compute[icompute]->peratom_flag == 0)
error->all(FLERR,
"Compute chunk/atom compute does not calculate "
"per-atom values");
if (argindex == 0 &&
modify->compute[icompute]->size_peratom_cols != 0)
error->all(FLERR,"Compute chunk/atom compute does not "
"calculate a per-atom vector");
if (argindex && modify->compute[icompute]->size_peratom_cols == 0)
error->all(FLERR,"Compute chunk/atom compute does not "
"calculate a per-atom array");
if (argindex &&
argindex > modify->compute[icompute]->size_peratom_cols)
error->all(FLERR,"Compute chunk/atom compute array is "
"accessed out-of-range");
}
if (which == FIX) {
int ifix = modify->find_fix(cfvid);
if (ifix < 0)
error->all(FLERR,"Fix ID for compute chunk/atom does not exist");
if (modify->fix[ifix]->peratom_flag == 0)
error->all(FLERR,"Compute chunk/atom fix does not calculate "
"per-atom values");
if (argindex == 0 && modify->fix[ifix]->size_peratom_cols != 0)
error->all(FLERR,
"Compute chunk/atom fix does not calculate a per-atom vector");
if (argindex && modify->fix[ifix]->size_peratom_cols == 0)
error->all(FLERR,
"Compute chunk/atom fix does not calculate a per-atom array");
if (argindex && argindex > modify->fix[ifix]->size_peratom_cols)
error->all(FLERR,"Compute chunk/atom fix array is accessed out-of-range");
}
if (which == VARIABLE) {
int ivariable = input->variable->find(cfvid);
if (ivariable < 0)
error->all(FLERR,"Variable name for compute chunk/atom does not exist");
if (input->variable->atomstyle(ivariable) == 0)
error->all(FLERR,"Compute chunk/atom variable is not "
"atom-style variable");
}
// setup scaling
if (binflag) {
if (domain->triclinic == 1 && scaleflag != REDUCED)
error->all(FLERR,"Compute chunk/atom for triclinic boxes "
"requires units reduced");
}
if (scaleflag == LATTICE) {
xscale = domain->lattice->xlattice;
yscale = domain->lattice->ylattice;
zscale = domain->lattice->zlattice;
} else xscale = yscale = zscale = 1.0;
// apply scaling factors and cylinder dims orthogonal to axis
if (binflag) {
double scale;
if (which == BIN1D || which == BIN2D || which == BIN3D ||
which == BINCYLINDER) {
if (which == BIN1D || which == BINCYLINDER) ndim = 1;
if (which == BIN2D) ndim = 2;
if (which == BIN3D) ndim = 3;
for (int idim = 0; idim < ndim; idim++) {
if (dim[idim] == 0) scale = xscale;
else if (dim[idim] == 1) scale = yscale;
else if (dim[idim] == 2) scale = zscale;
delta[idim] *= scale;
invdelta[idim] = 1.0/delta[idim];
if (originflag[idim] == COORD) origin[idim] *= scale;
if (minflag[idim] == COORD) minvalue[idim] *= scale;
if (maxflag[idim] == COORD) maxvalue[idim] *= scale;
}
} else if (which == BINSPHERE) {
sorigin_user[0] *= xscale;
sorigin_user[1] *= yscale;
sorigin_user[2] *= zscale;
sradmin_user *= xscale; // radii are scaled by xscale
sradmax_user *= xscale;
} else if (which == BINCYLINDER) {
if (dim[0] == 0) {
corigin_user[cdim1] *= yscale;
corigin_user[cdim2] *= zscale;
cradmin_user *= yscale; // radii are scaled by first non-axis dim
cradmax_user *= yscale;
} else if (dim[0] == 1) {
corigin_user[cdim1] *= xscale;
corigin_user[cdim2] *= zscale;
cradmin_user *= xscale;
cradmax_user *= xscale;
} else {
corigin_user[cdim1] *= xscale;
corigin_user[cdim2] *= yscale;
cradmin_user *= xscale;
cradmax_user *= xscale;
}
}
}
// initialize chunk vector and per-chunk info
nmax = 0;
chunk = NULL;
nmaxint = -1;
ichunk = NULL;
exclude = NULL;
nchunk = 0;
chunk_volume_scalar = 1.0;
chunk_volume_vec = NULL;
coord = NULL;
chunkID = NULL;
// computeflag = 1 if this compute might invoke another compute
// during assign_chunk_ids()
if (which == COMPUTE || which == FIX || which == VARIABLE) computeflag = 1;
else computeflag = 0;
// other initializations
invoked_setup = -1;
invoked_ichunk = -1;
id_fix = NULL;
fixstore = NULL;
if (compress) hash = new std::map<tagint,int>();
else hash = NULL;
maxvar = 0;
varatom = NULL;
lockcount = 0;
lockfix = NULL;
if (which == MOLECULE) molcheck = 1;
else molcheck = 0;
}
/* ---------------------------------------------------------------------- */
ComputeChunkAtom::~ComputeChunkAtom()
{
// check nfix in case all fixes have already been deleted
- if (modify->nfix) modify->delete_fix(id_fix);
+ if (id_fix && modify->nfix) modify->delete_fix(id_fix);
delete [] id_fix;
memory->destroy(chunk);
memory->destroy(ichunk);
memory->destroy(exclude);
memory->destroy(chunk_volume_vec);
memory->destroy(coord);
memory->destroy(chunkID);
delete [] idregion;
delete [] cfvid;
delete hash;
memory->destroy(varatom);
}
/* ---------------------------------------------------------------------- */
void ComputeChunkAtom::init()
{
// set and check validity of region
if (regionflag) {
int iregion = domain->find_region(idregion);
if (iregion == -1)
error->all(FLERR,"Region ID for compute chunk/atom does not exist");
region = domain->regions[iregion];
}
// set compute,fix,variable
if (which == COMPUTE) {
int icompute = modify->find_compute(cfvid);
if (icompute < 0)
error->all(FLERR,"Compute ID for compute chunk/atom does not exist");
cchunk = modify->compute[icompute];
} else if (which == FIX) {
int ifix = modify->find_fix(cfvid);
if (ifix < 0)
error->all(FLERR,"Fix ID for compute chunk/atom does not exist");
fchunk = modify->fix[ifix];
} else if (which == VARIABLE) {
int ivariable = input->variable->find(cfvid);
if (ivariable < 0)
error->all(FLERR,"Variable name for compute chunk/atom does not exist");
vchunk = ivariable;
}
// for style MOLECULE, check that no mol IDs exceed MAXSMALLINT
// don't worry about group or optional region
if (which == MOLECULE) {
tagint *molecule = atom->molecule;
int nlocal = atom->nlocal;
tagint maxone = -1;
for (int i = 0; i < nlocal; i++)
if (molecule[i] > maxone) maxone = molecule[i];
tagint maxall;
MPI_Allreduce(&maxone,&maxall,1,MPI_LMP_TAGINT,MPI_MAX,world);
if (maxall > MAXSMALLINT)
error->all(FLERR,"Molecule IDs too large for compute chunk/atom");
}
// for binning, if nchunkflag not already set, set it to ONCE or EVERY
// depends on whether simulation box size is static or dynamic
// reset invoked_setup if this is not first run and box just became static
if (binflag && !nchunksetflag && !compress && scaleflag != REDUCED) {
if (domain->box_change_size == 0) {
if (nchunkflag == EVERY && invoked_setup >= 0) invoked_setup = -1;
nchunkflag = ONCE;
} else nchunkflag = EVERY;
}
// require nchunkflag = ONCE if idsflag = ONCE
// b/c nchunk cannot change if chunk IDs are frozen
// can't check until now since nchunkflag may have been adjusted in init()
if (idsflag == ONCE && nchunkflag != ONCE)
error->all(FLERR,"Compute chunk/atom ids once but nchunk is not once");
// create/destroy fix STORE for persistent chunk IDs as needed
// need to do this if idsflag = ONCE or locks will be used by other commands
// need to wait until init() so that fix command(s) are in place
// they increment lockcount if they lock this compute
// fixstore ID = compute-ID + COMPUTE_STORE, fix group = compute group
// fixstore initializes all values to 0.0
if ((idsflag == ONCE || lockcount) && !fixstore) {
int n = strlen(id) + strlen("_COMPUTE_STORE") + 1;
id_fix = new char[n];
strcpy(id_fix,id);
strcat(id_fix,"_COMPUTE_STORE");
char **newarg = new char*[6];
newarg[0] = id_fix;
newarg[1] = group->names[igroup];
newarg[2] = (char *) "STORE";
newarg[3] = (char *) "peratom";
newarg[4] = (char *) "1";
newarg[5] = (char *) "1";
modify->add_fix(6,newarg);
fixstore = (FixStore *) modify->fix[modify->nfix-1];
delete [] newarg;
}
if ((idsflag != ONCE && !lockcount) && fixstore) {
modify->delete_fix(id_fix);
fixstore = NULL;
}
}
/* ----------------------------------------------------------------------
invoke setup_chunks and/or compute_ichunk if only done ONCE
so that nchunks or chunk IDs are assigned when this compute was specified
as opposed to first times compute_peratom() or compute_ichunk() is called
------------------------------------------------------------------------- */
void ComputeChunkAtom::setup()
{
if (nchunkflag == ONCE) setup_chunks();
if (idsflag == ONCE) compute_ichunk();
}
/* ----------------------------------------------------------------------
only called by classes that use per-atom computes in standard way
dump, variable, thermo output, other computes, etc
not called by fix chunk or compute chunk commands
they invoke setup_chunks() and compute_ichunk() directly
------------------------------------------------------------------------- */
void ComputeChunkAtom::compute_peratom()
{
invoked_peratom = update->ntimestep;
// grow floating point chunk vector if necessary
if (atom->nmax > nmax) {
memory->destroy(chunk);
nmax = atom->nmax;
memory->create(chunk,nmax,"chunk/atom:chunk");
vector_atom = chunk;
}
setup_chunks();
compute_ichunk();
// copy integer indices into floating-point chunk vector
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) chunk[i] = ichunk[i];
}
/* ----------------------------------------------------------------------
set lock, so that nchunk will not change from startstep to stopstep
called by fix for duration of time it requires lock
OK if called by multiple fix commands
error if all callers do not have same duration
last caller holds the lock, so it can also unlock
stopstep can be positive for final step of finite-size time window
or can be -1 for infinite-size time window
------------------------------------------------------------------------- */
void ComputeChunkAtom::lock(Fix *fixptr, bigint startstep, bigint stopstep)
{
if (lockfix == NULL) {
lockfix = fixptr;
lockstart = startstep;
lockstop = stopstep;
return;
}
if (startstep != lockstart || stopstep != lockstop)
error->all(FLERR,"Two fix commands using "
"same compute chunk/atom command in incompatible ways");
// set lock to last calling Fix, since it will be last to unlock()
lockfix = fixptr;
}
/* ----------------------------------------------------------------------
unset lock
can only be done by fix command that holds the lock
------------------------------------------------------------------------- */
void ComputeChunkAtom::unlock(Fix *fixptr)
{
if (fixptr != lockfix) return;
lockfix = NULL;
}
/* ----------------------------------------------------------------------
assign chunk IDs from 1 to Nchunk to every atom, or 0 if not in chunk
------------------------------------------------------------------------- */
void ComputeChunkAtom::compute_ichunk()
{
int i;
// skip if already done on this step
if (invoked_ichunk == update->ntimestep) return;
// if old IDs persist via storage in fixstore, then just retrieve them
// yes if idsflag = ONCE, and already done once
// or if idsflag = NFREQ and lock is in place and are on later timestep
// else proceed to recalculate per-atom chunk assignments
int restore = 0;
if (idsflag == ONCE && invoked_ichunk >= 0) restore = 1;
if (idsflag == NFREQ && lockfix && update->ntimestep > lockstart) restore = 1;
if (restore) {
invoked_ichunk = update->ntimestep;
double *vstore = fixstore->vstore;
int nlocal = atom->nlocal;
for (i = 0; i < nlocal; i++) ichunk[i] = static_cast<int> (vstore[i]);
return;
}
invoked_ichunk = update->ntimestep;
// assign chunk IDs to atoms
// will exclude atoms not in group or in optional region
// already invoked if this is same timestep as last setup_chunks()
if (update->ntimestep > invoked_setup) assign_chunk_ids();
// compress chunk IDs via hash of the original uncompressed IDs
// also apply discard rule except for binning styles which already did
int nlocal = atom->nlocal;
if (compress) {
if (binflag) {
for (i = 0; i < nlocal; i++) {
if (exclude[i]) continue;
if (hash->find(ichunk[i]) == hash->end()) exclude[i] = 1;
else ichunk[i] = hash->find(ichunk[i])->second;
}
} else if (discard == NODISCARD) {
for (i = 0; i < nlocal; i++) {
if (exclude[i]) continue;
if (hash->find(ichunk[i]) == hash->end()) ichunk[i] = nchunk;
else ichunk[i] = hash->find(ichunk[i])->second;
}
} else {
for (i = 0; i < nlocal; i++) {
if (exclude[i]) continue;
if (hash->find(ichunk[i]) == hash->end()) exclude[i] = 1;
else ichunk[i] = hash->find(ichunk[i])->second;
}
}
// else if no compression apply discard rule by itself
} else {
if (discard == NODISCARD) {
for (i = 0; i < nlocal; i++) {
if (exclude[i]) continue;
if (ichunk[i] < 1 || ichunk[i] > nchunk) ichunk[i] = nchunk;;
}
} else {
for (i = 0; i < nlocal; i++) {
if (exclude[i]) continue;
if (ichunk[i] < 1 || ichunk[i] > nchunk) exclude[i] = 1;
}
}
}
// set ichunk = 0 for excluded atoms
// this should set any ichunk values which have not yet been set
for (i = 0; i < nlocal; i++)
if (exclude[i]) ichunk[i] = 0;
// if newly calculated IDs need to persist, store them in fixstore
// yes if idsflag = ONCE or idsflag = NFREQ and lock is in place
if (idsflag == ONCE || (idsflag == NFREQ && lockfix)) {
double *vstore = fixstore->vstore;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) vstore[i] = ichunk[i];
}
// one-time check if which = MOLECULE and
// any chunks do not contain all atoms in the molecule
if (molcheck) {
check_molecules();
molcheck = 0;
}
}
/* ----------------------------------------------------------------------
setup chunks
return nchunk = # of chunks
all atoms will be assigned a chunk ID from 1 to Nchunk, or 0
also setup any internal state needed to quickly assign atoms to chunks
called from compute_peratom() and also directly from
fix chunk and compute chunk commands
------------------------------------------------------------------------- */
int ComputeChunkAtom::setup_chunks()
{
if (invoked_setup == update->ntimestep) return nchunk;
// check if setup needs to be done
// no if lock is in place
// no if nchunkflag = ONCE, and already done once
// otherwise yes
// even if no, check if need to re-compute bin volumes
// so that fix ave/chunk can do proper density normalization
int flag = 0;
if (lockfix) flag = 1;
if (nchunkflag == ONCE && invoked_setup >= 0) flag = 1;
if (flag) {
if (binflag && scaleflag == REDUCED && domain->box_change_size)
bin_volumes();
return nchunk;
}
invoked_setup = update->ntimestep;
// assign chunk IDs to atoms
// will exclude atoms not in group or in optional region
// for binning styles, need to setup bins and their volume first
// else chunk_volume_scalar = entire box volume
// IDs are needed to scan for max ID and for compress()
if (binflag) {
if (which == BIN1D || which == BIN2D || which == BIN3D)
nchunk = setup_xyz_bins();
else if (which == BINSPHERE) nchunk = setup_sphere_bins();
else if (which == BINCYLINDER) nchunk = setup_cylinder_bins();
bin_volumes();
} else {
chunk_volume_scalar = domain->xprd * domain->yprd;
if (domain->dimension == 3) chunk_volume_scalar *= domain->zprd;
}
assign_chunk_ids();
// set nchunk for chunk styles other than binning
// for styles other than TYPE, scan for max ID
if (which == TYPE) nchunk = atom->ntypes;
else if (!binflag) {
int nlocal = atom->nlocal;
int hi = -1;
for (int i = 0; i < nlocal; i++) {
if (exclude[i]) continue;
if (ichunk[i] > hi) hi = ichunk[i];
}
MPI_Allreduce(&hi,&nchunk,1,MPI_INT,MPI_MAX,world);
if (nchunk <= 0) nchunk = 1;
}
// apply limit setting as well as compression of chunks with no atoms
// if limit is set, there are 3 cases:
// no compression, limit specified before compression, or vice versa
if (limit && !binflag) {
if (!compress) {
if (limitstyle == LIMITMAX) nchunk = MIN(nchunk,limit);
else if (limitstyle == LIMITEXACT) nchunk = limit;
} else if (limitfirst) {
nchunk = MIN(nchunk,limit);
}
}
if (compress) compress_chunk_ids();
if (limit && !binflag && compress) {
if (limitstyle == LIMITMAX) nchunk = MIN(nchunk,limit);
else if (limitstyle == LIMITEXACT) nchunk = limit;
}
return nchunk;
}
/* ----------------------------------------------------------------------
assign chunk IDs for all atoms, via ichunk vector
except excluded atoms, their chunk IDs are set to 0 later
also set exclude vector to 0/1 for all atoms
excluded atoms are those not in group or in optional region
called from compute_ichunk() and setup_chunks()
------------------------------------------------------------------------- */
void ComputeChunkAtom::assign_chunk_ids()
{
int i;
// grow integer chunk index vector if necessary
if (atom->nmax > nmaxint) {
memory->destroy(ichunk);
memory->destroy(exclude);
nmaxint = atom->nmax;
memory->create(ichunk,nmaxint,"chunk/atom:ichunk");
memory->create(exclude,nmaxint,"chunk/atom:exclude");
}
// update region if necessary
if (regionflag) region->prematch();
// exclude = 1 if atom is not assigned to a chunk
// exclude atoms not in group or not in optional region
double **x = atom->x;
int *mask = atom->mask;
int nlocal = atom->nlocal;
if (regionflag) {
for (i = 0; i < nlocal; i++) {
if (mask[i] & groupbit &&
region->match(x[i][0],x[i][1],x[i][2])) exclude[i] = 0;
else exclude[i] = 1;
}
} else {
for (i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) exclude[i] = 0;
else exclude[i] = 1;
}
}
// set ichunk to style value for included atoms
// binning styles apply discard rule, others do not yet
if (binflag) {
if (which == BIN1D) atom2bin1d();
else if (which == BIN2D) atom2bin2d();
else if (which == BIN3D) atom2bin3d();
else if (which == BINSPHERE) atom2binsphere();
else if (which == BINCYLINDER) atom2bincylinder();
} else if (which == TYPE) {
int *type = atom->type;
for (i = 0; i < nlocal; i++) {
if (exclude[i]) continue;
ichunk[i] = type[i];
}
} else if (which == MOLECULE) {
tagint *molecule = atom->molecule;
for (i = 0; i < nlocal; i++) {
if (exclude[i]) continue;
ichunk[i] = static_cast<int> (molecule[i]);
}
} else if (which == COMPUTE) {
if (!(cchunk->invoked_flag & INVOKED_PERATOM)) {
cchunk->compute_peratom();
cchunk->invoked_flag |= INVOKED_PERATOM;
}
if (argindex == 0) {
double *vec = cchunk->vector_atom;
for (i = 0; i < nlocal; i++) {
if (exclude[i]) continue;
ichunk[i] = static_cast<int> (vec[i]);
}
} else {
double **array = cchunk->array_atom;
int argm1 = argindex - 1;
for (i = 0; i < nlocal; i++) {
if (exclude[i]) continue;
ichunk[i] = static_cast<int> (array[i][argm1]);
}
}
} else if (which == FIX) {
if (update->ntimestep % fchunk->peratom_freq)
error->all(FLERR,"Fix used in compute chunk/atom not "
"computed at compatible time");
if (argindex == 0) {
double *vec = fchunk->vector_atom;
for (i = 0; i < nlocal; i++) {
if (exclude[i]) continue;
ichunk[i] = static_cast<int> (vec[i]);
}
} else {
double **array = fchunk->array_atom;
int argm1 = argindex - 1;
for (i = 0; i < nlocal; i++) {
if (exclude[i]) continue;
ichunk[i] = static_cast<int> (array[i][argm1]);
}
}
} else if (which == VARIABLE) {
if (atom->nmax > maxvar) {
maxvar = atom->nmax;
memory->destroy(varatom);
memory->create(varatom,maxvar,"chunk/atom:varatom");
}
input->variable->compute_atom(vchunk,igroup,varatom,1,0);
for (i = 0; i < nlocal; i++) {
if (exclude[i]) continue;
ichunk[i] = static_cast<int> (varatom[i]);
}
}
}
/* ----------------------------------------------------------------------
compress chunk IDs currently assigned to atoms across all processors
by removing those with no atoms assigned
current assignment excludes atoms not in group or in optional region
current Nchunk = max ID
operation:
use hash to store list of populated IDs that I own
add new IDs to populated lists communicated from all other procs
final hash has global list of populated ideas
reset Nchunk = length of global list
called by setup_chunks() when setting Nchunk
remapping of chunk IDs to smaller Nchunk occurs later in compute_ichunk()
------------------------------------------------------------------------- */
void ComputeChunkAtom::compress_chunk_ids()
{
hash->clear();
// put my IDs into hash
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
if (exclude[i]) continue;
if (hash->find(ichunk[i]) == hash->end()) (*hash)[ichunk[i]] = 0;
}
// n = # of my populated IDs
// nall = n summed across all procs
int n = hash->size();
bigint nbone = n;
bigint nball;
MPI_Allreduce(&nbone,&nball,1,MPI_LMP_BIGINT,MPI_SUM,world);
// create my list of populated IDs
int *list = NULL;
memory->create(list,n,"chunk/atom:list");
n = 0;
std::map<tagint,int>::iterator pos;
for (pos = hash->begin(); pos != hash->end(); ++pos)
list[n++] = pos->first;
// if nall < 1M, just allgather all ID lists on every proc
// else perform ring comm
// add IDs from all procs to my hash
if (nball <= IDMAX) {
// setup for allgatherv
int nprocs = comm->nprocs;
int nall = nball;
int *recvcounts,*displs,*listall;
memory->create(recvcounts,nprocs,"chunk/atom:recvcounts");
memory->create(displs,nprocs,"chunk/atom:displs");
memory->create(listall,nall,"chunk/atom:listall");
MPI_Allgather(&n,1,MPI_INT,recvcounts,1,MPI_INT,world);
displs[0] = 0;
for (int iproc = 1; iproc < nprocs; iproc++)
displs[iproc] = displs[iproc-1] + recvcounts[iproc-1];
// allgatherv acquires list of populated IDs from all procs
MPI_Allgatherv(list,n,MPI_INT,listall,recvcounts,displs,MPI_INT,world);
// add all unique IDs in listall to my hash
for (int i = 0; i < nall; i++)
if (hash->find(listall[i]) == hash->end()) (*hash)[listall[i]] = 0;
// clean up
memory->destroy(recvcounts);
memory->destroy(displs);
memory->destroy(listall);
} else {
cptr = this;
comm->ring(n,sizeof(int),list,1,idring,NULL,0);
}
memory->destroy(list);
// nchunk = length of hash containing populated IDs from all procs
nchunk = hash->size();
// reset hash value of each original chunk ID to ordered index
// ordered index = new compressed chunk ID (1 to Nchunk)
// leverages fact that map stores keys in ascending order
// also allocate and set chunkID = list of original chunk IDs
// used by fix ave/chunk and compute property/chunk
memory->destroy(chunkID);
memory->create(chunkID,nchunk,"chunk/atom:chunkID");
n = 0;
for (pos = hash->begin(); pos != hash->end(); ++pos) {
chunkID[n] = pos->first;
(*hash)[pos->first] = ++n;
}
}
/* ----------------------------------------------------------------------
callback from comm->ring()
cbuf = list of N chunk IDs from another proc
loop over the list, add each to my hash
hash ends up storing all unique IDs across all procs
------------------------------------------------------------------------- */
void ComputeChunkAtom::idring(int n, char *cbuf)
{
tagint *list = (tagint *) cbuf;
std::map<tagint,int> *hash = cptr->hash;
for (int i = 0; i < n; i++) (*hash)[list[i]] = 0;
}
/* ----------------------------------------------------------------------
one-time check for which = MOLECULE to check
if each chunk contains all atoms in the molecule
issue warning if not
note that this check is without regard to discard rule
if discard == NODISCARD, there is no easy way to check that all
atoms in an out-of-bounds molecule were added to a chunk,
some could have been excluded by group or region, others not
------------------------------------------------------------------------- */
void ComputeChunkAtom::check_molecules()
{
tagint *molecule = atom->molecule;
int nlocal = atom->nlocal;
int flag = 0;
if (!compress) {
for (int i = 0; i < nlocal; i++) {
if (molecule[i] > 0 && molecule[i] <= nchunk &&
ichunk[i] == 0) flag = 1;
}
} else {
int molid;
for (int i = 0; i < nlocal; i++) {
molid = static_cast<int> (molecule[i]);
if (hash->find(molid) != hash->end() && ichunk[i] == 0) flag = 1;
}
}
int flagall;
MPI_Allreduce(&flag,&flagall,1,MPI_INT,MPI_SUM,world);
if (flagall && comm->me == 0)
error->warning(FLERR,
"One or more chunks do not contain all atoms in molecule");
}
/* ----------------------------------------------------------------------
setup xyz spatial bins and their extent and coordinates
return nbins = # of bins, will become # of chunks
called from setup_chunks()
------------------------------------------------------------------------- */
int ComputeChunkAtom::setup_xyz_bins()
{
int i,j,k,m,n,idim;
double lo,hi,coord1,coord2;
// lo = bin boundary immediately below boxlo or minvalue
// hi = bin boundary immediately above boxhi or maxvalue
// allocate and initialize arrays based on new bin count
double binlo[3],binhi[3];
if (scaleflag == REDUCED) {
binlo[0] = domain->boxlo_lamda[0];
binlo[1] = domain->boxlo_lamda[1];
binlo[2] = domain->boxlo_lamda[2];
binhi[0] = domain->boxhi_lamda[0];
binhi[1] = domain->boxhi_lamda[1];
binhi[2] = domain->boxhi_lamda[2];
} else {
binlo[0] = domain->boxlo[0];
binlo[1] = domain->boxlo[1];
binlo[2] = domain->boxlo[2];
binhi[0] = domain->boxhi[0];
binhi[1] = domain->boxhi[1];
binhi[2] = domain->boxhi[2];
}
if (minflag[0] == COORD) binlo[0] = minvalue[0];
if (minflag[1] == COORD) binlo[1] = minvalue[1];
if (minflag[2] == COORD) binlo[2] = minvalue[2];
if (maxflag[0] == COORD) binhi[0] = maxvalue[0];
if (maxflag[1] == COORD) binhi[1] = maxvalue[1];
if (maxflag[2] == COORD) binhi[2] = maxvalue[2];
int nbins = 1;
for (m = 0; m < ndim; m++) {
idim = dim[m];
if (originflag[m] == LOWER) origin[m] = binlo[idim];
else if (originflag[m] == UPPER) origin[m] = binhi[idim];
else if (originflag[m] == CENTER)
origin[m] = 0.5 * (binlo[idim] + binhi[idim]);
if (origin[m] < binlo[idim]) {
n = static_cast<int> ((binlo[idim] - origin[m]) * invdelta[m]);
lo = origin[m] + n*delta[m];
} else {
n = static_cast<int> ((origin[m] - binlo[idim]) * invdelta[m]);
lo = origin[m] - n*delta[m];
if (lo > binlo[idim]) lo -= delta[m];
}
if (origin[m] < binhi[idim]) {
n = static_cast<int> ((binhi[idim] - origin[m]) * invdelta[m]);
hi = origin[m] + n*delta[m];
if (hi < binhi[idim]) hi += delta[m];
} else {
n = static_cast<int> ((origin[m] - binhi[idim]) * invdelta[m]);
hi = origin[m] - n*delta[m];
}
if (lo > hi) error->all(FLERR,"Invalid bin bounds in compute chunk/atom");
offset[m] = lo;
nlayers[m] = static_cast<int> ((hi-lo) * invdelta[m] + 0.5);
nbins *= nlayers[m];
}
// allocate and set bin coordinates
memory->destroy(coord);
memory->create(coord,nbins,ndim,"chunk/atom:coord");
if (ndim == 1) {
for (i = 0; i < nlayers[0]; i++)
coord[i][0] = offset[0] + (i+0.5)*delta[0];
} else if (ndim == 2) {
m = 0;
for (i = 0; i < nlayers[0]; i++) {
coord1 = offset[0] + (i+0.5)*delta[0];
for (j = 0; j < nlayers[1]; j++) {
coord[m][0] = coord1;
coord[m][1] = offset[1] + (j+0.5)*delta[1];
m++;
}
}
} else if (ndim == 3) {
m = 0;
for (i = 0; i < nlayers[0]; i++) {
coord1 = offset[0] + (i+0.5)*delta[0];
for (j = 0; j < nlayers[1]; j++) {
coord2 = offset[1] + (j+0.5)*delta[1];
for (k = 0; k < nlayers[2]; k++) {
coord[m][0] = coord1;
coord[m][1] = coord2;
coord[m][2] = offset[2] + (k+0.5)*delta[2];
m++;
}
}
}
}
return nbins;
}
/* ----------------------------------------------------------------------
setup spherical spatial bins and their single coordinate
return nsphere = # of bins, will become # of chunks
called from setup_chunks()
------------------------------------------------------------------------- */
int ComputeChunkAtom::setup_sphere_bins()
{
// convert sorigin_user to sorigin
// sorigin,srad are always in box units, for orthogonal or triclinic domains
// lamda2x works for either orthogonal or triclinic
if (scaleflag == REDUCED) {
domain->lamda2x(sorigin_user,sorigin);
sradmin = sradmin_user * (domain->boxhi[0]-domain->boxlo[0]);
sradmax = sradmax_user * (domain->boxhi[0]-domain->boxlo[0]);
} else {
sorigin[0] = sorigin_user[0];
sorigin[1] = sorigin_user[1];
sorigin[2] = sorigin_user[2];
sradmin = sradmin_user;
sradmax = sradmax_user;
}
// if pbcflag set, sradmax must be < 1/2 box in any periodic dim
// treat orthongonal and triclinic the same
// check every time bins are created
if (pbcflag) {
double *prd_half = domain->prd_half;
int *periodicity = domain->periodicity;
int flag = 0;
if (periodicity[0] && sradmax > prd_half[0]) flag = 1;
if (periodicity[1] && sradmax > prd_half[1]) flag = 1;
if (domain->dimension == 3 &&
periodicity[2] && sradmax > prd_half[2]) flag = 1;
if (flag)
error->all(FLERR,"Compute chunk/atom bin/sphere radius "
"is too large for periodic box");
}
sinvrad = nsbin / (sradmax-sradmin);
// allocate and set bin coordinates
// coord = midpt of radii for a spherical shell
memory->destroy(coord);
memory->create(coord,nsbin,1,"chunk/atom:coord");
double rlo,rhi;
for (int i = 0; i < nsbin; i++) {
rlo = sradmin + i * (sradmax-sradmin) / nsbin;
rhi = sradmin + (i+1) * (sradmax-sradmin) / nsbin;
if (i == nsbin-1) rhi = sradmax;
coord[i][0] = 0.5 * (rlo+rhi);
}
return nsbin;
}
/* ----------------------------------------------------------------------
setup cylindrical spatial bins and their single coordinate
return nsphere = # of bins, will become # of chunks
called from setup_chunks()
------------------------------------------------------------------------- */
int ComputeChunkAtom::setup_cylinder_bins()
{
// setup bins along cylinder axis
// ncplane = # of axis bins
ncplane = setup_xyz_bins();
// convert corigin_user to corigin
// corigin is always in box units, for orthogonal or triclinic domains
// lamda2x works for either orthogonal or triclinic
if (scaleflag == REDUCED) {
domain->lamda2x(corigin_user,corigin);
cradmin = cradmin_user * (domain->boxhi[cdim1]-domain->boxlo[cdim1]);
cradmax = cradmax_user * (domain->boxhi[cdim1]-domain->boxlo[cdim1]);
} else {
corigin[cdim1] = corigin_user[cdim1];
corigin[cdim2] = corigin_user[cdim2];
cradmin = cradmin_user;
cradmax = cradmax_user;
}
// if pbcflag set, sradmax must be < 1/2 box in any periodic non-axis dim
// treat orthongonal and triclinic the same
// check every time bins are created
if (pbcflag) {
double *prd_half = domain->prd_half;
int *periodicity = domain->periodicity;
int flag = 0;
if (periodicity[cdim1] && sradmax > prd_half[cdim1]) flag = 1;
if (periodicity[cdim2] && sradmax > prd_half[cdim2]) flag = 1;
if (flag)
error->all(FLERR,"Compute chunk/atom bin/cylinder radius "
"is too large for periodic box");
}
cinvrad = ncbin / (cradmax-cradmin);
// allocate and set radial bin coordinates
// radial coord = midpt of radii for a cylindrical shell
// axiscoord = saved bin coords along cylndrical axis
// radcoord = saved bin coords in radial direction
double **axiscoord = coord;
memory->create(coord,ncbin,1,"chunk/atom:coord");
double **radcoord = coord;
double rlo,rhi;
for (int i = 0; i < ncbin; i++) {
rlo = cradmin + i * (cradmax-cradmin) / ncbin;
rhi = cradmin + (i+1) * (cradmax-cradmin) / ncbin;
if (i == ncbin-1) rhi = cradmax;
coord[i][0] = 0.5 * (rlo+rhi);
}
// create array of combined coords for all bins
memory->create(coord,ncbin*ncplane,2,"chunk/atom:coord");
int m = 0;
for (int i = 0; i < ncbin; i++)
for (int j = 0; j < ncplane; j++) {
coord[m][0] = radcoord[i][0];
coord[m][1] = axiscoord[j][0];
m++;
}
memory->destroy(axiscoord);
memory->destroy(radcoord);
return ncbin*ncplane;
}
/* ----------------------------------------------------------------------
calculate chunk volumes = bin volumes
scalar if all bins have same volume
vector if per-bin volumes are different
------------------------------------------------------------------------- */
void ComputeChunkAtom::bin_volumes()
{
if (which == BIN1D || which == BIN2D || which == BIN3D) {
if (domain->dimension == 3)
chunk_volume_scalar = domain->xprd * domain->yprd * domain->zprd;
else chunk_volume_scalar = domain->xprd * domain->yprd;
double *prd;
if (scaleflag == REDUCED) prd = domain->prd_lamda;
else prd = domain->prd;
for (int m = 0; m < ndim; m++)
chunk_volume_scalar *= delta[m]/prd[dim[m]];
} else if (which == BINSPHERE) {
memory->destroy(chunk_volume_vec);
memory->create(chunk_volume_vec,nchunk,"chunk/atom:chunk_volume_vec");
double rlo,rhi,vollo,volhi;
for (int i = 0; i < nchunk; i++) {
rlo = sradmin + i * (sradmax-sradmin) / nsbin;
rhi = sradmin + (i+1) * (sradmax-sradmin) / nsbin;
if (i == nchunk-1) rhi = sradmax;
vollo = 4.0/3.0 * MY_PI * rlo*rlo*rlo;
volhi = 4.0/3.0 * MY_PI * rhi*rhi*rhi;
chunk_volume_vec[i] = volhi - vollo;
}
} else if (which == BINCYLINDER) {
memory->destroy(chunk_volume_vec);
memory->create(chunk_volume_vec,nchunk,"chunk/atom:chunk_volume_vec");
// slabthick = delta of bins along cylinder axis
double *prd;
if (scaleflag == REDUCED) prd = domain->prd_lamda;
else prd = domain->prd;
double slabthick = domain->prd[dim[0]] * delta[0]/prd[dim[0]];
// area lo/hi of concentric circles in radial direction
int iradbin;
double rlo,rhi,arealo,areahi;
for (int i = 0; i < nchunk; i++) {
iradbin = i / ncplane;
rlo = cradmin + iradbin * (cradmax-cradmin) / ncbin;
rhi = cradmin + (iradbin+1) * (cradmax-cradmin) / ncbin;
if (iradbin == ncbin-1) rhi = cradmax;
arealo = MY_PI * rlo*rlo;
areahi = MY_PI * rhi*rhi;
chunk_volume_vec[i] = (areahi-arealo) * slabthick;
}
}
}
/* ----------------------------------------------------------------------
assign each atom to a 1d spatial bin (layer)
------------------------------------------------------------------------- */
void ComputeChunkAtom::atom2bin1d()
{
int i,ibin;
double *boxlo,*boxhi,*prd;
double xremap;
double **x = atom->x;
int nlocal = atom->nlocal;
int idim = dim[0];
int nlayer1m1 = nlayers[0] - 1;
int periodicity = domain->periodicity[idim];
if (periodicity) {
if (scaleflag == REDUCED) {
boxlo = domain->boxlo_lamda;
boxhi = domain->boxhi_lamda;
prd = domain->prd_lamda;
} else {
boxlo = domain->boxlo;
boxhi = domain->boxhi;
prd = domain->prd;
}
}
// remap each atom's relevant coord back into box via PBC if necessary
// if scaleflag = REDUCED, box coords -> lamda coords
// apply discard rule
if (scaleflag == REDUCED) domain->x2lamda(nlocal);
for (i = 0; i < nlocal; i++) {
if (exclude[i]) continue;
xremap = x[i][idim];
if (periodicity) {
if (xremap < boxlo[idim]) xremap += prd[idim];
if (xremap >= boxhi[idim]) xremap -= prd[idim];
}
ibin = static_cast<int> ((xremap - offset[0]) * invdelta[0]);
if (xremap < offset[0]) ibin--;
if (discard == MIXED) {
if (!minflag[idim]) ibin = MAX(ibin,0);
else if (ibin < 0) {
exclude[i] = 1;
continue;
}
if (!maxflag[idim]) ibin = MIN(ibin,nlayer1m1);
else if (ibin > nlayer1m1) {
exclude[i] = 1;
continue;
}
} else if (discard == NODISCARD) {
ibin = MAX(ibin,0);
ibin = MIN(ibin,nlayer1m1);
} else if (ibin < 0 || ibin > nlayer1m1) {
exclude[i] = 1;
continue;
}
ichunk[i] = ibin+1;
}
if (scaleflag == REDUCED) domain->lamda2x(nlocal);
}
/* ----------------------------------------------------------------------
assign each atom to a 2d spatial bin (pencil)
------------------------------------------------------------------------- */
void ComputeChunkAtom::atom2bin2d()
{
int i,ibin,i1bin,i2bin;
double *boxlo,*boxhi,*prd;
double xremap,yremap;
double **x = atom->x;
int nlocal = atom->nlocal;
int idim = dim[0];
int jdim = dim[1];
int nlayer1m1 = nlayers[0] - 1;
int nlayer2m1 = nlayers[1] - 1;
int *periodicity = domain->periodicity;
if (periodicity[idim] || periodicity[jdim]) {
if (scaleflag == REDUCED) {
boxlo = domain->boxlo_lamda;
boxhi = domain->boxhi_lamda;
prd = domain->prd_lamda;
} else {
boxlo = domain->boxlo;
boxhi = domain->boxhi;
prd = domain->prd;
}
}
// remap each atom's relevant coord back into box via PBC if necessary
// if scaleflag = REDUCED, box coords -> lamda coords
// apply discard rule
if (scaleflag == REDUCED) domain->x2lamda(nlocal);
for (i = 0; i < nlocal; i++) {
if (exclude[i]) continue;
xremap = x[i][idim];
if (periodicity[idim]) {
if (xremap < boxlo[idim]) xremap += prd[idim];
if (xremap >= boxhi[idim]) xremap -= prd[idim];
}
i1bin = static_cast<int> ((xremap - offset[0]) * invdelta[0]);
if (xremap < offset[0]) i1bin--;
if (discard == MIXED) {
if (!minflag[idim]) i1bin = MAX(i1bin,0);
else if (i1bin < 0) {
exclude[i] = 1;
continue;
}
if (!maxflag[idim]) i1bin = MIN(i1bin,nlayer1m1);
else if (i1bin > nlayer1m1) {
exclude[i] = 1;
continue;
}
} else if (discard == NODISCARD) {
i1bin = MAX(i1bin,0);
i1bin = MIN(i1bin,nlayer1m1);
} else if (i1bin < 0 || i1bin > nlayer1m1) {
exclude[i] = 1;
continue;
}
yremap = x[i][jdim];
if (periodicity[jdim]) {
if (yremap < boxlo[jdim]) yremap += prd[jdim];
if (yremap >= boxhi[jdim]) yremap -= prd[jdim];
}
i2bin = static_cast<int> ((yremap - offset[1]) * invdelta[1]);
if (yremap < offset[1]) i2bin--;
if (discard == MIXED) {
if (!minflag[jdim]) i2bin = MAX(i2bin,0);
else if (i2bin < 0) {
exclude[i] = 1;
continue;
}
if (!maxflag[jdim]) i2bin = MIN(i2bin,nlayer2m1);
else if (i2bin > nlayer2m1) {
exclude[i] = 1;
continue;
}
} else if (discard == NODISCARD) {
i2bin = MAX(i2bin,0);
i2bin = MIN(i2bin,nlayer2m1);
} else if (i2bin < 0 || i2bin > nlayer2m1) {
exclude[i] = 1;
continue;
}
ibin = i1bin*nlayers[1] + i2bin;
ichunk[i] = ibin+1;
}
if (scaleflag == REDUCED) domain->lamda2x(nlocal);
}
/* ----------------------------------------------------------------------
assign each atom to a 3d spatial bin (brick)
------------------------------------------------------------------------- */
void ComputeChunkAtom::atom2bin3d()
{
int i,ibin,i1bin,i2bin,i3bin;
double *boxlo,*boxhi,*prd;
double xremap,yremap,zremap;
double **x = atom->x;
int nlocal = atom->nlocal;
int idim = dim[0];
int jdim = dim[1];
int kdim = dim[2];
int nlayer1m1 = nlayers[0] - 1;
int nlayer2m1 = nlayers[1] - 1;
int nlayer3m1 = nlayers[2] - 1;
int *periodicity = domain->periodicity;
if (periodicity[idim] || periodicity[jdim] || periodicity[kdim]) {
if (scaleflag == REDUCED) {
boxlo = domain->boxlo_lamda;
boxhi = domain->boxhi_lamda;
prd = domain->prd_lamda;
} else {
boxlo = domain->boxlo;
boxhi = domain->boxhi;
prd = domain->prd;
}
}
// remap each atom's relevant coord back into box via PBC if necessary
// if scaleflag = REDUCED, box coords -> lamda coords
// apply discard rule
if (scaleflag == REDUCED) domain->x2lamda(nlocal);
for (i = 0; i < nlocal; i++) {
if (exclude[i]) continue;
xremap = x[i][idim];
if (periodicity[idim]) {
if (xremap < boxlo[idim]) xremap += prd[idim];
if (xremap >= boxhi[idim]) xremap -= prd[idim];
}
i1bin = static_cast<int> ((xremap - offset[0]) * invdelta[0]);
if (xremap < offset[0]) i1bin--;
if (discard == MIXED) {
if (!minflag[idim]) i1bin = MAX(i1bin,0);
else if (i1bin < 0) {
exclude[i] = 1;
continue;
}
if (!maxflag[idim]) i1bin = MIN(i1bin,nlayer1m1);
else if (i1bin > nlayer1m1) {
exclude[i] = 1;
continue;
}
} else if (discard == NODISCARD) {
i1bin = MAX(i1bin,0);
i1bin = MIN(i1bin,nlayer1m1);
} else if (i1bin < 0 || i1bin > nlayer1m1) {
exclude[i] = 1;
continue;
}
yremap = x[i][jdim];
if (periodicity[jdim]) {
if (yremap < boxlo[jdim]) yremap += prd[jdim];
if (yremap >= boxhi[jdim]) yremap -= prd[jdim];
}
i2bin = static_cast<int> ((yremap - offset[1]) * invdelta[1]);
if (yremap < offset[1]) i2bin--;
if (discard == MIXED) {
if (!minflag[jdim]) i2bin = MAX(i2bin,0);
else if (i2bin < 0) {
exclude[i] = 1;
continue;
}
if (!maxflag[jdim]) i2bin = MIN(i2bin,nlayer2m1);
else if (i2bin > nlayer2m1) {
exclude[i] = 1;
continue;
}
} else if (discard == NODISCARD) {
i2bin = MAX(i2bin,0);
i2bin = MIN(i2bin,nlayer2m1);
} else if (i2bin < 0 || i2bin > nlayer2m1) {
exclude[i] = 1;
continue;
}
zremap = x[i][kdim];
if (periodicity[kdim]) {
if (zremap < boxlo[kdim]) zremap += prd[kdim];
if (zremap >= boxhi[kdim]) zremap -= prd[kdim];
}
i3bin = static_cast<int> ((zremap - offset[2]) * invdelta[2]);
if (zremap < offset[2]) i3bin--;
if (discard == MIXED) {
if (!minflag[kdim]) i3bin = MAX(i3bin,0);
else if (i3bin < 0) {
exclude[i] = 1;
continue;
}
if (!maxflag[kdim]) i3bin = MIN(i3bin,nlayer3m1);
else if (i3bin > nlayer3m1) {
exclude[i] = 1;
continue;
}
} else if (discard == NODISCARD) {
i3bin = MAX(i3bin,0);
i3bin = MIN(i3bin,nlayer3m1);
} else if (i3bin < 0 || i3bin > nlayer3m1) {
exclude[i] = 1;
continue;
}
ibin = i1bin*nlayers[1]*nlayers[2] + i2bin*nlayers[2] + i3bin;
ichunk[i] = ibin+1;
}
if (scaleflag == REDUCED) domain->lamda2x(nlocal);
}
/* ----------------------------------------------------------------------
assign each atom to a spherical bin
------------------------------------------------------------------------- */
void ComputeChunkAtom::atom2binsphere()
{
int i,ibin;
double dx,dy,dz,r;
double xremap,yremap,zremap;
double *boxlo = domain->boxlo;
double *boxhi = domain->boxhi;
double *prd = domain->prd;
double *prd_half = domain->prd_half;
int *periodicity = domain->periodicity;
// remap each atom's relevant coords back into box via PBC if necessary
// apply discard rule based on rmin and rmax
double **x = atom->x;
int nlocal = atom->nlocal;
for (i = 0; i < nlocal; i++) {
if (exclude[i]) continue;
xremap = x[i][0];
if (periodicity[0]) {
if (xremap < boxlo[0]) xremap += prd[0];
if (xremap >= boxhi[0]) xremap -= prd[0];
}
yremap = x[i][1];
if (periodicity[1]) {
if (xremap < boxlo[1]) yremap += prd[1];
if (xremap >= boxhi[1]) yremap -= prd[1];
}
zremap = x[i][2];
if (periodicity[2]) {
if (xremap < boxlo[2]) zremap += prd[2];
if (xremap >= boxhi[2]) zremap -= prd[2];
}
dx = xremap - sorigin[0];
dy = yremap - sorigin[1];
dz = zremap - sorigin[2];
// if requested, apply PBC to distance from sphere center
// treat orthogonal and triclinic the same
// with dx,dy,dz = lengths independent of each other
// so do not use domain->minimum_image() which couples for triclinic
if (pbcflag) {
if (periodicity[0]) {
if (fabs(dx) > prd_half[0]) {
if (dx < 0.0) dx += prd[0];
else dx -= prd[0];
}
}
if (periodicity[1]) {
if (fabs(dy) > prd_half[1]) {
if (dy < 0.0) dy += prd[1];
else dy -= prd[1];
}
}
if (periodicity[2]) {
if (fabs(dz) > prd_half[2]) {
if (dz < 0.0) dz += prd[2];
else dz -= prd[2];
}
}
}
r = sqrt(dx*dx + dy*dy + dz*dz);
ibin = static_cast<int> ((r - sradmin) * sinvrad);
if (r < sradmin) ibin--;
if (discard == MIXED || discard == NODISCARD) {
ibin = MAX(ibin,0);
ibin = MIN(ibin,nchunk-1);
} else if (ibin < 0 || ibin >= nchunk) {
exclude[i] = 1;
continue;
}
ichunk[i] = ibin+1;
}
}
/* ----------------------------------------------------------------------
assign each atom to a cylindrical bin
------------------------------------------------------------------------- */
void ComputeChunkAtom::atom2bincylinder()
{
int i,rbin,kbin;
double d1,d2,r;
double remap1,remap2;
// first use atom2bin1d() to bin all atoms along cylinder axis
atom2bin1d();
// now bin in radial direction
// kbin = bin along cylinder axis
// rbin = bin in radial direction
double *boxlo = domain->boxlo;
double *boxhi = domain->boxhi;
double *prd = domain->prd;
double *prd_half = domain->prd_half;
int *periodicity = domain->periodicity;
// remap each atom's relevant coords back into box via PBC if necessary
// apply discard rule based on rmin and rmax
double **x = atom->x;
int nlocal = atom->nlocal;
for (i = 0; i < nlocal; i++) {
if (exclude[i]) continue;
kbin = ichunk[i] - 1;
remap1 = x[i][cdim1];
if (periodicity[cdim1]) {
if (remap1 < boxlo[cdim1]) remap1 += prd[cdim1];
if (remap1 >= boxhi[cdim1]) remap1 -= prd[cdim1];
}
remap2 = x[i][cdim2];
if (periodicity[cdim2]) {
if (remap2 < boxlo[cdim2]) remap2 += prd[cdim2];
if (remap2 >= boxhi[cdim2]) remap2 -= prd[cdim2];
}
d1 = remap1 - corigin[cdim1];
d2 = remap2 - corigin[cdim2];
// if requested, apply PBC to distance from cylinder axis
// treat orthogonal and triclinic the same
// with d1,d2 = lengths independent of each other
if (pbcflag) {
if (periodicity[cdim1]) {
if (fabs(d1) > prd_half[cdim1]) {
if (d1 < 0.0) d1 += prd[cdim1];
else d1 -= prd[cdim1];
}
}
if (periodicity[cdim2]) {
if (fabs(d2) > prd_half[cdim2]) {
if (d2 < 0.0) d2 += prd[cdim2];
else d2 -= prd[cdim2];
}
}
}
r = sqrt(d1*d1 + d2*d2);
rbin = static_cast<int> ((r - cradmin) * cinvrad);
if (r < cradmin) rbin--;
if (discard == MIXED || discard == NODISCARD) {
rbin = MAX(rbin,0);
rbin = MIN(rbin,ncbin-1);
} else if (rbin < 0 || rbin >= ncbin) {
exclude[i] = 1;
continue;
}
// combine axis and radial bin indices to set ichunk
ichunk[i] = rbin*ncplane + kbin + 1;
}
}
/* ----------------------------------------------------------------------
process args for one dimension of binning info
set dim, originflag, origin, delta
------------------------------------------------------------------------- */
void ComputeChunkAtom::readdim(int narg, char **arg, int iarg, int idim)
{
if (narg < iarg+3) error->all(FLERR,"Illegal compute chunk/atom command");
if (strcmp(arg[iarg],"x") == 0) dim[idim] = 0;
else if (strcmp(arg[iarg],"y") == 0) dim[idim] = 1;
else if (strcmp(arg[iarg],"z") == 0) dim[idim] = 2;
else error->all(FLERR,"Illegal compute chunk/atom command");
if (dim[idim] == 2 && domain->dimension == 2)
error->all(FLERR,"Cannot use compute chunk/atom bin z for 2d model");
if (strcmp(arg[iarg+1],"lower") == 0) originflag[idim] = LOWER;
else if (strcmp(arg[iarg+1],"center") == 0) originflag[idim] = CENTER;
else if (strcmp(arg[iarg+1],"upper") == 0) originflag[idim] = UPPER;
else originflag[idim] = COORD;
if (originflag[idim] == COORD)
origin[idim] = force->numeric(FLERR,arg[iarg+1]);
delta[idim] = force->numeric(FLERR,arg[iarg+2]);
}
/* ----------------------------------------------------------------------
initialize one atom's storage values, called when atom is created
just set chunkID to 0 for new atom
------------------------------------------------------------------------- */
void ComputeChunkAtom::set_arrays(int i)
{
if (!fixstore) return;
double *vstore = fixstore->vstore;
vstore[i] = 0.0;
}
/* ----------------------------------------------------------------------
memory usage of local atom-based arrays and per-chunk arrays
note: nchunk is actually 0 until first call
------------------------------------------------------------------------- */
double ComputeChunkAtom::memory_usage()
{
double bytes = 2*MAX(nmaxint,0) * sizeof(int); // ichunk,exclude
bytes += nmax * sizeof(double); // chunk
bytes += ncoord*nchunk * sizeof(double); // coord
if (compress) bytes += nchunk * sizeof(int); // chunkID
return bytes;
}
diff --git a/src/compute_coord_atom.cpp b/src/compute_coord_atom.cpp
index 21744dcc9..36f0b6350 100644
--- a/src/compute_coord_atom.cpp
+++ b/src/compute_coord_atom.cpp
@@ -1,226 +1,354 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include <math.h>
#include <string.h>
#include <stdlib.h>
#include "compute_coord_atom.h"
+#include "compute_orientorder_atom.h"
#include "atom.h"
#include "update.h"
#include "modify.h"
#include "neighbor.h"
#include "neigh_list.h"
#include "neigh_request.h"
#include "force.h"
#include "pair.h"
#include "comm.h"
#include "memory.h"
#include "error.h"
using namespace LAMMPS_NS;
+#define INVOKED_PERATOM 8
+
/* ---------------------------------------------------------------------- */
ComputeCoordAtom::ComputeCoordAtom(LAMMPS *lmp, int narg, char **arg) :
Compute(lmp, narg, arg),
- typelo(NULL), typehi(NULL), cvec(NULL), carray(NULL)
+ typelo(NULL), typehi(NULL), cvec(NULL), carray(NULL),
+ id_orientorder(NULL), normv(NULL)
{
- if (narg < 4) error->all(FLERR,"Illegal compute coord/atom command");
+ if (narg < 5) error->all(FLERR,"Illegal compute coord/atom command");
+
+ cstyle = NONE;
+
+ if (strcmp(arg[3],"cutoff") == 0) {
+ cstyle = CUTOFF;
+ double cutoff = force->numeric(FLERR,arg[4]);
+ cutsq = cutoff*cutoff;
+
+ ncol = narg-5 + 1;
+ int ntypes = atom->ntypes;
+ typelo = new int[ncol];
+ typehi = new int[ncol];
+
+ if (narg == 5) {
+ ncol = 1;
+ typelo[0] = 1;
+ typehi[0] = ntypes;
+ } else {
+ ncol = 0;
+ int iarg = 5;
+ while (iarg < narg) {
+ force->bounds(FLERR,arg[iarg],ntypes,typelo[ncol],typehi[ncol]);
+ if (typelo[ncol] > typehi[ncol])
+ error->all(FLERR,"Illegal compute coord/atom command");
+ ncol++;
+ iarg++;
+ }
+ }
+
+ } else if (strcmp(arg[3],"orientorder") == 0) {
+ cstyle = ORIENT;
+ if (narg != 6) error->all(FLERR,"Illegal compute coord/atom command");
- double cutoff = force->numeric(FLERR,arg[3]);
- cutsq = cutoff*cutoff;
+ int n = strlen(arg[4]) + 1;
+ id_orientorder = new char[n];
+ strcpy(id_orientorder,arg[4]);
- ncol = narg-4 + 1;
- int ntypes = atom->ntypes;
- typelo = new int[ncol];
- typehi = new int[ncol];
+ int iorientorder = modify->find_compute(id_orientorder);
+ if (iorientorder < 0)
+ error->all(FLERR,"Could not find compute coord/atom compute ID");
+ if (strcmp(modify->compute[iorientorder]->style,"orientorder/atom") != 0)
+ error->all(FLERR,"Compute coord/atom compute ID does not compute orientorder/atom");
+
+ threshold = force->numeric(FLERR,arg[5]);
+ if (threshold <= -1.0 || threshold >= 1.0)
+ error->all(FLERR,"Compute coord/atom threshold value must lie between -1 and 1");
- if (narg == 4) {
ncol = 1;
+ typelo = new int[ncol];
+ typehi = new int[ncol];
typelo[0] = 1;
- typehi[0] = ntypes;
- } else {
- ncol = 0;
- int iarg = 4;
- while (iarg < narg) {
- force->bounds(FLERR,arg[iarg],ntypes,typelo[ncol],typehi[ncol]);
- if (typelo[ncol] > typehi[ncol])
- error->all(FLERR,"Illegal compute coord/atom command");
- ncol++;
- iarg++;
- }
- }
+ typehi[0] = atom->ntypes;
+
+ } else error->all(FLERR,"Invalid cstyle in compute coord/atom");
peratom_flag = 1;
if (ncol == 1) size_peratom_cols = 0;
else size_peratom_cols = ncol;
nmax = 0;
}
/* ---------------------------------------------------------------------- */
ComputeCoordAtom::~ComputeCoordAtom()
{
delete [] typelo;
delete [] typehi;
memory->destroy(cvec);
memory->destroy(carray);
+ delete [] id_orientorder;
}
/* ---------------------------------------------------------------------- */
void ComputeCoordAtom::init()
{
+ if (cstyle == ORIENT) {
+ int iorientorder = modify->find_compute(id_orientorder);
+ c_orientorder = (ComputeOrientOrderAtom*)(modify->compute[iorientorder]);
+ cutsq = c_orientorder->cutsq;
+ l = c_orientorder->qlcomp;
+ // communicate real and imaginary 2*l+1 components of the normalized vector
+ comm_forward = 2*(2*l+1);
+ if (c_orientorder->iqlcomp < 0)
+ error->all(FLERR,"Compute coord/atom requires components "
+ "option in compute orientorder/atom be defined");
+ }
+
if (force->pair == NULL)
error->all(FLERR,"Compute coord/atom requires a pair style be defined");
if (sqrt(cutsq) > force->pair->cutforce)
error->all(FLERR,
"Compute coord/atom cutoff is longer than pairwise cutoff");
// need an occasional full neighbor list
int irequest = neighbor->request(this,instance_me);
neighbor->requests[irequest]->pair = 0;
neighbor->requests[irequest]->compute = 1;
neighbor->requests[irequest]->half = 0;
neighbor->requests[irequest]->full = 1;
neighbor->requests[irequest]->occasional = 1;
int count = 0;
for (int i = 0; i < modify->ncompute; i++)
if (strcmp(modify->compute[i]->style,"coord/atom") == 0) count++;
if (count > 1 && comm->me == 0)
error->warning(FLERR,"More than one compute coord/atom");
}
/* ---------------------------------------------------------------------- */
void ComputeCoordAtom::init_list(int id, NeighList *ptr)
{
list = ptr;
}
/* ---------------------------------------------------------------------- */
void ComputeCoordAtom::compute_peratom()
{
int i,j,m,ii,jj,inum,jnum,jtype,n;
double xtmp,ytmp,ztmp,delx,dely,delz,rsq;
int *ilist,*jlist,*numneigh,**firstneigh;
double *count;
invoked_peratom = update->ntimestep;
+// printf("Number of degrees %i components degree %i",nqlist,l);
+// printf("Particle \t %i \t Norm \t %g \n",0,norm[0][0]);
+
// grow coordination array if necessary
if (atom->nmax > nmax) {
if (ncol == 1) {
memory->destroy(cvec);
nmax = atom->nmax;
memory->create(cvec,nmax,"coord/atom:cvec");
vector_atom = cvec;
} else {
memory->destroy(carray);
nmax = atom->nmax;
memory->create(carray,nmax,ncol,"coord/atom:carray");
array_atom = carray;
}
}
+ if (cstyle == ORIENT) {
+ if (!(c_orientorder->invoked_flag & INVOKED_PERATOM)) {
+ c_orientorder->compute_peratom();
+ c_orientorder->invoked_flag |= INVOKED_PERATOM;
+ }
+ nqlist = c_orientorder->nqlist;
+ int ltmp = l;
+// l = c_orientorder->qlcomp;
+ if (ltmp != l) error->all(FLERR,"Debug error, ltmp != l\n");
+ normv = c_orientorder->array_atom;
+ comm->forward_comm_compute(this);
+ }
+
// invoke full neighbor list (will copy or build if necessary)
neighbor->build_one(list);
inum = list->inum;
ilist = list->ilist;
numneigh = list->numneigh;
firstneigh = list->firstneigh;
// compute coordination number(s) for each atom in group
// use full neighbor list to count atoms less than cutoff
double **x = atom->x;
int *type = atom->type;
int *mask = atom->mask;
- if (ncol == 1) {
+ if (cstyle == CUTOFF) {
+
+ if (ncol == 1) {
+
+ for (ii = 0; ii < inum; ii++) {
+ i = ilist[ii];
+ if (mask[i] & groupbit) {
+ xtmp = x[i][0];
+ ytmp = x[i][1];
+ ztmp = x[i][2];
+ jlist = firstneigh[i];
+ jnum = numneigh[i];
+
+ n = 0;
+ for (jj = 0; jj < jnum; jj++) {
+ j = jlist[jj];
+ j &= NEIGHMASK;
+
+ jtype = type[j];
+ delx = xtmp - x[j][0];
+ dely = ytmp - x[j][1];
+ delz = ztmp - x[j][2];
+ rsq = delx*delx + dely*dely + delz*delz;
+ if (rsq < cutsq && jtype >= typelo[0] && jtype <= typehi[0])
+ n++;
+ }
+
+ cvec[i] = n;
+ } else cvec[i] = 0.0;
+ }
+
+ } else {
+ for (ii = 0; ii < inum; ii++) {
+ i = ilist[ii];
+ count = carray[i];
+ for (m = 0; m < ncol; m++) count[m] = 0.0;
+
+ if (mask[i] & groupbit) {
+ xtmp = x[i][0];
+ ytmp = x[i][1];
+ ztmp = x[i][2];
+ jlist = firstneigh[i];
+ jnum = numneigh[i];
+
+
+ for (jj = 0; jj < jnum; jj++) {
+ j = jlist[jj];
+ j &= NEIGHMASK;
+
+ jtype = type[j];
+ delx = xtmp - x[j][0];
+ dely = ytmp - x[j][1];
+ delz = ztmp - x[j][2];
+ rsq = delx*delx + dely*dely + delz*delz;
+ if (rsq < cutsq) {
+ for (m = 0; m < ncol; m++)
+ if (jtype >= typelo[m] && jtype <= typehi[m])
+ count[m] += 1.0;
+ }
+ }
+ }
+ }
+ }
+
+ } else if (cstyle == ORIENT) {
+
for (ii = 0; ii < inum; ii++) {
i = ilist[ii];
if (mask[i] & groupbit) {
xtmp = x[i][0];
ytmp = x[i][1];
ztmp = x[i][2];
jlist = firstneigh[i];
jnum = numneigh[i];
n = 0;
for (jj = 0; jj < jnum; jj++) {
j = jlist[jj];
j &= NEIGHMASK;
-
- jtype = type[j];
delx = xtmp - x[j][0];
dely = ytmp - x[j][1];
delz = ztmp - x[j][2];
rsq = delx*delx + dely*dely + delz*delz;
- if (rsq < cutsq && jtype >= typelo[0] && jtype <= typehi[0]) n++;
+ if (rsq < cutsq) {
+ double dot_product = 0.0;
+ for (int m=0; m < 2*(2*l+1); m++) {
+ dot_product += normv[i][nqlist+m]*normv[j][nqlist+m];
+ }
+ if (dot_product > threshold) n++;
+ }
}
-
cvec[i] = n;
} else cvec[i] = 0.0;
}
+ }
+}
- } else {
- for (ii = 0; ii < inum; ii++) {
- i = ilist[ii];
- count = carray[i];
- for (m = 0; m < ncol; m++) count[m] = 0.0;
+/* ---------------------------------------------------------------------- */
- if (mask[i] & groupbit) {
- xtmp = x[i][0];
- ytmp = x[i][1];
- ztmp = x[i][2];
- jlist = firstneigh[i];
- jnum = numneigh[i];
+int ComputeCoordAtom::pack_forward_comm(int n, int *list, double *buf,
+ int pbc_flag, int *pbc)
+{
+ int i,m=0,j;
+ for (i = 0; i < n; ++i) {
+ for (j = nqlist; j < nqlist + 2*(2*l+1); ++j) {
+ buf[m++] = normv[list[i]][j];
+ }
+ }
+ return m;
+}
- for (jj = 0; jj < jnum; jj++) {
- j = jlist[jj];
- j &= NEIGHMASK;
+/* ---------------------------------------------------------------------- */
- jtype = type[j];
- delx = xtmp - x[j][0];
- dely = ytmp - x[j][1];
- delz = ztmp - x[j][2];
- rsq = delx*delx + dely*dely + delz*delz;
- if (rsq < cutsq) {
- for (m = 0; m < ncol; m++)
- if (jtype >= typelo[m] && jtype <= typehi[m])
- count[m] += 1.0;
- }
- }
- }
+void ComputeCoordAtom::unpack_forward_comm(int n, int first, double *buf)
+{
+ int i,last,m=0,j;
+ last = first + n;
+ for (i = first; i < last; ++i) {
+ for (j = nqlist; j < nqlist + 2*(2*l+1); ++j) {
+ normv[i][j] = buf[m++];
}
}
+
}
/* ----------------------------------------------------------------------
memory usage of local atom-based array
------------------------------------------------------------------------- */
double ComputeCoordAtom::memory_usage()
{
double bytes = ncol*nmax * sizeof(double);
return bytes;
}
diff --git a/src/compute_coord_atom.h b/src/compute_coord_atom.h
index 0ff373f13..2ad46fa85 100644
--- a/src/compute_coord_atom.h
+++ b/src/compute_coord_atom.h
@@ -1,72 +1,81 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#ifdef COMPUTE_CLASS
ComputeStyle(coord/atom,ComputeCoordAtom)
#else
#ifndef LMP_COMPUTE_COORD_ATOM_H
#define LMP_COMPUTE_COORD_ATOM_H
#include "compute.h"
namespace LAMMPS_NS {
class ComputeCoordAtom : public Compute {
public:
ComputeCoordAtom(class LAMMPS *, int, char **);
~ComputeCoordAtom();
void init();
void init_list(int, class NeighList *);
void compute_peratom();
+ int pack_forward_comm(int, int *, double *, int, int *);
+ void unpack_forward_comm(int, int, double *);
double memory_usage();
+ enum {NONE,CUTOFF,ORIENT};
private:
int nmax,ncol;
double cutsq;
class NeighList *list;
int *typelo,*typehi;
double *cvec;
double **carray;
+
+ class ComputeOrientOrderAtom *c_orientorder;
+ char *id_orientorder;
+ double threshold;
+ double **normv;
+ int cstyle,nqlist,l;
};
}
#endif
#endif
/* ERROR/WARNING messages:
E: Illegal ... command
Self-explanatory. Check the input script syntax and compare to the
documentation for the command. You can use -echo screen as a
command-line option when running LAMMPS to see the offending line.
E: Compute coord/atom requires a pair style be defined
Self-explantory.
E: Compute coord/atom cutoff is longer than pairwise cutoff
Cannot compute coordination at distances longer than the pair cutoff,
since those atoms are not in the neighbor list.
W: More than one compute coord/atom
It is not efficient to use compute coord/atom more than once.
*/
diff --git a/src/compute_orientorder_atom.cpp b/src/compute_orientorder_atom.cpp
index 6c5a2c0c0..5f78b33b6 100644
--- a/src/compute_orientorder_atom.cpp
+++ b/src/compute_orientorder_atom.cpp
@@ -1,503 +1,541 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing author: Aidan Thompson (SNL)
Axel Kohlmeyer (Temple U)
------------------------------------------------------------------------- */
#include <string.h>
#include <stdlib.h>
#include "compute_orientorder_atom.h"
#include "atom.h"
#include "update.h"
#include "modify.h"
#include "neighbor.h"
#include "neigh_list.h"
#include "neigh_request.h"
#include "force.h"
#include "pair.h"
#include "comm.h"
#include "memory.h"
#include "error.h"
#include "math_const.h"
using namespace LAMMPS_NS;
using namespace MathConst;
using namespace std;
#ifdef DBL_EPSILON
#define MY_EPSILON (10.0*DBL_EPSILON)
#else
#define MY_EPSILON (10.0*2.220446049250313e-16)
#endif
/* ---------------------------------------------------------------------- */
ComputeOrientOrderAtom::ComputeOrientOrderAtom(LAMMPS *lmp, int narg, char **arg) :
Compute(lmp, narg, arg),
- distsq(NULL), nearest(NULL), rlist(NULL), qlist(NULL), qnarray(NULL), qnm_r(NULL), qnm_i(NULL)
+ qlist(NULL), distsq(NULL), nearest(NULL), rlist(NULL),
+ qnarray(NULL), qnm_r(NULL), qnm_i(NULL)
{
if (narg < 3 ) error->all(FLERR,"Illegal compute orientorder/atom command");
// set default values for optional args
nnn = 12;
cutsq = 0.0;
+ qlcompflag = 0;
// specify which orders to request
-
+
nqlist = 5;
memory->create(qlist,nqlist,"orientorder/atom:qlist");
qlist[0] = 4;
qlist[1] = 6;
qlist[2] = 8;
qlist[3] = 10;
qlist[4] = 12;
qmax = 12;
// process optional args
int iarg = 3;
while (iarg < narg) {
if (strcmp(arg[iarg],"nnn") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal compute orientorder/atom command");
- if (strcmp(arg[iarg+1],"NULL") == 0)
- nnn = 0;
- else {
- nnn = force->numeric(FLERR,arg[iarg+1]);
- if (nnn <= 0)
- error->all(FLERR,"Illegal compute orientorder/atom command");
+ if (strcmp(arg[iarg+1],"NULL") == 0) {
+ nnn = 0;
+ } else {
+ nnn = force->numeric(FLERR,arg[iarg+1]);
+ if (nnn <= 0)
+ error->all(FLERR,"Illegal compute orientorder/atom command");
}
iarg += 2;
} else if (strcmp(arg[iarg],"degrees") == 0) {
- if (iarg+2 > narg) error->all(FLERR,"Illegal compute orientorder/atom command");
+ if (iarg+2 > narg)
+ error->all(FLERR,"Illegal compute orientorder/atom command");
nqlist = force->numeric(FLERR,arg[iarg+1]);
- if (nqlist <= 0) error->all(FLERR,"Illegal compute orientorder/atom command");
+ if (nqlist <= 0)
+ error->all(FLERR,"Illegal compute orientorder/atom command");
memory->destroy(qlist);
memory->create(qlist,nqlist,"orientorder/atom:qlist");
iarg += 2;
if (iarg+nqlist > narg) error->all(FLERR,"Illegal compute orientorder/atom command");
qmax = 0;
for (int iw = 0; iw < nqlist; iw++) {
- qlist[iw] = force->numeric(FLERR,arg[iarg+iw]);
- if (qlist[iw] < 0)
- error->all(FLERR,"Illegal compute orientorder/atom command");
- if (qlist[iw] > qmax) qmax = qlist[iw];
+ qlist[iw] = force->numeric(FLERR,arg[iarg+iw]);
+ if (qlist[iw] < 0)
+ error->all(FLERR,"Illegal compute orientorder/atom command");
+ if (qlist[iw] > qmax) qmax = qlist[iw];
}
iarg += nqlist;
+ if (strcmp(arg[iarg],"components") == 0) {
+ qlcompflag = 1;
+ if (iarg+2 > narg)
+ error->all(FLERR,"Illegal compute orientorder/atom command");
+ qlcomp = force->numeric(FLERR,arg[iarg+1]);
+ if (qlcomp <= 0)
+ error->all(FLERR,"Illegal compute orientorder/atom command");
+ iqlcomp = -1;
+ for (int iw = 0; iw < nqlist; iw++)
+ if (qlcomp == qlist[iw]) {
+ iqlcomp = iw;
+ break;
+ }
+ if (iqlcomp < 0)
+ error->all(FLERR,"Illegal compute orientorder/atom command");
+ iarg += 2;
+ }
} else if (strcmp(arg[iarg],"cutoff") == 0) {
- if (iarg+2 > narg) error->all(FLERR,"Illegal compute orientorder/atom command");
+ if (iarg+2 > narg)
+ error->all(FLERR,"Illegal compute orientorder/atom command");
double cutoff = force->numeric(FLERR,arg[iarg+1]);
- if (cutoff <= 0.0) error->all(FLERR,"Illegal compute orientorder/atom command");
+ if (cutoff <= 0.0)
+ error->all(FLERR,"Illegal compute orientorder/atom command");
cutsq = cutoff*cutoff;
iarg += 2;
} else error->all(FLERR,"Illegal compute orientorder/atom command");
}
- ncol = nqlist;
+ if (qlcompflag) ncol = nqlist + 2*(2*qlcomp+1);
+ else ncol = nqlist;
+
peratom_flag = 1;
size_peratom_cols = ncol;
nmax = 0;
maxneigh = 0;
}
/* ---------------------------------------------------------------------- */
ComputeOrientOrderAtom::~ComputeOrientOrderAtom()
{
memory->destroy(qnarray);
memory->destroy(distsq);
memory->destroy(rlist);
memory->destroy(nearest);
memory->destroy(qlist);
memory->destroy(qnm_r);
memory->destroy(qnm_i);
-
+
}
/* ---------------------------------------------------------------------- */
void ComputeOrientOrderAtom::init()
{
if (force->pair == NULL)
error->all(FLERR,"Compute orientorder/atom requires a pair style be defined");
if (cutsq == 0.0) cutsq = force->pair->cutforce * force->pair->cutforce;
else if (sqrt(cutsq) > force->pair->cutforce)
error->all(FLERR,
"Compute orientorder/atom cutoff is longer than pairwise cutoff");
memory->create(qnm_r,qmax,2*qmax+1,"orientorder/atom:qnm_r");
memory->create(qnm_i,qmax,2*qmax+1,"orientorder/atom:qnm_i");
// need an occasional full neighbor list
int irequest = neighbor->request(this,instance_me);
neighbor->requests[irequest]->pair = 0;
neighbor->requests[irequest]->compute = 1;
neighbor->requests[irequest]->half = 0;
neighbor->requests[irequest]->full = 1;
neighbor->requests[irequest]->occasional = 1;
int count = 0;
for (int i = 0; i < modify->ncompute; i++)
if (strcmp(modify->compute[i]->style,"orientorder/atom") == 0) count++;
if (count > 1 && comm->me == 0)
error->warning(FLERR,"More than one compute orientorder/atom");
}
/* ---------------------------------------------------------------------- */
void ComputeOrientOrderAtom::init_list(int id, NeighList *ptr)
{
list = ptr;
}
/* ---------------------------------------------------------------------- */
void ComputeOrientOrderAtom::compute_peratom()
{
int i,j,ii,jj,inum,jnum;
double xtmp,ytmp,ztmp,delx,dely,delz,rsq;
int *ilist,*jlist,*numneigh,**firstneigh;
invoked_peratom = update->ntimestep;
// grow order parameter array if necessary
if (atom->nmax > nmax) {
memory->destroy(qnarray);
nmax = atom->nmax;
memory->create(qnarray,nmax,ncol,"orientorder/atom:qnarray");
array_atom = qnarray;
}
// invoke full neighbor list (will copy or build if necessary)
neighbor->build_one(list);
inum = list->inum;
ilist = list->ilist;
numneigh = list->numneigh;
firstneigh = list->firstneigh;
// compute order parameter for each atom in group
// use full neighbor list to count atoms less than cutoff
double **x = atom->x;
int *mask = atom->mask;
for (ii = 0; ii < inum; ii++) {
i = ilist[ii];
double* qn = qnarray[i];
if (mask[i] & groupbit) {
xtmp = x[i][0];
ytmp = x[i][1];
ztmp = x[i][2];
jlist = firstneigh[i];
jnum = numneigh[i];
-
+
// insure distsq and nearest arrays are long enough
if (jnum > maxneigh) {
memory->destroy(distsq);
memory->destroy(rlist);
memory->destroy(nearest);
maxneigh = jnum;
memory->create(distsq,maxneigh,"orientorder/atom:distsq");
memory->create(rlist,maxneigh,3,"orientorder/atom:rlist");
memory->create(nearest,maxneigh,"orientorder/atom:nearest");
}
// loop over list of all neighbors within force cutoff
// distsq[] = distance sq to each
// rlist[] = distance vector to each
// nearest[] = atom indices of neighbors
int ncount = 0;
for (jj = 0; jj < jnum; jj++) {
j = jlist[jj];
j &= NEIGHMASK;
delx = xtmp - x[j][0];
dely = ytmp - x[j][1];
delz = ztmp - x[j][2];
rsq = delx*delx + dely*dely + delz*delz;
if (rsq < cutsq) {
distsq[ncount] = rsq;
- rlist[ncount][0] = delx;
- rlist[ncount][1] = dely;
- rlist[ncount][2] = delz;
+ rlist[ncount][0] = delx;
+ rlist[ncount][1] = dely;
+ rlist[ncount][2] = delz;
nearest[ncount++] = j;
}
}
// if not nnn neighbors, order parameter = 0;
if ((ncount == 0) || (ncount < nnn)) {
- for (int iw = 0; iw < nqlist; iw++)
- qn[iw] = 0.0;
+ for (int iw = 0; iw < nqlist; iw++)
+ qn[iw] = 0.0;
continue;
}
// if nnn > 0, use only nearest nnn neighbors
if (nnn > 0) {
- select3(nnn,ncount,distsq,nearest,rlist);
- ncount = nnn;
+ select3(nnn,ncount,distsq,nearest,rlist);
+ ncount = nnn;
}
calc_boop(rlist, ncount, qn, qlist, nqlist);
}
}
}
/* ----------------------------------------------------------------------
memory usage of local atom-based array
------------------------------------------------------------------------- */
double ComputeOrientOrderAtom::memory_usage()
{
double bytes = ncol*nmax * sizeof(double);
- bytes += (qmax*(2*qmax+1)+maxneigh*4) * sizeof(double);
- bytes += (nqlist+maxneigh) * sizeof(int);
+ bytes += (qmax*(2*qmax+1)+maxneigh*4) * sizeof(double);
+ bytes += (nqlist+maxneigh) * sizeof(int);
return bytes;
}
/* ----------------------------------------------------------------------
select3 routine from Numerical Recipes (slightly modified)
find k smallest values in array of length n
sort auxiliary arrays at same time
------------------------------------------------------------------------- */
// Use no-op do while to create single statement
-#define SWAP(a,b) do { \
- tmp = a; a = b; b = tmp; \
+#define SWAP(a,b) do { \
+ tmp = a; a = b; b = tmp; \
} while(0)
-#define ISWAP(a,b) do { \
- itmp = a; a = b; b = itmp; \
+#define ISWAP(a,b) do { \
+ itmp = a; a = b; b = itmp; \
} while(0)
-#define SWAP3(a,b) do { \
- tmp = a[0]; a[0] = b[0]; b[0] = tmp; \
- tmp = a[1]; a[1] = b[1]; b[1] = tmp; \
- tmp = a[2]; a[2] = b[2]; b[2] = tmp; \
+#define SWAP3(a,b) do { \
+ tmp = a[0]; a[0] = b[0]; b[0] = tmp; \
+ tmp = a[1]; a[1] = b[1]; b[1] = tmp; \
+ tmp = a[2]; a[2] = b[2]; b[2] = tmp; \
} while(0)
/* ---------------------------------------------------------------------- */
void ComputeOrientOrderAtom::select3(int k, int n, double *arr, int *iarr, double **arr3)
{
int i,ir,j,l,mid,ia,itmp;
double a,tmp,a3[3];
arr--;
iarr--;
arr3--;
l = 1;
ir = n;
for (;;) {
if (ir <= l+1) {
if (ir == l+1 && arr[ir] < arr[l]) {
SWAP(arr[l],arr[ir]);
- ISWAP(iarr[l],iarr[ir]);
+ ISWAP(iarr[l],iarr[ir]);
SWAP3(arr3[l],arr3[ir]);
}
return;
} else {
mid=(l+ir) >> 1;
SWAP(arr[mid],arr[l+1]);
ISWAP(iarr[mid],iarr[l+1]);
SWAP3(arr3[mid],arr3[l+1]);
if (arr[l] > arr[ir]) {
SWAP(arr[l],arr[ir]);
ISWAP(iarr[l],iarr[ir]);
- SWAP3(arr3[l],arr3[ir]);
+ SWAP3(arr3[l],arr3[ir]);
}
if (arr[l+1] > arr[ir]) {
SWAP(arr[l+1],arr[ir]);
ISWAP(iarr[l+1],iarr[ir]);
- SWAP3(arr3[l+1],arr3[ir]);
+ SWAP3(arr3[l+1],arr3[ir]);
}
if (arr[l] > arr[l+1]) {
SWAP(arr[l],arr[l+1]);
ISWAP(iarr[l],iarr[l+1]);
- SWAP3(arr3[l],arr3[l+1]);
+ SWAP3(arr3[l],arr3[l+1]);
}
i = l+1;
j = ir;
a = arr[l+1];
ia = iarr[l+1];
a3[0] = arr3[l+1][0];
a3[1] = arr3[l+1][1];
a3[2] = arr3[l+1][2];
for (;;) {
do i++; while (arr[i] < a);
do j--; while (arr[j] > a);
if (j < i) break;
SWAP(arr[i],arr[j]);
ISWAP(iarr[i],iarr[j]);
- SWAP3(arr3[i],arr3[j]);
+ SWAP3(arr3[i],arr3[j]);
}
arr[l+1] = arr[j];
arr[j] = a;
iarr[l+1] = iarr[j];
iarr[j] = ia;
arr3[l+1][0] = arr3[j][0];
arr3[l+1][1] = arr3[j][1];
arr3[l+1][2] = arr3[j][2];
arr3[j][0] = a3[0];
arr3[j][1] = a3[1];
arr3[j][2] = a3[2];
if (j >= k) ir = j-1;
if (j <= k) l = i;
}
}
}
/* ----------------------------------------------------------------------
calculate the bond orientational order parameters
------------------------------------------------------------------------- */
-void ComputeOrientOrderAtom::calc_boop(double **rlist,
- int ncount, double qn[],
- int qlist[], int nqlist) {
+void ComputeOrientOrderAtom::calc_boop(double **rlist,
+ int ncount, double qn[],
+ int qlist[], int nqlist) {
for (int iw = 0; iw < nqlist; iw++) {
int n = qlist[iw];
qn[iw] = 0.0;
for(int m = 0; m < 2*n+1; m++) {
qnm_r[iw][m] = 0.0;
qnm_i[iw][m] = 0.0;
}
}
for(int ineigh = 0; ineigh < ncount; ineigh++) {
const double * const r = rlist[ineigh];
double rmag = dist(r);
if(rmag <= MY_EPSILON) {
return;
}
double costheta = r[2] / rmag;
double expphi_r = r[0];
double expphi_i = r[1];
double rxymag = sqrt(expphi_r*expphi_r+expphi_i*expphi_i);
if(rxymag <= MY_EPSILON) {
expphi_r = 1.0;
expphi_i = 0.0;
} else {
double rxymaginv = 1.0/rxymag;
expphi_r *= rxymaginv;
expphi_i *= rxymaginv;
}
for (int iw = 0; iw < nqlist; iw++) {
int n = qlist[iw];
qnm_r[iw][n] += polar_prefactor(n, 0, costheta);
double expphim_r = expphi_r;
double expphim_i = expphi_i;
for(int m = 1; m <= +n; m++) {
- double prefactor = polar_prefactor(n, m, costheta);
- double c_r = prefactor * expphim_r;
- double c_i = prefactor * expphim_i;
- qnm_r[iw][m+n] += c_r;
- qnm_i[iw][m+n] += c_i;
- if(m & 1) {
- qnm_r[iw][-m+n] -= c_r;
- qnm_i[iw][-m+n] += c_i;
- } else {
- qnm_r[iw][-m+n] += c_r;
- qnm_i[iw][-m+n] -= c_i;
- }
- double tmp_r = expphim_r*expphi_r - expphim_i*expphi_i;
- double tmp_i = expphim_r*expphi_i + expphim_i*expphi_r;
- expphim_r = tmp_r;
- expphim_i = tmp_i;
+ double prefactor = polar_prefactor(n, m, costheta);
+ double c_r = prefactor * expphim_r;
+ double c_i = prefactor * expphim_i;
+ qnm_r[iw][m+n] += c_r;
+ qnm_i[iw][m+n] += c_i;
+ if(m & 1) {
+ qnm_r[iw][-m+n] -= c_r;
+ qnm_i[iw][-m+n] += c_i;
+ } else {
+ qnm_r[iw][-m+n] += c_r;
+ qnm_i[iw][-m+n] -= c_i;
+ }
+ double tmp_r = expphim_r*expphi_r - expphim_i*expphi_i;
+ double tmp_i = expphim_r*expphi_i + expphim_i*expphi_r;
+ expphim_r = tmp_r;
+ expphim_i = tmp_i;
}
}
}
double fac = sqrt(MY_4PI) / ncount;
+ double normfac = 0.0;
for (int iw = 0; iw < nqlist; iw++) {
int n = qlist[iw];
double qm_sum = 0.0;
for(int m = 0; m < 2*n+1; m++) {
qm_sum += qnm_r[iw][m]*qnm_r[iw][m] + qnm_i[iw][m]*qnm_i[iw][m];
// printf("Ylm^2 = %d %d %g\n",n,m,
- // qnm_r[iw][m]*qnm_r[iw][m] + qnm_i[iw][m]*qnm_i[iw][m]);
+ // qnm_r[iw][m]*qnm_r[iw][m] + qnm_i[iw][m]*qnm_i[iw][m]);
}
qn[iw] = fac * sqrt(qm_sum / (2*n+1));
+ if (qlcompflag && iqlcomp == iw) normfac = 1.0/sqrt(qm_sum);
+
+ }
+
+ // output of the complex vector
+
+ if (qlcompflag) {
+ int j = nqlist;
+ for(int m = 0; m < 2*qlcomp+1; m++) {
+ qn[j++] = qnm_r[iqlcomp][m] * normfac;
+ qn[j++] = qnm_i[iqlcomp][m] * normfac;
+ }
}
}
/* ----------------------------------------------------------------------
calculate scalar distance
------------------------------------------------------------------------- */
double ComputeOrientOrderAtom::dist(const double r[]) {
return sqrt(r[0]*r[0] + r[1]*r[1] + r[2]*r[2]);
}
/* ----------------------------------------------------------------------
- polar prefactor for spherical harmonic Y_l^m, where
+ polar prefactor for spherical harmonic Y_l^m, where
Y_l^m (theta, phi) = prefactor(l, m, cos(theta)) * exp(i*m*phi)
------------------------------------------------------------------------- */
double ComputeOrientOrderAtom::
polar_prefactor(int l, int m, double costheta) {
const int mabs = abs(m);
double prefactor = 1.0;
for (int i=l-mabs+1; i < l+mabs+1; ++i)
prefactor *= static_cast<double>(i);
prefactor = sqrt(static_cast<double>(2*l+1)/(MY_4PI*prefactor))
* associated_legendre(l,mabs,costheta);
if ((m < 0) && (m % 2)) prefactor = -prefactor;
return prefactor;
}
/* ----------------------------------------------------------------------
associated legendre polynomial
------------------------------------------------------------------------- */
double ComputeOrientOrderAtom::
associated_legendre(int l, int m, double x) {
if (l < m) return 0.0;
double p(1.0), pm1(0.0), pm2(0.0);
if (m != 0) {
const double sqx = sqrt(1.0-x*x);
for (int i=1; i < m+1; ++i)
p *= static_cast<double>(2*i-1) * sqx;
}
for (int i=m+1; i < l+1; ++i) {
pm2 = pm1;
pm1 = p;
p = (static_cast<double>(2*i-1)*x*pm1
- static_cast<double>(i+m-1)*pm2) / static_cast<double>(i-m);
}
return p;
}
diff --git a/src/compute_orientorder_atom.h b/src/compute_orientorder_atom.h
index 9c5ec14f5..81b08dbdd 100644
--- a/src/compute_orientorder_atom.h
+++ b/src/compute_orientorder_atom.h
@@ -1,84 +1,85 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#ifdef COMPUTE_CLASS
ComputeStyle(orientorder/atom,ComputeOrientOrderAtom)
#else
#ifndef LMP_COMPUTE_ORIENTORDER_ATOM_H
#define LMP_COMPUTE_ORIENTORDER_ATOM_H
#include "compute.h"
namespace LAMMPS_NS {
class ComputeOrientOrderAtom : public Compute {
public:
ComputeOrientOrderAtom(class LAMMPS *, int, char **);
~ComputeOrientOrderAtom();
void init();
void init_list(int, class NeighList *);
void compute_peratom();
double memory_usage();
+ double cutsq;
+ int iqlcomp, qlcomp, qlcompflag;
+ int *qlist;
+ int nqlist;
private:
int nmax,maxneigh,ncol,nnn;
class NeighList *list;
double *distsq;
int *nearest;
double **rlist;
- int *qlist;
- int nqlist;
int qmax;
double **qnarray;
- double cutsq;
double **qnm_r;
double **qnm_i;
void select3(int, int, double *, int *, double **);
- void calc_boop(double **rlist, int numNeighbors,
+ void calc_boop(double **rlist, int numNeighbors,
double qn[], int nlist[], int nnlist);
double dist(const double r[]);
double polar_prefactor(int, int, double);
double associated_legendre(int, int, double);
};
}
#endif
#endif
/* ERROR/WARNING messages:
E: Illegal ... command
Self-explanatory. Check the input script syntax and compare to the
documentation for the command. You can use -echo screen as a
command-line option when running LAMMPS to see the offending line.
E: Compute orientorder/atom requires a pair style be defined
Self-explantory.
E: Compute orientorder/atom cutoff is longer than pairwise cutoff
Cannot compute order parameter beyond cutoff.
W: More than one compute orientorder/atom
It is not efficient to use compute orientorder/atom more than once.
*/
diff --git a/src/domain.cpp b/src/domain.cpp
index f627048cf..52ac9d3d1 100644
--- a/src/domain.cpp
+++ b/src/domain.cpp
@@ -1,2019 +1,2053 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing author (triclinic) : Pieter in 't Veld (SNL)
------------------------------------------------------------------------- */
#include <mpi.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <math.h>
#include "domain.h"
#include "style_region.h"
#include "atom.h"
#include "atom_vec.h"
#include "molecule.h"
#include "force.h"
#include "kspace.h"
#include "update.h"
#include "modify.h"
#include "fix.h"
#include "fix_deform.h"
#include "region.h"
#include "lattice.h"
#include "comm.h"
#include "output.h"
#include "thermo.h"
#include "universe.h"
#include "math_const.h"
#include "memory.h"
#include "error.h"
using namespace LAMMPS_NS;
using namespace MathConst;
enum{NO_REMAP,X_REMAP,V_REMAP}; // same as fix_deform.cpp
enum{IGNORE,WARN,ERROR}; // same as thermo.cpp
enum{LAYOUT_UNIFORM,LAYOUT_NONUNIFORM,LAYOUT_TILED}; // several files
#define BIG 1.0e20
#define SMALL 1.0e-4
#define DELTAREGION 4
#define BONDSTRETCH 1.1
/* ----------------------------------------------------------------------
default is periodic
------------------------------------------------------------------------- */
Domain::Domain(LAMMPS *lmp) : Pointers(lmp)
{
box_exist = 0;
dimension = 3;
nonperiodic = 0;
xperiodic = yperiodic = zperiodic = 1;
periodicity[0] = xperiodic;
periodicity[1] = yperiodic;
periodicity[2] = zperiodic;
boundary[0][0] = boundary[0][1] = 0;
boundary[1][0] = boundary[1][1] = 0;
boundary[2][0] = boundary[2][1] = 0;
minxlo = minxhi = 0.0;
minylo = minyhi = 0.0;
minzlo = minzhi = 0.0;
triclinic = 0;
tiltsmall = 1;
boxlo[0] = boxlo[1] = boxlo[2] = -0.5;
boxhi[0] = boxhi[1] = boxhi[2] = 0.5;
xy = xz = yz = 0.0;
h[3] = h[4] = h[5] = 0.0;
h_inv[3] = h_inv[4] = h_inv[5] = 0.0;
h_rate[0] = h_rate[1] = h_rate[2] =
h_rate[3] = h_rate[4] = h_rate[5] = 0.0;
h_ratelo[0] = h_ratelo[1] = h_ratelo[2] = 0.0;
prd_lamda[0] = prd_lamda[1] = prd_lamda[2] = 1.0;
prd_half_lamda[0] = prd_half_lamda[1] = prd_half_lamda[2] = 0.5;
boxlo_lamda[0] = boxlo_lamda[1] = boxlo_lamda[2] = 0.0;
boxhi_lamda[0] = boxhi_lamda[1] = boxhi_lamda[2] = 1.0;
lattice = NULL;
char **args = new char*[2];
args[0] = (char *) "none";
args[1] = (char *) "1.0";
set_lattice(2,args);
delete [] args;
nregion = maxregion = 0;
regions = NULL;
copymode = 0;
region_map = new RegionCreatorMap();
#define REGION_CLASS
#define RegionStyle(key,Class) \
(*region_map)[#key] = &region_creator<Class>;
#include "style_region.h"
#undef RegionStyle
#undef REGION_CLASS
}
/* ---------------------------------------------------------------------- */
Domain::~Domain()
{
if (copymode) return;
delete lattice;
for (int i = 0; i < nregion; i++) delete regions[i];
memory->sfree(regions);
delete region_map;
}
/* ---------------------------------------------------------------------- */
void Domain::init()
{
// set box_change flags if box size/shape/sub-domains ever change
// due to shrink-wrapping or fixes that change box size/shape/sub-domains
box_change_size = box_change_shape = box_change_domain = 0;
if (nonperiodic == 2) box_change_size = 1;
for (int i = 0; i < modify->nfix; i++) {
if (modify->fix[i]->box_change_size) box_change_size = 1;
if (modify->fix[i]->box_change_shape) box_change_shape = 1;
if (modify->fix[i]->box_change_domain) box_change_domain = 1;
}
box_change = 0;
if (box_change_size || box_change_shape || box_change_domain) box_change = 1;
// check for fix deform
deform_flag = deform_vremap = deform_groupbit = 0;
for (int i = 0; i < modify->nfix; i++)
if (strcmp(modify->fix[i]->style,"deform") == 0) {
deform_flag = 1;
if (((FixDeform *) modify->fix[i])->remapflag == V_REMAP) {
deform_vremap = 1;
deform_groupbit = modify->fix[i]->groupbit;
}
}
// region inits
for (int i = 0; i < nregion; i++) regions[i]->init();
}
/* ----------------------------------------------------------------------
set initial global box
assumes boxlo/hi and triclinic tilts are already set
expandflag = 1 if need to expand box in shrink-wrapped dims
not invoked by read_restart since box is already expanded
if don't prevent further expansion, restarted triclinic box
with unchanged tilt factors can become a box with atoms outside the box
------------------------------------------------------------------------- */
void Domain::set_initial_box(int expandflag)
{
// error checks for orthogonal and triclinic domains
if (boxlo[0] >= boxhi[0] || boxlo[1] >= boxhi[1] || boxlo[2] >= boxhi[2])
error->one(FLERR,"Box bounds are invalid or missing");
if (domain->dimension == 2 && (xz != 0.0 || yz != 0.0))
error->all(FLERR,"Cannot skew triclinic box in z for 2d simulation");
// error check or warning on triclinic tilt factors
if (triclinic) {
if ((fabs(xy/(boxhi[0]-boxlo[0])) > 0.5 && xperiodic) ||
(fabs(xz/(boxhi[0]-boxlo[0])) > 0.5 && xperiodic) ||
(fabs(yz/(boxhi[1]-boxlo[1])) > 0.5 && yperiodic)) {
if (tiltsmall)
error->all(FLERR,"Triclinic box skew is too large");
else if (comm->me == 0)
error->warning(FLERR,"Triclinic box skew is large");
}
}
// set small based on box size and SMALL
// this works for any unit system
small[0] = SMALL * (boxhi[0] - boxlo[0]);
small[1] = SMALL * (boxhi[1] - boxlo[1]);
small[2] = SMALL * (boxhi[2] - boxlo[2]);
// if expandflag, adjust box lo/hi for shrink-wrapped dims
if (!expandflag) return;
if (boundary[0][0] == 2) boxlo[0] -= small[0];
else if (boundary[0][0] == 3) minxlo = boxlo[0];
if (boundary[0][1] == 2) boxhi[0] += small[0];
else if (boundary[0][1] == 3) minxhi = boxhi[0];
if (boundary[1][0] == 2) boxlo[1] -= small[1];
else if (boundary[1][0] == 3) minylo = boxlo[1];
if (boundary[1][1] == 2) boxhi[1] += small[1];
else if (boundary[1][1] == 3) minyhi = boxhi[1];
if (boundary[2][0] == 2) boxlo[2] -= small[2];
else if (boundary[2][0] == 3) minzlo = boxlo[2];
if (boundary[2][1] == 2) boxhi[2] += small[2];
else if (boundary[2][1] == 3) minzhi = boxhi[2];
}
/* ----------------------------------------------------------------------
set global box params
assumes boxlo/hi and triclinic tilts are already set
------------------------------------------------------------------------- */
void Domain::set_global_box()
{
prd[0] = xprd = boxhi[0] - boxlo[0];
prd[1] = yprd = boxhi[1] - boxlo[1];
prd[2] = zprd = boxhi[2] - boxlo[2];
h[0] = xprd;
h[1] = yprd;
h[2] = zprd;
h_inv[0] = 1.0/h[0];
h_inv[1] = 1.0/h[1];
h_inv[2] = 1.0/h[2];
prd_half[0] = xprd_half = 0.5*xprd;
prd_half[1] = yprd_half = 0.5*yprd;
prd_half[2] = zprd_half = 0.5*zprd;
if (triclinic) {
h[3] = yz;
h[4] = xz;
h[5] = xy;
h_inv[3] = -h[3] / (h[1]*h[2]);
h_inv[4] = (h[3]*h[5] - h[1]*h[4]) / (h[0]*h[1]*h[2]);
h_inv[5] = -h[5] / (h[0]*h[1]);
boxlo_bound[0] = MIN(boxlo[0],boxlo[0]+xy);
boxlo_bound[0] = MIN(boxlo_bound[0],boxlo_bound[0]+xz);
boxlo_bound[1] = MIN(boxlo[1],boxlo[1]+yz);
boxlo_bound[2] = boxlo[2];
boxhi_bound[0] = MAX(boxhi[0],boxhi[0]+xy);
boxhi_bound[0] = MAX(boxhi_bound[0],boxhi_bound[0]+xz);
boxhi_bound[1] = MAX(boxhi[1],boxhi[1]+yz);
boxhi_bound[2] = boxhi[2];
}
}
/* ----------------------------------------------------------------------
set lamda box params
assumes global box is defined and proc assignment has been made
uses comm->xyz_split or comm->mysplit
to define subbox boundaries in consistent manner
------------------------------------------------------------------------- */
void Domain::set_lamda_box()
{
if (comm->layout != LAYOUT_TILED) {
int *myloc = comm->myloc;
double *xsplit = comm->xsplit;
double *ysplit = comm->ysplit;
double *zsplit = comm->zsplit;
sublo_lamda[0] = xsplit[myloc[0]];
subhi_lamda[0] = xsplit[myloc[0]+1];
sublo_lamda[1] = ysplit[myloc[1]];
subhi_lamda[1] = ysplit[myloc[1]+1];
sublo_lamda[2] = zsplit[myloc[2]];
subhi_lamda[2] = zsplit[myloc[2]+1];
} else {
double (*mysplit)[2] = comm->mysplit;
sublo_lamda[0] = mysplit[0][0];
subhi_lamda[0] = mysplit[0][1];
sublo_lamda[1] = mysplit[1][0];
subhi_lamda[1] = mysplit[1][1];
sublo_lamda[2] = mysplit[2][0];
subhi_lamda[2] = mysplit[2][1];
}
}
/* ----------------------------------------------------------------------
set local subbox params for orthogonal boxes
assumes global box is defined and proc assignment has been made
uses comm->xyz_split or comm->mysplit
to define subbox boundaries in consistent manner
insure subhi[max] = boxhi
------------------------------------------------------------------------- */
void Domain::set_local_box()
{
if (triclinic) return;
if (comm->layout != LAYOUT_TILED) {
int *myloc = comm->myloc;
int *procgrid = comm->procgrid;
double *xsplit = comm->xsplit;
double *ysplit = comm->ysplit;
double *zsplit = comm->zsplit;
sublo[0] = boxlo[0] + xprd*xsplit[myloc[0]];
if (myloc[0] < procgrid[0]-1) subhi[0] = boxlo[0] + xprd*xsplit[myloc[0]+1];
else subhi[0] = boxhi[0];
sublo[1] = boxlo[1] + yprd*ysplit[myloc[1]];
if (myloc[1] < procgrid[1]-1) subhi[1] = boxlo[1] + yprd*ysplit[myloc[1]+1];
else subhi[1] = boxhi[1];
sublo[2] = boxlo[2] + zprd*zsplit[myloc[2]];
if (myloc[2] < procgrid[2]-1) subhi[2] = boxlo[2] + zprd*zsplit[myloc[2]+1];
else subhi[2] = boxhi[2];
} else {
double (*mysplit)[2] = comm->mysplit;
sublo[0] = boxlo[0] + xprd*mysplit[0][0];
if (mysplit[0][1] < 1.0) subhi[0] = boxlo[0] + xprd*mysplit[0][1];
else subhi[0] = boxhi[0];
sublo[1] = boxlo[1] + yprd*mysplit[1][0];
if (mysplit[1][1] < 1.0) subhi[1] = boxlo[1] + yprd*mysplit[1][1];
else subhi[1] = boxhi[1];
sublo[2] = boxlo[2] + zprd*mysplit[2][0];
if (mysplit[2][1] < 1.0) subhi[2] = boxlo[2] + zprd*mysplit[2][1];
else subhi[2] = boxhi[2];
}
}
/* ----------------------------------------------------------------------
reset global & local boxes due to global box boundary changes
if shrink-wrapped, determine atom extent and reset boxlo/hi
for triclinic, atoms must be in lamda coords (0-1) before reset_box is called
------------------------------------------------------------------------- */
void Domain::reset_box()
{
// perform shrink-wrapping
// compute extent of atoms on this proc
// for triclinic, this is done in lamda space
if (nonperiodic == 2) {
double extent[3][2],all[3][2];
extent[2][0] = extent[1][0] = extent[0][0] = BIG;
extent[2][1] = extent[1][1] = extent[0][1] = -BIG;
double **x = atom->x;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
extent[0][0] = MIN(extent[0][0],x[i][0]);
extent[0][1] = MAX(extent[0][1],x[i][0]);
extent[1][0] = MIN(extent[1][0],x[i][1]);
extent[1][1] = MAX(extent[1][1],x[i][1]);
extent[2][0] = MIN(extent[2][0],x[i][2]);
extent[2][1] = MAX(extent[2][1],x[i][2]);
}
// compute extent across all procs
// flip sign of MIN to do it in one Allreduce MAX
extent[0][0] = -extent[0][0];
extent[1][0] = -extent[1][0];
extent[2][0] = -extent[2][0];
MPI_Allreduce(extent,all,6,MPI_DOUBLE,MPI_MAX,world);
// for triclinic, convert back to box coords before changing box
if (triclinic) lamda2x(atom->nlocal);
// in shrink-wrapped dims, set box by atom extent
// if minimum set, enforce min box size settings
// for triclinic, convert lamda extent to box coords, then set box lo/hi
// decided NOT to do the next comment - don't want to sneakily change tilt
// for triclinic, adjust tilt factors if 2nd dim is shrink-wrapped,
// so that displacement in 1st dim stays the same
if (triclinic == 0) {
if (xperiodic == 0) {
if (boundary[0][0] == 2) boxlo[0] = -all[0][0] - small[0];
else if (boundary[0][0] == 3)
boxlo[0] = MIN(-all[0][0]-small[0],minxlo);
if (boundary[0][1] == 2) boxhi[0] = all[0][1] + small[0];
else if (boundary[0][1] == 3) boxhi[0] = MAX(all[0][1]+small[0],minxhi);
if (boxlo[0] > boxhi[0]) error->all(FLERR,"Illegal simulation box");
}
if (yperiodic == 0) {
if (boundary[1][0] == 2) boxlo[1] = -all[1][0] - small[1];
else if (boundary[1][0] == 3)
boxlo[1] = MIN(-all[1][0]-small[1],minylo);
if (boundary[1][1] == 2) boxhi[1] = all[1][1] + small[1];
else if (boundary[1][1] == 3) boxhi[1] = MAX(all[1][1]+small[1],minyhi);
if (boxlo[1] > boxhi[1]) error->all(FLERR,"Illegal simulation box");
}
if (zperiodic == 0) {
if (boundary[2][0] == 2) boxlo[2] = -all[2][0] - small[2];
else if (boundary[2][0] == 3)
boxlo[2] = MIN(-all[2][0]-small[2],minzlo);
if (boundary[2][1] == 2) boxhi[2] = all[2][1] + small[2];
else if (boundary[2][1] == 3) boxhi[2] = MAX(all[2][1]+small[2],minzhi);
if (boxlo[2] > boxhi[2]) error->all(FLERR,"Illegal simulation box");
}
} else {
double lo[3],hi[3];
if (xperiodic == 0) {
lo[0] = -all[0][0]; lo[1] = 0.0; lo[2] = 0.0;
lamda2x(lo,lo);
hi[0] = all[0][1]; hi[1] = 0.0; hi[2] = 0.0;
lamda2x(hi,hi);
if (boundary[0][0] == 2) boxlo[0] = lo[0] - small[0];
else if (boundary[0][0] == 3) boxlo[0] = MIN(lo[0]-small[0],minxlo);
if (boundary[0][1] == 2) boxhi[0] = hi[0] + small[0];
else if (boundary[0][1] == 3) boxhi[0] = MAX(hi[0]+small[0],minxhi);
if (boxlo[0] > boxhi[0]) error->all(FLERR,"Illegal simulation box");
}
if (yperiodic == 0) {
lo[0] = 0.0; lo[1] = -all[1][0]; lo[2] = 0.0;
lamda2x(lo,lo);
hi[0] = 0.0; hi[1] = all[1][1]; hi[2] = 0.0;
lamda2x(hi,hi);
if (boundary[1][0] == 2) boxlo[1] = lo[1] - small[1];
else if (boundary[1][0] == 3) boxlo[1] = MIN(lo[1]-small[1],minylo);
if (boundary[1][1] == 2) boxhi[1] = hi[1] + small[1];
else if (boundary[1][1] == 3) boxhi[1] = MAX(hi[1]+small[1],minyhi);
if (boxlo[1] > boxhi[1]) error->all(FLERR,"Illegal simulation box");
//xy *= (boxhi[1]-boxlo[1]) / yprd;
}
if (zperiodic == 0) {
lo[0] = 0.0; lo[1] = 0.0; lo[2] = -all[2][0];
lamda2x(lo,lo);
hi[0] = 0.0; hi[1] = 0.0; hi[2] = all[2][1];
lamda2x(hi,hi);
if (boundary[2][0] == 2) boxlo[2] = lo[2] - small[2];
else if (boundary[2][0] == 3) boxlo[2] = MIN(lo[2]-small[2],minzlo);
if (boundary[2][1] == 2) boxhi[2] = hi[2] + small[2];
else if (boundary[2][1] == 3) boxhi[2] = MAX(hi[2]+small[2],minzhi);
if (boxlo[2] > boxhi[2]) error->all(FLERR,"Illegal simulation box");
//xz *= (boxhi[2]-boxlo[2]) / xprd;
//yz *= (boxhi[2]-boxlo[2]) / yprd;
}
}
}
// reset box whether shrink-wrapping or not
set_global_box();
set_local_box();
// if shrink-wrapped & kspace is defined (i.e. using MSM), call setup()
// also call init() (to test for compatibility) ?
if (nonperiodic == 2 && force->kspace) {
//force->kspace->init();
force->kspace->setup();
}
// if shrink-wrapped & triclinic, re-convert to lamda coords for new box
// re-invoke pbc() b/c x2lamda result can be outside [0,1] due to roundoff
if (nonperiodic == 2 && triclinic) {
x2lamda(atom->nlocal);
pbc();
}
}
/* ----------------------------------------------------------------------
enforce PBC and modify box image flags for each atom
called every reneighboring and by other commands that change atoms
resulting coord must satisfy lo <= coord < hi
MAX is important since coord - prd < lo can happen when coord = hi
if fix deform, remap velocity of fix group atoms by box edge velocities
for triclinic, atoms must be in lamda coords (0-1) before pbc is called
image = 10 or 20 bits for each dimension depending on sizeof(imageint)
increment/decrement in wrap-around fashion
------------------------------------------------------------------------- */
void Domain::pbc()
{
int i;
imageint idim,otherdims;
double *lo,*hi,*period;
int nlocal = atom->nlocal;
double **x = atom->x;
double **v = atom->v;
int *mask = atom->mask;
imageint *image = atom->image;
// verify owned atoms have valid numerical coords
// may not if computed pairwise force between 2 atoms at same location
double *coord;
int n3 = 3*nlocal;
coord = &x[0][0]; // note: x is always initialzed to at least one element.
int flag = 0;
for (i = 0; i < n3; i++)
if (!ISFINITE(*coord++)) flag = 1;
if (flag) error->one(FLERR,"Non-numeric atom coords - simulation unstable");
// setup for PBC checks
if (triclinic == 0) {
lo = boxlo;
hi = boxhi;
period = prd;
} else {
lo = boxlo_lamda;
hi = boxhi_lamda;
period = prd_lamda;
}
// apply PBC to each owned atom
for (i = 0; i < nlocal; i++) {
if (xperiodic) {
if (x[i][0] < lo[0]) {
x[i][0] += period[0];
if (deform_vremap && mask[i] & deform_groupbit) v[i][0] += h_rate[0];
idim = image[i] & IMGMASK;
otherdims = image[i] ^ idim;
idim--;
idim &= IMGMASK;
image[i] = otherdims | idim;
}
if (x[i][0] >= hi[0]) {
x[i][0] -= period[0];
x[i][0] = MAX(x[i][0],lo[0]);
if (deform_vremap && mask[i] & deform_groupbit) v[i][0] -= h_rate[0];
idim = image[i] & IMGMASK;
otherdims = image[i] ^ idim;
idim++;
idim &= IMGMASK;
image[i] = otherdims | idim;
}
}
if (yperiodic) {
if (x[i][1] < lo[1]) {
x[i][1] += period[1];
if (deform_vremap && mask[i] & deform_groupbit) {
v[i][0] += h_rate[5];
v[i][1] += h_rate[1];
}
idim = (image[i] >> IMGBITS) & IMGMASK;
otherdims = image[i] ^ (idim << IMGBITS);
idim--;
idim &= IMGMASK;
image[i] = otherdims | (idim << IMGBITS);
}
if (x[i][1] >= hi[1]) {
x[i][1] -= period[1];
x[i][1] = MAX(x[i][1],lo[1]);
if (deform_vremap && mask[i] & deform_groupbit) {
v[i][0] -= h_rate[5];
v[i][1] -= h_rate[1];
}
idim = (image[i] >> IMGBITS) & IMGMASK;
otherdims = image[i] ^ (idim << IMGBITS);
idim++;
idim &= IMGMASK;
image[i] = otherdims | (idim << IMGBITS);
}
}
if (zperiodic) {
if (x[i][2] < lo[2]) {
x[i][2] += period[2];
if (deform_vremap && mask[i] & deform_groupbit) {
v[i][0] += h_rate[4];
v[i][1] += h_rate[3];
v[i][2] += h_rate[2];
}
idim = image[i] >> IMG2BITS;
otherdims = image[i] ^ (idim << IMG2BITS);
idim--;
idim &= IMGMASK;
image[i] = otherdims | (idim << IMG2BITS);
}
if (x[i][2] >= hi[2]) {
x[i][2] -= period[2];
x[i][2] = MAX(x[i][2],lo[2]);
if (deform_vremap && mask[i] & deform_groupbit) {
v[i][0] -= h_rate[4];
v[i][1] -= h_rate[3];
v[i][2] -= h_rate[2];
}
idim = image[i] >> IMG2BITS;
otherdims = image[i] ^ (idim << IMG2BITS);
idim++;
idim &= IMGMASK;
image[i] = otherdims | (idim << IMG2BITS);
}
}
}
}
/* ----------------------------------------------------------------------
check that point is inside box boundaries, in [lo,hi) sense
return 1 if true, 0 if false
------------------------------------------------------------------------- */
int Domain::inside(double* x)
{
double *lo,*hi;
double lamda[3];
if (triclinic == 0) {
lo = boxlo;
hi = boxhi;
if (x[0] < lo[0] || x[0] >= hi[0] ||
x[1] < lo[1] || x[1] >= hi[1] ||
x[2] < lo[2] || x[2] >= hi[2]) return 0;
else return 1;
} else {
lo = boxlo_lamda;
hi = boxhi_lamda;
x2lamda(x,lamda);
if (lamda[0] < lo[0] || lamda[0] >= hi[0] ||
lamda[1] < lo[1] || lamda[1] >= hi[1] ||
lamda[2] < lo[2] || lamda[2] >= hi[2]) return 0;
else return 1;
}
}
/* ----------------------------------------------------------------------
check that point is inside nonperiodic boundaries, in [lo,hi) sense
return 1 if true, 0 if false
------------------------------------------------------------------------- */
int Domain::inside_nonperiodic(double* x)
{
double *lo,*hi;
double lamda[3];
if (xperiodic && yperiodic && zperiodic) return 1;
if (triclinic == 0) {
lo = boxlo;
hi = boxhi;
if (!xperiodic && (x[0] < lo[0] || x[0] >= hi[0])) return 0;
if (!yperiodic && (x[1] < lo[1] || x[1] >= hi[1])) return 0;
if (!zperiodic && (x[2] < lo[2] || x[2] >= hi[2])) return 0;
return 1;
} else {
lo = boxlo_lamda;
hi = boxhi_lamda;
x2lamda(x,lamda);
if (!xperiodic && (lamda[0] < lo[0] || lamda[0] >= hi[0])) return 0;
if (!yperiodic && (lamda[1] < lo[1] || lamda[1] >= hi[1])) return 0;
if (!zperiodic && (lamda[2] < lo[2] || lamda[2] >= hi[2])) return 0;
return 1;
}
}
/* ----------------------------------------------------------------------
warn if image flags of any bonded atoms are inconsistent
could be a problem when using replicate or fix rigid
------------------------------------------------------------------------- */
void Domain::image_check()
{
int i,j,k,n,imol,iatom;
tagint tagprev;
// only need to check if system is molecular and some dimension is periodic
// if running verlet/split, don't check on KSpace partition since
// it has no ghost atoms and thus bond partners won't exist
if (!atom->molecular) return;
if (!xperiodic && !yperiodic && (dimension == 2 || !zperiodic)) return;
if (strncmp(update->integrate_style,"verlet/split",12) == 0 &&
universe->iworld != 0) return;
// communicate unwrapped position of owned atoms to ghost atoms
double **unwrap;
memory->create(unwrap,atom->nmax,3,"domain:unwrap");
double **x = atom->x;
imageint *image = atom->image;
int nlocal = atom->nlocal;
for (i = 0; i < nlocal; i++)
unmap(x[i],image[i],unwrap[i]);
comm->forward_comm_array(3,unwrap);
// compute unwrapped extent of each bond
// flag if any bond component is longer than 1/2 of periodic box length
// flag if any bond component is longer than non-periodic box length
// which means image flags in that dimension were different
int molecular = atom->molecular;
int *num_bond = atom->num_bond;
tagint **bond_atom = atom->bond_atom;
int **bond_type = atom->bond_type;
tagint *tag = atom->tag;
int *molindex = atom->molindex;
int *molatom = atom->molatom;
Molecule **onemols = atom->avec->onemols;
double delx,dely,delz;
int lostbond = output->thermo->lostbond;
int nmissing = 0;
int flag = 0;
for (i = 0; i < nlocal; i++) {
if (molecular == 1) n = num_bond[i];
else {
if (molindex[i] < 0) continue;
imol = molindex[i];
iatom = molatom[i];
n = onemols[imol]->num_bond[iatom];
}
for (j = 0; j < n; j++) {
if (molecular == 1) {
if (bond_type[i][j] <= 0) continue;
k = atom->map(bond_atom[i][j]);
} else {
if (onemols[imol]->bond_type[iatom][j] < 0) continue;
tagprev = tag[i] - iatom - 1;
k = atom->map(onemols[imol]->bond_atom[iatom][j]+tagprev);
}
if (k == -1) {
nmissing++;
if (lostbond == ERROR)
error->one(FLERR,"Bond atom missing in image check");
continue;
}
delx = unwrap[i][0] - unwrap[k][0];
dely = unwrap[i][1] - unwrap[k][1];
delz = unwrap[i][2] - unwrap[k][2];
if (xperiodic && delx > xprd_half) flag = 1;
if (xperiodic && dely > yprd_half) flag = 1;
if (dimension == 3 && zperiodic && delz > zprd_half) flag = 1;
if (!xperiodic && delx > xprd) flag = 1;
if (!yperiodic && dely > yprd) flag = 1;
if (dimension == 3 && !zperiodic && delz > zprd) flag = 1;
}
}
int flagall;
MPI_Allreduce(&flag,&flagall,1,MPI_INT,MPI_MAX,world);
if (flagall && comm->me == 0)
error->warning(FLERR,"Inconsistent image flags");
if (lostbond == WARN) {
int all;
MPI_Allreduce(&nmissing,&all,1,MPI_INT,MPI_SUM,world);
if (all && comm->me == 0)
error->warning(FLERR,"Bond atom missing in image check");
}
memory->destroy(unwrap);
}
/* ----------------------------------------------------------------------
warn if end atoms in any bonded interaction
are further apart than half a periodic box length
could cause problems when bonded neighbor list is built since
closest_image() could return wrong image
------------------------------------------------------------------------- */
void Domain::box_too_small_check()
{
int i,j,k,n,imol,iatom;
tagint tagprev;
// only need to check if system is molecular and some dimension is periodic
// if running verlet/split, don't check on KSpace partition since
// it has no ghost atoms and thus bond partners won't exist
if (!atom->molecular) return;
if (!xperiodic && !yperiodic && (dimension == 2 || !zperiodic)) return;
if (strncmp(update->integrate_style,"verlet/split",12) == 0 &&
universe->iworld != 0) return;
// maxbondall = longest current bond length
// if periodic box dim is tiny (less than 2 * bond-length),
// minimum_image() itself may compute bad bond lengths
// in this case, image_check() should warn,
// assuming 2 atoms have consistent image flags
int molecular = atom->molecular;
double **x = atom->x;
int *num_bond = atom->num_bond;
tagint **bond_atom = atom->bond_atom;
int **bond_type = atom->bond_type;
tagint *tag = atom->tag;
int *molindex = atom->molindex;
int *molatom = atom->molatom;
Molecule **onemols = atom->avec->onemols;
int nlocal = atom->nlocal;
double delx,dely,delz,rsq;
double maxbondme = 0.0;
int lostbond = output->thermo->lostbond;
int nmissing = 0;
for (i = 0; i < nlocal; i++) {
if (molecular == 1) n = num_bond[i];
else {
if (molindex[i] < 0) continue;
imol = molindex[i];
iatom = molatom[i];
n = onemols[imol]->num_bond[iatom];
}
for (j = 0; j < n; j++) {
if (molecular == 1) {
if (bond_type[i][j] <= 0) continue;
k = atom->map(bond_atom[i][j]);
} else {
if (onemols[imol]->bond_type[iatom][j] < 0) continue;
tagprev = tag[i] - iatom - 1;
k = atom->map(onemols[imol]->bond_atom[iatom][j]+tagprev);
}
if (k == -1) {
nmissing++;
if (lostbond == ERROR)
error->one(FLERR,"Bond atom missing in box size check");
continue;
}
delx = x[i][0] - x[k][0];
dely = x[i][1] - x[k][1];
delz = x[i][2] - x[k][2];
minimum_image(delx,dely,delz);
rsq = delx*delx + dely*dely + delz*delz;
maxbondme = MAX(maxbondme,rsq);
}
}
if (lostbond == WARN) {
int all;
MPI_Allreduce(&nmissing,&all,1,MPI_INT,MPI_SUM,world);
if (all && comm->me == 0)
error->warning(FLERR,"Bond atom missing in box size check");
}
double maxbondall;
MPI_Allreduce(&maxbondme,&maxbondall,1,MPI_DOUBLE,MPI_MAX,world);
maxbondall = sqrt(maxbondall);
// maxdelta = furthest apart 2 atoms in a bonded interaction can be
// include BONDSTRETCH factor to account for dynamics
double maxdelta = maxbondall * BONDSTRETCH;
if (atom->nangles) maxdelta = 2.0 * maxbondall * BONDSTRETCH;
if (atom->ndihedrals) maxdelta = 3.0 * maxbondall * BONDSTRETCH;
// warn if maxdelta > than half any periodic box length
// since atoms in the interaction could rotate into that dimension
int flag = 0;
if (xperiodic && maxdelta > xprd_half) flag = 1;
if (yperiodic && maxdelta > yprd_half) flag = 1;
if (dimension == 3 && zperiodic && maxdelta > zprd_half) flag = 1;
int flagall;
MPI_Allreduce(&flag,&flagall,1,MPI_INT,MPI_MAX,world);
if (flagall && comm->me == 0)
error->warning(FLERR,
"Bond/angle/dihedral extent > half of periodic box length");
}
/* ----------------------------------------------------------------------
check warn if any proc's subbox is smaller than thresh
since may lead to lost atoms in comm->exchange()
current callers set thresh = neighbor skin
------------------------------------------------------------------------- */
void Domain::subbox_too_small_check(double thresh)
{
int flag = 0;
if (!triclinic) {
if (subhi[0]-sublo[0] < thresh || subhi[1]-sublo[1] < thresh) flag = 1;
if (dimension == 3 && subhi[2]-sublo[2] < thresh) flag = 1;
} else {
double delta = subhi_lamda[0] - sublo_lamda[0];
if (delta*prd[0] < thresh) flag = 1;
delta = subhi_lamda[1] - sublo_lamda[1];
if (delta*prd[1] < thresh) flag = 1;
if (dimension == 3) {
delta = subhi_lamda[2] - sublo_lamda[2];
if (delta*prd[2] < thresh) flag = 1;
}
}
int flagall;
MPI_Allreduce(&flag,&flagall,1,MPI_INT,MPI_SUM,world);
if (flagall && comm->me == 0)
error->warning(FLERR,"Proc sub-domain size < neighbor skin, "
"could lead to lost atoms");
}
/* ----------------------------------------------------------------------
minimum image convention in periodic dimensions
use 1/2 of box size as test
for triclinic, also add/subtract tilt factors in other dims as needed
changed "if" to "while" to enable distance to
far-away ghost atom returned by atom->map() to be wrapped back into box
could be problem for looking up atom IDs when cutoff > boxsize
------------------------------------------------------------------------- */
void Domain::minimum_image(double &dx, double &dy, double &dz)
{
if (triclinic == 0) {
if (xperiodic) {
while (fabs(dx) > xprd_half) {
if (dx < 0.0) dx += xprd;
else dx -= xprd;
}
}
if (yperiodic) {
while (fabs(dy) > yprd_half) {
if (dy < 0.0) dy += yprd;
else dy -= yprd;
}
}
if (zperiodic) {
while (fabs(dz) > zprd_half) {
if (dz < 0.0) dz += zprd;
else dz -= zprd;
}
}
} else {
if (zperiodic) {
while (fabs(dz) > zprd_half) {
if (dz < 0.0) {
dz += zprd;
dy += yz;
dx += xz;
} else {
dz -= zprd;
dy -= yz;
dx -= xz;
}
}
}
if (yperiodic) {
while (fabs(dy) > yprd_half) {
if (dy < 0.0) {
dy += yprd;
dx += xy;
} else {
dy -= yprd;
dx -= xy;
}
}
}
if (xperiodic) {
while (fabs(dx) > xprd_half) {
if (dx < 0.0) dx += xprd;
else dx -= xprd;
}
}
}
}
/* ----------------------------------------------------------------------
minimum image convention in periodic dimensions
use 1/2 of box size as test
for triclinic, also add/subtract tilt factors in other dims as needed
changed "if" to "while" to enable distance to
far-away ghost atom returned by atom->map() to be wrapped back into box
could be problem for looking up atom IDs when cutoff > boxsize
------------------------------------------------------------------------- */
void Domain::minimum_image(double *delta)
{
if (triclinic == 0) {
if (xperiodic) {
while (fabs(delta[0]) > xprd_half) {
if (delta[0] < 0.0) delta[0] += xprd;
else delta[0] -= xprd;
}
}
if (yperiodic) {
while (fabs(delta[1]) > yprd_half) {
if (delta[1] < 0.0) delta[1] += yprd;
else delta[1] -= yprd;
}
}
if (zperiodic) {
while (fabs(delta[2]) > zprd_half) {
if (delta[2] < 0.0) delta[2] += zprd;
else delta[2] -= zprd;
}
}
} else {
if (zperiodic) {
while (fabs(delta[2]) > zprd_half) {
if (delta[2] < 0.0) {
delta[2] += zprd;
delta[1] += yz;
delta[0] += xz;
} else {
delta[2] -= zprd;
delta[1] -= yz;
delta[0] -= xz;
}
}
}
if (yperiodic) {
while (fabs(delta[1]) > yprd_half) {
if (delta[1] < 0.0) {
delta[1] += yprd;
delta[0] += xy;
} else {
delta[1] -= yprd;
delta[0] -= xy;
}
}
}
if (xperiodic) {
while (fabs(delta[0]) > xprd_half) {
if (delta[0] < 0.0) delta[0] += xprd;
else delta[0] -= xprd;
}
}
}
}
/* ----------------------------------------------------------------------
return local index of atom J or any of its images that is closest to atom I
if J is not a valid index like -1, just return it
------------------------------------------------------------------------- */
int Domain::closest_image(int i, int j)
{
if (j < 0) return j;
int *sametag = atom->sametag;
double **x = atom->x;
double *xi = x[i];
int closest = j;
double delx = xi[0] - x[j][0];
double dely = xi[1] - x[j][1];
double delz = xi[2] - x[j][2];
double rsqmin = delx*delx + dely*dely + delz*delz;
double rsq;
while (sametag[j] >= 0) {
j = sametag[j];
delx = xi[0] - x[j][0];
dely = xi[1] - x[j][1];
delz = xi[2] - x[j][2];
rsq = delx*delx + dely*dely + delz*delz;
if (rsq < rsqmin) {
rsqmin = rsq;
closest = j;
}
}
return closest;
}
+/* ----------------------------------------------------------------------
+ return local index of atom J or any of its images that is closest to pos
+ if J is not a valid index like -1, just return it
+------------------------------------------------------------------------- */
+
+int Domain::closest_image(double *pos, int j)
+{
+ if (j < 0) return j;
+
+ int *sametag = atom->sametag;
+ double **x = atom->x;
+
+ int closest = j;
+ double delx = pos[0] - x[j][0];
+ double dely = pos[1] - x[j][1];
+ double delz = pos[2] - x[j][2];
+ double rsqmin = delx*delx + dely*dely + delz*delz;
+ double rsq;
+
+ while (sametag[j] >= 0) {
+ j = sametag[j];
+ delx = pos[0] - x[j][0];
+ dely = pos[1] - x[j][1];
+ delz = pos[2] - x[j][2];
+ rsq = delx*delx + dely*dely + delz*delz;
+ if (rsq < rsqmin) {
+ rsqmin = rsq;
+ closest = j;
+ }
+ }
+
+ return closest;
+}
+
/* ----------------------------------------------------------------------
find and return Xj image = periodic image of Xj that is closest to Xi
for triclinic, add/subtract tilt factors in other dims as needed
not currently used (Jan 2017):
used to be called by pair TIP4P styles but no longer,
due to use of other closest_image() method
------------------------------------------------------------------------- */
void Domain::closest_image(const double * const xi, const double * const xj,
double * const xjimage)
{
double dx = xj[0] - xi[0];
double dy = xj[1] - xi[1];
double dz = xj[2] - xi[2];
if (triclinic == 0) {
if (xperiodic) {
if (dx < 0.0) {
while (dx < 0.0) dx += xprd;
if (dx > xprd_half) dx -= xprd;
} else {
while (dx > 0.0) dx -= xprd;
if (dx < -xprd_half) dx += xprd;
}
}
if (yperiodic) {
if (dy < 0.0) {
while (dy < 0.0) dy += yprd;
if (dy > yprd_half) dy -= yprd;
} else {
while (dy > 0.0) dy -= yprd;
if (dy < -yprd_half) dy += yprd;
}
}
if (zperiodic) {
if (dz < 0.0) {
while (dz < 0.0) dz += zprd;
if (dz > zprd_half) dz -= zprd;
} else {
while (dz > 0.0) dz -= zprd;
if (dz < -zprd_half) dz += zprd;
}
}
} else {
if (zperiodic) {
if (dz < 0.0) {
while (dz < 0.0) {
dz += zprd;
dy += yz;
dx += xz;
}
if (dz > zprd_half) {
dz -= zprd;
dy -= yz;
dx -= xz;
}
} else {
while (dz > 0.0) {
dz -= zprd;
dy -= yz;
dx -= xz;
}
if (dz < -zprd_half) {
dz += zprd;
dy += yz;
dx += xz;
}
}
}
if (yperiodic) {
if (dy < 0.0) {
while (dy < 0.0) {
dy += yprd;
dx += xy;
}
if (dy > yprd_half) {
dy -= yprd;
dx -= xy;
}
} else {
while (dy > 0.0) {
dy -= yprd;
dx -= xy;
}
if (dy < -yprd_half) {
dy += yprd;
dx += xy;
}
}
}
if (xperiodic) {
if (dx < 0.0) {
while (dx < 0.0) dx += xprd;
if (dx > xprd_half) dx -= xprd;
} else {
while (dx > 0.0) dx -= xprd;
if (dx < -xprd_half) dx += xprd;
}
}
}
xjimage[0] = xi[0] + dx;
xjimage[1] = xi[1] + dy;
xjimage[2] = xi[2] + dz;
}
/* ----------------------------------------------------------------------
remap the point into the periodic box no matter how far away
adjust 3 image flags encoded in image accordingly
resulting coord must satisfy lo <= coord < hi
MAX is important since coord - prd < lo can happen when coord = hi
for triclinic, point is converted to lamda coords (0-1) before doing remap
image = 10 bits for each dimension
increment/decrement in wrap-around fashion
------------------------------------------------------------------------- */
void Domain::remap(double *x, imageint &image)
{
double *lo,*hi,*period,*coord;
double lamda[3];
imageint idim,otherdims;
if (triclinic == 0) {
lo = boxlo;
hi = boxhi;
period = prd;
coord = x;
} else {
lo = boxlo_lamda;
hi = boxhi_lamda;
period = prd_lamda;
x2lamda(x,lamda);
coord = lamda;
}
if (xperiodic) {
while (coord[0] < lo[0]) {
coord[0] += period[0];
idim = image & IMGMASK;
otherdims = image ^ idim;
idim--;
idim &= IMGMASK;
image = otherdims | idim;
}
while (coord[0] >= hi[0]) {
coord[0] -= period[0];
idim = image & IMGMASK;
otherdims = image ^ idim;
idim++;
idim &= IMGMASK;
image = otherdims | idim;
}
coord[0] = MAX(coord[0],lo[0]);
}
if (yperiodic) {
while (coord[1] < lo[1]) {
coord[1] += period[1];
idim = (image >> IMGBITS) & IMGMASK;
otherdims = image ^ (idim << IMGBITS);
idim--;
idim &= IMGMASK;
image = otherdims | (idim << IMGBITS);
}
while (coord[1] >= hi[1]) {
coord[1] -= period[1];
idim = (image >> IMGBITS) & IMGMASK;
otherdims = image ^ (idim << IMGBITS);
idim++;
idim &= IMGMASK;
image = otherdims | (idim << IMGBITS);
}
coord[1] = MAX(coord[1],lo[1]);
}
if (zperiodic) {
while (coord[2] < lo[2]) {
coord[2] += period[2];
idim = image >> IMG2BITS;
otherdims = image ^ (idim << IMG2BITS);
idim--;
idim &= IMGMASK;
image = otherdims | (idim << IMG2BITS);
}
while (coord[2] >= hi[2]) {
coord[2] -= period[2];
idim = image >> IMG2BITS;
otherdims = image ^ (idim << IMG2BITS);
idim++;
idim &= IMGMASK;
image = otherdims | (idim << IMG2BITS);
}
coord[2] = MAX(coord[2],lo[2]);
}
if (triclinic) lamda2x(coord,x);
}
/* ----------------------------------------------------------------------
remap the point into the periodic box no matter how far away
no image flag calculation
resulting coord must satisfy lo <= coord < hi
MAX is important since coord - prd < lo can happen when coord = hi
for triclinic, point is converted to lamda coords (0-1) before remap
------------------------------------------------------------------------- */
void Domain::remap(double *x)
{
double *lo,*hi,*period,*coord;
double lamda[3];
if (triclinic == 0) {
lo = boxlo;
hi = boxhi;
period = prd;
coord = x;
} else {
lo = boxlo_lamda;
hi = boxhi_lamda;
period = prd_lamda;
x2lamda(x,lamda);
coord = lamda;
}
if (xperiodic) {
while (coord[0] < lo[0]) coord[0] += period[0];
while (coord[0] >= hi[0]) coord[0] -= period[0];
coord[0] = MAX(coord[0],lo[0]);
}
if (yperiodic) {
while (coord[1] < lo[1]) coord[1] += period[1];
while (coord[1] >= hi[1]) coord[1] -= period[1];
coord[1] = MAX(coord[1],lo[1]);
}
if (zperiodic) {
while (coord[2] < lo[2]) coord[2] += period[2];
while (coord[2] >= hi[2]) coord[2] -= period[2];
coord[2] = MAX(coord[2],lo[2]);
}
if (triclinic) lamda2x(coord,x);
}
/* ----------------------------------------------------------------------
remap xnew to be within half box length of xold
do it directly, not iteratively, in case is far away
for triclinic, both points are converted to lamda coords (0-1) before remap
------------------------------------------------------------------------- */
void Domain::remap_near(double *xnew, double *xold)
{
int n;
double *coordnew,*coordold,*period,*half;
double lamdanew[3],lamdaold[3];
if (triclinic == 0) {
period = prd;
half = prd_half;
coordnew = xnew;
coordold = xold;
} else {
period = prd_lamda;
half = prd_half_lamda;
x2lamda(xnew,lamdanew);
coordnew = lamdanew;
x2lamda(xold,lamdaold);
coordold = lamdaold;
}
// iterative form
// if (xperiodic) {
// while (coordnew[0]-coordold[0] > half[0]) coordnew[0] -= period[0];
// while (coordold[0]-coordnew[0] > half[0]) coordnew[0] += period[0];
// }
if (xperiodic) {
if (coordnew[0]-coordold[0] > period[0]) {
n = static_cast<int> ((coordnew[0]-coordold[0])/period[0]);
coordnew[0] -= n*period[0];
}
while (coordnew[0]-coordold[0] > half[0]) coordnew[0] -= period[0];
if (coordold[0]-coordnew[0] > period[0]) {
n = static_cast<int> ((coordold[0]-coordnew[0])/period[0]);
coordnew[0] += n*period[0];
}
while (coordold[0]-coordnew[0] > half[0]) coordnew[0] += period[0];
}
if (yperiodic) {
if (coordnew[1]-coordold[1] > period[1]) {
n = static_cast<int> ((coordnew[1]-coordold[1])/period[1]);
coordnew[1] -= n*period[1];
}
while (coordnew[1]-coordold[1] > half[1]) coordnew[1] -= period[1];
if (coordold[1]-coordnew[1] > period[1]) {
n = static_cast<int> ((coordold[1]-coordnew[1])/period[1]);
coordnew[1] += n*period[1];
}
while (coordold[1]-coordnew[1] > half[1]) coordnew[1] += period[1];
}
if (zperiodic) {
if (coordnew[2]-coordold[2] > period[2]) {
n = static_cast<int> ((coordnew[2]-coordold[2])/period[2]);
coordnew[2] -= n*period[2];
}
while (coordnew[2]-coordold[2] > half[2]) coordnew[2] -= period[2];
if (coordold[2]-coordnew[2] > period[2]) {
n = static_cast<int> ((coordold[2]-coordnew[2])/period[2]);
coordnew[2] += n*period[2];
}
while (coordold[2]-coordnew[2] > half[2]) coordnew[2] += period[2];
}
if (triclinic) lamda2x(coordnew,xnew);
}
/* ----------------------------------------------------------------------
unmap the point via image flags
x overwritten with result, don't reset image flag
for triclinic, use h[] to add in tilt factors in other dims as needed
------------------------------------------------------------------------- */
void Domain::unmap(double *x, imageint image)
{
int xbox = (image & IMGMASK) - IMGMAX;
int ybox = (image >> IMGBITS & IMGMASK) - IMGMAX;
int zbox = (image >> IMG2BITS) - IMGMAX;
if (triclinic == 0) {
x[0] += xbox*xprd;
x[1] += ybox*yprd;
x[2] += zbox*zprd;
} else {
x[0] += h[0]*xbox + h[5]*ybox + h[4]*zbox;
x[1] += h[1]*ybox + h[3]*zbox;
x[2] += h[2]*zbox;
}
}
/* ----------------------------------------------------------------------
unmap the point via image flags
result returned in y, don't reset image flag
for triclinic, use h[] to add in tilt factors in other dims as needed
------------------------------------------------------------------------- */
void Domain::unmap(const double *x, imageint image, double *y)
{
int xbox = (image & IMGMASK) - IMGMAX;
int ybox = (image >> IMGBITS & IMGMASK) - IMGMAX;
int zbox = (image >> IMG2BITS) - IMGMAX;
if (triclinic == 0) {
y[0] = x[0] + xbox*xprd;
y[1] = x[1] + ybox*yprd;
y[2] = x[2] + zbox*zprd;
} else {
y[0] = x[0] + h[0]*xbox + h[5]*ybox + h[4]*zbox;
y[1] = x[1] + h[1]*ybox + h[3]*zbox;
y[2] = x[2] + h[2]*zbox;
}
}
/* ----------------------------------------------------------------------
adjust image flags due to triclinic box flip
flip operation is changing box vectors A,B,C to new A',B',C'
A' = A (A does not change)
B' = B + mA (B shifted by A)
C' = C + pB + nA (C shifted by B and/or A)
this requires the image flags change from (a,b,c) to (a',b',c')
so that x_unwrap for each atom is same before/after
x_unwrap_before = xlocal + aA + bB + cC
x_unwrap_after = xlocal + a'A' + b'B' + c'C'
this requires:
c' = c
b' = b - cp
a' = a - (b-cp)m - cn = a - b'm - cn
in other words, for xy flip, change in x flag depends on current y flag
this is b/c the xy flip dramatically changes which tiled image of
simulation box an unwrapped point maps to
------------------------------------------------------------------------- */
void Domain::image_flip(int m, int n, int p)
{
imageint *image = atom->image;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
int xbox = (image[i] & IMGMASK) - IMGMAX;
int ybox = (image[i] >> IMGBITS & IMGMASK) - IMGMAX;
int zbox = (image[i] >> IMG2BITS) - IMGMAX;
ybox -= p*zbox;
xbox -= m*ybox + n*zbox;
image[i] = ((imageint) (xbox + IMGMAX) & IMGMASK) |
(((imageint) (ybox + IMGMAX) & IMGMASK) << IMGBITS) |
(((imageint) (zbox + IMGMAX) & IMGMASK) << IMG2BITS);
}
}
/* ----------------------------------------------------------------------
return 1 if this proc owns atom with coords x, else return 0
x is returned remapped into periodic box
if image flag is passed, flag is updated via remap(x,image)
if image = NULL is passed, no update with remap(x)
if shrinkexceed, atom can be outside shrinkwrap boundaries
called from create_atoms() in library.cpp
------------------------------------------------------------------------- */
int Domain::ownatom(int id, double *x, imageint *image, int shrinkexceed)
{
double lamda[3];
double *coord,*blo,*bhi,*slo,*shi;
if (image) remap(x,*image);
else remap(x);
if (triclinic) {
x2lamda(x,lamda);
coord = lamda;
} else coord = x;
// box and subbox bounds for orthogonal vs triclinic
if (triclinic == 0) {
blo = boxlo;
bhi = boxhi;
slo = sublo;
shi = subhi;
} else {
blo = boxlo_lamda;
bhi = boxhi_lamda;
slo = sublo_lamda;
shi = subhi_lamda;
}
if (coord[0] >= slo[0] && coord[0] < shi[0] &&
coord[1] >= slo[1] && coord[1] < shi[1] &&
coord[2] >= slo[2] && coord[2] < shi[2]) return 1;
// check if atom did not return 1 only b/c it was
// outside a shrink-wrapped boundary
if (shrinkexceed) {
int outside = 0;
if (coord[0] < blo[0] && boundary[0][0] > 1) outside = 1;
if (coord[0] >= bhi[0] && boundary[0][1] > 1) outside = 1;
if (coord[1] < blo[1] && boundary[1][0] > 1) outside = 1;
if (coord[1] >= bhi[1] && boundary[1][1] > 1) outside = 1;
if (coord[2] < blo[2] && boundary[2][0] > 1) outside = 1;
if (coord[2] >= bhi[2] && boundary[2][1] > 1) outside = 1;
if (!outside) return 0;
// newcoord = coords pushed back to be on shrink-wrapped boundary
// newcoord is a copy, so caller's x[] is not affected
double newcoord[3];
if (coord[0] < blo[0] && boundary[0][0] > 1) newcoord[0] = blo[0];
else if (coord[0] >= bhi[0] && boundary[0][1] > 1) newcoord[0] = bhi[0];
else newcoord[0] = coord[0];
if (coord[1] < blo[1] && boundary[1][1] > 1) newcoord[1] = blo[1];
else if (coord[1] >= bhi[1] && boundary[1][1] > 1) newcoord[1] = bhi[1];
else newcoord[1] = coord[1];
if (coord[2] < blo[2] && boundary[2][2] > 1) newcoord[2] = blo[2];
else if (coord[2] >= bhi[2] && boundary[2][1] > 1) newcoord[2] = bhi[2];
else newcoord[2] = coord[2];
// re-test for newcoord inside my sub-domain
// use <= test for upper-boundary since may have just put atom at boxhi
if (newcoord[0] >= slo[0] && newcoord[0] <= shi[0] &&
newcoord[1] >= slo[1] && newcoord[1] <= shi[1] &&
newcoord[2] >= slo[2] && newcoord[2] <= shi[2]) return 1;
}
return 0;
}
/* ----------------------------------------------------------------------
create a lattice
------------------------------------------------------------------------- */
void Domain::set_lattice(int narg, char **arg)
{
if (lattice) delete lattice;
lattice = new Lattice(lmp,narg,arg);
}
/* ----------------------------------------------------------------------
create a new region
------------------------------------------------------------------------- */
void Domain::add_region(int narg, char **arg)
{
if (narg < 2) error->all(FLERR,"Illegal region command");
if (strcmp(arg[1],"delete") == 0) {
delete_region(narg,arg);
return;
}
if (find_region(arg[0]) >= 0) error->all(FLERR,"Reuse of region ID");
// extend Region list if necessary
if (nregion == maxregion) {
maxregion += DELTAREGION;
regions = (Region **)
memory->srealloc(regions,maxregion*sizeof(Region *),"domain:regions");
}
// create the Region
if (lmp->suffix_enable) {
if (lmp->suffix) {
char estyle[256];
sprintf(estyle,"%s/%s",arg[1],lmp->suffix);
if (region_map->find(estyle) != region_map->end()) {
RegionCreator region_creator = (*region_map)[estyle];
regions[nregion] = region_creator(lmp, narg, arg);
regions[nregion]->init();
nregion++;
return;
}
}
if (lmp->suffix2) {
char estyle[256];
sprintf(estyle,"%s/%s",arg[1],lmp->suffix2);
if (region_map->find(estyle) != region_map->end()) {
RegionCreator region_creator = (*region_map)[estyle];
regions[nregion] = region_creator(lmp, narg, arg);
regions[nregion]->init();
nregion++;
return;
}
}
}
if (strcmp(arg[1],"none") == 0) error->all(FLERR,"Unknown region style");
if (region_map->find(arg[1]) != region_map->end()) {
RegionCreator region_creator = (*region_map)[arg[1]];
regions[nregion] = region_creator(lmp, narg, arg);
}
else error->all(FLERR,"Unknown region style");
// initialize any region variables via init()
// in case region is used between runs, e.g. to print a variable
regions[nregion]->init();
nregion++;
}
/* ----------------------------------------------------------------------
one instance per region style in style_region.h
------------------------------------------------------------------------- */
template <typename T>
Region *Domain::region_creator(LAMMPS *lmp, int narg, char ** arg)
{
return new T(lmp, narg, arg);
}
/* ----------------------------------------------------------------------
delete a region
------------------------------------------------------------------------- */
void Domain::delete_region(int narg, char **arg)
{
if (narg != 2) error->all(FLERR,"Illegal region command");
int iregion = find_region(arg[0]);
if (iregion == -1) error->all(FLERR,"Delete region ID does not exist");
delete regions[iregion];
regions[iregion] = regions[nregion-1];
nregion--;
}
/* ----------------------------------------------------------------------
return region index if name matches existing region ID
return -1 if no such region
------------------------------------------------------------------------- */
int Domain::find_region(char *name)
{
for (int iregion = 0; iregion < nregion; iregion++)
if (strcmp(name,regions[iregion]->id) == 0) return iregion;
return -1;
}
/* ----------------------------------------------------------------------
(re)set boundary settings
flag = 0, called from the input script
flag = 1, called from change box command
------------------------------------------------------------------------- */
void Domain::set_boundary(int narg, char **arg, int flag)
{
if (narg != 3) error->all(FLERR,"Illegal boundary command");
char c;
for (int idim = 0; idim < 3; idim++)
for (int iside = 0; iside < 2; iside++) {
if (iside == 0) c = arg[idim][0];
else if (iside == 1 && strlen(arg[idim]) == 1) c = arg[idim][0];
else c = arg[idim][1];
if (c == 'p') boundary[idim][iside] = 0;
else if (c == 'f') boundary[idim][iside] = 1;
else if (c == 's') boundary[idim][iside] = 2;
else if (c == 'm') boundary[idim][iside] = 3;
else {
if (flag == 0) error->all(FLERR,"Illegal boundary command");
if (flag == 1) error->all(FLERR,"Illegal change_box command");
}
}
for (int idim = 0; idim < 3; idim++)
if ((boundary[idim][0] == 0 && boundary[idim][1]) ||
(boundary[idim][0] && boundary[idim][1] == 0))
error->all(FLERR,"Both sides of boundary must be periodic");
if (boundary[0][0] == 0) xperiodic = 1;
else xperiodic = 0;
if (boundary[1][0] == 0) yperiodic = 1;
else yperiodic = 0;
if (boundary[2][0] == 0) zperiodic = 1;
else zperiodic = 0;
periodicity[0] = xperiodic;
periodicity[1] = yperiodic;
periodicity[2] = zperiodic;
nonperiodic = 0;
if (xperiodic == 0 || yperiodic == 0 || zperiodic == 0) {
nonperiodic = 1;
if (boundary[0][0] >= 2 || boundary[0][1] >= 2 ||
boundary[1][0] >= 2 || boundary[1][1] >= 2 ||
boundary[2][0] >= 2 || boundary[2][1] >= 2) nonperiodic = 2;
}
}
/* ----------------------------------------------------------------------
set domain attributes
------------------------------------------------------------------------- */
void Domain::set_box(int narg, char **arg)
{
if (narg < 1) error->all(FLERR,"Illegal box command");
int iarg = 0;
while (iarg < narg) {
if (strcmp(arg[iarg],"tilt") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal box command");
if (strcmp(arg[iarg+1],"small") == 0) tiltsmall = 1;
else if (strcmp(arg[iarg+1],"large") == 0) tiltsmall = 0;
else error->all(FLERR,"Illegal box command");
iarg += 2;
} else error->all(FLERR,"Illegal box command");
}
}
/* ----------------------------------------------------------------------
print box info, orthogonal or triclinic
------------------------------------------------------------------------- */
void Domain::print_box(const char *str)
{
if (comm->me == 0) {
if (screen) {
if (triclinic == 0)
fprintf(screen,"%sorthogonal box = (%g %g %g) to (%g %g %g)\n",
str,boxlo[0],boxlo[1],boxlo[2],boxhi[0],boxhi[1],boxhi[2]);
else {
char *format = (char *)
"%striclinic box = (%g %g %g) to (%g %g %g) with tilt (%g %g %g)\n";
fprintf(screen,format,
str,boxlo[0],boxlo[1],boxlo[2],boxhi[0],boxhi[1],boxhi[2],
xy,xz,yz);
}
}
if (logfile) {
if (triclinic == 0)
fprintf(logfile,"%sorthogonal box = (%g %g %g) to (%g %g %g)\n",
str,boxlo[0],boxlo[1],boxlo[2],boxhi[0],boxhi[1],boxhi[2]);
else {
char *format = (char *)
"%striclinic box = (%g %g %g) to (%g %g %g) with tilt (%g %g %g)\n";
fprintf(logfile,format,
str,boxlo[0],boxlo[1],boxlo[2],boxhi[0],boxhi[1],boxhi[2],
xy,xz,yz);
}
}
}
}
/* ----------------------------------------------------------------------
format boundary string for output
assume str is 9 chars or more in length
------------------------------------------------------------------------- */
void Domain::boundary_string(char *str)
{
int m = 0;
for (int idim = 0; idim < 3; idim++) {
for (int iside = 0; iside < 2; iside++) {
if (boundary[idim][iside] == 0) str[m++] = 'p';
else if (boundary[idim][iside] == 1) str[m++] = 'f';
else if (boundary[idim][iside] == 2) str[m++] = 's';
else if (boundary[idim][iside] == 3) str[m++] = 'm';
}
str[m++] = ' ';
}
str[8] = '\0';
}
/* ----------------------------------------------------------------------
convert triclinic 0-1 lamda coords to box coords for all N atoms
x = H lamda + x0;
------------------------------------------------------------------------- */
void Domain::lamda2x(int n)
{
double **x = atom->x;
for (int i = 0; i < n; i++) {
x[i][0] = h[0]*x[i][0] + h[5]*x[i][1] + h[4]*x[i][2] + boxlo[0];
x[i][1] = h[1]*x[i][1] + h[3]*x[i][2] + boxlo[1];
x[i][2] = h[2]*x[i][2] + boxlo[2];
}
}
/* ----------------------------------------------------------------------
convert box coords to triclinic 0-1 lamda coords for all N atoms
lamda = H^-1 (x - x0)
------------------------------------------------------------------------- */
void Domain::x2lamda(int n)
{
double delta[3];
double **x = atom->x;
for (int i = 0; i < n; i++) {
delta[0] = x[i][0] - boxlo[0];
delta[1] = x[i][1] - boxlo[1];
delta[2] = x[i][2] - boxlo[2];
x[i][0] = h_inv[0]*delta[0] + h_inv[5]*delta[1] + h_inv[4]*delta[2];
x[i][1] = h_inv[1]*delta[1] + h_inv[3]*delta[2];
x[i][2] = h_inv[2]*delta[2];
}
}
/* ----------------------------------------------------------------------
convert triclinic 0-1 lamda coords to box coords for one atom
x = H lamda + x0;
lamda and x can point to same 3-vector
------------------------------------------------------------------------- */
void Domain::lamda2x(double *lamda, double *x)
{
x[0] = h[0]*lamda[0] + h[5]*lamda[1] + h[4]*lamda[2] + boxlo[0];
x[1] = h[1]*lamda[1] + h[3]*lamda[2] + boxlo[1];
x[2] = h[2]*lamda[2] + boxlo[2];
}
/* ----------------------------------------------------------------------
convert box coords to triclinic 0-1 lamda coords for one atom
lamda = H^-1 (x - x0)
x and lamda can point to same 3-vector
------------------------------------------------------------------------- */
void Domain::x2lamda(double *x, double *lamda)
{
double delta[3];
delta[0] = x[0] - boxlo[0];
delta[1] = x[1] - boxlo[1];
delta[2] = x[2] - boxlo[2];
lamda[0] = h_inv[0]*delta[0] + h_inv[5]*delta[1] + h_inv[4]*delta[2];
lamda[1] = h_inv[1]*delta[1] + h_inv[3]*delta[2];
lamda[2] = h_inv[2]*delta[2];
}
/* ----------------------------------------------------------------------
convert box coords to triclinic 0-1 lamda coords for one atom
use my_boxlo & my_h_inv stored by caller for previous state of box
lamda = H^-1 (x - x0)
x and lamda can point to same 3-vector
------------------------------------------------------------------------- */
void Domain::x2lamda(double *x, double *lamda,
double *my_boxlo, double *my_h_inv)
{
double delta[3];
delta[0] = x[0] - my_boxlo[0];
delta[1] = x[1] - my_boxlo[1];
delta[2] = x[2] - my_boxlo[2];
lamda[0] = my_h_inv[0]*delta[0] + my_h_inv[5]*delta[1] + my_h_inv[4]*delta[2];
lamda[1] = my_h_inv[1]*delta[1] + my_h_inv[3]*delta[2];
lamda[2] = my_h_inv[2]*delta[2];
}
/* ----------------------------------------------------------------------
convert 8 lamda corner pts of lo/hi box to box coords
return bboxlo/hi = bounding box around 8 corner pts in box coords
------------------------------------------------------------------------- */
void Domain::bbox(double *lo, double *hi, double *bboxlo, double *bboxhi)
{
double x[3];
bboxlo[0] = bboxlo[1] = bboxlo[2] = BIG;
bboxhi[0] = bboxhi[1] = bboxhi[2] = -BIG;
x[0] = lo[0]; x[1] = lo[1]; x[2] = lo[2];
lamda2x(x,x);
bboxlo[0] = MIN(bboxlo[0],x[0]); bboxhi[0] = MAX(bboxhi[0],x[0]);
bboxlo[1] = MIN(bboxlo[1],x[1]); bboxhi[1] = MAX(bboxhi[1],x[1]);
bboxlo[2] = MIN(bboxlo[2],x[2]); bboxhi[2] = MAX(bboxhi[2],x[2]);
x[0] = hi[0]; x[1] = lo[1]; x[2] = lo[2];
lamda2x(x,x);
bboxlo[0] = MIN(bboxlo[0],x[0]); bboxhi[0] = MAX(bboxhi[0],x[0]);
bboxlo[1] = MIN(bboxlo[1],x[1]); bboxhi[1] = MAX(bboxhi[1],x[1]);
bboxlo[2] = MIN(bboxlo[2],x[2]); bboxhi[2] = MAX(bboxhi[2],x[2]);
x[0] = lo[0]; x[1] = hi[1]; x[2] = lo[2];
lamda2x(x,x);
bboxlo[0] = MIN(bboxlo[0],x[0]); bboxhi[0] = MAX(bboxhi[0],x[0]);
bboxlo[1] = MIN(bboxlo[1],x[1]); bboxhi[1] = MAX(bboxhi[1],x[1]);
bboxlo[2] = MIN(bboxlo[2],x[2]); bboxhi[2] = MAX(bboxhi[2],x[2]);
x[0] = hi[0]; x[1] = hi[1]; x[2] = lo[2];
lamda2x(x,x);
bboxlo[0] = MIN(bboxlo[0],x[0]); bboxhi[0] = MAX(bboxhi[0],x[0]);
bboxlo[1] = MIN(bboxlo[1],x[1]); bboxhi[1] = MAX(bboxhi[1],x[1]);
bboxlo[2] = MIN(bboxlo[2],x[2]); bboxhi[2] = MAX(bboxhi[2],x[2]);
x[0] = lo[0]; x[1] = lo[1]; x[2] = hi[2];
lamda2x(x,x);
bboxlo[0] = MIN(bboxlo[0],x[0]); bboxhi[0] = MAX(bboxhi[0],x[0]);
bboxlo[1] = MIN(bboxlo[1],x[1]); bboxhi[1] = MAX(bboxhi[1],x[1]);
bboxlo[2] = MIN(bboxlo[2],x[2]); bboxhi[2] = MAX(bboxhi[2],x[2]);
x[0] = hi[0]; x[1] = lo[1]; x[2] = hi[2];
lamda2x(x,x);
bboxlo[0] = MIN(bboxlo[0],x[0]); bboxhi[0] = MAX(bboxhi[0],x[0]);
bboxlo[1] = MIN(bboxlo[1],x[1]); bboxhi[1] = MAX(bboxhi[1],x[1]);
bboxlo[2] = MIN(bboxlo[2],x[2]); bboxhi[2] = MAX(bboxhi[2],x[2]);
x[0] = lo[0]; x[1] = hi[1]; x[2] = hi[2];
lamda2x(x,x);
bboxlo[0] = MIN(bboxlo[0],x[0]); bboxhi[0] = MAX(bboxhi[0],x[0]);
bboxlo[1] = MIN(bboxlo[1],x[1]); bboxhi[1] = MAX(bboxhi[1],x[1]);
bboxlo[2] = MIN(bboxlo[2],x[2]); bboxhi[2] = MAX(bboxhi[2],x[2]);
x[0] = hi[0]; x[1] = hi[1]; x[2] = hi[2];
lamda2x(x,x);
bboxlo[0] = MIN(bboxlo[0],x[0]); bboxhi[0] = MAX(bboxhi[0],x[0]);
bboxlo[1] = MIN(bboxlo[1],x[1]); bboxhi[1] = MAX(bboxhi[1],x[1]);
bboxlo[2] = MIN(bboxlo[2],x[2]); bboxhi[2] = MAX(bboxhi[2],x[2]);
}
/* ----------------------------------------------------------------------
compute 8 corner pts of my triclinic sub-box
output is in corners, see ordering in lamda_box_corners
------------------------------------------------------------------------- */
void Domain::box_corners()
{
lamda_box_corners(boxlo_lamda,boxhi_lamda);
}
/* ----------------------------------------------------------------------
compute 8 corner pts of my triclinic sub-box
output is in corners, see ordering in lamda_box_corners
------------------------------------------------------------------------- */
void Domain::subbox_corners()
{
lamda_box_corners(sublo_lamda,subhi_lamda);
}
/* ----------------------------------------------------------------------
compute 8 corner pts of any triclinic box with lo/hi in lamda coords
8 output corners are ordered with x changing fastest, then y, finally z
could be more efficient if just coded with xy,yz,xz explicitly
------------------------------------------------------------------------- */
void Domain::lamda_box_corners(double *lo, double *hi)
{
corners[0][0] = lo[0]; corners[0][1] = lo[1]; corners[0][2] = lo[2];
lamda2x(corners[0],corners[0]);
corners[1][0] = hi[0]; corners[1][1] = lo[1]; corners[1][2] = lo[2];
lamda2x(corners[1],corners[1]);
corners[2][0] = lo[0]; corners[2][1] = hi[1]; corners[2][2] = lo[2];
lamda2x(corners[2],corners[2]);
corners[3][0] = hi[0]; corners[3][1] = hi[1]; corners[3][2] = lo[2];
lamda2x(corners[3],corners[3]);
corners[4][0] = lo[0]; corners[4][1] = lo[1]; corners[4][2] = hi[2];
lamda2x(corners[4],corners[4]);
corners[5][0] = hi[0]; corners[5][1] = lo[1]; corners[5][2] = hi[2];
lamda2x(corners[5],corners[5]);
corners[6][0] = lo[0]; corners[6][1] = hi[1]; corners[6][2] = hi[2];
lamda2x(corners[6],corners[6]);
corners[7][0] = hi[0]; corners[7][1] = hi[1]; corners[7][2] = hi[2];
lamda2x(corners[7],corners[7]);
}
diff --git a/src/domain.h b/src/domain.h
index b8bf1657c..22e319123 100644
--- a/src/domain.h
+++ b/src/domain.h
@@ -1,280 +1,281 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#ifndef LMP_DOMAIN_H
#define LMP_DOMAIN_H
#include <math.h>
#include "pointers.h"
#include <map>
#include <string>
namespace LAMMPS_NS {
class Domain : protected Pointers {
public:
int box_exist; // 0 = not yet created, 1 = exists
int dimension; // 2 = 2d, 3 = 3d
int nonperiodic; // 0 = periodic in all 3 dims
// 1 = periodic or fixed in all 6
// 2 = shrink-wrap in any of 6
int xperiodic,yperiodic,zperiodic; // 0 = non-periodic, 1 = periodic
int periodicity[3]; // xyz periodicity as array
int boundary[3][2]; // settings for 6 boundaries
// 0 = periodic
// 1 = fixed non-periodic
// 2 = shrink-wrap non-periodic
// 3 = shrink-wrap non-per w/ min
int triclinic; // 0 = orthog box, 1 = triclinic
int tiltsmall; // 1 if limit tilt, else 0
// orthogonal box
double xprd,yprd,zprd; // global box dimensions
double xprd_half,yprd_half,zprd_half; // half dimensions
double prd[3]; // array form of dimensions
double prd_half[3]; // array form of half dimensions
// triclinic box
// xprd,xprd_half,prd,prd_half =
// same as if untilted
double prd_lamda[3]; // lamda box = (1,1,1)
double prd_half_lamda[3]; // lamda half box = (0.5,0.5,0.5)
double boxlo[3],boxhi[3]; // orthogonal box global bounds
// triclinic box
// boxlo/hi = same as if untilted
double boxlo_lamda[3],boxhi_lamda[3]; // lamda box = (0,1)
double boxlo_bound[3],boxhi_bound[3]; // bounding box of tilted domain
double corners[8][3]; // 8 corner points
// orthogonal box & triclinic box
double minxlo,minxhi; // minimum size of global box
double minylo,minyhi; // when shrink-wrapping
double minzlo,minzhi; // tri only possible for non-skew dims
// orthogonal box
double sublo[3],subhi[3]; // sub-box bounds on this proc
// triclinic box
// sublo/hi = undefined
double sublo_lamda[3],subhi_lamda[3]; // bounds of subbox in lamda
// triclinic box
double xy,xz,yz; // 3 tilt factors
double h[6],h_inv[6]; // shape matrix in Voigt notation
double h_rate[6],h_ratelo[3]; // rate of box size/shape change
int box_change; // 1 if any of next 3 flags are set, else 0
int box_change_size; // 1 if box size changes, 0 if not
int box_change_shape; // 1 if box shape changes, 0 if not
int box_change_domain; // 1 if proc sub-domains change, 0 if not
int deform_flag; // 1 if fix deform exist, else 0
int deform_vremap; // 1 if fix deform remaps v, else 0
int deform_groupbit; // atom group to perform v remap for
class Lattice *lattice; // user-defined lattice
int nregion; // # of defined Regions
int maxregion; // max # list can hold
class Region **regions; // list of defined Regions
int copymode;
typedef Region *(*RegionCreator)(LAMMPS *,int,char**);
typedef std::map<std::string,RegionCreator> RegionCreatorMap;
RegionCreatorMap *region_map;
Domain(class LAMMPS *);
virtual ~Domain();
virtual void init();
void set_initial_box(int expandflag=1);
virtual void set_global_box();
virtual void set_lamda_box();
virtual void set_local_box();
virtual void reset_box();
virtual void pbc();
void image_check();
void box_too_small_check();
void subbox_too_small_check(double);
void minimum_image(double &, double &, double &);
void minimum_image(double *);
int closest_image(int, int);
+ int closest_image(double *, int);
void closest_image(const double * const, const double * const,
double * const);
void remap(double *, imageint &);
void remap(double *);
void remap_near(double *, double *);
void unmap(double *, imageint);
void unmap(const double *, imageint, double *);
void image_flip(int, int, int);
int ownatom(int, double *, imageint *, int);
void set_lattice(int, char **);
void add_region(int, char **);
void delete_region(int, char **);
int find_region(char *);
void set_boundary(int, char **, int);
void set_box(int, char **);
void print_box(const char *);
void boundary_string(char *);
virtual void lamda2x(int);
virtual void x2lamda(int);
virtual void lamda2x(double *, double *);
virtual void x2lamda(double *, double *);
int inside(double *);
int inside_nonperiodic(double *);
void x2lamda(double *, double *, double *, double *);
void bbox(double *, double *, double *, double *);
void box_corners();
void subbox_corners();
void lamda_box_corners(double *, double *);
// minimum image convention check
// return 1 if any distance > 1/2 of box size
// indicates a special neighbor is actually not in a bond,
// but is a far-away image that should be treated as an unbonded neighbor
// inline since called from neighbor build inner loop
inline int minimum_image_check(double dx, double dy, double dz) {
if (xperiodic && fabs(dx) > xprd_half) return 1;
if (yperiodic && fabs(dy) > yprd_half) return 1;
if (zperiodic && fabs(dz) > zprd_half) return 1;
return 0;
}
protected:
double small[3]; // fractions of box lengths
private:
template <typename T> static Region *region_creator(LAMMPS *,int,char**);
};
}
#endif
/* ERROR/WARNING messages:
E: Box bounds are invalid
The box boundaries specified in the read_data file are invalid. The
lo value must be less than the hi value for all 3 dimensions.
E: Cannot skew triclinic box in z for 2d simulation
Self-explanatory.
E: Triclinic box skew is too large
The displacement in a skewed direction must be less than half the box
length in that dimension. E.g. the xy tilt must be between -half and
+half of the x box length. This constraint can be relaxed by using
the box tilt command.
W: Triclinic box skew is large
The displacement in a skewed direction is normally required to be less
than half the box length in that dimension. E.g. the xy tilt must be
between -half and +half of the x box length. You have relaxed the
constraint using the box tilt command, but the warning means that a
LAMMPS simulation may be inefficient as a result.
E: Illegal simulation box
The lower bound of the simulation box is greater than the upper bound.
E: Bond atom missing in image check
The 2nd atom in a particular bond is missing on this processor.
Typically this is because the pairwise cutoff is set too short or the
bond has blown apart and an atom is too far away.
W: Inconsistent image flags
The image flags for a pair on bonded atoms appear to be inconsistent.
Inconsistent means that when the coordinates of the two atoms are
unwrapped using the image flags, the two atoms are far apart.
Specifically they are further apart than half a periodic box length.
Or they are more than a box length apart in a non-periodic dimension.
This is usually due to the initial data file not having correct image
flags for the 2 atoms in a bond that straddles a periodic boundary.
They should be different by 1 in that case. This is a warning because
inconsistent image flags will not cause problems for dynamics or most
LAMMPS simulations. However they can cause problems when such atoms
are used with the fix rigid or replicate commands.
W: Bond atom missing in image check
The 2nd atom in a particular bond is missing on this processor.
Typically this is because the pairwise cutoff is set too short or the
bond has blown apart and an atom is too far away.
E: Bond atom missing in box size check
The 2nd atoms needed to compute a particular bond is missing on this
processor. Typically this is because the pairwise cutoff is set too
short or the bond has blown apart and an atom is too far away.
W: Bond atom missing in box size check
The 2nd atoms needed to compute a particular bond is missing on this
processor. Typically this is because the pairwise cutoff is set too
short or the bond has blown apart and an atom is too far away.
W: Bond/angle/dihedral extent > half of periodic box length
This is a restriction because LAMMPS can be confused about which image
of an atom in the bonded interaction is the correct one to use.
"Extent" in this context means the maximum end-to-end length of the
bond/angle/dihedral. LAMMPS computes this by taking the maximum bond
length, multiplying by the number of bonds in the interaction (e.g. 3
for a dihedral) and adding a small amount of stretch.
W: Proc sub-domain size < neighbor skin, could lead to lost atoms
The decomposition of the physical domain (likely due to load
balancing) has led to a processor's sub-domain being smaller than the
neighbor skin in one or more dimensions. Since reneighboring is
triggered by atoms moving the skin distance, this may lead to lost
atoms, if an atom moves all the way across a neighboring processor's
sub-domain before reneighboring is triggered.
E: Illegal ... command
Self-explanatory. Check the input script syntax and compare to the
documentation for the command. You can use -echo screen as a
command-line option when running LAMMPS to see the offending line.
E: Reuse of region ID
A region ID cannot be used twice.
E: Unknown region style
The choice of region style is unknown.
E: Delete region ID does not exist
Self-explanatory.
E: Both sides of boundary must be periodic
Cannot specify a boundary as periodic only on the lo or hi side. Must
be periodic on both sides.
*/
diff --git a/src/error.cpp b/src/error.cpp
index 5c24d9483..0969507fc 100644
--- a/src/error.cpp
+++ b/src/error.cpp
@@ -1,244 +1,252 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include <mpi.h>
#include <stdlib.h>
#include <string.h>
#include "error.h"
#include "universe.h"
#include "output.h"
+#include "input.h"
using namespace LAMMPS_NS;
/* ---------------------------------------------------------------------- */
Error::Error(LAMMPS *lmp) : Pointers(lmp) {
#ifdef LAMMPS_EXCEPTIONS
last_error_message = NULL;
last_error_type = ERROR_NONE;
#endif
}
/* ----------------------------------------------------------------------
called by all procs in universe
close all output, screen, and log files in world and universe
no abort, so insure all procs in universe call, else will hang
------------------------------------------------------------------------- */
void Error::universe_all(const char *file, int line, const char *str)
{
MPI_Barrier(universe->uworld);
if (universe->me == 0) {
if (universe->uscreen) fprintf(universe->uscreen,
"ERROR: %s (%s:%d)\n",str,file,line);
if (universe->ulogfile) fprintf(universe->ulogfile,
"ERROR: %s (%s:%d)\n",str,file,line);
}
if (output) delete output;
if (universe->nworlds > 1) {
if (screen && screen != stdout) fclose(screen);
if (logfile) fclose(logfile);
}
if (universe->ulogfile) fclose(universe->ulogfile);
#ifdef LAMMPS_EXCEPTIONS
char msg[100];
sprintf(msg, "ERROR: %s (%s:%d)\n", str, file, line);
throw LAMMPSException(msg);
#else
MPI_Finalize();
exit(1);
#endif
}
/* ----------------------------------------------------------------------
called by one proc in universe
forces abort of entire universe if any proc in universe calls
------------------------------------------------------------------------- */
void Error::universe_one(const char *file, int line, const char *str)
{
if (universe->uscreen)
fprintf(universe->uscreen,"ERROR on proc %d: %s (%s:%d)\n",
universe->me,str,file,line);
#ifdef LAMMPS_EXCEPTIONS
char msg[100];
sprintf(msg, "ERROR: %s (%s:%d)\n", str, file, line);
throw LAMMPSAbortException(msg, universe->uworld);
#else
MPI_Abort(universe->uworld,1);
#endif
}
/* ----------------------------------------------------------------------
called by one proc in universe
prints a warning message to the screen
------------------------------------------------------------------------- */
void Error::universe_warn(const char *file, int line, const char *str)
{
if (universe->uscreen)
fprintf(universe->uscreen,"WARNING on proc %d: %s (%s:%d)\n",
universe->me,str,file,line);
}
/* ----------------------------------------------------------------------
called by all procs in one world
close all output, screen, and log files in world
insure all procs in world call, else will hang
force MPI_Abort if running in multi-partition mode
------------------------------------------------------------------------- */
void Error::all(const char *file, int line, const char *str)
{
MPI_Barrier(world);
int me;
+ const char *lastcmd = (const char*)"(unknown)";
+
MPI_Comm_rank(world,&me);
if (me == 0) {
- if (screen) fprintf(screen,"ERROR: %s (%s:%d)\n",str,file,line);
- if (logfile) fprintf(logfile,"ERROR: %s (%s:%d)\n",str,file,line);
+ if (input && input->line) lastcmd = input->line;
+ if (screen) fprintf(screen,"ERROR: %s (%s:%d)\n"
+ "Last command: %s\n",
+ str,file,line,lastcmd);
+ if (logfile) fprintf(logfile,"ERROR: %s (%s:%d)\n"
+ "Last command: %s\n",
+ str,file,line,lastcmd);
}
#ifdef LAMMPS_EXCEPTIONS
char msg[100];
sprintf(msg, "ERROR: %s (%s:%d)\n", str, file, line);
if (universe->nworlds > 1) {
throw LAMMPSAbortException(msg, universe->uworld);
}
throw LAMMPSException(msg);
#else
if (output) delete output;
if (screen && screen != stdout) fclose(screen);
if (logfile) fclose(logfile);
if (universe->nworlds > 1) MPI_Abort(universe->uworld,1);
MPI_Finalize();
exit(1);
#endif
}
/* ----------------------------------------------------------------------
called by one proc in world
write to world screen only if non-NULL on this proc
always write to universe screen
forces abort of entire world (and universe) if any proc in world calls
------------------------------------------------------------------------- */
void Error::one(const char *file, int line, const char *str)
{
int me;
MPI_Comm_rank(world,&me);
if (screen) fprintf(screen,"ERROR on proc %d: %s (%s:%d)\n",
me,str,file,line);
if (universe->nworlds > 1)
if (universe->uscreen)
fprintf(universe->uscreen,"ERROR on proc %d: %s (%s:%d)\n",
universe->me,str,file,line);
#ifdef LAMMPS_EXCEPTIONS
char msg[100];
sprintf(msg, "ERROR on proc %d: %s (%s:%d)\n", me, str, file, line);
throw LAMMPSAbortException(msg, world);
#else
MPI_Abort(world,1);
#endif
}
/* ----------------------------------------------------------------------
called by one proc in world
only write to screen if non-NULL on this proc since could be file
------------------------------------------------------------------------- */
void Error::warning(const char *file, int line, const char *str, int logflag)
{
if (screen) fprintf(screen,"WARNING: %s (%s:%d)\n",str,file,line);
if (logflag && logfile) fprintf(logfile,"WARNING: %s (%s:%d)\n",
str,file,line);
}
/* ----------------------------------------------------------------------
called by one proc in world, typically proc 0
write message to screen and logfile (if logflag is set)
------------------------------------------------------------------------- */
void Error::message(const char *file, int line, const char *str, int logflag)
{
if (screen) fprintf(screen,"%s (%s:%d)\n",str,file,line);
if (logflag && logfile) fprintf(logfile,"%s (%s:%d)\n",str,file,line);
}
/* ----------------------------------------------------------------------
shutdown LAMMPS
called by all procs in one world
close all output, screen, and log files in world
no abort, so insure all procs in world call, else will hang
------------------------------------------------------------------------- */
void Error::done(int status)
{
MPI_Barrier(world);
if (output) delete output;
if (screen && screen != stdout) fclose(screen);
if (logfile) fclose(logfile);
MPI_Finalize();
exit(status);
}
#ifdef LAMMPS_EXCEPTIONS
/* ----------------------------------------------------------------------
return the last error message reported by LAMMPS (only used if
compiled with -DLAMMPS_EXCEPTIONS)
------------------------------------------------------------------------- */
char * Error::get_last_error() const
{
return last_error_message;
}
/* ----------------------------------------------------------------------
return the type of the last error reported by LAMMPS (only used if
compiled with -DLAMMPS_EXCEPTIONS)
------------------------------------------------------------------------- */
ErrorType Error::get_last_error_type() const
{
return last_error_type;
}
/* ----------------------------------------------------------------------
set the last error message and error type
(only used if compiled with -DLAMMPS_EXCEPTIONS)
------------------------------------------------------------------------- */
void Error::set_last_error(const char * msg, ErrorType type)
{
delete [] last_error_message;
if(msg) {
last_error_message = new char[strlen(msg)+1];
strcpy(last_error_message, msg);
} else {
last_error_message = NULL;
}
last_error_type = type;
}
#endif
diff --git a/src/finish.cpp b/src/finish.cpp
index 9222d4e86..015ac46ca 100644
--- a/src/finish.cpp
+++ b/src/finish.cpp
@@ -1,978 +1,978 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include <mpi.h>
#include <math.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include "finish.h"
#include "timer.h"
#include "universe.h"
#include "accelerator_kokkos.h"
#include "accelerator_omp.h"
#include "atom.h"
#include "atom_vec.h"
#include "molecule.h"
#include "comm.h"
#include "force.h"
#include "kspace.h"
#include "update.h"
#include "min.h"
#include "neighbor.h"
#include "neigh_list.h"
#include "neigh_request.h"
#include "output.h"
#include "memory.h"
#include "error.h"
#ifdef LMP_USER_OMP
#include "modify.h"
#include "fix_omp.h"
#include "thr_data.h"
#endif
using namespace LAMMPS_NS;
// local function prototypes, code at end of file
static void mpi_timings(const char *label, Timer *t, enum Timer::ttype tt,
MPI_Comm world, const int nprocs, const int nthreads,
const int me, double time_loop, FILE *scr, FILE *log);
#ifdef LMP_USER_OMP
static void omp_times(FixOMP *fix, const char *label, enum Timer::ttype which,
const int nthreads,FILE *scr, FILE *log);
#endif
/* ---------------------------------------------------------------------- */
Finish::Finish(LAMMPS *lmp) : Pointers(lmp) {}
/* ---------------------------------------------------------------------- */
void Finish::end(int flag)
{
int i,m,nneigh,nneighfull;
int histo[10];
int minflag,prdflag,tadflag,hyperflag;
int timeflag,fftflag,histoflag,neighflag;
double time,tmp,ave,max,min;
double time_loop,time_other,cpu_loop;
int me,nprocs;
MPI_Comm_rank(world,&me);
MPI_Comm_size(world,&nprocs);
const int nthreads = comm->nthreads;
// recompute natoms in case atoms have been lost
bigint nblocal = atom->nlocal;
MPI_Allreduce(&nblocal,&atom->natoms,1,MPI_LMP_BIGINT,MPI_SUM,world);
// choose flavors of statistical output
// flag determines caller
// flag = 0 = just loop summary
// flag = 1 = dynamics or minimization
// flag = 2 = PRD
// flag = 3 = TAD
// flag = 4 = HYPER
// turn off neighflag for Kspace partition of verlet/split integrator
minflag = prdflag = tadflag = hyperflag = 0;
timeflag = fftflag = histoflag = neighflag = 0;
time_loop = cpu_loop = time_other = 0.0;
if (flag == 1) {
if (update->whichflag == 2) minflag = 1;
timeflag = histoflag = 1;
neighflag = 1;
if (update->whichflag == 1 &&
strncmp(update->integrate_style,"verlet/split",12) == 0 &&
universe->iworld == 1) neighflag = 0;
if (force->kspace && force->kspace_match("pppm",0)
&& force->kspace->fftbench) fftflag = 1;
}
if (flag == 2) prdflag = timeflag = histoflag = neighflag = 1;
if (flag == 3) tadflag = histoflag = neighflag = 1;
if (flag == 4) hyperflag = timeflag = histoflag = neighflag = 1;
// loop stats
if (timer->has_loop()) {
// overall loop time
time_loop = timer->get_wall(Timer::TOTAL);
cpu_loop = timer->get_cpu(Timer::TOTAL);
MPI_Allreduce(&time_loop,&tmp,1,MPI_DOUBLE,MPI_SUM,world);
time_loop = tmp/nprocs;
MPI_Allreduce(&cpu_loop,&tmp,1,MPI_DOUBLE,MPI_SUM,world);
cpu_loop = tmp/nprocs;
if (time_loop > 0.0) cpu_loop = cpu_loop/time_loop*100.0;
if (me == 0) {
int ntasks = nprocs * nthreads;
const char fmt1[] = "Loop time of %g on %d procs "
"for %d steps with " BIGINT_FORMAT " atoms\n\n";
if (screen) fprintf(screen,fmt1,time_loop,ntasks,update->nsteps,
atom->natoms);
if (logfile) fprintf(logfile,fmt1,time_loop,ntasks,update->nsteps,
atom->natoms);
// Gromacs/NAMD-style performance metric for suitable unit settings
if ( timeflag && !minflag && !prdflag && !tadflag &&
(update->nsteps > 0) && (update->dt != 0.0) &&
((strcmp(update->unit_style,"lj") == 0) ||
(strcmp(update->unit_style,"metal") == 0) ||
(strcmp(update->unit_style,"micro") == 0) ||
(strcmp(update->unit_style,"nano") == 0) ||
(strcmp(update->unit_style,"electron") == 0) ||
(strcmp(update->unit_style,"real") == 0)) ) {
double one_fs = force->femtosecond;
double t_step = ((double) time_loop) / ((double) update->nsteps);
double step_t = 1.0/t_step;
if (strcmp(update->unit_style,"lj") == 0) {
double tau_day = 24.0*3600.0 / t_step * update->dt / one_fs;
const char perf[] = "Performance: %.3f tau/day, %.3f timesteps/s\n";
if (screen) fprintf(screen,perf,tau_day,step_t);
if (logfile) fprintf(logfile,perf,tau_day,step_t);
} else {
double hrs_ns = t_step / update->dt * 1000000.0 * one_fs / 3600.0;
double ns_day = 24.0*3600.0 / t_step * update->dt / one_fs/1000000.0;
const char perf[] =
"Performance: %.3f ns/day, %.3f hours/ns, %.3f timesteps/s\n";
if (screen) fprintf(screen,perf,ns_day,hrs_ns,step_t);
if (logfile) fprintf(logfile,perf,ns_day,hrs_ns,step_t);
}
}
// CPU use on MPI tasks and OpenMP threads
if (lmp->kokkos) {
const char fmt2[] =
"%.1f%% CPU use with %d MPI tasks x %d OpenMP threads\n";
if (screen) fprintf(screen,fmt2,cpu_loop,nprocs,
lmp->kokkos->num_threads);
if (logfile) fprintf(logfile,fmt2,cpu_loop,nprocs,
lmp->kokkos->num_threads);
} else {
#if defined(_OPENMP)
const char fmt2[] =
"%.1f%% CPU use with %d MPI tasks x %d OpenMP threads\n";
if (screen) fprintf(screen,fmt2,cpu_loop,nprocs,nthreads);
if (logfile) fprintf(logfile,fmt2,cpu_loop,nprocs,nthreads);
#else
const char fmt2[] =
"%.1f%% CPU use with %d MPI tasks x no OpenMP threads\n";
if (screen) fprintf(screen,fmt2,cpu_loop,nprocs);
if (logfile) fprintf(logfile,fmt2,cpu_loop,nprocs);
#endif
}
}
}
// avoid division by zero for very short runs
if (time_loop == 0.0) time_loop = 1.0;
if (cpu_loop == 0.0) cpu_loop = 100.0;
// get "Other" wall time for later use
if (timer->has_normal())
time_other = timer->get_wall(Timer::TOTAL) - timer->get_wall(Timer::ALL);
// minimization stats
if (minflag) {
if (me == 0) {
if (screen) fprintf(screen,"\n");
if (logfile) fprintf(logfile,"\n");
}
if (me == 0) {
if (screen) {
fprintf(screen,"Minimization stats:\n");
fprintf(screen," Stopping criterion = %s\n",
update->minimize->stopstr);
fprintf(screen," Energy initial, next-to-last, final = \n"
" %18.12g %18.12g %18.12g\n",
update->minimize->einitial,update->minimize->eprevious,
update->minimize->efinal);
fprintf(screen," Force two-norm initial, final = %g %g\n",
update->minimize->fnorm2_init,update->minimize->fnorm2_final);
fprintf(screen," Force max component initial, final = %g %g\n",
update->minimize->fnorminf_init,
update->minimize->fnorminf_final);
fprintf(screen," Final line search alpha, max atom move = %g %g\n",
update->minimize->alpha_final,
update->minimize->alpha_final*
update->minimize->fnorminf_final);
fprintf(screen," Iterations, force evaluations = %d %d\n",
update->minimize->niter,update->minimize->neval);
}
if (logfile) {
fprintf(logfile,"Minimization stats:\n");
fprintf(logfile," Stopping criterion = %s\n",
update->minimize->stopstr);
fprintf(logfile," Energy initial, next-to-last, final = \n"
" %18.12g %18.12g %18.12g\n",
update->minimize->einitial,update->minimize->eprevious,
update->minimize->efinal);
fprintf(logfile," Force two-norm initial, final = %g %g\n",
update->minimize->fnorm2_init,update->minimize->fnorm2_final);
fprintf(logfile," Force max component initial, final = %g %g\n",
update->minimize->fnorminf_init,
update->minimize->fnorminf_final);
fprintf(logfile," Final line search alpha, max atom move = %g %g\n",
update->minimize->alpha_final,
update->minimize->alpha_final*
update->minimize->fnorminf_final);
fprintf(logfile," Iterations, force evaluations = %d %d\n",
update->minimize->niter,update->minimize->neval);
}
}
}
- // PRD stats using PAIR,BOND,KSPACE for dephase,dynamics,quench
+ // PRD stats
if (prdflag) {
if (me == 0) {
if (screen) fprintf(screen,"\nPRD stats:\n");
if (logfile) fprintf(logfile,"\nPRD stats:\n");
}
time = timer->get_wall(Timer::DEPHASE);
MPI_Allreduce(&time,&tmp,1,MPI_DOUBLE,MPI_SUM,world);
time = tmp/nprocs;
if (me == 0) {
if (screen)
fprintf(screen," Dephase time (%%) = %g (%g)\n",
time,time/time_loop*100.0);
if (logfile)
fprintf(logfile," Dephase time (%%) = %g (%g)\n",
time,time/time_loop*100.0);
}
time = timer->get_wall(Timer::DYNAMICS);
MPI_Allreduce(&time,&tmp,1,MPI_DOUBLE,MPI_SUM,world);
time = tmp/nprocs;
if (me == 0) {
if (screen)
fprintf(screen," Dynamics time (%%) = %g (%g)\n",
time,time/time_loop*100.0);
if (logfile)
fprintf(logfile," Dynamics time (%%) = %g (%g)\n",
time,time/time_loop*100.0);
}
time = timer->get_wall(Timer::QUENCH);
MPI_Allreduce(&time,&tmp,1,MPI_DOUBLE,MPI_SUM,world);
time = tmp/nprocs;
if (me == 0) {
if (screen)
fprintf(screen," Quench time (%%) = %g (%g)\n",
time,time/time_loop*100.0);
if (logfile)
fprintf(logfile," Quench time (%%) = %g (%g)\n",
time,time/time_loop*100.0);
}
time = timer->get_wall(Timer::REPCOMM);
MPI_Allreduce(&time,&tmp,1,MPI_DOUBLE,MPI_SUM,world);
time = tmp/nprocs;
if (me == 0) {
if (screen)
fprintf(screen," Comm time (%%) = %g (%g)\n",
time,time/time_loop*100.0);
if (logfile)
fprintf(logfile," Comm time (%%) = %g (%g)\n",
time,time/time_loop*100.0);
}
time = timer->get_wall(Timer::REPOUT);
MPI_Allreduce(&time,&tmp,1,MPI_DOUBLE,MPI_SUM,world);
time = tmp/nprocs;
if (me == 0) {
if (screen)
fprintf(screen," Output time (%%) = %g (%g)\n",
time,time/time_loop*100.0);
if (logfile)
fprintf(logfile," Output time (%%) = %g (%g)\n",
time,time/time_loop*100.0);
}
time = time_other;
MPI_Allreduce(&time,&tmp,1,MPI_DOUBLE,MPI_SUM,world);
time = tmp/nprocs;
if (me == 0) { // XXXX: replica comm, replica output
if (screen)
fprintf(screen," Other time (%%) = %g (%g)\n",
time,time/time_loop*100.0);
if (logfile)
fprintf(logfile," Other time (%%) = %g (%g)\n",
time,time/time_loop*100.0);
}
}
- // TAD stats using PAIR,BOND,KSPACE for neb,dynamics,quench
+ // TAD stats
if (tadflag) {
if (me == 0) {
if (screen) fprintf(screen,"\n");
if (logfile) fprintf(logfile,"\n");
}
if (screen) fprintf(screen,"TAD stats:\n");
if (logfile) fprintf(logfile,"TAD stats:\n");
time = timer->get_wall(Timer::NEB);
MPI_Allreduce(&time,&tmp,1,MPI_DOUBLE,MPI_SUM,world);
time = tmp/nprocs;
if (me == 0) {
if (screen)
fprintf(screen," NEB time (%%) = %g (%g)\n",
time,time/time_loop*100.0);
if (logfile)
fprintf(logfile," NEB time (%%) = %g (%g)\n",
time,time/time_loop*100.0);
}
time = timer->get_wall(Timer::DYNAMICS);
MPI_Allreduce(&time,&tmp,1,MPI_DOUBLE,MPI_SUM,world);
time = tmp/nprocs;
if (me == 0) {
if (screen)
fprintf(screen," Dynamics time (%%) = %g (%g)\n",
time,time/time_loop*100.0);
if (logfile)
fprintf(logfile," Dynamics time (%%) = %g (%g)\n",
time,time/time_loop*100.0);
}
time = timer->get_wall(Timer::QUENCH);
MPI_Allreduce(&time,&tmp,1,MPI_DOUBLE,MPI_SUM,world);
time = tmp/nprocs;
if (me == 0) {
if (screen)
fprintf(screen," Quench time (%%) = %g (%g)\n",
time,time/time_loop*100.0);
if (logfile)
fprintf(logfile," Quench time (%%) = %g (%g)\n",
time,time/time_loop*100.0);
}
time = timer->get_wall(Timer::REPCOMM);
MPI_Allreduce(&time,&tmp,1,MPI_DOUBLE,MPI_SUM,world);
time = tmp/nprocs;
if (me == 0) {
if (screen)
fprintf(screen," Comm time (%%) = %g (%g)\n",
time,time/time_loop*100.0);
if (logfile)
fprintf(logfile," Comm time (%%) = %g (%g)\n",
time,time/time_loop*100.0);
}
time = timer->get_wall(Timer::REPOUT);
MPI_Allreduce(&time,&tmp,1,MPI_DOUBLE,MPI_SUM,world);
time = tmp/nprocs;
if (me == 0) {
if (screen)
fprintf(screen," Output time (%%) = %g (%g)\n",
time,time/time_loop*100.0);
if (logfile)
fprintf(logfile," Output time (%%) = %g (%g)\n",
time,time/time_loop*100.0);
}
time = time_other;
MPI_Allreduce(&time,&tmp,1,MPI_DOUBLE,MPI_SUM,world);
time = tmp/nprocs;
if (me == 0) {
if (screen)
fprintf(screen," Other time (%%) = %g (%g)\n",
time,time/time_loop*100.0);
if (logfile)
fprintf(logfile," Other time (%%) = %g (%g)\n",
time,time/time_loop*100.0);
}
}
- // HYPER stats using PAIR,BOND,KSPACE for dynamics,quench
+ // HYPER stats
if (hyperflag) {
if (me == 0) {
if (screen) fprintf(screen,"\nHyper stats:\n");
if (logfile) fprintf(logfile,"\nHyper stats:\n");
}
time = timer->get_wall(Timer::DYNAMICS);
MPI_Allreduce(&time,&tmp,1,MPI_DOUBLE,MPI_SUM,world);
time = tmp/nprocs;
if (me == 0) {
if (screen)
fprintf(screen," Dynamics time (%%) = %g (%g)\n",
time,time/time_loop*100.0);
if (logfile)
fprintf(logfile," Dynamics time (%%) = %g (%g)\n",
time,time/time_loop*100.0);
}
time = timer->get_wall(Timer::QUENCH);
MPI_Allreduce(&time,&tmp,1,MPI_DOUBLE,MPI_SUM,world);
time = tmp/nprocs;
if (me == 0) {
if (screen)
fprintf(screen," Quench time (%%) = %g (%g)\n",
time,time/time_loop*100.0);
if (logfile)
fprintf(logfile," Quench time (%%) = %g (%g)\n",
time,time/time_loop*100.0);
}
time = time_other;
MPI_Allreduce(&time,&tmp,1,MPI_DOUBLE,MPI_SUM,world);
time = tmp/nprocs;
if (me == 0) {
if (screen)
fprintf(screen," Other time (%%) = %g (%g)\n",
time,time/time_loop*100.0);
if (logfile)
fprintf(logfile," Other time (%%) = %g (%g)\n",
time,time/time_loop*100.0);
}
}
// further timing breakdowns
if (timeflag && timer->has_normal()) {
if (timer->has_full()) {
const char hdr[] = "\nMPI task timing breakdown:\n"
"Section | min time | avg time | max time "
"|%varavg| %CPU | %total\n"
"-----------------------------------------------"
"------------------------\n";
if (me == 0) {
if (screen) fputs(hdr,screen);
if (logfile) fputs(hdr,logfile);
}
} else {
const char hdr[] = "\nMPI task timing breakdown:\n"
"Section | min time | avg time | max time |%varavg| %total\n"
"---------------------------------------------------------------\n";
if (me == 0) {
if (screen) fputs(hdr,screen);
if (logfile) fputs(hdr,logfile);
}
}
mpi_timings("Pair",timer,Timer::PAIR, world,nprocs,
nthreads,me,time_loop,screen,logfile);
if (atom->molecular)
mpi_timings("Bond",timer,Timer::BOND,world,nprocs,
nthreads,me,time_loop,screen,logfile);
if (force->kspace)
mpi_timings("Kspace",timer,Timer::KSPACE,world,nprocs,
nthreads,me,time_loop,screen,logfile);
mpi_timings("Neigh",timer,Timer::NEIGH,world,nprocs,
nthreads,me,time_loop,screen,logfile);
mpi_timings("Comm",timer,Timer::COMM,world,nprocs,
nthreads,me,time_loop,screen,logfile);
mpi_timings("Output",timer,Timer::OUTPUT,world,nprocs,
nthreads,me,time_loop,screen,logfile);
mpi_timings("Modify",timer,Timer::MODIFY,world,nprocs,
nthreads,me,time_loop,screen,logfile);
if (timer->has_sync())
mpi_timings("Sync",timer,Timer::SYNC,world,nprocs,
nthreads,me,time_loop,screen,logfile);
time = time_other;
MPI_Allreduce(&time,&tmp,1,MPI_DOUBLE,MPI_SUM,world);
time = tmp/nprocs;
const char *fmt;
if (timer->has_full())
fmt = "Other | |%- 12.4g| | | |%6.2f\n";
else
fmt = "Other | |%- 12.4g| | |%6.2f\n";
if (me == 0) {
if (screen) fprintf(screen,fmt,time,time/time_loop*100.0);
if (logfile) fprintf(logfile,fmt,time,time/time_loop*100.0);
}
}
#ifdef LMP_USER_OMP
const char thr_hdr_fmt[] =
"\nThread timing breakdown (MPI rank %d):\nTotal threaded time %.4g / %.1f%%\n";
const char thr_header[] =
"Section | min time | avg time | max time |%varavg| %total\n"
"---------------------------------------------------------------\n";
int ifix = modify->find_fix("package_omp");
// print thread breakdown only with full timer detail
if ((ifix >= 0) && timer->has_full() && me == 0) {
double thr_total = 0.0;
ThrData *td;
FixOMP *fixomp = static_cast<FixOMP *>(lmp->modify->fix[ifix]);
for (i=0; i < nthreads; ++i) {
td = fixomp->get_thr(i);
thr_total += td->get_time(Timer::ALL);
}
thr_total /= (double) nthreads;
if (thr_total > 0.0) {
if (screen) {
fprintf(screen,thr_hdr_fmt,me,thr_total,thr_total/time_loop*100.0);
fputs(thr_header,screen);
}
if (logfile) {
fprintf(logfile,thr_hdr_fmt,me,thr_total,thr_total/time_loop*100.0);
fputs(thr_header,logfile);
}
omp_times(fixomp,"Pair",Timer::PAIR,nthreads,screen,logfile);
if (atom->molecular)
omp_times(fixomp,"Bond",Timer::BOND,nthreads,screen,logfile);
if (force->kspace)
omp_times(fixomp,"Kspace",Timer::KSPACE,nthreads,screen,logfile);
omp_times(fixomp,"Neigh",Timer::NEIGH,nthreads,screen,logfile);
omp_times(fixomp,"Reduce",Timer::COMM,nthreads,screen,logfile);
}
}
#endif
if (lmp->kokkos && lmp->kokkos->ngpu > 0)
if (const char* env_clb = getenv("CUDA_LAUNCH_BLOCKING"))
if (!(strcmp(env_clb,"1") == 0)) {
error->warning(FLERR,"Timing breakdown may not be accurate "
"since GPU/CPU overlap is enabled\n"
"Using 'export CUDA_LAUNCH_BLOCKING=1' will give an "
"accurate timing breakdown but will reduce performance");
}
// FFT timing statistics
// time3d,time1d = total time during run for 3d and 1d FFTs
// loop on timing() until nsample FFTs require at least 1.0 CPU sec
// time_kspace may be 0.0 if another partition is doing Kspace
if (fftflag) {
if (me == 0) {
if (screen) fprintf(screen,"\n");
if (logfile) fprintf(logfile,"\n");
}
int nsteps = update->nsteps;
double time3d;
int nsample = 1;
int nfft = force->kspace->timing_3d(nsample,time3d);
while (time3d < 1.0) {
nsample *= 2;
nfft = force->kspace->timing_3d(nsample,time3d);
}
time3d = nsteps * time3d / nsample;
MPI_Allreduce(&time3d,&tmp,1,MPI_DOUBLE,MPI_SUM,world);
time3d = tmp/nprocs;
double time1d;
nsample = 1;
nfft = force->kspace->timing_1d(nsample,time1d);
while (time1d < 1.0) {
nsample *= 2;
nfft = force->kspace->timing_1d(nsample,time1d);
}
time1d = nsteps * time1d / nsample;
MPI_Allreduce(&time1d,&tmp,1,MPI_DOUBLE,MPI_SUM,world);
time1d = tmp/nprocs;
double time_kspace = timer->get_wall(Timer::KSPACE);
MPI_Allreduce(&time_kspace,&tmp,1,MPI_DOUBLE,MPI_SUM,world);
time_kspace = tmp/nprocs;
double ntotal = 1.0 * force->kspace->nx_pppm *
force->kspace->ny_pppm * force->kspace->nz_pppm;
double nflops = 5.0 * ntotal * log(ntotal);
double fraction,flop3,flop1;
if (nsteps) {
if (time_kspace) fraction = time3d/time_kspace*100.0;
else fraction = 0.0;
flop3 = nfft*nflops/1.0e9/(time3d/nsteps);
flop1 = nfft*nflops/1.0e9/(time1d/nsteps);
} else fraction = flop3 = flop1 = 0.0;
if (me == 0) {
if (screen) {
fprintf(screen,"FFT time (%% of Kspce) = %g (%g)\n",time3d,fraction);
fprintf(screen,"FFT Gflps 3d (1d only) = %g %g\n",flop3,flop1);
}
if (logfile) {
fprintf(logfile,"FFT time (%% of Kspce) = %g (%g)\n",time3d,fraction);
fprintf(logfile,"FFT Gflps 3d (1d only) = %g %g\n",flop3,flop1);
}
}
}
if (histoflag) {
if (me == 0) {
if (screen) fprintf(screen,"\n");
if (logfile) fprintf(logfile,"\n");
}
tmp = atom->nlocal;
stats(1,&tmp,&ave,&max,&min,10,histo);
if (me == 0) {
if (screen) {
fprintf(screen,"Nlocal: %g ave %g max %g min\n",ave,max,min);
fprintf(screen,"Histogram:");
for (i = 0; i < 10; i++) fprintf(screen," %d",histo[i]);
fprintf(screen,"\n");
}
if (logfile) {
fprintf(logfile,"Nlocal: %g ave %g max %g min\n",ave,max,min);
fprintf(logfile,"Histogram:");
for (i = 0; i < 10; i++) fprintf(logfile," %d",histo[i]);
fprintf(logfile,"\n");
}
}
tmp = atom->nghost;
stats(1,&tmp,&ave,&max,&min,10,histo);
if (me == 0) {
if (screen) {
fprintf(screen,"Nghost: %g ave %g max %g min\n",ave,max,min);
fprintf(screen,"Histogram:");
for (i = 0; i < 10; i++) fprintf(screen," %d",histo[i]);
fprintf(screen,"\n");
}
if (logfile) {
fprintf(logfile,"Nghost: %g ave %g max %g min\n",ave,max,min);
fprintf(logfile,"Histogram:");
for (i = 0; i < 10; i++) fprintf(logfile," %d",histo[i]);
fprintf(logfile,"\n");
}
}
// find a non-skip neighbor list containing half pairwise interactions
// count neighbors in that list for stats purposes
// allow it to be Kokkos neigh list as well
for (m = 0; m < neighbor->old_nrequest; m++)
if ((neighbor->old_requests[m]->half ||
neighbor->old_requests[m]->gran ||
neighbor->old_requests[m]->respaouter ||
neighbor->old_requests[m]->half_from_full) &&
neighbor->old_requests[m]->skip == 0 &&
neighbor->lists[m] && neighbor->lists[m]->numneigh) break;
nneigh = 0;
if (m < neighbor->old_nrequest) {
if (!neighbor->lists[m]->kokkos) {
int inum = neighbor->lists[m]->inum;
int *ilist = neighbor->lists[m]->ilist;
int *numneigh = neighbor->lists[m]->numneigh;
for (i = 0; i < inum; i++)
nneigh += numneigh[ilist[i]];
} else if (lmp->kokkos) nneigh = lmp->kokkos->neigh_count(m);
}
tmp = nneigh;
stats(1,&tmp,&ave,&max,&min,10,histo);
if (me == 0) {
if (screen) {
fprintf(screen,"Neighs: %g ave %g max %g min\n",ave,max,min);
fprintf(screen,"Histogram:");
for (i = 0; i < 10; i++) fprintf(screen," %d",histo[i]);
fprintf(screen,"\n");
}
if (logfile) {
fprintf(logfile,"Neighs: %g ave %g max %g min\n",ave,max,min);
fprintf(logfile,"Histogram:");
for (i = 0; i < 10; i++) fprintf(logfile," %d",histo[i]);
fprintf(logfile,"\n");
}
}
// find a non-skip neighbor list containing full pairwise interactions
// count neighbors in that list for stats purposes
// allow it to be Kokkos neigh list as well
for (m = 0; m < neighbor->old_nrequest; m++)
if (neighbor->old_requests[m]->full &&
neighbor->old_requests[m]->skip == 0) break;
nneighfull = 0;
if (m < neighbor->old_nrequest) {
if (!neighbor->lists[m]->kokkos && neighbor->lists[m]->numneigh) {
int inum = neighbor->lists[m]->inum;
int *ilist = neighbor->lists[m]->ilist;
int *numneigh = neighbor->lists[m]->numneigh;
for (i = 0; i < inum; i++)
nneighfull += numneigh[ilist[i]];
} else if (lmp->kokkos)
nneighfull = lmp->kokkos->neigh_count(m);
tmp = nneighfull;
stats(1,&tmp,&ave,&max,&min,10,histo);
if (me == 0) {
if (screen) {
fprintf(screen,"FullNghs: %g ave %g max %g min\n",ave,max,min);
fprintf(screen,"Histogram:");
for (i = 0; i < 10; i++) fprintf(screen," %d",histo[i]);
fprintf(screen,"\n");
}
if (logfile) {
fprintf(logfile,"FullNghs: %g ave %g max %g min\n",ave,max,min);
fprintf(logfile,"Histogram:");
for (i = 0; i < 10; i++) fprintf(logfile," %d",histo[i]);
fprintf(logfile,"\n");
}
}
}
}
if (neighflag) {
if (me == 0) {
if (screen) fprintf(screen,"\n");
if (logfile) fprintf(logfile,"\n");
}
tmp = MAX(nneigh,nneighfull);
double nall;
MPI_Allreduce(&tmp,&nall,1,MPI_DOUBLE,MPI_SUM,world);
int nspec;
double nspec_all = 0;
if (atom->molecular == 1) {
int **nspecial = atom->nspecial;
int nlocal = atom->nlocal;
nspec = 0;
for (i = 0; i < nlocal; i++) nspec += nspecial[i][2];
tmp = nspec;
MPI_Allreduce(&tmp,&nspec_all,1,MPI_DOUBLE,MPI_SUM,world);
} else if (atom->molecular == 2) {
Molecule **onemols = atom->avec->onemols;
int *molindex = atom->molindex;
int *molatom = atom->molatom;
int nlocal = atom->nlocal;
int imol,iatom;
nspec = 0;
for (i = 0; i < nlocal; i++) {
if (molindex[i] < 0) continue;
imol = molindex[i];
iatom = molatom[i];
nspec += onemols[imol]->nspecial[iatom][2];
}
tmp = nspec;
MPI_Allreduce(&tmp,&nspec_all,1,MPI_DOUBLE,MPI_SUM,world);
}
if (me == 0) {
if (screen) {
if (nall < 2.0e9)
fprintf(screen,
"Total # of neighbors = %d\n",static_cast<int> (nall));
else fprintf(screen,"Total # of neighbors = %g\n",nall);
if (atom->natoms > 0)
fprintf(screen,"Ave neighs/atom = %g\n",nall/atom->natoms);
if (atom->molecular && atom->natoms > 0)
fprintf(screen,"Ave special neighs/atom = %g\n",
nspec_all/atom->natoms);
fprintf(screen,"Neighbor list builds = " BIGINT_FORMAT "\n",
neighbor->ncalls);
if (neighbor->dist_check)
fprintf(screen,"Dangerous builds = " BIGINT_FORMAT "\n",
neighbor->ndanger);
else fprintf(screen,"Dangerous builds not checked\n");
}
if (logfile) {
if (nall < 2.0e9)
fprintf(logfile,
"Total # of neighbors = %d\n",static_cast<int> (nall));
else fprintf(logfile,"Total # of neighbors = %g\n",nall);
if (atom->natoms > 0)
fprintf(logfile,"Ave neighs/atom = %g\n",nall/atom->natoms);
if (atom->molecular && atom->natoms > 0)
fprintf(logfile,"Ave special neighs/atom = %g\n",
nspec_all/atom->natoms);
fprintf(logfile,"Neighbor list builds = " BIGINT_FORMAT "\n",
neighbor->ncalls);
if (neighbor->dist_check)
fprintf(logfile,"Dangerous builds = " BIGINT_FORMAT "\n",
neighbor->ndanger);
else fprintf(logfile,"Dangerous builds not checked\n");
}
}
}
if (logfile) fflush(logfile);
}
/* ---------------------------------------------------------------------- */
void Finish::stats(int n, double *data,
double *pave, double *pmax, double *pmin,
int nhisto, int *histo)
{
int i,m;
int *histotmp;
double min = 1.0e20;
double max = -1.0e20;
double ave = 0.0;
for (i = 0; i < n; i++) {
ave += data[i];
if (data[i] < min) min = data[i];
if (data[i] > max) max = data[i];
}
int ntotal;
MPI_Allreduce(&n,&ntotal,1,MPI_INT,MPI_SUM,world);
double tmp;
MPI_Allreduce(&ave,&tmp,1,MPI_DOUBLE,MPI_SUM,world);
ave = tmp/ntotal;
MPI_Allreduce(&min,&tmp,1,MPI_DOUBLE,MPI_MIN,world);
min = tmp;
MPI_Allreduce(&max,&tmp,1,MPI_DOUBLE,MPI_MAX,world);
max = tmp;
for (i = 0; i < nhisto; i++) histo[i] = 0;
double del = max - min;
for (i = 0; i < n; i++) {
if (del == 0.0) m = 0;
else m = static_cast<int> ((data[i]-min)/del * nhisto);
if (m > nhisto-1) m = nhisto-1;
histo[m]++;
}
memory->create(histotmp,nhisto,"finish:histotmp");
MPI_Allreduce(histo,histotmp,nhisto,MPI_INT,MPI_SUM,world);
for (i = 0; i < nhisto; i++) histo[i] = histotmp[i];
memory->destroy(histotmp);
*pave = ave;
*pmax = max;
*pmin = min;
}
/* ---------------------------------------------------------------------- */
void mpi_timings(const char *label, Timer *t, enum Timer::ttype tt,
MPI_Comm world, const int nprocs, const int nthreads,
const int me, double time_loop, FILE *scr, FILE *log)
{
double tmp, time_max, time_min, time_sq;
double time = t->get_wall(tt);
double time_cpu = t->get_cpu(tt);
if (time/time_loop < 0.001) // insufficient timer resolution!
time_cpu = 1.0;
else
time_cpu = time_cpu / time;
if (time_cpu > nthreads) time_cpu = nthreads;
MPI_Allreduce(&time,&time_min,1,MPI_DOUBLE,MPI_MIN,world);
MPI_Allreduce(&time,&time_max,1,MPI_DOUBLE,MPI_MAX,world);
time_sq = time*time;
MPI_Allreduce(&time,&tmp,1,MPI_DOUBLE,MPI_SUM,world);
time = tmp/nprocs;
MPI_Allreduce(&time_sq,&tmp,1,MPI_DOUBLE,MPI_SUM,world);
time_sq = tmp/nprocs;
MPI_Allreduce(&time_cpu,&tmp,1,MPI_DOUBLE,MPI_SUM,world);
time_cpu = tmp/nprocs*100.0;
// % variance from the average as measure of load imbalance
- if ((time_sq/time - time) > 1.0e-10)
+ if ((time > 0.001) && ((time_sq/time - time) > 1.0e-10))
time_sq = sqrt(time_sq/time - time)*100.0;
else
time_sq = 0.0;
if (me == 0) {
tmp = time/time_loop*100.0;
if (t->has_full()) {
const char fmt[] = "%-8s|%- 12.5g|%- 12.5g|%- 12.5g|%6.1f |%6.1f |%6.2f\n";
if (scr)
fprintf(scr,fmt,label,time_min,time,time_max,time_sq,time_cpu,tmp);
if (log)
fprintf(log,fmt,label,time_min,time,time_max,time_sq,time_cpu,tmp);
time_loop = 100.0/time_loop;
} else {
const char fmt[] = "%-8s|%- 12.5g|%- 12.5g|%- 12.5g|%6.1f |%6.2f\n";
if (scr)
fprintf(scr,fmt,label,time_min,time,time_max,time_sq,tmp);
if (log)
fprintf(log,fmt,label,time_min,time,time_max,time_sq,tmp);
}
}
}
/* ---------------------------------------------------------------------- */
#ifdef LMP_USER_OMP
void omp_times(FixOMP *fix, const char *label, enum Timer::ttype which,
const int nthreads,FILE *scr, FILE *log)
{
const char fmt[] = "%-8s|%- 12.5g|%- 12.5g|%- 12.5g|%6.1f |%6.2f\n";
double time_min, time_max, time_avg, time_total, time_std;
time_min = 1.0e100;
time_max = -1.0e100;
time_total = time_avg = time_std = 0.0;
for (int i=0; i < nthreads; ++i) {
ThrData *thr = fix->get_thr(i);
double tmp=thr->get_time(which);
time_min = MIN(time_min,tmp);
time_max = MAX(time_max,tmp);
time_avg += tmp;
time_std += tmp*tmp;
time_total += thr->get_time(Timer::ALL);
}
time_avg /= nthreads;
time_std /= nthreads;
time_total /= nthreads;
- if ((time_std/time_avg -time_avg) > 1.0e-10)
+ if ((time_avg > 0.001) && ((time_std/time_avg -time_avg) > 1.0e-10))
time_std = sqrt(time_std/time_avg - time_avg)*100.0;
else
time_std = 0.0;
if (scr) fprintf(scr,fmt,label,time_min,time_avg,time_max,time_std,
time_avg/time_total*100.0);
if (log) fprintf(log,fmt,label,time_min,time_avg,time_max,time_std,
time_avg/time_total*100.0);
}
#endif
diff --git a/src/input.cpp b/src/input.cpp
index 258b4d7dd..76aba3d87 100644
--- a/src/input.cpp
+++ b/src/input.cpp
@@ -1,1987 +1,1987 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <ctype.h>
#include <unistd.h>
#include "sys/stat.h"
#include "input.h"
#include "style_command.h"
#include "universe.h"
#include "atom.h"
#include "atom_vec.h"
#include "comm.h"
#include "comm_brick.h"
#include "comm_tiled.h"
#include "accelerator_kokkos.h"
#include "group.h"
#include "domain.h"
#include "output.h"
#include "thermo.h"
#include "force.h"
#include "pair.h"
#include "min.h"
#include "modify.h"
#include "compute.h"
#include "fix.h"
#include "bond.h"
#include "angle.h"
#include "dihedral.h"
#include "improper.h"
#include "kspace.h"
#include "update.h"
#include "neighbor.h"
#include "special.h"
#include "timer.h"
#include "variable.h"
#include "accelerator_kokkos.h"
#include "error.h"
#include "memory.h"
#ifdef _OPENMP
#include <omp.h>
#endif
#ifdef _WIN32
#include <direct.h>
#endif
using namespace LAMMPS_NS;
#define DELTALINE 256
#define DELTA 4
/* ---------------------------------------------------------------------- */
Input::Input(LAMMPS *lmp, int argc, char **argv) : Pointers(lmp)
{
MPI_Comm_rank(world,&me);
maxline = maxcopy = maxwork = 0;
line = copy = work = NULL;
narg = maxarg = 0;
arg = NULL;
echo_screen = 0;
echo_log = 1;
label_active = 0;
labelstr = NULL;
jump_skip = 0;
ifthenelse_flag = 0;
if (me == 0) {
nfile = maxfile = 1;
infiles = (FILE **) memory->smalloc(sizeof(FILE *),"input:infiles");
infiles[0] = infile;
} else infiles = NULL;
variable = new Variable(lmp);
// fill map with commands listed in style_command.h
command_map = new CommandCreatorMap();
#define COMMAND_CLASS
#define CommandStyle(key,Class) \
(*command_map)[#key] = &command_creator<Class>;
#include "style_command.h"
#undef CommandStyle
#undef COMMAND_CLASS
// process command-line args
// check for args "-var" and "-echo"
// caller has already checked that sufficient arguments exist
int iarg = 1;
while (iarg < argc) {
if (strcmp(argv[iarg],"-var") == 0 || strcmp(argv[iarg],"-v") == 0) {
int jarg = iarg+3;
while (jarg < argc && argv[jarg][0] != '-') jarg++;
variable->set(argv[iarg+1],jarg-iarg-2,&argv[iarg+2]);
iarg = jarg;
} else if (strcmp(argv[iarg],"-echo") == 0 ||
strcmp(argv[iarg],"-e") == 0) {
narg = 1;
char **tmp = arg; // trick echo() into using argv instead of arg
arg = &argv[iarg+1];
echo();
arg = tmp;
iarg += 2;
} else iarg++;
}
}
/* ---------------------------------------------------------------------- */
Input::~Input()
{
// don't free command and arg strings
// they just point to other allocated memory
memory->sfree(line);
memory->sfree(copy);
memory->sfree(work);
if (labelstr) delete [] labelstr;
memory->sfree(arg);
memory->sfree(infiles);
delete variable;
delete command_map;
}
/* ----------------------------------------------------------------------
process all input from infile
infile = stdin or file if command-line arg "-in" was used
------------------------------------------------------------------------- */
void Input::file()
{
int m,n;
while (1) {
// read a line from input script
// n = length of line including str terminator, 0 if end of file
// if line ends in continuation char '&', concatenate next line
if (me == 0) {
m = 0;
while (1) {
if (maxline-m < 2) reallocate(line,maxline,0);
// end of file reached, so break
// n == 0 if nothing read, else n = line with str terminator
if (fgets(&line[m],maxline-m,infile) == NULL) {
if (m) n = strlen(line) + 1;
else n = 0;
break;
}
// continue if last char read was not a newline
// could happen if line is very long
m = strlen(line);
if (line[m-1] != '\n') continue;
// continue reading if final printable char is & char
// or if odd number of triple quotes
// else break with n = line with str terminator
m--;
while (m >= 0 && isspace(line[m])) m--;
if (m < 0 || line[m] != '&') {
if (numtriple(line) % 2) {
m += 2;
continue;
}
line[m+1] = '\0';
n = m+2;
break;
}
}
}
// bcast the line
// if n = 0, end-of-file
// error if label_active is set, since label wasn't encountered
// if original input file, code is done
// else go back to previous input file
MPI_Bcast(&n,1,MPI_INT,0,world);
if (n == 0) {
if (label_active) error->all(FLERR,"Label wasn't found in input script");
if (me == 0) {
if (infile != stdin) {
fclose(infile);
infile = NULL;
}
nfile--;
}
MPI_Bcast(&nfile,1,MPI_INT,0,world);
if (nfile == 0) break;
if (me == 0) infile = infiles[nfile-1];
continue;
}
if (n > maxline) reallocate(line,maxline,n);
MPI_Bcast(line,n,MPI_CHAR,0,world);
// echo the command unless scanning for label
if (me == 0 && label_active == 0) {
if (echo_screen && screen) fprintf(screen,"%s\n",line);
if (echo_log && logfile) fprintf(logfile,"%s\n",line);
}
// parse the line
// if no command, skip to next line in input script
parse();
if (command == NULL) continue;
// if scanning for label, skip command unless it's a label command
if (label_active && strcmp(command,"label") != 0) continue;
// execute the command
if (execute_command()) {
char *str = new char[maxline+32];
sprintf(str,"Unknown command: %s",line);
error->all(FLERR,str);
}
}
}
/* ----------------------------------------------------------------------
process all input from filename
called from library interface
------------------------------------------------------------------------- */
void Input::file(const char *filename)
{
// error if another nested file still open, should not be possible
// open new filename and set infile, infiles[0], nfile
// call to file() will close filename and decrement nfile
if (me == 0) {
if (nfile > 1)
error->one(FLERR,"Invalid use of library file() function");
if (infile && infile != stdin) fclose(infile);
infile = fopen(filename,"r");
if (infile == NULL) {
char str[128];
sprintf(str,"Cannot open input script %s",filename);
error->one(FLERR,str);
}
infiles[0] = infile;
nfile = 1;
}
file();
}
/* ----------------------------------------------------------------------
invoke one command in single
first copy to line, then parse, then execute it
return command name to caller
------------------------------------------------------------------------- */
char *Input::one(const char *single)
{
int n = strlen(single) + 1;
if (n > maxline) reallocate(line,maxline,n);
strcpy(line,single);
// echo the command unless scanning for label
if (me == 0 && label_active == 0) {
if (echo_screen && screen) fprintf(screen,"%s\n",line);
if (echo_log && logfile) fprintf(logfile,"%s\n",line);
}
// parse the line
// if no command, just return NULL
parse();
if (command == NULL) return NULL;
// if scanning for label, skip command unless it's a label command
if (label_active && strcmp(command,"label") != 0) return NULL;
// execute the command and return its name
if (execute_command()) {
char *str = new char[maxline+32];
sprintf(str,"Unknown command: %s",line);
error->all(FLERR,str);
}
return command;
}
/* ----------------------------------------------------------------------
parse copy of command line by inserting string terminators
strip comment = all chars from # on
replace all $ via variable substitution except within quotes
command = first word
narg = # of args
arg[] = individual args
treat text between single/double/triple quotes as one arg via nextword()
------------------------------------------------------------------------- */
void Input::parse()
{
// duplicate line into copy string to break into words
int n = strlen(line) + 1;
if (n > maxcopy) reallocate(copy,maxcopy,n);
strcpy(copy,line);
// strip any # comment by replacing it with 0
// do not strip from a # inside single/double/triple quotes
// quoteflag = 1,2,3 when encounter first single/double,triple quote
// quoteflag = 0 when encounter matching single/double,triple quote
int quoteflag = 0;
char *ptr = copy;
while (*ptr) {
if (*ptr == '#' && !quoteflag) {
*ptr = '\0';
break;
}
if (quoteflag == 0) {
if (strstr(ptr,"\"\"\"") == ptr) {
- quoteflag = 3;
- ptr += 2;
+ quoteflag = 3;
+ ptr += 2;
}
else if (*ptr == '"') quoteflag = 2;
else if (*ptr == '\'') quoteflag = 1;
} else {
if (quoteflag == 3 && strstr(ptr,"\"\"\"") == ptr) {
- quoteflag = 0;
- ptr += 2;
+ quoteflag = 0;
+ ptr += 2;
}
else if (quoteflag == 2 && *ptr == '"') quoteflag = 0;
else if (quoteflag == 1 && *ptr == '\'') quoteflag = 0;
}
ptr++;
}
// perform $ variable substitution (print changes)
// except if searching for a label since earlier variable may not be defined
if (!label_active) substitute(copy,work,maxcopy,maxwork,1);
// command = 1st arg in copy string
char *next;
command = nextword(copy,&next);
if (command == NULL) return;
// point arg[] at each subsequent arg in copy string
// nextword() inserts string terminators into copy string to delimit args
// nextword() treats text between single/double/triple quotes as one arg
narg = 0;
ptr = next;
while (ptr) {
if (narg == maxarg) {
maxarg += DELTA;
arg = (char **) memory->srealloc(arg,maxarg*sizeof(char *),"input:arg");
}
arg[narg] = nextword(ptr,&next);
if (!arg[narg]) break;
narg++;
ptr = next;
}
}
/* ----------------------------------------------------------------------
find next word in str
insert 0 at end of word
ignore leading whitespace
treat text between single/double/triple quotes as one arg
matching quote must be followed by whitespace char if not end of string
strip quotes from returned word
return ptr to start of word or NULL if no word in string
also return next = ptr after word
------------------------------------------------------------------------- */
char *Input::nextword(char *str, char **next)
{
char *start,*stop;
// start = first non-whitespace char
start = &str[strspn(str," \t\n\v\f\r")];
if (*start == '\0') return NULL;
// if start is single/double/triple quote:
// start = first char beyond quote
// stop = first char of matching quote
// next = first char beyond matching quote
// next must be NULL or whitespace
// if start is not single/double/triple quote:
// stop = first whitespace char after start
// next = char after stop, or stop itself if stop is NULL
if (strstr(start,"\"\"\"") == start) {
stop = strstr(&start[3],"\"\"\"");
if (!stop) error->all(FLERR,"Unbalanced quotes in input line");
start += 3;
*next = stop+3;
if (**next && !isspace(**next))
error->all(FLERR,"Input line quote not followed by whitespace");
} else if (*start == '"' || *start == '\'') {
stop = strchr(&start[1],*start);
if (!stop) error->all(FLERR,"Unbalanced quotes in input line");
start++;
*next = stop+1;
if (**next && !isspace(**next))
error->all(FLERR,"Input line quote not followed by whitespace");
} else {
stop = &start[strcspn(start," \t\n\v\f\r")];
if (*stop == '\0') *next = stop;
else *next = stop+1;
}
// set stop to NULL to terminate word
*stop = '\0';
return start;
}
/* ----------------------------------------------------------------------
substitute for $ variables in str using work str2 and return it
reallocate str/str2 to hold expanded version if necessary & reset max/max2
print updated string if flag is set and not searching for label
label_active will be 0 if called from external class
------------------------------------------------------------------------- */
void Input::substitute(char *&str, char *&str2, int &max, int &max2, int flag)
{
// use str2 as scratch space to expand str, then copy back to str
// reallocate str and str2 as necessary
// do not replace $ inside single/double/triple quotes
// var = pts at variable name, ended by NULL
// if $ is followed by '{', trailing '}' becomes NULL
// else $x becomes x followed by NULL
// beyond = points to text following variable
int i,n,paren_count;
char immediate[256];
char *var,*value,*beyond;
int quoteflag = 0;
char *ptr = str;
n = strlen(str) + 1;
if (n > max2) reallocate(str2,max2,n);
*str2 = '\0';
char *ptr2 = str2;
while (*ptr) {
// variable substitution
if (*ptr == '$' && !quoteflag) {
// value = ptr to expanded variable
// variable name between curly braces, e.g. ${a}
if (*(ptr+1) == '{') {
var = ptr+2;
i = 0;
while (var[i] != '\0' && var[i] != '}') i++;
if (var[i] == '\0') error->one(FLERR,"Invalid variable name");
var[i] = '\0';
beyond = ptr + strlen(var) + 3;
value = variable->retrieve(var);
// immediate variable between parenthesis, e.g. $(1/2)
} else if (*(ptr+1) == '(') {
var = ptr+2;
paren_count = 0;
i = 0;
while (var[i] != '\0' && !(var[i] == ')' && paren_count == 0)) {
switch (var[i]) {
case '(': paren_count++; break;
case ')': paren_count--; break;
default: ;
}
i++;
}
if (var[i] == '\0') error->one(FLERR,"Invalid immediate variable");
var[i] = '\0';
beyond = ptr + strlen(var) + 3;
sprintf(immediate,"%.20g",variable->compute_equal(var));
value = immediate;
// single character variable name, e.g. $a
} else {
var = ptr;
var[0] = var[1];
var[1] = '\0';
beyond = ptr + 2;
value = variable->retrieve(var);
}
if (value == NULL) error->one(FLERR,"Substitution for illegal variable");
// check if storage in str2 needs to be expanded
// re-initialize ptr and ptr2 to the point beyond the variable.
n = strlen(str2) + strlen(value) + strlen(beyond) + 1;
if (n > max2) reallocate(str2,max2,n);
strcat(str2,value);
ptr2 = str2 + strlen(str2);
ptr = beyond;
// output substitution progress if requested
if (flag && me == 0 && label_active == 0) {
if (echo_screen && screen) fprintf(screen,"%s%s\n",str2,beyond);
if (echo_log && logfile) fprintf(logfile,"%s%s\n",str2,beyond);
}
continue;
}
// quoteflag = 1,2,3 when encounter first single/double,triple quote
// quoteflag = 0 when encounter matching single/double,triple quote
// copy 2 extra triple quote chars into str2
if (quoteflag == 0) {
if (strstr(ptr,"\"\"\"") == ptr) {
quoteflag = 3;
*ptr2++ = *ptr++;
*ptr2++ = *ptr++;
}
else if (*ptr == '"') quoteflag = 2;
else if (*ptr == '\'') quoteflag = 1;
} else {
if (quoteflag == 3 && strstr(ptr,"\"\"\"") == ptr) {
quoteflag = 0;
*ptr2++ = *ptr++;
*ptr2++ = *ptr++;
}
else if (quoteflag == 2 && *ptr == '"') quoteflag = 0;
else if (quoteflag == 1 && *ptr == '\'') quoteflag = 0;
}
// copy current character into str2
*ptr2++ = *ptr++;
*ptr2 = '\0';
}
// set length of input str to length of work str2
// copy work string back to input str
if (max2 > max) reallocate(str,max,max2);
strcpy(str,str2);
}
/* ----------------------------------------------------------------------
expand arg to earg, for arguments with syntax c_ID[*] or f_ID[*]
fields to consider in input arg range from iarg to narg
return new expanded # of values, and copy them w/out "*" into earg
if any expansion occurs, earg is new allocation, must be freed by caller
if no expansion occurs, earg just points to arg, caller need not free
------------------------------------------------------------------------- */
int Input::expand_args(int narg, char **arg, int mode, char **&earg)
{
int n,iarg,index,nlo,nhi,nmax,expandflag,icompute,ifix;
char *ptr1,*ptr2,*str;
ptr1 = NULL;
for (iarg = 0; iarg < narg; iarg++) {
ptr1 = strchr(arg[iarg],'*');
if (ptr1) break;
}
if (!ptr1) {
earg = arg;
return narg;
}
// maxarg should always end up equal to newarg, so caller can free earg
int maxarg = narg-iarg;
earg = (char **) memory->smalloc(maxarg*sizeof(char *),"input:earg");
int newarg = 0;
for (iarg = 0; iarg < narg; iarg++) {
expandflag = 0;
if (strncmp(arg[iarg],"c_",2) == 0 ||
strncmp(arg[iarg],"f_",2) == 0) {
ptr1 = strchr(&arg[iarg][2],'[');
if (ptr1) {
ptr2 = strchr(ptr1,']');
if (ptr2) {
*ptr2 = '\0';
if (strchr(ptr1,'*')) {
if (arg[iarg][0] == 'c') {
*ptr1 = '\0';
icompute = modify->find_compute(&arg[iarg][2]);
*ptr1 = '[';
// check for global vector/array, peratom array, local array
if (icompute >= 0) {
if (mode == 0 && modify->compute[icompute]->vector_flag) {
nmax = modify->compute[icompute]->size_vector;
expandflag = 1;
} else if (mode == 1 && modify->compute[icompute]->array_flag) {
nmax = modify->compute[icompute]->size_array_cols;
expandflag = 1;
} else if (modify->compute[icompute]->peratom_flag &&
modify->compute[icompute]->size_peratom_cols) {
nmax = modify->compute[icompute]->size_peratom_cols;
expandflag = 1;
} else if (modify->compute[icompute]->local_flag &&
modify->compute[icompute]->size_local_cols) {
nmax = modify->compute[icompute]->size_local_cols;
expandflag = 1;
}
}
} else if (arg[iarg][0] == 'f') {
*ptr1 = '\0';
ifix = modify->find_fix(&arg[iarg][2]);
*ptr1 = '[';
// check for global vector/array, peratom array, local array
if (ifix >= 0) {
if (mode == 0 && modify->fix[ifix]->vector_flag) {
nmax = modify->fix[ifix]->size_vector;
expandflag = 1;
} else if (mode == 1 && modify->fix[ifix]->array_flag) {
nmax = modify->fix[ifix]->size_array_cols;
expandflag = 1;
} else if (modify->fix[ifix]->peratom_flag &&
modify->fix[ifix]->size_peratom_cols) {
nmax = modify->fix[ifix]->size_peratom_cols;
expandflag = 1;
} else if (modify->fix[ifix]->local_flag &&
modify->fix[ifix]->size_local_cols) {
nmax = modify->fix[ifix]->size_local_cols;
expandflag = 1;
}
}
}
}
*ptr2 = ']';
}
}
}
if (expandflag) {
*ptr2 = '\0';
force->bounds(FLERR,ptr1+1,nmax,nlo,nhi);
*ptr2 = ']';
if (newarg+nhi-nlo+1 > maxarg) {
maxarg += nhi-nlo+1;
earg = (char **)
memory->srealloc(earg,maxarg*sizeof(char *),"input:earg");
}
for (index = nlo; index <= nhi; index++) {
n = strlen(arg[iarg]) + 16; // 16 = space for large inserted integer
str = earg[newarg] = new char[n];
strncpy(str,arg[iarg],ptr1+1-arg[iarg]);
sprintf(&str[ptr1+1-arg[iarg]],"%d",index);
strcat(str,ptr2);
newarg++;
}
} else {
if (newarg == maxarg) {
maxarg++;
earg = (char **)
memory->srealloc(earg,maxarg*sizeof(char *),"input:earg");
}
n = strlen(arg[iarg]) + 1;
earg[newarg] = new char[n];
strcpy(earg[newarg],arg[iarg]);
newarg++;
}
}
//printf("NEWARG %d\n",newarg);
//for (int i = 0; i < newarg; i++)
// printf(" arg %d: %s\n",i,earg[i]);
return newarg;
}
/* ----------------------------------------------------------------------
return number of triple quotes in line
------------------------------------------------------------------------- */
int Input::numtriple(char *line)
{
int count = 0;
char *ptr = line;
while ((ptr = strstr(ptr,"\"\"\""))) {
ptr += 3;
count++;
}
return count;
}
/* ----------------------------------------------------------------------
rellocate a string
if n > 0: set max >= n in increments of DELTALINE
if n = 0: just increment max by DELTALINE
------------------------------------------------------------------------- */
void Input::reallocate(char *&str, int &max, int n)
{
if (n) {
while (n > max) max += DELTALINE;
} else max += DELTALINE;
str = (char *) memory->srealloc(str,max*sizeof(char),"input:str");
}
/* ----------------------------------------------------------------------
process a single parsed command
return 0 if successful, -1 if did not recognize command
------------------------------------------------------------------------- */
int Input::execute_command()
{
int flag = 1;
if (!strcmp(command,"clear")) clear();
else if (!strcmp(command,"echo")) echo();
else if (!strcmp(command,"if")) ifthenelse();
else if (!strcmp(command,"include")) include();
else if (!strcmp(command,"jump")) jump();
else if (!strcmp(command,"label")) label();
else if (!strcmp(command,"log")) log();
else if (!strcmp(command,"next")) next_command();
else if (!strcmp(command,"partition")) partition();
else if (!strcmp(command,"print")) print();
else if (!strcmp(command,"python")) python();
else if (!strcmp(command,"quit")) quit();
else if (!strcmp(command,"shell")) shell();
else if (!strcmp(command,"variable")) variable_command();
else if (!strcmp(command,"angle_coeff")) angle_coeff();
else if (!strcmp(command,"angle_style")) angle_style();
else if (!strcmp(command,"atom_modify")) atom_modify();
else if (!strcmp(command,"atom_style")) atom_style();
else if (!strcmp(command,"bond_coeff")) bond_coeff();
else if (!strcmp(command,"bond_style")) bond_style();
else if (!strcmp(command,"bond_write")) bond_write();
else if (!strcmp(command,"boundary")) boundary();
else if (!strcmp(command,"box")) box();
else if (!strcmp(command,"comm_modify")) comm_modify();
else if (!strcmp(command,"comm_style")) comm_style();
else if (!strcmp(command,"compute")) compute();
else if (!strcmp(command,"compute_modify")) compute_modify();
else if (!strcmp(command,"dielectric")) dielectric();
else if (!strcmp(command,"dihedral_coeff")) dihedral_coeff();
else if (!strcmp(command,"dihedral_style")) dihedral_style();
else if (!strcmp(command,"dimension")) dimension();
else if (!strcmp(command,"dump")) dump();
else if (!strcmp(command,"dump_modify")) dump_modify();
else if (!strcmp(command,"fix")) fix();
else if (!strcmp(command,"fix_modify")) fix_modify();
else if (!strcmp(command,"group")) group_command();
else if (!strcmp(command,"improper_coeff")) improper_coeff();
else if (!strcmp(command,"improper_style")) improper_style();
else if (!strcmp(command,"kspace_modify")) kspace_modify();
else if (!strcmp(command,"kspace_style")) kspace_style();
else if (!strcmp(command,"lattice")) lattice();
else if (!strcmp(command,"mass")) mass();
else if (!strcmp(command,"min_modify")) min_modify();
else if (!strcmp(command,"min_style")) min_style();
else if (!strcmp(command,"molecule")) molecule();
else if (!strcmp(command,"neigh_modify")) neigh_modify();
else if (!strcmp(command,"neighbor")) neighbor_command();
else if (!strcmp(command,"newton")) newton();
else if (!strcmp(command,"package")) package();
else if (!strcmp(command,"pair_coeff")) pair_coeff();
else if (!strcmp(command,"pair_modify")) pair_modify();
else if (!strcmp(command,"pair_style")) pair_style();
else if (!strcmp(command,"pair_write")) pair_write();
else if (!strcmp(command,"processors")) processors();
else if (!strcmp(command,"region")) region();
else if (!strcmp(command,"reset_timestep")) reset_timestep();
else if (!strcmp(command,"restart")) restart();
else if (!strcmp(command,"run_style")) run_style();
else if (!strcmp(command,"special_bonds")) special_bonds();
else if (!strcmp(command,"suffix")) suffix();
else if (!strcmp(command,"thermo")) thermo();
else if (!strcmp(command,"thermo_modify")) thermo_modify();
else if (!strcmp(command,"thermo_style")) thermo_style();
else if (!strcmp(command,"timestep")) timestep();
else if (!strcmp(command,"timer")) timer_command();
else if (!strcmp(command,"uncompute")) uncompute();
else if (!strcmp(command,"undump")) undump();
else if (!strcmp(command,"unfix")) unfix();
else if (!strcmp(command,"units")) units();
else flag = 0;
// return if command was listed above
if (flag) return 0;
// invoke commands added via style_command.h
if (command_map->find(command) != command_map->end()) {
CommandCreator command_creator = (*command_map)[command];
command_creator(lmp,narg,arg);
return 0;
}
// unrecognized command
return -1;
}
/* ----------------------------------------------------------------------
one instance per command in style_command.h
------------------------------------------------------------------------- */
template <typename T>
void Input::command_creator(LAMMPS *lmp, int narg, char **arg)
{
T cmd(lmp);
cmd.command(narg,arg);
}
/* ---------------------------------------------------------------------- */
/* ---------------------------------------------------------------------- */
/* ---------------------------------------------------------------------- */
/* ---------------------------------------------------------------------- */
void Input::clear()
{
if (narg > 0) error->all(FLERR,"Illegal clear command");
lmp->destroy();
lmp->create();
lmp->post_create();
}
/* ---------------------------------------------------------------------- */
void Input::echo()
{
if (narg != 1) error->all(FLERR,"Illegal echo command");
if (strcmp(arg[0],"none") == 0) {
echo_screen = 0;
echo_log = 0;
} else if (strcmp(arg[0],"screen") == 0) {
echo_screen = 1;
echo_log = 0;
} else if (strcmp(arg[0],"log") == 0) {
echo_screen = 0;
echo_log = 1;
} else if (strcmp(arg[0],"both") == 0) {
echo_screen = 1;
echo_log = 1;
} else error->all(FLERR,"Illegal echo command");
}
/* ---------------------------------------------------------------------- */
void Input::ifthenelse()
{
if (narg < 3) error->all(FLERR,"Illegal if command");
// substitute for variables in Boolean expression for "if"
// in case expression was enclosed in quotes
// must substitute on copy of arg else will step on subsequent args
int n = strlen(arg[0]) + 1;
if (n > maxline) reallocate(line,maxline,n);
strcpy(line,arg[0]);
substitute(line,work,maxline,maxwork,0);
// evaluate Boolean expression for "if"
double btest = variable->evaluate_boolean(line);
// bound "then" commands
if (strcmp(arg[1],"then") != 0) error->all(FLERR,"Illegal if command");
int first = 2;
int iarg = first;
while (iarg < narg &&
(strcmp(arg[iarg],"elif") != 0 && strcmp(arg[iarg],"else") != 0))
iarg++;
int last = iarg-1;
// execute "then" commands
// make copies of all arg string commands
// required because re-parsing a command via one() will wipe out args
if (btest != 0.0) {
int ncommands = last-first + 1;
if (ncommands <= 0) error->all(FLERR,"Illegal if command");
char **commands = new char*[ncommands];
ncommands = 0;
for (int i = first; i <= last; i++) {
int n = strlen(arg[i]) + 1;
if (n == 1) error->all(FLERR,"Illegal if command");
commands[ncommands] = new char[n];
strcpy(commands[ncommands],arg[i]);
ncommands++;
}
ifthenelse_flag = 1;
for (int i = 0; i < ncommands; i++) one(commands[i]);
ifthenelse_flag = 0;
for (int i = 0; i < ncommands; i++) delete [] commands[i];
delete [] commands;
return;
}
// done if no "elif" or "else"
if (iarg == narg) return;
// check "elif" or "else" until find commands to execute
// substitute for variables and evaluate Boolean expression for "elif"
// must substitute on copy of arg else will step on subsequent args
// bound and execute "elif" or "else" commands
while (iarg != narg) {
if (iarg+2 > narg) error->all(FLERR,"Illegal if command");
if (strcmp(arg[iarg],"elif") == 0) {
n = strlen(arg[iarg+1]) + 1;
if (n > maxline) reallocate(line,maxline,n);
strcpy(line,arg[iarg+1]);
substitute(line,work,maxline,maxwork,0);
btest = variable->evaluate_boolean(line);
first = iarg+2;
} else {
btest = 1.0;
first = iarg+1;
}
iarg = first;
while (iarg < narg &&
(strcmp(arg[iarg],"elif") != 0 && strcmp(arg[iarg],"else") != 0))
iarg++;
last = iarg-1;
if (btest == 0.0) continue;
int ncommands = last-first + 1;
if (ncommands <= 0) error->all(FLERR,"Illegal if command");
char **commands = new char*[ncommands];
ncommands = 0;
for (int i = first; i <= last; i++) {
int n = strlen(arg[i]) + 1;
if (n == 1) error->all(FLERR,"Illegal if command");
commands[ncommands] = new char[n];
strcpy(commands[ncommands],arg[i]);
ncommands++;
}
// execute the list of commands
ifthenelse_flag = 1;
for (int i = 0; i < ncommands; i++) one(commands[i]);
ifthenelse_flag = 0;
// clean up
for (int i = 0; i < ncommands; i++) delete [] commands[i];
delete [] commands;
return;
}
}
/* ---------------------------------------------------------------------- */
void Input::include()
{
if (narg != 1) error->all(FLERR,"Illegal include command");
// do not allow include inside an if command
// NOTE: this check will fail if a 2nd if command was inside the if command
// and came before the include
if (ifthenelse_flag)
error->all(FLERR,"Cannot use include command within an if command");
if (me == 0) {
if (nfile == maxfile) {
maxfile++;
infiles = (FILE **)
memory->srealloc(infiles,maxfile*sizeof(FILE *),"input:infiles");
}
infile = fopen(arg[0],"r");
if (infile == NULL) {
char str[128];
sprintf(str,"Cannot open input script %s",arg[0]);
error->one(FLERR,str);
}
infiles[nfile++] = infile;
}
}
/* ---------------------------------------------------------------------- */
void Input::jump()
{
if (narg < 1 || narg > 2) error->all(FLERR,"Illegal jump command");
if (jump_skip) {
jump_skip = 0;
return;
}
if (me == 0) {
if (strcmp(arg[0],"SELF") == 0) rewind(infile);
else {
if (infile && infile != stdin) fclose(infile);
infile = fopen(arg[0],"r");
if (infile == NULL) {
char str[128];
sprintf(str,"Cannot open input script %s",arg[0]);
error->one(FLERR,str);
}
infiles[nfile-1] = infile;
}
}
if (narg == 2) {
label_active = 1;
if (labelstr) delete [] labelstr;
int n = strlen(arg[1]) + 1;
labelstr = new char[n];
strcpy(labelstr,arg[1]);
}
}
/* ---------------------------------------------------------------------- */
void Input::label()
{
if (narg != 1) error->all(FLERR,"Illegal label command");
if (label_active && strcmp(labelstr,arg[0]) == 0) label_active = 0;
}
/* ---------------------------------------------------------------------- */
void Input::log()
{
if (narg > 2) error->all(FLERR,"Illegal log command");
int appendflag = 0;
if (narg == 2) {
if (strcmp(arg[1],"append") == 0) appendflag = 1;
else error->all(FLERR,"Illegal log command");
}
if (me == 0) {
if (logfile) fclose(logfile);
if (strcmp(arg[0],"none") == 0) logfile = NULL;
else {
if (appendflag) logfile = fopen(arg[0],"a");
else logfile = fopen(arg[0],"w");
if (logfile == NULL) {
char str[128];
sprintf(str,"Cannot open logfile %s",arg[0]);
error->one(FLERR,str);
}
}
if (universe->nworlds == 1) universe->ulogfile = logfile;
}
}
/* ---------------------------------------------------------------------- */
void Input::next_command()
{
if (variable->next(narg,arg)) jump_skip = 1;
}
/* ---------------------------------------------------------------------- */
void Input::partition()
{
if (narg < 3) error->all(FLERR,"Illegal partition command");
int yesflag;
if (strcmp(arg[0],"yes") == 0) yesflag = 1;
else if (strcmp(arg[0],"no") == 0) yesflag = 0;
else error->all(FLERR,"Illegal partition command");
int ilo,ihi;
force->bounds(FLERR,arg[1],universe->nworlds,ilo,ihi);
// copy original line to copy, since will use strtok() on it
// ptr = start of 4th word
strcpy(copy,line);
char *ptr = strtok(copy," \t\n\r\f");
ptr = strtok(NULL," \t\n\r\f");
ptr = strtok(NULL," \t\n\r\f");
ptr += strlen(ptr) + 1;
ptr += strspn(ptr," \t\n\r\f");
// execute the remaining command line on requested partitions
if (yesflag) {
if (universe->iworld+1 >= ilo && universe->iworld+1 <= ihi) one(ptr);
} else {
if (universe->iworld+1 < ilo || universe->iworld+1 > ihi) one(ptr);
}
}
/* ---------------------------------------------------------------------- */
void Input::print()
{
if (narg < 1) error->all(FLERR,"Illegal print command");
// copy 1st arg back into line (copy is being used)
// check maxline since arg[0] could have been exanded by variables
// substitute for $ variables (no printing) and print arg
int n = strlen(arg[0]) + 1;
if (n > maxline) reallocate(line,maxline,n);
strcpy(line,arg[0]);
substitute(line,work,maxline,maxwork,0);
// parse optional args
FILE *fp = NULL;
int screenflag = 1;
int iarg = 1;
while (iarg < narg) {
if (strcmp(arg[iarg],"file") == 0 || strcmp(arg[iarg],"append") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal print command");
if (me == 0) {
if (fp != NULL) fclose(fp);
if (strcmp(arg[iarg],"file") == 0) fp = fopen(arg[iarg+1],"w");
else fp = fopen(arg[iarg+1],"a");
if (fp == NULL) {
char str[128];
sprintf(str,"Cannot open print file %s",arg[iarg+1]);
error->one(FLERR,str);
}
}
iarg += 2;
} else if (strcmp(arg[iarg],"screen") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal print command");
if (strcmp(arg[iarg+1],"yes") == 0) screenflag = 1;
else if (strcmp(arg[iarg+1],"no") == 0) screenflag = 0;
else error->all(FLERR,"Illegal print command");
iarg += 2;
} else error->all(FLERR,"Illegal print command");
}
if (me == 0) {
if (screenflag && screen) fprintf(screen,"%s\n",line);
if (screenflag && logfile) fprintf(logfile,"%s\n",line);
if (fp) {
fprintf(fp,"%s\n",line);
fclose(fp);
}
}
}
/* ---------------------------------------------------------------------- */
void Input::python()
{
variable->python_command(narg,arg);
}
/* ---------------------------------------------------------------------- */
void Input::quit()
{
if (narg == 0) error->done(0); // 1 would be fully backwards compatible
if (narg == 1) error->done(force->inumeric(FLERR,arg[0]));
error->all(FLERR,"Illegal quit command");
}
/* ---------------------------------------------------------------------- */
char *shell_failed_message(const char* cmd, int errnum)
{
const char *errmsg = strerror(errnum);
int len = strlen(cmd)+strlen(errmsg)+64;
char *msg = new char[len];
sprintf(msg,"Shell command '%s' failed with error '%s'", cmd, errmsg);
return msg;
}
void Input::shell()
{
int rv,err;
if (narg < 1) error->all(FLERR,"Illegal shell command");
if (strcmp(arg[0],"cd") == 0) {
if (narg != 2) error->all(FLERR,"Illegal shell cd command");
rv = (chdir(arg[1]) < 0) ? errno : 0;
MPI_Reduce(&rv,&err,1,MPI_INT,MPI_MAX,0,world);
if (me == 0 && err != 0) {
char *message = shell_failed_message("cd",err);
error->warning(FLERR,message);
delete [] message;
}
} else if (strcmp(arg[0],"mkdir") == 0) {
if (narg < 2) error->all(FLERR,"Illegal shell mkdir command");
if (me == 0)
for (int i = 1; i < narg; i++) {
#if defined(_WIN32)
rv = _mkdir(arg[i]);
#else
rv = mkdir(arg[i], S_IRWXU | S_IRGRP | S_IXGRP);
#endif
if (rv < 0) {
char *message = shell_failed_message("mkdir",errno);
error->warning(FLERR,message);
delete [] message;
}
}
} else if (strcmp(arg[0],"mv") == 0) {
if (narg != 3) error->all(FLERR,"Illegal shell mv command");
rv = (rename(arg[1],arg[2]) < 0) ? errno : 0;
MPI_Reduce(&rv,&err,1,MPI_INT,MPI_MAX,0,world);
if (me == 0 && err != 0) {
char *message = shell_failed_message("mv",err);
error->warning(FLERR,message);
delete [] message;
}
} else if (strcmp(arg[0],"rm") == 0) {
if (narg < 2) error->all(FLERR,"Illegal shell rm command");
if (me == 0)
for (int i = 1; i < narg; i++) {
if (unlink(arg[i]) < 0) {
char *message = shell_failed_message("rm",errno);
error->warning(FLERR,message);
delete [] message;
}
}
} else if (strcmp(arg[0],"rmdir") == 0) {
if (narg < 2) error->all(FLERR,"Illegal shell rmdir command");
if (me == 0)
for (int i = 1; i < narg; i++) {
if (rmdir(arg[i]) < 0) {
char *message = shell_failed_message("rmdir",errno);
error->warning(FLERR,message);
delete [] message;
}
}
} else if (strcmp(arg[0],"putenv") == 0) {
if (narg < 2) error->all(FLERR,"Illegal shell putenv command");
for (int i = 1; i < narg; i++) {
char *ptr = strdup(arg[i]);
rv = 0;
#ifdef _WIN32
if (ptr != NULL) rv = _putenv(ptr);
#else
if (ptr != NULL) rv = putenv(ptr);
#endif
rv = (rv < 0) ? errno : 0;
MPI_Reduce(&rv,&err,1,MPI_INT,MPI_MAX,0,world);
if (me == 0 && err != 0) {
char *message = shell_failed_message("putenv",err);
error->warning(FLERR,message);
delete [] message;
}
}
// use work string to concat args back into one string separated by spaces
// invoke string in shell via system()
} else {
int n = 0;
for (int i = 0; i < narg; i++) n += strlen(arg[i]) + 1;
if (n > maxwork) reallocate(work,maxwork,n);
strcpy(work,arg[0]);
for (int i = 1; i < narg; i++) {
strcat(work," ");
strcat(work,arg[i]);
}
if (me == 0)
if (system(work) != 0)
error->warning(FLERR,"Shell command returned with non-zero status");
}
}
/* ---------------------------------------------------------------------- */
void Input::variable_command()
{
variable->set(narg,arg);
}
/* ---------------------------------------------------------------------- */
/* ---------------------------------------------------------------------- */
/* ---------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
one function for each LAMMPS-specific input script command
------------------------------------------------------------------------- */
/* ---------------------------------------------------------------------- */
void Input::angle_coeff()
{
if (domain->box_exist == 0)
error->all(FLERR,"Angle_coeff command before simulation box is defined");
if (force->angle == NULL)
error->all(FLERR,"Angle_coeff command before angle_style is defined");
if (atom->avec->angles_allow == 0)
error->all(FLERR,"Angle_coeff command when no angles allowed");
force->angle->coeff(narg,arg);
}
/* ---------------------------------------------------------------------- */
void Input::angle_style()
{
if (narg < 1) error->all(FLERR,"Illegal angle_style command");
if (atom->avec->angles_allow == 0)
error->all(FLERR,"Angle_style command when no angles allowed");
force->create_angle(arg[0],1);
if (force->angle) force->angle->settings(narg-1,&arg[1]);
}
/* ---------------------------------------------------------------------- */
void Input::atom_modify()
{
atom->modify_params(narg,arg);
}
/* ---------------------------------------------------------------------- */
void Input::atom_style()
{
if (narg < 1) error->all(FLERR,"Illegal atom_style command");
if (domain->box_exist)
error->all(FLERR,"Atom_style command after simulation box is defined");
atom->create_avec(arg[0],narg-1,&arg[1],1);
}
/* ---------------------------------------------------------------------- */
void Input::bond_coeff()
{
if (domain->box_exist == 0)
error->all(FLERR,"Bond_coeff command before simulation box is defined");
if (force->bond == NULL)
error->all(FLERR,"Bond_coeff command before bond_style is defined");
if (atom->avec->bonds_allow == 0)
error->all(FLERR,"Bond_coeff command when no bonds allowed");
force->bond->coeff(narg,arg);
}
/* ---------------------------------------------------------------------- */
void Input::bond_style()
{
if (narg < 1) error->all(FLERR,"Illegal bond_style command");
if (atom->avec->bonds_allow == 0)
error->all(FLERR,"Bond_style command when no bonds allowed");
force->create_bond(arg[0],1);
if (force->bond) force->bond->settings(narg-1,&arg[1]);
}
/* ---------------------------------------------------------------------- */
void Input::bond_write()
{
if (atom->avec->bonds_allow == 0)
error->all(FLERR,"Bond_write command when no bonds allowed");
if (force->bond == NULL)
error->all(FLERR,"Bond_write command before bond_style is defined");
else force->bond->write_file(narg,arg);
}
/* ---------------------------------------------------------------------- */
void Input::boundary()
{
if (domain->box_exist)
error->all(FLERR,"Boundary command after simulation box is defined");
domain->set_boundary(narg,arg,0);
}
/* ---------------------------------------------------------------------- */
void Input::box()
{
if (domain->box_exist)
error->all(FLERR,"Box command after simulation box is defined");
domain->set_box(narg,arg);
}
/* ---------------------------------------------------------------------- */
void Input::comm_modify()
{
comm->modify_params(narg,arg);
}
/* ---------------------------------------------------------------------- */
void Input::comm_style()
{
if (narg < 1) error->all(FLERR,"Illegal comm_style command");
if (strcmp(arg[0],"brick") == 0) {
if (comm->style == 0) return;
Comm *oldcomm = comm;
comm = new CommBrick(lmp,oldcomm);
delete oldcomm;
} else if (strcmp(arg[0],"tiled") == 0) {
if (comm->style == 1) return;
Comm *oldcomm = comm;
if (lmp->kokkos) comm = new CommTiledKokkos(lmp,oldcomm);
else comm = new CommTiled(lmp,oldcomm);
delete oldcomm;
} else error->all(FLERR,"Illegal comm_style command");
}
/* ---------------------------------------------------------------------- */
void Input::compute()
{
modify->add_compute(narg,arg,1);
}
/* ---------------------------------------------------------------------- */
void Input::compute_modify()
{
modify->modify_compute(narg,arg);
}
/* ---------------------------------------------------------------------- */
void Input::dielectric()
{
if (narg != 1) error->all(FLERR,"Illegal dielectric command");
force->dielectric = force->numeric(FLERR,arg[0]);
}
/* ---------------------------------------------------------------------- */
void Input::dihedral_coeff()
{
if (domain->box_exist == 0)
error->all(FLERR,"Dihedral_coeff command before simulation box is defined");
if (force->dihedral == NULL)
error->all(FLERR,"Dihedral_coeff command before dihedral_style is defined");
if (atom->avec->dihedrals_allow == 0)
error->all(FLERR,"Dihedral_coeff command when no dihedrals allowed");
force->dihedral->coeff(narg,arg);
}
/* ---------------------------------------------------------------------- */
void Input::dihedral_style()
{
if (narg < 1) error->all(FLERR,"Illegal dihedral_style command");
if (atom->avec->dihedrals_allow == 0)
error->all(FLERR,"Dihedral_style command when no dihedrals allowed");
force->create_dihedral(arg[0],1);
if (force->dihedral) force->dihedral->settings(narg-1,&arg[1]);
}
/* ---------------------------------------------------------------------- */
void Input::dimension()
{
if (narg != 1) error->all(FLERR,"Illegal dimension command");
if (domain->box_exist)
error->all(FLERR,"Dimension command after simulation box is defined");
domain->dimension = force->inumeric(FLERR,arg[0]);
if (domain->dimension != 2 && domain->dimension != 3)
error->all(FLERR,"Illegal dimension command");
// must reset default extra_dof of all computes
// since some were created before dimension command is encountered
for (int i = 0; i < modify->ncompute; i++)
modify->compute[i]->reset_extra_dof();
}
/* ---------------------------------------------------------------------- */
void Input::dump()
{
output->add_dump(narg,arg);
}
/* ---------------------------------------------------------------------- */
void Input::dump_modify()
{
output->modify_dump(narg,arg);
}
/* ---------------------------------------------------------------------- */
void Input::fix()
{
modify->add_fix(narg,arg,1);
}
/* ---------------------------------------------------------------------- */
void Input::fix_modify()
{
modify->modify_fix(narg,arg);
}
/* ---------------------------------------------------------------------- */
void Input::group_command()
{
group->assign(narg,arg);
}
/* ---------------------------------------------------------------------- */
void Input::improper_coeff()
{
if (domain->box_exist == 0)
error->all(FLERR,"Improper_coeff command before simulation box is defined");
if (force->improper == NULL)
error->all(FLERR,"Improper_coeff command before improper_style is defined");
if (atom->avec->impropers_allow == 0)
error->all(FLERR,"Improper_coeff command when no impropers allowed");
force->improper->coeff(narg,arg);
}
/* ---------------------------------------------------------------------- */
void Input::improper_style()
{
if (narg < 1) error->all(FLERR,"Illegal improper_style command");
if (atom->avec->impropers_allow == 0)
error->all(FLERR,"Improper_style command when no impropers allowed");
force->create_improper(arg[0],1);
if (force->improper) force->improper->settings(narg-1,&arg[1]);
}
/* ---------------------------------------------------------------------- */
void Input::kspace_modify()
{
if (force->kspace == NULL)
error->all(FLERR,"KSpace style has not yet been set");
force->kspace->modify_params(narg,arg);
}
/* ---------------------------------------------------------------------- */
void Input::kspace_style()
{
force->create_kspace(narg,arg,1);
}
/* ---------------------------------------------------------------------- */
void Input::lattice()
{
domain->set_lattice(narg,arg);
}
/* ---------------------------------------------------------------------- */
void Input::mass()
{
if (narg != 2) error->all(FLERR,"Illegal mass command");
if (domain->box_exist == 0)
error->all(FLERR,"Mass command before simulation box is defined");
atom->set_mass(FLERR,narg,arg);
}
/* ---------------------------------------------------------------------- */
void Input::min_modify()
{
update->minimize->modify_params(narg,arg);
}
/* ---------------------------------------------------------------------- */
void Input::min_style()
{
if (domain->box_exist == 0)
error->all(FLERR,"Min_style command before simulation box is defined");
update->create_minimize(narg,arg);
}
/* ---------------------------------------------------------------------- */
void Input::molecule()
{
atom->add_molecule(narg,arg);
}
/* ---------------------------------------------------------------------- */
void Input::neigh_modify()
{
neighbor->modify_params(narg,arg);
}
/* ---------------------------------------------------------------------- */
void Input::neighbor_command()
{
neighbor->set(narg,arg);
}
/* ---------------------------------------------------------------------- */
void Input::newton()
{
int newton_pair=1,newton_bond=1;
if (narg == 1) {
if (strcmp(arg[0],"off") == 0) newton_pair = newton_bond = 0;
else if (strcmp(arg[0],"on") == 0) newton_pair = newton_bond = 1;
else error->all(FLERR,"Illegal newton command");
} else if (narg == 2) {
if (strcmp(arg[0],"off") == 0) newton_pair = 0;
else if (strcmp(arg[0],"on") == 0) newton_pair= 1;
else error->all(FLERR,"Illegal newton command");
if (strcmp(arg[1],"off") == 0) newton_bond = 0;
else if (strcmp(arg[1],"on") == 0) newton_bond = 1;
else error->all(FLERR,"Illegal newton command");
} else error->all(FLERR,"Illegal newton command");
force->newton_pair = newton_pair;
if (domain->box_exist && (newton_bond != force->newton_bond))
error->all(FLERR,"Newton bond change after simulation box is defined");
force->newton_bond = newton_bond;
if (newton_pair || newton_bond) force->newton = 1;
else force->newton = 0;
}
/* ---------------------------------------------------------------------- */
void Input::package()
{
if (domain->box_exist)
error->all(FLERR,"Package command after simulation box is defined");
if (narg < 1) error->all(FLERR,"Illegal package command");
// same checks for packages existing as in LAMMPS::post_create()
// since can be invoked here by package command in input script
if (strcmp(arg[0],"gpu") == 0) {
if (!modify->check_package("GPU"))
error->all(FLERR,"Package gpu command without GPU package installed");
char **fixarg = new char*[2+narg];
fixarg[0] = (char *) "package_gpu";
fixarg[1] = (char *) "all";
fixarg[2] = (char *) "GPU";
for (int i = 1; i < narg; i++) fixarg[i+2] = arg[i];
modify->add_fix(2+narg,fixarg);
delete [] fixarg;
} else if (strcmp(arg[0],"kokkos") == 0) {
if (lmp->kokkos == NULL || lmp->kokkos->kokkos_exists == 0)
error->all(FLERR,
"Package kokkos command without KOKKOS package enabled");
lmp->kokkos->accelerator(narg-1,&arg[1]);
} else if (strcmp(arg[0],"omp") == 0) {
if (!modify->check_package("OMP"))
error->all(FLERR,
"Package omp command without USER-OMP package installed");
char **fixarg = new char*[2+narg];
fixarg[0] = (char *) "package_omp";
fixarg[1] = (char *) "all";
fixarg[2] = (char *) "OMP";
for (int i = 1; i < narg; i++) fixarg[i+2] = arg[i];
modify->add_fix(2+narg,fixarg);
delete [] fixarg;
} else if (strcmp(arg[0],"intel") == 0) {
if (!modify->check_package("INTEL"))
error->all(FLERR,
"Package intel command without USER-INTEL package installed");
char **fixarg = new char*[2+narg];
fixarg[0] = (char *) "package_intel";
fixarg[1] = (char *) "all";
fixarg[2] = (char *) "INTEL";
for (int i = 1; i < narg; i++) fixarg[i+2] = arg[i];
modify->add_fix(2+narg,fixarg);
delete [] fixarg;
} else error->all(FLERR,"Illegal package command");
}
/* ---------------------------------------------------------------------- */
void Input::pair_coeff()
{
if (domain->box_exist == 0)
error->all(FLERR,"Pair_coeff command before simulation box is defined");
if (force->pair == NULL)
error->all(FLERR,"Pair_coeff command before pair_style is defined");
force->pair->coeff(narg,arg);
}
/* ---------------------------------------------------------------------- */
void Input::pair_modify()
{
if (force->pair == NULL)
error->all(FLERR,"Pair_modify command before pair_style is defined");
force->pair->modify_params(narg,arg);
}
/* ----------------------------------------------------------------------
if old pair style exists and new style is same, just change settings
else create new pair class
------------------------------------------------------------------------- */
void Input::pair_style()
{
if (narg < 1) error->all(FLERR,"Illegal pair_style command");
if (force->pair) {
int match = 0;
if (strcmp(arg[0],force->pair_style) == 0) match = 1;
if (!match && lmp->suffix_enable) {
char estyle[256];
if (lmp->suffix) {
sprintf(estyle,"%s/%s",arg[0],lmp->suffix);
if (strcmp(estyle,force->pair_style) == 0) match = 1;
}
if (lmp->suffix2) {
sprintf(estyle,"%s/%s",arg[0],lmp->suffix2);
if (strcmp(estyle,force->pair_style) == 0) match = 1;
}
}
if (match) {
force->pair->settings(narg-1,&arg[1]);
return;
}
}
force->create_pair(arg[0],1);
if (force->pair) force->pair->settings(narg-1,&arg[1]);
}
/* ---------------------------------------------------------------------- */
void Input::pair_write()
{
if (force->pair == NULL)
error->all(FLERR,"Pair_write command before pair_style is defined");
force->pair->write_file(narg,arg);
}
/* ---------------------------------------------------------------------- */
void Input::processors()
{
if (domain->box_exist)
error->all(FLERR,"Processors command after simulation box is defined");
comm->set_processors(narg,arg);
}
/* ---------------------------------------------------------------------- */
void Input::region()
{
domain->add_region(narg,arg);
}
/* ---------------------------------------------------------------------- */
void Input::reset_timestep()
{
update->reset_timestep(narg,arg);
}
/* ---------------------------------------------------------------------- */
void Input::restart()
{
output->create_restart(narg,arg);
}
/* ---------------------------------------------------------------------- */
void Input::run_style()
{
if (domain->box_exist == 0)
error->all(FLERR,"Run_style command before simulation box is defined");
update->create_integrate(narg,arg,1);
}
/* ---------------------------------------------------------------------- */
void Input::special_bonds()
{
// store 1-3,1-4 and dihedral/extra flag values before change
// change in 1-2 coeffs will not change the special list
double lj2 = force->special_lj[2];
double lj3 = force->special_lj[3];
double coul2 = force->special_coul[2];
double coul3 = force->special_coul[3];
int angle = force->special_angle;
int dihedral = force->special_dihedral;
int extra = force->special_extra;
force->set_special(narg,arg);
// if simulation box defined and saved values changed, redo special list
if (domain->box_exist && atom->molecular == 1) {
if (lj2 != force->special_lj[2] || lj3 != force->special_lj[3] ||
coul2 != force->special_coul[2] || coul3 != force->special_coul[3] ||
angle != force->special_angle ||
dihedral != force->special_dihedral ||
extra != force->special_extra) {
Special special(lmp);
special.build();
}
}
}
/* ---------------------------------------------------------------------- */
void Input::suffix()
{
if (narg < 1) error->all(FLERR,"Illegal suffix command");
if (strcmp(arg[0],"off") == 0) lmp->suffix_enable = 0;
else if (strcmp(arg[0],"on") == 0) lmp->suffix_enable = 1;
else {
lmp->suffix_enable = 1;
delete [] lmp->suffix;
delete [] lmp->suffix2;
if (strcmp(arg[0],"hybrid") == 0) {
if (narg != 3) error->all(FLERR,"Illegal suffix command");
int n = strlen(arg[1]) + 1;
lmp->suffix = new char[n];
strcpy(lmp->suffix,arg[1]);
n = strlen(arg[2]) + 1;
lmp->suffix2 = new char[n];
strcpy(lmp->suffix2,arg[2]);
} else {
if (narg != 1) error->all(FLERR,"Illegal suffix command");
int n = strlen(arg[0]) + 1;
lmp->suffix = new char[n];
strcpy(lmp->suffix,arg[0]);
}
}
}
/* ---------------------------------------------------------------------- */
void Input::thermo()
{
output->set_thermo(narg,arg);
}
/* ---------------------------------------------------------------------- */
void Input::thermo_modify()
{
output->thermo->modify_params(narg,arg);
}
/* ---------------------------------------------------------------------- */
void Input::thermo_style()
{
output->create_thermo(narg,arg);
}
/* ---------------------------------------------------------------------- */
void Input::timer_command()
{
timer->modify_params(narg,arg);
}
/* ---------------------------------------------------------------------- */
void Input::timestep()
{
if (narg != 1) error->all(FLERR,"Illegal timestep command");
update->dt = force->numeric(FLERR,arg[0]);
}
/* ---------------------------------------------------------------------- */
void Input::uncompute()
{
if (narg != 1) error->all(FLERR,"Illegal uncompute command");
modify->delete_compute(arg[0]);
}
/* ---------------------------------------------------------------------- */
void Input::undump()
{
if (narg != 1) error->all(FLERR,"Illegal undump command");
output->delete_dump(arg[0]);
}
/* ---------------------------------------------------------------------- */
void Input::unfix()
{
if (narg != 1) error->all(FLERR,"Illegal unfix command");
modify->delete_fix(arg[0]);
}
/* ---------------------------------------------------------------------- */
void Input::units()
{
if (narg != 1) error->all(FLERR,"Illegal units command");
if (domain->box_exist)
error->all(FLERR,"Units command after simulation box is defined");
update->set_units(arg[0]);
}
diff --git a/src/input.h b/src/input.h
index 7f9cefe06..9165ad981 100644
--- a/src/input.h
+++ b/src/input.h
@@ -1,383 +1,384 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#ifndef LMP_INPUT_H
#define LMP_INPUT_H
#include <stdio.h>
#include "pointers.h"
#include <map>
#include <string>
namespace LAMMPS_NS {
class Input : protected Pointers {
friend class Info;
+ friend class Error;
public:
int narg; // # of command args
char **arg; // parsed args for command
class Variable *variable; // defined variables
Input(class LAMMPS *, int, char **);
~Input();
void file(); // process all input
void file(const char *); // process an input script
char *one(const char *); // process a single command
void substitute(char *&, char *&, int &, int &, int);
// substitute for variables in a string
int expand_args(int, char **, int, char **&); // expand args due to wildcard
private:
int me; // proc ID
char *command; // ptr to current command
int maxarg; // max # of args in arg
char *line,*copy,*work; // input line & copy and work string
int maxline,maxcopy,maxwork; // max lengths of char strings
int echo_screen; // 0 = no, 1 = yes
int echo_log; // 0 = no, 1 = yes
int nfile,maxfile; // current # and max # of open input files
int label_active; // 0 = no label, 1 = looking for label
char *labelstr; // label string being looked for
int jump_skip; // 1 if skipping next jump, 0 otherwise
int ifthenelse_flag; // 1 if executing commands inside an if-then-else
FILE **infiles; // list of open input files
public:
typedef void (*CommandCreator)(LAMMPS *, int, char **);
typedef std::map<std::string,CommandCreator> CommandCreatorMap;
CommandCreatorMap *command_map;
protected:
template <typename T> static void command_creator(LAMMPS *, int, char **);
private:
void parse(); // parse an input text line
char *nextword(char *, char **); // find next word in string with quotes
int numtriple(char *); // count number of triple quotes
void reallocate(char *&, int &, int); // reallocate a char string
int execute_command(); // execute a single command
void clear(); // input script commands
void echo();
void ifthenelse();
void include();
void jump();
void label();
void log();
void next_command();
void partition();
void print();
void python();
void quit();
void shell();
void variable_command();
void angle_coeff(); // LAMMPS commands
void angle_style();
void atom_modify();
void atom_style();
void bond_coeff();
void bond_style();
void bond_write();
void boundary();
void box();
void comm_modify();
void comm_style();
void compute();
void compute_modify();
void dielectric();
void dihedral_coeff();
void dihedral_style();
void dimension();
void dump();
void dump_modify();
void fix();
void fix_modify();
void group_command();
void improper_coeff();
void improper_style();
void kspace_modify();
void kspace_style();
void lattice();
void mass();
void min_modify();
void min_style();
void molecule();
void neigh_modify();
void neighbor_command();
void newton();
void package();
void pair_coeff();
void pair_modify();
void pair_style();
void pair_write();
void processors();
void region();
void reset_timestep();
void restart();
void run_style();
void special_bonds();
void suffix();
void thermo();
void thermo_modify();
void thermo_style();
void timestep();
void timer_command();
void uncompute();
void undump();
void unfix();
void units();
};
}
#endif
/* ERROR/WARNING messages:
E: Label wasn't found in input script
Self-explanatory.
E: Unknown command: %s
The command is not known to LAMMPS. Check the input script.
E: Invalid use of library file() function
This function is called thru the library interface. This
error should not occur. Contact the developers if it does.
E: Cannot open input script %s
Self-explanatory.
E: Unbalanced quotes in input line
No matching end double quote was found following a leading double
quote.
E: Input line quote not followed by whitespace
An end quote must be followed by whitespace.
E: Invalid variable name
Variable name used in an input script line is invalid.
E: Invalid immediate variable
Syntax of immediate value is incorrect.
E: Substitution for illegal variable
Input script line contained a variable that could not be substituted
for.
E: Illegal ... command
Self-explanatory. Check the input script syntax and compare to the
documentation for the command. You can use -echo screen as a
command-line option when running LAMMPS to see the offending line.
E: Cannot use include command within an if command
Self-explanatory.
E: Cannot open logfile %s
The LAMMPS log file specified in the input script cannot be opened.
Check that the path and name are correct.
E: Cannot open print file %s
Self-explanatory.
W: Shell command '%s' failed with error '%s'
Self-explanatory.
W: Shell command returned with non-zero status
This may indicate the shell command did not operate as expected.
E: Angle_coeff command before simulation box is defined
The angle_coeff command cannot be used before a read_data,
read_restart, or create_box command.
E: Angle_coeff command before angle_style is defined
Coefficients cannot be set in the data file or via the angle_coeff
command until an angle_style has been assigned.
E: Angle_coeff command when no angles allowed
The chosen atom style does not allow for angles to be defined.
E: Angle_style command when no angles allowed
The chosen atom style does not allow for angles to be defined.
E: Atom_style command after simulation box is defined
The atom_style command cannot be used after a read_data,
read_restart, or create_box command.
E: Bond_coeff command before simulation box is defined
The bond_coeff command cannot be used before a read_data,
read_restart, or create_box command.
E: Bond_coeff command before bond_style is defined
Coefficients cannot be set in the data file or via the bond_coeff
command until an bond_style has been assigned.
E: Bond_coeff command when no bonds allowed
The chosen atom style does not allow for bonds to be defined.
E: Bond_style command when no bonds allowed
The chosen atom style does not allow for bonds to be defined.
E: Boundary command after simulation box is defined
The boundary command cannot be used after a read_data, read_restart,
or create_box command.
E: Box command after simulation box is defined
The box command cannot be used after a read_data, read_restart, or
create_box command.
E: Dihedral_coeff command before simulation box is defined
The dihedral_coeff command cannot be used before a read_data,
read_restart, or create_box command.
E: Dihedral_coeff command before dihedral_style is defined
Coefficients cannot be set in the data file or via the dihedral_coeff
command until an dihedral_style has been assigned.
E: Dihedral_coeff command when no dihedrals allowed
The chosen atom style does not allow for dihedrals to be defined.
E: Dihedral_style command when no dihedrals allowed
The chosen atom style does not allow for dihedrals to be defined.
E: Dimension command after simulation box is defined
The dimension command cannot be used after a read_data,
read_restart, or create_box command.
E: Improper_coeff command before simulation box is defined
The improper_coeff command cannot be used before a read_data,
read_restart, or create_box command.
E: Improper_coeff command before improper_style is defined
Coefficients cannot be set in the data file or via the improper_coeff
command until an improper_style has been assigned.
E: Improper_coeff command when no impropers allowed
The chosen atom style does not allow for impropers to be defined.
E: Improper_style command when no impropers allowed
The chosen atom style does not allow for impropers to be defined.
E: KSpace style has not yet been set
Cannot use kspace_modify command until a kspace style is set.
E: Mass command before simulation box is defined
The mass command cannot be used before a read_data, read_restart, or
create_box command.
E: Min_style command before simulation box is defined
The min_style command cannot be used before a read_data, read_restart,
or create_box command.
E: Newton bond change after simulation box is defined
The newton command cannot be used to change the newton bond value
after a read_data, read_restart, or create_box command.
E: Package command after simulation box is defined
The package command cannot be used afer a read_data, read_restart, or
create_box command.
E: Package gpu command without GPU package installed
The GPU package must be installed via "make yes-gpu" before LAMMPS is
built.
E: Package kokkos command without KOKKOS package enabled
The KOKKOS package must be installed via "make yes-kokkos" before
LAMMPS is built, and the "-k on" must be used to enable the package.
E: Package omp command without USER-OMP package installed
The USER-OMP package must be installed via "make yes-user-omp" before
LAMMPS is built.
E: Package intel command without USER-INTEL package installed
The USER-INTEL package must be installed via "make yes-user-intel"
before LAMMPS is built.
E: Pair_coeff command before simulation box is defined
The pair_coeff command cannot be used before a read_data,
read_restart, or create_box command.
E: Pair_coeff command before pair_style is defined
Self-explanatory.
E: Pair_modify command before pair_style is defined
Self-explanatory.
E: Pair_write command before pair_style is defined
Self-explanatory.
E: Processors command after simulation box is defined
The processors command cannot be used after a read_data, read_restart,
or create_box command.
E: Run_style command before simulation box is defined
The run_style command cannot be used before a read_data,
read_restart, or create_box command.
E: Units command after simulation box is defined
The units command cannot be used after a read_data, read_restart, or
create_box command.
*/
diff --git a/src/math_extra.h b/src/math_extra.h
index 75dd49284..a67acce3c 100644
--- a/src/math_extra.h
+++ b/src/math_extra.h
@@ -1,700 +1,713 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing author: Mike Brown (SNL)
------------------------------------------------------------------------- */
#ifndef LMP_MATH_EXTRA_H
#define LMP_MATH_EXTRA_H
#include <math.h>
#include <stdio.h>
#include <string.h>
#include "error.h"
namespace MathExtra {
// 3 vector operations
inline void copy3(const double *v, double *ans);
inline void zero3(double *v);
inline void norm3(double *v);
inline void normalize3(const double *v, double *ans);
inline void snormalize3(const double, const double *v, double *ans);
inline void negate3(double *v);
inline void scale3(double s, double *v);
inline void add3(const double *v1, const double *v2, double *ans);
inline void scaleadd3(double s, const double *v1, const double *v2,
double *ans);
inline void sub3(const double *v1, const double *v2, double *ans);
inline double len3(const double *v);
inline double lensq3(const double *v);
+ inline double distsq3(const double *v1, const double *v2);
inline double dot3(const double *v1, const double *v2);
inline void cross3(const double *v1, const double *v2, double *ans);
// 3x3 matrix operations
inline void col2mat(const double *ex, const double *ey, const double *ez,
double m[3][3]);
inline double det3(const double mat[3][3]);
inline void diag_times3(const double *d, const double m[3][3],
double ans[3][3]);
inline void times3_diag(const double m[3][3], const double *d,
double ans[3][3]);
inline void plus3(const double m[3][3], const double m2[3][3],
double ans[3][3]);
inline void times3(const double m[3][3], const double m2[3][3],
double ans[3][3]);
inline void transpose_times3(const double m[3][3], const double m2[3][3],
double ans[3][3]);
inline void times3_transpose(const double m[3][3], const double m2[3][3],
double ans[3][3]);
inline void invert3(const double mat[3][3], double ans[3][3]);
inline void matvec(const double mat[3][3], const double *vec, double *ans);
inline void matvec(const double *ex, const double *ey, const double *ez,
const double *vec, double *ans);
inline void transpose_matvec(const double mat[3][3], const double *vec,
double *ans);
inline void transpose_matvec(const double *ex, const double *ey,
const double *ez, const double *v,
double *ans);
inline void transpose_diag3(const double m[3][3], const double *d,
double ans[3][3]);
inline void vecmat(const double *v, const double m[3][3], double *ans);
inline void scalar_times3(const double f, double m[3][3]);
void write3(const double mat[3][3]);
int mldivide3(const double mat[3][3], const double *vec, double *ans);
int jacobi(double matrix[3][3], double *evalues, double evectors[3][3]);
void rotate(double matrix[3][3], int i, int j, int k, int l,
double s, double tau);
void richardson(double *q, double *m, double *w, double *moments, double dtq);
void no_squish_rotate(int k, double *p, double *q, double *inertia,
double dt);
// shape matrix operations
// upper-triangular 3x3 matrix stored in Voigt notation as 6-vector
inline void multiply_shape_shape(const double *one, const double *two,
double *ans);
// quaternion operations
inline void qnormalize(double *q);
inline void qconjugate(double *q, double *qc);
inline void vecquat(double *a, double *b, double *c);
inline void quatvec(double *a, double *b, double *c);
inline void quatquat(double *a, double *b, double *c);
inline void invquatvec(double *a, double *b, double *c);
inline void axisangle_to_quat(const double *v, const double angle,
double *quat);
void angmom_to_omega(double *m, double *ex, double *ey, double *ez,
double *idiag, double *w);
void omega_to_angmom(double *w, double *ex, double *ey, double *ez,
double *idiag, double *m);
void mq_to_omega(double *m, double *q, double *moments, double *w);
void exyz_to_q(double *ex, double *ey, double *ez, double *q);
void q_to_exyz(double *q, double *ex, double *ey, double *ez);
void quat_to_mat(const double *quat, double mat[3][3]);
void quat_to_mat_trans(const double *quat, double mat[3][3]);
// rotation operations
inline void rotation_generator_x(const double m[3][3], double ans[3][3]);
inline void rotation_generator_y(const double m[3][3], double ans[3][3]);
inline void rotation_generator_z(const double m[3][3], double ans[3][3]);
void BuildRxMatrix(double R[3][3], const double angle);
void BuildRyMatrix(double R[3][3], const double angle);
void BuildRzMatrix(double R[3][3], const double angle);
// moment of inertia operations
void inertia_ellipsoid(double *shape, double *quat, double mass,
double *inertia);
void inertia_line(double length, double theta, double mass,
double *inertia);
void inertia_triangle(double *v0, double *v1, double *v2,
double mass, double *inertia);
void inertia_triangle(double *idiag, double *quat, double mass,
double *inertia);
}
/* ----------------------------------------------------------------------
copy a vector, return in ans
------------------------------------------------------------------------- */
inline void MathExtra::copy3(const double *v, double *ans)
{
ans[0] = v[0];
ans[1] = v[1];
ans[2] = v[2];
}
/* ----------------------------------------------------------------------
set vector equal to zero
------------------------------------------------------------------------- */
inline void MathExtra::zero3(double *v)
{
v[0] = 0.0;
v[1] = 0.0;
v[2] = 0.0;
}
/* ----------------------------------------------------------------------
normalize a vector in place
------------------------------------------------------------------------- */
inline void MathExtra::norm3(double *v)
{
double scale = 1.0/sqrt(v[0]*v[0]+v[1]*v[1]+v[2]*v[2]);
v[0] *= scale;
v[1] *= scale;
v[2] *= scale;
}
/* ----------------------------------------------------------------------
normalize a vector, return in ans
------------------------------------------------------------------------- */
inline void MathExtra::normalize3(const double *v, double *ans)
{
double scale = 1.0/sqrt(v[0]*v[0]+v[1]*v[1]+v[2]*v[2]);
ans[0] = v[0]*scale;
ans[1] = v[1]*scale;
ans[2] = v[2]*scale;
}
/* ----------------------------------------------------------------------
scale a vector to length
------------------------------------------------------------------------- */
inline void MathExtra::snormalize3(const double length, const double *v,
double *ans)
{
double scale = length/sqrt(v[0]*v[0]+v[1]*v[1]+v[2]*v[2]);
ans[0] = v[0]*scale;
ans[1] = v[1]*scale;
ans[2] = v[2]*scale;
}
/* ----------------------------------------------------------------------
negate vector v
------------------------------------------------------------------------- */
inline void MathExtra::negate3(double *v)
{
v[0] = -v[0];
v[1] = -v[1];
v[2] = -v[2];
}
/* ----------------------------------------------------------------------
scale vector v by s
------------------------------------------------------------------------- */
inline void MathExtra::scale3(double s, double *v)
{
v[0] *= s;
v[1] *= s;
v[2] *= s;
}
/* ----------------------------------------------------------------------
ans = v1 + v2
------------------------------------------------------------------------- */
inline void MathExtra::add3(const double *v1, const double *v2, double *ans)
{
ans[0] = v1[0] + v2[0];
ans[1] = v1[1] + v2[1];
ans[2] = v1[2] + v2[2];
}
/* ----------------------------------------------------------------------
ans = s*v1 + v2
------------------------------------------------------------------------- */
inline void MathExtra::scaleadd3(double s, const double *v1,
const double *v2, double *ans)
{
ans[0] = s*v1[0] + v2[0];
ans[1] = s*v1[1] + v2[1];
ans[2] = s*v1[2] + v2[2];
}
/* ----------------------------------------------------------------------
ans = v1 - v2
------------------------------------------------------------------------- */
inline void MathExtra::sub3(const double *v1, const double *v2, double *ans)
{
ans[0] = v1[0] - v2[0];
ans[1] = v1[1] - v2[1];
ans[2] = v1[2] - v2[2];
}
/* ----------------------------------------------------------------------
length of vector v
------------------------------------------------------------------------- */
inline double MathExtra::len3(const double *v)
{
return sqrt(v[0]*v[0] + v[1]*v[1] + v[2]*v[2]);
}
/* ----------------------------------------------------------------------
squared length of vector v, or dot product of v with itself
------------------------------------------------------------------------- */
inline double MathExtra::lensq3(const double *v)
{
return v[0]*v[0] + v[1]*v[1] + v[2]*v[2];
}
+/* ----------------------------------------------------------------------
+ ans = distance squared between pts v1 and v2
+------------------------------------------------------------------------- */
+
+inline double MathExtra::distsq3(const double *v1, const double *v2)
+{
+ double dx = v1[0] - v2[0];
+ double dy = v1[1] - v2[1];
+ double dz = v1[2] - v2[2];
+ return dx*dx + dy*dy + dz*dz;
+}
+
/* ----------------------------------------------------------------------
dot product of 2 vectors
------------------------------------------------------------------------- */
inline double MathExtra::dot3(const double *v1, const double *v2)
{
return v1[0]*v2[0]+v1[1]*v2[1]+v1[2]*v2[2];
}
/* ----------------------------------------------------------------------
cross product of 2 vectors
------------------------------------------------------------------------- */
inline void MathExtra::cross3(const double *v1, const double *v2, double *ans)
{
ans[0] = v1[1]*v2[2] - v1[2]*v2[1];
ans[1] = v1[2]*v2[0] - v1[0]*v2[2];
ans[2] = v1[0]*v2[1] - v1[1]*v2[0];
}
/* ----------------------------------------------------------------------
construct matrix from 3 column vectors
------------------------------------------------------------------------- */
void MathExtra::col2mat(const double *ex, const double *ey, const double *ez,
double m[3][3])
{
m[0][0] = ex[0];
m[1][0] = ex[1];
m[2][0] = ex[2];
m[0][1] = ey[0];
m[1][1] = ey[1];
m[2][1] = ey[2];
m[0][2] = ez[0];
m[1][2] = ez[1];
m[2][2] = ez[2];
}
/* ----------------------------------------------------------------------
determinant of a matrix
------------------------------------------------------------------------- */
inline double MathExtra::det3(const double m[3][3])
{
double ans = m[0][0]*m[1][1]*m[2][2] - m[0][0]*m[1][2]*m[2][1] -
m[1][0]*m[0][1]*m[2][2] + m[1][0]*m[0][2]*m[2][1] +
m[2][0]*m[0][1]*m[1][2] - m[2][0]*m[0][2]*m[1][1];
return ans;
}
/* ----------------------------------------------------------------------
diagonal matrix times a full matrix
------------------------------------------------------------------------- */
inline void MathExtra::diag_times3(const double *d, const double m[3][3],
double ans[3][3])
{
ans[0][0] = d[0]*m[0][0];
ans[0][1] = d[0]*m[0][1];
ans[0][2] = d[0]*m[0][2];
ans[1][0] = d[1]*m[1][0];
ans[1][1] = d[1]*m[1][1];
ans[1][2] = d[1]*m[1][2];
ans[2][0] = d[2]*m[2][0];
ans[2][1] = d[2]*m[2][1];
ans[2][2] = d[2]*m[2][2];
}
/* ----------------------------------------------------------------------
full matrix times a diagonal matrix
------------------------------------------------------------------------- */
void MathExtra::times3_diag(const double m[3][3], const double *d,
double ans[3][3])
{
ans[0][0] = m[0][0]*d[0];
ans[0][1] = m[0][1]*d[1];
ans[0][2] = m[0][2]*d[2];
ans[1][0] = m[1][0]*d[0];
ans[1][1] = m[1][1]*d[1];
ans[1][2] = m[1][2]*d[2];
ans[2][0] = m[2][0]*d[0];
ans[2][1] = m[2][1]*d[1];
ans[2][2] = m[2][2]*d[2];
}
/* ----------------------------------------------------------------------
add two matrices
------------------------------------------------------------------------- */
inline void MathExtra::plus3(const double m[3][3], const double m2[3][3],
double ans[3][3])
{
ans[0][0] = m[0][0]+m2[0][0];
ans[0][1] = m[0][1]+m2[0][1];
ans[0][2] = m[0][2]+m2[0][2];
ans[1][0] = m[1][0]+m2[1][0];
ans[1][1] = m[1][1]+m2[1][1];
ans[1][2] = m[1][2]+m2[1][2];
ans[2][0] = m[2][0]+m2[2][0];
ans[2][1] = m[2][1]+m2[2][1];
ans[2][2] = m[2][2]+m2[2][2];
}
/* ----------------------------------------------------------------------
multiply mat1 times mat2
------------------------------------------------------------------------- */
inline void MathExtra::times3(const double m[3][3], const double m2[3][3],
double ans[3][3])
{
ans[0][0] = m[0][0]*m2[0][0] + m[0][1]*m2[1][0] + m[0][2]*m2[2][0];
ans[0][1] = m[0][0]*m2[0][1] + m[0][1]*m2[1][1] + m[0][2]*m2[2][1];
ans[0][2] = m[0][0]*m2[0][2] + m[0][1]*m2[1][2] + m[0][2]*m2[2][2];
ans[1][0] = m[1][0]*m2[0][0] + m[1][1]*m2[1][0] + m[1][2]*m2[2][0];
ans[1][1] = m[1][0]*m2[0][1] + m[1][1]*m2[1][1] + m[1][2]*m2[2][1];
ans[1][2] = m[1][0]*m2[0][2] + m[1][1]*m2[1][2] + m[1][2]*m2[2][2];
ans[2][0] = m[2][0]*m2[0][0] + m[2][1]*m2[1][0] + m[2][2]*m2[2][0];
ans[2][1] = m[2][0]*m2[0][1] + m[2][1]*m2[1][1] + m[2][2]*m2[2][1];
ans[2][2] = m[2][0]*m2[0][2] + m[2][1]*m2[1][2] + m[2][2]*m2[2][2];
}
/* ----------------------------------------------------------------------
multiply the transpose of mat1 times mat2
------------------------------------------------------------------------- */
inline void MathExtra::transpose_times3(const double m[3][3],
const double m2[3][3],double ans[3][3])
{
ans[0][0] = m[0][0]*m2[0][0] + m[1][0]*m2[1][0] + m[2][0]*m2[2][0];
ans[0][1] = m[0][0]*m2[0][1] + m[1][0]*m2[1][1] + m[2][0]*m2[2][1];
ans[0][2] = m[0][0]*m2[0][2] + m[1][0]*m2[1][2] + m[2][0]*m2[2][2];
ans[1][0] = m[0][1]*m2[0][0] + m[1][1]*m2[1][0] + m[2][1]*m2[2][0];
ans[1][1] = m[0][1]*m2[0][1] + m[1][1]*m2[1][1] + m[2][1]*m2[2][1];
ans[1][2] = m[0][1]*m2[0][2] + m[1][1]*m2[1][2] + m[2][1]*m2[2][2];
ans[2][0] = m[0][2]*m2[0][0] + m[1][2]*m2[1][0] + m[2][2]*m2[2][0];
ans[2][1] = m[0][2]*m2[0][1] + m[1][2]*m2[1][1] + m[2][2]*m2[2][1];
ans[2][2] = m[0][2]*m2[0][2] + m[1][2]*m2[1][2] + m[2][2]*m2[2][2];
}
/* ----------------------------------------------------------------------
multiply mat1 times transpose of mat2
------------------------------------------------------------------------- */
inline void MathExtra::times3_transpose(const double m[3][3],
const double m2[3][3],double ans[3][3])
{
ans[0][0] = m[0][0]*m2[0][0] + m[0][1]*m2[0][1] + m[0][2]*m2[0][2];
ans[0][1] = m[0][0]*m2[1][0] + m[0][1]*m2[1][1] + m[0][2]*m2[1][2];
ans[0][2] = m[0][0]*m2[2][0] + m[0][1]*m2[2][1] + m[0][2]*m2[2][2];
ans[1][0] = m[1][0]*m2[0][0] + m[1][1]*m2[0][1] + m[1][2]*m2[0][2];
ans[1][1] = m[1][0]*m2[1][0] + m[1][1]*m2[1][1] + m[1][2]*m2[1][2];
ans[1][2] = m[1][0]*m2[2][0] + m[1][1]*m2[2][1] + m[1][2]*m2[2][2];
ans[2][0] = m[2][0]*m2[0][0] + m[2][1]*m2[0][1] + m[2][2]*m2[0][2];
ans[2][1] = m[2][0]*m2[1][0] + m[2][1]*m2[1][1] + m[2][2]*m2[1][2];
ans[2][2] = m[2][0]*m2[2][0] + m[2][1]*m2[2][1] + m[2][2]*m2[2][2];
}
/* ----------------------------------------------------------------------
invert a matrix
does NOT check for singular or badly scaled matrix
------------------------------------------------------------------------- */
inline void MathExtra::invert3(const double m[3][3], double ans[3][3])
{
double den = m[0][0]*m[1][1]*m[2][2]-m[0][0]*m[1][2]*m[2][1];
den += -m[1][0]*m[0][1]*m[2][2]+m[1][0]*m[0][2]*m[2][1];
den += m[2][0]*m[0][1]*m[1][2]-m[2][0]*m[0][2]*m[1][1];
ans[0][0] = (m[1][1]*m[2][2]-m[1][2]*m[2][1]) / den;
ans[0][1] = -(m[0][1]*m[2][2]-m[0][2]*m[2][1]) / den;
ans[0][2] = (m[0][1]*m[1][2]-m[0][2]*m[1][1]) / den;
ans[1][0] = -(m[1][0]*m[2][2]-m[1][2]*m[2][0]) / den;
ans[1][1] = (m[0][0]*m[2][2]-m[0][2]*m[2][0]) / den;
ans[1][2] = -(m[0][0]*m[1][2]-m[0][2]*m[1][0]) / den;
ans[2][0] = (m[1][0]*m[2][1]-m[1][1]*m[2][0]) / den;
ans[2][1] = -(m[0][0]*m[2][1]-m[0][1]*m[2][0]) / den;
ans[2][2] = (m[0][0]*m[1][1]-m[0][1]*m[1][0]) / den;
}
/* ----------------------------------------------------------------------
matrix times vector
------------------------------------------------------------------------- */
inline void MathExtra::matvec(const double m[3][3], const double *v,
double *ans)
{
ans[0] = m[0][0]*v[0] + m[0][1]*v[1] + m[0][2]*v[2];
ans[1] = m[1][0]*v[0] + m[1][1]*v[1] + m[1][2]*v[2];
ans[2] = m[2][0]*v[0] + m[2][1]*v[1] + m[2][2]*v[2];
}
/* ----------------------------------------------------------------------
matrix times vector
------------------------------------------------------------------------- */
inline void MathExtra::matvec(const double *ex, const double *ey,
const double *ez, const double *v, double *ans)
{
ans[0] = ex[0]*v[0] + ey[0]*v[1] + ez[0]*v[2];
ans[1] = ex[1]*v[0] + ey[1]*v[1] + ez[1]*v[2];
ans[2] = ex[2]*v[0] + ey[2]*v[1] + ez[2]*v[2];
}
/* ----------------------------------------------------------------------
transposed matrix times vector
------------------------------------------------------------------------- */
inline void MathExtra::transpose_matvec(const double m[3][3], const double *v,
double *ans)
{
ans[0] = m[0][0]*v[0] + m[1][0]*v[1] + m[2][0]*v[2];
ans[1] = m[0][1]*v[0] + m[1][1]*v[1] + m[2][1]*v[2];
ans[2] = m[0][2]*v[0] + m[1][2]*v[1] + m[2][2]*v[2];
}
/* ----------------------------------------------------------------------
transposed matrix times vector
------------------------------------------------------------------------- */
inline void MathExtra::transpose_matvec(const double *ex, const double *ey,
const double *ez, const double *v,
double *ans)
{
ans[0] = ex[0]*v[0] + ex[1]*v[1] + ex[2]*v[2];
ans[1] = ey[0]*v[0] + ey[1]*v[1] + ey[2]*v[2];
ans[2] = ez[0]*v[0] + ez[1]*v[1] + ez[2]*v[2];
}
/* ----------------------------------------------------------------------
transposed matrix times diagonal matrix
------------------------------------------------------------------------- */
inline void MathExtra::transpose_diag3(const double m[3][3], const double *d,
double ans[3][3])
{
ans[0][0] = m[0][0]*d[0];
ans[0][1] = m[1][0]*d[1];
ans[0][2] = m[2][0]*d[2];
ans[1][0] = m[0][1]*d[0];
ans[1][1] = m[1][1]*d[1];
ans[1][2] = m[2][1]*d[2];
ans[2][0] = m[0][2]*d[0];
ans[2][1] = m[1][2]*d[1];
ans[2][2] = m[2][2]*d[2];
}
/* ----------------------------------------------------------------------
row vector times matrix
------------------------------------------------------------------------- */
inline void MathExtra::vecmat(const double *v, const double m[3][3],
double *ans)
{
ans[0] = v[0]*m[0][0] + v[1]*m[1][0] + v[2]*m[2][0];
ans[1] = v[0]*m[0][1] + v[1]*m[1][1] + v[2]*m[2][1];
ans[2] = v[0]*m[0][2] + v[1]*m[1][2] + v[2]*m[2][2];
}
/* ----------------------------------------------------------------------
matrix times scalar, in place
------------------------------------------------------------------------- */
inline void MathExtra::scalar_times3(const double f, double m[3][3])
{
m[0][0] *= f; m[0][1] *= f; m[0][2] *= f;
m[1][0] *= f; m[1][1] *= f; m[1][2] *= f;
m[2][0] *= f; m[2][1] *= f; m[2][2] *= f;
}
/* ----------------------------------------------------------------------
multiply 2 shape matrices
upper-triangular 3x3, stored as 6-vector in Voigt notation
------------------------------------------------------------------------- */
inline void MathExtra::multiply_shape_shape(const double *one,
const double *two, double *ans)
{
ans[0] = one[0]*two[0];
ans[1] = one[1]*two[1];
ans[2] = one[2]*two[2];
ans[3] = one[1]*two[3] + one[3]*two[2];
ans[4] = one[0]*two[4] + one[5]*two[3] + one[4]*two[2];
ans[5] = one[0]*two[5] + one[5]*two[1];
}
/* ----------------------------------------------------------------------
normalize a quaternion
------------------------------------------------------------------------- */
inline void MathExtra::qnormalize(double *q)
{
double norm = 1.0 / sqrt(q[0]*q[0] + q[1]*q[1] + q[2]*q[2] + q[3]*q[3]);
q[0] *= norm;
q[1] *= norm;
q[2] *= norm;
q[3] *= norm;
}
/* ----------------------------------------------------------------------
conjugate of a quaternion: qc = conjugate of q
assume q is of unit length
------------------------------------------------------------------------- */
inline void MathExtra::qconjugate(double *q, double *qc)
{
qc[0] = q[0];
qc[1] = -q[1];
qc[2] = -q[2];
qc[3] = -q[3];
}
/* ----------------------------------------------------------------------
vector-quaternion multiply: c = a*b, where a = (0,a)
------------------------------------------------------------------------- */
inline void MathExtra::vecquat(double *a, double *b, double *c)
{
c[0] = -a[0]*b[1] - a[1]*b[2] - a[2]*b[3];
c[1] = b[0]*a[0] + a[1]*b[3] - a[2]*b[2];
c[2] = b[0]*a[1] + a[2]*b[1] - a[0]*b[3];
c[3] = b[0]*a[2] + a[0]*b[2] - a[1]*b[1];
}
/* ----------------------------------------------------------------------
quaternion-vector multiply: c = a*b, where b = (0,b)
------------------------------------------------------------------------- */
inline void MathExtra::quatvec(double *a, double *b, double *c)
{
c[0] = -a[1]*b[0] - a[2]*b[1] - a[3]*b[2];
c[1] = a[0]*b[0] + a[2]*b[2] - a[3]*b[1];
c[2] = a[0]*b[1] + a[3]*b[0] - a[1]*b[2];
c[3] = a[0]*b[2] + a[1]*b[1] - a[2]*b[0];
}
/* ----------------------------------------------------------------------
quaternion-quaternion multiply: c = a*b
------------------------------------------------------------------------- */
inline void MathExtra::quatquat(double *a, double *b, double *c)
{
c[0] = a[0]*b[0] - a[1]*b[1] - a[2]*b[2] - a[3]*b[3];
c[1] = a[0]*b[1] + b[0]*a[1] + a[2]*b[3] - a[3]*b[2];
c[2] = a[0]*b[2] + b[0]*a[2] + a[3]*b[1] - a[1]*b[3];
c[3] = a[0]*b[3] + b[0]*a[3] + a[1]*b[2] - a[2]*b[1];
}
/* ----------------------------------------------------------------------
quaternion multiply: c = inv(a)*b
a is a quaternion
b is a four component vector
c is a three component vector
------------------------------------------------------------------------- */
inline void MathExtra::invquatvec(double *a, double *b, double *c)
{
c[0] = -a[1]*b[0] + a[0]*b[1] + a[3]*b[2] - a[2]*b[3];
c[1] = -a[2]*b[0] - a[3]*b[1] + a[0]*b[2] + a[1]*b[3];
c[2] = -a[3]*b[0] + a[2]*b[1] - a[1]*b[2] + a[0]*b[3];
}
/* ----------------------------------------------------------------------
compute quaternion from axis-angle rotation
v MUST be a unit vector
------------------------------------------------------------------------- */
inline void MathExtra::axisangle_to_quat(const double *v, const double angle,
double *quat)
{
double halfa = 0.5*angle;
double sina = sin(halfa);
quat[0] = cos(halfa);
quat[1] = v[0]*sina;
quat[2] = v[1]*sina;
quat[3] = v[2]*sina;
}
/* ----------------------------------------------------------------------
Apply principal rotation generator about x to rotation matrix m
------------------------------------------------------------------------- */
inline void MathExtra::rotation_generator_x(const double m[3][3],
double ans[3][3])
{
ans[0][0] = 0;
ans[0][1] = -m[0][2];
ans[0][2] = m[0][1];
ans[1][0] = 0;
ans[1][1] = -m[1][2];
ans[1][2] = m[1][1];
ans[2][0] = 0;
ans[2][1] = -m[2][2];
ans[2][2] = m[2][1];
}
/* ----------------------------------------------------------------------
Apply principal rotation generator about y to rotation matrix m
------------------------------------------------------------------------- */
inline void MathExtra::rotation_generator_y(const double m[3][3],
double ans[3][3])
{
ans[0][0] = m[0][2];
ans[0][1] = 0;
ans[0][2] = -m[0][0];
ans[1][0] = m[1][2];
ans[1][1] = 0;
ans[1][2] = -m[1][0];
ans[2][0] = m[2][2];
ans[2][1] = 0;
ans[2][2] = -m[2][0];
}
/* ----------------------------------------------------------------------
Apply principal rotation generator about z to rotation matrix m
------------------------------------------------------------------------- */
inline void MathExtra::rotation_generator_z(const double m[3][3],
double ans[3][3])
{
ans[0][0] = -m[0][1];
ans[0][1] = m[0][0];
ans[0][2] = 0;
ans[1][0] = -m[1][1];
ans[1][1] = m[1][0];
ans[1][2] = 0;
ans[2][0] = -m[2][1];
ans[2][1] = m[2][0];
ans[2][2] = 0;
}
#endif
diff --git a/src/neighbor.cpp b/src/neighbor.cpp
index af5939120..834bdffb7 100644
--- a/src/neighbor.cpp
+++ b/src/neighbor.cpp
@@ -1,2074 +1,2076 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing author (triclinic and multi-neigh) : Pieter in 't Veld (SNL)
------------------------------------------------------------------------- */
#include <mpi.h>
#include <math.h>
#include <stdlib.h>
#include <string.h>
#include "neighbor.h"
#include "neigh_list.h"
#include "neigh_request.h"
#include "style_nbin.h"
#include "style_nstencil.h"
#include "style_npair.h"
#include "style_ntopo.h"
#include "atom.h"
#include "atom_vec.h"
#include "comm.h"
#include "force.h"
#include "pair.h"
#include "domain.h"
#include "group.h"
#include "modify.h"
#include "fix.h"
#include "compute.h"
#include "update.h"
#include "respa.h"
#include "output.h"
#include "citeme.h"
#include "memory.h"
#include "error.h"
#include <map>
using namespace LAMMPS_NS;
using namespace NeighConst;
#define RQDELTA 1
#define EXDELTA 1
#define BIG 1.0e20
enum{NSQ,BIN,MULTI}; // also in NBin, NeighList, NStencil
enum{NONE,ALL,PARTIAL,TEMPLATE};
static const char cite_neigh_multi[] =
"neighbor multi command:\n\n"
"@Article{Intveld08,\n"
" author = {P.{\\,}J.~in{\\,}'t~Veld and S.{\\,}J.~Plimpton"
" and G.{\\,}S.~Grest},\n"
" title = {Accurate and Efficient Methods for Modeling Colloidal\n"
" Mixtures in an Explicit Solvent using Molecular Dynamics},\n"
" journal = {Comp.~Phys.~Comm.},\n"
" year = 2008,\n"
" volume = 179,\n"
" pages = {320--329}\n"
"}\n\n";
//#define NEIGH_LIST_DEBUG 1
/* ---------------------------------------------------------------------- */
Neighbor::Neighbor(LAMMPS *lmp) : Pointers(lmp),
pairclass(NULL), pairnames(NULL), pairmasks(NULL)
{
MPI_Comm_rank(world,&me);
MPI_Comm_size(world,&nprocs);
firsttime = 1;
style = BIN;
every = 1;
delay = 10;
dist_check = 1;
pgsize = 100000;
oneatom = 2000;
binsizeflag = 0;
build_once = 0;
cluster_check = 0;
ago = -1;
cutneighmax = 0.0;
cutneighsq = NULL;
cutneighghostsq = NULL;
cuttype = NULL;
cuttypesq = NULL;
fixchecklist = NULL;
// pairwise neighbor lists and associated data structs
nlist = 0;
lists = NULL;
nbin = 0;
neigh_bin = NULL;
nstencil = 0;
neigh_stencil = NULL;
neigh_pair = NULL;
nstencil_perpetual = 0;
slist = NULL;
npair_perpetual = 0;
plist = NULL;
nrequest = maxrequest = 0;
requests = NULL;
old_nrequest = 0;
old_requests = NULL;
old_style = style;
old_triclinic = 0;
old_pgsize = pgsize;
old_oneatom = oneatom;
zeroes = NULL;
binclass = NULL;
binnames = NULL;
binmasks = NULL;
stencilclass = NULL;
stencilnames = NULL;
stencilmasks = NULL;
// topology lists
bondwhich = anglewhich = dihedralwhich = improperwhich = NONE;
neigh_bond = NULL;
neigh_angle = NULL;
neigh_dihedral = NULL;
neigh_improper = NULL;
// coords at last neighboring
maxhold = 0;
xhold = NULL;
lastcall = -1;
last_setup_bins = -1;
// pair exclusion list info
includegroup = 0;
nex_type = maxex_type = 0;
ex1_type = ex2_type = NULL;
ex_type = NULL;
nex_group = maxex_group = 0;
ex1_group = ex2_group = ex1_bit = ex2_bit = NULL;
nex_mol = maxex_mol = 0;
ex_mol_group = ex_mol_bit = NULL;
// Kokkos setting
copymode = 0;
}
/* ---------------------------------------------------------------------- */
Neighbor::~Neighbor()
{
if (copymode) return;
memory->destroy(cutneighsq);
memory->destroy(cutneighghostsq);
delete [] cuttype;
delete [] cuttypesq;
delete [] fixchecklist;
for (int i = 0; i < nlist; i++) delete lists[i];
for (int i = 0; i < nbin; i++) delete neigh_bin[i];
for (int i = 0; i < nstencil; i++) delete neigh_stencil[i];
for (int i = 0; i < nlist; i++) delete neigh_pair[i];
delete [] lists;
delete [] neigh_bin;
delete [] neigh_stencil;
delete [] neigh_pair;
delete [] slist;
delete [] plist;
for (int i = 0; i < nrequest; i++) delete requests[i];
memory->sfree(requests);
for (int i = 0; i < old_nrequest; i++) delete old_requests[i];
memory->sfree(old_requests);
delete [] zeroes;
delete [] binclass;
delete [] binnames;
delete [] binmasks;
delete [] stencilclass;
delete [] stencilnames;
delete [] stencilmasks;
delete [] pairclass;
delete [] pairnames;
delete [] pairmasks;
delete neigh_bond;
delete neigh_angle;
delete neigh_dihedral;
delete neigh_improper;
memory->destroy(xhold);
memory->destroy(ex1_type);
memory->destroy(ex2_type);
memory->destroy(ex_type);
memory->destroy(ex1_group);
memory->destroy(ex2_group);
delete [] ex1_bit;
delete [] ex2_bit;
memory->destroy(ex_mol_group);
delete [] ex_mol_bit;
}
/* ---------------------------------------------------------------------- */
void Neighbor::init()
{
int i,j,n;
ncalls = ndanger = 0;
dimension = domain->dimension;
triclinic = domain->triclinic;
newton_pair = force->newton_pair;
// error check
if (delay > 0 && (delay % every) != 0)
error->all(FLERR,"Neighbor delay must be 0 or multiple of every setting");
if (pgsize < 10*oneatom)
error->all(FLERR,"Neighbor page size must be >= 10x the one atom setting");
// ------------------------------------------------------------------
// settings
// bbox lo/hi ptrs = bounding box of entire domain, stored by Domain
if (triclinic == 0) {
bboxlo = domain->boxlo;
bboxhi = domain->boxhi;
} else {
bboxlo = domain->boxlo_bound;
bboxhi = domain->boxhi_bound;
}
// set neighbor cutoffs (force cutoff + skin)
// trigger determines when atoms migrate and neighbor lists are rebuilt
// needs to be non-zero for migration distance check
// even if pair = NULL and no neighbor lists are used
// cutneigh = force cutoff + skin if cutforce > 0, else cutneigh = 0
// cutneighghost = pair cutghost if it requests it, else same as cutneigh
triggersq = 0.25*skin*skin;
boxcheck = 0;
if (domain->box_change && (domain->xperiodic || domain->yperiodic ||
(dimension == 3 && domain->zperiodic)))
boxcheck = 1;
n = atom->ntypes;
if (cutneighsq == NULL) {
if (lmp->kokkos) init_cutneighsq_kokkos(n);
else memory->create(cutneighsq,n+1,n+1,"neigh:cutneighsq");
memory->create(cutneighghostsq,n+1,n+1,"neigh:cutneighghostsq");
cuttype = new double[n+1];
cuttypesq = new double[n+1];
}
double cutoff,delta,cut;
cutneighmin = BIG;
cutneighmax = 0.0;
for (i = 1; i <= n; i++) {
cuttype[i] = cuttypesq[i] = 0.0;
for (j = 1; j <= n; j++) {
if (force->pair) cutoff = sqrt(force->pair->cutsq[i][j]);
else cutoff = 0.0;
if (cutoff > 0.0) delta = skin;
else delta = 0.0;
cut = cutoff + delta;
cutneighsq[i][j] = cut*cut;
cuttype[i] = MAX(cuttype[i],cut);
cuttypesq[i] = MAX(cuttypesq[i],cut*cut);
cutneighmin = MIN(cutneighmin,cut);
cutneighmax = MAX(cutneighmax,cut);
if (force->pair && force->pair->ghostneigh) {
cut = force->pair->cutghost[i][j] + skin;
cutneighghostsq[i][j] = cut*cut;
} else cutneighghostsq[i][j] = cut*cut;
}
}
cutneighmaxsq = cutneighmax * cutneighmax;
// rRESPA cutoffs
int respa = 0;
if (update->whichflag == 1 && strstr(update->integrate_style,"respa")) {
if (((Respa *) update->integrate)->level_inner >= 0) respa = 1;
if (((Respa *) update->integrate)->level_middle >= 0) respa = 2;
}
if (respa) {
double *cut_respa = ((Respa *) update->integrate)->cutoff;
cut_inner_sq = (cut_respa[1] + skin) * (cut_respa[1] + skin);
cut_middle_sq = (cut_respa[3] + skin) * (cut_respa[3] + skin);
cut_middle_inside_sq = (cut_respa[0] - skin) * (cut_respa[0] - skin);
if (cut_respa[0]-skin < 0) cut_middle_inside_sq = 0.0;
}
// fixchecklist = other classes that can induce reneighboring in decide()
restart_check = 0;
if (output->restart_flag) restart_check = 1;
delete [] fixchecklist;
fixchecklist = NULL;
fixchecklist = new int[modify->nfix];
fix_check = 0;
for (i = 0; i < modify->nfix; i++)
if (modify->fix[i]->force_reneighbor)
fixchecklist[fix_check++] = i;
must_check = 0;
if (restart_check || fix_check) must_check = 1;
// set special_flag for 1-2, 1-3, 1-4 neighbors
// flag[0] is not used, flag[1] = 1-2, flag[2] = 1-3, flag[3] = 1-4
// flag = 0 if both LJ/Coulomb special values are 0.0
// flag = 1 if both LJ/Coulomb special values are 1.0
// flag = 2 otherwise or if KSpace solver is enabled
// pairwise portion of KSpace solver uses all 1-2,1-3,1-4 neighbors
// or selected Coulomb-approixmation pair styles require it
if (force->special_lj[1] == 0.0 && force->special_coul[1] == 0.0)
special_flag[1] = 0;
else if (force->special_lj[1] == 1.0 && force->special_coul[1] == 1.0)
special_flag[1] = 1;
else special_flag[1] = 2;
if (force->special_lj[2] == 0.0 && force->special_coul[2] == 0.0)
special_flag[2] = 0;
else if (force->special_lj[2] == 1.0 && force->special_coul[2] == 1.0)
special_flag[2] = 1;
else special_flag[2] = 2;
if (force->special_lj[3] == 0.0 && force->special_coul[3] == 0.0)
special_flag[3] = 0;
else if (force->special_lj[3] == 1.0 && force->special_coul[3] == 1.0)
special_flag[3] = 1;
else special_flag[3] = 2;
if (force->kspace || force->pair_match("coul/wolf",0) ||
force->pair_match("coul/dsf",0) || force->pair_match("thole",0))
special_flag[1] = special_flag[2] = special_flag[3] = 2;
// maxwt = max multiplicative factor on atom indices stored in neigh list
maxwt = 0;
if (special_flag[1] == 2) maxwt = 2;
if (special_flag[2] == 2) maxwt = 3;
if (special_flag[3] == 2) maxwt = 4;
// ------------------------------------------------------------------
// xhold array
// free if not needed for this run
if (dist_check == 0) {
memory->destroy(xhold);
maxhold = 0;
xhold = NULL;
}
// first time allocation
if (dist_check) {
if (maxhold == 0) {
maxhold = atom->nmax;
memory->create(xhold,maxhold,3,"neigh:xhold");
}
}
// ------------------------------------------------------------------
// exclusion lists
// depend on type, group, molecule settings from neigh_modify
// warn if exclusions used with KSpace solver
n = atom->ntypes;
if (nex_type == 0 && nex_group == 0 && nex_mol == 0) exclude = 0;
else exclude = 1;
if (nex_type) {
if (lmp->kokkos)
init_ex_type_kokkos(n);
else {
memory->destroy(ex_type);
memory->create(ex_type,n+1,n+1,"neigh:ex_type");
}
for (i = 1; i <= n; i++)
for (j = 1; j <= n; j++)
ex_type[i][j] = 0;
for (i = 0; i < nex_type; i++) {
if (ex1_type[i] <= 0 || ex1_type[i] > n ||
ex2_type[i] <= 0 || ex2_type[i] > n)
error->all(FLERR,"Invalid atom type in neighbor exclusion list");
ex_type[ex1_type[i]][ex2_type[i]] = 1;
ex_type[ex2_type[i]][ex1_type[i]] = 1;
}
}
if (nex_group) {
if (lmp->kokkos)
init_ex_bit_kokkos();
else {
delete [] ex1_bit;
delete [] ex2_bit;
ex1_bit = new int[nex_group];
ex2_bit = new int[nex_group];
}
for (i = 0; i < nex_group; i++) {
ex1_bit[i] = group->bitmask[ex1_group[i]];
ex2_bit[i] = group->bitmask[ex2_group[i]];
}
}
if (nex_mol) {
if (lmp->kokkos)
init_ex_mol_bit_kokkos();
else {
delete [] ex_mol_bit;
ex_mol_bit = new int[nex_mol];
}
for (i = 0; i < nex_mol; i++)
ex_mol_bit[i] = group->bitmask[ex_mol_group[i]];
}
if (exclude && force->kspace && me == 0)
error->warning(FLERR,"Neighbor exclusions used with KSpace solver "
"may give inconsistent Coulombic energies");
// ------------------------------------------------------------------
// create pairwise lists
// one-time call to init_styles() to scan style files and setup
// init_pair() creates auxiliary classes: NBin, NStencil, NPair
if (firsttime) init_styles();
firsttime = 0;
init_pair();
// invoke copy_neighbor_info() in Bin,Stencil,Pair classes
// copied once per run in case any cutoff, exclusion, special info changed
for (i = 0; i < nbin; i++) neigh_bin[i]->copy_neighbor_info();
for (i = 0; i < nstencil; i++) neigh_stencil[i]->copy_neighbor_info();
for (i = 0; i < nlist; i++)
if (neigh_pair[i]) neigh_pair[i]->copy_neighbor_info();
if (!same && comm->me == 0) print_pairwise_info();
requests_new2old();
// ------------------------------------------------------------------
// create topology lists
// instantiated topo styles can change from run to run
init_topology();
}
/* ----------------------------------------------------------------------
create and initialize lists of Nbin, Nstencil, NPair classes
lists have info on all classes in 3 style*.h files
cannot do this in constructor, b/c too early to instantiate classes
------------------------------------------------------------------------- */
void Neighbor::init_styles()
{
// extract info from NBin classes listed in style_nbin.h
nbclass = 0;
#define NBIN_CLASS
#define NBinStyle(key,Class,bitmasks) nbclass++;
#include "style_nbin.h"
#undef NBinStyle
#undef NBIN_CLASS
binclass = new BinCreator[nbclass];
binnames = new char*[nbclass];
binmasks = new int[nbclass];
nbclass = 0;
#define NBIN_CLASS
#define NBinStyle(key,Class,bitmasks) \
binnames[nbclass] = (char *) #key; \
binclass[nbclass] = &bin_creator<Class>; \
binmasks[nbclass++] = bitmasks;
#include "style_nbin.h"
#undef NBinStyle
#undef NBIN_CLASS
// extract info from NStencil classes listed in style_nstencil.h
nsclass = 0;
#define NSTENCIL_CLASS
#define NStencilStyle(key,Class,bitmasks) nsclass++;
#include "style_nstencil.h"
#undef NStencilStyle
#undef NSTENCIL_CLASS
stencilclass = new StencilCreator[nsclass];
stencilnames = new char*[nsclass];
stencilmasks = new int[nsclass];
nsclass = 0;
#define NSTENCIL_CLASS
#define NStencilStyle(key,Class,bitmasks) \
stencilnames[nsclass] = (char *) #key; \
stencilclass[nsclass] = &stencil_creator<Class>; \
stencilmasks[nsclass++] = bitmasks;
#include "style_nstencil.h"
#undef NStencilStyle
#undef NSTENCIL_CLASS
// extract info from NPair classes listed in style_npair.h
npclass = 0;
#define NPAIR_CLASS
#define NPairStyle(key,Class,bitmasks) npclass++;
#include "style_npair.h"
#undef NPairStyle
#undef NPAIR_CLASS
pairclass = new PairCreator[npclass];
pairnames = new char*[npclass];
pairmasks = new int[npclass];
npclass = 0;
#define NPAIR_CLASS
#define NPairStyle(key,Class,bitmasks) \
pairnames[npclass] = (char *) #key; \
pairclass[npclass] = &pair_creator<Class>; \
pairmasks[npclass++] = bitmasks;
#include "style_npair.h"
#undef NPairStyle
#undef NPAIR_CLASS
}
/* ----------------------------------------------------------------------
create and initialize NPair classes
------------------------------------------------------------------------- */
void Neighbor::init_pair()
{
int i,j,k,m;
// test if pairwise lists need to be re-created
// no need to re-create if:
// neigh style, triclinic, pgsize, oneatom have not changed
// current requests = old requests
// first archive request params for current requests
// before possibly changing them below
for (i = 0; i < nrequest; i++) requests[i]->archive();
same = 1;
if (style != old_style) same = 0;
if (triclinic != old_triclinic) same = 0;
if (pgsize != old_pgsize) same = 0;
if (oneatom != old_oneatom) same = 0;
if (nrequest != old_nrequest) same = 0;
else
for (i = 0; i < nrequest; i++)
if (requests[i]->identical(old_requests[i]) == 0) same = 0;
#ifdef NEIGH_LIST_DEBUG
if (comm->me == 0) printf("SAME flag %d\n",same);
#endif
if (same) return;
// delete old lists and create new ones
for (i = 0; i < nlist; i++) delete lists[i];
for (i = 0; i < nbin; i++) delete neigh_bin[i];
for (i = 0; i < nstencil; i++) delete neigh_stencil[i];
for (i = 0; i < nlist; i++) delete neigh_pair[i];
delete [] lists;
delete [] neigh_bin;
delete [] neigh_stencil;
delete [] neigh_pair;
nlist = nrequest;
lists = new NeighList*[nrequest];
neigh_bin = new NBin*[nrequest];
neigh_stencil = new NStencil*[nrequest];
neigh_pair = new NPair*[nrequest];
// create individual lists, one per request
// pass list ptr back to requestor (except for Command class)
// wait to allocate initial pages until copy lists are detected
for (i = 0; i < nrequest; i++) {
if (requests[i]->kokkos_host || requests[i]->kokkos_device)
create_kokkos_list(i);
else
lists[i] = new NeighList(lmp);
lists[i]->index = i;
if (requests[i]->pair) {
Pair *pair = (Pair *) requests[i]->requestor;
pair->init_list(requests[i]->id,lists[i]);
} else if (requests[i]->fix) {
Fix *fix = (Fix *) requests[i]->requestor;
fix->init_list(requests[i]->id,lists[i]);
} else if (requests[i]->compute) {
Compute *compute = (Compute *) requests[i]->requestor;
compute->init_list(requests[i]->id,lists[i]);
}
}
// morph requests via A,B,C rules
// this is to avoid duplicate or inefficient builds
// update both request and list when morph
// (A) rule:
// invoke post_constructor() for all lists
// processes copy,skip,half_from_full,granhistory,respaouter lists
// error checks and resets internal ptrs to other lists that now exist
for (i = 0; i < nrequest; i++)
lists[i]->post_constructor(requests[i]);
// (B) rule:
// if request = pair, half, newton != 2
// and full perpetual non-skip/copy list exists,
// then morph to half_from_full of matching parent list
// NOTE: should be OK if parent is skip list?
// see build method comments
// parent can be pair or fix, so long as perpetual fix
// NOTE: could remove newton != 2 restriction if added
// half_from_full_newtoff_ghost NPair class
// this would require full list having ghost info
// would be useful when reax/c used in hybrid mode, e.g. with airebo
for (i = 0; i < nrequest; i++) {
if (requests[i]->pair && requests[i]->half && requests[i]->newton != 2) {
for (j = 0; j < nrequest; j++) {
// Kokkos doesn't yet support half from full
if (requests[i]->kokkos_device || requests[j]->kokkos_device) continue;
if (requests[i]->kokkos_host || requests[j]->kokkos_host) continue;
if (requests[j]->full && requests[j]->occasional == 0 &&
!requests[j]->skip && !requests[j]->copy) break;
}
if (j < nrequest) {
requests[i]->half = 0;
requests[i]->half_from_full = 1;
lists[i]->listfull = lists[j];
}
}
}
// (C) rule:
// for fix/compute requests, occasional or not does not matter
// 1st check:
// if request = half and non-skip/copy pair half/respaouter request exists,
// or if request = full and non-skip/copy pair full request exists,
// or if request = gran and non-skip/copy pair gran request exists,
// then morph to copy of the matching parent list
// 2nd check: only if no match to 1st check
// if request = half and non-skip/copy pair full request exists,
// then morph to half_from_full of the matching parent list
// for 1st or 2nd check, parent can be copy list or pair or fix
for (i = 0; i < nrequest; i++) {
if (!requests[i]->fix && !requests[i]->compute) continue;
for (j = 0; j < nrequest; j++) {
// Kokkos flags must match
if (requests[i]->kokkos_device != requests[j]->kokkos_device) continue;
if (requests[i]->kokkos_host != requests[j]->kokkos_host) continue;
if (requests[i]->ssa != requests[j]->ssa) continue;
+ // newton 2 and newton 0 both are newton off
+ if ((requests[i]->newton & 2) != (requests[j]->newton & 2)) continue;
if (requests[i]->half && requests[j]->pair &&
!requests[j]->skip && requests[j]->half && !requests[j]->copy)
break;
if (requests[i]->half && requests[j]->pair &&
!requests[j]->skip && requests[j]->respaouter && !requests[j]->copy)
break;
if (requests[i]->full && requests[j]->pair &&
!requests[j]->skip && requests[j]->full && !requests[j]->copy)
break;
if (requests[i]->gran && requests[j]->pair &&
!requests[j]->skip && requests[j]->gran && !requests[j]->copy)
break;
}
if (j < nrequest) {
requests[i]->copy = 1;
requests[i]->otherlist = j;
lists[i]->copy = 1;
lists[i]->listcopy = lists[j];
continue;
}
for (j = 0; j < nrequest; j++) {
// Kokkos doesn't yet support half from full
if (requests[i]->kokkos_device || requests[j]->kokkos_device) continue;
if (requests[i]->kokkos_host || requests[j]->kokkos_host) continue;
if (requests[i]->half && requests[j]->pair &&
!requests[j]->skip && requests[j]->full && !requests[j]->copy)
break;
}
if (j < nrequest) {
requests[i]->half = 0;
requests[i]->half_from_full = 1;
lists[i]->listfull = lists[j];
}
}
// assign Bin,Stencil,Pair style to each list
int flag;
for (i = 0; i < nrequest; i++) {
flag = choose_bin(requests[i]);
lists[i]->bin_method = flag;
if (flag < 0)
error->all(FLERR,"Requested neighbor bin option does not exist");
flag = choose_stencil(requests[i]);
lists[i]->stencil_method = flag;
if (flag < 0)
error->all(FLERR,"Requested neighbor stencil method does not exist");
flag = choose_pair(requests[i]);
lists[i]->pair_method = flag;
if (flag < 0)
error->all(FLERR,"Requested neighbor pair method does not exist");
}
// instantiate unique Bin,Stencil classes in neigh_bin & neigh_stencil vecs
// instantiate one Pair class per list in neigh_pair vec
nbin = 0;
for (i = 0; i < nrequest; i++) {
flag = lists[i]->bin_method;
if (flag == 0) continue;
for (j = 0; j < nbin; j++)
if (neigh_bin[j]->istyle == flag) break;
if (j < nbin) continue;
BinCreator bin_creator = binclass[flag-1];
neigh_bin[nbin] = bin_creator(lmp);
neigh_bin[nbin]->istyle = flag;
nbin++;
}
nstencil = 0;
for (i = 0; i < nrequest; i++) {
flag = lists[i]->stencil_method;
if (flag == 0) continue;
for (j = 0; j < nstencil; j++)
if (neigh_stencil[j]->istyle == flag) break;
if (j < nstencil) continue;
StencilCreator stencil_creator = stencilclass[flag-1];
neigh_stencil[nstencil] = stencil_creator(lmp);
neigh_stencil[nstencil]->istyle = flag;
int bin_method = lists[i]->bin_method;
for (k = 0; k < nbin; k++) {
if (neigh_bin[k]->istyle == bin_method) {
neigh_stencil[nstencil]->nb = neigh_bin[k];
break;
}
}
if (k == nbin)
error->all(FLERR,"Could not assign bin method to neighbor stencil");
nstencil++;
}
for (i = 0; i < nrequest; i++) {
flag = lists[i]->pair_method;
if (flag == 0) {
neigh_pair[i] = NULL;
continue;
}
PairCreator pair_creator = pairclass[flag-1];
neigh_pair[i] = pair_creator(lmp);
neigh_pair[i]->istyle = flag;
int bin_method = lists[i]->bin_method;
if (bin_method == 0) neigh_pair[i]->nb = NULL;
else {
for (k = 0; k < nbin; k++) {
if (neigh_bin[k]->istyle == bin_method) {
neigh_pair[i]->nb = neigh_bin[k];
break;
}
}
if (k == nbin)
error->all(FLERR,"Could not assign bin method to neighbor pair");
}
int stencil_method = lists[i]->stencil_method;
if (stencil_method == 0) neigh_pair[i]->ns = NULL;
else {
for (k = 0; k < nstencil; k++) {
if (neigh_stencil[k]->istyle == stencil_method) {
neigh_pair[i]->ns = neigh_stencil[k];
break;
}
}
if (k == nstencil)
error->all(FLERR,"Could not assign stencil method to neighbor pair");
}
}
// allocate initial pages for each list, except if copy flag set
// allocate dnum vector of zeroes if set
int dnummax = 0;
for (i = 0; i < nlist; i++) {
if (lists[i]->copy) continue;
lists[i]->setup_pages(pgsize,oneatom);
dnummax = MAX(dnummax,lists[i]->dnum);
}
if (dnummax) {
delete [] zeroes;
zeroes = new double[dnummax];
for (i = 0; i < dnummax; i++) zeroes[i] = 0.0;
}
// first-time allocation of per-atom data for lists that are built and store
// lists that are not built: granhistory, respa inner/middle (no neigh_pair)
// lists that do not store: copy
// use atom->nmax for both grow() args
// i.e. grow first time to expanded size to avoid future reallocs
// also Kokkos list initialization
int maxatom = atom->nmax;
for (i = 0; i < nlist; i++)
if (neigh_pair[i] && !lists[i]->copy) lists[i]->grow(maxatom,maxatom);
// plist = indices of perpetual NPair classes
// perpetual = non-occasional, re-built at every reneighboring
// slist = indices of perpetual NStencil classes
// perpetual = used by any perpetual NPair class
delete [] slist;
delete [] plist;
nstencil_perpetual = npair_perpetual = 0;
slist = new int[nstencil];
plist = new int[nlist];
for (i = 0; i < nlist; i++) {
if (lists[i]->occasional == 0 && lists[i]->pair_method)
plist[npair_perpetual++] = i;
}
for (i = 0; i < nstencil; i++) {
flag = 0;
for (j = 0; j < npair_perpetual; j++)
if (lists[plist[j]]->stencil_method == neigh_stencil[i]->istyle)
flag = 1;
if (flag) slist[nstencil_perpetual++] = i;
}
// reorder plist vector if necessary
// relevant for lists that copy/skip/half-full from parent
// the child index must appear in plist after the parent index
// swap two indices within plist when dependency is mis-ordered
// done when entire pass thru plist results in no swaps
NeighList *ptr;
int done = 0;
while (!done) {
done = 1;
for (i = 0; i < npair_perpetual; i++) {
ptr = NULL;
if (lists[plist[i]]->listfull) ptr = lists[plist[i]]->listfull;
if (lists[plist[i]]->listcopy) ptr = lists[plist[i]]->listcopy;
// listskip check must be after listfull check
if (lists[plist[i]]->listskip) ptr = lists[plist[i]]->listskip;
if (ptr == NULL) continue;
for (m = 0; m < nrequest; m++)
if (ptr == lists[m]) break;
for (j = 0; j < npair_perpetual; j++)
if (m == plist[j]) break;
if (j < i) continue;
int tmp = plist[i]; // swap I,J indices
plist[i] = plist[j];
plist[j] = tmp;
done = 0;
break;
}
}
// debug output
#ifdef NEIGH_LIST_DEBUG
for (i = 0; i < nrequest; i++) lists[i]->print_attributes();
#endif
}
/* ----------------------------------------------------------------------
create and initialize NTopo classes
------------------------------------------------------------------------- */
void Neighbor::init_topology()
{
int i,m;
if (!atom->molecular) return;
// set flags that determine which topology neighbor classes to use
// these settings could change from run to run, depending on fixes defined
// bonds,etc can only be broken for atom->molecular = 1, not 2
// SHAKE sets bonds and angles negative
// gcmc sets all bonds, angles, etc negative
// bond_quartic sets bonds to 0
// delete_bonds sets all interactions negative
int bond_off = 0;
int angle_off = 0;
for (i = 0; i < modify->nfix; i++)
if ((strcmp(modify->fix[i]->style,"shake") == 0)
|| (strcmp(modify->fix[i]->style,"rattle") == 0))
bond_off = angle_off = 1;
if (force->bond && force->bond_match("quartic")) bond_off = 1;
if (atom->avec->bonds_allow && atom->molecular == 1) {
for (i = 0; i < atom->nlocal; i++) {
if (bond_off) break;
for (m = 0; m < atom->num_bond[i]; m++)
if (atom->bond_type[i][m] <= 0) bond_off = 1;
}
}
if (atom->avec->angles_allow && atom->molecular == 1) {
for (i = 0; i < atom->nlocal; i++) {
if (angle_off) break;
for (m = 0; m < atom->num_angle[i]; m++)
if (atom->angle_type[i][m] <= 0) angle_off = 1;
}
}
int dihedral_off = 0;
if (atom->avec->dihedrals_allow && atom->molecular == 1) {
for (i = 0; i < atom->nlocal; i++) {
if (dihedral_off) break;
for (m = 0; m < atom->num_dihedral[i]; m++)
if (atom->dihedral_type[i][m] <= 0) dihedral_off = 1;
}
}
int improper_off = 0;
if (atom->avec->impropers_allow && atom->molecular == 1) {
for (i = 0; i < atom->nlocal; i++) {
if (improper_off) break;
for (m = 0; m < atom->num_improper[i]; m++)
if (atom->improper_type[i][m] <= 0) improper_off = 1;
}
}
for (i = 0; i < modify->nfix; i++)
if ((strcmp(modify->fix[i]->style,"gcmc") == 0))
bond_off = angle_off = dihedral_off = improper_off = 1;
// sync on/off settings across all procs
int onoff = bond_off;
MPI_Allreduce(&onoff,&bond_off,1,MPI_INT,MPI_MAX,world);
onoff = angle_off;
MPI_Allreduce(&onoff,&angle_off,1,MPI_INT,MPI_MAX,world);
onoff = dihedral_off;
MPI_Allreduce(&onoff,&dihedral_off,1,MPI_INT,MPI_MAX,world);
onoff = improper_off;
MPI_Allreduce(&onoff,&improper_off,1,MPI_INT,MPI_MAX,world);
// instantiate NTopo classes
if (atom->avec->bonds_allow) {
int old_bondwhich = bondwhich;
if (atom->molecular == 2) bondwhich = TEMPLATE;
else if (bond_off) bondwhich = PARTIAL;
else bondwhich = ALL;
if (!neigh_bond || bondwhich != old_bondwhich) {
delete neigh_bond;
if (bondwhich == ALL)
neigh_bond = new NTopoBondAll(lmp);
else if (bondwhich == PARTIAL)
neigh_bond = new NTopoBondPartial(lmp);
else if (bondwhich == TEMPLATE)
neigh_bond = new NTopoBondTemplate(lmp);
}
}
if (atom->avec->angles_allow) {
int old_anglewhich = anglewhich;
if (atom->molecular == 2) anglewhich = TEMPLATE;
else if (angle_off) anglewhich = PARTIAL;
else anglewhich = ALL;
if (!neigh_angle || anglewhich != old_anglewhich) {
delete neigh_angle;
if (anglewhich == ALL)
neigh_angle = new NTopoAngleAll(lmp);
else if (anglewhich == PARTIAL)
neigh_angle = new NTopoAnglePartial(lmp);
else if (anglewhich == TEMPLATE)
neigh_angle = new NTopoAngleTemplate(lmp);
}
}
if (atom->avec->dihedrals_allow) {
int old_dihedralwhich = dihedralwhich;
if (atom->molecular == 2) dihedralwhich = TEMPLATE;
else if (dihedral_off) dihedralwhich = PARTIAL;
else dihedralwhich = ALL;
if (!neigh_dihedral || dihedralwhich != old_dihedralwhich) {
delete neigh_dihedral;
if (dihedralwhich == ALL)
neigh_dihedral = new NTopoDihedralAll(lmp);
else if (dihedralwhich == PARTIAL)
neigh_dihedral = new NTopoDihedralPartial(lmp);
else if (dihedralwhich == TEMPLATE)
neigh_dihedral = new NTopoDihedralTemplate(lmp);
}
}
if (atom->avec->impropers_allow) {
int old_improperwhich = improperwhich;
if (atom->molecular == 2) improperwhich = TEMPLATE;
else if (improper_off) improperwhich = PARTIAL;
else improperwhich = ALL;
if (!neigh_improper || improperwhich != old_improperwhich) {
delete neigh_improper;
if (improperwhich == ALL)
neigh_improper = new NTopoImproperAll(lmp);
else if (improperwhich == PARTIAL)
neigh_improper = new NTopoImproperPartial(lmp);
else if (improperwhich == TEMPLATE)
neigh_improper = new NTopoImproperTemplate(lmp);
}
}
}
/* ----------------------------------------------------------------------
output summary of pairwise neighbor list info
only called by proc 0
------------------------------------------------------------------------- */
void Neighbor::print_pairwise_info()
{
int i,j,m;
char str[128];
const char *kind;
FILE *out;
const double cutghost = MAX(cutneighmax,comm->cutghostuser);
double binsize, bbox[3];
bbox[0] = bboxhi[0]-bboxlo[0];
bbox[1] = bboxhi[1]-bboxlo[1];
bbox[2] = bboxhi[2]-bboxlo[2];
if (binsizeflag) binsize = binsize_user;
else if (style == BIN) binsize = 0.5*cutneighmax;
else binsize = 0.5*cutneighmin;
if (binsize == 0.0) binsize = bbox[0];
int nperpetual = 0;
int noccasional = 0;
int nextra = 0;
for (i = 0; i < nlist; i++) {
if (lists[i]->pair_method == 0) nextra++;
else if (lists[i]->occasional) noccasional++;
else nperpetual++;
}
for (m = 0; m < 2; m++) {
if (m == 0) out = screen;
else out = logfile;
if (out) {
fprintf(out,"Neighbor list info ...\n");
fprintf(out," update every %d steps, delay %d steps, check %s\n",
every,delay,dist_check ? "yes" : "no");
fprintf(out," max neighbors/atom: %d, page size: %d\n",
oneatom, pgsize);
fprintf(out," master list distance cutoff = %g\n",cutneighmax);
fprintf(out," ghost atom cutoff = %g\n",cutghost);
if (style != NSQ)
fprintf(out," binsize = %g, bins = %g %g %g\n",binsize,
ceil(bbox[0]/binsize), ceil(bbox[1]/binsize),
ceil(bbox[2]/binsize));
fprintf(out," %d neighbor lists, "
"perpetual/occasional/extra = %d %d %d\n",
nlist,nperpetual,noccasional,nextra);
for (i = 0; i < nlist; i++) {
if (requests[i]->pair) {
char *pname = force->pair_match_ptr((Pair *) requests[i]->requestor);
sprintf(str," (%d) pair %s",i+1,pname);
} else if (requests[i]->fix) {
sprintf(str," (%d) fix %s",i+1,
((Fix *) requests[i]->requestor)->style);
} else if (requests[i]->compute) {
sprintf(str," (%d) compute %s",i+1,
((Compute *) requests[i]->requestor)->style);
} else {
sprintf(str," (%d) command %s",i+1,requests[i]->command_style);
}
fprintf(out,"%s",str);
if (requests[i]->half) kind = "half";
else if (requests[i]->full) kind = "full";
else if (requests[i]->gran) kind = "size";
else if (requests[i]->granhistory) kind = "size/history";
else if (requests[i]->respainner) kind = "respa/inner";
else if (requests[i]->respamiddle) kind = "respa/middle";
else if (requests[i]->respaouter) kind = "respa/outer";
else if (requests[i]->half_from_full) kind = "half/from/full";
if (requests[i]->occasional) fprintf(out,", occasional");
else fprintf(out,", perpetual");
if (requests[i]->ghost) fprintf(out,", ghost");
if (requests[i]->ssa) fprintf(out,", ssa");
if (requests[i]->omp) fprintf(out,", omp");
if (requests[i]->intel) fprintf(out,", intel");
if (requests[i]->kokkos_device) fprintf(out,", kokkos_device");
if (requests[i]->kokkos_host) fprintf(out,", kokkos_host");
if (requests[i]->copy)
fprintf(out,", copy from (%d)",requests[i]->otherlist+1);
if (requests[i]->skip)
fprintf(out,", skip from (%d)",requests[i]->otherlist+1);
if (requests[i]->off2on) fprintf(out,", off2on");
fprintf(out,"\n");
if (lists[i]->pair_method == 0) fprintf(out," pair build: none\n");
else fprintf(out," pair build: %s\n",
pairnames[lists[i]->pair_method-1]);
if (lists[i]->stencil_method == 0) fprintf(out," stencil: none\n");
else fprintf(out," stencil: %s\n",
stencilnames[lists[i]->stencil_method-1]);
if (lists[i]->bin_method == 0) fprintf(out," bin: none\n");
else fprintf(out," bin: %s\n",binnames[lists[i]->bin_method-1]);
}
/*
fprintf(out," %d stencil methods\n",nstencil);
for (i = 0; i < nstencil; i++)
fprintf(out," (%d) %s\n",
i+1,stencilnames[neigh_stencil[i]->istyle-1]);
fprintf(out," %d bin methods\n",nbin);
for (i = 0; i < nbin; i++)
fprintf(out," (%d) %s\n",i+1,binnames[neigh_bin[i]->istyle-1]);
*/
}
}
}
/* ----------------------------------------------------------------------
delete old NeighRequests
copy current requests and params to old for next run
------------------------------------------------------------------------- */
void Neighbor::requests_new2old()
{
for (int i = 0; i < old_nrequest; i++) delete old_requests[i];
memory->sfree(old_requests);
old_nrequest = nrequest;
old_requests = requests;
nrequest = maxrequest = 0;
requests = NULL;
old_style = style;
old_triclinic = triclinic;
old_pgsize = pgsize;
old_oneatom = oneatom;
}
/* ----------------------------------------------------------------------
assign NBin class to a NeighList
use neigh request settings to build mask
match mask to list of masks of known Nbin classes
return index+1 of match in list of masks
return 0 for no binning
return -1 if no match
------------------------------------------------------------------------- */
int Neighbor::choose_bin(NeighRequest *rq)
{
// no binning needed
if (style == NSQ) return 0;
if (rq->skip || rq->copy || rq->half_from_full) return 0;
if (rq->granhistory) return 0;
if (rq->respainner || rq->respamiddle) return 0;
// flags for settings the request + system requires of NBin class
// ssaflag = no/yes ssa request
// intelflag = no/yes intel request
// kokkos_device_flag = no/yes kokkos device request
// kokkos_host_flag = no/yes kokkos host request
int ssaflag,intelflag,kokkos_device_flag,kokkos_host_flag;
ssaflag = intelflag = kokkos_device_flag = kokkos_host_flag = 0;
if (rq->ssa) ssaflag = NB_SSA;
if (rq->intel) intelflag = NB_INTEL;
if (rq->kokkos_device) kokkos_device_flag = NB_KOKKOS_DEVICE;
if (rq->kokkos_host) kokkos_host_flag = NB_KOKKOS_HOST;
// use flags to match exactly one of NBin class masks, bit by bit
int mask;
for (int i = 0; i < nbclass; i++) {
mask = binmasks[i];
if (ssaflag != (mask & NB_SSA)) continue;
if (intelflag != (mask & NB_INTEL)) continue;
if (kokkos_device_flag != (mask & NB_KOKKOS_DEVICE)) continue;
if (kokkos_host_flag != (mask & NB_KOKKOS_HOST)) continue;
return i+1;
}
// error return if matched none
return -1;
}
/* ----------------------------------------------------------------------
assign NStencil class to a NeighList
use neigh request settings to build mask
match mask to list of masks of known NStencil classes
return index+1 of match in list of masks
return 0 for no binning
return -1 if no match
------------------------------------------------------------------------- */
int Neighbor::choose_stencil(NeighRequest *rq)
{
// no stencil creation needed
if (style == NSQ) return 0;
if (rq->skip || rq->copy || rq->half_from_full) return 0;
if (rq->granhistory) return 0;
if (rq->respainner || rq->respamiddle) return 0;
// flags for settings the request + system requires of NStencil class
// halfflag = half request (gran and respa are also half lists)
// fullflag = full request
// ghostflag = no/yes ghost request
// ssaflag = no/yes ssa request
// dimension = 2d/3d
// newtflag = newton off/on request
// triclinic = orthgonal/triclinic box
int halfflag,fullflag,ghostflag,ssaflag;
halfflag = fullflag = ghostflag = ssaflag = 0;
if (rq->half) halfflag = 1;
if (rq->full) fullflag = 1;
if (rq->gran) halfflag = 1;
if (rq->respaouter) halfflag = 1;
if (rq->ghost) ghostflag = NS_GHOST;
if (rq->ssa) ssaflag = NS_SSA;
int newtflag;
if (rq->newton == 0 && newton_pair) newtflag = 1;
else if (rq->newton == 0 && !newton_pair) newtflag = 0;
else if (rq->newton == 1) newtflag = 1;
else if (rq->newton == 2) newtflag = 0;
// use flags to match exactly one of NStencil class masks, bit by bit
// exactly one of halfflag,fullflag is set and thus must match
int mask;
for (int i = 0; i < nsclass; i++) {
mask = stencilmasks[i];
if (halfflag) {
if (!(mask & NS_HALF)) continue;
} else if (fullflag) {
if (!(mask & NS_FULL)) continue;
}
if (ghostflag != (mask & NS_GHOST)) continue;
if (ssaflag != (mask & NS_SSA)) continue;
if (style == BIN && !(mask & NS_BIN)) continue;
if (style == MULTI && !(mask & NS_MULTI)) continue;
if (dimension == 2 && !(mask & NS_2D)) continue;
if (dimension == 3 && !(mask & NS_3D)) continue;
if (newtflag && !(mask & NS_NEWTON)) continue;
if (!newtflag && !(mask & NS_NEWTOFF)) continue;
if (!triclinic && !(mask & NS_ORTHO)) continue;
if (triclinic && !(mask & NS_TRI)) continue;
return i+1;
}
// error return if matched none
return -1;
}
/* ----------------------------------------------------------------------
assign NPair class to a NeighList
use neigh request settings to build mask
match mask to list of masks of known NPair classes
return index+1 of match in list of masks
return 0 for no binning
return -1 if no match
------------------------------------------------------------------------- */
int Neighbor::choose_pair(NeighRequest *rq)
{
// no NPair build performed
if (rq->granhistory) return 0;
if (rq->respainner || rq->respamiddle) return 0;
// error check for includegroup with ghost neighbor request
if (includegroup && rq->ghost)
error->all(FLERR,"Neighbor include group not allowed "
"with ghost neighbors");
// flags for settings the request + system requires of NPair class
// some are set to 0/1, others are set to mask bit
// comparisons below in loop over classes reflect that
// copyflag = no/yes copy request
// skipflag = no/yes skip request
// halfflag = half request (gran and respa are also half lists)
// fullflag = full request
// halffullflag = half_from_full request
// sizeflag = no/yes gran request for finite-size particles
// ghostflag = no/yes ghost request
// respaflag = no/yes respa request
// off2onflag = no/yes off2on request
// onesideflag = no/yes granonesided request
// ssaflag = no/yes request
// ompflag = no/yes omp request
// intelflag = no/yes intel request
// kokkos_device_flag = no/yes Kokkos device request
// kokkos_host_flag = no/yes Kokkos host request
// newtflag = newton off/on request
// style = NSQ/BIN/MULTI neighbor style
// triclinic = orthgonal/triclinic box
int copyflag,skipflag,halfflag,fullflag,halffullflag,sizeflag,respaflag,
ghostflag,off2onflag,onesideflag,ssaflag,ompflag,intelflag,
kokkos_device_flag,kokkos_host_flag;
copyflag = skipflag = halfflag = fullflag = halffullflag = sizeflag =
ghostflag = respaflag = off2onflag = onesideflag = ssaflag =
ompflag = intelflag = kokkos_device_flag = kokkos_host_flag = 0;
if (rq->copy) copyflag = NP_COPY;
if (rq->skip) skipflag = NP_SKIP;
// NOTE: exactly one of these request flags is set (see neigh_request.h)
// this requires gran/respaouter also set halfflag
// can simplify this logic, if follow NOTE in neigh_request.h
// why do size/off2on and size/off2on/oneside set NP_HALF
// either should set both half & full, or half should be in file name
// to be consistent with how other NP classes use "half"
if (rq->half) halfflag = 1;
if (rq->full) fullflag = 1;
if (rq->half_from_full) halffullflag = 1;
if (rq->gran) {
sizeflag = NP_SIZE;
halfflag = 1;
}
if (rq->respaouter) {
respaflag = NP_RESPA;
halfflag = 1;
}
if (rq->ghost) ghostflag = NP_GHOST;
if (rq->off2on) off2onflag = NP_OFF2ON;
if (rq->granonesided) onesideflag = NP_ONESIDE;
if (rq->ssa) ssaflag = NP_SSA;
if (rq->omp) ompflag = NP_OMP;
if (rq->intel) intelflag = NP_INTEL;
if (rq->kokkos_device) kokkos_device_flag = NP_KOKKOS_DEVICE;
if (rq->kokkos_host) kokkos_host_flag = NP_KOKKOS_HOST;
int newtflag;
if (rq->newton == 0 && newton_pair) newtflag = 1;
else if (rq->newton == 0 && !newton_pair) newtflag = 0;
else if (rq->newton == 1) newtflag = 1;
else if (rq->newton == 2) newtflag = 0;
// use flags to match exactly one of NPair class masks
// sequence of checks is bit by bit in NeighConst
int mask;
//printf("FLAGS: %d %d %d %d %d %d %d %d %d %d %d %d %d %d\n",
// copyflag,skipflag,halfflag,fullflag,halffullflag,
// sizeflag,respaflag,ghostflag,off2onflag,onesideflag,ssaflag,
// ompflag,intelflag,newtflag);
for (int i = 0; i < npclass; i++) {
mask = pairmasks[i];
// if copyflag set, return or continue with no further checks
if (copyflag) {
if (!(mask & NP_COPY)) continue;
if (kokkos_device_flag != (mask & NP_KOKKOS_DEVICE)) continue;
if (kokkos_host_flag != (mask & NP_KOKKOS_HOST)) continue;
return i+1;
}
// skipflag must match along with other flags, so do not return
if (skipflag != (mask & NP_SKIP)) continue;
// exactly one of halfflag,fullflag,halffullflag is set and must match
if (halfflag) {
if (!(mask & NP_HALF)) continue;
} else if (fullflag) {
if (!(mask & NP_FULL)) continue;
} else if (halffullflag) {
if (!(mask & NP_HALFFULL)) continue;
}
if (sizeflag != (mask & NP_SIZE)) continue;
if (respaflag != (mask & NP_RESPA)) continue;
if (ghostflag != (mask & NP_GHOST)) continue;
if (off2onflag != (mask & NP_OFF2ON)) continue;
if (onesideflag != (mask & NP_ONESIDE)) continue;
if (ssaflag != (mask & NP_SSA)) continue;
if (ompflag != (mask & NP_OMP)) continue;
if (intelflag != (mask & NP_INTEL)) continue;
// style is one of NSQ,BIN,MULTI and must match
if (style == NSQ) {
if (!(mask & NP_NSQ)) continue;
} else if (style == BIN) {
if (!(mask & NP_BIN)) continue;
} else if (style == MULTI) {
if (!(mask & NP_MULTI)) continue;
}
// newtflag is on or off and must match
if (newtflag) {
if (!(mask & NP_NEWTON)) continue;
} else if (!newtflag) {
if (!(mask & NP_NEWTOFF)) continue;
}
// triclinic flag is on or off and must match
if (triclinic) {
if (!(mask & NP_TRI)) continue;
} else if (!triclinic) {
if (!(mask & NP_ORTHO)) continue;
}
// Kokkos flags
if (kokkos_device_flag != (mask & NP_KOKKOS_DEVICE)) continue;
if (kokkos_host_flag != (mask & NP_KOKKOS_HOST)) continue;
return i+1;
}
//printf("NO MATCH\n");
// error return if matched none
return -1;
}
/* ----------------------------------------------------------------------
called by other classes to request a pairwise neighbor list
------------------------------------------------------------------------- */
int Neighbor::request(void *requestor, int instance)
{
if (nrequest == maxrequest) {
maxrequest += RQDELTA;
requests = (NeighRequest **)
memory->srealloc(requests,maxrequest*sizeof(NeighRequest *),
"neighbor:requests");
}
requests[nrequest] = new NeighRequest(lmp);
requests[nrequest]->index = nrequest;
requests[nrequest]->requestor = requestor;
requests[nrequest]->requestor_instance = instance;
nrequest++;
return nrequest-1;
}
/* ----------------------------------------------------------------------
one instance per entry in style_neigh_bin.h
------------------------------------------------------------------------- */
template <typename T>
NBin *Neighbor::bin_creator(LAMMPS *lmp)
{
return new T(lmp);
}
/* ----------------------------------------------------------------------
one instance per entry in style_neigh_stencil.h
------------------------------------------------------------------------- */
template <typename T>
NStencil *Neighbor::stencil_creator(LAMMPS *lmp)
{
return new T(lmp);
}
/* ----------------------------------------------------------------------
one instance per entry in style_neigh_pair.h
------------------------------------------------------------------------- */
template <typename T>
NPair *Neighbor::pair_creator(LAMMPS *lmp)
{
return new T(lmp);
}
/* ----------------------------------------------------------------------
setup neighbor binning and neighbor stencils
called before run and every reneighbor if box size/shape changes
only operates on perpetual lists
build_one() operates on occasional lists
------------------------------------------------------------------------- */
void Neighbor::setup_bins()
{
// invoke setup_bins() for all NBin
// actual binning is performed in build()
for (int i = 0; i < nbin; i++)
neigh_bin[i]->setup_bins(style);
// invoke create_setup() and create() for all perpetual NStencil
// same ops performed for occasional lists in build_one()
for (int i = 0; i < nstencil_perpetual; i++) {
neigh_stencil[slist[i]]->create_setup();
neigh_stencil[slist[i]]->create();
}
last_setup_bins = update->ntimestep;
}
/* ---------------------------------------------------------------------- */
int Neighbor::decide()
{
if (must_check) {
bigint n = update->ntimestep;
if (restart_check && n == output->next_restart) return 1;
for (int i = 0; i < fix_check; i++)
if (n == modify->fix[fixchecklist[i]]->next_reneighbor) return 1;
}
ago++;
if (ago >= delay && ago % every == 0) {
if (build_once) return 0;
if (dist_check == 0) return 1;
return check_distance();
} else return 0;
}
/* ----------------------------------------------------------------------
if any atom moved trigger distance (half of neighbor skin) return 1
shrink trigger distance if box size has changed
conservative shrink procedure:
compute distance each of 8 corners of box has moved since last reneighbor
reduce skin distance by sum of 2 largest of the 8 values
new trigger = 1/2 of reduced skin distance
for orthogonal box, only need 2 lo/hi corners
for triclinic, need all 8 corners since deformations can displace all 8
------------------------------------------------------------------------- */
int Neighbor::check_distance()
{
double delx,dely,delz,rsq;
double delta,deltasq,delta1,delta2;
if (boxcheck) {
if (triclinic == 0) {
delx = bboxlo[0] - boxlo_hold[0];
dely = bboxlo[1] - boxlo_hold[1];
delz = bboxlo[2] - boxlo_hold[2];
delta1 = sqrt(delx*delx + dely*dely + delz*delz);
delx = bboxhi[0] - boxhi_hold[0];
dely = bboxhi[1] - boxhi_hold[1];
delz = bboxhi[2] - boxhi_hold[2];
delta2 = sqrt(delx*delx + dely*dely + delz*delz);
delta = 0.5 * (skin - (delta1+delta2));
deltasq = delta*delta;
} else {
domain->box_corners();
delta1 = delta2 = 0.0;
for (int i = 0; i < 8; i++) {
delx = corners[i][0] - corners_hold[i][0];
dely = corners[i][1] - corners_hold[i][1];
delz = corners[i][2] - corners_hold[i][2];
delta = sqrt(delx*delx + dely*dely + delz*delz);
if (delta > delta1) delta1 = delta;
else if (delta > delta2) delta2 = delta;
}
delta = 0.5 * (skin - (delta1+delta2));
deltasq = delta*delta;
}
} else deltasq = triggersq;
double **x = atom->x;
int nlocal = atom->nlocal;
if (includegroup) nlocal = atom->nfirst;
int flag = 0;
for (int i = 0; i < nlocal; i++) {
delx = x[i][0] - xhold[i][0];
dely = x[i][1] - xhold[i][1];
delz = x[i][2] - xhold[i][2];
rsq = delx*delx + dely*dely + delz*delz;
if (rsq > deltasq) flag = 1;
}
int flagall;
MPI_Allreduce(&flag,&flagall,1,MPI_INT,MPI_MAX,world);
if (flagall && ago == MAX(every,delay)) ndanger++;
return flagall;
}
/* ----------------------------------------------------------------------
build perpetual neighbor lists
called at setup and every few timesteps during run or minimization
topology lists also built if topoflag = 1 (Kokkos calls with topoflag=0)
------------------------------------------------------------------------- */
void Neighbor::build(int topoflag)
{
int i,m;
ago = 0;
ncalls++;
lastcall = update->ntimestep;
int nlocal = atom->nlocal;
int nall = nlocal + atom->nghost;
// check that using special bond flags will not overflow neigh lists
if (nall > NEIGHMASK)
error->one(FLERR,"Too many local+ghost atoms for neighbor list");
// store current atom positions and box size if needed
if (dist_check) {
double **x = atom->x;
if (includegroup) nlocal = atom->nfirst;
if (atom->nmax > maxhold) {
maxhold = atom->nmax;
memory->destroy(xhold);
memory->create(xhold,maxhold,3,"neigh:xhold");
}
for (i = 0; i < nlocal; i++) {
xhold[i][0] = x[i][0];
xhold[i][1] = x[i][1];
xhold[i][2] = x[i][2];
}
if (boxcheck) {
if (triclinic == 0) {
boxlo_hold[0] = bboxlo[0];
boxlo_hold[1] = bboxlo[1];
boxlo_hold[2] = bboxlo[2];
boxhi_hold[0] = bboxhi[0];
boxhi_hold[1] = bboxhi[1];
boxhi_hold[2] = bboxhi[2];
} else {
domain->box_corners();
corners = domain->corners;
for (i = 0; i < 8; i++) {
corners_hold[i][0] = corners[i][0];
corners_hold[i][1] = corners[i][1];
corners_hold[i][2] = corners[i][2];
}
}
}
}
// bin atoms for all NBin instances
// not just NBin associated with perpetual lists
// b/c cannot wait to bin occasional lists in build_one() call
// if bin then, atoms may have moved outside of proc domain & bin extent,
// leading to errors or even a crash
if (style != NSQ) {
for (int i = 0; i < nbin; i++) {
neigh_bin[i]->bin_atoms_setup(nall);
neigh_bin[i]->bin_atoms();
}
}
// build pairwise lists for all perpetual NPair/NeighList
// grow() with nlocal/nall args so that only realloc if have to
for (i = 0; i < npair_perpetual; i++) {
m = plist[i];
if (!lists[m]->copy) lists[m]->grow(nlocal,nall);
neigh_pair[m]->build_setup();
neigh_pair[m]->build(lists[m]);
}
// build topology lists for bonds/angles/etc
if (atom->molecular && topoflag) build_topology();
}
/* ----------------------------------------------------------------------
build topology neighbor lists: bond, angle, dihedral, improper
copy their list info back to Neighbor for access by bond/angle/etc classes
------------------------------------------------------------------------- */
void Neighbor::build_topology()
{
if (force->bond) {
neigh_bond->build();
nbondlist = neigh_bond->nbondlist;
bondlist = neigh_bond->bondlist;
}
if (force->angle) {
neigh_angle->build();
nanglelist = neigh_angle->nanglelist;
anglelist = neigh_angle->anglelist;
}
if (force->dihedral) {
neigh_dihedral->build();
ndihedrallist = neigh_dihedral->ndihedrallist;
dihedrallist = neigh_dihedral->dihedrallist;
}
if (force->improper) {
neigh_improper->build();
nimproperlist = neigh_improper->nimproperlist;
improperlist = neigh_improper->improperlist;
}
}
/* ----------------------------------------------------------------------
build a single occasional pairwise neighbor list indexed by I
called by other classes
------------------------------------------------------------------------- */
void Neighbor::build_one(class NeighList *mylist, int preflag)
{
// check if list structure is initialized
if (mylist == NULL)
error->all(FLERR,"Trying to build an occasional neighbor list "
"before initialization completed");
// build_one() should never be invoked on a perpetual list
if (!mylist->occasional)
error->all(FLERR,"Neighbor build one invoked on perpetual list");
// no need to build if already built since last re-neighbor
// preflag is set by fix bond/create and fix bond/swap
// b/c they invoke build_one() on same step neigh list is re-built,
// but before re-build, so need to use ">" instead of ">="
NPair *np = neigh_pair[mylist->index];
if (preflag) {
if (np->last_build > lastcall) return;
} else {
if (np->last_build >= lastcall) return;
}
// if this is copy list and parent is occasional list,
// or this is half_from_full and parent is occasional list,
// insure parent is current
if (mylist->listcopy && mylist->listcopy->occasional)
build_one(mylist->listcopy,preflag);
if (mylist->listfull && mylist->listfull->occasional)
build_one(mylist->listfull,preflag);
// create stencil if hasn't been created since last setup_bins() call
NStencil *ns = np->ns;
if (ns && ns->last_create < last_setup_bins) {
ns->create_setup();
ns->create();
}
// build the list
np->build_setup();
np->build(mylist);
}
/* ----------------------------------------------------------------------
set neighbor style and skin distance
------------------------------------------------------------------------- */
void Neighbor::set(int narg, char **arg)
{
if (narg != 2) error->all(FLERR,"Illegal neighbor command");
skin = force->numeric(FLERR,arg[0]);
if (skin < 0.0) error->all(FLERR,"Illegal neighbor command");
if (strcmp(arg[1],"nsq") == 0) style = NSQ;
else if (strcmp(arg[1],"bin") == 0) style = BIN;
else if (strcmp(arg[1],"multi") == 0) style = MULTI;
else error->all(FLERR,"Illegal neighbor command");
if (style == MULTI && lmp->citeme) lmp->citeme->add(cite_neigh_multi);
}
/* ----------------------------------------------------------------------
reset timestamps in all NeignBin, NStencil, NPair classes
so that neighbor lists will rebuild properly with timestep change
------------------------------------------------------------------------- */
void Neighbor::reset_timestep(bigint ntimestep)
{
for (int i = 0; i < nbin; i++) {
neigh_bin[i]->last_setup = -1;
neigh_bin[i]->last_bin = -1;
neigh_bin[i]->last_bin_memory = -1;
}
for (int i = 0; i < nstencil; i++) {
neigh_stencil[i]->last_create = -1;
neigh_stencil[i]->last_stencil_memory = -1;
neigh_stencil[i]->last_copy_bin = -1;
}
for (int i = 0; i < nlist; i++) {
if (!neigh_pair[i]) continue;
neigh_pair[i]->last_build = -1;
neigh_pair[i]->last_copy_bin_setup = -1;
neigh_pair[i]->last_copy_bin = -1;
neigh_pair[i]->last_copy_stencil = -1;
}
}
/* ----------------------------------------------------------------------
modify parameters of the pair-wise neighbor build
------------------------------------------------------------------------- */
void Neighbor::modify_params(int narg, char **arg)
{
int iarg = 0;
while (iarg < narg) {
if (strcmp(arg[iarg],"every") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal neigh_modify command");
every = force->inumeric(FLERR,arg[iarg+1]);
if (every <= 0) error->all(FLERR,"Illegal neigh_modify command");
iarg += 2;
} else if (strcmp(arg[iarg],"delay") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal neigh_modify command");
delay = force->inumeric(FLERR,arg[iarg+1]);
if (delay < 0) error->all(FLERR,"Illegal neigh_modify command");
iarg += 2;
} else if (strcmp(arg[iarg],"check") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal neigh_modify command");
if (strcmp(arg[iarg+1],"yes") == 0) dist_check = 1;
else if (strcmp(arg[iarg+1],"no") == 0) dist_check = 0;
else error->all(FLERR,"Illegal neigh_modify command");
iarg += 2;
} else if (strcmp(arg[iarg],"once") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal neigh_modify command");
if (strcmp(arg[iarg+1],"yes") == 0) build_once = 1;
else if (strcmp(arg[iarg+1],"no") == 0) build_once = 0;
else error->all(FLERR,"Illegal neigh_modify command");
iarg += 2;
} else if (strcmp(arg[iarg],"page") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal neigh_modify command");
old_pgsize = pgsize;
pgsize = force->inumeric(FLERR,arg[iarg+1]);
iarg += 2;
} else if (strcmp(arg[iarg],"one") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal neigh_modify command");
old_oneatom = oneatom;
oneatom = force->inumeric(FLERR,arg[iarg+1]);
iarg += 2;
} else if (strcmp(arg[iarg],"binsize") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal neigh_modify command");
binsize_user = force->numeric(FLERR,arg[iarg+1]);
if (binsize_user <= 0.0) binsizeflag = 0;
else binsizeflag = 1;
iarg += 2;
} else if (strcmp(arg[iarg],"cluster") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal neigh_modify command");
if (strcmp(arg[iarg+1],"yes") == 0) cluster_check = 1;
else if (strcmp(arg[iarg+1],"no") == 0) cluster_check = 0;
else error->all(FLERR,"Illegal neigh_modify command");
iarg += 2;
} else if (strcmp(arg[iarg],"include") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal neigh_modify command");
includegroup = group->find(arg[iarg+1]);
if (includegroup < 0)
error->all(FLERR,"Invalid group ID in neigh_modify command");
if (includegroup && (atom->firstgroupname == NULL ||
strcmp(arg[iarg+1],atom->firstgroupname) != 0))
error->all(FLERR,
"Neigh_modify include group != atom_modify first group");
iarg += 2;
} else if (strcmp(arg[iarg],"exclude") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal neigh_modify command");
if (strcmp(arg[iarg+1],"type") == 0) {
if (iarg+4 > narg) error->all(FLERR,"Illegal neigh_modify command");
if (nex_type == maxex_type) {
maxex_type += EXDELTA;
memory->grow(ex1_type,maxex_type,"neigh:ex1_type");
memory->grow(ex2_type,maxex_type,"neigh:ex2_type");
}
ex1_type[nex_type] = force->inumeric(FLERR,arg[iarg+2]);
ex2_type[nex_type] = force->inumeric(FLERR,arg[iarg+3]);
nex_type++;
iarg += 4;
} else if (strcmp(arg[iarg+1],"group") == 0) {
if (iarg+4 > narg) error->all(FLERR,"Illegal neigh_modify command");
if (nex_group == maxex_group) {
maxex_group += EXDELTA;
memory->grow(ex1_group,maxex_group,"neigh:ex1_group");
memory->grow(ex2_group,maxex_group,"neigh:ex2_group");
}
ex1_group[nex_group] = group->find(arg[iarg+2]);
ex2_group[nex_group] = group->find(arg[iarg+3]);
if (ex1_group[nex_group] == -1 || ex2_group[nex_group] == -1)
error->all(FLERR,"Invalid group ID in neigh_modify command");
nex_group++;
iarg += 4;
} else if (strcmp(arg[iarg+1],"molecule") == 0) {
if (iarg+3 > narg) error->all(FLERR,"Illegal neigh_modify command");
if (atom->molecule_flag == 0)
error->all(FLERR,"Neigh_modify exclude molecule "
"requires atom attribute molecule");
if (nex_mol == maxex_mol) {
maxex_mol += EXDELTA;
memory->grow(ex_mol_group,maxex_mol,"neigh:ex_mol_group");
}
ex_mol_group[nex_mol] = group->find(arg[iarg+2]);
if (ex_mol_group[nex_mol] == -1)
error->all(FLERR,"Invalid group ID in neigh_modify command");
nex_mol++;
iarg += 3;
} else if (strcmp(arg[iarg+1],"none") == 0) {
nex_type = nex_group = nex_mol = 0;
iarg += 2;
} else error->all(FLERR,"Illegal neigh_modify command");
} else error->all(FLERR,"Illegal neigh_modify command");
}
}
/* ----------------------------------------------------------------------
remove the first group-group exclusion matching group1, group2
------------------------------------------------------------------------- */
void Neighbor::exclusion_group_group_delete(int group1, int group2)
{
int m, mlast;
for (m = 0; m < nex_group; m++)
if (ex1_group[m] == group1 && ex2_group[m] == group2 )
break;
mlast = m;
if (mlast == nex_group)
error->all(FLERR,"Unable to find group-group exclusion");
for (m = mlast+1; m < nex_group; m++) {
ex1_group[m-1] = ex1_group[m];
ex2_group[m-1] = ex2_group[m];
ex1_bit[m-1] = ex1_bit[m];
ex2_bit[m-1] = ex2_bit[m];
}
nex_group--;
}
/* ----------------------------------------------------------------------
return the value of exclude - used to check compatibility with GPU
------------------------------------------------------------------------- */
int Neighbor::exclude_setting()
{
return exclude;
}
/* ----------------------------------------------------------------------
return # of bytes of allocated memory
------------------------------------------------------------------------- */
bigint Neighbor::memory_usage()
{
bigint bytes = 0;
bytes += memory->usage(xhold,maxhold,3);
for (int i = 0; i < nlist; i++)
if (lists[i]) bytes += lists[i]->memory_usage();
for (int i = 0; i < nstencil; i++)
bytes += neigh_stencil[i]->memory_usage();
for (int i = 0; i < nbin; i++)
bytes += neigh_bin[i]->memory_usage();
if (neigh_bond) bytes += neigh_bond->memory_usage();
if (neigh_angle) bytes += neigh_angle->memory_usage();
if (neigh_dihedral) bytes += neigh_dihedral->memory_usage();
if (neigh_improper) bytes += neigh_improper->memory_usage();
return bytes;
}
diff --git a/src/npair.cpp b/src/npair.cpp
index 6ea4e6255..3c38c40f5 100644
--- a/src/npair.cpp
+++ b/src/npair.cpp
@@ -1,257 +1,258 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include <math.h>
#include "npair.h"
#include "neighbor.h"
#include "nbin.h"
#include "nstencil.h"
#include "atom.h"
#include "update.h"
#include "error.h"
using namespace LAMMPS_NS;
/* ---------------------------------------------------------------------- */
-NPair::NPair(LAMMPS *lmp) : Pointers(lmp)
+NPair::NPair(LAMMPS *lmp)
+ : Pointers(lmp), nb(NULL), ns(NULL), bins(NULL), stencil(NULL)
{
last_build = -1;
last_copy_bin_setup = last_copy_bin = last_copy_stencil = -1;
molecular = atom->molecular;
}
/* ----------------------------------------------------------------------
copy needed info from Neighbor class to this build class
------------------------------------------------------------------------- */
void NPair::copy_neighbor_info()
{
// general params
includegroup = neighbor->includegroup;
exclude = neighbor->exclude;
skin = neighbor->skin;
cutneighsq = neighbor->cutneighsq;
cutneighghostsq = neighbor->cutneighghostsq;
cut_inner_sq = neighbor->cut_inner_sq;
cut_middle_sq = neighbor->cut_middle_sq;
cut_middle_inside_sq = neighbor->cut_middle_inside_sq;
zeroes = neighbor->zeroes;
bboxlo = neighbor->bboxlo;
bboxhi = neighbor->bboxhi;
// exclusion info
nex_type = neighbor->nex_type;
ex1_type = neighbor->ex1_type;
ex2_type = neighbor->ex2_type;
ex_type = neighbor->ex_type;
nex_group = neighbor->nex_group;
ex1_group = neighbor->ex1_group;
ex2_group = neighbor->ex2_group;
ex1_bit = neighbor->ex1_bit;
ex2_bit = neighbor->ex2_bit;
nex_mol = neighbor->nex_mol;
ex_mol_group = neighbor->ex_mol_group;
ex_mol_bit = neighbor->ex_mol_bit;
// special info
special_flag = neighbor->special_flag;
}
/* ----------------------------------------------------------------------
copy bin geometry info from NBin class to this build class
------------------------------------------------------------------------- */
void NPair::copy_bin_setup_info()
{
nbinx = nb->nbinx;
nbiny = nb->nbiny;
nbinz = nb->nbinz;
mbins = nb->mbins;
mbinx = nb->mbinx;
mbiny = nb->mbiny;
mbinz = nb->mbinz;
mbinxlo = nb->mbinxlo;
mbinylo = nb->mbinylo;
mbinzlo = nb->mbinzlo;
bininvx = nb->bininvx;
bininvy = nb->bininvy;
bininvz = nb->bininvz;
}
/* ----------------------------------------------------------------------
copy per-atom and per-bin vectors from NBin class to this build class
------------------------------------------------------------------------- */
void NPair::copy_bin_info()
{
bins = nb->bins;
binhead = nb->binhead;
}
/* ----------------------------------------------------------------------
copy needed info from NStencil class to this build class
------------------------------------------------------------------------- */
void NPair::copy_stencil_info()
{
nstencil = ns->nstencil;
stencil = ns->stencil;
stencilxyz = ns->stencilxyz;
nstencil_multi = ns->nstencil_multi;
stencil_multi = ns->stencil_multi;
distsq_multi = ns->distsq_multi;
}
/* ----------------------------------------------------------------------
copy needed info from NStencil class to this build class
------------------------------------------------------------------------- */
void NPair::build_setup()
{
if (nb && last_copy_bin_setup < nb->last_setup) {
copy_bin_setup_info();
last_copy_bin_setup = update->ntimestep;
}
- if (nb && last_copy_bin < nb->last_bin_memory) {
+ if (nb && ((last_copy_bin < nb->last_bin_memory) || (bins != nb->bins))) {
copy_bin_info();
last_copy_bin = update->ntimestep;
}
- if (ns && last_copy_stencil < ns->last_create) {
+ if (ns && ((last_copy_stencil < ns->last_create) || (stencil != ns->stencil))) {
copy_stencil_info();
last_copy_stencil = update->ntimestep;
}
last_build = update->ntimestep;
}
/* ----------------------------------------------------------------------
test if atom pair i,j is excluded from neighbor list
due to type, group, molecule settings from neigh_modify command
return 1 if should be excluded, 0 if included
------------------------------------------------------------------------- */
int NPair::exclusion(int i, int j, int itype, int jtype,
- int *mask, tagint *molecule) const {
+ int *mask, tagint *molecule) const {
int m;
if (nex_type && ex_type[itype][jtype]) return 1;
if (nex_group) {
for (m = 0; m < nex_group; m++) {
if (mask[i] & ex1_bit[m] && mask[j] & ex2_bit[m]) return 1;
if (mask[i] & ex2_bit[m] && mask[j] & ex1_bit[m]) return 1;
}
}
if (nex_mol) {
for (m = 0; m < nex_mol; m++)
if (mask[i] & ex_mol_bit[m] && mask[j] & ex_mol_bit[m] &&
molecule[i] == molecule[j]) return 1;
}
return 0;
}
/* ----------------------------------------------------------------------
convert atom coords into local bin #
for orthogonal, only ghost atoms will have coord >= bboxhi or coord < bboxlo
take special care to insure ghosts are in correct bins even w/ roundoff
hi ghost atoms = nbin,nbin+1,etc
owned atoms = 0 to nbin-1
lo ghost atoms = -1,-2,etc
this is necessary so that both procs on either side of PBC
treat a pair of atoms straddling the PBC in a consistent way
for triclinic, doesn't matter since stencil & neigh list built differently
------------------------------------------------------------------------- */
int NPair::coord2bin(double *x)
{
int ix,iy,iz;
if (!ISFINITE(x[0]) || !ISFINITE(x[1]) || !ISFINITE(x[2]))
error->one(FLERR,"Non-numeric positions - simulation unstable");
if (x[0] >= bboxhi[0])
ix = static_cast<int> ((x[0]-bboxhi[0])*bininvx) + nbinx;
else if (x[0] >= bboxlo[0]) {
ix = static_cast<int> ((x[0]-bboxlo[0])*bininvx);
ix = MIN(ix,nbinx-1);
} else
ix = static_cast<int> ((x[0]-bboxlo[0])*bininvx) - 1;
if (x[1] >= bboxhi[1])
iy = static_cast<int> ((x[1]-bboxhi[1])*bininvy) + nbiny;
else if (x[1] >= bboxlo[1]) {
iy = static_cast<int> ((x[1]-bboxlo[1])*bininvy);
iy = MIN(iy,nbiny-1);
} else
iy = static_cast<int> ((x[1]-bboxlo[1])*bininvy) - 1;
if (x[2] >= bboxhi[2])
iz = static_cast<int> ((x[2]-bboxhi[2])*bininvz) + nbinz;
else if (x[2] >= bboxlo[2]) {
iz = static_cast<int> ((x[2]-bboxlo[2])*bininvz);
iz = MIN(iz,nbinz-1);
} else
iz = static_cast<int> ((x[2]-bboxlo[2])*bininvz) - 1;
return (iz-mbinzlo)*mbiny*mbinx + (iy-mbinylo)*mbinx + (ix-mbinxlo);
}
/* ----------------------------------------------------------------------
same as coord2bin, but also return ix,iy,iz offsets in each dim
------------------------------------------------------------------------- */
int NPair::coord2bin(double *x, int &ix, int &iy, int &iz)
{
if (!ISFINITE(x[0]) || !ISFINITE(x[1]) || !ISFINITE(x[2]))
error->one(FLERR,"Non-numeric positions - simulation unstable");
if (x[0] >= bboxhi[0])
ix = static_cast<int> ((x[0]-bboxhi[0])*bininvx) + nbinx;
else if (x[0] >= bboxlo[0]) {
ix = static_cast<int> ((x[0]-bboxlo[0])*bininvx);
ix = MIN(ix,nbinx-1);
} else
ix = static_cast<int> ((x[0]-bboxlo[0])*bininvx) - 1;
if (x[1] >= bboxhi[1])
iy = static_cast<int> ((x[1]-bboxhi[1])*bininvy) + nbiny;
else if (x[1] >= bboxlo[1]) {
iy = static_cast<int> ((x[1]-bboxlo[1])*bininvy);
iy = MIN(iy,nbiny-1);
} else
iy = static_cast<int> ((x[1]-bboxlo[1])*bininvy) - 1;
if (x[2] >= bboxhi[2])
iz = static_cast<int> ((x[2]-bboxhi[2])*bininvz) + nbinz;
else if (x[2] >= bboxlo[2]) {
iz = static_cast<int> ((x[2]-bboxlo[2])*bininvz);
iz = MIN(iz,nbinz-1);
} else
iz = static_cast<int> ((x[2]-bboxlo[2])*bininvz) - 1;
ix -= mbinxlo;
iy -= mbinylo;
iz -= mbinzlo;
return iz*mbiny*mbinx + iy*mbinx + ix;
}
diff --git a/src/region.cpp b/src/region.cpp
index e109b7fd6..e69fdc79d 100644
--- a/src/region.cpp
+++ b/src/region.cpp
@@ -1,605 +1,607 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include <math.h>
#include <stdlib.h>
#include <string.h>
#include "region.h"
#include "update.h"
#include "domain.h"
#include "lattice.h"
#include "input.h"
#include "variable.h"
#include "math_extra.h"
#include "error.h"
#include "force.h"
using namespace LAMMPS_NS;
/* ---------------------------------------------------------------------- */
-Region::Region(LAMMPS *lmp, int narg, char **arg) : Pointers(lmp),
- id(NULL), style(NULL), contact(NULL), list(NULL), xstr(NULL), ystr(NULL), zstr(NULL), tstr(NULL)
+Region::Region(LAMMPS *lmp, int narg, char **arg) :
+ Pointers(lmp),
+ id(NULL), style(NULL), contact(NULL), list(NULL),
+ xstr(NULL), ystr(NULL), zstr(NULL), tstr(NULL)
{
int n = strlen(arg[0]) + 1;
id = new char[n];
strcpy(id,arg[0]);
n = strlen(arg[1]) + 1;
style = new char[n];
strcpy(style,arg[1]);
varshape = 0;
xstr = ystr = zstr = tstr = NULL;
dx = dy = dz = 0.0;
size_restart = 5;
reset_vel();
copymode = 0;
list = NULL;
nregion = 1;
}
/* ---------------------------------------------------------------------- */
Region::~Region()
{
if (copymode) return;
delete [] id;
delete [] style;
delete [] xstr;
delete [] ystr;
delete [] zstr;
delete [] tstr;
}
/* ---------------------------------------------------------------------- */
void Region::init()
{
if (xstr) {
xvar = input->variable->find(xstr);
if (xvar < 0) error->all(FLERR,"Variable name for region does not exist");
if (!input->variable->equalstyle(xvar))
error->all(FLERR,"Variable for region is invalid style");
}
if (ystr) {
yvar = input->variable->find(ystr);
if (yvar < 0) error->all(FLERR,"Variable name for region does not exist");
if (!input->variable->equalstyle(yvar))
error->all(FLERR,"Variable for region is not equal style");
}
if (zstr) {
zvar = input->variable->find(zstr);
if (zvar < 0) error->all(FLERR,"Variable name for region does not exist");
if (!input->variable->equalstyle(zvar))
error->all(FLERR,"Variable for region is not equal style");
}
if (tstr) {
tvar = input->variable->find(tstr);
if (tvar < 0) error->all(FLERR,"Variable name for region does not exist");
if (!input->variable->equalstyle(tvar))
error->all(FLERR,"Variable for region is not equal style");
}
vel_timestep = -1;
}
/* ----------------------------------------------------------------------
return 1 if region is dynamic (moves/rotates) or has variable shape
else return 0 if static
------------------------------------------------------------------------- */
int Region::dynamic_check()
{
if (dynamic || varshape) return 1;
return 0;
}
/* ----------------------------------------------------------------------
called before looping over atoms with match() or surface()
this insures any variables used by region are invoked once per timestep
also insures variables are invoked by all procs even those w/out atoms
necessary if equal-style variable invokes global operation
with MPI_Allreduce, e.g. xcm() or count()
------------------------------------------------------------------------- */
void Region::prematch()
{
if (varshape) shape_update();
if (dynamic) pretransform();
}
/* ----------------------------------------------------------------------
determine if point x,y,z is a match to region volume
XOR computes 0 if 2 args are the same, 1 if different
note that inside() returns 1 for points on surface of region
thus point on surface of exterior region will not match
if region has variable shape, invoke shape_update() once per timestep
if region is dynamic, apply inverse transform to x,y,z
unmove first, then unrotate, so don't have to change rotation point
caller is responsible for wrapping this call with
modify->clearstep_compute() and modify->addstep_compute() if needed
------------------------------------------------------------------------- */
int Region::match(double x, double y, double z)
{
if (dynamic) inverse_transform(x,y,z);
if (openflag) return 1;
return !(inside(x,y,z) ^ interior);
}
/* ----------------------------------------------------------------------
generate error if Kokkos function defaults to base class
------------------------------------------------------------------------- */
void Region::match_all_kokkos(int, DAT::t_int_1d)
{
error->all(FLERR,"Can only use Kokkos supported regions with Kokkos package");
}
/* ----------------------------------------------------------------------
generate list of contact points for interior or exterior regions
if region has variable shape, invoke shape_update() once per timestep
if region is dynamic:
before: inverse transform x,y,z (unmove, then unrotate)
after: forward transform contact point xs,yx,zs (rotate, then move),
then reset contact delx,dely,delz based on new contact point
no need to do this if no rotation since delxyz doesn't change
caller is responsible for wrapping this call with
modify->clearstep_compute() and modify->addstep_compute() if needed
------------------------------------------------------------------------- */
int Region::surface(double x, double y, double z, double cutoff)
{
int ncontact;
double xs,ys,zs;
double xnear[3],xorig[3];
if (dynamic) {
xorig[0] = x;
xorig[1] = y;
xorig[2] = z;
inverse_transform(x,y,z);
}
xnear[0] = x;
xnear[1] = y;
xnear[2] = z;
if (!openflag) {
if (interior) ncontact = surface_interior(xnear,cutoff);
else ncontact = surface_exterior(xnear,cutoff);
}
else{
// one of surface_int/ext() will return 0
// so no need to worry about offset of contact indices
ncontact = surface_exterior(xnear,cutoff) + surface_interior(xnear,cutoff);
}
if (rotateflag && ncontact) {
for (int i = 0; i < ncontact; i++) {
xs = xnear[0] - contact[i].delx;
ys = xnear[1] - contact[i].dely;
zs = xnear[2] - contact[i].delz;
forward_transform(xs,ys,zs);
contact[i].delx = xorig[0] - xs;
contact[i].dely = xorig[1] - ys;
contact[i].delz = xorig[2] - zs;
}
}
return ncontact;
}
/* ----------------------------------------------------------------------
add a single contact at Nth location in contact array
x = particle position
xp,yp,zp = region surface point
------------------------------------------------------------------------- */
void Region::add_contact(int n, double *x, double xp, double yp, double zp)
{
double delx = x[0] - xp;
double dely = x[1] - yp;
double delz = x[2] - zp;
contact[n].r = sqrt(delx*delx + dely*dely + delz*delz);
contact[n].radius = 0;
contact[n].delx = delx;
contact[n].dely = dely;
contact[n].delz = delz;
}
/* ----------------------------------------------------------------------
pre-compute dx,dy,dz and theta for a moving/rotating region
called once for the region before per-atom loop, via prematch()
------------------------------------------------------------------------- */
void Region::pretransform()
{
if (moveflag) {
if (xstr) dx = input->variable->compute_equal(xvar);
if (ystr) dy = input->variable->compute_equal(yvar);
if (zstr) dz = input->variable->compute_equal(zvar);
}
if (rotateflag) theta = input->variable->compute_equal(tvar);
}
/* ----------------------------------------------------------------------
transform a point x,y,z in region space to moved space
rotate first (around original P), then displace
------------------------------------------------------------------------- */
void Region::forward_transform(double &x, double &y, double &z)
{
if (rotateflag) rotate(x,y,z,theta);
if (moveflag) {
x += dx;
y += dy;
z += dz;
}
}
/* ----------------------------------------------------------------------
transform a point x,y,z in moved space back to region space
undisplace first, then unrotate (around original P)
------------------------------------------------------------------------- */
void Region::inverse_transform(double &x, double &y, double &z)
{
if (moveflag) {
x -= dx;
y -= dy;
z -= dz;
}
if (rotateflag) rotate(x,y,z,-theta);
}
/* ----------------------------------------------------------------------
rotate x,y,z by angle via right-hand rule around point and runit normal
sign of angle determines whether rotating forward/backward in time
return updated x,y,z
R = vector axis of rotation
P = point = point to rotate around
R0 = runit = unit vector for R
X0 = x,y,z = initial coord of atom
D = X0 - P = vector from P to X0
C = (D dot R0) R0 = projection of D onto R, i.e. Dparallel
A = D - C = vector from R line to X0, i.e. Dperp
B = R0 cross A = vector perp to A in plane of rotation, same len as A
A,B define plane of circular rotation around R line
new x,y,z = P + C + A cos(angle) + B sin(angle)
------------------------------------------------------------------------- */
void Region::rotate(double &x, double &y, double &z, double angle)
{
double a[3],b[3],c[3],d[3],disp[3];
double sine = sin(angle);
double cosine = cos(angle);
d[0] = x - point[0];
d[1] = y - point[1];
d[2] = z - point[2];
double x0dotr = d[0]*runit[0] + d[1]*runit[1] + d[2]*runit[2];
c[0] = x0dotr * runit[0];
c[1] = x0dotr * runit[1];
c[2] = x0dotr * runit[2];
a[0] = d[0] - c[0];
a[1] = d[1] - c[1];
a[2] = d[2] - c[2];
b[0] = runit[1]*a[2] - runit[2]*a[1];
b[1] = runit[2]*a[0] - runit[0]*a[2];
b[2] = runit[0]*a[1] - runit[1]*a[0];
disp[0] = a[0]*cosine + b[0]*sine;
disp[1] = a[1]*cosine + b[1]*sine;
disp[2] = a[2]*cosine + b[2]*sine;
x = point[0] + c[0] + disp[0];
y = point[1] + c[1] + disp[1];
z = point[2] + c[2] + disp[2];
}
/* ----------------------------------------------------------------------
parse optional parameters at end of region input line
------------------------------------------------------------------------- */
void Region::options(int narg, char **arg)
{
if (narg < 0) error->all(FLERR,"Illegal region command");
// option defaults
interior = 1;
scaleflag = 1;
moveflag = rotateflag = 0;
openflag = 0;
for (int i = 0; i < 6; i++) open_faces[i] = 0;
int iarg = 0;
while (iarg < narg) {
if (strcmp(arg[iarg],"units") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal region command");
if (strcmp(arg[iarg+1],"box") == 0) scaleflag = 0;
else if (strcmp(arg[iarg+1],"lattice") == 0) scaleflag = 1;
else error->all(FLERR,"Illegal region command");
iarg += 2;
} else if (strcmp(arg[iarg],"side") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal region command");
if (strcmp(arg[iarg+1],"in") == 0) interior = 1;
else if (strcmp(arg[iarg+1],"out") == 0) interior = 0;
else error->all(FLERR,"Illegal region command");
iarg += 2;
} else if (strcmp(arg[iarg],"move") == 0) {
if (iarg+4 > narg) error->all(FLERR,"Illegal region command");
if (strcmp(arg[iarg+1],"NULL") != 0) {
if (strstr(arg[iarg+1],"v_") != arg[iarg+1])
error->all(FLERR,"Illegal region command");
int n = strlen(&arg[iarg+1][2]) + 1;
xstr = new char[n];
strcpy(xstr,&arg[iarg+1][2]);
}
if (strcmp(arg[iarg+2],"NULL") != 0) {
if (strstr(arg[iarg+2],"v_") != arg[iarg+2])
error->all(FLERR,"Illegal region command");
int n = strlen(&arg[iarg+2][2]) + 1;
ystr = new char[n];
strcpy(ystr,&arg[iarg+2][2]);
}
if (strcmp(arg[iarg+3],"NULL") != 0) {
if (strstr(arg[iarg+3],"v_") != arg[iarg+3])
error->all(FLERR,"Illegal region command");
int n = strlen(&arg[iarg+3][2]) + 1;
zstr = new char[n];
strcpy(zstr,&arg[iarg+3][2]);
}
moveflag = 1;
iarg += 4;
} else if (strcmp(arg[iarg],"rotate") == 0) {
if (iarg+8 > narg) error->all(FLERR,"Illegal region command");
if (strstr(arg[iarg+1],"v_") != arg[iarg+1])
error->all(FLERR,"Illegal region command");
int n = strlen(&arg[iarg+1][2]) + 1;
tstr = new char[n];
strcpy(tstr,&arg[iarg+1][2]);
point[0] = force->numeric(FLERR,arg[iarg+2]);
point[1] = force->numeric(FLERR,arg[iarg+3]);
point[2] = force->numeric(FLERR,arg[iarg+4]);
axis[0] = force->numeric(FLERR,arg[iarg+5]);
axis[1] = force->numeric(FLERR,arg[iarg+6]);
axis[2] = force->numeric(FLERR,arg[iarg+7]);
rotateflag = 1;
iarg += 8;
} else if (strcmp(arg[iarg],"open") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal region command");
int iface = force->inumeric(FLERR,arg[iarg+1]);
if (iface < 1 || iface > 6) error->all(FLERR,"Illegal region command");
// additional checks on valid face index are done by region classes
open_faces[iface-1] = 1;
openflag = 1;
iarg += 2;
}
else error->all(FLERR,"Illegal region command");
}
// error check
if ((moveflag || rotateflag) &&
(strcmp(style,"union") == 0 || strcmp(style,"intersect") == 0))
error->all(FLERR,"Region union or intersect cannot be dynamic");
// setup scaling
if (scaleflag) {
xscale = domain->lattice->xlattice;
yscale = domain->lattice->ylattice;
zscale = domain->lattice->zlattice;
}
else xscale = yscale = zscale = 1.0;
if (rotateflag) {
point[0] *= xscale;
point[1] *= yscale;
point[2] *= zscale;
}
// runit = unit vector along rotation axis
if (rotateflag) {
double len = sqrt(axis[0]*axis[0] + axis[1]*axis[1] + axis[2]*axis[2]);
if (len == 0.0)
error->all(FLERR,"Region cannot have 0 length rotation vector");
runit[0] = axis[0]/len;
runit[1] = axis[1]/len;
runit[2] = axis[2]/len;
}
if (moveflag || rotateflag) dynamic = 1;
else dynamic = 0;
}
/* ----------------------------------------------------------------------
find nearest point to C on line segment A,B and return it as D
project (C-A) onto (B-A)
t = length of that projection, normalized by length of (B-A)
t <= 0, C is closest to A
t >= 1, C is closest to B
else closest point is between A and B
------------------------------------------------------------------------- */
void Region::point_on_line_segment(double *a, double *b,
double *c, double *d)
{
double ba[3],ca[3];
MathExtra::sub3(b,a,ba);
MathExtra::sub3(c,a,ca);
double t = MathExtra::dot3(ca,ba) / MathExtra::dot3(ba,ba);
if (t <= 0.0) {
d[0] = a[0];
d[1] = a[1];
d[2] = a[2];
} else if (t >= 1.0) {
d[0] = b[0];
d[1] = b[1];
d[2] = b[2];
} else {
d[0] = a[0] + t*ba[0];
d[1] = a[1] + t*ba[1];
d[2] = a[2] + t*ba[2];
}
}
/* ----------------------------------------------------------------------
infer translational and angular velocity of region
necessary b/c motion variables are for displacement & theta
there is no analytic formula for v & omega
prev[4] contains values of dx,dy,dz,theta at previous step
used for difference, then updated to current step values
dt is time elapsed since previous step
rpoint = point updated by current displacement
called by fix wall/gran/region every timestep
------------------------------------------------------------------------- */
void Region::set_velocity()
{
if (vel_timestep == update->ntimestep) return;
vel_timestep = update->ntimestep;
if (moveflag) {
if (update->ntimestep > 0) {
v[0] = (dx - prev[0])/update->dt;
v[1] = (dy - prev[1])/update->dt;
v[2] = (dz - prev[2])/update->dt;
}
else v[0] = v[1] = v[2] = 0.0;
prev[0] = dx;
prev[1] = dy;
prev[2] = dz;
}
if (rotateflag) {
rpoint[0] = point[0] + dx;
rpoint[1] = point[1] + dy;
rpoint[2] = point[2] + dz;
if (update->ntimestep > 0) {
double angvel = (theta-prev[3]) / update->dt;
omega[0] = angvel*axis[0];
omega[1] = angvel*axis[1];
omega[2] = angvel*axis[2];
}
else omega[0] = omega[1] = omega[2] = 0.0;
prev[3] = theta;
}
if (varshape){
set_velocity_shape();
}
}
/* ----------------------------------------------------------------------
compute velocity of wall for given contact
since contacts only store delx/y/z, need to pass particle coords
to compute contact point
called by fix/wall/gran/region every contact every timestep
------------------------------------------------------------------------- */
void Region::velocity_contact(double *vwall, double *x, int ic)
{
double xc[3];
vwall[0] = vwall[1] = vwall[2] = 0.0;
if (moveflag){
vwall[0] = v[0];
vwall[1] = v[1];
vwall[2] = v[2];
}
if (rotateflag){
xc[0] = x[0] - contact[ic].delx;
xc[1] = x[1] - contact[ic].dely;
xc[2] = x[2] - contact[ic].delz;
vwall[0] += omega[1]*(xc[2] - rpoint[2]) - omega[2]*(xc[1] - rpoint[1]);
vwall[1] += omega[2]*(xc[0] - rpoint[0]) - omega[0]*(xc[2] - rpoint[2]);
vwall[2] += omega[0]*(xc[1] - rpoint[1]) - omega[1]*(xc[0] - rpoint[0]);
}
if (varshape && contact[ic].varflag) velocity_contact_shape(vwall, xc);
}
/* ----------------------------------------------------------------------
increment length of restart buffer based on region info
used by restart of fix/wall/gran/region
------------------------------------------------------------------------- */
void Region::length_restart_string(int &n)
{
n += sizeof(int) + strlen(id)+1 +
sizeof(int) + strlen(style)+1 + sizeof(int) +
size_restart*sizeof(double);
}
/* ----------------------------------------------------------------------
region writes its current style, id, number of sub-regions, position/angle
needed by fix/wall/gran/region to compute velocity by differencing scheme
------------------------------------------------------------------------- */
void Region::write_restart(FILE *fp)
{
int sizeid = (strlen(id)+1);
int sizestyle = (strlen(style)+1);
fwrite(&sizeid, sizeof(int), 1, fp);
fwrite(id,1,sizeid,fp);
fwrite(&sizestyle,sizeof(int),1,fp);
fwrite(style,1,sizestyle,fp);
fwrite(&nregion,sizeof(int),1,fp);
fwrite(prev,sizeof(double),size_restart,fp);
}
/* ----------------------------------------------------------------------
region reads style, id, number of sub-regions from restart file
if they match current region, also read previous position/angle
needed by fix/wall/gran/region to compute velocity by differencing scheme
------------------------------------------------------------------------- */
int Region::restart(char *buf, int &n)
{
int size = *((int *) (&buf[n]));
n += sizeof(int);
if ((size <= 0) || (strcmp(&buf[n],id) != 0)) return 0;
n += size;
size = *((int *) (&buf[n]));
n += sizeof(int);
if ((size <= 0) || (strcmp(&buf[n],style) != 0)) return 0;
n += size;
int restart_nreg = *((int *) (&buf[n]));
n += sizeof(int);
if (restart_nreg != nregion) return 0;
memcpy(prev,&buf[n],size_restart*sizeof(double));
return 1;
}
/* ----------------------------------------------------------------------
set prev vector to zero
------------------------------------------------------------------------- */
void Region::reset_vel()
{
for (int i = 0; i < size_restart; i++) prev[i] = 0;
}
diff --git a/src/version.h b/src/version.h
index f09421aa1..9a5686a75 100644
--- a/src/version.h
+++ b/src/version.h
@@ -1 +1 @@
-#define LAMMPS_VERSION "9 Jan 2017"
+#define LAMMPS_VERSION "17 Jan 2017"
diff --git a/tools/ch2lmp/charmm2lammps.pl b/tools/ch2lmp/charmm2lammps.pl
index 17b4b817a..0693c0a0f 100755
--- a/tools/ch2lmp/charmm2lammps.pl
+++ b/tools/ch2lmp/charmm2lammps.pl
@@ -1,2061 +1,2061 @@
#!/usr/bin/perl
#
# program: charmm2lammps.pl
# author: Pieter J. in 't Veld,
# pjintve@sandia.gov, veld@verizon.net
# date: February 12-23, April 5, 2005.
# purpose: Translation of charmm input to lammps input
#
# Notes: Copyright by author for Sandia National Laboratories
# 20050212 Needed (in the same directory):
# - $project.crd ; Assumed to be correct and running
# - $project.psf ; CHARMM configs
# - top_$forcefield.rtf ;
# - par_$forcefield.prm ;
# Ouput:
# - $project.data ; LAMMPS data file
# - $project.in ; LAMMPS input file
# - $project_ctrl.pdb ; PDB control file
# - $project_ctrl.psf ; PSF control file
# 20050218 Optimized for memory usage
# 20050221 Rotation added
# 20050222 Water added
# 20050223 Ions added
# 20050405 Water bug fixed; addition of .pdb input
# 20050407 project_ctrl.psf bug fixed; addition of -border
# 20050519 Added interpretation of charmm xplor psfs
# 20050603 Fixed centering issues
# 20050630 Fixed symbol issues arising from salt addition
# 20060818 Changed reading of pdb format to read exact columns
# 20070109 Changed AddMass() to use $max_id correctly
# 20160114 Added compatibility for parameter files that use IMPROPERS instead of IMPROPER
# Print warning when not all parameters are detected. Set correct number of atom types.
# 20160613 Fix off-by-one issue in atom type validation check
# Replace -charmm command line flag with -nohints flag
# and enable type hints in data file by default.
# Add hints also to section headers
# Add a brief minimization to example input template.
# 20161001 Added 'CMAP crossterms' section at the end of the data file
# 20161001 Added instructions in CMAP section to fix problem if 'ter'
# is not designated in the .pdb file to identify last amino acid
# 20161005 Added tweak to embed command line in generated LAMMPS input
#
# General Many thanks to Paul S. Crozier for checking script validity
# against his projects.
# Also thanks to Xiaohu Hu (hux2@ornl.gov) and Robert A. Latour
-# (latourr@clemson.edu), David Hyde-Volpe, and Tigran Abramyan,
-# Clemson University and Chris Lorenz (chris.lorenz@kcl.ac.uk),
-# King's College London for their efforts to add CMAP sections,
-# which is implemented using the option flag "-cmap".
+# (latourr@clemson.edu), David Hyde-Volpe, and Tigran Abramyan,
+# Clemson University and Chris Lorenz (chris.lorenz@kcl.ac.uk),
+# King's College London for their efforts to add CMAP sections,
+# which is implemented using the option flag "-cmap".
# Initialization
sub Test
{
my $name = shift(@_);
printf("Error: file %s not found\n", $name) if (!scalar(stat($name)));
return !scalar(stat($name));
}
sub Initialize # initialization
{
my $k = 0;
my @dir = ("x", "y", "z");
my @options = ("-help", "-nohints", "-water", "-ions", "-center",
"-quiet", "-pdb_ctrl", "-l", "-lx", "-ly", "-lz",
"-border", "-ax", "-ay", "-az", "-cmap");
my @remarks = ("display this message",
"do not print type and style hints in data file",
"add TIP3P water [default: 1 g/cc]",
"add (counter)ions using Na+ and Cl- [default: 0 mol/l]",
"recenter atoms",
"do not print info",
"output project_ctrl.pdb [default: on]",
"set x-, y-, and z-dimensions simultaneously",
"x-dimension of simulation box",
"y-dimension of simulation box",
"z-dimension of simulation box",
"add border to all sides of simulation box [default: 0 A]",
"rotation around x-axis",
"rotation around y-axis",
"rotation around z-axis",
"generate a CMAP section in data file"
);
my $notes;
$program = "charmm2lammps";
$version = "1.9.1";
$year = "2016";
$add = 1;
$water_dens = 0;
$ions = 0;
$info = 1;
$center = 0;
$net_charge = 0;
$ion_molar = 0;
$pdb_ctrl = 1;
$border = 0;
$L = (0, 0, 0);
$cmap = 0;
@R = M_Unit();
$notes = " * The average of extremes is used as the origin\n";
$notes .= " * Residues are numbered sequentially\n";
$notes .= " * Water is added on an FCC lattice: allow 5 ps for";
$notes .= " equilibration\n";
$notes .= " * Ions are added randomly and only when water is present\n";
$notes .= " * CHARMM force field v2.7 parameters used for";
$notes .= " water and NaCl\n";
$notes .= " * Rotation angles are in degrees\n";
$notes .= " * Rotations are executed consecutively: -ax -ay != -ay -ax\n";
$notes .= " * CHARMM files needed in execution directory:\n";
$notes .= " - project.crd coordinates\n";
$notes .= " - project.pdb when project.crd is absent\n";
$notes .= " - project.psf connectivity\n";
$notes .= " - top_forcefield.rtf topology\n";
$notes .= " - par_forcefield.prm parameters\n";
$notes .= " * Output files written to execution directory:\n";
$notes .= " - project.data LAMMPS data file\n";
$notes .= " - project.in suggested LAMMPS input script\n";
$notes .= " - project_ctrl.pdb control file when requested\n";
# record full command line for later use
$cmdline = "$program.pl " . join(" ",@ARGV);
foreach (@ARGV)
{
if (substr($_, 0, 1) eq "-")
{
my $k = 0;
my @tmp = split("=");
my $switch = ($arg[1] eq "")||($arg[1] eq "on")||($arg[1]!=0);
$tmp[0] = lc($tmp[0]);
foreach (@options)
{
last if ($tmp[0] eq substr($_, 0 , length($tmp[0])));
++$k;
}
$help = 1 if (!$k--); # -help
$add = 0 if (!$k--); # -nohints
$water_dens = ($tmp[1] ne "" ? $tmp[1] : 1) if (!$k--); # -water
$ion_molar = abs($tmp[1]) if (!$k); # -ions
$ions = 1 if (!$k--); # ...
$center = 1 if (!$k--); # -center
$info = 0 if (!$k--); # -quiet
$pdb_ctrl = $switch if (!$k--); # -pdb_ctrl
my $flag = $k--; # -l
$L[0] = abs($tmp[1]) if (!($flag && $k--)); # -lx
$L[1] = abs($tmp[1]) if (!($flag && $k--)); # -ly
$L[2] = abs($tmp[1]) if (!($flag && $k--)); # -lz
$border = abs($tmp[1]) if (!$k--); # -border
@R = M_Dot(M_Rotate(0, $tmp[1]), @R) if (!$k--);# -ax
@R = M_Dot(M_Rotate(1, $tmp[1]), @R) if (!$k--);# -ay
@R = M_Dot(M_Rotate(2, $tmp[1]), @R) if (!$k--);# -az
$cmap = ($tmp[1] ne "" ? $tmp[1] : 22) if (!$k--); # -cmap
print("Warning: ignoring unknown command line flag: $tmp[0]\n") unless $k;
}
else
{
$forcefield = $_ if (!$k);
$project = $_ if ($k++ == 1);
}
}
$water_dens = 1 if ($ions && !$water_dens);
if (($k<2)||$help)
{
printf("\n%s v%s (c)2005-%s by Pieter J. in \'t Veld and others\n\n",
$program, $version, $year);
printf("Usage:\n %s.pl [-option[=#] ..] forcefield project\n\n",$program);
printf("Options:\n");
for (my $i=0; $i<scalar(@options); ++$i)
{
printf(" %-10.10s %s\n", $options[$i], $remarks[$i]);
}
printf("\nNotes:\n%s\n", $notes);
exit(-1);
}
else { printf("\n%s v%s (c)2005-%s\n\n", $program, $version, $year) if ($info); }
my $flag = Test($Parameters = "par_$forcefield.prm");
$flag |= Test($Topology = "top_$forcefield.rtf");
$flag |= Test($Pdb = "$project.pdb")
if (!scalar(stat($Crd = "$project.crd")));
$flag |= Test($Psf = "$project.psf") if ($look eq "");
$pdb = ($Pdb ne "") ? 1 : 0;
printf("Conversion aborted\n\n") if ($flag);
exit(-1) if ($flag);
printf("Info: using $Pdb instead of $Crd\n") if (!scalar(stat($Crd)));
for (my $i=0; $i<3; ++$i)
{
printf("Info: l%s not set: will use extremes\n",
("x", "y", "z")[$i]) if ($info&&!$L[$i]);
}
open(PARAMETERS, "<par_$forcefield.prm");
}
# Vector manipulation
sub V_String
{
my @v = @_;
return "{".$v[0].", ".$v[1].", ".$v[2]."}";
}
sub V_Add
{
my @v1 = splice(@_, 0, 3);
my @v2 = splice(@_, 0, 3);
return ($v1[0]+$v2[0], $v1[1]+$v2[1], $v1[2]+$v2[2]);
}
sub V_Subtr
{
my @v1 = splice(@_, 0, 3);
my @v2 = splice(@_, 0, 3);
return ($v1[0]-$v2[0], $v1[1]-$v2[1], $v1[2]-$v2[2]);
}
sub V_Dot
{
my @v1 = splice(@_, 0, 3);
my @v2 = splice(@_, 0, 3);
return $v1[0]*$v2[0]+$v1[1]*$v2[1]+$v1[2]*$v2[2];
}
sub V_Mult
{
my @v = splice(@_, 0, 3);
my $f = shift(@_);
return ($f*$v[0], $f*$v[1], $f*$v[2]);
}
sub M_String
{
my $string;
for (my $i=0; $i<3; ++$i)
{
$string .= ", " if ($i);
$string .= V_String(splice(@_, 0, 3));
}
return "{".$string."}";
}
sub M_Transpose
{
return
(@_[0], @_[3], @_[6],
@_[1], @_[4], @_[7],
@_[2], @_[5], @_[8]);
}
sub M_Dot
{
my @v11 = splice(@_, 0, 3);
my @v12 = splice(@_, 0, 3);
my @v13 = splice(@_, 0, 3);
my @m = M_Transpose(splice(@_, 0, 9));
my @v21 = splice(@m, 0, 3);
my @v22 = splice(@m, 0, 3);
my @v23 = splice(@m, 0, 3);
return (
V_Dot(@v11, @v21), V_Dot(@v11, @v22), V_Dot(@v11, @v23),
V_Dot(@v12, @v21), V_Dot(@v12, @v22), V_Dot(@v12, @v23),
V_Dot(@v13, @v21), V_Dot(@v13, @v22), V_Dot(@v13, @v23));
}
sub M_Unit { return (1,0,0, 0,1,0, 0,0,1); }
sub PI { return 4*atan2(1,1); }
sub M_Rotate
{ # vmd convention
my $n = shift(@_);
my $alpha = shift(@_)*PI()/180;
my $cos = cos($alpha);
my $sin = sin($alpha);
$cos = 0 if (abs($cos)<1e-16);
$sin = 0 if (abs($sin)<1e-16);
return (1,0,0, 0,$cos,-$sin, 0,$sin,$cos) if ($n==0); # around x-axis
return ($cos,0,$sin, 0,1,0, -$sin,0,$cos) if ($n==1); # around y-axis
return ($cos,-$sin,0, $sin,$cos,0, 0,0,1) if ($n==2); # around z-axis
return M_Unit();
}
sub MV_Dot
{
my @v11 = splice(@_, 0, 3);
my @v12 = splice(@_, 0, 3);
my @v13 = splice(@_, 0, 3);
my @v2 = splice(@_, 0, 3);
return (V_Dot(@v11, @v2), V_Dot(@v12, @v2), V_Dot(@v13, @v2));
}
# CHARMM input
sub PSFConnectivity
{
my $n = PSFGoto(bonds);
return if (scalar(@nconnect));
printf("Info: creating connectivity\n") if ($info);
for (my $i=0; $i<$n; ++$i)
{
my @bond = PSFGet(2);
$connect[$bond[0]][$nconnect[$bond[0]]++] = $bond[1];
$connect[$bond[1]][$nconnect[$bond[1]]++] = $bond[0];
}
}
sub PSFDihedrals # hack to accomodate
{ # LAMMPS' way of calc
$idihedral = 0; # LJ 1-4 interactions
return $ndihedral if (($dihedral_flag = $ndihedral ? 1 : 0));
PSFConnectivity();
printf("Info: creating dihedrals\n") if ($info);
my $n = scalar(@nconnect);
my @bonded = ();
for (my $i=1; $i<=$n; ++$i)
{
$bonded[0] = $i;
for (my $i=0; $i<scalar($nconnect[$bonded[0]]); ++$i)
{
$bonded[1] = $connect[$bonded[0]][$i];
for (my $i=0; $i<scalar($nconnect[$bonded[1]]); ++$i)
{
next if (($bonded[2] = $connect[$bonded[1]][$i])==$bonded[0]);
for (my $i=0; $i<scalar($nconnect[$bonded[2]]); ++$i)
{
next if (($bonded[3] = $connect[$bonded[2]][$i])==$bonded[1]);
next if ($bonded[3]<$bonded[0]);
$dihedral[$ndihedral++] = join(" ", @bonded);
}
}
}
}
$dihedral_flag = 1;
return $ndihedral;
}
sub CreatePSFIndex # make an index of id
{ # locations
my @psf_ids = ("!NATOM","!NBOND:","!NTHETA:","!NPHI:","!NIMPHI:");
my @ids = (atoms, bonds, angles, dihedrals, impropers);
my $k = 0;
my %hash;
printf("Info: creating PSF index\n") if ($info);
open(PSF, "<$project.psf") if (fileno(PSF) eq "");
foreach (@psf_ids) { $hash{$_} = shift(@ids); };
while (<PSF>)
{
chop();
my @tmp = split(" ");
my $n = $hash{$tmp[1]};
$PSFIndex{$n} = tell(PSF)." ".$tmp[0] if ($n ne "");
}
}
sub PSFGoto # goto $ident in <PSF>
{
CreatePSFIndex() if (!scalar(%PSFIndex));
my $id = shift(@_);
my @n = split(" ", $PSFIndex{$id});
@PSFBuffer = ();
# return PSFDihedrals() if ($id eq "dihedrals");
if (!scalar(@n))
{
printf("Warning: PSF index for $id not found\n");
seek(PSF, 0, SEEK_END);
return -1;
}
seek(PSF, $n[0], SEEK_SET);
return $n[1];
}
sub PSFGet
{
if ($dihedral_flag)
{
$dihedral_flag = $idihedral+1<$ndihedral ? 1 : 0;
return split(" ", $dihedral[$idihedral++]);
}
if (!scalar(@PSFBuffer))
{
my $line = <PSF>;
chop($line);
@PSFBuffer = split(" ", $line);
}
return splice(@PSFBuffer, 0, shift(@_));
}
sub PSFWrite
{
my $items = shift(@_);
my $n = $items;
if ($psf_ncols>7) { printf(PSF_CTRL "\n"); $psf_ncols = 0; }
foreach(@_)
{
printf(PSF_CTRL " %7d", $_);
++$psf_ncols;
if ((!--$n) && ($psf_ncols>7))
{
printf(PSF_CTRL "\n");
$psf_ncols = 0;
$n = $items;
}
}
}
sub CRDGoto
{
my $n;
return if (shift(@_) ne "atoms");
open(CRD, "<".($pdb ? $Pdb : $Crd)) if (fileno(CRD) eq "");
seek(CRD, 0, SEEK_SET);
return PSFGoto(atoms) if ($pdb);
while (substr($n = <CRD>, 0, 1) eq "*") {}
chop($n);
return $n;
}
sub NextPDB2CRD
{
my @n = (6,5,1,4,1,3,1,1,4,1,3,8,8,8,6,6,6,4,2,2);
my @data = ();
my $c = 0;
my $line;
while (substr($line = <CRD>, 0, 4) ne "ATOM") {};
chop($line);
foreach (@n) { push(@data, substr($line, ($c += $_)-$_, $_)); }
return @data[1, 8, 5, 3, 11, 12, 13, 17, 8, 15];
}
sub Delete
{
my $item = shift(@_);
my $k = 0;
my @list;
foreach (@_)
{
my @tmp = split(" ");
delete($tmp[$item]);
$list[$k++] = join(" ", @tmp);
}
return @list;
}
sub CreateID # create id from list
{
my $n = scalar(@_);
my @list = @_;
my $id = "";
my $flag = $list[0] gt $list[-1];
my $j = $n;
my $tmp;
return "" if (scalar(@list)<$n);
$flag = $list[1] gt $list[-2]
if ((scalar(@list)>3)&&($list[0] eq $list[-1]));
for (my $i=0; $i<$n; ++$i)
{
$id .= ($i ? " " : "").($tmp = $list[$flag ? --$j : $i]);
$id .= substr(" ", 0, 4-length($tmp));
}
return $id;
}
sub AtomTypes
{
my $n = PSFGoto(atoms);
my %list;
return () if ($n<1);
$atom_types[0] = -1;
for (my $i=0; $i<$n; ++$i)
{
my @tmp = split(" ", <PSF>);
$tmp[5] = $symbols{$tmp[5]}
if ((substr($tmp[5],0,1) lt '0')||(substr($tmp[5],0,1) gt '9'));
push(@atom_types, $tmp[5]);
++$list{$tmp[5]};
}
if ($water_dens)
{
push(@atom_types, $symbols{HT}); ++$list{$symbols{HT}};
push(@atom_types, $symbols{OT}); ++$list{$symbols{OT}};
}
if ($ions)
{
push(@atom_types, $symbols{CLA}); ++$list{$symbols{CLA}};
push(@atom_types, $symbols{SOD}); ++$list{$symbols{SOD}};
}
return sort({$a<=>$b} keys(%list));
}
sub Markers
{
my %markers = (
NONBONDED => '0',
BONDS => '1',
ANGLES => '2',
DIHEDRALS => '3',
IMPROPERS => '4',
IMPROPER => '4'
);
return %markers;
}
sub NonBond
{
my @cols = @_;
my $f = (scalar(@cols)>3)&&(substr($cols[3],0,1) ne "!");
my @tmp = (-$cols[1], $cols[2],
$f ? -$cols[4]:-$cols[1], $f ? $cols[5]:$cols[2]);
$tmp[1] *= 2.0**(5/6); # adjust sigma
$tmp[3] *= 2.0**(5/6); # adjust sigma 1-4
return join(" ", @tmp);
}
sub AtomParameters # non-bonded parameters
{
my @types;
my @list;
my $k = 0;
my $read = 0;
my %markers = Markers();
foreach(@_) { $types{$ids{$_}} = $k++; }
seek(PARAMETERS, 0, 0);
while (<PARAMETERS>)
{
chop();
my @cols = split(" ");
if ($read&&(scalar(@cols)>1)&&
(substr($cols[0],0,1) ne "!")&&($cols[1] lt "A"))
{
my $k = $types{shift(@cols)};
$list[$k] = NonBond(@cols) if ($k ne "");
}
if ($markers{$cols[0]} ne "") {
$read = ($markers{$cols[0]} eq "0") ? 1 : 0; }
}
$list[$types{HT}] = NonBond(0, -0.046, 0.2245)
if ($water_dens&&($list[$types{HT}] eq ""));
$list[$types{OT}] = NonBond(0, -0.152100, 1.768200)
if ($water_dens&&($list[$types{OT}] eq ""));
$list[$types{CLA}] = NonBond(0, -0.150, 2.27)
if ($ions&&($list[$types{CLA}] eq ""));
$list[$types{SOD}] = NonBond(0, -0.0469, 1.36375)
if ($ions&&($list[$types{SOD}] eq ""));
return @list;
}
sub BondedTypes # create bonded types
{
my $mode = shift(@_); # operation mode
my $items = (2, 3, 4, 4)[$mode]; # items per entry
my $id = (bonds, angles, dihedrals, impropers)[$mode];
my $n = PSFGoto($id);
my %list;
for (my $i=0; $i<$n; ++$i)
{
my @tmp = ();
foreach (PSFGet($items)) { push(@tmp, $ids{$atom_types[$_]}); }
++$list{CreateID(@tmp)};
}
++$list{CreateID(HT, OT)} if ($water_dens&&($mode==0));
++$list{CreateID(HT, OT, HT)} if ($water_dens&&($mode==1));
@types = sort(keys(%list));
}
sub Parameters # parms from columns
{
my $items = shift(@_);
my @cols = @_;
my $parms = "";
for (my $i=$items; ($i<scalar(@cols))&&(substr($cols[$i],0,1)ne"!"); ++$i)
{
$parms = $parms.($i>$items ? " " : "").$cols[$i];
}
return $parms;
}
sub BondedParameters # distil parms from
{ # <PARAMETERS>
my $mode = shift(@_); # bonded mode
return if (($mode>3)||($mode<0));
my $items = (2, 3, 4, 4)[$mode]; # items per entry
my $name = ("bond", "angle", "dihedral", "improper")[$mode];
my $read = 0;
my $k = 0;
my %markers = Markers();
my @set;
my @tmp;
my $f;
my %list;
my %link;
@parms = ();
foreach(@types) { $link{$_} = $k++; }
seek(PARAMETERS, 0, 0);
while (<PARAMETERS>)
{
chomp();
my @cols = split(" ");
if ($read&&(scalar(@cols)>$items)&&($cols[$items] lt "A"))
{
if (($items==4)&&(($f = ($cols[1] eq "X")&&($cols[2] eq "X"))||
(($cols[0] eq "X")&&($cols[3] eq "X")))) # wildcards
{
my $id = CreateID(($cols[1-$f], $cols[2+$f]));
for ($k=0; $k<scalar(@types); ++$k)
{
if (!$set[$k])
{
my @tmp = split(" ", $types[$k]);
if (CreateID($tmp[1-$f], $tmp[2+$f]) eq $id)
{
if ($mode==2)
{
if ($parms[$k] eq "") {
$parms[$k] = Parameters($items,@cols)." 1"; }
else {
$parms[$k] .= ":".Parameters($items,@cols)." 0"; }
}
else {
$parms[$k] .= Parameters($items,@cols); }
}
}
}
}
else # regular
{
for (my $i=0; $i<$items; ++$i) { $tmp[$i] = $cols[$i]; };
$k = $link{CreateID(@tmp)};
if ($k ne "")
{
$parms[$k] = "" if (!$set[$k]);
$parms[$k] .= ($set[$k]++ ? ":" : "").Parameters($items,@cols);
$parms[$k] .= ($set[$k]-1 ? " 0" : " 1") if ($mode==2);
}
}
}
if ($markers{$cols[0]}) {
$read = ($markers{$cols[0]} eq $mode+1) ? 1 : 0; }
}
if ($water_dens)
{
$parms[$link{CreateID(HT, OT)}] = "450 0.9572" if ($mode==0);
$parms[$link{CreateID(HT, OT, HT)}] = "55 104.52" if ($mode==1);
}
for (my $i=0; $i<scalar(@types); ++$i)
{
printf("Warning: %s parameter %4d for [%s] was not found\n",
$name, $i+1, $types[$i]) if ($parms[$i] eq "");
}
}
sub SetScreeningFactor # set screening factor
{
my $id = shift(@_);
my $value = shift(@_);
my $new = "";
foreach (split(":", $parms[$id]))
{
my @tmp = split(" ");
$tmp[-1] = $value if ($tmp[-1]);
$new .= ":" if ($new ne "");
$new .= join(" ", @tmp);
}
$parms[$id] = $new;
}
sub CorrectDihedralParameters
{
my $n = PSFGoto(dihedrals);
my %hash;
my $hash_id;
my $id1;
my $id2;
my $first;
my $last;
for (my $i=0; $i<$n; ++$i)
{
my @bonded = PSFGet(4);
my @tmp = ();
foreach (@bonded) { push(@tmp, $ids{$atom_types[$_]}); }
$id1 = $link{CreateID(@tmp)}-1;
$first = $bonded[0];
$last = $bonded[3];
if ($first>$last) { my $tmp = $first; $first = $last; $last = $tmp; }
if (($id2 = $hash{$hash_id = $first." ".$last}) eq "")
{
$hash{$hash_id} = $id1; # add id to hash
}
else
{
SetScreeningFactor($id1, 0.5); # 6-ring: shared 1-4
SetScreeningFactor($id2, 0.5);
}
}
$n = PSFGoto(angles);
for (my $i=0; $i<$n; ++$i)
{
my @bonded = PSFGet(3);
$first = $bonded[0];
$last = $bonded[2];
if ($first>$last) { my $tmp = $first; $first = $last; $last = $tmp; }
if (($id1 = $hash{$first." ".$last}) ne "")
{
SetScreeningFactor($id1, 0); # 5-ring: no 1-4
}
}
}
sub AddMass
{
my $symbol = shift(@_);
my $mass = shift(@_);
return if ($symbols{$symbol} ne "");
$ids{++$max_id} = $symbol;
$masses{$max_id} = $mass;
$symbols{$symbol} = $max_id;
}
sub ReadTopology # read topology links
{
my $id = shift(@_);
my $item = shift(@_);
my $read = 0;
my @tmp;
open(TOPOLOGY, "<top_$forcefield.rtf");
$max_id = 0;
while (<TOPOLOGY>)
{
chop(); # delete CR at end
my @tmp = split(" ");
$read = 1 if ($tmp[0] eq "MASS");
if ($read&&($tmp[0] eq "MASS"))
{
$symbols{$tmp[2]} = $tmp[1];
$ids{$tmp[1]} = $tmp[2];
$masses{$tmp[1]} = $tmp[3];
$max_id = $tmp[1] if ($max_id<$tmp[1]);
}
# $names{$tmp[1]} = $tmp[4] if ($read&&($tmp[0] eq "MASS"));
last if ($read&&!scalar(@tmp)); # quit reading
}
AddMass(HT, 1.00800);
AddMass(OT, 15.99940);
AddMass(CLA, 35.450000);
AddMass(SOD, 22.989770);
close(TOPOLOGY);
}
sub CrossLink # symbolic cross-links
{
my @list = @_;
my $n = scalar(@list);
my %hash;
for (my $i=0; $i<$n; ++$i) { $hash{$list[$i]} = $i+1; }
return %hash;
}
sub CharacterizeBox
{
my $flag = 1;
my @x = (-$L[0]/2, $L[0]/2);
my @y = (-$L[1]/2, $L[1]/2);
my @z = (-$L[2]/2, $L[2]/2);
my $n = CRDGoto(atoms);
my $extremes = !($L[0] && $L[1] && $L[2]);
@Center = (0, 0, 0);
return if (!$n);
for (my $i=0; $i<$n; ++$i)
{
my @tmp = $pdb ? NextPDB2CRD() : split(" ", <CRD>);
my @p = @tmp[-6, -5, -4];
@p = MV_Dot(@R, @p);
$x[0] = $p[0] if ($flag||($p[0]<$x[0]));
$x[1] = $p[0] if ($flag||($p[0]>$x[1]));
$y[0] = $p[1] if ($flag||($p[1]<$y[0]));
$y[1] = $p[1] if ($flag||($p[1]>$y[1]));
$z[0] = $p[2] if ($flag||($p[2]<$z[0]));
$z[1] = $p[2] if ($flag||($p[2]>$z[1]));
$flag = 0 if ($flag);
}
$L[0] = $x[1]-$x[0] if (!$L[0]);
$L[1] = $y[1]-$y[0] if (!$L[1]);
$L[2] = $z[1]-$z[0] if (!$L[2]);
$L[0] += $border;
$L[1] += $border;
$L[2] += $border;
@Center = (($x[1]+$x[0])/2, ($y[1]+$y[0])/2, ($z[1]+$z[0])/2);
printf("Info: recentering atoms\n") if ($info&&$center);
}
sub SetupWater
{
return if (!$water_dens);
my $dens = 1000*$water_dens; # kg/m^3
my $m = 0.018; # kg/mol
my $loh = 0.9572; # l[O-H] in [A]
my $s_OT = 1.7682; # CHARMM sigma [A]
my $ahoh = (180-104.52)/360*PI();
my @p = ($loh*cos($ahoh), $loh*sin($ahoh), 0);
printf("Info: creating fcc water\n") if ($info);
$n_water = 4; # molecules/cell
$nav = 6.022e23; # 1/mol
$v_water = $m/$nav/$dens*1e30; # water volume [A^3]
$r_water = $s_OT*2**(-1/6); # sigma_OT in [A]
@p_water = (0,0,0, @p, -$p[0],$p[1],0);
$v_fcc = $n_water*$v_water; # cell volume
$l_fcc = $v_fcc**(1/3); # cell length
@p_fcc = (0.00,0.00,0.00, 0.50,0.50,0.00,
0.50,0.00,0.50, 0.00,0.50,0.50);
@n_fcc = ();
for (my $i=0; $i<scalar(@L); ++$i)
{
my $n = $L[$i]/$l_fcc; # calculate n_fcc
$n = int($n-int($n) ? $n+1 : $n); # ceil($n)
$L[$i] = $n*$l_fcc; # adjust box length
printf("Info: changed l%s to %g A\n", ("x","y","z")[$i], $L[$i])
if ($info);
push(@n_fcc, $n);
}
foreach (@p_fcc) { $_ = ($_+0.25)*$l_fcc; } # p_fcc in [A]
for (my $x=0; $x<$n_fcc[0]; ++$x) { # initialize flags
for (my $y=0; $y<$n_fcc[1]; ++$y) {
for (my $z=0; $z<$n_fcc[2]; ++$z) {
$flags_fcc[$x][$y][$z] = 15; } } } # turn on all fcc sites
}
sub floor
{
my $x = shift(@_);
return $x>0 ? int($x) : int($x)-1;
}
sub Periodic
{
my @p = splice(@_, 0, 3);
return (
$p[0]-floor($p[0]/$L[0]+0.5)*$L[0],
$p[1]-floor($p[1]/$L[1]+0.5)*$L[1],
$p[2]-floor($p[2]/$L[2]+0.5)*$L[2]);
}
sub EraseWater
{
my $r = shift(@_)/2;
my @p = splice(@_, 0, 3);
@p = V_Subtr(@p, @Center) if (!$center);
my @edges = (
$p[0]-$r,$p[1]-$r,$p[2]-$r, $p[0]-$r,$p[1]-$r,$p[2]+$r,
$p[0]-$r,$p[1]+$r,$p[2]-$r, $p[0]-$r,$p[1]+$r,$p[2]+$r,
$p[0]+$r,$p[1]-$r,$p[2]-$r, $p[0]+$r,$p[1]-$r,$p[2]+$r,
$p[0]+$r,$p[1]+$r,$p[2]-$r, $p[0]+$r,$p[1]+$r,$p[2]+$r);
my %list;
my @n;
my $d2 = ($r_water+$r)**2;
my @l = ($L[0]/2, $L[1]/2, $L[2]/2);
for (my $i=0; $i<scalar(@edges); $i+=3) # determine candidates
{
my @q = Periodic(@edges[$i, $i+1, $i+2]);
my @n = (int(($q[0]+$l[0])/$l_fcc),int(($q[1]+$l[1])/$l_fcc),
int(($q[2]+$l[2])/$l_fcc));
++$list{join(" ", @n)};
}
foreach (sort(keys(%list))) # check overlap
{
my @n = split(" ");
my @corner = ($n[0]*$l_fcc-$l[0]+$p_water[0],
$n[1]*$l_fcc-$l[1]+$p_water[1],
$n[2]*$l_fcc-$l[2]+$p_water[2]);
my $bit = 1;
my $flags = 0;
for (my $i=0; $i<scalar(@p_fcc); $i+=3)
{
my @q = V_Add(@corner, @p_fcc[$i,$i+1,$i+2]);
my @dp = Periodic(V_Subtr(@q, @p));
$flags |= $bit if (V_Dot(@dp, @dp)>$d2); # turn on fcc
$bit *= 2;
}
$flags_fcc[$n[0]][$n[1]][$n[2]] &= $flags; # set flags
}
}
sub CountFCC
{
my $n = 0;
return $n_fccs = 0 if (!$water_dens);
for (my $x=0; $x<$n_fcc[0]; ++$x) { # count water
for (my $y=0; $y<$n_fcc[1]; ++$y) {
for (my $z=0; $z<$n_fcc[2]; ++$z) {
my $bit = 1;
my $flags = $flags_fcc[$x][$y][$z];
for (my $i=0; $i<$n_water; ++$i) {
++$n if ($flags & $bit);
$bit *= 2; } } } }
return ($n_fccs = $n);
}
sub AddIons
{
my $n = ($n_waters = CountFCC())-int(abs($net_charge));
return if (!$ions);
printf("Warning: charge not neutralized: too little water\n") if ($n<0);
return if ($n<0);
printf(
"Warning: charge not neutralized: net charge (%g) is not an integer\n",
$net_charge) if ($net_charge!=int($net_charge));
my $n_na = $net_charge<0 ? int(abs($net_charge)) : 0;
my $n_cl = $net_charge>0 ? int(abs($net_charge)) : 0;
my $n_mol = int($ion_molar*$n*$v_water*1e-27*$nav+0.5);
my $n_atoms = ($n_na += $n_mol)+($n_cl += $n_mol);
$n_waters -= $n_atoms;
printf(
"Info: adding ions: [NaCl] = %g mol/l (%d Na+, %d Cl-)\n",
$n_mol/$n/$v_water/$nav/1e-27, $n_na, $n_cl) if ($info);
$n += int(abs($net_charge));
my $salt = 2**$n_water;
srand(time()); # seed random number
for (my $x=0; $x<$n_fcc[0]; ++$x) # replace water by ions
{
for (my $y=0; $y<$n_fcc[1]; ++$y)
{
for (my $z=0; $z<$n_fcc[2]; ++$z)
{
my $bit = 1;
my $flags = $flags_fcc[$x][$y][$z];
for (my $i=0; $i<$n_water; ++$i)
{
if ($flags & $bit)
{
my $prob = $n_atoms/$n;
--$n;
if (rand()<$prob)
{
my $na = rand()<$n_na/$n_atoms ? 1 : 0;
--$n_atoms;
if ($na) { --$n_na; } else { --$n_cl; }
$flags |= $salt*(1+$salt*$na)*$bit; # set type of ion
}
};
$bit *= 2;
}
$flags_fcc[$x][$y][$z] = $flags;
}
}
}
}
# LAMMPS output
sub WriteLAMMPSHeader # print lammps header
{
printf(LAMMPS "LAMMPS data file. %sCreated by $program v$version on %s\n",
($add ? "CGCMM Style. atom_style full. " : ""),`date`);
printf(LAMMPS "%12d atoms\n", $natoms);
printf(LAMMPS "%12d bonds\n", $nbonds);
printf(LAMMPS "%12d angles\n", $nangles);
printf(LAMMPS "%12d dihedrals\n", $ndihedrals);
printf(LAMMPS "%12d impropers\n\n", $nimpropers);
printf(LAMMPS "%12d atom types\n", $natom_types);
printf(LAMMPS "%12d bond types\n", $nbond_types);
printf(LAMMPS "%12d angle types\n", $nangle_types);
printf(LAMMPS "%12d dihedral types\n", $ndihedral_types);
printf(LAMMPS "%12d improper types\n\n", $nimproper_types);
}
sub WriteControlHeader
{
printf(PDB_CTRL "REMARK \n");
printf(PDB_CTRL "REMARK CONTROL PDB %s_ctrl.pdb\n", $project);
printf(PDB_CTRL "REMARK CREATED BY %s v%s ON %s",
$program, $version, `date`);
printf(PDB_CTRL "REMARK \n");
printf(PSF_CTRL "PSF\n");
printf(PSF_CTRL "\n");
printf(PSF_CTRL "%8d !NTITLE\n", 2);
printf(PSF_CTRL " REMARKS CONTROL PSF %s_ctrl.psf\n", $project);
printf(PSF_CTRL " REMARKS CREATED BY %s v%s ON %s",
$program, $version, `date`);
printf(PSF_CTRL "\n");
}
sub WriteBoxSize # print box limits
{
my @lo = V_Mult(@L[0,1,2], -1/2);
my @hi = V_Mult(@L[0,1,2], 1/2);
@lo = V_Add(@lo, @Center) if (!$center);
@hi = V_Add(@hi, @Center) if (!$center);
printf(LAMMPS "%12.8g %12.8g xlo xhi\n", $lo[0], $hi[0]);
printf(LAMMPS "%12.8g %12.8g ylo yhi\n", $lo[1], $hi[1]);
printf(LAMMPS "%12.8g %12.8g zlo zhi\n\n", $lo[2], $hi[2]);
}
sub WriteMasses # print mass list
{
my $k = 0;
printf(LAMMPS "Masses\n\n");
foreach (@types)
{
printf(LAMMPS "%8d %10.7g%s\n",
++$k, $masses{$_}, $add ? " # ".$ids{$_} : "");
}
printf(LAMMPS "\n");
}
sub WriteFCCAtoms
{
my $k = shift(@_);
my $res = shift(@_);
return $k if (!$water_dens);
$k_fcc = $k+1;
my @id = ($symbols{OT}, $symbols{HT}, $symbols{HT},
$symbols{SOD}, $symbols{CLA});
my @par = ();
my @charge = (-0.834, 0.417, 0.417, 1, -1);
my $salt = 2**$n_water;
my @l = ($L[0]/2, $L[1]/2, $L[2]/2);
my $iwater = 0;
my $isalt = 0;
foreach(@id) { push(@par, $link{$_}); }
for (my $x=0; $x<$n_fcc[0]; ++$x)
{
for (my $y=0; $y<$n_fcc[1]; ++$y)
{
for (my $z=0; $z<$n_fcc[2]; ++$z)
{
my @corner = ($x*$l_fcc-$l[0], $y*$l_fcc-$l[1], $z*$l_fcc-$l[2]);
my $flags = $flags_fcc[$x][$y][$z];
my $bit = 1;
for (my $i=0; $i<scalar(@p_fcc); $i+=3)
{
my $pair = $bit;
if ($flags & $pair)
{
my @p = V_Add(@corner, @p_fcc[$i,$i+1,$i+2]);
my $j = 0; # print water
my $n = scalar(@p_water);
++$res;
if ($flags & ($pair *= $salt)) # print salt ion
{ # sodium if highest
$j = $flags & ($pair*$salt) ? 3 : 4;
$n = 1;
$counter = ++$isalt;
}
else { $counter = ++$iwater; }
for (my $i=0; $i<$n; $i+=3)
{
my @xyz = V_Add(@p, @p_water[$i,$i+1,$i+2]);
@xyz = V_Add(@xyz, @Center) if (!$center);
printf(LAMMPS "%8d %7d %5d %9.6g %16.12g %16.12g %16.12g%s\n",
++$k, $res, $par[$j], $charge[$j], $xyz[0], $xyz[1],
$xyz[2], $add ? " # ".$types[$par[$j]-1] : "");
printf(PDB_CTRL "ATOM %6.6s %-4.4s %-3.3s %5.5s %3.3s ".
"%7.7s %7.7s %7.7s %5.5s %5.5s %4.4s %s\n", $k,
$types[$par[$j]-1], $n-1 ? "HOH" : "ION", $res, "",
$xyz[0], $xyz[1], $xyz[2], "1.00", "0.00", "",
$n-1 ? "WATR" : "SALT") if ($pdb_ctrl);
printf(PSF_CTRL "%8d %4.4s %-4.4s %-4.4s %-4.4s %4.4s ".
"%16.8e %7.7s %9.9s 0\n", $k, $n-1 ? "WATR" : "SALT",
$counter, $n-1 ? "HOH" : "ION", $types[$par[$j]-1], $id[$j],
$charge[$j], $masses{$id[$j]}, "") if ($pdb_ctrl);
++$j;
}
}
$bit *= 2;
}
}
}
}
return $k;
}
sub WritePSFAtoms()
{
my $n = PSFGoto(atoms);
my @res = (0, 0);
printf(PSF_CTRL "%8d !NATOM\n", $n+2*$n_waters+$n_fccs);
while (<PSF>)
{
last if (!$n--);
my @psf = split(" ");
if ($res[1]!=$psf[2]) { ++$res[0]; $res[1] = $psf[2]; }
printf(PSF_CTRL "%8d %4.4s %-4.4s %-4.4s %-4.4s %-4.4s ".
"%16.8e %7.7s %9.9s %s\n", $psf[0], $psf[1], $res[0],
$psf[3], $psf[4], $psf[5], $psf[6], $psf[7], "", $psf[8]);
}
}
sub WriteAtoms # print positions etc.
{
my $n = PSFGoto(atoms);
my $k = 0;
my @res = (0, 0);
CRDGoto(atoms);
$net_charge = 0;
printf(LAMMPS "Atoms%s\n\n",($add ? " # full" : "")) if ($n>0);
for (my $i=0; $i<$n; ++$i)
{
my @crd = $pdb ? NextPDB2CRD() : split(" ", <CRD>);
my @psf = split(" ", <PSF>);
my @xyz = MV_Dot(@R, @crd[-6, -5, -4]);
@xyz = V_Subtr(@xyz, @Center) if ($center);
if ($crd[-2]!=$res[1]) { ++$res[0]; $res[1] = $crd[-2]; }
printf(LAMMPS "%8d %7d %5d %9.6g %16.12g %16.12g %16.12g%s\n", ++$k,
$res[0], $link{$atom_types[$k]}, $psf[6], $xyz[0], $xyz[1], $xyz[2],
$add ? " # ".$types[$link{$atom_types[$k]}-1] : "");
printf(PDB_CTRL "ATOM %6.6s %-4.4s %-4.4s %4.4s %3.3s ".
"%7.7s %7.7s %7.7s %5.5s %5.5s %4.4s %s\n", $k,
$crd[-7], $crd[-8], $res[0], "", $xyz[0], $xyz[1], $xyz[2],
"1.00", $crd[-1], "", $crd[-3]) if ($pdb_ctrl);
next if (!$water_dens); # is water added?
$net_charge += $psf[6];
my @c = split(" ", $parms[$link{$atom_types[$k]}-1]);
EraseWater($c[1], @xyz);
}
$net_charge = int($net_charge*1e5+($net_charge>0?0.5:-0.5))/1e5;
AddIons() if ($water_dens);
WritePSFAtoms() if ($pdb_ctrl);
$k = WriteFCCAtoms($k, $res[0]+$res[1]);
printf(PDB_CTRL "END\n") if ($pdb_ctrl);
printf(LAMMPS "\n");
return $k;
}
sub WriteParameters # print parameters
{
my $mode = shift(@_)+1;
my $header = ("Pair","Bond","Angle","Dihedral","Improper")[$mode];
my $hint = ("# lj/charmm/coul/long", "# harmonic", "# charmm", "# charmm", "# harmonic")[$mode];
my $n = (4, 2, 4, 4, 2)[$mode];
my $k = 0;
printf("Info: converting ".lc($mode ? $header : "Atom")."s\n") if ($info);
if ($mode--)
{
BondedTypes($mode);
BondedParameters($mode);
%link = CrossLink(@types);
CorrectDihedralParameters() if ($mode==2);
@parms = Delete(1, @parms) if ($mode==3);
}
return 0 if (!scalar(@parms));
printf(LAMMPS "%s Coeffs %s\n\n", $header, ($add ? $hint : ""));
for (my $i=0; $i<scalar(@parms); ++$i)
{
if ($parms[$i] ne "")
{
foreach (split(":", $parms[$i]))
{
my @tmp = split(" ");
printf(LAMMPS "%8d", ++$k);
for (my $j=0; $j<$n; ++$j) {
printf(LAMMPS " %16.12g", $j<scalar(@tmp) ? $tmp[$j] : 0); }
printf(LAMMPS "%s\n", $add ? " # ".$types[$i] : "");
}
} else { ++$k; }
}
printf(LAMMPS "\n");
return $k;
}
sub WriteFCCBonded
{
my $mode = shift(@_);
my $k = shift(@_);
my $atom = $k_fcc;
return $k if (($mode>1)||!$water_dens);
my $type = $mode ? CreateID(HT, OT, HT) : CreateID(HT, OT);
my $id = $link{$type};
my $salt = 2**$n_water;
for (my $x=0; $x<$n_fcc[0]; ++$x)
{
for (my $y=0; $y<$n_fcc[1]; ++$y)
{
for (my $z=0; $z<$n_fcc[2]; ++$z)
{
my @corner = ($x*$l_fcc-$L[0]/2, $y*$l_fcc-$L[1]/2,
$z*$l_fcc-$L[2]/2);
my $flags = $flags_fcc[$x][$y][$z];
my $bit = 1;
for (my $i=0; $i<scalar(@p_fcc); $i+=3)
{
if ($flags&$bit)
{
if ($flags&($bit*$salt)) { ++$atom; }
else
{
printf(LAMMPS "%8d %7d %7d %7d%s\n", ++$k, $id, $atom,
$atom+1, $add ? " # ".$type : "") if (!$mode);
printf(LAMMPS "%8d %7d %7d %7d%s\n", ++$k, $id, $atom,
$atom+2, $add ? " # ".$type : "") if (!$mode);
printf(LAMMPS "%8d %7d %7d %7d %7d%s\n", ++$k, $id, $atom+1,
$atom, $atom+2, $add ? " # ".$type : "") if ($mode);
if ($pdb_ctrl)
{
PSFWrite(2, $atom, $atom+1, $atom, $atom+2) if (!$mode);
PSFWrite(3, $atom+1, $atom, $atom+2) if ($mode);
}
$atom += 3;
}
}
$bit *= 2;
}
}
}
}
return $k;
}
sub WriteBonded # print bonded list
{
my $mode = shift(@_);
my $psf_id = ("!NBOND:", "!NTHETA:", "!NPHI:", "!NIMPHI:")[$mode];
my $title = ("bonds", "angles", "dihedrals", "impropers")[$mode];
my $items = (2, 3, 4, 4)[$mode];
my $n = PSFGoto($title);
my $k = 0;
my @delta;
my @tmp;
return 0 if ($n<1);
printf(LAMMPS "%s\n\n", ucfirst($title));
printf(PSF_CTRL "\n%8d %s %s\n", $n+($mode ? ($mode==1 ? $n_waters : 0)
: 2*$n_waters), $psf_id, $title) if ($pdb_ctrl);
$psf_ncols = 0 if ($pdb_ctrl);
foreach (@parms)
{
push(@delta, $k);
$k += scalar(split(":"))-1 if ($_ ne "");
}
$k = 0;
for (my $i=0; $i<$n; ++$i)
{
my @bonded = PSFGet($items);
my @tmp = ();
foreach (@bonded) { push(@tmp, $ids{$atom_types[$_]}); }
my $id = $link{CreateID(@tmp)}-1;
my $m = 0;
if ($parms[$id] ne "")
{
foreach (split(":", $parms[$id]))
{
++$m;
my @const = split(" ");
next if (($const[0]==0)&&($mode==2 ? $const[-1]==0 : 1));
printf(LAMMPS "%8d %7d", ++$k, $id+$delta[$id]+$m);
foreach (@bonded) { printf(LAMMPS " %7d", $_); }
printf(LAMMPS "%s\n", $add ? " # ".CreateID(@tmp) : "");
}
}
else
{
printf(LAMMPS "%8d %7d", ++$k, $id+$delta[$id]+$m);
foreach (@bonded) { printf(LAMMPS " %7d", $_); }
printf(LAMMPS "%s\n", $add ? " # ".CreateID(@tmp) : "");
}
PSFWrite($items, @bonded) if ($pdb_ctrl);
}
$k = WriteFCCBonded($mode, $k);
printf(PSF_CTRL "\n") if ($pdb_ctrl && $psf_ncols);
printf(LAMMPS "\n");
return $k;
}
sub CreateCorrectedPairCoefficients
{
my $read = 0;
my $k = 0;
my %id;
my %type;
$coefficients = "";
foreach (@types) { $id{$ids{$_}} = $_; $type{$_} = ++$k; }
seek(PARAMETERS, 0, 0);
while (<PARAMETERS>)
{
chop();
my @cols = split(" ");
if ($read&&(scalar(@cols)>3)&&
(substr($cols[0],0,1) ne "!")&&($cols[2] lt 'A'))
{
my $id1 = $id{$cols[0]};
my $id2 = $id{$cols[1]};
if (($id1 ne "")&&($id2 ne ""))
{
my @c = (abs($cols[2]), $cols[3]*2.0**(-1/6));
if ($type{$id2}<$type{$id1})
{
my $tmp = $id1; $id1 = $id2; $id2 = $tmp;
}
$coefficients .= ":" if ($coefficients ne "");
$coefficients .= $type{$id1}." ".$type{$id2}." ";
$coefficients .= $c[0]." ".$c[1]." ".$c[0]." ".$c[1];
}
}
$read = 1 if ($cols[0] eq "NBFIX");
last if ($read&&!scalar(@cols));
}
}
sub WriteData
{
open(LAMMPS, ">$project.in"); # use .in for temporary
open(PDB_CTRL, ">".$project."_ctrl.pdb") if ($pdb_ctrl);
open(PSF_CTRL, ">".$project."_ctrl.psf") if ($pdb_ctrl);
WriteControlHeader() if ($pdb_ctrl);
ReadTopology();
CharacterizeBox();
SetupWater() if ($water_dens);
WriteBoxSize(); # body storage
@types = AtomTypes(); # atoms
@parms = AtomParameters(@types);
WriteMasses();
%link = CrossLink(@types);
CreateCorrectedPairCoefficients();
for (my $i=0; $i<scalar(@types); ++$i) { $types[$i] = $ids{$types[$i]}; }
$natom_types = WriteParameters(-1); # pairs
if ($#types+1 > $natom_types) {
print "Warning: $#types atom types present, but only $natom_types pair coeffs found\n";
# reset to what is found while determining the number of atom types.
$natom_types = $#types+1;
}
$natoms = WriteAtoms();
$nbond_types = WriteParameters(0); # bonds
$nbonds = WriteBonded(0);
$nangle_types = WriteParameters(1); # angles
$nangles = WriteBonded(1);
$shake = $link{CreateID(("HT", "OT", "HT"))};
$ndihedral_types = WriteParameters(2); # dihedrals
$ndihedrals = WriteBonded(2);
$nimproper_types = WriteParameters(3); # impropers
$nimpropers = WriteBonded(3);
close(LAMMPS); # close temp file
open(LAMMPS, ">$project.data"); # open data file
WriteLAMMPSHeader(); # header
open(TMP, "<$project.in"); # open temp file
while (<TMP>) { printf(LAMMPS "%s", $_); } # spool body
close(TMP); # close temp file
if ($pdb_ctrl)
{
#while (<PSF>) { printf(PSF_CTRL "%s", $_); }
close(PSF_CTRL); close(PDB_CTRL);
}
close(LAMMPS); # close data file
}
sub WriteLAMMPSInput
{
open(LAMMPS, ">$project.in"); # input file
printf(LAMMPS "# Created by $program v$version on %s", `date`);
printf(LAMMPS "# Command: %s\n\n", $cmdline);
printf(LAMMPS "units real\n"); # general
printf(LAMMPS "neigh_modify delay 2 every 1\n\n");
printf(LAMMPS "atom_style full\n"); # styles
printf(LAMMPS "bond_style harmonic\n") if ($nbond_types);
printf(LAMMPS "angle_style charmm\n") if ($nangle_types);
printf(LAMMPS "dihedral_style charmm\n") if ($ndihedral_types);
printf(LAMMPS "improper_style harmonic\n\n") if ($nimproper_types);
printf(LAMMPS "pair_style lj/charmm/coul/long 8 12\n");
printf(LAMMPS "pair_modify mix arithmetic\n");
printf(LAMMPS "kspace_style pppm 1e-6\n\n");
if ($cmap) {
printf(LAMMPS "# Modify following line to point to the desired CMAP file\n");
printf(LAMMPS "fix cmap all cmap charmm$cmap.cmap\n");
printf(LAMMPS "fix_modify cmap energy yes\n");
printf(LAMMPS "read_data $project.data fix cmap crossterm CMAP\n\n");
}else{
printf(LAMMPS "read_data $project.data\n\n"); # read data
}
if ($coefficients ne "") # corrected coeffs
{
foreach (split(":", $coefficients))
{
printf(LAMMPS "pair_coeff %s\n", $_);
}
printf(LAMMPS "\n");
}
printf(LAMMPS "special_bonds charmm\n"); # invoke charmm
printf(LAMMPS "thermo 10\n"); # set thermo style
printf(LAMMPS "thermo_style multi\n");
printf(LAMMPS "timestep 1.0\n\n"); # 1.0 ps time step
printf(LAMMPS "minimize 0.0 0.0 50 200\n\n"); # take of the edge
printf(LAMMPS "reset_timestep 0\n");
printf(LAMMPS "fix 1 all nve\n");
printf(LAMMPS "fix 2 all shake 1e-6 500 0 m 1.0\n")
if ($shake eq ""); # shake all H-bonds
printf(LAMMPS "fix 2 all shake 1e-6 500 0 m 1.0 a %s\n",$shake)
if ($shake ne ""); # add water if present
printf(LAMMPS "velocity all create 0.0 12345678 dist uniform\n\n");
printf(LAMMPS "restart 500 $project.restart1 $project.restart2\n");
printf(LAMMPS "dump 1 all atom 100 $project.dump\n");
printf(LAMMPS "dump_modify 1 image yes scale yes\n\n");
printf(LAMMPS "thermo 100\n"); # set thermo style
printf(LAMMPS "run 1000\n"); # run for 1000 time steps
close(LAMMPS);
}
# ----------------------- DESCRIPTION: sub CharmmCmap ------------------------ #
# This subroutine add a new section "CMAP" to the LAMMPS data file, which #
# a part of the implementation of "CHARMM CMAP" (see references) in LAMMPS. #
# The section "CMAP" contains a list of dihedral ID pairs from adjecent #
# peptide backtone dihedrals whose dihedral angles are corrresponding to PHI #
# and PSI. (PHI: C--N--C_aphla_C and PSI: N--C_alpha--C--N) #
# #
# Initiated by: Xiaohu Hu (hux2@ornl.gov) #
# May 2009 #
# #
# Finalized Oct 2016 by: Robert Latour (latourr@clemson.edu), #
# David Hyde-Volpe, and Tigran Abramyan, Clemson University, #
# and Chris Lorenz (chris.lorenz@kcl.ac.uk #
# #
# References: #
# - MacKerell, A.D., Jr., Feig, M., Brooks, C.L., III, Improved Treatment of #
# Protein Backbone Conformational in Empirical Force Fields, J. Am. Chem. #
# Soc. 126(2004): 698-699. #
# - MacKerell, A.D., Jr., Feig, M., Brooks, C.L., III, Extending the Treatment #
# of Backbone Energetics in Protein Force Fields: Limitations of Gas-Phase #
# Quantum Mechnacis in Reproducing Protein Conformational Distributions in #
# Molecular Dynamics Simulations, J. Comput. Chem. 25(2004): 1400-1415. #
# ---------------------------------------------------------------------------- #
sub CharmmCmap
{
print "\nINITIATING CHARMM CMAP SUBROUTINE...\n\n";
# Reread and analyse $project.data
my @raw_data;
open(LAMMPS, "< $project.data") or
die "\"sub CharmmCmap()\" cannot open \"$project.data!\n";
print "Analyzing \"$project.data\"...\n\n";
@raw_data = <LAMMPS>;
close(LAMMPS);
# Locate and extract the sections "Masses" and "Atoms"
my $line_number = 0;
# Header infos, 0 by default
my $natom_types = 0;
my $natom_number = 0;
my $ndihedral_number = 0;
my $temp_string;
# splice points, 0 by default
my $splice_onset_masses = 0;
my $splice_onset_atoms = 0;
my $splice_onset_dihedrals = 0;
foreach my $line (@raw_data) {
$line_number++;
chomp($line);
# Extract useful informations from the header
if ($line =~ m/atom types/) {
($natom_types,$temp_string) = split(" ",$line);
if ($natom_types == 0) {
die "\nError: Number of atom types is 0!\n";
}
print "Total atom types: $natom_types\n";
}
if ($line =~ m/atoms/) {
($natom_number,$temp_string) = split(" ",$line);
if ($natom_number == 0) {
die "\nError: Number of atoms is 0!\n";
}
print "Total atoms: $natom_number\n";
}
if ($line =~ m/dihedrals/) {
($ndihedral_number,$temp_string) = split(" ",$line);
if ($ndihedral_number == 0) {
die "\nError: Number of dihedrals is 0\n";
}
print "Total dihedrals: $ndihedral_number\n";
}
# Locate and data from sections "Masses", "Atoms" and "Dihedrals"
if ($line =~ m/Masses/) {
$splice_onset_masses = $line_number + 1;
if ($splice_onset_masses-1 == 0) {
die "\nError: Can not find the section \"Masses\"\n";
}
print "Section \"Masses\" found: line $splice_onset_masses\n";
}
if ($line =~ m/Atoms/) {
$splice_onset_atoms = $line_number +1;
if ($splice_onset_atoms-1 == 0) {
die "\nError: Can not find the section \"Atoms\"\n";
}
print "Section \"Atoms\" found: line $splice_onset_atoms\n";
}
if ($line =~ m/Dihedrals/) {
$splice_onset_dihedrals = $line_number + 1;
if ($splice_onset_dihedrals-1 == 0) {
die "\nError: Can not find the section \"Dihedrals\"\n";
}
print "Section \"Dihedrals\" found: line $splice_onset_dihedrals\n";
}
}
print "\nGenerating PHI/PSI dihedral pair list...\n\n";
my @temp1 = @raw_data;
my @temp2 = @raw_data;
my @temp3 = @raw_data;
# Extract the section "Masses", "Atoms" and "Dihedrals"
my @temp_masses_data = splice(@temp1,$splice_onset_masses,$natom_types);
my @temp_atoms_data = splice(@temp2,$splice_onset_atoms,$natom_number);
my @temp_dihedrals_data = splice(@temp3,$splice_onset_dihedrals,$ndihedral_number);
# Store @temp_masses_dat into a matrix
my @masses_matrix;
my $atom_type;
my $mass;
for (@temp_masses_data) {
($atom_type, $mass) = split(" ");
push(@masses_matrix,[$atom_type,$mass]);
}
# Store @temp_atoms_data into a matrix
my @atoms_matrix;
my $atom_ID;
my $molecule_ID;
my $atype;
my $charge;
my $atom_x_coor;
my $atom_y_coor;
my $atom_z_coor;
for (@temp_atoms_data) {
($atom_ID,$molecule_ID,$atype,$charge,$atom_x_coor,$atom_y_coor,$atom_z_coor) = split(" ");
push(@atoms_matrix,
[$atom_ID,$molecule_ID,$atype,$charge,$atom_x_coor,$atom_y_coor,$atom_z_coor]);
}
# Store @temp_dihedrals_data into a matrix
my @dihedrals_matrix;
my $dihedral_ID;
my $dihedtal_type;
my $dihe_atom1;
my $dihe_atom2;
my $dihe_atom3;
my $dihe_atom4;
for (@temp_dihedrals_data) {
($dihedral_ID,$dihedral_type,$dihe_atom1,$dihe_atom2,$dihe_atom3,$dihe_atom4) = split(" ");
push(@dihedrals_matrix,
[$dihedral_ID,$dihedral_type,$dihe_atom1,$dihe_atom2,$dihe_atom3,$dihe_atom4]);
}
# Find out and extract the peptide backbone dihedrals
#
# Definitions of peptide backbone dihedrals
#
# For dihedral angle PHI: C--N--CA--C
# For dihedral angle PSI: N--CA--C--N
#
# ---------------------------------------------------------
# atom | mass |partial charge| amino-acid
# ---------------------------------------------------------
# C | 12.011 | 0.51 | all except GLY and PRO
# N | 14.007 | -0.29 | PRO
# N | 14.007 | -0.47 | all except PRO
# CA | 12.011 | 0.07 | all except GLY and PRO
# CA | 12.011 | -0.02 | GLY
# CA | 12.011 | 0.02 | PRO
# ---------------------------------------------------------
#
# Peptide backbone
# ...
# /
# O=C
# \
# N-H
# / -----> PHI (C-N-CA-C)
# H-CA-R
# \ -----> PSI (N-CA-C-N)
# C=O
# /
# H-N
# \
# ...
#
# Criteria to be a PHI/PSI dihedral pair:
# 1. Atoms have to match with the mass/charge constellations as
# defined above.
# 2. The atoms N--CA--C needs to be covalently bonded with each
# other.
# Find which types do C, N and CA correspond to and store them
# in lists
my $mass_carbon = 12.011;
my $mass_nitrogen = 14.007;
my @carbon_list;
my @nitrogen_list;
my $carbon_counter = 0;
my $nitrogen_counter = 0;
for (my $i = 0; $i < $natom_types; $i++) {
if (${$masses_matrix[$i]}[1] == $mass_carbon) {
push(@carbon_list,${$masses_matrix[$i]}[0]);
$carbon_counter++;
}
if (${$masses_matrix[$i]}[1] == $mass_nitrogen) {
push(@nitrogen_list,${$masses_matrix[$i]}[0]);
$nitrogen_counter++;
}
}
# Quit if no carbons or nitrogens
if ($carbon_counter == 0 or $nitrogen_counter == 0) {
if ($carbon_counter == 0) {
print "No carbon atoms exist in the system\n";
}
if ($nitrogen_counter == 0) {
print "No nitrogen atoms exist in the system\n";
}
print "CMAP usage impossible\n";
return;
}
print "Carbon atom type/s: @carbon_list\n";
print "Nitrogen atom type/s: @nitrogen_list\n";
# Determine the atom types of C, CA and N
# Charges of the backbone atoms
my $charge_C = 0.51;
my $charge_CA = 0.07;
my $charge_N = -0.47;
# Special setting for PRO
my $charge_N_PRO = -0.29;
my $charge_CA_PRO = 0.02;
# Special setting for GLY
my $charge_CA_GLY = -0.02;
# Peptide backbone atom types
my $C_type;
my $CA_type;
my $CA_GLY_type;
my $CA_PRO_type;
my $N_type;
my $N_PRO_type;
my $C_counter = 0;
my $CA_counter = 0;
my $CA_GLY_counter = 0;
my $CA_PRO_counter = 0;
my $N_counter = 0;
my $N_PRO_counter = 0;
my $C_flag = 0;
for (my $i = 0; $i <= $natom_number; $i++) {
my $cur_type = ${$atoms_matrix[$i]}[2];
my $cur_charge = ${$atoms_matrix[$i]}[3];
for (my $j = 0; $j <= $#carbon_list; $j++) {
if ($cur_type == $carbon_list[$j]) {
$C_flag = 1;
if ($cur_charge == $charge_C) {
$C_type = $cur_type;
$C_counter++;
}
if ($cur_charge == $charge_CA) {
$CA_type = $cur_type;
$CA_counter++;
}
if ($cur_charge == $charge_CA_GLY) {
$CA_GLY_type = $cur_type;
$CA_GLY_counter++;
}
if ($cur_charge == $charge_CA_PRO) {
$CA_PRO_type = $cur_type;
$CA_PRO_counter++;
}
}
}
if ($C_flag == 0) {
for (my $k = 0; $k <= $#nitrogen_list; $k++) {
if ($cur_type == $nitrogen_list[$k]) {
if ($cur_charge == $charge_N) {
$N_type = $cur_type;
$N_counter++;
}
if ($cur_charge == $charge_N_PRO) {
$N_PRO_type = $cur_type;
$N_PRO_counter++;
}
}
}
}
$C_flag = 0;
}
# Quit if one of the atom types dosen't exist
if ( $C_counter == 0 or
($CA_counter == 0 and $CA_GLY_counter == 0 and $CA_PRO_counter == 0) or
($N_counter == 0 and $N_PRO_counter == 0) ) {
if ($C_counter == 0) {
print "\nCannot find the peptide backbone C atom type\n";
}
if ($CA_counter == 0 and $CA_GLY_counter == 0 and $CA_PRO_counter == 0) {
print "\nCannot find the peptide backbone C-alpha atom type\n";
}
if ($N_counter == 0 and $N_PRO_counter == 0) {
print "\nCannot find the peptide backbone N atom type\n";
}
print "CMAP usage impossible\n";
return;
}
print "Peptide backbone carbon type: $C_type\n";
print "Alpha-carbon type: $CA_type\n" if ($CA_counter > 0);
print "Alpha-carbon type (GLY): $CA_GLY_type\n" if ($CA_GLY_counter > 0);
print "Alpha-carbon type (PRO): $CA_PRO_type\n" if ($CA_PRO_counter > 0);
print "Peptide backbone nitrogen type: $N_type\n" if ($N_counter >0);
print "Peptide backbone nitrogen type (PRO): $N_PRO_type\n" if ($N_PRO_counter > 0);
# Loop through the dihedral list to find the PHI- and PSI-dihedrals
my @PHI_dihedrals;
my @PSI_dihedrals;
my $PHI_counter = 0;
my $PSI_counter = 0;
for (my $i = 0; $i < $ndihedral_number; $i++) {
my $cur_dihe_ID = ${dihedrals_matrix[$i]}[0];
my $cur_atom1_type = ${atoms_matrix[${dihedrals_matrix[$i]}[2]-1]}[2];
my $cur_atom2_type = ${atoms_matrix[${dihedrals_matrix[$i]}[3]-1]}[2];
my $cur_atom3_type = ${atoms_matrix[${dihedrals_matrix[$i]}[4]-1]}[2];
my $cur_atom4_type = ${atoms_matrix[${dihedrals_matrix[$i]}[5]-1]}[2];
next if (${dihedrals_matrix[$i]}[2] == ${dihedrals_matrix[$i-1]}[2] and
${dihedrals_matrix[$i]}[3] == ${dihedrals_matrix[$i-1]}[3] and
${dihedrals_matrix[$i]}[4] == ${dihedrals_matrix[$i-1]}[4] and
${dihedrals_matrix[$i]}[5] == ${dihedrals_matrix[$i-1]}[5]);
# Determine PHI-dihedrals; If C-CA-N-C or C-N-CA-C, then save it in a list
if ($cur_atom1_type == $C_type and $cur_atom4_type == $C_type) {
if ( ( ($cur_atom2_type == $CA_type or
$cur_atom2_type == $CA_GLY_type or
$cur_atom2_type == $CA_PRO_type) and
($cur_atom3_type == $N_type or
$cur_atom3_type == $N_PRO_type) ) or
( ($cur_atom3_type == $CA_type or
$cur_atom3_type == $CA_GLY_type or
$cur_atom3_type == $CA_PRO_type) and
($cur_atom2_type == $N_type or
$cur_atom2_type == $N_PRO_type) ) ) {
push (@PHI_dihedrals,$cur_dihe_ID);
$PHI_counter++;
}
}
# Determin PSI-dihedrals; If N-CA-C-N or N-C-CA-N (N can be both normal N or N proline),
# then save it in a list
if ( ($cur_atom1_type == $N_type and $cur_atom4_type == $N_type) or
($cur_atom4_type == $N_PRO_type and $cur_atom1_type == $N_PRO_type) or
($cur_atom1_type == $N_type and $cur_atom4_type == $N_PRO_type) or
($cur_atom4_type == $N_type and $cur_atom1_type == $N_PRO_type) ) {
if ( ( ($cur_atom2_type == $CA_type or
$cur_atom2_type == $CA_GLY_type or
$cur_atom2_type == $CA_PRO_type) and
$cur_atom3_type == $C_type) or
( ($cur_atom3_type == $CA_type or
$cur_atom3_type == $CA_GLY_type or
$cur_atom3_type == $CA_PRO_type) and
$cur_atom2_type == $C_type) ) {
push (@PSI_dihedrals,$cur_dihe_ID);
$PSI_counter++;
}
}
}
# Quit if no PHI or PSI dihedrals
if ($PHI_counter == 0 or $PSI_counter ==0) {
if ($PHI_counter == 0) {
print "Can not find the PHI backbone dihedrals\n";
}
if ($PSI_counter ==0) {
print "Can not find the PSI backbone dihedrals\n";
}
print "CMAP usage impossible\n";
return;
}
# Construct the PHI/PSI diheral pair list
#
# The algorithm:
# _____
# | |
# 1--2--3--4 PHI-dihedral
# 4--3--2--1
# --C--N-CA--C--N-- Peptide backbone
# 1--2--3--4
# 4--3--2--1 PSI-dihedral
# |_____|
#
# For a certain PHI dihedral, following conditions have to be met:
#
# PHI PSI
# If (2--3--4) = (1--2--3)
# or
# if (2--3--4) = (4--3--2)
# or
# if (3--2--1) = (1--2--3)
# or
# if (3--2--1) = (4--3--2),
#
# then these 2 dihedrals are a PHI/PSI pair. If a pair is found, the
# dihedral IDs will be stored in "@PHI_PSI_matrix".
my @PHI_PSI_matrix;
my $crossterm_CA_charge;
my $crossterm_type;
my $crossterm_counter = 0;
my $crossterm_type1_flag = 0;
my $crossterm_type2_flag = 0;
my $crossterm_type3_flag = 0;
my $crossterm_type4_flag = 0;
my $crossterm_type5_flag = 0;
my $crossterm_type6_flag = 0;
for (my $i = 0; $i <= $#PHI_dihedrals; $i++) {
my $cur_PHI_dihe = ${dihedrals_matrix[$PHI_dihedrals[$i]-1]}[0];
my $phi1 = ${dihedrals_matrix[$PHI_dihedrals[$i]-1]}[2];
my $phi2 = ${dihedrals_matrix[$PHI_dihedrals[$i]-1]}[3];
my $phi3 = ${dihedrals_matrix[$PHI_dihedrals[$i]-1]}[4];
my $phi4 = ${dihedrals_matrix[$PHI_dihedrals[$i]-1]}[5];
for (my $j = 0; $j <= $#PSI_dihedrals; $j++) {
my $cur_PSI_dihe = ${dihedrals_matrix[$PSI_dihedrals[$j]-1]}[0];
my $psi1 = ${dihedrals_matrix[$PSI_dihedrals[$j]-1]}[2];
my $psi2 = ${dihedrals_matrix[$PSI_dihedrals[$j]-1]}[3];
my $psi3 = ${dihedrals_matrix[$PSI_dihedrals[$j]-1]}[4];
my $psi4 = ${dihedrals_matrix[$PSI_dihedrals[$j]-1]}[5];
if ( ($phi2 == $psi1 and $phi3 == $psi2 and $phi4 == $psi3) or
($phi2 == $psi4 and $phi3 == $psi3 and $phi4 == $psi2) or
($phi3 == $psi1 and $phi2 == $psi2 and $phi1 == $psi3) or
($phi3 == $psi4 and $phi2 == $psi3 and $phi1 == $psi2) ) {
# Find out to which amino acid the cross-term belongs
if ($phi3 == $psi2 or $phi3 == $psi3) {
$crossterm_CA_charge = ${atoms_matrix[${dihedrals_matrix[$PHI_dihedrals[$i]-1]}[4]-1]}[3];
}
if ($phi2 == $psi2 or $phi2 == $psi3) {
$crossterm_CA_charge = ${atoms_matrix[${dihedrals_matrix[$PHI_dihedrals[$i]-1]}[3]-1]}[3];
}
# Def. the type of the crossterm per cmap.data file; If C_alpha of the crossterm is
# - ALA type, then $crossterm_type = 1;
# - ALA-PRO (ALA is the current AA), then $crossterm_type = 2;
# - PRO type, then $crossterm_type = 3;
# - PRO-PRO (First PRO is the current AA), then $crossterm_type = 4;
# - GLY type, then $crossterm_type = 5;
# - GLY-PRO (GLY is the current AA), then $crossterm_type = 6;
if ($crossterm_CA_charge == $charge_CA) { $crossterm_type = 1; $crossterm_type1_flag = 1; }
if ($crossterm_CA_charge == $charge_CA_GLY) { $crossterm_type = 5; $crossterm_type5_flag = 1; }
if ($crossterm_CA_charge == $charge_CA_PRO) {
$crossterm_type = 3; $crossterm_type3_flag = 1;
# Checking the last crossterm, re-assign the last crossterm type if needed
if ($crossterm_counter-1 >= 0 and $PHI_PSI_matrix[$crossterm_counter-1][0] == 1) {
$PHI_PSI_matrix[$crossterm_counter-1][0] = 2;
$crossterm_type2_flag = 1;
}
if ($crossterm_counter-1 >= 0 and $PHI_PSI_matrix[$crossterm_counter-1][0] == 3) {
$PHI_PSI_matrix[$crossterm_counter-1][0] = 4;
$crossterm_type4_flag = 1;
}
if ($crossterm_counter-1 >= 0 and $PHI_PSI_matrix[$crossterm_counter-1][0] == 5) {
$PHI_PSI_matrix[$crossterm_counter-1][0] = 6;
$crossterm_type6_flag = 1;
}
}
push(@PHI_PSI_matrix,[$crossterm_type,$phi1,$phi2,$phi3,$phi4,$psi4]);
$crossterm_counter++;
$crossterm_CA_charge = 0;
$crossterm_type = 0;
}
}
}
# Check whether the amino acid at the C-terminus is a PRO or not. If yes, the type of the last crossterm
# should be set to its X-PRO form instead of X, where X is ALA, PRO, or GLY. X-PRO form = X type + 1.
my @pdb_data;
open(PDB,"< $project.pdb")
or die "WARNING: Cannot open file \"$project.pdb\"! (required if the -cmap option is used)\n";
@pdb_data = <PDB>;
close(PDB);
my @ter_line;
my $ter_AA_type = 0;
my $ter_flag = 0;
foreach $line (@pdb_data) {
if ($line =~ m/TER/) {
@ter_line = split(" ",$line);
$ter_AA_type = $ter_line[2];
print "Terminal amino acid type is: $ter_AA_type\n";
$ter_flag = 1;
}
}
if ($ter_flag == 0) {
print "\n*** ERROR IN THE PDB FILE: ***\n";
print "In order for the CMAP section to be generated, the pdb file must \n";
print "identify the C-terminus amino acid in the file with 'TER'. \n";
print "This line is missing from the pdb file that was used.\n";
print "To correct this problem, open the pdb file in an editor,\n";
print "find the last atom of the last amino acid residue in the peptide\n";
print "chain and insert the following line immediately after that atom:\n";
print " 'TER <#1> <RES> <#2>' \n";
print "where '<#1> is the next atom number, <RES> is the three letter amino\n";
print "acid abbreviation for that amino acid, and <#2> is the molecule number\n";
print "of the terminal amino acid residue.\n\n";
print "For example, if the last atom of the last amino acid in the peptide\n";
print "sequence is listed in the pdb file as:\n\n";
print " 'ATOM 853 O GLU P 56 12.089 -1.695 -6.543 1.00 1.03 PROA'\n\n";
print "you would insert the following line after it:\n\n";
print " 'TER 854 GLU 56'\n\n";
print "If any additional atoms are listed in the pdb file (e.g., water, ions)\n";
print "after this terminal amino acid residue, their atom numbers and\n";
print "molecule numbers must be incremented by 1 to account for the new line\n";
print "that was inserted.\n\n";
die "Error: No terminating atom designated in pdb file! See above note to correct problem.\n\n";
}
if ($ter_AA_type eq PRO) {
$PHI_PSI_matrix[$crossterm_counter-1][0] = $PHI_PSI_matrix[$crossterm_counter-1][0]+1;
}
# Print out PHI/PSI diheral pair list
my $pair_counter = 0;
# Don't presently use $ncrosstermtypes but have this available if wish to print it out
my $ncrosstermtypes = $crossterm_type1_flag + $crossterm_type2_flag + $crossterm_type3_flag +
$crossterm_type4_flag + $crossterm_type5_flag + $crossterm_type6_flag;
print "\nWriting \"$project.data\" with section \"CMAP crossterms\" added at the end.\n";
# Writing the new lammps data file
open(REWRITE,"> $project.data")
or die "Cannot write file \"$project.data\"!\n";
foreach $line (@raw_data) {
printf(REWRITE "$line\n");
if ($line =~ m/impropers/) {
printf(REWRITE "%12d crossterms\n", $crossterm_counter);
}
}
printf(REWRITE "CMAP\n\n");
my $ref_line;
my $column;
foreach $ref_line (@PHI_PSI_matrix) {
$pair_counter++;
printf(REWRITE "%8d",$pair_counter);
foreach $column (@$ref_line) {
printf(REWRITE " %7d",$column);
}
printf(REWRITE "\n");
}
close(REWRITE);
print "\nDone!\n\n";
}
# main
Initialize();
WriteData();
WriteLAMMPSInput();
printf("Info: conversion complete\n\n") if ($info);
CharmmCmap() if ($cmap);

Event Timeline