diff --git a/doc/src/Section_start.txt b/doc/src/Section_start.txt index 5f9ac36a4..3711342f7 100644 --- a/doc/src/Section_start.txt +++ b/doc/src/Section_start.txt @@ -1,1819 +1,1819 @@ "Previous Section"_Section_intro.html - "LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc - "Next Section"_Section_commands.html :c :link(lws,http://lammps.sandia.gov) :link(ld,Manual.html) :link(lc,Section_commands.html#comm) :line 2. Getting Started :h3 This section describes how to build and run LAMMPS, for both new and experienced users. 2.1 "What's in the LAMMPS distribution"_#start_1 2.2 "Making LAMMPS"_#start_2 2.3 "Making LAMMPS with optional packages"_#start_3 2.4 "Building LAMMPS as a library"_#start_4 2.5 "Running LAMMPS"_#start_5 2.6 "Command-line options"_#start_6 2.7 "Screen output"_#start_7 2.8 "Tips for users of previous versions"_#start_8 :all(b) :line 2.1 What's in the LAMMPS distribution :h4,link(start_1) When you download a LAMMPS tarball you will need to unzip and untar the downloaded file with the following commands, after placing the tarball in an appropriate directory. tar -xzvf lammps*.tar.gz :pre This will create a LAMMPS directory containing two files and several sub-directories: README: text file LICENSE: the GNU General Public License (GPL) bench: benchmark problems doc: documentation examples: simple test problems potentials: embedded atom method (EAM) potential files src: source files tools: pre- and post-processing tools :tb(s=:) Note that the "download page"_download also has links to download pre-build Windows installers, as well as pre-built packages for several widely used Linux distributions. It also has instructions for how to download/install LAMMPS for Macs (via Homebrew), and to download and update LAMMPS from SVN and Git repositories, which gives you access to the up-to-date sources that are used by the LAMMPS core developers. :link(download,http://lammps.sandia.gov/download.html) The Windows and Linux packages for serial or parallel include only selected packages and bug-fixes/upgrades listed on "this page"_http://lammps.sandia.gov/bug.html up to a certain date, as stated on the download page. If you want an executable with non-included packages or that is more current, then you'll need to build LAMMPS yourself, as discussed in the next section. Skip to the "Running LAMMPS"_#start_6 sections for info on how to launch a LAMMPS Windows executable on a Windows box. :line 2.2 Making LAMMPS :h4,link(start_2) This section has the following sub-sections: 2.2.1 "Read this first"_#start_2_1 2.2.1 "Steps to build a LAMMPS executable"_#start_2_2 2.2.3 "Common errors that can occur when making LAMMPS"_#start_2_3 2.2.4 "Additional build tips"_#start_2_4 2.2.5 "Building for a Mac"_#start_2_5 2.2.6 "Building for Windows"_#start_2_6 :all(b) :line Read this first :h5,link(start_2_1) If you want to avoid building LAMMPS yourself, read the preceeding section about options available for downloading and installing executables. Details are discussed on the "download"_download page. Building LAMMPS can be simple or not-so-simple. If all you need are the default packages installed in LAMMPS, and MPI is already installed on your machine, or you just want to run LAMMPS in serial, then you can typically use the Makefile.mpi or Makefile.serial files in src/MAKE by typing one of these lines (from the src dir): make mpi make serial :pre Note that on a facility supercomputer, there are often "modules" loaded in your environment that provide the compilers and MPI you should use. In this case, the "mpicxx" compile/link command in Makefile.mpi should simply work by accessing those modules. It may be the case that one of the other Makefile.machine files in the src/MAKE sub-directories is a better match to your system (type "make" to see a list), you can use it as-is by typing (for example): make stampede :pre If any of these builds (with an existing Makefile.machine) works on your system, then you're done! If you need to install an optional package with a LAMMPS command you want to use, and the package does not depend on an extra library, you can simply type make name :pre before invoking (or re-invoking) the above steps. "Name" is the lower-case name of the package, e.g. replica or user-misc. If you want to do one of the following: use a LAMMPS command that requires an extra library (e.g. "dump image"_dump_image.html) build with a package that requires an extra library build with an accelerator package that requires special compiler/linker settings run on a machine that has its own compilers, settings, or libraries :ul then building LAMMPS is more complicated. You may need to find where extra libraries exist on your machine or install them if they don't. You may need to build extra libraries that are included in the LAMMPS distribution, before building LAMMPS itself. You may need to edit a Makefile.machine file to make it compatible with your system. Please read the following sections carefully. If you are not comfortable with makefiles, or building codes on a Unix platform, or running an MPI job on your machine, please find a local expert to help you. Many compilation, linking, and run problems users experience are often not LAMMPS issues - they are peculiar to the user's system, compilers, libraries, etc. Such questions are better answered by a local expert. If you have a build problem that you are convinced is a LAMMPS issue (e.g. the compiler complains about a line of LAMMPS source code), then please post the issue to the "LAMMPS mail list"_http://lammps.sandia.gov/mail.html. If you succeed in building LAMMPS on a new kind of machine, for which there isn't a similar machine Makefile included in the src/MAKE/MACHINES directory, then send it to the developers and we can include it in the LAMMPS distribution. :line Steps to build a LAMMPS executable :h5,link(start_2_2) Step 0 :h6 The src directory contains the C++ source and header files for LAMMPS. It also contains a top-level Makefile and a MAKE sub-directory with low-level Makefile.* files for many systems and machines. See the src/MAKE/README file for a quick overview of what files are available and what sub-directories they are in. The src/MAKE dir has a few files that should work as-is on many platforms. The src/MAKE/OPTIONS dir has more that invoke additional compiler, MPI, and other setting options commonly used by LAMMPS, to illustrate their syntax. The src/MAKE/MACHINES dir has many more that have been tweaked or optimized for specific machines. These files are all good starting points if you find you need to change them for your machine. Put any file you edit into the src/MAKE/MINE directory and it will be never be touched by any LAMMPS updates. >From within the src directory, type "make" or "gmake". You should see a list of available choices from src/MAKE and all of its sub-directories. If one of those has the options you want or is the machine you want, you can type a command like: make mpi :pre or make serial :pre or gmake mac :pre Note that the corresponding Makefile.machine can exist in src/MAKE or any of its sub-directories. If a file with the same name appears in multiple places (not a good idea), the order they are used is as follows: src/MAKE/MINE, src/MAKE, src/MAKE/OPTIONS, src/MAKE/MACHINES. This gives preference to a file you have created/edited and put in src/MAKE/MINE. Note that on a multi-processor or multi-core platform you can launch a parallel make, by using the "-j" switch with the make command, which will build LAMMPS more quickly. If you get no errors and an executable like [lmp_mpi] or [lmp_serial] or [lmp_mac] is produced, then you're done; it's your lucky day. Note that by default only a few of LAMMPS optional packages are installed. To build LAMMPS with optional packages, see "this section"_#start_3 below. Step 1 :h6 If Step 0 did not work, you will need to create a low-level Makefile for your machine, like Makefile.foo. You should make a copy of an existing Makefile.* in src/MAKE or one of its sub-directories as a starting point. The only portions of the file you need to edit are the first line, the "compiler/linker settings" section, and the "LAMMPS-specific settings" section. When it works, put the edited file in src/MAKE/MINE and it will not be altered by any future LAMMPS updates. Step 2 :h6 Change the first line of Makefile.foo to list the word "foo" after the "#", and whatever other options it will set. This is the line you will see if you just type "make". Step 3 :h6 The "compiler/linker settings" section lists compiler and linker settings for your C++ compiler, including optimization flags. You can use g++, the open-source GNU compiler, which is available on all Unix systems. You can also use mpicxx which will typically be available if MPI is installed on your system, though you should check which actual compiler it wraps. Vendor compilers often produce faster code. On boxes with Intel CPUs, we suggest using the Intel icc compiler, which can be downloaded from "Intel's compiler site"_intel. :link(intel,http://www.intel.com/software/products/noncom) If building a C++ code on your machine requires additional libraries, then you should list them as part of the LIB variable. You should not need to do this if you use mpicxx. The DEPFLAGS setting is what triggers the C++ compiler to create a dependency list for a source file. This speeds re-compilation when source (*.cpp) or header (*.h) files are edited. Some compilers do not support dependency file creation, or may use a different switch than -D. GNU g++ and Intel icc works with -D. If your compiler can't create dependency files, then you'll need to create a Makefile.foo patterned after Makefile.storm, which uses different rules that do not involve dependency files. Note that when you build LAMMPS for the first time on a new platform, a long list of *.d files will be printed out rapidly. This is not an error; it is the Makefile doing its normal creation of dependencies. Step 4 :h6 The "system-specific settings" section has several parts. Note that if you change any -D setting in this section, you should do a full re-compile, after typing "make clean" (which will describe different clean options). The LMP_INC variable is used to include options that turn on ifdefs within the LAMMPS code. The options that are currently recogized are: -DLAMMPS_GZIP -DLAMMPS_JPEG -DLAMMPS_PNG -DLAMMPS_FFMPEG -DLAMMPS_MEMALIGN -DLAMMPS_XDR -DLAMMPS_SMALLBIG -DLAMMPS_BIGBIG -DLAMMPS_SMALLSMALL -DLAMMPS_LONGLONG_TO_LONG -DLAMMPS_EXCEPTIONS -DPACK_ARRAY -DPACK_POINTER -DPACK_MEMCPY :ul The read_data and dump commands will read/write gzipped files if you compile with -DLAMMPS_GZIP. It requires that your machine supports the "popen()" function in the standard runtime library and that a gzip executable can be found by LAMMPS during a run. NOTE: on some clusters with high-speed networks, using the fork() library calls (required by popen()) can interfere with the fast communication library and lead to simulations using compressed output or input to hang or crash. For selected operations, compressed file I/O is also available using a compression library instead, which are provided in the COMPRESS package. From more details about compiling LAMMPS with packages, please see below. If you use -DLAMMPS_JPEG, the "dump image"_dump_image.html command will be able to write out JPEG image files. For JPEG files, you must also link LAMMPS with a JPEG library, as described below. If you use -DLAMMPS_PNG, the "dump image"_dump.html command will be able to write out PNG image files. For PNG files, you must also link LAMMPS with a PNG library, as described below. If neither of those two defines are used, LAMMPS will only be able to write out uncompressed PPM image files. If you use -DLAMMPS_FFMPEG, the "dump movie"_dump_image.html command will be available to support on-the-fly generation of rendered movies the need to store intermediate image files. It requires that your machines supports the "popen" function in the standard runtime library and that an FFmpeg executable can be found by LAMMPS during the run. NOTE: Similar to the note above, this option can conflict with high-speed networks, because it uses popen(). Using -DLAMMPS_MEMALIGN= enables the use of the posix_memalign() call instead of malloc() when large chunks or memory are allocated by LAMMPS. This can help to make more efficient use of vector instructions of modern CPUS, since dynamically allocated memory has to be aligned on larger than default byte boundaries (e.g. 16 bytes instead of 8 bytes on x86 type platforms) for optimal performance. If you use -DLAMMPS_XDR, the build will include XDR compatibility files for doing particle dumps in XTC format. This is only necessary if your platform does have its own XDR files available. See the Restrictions section of the "dump"_dump.html command for details. Use at most one of the -DLAMMPS_SMALLBIG, -DLAMMPS_BIGBIG, -DLAMMPS_SMALLSMALL settings. The default is -DLAMMPS_SMALLBIG. These settings refer to use of 4-byte (small) vs 8-byte (big) integers within LAMMPS, as specified in src/lmptype.h. The only reason to use the BIGBIG setting is to enable simulation of huge molecular systems (which store bond topology info) with more than 2 billion atoms, or to track the image flags of moving atoms that wrap around a periodic box more than 512 times. Normally, the only reason to use SMALLSMALL is if your machine does not support 64-bit integers, though you can use SMALLSMALL setting if you are running in serial or on a desktop machine or small cluster where you will never run large systems or for long time (more than 2 billion atoms, more than 2 billion timesteps). See the "Additional build tips"_#start_2_4 section below for more details on these settings. Note that the USER-ATC package is not currently compatible with -DLAMMPS_BIGBIG. Also the GPU package requires the lib/gpu library to be compiled with the same setting, or the link will fail. The -DLAMMPS_LONGLONG_TO_LONG setting may be needed if your system or MPI version does not recognize "long long" data types. In this case a "long" data type is likely already 64-bits, in which case this setting will convert to that data type. The -DLAMMPS_EXCEPTIONS setting can be used to activate alternative versions of error handling inside of LAMMPS. This is useful when external codes drive LAMMPS as a library. Using this option, LAMMPS errors do not kill the caller. Instead, the call stack is unwound and control returns to the caller. The library interface provides the lammps_has_error() and lammps_get_last_error_message() functions to detect and find out more about a LAMMPS error. Using one of the -DPACK_ARRAY, -DPACK_POINTER, and -DPACK_MEMCPY options can make for faster parallel FFTs (in the PPPM solver) on some platforms. The -DPACK_ARRAY setting is the default. See the "kspace_style"_kspace_style.html command for info about PPPM. See Step 6 below for info about building LAMMPS with an FFT library. Step 5 :h6 The 3 MPI variables are used to specify an MPI library to build LAMMPS with. Note that you do not need to set these if you use the MPI compiler mpicxx for your CC and LINK setting in the section above. The MPI wrapper knows where to find the needed files. If you want LAMMPS to run in parallel, you must have an MPI library installed on your platform. If MPI is installed on your system in the usual place (under /usr/local), you also may not need to specify these 3 variables, assuming /usr/local is in your path. On some large parallel machines which use "modules" for their compile/link environements, you may simply need to include the correct module in your build environment, before building LAMMPS. Or the parallel machine may have a vendor-provided MPI which the compiler has no trouble finding. Failing this, these 3 variables can be used to specify where the mpi.h file (MPI_INC) and the MPI library file (MPI_PATH) are found and the name of the library file (MPI_LIB). If you are installing MPI yourself, we recommend Argonne's MPICH2 or OpenMPI. MPICH can be downloaded from the "Argonne MPI site"_http://www.mcs.anl.gov/research/projects/mpich2/. OpenMPI can be downloaded from the "OpenMPI site"_http://www.open-mpi.org. Other MPI packages should also work. If you are running on a big parallel platform, your system people or the vendor should have already installed a version of MPI, which is likely to be faster than a self-installed MPICH or OpenMPI, so find out how to build and link with it. If you use MPICH or OpenMPI, you will have to configure and build it for your platform. The MPI configure script should have compiler options to enable you to use the same compiler you are using for the LAMMPS build, which can avoid problems that can arise when linking LAMMPS to the MPI library. If you just want to run LAMMPS on a single processor, you can use the dummy MPI library provided in src/STUBS, since you don't need a true MPI library installed on your system. See src/MAKE/Makefile.serial for how to specify the 3 MPI variables in this case. You will also need to build the STUBS library for your platform before making LAMMPS itself. Note that if you are building with src/MAKE/Makefile.serial, e.g. by typing "make serial", then the STUBS library is built for you. To build the STUBS library from the src directory, type "make mpi-stubs", or from the src/STUBS dir, type "make". This should create a libmpi_stubs.a file suitable for linking to LAMMPS. If the build fails, you will need to edit the STUBS/Makefile for your platform. The file STUBS/mpi.c provides a CPU timer function called MPI_Wtime() that calls gettimeofday() . If your system doesn't support gettimeofday() , you'll need to insert code to call another timer. Note that the ANSI-standard function clock() rolls over after an hour or so, and is therefore insufficient for timing long LAMMPS simulations. Step 6 :h6 The 3 FFT variables allow you to specify an FFT library which LAMMPS uses (for performing 1d FFTs) when running the particle-particle particle-mesh (PPPM) option for long-range Coulombics via the "kspace_style"_kspace_style.html command. LAMMPS supports common open-source or vendor-supplied FFT libraries for this purpose. If you leave these 3 variables blank, LAMMPS will use the open-source "KISS FFT library"_http://kissfft.sf.net, which is included in the LAMMPS distribution. This library is portable to all platforms and for typical LAMMPS simulations is almost as fast as FFTW or vendor optimized libraries. If you are not including the KSPACE package in your build, you can also leave the 3 variables blank. Otherwise, select which kinds of FFTs to use as part of the FFT_INC setting by a switch of the form -DFFT_XXX. Recommended values for XXX are: MKL or FFTW3. FFTW2 and NONE are supported as legacy options. Selecting -DFFT_FFTW will use the FFTW3 library and -DFFT_NONE will use the KISS library described above. You may also need to set the FFT_INC, FFT_PATH, and FFT_LIB variables, so the compiler and linker can find the needed FFT header and library files. Note that on some large parallel machines which use "modules" for their compile/link environements, you may simply need to include the correct module in your build environment. Or the parallel machine may have a vendor-provided FFT library which the compiler has no trouble finding. See the src/MAKE/OPTIONS/Makefile.fftw file for an example of how to specify these variables to use the FFTW3 library. FFTW is fast, portable library that should also work on any platform and typically be faster than KISS FFT. You can download it from "www.fftw.org"_http://www.fftw.org. Both the legacy version 2.1.X and the newer 3.X versions are supported as -DFFT_FFTW2 or -DFFT_FFTW3. Building FFTW for your box should be as simple as ./configure; make; make install. The install command typically requires root privileges (e.g. invoke it via sudo), unless you specify a local directory with the "--prefix" option of configure. Type "./configure --help" to see various options. If you wish to have FFTW support for single-precision FFTs (see below about -DFFT_SINGLE) in addition to the default double-precision FFTs, you will need to build FFTW a second time for single-precision. For FFTW3, do this via: make clean ./configure --enable-single; make; make install :pre which should produce the additional library libfftw3f.a. For FFTW2, do this: make clean ./configure --enable-float --enable-type-prefix; make; make install :pre which should produce the additional library libsfftw.a and additional include file sfttw.a. Note that on some platforms FFTW2 has been pre-installed for both single- and double-precision, and may already have these files as well as libdfftw.a and dfftw.h for double precision. The FFT_INC variable also allows for a -DFFT_SINGLE setting that will use single-precision FFTs with PPPM, which can speed-up long-range calulations, particularly in parallel or on GPUs. Fourier transform and related PPPM operations are somewhat insensitive to floating point truncation errors and thus do not always need to be performed in double precision. Using the -DFFT_SINGLE setting trades off a little accuracy for reduced memory use and parallel communication costs for transposing 3d FFT data. Note that single precision FFTs have only been tested with the FFTW3, FFTW2, MKL, and KISS FFT options. When using -DFFT_SINGLE with FFTW3 or FFTW2, you need to build FFTW with support for single-precision, as explained above. For FFTW3 you also need to include -lfftw3f with the FFT_LIB setting, in addition to -lfftw3. For FFTW2, you also need to specify -DFFT_SIZE with the FFT_INC setting and -lsfftw with the FFT_LIB setting (in place of -lfftw). Similarly, if FFTW2 has been preinstalled with an explicit double-precision library (libdfftw.a and not the default libfftw.a), then you can specify -DFFT_SIZE (and not -DFFT_SINGLE), and specify -ldfftw to use double-precision FFTs. Step 7 :h6 The 3 JPG variables allow you to specify a JPEG and/or PNG library which LAMMPS uses when writing out JPEG or PNG files via the "dump image"_dump_image.html command. These can be left blank if you do not use the -DLAMMPS_JPEG or -DLAMMPS_PNG switches discussed above in Step 4, since in that case JPEG/PNG output will be disabled. A standard JPEG library usually goes by the name libjpeg.a or libjpeg.so and has an associated header file jpeglib.h. Whichever JPEG library you have on your platform, you'll need to set the appropriate JPG_INC, JPG_PATH, and JPG_LIB variables, so that the compiler and linker can find it. A standard PNG library usually goes by the name libpng.a or libpng.so and has an associated header file png.h. Whichever PNG library you have on your platform, you'll need to set the appropriate JPG_INC, JPG_PATH, and JPG_LIB variables, so that the compiler and linker can find it. As before, if these header and library files are in the usual place on your machine, you may not need to set these variables. Step 8 :h6 Note that by default only a few of LAMMPS optional packages are installed. To build LAMMPS with optional packages, see "this section"_#start_3 below, before proceeding to Step 9. Step 9 :h6 That's it. Once you have a correct Makefile.foo, and you have pre-built any other needed libraries (e.g. MPI, FFT, etc) all you need to do from the src directory is type something like this: make foo make -j N foo gmake foo gmake -j N foo :pre The -j or -j N switches perform a parallel build which can be much faster, depending on how many cores your compilation machine has. N is the number of cores the build runs on. You should get the executable lmp_foo when the build is complete. :line Errors that can occur when making LAMMPS :h5 :link(start_2_3) If an error occurs when building LAMMPS, the compiler or linker will state very explicitly what the problem is. The error message should give you a hint as to which of the steps above has failed, and what you need to do in order to fix it. Building a code with a Makefile is a very logical process. The compiler and linker need to find the appropriate files and those files need to be compatible with LAMMPS settings and source files. When a make fails, there is usually a very simple reason, which you or a local expert will need to fix. Here are two non-obvious errors that can occur: (1) If the make command breaks immediately with errors that indicate it can't find files with a "*" in their names, this can be because your machine's native make doesn't support wildcard expansion in a makefile. Try gmake instead of make. If that doesn't work, try using a -f switch with your make command to use a pre-generated Makefile.list which explicitly lists all the needed files, e.g. make makelist make -f Makefile.list linux gmake -f Makefile.list mac :pre The first "make" command will create a current Makefile.list with all the file names in your src dir. The 2nd "make" command (make or gmake) will use it to build LAMMPS. Note that you should include/exclude any desired optional packages before using the "make makelist" command. (2) If you get an error that says something like 'identifier "atoll" is undefined', then your machine does not support "long long" integers. Try using the -DLAMMPS_LONGLONG_TO_LONG setting described above in Step 4. :line Additional build tips :h5,link(start_2_4) Building LAMMPS for multiple platforms. :h6 You can make LAMMPS for multiple platforms from the same src directory. Each target creates its own object sub-directory called Obj_target where it stores the system-specific *.o files. Cleaning up. :h6 Typing "make clean-all" or "make clean-machine" will delete *.o object files created when LAMMPS is built, for either all builds or for a particular machine. Changing the LAMMPS size limits via -DLAMMPS_SMALLBIG or -DLAMMPS_BIGBIG or -DLAMMPS_SMALLSMALL :h6 As explained above, any of these 3 settings can be specified on the LMP_INC line in your low-level src/MAKE/Makefile.foo. The default is -DLAMMPS_SMALLBIG which allows for systems with up to 2^63 atoms and 2^63 timesteps (about 9e18). The atom limit is for atomic systems which do not store bond topology info and thus do not require atom IDs. If you use atom IDs for atomic systems (which is the default) or if you use a molecular model, which stores bond topology info and thus requires atom IDs, the limit is 2^31 atoms (about 2 billion). This is because the IDs are stored in 32-bit integers. Likewise, with this setting, the 3 image flags for each atom (see the "dump"_dump.html doc page for a discussion) are stored in a 32-bit integer, which means the atoms can only wrap around a periodic box (in each dimension) at most 512 times. If atoms move through the periodic box more than this many times, the image flags will "roll over", e.g. from 511 to -512, which can cause diagnostics like the mean-squared displacement, as calculated by the "compute msd"_compute_msd.html command, to be faulty. To allow for larger atomic systems with atom IDs or larger molecular systems or larger image flags, compile with -DLAMMPS_BIGBIG. This stores atom IDs and image flags in 64-bit integers. This enables atomic or molecular systems with atom IDS of up to 2^63 atoms (about 9e18). And image flags will not "roll over" until they reach 2^20 = 1048576. If your system does not support 8-byte integers, you will need to compile with the -DLAMMPS_SMALLSMALL setting. This will restrict the total number of atoms (for atomic or molecular systems) and timesteps to 2^31 (about 2 billion). Image flags will roll over at 2^9 = 512. Note that in src/lmptype.h there are definitions of all these data types as well as the MPI data types associated with them. The MPI types need to be consistent with the associated C data types, or else LAMMPS will generate a run-time error. As far as we know, the settings defined in src/lmptype.h are portable and work on every current system. In all cases, the size of problem that can be run on a per-processor basis is limited by 4-byte integer storage to 2^31 atoms per processor (about 2 billion). This should not normally be a limitation since such a problem would have a huge per-processor memory footprint due to neighbor lists and would run very slowly in terms of CPU secs/timestep. :line Building for a Mac :h5,link(start_2_5) OS X is a derivative of BSD Unix, so it should just work. See the src/MAKE/MACHINES/Makefile.mac and Makefile.mac_mpi files. :line Building for Windows :h5,link(start_2_6) If you want to build a Windows version of LAMMPS, you can build it yourself, but it may require some effort. LAMMPS expects a Unix-like build environment for the default build procedure. This can be done using either Cygwin or MinGW; the latter also exists as a ready-to-use Linux-to-Windows cross-compiler in several Linux distributions. In these cases, you can do the installation after installing several unix-style commands like make, grep, sed and bash with some shell utilities. For Cygwin and the MinGW cross-compilers, suitable makefiles are provided in src/MAKE/MACHINES. When using other compilers, like Visual C++ or Intel compilers for Windows, you may have to implement your own build system. Due to differences between the Windows OS and Windows system libraries to Unix-like environments like Linux or MacOS, when compiling for Windows a few adjustments may be needed: Do [not] set the -DLAMMPS_MEMALIGN define (see LMP_INC makefile variable) Add -lwsock32 -lpsapi to the linker flags (see LIB makefile variable) Try adding -static-libgcc or -static or both to the linker flags when your LAMMPS executable complains about missing .dll files :ul Since none of the current LAMMPS core developers has significant experience building executables on Windows, we are happy to distribute contributed instructions and modifications to improve the situation, but we cannot provide support for those. With the so-called "Anniversary Update" to Windows 10, there is a Ubuntu Linux subsystem available for Windows, that can be installed and then used to compile/install LAMMPS as if you are running on a Ubuntu Linux system instead of Windows. As an alternative, you can download pre-compiled installer packages from "packages.lammps.org/windows.html"_http://packages.lammps.org/windows.html. These executables are built with most optional packages included and the download includes documentation, potential files, some tools and many examples, but no source code. :line 2.3 Making LAMMPS with optional packages :h4,link(start_3) This section has the following sub-sections: 2.3.1 "Package basics"_#start_3_1 2.3.2 "Including/excluding packages"_#start_3_2 2.3.3 "Packages that require extra libraries"_#start_3_3 :all(b) :line Package basics: :h5,link(start_3_1) The source code for LAMMPS is structured as a set of core files which are always included, plus optional packages. Packages are groups of files that enable a specific set of features. For example, force fields for molecular systems or granular systems are in packages. "Section 4"_Section_packages.html in the manual has details about all the packages, which come in two flavors: [standard] and [user] packages. It also has specific instructions for building LAMMPS with any package which requires an extra library. General instructions are below. You can see the list of all packages by typing "make package" from within the src directory of the LAMMPS distribution. It will also list various make commands that can be used to manage packages. If you use a command in a LAMMPS input script that is part of a package, you must have built LAMMPS with that package, else you will get an error that the style is invalid or the command is unknown. Every command's doc page specfies if it is part of a package. You can type lmp_machine -h :pre to run your executable with the optional "-h command-line switch"_#start_6 for "help", which will list the styles and commands known to your executable, and immediately exit. :line Including/excluding packages :h5,link(start_3_2) To use (or not use) a package you must install it (or un-install it) before building LAMMPS. From the src directory, this is as simple as: make yes-colloid make mpi :pre or make no-user-omp make mpi :pre NOTE: You should NOT install/un-install packages and build LAMMPS in a single make command using multiple targets, e.g. make yes-colloid mpi. This is because the make procedure creates a list of source files that will be out-of-date for the build if the package configuration changes within the same command. Any package can be installed or not in a LAMMPS build, independent of all other packages. However, some packages include files derived from files in other packages. LAMMPS checks for this and does the right thing. I.e. individual files are only included if their dependencies are already included. Likewise, if a package is excluded, other files dependent on that package are also excluded. NOTE: The one exception is that we do not recommend building with both the KOKKOS package installed and any of the other acceleration packages (GPU, OPT, USER-INTEL, USER-OMP) also installed. This is because of how Kokkos sometimes builds using a wrapper compiler which can make it difficult to invoke all the compile/link flags correctly for both Kokkos and non-Kokkos files. If you will never run simulations that use the features in a particular packages, there is no reason to include it in your build. For some packages, this will keep you from having to build extra libraries, and will also produce a smaller executable which may run a bit faster. When you download a LAMMPS tarball, three packages are pre-installed in the src directory -- KSPACE, MANYBODY, MOLECULE -- because they are so commonly used. When you download LAMMPS source files from the SVN or Git repositories, no packages are pre-installed. Packages are installed or un-installed by typing make yes-name make no-name :pre where "name" is the name of the package in lower-case, e.g. name = kspace for the KSPACE package or name = user-atc for the USER-ATC package. You can also type any of these commands: make yes-all | install all packages make no-all | un-install all packages make yes-standard or make yes-std | install standard packages make no-standard or make no-std| un-install standard packages make yes-user | install user packages make no-user | un-install user packages make yes-lib | install packages that require extra libraries make no-lib | un-install packages that require extra libraries make yes-ext | install packages that require external libraries make no-ext | un-install packages that require external libraries :tb(s=|) which install/un-install various sets of packages. Typing "make package" will list all the these commands. NOTE: Installing or un-installing a package works by simply moving files back and forth between the main src directory and sub-directories with the package name (e.g. src/KSPACE, src/USER-ATC), so that the files are included or excluded when LAMMPS is built. After you have installed or un-installed a package, you must re-build LAMMPS for the action to take effect. The following make commands help manage files that exist in both the src directory and in package sub-directories. You do not normally need to use these commands unless you are editing LAMMPS files or have downloaded a patch from the LAMMPS web site. Typing "make package-status" or "make ps" will show which packages are currently installed. For those that are installed, it will list any files that are different in the src directory and package sub-directory. Typing "make package-update" or "make pu" will overwrite src files with files from the package sub-directories if the package is installed. It should be used after a patch has been applied, since patches only update the files in the package sub-directory, but not the src files. Typing "make package-overwrite" will overwrite files in the package sub-directories with src files. Typing "make package-diff" lists all differences between these files. Again, just type "make package" to see all of the package-related make options. :line Packages that require extra libraries :h5,link(start_3_3) A few of the standard and user packages require extra libraries. See "Section 4"_Section_packages.html for two tables of packages which indicate which ones require libraries. For each such package, the Section 4 doc page gives details on how to build the extra library, including how to download it if necessary. The basic ideas are summarized here. [System libraries:] Packages in the tables "Section 4"_Section_packages.html with a "sys" in the last column link to system libraries that typically already exist on your machine. E.g. the python package links to a system Python library. If your machine does not have the required library, you will have to download and install it on your machine, in either the system or user space. [Internal libraries:] Packages in the tables "Section 4"_Section_packages.html with an "int" in the last column link to internal libraries whose source code is included with LAMMPS, in the lib/name directory where name is the package name. You must first build the library in that directory before building LAMMPS with that package installed. E.g. the gpu package links to a library you build in the lib/gpu dir. You can often do the build in one step by typing "make lib-name args=..." from the src dir, with appropriate arguments. You can leave off the args to see a help message. See "Section 4"_Section_packages.html for details for each package. [External libraries:] Packages in the tables "Section 4"_Section_packages.html with an "ext" in the last column link to exernal libraries whose source code is not included with LAMMPS. You must first download and install the library before building LAMMPS with that package installed. E.g. the voronoi package links to the freely available "Voro++ library"_voro_home2. You can often do the download/build in one step by typing "make lib-name args=..." from the src dir, with appropriate arguments. You can leave off the args to see a help message. See "Section 4"_Section_packages.html for details for each package. :link(voro_home2,http://math.lbl.gov/voro++) [Possible errors:] There are various common errors which can occur when building extra libraries or when building LAMMPS with packages that require the extra libraries. If you cannot build the extra library itself successfully, you may need to edit or create an appropriate Makefile for your machine, e.g. with appropriate compiler or system settings. Provided makefiles are typically in the lib/name directory. E.g. see the Makefile.* files in lib/gpu. The LAMMPS build often uses settings in a lib/name/Makefile.lammps file which either exists in the LAMMPS distribution or is created or copied from a lib/name/Makefile.lammps.* file when the library is built. If those settings are not correct for your machine you will need to edit or create an appropriate Makefile.lammps file. Package-specific details for these steps are given in "Section 4"_Section_packages.html an in README files in the lib/name directories. [Compiler options needed for accelerator packages:] Several packages contain code that is optimized for specific hardware, e.g. CPU, KNL, or GPU. These are the OPT, GPU, KOKKOS, USER-INTEL, and USER-OMP packages. Compiling and linking the source files in these accelerator packages for optimal performance requires specific settings in the Makefile.machine file you use. A summary of the Makefile.machine settings needed for each of these packages is given in "Section 4"_Section_packages.html. More info is given on the doc pages that describe each package in detail: 5.3.1 "USER-INTEL package"_accelerate_intel.html 5.3.2 "GPU package"_accelerate_intel.html 5.3.3 "KOKKOS package"_accelerate_kokkos.html 5.3.4 "USER-OMP package"_accelerate_omp.html 5.3.5 "OPT package"_accelerate_opt.html :all(b) You can also use or examine the following machine Makefiles in src/MAKE/OPTIONS, which include the settings. Note that the USER-INTEL and KOKKOS packages can use settings that build LAMMPS for different hardware. The USER-INTEL package can be compiled for Intel CPUs and KNLs; the KOKKOS package builds for CPUs (OpenMP), GPUs (CUDA), and Intel KNLs. Makefile.intel_cpu Makefile.intel_phi Makefile.kokkos_omp -Makefile.kokkos_cuda +Makefile.kokkos_cuda_mpi Makefile.kokkos_phi Makefile.omp Makefile.opt :ul :line 2.4 Building LAMMPS as a library :h4,link(start_4) LAMMPS can be built as either a static or shared library, which can then be called from another application or a scripting language. See "this section"_Section_howto.html#howto_10 for more info on coupling LAMMPS to other codes. See "this section"_Section_python.html for more info on wrapping and running LAMMPS from Python. Static library :h5 To build LAMMPS as a static library (*.a file on Linux), type make foo mode=lib :pre where foo is the machine name. This kind of library is typically used to statically link a driver application to LAMMPS, so that you can insure all dependencies are satisfied at compile time. This will use the ARCHIVE and ARFLAGS settings in src/MAKE/Makefile.foo. The build will create the file liblammps_foo.a which another application can link to. It will also create a soft link liblammps.a, which will point to the most recently built static library. Shared library :h5 To build LAMMPS as a shared library (*.so file on Linux), which can be dynamically loaded, e.g. from Python, type make foo mode=shlib :pre where foo is the machine name. This kind of library is required when wrapping LAMMPS with Python; see "Section 11"_Section_python.html for details. This will use the SHFLAGS and SHLIBFLAGS settings in src/MAKE/Makefile.foo and perform the build in the directory Obj_shared_foo. This is so that each file can be compiled with the -fPIC flag which is required for inclusion in a shared library. The build will create the file liblammps_foo.so which another application can link to dyamically. It will also create a soft link liblammps.so, which will point to the most recently built shared library. This is the file the Python wrapper loads by default. Note that for a shared library to be usable by a calling program, all the auxiliary libraries it depends on must also exist as shared libraries. This will be the case for libraries included with LAMMPS, such as the dummy MPI library in src/STUBS or any package libraries in lib/packages, since they are always built as shared libraries using the -fPIC switch. However, if a library like MPI or FFTW does not exist as a shared library, the shared library build will generate an error. This means you will need to install a shared library version of the auxiliary library. The build instructions for the library should tell you how to do this. Here is an example of such errors when the system FFTW or provided lib/colvars library have not been built as shared libraries: /usr/bin/ld: /usr/local/lib/libfftw3.a(mapflags.o): relocation R_X86_64_32 against '.rodata' can not be used when making a shared object; recompile with -fPIC /usr/local/lib/libfftw3.a: could not read symbols: Bad value :pre /usr/bin/ld: ../../lib/colvars/libcolvars.a(colvarmodule.o): relocation R_X86_64_32 against '__pthread_key_create' can not be used when making a shared object; recompile with -fPIC ../../lib/colvars/libcolvars.a: error adding symbols: Bad value :pre As an example, here is how to build and install the "MPICH library"_mpich, a popular open-source version of MPI, distributed by Argonne National Labs, as a shared library in the default /usr/local/lib location: :link(mpich,http://www-unix.mcs.anl.gov/mpi) ./configure --enable-shared make make install :pre You may need to use "sudo make install" in place of the last line if you do not have write privileges for /usr/local/lib. The end result should be the file /usr/local/lib/libmpich.so. [Additional requirement for using a shared library:] :h5 The operating system finds shared libraries to load at run-time using the environment variable LD_LIBRARY_PATH. So you may wish to copy the file src/liblammps.so or src/liblammps_g++.so (for example) to a place the system can find it by default, such as /usr/local/lib, or you may wish to add the LAMMPS src directory to LD_LIBRARY_PATH, so that the current version of the shared library is always available to programs that use it. For the csh or tcsh shells, you would add something like this to your ~/.cshrc file: setenv LD_LIBRARY_PATH $\{LD_LIBRARY_PATH\}:/home/sjplimp/lammps/src :pre Calling the LAMMPS library :h5 Either flavor of library (static or shared) allows one or more LAMMPS objects to be instantiated from the calling program. When used from a C++ program, all of LAMMPS is wrapped in a LAMMPS_NS namespace; you can safely use any of its classes and methods from within the calling code, as needed. When used from a C or Fortran program or a scripting language like Python, the library has a simple function-style interface, provided in src/library.cpp and src/library.h. See the sample codes in examples/COUPLE/simple for examples of C++ and C and Fortran codes that invoke LAMMPS thru its library interface. There are other examples as well in the COUPLE directory which are discussed in "Section 6.10"_Section_howto.html#howto_10 of the manual. See "Section 11"_Section_python.html of the manual for a description of the Python wrapper provided with LAMMPS that operates through the LAMMPS library interface. The files src/library.cpp and library.h define the C-style API for using LAMMPS as a library. See "Section 6.19"_Section_howto.html#howto_19 of the manual for a description of the interface and how to extend it for your needs. :line 2.5 Running LAMMPS :h4,link(start_5) By default, LAMMPS runs by reading commands from standard input. Thus if you run the LAMMPS executable by itself, e.g. lmp_linux :pre it will simply wait, expecting commands from the keyboard. Typically you should put commands in an input script and use I/O redirection, e.g. lmp_linux < in.file :pre For parallel environments this should also work. If it does not, use the '-in' command-line switch, e.g. lmp_linux -in in.file :pre "This section"_Section_commands.html describes how input scripts are structured and what commands they contain. You can test LAMMPS on any of the sample inputs provided in the examples or bench directory. Input scripts are named in.* and sample outputs are named log.*.name.P where name is a machine and P is the number of processors it was run on. Here is how you might run a standard Lennard-Jones benchmark on a Linux box, using mpirun to launch a parallel job: cd src make linux cp lmp_linux ../bench cd ../bench mpirun -np 4 lmp_linux -in in.lj :pre See "this page"_bench for timings for this and the other benchmarks on various platforms. Note that some of the example scripts require LAMMPS to be built with one or more of its optional packages. :link(bench,http://lammps.sandia.gov/bench.html) :line On a Windows box, you can skip making LAMMPS and simply download an installer package from "here"_http://packages.lammps.org/windows.html For running the non-MPI executable, follow these steps: Get a command prompt by going to Start->Run... , then typing "cmd". :ulb,l Move to the directory where you have your input, e.g. a copy of the [in.lj] input from the bench folder. (e.g. by typing: cd "Documents"). :l At the command prompt, type "lmp_serial -in in.lj", replacing [in.lj] with the name of your LAMMPS input script. :l The serial executable includes support for multi-threading parallelization from the styles in the USER-OMP packages. To run with, e.g. 4 threads, type "lmp_serial -in in.lj -pk omp 4 -sf omp" :ule For the MPI version, which allows you to run LAMMPS under Windows with the more general message passing parallel library (LAMMPS has been designed from ground up to use MPI efficiently), follow these steps: Download and install a compatible MPI library binary package: for 32-bit Windows "mpich2-1.4.1p1-win-ia32.msi"_download.lammps.org/thirdparty/mpich2-1.4.1p1-win-ia32.msi and for 64-bit Windows "mpich2-1.4.1p1-win-x86-64.msi"_download.lammps.org/thirdparty/mpich2-1.4.1p1-win-x86-64.msi :ulb,l The LAMMPS Windows installer packages will automatically adjust your path for the default location of this MPI package. After the installation of the MPICH2 software, it needs to be integrated into the system. For this you need to start a Command Prompt in {Administrator Mode} (right click on the icon and select it). Change into the MPICH2 installation directory, then into the subdirectory [bin] and execute [smpd.exe -install]. Exit the command window. Get a new, regular command prompt by going to Start->Run... , then typing "cmd". :l Move to the directory where you have your input file (e.g. by typing: cd "Documents"). :l Then type something like this: mpiexec -localonly 4 lmp_mpi -in in.lj :pre or mpiexec -np 4 lmp_mpi -in in.lj :pre replacing [in.lj] with the name of your LAMMPS input script. For the latter case, you may be prompted to enter your password. :l In this mode, output may not immediately show up on the screen, so if your input script takes a long time to execute, you may need to be patient before the output shows up. :l The parallel executable can also run on a single processor by typing something like: lmp_mpi -in in.lj :pre And the parallel executable also includes OpenMP multi-threading, which can be combined with MPI using something like: mpiexec -localonly 2 lmp_mpi -in in.lj -pk omp 2 -sf omp :pre :ule :line The screen output from LAMMPS is described in a section below. As it runs, LAMMPS also writes a log.lammps file with the same information. Note that this sequence of commands copies the LAMMPS executable (lmp_linux) to the directory with the input files. This may not be necessary, but some versions of MPI reset the working directory to where the executable is, rather than leave it as the directory where you launch mpirun from (if you launch lmp_linux on its own and not under mpirun). If that happens, LAMMPS will look for additional input files and write its output files to the executable directory, rather than your working directory, which is probably not what you want. If LAMMPS encounters errors in the input script or while running a simulation it will print an ERROR message and stop or a WARNING message and continue. See "Section 12"_Section_errors.html for a discussion of the various kinds of errors LAMMPS can or can't detect, a list of all ERROR and WARNING messages, and what to do about them. LAMMPS can run a problem on any number of processors, including a single processor. In theory you should get identical answers on any number of processors and on any machine. In practice, numerical round-off can cause slight differences and eventual divergence of molecular dynamics phase space trajectories. LAMMPS can run as large a problem as will fit in the physical memory of one or more processors. If you run out of memory, you must run on more processors or setup a smaller problem. :line 2.6 Command-line options :h4,link(start_6) At run time, LAMMPS recognizes several optional command-line switches which may be used in any order. Either the full word or a one-or-two letter abbreviation can be used: -e or -echo -h or -help -i or -in -k or -kokkos -l or -log -nc or -nocite -pk or -package -p or -partition -pl or -plog -ps or -pscreen -r or -restart -ro or -reorder -sc or -screen -sf or -suffix -v or -var :ul For example, lmp_ibm might be launched as follows: mpirun -np 16 lmp_ibm -v f tmp.out -l my.log -sc none -in in.alloy mpirun -np 16 lmp_ibm -var f tmp.out -log my.log -screen none -in in.alloy :pre Here are the details on the options: -echo style :pre Set the style of command echoing. The style can be {none} or {screen} or {log} or {both}. Depending on the style, each command read from the input script will be echoed to the screen and/or logfile. This can be useful to figure out which line of your script is causing an input error. The default value is {log}. The echo style can also be set by using the "echo"_echo.html command in the input script itself. -help :pre Print a brief help summary and a list of options compiled into this executable for each LAMMPS style (atom_style, fix, compute, pair_style, bond_style, etc). This can tell you if the command you want to use was included via the appropriate package at compile time. LAMMPS will print the info and immediately exit if this switch is used. -in file :pre Specify a file to use as an input script. This is an optional switch when running LAMMPS in one-partition mode. If it is not specified, LAMMPS reads its script from standard input, typically from a script via I/O redirection; e.g. lmp_linux < in.run. I/O redirection should also work in parallel, but if it does not (in the unlikely case that an MPI implementation does not support it), then use the -in flag. Note that this is a required switch when running LAMMPS in multi-partition mode, since multiple processors cannot all read from stdin. -kokkos on/off keyword/value ... :pre Explicitly enable or disable KOKKOS support, as provided by the KOKKOS package. Even if LAMMPS is built with this package, as described above in "Section 2.3"_#start_3, this switch must be set to enable running with the KOKKOS-enabled styles the package provides. If the switch is not set (the default), LAMMPS will operate as if the KOKKOS package were not installed; i.e. you can run standard LAMMPS or with the GPU or USER-OMP packages, for testing or benchmarking purposes. Additional optional keyword/value pairs can be specified which determine how Kokkos will use the underlying hardware on your platform. These settings apply to each MPI task you launch via the "mpirun" or "mpiexec" command. You may choose to run one or more MPI tasks per physical node. Note that if you are running on a desktop machine, you typically have one physical node. On a cluster or supercomputer there may be dozens or 1000s of physical nodes. Either the full word or an abbreviation can be used for the keywords. Note that the keywords do not use a leading minus sign. I.e. the keyword is "t", not "-t". Also note that each of the keywords has a default setting. Example of when to use these options and what settings to use on different platforms is given in "Section 5.3"_Section_accelerate.html#acc_3. d or device g or gpus t or threads n or numa :ul device Nd :pre This option is only relevant if you built LAMMPS with CUDA=yes, you have more than one GPU per node, and if you are running with only one MPI task per node. The Nd setting is the ID of the GPU on the node to run on. By default Nd = 0. If you have multiple GPUs per node, they have consecutive IDs numbered as 0,1,2,etc. This setting allows you to launch multiple independent jobs on the node, each with a single MPI task per node, and assign each job to run on a different GPU. gpus Ng Ns :pre This option is only relevant if you built LAMMPS with CUDA=yes, you have more than one GPU per node, and you are running with multiple MPI tasks per node (up to one per GPU). The Ng setting is how many GPUs you will use. The Ns setting is optional. If set, it is the ID of a GPU to skip when assigning MPI tasks to GPUs. This may be useful if your desktop system reserves one GPU to drive the screen and the rest are intended for computational work like running LAMMPS. By default Ng = 1 and Ns is not set. Depending on which flavor of MPI you are running, LAMMPS will look for one of these 3 environment variables SLURM_LOCALID (various MPI variants compiled with SLURM support) MV2_COMM_WORLD_LOCAL_RANK (Mvapich) OMPI_COMM_WORLD_LOCAL_RANK (OpenMPI) :pre which are initialized by the "srun", "mpirun" or "mpiexec" commands. The environment variable setting for each MPI rank is used to assign a unique GPU ID to the MPI task. threads Nt :pre This option assigns Nt number of threads to each MPI task for performing work when Kokkos is executing in OpenMP or pthreads mode. The default is Nt = 1, which essentially runs in MPI-only mode. If there are Np MPI tasks per physical node, you generally want Np*Nt = the number of physical cores per node, to use your available hardware optimally. This also sets the number of threads used by the host when LAMMPS is compiled with CUDA=yes. numa Nm :pre This option is only relevant when using pthreads with hwloc support. In this case Nm defines the number of NUMA regions (typicaly sockets) on a node which will be utilizied by a single MPI rank. By default Nm = 1. If this option is used the total number of worker-threads per MPI rank is threads*numa. Currently it is always almost better to assign at least one MPI rank per NUMA region, and leave numa set to its default value of 1. This is because letting a single process span multiple NUMA regions induces a significant amount of cross NUMA data traffic which is slow. -log file :pre Specify a log file for LAMMPS to write status information to. In one-partition mode, if the switch is not used, LAMMPS writes to the file log.lammps. If this switch is used, LAMMPS writes to the specified file. In multi-partition mode, if the switch is not used, a log.lammps file is created with hi-level status information. Each partition also writes to a log.lammps.N file where N is the partition ID. If the switch is specified in multi-partition mode, the hi-level logfile is named "file" and each partition also logs information to a file.N. For both one-partition and multi-partition mode, if the specified file is "none", then no log files are created. Using a "log"_log.html command in the input script will override this setting. Option -plog will override the name of the partition log files file.N. -nocite :pre Disable writing the log.cite file which is normally written to list references for specific cite-able features used during a LAMMPS run. See the "citation page"_http://lammps.sandia.gov/cite.html for more details. -package style args .... :pre Invoke the "package"_package.html command with style and args. The syntax is the same as if the command appeared at the top of the input script. For example "-package gpu 2" or "-pk gpu 2" is the same as "package gpu 2"_package.html in the input script. The possible styles and args are documented on the "package"_package.html doc page. This switch can be used multiple times, e.g. to set options for the USER-INTEL and USER-OMP packages which can be used together. Along with the "-suffix" command-line switch, this is a convenient mechanism for invoking accelerator packages and their options without having to edit an input script. -partition 8x2 4 5 ... :pre Invoke LAMMPS in multi-partition mode. When LAMMPS is run on P processors and this switch is not used, LAMMPS runs in one partition, i.e. all P processors run a single simulation. If this switch is used, the P processors are split into separate partitions and each partition runs its own simulation. The arguments to the switch specify the number of processors in each partition. Arguments of the form MxN mean M partitions, each with N processors. Arguments of the form N mean a single partition with N processors. The sum of processors in all partitions must equal P. Thus the command "-partition 8x2 4 5" has 10 partitions and runs on a total of 25 processors. Running with multiple partitions can e useful for running "multi-replica simulations"_Section_howto.html#howto_5, where each replica runs on on one or a few processors. Note that with MPI installed on a machine (e.g. your desktop), you can run on more (virtual) processors than you have physical processors. To run multiple independent simulatoins from one input script, using multiple partitions, see "Section 6.4"_Section_howto.html#howto_4 of the manual. World- and universe-style "variables"_variable.html are useful in this context. -plog file :pre Specify the base name for the partition log files, so partition N writes log information to file.N. If file is none, then no partition log files are created. This overrides the filename specified in the -log command-line option. This option is useful when working with large numbers of partitions, allowing the partition log files to be suppressed (-plog none) or placed in a sub-directory (-plog replica_files/log.lammps) If this option is not used the log file for partition N is log.lammps.N or whatever is specified by the -log command-line option. -pscreen file :pre Specify the base name for the partition screen file, so partition N writes screen information to file.N. If file is none, then no partition screen files are created. This overrides the filename specified in the -screen command-line option. This option is useful when working with large numbers of partitions, allowing the partition screen files to be suppressed (-pscreen none) or placed in a sub-directory (-pscreen replica_files/screen). If this option is not used the screen file for partition N is screen.N or whatever is specified by the -screen command-line option. -restart restartfile {remap} datafile keyword value ... :pre Convert the restart file into a data file and immediately exit. This is the same operation as if the following 2-line input script were run: read_restart restartfile {remap} write_data datafile keyword value ... :pre Note that the specified restartfile and datafile can have wild-card characters ("*",%") as described by the "read_restart"_read_restart.html and "write_data"_write_data.html commands. But a filename such as file.* will need to be enclosed in quotes to avoid shell expansion of the "*" character. Note that following restartfile, the optional flag {remap} can be used. This has the same effect as adding it to the "read_restart"_read_restart.html command, as explained on its doc page. This is only useful if the reading of the restart file triggers an error that atoms have been lost. In that case, use of the remap flag should allow the data file to still be produced. Also note that following datafile, the same optional keyword/value pairs can be listed as used by the "write_data"_write_data.html command. -reorder nth N -reorder custom filename :pre Reorder the processors in the MPI communicator used to instantiate LAMMPS, in one of several ways. The original MPI communicator ranks all P processors from 0 to P-1. The mapping of these ranks to physical processors is done by MPI before LAMMPS begins. It may be useful in some cases to alter the rank order. E.g. to insure that cores within each node are ranked in a desired order. Or when using the "run_style verlet/split"_run_style.html command with 2 partitions to insure that a specific Kspace processor (in the 2nd partition) is matched up with a specific set of processors in the 1st partition. See the "Section 5"_Section_accelerate.html doc pages for more details. If the keyword {nth} is used with a setting {N}, then it means every Nth processor will be moved to the end of the ranking. This is useful when using the "run_style verlet/split"_run_style.html command with 2 partitions via the -partition command-line switch. The first set of processors will be in the first partition, the 2nd set in the 2nd partition. The -reorder command-line switch can alter this so that the 1st N procs in the 1st partition and one proc in the 2nd partition will be ordered consecutively, e.g. as the cores on one physical node. This can boost performance. For example, if you use "-reorder nth 4" and "-partition 9 3" and you are running on 12 processors, the processors will be reordered from 0 1 2 3 4 5 6 7 8 9 10 11 :pre to 0 1 2 4 5 6 8 9 10 3 7 11 :pre so that the processors in each partition will be 0 1 2 4 5 6 8 9 10 3 7 11 :pre See the "processors" command for how to insure processors from each partition could then be grouped optimally for quad-core nodes. If the keyword is {custom}, then a file that specifies a permutation of the processor ranks is also specified. The format of the reorder file is as follows. Any number of initial blank or comment lines (starting with a "#" character) can be present. These should be followed by P lines of the form: I J :pre where P is the number of processors LAMMPS was launched with. Note that if running in multi-partition mode (see the -partition switch above) P is the total number of processors in all partitions. The I and J values describe a permutation of the P processors. Every I and J should be values from 0 to P-1 inclusive. In the set of P I values, every proc ID should appear exactly once. Ditto for the set of P J values. A single I,J pairing means that the physical processor with rank I in the original MPI communicator will have rank J in the reordered communicator. Note that rank ordering can also be specified by many MPI implementations, either by environment variables that specify how to order physical processors, or by config files that specify what physical processors to assign to each MPI rank. The -reorder switch simply gives you a portable way to do this without relying on MPI itself. See the "processors out"_processors.html command for how to output info on the final assignment of physical processors to the LAMMPS simulation domain. -screen file :pre Specify a file for LAMMPS to write its screen information to. In one-partition mode, if the switch is not used, LAMMPS writes to the screen. If this switch is used, LAMMPS writes to the specified file instead and you will see no screen output. In multi-partition mode, if the switch is not used, hi-level status information is written to the screen. Each partition also writes to a screen.N file where N is the partition ID. If the switch is specified in multi-partition mode, the hi-level screen dump is named "file" and each partition also writes screen information to a file.N. For both one-partition and multi-partition mode, if the specified file is "none", then no screen output is performed. Option -pscreen will override the name of the partition screen files file.N. -suffix style args :pre Use variants of various styles if they exist. The specified style can be {cuda}, {gpu}, {intel}, {kk}, {omp}, {opt}, or {hybrid}. These refer to optional packages that LAMMPS can be built with, as described above in "Section 2.3"_#start_3. The "gpu" style corresponds to the GPU package, the "intel" style to the USER-INTEL package, the "kk" style to the KOKKOS package, the "opt" style to the OPT package, and the "omp" style to the USER-OMP package. The hybrid style is the only style that accepts arguments. It allows for two packages to be specified. The first package specified is the default and will be used if it is available. If no style is available for the first package, the style for the second package will be used if available. For example, "-suffix hybrid intel omp" will use styles from the USER-INTEL package if they are installed and available, but styles for the USER-OMP package otherwise. Along with the "-package" command-line switch, this is a convenient mechanism for invoking accelerator packages and their options without having to edit an input script. As an example, all of the packages provide a "pair_style lj/cut"_pair_lj.html variant, with style names lj/cut/gpu, lj/cut/intel, lj/cut/kk, lj/cut/omp, and lj/cut/opt. A variant style can be specified explicitly in your input script, e.g. pair_style lj/cut/gpu. If the -suffix switch is used the specified suffix (gpu,intel,kk,omp,opt) is automatically appended whenever your input script command creates a new "atom"_atom_style.html, "pair"_pair_style.html, "fix"_fix.html, "compute"_compute.html, or "run"_run_style.html style. If the variant version does not exist, the standard version is created. For the GPU package, using this command-line switch also invokes the default GPU settings, as if the command "package gpu 1" were used at the top of your input script. These settings can be changed by using the "-package gpu" command-line switch or the "package gpu"_package.html command in your script. For the USER-INTEL package, using this command-line switch also invokes the default USER-INTEL settings, as if the command "package intel 1" were used at the top of your input script. These settings can be changed by using the "-package intel" command-line switch or the "package intel"_package.html command in your script. If the USER-OMP package is also installed, the hybrid style with "intel omp" arguments can be used to make the omp suffix a second choice, if a requested style is not available in the USER-INTEL package. It will also invoke the default USER-OMP settings, as if the command "package omp 0" were used at the top of your input script. These settings can be changed by using the "-package omp" command-line switch or the "package omp"_package.html command in your script. For the KOKKOS package, using this command-line switch also invokes the default KOKKOS settings, as if the command "package kokkos" were used at the top of your input script. These settings can be changed by using the "-package kokkos" command-line switch or the "package kokkos"_package.html command in your script. For the OMP package, using this command-line switch also invokes the default OMP settings, as if the command "package omp 0" were used at the top of your input script. These settings can be changed by using the "-package omp" command-line switch or the "package omp"_package.html command in your script. The "suffix"_suffix.html command can also be used within an input script to set a suffix, or to turn off or back on any suffix setting made via the command line. -var name value1 value2 ... :pre Specify a variable that will be defined for substitution purposes when the input script is read. This switch can be used multiple times to define multiple variables. "Name" is the variable name which can be a single character (referenced as $x in the input script) or a full string (referenced as $\{abc\}). An "index-style variable"_variable.html will be created and populated with the subsequent values, e.g. a set of filenames. Using this command-line option is equivalent to putting the line "variable name index value1 value2 ..." at the beginning of the input script. Defining an index variable as a command-line argument overrides any setting for the same index variable in the input script, since index variables cannot be re-defined. See the "variable"_variable.html command for more info on defining index and other kinds of variables and "this section"_Section_commands.html#cmd_2 for more info on using variables in input scripts. NOTE: Currently, the command-line parser looks for arguments that start with "-" to indicate new switches. Thus you cannot specify multiple variable values if any of they start with a "-", e.g. a negative numeric value. It is OK if the first value1 starts with a "-", since it is automatically skipped. :line 2.7 LAMMPS screen output :h4,link(start_7) As LAMMPS reads an input script, it prints information to both the screen and a log file about significant actions it takes to setup a simulation. When the simulation is ready to begin, LAMMPS performs various initializations and prints the amount of memory (in MBytes per processor) that the simulation requires. It also prints details of the initial thermodynamic state of the system. During the run itself, thermodynamic information is printed periodically, every few timesteps. When the run concludes, LAMMPS prints the final thermodynamic state and a total run time for the simulation. It then appends statistics about the CPU time and storage requirements for the simulation. An example set of statistics is shown here: Loop time of 2.81192 on 4 procs for 300 steps with 2004 atoms :pre Performance: 18.436 ns/day 1.302 hours/ns 106.689 timesteps/s 97.0% CPU use with 4 MPI tasks x no OpenMP threads :pre MPI task timings breakdown: Section | min time | avg time | max time |%varavg| %total --------------------------------------------------------------- Pair | 1.9808 | 2.0134 | 2.0318 | 1.4 | 71.60 Bond | 0.0021894 | 0.0060319 | 0.010058 | 4.7 | 0.21 Kspace | 0.3207 | 0.3366 | 0.36616 | 3.1 | 11.97 Neigh | 0.28411 | 0.28464 | 0.28516 | 0.1 | 10.12 Comm | 0.075732 | 0.077018 | 0.07883 | 0.4 | 2.74 Output | 0.00030518 | 0.00042665 | 0.00078821 | 1.0 | 0.02 Modify | 0.086606 | 0.086631 | 0.086668 | 0.0 | 3.08 Other | | 0.007178 | | | 0.26 :pre Nlocal: 501 ave 508 max 490 min Histogram: 1 0 0 0 0 0 1 1 0 1 Nghost: 6586.25 ave 6628 max 6548 min Histogram: 1 0 1 0 0 0 1 0 0 1 Neighs: 177007 ave 180562 max 170212 min Histogram: 1 0 0 0 0 0 0 1 1 1 :pre Total # of neighbors = 708028 Ave neighs/atom = 353.307 Ave special neighs/atom = 2.34032 Neighbor list builds = 26 Dangerous builds = 0 :pre The first section provides a global loop timing summary. The {loop time} is the total wall time for the section. The {Performance} line is provided for convenience to help predicting the number of loop continuations required and for comparing performance with other, similar MD codes. The {CPU use} line provides the CPU utilzation per MPI task; it should be close to 100% times the number of OpenMP threads (or 1 of no OpenMP). Lower numbers correspond to delays due to file I/O or insufficient thread utilization. The MPI task section gives the breakdown of the CPU run time (in seconds) into major categories: {Pair} stands for all non-bonded force computation {Bond} stands for bonded interactions: bonds, angles, dihedrals, impropers {Kspace} stands for reciprocal space interactions: Ewald, PPPM, MSM {Neigh} stands for neighbor list construction {Comm} stands for communicating atoms and their properties {Output} stands for writing dumps and thermo output {Modify} stands for fixes and computes called by them {Other} is the remaining time :ul For each category, there is a breakdown of the least, average and most amount of wall time a processor spent on this section. Also you have the variation from the average time. Together these numbers allow to gauge the amount of load imbalance in this segment of the calculation. Ideally the difference between minimum, maximum and average is small and thus the variation from the average close to zero. The final column shows the percentage of the total loop time is spent in this section. When using the "timer full"_timer.html setting, an additional column is present that also prints the CPU utilization in percent. In addition, when using {timer full} and the "package omp"_package.html command are active, a similar timing summary of time spent in threaded regions to monitor thread utilization and load balance is provided. A new entry is the {Reduce} section, which lists the time spent in reducing the per-thread data elements to the storage for non-threaded computation. These thread timings are taking from the first MPI rank only and and thus, as the breakdown for MPI tasks can change from MPI rank to MPI rank, this breakdown can be very different for individual ranks. Here is an example output for this section: Thread timings breakdown (MPI rank 0): Total threaded time 0.6846 / 90.6% Section | min time | avg time | max time |%varavg| %total --------------------------------------------------------------- Pair | 0.5127 | 0.5147 | 0.5167 | 0.3 | 75.18 Bond | 0.0043139 | 0.0046779 | 0.0050418 | 0.5 | 0.68 Kspace | 0.070572 | 0.074541 | 0.07851 | 1.5 | 10.89 Neigh | 0.084778 | 0.086969 | 0.089161 | 0.7 | 12.70 Reduce | 0.0036485 | 0.003737 | 0.0038254 | 0.1 | 0.55 :pre The third section lists the number of owned atoms (Nlocal), ghost atoms (Nghost), and pair-wise neighbors stored per processor. The max and min values give the spread of these values across processors with a 10-bin histogram showing the distribution. The total number of histogram counts is equal to the number of processors. The last section gives aggregate statistics for pair-wise neighbors and special neighbors that LAMMPS keeps track of (see the "special_bonds"_special_bonds.html command). The number of times neighbor lists were rebuilt during the run is given as well as the number of potentially "dangerous" rebuilds. If atom movement triggered neighbor list rebuilding (see the "neigh_modify"_neigh_modify.html command), then dangerous reneighborings are those that were triggered on the first timestep atom movement was checked for. If this count is non-zero you may wish to reduce the delay factor to insure no force interactions are missed by atoms moving beyond the neighbor skin distance before a rebuild takes place. If an energy minimization was performed via the "minimize"_minimize.html command, additional information is printed, e.g. Minimization stats: Stopping criterion = linesearch alpha is zero Energy initial, next-to-last, final = -6372.3765206 -8328.46998942 -8328.46998942 Force two-norm initial, final = 1059.36 5.36874 Force max component initial, final = 58.6026 1.46872 Final line search alpha, max atom move = 2.7842e-10 4.0892e-10 Iterations, force evaluations = 701 1516 :pre The first line prints the criterion that determined the minimization to be completed. The third line lists the initial and final energy, as well as the energy on the next-to-last iteration. The next 2 lines give a measure of the gradient of the energy (force on all atoms). The 2-norm is the "length" of this force vector; the inf-norm is the largest component. Then some information about the line search and statistics on how many iterations and force-evaluations the minimizer required. Multiple force evaluations are typically done at each iteration to perform a 1d line minimization in the search direction. If a "kspace_style"_kspace_style.html long-range Coulombics solve was performed during the run (PPPM, Ewald), then additional information is printed, e.g. FFT time (% of Kspce) = 0.200313 (8.34477) FFT Gflps 3d 1d-only = 2.31074 9.19989 :pre The first line gives the time spent doing 3d FFTs (4 per timestep) and the fraction it represents of the total KSpace time (listed above). Each 3d FFT requires computation (3 sets of 1d FFTs) and communication (transposes). The total flops performed is 5Nlog_2(N), where N is the number of points in the 3d grid. The FFTs are timed with and without the communication and a Gflop rate is computed. The 3d rate is with communication; the 1d rate is without (just the 1d FFTs). Thus you can estimate what fraction of your FFT time was spent in communication, roughly 75% in the example above. :line 2.8 Tips for users of previous LAMMPS versions :h4,link(start_8) The current C++ began with a complete rewrite of LAMMPS 2001, which was written in F90. Features of earlier versions of LAMMPS are listed in "Section 13"_Section_history.html. The F90 and F77 versions (2001 and 99) are also freely distributed as open-source codes; check the "LAMMPS WWW Site"_lws for distribution information if you prefer those versions. The 99 and 2001 versions are no longer under active development; they do not have all the features of C++ LAMMPS. If you are a previous user of LAMMPS 2001, these are the most significant changes you will notice in C++ LAMMPS: (1) The names and arguments of many input script commands have changed. All commands are now a single word (e.g. read_data instead of read data). (2) All the functionality of LAMMPS 2001 is included in C++ LAMMPS, but you may need to specify the relevant commands in different ways. (3) The format of the data file can be streamlined for some problems. See the "read_data"_read_data.html command for details. The data file section "Nonbond Coeff" has been renamed to "Pair Coeff" in C++ LAMMPS. (4) Binary restart files written by LAMMPS 2001 cannot be read by C++ LAMMPS with a "read_restart"_read_restart.html command. This is because they were output by F90 which writes in a different binary format than C or C++ writes or reads. Use the {restart2data} tool provided with LAMMPS 2001 to convert the 2001 restart file to a text data file. Then edit the data file as necessary before using the C++ LAMMPS "read_data"_read_data.html command to read it in. (5) There are numerous small numerical changes in C++ LAMMPS that mean you will not get identical answers when comparing to a 2001 run. However, your initial thermodynamic energy and MD trajectory should be close if you have setup the problem for both codes the same. diff --git a/doc/src/accelerate_kokkos.txt b/doc/src/accelerate_kokkos.txt index 8d87751f9..2b07ed035 100644 --- a/doc/src/accelerate_kokkos.txt +++ b/doc/src/accelerate_kokkos.txt @@ -1,493 +1,493 @@ "Previous Section"_Section_packages.html - "LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c :link(lws,http://lammps.sandia.gov) :link(ld,Manual.html) :link(lc,Section_commands.html#comm) :line "Return to Section accelerate overview"_Section_accelerate.html 5.3.3 KOKKOS package :h5 The KOKKOS package was developed primarily by Christian Trott (Sandia) with contributions of various styles by others, including Sikandar Mashayak (UIUC), Stan Moore (Sandia), and Ray Shan (Sandia). The underlying Kokkos library was written primarily by Carter Edwards, Christian Trott, and Dan Sunderland (all Sandia). The KOKKOS package contains versions of pair, fix, and atom styles that use data structures and macros provided by the Kokkos library, which is included with LAMMPS in lib/kokkos. The Kokkos library is part of "Trilinos"_http://trilinos.sandia.gov/packages/kokkos and can also be downloaded from "Github"_https://github.com/kokkos/kokkos. Kokkos is a templated C++ library that provides two key abstractions for an application like LAMMPS. First, it allows a single implementation of an application kernel (e.g. a pair style) to run efficiently on different kinds of hardware, such as a GPU, Intel Phi, or many-core CPU. The Kokkos library also provides data abstractions to adjust (at compile time) the memory layout of basic data structures like 2d and 3d arrays and allow the transparent utilization of special hardware load and store operations. Such data structures are used in LAMMPS to store atom coordinates or forces or neighbor lists. The layout is chosen to optimize performance on different platforms. Again this functionality is hidden from the developer, and does not affect how the kernel is coded. These abstractions are set at build time, when LAMMPS is compiled with the KOKKOS package installed. All Kokkos operations occur within the context of an individual MPI task running on a single node of the machine. The total number of MPI tasks used by LAMMPS (one or multiple per compute node) is set in the usual manner via the mpirun or mpiexec commands, and is independent of Kokkos. Kokkos currently provides support for 3 modes of execution (per MPI task). These are OpenMP (for many-core CPUs), Cuda (for NVIDIA GPUs), and OpenMP (for Intel Phi). Note that the KOKKOS package supports running on the Phi in native mode, not offload mode like the USER-INTEL package supports. You choose the mode at build time to produce an executable compatible with specific hardware. Here is a quick overview of how to use the KOKKOS package for CPU acceleration, assuming one or more 16-core nodes. More details follow. use a C++11 compatible compiler make yes-kokkos make mpi KOKKOS_DEVICES=OpenMP # build with the KOKKOS package make kokkos_omp # or Makefile.kokkos_omp already has variable set :pre mpirun -np 16 lmp_mpi -k on -sf kk -in in.lj # 1 node, 16 MPI tasks/node, no threads mpirun -np 2 -ppn 1 lmp_mpi -k on t 16 -sf kk -in in.lj # 2 nodes, 1 MPI task/node, 16 threads/task mpirun -np 2 lmp_mpi -k on t 8 -sf kk -in in.lj # 1 node, 2 MPI tasks/node, 8 threads/task mpirun -np 32 -ppn 4 lmp_mpi -k on t 4 -sf kk -in in.lj # 8 nodes, 4 MPI tasks/node, 4 threads/task :pre specify variables and settings in your Makefile.machine that enable OpenMP, GPU, or Phi support include the KOKKOS package and build LAMMPS enable the KOKKOS package and its hardware options via the "-k on" command-line switch use KOKKOS styles in your input script :ul Here is a quick overview of how to use the KOKKOS package for GPUs, assuming one or more nodes, each with 16 cores and a GPU. More details follow. discuss use of NVCC, which Makefiles to examine use a C++11 compatible compiler KOKKOS_DEVICES = Cuda, OpenMP KOKKOS_ARCH = Kepler35 make yes-kokkos make machine :pre mpirun -np 1 lmp_cuda -k on t 6 -sf kk -in in.lj # one MPI task, 6 threads on CPU mpirun -np 4 -ppn 1 lmp_cuda -k on t 6 -sf kk -in in.lj # ditto on 4 nodes :pre mpirun -np 2 lmp_cuda -k on t 8 g 2 -sf kk -in in.lj # two MPI tasks, 8 threads per CPU mpirun -np 32 -ppn 2 lmp_cuda -k on t 8 g 2 -sf kk -in in.lj # ditto on 16 nodes :pre Here is a quick overview of how to use the KOKKOS package for the Intel Phi: use a C++11 compatible compiler KOKKOS_DEVICES = OpenMP KOKKOS_ARCH = KNC make yes-kokkos make machine :pre host=MIC, Intel Phi with 61 cores (240 threads/phi via 4x hardware threading): mpirun -np 1 lmp_g++ -k on t 240 -sf kk -in in.lj # 1 MPI task on 1 Phi, 1*240 = 240 mpirun -np 30 lmp_g++ -k on t 8 -sf kk -in in.lj # 30 MPI tasks on 1 Phi, 30*8 = 240 mpirun -np 12 lmp_g++ -k on t 20 -sf kk -in in.lj # 12 MPI tasks on 1 Phi, 12*20 = 240 mpirun -np 96 -ppn 12 lmp_g++ -k on t 20 -sf kk -in in.lj # ditto on 8 Phis :pre [Required hardware/software:] Kokkos support within LAMMPS must be built with a C++11 compatible compiler. If using gcc, version 4.7.2 or later is required. To build with Kokkos support for CPUs, your compiler must support the OpenMP interface. You should have one or more multi-core CPUs so that multiple threads can be launched by each MPI task running on a CPU. To build with Kokkos support for NVIDIA GPUs, NVIDIA CUDA software version 7.5 or later must be installed on your system. See the discussion for the "GPU"_accelerate_gpu.html package for details of how to check and do this. NOTE: For good performance of the KOKKOS package on GPUs, you must have Kepler generation GPUs (or later). The Kokkos library exploits texture cache options not supported by Telsa generation GPUs (or older). To build with Kokkos support for Intel Xeon Phi coprocessors, your sysmte must be configured to use them in "native" mode, not "offload" mode like the USER-INTEL package supports. [Building LAMMPS with the KOKKOS package:] You must choose at build time whether to build for CPUs (OpenMP), GPUs, or Phi. You can do any of these in one line, using the suitable make command line flags as described in "Section 4"_Section_packages.html of the manual. If run from the src directory, these -commands will create src/lmp_kokkos_omp, lmp_kokkos_cuda, and +commands will create src/lmp_kokkos_omp, lmp_kokkos_cuda_mpi, and lmp_kokkos_phi. Note that the OMP and PHI options use src/MAKE/Makefile.mpi as the starting Makefile.machine. The CUDA -option uses src/MAKE/OPTIONS/Makefile.kokkos_cuda. +option uses src/MAKE/OPTIONS/Makefile.kokkos_cuda_mpi. The latter two steps can be done using the "-k on", "-pk kokkos" and "-sf kk" "command-line switches"_Section_start.html#start_6 respectively. Or the effect of the "-pk" or "-sf" switches can be duplicated by adding the "package kokkos"_package.html or "suffix kk"_suffix.html commands respectively to your input script. Or you can follow these steps: CPU-only (run all-MPI or with OpenMP threading): cd lammps/src make yes-kokkos make kokkos_omp :pre CPU-only (only MPI, no threading): cd lammps/src make yes-kokkos -make kokkos_mpi :pre +make kokkos_mpi_only :pre Intel Xeon Phi (Intel Compiler, Intel MPI): cd lammps/src make yes-kokkos make kokkos_phi :pre -CPUs and GPUs (with MPICH): +CPUs and GPUs (with MPICH or OpenMPI): cd lammps/src make yes-kokkos -make kokkos_cuda_mpich :pre +make kokkos_cuda_mpi :pre These examples set the KOKKOS-specific OMP, MIC, CUDA variables on the make command line which requires a GNU-compatible make command. Try "gmake" if your system's standard make complains. NOTE: If you build using make line variables and re-build LAMMPS twice with different KOKKOS options and the *same* target, e.g. g++ in the first two examples above, then you *must* perform a "make clean-all" or "make clean-machine" before each build. This is to force all the KOKKOS-dependent files to be re-compiled with the new options. NOTE: Currently, there are no precision options with the KOKKOS package. All compilation and computation is performed in double precision. There are other allowed options when building with the KOKKOS package. As above, they can be set either as variables on the make command line or in Makefile.machine. This is the full list of options, including those discussed above, Each takes a value shown below. The default value is listed, which is set in the lib/kokkos/Makefile.kokkos file. #Default settings specific options #Options: force_uvm,use_ldg,rdc KOKKOS_DEVICES, values = {OpenMP}, {Serial}, {Pthreads}, {Cuda}, default = {OpenMP} KOKKOS_ARCH, values = {KNC}, {SNB}, {HSW}, {Kepler}, {Kepler30}, {Kepler32}, {Kepler35}, {Kepler37}, {Maxwell}, {Maxwell50}, {Maxwell52}, {Maxwell53}, {ARMv8}, {BGQ}, {Power7}, {Power8}, default = {none} KOKKOS_DEBUG, values = {yes}, {no}, default = {no} KOKKOS_USE_TPLS, values = {hwloc}, {librt}, default = {none} KOKKOS_CUDA_OPTIONS, values = {force_uvm}, {use_ldg}, {rdc} :ul KOKKOS_DEVICE sets the parallelization method used for Kokkos code (within LAMMPS). KOKKOS_DEVICES=OpenMP means that OpenMP will be used. KOKKOS_DEVICES=Pthreads means that pthreads will be used. KOKKOS_DEVICES=Cuda means an NVIDIA GPU running CUDA will be used. If KOKKOS_DEVICES=Cuda, then the lo-level Makefile in the src/MAKE directory must use "nvcc" as its compiler, via its CC setting. For best performance its CCFLAGS setting should use -O3 and have a KOKKOS_ARCH setting that matches the compute capability of your NVIDIA hardware and software installation, e.g. KOKKOS_ARCH=Kepler30. Note the minimal required compute capability is 2.0, but this will give significantly reduced performance compared to Kepler generation GPUs with compute capability 3.x. For the LINK setting, "nvcc" should not be used; instead use g++ or another compiler suitable for linking C++ applications. Often you will want to use your MPI compiler wrapper for this setting (i.e. mpicxx). Finally, the lo-level Makefile must also have a "Compilation rule" for creating *.o files from *.cu files. See src/Makefile.cuda for an example of a lo-level Makefile with all of these settings. KOKKOS_USE_TPLS=hwloc binds threads to hardware cores, so they do not migrate during a simulation. KOKKOS_USE_TPLS=hwloc should always be used if running with KOKKOS_DEVICES=Pthreads for pthreads. It is not necessary for KOKKOS_DEVICES=OpenMP for OpenMP, because OpenMP provides alternative methods via environment variables for binding threads to hardware cores. More info on binding threads to cores is given in "Section 5.3"_Section_accelerate.html#acc_3. KOKKOS_ARCH=KNC enables compiler switches needed when compiling for an Intel Phi processor. KOKKOS_USE_TPLS=librt enables use of a more accurate timer mechanism on most Unix platforms. This library is not available on all platforms. KOKKOS_DEBUG is only useful when developing a Kokkos-enabled style within LAMMPS. KOKKOS_DEBUG=yes enables printing of run-time debugging information that can be useful. It also enables runtime bounds checking on Kokkos data structures. KOKKOS_CUDA_OPTIONS are additional options for CUDA. For more information on Kokkos see the Kokkos programmers' guide here: /lib/kokkos/doc/Kokkos_PG.pdf. [Run with the KOKKOS package from the command line:] The mpirun or mpiexec command sets the total number of MPI tasks used by LAMMPS (one or multiple per compute node) and the number of MPI tasks used per node. E.g. the mpirun command in MPICH does this via its -np and -ppn switches. Ditto for OpenMPI via -np and -npernode. When using KOKKOS built with host=OMP, you need to choose how many OpenMP threads per MPI task will be used (via the "-k" command-line switch discussed below). Note that the product of MPI tasks * OpenMP threads/task should not exceed the physical number of cores (on a node), otherwise performance will suffer. When using the KOKKOS package built with device=CUDA, you must use exactly one MPI task per physical GPU. When using the KOKKOS package built with host=MIC for Intel Xeon Phi coprocessor support you need to insure there are one or more MPI tasks per coprocessor, and choose the number of coprocessor threads to use per MPI task (via the "-k" command-line switch discussed below). The product of MPI tasks * coprocessor threads/task should not exceed the maximum number of threads the coprocessor is designed to run, otherwise performance will suffer. This value is 240 for current generation Xeon Phi(TM) chips, which is 60 physical cores * 4 threads/core. Note that with the KOKKOS package you do not need to specify how many Phi coprocessors there are per node; each coprocessors is simply treated as running some number of MPI tasks. You must use the "-k on" "command-line switch"_Section_start.html#start_6 to enable the KOKKOS package. It takes additional arguments for hardware settings appropriate to your system. Those arguments are "documented here"_Section_start.html#start_6. The two most commonly used options are: -k on t Nt g Ng :pre The "t Nt" option applies to host=OMP (even if device=CUDA) and host=MIC. For host=OMP, it specifies how many OpenMP threads per MPI task to use with a node. For host=MIC, it specifies how many Xeon Phi threads per MPI task to use within a node. The default is Nt = 1. Note that for host=OMP this is effectively MPI-only mode which may be fine. But for host=MIC you will typically end up using far less than all the 240 available threads, which could give very poor performance. The "g Ng" option applies to device=CUDA. It specifies how many GPUs per compute node to use. The default is 1, so this only needs to be specified is you have 2 or more GPUs per compute node. The "-k on" switch also issues a "package kokkos" command (with no additional arguments) which sets various KOKKOS options to default values, as discussed on the "package"_package.html command doc page. Use the "-sf kk" "command-line switch"_Section_start.html#start_6, which will automatically append "kk" to styles that support it. Use the "-pk kokkos" "command-line switch"_Section_start.html#start_6 if you wish to change any of the default "package kokkos"_package.html optionns set by the "-k on" "command-line switch"_Section_start.html#start_6. Note that the default for the "package kokkos"_package.html command is to use "full" neighbor lists and set the Newton flag to "off" for both pairwise and bonded interactions. This typically gives fastest performance. If the "newton"_newton.html command is used in the input script, it can override the Newton flag defaults. However, when running in MPI-only mode with 1 thread per MPI task, it will typically be faster to use "half" neighbor lists and set the Newton flag to "on", just as is the case for non-accelerated pair styles. You can do this with the "-pk" "command-line switch"_Section_start.html#start_6. [Or run with the KOKKOS package by editing an input script:] The discussion above for the mpirun/mpiexec command and setting appropriate thread and GPU values for host=OMP or host=MIC or device=CUDA are the same. You must still use the "-k on" "command-line switch"_Section_start.html#start_6 to enable the KOKKOS package, and specify its additional arguments for hardware options appropriate to your system, as documented above. Use the "suffix kk"_suffix.html command, or you can explicitly add a "kk" suffix to individual styles in your input script, e.g. pair_style lj/cut/kk 2.5 :pre You only need to use the "package kokkos"_package.html command if you wish to change any of its option defaults, as set by the "-k on" "command-line switch"_Section_start.html#start_6. [Speed-ups to expect:] The performance of KOKKOS running in different modes is a function of your hardware, which KOKKOS-enable styles are used, and the problem size. Generally speaking, the following rules of thumb apply: When running on CPUs only, with a single thread per MPI task, performance of a KOKKOS style is somewhere between the standard (un-accelerated) styles (MPI-only mode), and those provided by the USER-OMP package. However the difference between all 3 is small (less than 20%). :ulb,l When running on CPUs only, with multiple threads per MPI task, performance of a KOKKOS style is a bit slower than the USER-OMP package. :l When running large number of atoms per GPU, KOKKOS is typically faster than the GPU package. :l When running on Intel Xeon Phi, KOKKOS is not as fast as the USER-INTEL package, which is optimized for that hardware. :l :ule See the "Benchmark page"_http://lammps.sandia.gov/bench.html of the LAMMPS web site for performance of the KOKKOS package on different hardware. [Guidelines for best performance:] Here are guidline for using the KOKKOS package on the different hardware configurations listed above. Many of the guidelines use the "package kokkos"_package.html command See its doc page for details and default settings. Experimenting with its options can provide a speed-up for specific calculations. [Running on a multi-core CPU:] If N is the number of physical cores/node, then the number of MPI tasks/node * number of threads/task should not exceed N, and should typically equal N. Note that the default threads/task is 1, as set by the "t" keyword of the "-k" "command-line switch"_Section_start.html#start_6. If you do not change this, no additional parallelism (beyond MPI) will be invoked on the host CPU(s). You can compare the performance running in different modes: run with 1 MPI task/node and N threads/task run with N MPI tasks/node and 1 thread/task run with settings in between these extremes :ul Examples of mpirun commands in these modes are shown above. When using KOKKOS to perform multi-threading, it is important for performance to bind both MPI tasks to physical cores, and threads to physical cores, so they do not migrate during a simulation. If you are not certain MPI tasks are being bound (check the defaults for your MPI installation), binding can be forced with these flags: OpenMPI 1.8: mpirun -np 2 -bind-to socket -map-by socket ./lmp_openmpi ... Mvapich2 2.0: mpiexec -np 2 -bind-to socket -map-by socket ./lmp_mvapich ... :pre For binding threads with the KOKKOS OMP option, use thread affinity environment variables to force binding. With OpenMP 3.1 (gcc 4.7 or later, intel 12 or later) setting the environment variable OMP_PROC_BIND=true should be sufficient. For binding threads with the KOKKOS pthreads option, compile LAMMPS the KOKKOS HWLOC=yes option (see "this section"_Section_packages.html#KOKKOS of the manual for details). [Running on GPUs:] Insure the -arch setting in the machine makefile you are using, e.g. src/MAKE/Makefile.cuda, is correct for your GPU hardware/software. (see "this section"_Section_packages.html#KOKKOS of the manual for details). The -np setting of the mpirun command should set the number of MPI tasks/node to be equal to the # of physical GPUs on the node. Use the "-k" "command-line switch"_Section_commands.html#start_6 to specify the number of GPUs per node, and the number of threads per MPI task. As above for multi-core CPUs (and no GPU), if N is the number of physical cores/node, then the number of MPI tasks/node * number of threads/task should not exceed N. With one GPU (and one MPI task) it may be faster to use less than all the available cores, by setting threads/task to a smaller value. This is because using all the cores on a dual-socket node will incur extra cost to copy memory from the 2nd socket to the GPU. Examples of mpirun commands that follow these rules are shown above. NOTE: When using a GPU, you will achieve the best performance if your input script does not use any fix or compute styles which are not yet Kokkos-enabled. This allows data to stay on the GPU for multiple timesteps, without being copied back to the host CPU. Invoking a non-Kokkos fix or compute, or performing I/O for "thermo"_thermo_style.html or "dump"_dump.html output will cause data to be copied back to the CPU. You cannot yet assign multiple MPI tasks to the same GPU with the KOKKOS package. We plan to support this in the future, similar to the GPU package in LAMMPS. You cannot yet use both the host (multi-threaded) and device (GPU) together to compute pairwise interactions with the KOKKOS package. We hope to support this in the future, similar to the GPU package in LAMMPS. [Running on an Intel Phi:] Kokkos only uses Intel Phi processors in their "native" mode, i.e. not hosted by a CPU. As illustrated above, build LAMMPS with OMP=yes (the default) and MIC=yes. The latter insures code is correctly compiled for the Intel Phi. The OMP setting means OpenMP will be used for parallelization on the Phi, which is currently the best option within Kokkos. In the future, other options may be added. Current-generation Intel Phi chips have either 61 or 57 cores. One core should be excluded for running the OS, leaving 60 or 56 cores. Each core is hyperthreaded, so there are effectively N = 240 (4*60) or N = 224 (4*56) cores to run on. The -np setting of the mpirun command sets the number of MPI tasks/node. The "-k on t Nt" command-line switch sets the number of threads/task as Nt. The product of these 2 values should be N, i.e. 240 or 224. Also, the number of threads/task should be a multiple of 4 so that logical threads from more than one MPI task do not run on the same physical core. Examples of mpirun commands that follow these rules are shown above. [Restrictions:] As noted above, if using GPUs, the number of MPI tasks per compute node should equal to the number of GPUs per compute node. In the future Kokkos will support assigning multiple MPI tasks to a single GPU. Currently Kokkos does not support AMD GPUs due to limits in the available backend programming models. Specifically, Kokkos requires extensive C++ support from the Kernel language. This is expected to change in the future.