日本語訳: www.conflex.co.jp/prod_gaussian_new.html
New Modeling Capabilities
- TD-DFT analytic second derivatives for predicting vibrational frequencies/IR and Raman spectra and performing transition state optimizations and IRC calculations for excited states.
- EOMCC analytic gradients for performing geometry optimizations.
- Anharmonic vibrational analysis for VCD and ROA spectra: see Freq=Anharmonic.
- Vibronic spectra and intensities: see Freq=FCHT and related options.
- Resonance Raman spectra: see Freq=ReadFCHT.
- New DFT functionals: M08HX, MN15, MN15L.
- New double-hybrid methods: DSDPBEP86, PBE0DH and PBEQIDH.
- PM7 semi-empirical method.
- Ciofini excited state charge transfer diagnostic: see Pop=DCT.
- The EOMCC solvation interaction models of Caricato: see SCRF=PTED.
- Generalized internal coordinates, a facility which allows arbitrary redundant internal coordinates to be defined and used for optimization constraints and other purposes. See Geom=GIC and GIC Info.
Performance Enhancements
- NVIDIA K40 and K80 GPUs are supported under Linux for Hartree-Fock and DFT calculations. See the Using GPUs tab for details.
- Parallel performance on larger numbers of processors has been improved. See the Parallel Performance tab for information about how to get optimal performance on multiple CPUs and clusters.
- Gaussian 16 uses an optimized memory algorithm to avoid I/O during CCSD iterations.
- There are several enhancements to the GEDIIS optimization algorithm.
- CASSCF improvements for active spaces ≥ (10,10) increase performance and make active spaces of up to 16 orbitals feasible (depending on the molecular system).
- Significant speedup of the core correlation energies for W1 compound model.
- Gaussian 16 incorporates algorithmic improvements for significant speedup of the diagonal, second-order self-energy approximation (D2) component of composite electron propagator (CEP) methods as described in [DiazTinoco16]. See EPT.
Usage Enhancements
- Tools for interfacing Gaussian with other programs, both in compiled languages such as Fortran and C and with interpreted languages such as Python and Perl. Refer to Interfacing to Gaussian 16 for details.
- Parameters specified in Link 0 (%) input lines and/or in a Default.Route file can now also be specified via either command-line
arguments or environment variables. See the Link 0 Equivalences tab for details. - Compute the force constants are every nth step of a geometry optimization: see Opt=Recalc.
Calculation Defaults
The following calculation defaults are different in Gaussian 16:
- Integral accuracy is 10-12 rather than 10-10 in Gaussian 09.
- The default DFT grid for general use is UltraFine rather than FineGrid in G09; the default grid for CPHF is SG1 rather than CoarseGrid. See the discussion of the Integral keyword for details.
- SCRF defaults to the symmetric form of IEFPCM [Lipparini10] (not present in Gaussian 09) rather than the non-symmetric version.
- Physical constants use the 2010 values rather than the 2006 values in Gaussian 09.
The first two items were changed to ensure accuracy in several new calculation types (e.g., TD-DFT frequencies, anharmonic ROA). For these reasons, Integral=(UltraFine,Acc2E=12) was made the default. Using these settings generally improve the reliability of calculations involving numerical integration, e.g., DFT optimizations in solution. There is a modest increase in the CPU requirements for these options compared to the Gaussian 09 defaults of Integral=(FineGrid,Acc2E=10).
The G09Defaults keyword sets all four of these defaults back to the Gaussian 09 values. It is provided for compatibility with previous calculations, but the new defaults are strongly recommended for new studies.
Default Memory Use
Gaussian 16 defaults memory usage to %Mem=100MW (800MB). Even larger values are appropriate for calculations on larger molecules and when using many processors; refer to the Parallel Jobs tab for details.
TD-DFT Frequencies
TDDFT frequency calculations compute second derivatives analytically by default, since these are much faster than the numerical derivatives (the only choice in Gaussian 09).
[include-page id=”/gpu”]
Parallel Usage and Performance Notes
Shared-memory parallelism
Memory allocation. Calculations involving larger molecules and basis sets benefit from larger memory allocations. 4 GB or more per processor is recommended for calculations involving 50 or more atoms and/or 500 or more basis functions. The freqmem utility estimates the optimal memory size per thread for ground-state frequency calculations, and the same value is reasonable for excited-state frequencies and is more than sufficient for ground and excited state optimizations.
The amount of memory allowed should rise with the number of processors: if 4 GB is reasonable for one processor, then the same job using 8 CPUs would run well in 32 GB. Of course, there may be limitations to smaller values imposed by the particular hardware, but scaling memory linearly with number of CPUs should be the goal. In particular, increasing only the number of CPUs with fixed memory size is unlikely to lead to good performance when using large numbers of processors.
For large frequency calculations and for large CCSD and EOM-CCSD energies, it is also desirable to leave enough memory to buffer the large disk files involved. Therefore, a Gaussian job should only be given 50-70% of the total memory on the system. For example, on a machine with a total of 128 GB, one should typically give 64-80 GB to a job which was using all the CPUs, and leave the remaining memory for the operating system to use as disk cache.
Pinning threads to CPUs. Efficiency is lost when threads are moved from one CPU to another, thereby invalidating the cache and causing other overhead. On most machines, Gaussian can tie threads to specific CPUs, and this is the recommended mode of operation, especially when using larger numbers of processors. The %CPU Link 0 line specifies the numbers of specific CPUs to be used. Thus, on a machine with one 8-core chip, one should use %CPU=0‑7 rather than %NProc=8 because the former ties the first thread to CPU 0, the next to CPU 1, etc.
On some older Intel processors (Nehalem and before), there is not enough memory bandwidth to keep all the CPUs on a chip busy, and it is often preferable to use half the CPUs, each with twice as much memory as if all were used. For example, on such a machine with four 12-core chips and 128 GB of memory, with CPUs 0-11 on the first chip, 12-23 on the second, and so on, it is better to run using 24 processors (6 on each chip) and give them 72 GB/24 procs = 3 GB memory each, rather than use all 48 with only 1.5 GB of memory each. The required input directives would be:
%Mem=72GB %CPU=0-47/2
where the /2 means to use every other core: i.e., cores 0, 2, 4, 6, 8, and 10 (on chip 0), 12, 14, 16, 18, 20, and 22 (on chip 1), etc.
With the most recent generations of Intel processors (Haswell and later), the memory bandwidth is better and using all the cores on each chip works well.
As long as sufficient memory is available and threads are tied to specific cores, then parallel efficiency on large molecules is good up to 64 or more cores.
Disable hyperthreading. Hyperthreading is not useful for Gaussian since it effectively divides resources such as memory bandwidth among threads on the same physical CPU. If hyperthreading cannot be turned off, Gaussian jobs should use only one hyperthread on each physical CPU. Under Linux, hyperthreads on different processors are grouped together. That is, if a machine has 2 chips each with 8 cores and 3-way hyperthreading, then “CPUs” 0-7 are across the 8 cores on chip 0, 8-15 are across the 8 cores on chip 1, and 16-23 are the second hyperthreads on the 8 cores of chip 0, and so on. So a job would run best with %CPU=0‑15.
Under AIX, hyperthreads are grouped together with up 8 hyperthread numbers for each CPU even if fewer hyperthreads are in use, so with two 8 core chips and 4-way hyperthreading, “CPUs” 0-3 are all on core 0 of chip 0, 8-11 are on core 1 of chip 0, etc. Thus, one would want to use %CPU=0‑127/8 to select “CPUs” 0, 8, 16, … which are each using a distinct core.
Cluster (Linda) parallelism
Availability. Hartree-Fock and DFT energies, gradients and frequencies run in parallel across clusters, as do MP2 energies and gradients. MP2 frequencies, CCSD, and EOM-CCSD energies and optimizations are SMP parallel but not cluster parallel. Numerical derivatives, such as DFT anharmonic frequencies and CCSD frequencies, are parallelized across nodes of a cluster by doing a complete gradient or second derivative calculation on each node, splitting the directions of differentiation across workers in the cluster.
Combining with MP parallelism. Shared-memory and cluster parallelism can be combined. Generally, one uses shared-memory parallelism across all CPUs in each node of the cluster. Note that %CPU and %Mem apply to each node of the cluster. Thus, if one has 3 nodes names apple, banana and cherry, each with two chips which have 8 CPUs each, then one might specify:
%Mem=64GB %CPU=0-15 %LindaWorkers=apple,banana,cherry # B3LYP/6-311+G(2d,p) Freq …
This would run 16 threads, each pinned to a CPU, on each of the 3 nodes, giving 4 GB to each of the 48 threads.
For the special case of numerical differentiation only—e.g., Freq=Anharm, CCSD Freq, etc.—one extra worker is used to collect the results. So these jobs should be run with two workers on the master node (where Gaussian 16 is started). For the above example if the job was computing anharmonic frequencies, then one would use:
%Mem=64GB %CPU=0-15 %LindaWorkers=apple:2,banana,cherry # B3LYP/6-311+G(2d,p) Freq=Anharm …
where Gaussian 16 is assumed to be started on node apple. This will start 2 workers on node apple, one of which just collects results, and will do the computational work using the other worker on apple and those on banana and cherry.
Memory requirements for CCSD, CCSD(T) and EOM-CCSD calculations
These calculations can use memory to avoid I/O and will run much more efficiently if they are allowed enough memory to store the amplitudes and product vectors in memory. If there are NO active occupied orbitals (NOA in the output) and NV virtual orbitals (NVB in the output) then approximately 9NO2NV2 words of memory are required. This does not depend on the number of processors used.
Most options that control how Gaussian 16 operates can be specified in any of 4 ways. From highest to lowest precedence these are:
- As Link 0 input (%-lines): This is the usual method to control a specific job and the only way to control a specific step within a multi-step input file. Example: %CPU=1,2,3,4
- As options on the command line: This is useful when one wants aliases or other shortcuts for different common ways of running the program. Example: g16 -c="1,2,3,4" …
- As environment variables: This is most useful in standard scripts, for example for generating and submitting jobs to batch queuing systems. Example: export GAUSS_CDEF="1,2,3,4"
- As directives in the Default.Route file: This is most useful when one wants to change the program defaults for all jobs. Example: -C- 1,2,3,4
When searching for a Default.Route file the current default directory is checked first, followed by the directories in the path for Gaussian 16 executables: environment variable GAUSS_EXEDIR, which normally points to $g16root/g16.
The following table lists the most important Link 0 commands and their equivalences:
Default.Route | Link 0 | Option | Env. Var. | Description |
Gaussian 16 execution defaults | ||||
-R- | -r | GAUSS_RDEF | Route section keyword list. | |
-M- | %Mem | -m | GAUSS_MDEF | Memory amount for Gaussian jobs. |
-C- | %CPU | -c | GAUSS_CDEF | Processor/core list for multiprocessor parallel jobs. |
-G- | %GPUCPU | -g | GAUSS_GDEF | GPUs=Cores list for GPU parallel jobs. |
-S- | %UseSSH | -s | GAUSS_SDEF | Program to start workers for network parallel jobs: rsh or ssh. |
-W- | %LindaWorkers | -w | GAUSS_WDEF | List of hostnames for network parallel jobs. |
-P- | %NProcShared | -p | GAUSS_PDEF | #processors/cores for multiprocessor parallel jobs. Deprecated; use -C-. |
-L- | %NProcLinda | -l | GAUSS_LDEF | #nodes for network parallel jobs. Deprecated; use -W-. |
Archive entry data | ||||
-H- | GAUSS_HDEF | Computer hostname. | ||
-O- | GAUSS_ODEF | Organization (site) name. | ||
Utility program defaults | ||||
-F- | GAUSS_FDEF | Options for the formchk utility. | ||
-U- | GAUSS_UDEF | Memory amount for utilities. | ||
Parameters for scripts and external programs | ||||
# section | -x | GAUSS_XDEF | Complete route for the job (route not read from input file). | |
%Chk | -y | GAUSS_YDEF | Checkpoint file | |
%RWF | -z | GAUSS_ZDEF | Read-write file. |
Note that the quotation marks are normally required around the specified value for the command line and environment variables to avoid modification of the parameter string by the shell.
Last updated: 3 March 2017