Features and changes introduced in Revs. B.01 and C.01 are indicated by [REV B] and [REV C], respectively.
New Modeling Capabilities
- [REV C] NBO version 7 is supported. There are new options to the Population keyword: Pop=NPA7, Pop=NBO7, Pop=NBO7Read and Pop=NBO7Delete request Natural Population Analysis, full Natural Bond Orbital Analysis, full NBO with NBO input read from the input stream and NBO analysis of the effects of deletion of some interactions (respectively), using NBO7 via the external interface. In addition, NEDA=n is used to perform Natural Energy Decomposition Analysis. The analysis uses the same input information about fragments as counterpoise calculations. Deletions and optimizations with deletions now work with either NBO6 or NBO7.
- [REV C] The RESP (restrained electrostatic potential) constraint can be included in computing potential-derived charges. For example, Pop=(MK,Resp=N) applies a weight of N x 10-6 Hartrees to the squared charges. Other electrostatic potential-derived charge schemes also accept this option (e.g., CHelp, HLY). N defaults to 2.
- [REV C] Pop=SaveHirshfeld and Pop=SaveCM5 cause the specified charges to be saved as the MM charges to be used in a subsequent calculation.
- [REV B] Static Raman intensities can be computed for excited states at the CIS and TD levels of theory. TD Freq=Raman computes the polarizability by numerical differentiation with respect to an electric field, so the cost of Freq=Raman for these methods is 7x that of the frequencies without Raman intensities.
- TD-DFT analytic second derivatives for predicting vibrational frequencies/IR and Raman spectra and performing transition state optimizations and IRC calculations for excited states.
- EOMCC analytic gradients for performing geometry optimizations.
- Anharmonic vibrational analysis for VCD and ROA spectra: see Freq=Anharmonic.
- Vibronic spectra and intensities: see Freq=FCHT and related options.
- Resonance Raman spectra: see Freq=ReadFCHT.
- New DFT functionals: M08HX, MN15, MN15L, PW6B95, PW6B95D3.
- New double-hybrid methods: DSDPBEP86, PBE0DH and PBEQIDH.
- PM7 semi-empirical method.
- Ciofini excited state charge transfer diagnostic: see Pop=DCT.
- The EOMCC solvation interaction models of Caricato: see SCRF=PTED.
- Generalized internal coordinates, a facility which allows arbitrary redundant internal coordinates to be defined and used for optimization constraints and other purposes. See Geom=GIC and GIC Info.
Performance Enhancements
- NVIDIA K40, K80, P100 (Pascal), V100 (Volta) and A100 (Ampere) GPUs are supported under Linux for Hartree-Fock and DFT calculations. A100 support is new with Revision C.02, V100 support was new with Revision C.01, and P100 support was new with [REV B]. Revisions B.01 and C.01 also provided performance improvements for all supported GPU types. See Using GPUs for details on GPU support and usage.
- Parallel performance on larger numbers of processors has been improved. See the Parallel Performance tab for information about how to get optimal performance on multiple CPUs and clusters.
- [REV B] Dynamic allocation of tasks among Linda workers is now the default, improving parallel efficiency.
- Gaussian 16 uses an optimized memory algorithm to avoid I/O during CCSD iterations.
- There are several enhancements to the GEDIIS optimization algorithm.
- CASSCF improvements for active spaces ≥ (10,10) increase performance and make active spaces of up to 16 orbitals feasible (depending on the molecular system).
- Significant speedup of the core correlation energies for W1 compound model.
- Gaussian 16 incorporates algorithmic improvements for significant speedup of the diagonal, second-order self-energy approximation (D2) component of composite electron propagator (CEP) methods as described in [DiazTinoco16]. See EPT.
Usage Enhancements
- [REV C] The ROA invariants for each vibrational mode are now only printed by G16 or by freqchk if normal mode derivatives were requested, rather than by default.
- [REV C] Utilities can now take the -m command-line argument to specify the amount of memory available to the utility. For example:
formchk -m=1gb myfile
The -m option must precede any file name or other arguments.
- [REV C] The %SSH Link 0 command and its equivalents can be used to name a command to run to start Linda workers, rather than either rsh or ssh.
- [REV C] Some defaults when Geom=AllCheck is specified can now be overridden:
- Field=NoChk can be used to suppress reading external field coefficients from the checkpoint file.
- Geom=GenConnectivity forces the connectivity to be recomputed rather than using the information in the checkpoint file.
- Geom=UseStandardOrientation uses the coordinates in the standard orientation from the checkpoint file as the input orientation for the new job.
- [REV C] Some defaults during geometry optimizations to a minimum can now be overridden:
- Opt=NGoUp=N allows the energy to increase N times before doing only linear searches. The default is 1 (only linear searches are performed after the second time in row that the energy increases); N=-1 forces only linear searches whenever the energy rises.
- When near a saddle point, Opt=NGoDown=N causes the program to mix at most N eigenvectors of the Hessian with negative eigenvalues to form a step away from the saddle point. The default is 3; N=-1 turns this feature off, and the algorithm takes only the regular RFO step.
- Opt=MaxEStep=N says to take a step of length N/1000 (Bohr or radians) when moving away from a saddle point. The default is N=600 (0.6) for regular optimizations and N=100 (0.1) for ONIOM Opt=Quadmac calculations.
- [REV C] Information on multidimensional relaxed scans is now stored on the formatted checkpoint file with details about the axes, rather than flattened, so these can be displayed in GaussView and other programs.
- [REV C] The program now stores and checks a version number in checkpoint files. This avoids obscure failure modes when an obsolete checkpoint is named. The c8616 utility can be used to update checkpoint files, and there is a -fixver option to unfchk to mark a checkpoint file it creates as current even if there was no version in the input formatted checkpoint file.
- [REV B] The ChkChk utility now reports the job status (whether the job completed normally, failed, is in progress, etc.)
- [REV B] The optional parameters in the input line for an atom can now specify the radius to use when finite (non-point) nuclei are used. The radius is specified as a floating point value in atomic units using the RadNuclear=val item. For example:
C(RadNucl=0.001) 0.0 0.0 3.0
- The GauOpen tools for interfacing Gaussian with other programs, both in compiled languages such as Fortran and C and with interpreted languages such as Python and Perl. Refer to GauOpen: Interfacing to Gaussian 16 for details.
- [REV C] supports raw binary files using either 4- or 8-byte integers. The former is the default except on NEC systems. Support for this feature includes new options to the Output keyword and the formchk utility, new Link 0 commands and new command line options and environment variables.
- [REV C] adds information about ONIOM layers and optimization and trajectory results to the matrix element file. It also adds new options to the Output keyword for including AO two-electron integrals, derivatives of the overlap, core Hamiltonian and other matrices and/or the AO 2-electron integral derivatives.
- [REV B] added many additional quantities to the matrix element file, including atomic populations, one-electron and property operator matrices and the non-adiabatic coupling vector. The new items are the labeled sections QUADRUPOLE INTEGRALS, OCTOPOLE INTEGRALS, HEXADECAPOLE INTEGRALS, [MULLIKEN,ESP,AIM,NPA,MBS] CHARGES, DIP VEL INTEGRALS, R X DEL INTEGRALS, OVERLAP DERIVATIVES, CORE HAMILTONIAN DERIVATIVES, F(X), DENSITY DERIVATIVES, FOCK DERIVATIVES, ALPHA UX, BETA UX, ALPHA MO DERIVATIVES, BETA MO DERIVATIVES, [Alpha,Beta] [SCF,MP2,MP3,MP4,CI Rho(1),CI,CC] DENSITY and TRANS MO COEFFICIENTS and the scalars 63-64.
- [REV C] Enhancements to facilitate scripting:
- The AllAtoms and ActiveAtoms to the External keyword are used to provide information on all atoms or only those in the model system (high layer) when using an external program/script with ONIOM.
- The file $g16root/g16/bsd/inp2mat is a script which takes a Gaussian input file and generates a matrix element file with the information implied by the input file (coordinates, basis set, etc.) without running the full calculation. This is used by the Python interface in GauOpen to import this information into a matrix element file object, but can also be used in other scripts to avoid any need to parse Gaussian input files.
- The testrt utility now prints the integer size used by G16 so that scripts can check what size of integers will be used by default in matrix element files.
- Parameters specified in Link 0 (%) input lines and/or in a Default.Route file can now also be specified via either command-line arguments or environment variables. [REV B] introduces command-line options to specify input and/or data using a checkpoint or matrix element file (the equivalent of the %OldChk or %OldMatrix Link 0 commands for input). See the Equivalencies tab for details.
- You can now compute the force constants at every nth step of a geometry optimization: see Opt=Recalc.
- [REV B] DFTB parameters are now read in Link 301 before the basis set is constructed, so that the presence or absence of d functions for an element can be taken from the parameter file.
Changes between G16 Revision C.01 and G16 Revision C.02
Revision C.02 is an update to support NVIDIA A100 (Ampere) GPUs and the NVIDIA SDK compiler version 21.3. The build procedure from source code has changed for all x86_64 platforms to use the new compiler. Apart from A100 GPU support, the resulting binaries offer the same functionality as Revision C.01.
Changes from Gaussian 16 Rev. A.03
- There have been minor modifications to the procedure for building from source code, which is documented here.
Changes from Gaussian 09
Calculation Defaults
The following calculation defaults are different in Gaussian 16:
- Integral accuracy is 10-12 rather than 10-10 in Gaussian 09.
- The default DFT grid for general use is UltraFine rather than FineGrid in G09; the default grid for CPHF is SG1 rather than CoarseGrid. See the discussion of the Integral keyword for details.
- SCRF defaults to the symmetric form of IEFPCM [Lipparini10] (not present in Gaussian 09) rather than the non-symmetric version.
- Physical constants use the 2010 values rather than the 2006 values in Gaussian 09.
The first two items were changed to ensure accuracy in several new calculation types (e.g., TD-DFT frequencies, anharmonic ROA). For these reasons, Integral=(UltraFine,Acc2E=12) was made the default. Using these settings generally improve the reliability of calculations involving numerical integration, e.g., DFT optimizations in solution. There is a modest increase in the CPU requirements for these options compared to the Gaussian 09 defaults of Integral=(FineGrid,Acc2E=10).
The G09Defaults keyword sets all four of these defaults back to the Gaussian 09 values. It is provided for compatibility with previous calculations, but the new defaults are strongly recommended for new studies.
Default Memory Use
Gaussian 16 defaults memory usage to %Mem=100MW (800MB). Even larger values are appropriate for calculations on larger molecules and when using many processors; refer to the Parallel Jobs tab for details.
TD-DFT Frequencies
TDDFT frequency calculations compute second derivatives analytically by default, since these are much faster than the numerical derivatives (the only choice in Gaussian 09).
[include-page id=”/gpu”]
Parallel Usage and Performance Notes
Shared-memory parallelism
Memory allocation. Calculations involving larger molecules and basis sets benefit from larger memory allocations. 4 GB or more per processor is recommended for calculations involving 50 or more atoms and/or 500 or more basis functions. The freqmem utility estimates the optimal memory size per thread for ground-state frequency calculations, and the same value is reasonable for excited-state frequencies and is more than sufficient for ground and excited state optimizations.
The amount of memory allowed should rise with the number of processors: if 4 GB is reasonable for one processor, then the same job using 8 CPUs would run well in 32 GB. Of course, there may be limitations to smaller values imposed by the particular hardware, but scaling memory linearly with number of CPUs should be the goal. In particular, increasing only the number of CPUs with fixed memory size is unlikely to lead to good performance when using large numbers of processors.
For large frequency calculations and for large CCSD and EOM-CCSD energies, it is also desirable to leave enough memory to buffer the large disk files involved. Therefore, a Gaussian job should only be given 50-70% of the total memory on the system. For example, on a machine with a total of 128 GB, one should typically give 64-80 GB to a job which was using all the CPUs, and leave the remaining memory for the operating system to use as disk cache.
Pinning threads to CPUs under Linux. Efficiency is lost when threads are moved from one CPU to another, thereby invalidating the cache and causing other overhead. On most machines, Gaussian can tie threads to specific CPUs, and this is the recommended mode of operation, especially when using larger numbers of processors. The %CPU Link 0 line specifies the numbers of specific CPUs to be used. Thus, on a machine with one 8-core chip, one should use %CPU=0‑7 rather than %NProc=8 because the former ties the first thread to CPU 0, the next to CPU 1, etc.
On some older Intel processors (Nehalem and before), there is not enough memory bandwidth to keep all the CPUs on a chip busy, and it is often preferable to use half the CPUs, each with twice as much memory as if all were used. For example, on such a machine with four 12-core chips and 128 GB of memory, with CPUs 0-11 on the first chip, 12-23 on the second, and so on, it is better to run using 24 processors (6 on each chip) and give them 72 GB/24 procs = 3 GB memory each, rather than use all 48 with only 1.5 GB of memory each. The required input directives would be:
%Mem=72GB %CPU=0-47/2
where the /2 means to use every other core: i.e., cores 0, 2, 4, 6, 8, and 10 (on chip 0), 12, 14, 16, 18, 20, and 22 (on chip 1), etc.
With the most recent generations of Intel processors (Haswell and later), the memory bandwidth is better and using all the cores on each chip works well.
As long as sufficient memory is available and threads are tied to specific cores, then parallel efficiency on large molecules is good up to 64 or more cores.
Disable hyperthreading. Hyperthreading is not useful for Gaussian since it effectively divides resources such as memory bandwidth among threads on the same physical CPU. If hyperthreading cannot be turned off, Gaussian jobs should use only one hyperthread on each physical CPU. Under Linux, hyperthreads on different processors are grouped together. That is, if a machine has 2 chips each with 8 cores and 3-way hyperthreading, then “CPUs” 0-7 are across the 8 cores on chip 0, 8-15 are across the 8 cores on chip 1, and 16-23 are the second hyperthreads on the 8 cores of chip 0, and so on. So a job would run best with %CPU=0‑15.
Under AIX, hyperthreads are grouped together with up 8 hyperthread numbers for each CPU even if fewer hyperthreads are in use, so with two 8 core chips and 4-way hyperthreading, “CPUs” 0-3 are all on core 0 of chip 0, 8-11 are on core 1 of chip 0, etc. Thus, one would want to use %CPU=0‑127/8 to select “CPUs” 0, 8, 16, … which are each using a distinct core.
Cluster (Linda) parallelism
Availability. Hartree-Fock and DFT energies, gradients and frequencies run in parallel across clusters, as do MP2 energies and gradients. MP2 frequencies, CCSD, and EOM-CCSD energies and optimizations are SMP parallel but not cluster parallel. Numerical derivatives, such as DFT anharmonic frequencies and CCSD frequencies, are parallelized across nodes of a cluster by doing a complete gradient or second derivative calculation on each node, splitting the directions of differentiation across workers in the cluster.
Combining with MP parallelism. Shared-memory and cluster parallelism can be combined. Generally, one uses shared-memory parallelism across all CPUs in each node of the cluster. Note that %CPU and %Mem apply to each node of the cluster. Thus, if one has 3 nodes names apple, banana and cherry, each with two chips which have 8 CPUs each, then one might specify:
%Mem=64GB %CPU=0-15 %LindaWorkers=apple,banana,cherry # B3LYP/6-311+G(2d,p) Freq …
This would run 16 threads, each pinned to a CPU, on each of the 3 nodes, giving 4 GB to each of the 48 threads.
For the special case of numerical differentiation only—e.g., Freq=Anharm, CCSD Freq, etc.—one extra worker is used to collect the results. So these jobs should be run with two workers on the master node (where Gaussian 16 is started). For the above example if the job was computing anharmonic frequencies, then one would use:
%Mem=64GB %CPU=0-15 %LindaWorkers=apple:2,banana,cherry # B3LYP/6-311+G(2d,p) Freq=Anharm …
where Gaussian 16 is assumed to be started on node apple. This will start 2 workers on node apple, one of which just collects results, and will do the computational work using the other worker on apple and those on banana and cherry.
Memory requirements for CCSD, CCSD(T) and EOM-CCSD calculations
These calculations can use memory to avoid I/O and will run much more efficiently if they are allowed enough memory to store the amplitudes and product vectors in memory. If there are NO active occupied orbitals (NOA in the output) and NV virtual orbitals (NVB in the output) then approximately 9NO2NV2 words of memory are required. This does not depend on the number of processors used.
[include-page id=”/equivs”]
The following bugs are fixed in Rev. C.01:
- Problems with Freq=Anharmonic when doing Raman or ROA with multiple incident light frequencies were fixed.
- Fixes for memory allocation running in parallel with high angular momentum and pure DFT functionals and some unusual cases with cluster parallelism.
- Documentation within DFTB parameter files is skipped properly.
- A problem with running chkchk on a checkpoint file from a job which died early was fixed.
- Performance problems with the hybridization term in PM7R6 for large molecules were fixed.
- The limit on the number of occupied orbitals in the GVB code has been increased to 1000, and some problems with FMM andGVB for large molecules were fixed.
- Problems with Grimme (D2 or D3) dispersion and ghost atoms were fixed.
- A problem with the orbital energies printed by Punch=MO and chkchk -p was fixed.
- The handling of the default file extension for the -fck= (/fck= on Windows systems) command line argument was fixed, so that the default is .fck but specifying other extensions such as .fchk also work.
- Some errors in specifying a named basis in general basis input which were previously undetected are now recognized.
- A problem with using / rather than – to specify the option selecting an ONIOM subcalculation to chkchk, copychk and formchk on Windows was fixed.
- Problems with running formchk on files during an ONIOM model system calculation, or on a checkpoint file from an ONIOM job which stopped during a model system calculation were fixed.
- A problem with MM parameter values being incomplete or wrong in formatted checkpoint files was fixed.
- Field=Read when density fitting is also in use no longer tries to read the field values twice.
- An error in parsing a bare CBSB7 on the route, treating this as implying CBS extrapolation rather than naming the basis set, was fixed.
- Various wrong defaults in the route generated for CIS with SCF=Conventional were corrected.
- A problem with Guess=Read when ghost atoms were present was fixed.
- Punch=GAMESS now works with H and higher functions.
- -2 instead of Tv for translation vectors in the atom specification input section works again.
- Some problems in generating internal coordinates for molecules having long linear chains of atoms were fixed.
- A problem in doing one-electron derivatives when a very small threshold for two-electron integrals was specified was fixed.
- A problem caused an Opt+Freq jobs which specified a non-default post-SCF window (e.g., MP2=FreezeG2) to fail in the frequency step was fixed.
- An underestimate of memory requirements for incore, which could cause jobs to default to incore and then run out of memory, was fixed.
The following bugs were fixed in Rev. B.01:
- A problem with restarting in the middle of a job step (from the RWF) when using SCF=QC was fixed.
- When doing the regular SCF part of SCF=XQC or SCF=YQC, the orbitals and density are only saved when a new lowest energy wavefunction is found. If L502 fails to converge and the calculation moves on to L508 (QC or steepest descent SCF) then the best wavefunction from the regular SCF iterations is used.
- Problems with restarting from the RWF in the middle of EOM-CC calculations were fixed.
- Problems with ROMP4 and with EOM-CC when there was an empty beta spin-space or a full alpha spin-space were fixed.
- The erroneous labels in the summary table for G4 and G4MP2 jobs were corrected.
- Problems with naming scratch files for NBO when the RWF was split across physical files were fixed.
- An allocation problem which caused CIS and TD frequency jobs on large molecules using very small amounts of memory to fail was fixed. These jobs now complete, but they would run much more efficiently if given more memory (i.e., a larger value to %Mem).
- A bug which caused jobs which used the FormCheck keyword to fail was fixed. This keyword is deprecated, and the -fchk command line option, which is more flexible, is the preferred alternative.
- Unnecessary warnings which were printed by formchk when operating on a checkpoint file from a calculation which included PCM solvation were removed.
- The route for Opt=(TS,ReCalcFC=N) was corrected.
- Molecular mechanics parameters are now stored correctly in formatted checkpoint files.
- The route for doing interaction deletions using NBO6 (Pop=NBO6Del) was corrected.
- A bug which prevented GPUs from being enabled in later steps of a compound job was fixed.
- A problem with parsing the obsolete keywords QMom and Magneton in atomic property lists was corrected.
Last updated: 31 August 2022