# Basis set (chemistry)

A basis set in theoretical and computational chemistry is a set of functions (called basis functions) which are combined in linear combinations (generally as part of a quantum chemical calculation) to create molecular orbitals. For convenience these functions are typically atomic orbitals centered on atoms, but can theoretically be any function; plane waves are frequently used in materials calculations.

## Introduction

In modern computational chemistry, quantum chemical calculations are typically performed using a finite set of basis functions. In these cases, the wavefunctions of the system in question are represented as vectors, the components of which correspond to coefficients in a linear combination of the basis functions in the basis set used. The operators are then represented as matrices, (rank two tensors), in this finite basis. In this article, basis function and atomic orbital are sometimes used interchangeably, although it should be noted that these basis functions are usually not actually the exact atomic orbitals, even for the corresponding hydrogen-like atoms, due to approximations and simplifications of their analytic formulas. If the finite basis is expanded towards an infinite complete set of functions, calculations using such a basis set are said to approach the basis set limit.[1]

When molecular calculations are performed, it is common to use a basis composed of a finite number of atomic orbitals, centered at each atomic nucleus within the molecule (linear combination of atomic orbitals ansatz). These atomic orbitals are well described with Slater-type orbitals (STOs), as STOs decay exponentially with distance from the nuclei, accurately describing the long-range overlap between atoms, and reach a maximum at zero, well describing the charge and spin at the nucleus. STOs are computationally difficult and it was later realized by Frank Boys that these Slater-type orbitals could in turn be approximated as linear combinations of Gaussian orbitals instead. Because it is easier to calculate overlap and other integrals with Gaussian basis functions, this led to huge computational savings (see John Pople).

Today, there are hundreds of basis sets composed of Gaussian-type orbitals (GTOs). The smallest of these are called minimal basis sets, and they are typically composed of the minimum number of basis functions required to represent all of the electrons on each atom. The largest of these can contain dozens to hundreds of basis functions on each atom.

A minimum basis set is one in which, on each atom in the molecule, a single basis function is used for each orbital in a Hartree–Fock calculation on the free atom. However, for atoms such as lithium, basis functions of p type are added to the basis functions corresponding to the 1s and 2s orbitals of the free atom. For example, each atom in the second period of the periodic system (Li - Ne) would have a basis set of five functions (two s functions and three p functions).

A d-polarization function added to a p orbital[2]

Another common addition to basis sets is the addition of diffuse functions, denoted in Pople-type sets by a plus sign, +, and in Dunning-type sets by "aug" (from "augmented"). Two plus signs indicate that diffuse functions are also added to light atoms (hydrogen and helium). These are very shallow Gaussian basis functions, which more accurately represent the "tail" portion of the atomic orbitals, which are distant from the atomic nuclei. These additional basis functions can be important when considering anions and other large, "soft" molecular systems.

## Minimal basis sets

The most common minimal basis set is STO-nG, where n is an integer. This n value represents the number of Gaussian primitive functions comprising a single basis function. In these basis sets, the same number of Gaussian primitives comprise core and valence orbitals. Minimal basis sets typically give rough results that are insufficient for research-quality publication, but are much cheaper than their larger counterparts. Commonly used minimal basis sets of this type are:

• STO-3G
• STO-4G
• STO-6G
• STO-3G* - Polarized version of STO-3G

There are several other minimum basis sets that have been used such as the MidiX basis sets.

## Split-valence basis sets

During most molecular bonding, it is the valence electrons which principally take part in the bonding. In recognition of this fact, it is common to represent valence orbitals by more than one basis function (each of which can in turn be composed of a fixed linear combination of primitive Gaussian functions). Basis sets in which there are multiple basis functions corresponding to each valence atomic orbital are called valence double, triple, quadruple-zeta, and so on, basis sets (zeta, ζ, was commonly used to represent the exponent of an STO basis function[3]). Since the different orbitals of the split have different spatial extents, the combination allows the electron density to adjust its spatial extent appropriate to the particular molecular environment. Minimum basis sets are fixed and are unable to adjust to different molecular environments.

### Pople basis sets

The notation for the split-valence basis sets arising from the group of John Pople is typically X-YZg.[4] In this case, X represents the number of primitive Gaussians comprising each core atomic orbital basis function. The Y and Z indicate that the valence orbitals are composed of two basis functions each, the first one composed of a linear combination of Y primitive Gaussian functions, the other composed of a linear combination of Z primitive Gaussian functions. In this case, the presence of two numbers after the hyphens implies that this basis set is a split-valence double-zeta basis set. Split-valence triple- and quadruple-zeta basis sets are also used, denoted as X-YZWg, X-YZWVg, etc. Here is a list of commonly used split-valence basis sets of this type:

• 3-21G
• 3-21G* - Polarized
• 3-21+G - Diffuse functions
• 3-21+G* - With polarization and diffuse functions
• 4-21G
• 4-31G
• 6-21G
• 6-31G
• 6-31G*
• 6-31+G*
• 6-31G(3df, 3pd)
• 6-311G
• 6-311G*
• 6-311+G*

The 6-31G* basis set (defined for the atoms H through Zn) is a valence double-zeta polarized basis set that adds to the 6-31G set six d-type Cartesian-Gaussian polarization functions on each of the atoms Li through Ca and ten f-type Cartesian Gaussian polarization functions on each of the atoms Sc through Zn.

## Correlation-consistent basis sets

Some of the most widely used basis sets are those developed by Dunning and coworkers,[5] since they are designed to converge systematically to the complete-basis-set (CBS) limit using empirical extrapolation techniques. For first- and second-row atoms, the basis sets are cc-pVNZ where N=D,T,Q,5,6,... (D=double, T=triples, etc.). The 'cc-p', stands for 'correlation-consistent polarized' and the 'V' indicates they are valence-only basis sets. They include successively larger shells of polarization (correlating) functions (d, f, g, etc.). More recently these 'correlation-consistent polarized' basis sets have become widely used and are the current state of the art for correlated or post-Hartree–Fock calculations. Examples of these are:

• cc-pVDZ - Double-zeta
• cc-pVTZ - Triple-zeta
• cc-pV5Z - Quintuple-zeta, etc.
• aug-cc-pVDZ, etc. - Augmented versions of the preceding basis sets with added diffuse functions.

For period-3 atoms (Al-Ar), additional functions are necessary; these are the cc-pV(N+d)Z basis sets. Even larger atoms may employ pseudopotential basis sets, cc-pVNZ-PP, or relativistic-contracted Douglas-Kroll basis sets, cc-pVNZ-DK.

These basis sets can be augmented with core functions for geometric and nuclear property calculations, and with diffuse functions for electronic excited-state calculations, electric field property calculations, and long-range interactions, such as Van der Waals forces. A recipe for constructing additional augmented functions exists; as many as five augmented functions have been used in second hyperpolarizability calculations in the literature. Because of the rigorous construction of these basis sets, extrapolation can be done for almost any energetic property, although care must be taken when extrapolating energy differences as the individual energy components may converge at different rates.

H-He Li-Ne Na-Ar
cc-pVDZ [2s1p] → 5 func. [3s2p1d] → 14 func. [4s3p1d] → 18 func.
cc-pVTZ [3s2p1d] → 14 func. [4s3p2d1f] → 30 func. [5s4p2d1f] → 34 func.
cc-pVQZ [4s3p2d1f] → 30 func. [5s4p3d2f1g] → 55 func. [6s5p3d2f1g] → 59 func.

To understand how to get the number of functions take the cc-pVDZ basis set for H: There are two s (L = 0) orbitals and one p (L = 1) orbital that has 3 components along the z-axis (mL = -1,0,1) corresponding to px, py and pz. Thus, five spatial orbitals in total. Note that each orbital can hold two electrons of opposite spin.

For example, Ar [1s, 2s, 2p, 3s, 3p] has 3 s orbitals (L=0) and 2 sets of p orbitals (L=1). Using cc-pVDZ, orbitals are [1s, 2s, 2p, 3s, 3s', 3p, 3p', 3d'] (where ' represents the added in polarisation orbitals), with 4 s orbitals, 3 sets of p orbitals and 1 set of d orbitals.

## Other kinds of basis sets

Some other basis sets are :

• SV(P)
• SVP
• DZV - Valence double-zeta
• TZV - Valence triple-zeta
• TZVPP - Valence triple-zeta plus polarization
• QZVPP - Valence quadruple-zeta plus polarization

## Plane-wave basis sets

In addition to localized basis sets, plane-wave basis sets can also be used in quantum-chemical simulations. Typically, a finite number of plane-wave functions are used, below a specific cutoff energy which is chosen for a certain calculation. These basis sets are popular in calculations involving periodic boundary conditions. Certain integrals and operations are much easier to code and carry out with plane-wave basis functions than with their localized counterparts.

In practice, plane-wave basis sets are often used in combination with an 'effective core potential' or pseudopotential, so that the plane waves are only used to describe the valence charge density. This is because core electrons tend to be concentrated very close to the atomic nuclei, resulting in large wavefunction and density gradients near the nuclei which are not easily described by a plane-wave basis set unless a very high energy cutoff, and therefore small wavelength, is used. This combined method of a plane-wave basis set with a core pseudopotential is often abbreviated as a PSPW calculation.

Furthermore, as all functions in the basis are mutually orthogonal and are not associated with any particular atom, plane-wave basis sets do not exhibit basis-set superposition error. However, they are less well suited to gas-phase calculations. Using Fast Fourier Transforms, one can work with plane-wave basis sets in reciprocal space in which not only the aforementioned integrals, such as the kinetic energy, but also derivatives are computationally less demanding to be carried out. Another important advantage of a plane-wave basis is that it is guaranteed to converge in a smooth, monotonic manner to the target wavefunction, while there is only a guarantee of monotonic convergence for all Gaussian-type basis sets when used in variational calculations. (An exception to the latter point is the correlation consistent basis sets.) The properties of the Fourier Transform allow a vector representing the gradient of the total energy with respect to the plane-wave coefficients to be calculated with a computational effort that scales as NPW*ln(NPW) where NPW is the number of plane-waves. When this property is combined with separable pseudopotentials of the Kleinman-Bylander type and pre-conditioned conjugate gradient solution techniques, the dynamic simulation of periodic problems containing hundreds of atoms becomes possible.

## Real-space basis sets

On the same principle as the plane waves but in real space, there are basis sets whose functions are centered on a uniform mesh in real space. This is the case for the finite difference, the functions sinc or wavelets. In the case of the latter, it is possible to have an adaptive mesh closer to the nucleus using the scaling properties of wavelets. These methods use functions that are localized which allow the development of order N methods.

## Lagrange functions

Lagrange functions are a type of real-space basis set. Unlike several real-space basis sets, however, they are orthonormal, analytical, and complete. In addition, the convergence is systematic and relatively simple. Convergence is only controlled by the dimension of the basis set, similar as to how convergence is controlled by the energy cutoff in plane wave basis set.[citation needed]

## References

1. ^ Roman M. Balabin (2010). "Intramolecular basis set superposition error as a measure of basis set incompleteness: Can one reach the basis set limit without extrapolation?". J. Chem. Phys. 132 (21): 211103. Bibcode:2010JChPh.132u1103B. doi:10.1063/1.3430647. PMID 20528011.
2. ^ Errol G. Lewars. Computational Chemistry: Introduction to the Theory and Applications of Molecular and Quantum Mechanics (1st ed.). Springer. ISBN 978-1402072857.
3. ^ Davidson, Ernest; Feller, David (1986). "Basis set selection for molecular calculations". Chem. Rev. 86 (4): 681–696. doi:10.1021/cr00074a002.
4. ^ Ditchfield, R; Hehre, W.J; Pople, J. A. (1971). "Self‐Consistent Molecular‐Orbital Methods. IX. An Extended Gaussian‐Type Basis for Molecular‐Orbital Studies of Organic Molecules". J. Chem. Phys. 54 (2): 724–728. Bibcode:1971JChPh..54..724D. doi:10.1063/1.1674902.
5. ^ Dunning, Thomas H. (1989). "Gaussian basis sets for use in correlated molecular calculations. I. The atoms boron through neon and hydrogen". J. Chem. Phys. 90 (2): 1007–1023. Bibcode:1989JChPh..90.1007D. doi:10.1063/1.456153.

All the many basis sets discussed here along with others are discussed in the references below which themselves give references to the original journal articles:

• Levine, Ira N. (1991). Quantum Chemistry. Englewood Cliffs, New jersey: Prentice Hall. pp. 461–466. ISBN 0-205-12770-3.
• Cramer, Christopher J. (2002). Essentials of Computational Chemistry. Chichester: John Wiley & Sons, Ltd. pp. 154–168. ISBN 0-471-48552-7.
• Jensen, Frank (1999). Introduction to Computational Chemistry. John Wiley and Sons. pp. 150–176. ISBN 978-0471980858.
• Leach, Andrew R. (1996). Molecular Modelling: Principles and Applications. Singapore: Longman. pp. 68–77. ISBN 0-582-23933-8.
• Hehre, Warren J.. (2003). A Guide to Molecular Mechanics and Quantum Chemical Calculations. Irvine, California: Wavefunction, Inc. pp. 40–47. ISBN 1-890661-18-X.
• https://web.archive.org/web/20070830043639/http://www.chem.swin.edu.au/modules/mod8/basis1.html
• Moran, Damian; Simmonett, Andrew C.; Leach, Franklin E.; Allen, Wesley D.; Schleyer, Paul v. R.; Schaefer, Henry F. (2006). "Popular Theoretical Methods Predict Benzene and Arenes To Be Nonplanar". Journal of the American Chemical Society. 128 (29): 9342–3. doi:10.1021/ja0630285. PMID 16848464.
• Choi, Sunghwan; Kwangwoo, Hong; Jaewook, Kim; Woo Youn, Kim (2015). "Accuracy of Lagrange-sinc functions as a basis set for electronic structure calculations of atoms and molecules". The Journal of Chemical Physics. doi:10.1063/1.4913569.