pyvalem is a Python 3 package for handling chemical formulae. It defines a syntax for specifying a formula with some structural information but is not itself a format for representing all molecular structures: for this purpose there are already many standards, including InChI and SMILES. Rather, it provides a simple way to parse the chemical formulae of atoms, isotopes, atomic ions and small molecules and to transform them into HTML for use on webpages, URL-safe "slug" strings and canonical stoichiometric formula form. It can also calculate masses (either as isotope-weighted averages or absolute values for specific isotopologues).
As an example, the L-tyrosine zwitterion may be represented by the following
>>> from pyvalem.chem_formula import ChemFormula >>> Ltyrosine = ChemFormula('L-(-)-(NH3+)CH(CH2C6H4OH)CO2-')
Its HTML representation (accessed with
(Note that D- and L- prefixes appear in lower-caps). Other useful attributes and methods include:
>>> print Ltyrosine.stoichiometric_formula() H11C9NO3 >>> print Ltyrosine.stoichiometric_formula('alphabetical') C9H11NO3 >>> print Ltyrosine.slug L-m___NH3_p_CH_CH2C6H4OH_CO2_m >>> print Ltyrosine.rmm # relative molecular mass 181.18854
ChemFormula object may be initialized by passing it a string consisting of element symbols and their stoichiometries. Any total charge on the species is indicated at the end of the string by
-1 (or just
+1 (or just
+2, etc. Do not use underscores (
_) for subscripts or carets (
^) for superscripts. For example,
>>> ethanol = ChemFormula('CH3CH2OH') >>> carbonate = ChemFormula('CO3-2') >>> hydronium = ChemFormula('H3O+')
Enclose specific isotopes in parentheses:
>>> f1 = ChemFormula('(235U)+4') >>> f2 = ChemFormula('(12C)(16O)2') >>> f3 = ChemFormula('(13C)HCl3')
Prefixes and formulae including bracketed moieties are now (v1.0b) supported:
>>> isobutane = ChemFormula('CH(CH3)3') >>> Dalanine = ChemFormula('D-CH3CH(NH2)COOH') >>> chlorocarbon = ChemFormula('1,1,2-C2H3Cl3') >>> beta_lysine = ChemFormula('β-H2NC3H6CH(NH2)CH2CO2H')
The supported molecular formula prefixes are listed below. Multiple prefixes are separated by a hyphen (as, for example, in
'(L)-α-CH3CH(NH2)COOH'). Note that some of the prefixes require unicode characters.
'(+)', '(-), '(±)', 'D', 'L', '(R)', '(S)', '(E)', '(Z)', 'cis', 'trans', 's', 'a', 'Δ', 'Λ', 'α', 'β', 'γ', 'n', 'i', 't', 'neo', 'sec', 'o', 'm', 'p', 'ortho', 'meta', 'para'
The string used to initialize the
ChemFormula object is stored and is returned by the
__str__() method. An HTML version (with the stoichiometric numbers as subscripts and the charge in its conventional form as a superscript is stored in the attribute
html. For example,
>>> print carbonate CO3-2 >>> print carbonate.html CO<sub>3</sub><sup>2-</sup>
slug attribute holds a URL-safe string representing the formula; this is guaranteed to be unique only for formulas without isotopes, prefixes or bracketed moieties.
>>> print hydronium.slug H3O_p >>> print chlorocarbon.slug 1_1_2__C2H3Cl3
The stoichiometric formula of a ChemFormula can be returned with the elements ordered by atomic number (the default), alphabetically, or in Hill notation: first the carbons, if any, then the hydrogens, then the other atoms in alphabetical order. For example,
>>> f = ChemFormula('CH2FCH2Cl') >>> print f.stoichiometric_formula() H4C2FCl # ordered by atomic number, the default >>> print f.stoichiometric_formula('alphabetical') C2ClFH4 >>> print f.stoichiometric_formula('hill') C2H4ClF
The relative molecular mass, relative to C=12, for average isotopic abundances is held in the attribute
>>> print ethanol.rmm; 46.06844
Conversely, where specific isotopes are specified, the isotope mass is used. For example, the most abundant isotopologue of ethanol, 12C21H516O1H:
>>> f = ChemFormula('(12C)2(1H)5(16O)(1H)') >>> print f.rmm 46.041865
The support of the Atomic and Molecular Data Unit at the IAEA, the Data Center for Plasma Properties of the Korean National Fusion Research Institute, and the Virtual Atomic and Molecular Data Centre in the development of pyvalem is gratefully acknowledged.