Universe¶
If you wish to make an apple pie from scratch, you must first invent the universe.
—Carl Sagan, Cosmos
MDAnalysis is structured around two fundamental classes: the Universe
and the AtomGroup
. Almost all code in MDAnalysis begins with Universe
, which contains all the information describing a molecular dynamics system.
It has two key properties:
atoms
: anAtomGroup
of the system’s atoms, providing access to important analysis methods (described below)trajectory
: the currently loaded trajectory reader
A Universe
ties the static information from the “topology” (e.g. atom identities) to dynamically updating information from the “trajectory” (e.g. coordinates). A key feature of MDAnalysis is that an entire trajectory is not loaded into memory (unless the user explicitly does so with MemoryReader
). Instead, the trajectory
attribute provides a view on a specific frame of the trajectory. This allows the analysis of arbitrarily long trajectories without a significant impact on memory.
Creating a Universe¶
Loading from files¶
A Universe is typically created from a “topology” file, with optional “trajectory” file/s. Trajectory files must have the coordinates in the same order as atoms in the topology. See Formats for the topology and trajectory formats supported by MDAnalysis, and how to load each specific format.
u = Universe(topology, trajectory)
u = Universe(pdbfile) # read atoms and coordinates from PDB or GRO
u = Universe(topology, [traj1, traj2, ...]) # read from a list of trajectories
u = Universe(topology, traj1, traj2, ...) # read from multiple trajectories
The line between topology and trajectory files is quite blurry. For example, a PDB or GRO file is considered both a topology and a trajectory file. The difference is that a topology file provides static information, such as atom identities (name, mass, etc.), charges, and bond connectivity. A trajectory file provides dynamic information, such as coordinates, velocities, forces, and box dimensions.
If only a single file is provided, MDAnalysis tries to read both topology and trajectory information from it. When multiple trajectory files are provided, coordinates are loaded in the order given.
The default arguments should create a Universe suited for most analysis applications. However, the Universe
constructor also takes optional arguments.
The following options specify how to treat the input:
format
: the file format of the trajectory file/s. (default: None, formats are guessed)topology_format
: the file format of the topology file. (default: None, formats are guessed)all_coordinates
: whether to read coordinate information from the first file (default: False. Ignored when only one file is provided)continuous
: whether to give multiple trajectory files continuous time steps. This is currently only supported for XTC/TRR trajectories with a GRO/TPR topology, following the behaviour of gmx trjcat (default: False.)
In [1]: import MDAnalysis as mda
In [2]: from MDAnalysis.tests.datafiles import PDB, GRO, XTC
In [3]: u1 = mda.Universe(GRO, XTC, XTC, all_coordinates=True)
In [4]: u1.trajectory
Out[4]: <ChainReader containing adk_oplsaa.gro, adk_oplsaa.xtc, adk_oplsaa.xtc with 21 frames of 47681 atoms>
In [5]: u2 = mda.Universe(GRO, XTC, XTC, all_coordinates=False, continuous=False)
In [6]: print([int(ts.time) for ts in u2.trajectory])
[0, 100, 200, 300, 400, 500, 600, 700, 800, 900, 0, 100, 200, 300, 400, 500, 600, 700, 800, 900]
The following options modify the created Universe:
guess_bonds
: whether to guess connectivity between atoms. (default: False)vdwradii
: a dictionary of{element: radius}
of van der Waals’ radii for use in guessing bonds.transformations
: a function or list of functions for on-the-fly trajectory transformation.in_memory
: whether to load coordinates into memory (default: False)in_memory_step
: only read every nth frame into an in-memory representation. (default: 1)is_anchor
: whether to consider this Universe when unpicklingAtomGroup
s (default: True)anchor_name
: the name of this Universe when unpicklingAtomGroup
s (default: None, automatically generated)
You can also pass in keywords for parsing the topology or coordinates. For example, many file formats do not specify the timestep for their trajectory. In these cases, MDAnalysis assumes that the default timestep is 1 ps. If this is incorrect, you can pass in a dt
argument to modify the timestep. This does not modify timesteps for formats that include time information.
In [7]: from MDAnalysis.tests.datafiles import PRM, TRJ
In [8]: default_timestep = mda.Universe(PRM, TRJ)
In [9]: default_timestep.trajectory.dt
Out[9]: 1.0
In [10]: user_timestep = mda.Universe(PRM, TRJ, dt=5) # ps
In [11]: user_timestep.trajectory.dt
Out[11]: 5
Constructing from AtomGroups¶
A new Universe can be created from one or more AtomGroup
instances with Merge()
. The AtomGroup
instances can come from different Universes, meaning that this is one way to concatenate selections from different datasets.
For example, to combine a protein, ligand, and solvent from separate PDB files:
u1 = mda.Universe("protein.pdb")
u2 = mda.Universe("ligand.pdb")
u3 = mda.Universe("solvent.pdb")
u = Merge(u1.select_atoms("protein"), u2.atoms, u3.atoms)
u.atoms.write("system.pdb")
Constructing from scratch¶
A Universe can be constructed from scratch with Universe.empty
. There are three stages to this process:
Create the blank Universe with specified number of atoms. If coordinates, set
trajectory=True
.Add topology attributes such as atom names.
(Optional) Load coordinates.
For example, to construct a universe with 6 atoms in 2 residues:
In [12]: u = mda.Universe.empty(6, 2, atom_resindex=[0, 0, 0, 1, 1, 1], trajectory=True)
In [13]: u.add_TopologyAttr('masses')
In [14]: coordinates = np.empty((1000, # number of frames
....: u.atoms.n_atoms,
....: 3))
....:
In [15]: u.load_new(coordinates, order='fac')
Out[15]: <Universe with 6 atoms>
Constructing from SMILES¶
Thanks to the interoperability with RDKit, a Universe can be constructed from a SMILES string with Universe.from_smiles
. Since the SMILES string usually contains information about heavy atoms only, this method automatically adds the appropriate number of hydrogens on all atoms, by setting addHs=True
. It is also possible to generate coordinates with generate_coordinates=True
. This will allow RDKit to generate one or more conformers for the given molecule. For example, setting numConfs=10
will generate 10 conformers, and each conformer will be read as a frame by the Universe.
Here is a minimal example to create a Universe from one conformer of ethanol:
In [16]: u = mda.Universe.from_smiles("CCO")
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-16-32eb0824c794> in <module>
----> 1 u = mda.Universe.from_smiles("CCO")
AttributeError: type object 'Universe' has no attribute 'from_smiles'
In [17]: u
Out[17]: <Universe with 6 atoms>
In [18]: u.trajectory
Out[18]: <MemoryReader with 1000 frames of 6 atoms>
Internally, calling the Universe.from_smiles
method does the following:
In [19]: from rdkit import Chem
In [20]: from rdkit.Chem import AllChem
In [21]: mol = Chem.MolFromSmiles("CCO") # ethanol
In [22]: mol = Chem.AddHs(mol)
In [23]: confids = AllChem.EmbedMultipleConfs(mol, numConfs=1) # 1 conformer
In [24]: u = mda.Universe(mol, format="RDKIT")
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/checkouts/readthedocs.org/user_builds/mdauserguide/conda/pr69/lib/python3.8/site-packages/MDAnalysis/core/_get_readers.py in get_parser_for(filename, format)
228 try:
--> 229 return _PARSERS[format]
230 except KeyError:
KeyError: 'RDKIT'
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
~/checkouts/readthedocs.org/user_builds/mdauserguide/conda/pr69/lib/python3.8/site-packages/MDAnalysis/lib/util.py in format_from_filename_extension(filename)
981 try:
--> 982 root, ext = get_ext(filename)
983 except:
~/checkouts/readthedocs.org/user_builds/mdauserguide/conda/pr69/lib/python3.8/site-packages/MDAnalysis/lib/util.py in get_ext(filename)
924 """
--> 925 root, ext = os.path.splitext(filename)
926 if ext.startswith(os.extsep):
~/checkouts/readthedocs.org/user_builds/mdauserguide/conda/pr69/lib/python3.8/posixpath.py in splitext(p)
117 def splitext(p):
--> 118 p = os.fspath(p)
119 if isinstance(p, bytes):
TypeError: expected str, bytes or os.PathLike object, not Mol
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-24-9655b02def0e> in <module>
----> 1 u = mda.Universe(mol, format="RDKIT")
~/checkouts/readthedocs.org/user_builds/mdauserguide/conda/pr69/lib/python3.8/site-packages/MDAnalysis/core/universe.py in __init__(self, *args, **kwargs)
287 else:
288 self.filename = args[0]
--> 289 parser = get_parser_for(self.filename, format=topology_format)
290 try:
291 with parser(self.filename) as p:
~/checkouts/readthedocs.org/user_builds/mdauserguide/conda/pr69/lib/python3.8/site-packages/MDAnalysis/core/_get_readers.py in get_parser_for(filename, format)
230 except KeyError:
231 try:
--> 232 rdr = get_reader_for(filename)
233 except ValueError:
234 raise ValueError(
~/checkouts/readthedocs.org/user_builds/mdauserguide/conda/pr69/lib/python3.8/site-packages/MDAnalysis/core/_get_readers.py in get_reader_for(filename, format)
93 else:
94 # else let the guessing begin!
---> 95 format = util.guess_format(filename)
96 format = format.upper()
97 try:
~/checkouts/readthedocs.org/user_builds/mdauserguide/conda/pr69/lib/python3.8/site-packages/MDAnalysis/lib/util.py in guess_format(filename)
1030 # Note: at the moment the upper-case extension *is* the format specifier
1031 # and list of filenames is handled by ChainReader
-> 1032 format = (format_from_filename_extension(filename)
1033 if not iterable(filename) else 'CHAIN')
1034
~/checkouts/readthedocs.org/user_builds/mdauserguide/conda/pr69/lib/python3.8/site-packages/MDAnalysis/lib/util.py in format_from_filename_extension(filename)
982 root, ext = get_ext(filename)
983 except:
--> 984 raise TypeError(
985 "Cannot determine file format for file '{0}'.\n"
986 " You can set the format explicitly with "
TypeError: Cannot determine file format for file '<rdkit.Chem.rdchem.Mol object at 0x7f38699686c0>'.
You can set the format explicitly with 'Universe(..., format=FORMAT)'.
In [25]: u
Out[25]: <Universe with 6 atoms>
In [26]: u.trajectory
Out[26]: <MemoryReader with 1000 frames of 6 atoms>
Guessing topology attributes¶
MDAnalysis can guess two kinds of information. Sometimes MDAnalysis guesses information instead of reading it from certain file formats, which can lead to mistakes such as assigning atoms the wrong element or charge. See the available topology parsers for a case-by-case breakdown of which atom properties MDAnalysis guesses for each format. See Guessing for how attributes are guessed, and Default values and attribute levels for which attributes have default values.
Universe properties and methods¶
A Universe holds master groups of atoms and topology objects:
atoms
: all Atoms in the system, in an AtomGroup.
residues
: all Residues in the system
segments
: all Segments in the system
bonds
: all bond TopologyObjects in the system
angles
: all angle TopologyObjects in the system
dihedrals
: all dihedral TopologyObjects in the system
impropers
: all improper TopologyObjects in the system
Residues and Segments are chemically meaningful groups of Atoms.
Modifying a topology is typically done through the Universe
, which contains several methods for adding properties:
See Topology attributes for more information on which topology attributes can be added, and examples/constructing_universe.ipynb for examples on adding attributes and Segments.