Universe¶

If you wish to make an apple pie from scratch, you must first invent the universe.

—Carl Sagan, Cosmos

MDAnalysis is structured around two fundamental classes: the Universe and the AtomGroup. Almost all code in MDAnalysis begins with Universe, which contains all the information describing a molecular dynamics system.

It has two key properties:

atoms: an AtomGroup of the system’s atoms, providing access to important analysis methods (described below)
trajectory: the currently loaded trajectory reader

A Universe ties the static information from the “topology” (e.g. atom identities) to dynamically updating information from the “trajectory” (e.g. coordinates). A key feature of MDAnalysis is that an entire trajectory is not loaded into memory (unless the user explicitly does so with MemoryReader). Instead, the trajectory attribute provides a view on a specific frame of the trajectory. This allows the analysis of arbitrarily long trajectories without a significant impact on memory.

Creating a Universe¶

Loading from files¶

A Universe is typically created from a “topology” file, with optional “trajectory” file/s. Trajectory files must have the coordinates in the same order as atoms in the topology. See Formats for the topology and trajectory formats supported by MDAnalysis, and how to load each specific format.

u = Universe(topology, trajectory)
u = Universe(pdbfile)                       # read atoms and coordinates from PDB or GRO
u = Universe(topology, [traj1, traj2, ...]) # read from a list of trajectories
u = Universe(topology, traj1, traj2, ...)   # read from multiple trajectories

The line between topology and trajectory files is quite blurry. For example, a PDB or GRO file is considered both a topology and a trajectory file. The difference is that a topology file provides static information, such as atom identities (name, mass, etc.), charges, and bond connectivity. A trajectory file provides dynamic information, such as coordinates, velocities, forces, and box dimensions.

If only a single file is provided, MDAnalysis tries to read both topology and trajectory information from it. When multiple trajectory files are provided, coordinates are loaded in the order given.

The default arguments should create a Universe suited for most analysis applications. However, the Universe constructor also takes optional arguments.

The following options specify how to treat the input:

format: the file format of the trajectory file/s. (default: None, formats are guessed)
topology_format: the file format of the topology file. (default: None, formats are guessed)
all_coordinates: whether to read coordinate information from the first file (default: False. Ignored when only one file is provided)
continuous: whether to give multiple trajectory files continuous time steps. This is currently only supported for XTC/TRR trajectories with a GRO/TPR topology, following the behaviour of gmx trjcat (default: False.)

In [1]: import MDAnalysis as mda

In [2]: from MDAnalysis.tests.datafiles import PDB, GRO, XTC

In [3]: u1 = mda.Universe(GRO, XTC, XTC, all_coordinates=True)

In [4]: u1.trajectory
Out[4]: <ChainReader containing adk_oplsaa.gro, adk_oplsaa.xtc, adk_oplsaa.xtc with 21 frames of 47681 atoms>

In [5]: u2 = mda.Universe(GRO, XTC, XTC, all_coordinates=False, continuous=False)

In [6]: print([int(ts.time) for ts in u2.trajectory])
[0, 100, 200, 300, 400, 500, 600, 700, 800, 900, 0, 100, 200, 300, 400, 500, 600, 700, 800, 900]

The following options modify the created Universe:

guess_bonds: whether to guess connectivity between atoms. (default: False)
vdwradii: a dictionary of {element: radius} of van der Waals’ radii for use in guessing bonds.
transformations: a function or list of functions for on-the-fly trajectory transformation.
in_memory: whether to load coordinates into memory (default: False)
in_memory_step: only read every nth frame into an in-memory representation. (default: 1)
is_anchor: whether to consider this Universe when unpickling AtomGroups (default: True)
anchor_name: the name of this Universe when unpickling AtomGroups (default: None, automatically generated)

You can also pass in keywords for parsing the topology or coordinates. For example, many file formats do not specify the timestep for their trajectory. In these cases, MDAnalysis assumes that the default timestep is 1 ps. If this is incorrect, you can pass in a dt argument to modify the timestep. This does not modify timesteps for formats that include time information.

In [7]: from MDAnalysis.tests.datafiles import PRM, TRJ

In [8]: default_timestep = mda.Universe(PRM, TRJ)

In [9]: default_timestep.trajectory.dt
Out[9]: 1.0

In [10]: user_timestep = mda.Universe(PRM, TRJ, dt=5)  # ps

In [11]: user_timestep.trajectory.dt
Out[11]: 5

Constructing from AtomGroups¶

A new Universe can be created from one or more AtomGroup instances with Merge(). The AtomGroup instances can come from different Universes, meaning that this is one way to concatenate selections from different datasets.

For example, to combine a protein, ligand, and solvent from separate PDB files:

u1 = mda.Universe("protein.pdb")
u2 = mda.Universe("ligand.pdb")
u3 = mda.Universe("solvent.pdb")
u = Merge(u1.select_atoms("protein"), u2.atoms, u3.atoms)
u.atoms.write("system.pdb")

Constructing from scratch¶

A Universe can be constructed from scratch with Universe.empty. There are three stages to this process:

Create the blank Universe with specified number of atoms. If coordinates, set trajectory=True.

Add topology attributes such as atom names.

(Optional) Load coordinates.

For example, to construct a universe with 6 atoms in 2 residues:

In [12]: u = mda.Universe.empty(6, 2, atom_resindex=[0, 0, 0, 1, 1, 1], trajectory=True)

In [13]: u.add_TopologyAttr('masses')

In [14]: coordinates = np.empty((1000,  # number of frames
   ....:                         u.atoms.n_atoms,
   ....:                         3))
   ....: 

In [15]: u.load_new(coordinates, order='fac')
Out[15]: <Universe with 6 atoms>

See this notebook tutorial for more information.

Constructing from SMILES¶

Thanks to the interoperability with RDKit, a Universe can be constructed from a SMILES string with Universe.from_smiles. Since the SMILES string usually contains information about heavy atoms only, this method automatically adds the appropriate number of hydrogens on all atoms, by setting addHs=True. It is also possible to generate coordinates with generate_coordinates=True. This will allow RDKit to generate one or more conformers for the given molecule. For example, setting numConfs=10 will generate 10 conformers, and each conformer will be read as a frame by the Universe.

Here is a minimal example to create a Universe from one conformer of ethanol:

In [16]: u = mda.Universe.from_smiles("CCO")
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-16-32eb0824c794> in <module>
----> 1 u = mda.Universe.from_smiles("CCO")

AttributeError: type object 'Universe' has no attribute 'from_smiles'

In [17]: u
Out[17]: <Universe with 6 atoms>

In [18]: u.trajectory
Out[18]: <MemoryReader with 1000 frames of 6 atoms>

Internally, calling the Universe.from_smiles method does the following:

In [19]: from rdkit import Chem

In [20]: from rdkit.Chem import AllChem

In [21]: mol = Chem.MolFromSmiles("CCO") # ethanol

In [22]: mol = Chem.AddHs(mol)

In [23]: confids = AllChem.EmbedMultipleConfs(mol, numConfs=1) # 1 conformer

In [24]: u = mda.Universe(mol, format="RDKIT")
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/checkouts/readthedocs.org/user_builds/mdauserguide/conda/pr69/lib/python3.8/site-packages/MDAnalysis/core/_get_readers.py in get_parser_for(filename, format)
    228     try:
--> 229         return _PARSERS[format]
    230     except KeyError:

KeyError: 'RDKIT'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
~/checkouts/readthedocs.org/user_builds/mdauserguide/conda/pr69/lib/python3.8/site-packages/MDAnalysis/lib/util.py in format_from_filename_extension(filename)
    981     try:
--> 982         root, ext = get_ext(filename)
    983     except:

~/checkouts/readthedocs.org/user_builds/mdauserguide/conda/pr69/lib/python3.8/site-packages/MDAnalysis/lib/util.py in get_ext(filename)
    924     """
--> 925     root, ext = os.path.splitext(filename)
    926     if ext.startswith(os.extsep):

~/checkouts/readthedocs.org/user_builds/mdauserguide/conda/pr69/lib/python3.8/posixpath.py in splitext(p)
    117 def splitext(p):
--> 118     p = os.fspath(p)
    119     if isinstance(p, bytes):

TypeError: expected str, bytes or os.PathLike object, not Mol

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-24-9655b02def0e> in <module>
----> 1 u = mda.Universe(mol, format="RDKIT")

~/checkouts/readthedocs.org/user_builds/mdauserguide/conda/pr69/lib/python3.8/site-packages/MDAnalysis/core/universe.py in __init__(self, *args, **kwargs)
    287                 else:
    288                     self.filename = args[0]
--> 289                 parser = get_parser_for(self.filename, format=topology_format)
    290                 try:
    291                     with parser(self.filename) as p:

~/checkouts/readthedocs.org/user_builds/mdauserguide/conda/pr69/lib/python3.8/site-packages/MDAnalysis/core/_get_readers.py in get_parser_for(filename, format)
    230     except KeyError:
    231         try:
--> 232             rdr = get_reader_for(filename)
    233         except ValueError:
    234             raise ValueError(

~/checkouts/readthedocs.org/user_builds/mdauserguide/conda/pr69/lib/python3.8/site-packages/MDAnalysis/core/_get_readers.py in get_reader_for(filename, format)
     93         else:
     94             # else let the guessing begin!
---> 95             format = util.guess_format(filename)
     96     format = format.upper()
     97     try:

~/checkouts/readthedocs.org/user_builds/mdauserguide/conda/pr69/lib/python3.8/site-packages/MDAnalysis/lib/util.py in guess_format(filename)
   1030         # Note: at the moment the upper-case extension *is* the format specifier
   1031         # and list of filenames is handled by ChainReader
-> 1032         format = (format_from_filename_extension(filename)
   1033                   if not iterable(filename) else 'CHAIN')
   1034 

~/checkouts/readthedocs.org/user_builds/mdauserguide/conda/pr69/lib/python3.8/site-packages/MDAnalysis/lib/util.py in format_from_filename_extension(filename)
    982         root, ext = get_ext(filename)
    983     except:
--> 984         raise TypeError(
    985             "Cannot determine file format for file '{0}'.\n"
    986             "           You can set the format explicitly with "

TypeError: Cannot determine file format for file '<rdkit.Chem.rdchem.Mol object at 0x7f38699686c0>'.
           You can set the format explicitly with 'Universe(..., format=FORMAT)'.

In [25]: u
Out[25]: <Universe with 6 atoms>

In [26]: u.trajectory
Out[26]: <MemoryReader with 1000 frames of 6 atoms>

Guessing topology attributes¶

MDAnalysis can guess two kinds of information. Sometimes MDAnalysis guesses information instead of reading it from certain file formats, which can lead to mistakes such as assigning atoms the wrong element or charge. See the available topology parsers for a case-by-case breakdown of which atom properties MDAnalysis guesses for each format. See Guessing for how attributes are guessed, and Default values and attribute levels for which attributes have default values.

Universe properties and methods¶

A Universe holds master groups of atoms and topology objects:

atoms: all Atoms in the system, in an AtomGroup.

residues: all Residues in the system

segments: all Segments in the system

bonds: all bond TopologyObjects in the system

angles: all angle TopologyObjects in the system

dihedrals: all dihedral TopologyObjects in the system

impropers: all improper TopologyObjects in the system

Residues and Segments are chemically meaningful groups of Atoms.

Modifying a topology is typically done through the Universe, which contains several methods for adding properties:

add_TopologyAttr()

add_Residue()

add_Segment()

See Topology attributes for more information on which topology attributes can be added, and examples/constructing_universe.ipynb for examples on adding attributes and Segments.