molpx.generate¶
This module contains methods that generate the needed objects for visualize
of the methods to work.
molpx.generate.projection_paths (…[, …]) 
Return a path along a given projection. 
molpx.generate.sample (MD_trajectories, …) 
Returns a sample of molecular geometries and their positions in the projected space 

molpx.generate.
projection_paths
(MD_trajectories, MD_top, projected_trajectories, n_projs=1, proj_dim=2, proj_idxs=None, n_points=100, n_geom_samples=100, proj_stride=1, history_aware=True, verbose=False, minRMSD_selection='backbone')¶ Return a path along a given projection. More info on what this means exactly will follow soon.
Parameters:  MD_trajectories (str, or list of strings with the filename(s) the the molecular dynamics (MD) trajectories.) –
Any file extension that
mdtraj
(.xtc, .dcd etc) can read is accepted.Alternatively, a single
mdtraj.Trajectory
object or a list of them can be given as input.  MD_top (str to topology filename or directly
mdtraj.Topology
object) –  projected_trajectories (str to a filename or numpy ndarray of shape (n_frames, n_dims)) – Timeseries with the projection(s) that want to be explored. If these have been computed externally, you can provide .npyfilenames or readable asciis (.dat, .txt etc). NOTE: molpx assumes that there is no time column.
 n_projs (int, default is 1) – Number of projection paths to generate. If the input
projected_trajectories
are ndimensional, in principle up to npaths can be generated  proj_dim (int, default is 2) – Dimensionality of the space in which distances will be computed
 proj_idxs (int, defaultis None) – Selection of projection idxs (zeroidxd) to visualize. The default behaviour is that proj_idxs = range(n_projs). However, if proj_idxs != None, then n_projs is ignored and proj_dim is set automatically
 n_points (int, default is 100) – Number of points along the projection path. The higher this number, the higher the projected coordinate is resolved, at the cost of more computational effort. It’s a tradeoff parameter
 n_geom_samples (int, default is 100) – For each of the
n_points
along the projection path,n_geom_samples
will be retrieved from the trajectory files. The higher this number, the smoother the minRMSD projection path. Also, the longer it takes for the path to be computed  proj_stride (int, default is 1) – The stride of the
projected_trajectories
relative to theMD_trajectories
. This will play a role particularly ifprojected_trajectories
is already strided (because the user is holding it in memory) but the MDdata on disk has not been strided.  history_aware (bool, default is True) – The pathsearching algorigthm the can minimize distances between adjacent points along the path or minimize the distance between each point and the mean value of all the other up to that point. Use this parameter to avoid a situation in which the path gets “derailed” because an outlier is chosen at a given point.
 verbose (bool, default is False) – The verbosity level
 minRMSD_selection (str, default is 'backbone') – When computing minRMSDs between a given point and adjacent candidates, use this string to select the atoms that will be considered. Check mdtraj’s selection language here http://mdtraj.org/latest/atom_selection.html
Returns: dictionary of dictionaries containing the projection paths.
paths_dict[idxs][type_of_path]
 idxs represent the index of the projected coordinate ([0], [1]…)
 types of paths “min_rmsd” or “min_disp”
What the dictionary actually contains
paths_dict[idxs][type_of_path]["proj"]
: ndarray of shape (n_points, proj_dim) with the coordinates of the projection along the pathpaths_dict[idxs][type_of_path]["geom"]
:mdtraj.Trajectory
geometries along the path
Return type: paths_dict
 idata :
 list of ndarrays with the the data in
projected_trajectories
 MD_trajectories (str, or list of strings with the filename(s) the the molecular dynamics (MD) trajectories.) –

molpx.generate.
sample
(MD_trajectories, MD_top, projected_trajectories, atom_selection=None, proj_idxs=[0, 1], n_points=100, n_geom_samples=1, keep_all_samples=False, proj_stride=1, verbose=False, return_data=False)¶ Returns a sample of molecular geometries and their positions in the projected space
Parameters:  MD_trajectories (list of strings) – Filenames (any extension that
mdtraj
can read is accepted) containing the trajectory data. There is an untested input mode where the user parses directlymdtraj.Trajectory
objects  MD_top (str to topology filename or directly
mdtraj.Topology
object) –  projected_trajectories ((list of) strings or (list of) numpy ndarrays of shape (n_frames, n_dims)) – Timeseries with the projection(s) that want to be explored. You can provide .npyfilenames or readable asciis (.dat, .txt etc). Alternatively, you can feed in your own PyEMMAclustering object NOTE: molpx assumes that there is no time column.
 atom_selection (string or iterable of integers, default is None) – The geometries of the original trajectory files will be filtered down to these atoms. It can be any DSL string that mdtraj.Topology.select could understand or directly the iterable of integers. If :py:obj`MD_trajectories` is already a (list of) md.Trajectory objects, the atomslicing can take place before calling this method.
 proj_idxs (int, default is None) – Selection of projection idxs (zeroidxd) to visualize. The default behaviour is that proj_idxs = range(n_projs). However, if proj_idxs != None, then n_projs is ignored and proj_dim is set automatically
 n_points (int, default is 100) – Number of points along the projection path. The higher this number, the higher the projected coordinate is resolved, at the cost of more computational effort. It’s a tradeoff parameter
 n_geom_samples (int, default is 1) – For each of the
n_points
along the projection path,n_geom_samples
will be retrieved from the trajectory files. The higher this number, the smoother the minRMSD projection path. Also, the longer it takes for the path to be computed. This is a tradeoff parameter between how smooth the transitons between geometries can be and how long it takes to generate the sample  keep_all_samples (boolean, default is False) – In principle, once the closesttoref geometry has been kept, the other geometries are discarded, and the
output sample contains only n_point geometries. There are, still, special cases where the user might
want to keep all sampled geometries. Typical usecase is when the n_points is low and many representatives
per clustercenters will be much more informative than the other way around.
This is an advanced feature that other methods of molPX use internally for generating overlays, be awere that
it changes the return type of
geom_smpl
from the default (anmdtraj.Trajectory
withn_points
frames) to a list list of lengthn_geom_samples
, each element is anmdtraj.Trajectory
object ofn_points
frames  proj_stride (int, default is 1) – Stride value that was used in the
projected_trajectories
relative to theMD_trajectories
If the originalMD_trajectories
were stored every 5 ps but the projected trajectories were stored every 50 ps,proj_stride
= 10 has to be provided, otherwise an exception will be thrown informing the user that theMD_trajectories
and theprojected_trajectories
have different number of frames.
Returns:  pos – ndarray with the positions of the sample
 geom_smpl – sampled geometries. Can be of two types:
 default:
mdtraj.Trajectory
withn_points
frames  if keep_all_samples = True: list of length
n_geom_samples
. Each element is anmdtraj.Trajectory
object ofn_points
frames.
 default:
 MD_trajectories (list of strings) – Filenames (any extension that