molpx.generate.sample¶
-
molpx.generate.sample(MD_trajectories, MD_top, projected_trajectories, atom_selection=None, proj_idxs=[0, 1], n_points=100, n_geom_samples=1, keep_all_samples=False, proj_stride=1, verbose=False, return_data=False)¶ Returns a sample of molecular geometries and their positions in the projected space
Parameters: - MD_trajectories (list of strings) – Filenames (any extension that
mdtrajcan read is accepted) containing the trajectory data. There is an untested input mode where the user parses directlymdtraj.Trajectoryobjects - MD_top (str to topology filename or directly
mdtraj.Topologyobject) – - projected_trajectories ((list of) strings or (list of) numpy ndarrays of shape (n_frames, n_dims)) – Time-series with the projection(s) that want to be explored. You can provide .npy-filenames or readable asciis (.dat, .txt etc). Alternatively, you can feed in your own PyEMMA-clustering object NOTE: molpx assumes that there is no time column.
- atom_selection (string or iterable of integers, default is None) – The geometries of the original trajectory files will be filtered down to these atoms. It can be any DSL string that mdtraj.Topology.select could understand or directly the iterable of integers. If :py:obj`MD_trajectories` is already a (list of) md.Trajectory objects, the atom-slicing can take place before calling this method.
- proj_idxs (int, default is None) – Selection of projection idxs (zero-idxd) to visualize. The default behaviour is that proj_idxs = range(n_projs). However, if proj_idxs != None, then n_projs is ignored and proj_dim is set automatically
- n_points (int, default is 100) – Number of points along the projection path. The higher this number, the higher the projected coordinate is resolved, at the cost of more computational effort. It’s a trade-off parameter
- n_geom_samples (int, default is 1) – For each of the
n_pointsalong the projection path,n_geom_sampleswill be retrieved from the trajectory files. The higher this number, the smoother the minRMSD projection path. Also, the longer it takes for the path to be computed. This is a trade-off parameter between how smooth the transitons between geometries can be and how long it takes to generate the sample - keep_all_samples (boolean, default is False) – In principle, once the closest-to-ref geometry has been kept, the other geometries are discarded, and the
output sample contains only n_point geometries. There are, still, special cases where the user might
want to keep all sampled geometries. Typical use-case is when the n_points is low and many representatives
per clustercenters will be much more informative than the other way around.
This is an advanced feature that other methods of molPX use internally for generating overlays, be awere that
it changes the return type of
geom_smplfrom the default (anmdtraj.Trajectorywithn_points-frames) to a list list of lengthn_geom_samples, each element is anmdtraj.Trajectoryobject ofn_points-frames - proj_stride (int, default is 1) – Stride value that was used in the
projected_trajectoriesrelative to theMD_trajectoriesIf the originalMD_trajectorieswere stored every 5 ps but the projected trajectories were stored every 50 ps,proj_stride= 10 has to be provided, otherwise an exception will be thrown informing the user that theMD_trajectoriesand theprojected_trajectorieshave different number of frames.
Returns: - pos – ndarray with the positions of the sample
- geom_smpl – sampled geometries. Can be of two types:
- default:
mdtraj.Trajectorywithn_points-frames - if keep_all_samples = True: list of length
n_geom_samples. Each element is anmdtraj.Trajectoryobject ofn_points-frames.
- default:
- MD_trajectories (list of strings) – Filenames (any extension that