molpx.generate.sample

molpx.generate.sample(MD_trajectories, MD_top, projected_trajectories, proj_idxs=[0, 1], n_points=100, n_geom_samples=1, keep_all_samples=False, proj_stride=1, verbose=False, return_data=False)

Returns a sample of molecular geometries and their positions in the projected space

Parameters:
  • MD_trajectories (list of strings) – Filenames (any extension that mdtraj can read is accepted) containing the trajectory data. There is an untested input mode where the user parses directly mdtraj.Trajectory objects
  • MD_top (str to topology filename or directly mdtraj.Topology object) –
  • projected_trajectories ((lists of) strings or (lists of) numpy ndarrays of shape (n_frames, n_dims)) – Time-series with the projection(s) that want to be explored. If these have been computed externally, you can provide .npy-filenames or readable asciis (.dat, .txt etc). Alternatively, you can feed in your own clustering object. NOTE: molpx assumes that there is no time column.
  • proj_idxs (int, default is None) – Selection of projection idxs (zero-idxd) to visualize. The default behaviour is that proj_idxs = range(n_projs). However, if proj_idxs != None, then n_projs is ignored and proj_dim is set automatically
  • n_points (int, default is 100) – Number of points along the projection path. The higher this number, the higher the projected coordinate is resolved, at the cost of more computational effort. It’s a trade-off parameter
  • n_points – For each of the n_points along the projection path, n_geom_samples will be retrieved from the trajectory files. The higher this number, the smoother the minRMSD projection path. Also, the longer it takes for the path to be computed
  • n_geom_samples (int, default is 1) – This is a trade-off parameter between how smooth the transitons between geometries can be and how long it takes to generate the sample
  • keep_all_samples (boolean, default is False) – In principle, once the closest-to-ref geometry has been kept, the other geometries are discarded, and the output sample contains only n_point geometries. HOWEVER, there are special cases where the user might want to keep all sampled geometries. Typical use-case is when the n_points is low and many representatives per clustercenters will be much more informative than the other way around (i know, this is confusing TODO: write this better)
Returns:

  • pos – ndarray with the positions of the sample
  • geom_smplmdtraj.Trajectory object with the sampled geometries