molpx.visualize

The core functionality is to link two interative figures, fig1 and fig2, inside an IPython/Jupyter notebook, so that an action in fig1 (e.g.a click of the mouse or a slide of a slidebar) will trigger an event in fig2 (e.g. a frame update or point moved) and vice versa. Usually, these two figures contain representations from:

  • molecules: an NGLview widget showing one (or more) molecular structure(s) that a particular value of the coordinate(s) is associated with and
  • projected coordinates: a matplotlib figure showing the projected coordinates (e.g. TICs or PCs or any other), \({Y_0, ..., Y_N}\), either as a 2D histogram, \(PDF(Y_i, Y_j)\) or as trajectory views \({Y_0(t), ...Y_N(t)}\)

You are strongly encouraged to check NGLview’s documentation, since its functionalities extend beyond the scope of this package and the molecular visualization universe is rich and complex (unlike this module).

The methods offered by this module are:

molpx.visualize.FES(MD_trajectories, MD_top, …) Return a molecular visualization widget connected with a free energy plot.
molpx.visualize.sample(positions, geom, ax) Visualize the geometries in geom according to the data in positions on an existing matplotlib axes ax
molpx.visualize.traj(MD_trajectories, …[, …]) Link one or many projected trajectories, [Y_0(t), Y_1(t)…], with the MD_trajectories that originated them.
molpx.visualize.correlations(correlation_input) Provide a visual and textual representation of the linear correlations between projected coordinates (PCA, TICA) and original features.
molpx.visualize.feature(feat, widget[, …]) Provide a visual representation of a PyEMMA feature.
molpx.visualize.FES(MD_trajectories, MD_top, projected_trajectories, proj_idxs=[0, 1], nbins=100, n_sample=100, proj_stride=1, weights=None, proj_labels='proj', n_overlays=1, atom_selection=None, **sample_kwargs)

Return a molecular visualization widget connected with a free energy plot.

Parameters:
  • MD_trajectories (str, or list of strings with the filename(s) the the molecular dynamics (MD) trajectories.) – Any file extension that mdtraj (.xtc, .dcd, etc) can read is accepted. Alternatively, a single mdtraj.Trajectory object or a list of them can be given as input.
  • MD_top (str to topology filename or directly an mdtraj.Topology object) –
  • projected_trajectories (numpy ndarray (or list thereof) of shape (n_frames, n_dims) with the time-series) –
  • the projection(s) that want to be explored. Alternatively, strings or list of string with .npy or ascii filenames (of) – filenames (.dat, .txt etc)
  • NOTE (molpx assumes that there is no time column.) –
  • proj_idxs (int, list or ndarray) – Selection of projection idxs (zero-idxd) to visualize.
  • nbins (int, default 100) – The number of bins per axis to used in the histogram (FES)
  • n_sample (int, default is 100) – The number of geometries that will be used to represent the FES. The higher the number, the higher the spatial resolution of the “click”-action. (And the longer it will take to generate the plot)
  • proj_stride (int, default is 1) – Stride value that was used in the projected_trajectories relative to the MD_trajectories If the original MD_trajectories were stored every 5 ps but the projected trajectories were stored every 50 ps, proj_stride = 10 has to be provided, otherwise an exception will be thrown informing the user that the MD_trajectories and the projected_trajectories have different number of frames.
  • weights (iterable of floats (or list thereof) each of shape (n_frames, 1) or (n_frames)) – The sample weights can come from a metadynamics run or an MSM-object, e.g. via the method pyemma.msm.BayesianMSM.trajectory_weights. Has to have the same length as the projected_trajectories
  • proj_labels (either string or list of strings or (experimental) PyEMMA featurizer) – The projection plots will get this parameter for labeling their yaxis. If a str is provided, that will be the base name proj_labels='%s_%u'%(proj_labels,ii) for each projection. If proj_labels is a list, the list will be used as is. If there are not enough labels, the module will complain
  • n_overlays (int, default is 1) – The number of structures that will be simultaneously displayed as overlays for every sampled point of the FES. This parameter can seriously slow down the method, it is currently limited to a maximum value of 50
  • atom_selection (string or iterable of integers, default is None) – The geometries of the original trajectory files will be filtered down to these atoms. It can be any DSL string that mdtraj.Topology.select could understand or directly the iterable of integers. If MD_trajectories is already a (list of) mdtraj.Trajectory objects, the atom-slicing can be done by the user place before calling this method.
  • sample_kwargs (dictionary of named arguments, optional) – named arguments for the function molpx.visualize.sample. Non-expert users can safely ignore this option. Examples are superpose or proj_idxs
Returns:

A molpx._linkutils.MolPXHBox-object.

It contains the NGLWidget and the AxesWidget (which is responsible for the interactive figure). It is child-class of the ipywidgets.HBox-class and has been monkey-patched to have the following extra attributes so that the user has access to all the information being displayed.

linked_axes :

list with all the pylab.Axis-objects contained in the widgetbox

linked_ax_wdgs :

list with all the matplotlib.widgets.AxesWidget`objects contained in the :obj:`widgetbox

linked_figs :

list with all the pylab.Figure-objects contained in the widgetbox

linked_ngl_wdgs :

list with all the nglview.NGLWidget-objects contained in the widgetbox

linked_data_arrays :

list with all the numpy ndarrays contained in the widgetbox

linked_mdgeoms:

list with all the mdtraj.Trajectory-objects contained in the widgetbox

Return type:

widgetbox

molpx.visualize.correlations(correlation_input, geoms=None, proj_idxs=None, feat_name=None, widget=None, proj_color_list=None, n_feats=1, verbose=False, featurizer=None)

Provide a visual and textual representation of the linear correlations between projected coordinates (PCA, TICA) and original features.

Parameters:
  • correlation_input (numpy ndarray or some PyEMMA objects) –
    if array :
    (m,m) correlation matrix, with a row for each feature and a column for each projection
    if PyEMMA-object :
    TICA, PCA or MDFeaturizer.
  • geoms (None or mdtraj.Trajectory, default is None) – The values of the most correlated features will be returned for the geometries in this object. If widget is left to its default, None, correlations will create a new widget and try to show the most correlated features on top of the widget.
  • widget (None or nglview.NGLWidget, default is None) –

    Provide an already existing widget to visualize the correlations on top of. This is only for expert use, because no checks are done to see if correlation_input and the geometry contained in the widget actually match. Use with caution.

    Note

    When objects geoms and widget are provided simultaneously, three things happen:
    • no new widget will be instantiated
    • the display of features will be on top of whatever geometry widget contains
    • the value of the features is computed for the geometry of geom

    Use with caution and clean bookkeeping!

  • proj_color_list (list, default is None) – projection specific list of colors to provide the representations with. The default None yields blue. In principle, the list can contain one color for each projection (= as many colors as len(proj_idxs) but if your list is short it will just default to the last color. This way, proj_color_list=[‘black’] will paint all black regardless len(proj_idxs). Input anything that yields matplotlib.colors.is_color_like == True
  • proj_idxs (None, or int, or iterable of integers, default is None) – The indices of the projections for which the most correlated feture will be returned. If none it will default to the dimension of the correlation_input object
  • feat_name (None or str, default is None) – The prefix with which to prepend the labels of the most correlated features. If left to None, the feature description found in correlation_input will be used, if available
  • n_feats (int, default is 1) – Number of most correlated features to return for each projection
  • featurizer (None or MDFeaturizer, default is None) – If correlation_input is not an MDFeaturizer itself, or doesn’t have a data_producer attribute, the user can input one here. If correlation_input and featurizer are both MDFeaturizer-objects, featurizer will be ignored.
  • verbose (Bool, default is True) – print to standard output
Returns:

  • corr_dict and ngl_wdg

  • corr_dict – Dictionary with items:

    idxs :

    List of length len(proj_idxs) with lists of length n_feat with the idxs of the most correlated features

    vals :

    List of length len(proj_idxs) with lists of length n_feat with the corelation values of the most correlated features

    labels :

    List of length len(proj_idxs) with lists of length n_feat with the labels of the most correlated features

    feats :

    If an mdtraj.Trajectory is passed as an geom argument, the most correlated features will be evaluated and returned as list of length len(proj_idxs). Each element in th elist is an arrays with shape (geom.n_frames, n_feats)

    atom_idxs :

    List of length len(proj_idxs) each with an nd.array of shape (nfeat, m), where m is the number of atoms needed to describe each feature (1 of cartesian, 2 for distances, 3 for angles, 4 for dihedrals)

    info :

    List of length len(proj_idxs) with lists of length n_feat with strings describing the correlations

  • ngl_wdgnglview.NGLWidget with the most correlated features (distances, angles, dihedrals, positions) visualized on top of it.

molpx.visualize.feature(feat, widget, idxs=0, color_list=None, **kwargs)

Provide a visual representation of a PyEMMA feature. PyEMMA’s features are found as a list of the MDFeaturizers’s active_features attribute

Parameters:
  • featurizer (py:obj:_MDFeautrizer) – A PyEMMA MDFeaturizer object (either a feature or a featurizer, works with both)
  • widget (None or nglview widget) – Provide an already existing widget to visualize the correlations on top of. This is only for expert use, because no checks are done to see if correlation_input and the geometry contained in the widget actually match. Use with caution.
  • idxs (int or iterable of integers, default is 0) – Features can have many contributions, e.g. a distance feature can include many distances. Use this parameter to control which one of them gets represented.
  • color_list (list, default is None) – list of colors to represent each feature in feat_idxs. The default None yields blue for everything. In principle, the list can contain one color for each projection (= as many colors as len(feat_idxs) but if your list is short it will just default to the last color. This way, color_list=[‘black’] will paint all black regardless len(proj_idxs)
  • **kwargs (optional keyword arguments for _bmutils.add_atom_idxs_widget) – currently, only “radius” is left for the user to determine
  • Returns
  • --------
  • widget – the input widget with the features in idxs represented as either distances (for distance features) or “spacefill” spheres (for angular features)
molpx.visualize.sample(positions, geom, ax, plot_path=False, clear_lines=True, n_smooth=0, ngl_wdg=None, superpose=True, projection=None, n_feats=1, sticky=False, list_of_repr_dicts=None, color_list=None, **link_ax2wdg_kwargs)

Visualize the geometries in geom according to the data in positions on an existing matplotlib axes ax

Use this method when the array of positions, the geometries, the axes (and the ngl_wdg, optionally) have already been generated elsewhere.

Parameters:
  • positions (numpy nd.array of shape (n_frames, 2)) – Contains the position associated the geometries in geom. See below for more details
  • geom (mdtraj.Trajectory object or a list thereof.) – The geometries associated with the the positions. # TODO: WRITE HOW THE LISTS-LENGTHS CORRESPONDS FOR THE STICKY OPTION
  • ax (matplotlib.pyplot.Axes object) – The axes to be linked with the NGLWidget
  • plot_path (bool, default is False) – whether to draw a line connecting the positions in positions
  • clear_lines (bool, default is True) – whether to clear all the 2D objects that were previously drawn in ax
  • n_smooth (int, default is 0,) – if n_smooth > 0, the shown geometries and paths will be smoothed out by 2* n_smooth +1. See molpx._bmutils.smooth_geom for more information
  • ngl_wdg (None or an existing NGLWidget, default is None) – you can provide an already instantiated NGLWidget here (avanced use)
  • superpose (boolean, default is True) – The geometries in geom may or may not be oriented, depending on how they were generated. Since this method is mostly for visualization purposes, the default behaviour is to orient them all to maximally overlap with the most compact frame available
  • projection (object that generated the positions, default is None) –

    The projected coordinates may come from a variety of sources. When working with PyEMMA, a number of objects might have generated this projection, like a

    Makes most sense when positions where generated with molpx.generate.projection_paths, otherwise might produce rubbish. See n_feats for more info. # TODO: delete from here below? The features most correlated with the positions will be shown in the widget # TODO CHECK THIS geometries in geom, allowing the user to establish a visual connection between the projected coordinate and the original features (distances, angles, contacts etc).

  • n_feats (int, default is 1) – If a projection is passed along, the first n_feats features that most correlate the the projected trajectories will be represented, both in form of figures of feat(t) as well as in the on top of the ngl_wdg. If projection == None, nfeats will be ignored.
  • sticky (boolean, default is False) – If set to True, the ngl_wdg will be sticky in that every click adds a new molecular structure without deleting the previous one. Left-clicks adds a structure, right-clicks deletes a structure. Particularly useful for representing more minima simultaneously.
  • color_list (None, list of len(positions), or “random”) – The colors with which the sticky frames will be plotted. A color is anything that yields matplotlib.colors.is_color_like == True “None” defaults to coloring by element. “random” randomizes the color choice
  • list_of_repr_dicts (None or list of dictionaries, default is None) – Has an effect only for sticky == True, s.t. these reps instead of the default ones are used. The dictionaries must have at least the keys ‘repr_type’ and ‘selection’. Other key-value pairs are currently ignored but, will be implemented in the future. See nglview.NGLWidget.add_representation for more info).
  • link_ax2wdg_kwargs (dictionary of named arguments, optional) – named arguments for the function molpx._linkutils.link_ax_w_pos_2_nglwidget, which is the one that internally provides the interactivity. Non-expert users can safely ignore this option.
Returns:

molpx.visualize.traj(MD_trajectories, MD_top, projected_trajectories, max_frames=10000.0, stride=1, proj_stride=1, proj_idxs=[0, 1], proj_labels='proj', plot_FES=False, weights=None, panel_height=1, sharey_traj=True, dt=1.0, tunits='frames', traj_selection=None, projection=None, n_feats=1)

Link one or many projected trajectories, [Y_0(t), Y_1(t)…], with the MD_trajectories that originated them. Optionally plot also the resulting FES.

Parameters:
  • MD_trajectories (str, or list of strings with the filename(s) the the molecular dynamics (MD) trajectories.) –

    Any file extension that mdtraj (.xtc, .dcd etc) can read is accepted.

    Alternatively, a single mdtraj.Trajectory object or a list of them can be given as input.

  • MD_top (str to topology filename or directly mdtraj.Topology object) –
  • projected_trajectories (numpy ndarray (or list thereof) of shape (n_frames, n_dims) with the time-series) –
  • the projection(s) that want to be explored. Alternatively, strings or list of string with .npy or ascii filenames (of) –
  • txt etc) ((dat,) –
  • NOTE (molpx assumes that there is no time column.) –
  • max_frames (int, default is 1000) – If the trajectoy is longer than this, stride to this length (in frames)
  • stride (int, default is 1) – Stride value in case of large datasets. In case of having MD_trajectories and projected_trajectories in memory (and not on disk) the stride can take place also before calling this method.
  • proj_stride (int, default is 1) – Stride value that was used in the projected_trajectories relative to the MD_trajectories If the original MD_trajectories were stored every 5 ps but the projected trajectories were stored every 50 ps, proj_stride = 10 has to be provided, otherwise an exception will be thrown informing the user that the MD_trajectories and the projected_trajectories have different number of frames.
  • proj_idxs (iterable of ints, default is [0,1]) – Indices of the projected coordinates to use in the various representations
  • proj_labels (either string or list of strings) – The projection plots will get this paramter for labeling their yaxis. If a str is provided, that will be the base name proj_labels=’%s_%u’%(proj_labels,ii) for each projection. If a list, the list will be used. If not enough labels are there the module will complain
  • plot_FES (bool, default is False) – Plot (and interactively link) the FES as well
  • weights (ndarray(n_frames), default = None) – sample weights. By default all samples have the same weight (used for FES calculation only)
  • panel_height (int, default 1) – Height, in inches, of each panel of each trajectory subplots
  • sharey_traj (bool, default is True) – Force the panels of each projection to have the same yaxes across trajectories (Note: Not across coordinates)
  • dt (float, default is 1.0) – Physical time-unit equivalent to one frame of the projected_trajectories
  • tunits (str, default is 'frames') – Name of the physical time unit provided in dt
  • traj_selection (None, int, iterable of ints, default is None) – Don’t plot all trajectories but only few of them. The default None implies that all trajs will be plotted. Note: the data used for the FES will always include all trajectories, regardless of this value
  • projection (object that generated the projection, default is None) –

    The projected coordinates may come from a variety of sources. When working with pyemma a number of objects might have generated this projection, like a

    Pass this object along and observe and the features that are most correlated with the projections will be plotted for the active trajectory, allowing the user to establish a visual connection between the projected coordinate and the original features (distances, angles, contacts etc)

  • n_feats (int, default is 1) – If a projection is passed along, the first n_feats features that most correlate the the projected trajectories will be represented, both in form of trajectories feat vs t as well as in the ngl_wdg. If projection is None, nfeats will be ignored.
Returns:

  • ax, ngl_wdg, data_sample, geoms – return _plt.gca(), _plt.gcf(), widget, geoms
  • axpylab.Axis object
  • figpylab.Figure object
  • ngl_wdgnglview.NGLWidget
  • geomsmdtraj.Trajectory object with the geometries n_sample geometries shown by the ngl_wdg