molpx.visualize¶
The core functionality is to link two interative figures, fig1 and fig2, inside an IPython/Jupyter notebook, so that an action in fig1 (e.g.a click of the mouse or a slide of a slidebar) will trigger an event in fig2 (e.g. a frame update or point moved) and vice versa. Usually, these two figures contain representations from:
- molecules: an NGLview widget showing one (or more) molecular structure(s) that a particular value of the coordinate(s) is associated with and
- projected coordinates: a matplotlib figure showing the projected coordinates (e.g. TICs or PCs or any other), \({Y_0, ..., Y_N}\), either as a 2D histogram, \(PDF(Y_i, Y_j)\) or as trajectory views \({Y_0(t), ...Y_N(t)}\)
You are strongly encouraged to check NGLview’s documentation, since its functionalities extend beyond the scope of this package and the molecular visualization universe is rich and complex (unlike this module).
The methods offered by this module are:
molpx.visualize.FES (MD_trajectories, MD_top, …) |
Return a molecular visualization widget connected with a free energy plot. |
molpx.visualize.sample (positions, geom, ax) |
Visualize the geometries in geom according to the data in positions on an existing matplotlib axes ax |
molpx.visualize.traj (MD_trajectories, …[, …]) |
Link one or many projected trajectories , [Y_0(t), Y_1(t)…], with the MD_trajectories that originated them. |
molpx.visualize.correlations (correlation_input) |
Provide a visual and textual representation of the linear correlations between projected coordinates (PCA, TICA) and original features. |
molpx.visualize.feature (feat, widget[, …]) |
Provide a visual representation of a PyEMMA feature. |
-
molpx.visualize.
FES
(MD_trajectories, MD_top, projected_trajectories, proj_idxs=[0, 1], nbins=100, n_sample=100, proj_stride=1, weights=None, proj_labels='proj', n_overlays=1, atom_selection=None, **sample_kwargs)¶ Return a molecular visualization widget connected with a free energy plot.
Parameters: - MD_trajectories (str, or list of strings with the filename(s) the the molecular dynamics (MD) trajectories.) – Any file extension that
mdtraj
(.xtc, .dcd, etc) can read is accepted. Alternatively, a singlemdtraj.Trajectory
object or a list of them can be given as input. - MD_top (str to topology filename or directly an
mdtraj.Topology
object) – - projected_trajectories (numpy ndarray (or list thereof) of shape (n_frames, n_dims) with the time-series) –
- the projection(s) that want to be explored. Alternatively, strings or list of string with .npy or ascii filenames (of) – filenames (.dat, .txt etc)
- NOTE (molpx assumes that there is no time column.) –
- proj_idxs (int, list or ndarray) – Selection of projection idxs (zero-idxd) to visualize.
- nbins (int, default 100) – The number of bins per axis to used in the histogram (FES)
- n_sample (int, default is 100) – The number of geometries that will be used to represent the FES. The higher the number, the higher the spatial resolution of the “click”-action. (And the longer it will take to generate the plot)
- proj_stride (int, default is 1) – Stride value that was used in the
projected_trajectories
relative to theMD_trajectories
If the originalMD_trajectories
were stored every 5 ps but the projected trajectories were stored every 50 ps,proj_stride
= 10 has to be provided, otherwise an exception will be thrown informing the user that theMD_trajectories
and theprojected_trajectories
have different number of frames. - weights (iterable of floats (or list thereof) each of shape (n_frames, 1) or (n_frames)) – The sample weights can come from a metadynamics run or an MSM-object, e.g.
via the method
pyemma.msm.BayesianMSM.trajectory_weights
. Has to have the same length as theprojected_trajectories
- proj_labels (either string or list of strings or (experimental) PyEMMA featurizer) – The projection plots will get this parameter for labeling their yaxis. If a str is
provided, that will be the base name
proj_labels='%s_%u'%(proj_labels,ii)
for each projection. Ifproj_labels
is a list, the list will be used as is. If there are not enough labels, the module will complain - n_overlays (int, default is 1) – The number of structures that will be simultaneously displayed as overlays for every sampled point of the FES. This parameter can seriously slow down the method, it is currently limited to a maximum value of 50
- atom_selection (string or iterable of integers, default is None) – The geometries of the original trajectory files will be filtered down to these atoms. It can be any DSL string
that
mdtraj.Topology.select
could understand or directly the iterable of integers. IfMD_trajectories
is already a (list of)mdtraj.Trajectory
objects, the atom-slicing can be done by the user place before calling this method. - sample_kwargs (dictionary of named arguments, optional) – named arguments for the function
molpx.visualize.sample
. Non-expert users can safely ignore this option. Examples aresuperpose
orproj_idxs
Returns: A
molpx._linkutils.MolPXHBox
-object.It contains the
NGLWidget
and theAxesWidget
(which is responsible for the interactive figure). It is child-class of theipywidgets.HBox
-class and has been monkey-patched to have the following extra attributes so that the user has access to all the information being displayed.- linked_axes :
list with all the
pylab.Axis
-objects contained in thewidgetbox
- linked_ax_wdgs :
list with all the
matplotlib.widgets.AxesWidget`objects contained in the :obj:`widgetbox
- linked_figs :
list with all the
pylab.Figure
-objects contained in thewidgetbox
- linked_ngl_wdgs :
list with all the
nglview.NGLWidget
-objects contained in thewidgetbox
- linked_data_arrays :
list with all the numpy ndarrays contained in the
widgetbox
- linked_mdgeoms:
list with all the
mdtraj.Trajectory
-objects contained in thewidgetbox
Return type: widgetbox
- MD_trajectories (str, or list of strings with the filename(s) the the molecular dynamics (MD) trajectories.) – Any file extension that
-
molpx.visualize.
correlations
(correlation_input, geoms=None, proj_idxs=None, feat_name=None, widget=None, proj_color_list=None, n_feats=1, verbose=False, featurizer=None)¶ Provide a visual and textual representation of the linear correlations between projected coordinates (PCA, TICA) and original features.
Parameters: - correlation_input (numpy ndarray or some PyEMMA objects) –
- if array :
- (m,m) correlation matrix, with a row for each feature and a column for each projection
- if PyEMMA-object :
TICA
,PCA
orMDFeaturizer
.
- geoms (None or
mdtraj.Trajectory
, default is None) – The values of the most correlated features will be returned for the geometries in this object. If widget is left to its default, None,correlations
will create a new widget and try to show the most correlated features on top of the widget. - widget (None or
nglview.NGLWidget
, default is None) –Provide an already existing widget to visualize the correlations on top of. This is only for expert use, because no checks are done to see if
correlation_input
and the geometry contained in the widget actually match. Use with caution.Note
- When objects
geoms
andwidget
are provided simultaneously, three things happen: - no new widget will be instantiated
- the display of features will be on top of whatever geometry
widget
contains - the value of the features is computed for the geometry of
geom
Use with caution and clean bookkeeping!
- When objects
- proj_color_list (list, default is None) – projection specific list of colors to provide the representations with. The default None yields blue.
In principle, the list can contain one color for each projection (= as many colors as len(proj_idxs)
but if your list is short it will just default to the last color. This way, proj_color_list=[‘black’] will paint
all black regardless len(proj_idxs). Input anything that yields
matplotlib.colors.is_color_like
== True - proj_idxs (None, or int, or iterable of integers, default is None) – The indices of the projections for which the most correlated feture will be returned. If none it will default to the dimension of the correlation_input object
- feat_name (None or str, default is None) – The prefix with which to prepend the labels of the most correlated features. If left to None, the feature
description found in
correlation_input
will be used, if available - n_feats (int, default is 1) – Number of most correlated features to return for each projection
- featurizer (None or
MDFeaturizer
, default is None) – Ifcorrelation_input
is not anMDFeaturizer
itself, or doesn’t have adata_producer
attribute, the user can input one here. Ifcorrelation_input
andfeaturizer
are bothMDFeaturizer
-objects,featurizer
will be ignored. - verbose (Bool, default is True) – print to standard output
Returns: corr_dict and ngl_wdg
corr_dict – Dictionary with items:
- idxs :
List of length len(proj_idxs) with lists of length n_feat with the idxs of the most correlated features
- vals :
List of length len(proj_idxs) with lists of length n_feat with the corelation values of the most correlated features
- labels :
List of length len(proj_idxs) with lists of length n_feat with the labels of the most correlated features
- feats :
If an
mdtraj.Trajectory
is passed as angeom
argument, the most correlated features will be evaluated and returned as list of length len(proj_idxs
). Each element in th elist is an arrays with shape (geom.n_frames
,n_feats
)- atom_idxs :
List of length len(proj_idxs) each with an nd.array of shape (nfeat, m), where m is the number of atoms needed to describe each feature (1 of cartesian, 2 for distances, 3 for angles, 4 for dihedrals)
- info :
List of length len(
proj_idxs
) with lists of lengthn_feat
with strings describing the correlations
ngl_wdg –
nglview.NGLWidget
with the most correlated features (distances, angles, dihedrals, positions) visualized on top of it.
- correlation_input (numpy ndarray or some PyEMMA objects) –
-
molpx.visualize.
feature
(feat, widget, idxs=0, color_list=None, **kwargs)¶ Provide a visual representation of a PyEMMA feature. PyEMMA’s features are found as a list of the MDFeaturizers’s active_features attribute
Parameters: - featurizer (py:obj:_MDFeautrizer) – A PyEMMA MDFeaturizer object (either a feature or a featurizer, works with both)
- widget (None or nglview widget) – Provide an already existing widget to visualize the correlations on top of. This is only for expert use,
because no checks are done to see if
correlation_input
and the geometry contained in the widget actually match. Use with caution. - idxs (int or iterable of integers, default is 0) – Features can have many contributions, e.g. a distance feature can include many distances. Use this parameter to control which one of them gets represented.
- color_list (list, default is None) – list of colors to represent each feature in feat_idxs. The default None yields blue for everything. In principle, the list can contain one color for each projection (= as many colors as len(feat_idxs) but if your list is short it will just default to the last color. This way, color_list=[‘black’] will paint all black regardless len(proj_idxs)
- **kwargs (optional keyword arguments for _bmutils.add_atom_idxs_widget) – currently, only “radius” is left for the user to determine
- Returns –
- -------- –
- widget – the input widget with the features in
idxs
represented as either distances (for distance features) or “spacefill” spheres (for angular features)
-
molpx.visualize.
sample
(positions, geom, ax, plot_path=False, clear_lines=True, n_smooth=0, ngl_wdg=None, superpose=True, projection=None, n_feats=1, sticky=False, list_of_repr_dicts=None, color_list=None, **link_ax2wdg_kwargs)¶ Visualize the geometries in
geom
according to the data inpositions
on an existing matplotlib axesax
Use this method when the array of positions, the geometries, the axes (and the ngl_wdg, optionally) have already been generated elsewhere.
Parameters: - positions (numpy nd.array of shape (n_frames, 2)) – Contains the position associated the geometries in
geom
. See below for more details - geom (
mdtraj.Trajectory
object or a list thereof.) – The geometries associated with the thepositions
. # TODO: WRITE HOW THE LISTS-LENGTHS CORRESPONDS FOR THE STICKY OPTION - ax (
matplotlib.pyplot.Axes
object) – The axes to be linked with theNGLWidget
- plot_path (bool, default is False) – whether to draw a line connecting the positions in
positions
- clear_lines (bool, default is True) – whether to clear all the 2D objects that were previously drawn in
ax
- n_smooth (int, default is 0,) – if
n_smooth
> 0, the shown geometries and paths will be smoothed out by 2*n_smooth
+1. Seemolpx._bmutils.smooth_geom
for more information - ngl_wdg (None or an existing
NGLWidget
, default is None) – you can provide an already instantiatedNGLWidget
here (avanced use) - superpose (boolean, default is True) – The geometries in
geom
may or may not be oriented, depending on how they were generated. Since this method is mostly for visualization purposes, the default behaviour is to orient them all to maximally overlap with the most compact frame available - projection (object that generated the
positions
, default is None) –The projected coordinates may come from a variety of sources. When working with PyEMMA, a number of objects might have generated this projection, like a
TICA
- or aPCA
- or anMDFeaturizer
-object.
Makes most sense when
positions
where generated withmolpx.generate.projection_paths
, otherwise might produce rubbish. Seen_feats
for more info. # TODO: delete from here below? The features most correlated with thepositions
will be shown in the widget # TODO CHECK THIS geometries ingeom
, allowing the user to establish a visual connection between the projected coordinate and the original features (distances, angles, contacts etc). - n_feats (int, default is 1) – If a
projection
is passed along, the firstn_feats
features that most correlate the the projected trajectories will be represented, both in form of figures of feat(t) as well as in the on top of thengl_wdg
. Ifprojection == None
,nfeats
will be ignored. - sticky (boolean, default is False) – If set to True, the
ngl_wdg
will be sticky in that every click adds a new molecular structure without deleting the previous one. Left-clicks adds a structure, right-clicks deletes a structure. Particularly useful for representing more minima simultaneously. - color_list (None, list of len(
positions
), or “random”) – The colors with which the sticky frames will be plotted. A color is anything that yieldsmatplotlib.colors.is_color_like == True
“None” defaults to coloring by element. “random” randomizes the color choice - list_of_repr_dicts (None or list of dictionaries, default is None) – Has an effect only for
sticky == True
, s.t. these reps instead of the default ones are used. The dictionaries must have at least the keys ‘repr_type’ and ‘selection’. Other key-value pairs are currently ignored but, will be implemented in the future. Seenglview.NGLWidget.add_representation
for more info). - link_ax2wdg_kwargs (dictionary of named arguments, optional) – named arguments for the function
molpx._linkutils.link_ax_w_pos_2_nglwidget
, which is the one that internally provides the interactivity. Non-expert users can safely ignore this option.
Returns: - ngl_wdg (
NGLWidget
) - axes_wdg (
AxesWidget
)
- positions (numpy nd.array of shape (n_frames, 2)) – Contains the position associated the geometries in
-
molpx.visualize.
traj
(MD_trajectories, MD_top, projected_trajectories, max_frames=10000.0, stride=1, proj_stride=1, proj_idxs=[0, 1], proj_labels='proj', plot_FES=False, weights=None, panel_height=1, sharey_traj=True, dt=1.0, tunits='frames', traj_selection=None, projection=None, n_feats=1)¶ Link one or many
projected trajectories
, [Y_0(t), Y_1(t)…], with theMD_trajectories
that originated them. Optionally plot also the resulting FES.Parameters: - MD_trajectories (str, or list of strings with the filename(s) the the molecular dynamics (MD) trajectories.) –
Any file extension that
mdtraj
(.xtc, .dcd etc) can read is accepted.Alternatively, a single
mdtraj.Trajectory
object or a list of them can be given as input. - MD_top (str to topology filename or directly
mdtraj.Topology
object) – - projected_trajectories (numpy ndarray (or list thereof) of shape (n_frames, n_dims) with the time-series) –
- the projection(s) that want to be explored. Alternatively, strings or list of string with .npy or ascii filenames (of) –
- txt etc) ((dat,) –
- NOTE (molpx assumes that there is no time column.) –
- max_frames (int, default is 1000) – If the trajectoy is longer than this, stride to this length (in frames)
- stride (int, default is 1) – Stride value in case of large datasets. In case of having
MD_trajectories
andprojected_trajectories
in memory (and not on disk) the stride can take place also before calling this method. - proj_stride (int, default is 1) – Stride value that was used in the
projected_trajectories
relative to theMD_trajectories
If the originalMD_trajectories
were stored every 5 ps but the projected trajectories were stored every 50 ps,proj_stride
= 10 has to be provided, otherwise an exception will be thrown informing the user that theMD_trajectories
and theprojected_trajectories
have different number of frames. - proj_idxs (iterable of ints, default is [0,1]) – Indices of the projected coordinates to use in the various representations
- proj_labels (either string or list of strings) – The projection plots will get this paramter for labeling their yaxis. If a str is provided, that will be the base name proj_labels=’%s_%u’%(proj_labels,ii) for each projection. If a list, the list will be used. If not enough labels are there the module will complain
- plot_FES (bool, default is False) – Plot (and interactively link) the FES as well
- weights (ndarray(n_frames), default = None) – sample weights. By default all samples have the same weight (used for FES calculation only)
- panel_height (int, default 1) – Height, in inches, of each panel of each trajectory subplots
- sharey_traj (bool, default is True) – Force the panels of each projection to have the same yaxes across trajectories (Note: Not across coordinates)
- dt (float, default is 1.0) – Physical time-unit equivalent to one frame of the
projected_trajectories
- tunits (str, default is 'frames') – Name of the physical time unit provided in
dt
- traj_selection (None, int, iterable of ints, default is None) – Don’t plot all trajectories but only few of them. The default None implies that all trajs will be plotted. Note: the data used for the FES will always include all trajectories, regardless of this value
- projection (object that generated the projection, default is None) –
The projected coordinates may come from a variety of sources. When working with
pyemma
a number of objects might have generated this projection, like aPass this object along and observe and the features that are most correlated with the projections will be plotted for the active trajectory, allowing the user to establish a visual connection between the projected coordinate and the original features (distances, angles, contacts etc)
- n_feats (int, default is 1) – If a
projection
is passed along, the first n_feats features that most correlate the the projected trajectories will be represented, both in form of trajectories feat vs t as well as in the ngl_wdg. Ifprojection
is None,nfeats
will be ignored.
Returns: - ax, ngl_wdg, data_sample, geoms – return _plt.gca(), _plt.gcf(), widget, geoms
- ax –
pylab.Axis
object - fig –
pylab.Figure
object - ngl_wdg –
nglview.NGLWidget
- geoms –
mdtraj.Trajectory
object with the geometries n_sample geometries shown by the ngl_wdg
- MD_trajectories (str, or list of strings with the filename(s) the the molecular dynamics (MD) trajectories.) –