Sync with steven#27
Conversation
…lume parameters in protein_dna.py classes
… this maintains backward compatibility while allowing for future modification
…osition of the protein along the helical axis
Reviewer's GuideAdds tunable electrostatics/exclusion parameters, several new protein–DNA bias and measurement force classes, and a set of driver/analysis scripts around protein–DNA simulations, while slightly generalizing the existing string potential API. Flow diagram for new protein–DNA simulation and analysis scriptsflowchart LR
run["protein_DNA_run.py"]
energy["protein_DNA_energy.py"]
analysis["protein_DNA_analysis.py"]
forces["forces_setup.py (set_up_forces)"]
system["OpenMM System + Simulation"]
traj["output.dcd trajectory"]
run --> forces
energy --> forces
forces --> system
run -->|runs MD, writes| traj
energy -->|minimization/short MD, logs energies| system
analysis --> forces
analysis -->|reads| traj
analysis -->|recomputes energies via Simulation| system
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 5 issues, and left some high level feedback:
- Several core force classes and utilities now print debugging information directly (e.g., cutoff distances, k_ebias, centers) on reset/defineInteraction; consider removing or gating these behind a verbosity flag or logger to avoid noisy output in production runs.
- In multiple places you compare to None with '== None' (e.g., radius_override, cutoff_distance) and sometimes assume unit-bearing vs plain numeric values (e.g., cutoff_distance handling); using 'is None' and consistently enforcing/validating units at the API boundary would make the new parameters safer and less error-prone.
- forces_setup.py currently hardcodes AWSEM_folder and fragment paths and imports every term unconditionally; making these configurable (e.g., via function arguments or environment variables) and avoiding project-specific defaults would improve reuse of this helper in different environments.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Several core force classes and utilities now print debugging information directly (e.g., cutoff distances, k_ebias, centers) on reset/defineInteraction; consider removing or gating these behind a verbosity flag or logger to avoid noisy output in production runs.
- In multiple places you compare to None with '== None' (e.g., radius_override, cutoff_distance) and sometimes assume unit-bearing vs plain numeric values (e.g., cutoff_distance handling); using 'is None' and consistently enforcing/validating units at the API boundary would make the new parameters safer and less error-prone.
- forces_setup.py currently hardcodes AWSEM_folder and fragment paths and imports every term unconditionally; making these configurable (e.g., via function arguments or environment variables) and avoiding project-specific defaults would improve reuse of this helper in different environments.
## Individual Comments
### Comment 1
<location path="open3SPN2/force/protein_dna.py" line_range="129-134" />
<code_context>
+
ldby = ldby.in_units_of(unit.nanometer)
+
+ if self.cutoff_distance == None:
+ cutoff_distance = 5 # for backward compatibility
+ else:
+ cutoff_distance = self.cutoff_distance
+
+ cutoff_nm = cutoff_distance.value_in_unit(unit.nanometer)
+
denominator = 4 * np.pi * pv * dielectric / (Na * ec ** 2)
</code_context>
<issue_to_address>
**issue (bug_risk):** cutoff_distance default leads to unitless value and will break value_in_unit()
Here `cutoff_distance` becomes a bare float when `self.cutoff_distance is None`, so `cutoff_distance.value_in_unit(...)` will raise `AttributeError`. To preserve units and backward compatibility, make the default a `Quantity` (e.g. `self.cutoff_distance = 5*unit.nanometer` in `__init__`) and keep it as a `Quantity` in the fallback, or set `cutoff_distance = 5*unit.nanometer` here instead of `5`. Also check any other paths that might pass a plain float into this code.
</issue_to_address>
### Comment 2
<location path="open3SPN2/force/dna.py" line_range="518-523" />
<code_context>
+
ldby = ldby.in_units_of(unit.nanometer)
+
+ if self.cutoff_distance == None:
+ cutoff_distance = 5 # for backward compatibility
+ else:
+ cutoff_distance = self.cutoff_distance
+
+ cutoff_nm = cutoff_distance.value_in_unit(unit.nanometer)
+
denominator = 4 * np.pi * pv * dielectric / (Na * ec ** 2)
</code_context>
<issue_to_address>
**issue (bug_risk):** Electrostatics cutoff_distance has same unitless-default issue as protein-DNA electrostatics
Because the fallback uses a bare float (`5`), calling `cutoff_distance.value_in_unit(unit.nanometer)` will fail unless `cutoff_distance` is always an OpenMM Quantity. To align with the protein–DNA electrostatics fix and keep backward compatibility, either:
- set the `__init__` default to `cutoff_distance = 5*unit.nanometer`, or
- in `reset`, use `cutoff_distance = 5*unit.nanometer` in the backward-compatible branch.
This preserves the existing API while preventing runtime errors.
</issue_to_address>
### Comment 3
<location path="open3SPN2/force/dna.py" line_range="568-577" />
<code_context>
+class group_constraint_by_position(DNAForce):
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Constraint uses sum of mass-weighted coordinates, not center of mass as advertised
The current CV uses `sum(m_i x_i)` directly, so the effective reference position scales with the number and masses of atoms instead of being invariant like a true COM (which would be `sum(m_i x_i) / sum(m_i)`). To match the documented behavior, either:
- divide each coordinate sum by `total_mass` in the `CustomCVForce` (e.g., `(sum_x/total_mass - x0)^2`), or
- normalize the per-particle weights to `mass/total_mass` and keep the existing expression.
Aligning the implementation with the docstring will prevent downstream confusion and unintended parameter changes.
Suggested implementation:
```python
class group_constraint_by_position(DNAForce):
"""
Apply a harmonic restraint on the center of mass of selected DNA atoms to keep them near (x0, y0, z0).
The actual CV is the center-of-mass (COM) position:
COM = sum_i(m_i * r_i) / sum_i(m_i)
not just the raw mass-weighted position sum_i(m_i * r_i).
"""
def __init__(self, dna, k=1*unit.kilocalorie_per_mole,
x0=10*unit.angstrom, y0=10*unit.angstrom, z0=10*unit.angstrom,
appliedToResidues=None, force_group=24, OpenCLPatch=True):
# Store configuration
self.dna = dna
self.force_group = force_group
self.k = k
self.x0 = x0
self.y0 = y0
self.z0 = z0
self.appliedToResidues = appliedToResidues
self.OpenCLPatch = OpenCLPatch
# Build the OpenMM force object
self._create_force()
def _create_force(self):
"""
Create a CustomCVForce that restrains the center of mass of the selected atoms.
"""
import openmm as mm
from openmm import unit as omm_unit
system = self.dna.system
topology = self.dna.topology
# Collect the particle indices that belong to the selected residues
selected_particle_indices = []
if self.appliedToResidues is None:
# Default: all DNA atoms; assumes dna.atom_list or equivalent is available.
# Adapt this selection logic to your existing DNAForce conventions.
for atom in topology.atoms():
if atom.residue.name.startswith("D") or atom.residue.name.startswith("R"):
selected_particle_indices.append(atom.index)
else:
residues_set = set(self.appliedToResidues)
for atom in topology.atoms():
if atom.residue.index in residues_set:
selected_particle_indices.append(atom.index)
if len(selected_particle_indices) == 0:
raise ValueError("group_constraint_by_position: no atoms selected for COM restraint.")
# Compute total mass of the group
masses = [system.getParticleMass(i) for i in selected_particle_indices]
total_mass = sum(masses) # this is an OpenMM unit-bearing quantity
# Create per-coordinate CVs: sum(m_i * x_i), sum(m_i * y_i), sum(m_i * z_i)
# These are still mass-weighted sums; we convert to COM by dividing by total_mass in the main CV expression.
sum_x_force = mm.CustomCVForce("sum(mass * x)")
sum_y_force = mm.CustomCVForce("sum(mass * y)")
sum_z_force = mm.CustomCVForce("sum(mass * z)")
sum_x_force.addPerParticleParameter("mass")
sum_y_force.addPerParticleParameter("mass")
sum_z_force.addPerParticleParameter("mass")
for idx, m in zip(selected_particle_indices, masses):
sum_x_force.addParticle(idx, [m])
sum_y_force.addParticle(idx, [m])
sum_z_force.addParticle(idx, [m])
# Main COM restraint CV:
# Use COM = sum(m_i * coord_i) / total_mass for each coordinate
com_restraint = mm.CustomCVForce(
"0.5 * k * ( (sum_x/total_mass - x0)^2 + (sum_y/total_mass - y0)^2 + (sum_z/total_mass - z0)^2 )"
)
# Add the collective variables that provide sum_x, sum_y, sum_z
com_restraint.addCollectiveVariable("sum_x", sum_x_force)
com_restraint.addCollectiveVariable("sum_y", sum_y_force)
com_restraint.addCollectiveVariable("sum_z", sum_z_force)
# Global parameters: force constant, reference COM position, and total mass of the group
# Note: total_mass is constant for this group; exposing it as a global parameter
# makes the COM definition explicit and invariant to group size.
com_restraint.addGlobalParameter("k", self.k)
com_restraint.addGlobalParameter("x0", self.x0)
com_restraint.addGlobalParameter("y0", self.y0)
com_restraint.addGlobalParameter("z0", self.z0)
com_restraint.addGlobalParameter("total_mass", total_mass)
com_restraint.setForceGroup(self.force_group)
# Register this force with the DNAForce infrastructure
self.force = com_restraint
self.addForce(self.force, self.OpenCLPatch)
```
1. The selection logic for `selected_particle_indices` must be aligned with how other `DNAForce` subclasses in this file choose their atoms (e.g., there might already be helper methods or precomputed atom lists you should reuse instead of the placeholder `topology`-based loop).
2. The current implementation of `group_constraint_by_position` in the PR likely already constructs `CustomCVForce` instances for `sum(m_i * x_i)` etc.; if so, you can instead **only** change:
- the main CV expression from `(sum_x - x0)^2` to `(sum_x/total_mass - x0)^2` (and similarly for y, z), and
- add a global parameter `total_mass` initialized to the Python-computed total mass for the group.
3. Ensure that any existing calls to `addForce` or class initialization in `DNAForce` are preserved; if `DNAForce` expects a different initialization pattern (e.g., overriding `setup()` rather than calling `_create_force()` in `__init__`), adapt `_create_force()` to that pattern instead of calling `addForce` directly as shown.
</issue_to_address>
### Comment 4
<location path="open3SPN2/force/dna.py" line_range="614-623" />
<code_context>
+
+ self.force = harmonic
+
+ def defineInteraction(self):
+ """
+ Adds selected DNA atoms to the sum_x, sum_y, sum_z forces with appropriate mass weighting.
+ """
+ total_mass = 0.0
+ for i in range(len(self.dna.atoms)):
+ #mass = self.dna.system.getParticleMass(i).value_in_unit(unit.dalton)
+ mass = 1
+
+ # Add particle to sum_x, sum_y, sum_z
+ self.force.getCollectiveVariable(0).addParticle(i, [mass])
+ self.force.getCollectiveVariable(1).addParticle(i, [mass])
+ self.force.getCollectiveVariable(2).addParticle(i, [mass])
+
+ total_mass += mass
+
+ if total_mass == 0:
</code_context>
<issue_to_address>
**suggestion (bug_risk):** appliedToResidues and actual masses are ignored in group_constraint_by_position
In `defineInteraction`, all DNA atoms are added and `mass = 1` is used for every particle (with the `getParticleMass` call commented out). If residue-based selection is meant to matter, the indices should be filtered using `appliedToResidues` before adding them to the CVs. Also, using unit mass for all atoms turns the CV into an unweighted average position rather than a true COM. Please either restore `getParticleMass` (or equivalent) or clearly document that this CV is intentionally unweighted, and ensure `appliedToResidues` is respected when building the index list.
Suggested implementation:
```python
self.force = harmonic
def defineInteraction(self):
"""
Adds selected DNA atoms to the sum_x, sum_y, sum_z collective variables with mass weighting
to compute a true center-of-mass (COM). If ``appliedToResidues`` is provided, only atoms
belonging to those residues are included.
"""
total_mass = 0.0
# Build the list of atom indices to include, honoring appliedToResidues if provided.
if self.appliedToResidues is not None:
atom_indices = []
for i, atom in enumerate(self.dna.atoms):
# Assumes atoms expose a residue identifier as `resid`.
# If your topology uses a different attribute, update this accordingly.
resid = getattr(atom, "resid", None)
if resid in self.appliedToResidues:
atom_indices.append(i)
else:
atom_indices = range(len(self.dna.atoms))
for i in atom_indices:
# Use the actual particle mass so the CV represents a true COM, not an unweighted average.
mass = self.dna.system.getParticleMass(i).value_in_unit(unit.dalton)
# Add particle to sum_x, sum_y, sum_z with its mass as the weight
self.force.getCollectiveVariable(0).addParticle(i, [mass])
self.force.getCollectiveVariable(1).addParticle(i, [mass])
self.force.getCollectiveVariable(2).addParticle(i, [mass])
total_mass += mass
if total_mass == 0:
raise ValueError("No atoms were selected for the group constraint; check appliedToResidues.")
```
1. Ensure that the atom objects in `self.dna.atoms` expose the residue identifier as `atom.resid`. If a different attribute is used (e.g. `atom.residue.id` or `atom.residue.index`), adjust the `resid = getattr(atom, "resid", None)` line and the subsequent membership test accordingly.
2. Confirm that `unit` is imported in this module (e.g. `from openmm import unit` or `import openmm.unit as unit`) so that `value_in_unit(unit.dalton)` works as expected.
3. If the codebase already has a helper for residue-based atom selection (e.g. on `DNAForce` or `dna`), you may prefer to replace the explicit loop that builds `atom_indices` with that helper to keep residue-selection logic consistent across forces.
</issue_to_address>
### Comment 5
<location path="open3SPN2/force/protein_dna.py" line_range="39-42" />
<code_context>
exclusionForce.setNonbondedMethod(exclusionForce.CutoffPeriodic)
else:
exclusionForce.setNonbondedMethod(exclusionForce.CutoffNonPeriodic)
+ print(f"protein dna cutoff {exclusionForce.getCutoffDistance()}")
self.force = exclusionForce
</code_context>
<issue_to_address>
**suggestion:** Unconditional debug print statements in force construction may clutter output
These new print calls (e.g., for the protein-DNA cutoff) will fire on every force construction and can create a lot of noisy stdout in larger or batch runs. Please route them through a logger or guard them with a verbosity/debug flag so they can be disabled in production use.
Suggested implementation:
```python
if self.periodic:
exclusionForce.setNonbondedMethod(exclusionForce.CutoffPeriodic)
else:
exclusionForce.setNonbondedMethod(exclusionForce.CutoffNonPeriodic)
logger.debug("Protein-DNA cutoff distance: %s", exclusionForce.getCutoffDistance())
self.force = exclusionForce
```
To make this compile and behave as intended, you should also:
1. Import the `logging` module near the top of `open3SPN2/force/protein_dna.py` and define a module-level logger, for example:
- `import logging`
- `logger = logging.getLogger(__name__)`
2. Ensure your application or scripts configure the logging system (e.g., with `logging.basicConfig(...)` or a logging config file) so that debug logs can be enabled/disabled as needed.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| class group_constraint_by_position(DNAForce): | ||
| """ | ||
| Apply a harmonic restraint on the center of mass of selected DNA atoms to keep them near (x0, y0, z0). | ||
| """ | ||
|
|
||
| def __init__(self, dna, k=1*unit.kilocalorie_per_mole, | ||
| x0=10*unit.angstrom, y0=10*unit.angstrom, z0=10*unit.angstrom, | ||
| appliedToResidues=None, force_group=24, OpenCLPatch=True): | ||
|
|
||
| self.force_group = force_group |
There was a problem hiding this comment.
suggestion (bug_risk): Constraint uses sum of mass-weighted coordinates, not center of mass as advertised
The current CV uses sum(m_i x_i) directly, so the effective reference position scales with the number and masses of atoms instead of being invariant like a true COM (which would be sum(m_i x_i) / sum(m_i)). To match the documented behavior, either:
- divide each coordinate sum by
total_massin theCustomCVForce(e.g.,(sum_x/total_mass - x0)^2), or - normalize the per-particle weights to
mass/total_massand keep the existing expression.
Aligning the implementation with the docstring will prevent downstream confusion and unintended parameter changes.
Suggested implementation:
class group_constraint_by_position(DNAForce):
"""
Apply a harmonic restraint on the center of mass of selected DNA atoms to keep them near (x0, y0, z0).
The actual CV is the center-of-mass (COM) position:
COM = sum_i(m_i * r_i) / sum_i(m_i)
not just the raw mass-weighted position sum_i(m_i * r_i).
"""
def __init__(self, dna, k=1*unit.kilocalorie_per_mole,
x0=10*unit.angstrom, y0=10*unit.angstrom, z0=10*unit.angstrom,
appliedToResidues=None, force_group=24, OpenCLPatch=True):
# Store configuration
self.dna = dna
self.force_group = force_group
self.k = k
self.x0 = x0
self.y0 = y0
self.z0 = z0
self.appliedToResidues = appliedToResidues
self.OpenCLPatch = OpenCLPatch
# Build the OpenMM force object
self._create_force()
def _create_force(self):
"""
Create a CustomCVForce that restrains the center of mass of the selected atoms.
"""
import openmm as mm
from openmm import unit as omm_unit
system = self.dna.system
topology = self.dna.topology
# Collect the particle indices that belong to the selected residues
selected_particle_indices = []
if self.appliedToResidues is None:
# Default: all DNA atoms; assumes dna.atom_list or equivalent is available.
# Adapt this selection logic to your existing DNAForce conventions.
for atom in topology.atoms():
if atom.residue.name.startswith("D") or atom.residue.name.startswith("R"):
selected_particle_indices.append(atom.index)
else:
residues_set = set(self.appliedToResidues)
for atom in topology.atoms():
if atom.residue.index in residues_set:
selected_particle_indices.append(atom.index)
if len(selected_particle_indices) == 0:
raise ValueError("group_constraint_by_position: no atoms selected for COM restraint.")
# Compute total mass of the group
masses = [system.getParticleMass(i) for i in selected_particle_indices]
total_mass = sum(masses) # this is an OpenMM unit-bearing quantity
# Create per-coordinate CVs: sum(m_i * x_i), sum(m_i * y_i), sum(m_i * z_i)
# These are still mass-weighted sums; we convert to COM by dividing by total_mass in the main CV expression.
sum_x_force = mm.CustomCVForce("sum(mass * x)")
sum_y_force = mm.CustomCVForce("sum(mass * y)")
sum_z_force = mm.CustomCVForce("sum(mass * z)")
sum_x_force.addPerParticleParameter("mass")
sum_y_force.addPerParticleParameter("mass")
sum_z_force.addPerParticleParameter("mass")
for idx, m in zip(selected_particle_indices, masses):
sum_x_force.addParticle(idx, [m])
sum_y_force.addParticle(idx, [m])
sum_z_force.addParticle(idx, [m])
# Main COM restraint CV:
# Use COM = sum(m_i * coord_i) / total_mass for each coordinate
com_restraint = mm.CustomCVForce(
"0.5 * k * ( (sum_x/total_mass - x0)^2 + (sum_y/total_mass - y0)^2 + (sum_z/total_mass - z0)^2 )"
)
# Add the collective variables that provide sum_x, sum_y, sum_z
com_restraint.addCollectiveVariable("sum_x", sum_x_force)
com_restraint.addCollectiveVariable("sum_y", sum_y_force)
com_restraint.addCollectiveVariable("sum_z", sum_z_force)
# Global parameters: force constant, reference COM position, and total mass of the group
# Note: total_mass is constant for this group; exposing it as a global parameter
# makes the COM definition explicit and invariant to group size.
com_restraint.addGlobalParameter("k", self.k)
com_restraint.addGlobalParameter("x0", self.x0)
com_restraint.addGlobalParameter("y0", self.y0)
com_restraint.addGlobalParameter("z0", self.z0)
com_restraint.addGlobalParameter("total_mass", total_mass)
com_restraint.setForceGroup(self.force_group)
# Register this force with the DNAForce infrastructure
self.force = com_restraint
self.addForce(self.force, self.OpenCLPatch)- The selection logic for
selected_particle_indicesmust be aligned with how otherDNAForcesubclasses in this file choose their atoms (e.g., there might already be helper methods or precomputed atom lists you should reuse instead of the placeholdertopology-based loop). - The current implementation of
group_constraint_by_positionin the PR likely already constructsCustomCVForceinstances forsum(m_i * x_i)etc.; if so, you can instead only change:- the main CV expression from
(sum_x - x0)^2to(sum_x/total_mass - x0)^2(and similarly for y, z), and - add a global parameter
total_massinitialized to the Python-computed total mass for the group.
- the main CV expression from
- Ensure that any existing calls to
addForceor class initialization inDNAForceare preserved; ifDNAForceexpects a different initialization pattern (e.g., overridingsetup()rather than calling_create_force()in__init__), adapt_create_force()to that pattern instead of callingaddForcedirectly as shown.
| def defineInteraction(self): | ||
| """ | ||
| Adds selected DNA atoms to the sum_x, sum_y, sum_z forces with appropriate mass weighting. | ||
| """ | ||
| total_mass = 0.0 | ||
| for i in range(len(self.dna.atoms)): | ||
| #mass = self.dna.system.getParticleMass(i).value_in_unit(unit.dalton) | ||
| mass = 1 | ||
|
|
||
| # Add particle to sum_x, sum_y, sum_z |
There was a problem hiding this comment.
suggestion (bug_risk): appliedToResidues and actual masses are ignored in group_constraint_by_position
In defineInteraction, all DNA atoms are added and mass = 1 is used for every particle (with the getParticleMass call commented out). If residue-based selection is meant to matter, the indices should be filtered using appliedToResidues before adding them to the CVs. Also, using unit mass for all atoms turns the CV into an unweighted average position rather than a true COM. Please either restore getParticleMass (or equivalent) or clearly document that this CV is intentionally unweighted, and ensure appliedToResidues is respected when building the index list.
Suggested implementation:
self.force = harmonic
def defineInteraction(self):
"""
Adds selected DNA atoms to the sum_x, sum_y, sum_z collective variables with mass weighting
to compute a true center-of-mass (COM). If ``appliedToResidues`` is provided, only atoms
belonging to those residues are included.
"""
total_mass = 0.0
# Build the list of atom indices to include, honoring appliedToResidues if provided.
if self.appliedToResidues is not None:
atom_indices = []
for i, atom in enumerate(self.dna.atoms):
# Assumes atoms expose a residue identifier as `resid`.
# If your topology uses a different attribute, update this accordingly.
resid = getattr(atom, "resid", None)
if resid in self.appliedToResidues:
atom_indices.append(i)
else:
atom_indices = range(len(self.dna.atoms))
for i in atom_indices:
# Use the actual particle mass so the CV represents a true COM, not an unweighted average.
mass = self.dna.system.getParticleMass(i).value_in_unit(unit.dalton)
# Add particle to sum_x, sum_y, sum_z with its mass as the weight
self.force.getCollectiveVariable(0).addParticle(i, [mass])
self.force.getCollectiveVariable(1).addParticle(i, [mass])
self.force.getCollectiveVariable(2).addParticle(i, [mass])
total_mass += mass
if total_mass == 0:
raise ValueError("No atoms were selected for the group constraint; check appliedToResidues.")- Ensure that the atom objects in
self.dna.atomsexpose the residue identifier asatom.resid. If a different attribute is used (e.g.atom.residue.idoratom.residue.index), adjust theresid = getattr(atom, "resid", None)line and the subsequent membership test accordingly. - Confirm that
unitis imported in this module (e.g.from openmm import unitorimport openmm.unit as unit) so thatvalue_in_unit(unit.dalton)works as expected. - If the codebase already has a helper for residue-based atom selection (e.g. on
DNAForceordna), you may prefer to replace the explicit loop that buildsatom_indiceswith that helper to keep residue-selection logic consistent across forces.
Talked with @stevenluo22 about how we can merge his edits into this branch and this was the result
Summary by Sourcery
Add configurable electrostatics and biasing forces for protein–DNA and DNA-only systems, and introduce scripts to set up, run, and analyze protein–DNA OpenMM simulations.
New Features:
Enhancements: