Skip to content

detectors+vizusalization#32

Open
KarlDeck wants to merge 8 commits intomainfrom
karl-deck/detectors+vizusalization
Open

detectors+vizusalization#32
KarlDeck wants to merge 8 commits intomainfrom
karl-deck/detectors+vizusalization

Conversation

@KarlDeck
Copy link
Copy Markdown
Collaborator

@KarlDeck KarlDeck commented Mar 31, 2026

Implement spike and drop detection, and add an interactive detector explorer

♻️ Current situation & Problem

This PR implements the previously stubbed spike detector in detectors/spike.py and adds an interactive visualization workflow for inspecting detector behavior on real rows.

Before this change:

  • SpikeDetector._detect() always returned an empty list, so structural spike/drop annotations were never produced even when SpikeDetector was configured for a signal.
  • there was no practical way to inspect which configured detectors fired on which parts of a signal beyond reading generated captions or looking at static plots.

⚙️ Release Notes

  • Implemented spike and drop detection in SpikeDetector
  • Added SciPy find_peaks-based detection for both positive spikes and negative drops
  • Used a robust prominence threshold so flat signals stay quiet while isolated peaks/dips can still be detected
  • Added an interactive explorer in explorer.py for browsing rows and signals
  • Added detector overlay visualization so trends, spikes, drops, gaps, and nonwear regions can be inspected directly on the time series
  • Added tabbed details views for stats, detector events, captions, and help
  • Updated README.md with explorer usage instructions

📚 Documentation

This PR adds two main improvements:

Spike/drop detection

detectors/spike.py now:

  • detects upward peaks as spike
  • detects downward peaks as drop
  • maps detections back to the original signal minute indices
  • avoids emitting events for flat / near-constant series

No public detector interface changes were introduced.

Interactive visualization

explorer.py adds a local Matplotlib-based inspection tool that:

  • loads one dataset row at a time
  • lets the user switch signals from a signal list or channel overview
  • overlays configured detector outputs directly on the selected signal
  • provides a side panel for stats, detected events, and generated captions
  • supports zoom/pan with Matplotlib controls for detailed inspection

This is intended as a debugging and development tool for detector behavior, annotation quality, and general dataset exploration.

✅ Testing

Verified locally with small synthetic signal checks:

  • flat signals produce no detections
  • isolated upward peaks produce spike events
  • isolated downward peaks produce drop events

Also verified locally that:

  • the explorer loads rows from the configured dataset
  • signal switching works from the overview/list
  • overlay toggles update the plot correctly
  • the file compiles successfully with py_compile
  • the README instructions match the current workflow

Code of Conduct & Contributing Guidelines

Summary by CodeRabbit

  • New Features

    • Interactive Explorer GUI to browse dataset rows/signals with detector overlays, navigation, zoom/reset, captions, keyboard/mouse controls, dataset jump scanning, and save-to-file.
    • Detector upgrades: working spike/drop detection with adaptive thresholds and multi-window trend detection with merging.
    • Visualization mode to render per-channel detector events and summaries, with optional image export.
  • Documentation

    • README: explicit dependency install, updated run examples, and Explorer usage/controls.
  • Chores

    • requirements.txt updated (added pyarrow, scipy; reordered dependencies).

Implement TrendDetector using relative multi-scale windows, linear-fit scoring,
coverage checks, and segment merging.

Keep existing detector call sites unchanged so current channel configs continue
to work with the new defaults. Future tuning can override detector parameters
per dataset or per channel in mhc/constants.py.
…e 128) so you can inspect which detector fired, on which channel, and at what time in the series.

How to use:
python visualizer.py --mode detectors --row-index 0 --save-path detector_debug_row_0.png --min-wear-pct 0
Introduce a Matplotlib-based explorer for browsing dataset rows,
switching signals, inspecting detector overlays, and reviewing
stats/captions in a side panel. Document how to launch and use the
explorer in the README.
@KarlDeck KarlDeck requested a review from milanagm March 31, 2026 18:30
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 31, 2026

📝 Walkthrough

Walkthrough

Adds working spike/drop and windowed trend detectors, a new interactive Matplotlib SensorExplorer CLI, detector-focused visualization mode, README updates, and dependency additions (pyarrow, scipy) in requirements.

Changes

Cohort / File(s) Summary
Detection Algorithms
detectors/spike.py, detectors/trend.py
SpikeDetector now uses scipy.signal.find_peaks with adaptive prominence, deduplication, and emits spike/drop DetectionResults. TrendDetector implements windowed linear-regression classification, filters by coverage/effect/R², merges adjacent same-direction windows, and emits trend DetectionResults.
Interactive Explorer
explorer.py
New module SensorExplorer with CLI parsing, per-row cached loader, NaN-region detection, interactive Matplotlib UI (heatmap overview, per-signal plots), detector overlays (trend/plateau/spike/drop/gap/nonwear), keyboard/mouse navigation, details tabs, and optional snapshot save.
Visualizer & CLI
visualizer.py
Added plot_detector_view(...), CLI parsing for --mode (captions
Documentation & Dependencies
README.md, requirements.txt
README: added explicit python3 -m pip install -r requirements.txt step and python3 examples; documented Explorer controls. requirements.txt: reordered datasets, added pyarrow and scipy.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Explorer as SensorExplorer
    participant Dataset as MHCDataset
    participant Transform as MHCTransformer
    participant Annotator as Annotator
    participant Detector as Detector (spike/trend/others)
    participant UI as Matplotlib UI

    User->>Explorer: launch explorer.py (--row-index, --signal-index)
    Explorer->>Dataset: _load_row_bundle(row_index)
    Dataset-->>Explorer: signals, metadata, cached annotations
    Explorer->>Transform: transform_row(signals)
    Transform-->>Explorer: transformed signals
    Explorer->>Annotator: annotate(signals)
    Annotator->>Detector: detect(series) per signal
    Detector-->>Annotator: DetectionResult[]
    Annotator-->>Explorer: captions, stats, events
    Explorer->>UI: render() — plot series, heatmap, overlays, details
    UI-->>User: interactive window
    User->>UI: keyboard/mouse (navigate, toggle overlays)
    UI->>Explorer: update selection/toggles
    Explorer->>Explorer: recompute view/state
    Explorer->>UI: render() updated view
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐰 I hop through peaks and gentle trends,
I sniff the drops where signal bends,
An explorer paints with keys and panes,
Spikes and spans in neat refrains,
Data carrots glow — the rabbit grins.

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'detectors+vizusalization' is vague and contains a typo ('vizusalization' instead of 'visualization'), making it unclear and non-descriptive. Revise the title to be more descriptive and specific, such as 'Implement spike/drop detection and add interactive explorer visualization' or similar, and fix the spelling of 'visualization'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch karl-deck/detectors+vizusalization

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@KarlDeck KarlDeck changed the title Karl deck/detectors+vizusalization detectors+vizusalization Mar 31, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (9)
visualizer.py (2)

155-155: Add strict=True to zip() for safety.

While the lengths are guaranteed to match by construction, adding strict=True provides better error messages if assumptions are violated in future refactors.

✨ Suggested fix
-    for i, (ax, signal) in enumerate(zip(axes[:n_channels], signals)):
+    for i, (ax, signal) in enumerate(zip(axes[:n_channels], signals, strict=True)):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@visualizer.py` at line 155, The loop using zip in the visualization code (for
i, (ax, signal) in enumerate(zip(axes[:n_channels], signals))) should pass
strict=True to zip so mismatched lengths raise a clear error; update that zip
call to zip(axes[:n_channels], signals, strict=True) in the function containing
this loop to enforce and surface any future length-compatibility regressions.

176-186: Consider bounds checking before array access.

Direct access to signal.data[minute] could raise an IndexError if minute exceeds the array length. While this shouldn't occur with valid detector output, a defensive check would prevent cryptic errors from malformed detector results.

🛡️ Proposed defensive check
                 elif result.event_type == "spike":
                     minute = int(result.spike_minute)
-                    value = signal.data[minute]
-                    if not np.isnan(value):
+                    if minute < len(signal.data):
+                        value = signal.data[minute]
+                        if not np.isnan(value):
-                        ax.scatter(minute, value, color="#2ca02c", marker="^", s=28, zorder=3)
+                            ax.scatter(minute, value, color="#2ca02c", marker="^", s=28, zorder=3)
                     rendered.append(f"spike @{minute}")
                 elif result.event_type == "drop":
                     minute = int(result.spike_minute)
-                    value = signal.data[minute]
-                    if not np.isnan(value):
+                    if minute < len(signal.data):
+                        value = signal.data[minute]
+                        if not np.isnan(value):
-                        ax.scatter(minute, value, color="#d62728", marker="v", s=28, zorder=3)
+                            ax.scatter(minute, value, color="#d62728", marker="v", s=28, zorder=3)
                     rendered.append(f"drop @{minute}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@visualizer.py` around lines 176 - 186, The code directly indexes signal.data
with minute (from result.spike_minute) in both the "spike" and "drop" branches;
add a defensive bounds check before accessing signal.data — compute minute =
int(result.spike_minute) and verify 0 <= minute < len(signal.data) and not
np.isnan(signal.data[minute]) before calling ax.scatter and appending to
rendered; apply the same guard to both branches (where event_type == "spike" and
"drop") so malformed detector output won't raise IndexError.
detectors/trend.py (2)

80-80: Minor: Remove unnecessary int() cast.

Same as line 56 - round() already returns an integer.

✨ Suggested simplification
-            sizes.add(int(round(n_samples * frac)))
+            sizes.add(round(n_samples * frac))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@detectors/trend.py` at line 80, The sizes.add call wraps round(...) with an
unnecessary int() cast; update the code in detectors/trend.py where
sizes.add(int(round(n_samples * frac))) is used to remove the int() so it calls
sizes.add(round(n_samples * frac)) (matching the earlier occurrence at line 56),
keeping n_samples and frac as-is.

56-56: Minor: Remove unnecessary int() cast.

round() on a float already returns an int in Python 3, making the explicit int() call redundant.

✨ Suggested simplification
-            stride = max(1, int(round(window_size * self.stride_frac)))
+            stride = max(1, round(window_size * self.stride_frac))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@detectors/trend.py` at line 56, In the assignment that computes stride (the
line setting stride using window_size and self.stride_frac), remove the
redundant int() cast around round(...) so that stride uses the integer result
returned by round directly (i.e., keep max(1, round(window_size *
self.stride_frac))). Update the expression in the stride assignment (referencing
variables stride, window_size, and self.stride_frac) accordingly and run tests
to confirm no behavioral change.
requirements.txt (1)

1-6: Consider pinning dependency versions for reproducibility.

Unpinned dependencies can lead to non-reproducible builds and unexpected breakages when upstream packages release breaking changes. This is particularly important for torch, numpy, and scipy which have known compatibility matrices.

📦 Proposed fix: Pin versions (example)
-datasets
-matplotlib
-numpy
-pyarrow
-scipy
-torch
+datasets>=2.18.0,<3.0
+matplotlib>=3.8.0,<4.0
+numpy>=1.26.0,<2.0
+pyarrow>=15.0.0,<16.0
+scipy>=1.12.0,<2.0
+torch>=2.2.0,<3.0

Adjust versions to match your tested environment. Consider using pip freeze to capture exact versions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@requirements.txt` around lines 1 - 6, Pin the dependencies in
requirements.txt to specific, tested versions to ensure reproducible installs;
update the lines for numpy, scipy, torch (and other packages like datasets,
matplotlib, pyarrow) from unpinned names to explicit versions that match your
CI/local environment (use pip freeze or your tested env to obtain the exact
versions) and commit the updated requirements.txt so installs reproduce the
known-good dependency matrix.
explorer.py (4)

420-425: Minor: Add strict=True to zip() for consistency.

✨ Suggested fix
         handles, labels = self.ax_main.get_legend_handles_labels()
         deduped: dict[str, object] = {}
-        for handle, label in zip(handles, labels):
+        for handle, label in zip(handles, labels, strict=True):
             deduped.setdefault(label, handle)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@explorer.py` around lines 420 - 425, The loop that pairs handles and labels
uses zip(handles, labels) without strict checking; update the pairing to
zip(handles, labels, strict=True) in the block where
self.ax_main.get_legend_handles_labels() is processed (the variables handles,
labels and the deduped dict in the explorer.py section) so mismatched lengths
raise immediately and ensure consistent legend creation for
self.ax_main.legend(...).

36-49: Inconsistent _nan_regions implementation with visualizer.py.

This function returns inclusive end indices (i - 1, len(arr) - 1), while visualizer.py:61-74 returns exclusive end indices (i, len(arr)). This inconsistency could cause subtle off-by-one bugs if the functions are used interchangeably or code is refactored.

Consider extracting a shared utility function to avoid duplication and ensure consistency.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@explorer.py` around lines 36 - 49, The _nan_regions function currently
returns inclusive end indices while visualizer.py uses exclusive end indices;
change _nan_regions (or replace it with a shared utility, e.g.,
find_nan_regions) to return exclusive end indices (use i and len(arr) rather
than i-1 and len(arr)-1) so both modules match, update any callers to expect
exclusive slices, and factor the logic into a common helper function referenced
by both explorer.py and visualizer.py to avoid future divergence.

167-172: lru_cache on instance method can cause memory leaks.

Using @lru_cache on a method causes the cache to hold strong references to self, preventing garbage collection of SensorExplorer instances. While this application likely has only one instance, it's a latent issue.

🔧 Proposed fix: Use a module-level cache function
+@lru_cache(maxsize=12)
+def _load_row_bundle_cached(
+    dataset_id: int,
+    row_index: int,
+    transformer: MHCTransformer,
+    annotator: Annotator,
+    dataset: MHCDataset,
+) -> tuple[list[Signal], list[Sample], list[Annotation]]:
+    row = dataset[row_index]
+    signals = transformer.transform_row(row)
+    samples, annotations = annotator.annotate(signals)
+    return signals, samples, annotations
+

 class SensorExplorer:
     ...
-    `@lru_cache`(maxsize=12)
-    def _load_row_bundle(self, row_index: int) -> tuple[list[Signal], list[Sample], list[Annotation]]:
-        row = self.dataset[row_index]
-        signals = self.transformer.transform_row(row)
-        samples, annotations = self.annotator.annotate(signals)
-        return signals, samples, annotations
+    def _load_row_bundle(self, row_index: int) -> tuple[list[Signal], list[Sample], list[Annotation]]:
+        return _load_row_bundle_cached(
+            id(self.dataset), row_index, self.transformer, self.annotator, self.dataset
+        )

Note: This approach still has caveats since MHCTransformer and Annotator aren't hashable by default. An alternative is to simply remove the cache if row loading is fast enough, or use a manual dict cache on the instance.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@explorer.py` around lines 167 - 172, The `@lru_cache` on the instance method
_load_row_bundle causes the cache to retain self and leak SensorExplorer; fix by
removing the decorator and either (A) implement a module-level cached function
(e.g., module_load_row_bundle(dataset, transformer, annotator, row_index)) that
is lru_cached and called from SensorExplorer._load_row_bundle, ensuring
transformer (MHCTransformer) and annotator (Annotator) are replaced with
hashable identifiers or their state is folded into the cache key, or (B) replace
the decorator with an explicit instance cache (self._row_bundle_cache: dict) in
SensorExplorer and implement lookup/populate logic in _load_row_bundle to avoid
capturing self in a global lru cache. Ensure you update calls to
_load_row_bundle accordingly.

221-225: Minor: Remove unnecessary int() casts.

round() already returns an integer in Python 3.

✨ Suggested simplification
     def _on_click(self, event) -> None:
         if event.inaxes is self.ax_overview and event.ydata is not None:
-            self._set_signal(int(round(event.ydata)))
+            self._set_signal(round(event.ydata))
         elif event.inaxes is self.ax_signal_list and event.ydata is not None:
-            self._set_signal(int(round(event.ydata)))
+            self._set_signal(round(event.ydata))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@explorer.py` around lines 221 - 225, The _on_click handler wraps
round(event.ydata) with unnecessary int(...) casts; remove the int(...) calls
and call self._set_signal(round(event.ydata)) in both branches so you pass the
integer returned by round directly (ensure event.ydata is still checked for None
and _set_signal accepts an int).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@detectors/trend.py`:
- Line 80: The sizes.add call wraps round(...) with an unnecessary int() cast;
update the code in detectors/trend.py where sizes.add(int(round(n_samples *
frac))) is used to remove the int() so it calls sizes.add(round(n_samples *
frac)) (matching the earlier occurrence at line 56), keeping n_samples and frac
as-is.
- Line 56: In the assignment that computes stride (the line setting stride using
window_size and self.stride_frac), remove the redundant int() cast around
round(...) so that stride uses the integer result returned by round directly
(i.e., keep max(1, round(window_size * self.stride_frac))). Update the
expression in the stride assignment (referencing variables stride, window_size,
and self.stride_frac) accordingly and run tests to confirm no behavioral change.

In `@explorer.py`:
- Around line 420-425: The loop that pairs handles and labels uses zip(handles,
labels) without strict checking; update the pairing to zip(handles, labels,
strict=True) in the block where self.ax_main.get_legend_handles_labels() is
processed (the variables handles, labels and the deduped dict in the explorer.py
section) so mismatched lengths raise immediately and ensure consistent legend
creation for self.ax_main.legend(...).
- Around line 36-49: The _nan_regions function currently returns inclusive end
indices while visualizer.py uses exclusive end indices; change _nan_regions (or
replace it with a shared utility, e.g., find_nan_regions) to return exclusive
end indices (use i and len(arr) rather than i-1 and len(arr)-1) so both modules
match, update any callers to expect exclusive slices, and factor the logic into
a common helper function referenced by both explorer.py and visualizer.py to
avoid future divergence.
- Around line 167-172: The `@lru_cache` on the instance method _load_row_bundle
causes the cache to retain self and leak SensorExplorer; fix by removing the
decorator and either (A) implement a module-level cached function (e.g.,
module_load_row_bundle(dataset, transformer, annotator, row_index)) that is
lru_cached and called from SensorExplorer._load_row_bundle, ensuring transformer
(MHCTransformer) and annotator (Annotator) are replaced with hashable
identifiers or their state is folded into the cache key, or (B) replace the
decorator with an explicit instance cache (self._row_bundle_cache: dict) in
SensorExplorer and implement lookup/populate logic in _load_row_bundle to avoid
capturing self in a global lru cache. Ensure you update calls to
_load_row_bundle accordingly.
- Around line 221-225: The _on_click handler wraps round(event.ydata) with
unnecessary int(...) casts; remove the int(...) calls and call
self._set_signal(round(event.ydata)) in both branches so you pass the integer
returned by round directly (ensure event.ydata is still checked for None and
_set_signal accepts an int).

In `@requirements.txt`:
- Around line 1-6: Pin the dependencies in requirements.txt to specific, tested
versions to ensure reproducible installs; update the lines for numpy, scipy,
torch (and other packages like datasets, matplotlib, pyarrow) from unpinned
names to explicit versions that match your CI/local environment (use pip freeze
or your tested env to obtain the exact versions) and commit the updated
requirements.txt so installs reproduce the known-good dependency matrix.

In `@visualizer.py`:
- Line 155: The loop using zip in the visualization code (for i, (ax, signal) in
enumerate(zip(axes[:n_channels], signals))) should pass strict=True to zip so
mismatched lengths raise a clear error; update that zip call to
zip(axes[:n_channels], signals, strict=True) in the function containing this
loop to enforce and surface any future length-compatibility regressions.
- Around line 176-186: The code directly indexes signal.data with minute (from
result.spike_minute) in both the "spike" and "drop" branches; add a defensive
bounds check before accessing signal.data — compute minute =
int(result.spike_minute) and verify 0 <= minute < len(signal.data) and not
np.isnan(signal.data[minute]) before calling ax.scatter and appending to
rendered; apply the same guard to both branches (where event_type == "spike" and
"drop") so malformed detector output won't raise IndexError.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2be5bc20-2859-4272-89bc-159a64123910

📥 Commits

Reviewing files that changed from the base of the PR and between a1b684e and d703c93.

📒 Files selected for processing (6)
  • README.md
  • detectors/spike.py
  • detectors/trend.py
  • explorer.py
  • requirements.txt
  • visualizer.py

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
visualizer.py (3)

156-156: Add strict=True to zip() for safer iteration.

Both axes[:n_channels] and signals should have matching lengths, but adding strict=True provides a defensive check that will raise a ValueError if they unexpectedly differ.

♻️ Proposed fix
-    for i, (ax, signal) in enumerate(zip(axes[:n_channels], signals)):
+    for i, (ax, signal) in enumerate(zip(axes[:n_channels], signals, strict=True)):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@visualizer.py` at line 156, The loop using zip(axes[:n_channels], signals)
should use strict=True to ensure axes[:n_channels] and signals have matching
lengths; update the iteration in visualizer.py (the for i, (ax, signal) in
enumerate(...) loop that references axes, signals and n_channels) to call
zip(..., strict=True) so a ValueError is raised if lengths differ, preventing
silent mismatches during plotting.

275-286: Inconsistent empty signals handling between modes.

In detectors mode, plot_detector_view raises ValueError for empty signals (line 137). However, in captions mode, plot_row would fail with a less informative error from np.stack if signals is empty.

Consider adding consistent validation before entering either branch:

♻️ Proposed fix
     row = dataset[args.row_index]
     signals = MHCTransformer().transform_row(row)
+    if not signals:
+        raise ValueError(f"No signals found for row {args.row_index}")

     if args.mode == "detectors":
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@visualizer.py` around lines 275 - 286, Validate that signals returned by
MHCTransformer().transform_row(row) is non-empty before branching on args.mode
and raise a consistent, informative ValueError if empty; specifically, add a
check after MHCTransformer.transform_row that asserts signals (or
signals.size/len) is > 0 and raise e.g. ValueError("Empty signals: cannot plot")
so both code paths (plot_detector_view and plot_row) receive the same
precondition, and ensure this handling is applied where signals is used earlier
(references: MHCTransformer.transform_row, plot_detector_view, plot_row, and
Annotator.annotate).

179-190: Consider adding bounds check before array access.

The code accesses signal.data[minute] without verifying that minute is within bounds. While the detector implementation should produce valid indices (since it operates on the same array), a bounds check would provide defensive protection against edge cases or future detector changes.

🛡️ Proposed defensive fix
                 elif result.event_type == "spike":
                     minute = int(result.spike_minute)
-                    value = signal.data[minute]
-                    if not np.isnan(value):
+                    if 0 <= minute < len(signal.data) and not np.isnan(signal.data[minute]):
+                        value = signal.data[minute]
                         ax.scatter(minute, value, color="#2ca02c", marker="^", s=28, zorder=3)
                     rendered.append(f"spike @{minute}")
                 elif result.event_type == "drop":
                     minute = int(result.spike_minute)
-                    value = signal.data[minute]
-                    if not np.isnan(value):
+                    if 0 <= minute < len(signal.data) and not np.isnan(signal.data[minute]):
+                        value = signal.data[minute]
                         ax.scatter(minute, value, color="#d62728", marker="v", s=28, zorder=3)
                     rendered.append(f"drop @{minute}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@visualizer.py` around lines 179 - 190, Add a defensive bounds check before
indexing signal.data with minute in the visualizer branch handling
result.event_type ("spike" and "drop"): validate minute =
int(result.spike_minute) is >= 0 and < len(signal.data) before reading value =
signal.data[minute]; only perform the np.isnan check, ax.scatter call, and
append to rendered when the index is in range (otherwise skip or optionally
log/mark the event), so the code in the "spike" and "drop" branches never
accesses out-of-range indices.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@visualizer.py`:
- Line 156: The loop using zip(axes[:n_channels], signals) should use
strict=True to ensure axes[:n_channels] and signals have matching lengths;
update the iteration in visualizer.py (the for i, (ax, signal) in enumerate(...)
loop that references axes, signals and n_channels) to call zip(..., strict=True)
so a ValueError is raised if lengths differ, preventing silent mismatches during
plotting.
- Around line 275-286: Validate that signals returned by
MHCTransformer().transform_row(row) is non-empty before branching on args.mode
and raise a consistent, informative ValueError if empty; specifically, add a
check after MHCTransformer.transform_row that asserts signals (or
signals.size/len) is > 0 and raise e.g. ValueError("Empty signals: cannot plot")
so both code paths (plot_detector_view and plot_row) receive the same
precondition, and ensure this handling is applied where signals is used earlier
(references: MHCTransformer.transform_row, plot_detector_view, plot_row, and
Annotator.annotate).
- Around line 179-190: Add a defensive bounds check before indexing signal.data
with minute in the visualizer branch handling result.event_type ("spike" and
"drop"): validate minute = int(result.spike_minute) is >= 0 and <
len(signal.data) before reading value = signal.data[minute]; only perform the
np.isnan check, ax.scatter call, and append to rendered when the index is in
range (otherwise skip or optionally log/mark the event), so the code in the
"spike" and "drop" branches never accesses out-of-range indices.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b7e4d37b-d351-4a20-a35d-907c83f96271

📥 Commits

Reviewing files that changed from the base of the PR and between d703c93 and 35e9394.

📒 Files selected for processing (1)
  • visualizer.py

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
explorer.py (2)

266-270: Remove redundant int() casts.

In Python 3, round() returns an int when called with a single argument, so int(round(event.ydata)) is redundant.

♻️ Suggested fix
     def _on_click(self, event) -> None:
         if event.inaxes is self.ax_overview and event.ydata is not None:
-            self._set_signal(int(round(event.ydata)))
+            self._set_signal(round(event.ydata))
         elif event.inaxes is self.ax_signal_list and event.ydata is not None:
-            self._set_signal(int(round(event.ydata)))
+            self._set_signal(round(event.ydata))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@explorer.py` around lines 266 - 270, In _on_click, remove the redundant int()
wrappers around round(event.ydata) in both branches (the calls that pass values
to _set_signal); use round(event.ydata) directly when calling _set_signal since
round(x) already returns an int for a single argument in Python 3—update both
the ax_overview and ax_signal_list branches to pass round(event.ydata) to
_set_signal.

526-531: Consider adding strict=True to zip() for defensive coding.

While get_legend_handles_labels() guarantees equal-length lists, adding strict=True makes the assumption explicit and catches any future issues.

♻️ Suggested fix
         handles, labels = self.ax_main.get_legend_handles_labels()
         deduped: dict[str, object] = {}
-        for handle, label in zip(handles, labels):
+        for handle, label in zip(handles, labels, strict=True):
             deduped.setdefault(label, handle)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@explorer.py` around lines 526 - 531, The zip between handles and labels
should be made strict to fail fast if lengths diverge; update the loop using
zip(handles, labels) in the block where get_legend_handles_labels() is called
(variables: handles, labels, deduped, and ax_main.legend) to use zip(handles,
labels, strict=True) so any future mismatch raises immediately (ensure runtime
on Python >=3.10).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@explorer.py`:
- Around line 310-313: The `@lru_cache` on the instance method
_row_detector_events holds a strong reference to self and causes the same leak
as _load_row_bundle; remove the decorator and implement an instance-level cache
(e.g., a dict like self._row_detector_events_cache keyed by row_index) or
alternatively refactor _row_detector_events into a module-level function and
apply `@lru_cache` there; ensure cache lookup, population and invalidation are
handled (clear on relevant state changes) and reference the existing helper
_load_row_bundle and the method name _row_detector_events when making the
change.
- Around line 192-197: The use of `@lru_cache` on the instance method
_load_row_bundle (in class SensorExplorer) captures self in the cache key and
can leak SensorExplorer instances; remove the `@lru_cache` decorator and implement
one of the recommended patterns: either create a module-level cached helper that
accepts a stable identifier plus row_index (so it does not capture self) and
call it from _load_row_bundle, or add an instance-level dict cache (e.g.,
self._row_bundle_cache keyed by row_index) inside SensorExplorer and have
_load_row_bundle check/populate that dict before computing signals, samples,
annotations; ensure cache invalidation/clear method exists if
SensorExplorer.dataset changes.
- Around line 52-61: The _format_detector_event function contains dead branches
for "plateau" and "gap" which are never produced by DetectionResult (only
"trend", "spike", "drop") given the configured TrendDetector and SpikeDetector;
remove the plateau and gap branches from _format_detector_event and simplify the
function to handle only "trend", "spike"/"drop" and a final fallback, and then
remove any matching plateau/gap handling in the render() function and any other
places that special-case those event types so all code paths assume only
"trend", "spike", or "drop" (keep the existing formatting for trend and
spike/drop and the generic fallback for unknown types).

---

Nitpick comments:
In `@explorer.py`:
- Around line 266-270: In _on_click, remove the redundant int() wrappers around
round(event.ydata) in both branches (the calls that pass values to _set_signal);
use round(event.ydata) directly when calling _set_signal since round(x) already
returns an int for a single argument in Python 3—update both the ax_overview and
ax_signal_list branches to pass round(event.ydata) to _set_signal.
- Around line 526-531: The zip between handles and labels should be made strict
to fail fast if lengths diverge; update the loop using zip(handles, labels) in
the block where get_legend_handles_labels() is called (variables: handles,
labels, deduped, and ax_main.legend) to use zip(handles, labels, strict=True) so
any future mismatch raises immediately (ensure runtime on Python >=3.10).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6cfc5efd-1bae-4983-bf67-760ba209882f

📥 Commits

Reviewing files that changed from the base of the PR and between 35e9394 and ce2818d.

📒 Files selected for processing (1)
  • explorer.py

Comment thread explorer.py
Comment on lines +52 to +61
def _format_detector_event(detector_name: str, result) -> str:
if result.event_type == "trend":
return f"{detector_name}: {result.direction} {result.start_minute}-{result.end_minute}"
if result.event_type == "plateau":
return f"{detector_name}: plateau {result.start_minute}-{result.end_minute}"
if result.event_type in {"spike", "drop"}:
return f"{detector_name}: {result.event_type} @{result.spike_minute}"
if result.event_type == "gap":
return f"{detector_name}: gap {result.start_minute}-{result.end_minute}"
return f"{detector_name}: {result.event_type}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for any usage of "plateau" or "gap" as event_type in the codebase
echo "=== Searching for plateau event_type usage ==="
rg -n '"plateau"' --type py

echo ""
echo "=== Searching for gap event_type usage ==="
rg -n '"gap"' --type py

echo ""
echo "=== Checking DetectionResult definition ==="
ast-grep --pattern 'class DetectionResult:
  $$$
'

Repository: StanfordBDHG/SensorTSLM

Length of output: 2886


🏁 Script executed:

#!/bin/bash
# Find where _format_detector_event is called
echo "=== Searching for callers of _format_detector_event ==="
rg -n '_format_detector_event' --type py -B 3 -A 3

echo ""
echo "=== Check if DetectionResult is extended or if there are other event sources ==="
rg -n 'DetectionResult' --type py | head -30

echo ""
echo "=== Search for where "plateau" or "gap" events might be created ==="
rg -n 'event_type.*=.*"plateau"' --type py
rg -n 'event_type.*=.*"gap"' --type py

Repository: StanfordBDHG/SensorTSLM

Length of output: 2393


🏁 Script executed:

#!/bin/bash
# Find where detector_events is populated in the context of line 577
echo "=== Context around line 577 in explorer.py ==="
sed -n '550,590p' explorer.py

echo ""
echo "=== Search for detector instantiation/usage ==="
rg -n 'Detector' explorer.py | head -20

echo ""
echo "=== Check what detectors are actually imported ==="
rg -n 'from detectors' explorer.py
rg -n 'import.*Detector' explorer.py

Repository: StanfordBDHG/SensorTSLM

Length of output: 2155


🏁 Script executed:

#!/bin/bash
# Find the function containing line 577
echo "=== Find function containing line 577 ==="
awk 'NR<=577 {if (/^def /) last_def=$0; last_line=NR} NR==577 {print "Line 577 is in: " last_def; exit}' explorer.py

echo ""
echo "=== Full function context from line 577 ==="
awk '/^def / {func=$0; start=NR} start && NR==577 {for(i=start; i<=NR; i++) print} start && NR>577 && /^def / {exit}' explorer.py | head -100

echo ""
echo "=== Search for detector_events assignment ==="
rg -n 'detector_events\s*=' explorer.py -B 2 -A 2

Repository: StanfordBDHG/SensorTSLM

Length of output: 835


🏁 Script executed:

#!/bin/bash
# Find _detector_events method definition
echo "=== Find _detector_events method ==="
rg -n 'def _detector_events' explorer.py -A 30

echo ""
echo "=== Check if plateau/gap detectors exist ==="
find . -name '*detector*' -o -name '*Detector*' | grep -v __pycache__

Repository: StanfordBDHG/SensorTSLM

Length of output: 1836


🏁 Script executed:

#!/bin/bash
# Find MHC_CHANNEL_CONFIG definition
echo "=== Find MHC_CHANNEL_CONFIG ==="
rg -n 'MHC_CHANNEL_CONFIG' explorer.py | head -5

echo ""
echo "=== Find where MHC_CHANNEL_CONFIG is defined ==="
rg -n 'MHC_CHANNEL_CONFIG\s*=' --type py -A 5

echo ""
echo "=== Check imports in explorer.py ==="
head -30 explorer.py

Repository: StanfordBDHG/SensorTSLM

Length of output: 2008


🏁 Script executed:

#!/bin/bash
# Read mhc/constants.py to see full MHC_CHANNEL_CONFIG definition
echo "=== Full MHC_CHANNEL_CONFIG definition ==="
sed -n '47,150p' mhc/constants.py

Repository: StanfordBDHG/SensorTSLM

Length of output: 1162


Remove dead code for unsupported event types.

The _format_detector_event() function handles "plateau" and "gap" event types (lines 55-56, 59-60), but these are never produced. The DetectionResult type restricts event_type to Literal["trend", "spike", "drop"], and the detector configuration in mhc/constants.py instantiates only TrendDetector() and SpikeDetector(), which generate only those three event types.

Remove the dead code branches for "plateau" and "gap" from _format_detector_event(). Also remove corresponding dead code in render() (lines 499-500, 517-518) and any other locations handling these unsupported types.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@explorer.py` around lines 52 - 61, The _format_detector_event function
contains dead branches for "plateau" and "gap" which are never produced by
DetectionResult (only "trend", "spike", "drop") given the configured
TrendDetector and SpikeDetector; remove the plateau and gap branches from
_format_detector_event and simplify the function to handle only "trend",
"spike"/"drop" and a final fallback, and then remove any matching plateau/gap
handling in the render() function and any other places that special-case those
event types so all code paths assume only "trend", "spike", or "drop" (keep the
existing formatting for trend and spike/drop and the generic fallback for
unknown types).

Comment thread explorer.py
Comment on lines +192 to +197
@lru_cache(maxsize=12)
def _load_row_bundle(self, row_index: int) -> tuple[list[Signal], list[Sample], list[Annotation]]:
row = self.dataset[row_index]
signals = self.transformer.transform_row(row)
samples, annotations = self.annotator.annotate(signals)
return signals, samples, annotations
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

lru_cache on instance method can cause memory leaks.

Using @lru_cache on a method caches self as part of the key, preventing garbage collection of the SensorExplorer instance even after it's no longer referenced. While this may be acceptable for a single-instance GUI, it becomes problematic if instances are created/destroyed programmatically (e.g., in tests or batch processing with --save-path).

🛠️ Recommended fix: use a module-level cache or make the method static

Option 1: Module-level cache with explicit key:

+@lru_cache(maxsize=12)
+def _load_row_bundle_cached(
+    dataset_id: int, row_index: int, transformer: MHCTransformer, annotator: Annotator
+) -> tuple[list[Signal], list[Sample], list[Annotation]]:
+    # Note: requires dataset to be accessible or passed differently
+    ...

 class SensorExplorer:
-    `@lru_cache`(maxsize=12)
     def _load_row_bundle(self, row_index: int) -> tuple[list[Signal], list[Sample], list[Annotation]]:
-        row = self.dataset[row_index]
-        signals = self.transformer.transform_row(row)
-        samples, annotations = self.annotator.annotate(signals)
-        return signals, samples, annotations
+        return _load_row_bundle_cached(id(self.dataset), row_index, self.transformer, self.annotator)

Option 2: Use a dict-based cache on the instance:

+    def __init__(self, ...):
+        ...
+        self._row_cache: dict[int, tuple[list[Signal], list[Sample], list[Annotation]]] = {}
+
-    `@lru_cache`(maxsize=12)
     def _load_row_bundle(self, row_index: int) -> tuple[list[Signal], list[Sample], list[Annotation]]:
+        if row_index in self._row_cache:
+            return self._row_cache[row_index]
         row = self.dataset[row_index]
         signals = self.transformer.transform_row(row)
         samples, annotations = self.annotator.annotate(signals)
+        if len(self._row_cache) >= 12:
+            self._row_cache.pop(next(iter(self._row_cache)))
+        self._row_cache[row_index] = (signals, samples, annotations)
         return signals, samples, annotations
🧰 Tools
🪛 Ruff (0.15.7)

[warning] 192-192: Use of functools.lru_cache or functools.cache on methods can lead to memory leaks

(B019)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@explorer.py` around lines 192 - 197, The use of `@lru_cache` on the instance
method _load_row_bundle (in class SensorExplorer) captures self in the cache key
and can leak SensorExplorer instances; remove the `@lru_cache` decorator and
implement one of the recommended patterns: either create a module-level cached
helper that accepts a stable identifier plus row_index (so it does not capture
self) and call it from _load_row_bundle, or add an instance-level dict cache
(e.g., self._row_bundle_cache keyed by row_index) inside SensorExplorer and have
_load_row_bundle check/populate that dict before computing signals, samples,
annotations; ensure cache invalidation/clear method exists if
SensorExplorer.dataset changes.

Comment thread explorer.py
Comment on lines +310 to +313
@lru_cache(maxsize=64)
def _row_detector_events(self, row_index: int) -> tuple[tuple[tuple[str, object], ...], ...]:
signals, _, _ = self._load_row_bundle(row_index)
return tuple(tuple(self._detector_events(signal)) for signal in signals)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Same lru_cache memory leak concern.

This method has the same issue as _load_row_bundle - @lru_cache on an instance method prevents garbage collection of the SensorExplorer instance.

Consider the same fix patterns: use a dict-based instance cache or move caching to module level.

🧰 Tools
🪛 Ruff (0.15.7)

[warning] 310-310: Use of functools.lru_cache or functools.cache on methods can lead to memory leaks

(B019)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@explorer.py` around lines 310 - 313, The `@lru_cache` on the instance method
_row_detector_events holds a strong reference to self and causes the same leak
as _load_row_bundle; remove the decorator and implement an instance-level cache
(e.g., a dict like self._row_detector_events_cache keyed by row_index) or
alternatively refactor _row_detector_events into a module-level function and
apply `@lru_cache` there; ensure cache lookup, population and invalidation are
handled (clear on relevant state changes) and reference the existing helper
_load_row_bundle and the method name _row_detector_events when making the
change.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@visualizer.py`:
- Around line 191-202: Validate and guard result.spike_minute before using it to
index signal.data: retrieve result.spike_minute, ensure it exists and is
convertible to int, then check 0 <= minute < len(signal.data) to avoid negative
or out-of-range indexing; only then access signal.data[minute], check not
np.isnan(value) and call ax.scatter and appended rendered text (for both the
"spike" and "drop" branches); if the minute is invalid, skip plotting/appending
(or log a warning) to prevent crashes or silent misplots.
- Around line 283-285: The current bounds check for args.row_index in
visualizer.py uses len(dataset) which produces a confusing message when the
dataset is empty; update the logic to first check if len(dataset) == 0 and raise
a clear error (e.g., "dataset is empty, cannot select a row") and otherwise
perform the existing bounds check for args.row_index (raise IndexError with the
corrected range using len(dataset)-1). Modify the block that references
args.row_index and dataset to handle the empty-dataset case before computing the
upper bound.
- Line 168: The zip over axes and signals in the for loop should use strict=True
to enforce both iterables are the same length; update the loop `for i, (ax,
signal) in enumerate(zip(axes[:n_channels], signals)):` to include `strict=True`
so any accidental length mismatch between `axes[:n_channels]` and `signals` is
raised (referencing the variables/loop using axes, signals, n_channels and the
for-loop in visualizer.py).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7d4c5f0c-13b4-4a01-b899-c1436df954ac

📥 Commits

Reviewing files that changed from the base of the PR and between ce2818d and 48e804b.

📒 Files selected for processing (2)
  • requirements.txt
  • visualizer.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • requirements.txt

Comment thread visualizer.py
"gap": Patch(facecolor="#d62728", edgecolor="none", alpha=0.10, label="gap"),
}

for i, (ax, signal) in enumerate(zip(axes[:n_channels], signals)):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Verify Python version compatibility and lint config before applying strict=True.
fd -HI 'pyproject.toml|setup.cfg|setup.py'
rg -n --iglob 'pyproject.toml' 'requires-python|target-version|ruff'
rg -n --iglob 'setup.cfg' 'python_requires|target-version|\\[tool:ruff\\]'
rg -n --iglob 'setup.py' 'python_requires'
rg -nP '\\bzip\\s*\\(' visualizer.py

Repository: StanfordBDHG/SensorTSLM

Length of output: 215


🏁 Script executed:

#!/bin/bash
# Find the visualizer.py file and project configuration
find . -name "visualizer.py" -type f
ls -la | head -20

Repository: StanfordBDHG/SensorTSLM

Length of output: 1247


🏁 Script executed:

#!/bin/bash
# Look for project configuration files
find . -maxdepth 2 -name "pyproject.toml" -o -name "setup.cfg" -o -name "setup.py" -o -name "setup.cfg"

Repository: StanfordBDHG/SensorTSLM

Length of output: 49


🏁 Script executed:

#!/bin/bash
# Check the content around visualizer.py line 168
if [ -f "visualizer.py" ]; then
  echo "=== visualizer.py context around line 168 ==="
  sed -n '160,180p' visualizer.py
fi

Repository: StanfordBDHG/SensorTSLM

Length of output: 1299


🏁 Script executed:

#!/bin/bash
# Check for project configuration in current directory
cat pyproject.toml 2>/dev/null || echo "No pyproject.toml found"
cat setup.py 2>/dev/null || echo "No setup.py found"
cat setup.cfg 2>/dev/null || echo "No setup.cfg found"
cat requirements.txt 2>/dev/null || echo "No requirements.txt found"

Repository: StanfordBDHG/SensorTSLM

Length of output: 193


🏁 Script executed:

#!/bin/bash
# Get broader context of the function containing line 168
sed -n '140,200p' visualizer.py

Repository: StanfordBDHG/SensorTSLM

Length of output: 3194


🏁 Script executed:

#!/bin/bash
# Check GitHub workflows for Python version hints
find .github -type f -name "*.yml" -o -name "*.yaml" 2>/dev/null | head -5

Repository: StanfordBDHG/SensorTSLM

Length of output: 104


🏁 Script executed:

#!/bin/bash
# Check workflow files if they exist
if [ -d ".github/workflows" ]; then
  cat .github/workflows/*.yml .github/workflows/*.yaml 2>/dev/null | grep -i "python-version\|python_version" || echo "No Python version found in workflows"
fi

Repository: StanfordBDHG/SensorTSLM

Length of output: 103


🏁 Script executed:

#!/bin/bash
# Read the workflow file to check for Python configuration
cat .github/workflows/static-analysis.yml

Repository: StanfordBDHG/SensorTSLM

Length of output: 739


🏁 Script executed:

#!/bin/bash
# Check README for Python version requirements
head -50 README.md

Repository: StanfordBDHG/SensorTSLM

Length of output: 1239


🏁 Script executed:

#!/bin/bash
# Search for any Python version mentions in the repository
rg -i "python.*3\.\d+|requires.*python" --max-count=10 2>/dev/null || echo "No explicit Python version found"

Repository: StanfordBDHG/SensorTSLM

Length of output: 137


🏁 Script executed:

#!/bin/bash
# Found ruff.toml - read the full file
cat ruff.toml

Repository: StanfordBDHG/SensorTSLM

Length of output: 1144


🏁 Script executed:

#!/bin/bash
# Check if there are any Python 3.10+ specific features used in the codebase
rg "match\s+|:=|str\(" visualizer.py | head -5

Repository: StanfordBDHG/SensorTSLM

Length of output: 49


🏁 Script executed:

#!/bin/bash
# Check imports and type hints that might indicate Python version
head -30 visualizer.py

Repository: StanfordBDHG/SensorTSLM

Length of output: 879


Add strict=True to enforce aligned iterable lengths.

The code slices axes to n_channels (which equals len(signals)) before zipping. Since both iterables are guaranteed to match in length by design, adding strict=True catches accidental divergence and satisfies Ruff rule B905 (already enabled in ruff.toml).

Suggested change
-    for i, (ax, signal) in enumerate(zip(axes[:n_channels], signals)):
+    for i, (ax, signal) in enumerate(zip(axes[:n_channels], signals, strict=True)):
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for i, (ax, signal) in enumerate(zip(axes[:n_channels], signals)):
for i, (ax, signal) in enumerate(zip(axes[:n_channels], signals, strict=True)):
🧰 Tools
🪛 Ruff (0.15.7)

[warning] 168-168: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@visualizer.py` at line 168, The zip over axes and signals in the for loop
should use strict=True to enforce both iterables are the same length; update the
loop `for i, (ax, signal) in enumerate(zip(axes[:n_channels], signals)):` to
include `strict=True` so any accidental length mismatch between
`axes[:n_channels]` and `signals` is raised (referencing the variables/loop
using axes, signals, n_channels and the for-loop in visualizer.py).

Comment thread visualizer.py
Comment on lines +191 to +202
elif result.event_type == "spike":
minute = int(result.spike_minute)
value = signal.data[minute]
if not np.isnan(value):
ax.scatter(minute, value, color="#2ca02c", marker="^", s=28, zorder=3)
rendered.append(f"spike @{minute}")
elif result.event_type == "drop":
minute = int(result.spike_minute)
value = signal.data[minute]
if not np.isnan(value):
ax.scatter(minute, value, color="#d62728", marker="v", s=28, zorder=3)
rendered.append(f"drop @{minute}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard spike_minute before indexing to avoid crashes/misplots.

int(result.spike_minute) and signal.data[minute] can fail (or silently misplot for negative indices) if a detector emits malformed/out-of-range minutes.

💡 Suggested hardening
-                elif result.event_type == "spike":
-                    minute = int(result.spike_minute)
-                    value = signal.data[minute]
-                    if not np.isnan(value):
-                        ax.scatter(minute, value, color="#2ca02c", marker="^", s=28, zorder=3)
-                    rendered.append(f"spike @{minute}")
-                elif result.event_type == "drop":
-                    minute = int(result.spike_minute)
-                    value = signal.data[minute]
-                    if not np.isnan(value):
-                        ax.scatter(minute, value, color="#d62728", marker="v", s=28, zorder=3)
-                    rendered.append(f"drop @{minute}")
+                elif result.event_type in {"spike", "drop"}:
+                    minute_raw = result.spike_minute
+                    if minute_raw is None:
+                        rendered.append(f"{result.event_type} `@invalid`")
+                        continue
+                    minute = int(minute_raw)
+                    if minute < 0 or minute >= len(series):
+                        rendered.append(f"{result.event_type} `@out_of_range`({minute})")
+                        continue
+                    value = series[minute]
+                    if not np.isnan(value):
+                        marker = "^" if result.event_type == "spike" else "v"
+                        color = "#2ca02c" if result.event_type == "spike" else "#d62728"
+                        ax.scatter(minute, value, color=color, marker=marker, s=28, zorder=3)
+                    rendered.append(f"{result.event_type} @{minute}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
elif result.event_type == "spike":
minute = int(result.spike_minute)
value = signal.data[minute]
if not np.isnan(value):
ax.scatter(minute, value, color="#2ca02c", marker="^", s=28, zorder=3)
rendered.append(f"spike @{minute}")
elif result.event_type == "drop":
minute = int(result.spike_minute)
value = signal.data[minute]
if not np.isnan(value):
ax.scatter(minute, value, color="#d62728", marker="v", s=28, zorder=3)
rendered.append(f"drop @{minute}")
elif result.event_type in {"spike", "drop"}:
minute_raw = result.spike_minute
if minute_raw is None:
rendered.append(f"{result.event_type} `@invalid`")
continue
minute = int(minute_raw)
if minute < 0 or minute >= len(signal.data):
rendered.append(f"{result.event_type} `@out_of_range`({minute})")
continue
value = signal.data[minute]
if not np.isnan(value):
marker = "^" if result.event_type == "spike" else "v"
color = "#2ca02c" if result.event_type == "spike" else "#d62728"
ax.scatter(minute, value, color=color, marker=marker, s=28, zorder=3)
rendered.append(f"{result.event_type} @{minute}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@visualizer.py` around lines 191 - 202, Validate and guard result.spike_minute
before using it to index signal.data: retrieve result.spike_minute, ensure it
exists and is convertible to int, then check 0 <= minute < len(signal.data) to
avoid negative or out-of-range indexing; only then access signal.data[minute],
check not np.isnan(value) and call ax.scatter and appended rendered text (for
both the "spike" and "drop" branches); if the minute is invalid, skip
plotting/appending (or log a warning) to prevent crashes or silent misplots.

Comment thread visualizer.py
Comment on lines +283 to +285
if args.row_index < 0 or args.row_index >= len(dataset):
raise IndexError(f"row-index must be between 0 and {len(dataset) - 1}, got {args.row_index}")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Handle empty dataset explicitly for clearer CLI erroring.

When the dataset is empty, the current message reports a confusing range (0 to -1).

💡 Suggested change
-    dataset = MHCDataset(min_wear_pct=args.min_wear_pct)
-    if args.row_index < 0 or args.row_index >= len(dataset):
-        raise IndexError(f"row-index must be between 0 and {len(dataset) - 1}, got {args.row_index}")
+    dataset = MHCDataset(min_wear_pct=args.min_wear_pct)
+    dataset_len = len(dataset)
+    if dataset_len == 0:
+        raise ValueError("Dataset is empty after applying filters (check --min-wear-pct).")
+    if args.row_index < 0 or args.row_index >= dataset_len:
+        raise IndexError(f"row-index must be between 0 and {dataset_len - 1}, got {args.row_index}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@visualizer.py` around lines 283 - 285, The current bounds check for
args.row_index in visualizer.py uses len(dataset) which produces a confusing
message when the dataset is empty; update the logic to first check if
len(dataset) == 0 and raise a clear error (e.g., "dataset is empty, cannot
select a row") and otherwise perform the existing bounds check for
args.row_index (raise IndexError with the corrected range using len(dataset)-1).
Modify the block that references args.row_index and dataset to handle the
empty-dataset case before computing the upper bound.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant