Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a CrossChannelExtractor and three CrossChannelSynthesizers (Sleep, StationaryActivity, Cardio); extends templates and VALID_CAPTION_TYPES; wires the extractor into captionizer/visualizer/explorer; reworks explorer target navigation/rendering for cross-channel windows; updates spike/trend detectors and detector configs; adds Changes
Sequence DiagramsequenceDiagram
actor User
participant Explorer
participant Annotator
participant CCExtractor as CrossChannelExtractor
participant SleepSynth as SleepSynthesizer
participant StationarySynth as StationaryActivitySynthesizer
participant CardioSynth as CardioSynthesizer
participant Templates
User->>Explorer: open recording / request annotations
Explorer->>Annotator: extract(recording)
Annotator->>CCExtractor: extract(recording)
CCExtractor->>SleepSynth: synthesize(recording, config)
CCExtractor->>StationarySynth: synthesize(recording, config)
CCExtractor->>CardioSynth: synthesize(recording, config)
SleepSynth->>Templates: load cross_channel.sleep templates
StationarySynth->>Templates: load cross_channel.stationary_activity templates
CardioSynth->>Templates: load cross_channel.cardio templates
Templates-->>SleepSynth: template strings
Templates-->>StationarySynth: template strings
Templates-->>CardioSynth: template strings
SleepSynth-->>CCExtractor: annotations (sleep windows)
StationarySynth-->>CCExtractor: annotations (stationary activity windows)
CardioSynth-->>CCExtractor: annotations (cardio windows)
CCExtractor-->>Annotator: aggregated annotations
Annotator-->>Explorer: annotated recording
Explorer->>Explorer: render() draws axvspan overlays for cross-channel windows
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
8a21172 to
e457736
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
.gitignore (1)
6-6: Consider adding a newline at end of file.The file currently ends without a newline character. Adding one aligns with POSIX standards and prevents diff noise in future changes.
📝 Proposed fix
-data/ +data/ +🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.gitignore at line 6, Add a trailing newline at the end of the .gitignore by ensuring the final "data/" entry is followed by a newline character (i.e., end the file with a blank line) so the file conforms to POSIX conventions and avoids future diff noise.synthesizers/sleep.py (1)
35-37: Avoid reading and decodingtemplates.jsonfor every row.
SleepSynthesizer.synthesize()runs on the per-row annotation path incaptionizerandexplorer. Re-reading the same template file here adds unnecessary I/O and JSON parsing overhead; load it once in__init__or memoize it byconfig.templates_path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@synthesizers/sleep.py` around lines 35 - 37, The synthesize method currently re-reads and parses the templates JSON for every row (templates = json.loads(config.templates_path.read_text())...), causing heavy I/O; move this load into SleepSynthesizer.__init__ (e.g., self.templates = json.loads(config.templates_path.read_text())["cross_channel"]["sleep"]) or implement a simple memoization keyed by config.templates_path so synthesize uses self.templates instead of re-reading; update references in SleepSynthesizer.synthesize() (and any callers) to use the cached templates while keeping the rest of the logic (including time_unit and CaptionExtractor._seed usage) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@synthesizers/sleep.py`:
- Around line 19-20: The constructor for SleepSynthesizer currently defaults
min_duration to 5 which unintentionally filters short (<5-sample) intervals;
change the default in SleepSynthesizer.__init__ to a permissive value (e.g., 0)
so SleepSynthesizer() yields raw/permissive output and does not drop
sub-5-sample “in bed but awake” windows, and keep the min_duration parameter
available for callers that want to opt into filtering.
- Around line 31-33: The current guard requires asleep to have a positive value
before emitting awake-in-bed intervals; change it to only require that asleep
has any observed (non-NaN) samples. Replace the condition "if not
np.any((~np.isnan(asleep)) & (asleep > 0)):" with a check that tests for any
non-NaN in asleep (e.g., "if not np.any(~np.isnan(asleep)):"), leaving the
subsequent mask expression using asleep > 0 and in_bed unchanged so awake-in-bed
windows are emitted when asleep is observed but stays zero.
In `@visualizer.py`:
- Around line 125-129: The plot_row() function is only rendering captions for
annotation types 'statistical', 'structural', and 'semantic', so annotations
produced by CrossChannelExtractor (created in the Annotator instantiation
alongside StatisticalExtractor and StructuralExtractor) are being dropped;
update plot_row() to recognize and render the 'cross_channel' annotation type
(or whatever key CrossChannelExtractor uses) in the same way as the others—add
the caption/layout branch or include 'cross_channel' in the type-handling list
and ensure its text and markers are drawn to the saved plot so sleep-bundle
annotations become visible.
---
Nitpick comments:
In @.gitignore:
- Line 6: Add a trailing newline at the end of the .gitignore by ensuring the
final "data/" entry is followed by a newline character (i.e., end the file with
a blank line) so the file conforms to POSIX conventions and avoids future diff
noise.
In `@synthesizers/sleep.py`:
- Around line 35-37: The synthesize method currently re-reads and parses the
templates JSON for every row (templates =
json.loads(config.templates_path.read_text())...), causing heavy I/O; move this
load into SleepSynthesizer.__init__ (e.g., self.templates =
json.loads(config.templates_path.read_text())["cross_channel"]["sleep"]) or
implement a simple memoization keyed by config.templates_path so synthesize uses
self.templates instead of re-reading; update references in
SleepSynthesizer.synthesize() (and any callers) to use the cached templates
while keeping the rest of the logic (including time_unit and
CaptionExtractor._seed usage) unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5a2f7a21-e5a2-422f-8152-55d48ebb151f
📒 Files selected for processing (9)
.gitignorecaptionizer.pyexplorer.pyextractors/__init__.pyextractors/cross_channel.pysynthesizers/__init__.pysynthesizers/sleep.pytemplates/templates.jsonvisualizer.py
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@explorer.py`:
- Around line 414-452: The scan loop never revisits the starting row after a
full cycle because row_offset only iterates 0..n_rows-1 and the special-case
"elif row_offset == 0: continue" prevents wrapping back to the current row; fix
by changing the loop to iterate one extra step (use range(1, n_rows+1) or
otherwise include the wrap iteration) and remove the early-skip so that when
row_index == self.row_index and there are no candidates after the current signal
you still allow selecting hit_signals[0] (for step>0) or hit_signals[-1] (for
step<0) on the wrapped iteration; update the loop that uses start_row,
row_offset, and row_index and keep references to self._row_hit_signal_indices,
self.row_index, self.signal_index, and self.hit_target intact so the code will
correctly jump back to hits on the original row after scanning the rest of the
dataset.
In `@synthesizers/cardio.py`:
- Around line 19-21: The distance channel is hard-coded to DISTANCE_CHANNEL =
"hk_watch:HKQuantityTypeIdentifierDistanceWalkingRunning" and that same
distance_idx is reused for cycling, so cycling workouts miss distance data; add
a new constant (e.g., CYCLING_DISTANCE_CHANNEL =
"hk_watch:HKQuantityTypeIdentifierDistanceCycling"), update the lookup logic
where distance_idx is computed and used (references: DISTANCE_CHANNEL and
distance_idx) to choose the cycling channel when the workout type is cycling
(e.g., check the workout type or a workout.kind value) and fall back to the
walking/running channel otherwise, ensuring downstream code that reads distance
uses the correct index for cycling workouts.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c9366eb0-3611-4bb7-84ba-94c5b0b09c94
📒 Files selected for processing (5)
captionizer.pyexplorer.pysynthesizers/cardio.pytemplates/templates.jsonvisualizer.py
✅ Files skipped from review due to trivial changes (2)
- captionizer.py
- templates/templates.json
🚧 Files skipped from review as they are similar to previous changes (1)
- visualizer.py
There was a problem hiding this comment.
🧹 Nitpick comments (2)
detectors/spike.py (1)
27-28: Remove unused parametersmax_cluster_peaksandsoft_top_k_margin.These parameters are stored in
__init__but never referenced in_detector_build_results. If they're placeholders for future functionality, consider adding a TODO comment; otherwise, remove them to avoid confusion.♻️ Proposed fix to remove unused parameters
def __init__( self, filter_zeros: bool = False, min_height: float | None = None, min_prominence: float = 0.0, min_threshold: float | None = None, min_distance: int = 1, min_width: int = 1, top_k: int | None = None, - max_cluster_peaks: int = 2, - soft_top_k_margin: float = 0.90, ) -> None: super().__init__(filter_zeros=filter_zeros) self.min_height = min_height self.min_prominence = min_prominence self.min_threshold = min_threshold self.min_distance = max(1, min_distance) self.min_width = max(1, min_width) self.top_k = top_k - self.max_cluster_peaks = max(1, max_cluster_peaks) - self.soft_top_k_margin = min(max(float(soft_top_k_margin), 0.0), 1.0)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@detectors/spike.py` around lines 27 - 28, The constructor of the spike detector currently accepts and stores max_cluster_peaks and soft_top_k_margin but they are never used; remove these parameters from the __init__ signature and delete the corresponding attribute assignments (search for max_cluster_peaks and soft_top_k_margin in the class, e.g., in __init__ of the SpikeDetector) and any related comments or default values; if these were meant as future hooks, instead leave a single TODO comment in __init__ mentioning planned clustering/soft-top-k behaviour and remove the unused attributes to avoid confusion.explorer.py (1)
403-419:lru_cacheon instance method may cause memory retention.The
@lru_cache(maxsize=256)decorator on_row_hit_signal_indicesusesselfas part of the cache key. While the boundedmaxsizelimits growth, the cache will hold references toSensorExplorerinstances even after they should be garbage collected.For this interactive single-instance application, the impact is minimal. If multi-instance usage becomes a concern, consider using
functools.cached_propertypatterns or a per-instance cache dict.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@explorer.py` around lines 403 - 419, The method _row_hit_signal_indices is decorated with `@lru_cache` which captures self in the cache key and can retain SensorExplorer instances; remove the `@lru_cache` decorator and implement a per-instance bounded cache instead (e.g. add self._row_hit_signal_indices_cache = OrderedDict() or a plain dict with manual eviction in the class init), then modify _row_hit_signal_indices to check the per-instance cache using (row_index, target_name) as the key, compute the result using the existing logic that calls _load_row_bundle, _cross_channel_windows, and _row_detector_events when missing, store the tuple in the per-instance cache (evict oldest when exceeding maxsize 256), and optionally clear the cache on instance cleanup.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@detectors/spike.py`:
- Around line 27-28: The constructor of the spike detector currently accepts and
stores max_cluster_peaks and soft_top_k_margin but they are never used; remove
these parameters from the __init__ signature and delete the corresponding
attribute assignments (search for max_cluster_peaks and soft_top_k_margin in the
class, e.g., in __init__ of the SpikeDetector) and any related comments or
default values; if these were meant as future hooks, instead leave a single TODO
comment in __init__ mentioning planned clustering/soft-top-k behaviour and
remove the unused attributes to avoid confusion.
In `@explorer.py`:
- Around line 403-419: The method _row_hit_signal_indices is decorated with
`@lru_cache` which captures self in the cache key and can retain SensorExplorer
instances; remove the `@lru_cache` decorator and implement a per-instance bounded
cache instead (e.g. add self._row_hit_signal_indices_cache = OrderedDict() or a
plain dict with manual eviction in the class init), then modify
_row_hit_signal_indices to check the per-instance cache using (row_index,
target_name) as the key, compute the result using the existing logic that calls
_load_row_bundle, _cross_channel_windows, and _row_detector_events when missing,
store the tuple in the per-instance cache (evict oldest when exceeding maxsize
256), and optionally clear the cache on instance cleanup.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e5a7aee8-f7a7-49eb-aa5a-ba3a9c9dce75
📒 Files selected for processing (10)
.gitignoreREADME.mddetectors/__init__.pydetectors/spike.pydetectors/trend.pyexplorer.pyextractors/structural.pymhc/constants.pymhc_weekly/constants.pytemplates/templates.json
💤 Files with no reviewable changes (1)
- extractors/structural.py
✅ Files skipped from review due to trivial changes (1)
- README.md
🚧 Files skipped from review as they are similar to previous changes (1)
- .gitignore
|
The |
| hr_summary.append("the heartrate was elevated during this phase") | ||
| hr_summary.append(f"peak HR {hr_peak:.0f} bpm") | ||
| else: | ||
| hr_summary.append(f"avg HR {hr_mean:.0f} bpm") |
There was a problem hiding this comment.
hr_summary.append(f"avg HR {hr_mean:.0f} bpm") this line is added twice in each if branch
I think more fluent German sentences fit the current captions better than the current abbreviations.
"averaging a heart rate of HR bpm, peaking at HR..."
There was a problem hiding this comment.
lets use the meta channel config for that, we defined it somewhere
| parts.append(f"this means HR is {abs(hr_day_delta):.0f} bpm {direction} than the mean of the day") | ||
| channel_idxs.append(hr_idx) | ||
|
|
||
| distance_mean = self._metric_mean(row, distance_idx, start, end) |
max-rosenblattl
left a comment
There was a problem hiding this comment.
Thanks @KarlDeck 🚀 I think the functionality is quite there. But a few more structural changes necessary
| return float(np.sum(values)) | ||
|
|
||
| @classmethod | ||
| def _metric_day_mean_delta(cls, row: Recording, idx: int | None, start: int, end: int) -> float | None: |
There was a problem hiding this comment.
just a question: should we exclude the window mean from the daily mean? probably not though
There was a problem hiding this comment.
No I don't think so. But something I've been wanting to do for a while now is to calculate the Resting HR for the person and then the RR because that's a very informative medical measurement. But this might include calculating the resting HR over multiple days to be more accurate. So either we do multi day values or just do it for one day which is an approximation. For the weekly data we can definitely do it. Even for one time series
| from timef.schema import Annotation, Recording | ||
|
|
||
|
|
||
| class CrossChannelSynthesizer(abc.ABC): |
There was a problem hiding this comment.
I think the all the helper methods in __init__.py are better suited in a _helper.py
| from util import seed_from_key | ||
|
|
||
|
|
||
| class StationaryActivitySynthesizer(CrossChannelSynthesizer): |
There was a problem hiding this comment.
I think we should consider having a base WorkoutSynthesizer/ActivitySynthesizer, because Stationary and Cardio are almost duplicating 80% of the code; a base class for these too makes a lot of sense, also thinking about the other activities/workouts, where we would replicate a lot of code
| ) | ||
| return results | ||
|
|
||
| def _metrics_suffix( |
There was a problem hiding this comment.
Currently the metrics suffix is deterministic. I think we should adapt the template based behaviour from the other captions for greater variety. I'm not sure what's the best format to encode it but something along the lines:
"cross_channel": {
"metrics": {
"hr_summary": [
"averaging a {name} of {mean:.0f} {unit}",
"with a mean {name} of {mean:.0f} {unit}",
"at roughly {mean:.0f} {unit} {name}"
],
"hr_peak": ["peaking at {peak:.0f} {unit}", ...],
"hr_elevated": [...],
"hr_day_delta_higher": [...],
"hr_day_delta_lower": [...],
"distance_mean": [...],
"distance_total": [...],
"step_mean": [...],
"step_total": [...]
}
}
Also, important: currently I do see the cross_channel captions as more sophisticated semantic captions. Therefor, I would make them mutually exclusive (we don't have to handle that in code). But it requires us to have cross_channel captions for each activity, @KarlDeck did you find anything that would speak against extending them (not in this PR, a new one though)?
Cross-channel extractor and sleep synthesizer
♻️ Current situation & Problem
This PR introduces an initial raw version of sleep-related bundle captions built on top of the current signal representation and annotation pipeline.
The goal is to capture higher-level multi-channel context, like:
in bedbut notasleepThis is intended as an exploratory first version. More different cross-channel c.aptions are to follow this example. In reviewing these specific bundle outputs, one thing that comes up is that the detector may sometimes fire during the day when someone briefly lies down. In those cases, the label can still be technically correct because the person is in bed but not asleep. That said, this behavior could also be filtered later by adding a minimum bed-stay duration before such a caption qualifies. For this PR, the intention is to keep the behavior relatively raw rather than aggressively filtered.
⚙️ Release Notes
in bed but awakestyle intervalsPotential follow-up refinement:
in bedduration before generating a sleep-related bundle caption to reduce short daytime lay-down detections📚 Documentation
This PR is meant as a first-pass implementation of sleep bundles and focuses on getting the raw behavior into the pipeline so we can inspect outputs and iterate.
A notable design choice is that short daytime bed intervals are currently allowed to surface as captions when the underlying label is still semantically correct. We may later tighten this by introducing duration-based gating, but for now this version is intentionally permissive so we can better understand the behavior of the raw detector and captioning logic.
✅ Testing
Testing for this PR focused primarily on implementation and output inspection of the raw caption behavior.
Important note:
Code of Conduct & Contributing Guidelines
By creating and submitting this pull request, you agree to follow our Code of Conduct and Contributing Guidelines: