Conversation
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR centralizes label handling and prompt generation by exposing a public label-decoding API and introducing a reusable prompt-building helper function. The main training script is simplified by removing internal evaluation logic, delegating that responsibility to external evaluation. Dataset modules are updated to use the new public APIs instead of duplicating prompt generation logic. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 11 minutes and 22 seconds.Comment |
edaeb06 to
bb31baf
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
time_series_datasets/mhc_label_lookup.py (1)
165-175:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon't instruct the model to emit the literal
<label>token.The shared rule currently says
ONLY <label>verbatim, so every metabolic train/eval prompt now includes an output token that is not one of the listed valid answers. That makes the instruction self-contradictory and can steer generations toward the placeholder instead ofMale/Female/ etc.Proposed fix
return ( f"{cfg['question']}\n\n" f"Possible answers: {values_str}\n\n" "Rules:\n" "- Base your answer on the sensor patterns above.\n" "- You MUST give a classification even if the signal is unclear — " "state limitations but still make your best guess.\n" "- Never respond with a question.\n" - "- You MUST respond with ONLY <label>, exactly as written in the listed possible answers." + "- You MUST respond with ONLY one of the listed possible answers, exactly as written." )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@time_series_datasets/mhc_label_lookup.py` around lines 165 - 175, The prompt construction in the return block (using values_str and cfg) incorrectly instructs the model to output the literal token "<label>" which conflicts with the listed answers; change the final rule to require the model to respond with only one of the listed possible answers exactly as written (e.g., "You MUST respond with only one of the possible answers listed above, exactly as written."), referencing cfg["values"]/values_str and the return string so the sentence replaces the literal "<label>" placeholder with the correct instruction.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/metabolic_finetune.py`:
- Around line 11-14: The training targets still include the "Answer: " prefix
which mismatches the shared build_metabolic_post_prompt framing and the plain
str(value) outputs from time_series_datasets/mhc_multi_label_qa_dataset.py;
remove the hard-coded "Answer: " prefix where targets are constructed (e.g.,
replace occurrences like target = f"Answer: {label}" or post_prompt + "Answer: "
usage with just the bare label/str(value)), ensure the training loss uses the
bare label token sequence, and update the module docstring at the top (formerly
Lines 11-14) to describe the new bare-label target format; verify references to
build_metabolic_post_prompt and any variables named post_prompt/target in the
training loop (also the analogous code around Lines 142-149) are changed
consistently.
---
Duplicate comments:
In `@time_series_datasets/mhc_label_lookup.py`:
- Around line 165-175: The prompt construction in the return block (using
values_str and cfg) incorrectly instructs the model to output the literal token
"<label>" which conflicts with the listed answers; change the final rule to
require the model to respond with only one of the listed possible answers
exactly as written (e.g., "You MUST respond with only one of the possible
answers listed above, exactly as written."), referencing
cfg["values"]/values_str and the return string so the sentence replaces the
literal "<label>" placeholder with the correct instruction.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ebe68d5d-e983-4651-9b4a-15db19c68ec3
📒 Files selected for processing (5)
scripts/metabolic_finetune.pytime_series_datasets/mhc_base_qa_dataset.pytime_series_datasets/mhc_label_lookup.pytime_series_datasets/mhc_metabolic_qa_dataset.pytime_series_datasets/mhc_multi_label_qa_dataset.py
Switch metabolic training targets from ints to canonical strings
♻️ Current situation & Problem
Depends on / sits on top of the eval pipeline PR (#55, feature/evaluation-pipeline). Surface follow-up to PR #50, which
introduced the curriculum-learning training stack with int-encoded answer targets ("0", "1", "2", …).
Two related issues motivating this PR:
OpenTSLM convention divergence — the upstream OpenTSLM reference datasets (HARAcc, Sleep, ECG) all train on canonical class strings ("biking", "Wake", "yes", …). Our metabolic_finetune.py and mhc_multi_label_qa_dataset.py deviated by using str(int(value)) as the training target.
Train ↔ eval prompt drift — eval (MHCMetabolicQADataset from PR Feature/evaluation pipeline #55) frames the task as "Is this person biologically male or female? … Answer: ", while training framed it as the terse "Based on the sensor data, predict the value of BiologicalSex (binary).".
⚙️ Release Notes
Binary: "Male" / "Female" / "True" / "False"
Ordinal: "Underweight" / "Normal weight" / "Overweight" / …
Continuous: numeric string (unchanged)
Wrapped as "Answer: " so the loss target matches the prompt's Answer: instruction.
New shared helper build_metabolic_post_prompt(label) in time_series_datasets/mhc_label_lookup.py. Both MHCMetabolicQADataset (eval) and MHCMultiLabelQADataset (training) call it — train and eval now emit byte-identical post_prompts. Lives next to METABOLIC_LABEL_CONFIG it formats.
_logits_over_candidates, _answer_prefix_embed) was incompatible with multi-token canonical strings. Downstream metrics now flow
through evaluation/run_eval.py (PR Feature/evaluation pipeline #55) instead. --eval_only flag removed; net −189 LOC.
captioning checkpoint:
python curriculum_learning.py \
--model OpenTSLMSP \
--stages stage_joint_caption_metabolic \
--batch_size 2 \
--gradient_checkpointing