fix(clp-s::log_converter): Update timezone regex to only accept spaces when followed by an actual timezone (fixes #2187).#2192
Conversation
…wed by at least some actual timezone content.
WalkthroughExpanded timestamp regex handling in LogConverter; added integration-test fixtures and a test that runs Changes
Sequence Diagram(s)sequenceDiagram
participant Test as Test Harness
participant LC as "log-converter (binary)"
participant FS as File System (conversion_dir)
participant CLP as "clp-s (binary)"
participant FS2 as File System (compression_dir)
Test->>LC: invoke with source log file(s)
LC-->>FS: write converted logs (conversion_dir)
Test->>CLP: invoke 'clp-s c' on conversion_dir (--timestamp-key timestamp)
CLP-->>FS2: write compressed output (compression_dir)
alt num_log_events provided
Test->>CLP: invoke 'clp-s s' to read compressed output (filter timestamp > 0)
CLP-->>Test: stdout lines (events)
Test->>Test: count lines and assert equals expected
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Ruff (0.15.10)integration-tests/tests/utils/config.py�[1;31mruff failed�[0m integration-tests/tests/test_log_converter.py�[1;31mruff failed�[0m Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@components/core/src/clp_s/log_converter/LogConverter.cpp`:
- Around line 33-35: Add automated regression tests that exercise the
LogConverter parsing/serialization boundary for the exact timezone edge cases
mentioned: no timezone, lone space, " Z", " UTC", " +05:30", and "UTC+05:30".
Create unit tests that call the LogConverter::parseTimestamp (or the equivalent
parsing function) with inputs for each case and then call
LogConverter::serializeTimestamp (or the corresponding serializer) on the parsed
result, asserting round-trip equality (or normalized canonical form) for each
case so future changes to the regex in LogConverter.cpp are covered.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8dd09ec5-4f55-46e3-b28f-8338aaff0a49
📒 Files selected for processing (1)
components/core/src/clp_s/log_converter/LogConverter.cpp
| R"(((( UTC){0,1}([\+\-]\d{2}(:{0,1}\d{2}){0,1}){0,1}Z{0,1})|)" | ||
| R"((( [\+\-]\d{2}(:{0,1}\d{2}){0,1}){0,1}Z{0,1})|(( Z){0,1})|)" | ||
| R"(((UTC){0,1}([\+\-]\d{2}(:{0,1}\d{2}){0,1}){0,1}Z{0,1})){0,1}))" |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Add regression tests for the exact timezone edge cases fixed here.
Please add automated coverage for at least: no timezone, lone space, " Z", " UTC", " +05:30", and "UTC+05:30" so this parser/serializer boundary doesn’t regress again.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/core/src/clp_s/log_converter/LogConverter.cpp` around lines 33 -
35, Add automated regression tests that exercise the LogConverter
parsing/serialization boundary for the exact timezone edge cases mentioned: no
timezone, lone space, " Z", " UTC", " +05:30", and "UTC+05:30". Create unit
tests that call the LogConverter::parseTimestamp (or the equivalent parsing
function) with inputs for each case and then call
LogConverter::serializeTimestamp (or the corresponding serializer) on the parsed
result, asserting round-trip equality (or normalized canonical form) for each
case so future changes to the regex in LogConverter.cpp are covered.
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@integration-tests/tests/test_log_converter.py`:
- Around line 52-55: The test currently calls
_convert_and_compress(clp_core_path_config, test_paths) then
test_paths.clear_test_outputs(), but if _convert_and_compress raises the cleanup
is skipped; wrap the call to _convert_and_compress in a try/finally so
test_paths.clear_test_outputs() is invoked in the finally block regardless of
errors. Locate the invocation of _convert_and_compress and replace the
sequential calls with a try { _convert_and_compress(...) } finally {
test_paths.clear_test_outputs() } pattern to guarantee cleanup.
- Around line 75-77: The event counting treats an empty stdout as one event
because len(output.stdout.strip().split("\n")) returns 1 for "", so update the
logic around run_and_log_subprocess/output.stdout to explicitly check for empty
or whitespace-only stdout: set num_events = 0 if output.stdout is None or
output.stdout.strip() == "" else len(output.stdout.strip().split("\n")) (or use
output.stdout.splitlines() after confirming it's non-empty); modify the code
that computes num_events (referencing variables output, num_events and
test_paths.num_log_events) so empty output yields 0.
In `@integration-tests/tests/utils/config.py`:
- Around line 399-400: The docstring for
ConversionTestPathConfig.clear_test_outputs is misleadingly referring to a
"compression test"; update the docstring to reference the conversion workflow
instead—e.g., change "Remove any existing output directories created by this
compression test." to something like "Remove any existing output directories
created by this conversion workflow." Ensure you update the docstring text in
the ConversionTestPathConfig.clear_test_outputs method so it accurately reflects
the class purpose.
- Around line 386-398: The __post_init__ of ConversionTestPathConfig fails to
validate the provided logs_source_dir which leads to later subprocess failures;
update __post_init__ to retrieve integration_test_path_config.logs_source_dir,
call validate_dir_exists on it (like test_root_dir), and then set the instance
attribute (object.__setattr__(self, "logs_source_dir", logs_source_dir)) so the
logs directory existence is checked early.
- Around line 318-319: The field num_log_events currently allows negative
values; add a validation on construction in the class that declares
num_log_events to ensure it is either None or a non-negative integer. If this is
a dataclass, implement a __post_init__ that raises ValueError when
self.num_log_events is not None and self.num_log_events < 0; if it’s a pydantic
model add a `@validator` for "num_log_events" performing the same check. Update
any callers/tests expecting errors accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 456b879a-da17-478e-8eab-cfc25919e21a
📒 Files selected for processing (3)
integration-tests/tests/fixtures/integration_test_logs.pyintegration-tests/tests/test_log_converter.pyintegration-tests/tests/utils/config.py
| _convert_and_compress(clp_core_path_config, test_paths) | ||
|
|
||
| test_paths.clear_test_outputs() | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Ensure cleanup always runs on failure.
If _convert_and_compress raises, post-test cleanup is skipped. Wrap execution in try/finally to avoid leaking artefacts across runs.
Proposed fix
- _convert_and_compress(clp_core_path_config, test_paths)
-
- test_paths.clear_test_outputs()
+ try:
+ _convert_and_compress(clp_core_path_config, test_paths)
+ finally:
+ test_paths.clear_test_outputs()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@integration-tests/tests/test_log_converter.py` around lines 52 - 55, The test
currently calls _convert_and_compress(clp_core_path_config, test_paths) then
test_paths.clear_test_outputs(), but if _convert_and_compress raises the cleanup
is skipped; wrap the call to _convert_and_compress in a try/finally so
test_paths.clear_test_outputs() is invoked in the finally block regardless of
errors. Locate the invocation of _convert_and_compress and replace the
sequential calls with a try { _convert_and_compress(...) } finally {
test_paths.clear_test_outputs() } pattern to guarantee cleanup.
| #: Optional number of log events in the downloaded logs. | ||
| num_log_events: int | None = None |
There was a problem hiding this comment.
Validate num_log_events bounds at construction.
num_log_events currently accepts negative values, which can produce invalid expectations in event-count assertions downstream.
Proposed fix
def __post_init__(self, integration_test_path_config: IntegrationTestPathConfig) -> None:
"""Initialize and set tarball and extraction paths for integration test logs."""
+ if self.num_log_events is not None and self.num_log_events < 0:
+ err_msg = "`num_log_events` cannot be negative."
+ raise ValueError(err_msg)
+
name = self.name.strip()
if 0 == len(name):
err_msg = "`name` cannot be empty."
raise ValueError(err_msg)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@integration-tests/tests/utils/config.py` around lines 318 - 319, The field
num_log_events currently allows negative values; add a validation on
construction in the class that declares num_log_events to ensure it is either
None or a non-negative integer. If this is a dataclass, implement a
__post_init__ that raises ValueError when self.num_log_events is not None and
self.num_log_events < 0; if it’s a pydantic model add a `@validator` for
"num_log_events" performing the same check. Update any callers/tests expecting
errors accordingly.
| def __post_init__(self, integration_test_path_config: IntegrationTestPathConfig) -> None: | ||
| """Initialize and set required directory paths for conversion tests.""" | ||
| test_name = self.test_name.strip() | ||
| if 0 == len(test_name): | ||
| err_msg = "`test_name` cannot be empty." | ||
| raise ValueError(err_msg) | ||
| test_root_dir = integration_test_path_config.test_root_dir | ||
| validate_dir_exists(test_root_dir) | ||
|
|
||
| object.__setattr__(self, "test_name", test_name) | ||
| object.__setattr__(self, "conversion_dir", test_root_dir / f"{test_name}-converted") | ||
| object.__setattr__(self, "compression_dir", test_root_dir / f"{test_name}-archives") | ||
|
|
There was a problem hiding this comment.
Add existence validation for logs_source_dir.
ConversionTestPathConfig stores logs_source_dir but does not validate it. Failing fast here gives clearer diagnostics than letting subprocess calls fail later.
Proposed fix
def __post_init__(self, integration_test_path_config: IntegrationTestPathConfig) -> None:
"""Initialize and set required directory paths for conversion tests."""
test_name = self.test_name.strip()
if 0 == len(test_name):
err_msg = "`test_name` cannot be empty."
raise ValueError(err_msg)
+ validate_dir_exists(self.logs_source_dir)
test_root_dir = integration_test_path_config.test_root_dir
validate_dir_exists(test_root_dir)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@integration-tests/tests/utils/config.py` around lines 386 - 398, The
__post_init__ of ConversionTestPathConfig fails to validate the provided
logs_source_dir which leads to later subprocess failures; update __post_init__
to retrieve integration_test_path_config.logs_source_dir, call
validate_dir_exists on it (like test_root_dir), and then set the instance
attribute (object.__setattr__(self, "logs_source_dir", logs_source_dir)) so the
logs directory existence is checked early.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
integration-tests/tests/utils/config.py (2)
386-394:⚠️ Potential issue | 🟡 MinorFail fast by validating
logs_source_direxistence inConversionTestPathConfig.Without this check, failures are deferred to subprocess calls with less actionable errors.
Proposed fix
def __post_init__(self, integration_test_path_config: IntegrationTestPathConfig) -> None: """Initialize and set required directory paths for conversion tests.""" test_name = self.test_name.strip() if 0 == len(test_name): err_msg = "`test_name` cannot be empty." raise ValueError(err_msg) + validate_dir_exists(self.logs_source_dir) test_root_dir = integration_test_path_config.test_root_dir validate_dir_exists(test_root_dir)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@integration-tests/tests/utils/config.py` around lines 386 - 394, In ConversionTestPathConfig.__post_init__ (the method currently validating test_name and test_root_dir), also validate that self.logs_source_dir (or the logs_source_dir obtained from the provided integration_test_path_config) exists by calling validate_dir_exists(logs_source_dir) and raise a clear ValueError if missing; add the check immediately after validate_dir_exists(test_root_dir) so failures fail fast with an actionable error message referencing logs_source_dir.
318-323:⚠️ Potential issue | 🟡 MinorValidate
num_log_eventsas non-negative at construction.
num_log_eventscurrently accepts negative values, which can make downstream event-count checks invalid.Proposed fix
def __post_init__(self, integration_test_path_config: IntegrationTestPathConfig) -> None: """Initialize and set tarball and extraction paths for integration test logs.""" + if self.num_log_events is not None and self.num_log_events < 0: + err_msg = "`num_log_events` cannot be negative." + raise ValueError(err_msg) + name = self.name.strip() if 0 == len(name): err_msg = "`name` cannot be empty."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@integration-tests/tests/utils/config.py` around lines 318 - 323, The num_log_events attribute currently allows negative values; add a validation in the __post_init__ method to ensure that if self.num_log_events is not None it is >= 0, otherwise raise a ValueError with a clear message referencing num_log_events; update the IntegrationTestPathConfig initialization flow (within __post_init__) to perform this check so invalid negatives are rejected at construction.integration-tests/tests/test_log_converter.py (1)
52-55:⚠️ Potential issue | 🟡 MinorEnsure cleanup always runs, even when conversion/compression fails.
If
_convert_and_compressraises, post-test cleanup is skipped.Proposed fix
- _convert_and_compress(clp_core_path_config, test_paths) - - test_paths.clear_test_outputs() + try: + _convert_and_compress(clp_core_path_config, test_paths) + finally: + test_paths.clear_test_outputs()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@integration-tests/tests/test_log_converter.py` around lines 52 - 55, The test currently calls _convert_and_compress(clp_core_path_config, test_paths) directly so if it raises the subsequent test_paths.clear_test_outputs() is skipped; wrap the conversion/compression call in a try/finally (or use a pytest fixture finalizer) so test_paths.clear_test_outputs() always runs in the finally block; reference the call to _convert_and_compress and the cleanup method test_paths.clear_test_outputs() when applying the change so cleanup is guaranteed even on exceptions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@integration-tests/tests/test_log_converter.py`:
- Around line 1-4: The module docstring currently claims the tests verify
lossless compression and decompression which is inaccurate; update the
module-level docstring in test_log_converter.py to state that the integration
tests verify CLP core log conversion, clp-s compression, and searchable event
counts (not decompression), so the description matches the actual assertions and
scope of the tests.
---
Duplicate comments:
In `@integration-tests/tests/test_log_converter.py`:
- Around line 52-55: The test currently calls
_convert_and_compress(clp_core_path_config, test_paths) directly so if it raises
the subsequent test_paths.clear_test_outputs() is skipped; wrap the
conversion/compression call in a try/finally (or use a pytest fixture finalizer)
so test_paths.clear_test_outputs() always runs in the finally block; reference
the call to _convert_and_compress and the cleanup method
test_paths.clear_test_outputs() when applying the change so cleanup is
guaranteed even on exceptions.
In `@integration-tests/tests/utils/config.py`:
- Around line 386-394: In ConversionTestPathConfig.__post_init__ (the method
currently validating test_name and test_root_dir), also validate that
self.logs_source_dir (or the logs_source_dir obtained from the provided
integration_test_path_config) exists by calling
validate_dir_exists(logs_source_dir) and raise a clear ValueError if missing;
add the check immediately after validate_dir_exists(test_root_dir) so failures
fail fast with an actionable error message referencing logs_source_dir.
- Around line 318-323: The num_log_events attribute currently allows negative
values; add a validation in the __post_init__ method to ensure that if
self.num_log_events is not None it is >= 0, otherwise raise a ValueError with a
clear message referencing num_log_events; update the IntegrationTestPathConfig
initialization flow (within __post_init__) to perform this check so invalid
negatives are rejected at construction.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: b24d69b9-5ffd-43c5-9389-f1cdf9a79cd4
📒 Files selected for processing (2)
integration-tests/tests/test_log_converter.pyintegration-tests/tests/utils/config.py
| """ | ||
| Integration tests verifying that CLP core compression binaries perform lossless round-trip | ||
| compression and decompression. | ||
| """ |
There was a problem hiding this comment.
Module docstring overstates what this test verifies.
The test does not validate decompression; it validates log conversion, clp-s compression, and searchable event count.
Proposed fix
-Integration tests verifying that CLP core compression binaries perform lossless round-trip
-compression and decompression.
+Integration tests verifying log-converter output can be compressed by clp-s and searched
+with expected event counts.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| """ | |
| Integration tests verifying that CLP core compression binaries perform lossless round-trip | |
| compression and decompression. | |
| """ | |
| """ | |
| Integration tests verifying log-converter output can be compressed by clp-s and searched | |
| with expected event counts. | |
| """ |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@integration-tests/tests/test_log_converter.py` around lines 1 - 4, The module
docstring currently claims the tests verify lossless compression and
decompression which is inaccurate; update the module-level docstring in
test_log_converter.py to state that the integration tests verify CLP core log
conversion, clp-s compression, and searchable event counts (not decompression),
so the description matches the actual assertions and scope of the tests.
…wed by at least some actual timezone content.
Description
This PR fixes an issue with the timezone regex used by log-converter where " " on its own was accepted as a valid timezone. This leads to
log-converterparsing timestamps without a timezone as the timestamp plus a trailing space. Since the new timestamp parser in clp-s is strict (in order to provide lossless compression), clp-s ends up rejecting the timestamp since the trailing space prevents it from matching any known timestamp pattern.Medium-term, once log-surgeon has stabilized a bit more, we can solve this problem by writing some utilities that convert clp-s timestamp patterns into log-surgeon regex. That way, we eliminate the possibility for log-converter to parse timestamps in a way that clp-s can not interpret.
For now, the fix is to update the timezone regex in
log-converterto only accept space as part of the timezone if it is followed by at least some timezone content.This PR also adds an integration test to validate to give us some basic assurances that the log conversion flow is working as expected.
Checklist
breaking change.
Validation performed
log-convertercould convert, andclp-scould subsequently compress and search the following unstructured log file, and added this test as a new integration test:Summary by CodeRabbit
Bug Fixes
Tests
Chores