Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion components/core/src/clp_s/log_converter/LogConverter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,9 @@ constexpr std::string_view cTimestampSchema{
R"(((Jan(uary){0,1})|(Feb(ruary){0,1})|(Mar(ch){0,1})|(Apr(il){0,1})|(May)|(Jun(e){0,1})|)"
R"((Jul(y){0,1})|(Aug(ust){0,1})|(Sep(tember){0,1})|(Oct(ober){0,1})|(Nov(ember){0,1})|)"
R"((Dec(ember){0,1}))[ /\-]\d{2,4}))[ T:][ 0-9]{2}:[ 0-9]{2}:[ 0-9]{2}([,\.:]\d{1,9}){0,1})"
R"(([ ]{0,1}(UTC){0,1}([\+\-]\d{2}(:{0,1}\d{2}){0,1}){0,1}Z{0,1}){0,1}))"
R"(((( UTC){0,1}([\+\-]\d{2}(:{0,1}\d{2}){0,1}){0,1}Z{0,1})|)"
R"((( [\+\-]\d{2}(:{0,1}\d{2}){0,1}){0,1}Z{0,1})|(( Z){0,1})|)"
R"(((UTC){0,1}([\+\-]\d{2}(:{0,1}\d{2}){0,1}){0,1}Z{0,1})){0,1}))"
Comment on lines +33 to +35
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Add regression tests for the exact timezone edge cases fixed here.

Please add automated coverage for at least: no timezone, lone space, " Z", " UTC", " +05:30", and "UTC+05:30" so this parser/serializer boundary doesn’t regress again.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/core/src/clp_s/log_converter/LogConverter.cpp` around lines 33 -
35, Add automated regression tests that exercise the LogConverter
parsing/serialization boundary for the exact timezone edge cases mentioned: no
timezone, lone space, " Z", " UTC", " +05:30", and "UTC+05:30". Create unit
tests that call the LogConverter::parseTimestamp (or the equivalent parsing
function) with inputs for each case and then call
LogConverter::serializeTimestamp (or the corresponding serializer) on the parsed
result, asserting round-trip equality (or normalized canonical form) for each
case so future changes to the regex in LogConverter.cpp are covered.

};

constexpr std::string_view cDelimiters{R"(delimiters: \t\r\n[(:)"};
Expand Down
37 changes: 37 additions & 0 deletions integration-tests/tests/fixtures/integration_test_logs.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
"""Session-scoped test log fixtures shared across integration tests."""

import logging
import pathlib
import subprocess

import pytest
Expand Down Expand Up @@ -45,6 +46,42 @@ def postgresql(
)


@pytest.fixture(scope="session")
def simple_unstructured(
request: pytest.FixtureRequest,
integration_test_path_config: IntegrationTestPathConfig,
) -> IntegrationTestLogs:
"""Provides a simple unstructured test log."""
name = "simple_unstructured"
integration_test_logs = IntegrationTestLogs(
name=name,
tarball_url=f"{name}.tar.gz",
integration_test_path_config=integration_test_path_config,
num_log_events=11,
)
remove_path(integration_test_logs.extraction_dir)
integration_test_logs.extraction_dir.mkdir(parents=True, exist_ok=False)

with pathlib.Path.open(integration_test_logs.extraction_dir / f"{name}.log", "w") as f:
f.write(
"2015-03-23 05:48:30,122 TEST1\n"
"2015-03-23 05:48:30,122Z TEST2\n"
"2015-03-23 05:48:30,122 Z TEST3\n"
"2015-03-23 05:48:30,122+00 TEST4\n"
"2015-03-23 05:48:30,122+00Z TEST5\n"
"2015-03-23 05:48:30,122 +00 TEST6\n"
"2015-03-23 05:48:30,122 +00Z TEST7\n"
"2015-03-23 05:48:30,122UTC+00 TEST8\n"
"2015-03-23 05:48:30,122UTC+00Z TEST9\n"
"2015-03-23 05:48:30,122 UTC+00 TEST10\n"
"2015-03-23 05:48:30,122 UTC+00Z TEST11\n"
)

logger.info("Set up logs for dataset `%s`.", name)
request.config.cache.set(name, True)
return integration_test_logs


def _download_and_extract_gzip_dataset(
request: pytest.FixtureRequest,
integration_test_path_config: IntegrationTestPathConfig,
Expand Down
81 changes: 81 additions & 0 deletions integration-tests/tests/test_log_converter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
"""
Integration tests verifying that CLP core compression binaries perform lossless round-trip
compression and decompression.
"""
Comment on lines +1 to +4
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Module docstring overstates what this test verifies.

The test does not validate decompression; it validates log conversion, clp-s compression, and searchable event count.

Proposed fix
-Integration tests verifying that CLP core compression binaries perform lossless round-trip
-compression and decompression.
+Integration tests verifying log-converter output can be compressed by clp-s and searched
+with expected event counts.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"""
Integration tests verifying that CLP core compression binaries perform lossless round-trip
compression and decompression.
"""
"""
Integration tests verifying log-converter output can be compressed by clp-s and searched
with expected event counts.
"""
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/tests/test_log_converter.py` around lines 1 - 4, The module
docstring currently claims the tests verify lossless compression and
decompression which is inaccurate; update the module-level docstring in
test_log_converter.py to state that the integration tests verify CLP core log
conversion, clp-s compression, and searchable event counts (not decompression),
so the description matches the actual assertions and scope of the tests.


import pytest

from tests.utils.config import (
ClpCorePathConfig,
ConversionTestPathConfig,
IntegrationTestLogs,
IntegrationTestPathConfig,
)
from tests.utils.subprocess_utils import run_and_log_subprocess

pytestmark = pytest.mark.core

text_datasets = pytest.mark.parametrize(
"test_logs_fixture",
[
"simple_unstructured",
],
)


@pytest.mark.clp_s
@text_datasets
def test_log_converter_transform(
request: pytest.FixtureRequest,
clp_core_path_config: ClpCorePathConfig,
integration_test_path_config: IntegrationTestPathConfig,
test_logs_fixture: str,
) -> None:
"""
Validate that converted logs from the core binary `log-converter` can be ingested successfully
by `clp-s`.

:param request:
:param clp_core_path_config:
:param integration_test_path_config:
:param test_logs_fixture:
"""
integration_test_logs: IntegrationTestLogs = request.getfixturevalue(test_logs_fixture)
test_logs_name = integration_test_logs.name

test_paths = ConversionTestPathConfig(
test_name=f"clp-s-{test_logs_name}",
logs_source_dir=integration_test_logs.extraction_dir,
num_log_events=integration_test_logs.num_log_events,
integration_test_path_config=integration_test_path_config,
)
_convert_and_compress(clp_core_path_config, test_paths)

test_paths.clear_test_outputs()

Comment on lines +52 to +55
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Ensure cleanup always runs on failure.

If _convert_and_compress raises, post-test cleanup is skipped. Wrap execution in try/finally to avoid leaking artefacts across runs.

Proposed fix
-    _convert_and_compress(clp_core_path_config, test_paths)
-
-    test_paths.clear_test_outputs()
+    try:
+        _convert_and_compress(clp_core_path_config, test_paths)
+    finally:
+        test_paths.clear_test_outputs()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/tests/test_log_converter.py` around lines 52 - 55, The test
currently calls _convert_and_compress(clp_core_path_config, test_paths) then
test_paths.clear_test_outputs(), but if _convert_and_compress raises the cleanup
is skipped; wrap the call to _convert_and_compress in a try/finally so
test_paths.clear_test_outputs() is invoked in the finally block regardless of
errors. Locate the invocation of _convert_and_compress and replace the
sequential calls with a try { _convert_and_compress(...) } finally {
test_paths.clear_test_outputs() } pattern to guarantee cleanup.


def _convert_and_compress(
clp_core_path_config: ClpCorePathConfig,
test_paths: ConversionTestPathConfig,
) -> None:
test_paths.clear_test_outputs()
log_converter_bin_path = str(clp_core_path_config.log_converter_binary_path)
clp_s_bin_path = str(clp_core_path_config.clp_s_binary_path)
src_path = str(test_paths.logs_source_dir)
conversion_path = str(test_paths.conversion_dir)
compression_path = str(test_paths.compression_dir)
run_and_log_subprocess([log_converter_bin_path, src_path, "--output-dir", conversion_path])
run_and_log_subprocess(
[clp_s_bin_path, "c", compression_path, conversion_path, "--timestamp-key", "timestamp"]
)

if test_paths.num_log_events is None:
return

output = run_and_log_subprocess([clp_s_bin_path, "s", compression_path, "timestamp > 0"])
num_events = 0 if output.stdout is None else len(output.stdout.splitlines())
if num_events != test_paths.num_log_events:
pytest.fail(
f"Expected {test_paths.num_log_events} log events after conversion, "
f"but found {num_events}."
)
42 changes: 42 additions & 0 deletions integration-tests/tests/utils/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,11 @@ def clp_s_binary_path(self) -> Path:
""":return: The absolute path to the core binary `clp-s`."""
return self.clp_core_bins_dir / "clp-s"

@property
def log_converter_binary_path(self) -> Path:
""":return: The absolute path to the core binary `log-converter`."""
return self.clp_core_bins_dir / "log-converter"


@dataclass(frozen=True)
class PackagePathConfig:
Expand Down Expand Up @@ -310,6 +315,8 @@ class IntegrationTestLogs:
tarball_path: Path = field(init=False, repr=True)
#:
extraction_dir: Path = field(init=False, repr=True)
#: Optional number of log events in the downloaded logs.
num_log_events: int | None = None
Comment on lines +318 to +319
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validate num_log_events bounds at construction.

num_log_events currently accepts negative values, which can produce invalid expectations in event-count assertions downstream.

Proposed fix
 def __post_init__(self, integration_test_path_config: IntegrationTestPathConfig) -> None:
     """Initialize and set tarball and extraction paths for integration test logs."""
+    if self.num_log_events is not None and self.num_log_events < 0:
+        err_msg = "`num_log_events` cannot be negative."
+        raise ValueError(err_msg)
+
     name = self.name.strip()
     if 0 == len(name):
         err_msg = "`name` cannot be empty."
         raise ValueError(err_msg)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/tests/utils/config.py` around lines 318 - 319, The field
num_log_events currently allows negative values; add a validation on
construction in the class that declares num_log_events to ensure it is either
None or a non-negative integer. If this is a dataclass, implement a
__post_init__ that raises ValueError when self.num_log_events is not None and
self.num_log_events < 0; if it’s a pydantic model add a `@validator` for
"num_log_events" performing the same check. Update any callers/tests expecting
errors accordingly.


def __post_init__(self, integration_test_path_config: IntegrationTestPathConfig) -> None:
"""Initialize and set tarball and extraction paths for integration test logs."""
Expand Down Expand Up @@ -358,3 +365,38 @@ def clear_test_outputs(self) -> None:
"""Remove any existing output directories created by this compression test."""
remove_path(self.compression_dir)
remove_path(self.decompression_dir)


@dataclass(frozen=True)
class ConversionTestPathConfig:
"""Per-test path configuration for conversion workflow artifacts."""

#:
test_name: str
#: Directory containing the original (uncompressed) log files used by this test.
logs_source_dir: Path
integration_test_path_config: InitVar[IntegrationTestPathConfig]
#: Path to store converted kv-ir files generated by the test.
conversion_dir: Path = field(init=False, repr=True)
#: Path to store compressed archives generated by the test.
compression_dir: Path = field(init=False, repr=True)
#: Optional number of log events in the converted logs.
num_log_events: int | None = None

def __post_init__(self, integration_test_path_config: IntegrationTestPathConfig) -> None:
"""Initialize and set required directory paths for conversion tests."""
test_name = self.test_name.strip()
if 0 == len(test_name):
err_msg = "`test_name` cannot be empty."
raise ValueError(err_msg)
test_root_dir = integration_test_path_config.test_root_dir
validate_dir_exists(test_root_dir)

object.__setattr__(self, "test_name", test_name)
object.__setattr__(self, "conversion_dir", test_root_dir / f"{test_name}-converted")
object.__setattr__(self, "compression_dir", test_root_dir / f"{test_name}-archives")

Comment on lines +386 to +398
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add existence validation for logs_source_dir.

ConversionTestPathConfig stores logs_source_dir but does not validate it. Failing fast here gives clearer diagnostics than letting subprocess calls fail later.

Proposed fix
 def __post_init__(self, integration_test_path_config: IntegrationTestPathConfig) -> None:
     """Initialize and set required directory paths for conversion tests."""
     test_name = self.test_name.strip()
     if 0 == len(test_name):
         err_msg = "`test_name` cannot be empty."
         raise ValueError(err_msg)
+    validate_dir_exists(self.logs_source_dir)
     test_root_dir = integration_test_path_config.test_root_dir
     validate_dir_exists(test_root_dir)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/tests/utils/config.py` around lines 386 - 398, The
__post_init__ of ConversionTestPathConfig fails to validate the provided
logs_source_dir which leads to later subprocess failures; update __post_init__
to retrieve integration_test_path_config.logs_source_dir, call
validate_dir_exists on it (like test_root_dir), and then set the instance
attribute (object.__setattr__(self, "logs_source_dir", logs_source_dir)) so the
logs directory existence is checked early.

def clear_test_outputs(self) -> None:
"""Remove any existing output directories created by this conversion test."""
remove_path(self.conversion_dir)
remove_path(self.compression_dir)
Loading