Skip to content

feat(clp-s::log_converter): Add max-log-event-size argument; Make default max log event size 512 MiB to match clp-s (fixes #2176).#2193

Merged
gibber9809 merged 4 commits intoy-scope:mainfrom
gibber9809:fix-2176
Apr 14, 2026

Conversation

@gibber9809
Copy link
Copy Markdown
Contributor

@gibber9809 gibber9809 commented Apr 9, 2026

…MiB to match clp-s.

Description

This PR brings log-converter in line with clp-s for maximum log event size. We add a command line option to log-converter to make it configurable, and we set the default to 512MiB in order to match clp-s.

It may be worth adding a per-compression-job config option to allow users to raise this limit for datasets they know will have large log lines, but that should be part of a separate PR.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Generated a one line and two line log file that contain log records larger than 64MiB and observed that log-converter now accepts these records by default.
  • Observed that setting --max-log-event-size to a value lower than 64MiB results in the two generated files failing conversion, as expected.

Summary by CodeRabbit

  • New Features

    • Added a --max-log-event-size option (default 512 MiB) to configure the maximum size of individual log events; non‑positive values are rejected.
  • Behavior Change

    • Log conversion and buffer allocation now respect the configured maximum event size, limiting growth and allocation accordingly.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 9, 2026

Walkthrough

Adds a configurable maximum log event/buffer size: new CLI option --max-log-event-size, validation in argument parsing, CommandLineArguments getter and state, and refactors LogConverter to accept and use the configured max size instead of a fixed constant.

Changes

Cohort / File(s) Summary
Command-line argument declaration
components/core/src/clp_s/log_converter/CommandLineArguments.hpp
Added m_max_log_event_size (default 512 MiB) and public getter get_max_log_event_size(). Added <cstddef> for size_t.
Command-line argument parsing
components/core/src/clp_s/log_converter/CommandLineArguments.cpp
Added --max-log-event-size CLI option (value name LOG_EVENT_SIZE) wired to m_max_log_event_size; added runtime validation that throws std::invalid_argument if value ≤ 0.
LogConverter API and internals
components/core/src/clp_s/log_converter/LogConverter.hpp, components/core/src/clp_s/log_converter/LogConverter.cpp
Changed constructor to explicit LogConverter(size_t max_buffer_size); replaced compile-time cMaxBufferSize with instance member m_max_buffer_size; updated grow_buffer_if_full() to use m_max_buffer_size.
LogConverter usage
components/core/src/clp_s/log_converter/log_converter.cpp
Construct LogConverter with command_line_arguments.get_max_log_event_size() instead of default construction.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: adding a --max-log-event-size argument with a 512 MiB default to align log-converter with clp-s, which is substantiated by all modified files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
components/core/src/clp_s/log_converter/LogConverter.cpp (1)

138-143: ⚠️ Potential issue | 🟠 Major

Clamp buffer growth to the configured ceiling.

Many valid --max-log-event-size values are unreachable here. With a 96 MiB limit, the buffer still tops out at 64 MiB because the next growth step jumps to 128 MiB and returns result_out_of_range. 2 * m_buffer.size() can also wrap before the comparison when the configured ceiling is very large. Grow to the smaller of 2x and m_max_buffer_size, and only fail once the buffer is already at the cap.

Suggested fix
-    size_t const new_size{2 * m_buffer.size()};
-    if (new_size > m_max_buffer_size) {
+    auto const current_size{m_buffer.size()};
+    if (current_size >= m_max_buffer_size) {
         return std::errc::result_out_of_range;
     }
+    size_t const new_size{
+            current_size > m_max_buffer_size - current_size ? m_max_buffer_size
+                                                            : current_size + current_size
+    };
     ystdlib::containers::Array<char> new_buffer(new_size);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/core/src/clp_s/log_converter/LogConverter.cpp` around lines 138 -
143, The current growth logic computes new_size as 2 * m_buffer.size() which can
overflow and skips reachable sizes below m_max_buffer_size; change it to first
check if m_buffer.size() >= m_max_buffer_size and return
std::errc::result_out_of_range only when already at the cap, otherwise compute
new_size safely as the smaller of 2*m_buffer.size() and m_max_buffer_size (avoid
overflow by testing m_buffer.size() > m_max_buffer_size/2 and using
m_max_buffer_size in that case), then allocate the new
ystdlib::containers::Array<char> with that new_size and memcpy
m_num_bytes_buffered bytes as before (symbols: m_buffer, m_max_buffer_size,
m_num_bytes_buffered).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/core/src/clp_s/log_converter/LogConverter.hpp`:
- Around line 20-22: The constructor for LogConverter currently always
initializes m_buffer with cDefaultBufferSize which allows records larger than a
configured max; update the LogConverter(size_t max_buffer_size) constructor to
enforce the limit by sizing m_buffer from the provided max_buffer_size (or the
lesser of max_buffer_size and cDefaultBufferSize) and/or validate/reject
max_buffer_size below cDefaultBufferSize by setting m_max_buffer_size and
throwing or asserting; specifically change the initialization logic that sets
m_buffer(cDefaultBufferSize) and adjust handling of m_max_buffer_size so the
initial buffer cannot exceed the configured max and small limits are enforced at
construction time.

---

Outside diff comments:
In `@components/core/src/clp_s/log_converter/LogConverter.cpp`:
- Around line 138-143: The current growth logic computes new_size as 2 *
m_buffer.size() which can overflow and skips reachable sizes below
m_max_buffer_size; change it to first check if m_buffer.size() >=
m_max_buffer_size and return std::errc::result_out_of_range only when already at
the cap, otherwise compute new_size safely as the smaller of 2*m_buffer.size()
and m_max_buffer_size (avoid overflow by testing m_buffer.size() >
m_max_buffer_size/2 and using m_max_buffer_size in that case), then allocate the
new ystdlib::containers::Array<char> with that new_size and memcpy
m_num_bytes_buffered bytes as before (symbols: m_buffer, m_max_buffer_size,
m_num_bytes_buffered).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9137d52b-35d4-4a05-b0f6-1f558fd702ab

📥 Commits

Reviewing files that changed from the base of the PR and between 1e1329d and 8992827.

📒 Files selected for processing (5)
  • components/core/src/clp_s/log_converter/CommandLineArguments.cpp
  • components/core/src/clp_s/log_converter/CommandLineArguments.hpp
  • components/core/src/clp_s/log_converter/LogConverter.cpp
  • components/core/src/clp_s/log_converter/LogConverter.hpp
  • components/core/src/clp_s/log_converter/log_converter.cpp

Comment on lines +20 to +22
explicit LogConverter(size_t max_buffer_size)
: m_buffer(cDefaultBufferSize),
m_max_buffer_size{max_buffer_size} {}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we check that the max buffer size is at least cDefaultBufferSize?

Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/core/src/clp_s/log_converter/CommandLineArguments.cpp`:
- Around line 198-200: The check against m_max_log_event_size should explicitly
test for zero because it's declared as size_t (unsigned); in
CommandLineArguments.cpp replace the condition "m_max_log_event_size <= 0" with
an explicit "m_max_log_event_size == 0" to make intent clear and avoid
misleading unsigned comparisons, leaving the thrown std::invalid_argument
message and surrounding logic in the constructor/validation unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 79f20945-6939-4893-b130-7970f4f07c05

📥 Commits

Reviewing files that changed from the base of the PR and between 8992827 and feb681b.

📒 Files selected for processing (1)
  • components/core/src/clp_s/log_converter/CommandLineArguments.cpp

Copy link
Copy Markdown
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR title, how about:

feat(clp-s::log_converter): Add `max-log-event-size` argument; Make default max log event size 512 MiB to match `clp-s` (fixes #2176).

I think this PR is adding a feature, not strictly fixing an issue...

@gibber9809 gibber9809 changed the title fix(clp-s::log_converter): Add max-log-event-size argument; make default max log event size 512 MiB to match clp-s (fixes #2176). feat(clp-s::log_converter): Add max-log-event-size argument; Make default max log event size 512 MiB to match clp-s (fixes #2176). Apr 14, 2026
@gibber9809 gibber9809 merged commit 5901049 into y-scope:main Apr 14, 2026
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CLP-JSON package fails to ingest unstructured text log files with events larger than 64 MiB

3 participants