Skip to content

feat(clp-s::ffi::sfa): Add ClpArchiveDecoder for iterating log events from decompression and search.#2163

Open
Bill-hbrhbr wants to merge 3 commits intoy-scope:mainfrom
Bill-hbrhbr:clp-s-ffi-sfa/decoder
Open

feat(clp-s::ffi::sfa): Add ClpArchiveDecoder for iterating log events from decompression and search.#2163
Bill-hbrhbr wants to merge 3 commits intoy-scope:mainfrom
Bill-hbrhbr:clp-s-ffi-sfa/decoder

Conversation

@Bill-hbrhbr
Copy link
Copy Markdown
Contributor

@Bill-hbrhbr Bill-hbrhbr commented Apr 1, 2026

Description

This PR adds ClpArchiveDecoder as the utility class for iterating log events decoded from archives. The decoder is created from ClpArchiveReader, and supports both in-order and out-of-order decoding.

Once log events are decoded, they are cached inside the decoder to avoid repeating decoding work.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Summary by CodeRabbit

Release Notes

  • New Features

    • Added archive decoding functionality to extract and retrieve log events from archives
    • Implemented log event data structure containing message, timestamp, and event index information
    • Added support for both ordered and sequential log event consumption
  • Bug Fixes

    • Improved error handling with additional failure diagnostics

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 1, 2026

Walkthrough

Introduces a new ClpArchiveDecoder class for decoding log events from CLP archives, including support for both ordered and sequential event iteration. Adds a LogEvent data structure, expands error handling with a Failure code, integrates the decoder with ClpArchiveReader via a decode_all() method, and updates the build system to include new SFA source files.

Changes

Cohort / File(s) Summary
Build Configuration
components/core/src/clp_s/ffi/CMakeLists.txt
Adds ClpArchiveDecoder.cpp, ClpArchiveDecoder.hpp, and LogEvent.hpp to the CLP_S_FFI_SFA_SOURCES list for compilation when CLP_BUILD_CLP_S_ARCHIVEREADER is enabled.
Core Decoder Implementation
components/core/src/clp_s/ffi/sfa/ClpArchiveDecoder.hpp, components/core/src/clp_s/ffi/sfa/ClpArchiveDecoder.cpp
Introduces ClpArchiveDecoder class with static factory create(), move semantics, lifecycle management (close(), destructor), and event decoding APIs (get_next_log_event(), collect_log_events()). Supports both unordered and ordered decoding strategies based on archive log ordering availability. Internal helpers manage schema table iteration and event accumulation.
Archive Reader Integration
components/core/src/clp_s/ffi/sfa/ClpArchiveReader.hpp, components/core/src/clp_s/ffi/sfa/ClpArchiveReader.cpp
Adds public decode_all() method returning Result<ClpArchiveDecoder>, extends precompute_archive_metadata() to initialize dictionaries and packed streams, grants ClpArchiveDecoder friend class access, and includes the decoder header.
Data Structure
components/core/src/clp_s/ffi/sfa/LogEvent.hpp
Defines LogEvent class storing a log message, timestamp, and log event index. Provides move-enabled constructor and const accessor methods.
Error Handling
components/core/src/clp_s/ffi/sfa/SfaErrorCode.hpp, components/core/src/clp_s/ffi/sfa/SfaErrorCode.cpp
Adds Failure enum member to SfaErrorCodeEnum and corresponding error message mapping in SfaErrorCategory::message().

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant ClpArchiveReader
    participant ClpArchiveDecoder
    participant SchemaReader as SchemaReader(s)
    participant LogEventStorage as m_log_events

    Client->>ClpArchiveReader: decode_all()
    ClpArchiveReader->>ClpArchiveDecoder: create(reader)
    ClpArchiveDecoder->>ClpArchiveReader: Read all tables
    ClpArchiveDecoder->>SchemaReader: Determine log ordering availability
    alt Log Ordering Available
        SchemaReader-->>ClpArchiveDecoder: has_log_order = true
    else No Log Ordering
        SchemaReader-->>ClpArchiveDecoder: has_log_order = false
    end
    ClpArchiveDecoder-->>ClpArchiveReader: Return Result<Decoder>

    Client->>ClpArchiveDecoder: collect_log_events()
    loop While events remain
        ClpArchiveDecoder->>ClpArchiveDecoder: get_next_log_event()
        alt has_log_order
            ClpArchiveDecoder->>SchemaReader: decode_next_log_event_in_order()
        else
            ClpArchiveDecoder->>SchemaReader: decode_next_log_event()
        end
        SchemaReader->>ClpArchiveDecoder: LogEvent
        ClpArchiveDecoder->>LogEventStorage: Append event
    end
    ClpArchiveDecoder-->>Client: Return span<LogEvent const>
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 13.64% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and specifically describes the main change: introducing ClpArchiveDecoder for iterating log events from decompression and search, which aligns directly with the changeset content.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Bill-hbrhbr Bill-hbrhbr requested a review from hoophalab April 1, 2026 21:25
@Bill-hbrhbr Bill-hbrhbr requested a review from junhaoliao April 2, 2026 18:33
@Bill-hbrhbr Bill-hbrhbr marked this pull request as ready for review April 2, 2026 18:35
@Bill-hbrhbr Bill-hbrhbr requested review from a team and gibber9809 as code owners April 2, 2026 18:35
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/core/src/clp_s/ffi/sfa/ClpArchiveDecoder.cpp`:
- Around line 96-107: In ClpArchiveDecoder::append_next_log_event, avoid copying
the local message when adding to m_log_events by moving it into the LogEvent;
update the emplace_back call that currently passes message as an lvalue to use
std::move(message) so the temporary local string is moved into the new LogEvent
(leave timestamp and log_event_idx unchanged).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4fee67aa-6cc1-4d53-b5e5-ad265d62d50b

📥 Commits

Reviewing files that changed from the base of the PR and between 22c1275 and 8a3b6ef.

📒 Files selected for processing (8)
  • components/core/src/clp_s/ffi/CMakeLists.txt
  • components/core/src/clp_s/ffi/sfa/ClpArchiveDecoder.cpp
  • components/core/src/clp_s/ffi/sfa/ClpArchiveDecoder.hpp
  • components/core/src/clp_s/ffi/sfa/ClpArchiveReader.cpp
  • components/core/src/clp_s/ffi/sfa/ClpArchiveReader.hpp
  • components/core/src/clp_s/ffi/sfa/LogEvent.hpp
  • components/core/src/clp_s/ffi/sfa/SfaErrorCode.cpp
  • components/core/src/clp_s/ffi/sfa/SfaErrorCode.hpp

int64_t log_event_idx{0};

if (table.get_next_message_with_metadata(message, timestamp, log_event_idx)) {
m_log_events.emplace_back(message, timestamp, log_event_idx);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
m_log_events.emplace_back(message, timestamp, log_event_idx);
m_log_events.emplace_back(std::move(message), timestamp, log_event_idx);

Copy link
Copy Markdown
Contributor

@hoophalab hoophalab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments.

One high level question:

It might be best to transform the decoder to an iterator. That way, get_next_log_event doesn't need to copy and return a LogEvent from the internal vector.

The behaviour of collect_log_events is also a concern. I haven't looked into it in detail, but it appears that if a user calls get_next_log_event first and then collect_log_events, and get_next_log_event fails, collect_log_events might return a span of LogEvents that silently skips the failed event.

With an iterator, the user could simply collect LogEvents into a vector themselves, avoiding this issue.

* error code indicating the failure:
* - Forwards `ClpArchiveDecoder::create`'s return values on failure.
*/
[[nodiscard]] auto decode_all() -> ystdlib::error_handling::Result<ClpArchiveDecoder>;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Is this function const?
  2. Is the argument of create const?
    [[nodiscard]] static auto create(ClpArchiveReader const& reader)
  3. Can we return
ClpArchiveDecoder{
                m_archive_reader->read_all_tables(),
                m_archive_reader->has_log_order()
}

inside auto decode_all() directly to avoid friend class ClpArchiveDecoder?

Comment on lines +33 to +36
} catch (std::bad_alloc const&) {
SPDLOG_ERROR("Failed to create ClpArchiveDecoder: out of memory.");
return SfaErrorCode{SfaErrorCodeEnum::NoMemory};
} catch (std::exception const& ex) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if catching bad_alloc here and in other functions, and having a dedicated SfaErrorCodeEnum::NoMemory is necessarily.


#include <ystdlib/error_handling/Result.hpp>

#include "ClpArchiveDecoder.hpp"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be a forward declaration instead?

namespace clp_s::ffi::sfa {
class LogEvent {
public:
LogEvent(std::string message, int64_t timestamp, int64_t log_event_idx)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LogEvent(std::string message, int64_t timestamp, int64_t log_event_idx)
explicit LogEvent(std::string message, int64_t timestamp, int64_t log_event_idx)

* Error code enum for SFA API operations.
*/
enum class SfaErrorCodeEnum : uint8_t {
Failure,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about

Suggested change
Failure,
DecodeFailure,

Comment on lines +122 to +141
std::shared_ptr<clp_s::SchemaReader> next_table;
int64_t next_log_event_idx{0};
bool found_next_table{false};

for (auto const& table : m_tables) {
if (table->done()) {
continue;
}

auto const table_log_event_idx{table->get_next_log_event_idx()};
if (false == found_next_table || table_log_event_idx < next_log_event_idx) {
next_table = table;
next_log_event_idx = table_log_event_idx;
found_next_table = true;
}
}

if (false == found_next_table) {
return false;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::shared_ptr<clp_s::SchemaReader> next_table;
int64_t next_log_event_idx{0};
bool found_next_table{false};
for (auto const& table : m_tables) {
if (table->done()) {
continue;
}
auto const table_log_event_idx{table->get_next_log_event_idx()};
if (false == found_next_table || table_log_event_idx < next_log_event_idx) {
next_table = table;
next_log_event_idx = table_log_event_idx;
found_next_table = true;
}
}
if (false == found_next_table) {
return false;
}
std::shared_ptr<clp_s::SchemaReader> next_table;
int64_t next_log_event_idx{INT64_MAX};
for (auto const& table : m_tables) {
if (table->done()) {
continue;
}
auto const table_log_event_idx{table->get_next_log_event_idx()};
if (table_log_event_idx < next_log_event_idx) {
next_table = table;
next_log_event_idx = table_log_event_idx;
}
}
if (nullptr == next_table) {
return false;
}
  1. Does removing found_next_table make this function more understandable?
  2. The function looks correct to me but I'm not fully confident. Best to ask @gibber9809 to take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants