Skip to content

fix: eliminate flaky OnetimeConfigUpdateUnittest via injectable clock#2551

Open
Copilot wants to merge 7 commits intomainfrom
copilot/fix-pipelineconfig-access-error
Open

fix: eliminate flaky OnetimeConfigUpdateUnittest via injectable clock#2551
Copilot wants to merge 7 commits intomainfrom
copilot/fix-pipelineconfig-access-error

Conversation

Copy link
Copy Markdown

Copilot AI commented Apr 15, 2026

OnetimeConfigUpdateUnittest::OnCollectionConfigUpdate was intermittently off by 1 second in CI because the production code and the test assertion each called time(nullptr) independently, and a second boundary could tick between them.

Changes

  • common/TimeUtil.h — Under APSARA_UNIT_TEST_MAIN, adds:

    • thread_local std::function<uint64_t()> gCurrentTimeNs — injectable clock returning nanoseconds since epoch, thread_local so background threads spawned by pipelines always use the real clock while the test thread can safely override without data races
    • ScopedClockOverride — RAII guard that accepts a std::chrono::system_clock::time_point, pins the test thread's clock to a fixed nanosecond value for the duration of a test, and restores it on destruction; copy/assign deleted
  • common/TimeUtil.cpp — Defines the thread_local gCurrentTimeNs default (real system_clock::now()). All four GetCurrentTimeIn*() functions derive from it in UT builds by dividing to the appropriate precision.

  • PipelineConfig.cpp — Both time(nullptr) calls in GetExpireTimeIfOneTime() replaced with GetCurrentTimeInSeconds(), routing through the injectable clock in test builds.

  • PipelineConfig.h — Retains only friend class PipelineConfigUnittest (removed unnecessary friend class OnetimeConfigUpdateUnittest, which never accesses protected/private members).

  • OnetimeConfigUpdateUnittest.cpp — Captures system_clock::now() before each UpdatePipelines() call, wraps the call in ScopedClockOverride, and asserts using duration_cast<seconds> of the captured value — making the expected value identical to what the production code computes.

  • PipelineConfigUnittest.cpp — Same pattern applied to all mOnetimeExpireTime assertions (including the previously missed "new config" scenario) that formerly compared against an independent time(nullptr) call.

// Before (flaky — two independent time(nullptr) calls)
CollectionPipelineManager::GetInstance()->UpdatePipelines(diff.first);
APSARA_TEST_EQUAL(time(nullptr) + 1200U, item.mExpireTime);  // may differ by 1 s

// After (deterministic)
auto updateNow = std::chrono::system_clock::now();
{
    ScopedClockOverride clockGuard(updateNow);
    CollectionPipelineManager::GetInstance()->UpdatePipelines(diff.first);
}
APSARA_TEST_EQUAL(
    static_cast<uint32_t>(
        std::chrono::duration_cast<std::chrono::seconds>(updateNow.time_since_epoch()).count()) + 1200U,
    item.mExpireTime);

This pull request was created from Copilot chat.

Copilot AI and others added 5 commits April 15, 2026 06:38
… non-test impls with system_clock

Agent-Logs-Url: https://github.com/alibaba/loongcollector/sessions/c9045f85-6c3a-4a59-9cb4-21453adba656

Co-authored-by: yyuuttaaoo <1827594+yyuuttaaoo@users.noreply.github.com>
…point, update test call sites

Agent-Logs-Url: https://github.com/alibaba/loongcollector/sessions/c9205b39-a601-47e8-9b61-7580709691c6

Co-authored-by: yyuuttaaoo <1827594+yyuuttaaoo@users.noreply.github.com>
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 15, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ yyuuttaaoo
❌ Copilot
You have signed the CLA already but the status is still pending? Let us recheck it.

@yyuuttaaoo yyuuttaaoo changed the title refactor: unify injectable test clock to nanosecond precision via gCurrentTimeNs fix: eliminate flaky OnetimeConfigUpdateUnittest via injectable clock Apr 15, 2026
Removed duplicate include of 'common/TimeUtil.h'.
@yyuuttaaoo yyuuttaaoo marked this pull request as ready for review April 15, 2026 15:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

该 PR 旨在消除 onetime 配置相关单测在 CI 中因跨秒导致的偶现失败,通过在 UT 构建下提供可控“当前时间”,并让过期时间计算与断言使用同一时间源,从而使断言确定化。

Changes:

  • common/TimeUtil 中增加 UT 可注入时钟(gCurrentTimeNs)及 RAII 覆盖器 ScopedClockOverride,并让 GetCurrentTimeIn*() 在 UT 下走该注入点。
  • PipelineConfig::GetExpireTimeIfOneTime() 使用 GetCurrentTimeInSeconds() 替代 time(nullptr),避免生产代码与测试各自取时导致 off-by-1。
  • 更新 OnetimeConfigUpdateUnittest / PipelineConfigUnittest 的部分断言:在调用更新逻辑时固定 now,并用固定值计算期望过期时间。

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
core/common/TimeUtil.h UT 下引入可注入时钟声明与 ScopedClockOverride
core/common/TimeUtil.cpp UT 下实现 gCurrentTimeNs,并让 GetCurrentTimeIn*() 可被覆盖
core/config/PipelineConfig.cpp onetime 过期时间计算改用 GetCurrentTimeInSeconds()
core/config/PipelineConfig.h 新增 OnetimeConfigUpdateUnittest friend 声明
core/unittest/config/OnetimeConfigUpdateUnittest.cpp ScopedClockOverride 固定更新时刻,断言使用固定 now
core/unittest/config/PipelineConfigUnittest.cpp ScopedClockOverride 固定部分场景 now,断言使用固定 now

Comment thread core/config/PipelineConfig.h
Comment thread core/common/TimeUtil.h
Comment thread core/common/TimeUtil.cpp
Comment thread core/unittest/config/PipelineConfigUnittest.cpp
Comment thread core/config/PipelineConfig.h Outdated
@yyuuttaaoo yyuuttaaoo added bug Something isn't working test test and removed bug Something isn't working labels Apr 15, 2026
@yyuuttaaoo
Copy link
Copy Markdown
Collaborator

@copilot apply changes based on the comments in this thread

… friend, fix remaining flaky assertion

Agent-Logs-Url: https://github.com/alibaba/loongcollector/sessions/4d1f4f26-225f-43cd-a0e1-199437031b90

Co-authored-by: yyuuttaaoo <1827594+yyuuttaaoo@users.noreply.github.com>
Copy link
Copy Markdown
Author

Copilot AI commented Apr 15, 2026

@copilot apply changes based on the comments in this thread

已在 c32a7a4 中应用了以下修改:

  1. gCurrentTimeNs 改为 thread_localTimeUtil.h + TimeUtil.cpp):后台线程各自持有独立副本并始终使用真实时钟,测试线程的覆盖操作不会跨线程传播,从根本上消除了对全局 std::function 的并发读写竞争。

  2. 移除 friend class OnetimeConfigUpdateUnittestPipelineConfig.h):确认该测试类未访问 PipelineConfig 的任何受保护/私有成员,friend 声明多余。

  3. 修复 PipelineConfigUnittest.cpp 中剩余的 flaky 断言("new config" 场景,原第 163 行):用 ScopedClockOverride + chrono 替换 time(nullptr) + 3600U,使该处断言与其他场景一致。

Copilot AI requested a review from yyuuttaaoo April 15, 2026 15:39
Comment thread core/common/TimeUtil.cpp
return std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::system_clock::now().time_since_epoch())
.count();
#ifdef APSARA_UNIT_TEST_MAIN
return gCurrentTimeNs() / 1000ULL;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UT为什么要增加特殊分支?

Copy link
Copy Markdown
Collaborator

@yyuuttaaoo yyuuttaaoo Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UT为什么要增加特殊分支?

因为UT要用override的time函数,可能不是system_clock::now(),非UT不允许override time函数。
这可以完全解决

  1. 两个system_clock::now()差了几十毫秒导致差值多了一秒或少了一秒,导致UT失败的问题
  2. 其他依赖sleep的UT,可以利用这个util改造成不依赖sleep的,减少UT运行时间

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants