enh(test_common): add profiler-safe HIP-event timing path to run_perftest#656
Draft
Arist12 wants to merge 1 commit into
Draft
enh(test_common): add profiler-safe HIP-event timing path to run_perftest#656Arist12 wants to merge 1 commit into
Arist12 wants to merge 1 commit into
Conversation
…test Add FLYDSL_PERFTEST_USE_EVENTS=1 to time benchmark iterations with a pair of HIP events rather than torch.profiler. When set, each iteration is bracketed by Event.record() / Event.synchronize() and the mean latency is returned as usual, but torch.profiler is never entered. This is necessary when running benchmarks under an external rocprofv3 session: nesting torch.profiler (ROCTracer) inside rocprofv3 produces duplicate-flow warnings and can perturb timing. With the events path the benchmark command line stays identical; only the internal timing backend changes. Lazy-import torch.profiler in the default path so the module-level import no longer pulls in ROCTracer on every test collection. The testGraph path gets the same lazy import. FLYDSL_PERFTEST_USE_EVENTS is not set in any test; default behavior is unchanged.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
run_perftesttimes benchmark iterations by running them undertorch.profiler(ROCTracer). When the benchmark is driven by an externalrocprofv3session, nesting torch.profiler inside rocprofv3 causes duplicate-flow warnings from ROCTracer and can perturb the measured kernel time. There was no way to time the same workload without entering the profiler.The module-level
import torch.profiler as tpfalso caused ROCTracer to be imported eagerly on every pytest collection.Solution
Add a
FLYDSL_PERFTEST_USE_EVENTS=1environment variable that switches the internal timing backend to paired HIP events, bypassingtorch.profilerentirely. When set:num_itersiterations is bracketed bytorch.cuda.Event(enable_timing=True)record/synchronize pairs.rocprofv3 --statsreports per dispatch.torch.profileris never imported or entered.When the variable is unset (default), behavior is identical to before.
torch.profileris now lazy-imported in both the default timing branch and thetestGraphbranch, so it is only pulled in when actually used.Typical usage
Testing
FLYDSL_PERFTEST_USE_EVENTS=1and=0paths verified to return valid(data, avg_us)tuples.FLYDSL_LOG_MORE=1combined withFLYDSL_PERFTEST_USE_EVENTS=1does not crash.tests/unit/test_compile_hints.py::TestCacheDisabledRegressionpasses (this test directly callsrun_perftest).python -m pytest tests/unit/ -m "not l2_device and not rocm_lower": 356 passed, 2 pre-existing failures unrelated to this change.