Fix bug 5811173: update cufile tests and configuration#1821
Fix bug 5811173: update cufile tests and configuration#1821rsarpangalav wants to merge 4 commits intoNVIDIA:mainfrom
Conversation
|
/ok to test |
|
/ok to test 1b672c9 |
|
|
@rsarpangalav let us know if this is ready, thanks! |
test_cufile.py: skip compat bool params in set_get_parameter_bool
Avoid setting allow_compat_mode/force_compat_mode before driver_open; pending
values can be applied on first open and interact badly with cufile.json when
nvidia-fs is not loaded (DRIVER_NOT_INITIALIZED). Compat behavior remains
covered elsewhere.
cufile.json: Set allow_compat_mode to true
Open driver once to read size_t/bool/string originals, then close before set/get/restore round-trips so pending does not restore invalid pre-open values (e.g. per-buffer cache 0). Aligns with review feedback.
1b672c9 to
1b513d5
Compare
|
@rsarpangalav plz resolve conflicts, thx |
rwgk
left a comment
There was a problem hiding this comment.
Cursor GPT-5.4 Extra High Fast
As generated:
Findings
-
High: the new post-
driver_open()baseline snapshots in the parameter API tests do not load the compat-enabled test config.test_set_get_parameter_size_t(),test_set_get_parameter_bool(), andtest_set_get_parameter_string()now call_cufile_driver_session()before collecting their original values. However, those tests still only opt intoctx, notcufile_env_json, so their firstdriver_open()does not consumecuda_bindings/tests/cufile.json.That means the new
allow_compat_mode: truesetting only affects the async tests, while the parameter tests can still hit the sameDRIVER_NOT_INITIALIZEDpath on systems withoutnvidia-fs. In other words, the PR adds the config needed to avoid the failure, but does not wire that config into the newly introduceddriver_open()calls that need it most. -
Medium: the bool round-trip test drops direct coverage for the compat parameters.
test_set_get_parameter_bool()removesPROPERTIES_ALLOW_COMPAT_MODEandFORCE_COMPAT_MODEfromparam_val_pairs. I could not find another test in the repo that round-trips those two bool config APIs, so the current PR avoids the failure by skipping the affected parameters instead of exercising them in the intended compat-enabled environment.
Suggested Fix
The minimal local fix is:
- add
cufile_env_jsonto the three parameter snapshot tests so their firstdriver_open()actually sees the compat-enabled test JSON - restore the compat bool parameters to
test_set_get_parameter_bool()now that the compat-enabled config is guaranteed to be active
diff --git a/cuda_bindings/tests/test_cufile.py b/cuda_bindings/tests/test_cufile.py
index 70a7bc80c8..e1da6a3349 100644
--- a/cuda_bindings/tests/test_cufile.py
+++ b/cuda_bindings/tests/test_cufile.py
@@ -1387,7 +1387,7 @@ def test_batch_io_large_operations():
@pytest.mark.skipif(
cufileVersionLessThan(1140), reason="cuFile parameter APIs require cuFile library version 1.14.0 or later"
)
-@pytest.mark.usefixtures("ctx")
+@pytest.mark.usefixtures("ctx", "cufile_env_json")
def test_set_get_parameter_size_t():
"""Test setting and getting size_t parameters with cuFile validation."""
param_val_pairs = (
@@ -1424,17 +1424,11 @@ def test_set_get_parameter_size_t():
@pytest.mark.skipif(
cufileVersionLessThan(1140), reason="cuFile parameter APIs require cuFile library version 1.14.0 or later"
)
-@pytest.mark.usefixtures("ctx")
+@pytest.mark.usefixtures("ctx", "cufile_env_json")
def test_set_get_parameter_bool():
"""Test setting and getting boolean parameters with cuFile validation."""
- # Do not exercise allow/force compat via set_parameter_bool before any driver_open:
- # pending API values are applied after JSON load on first open and can overwrite
- # cufile.json (e.g. allow_compat_mode: true), causing DRIVER_NOT_INITIALIZED when
- # nvidia-fs is not loaded. Other tests cover compat behavior where appropriate.
- _COMPAT_PARAMS = (
- cufile.BoolConfigParameter.PROPERTIES_ALLOW_COMPAT_MODE,
- cufile.BoolConfigParameter.FORCE_COMPAT_MODE,
- )
+ # Load the compat-enabled test config before the first driver_open so the compat
+ # bool params can still be round-tripped on systems without nvidia-fs.
param_val_pairs = (
(cufile.BoolConfigParameter.PROPERTIES_USE_POLL_MODE, True),
(cufile.BoolConfigParameter.PROPERTIES_ALLOW_COMPAT_MODE, False),
@@ -1449,12 +1443,9 @@ def test_set_get_parameter_bool():
(cufile.BoolConfigParameter.SKIP_TOPOLOGY_DETECTION, False),
(cufile.BoolConfigParameter.STREAM_MEMOPS_BYPASS, True),
)
- param_val_pairs = tuple((p, v) for p, v in param_val_pairs if p not in _COMPAT_PARAMS)
# PROFILE_NVTX is deprecated (CTK 13.1.0+); cuFile >= 1.16 rejects bool getters for it.
if cufile.get_version() >= 1160:
- param_val_pairs = tuple(
- (p, v) for p, v in param_val_pairs if p is not cufile.BoolConfigParameter.PROFILE_NVTX
- )
+ param_val_pairs = tuple((p, v) for p, v in param_val_pairs if p is not cufile.BoolConfigParameter.PROFILE_NVTX)
with _cufile_driver_session():
originals = {param: cufile.get_parameter_bool(param) for param, _ in param_val_pairs}
@@ -1474,7 +1465,7 @@ def test_set_get_parameter_bool():
@pytest.mark.skipif(
cufileVersionLessThan(1140), reason="cuFile parameter APIs require cuFile library version 1.14.0 or later"
)
-@pytest.mark.usefixtures("ctx")
+@pytest.mark.usefixtures("ctx", "cufile_env_json")
def test_set_get_parameter_string(tmp_path):
"""Test setting and getting string parameters with cuFile validation."""
temp_dir = tempfile.gettempdir()Local Validation
I applied the suggested patch locally and ran the focused cuFile cases in TestVenv:
TestVenv/bin/python -m pytest -q -ra \
cuda_bindings/tests/test_cufile.py::test_set_get_parameter_size_t \
cuda_bindings/tests/test_cufile.py::test_set_get_parameter_bool \
cuda_bindings/tests/test_cufile.py::test_set_get_parameter_string \
cuda_bindings/tests/test_cufile.py::test_cufile_write_async \
cuda_bindings/tests/test_cufile.py::test_cufile_read_async \
cuda_bindings/tests/test_cufile.py::test_cufile_async_read_writeResult:
...... [100%]
6 passed in 7.02s
warn: error opening log file: No such file or directory, logging will be disabled
I also ran pre-commit run --all-files locally after the patch, and it passed.
|
@rsarpangalav @sourabgupta3 It'd be very helpful if this is merged by tomorrow afternoon, so I can integrate it for the next cycle of QA testing. |
Description
closes
Checklist