The FX-level GatherToLDS op has no MLIR counterpart. This means any pipeline path that touches it in MLIR -- roundtrip, lowering, analysis -- is blocked .
What's needed:
- Define
wave.gather_to_lds in WaveOps.td with appropriate interfaces. Possibly, it should not inherit HasWaveIndexMapping since it lacks a standard index attribute.
- Implement a C++ lowering pattern to
amdgpu.gather_to_lds in LowerReadWriteOps.cpp
- Verify the op interacts correctly with existing MLIR passes and propagations.
- Add emission (
water_emitter.py) and import (fx_emitter.py) support for MLIR roundtrip.
Currently, the MXFP4 roundtrip test works around the absence of this op by setting use_global_to_shared=False and schedule=SchedulingType.NONE, so does water_e2e_test.py. The manual schedule (gemm_mxfp4_double_buffer.py) depends on GatherToLDS nodes, so SchedulingType.MANUAL cannot be used without this.
The FX-level
GatherToLDSop has no MLIR counterpart. This means any pipeline path that touches it in MLIR -- roundtrip, lowering, analysis -- is blocked .What's needed:
wave.gather_to_ldsinWaveOps.tdwith appropriate interfaces. Possibly, it should not inheritHasWaveIndexMappingsince it lacks a standardindexattribute.amdgpu.gather_to_ldsinLowerReadWriteOps.cppwater_emitter.py) and import (fx_emitter.py) support for MLIR roundtrip.Currently, the MXFP4 roundtrip test works around the absence of this op by setting
use_global_to_shared=Falseandschedule=SchedulingType.NONE, so doeswater_e2e_test.py. The manual schedule (gemm_mxfp4_double_buffer.py) depends onGatherToLDSnodes, soSchedulingType.MANUALcannot be used without this.