[PyTorch] Support scaled + clamped SwiGLU in te.ops and enable fused MXFP8 grouped MLP
#2855
+299
−58
Loading