Add external C functions support for JIT#2475
Conversation
|
Looks good, I have some questions:
|
|
Echoing that I am interested in Jeff's questions and casting a vote for |
|
For first and last comment, the implementation in the PR inherits the capabilities of the Kernel and Worker class.
Technically, yes, you could have multiple kernels but you only declare one function that you will use at the moment. We could turn the
Yeah I like
Looks like yes! I have just tried what is below. Not suggesting that the user go find the mangled name but not sure yet what would be a nice way to resolve that (i.e., what would the user type for name) or even better lookup the function when generating the MLIR (not sure if it is straight forward). Any ideas? (Would love to discuss this more but prefer to keep it to a different PR). add_one_templated = ExternalKernel(
"_Z7add_oneIiEvPT_S1_i",
source_string="""
template<typename T>
void add_one(T* input, T* output, int tile_size) {
for (int i = 0; i < tile_size; i++) {
output[i] = input[i] + 1;
}
}
template void add_one<int>(int* input, int* output, int tile_size);
""",
arg_types=[
np.ndarray[(16,), np.dtype[np.int32]],
np.ndarray[(16,), np.dtype[np.int32]],
np.int32,
],
) |
|
Also CI is failing on NPU2 for some reason -- I will need to debug that, I think I am passing the correct compiler options. |
There was a problem hiding this comment.
Some comments on the code. The biggest one is that I don't see how this can be used outside iron.jit and relatively simple core functions.
Things that I'd be looking as a developer:
- Ability to separate the C and Python codes for ease of development and testability.
- Ability to compile an external function / kernel outside
iron.jit. - Maintaining the flexibility of compiling the C code manually.
|
The alternative I have been using is this one: def unary_op(input_tensor, output_tensor, core_function_info: CoreFunctionInfo):
"""Implements output = op(input)."""
def unary_op_core_function_info(
op_name: str, device, input_tensors: list, output_tensor
):
"""Returns a compilation specification for unary ops."""
current_dir = path.dirname(path.realpath(__file__))
return CoreFunctionInfo(
source_file=path.join(current_dir, "file_with_ops.cc"),
exported_function= op_name,
compile_args=[
f"-DCOMPILE_{op_name.upper()}=1",
f"-DINPUT_DTYPE={dtype_to_str(input_tensors[0].dtype)}",
f"-DOUTPUT_DTYPE={dtype_to_str(output_tensor.dtype)}",
],
)
@core_function(partial(unary_op_core_function_info, op_name="sqrt"))
def sqrt(
input_tensors: list, output_tensor, core_function_info: CoreFunctionInfo
):
"""SQRT implementation."""
return unary_op(*input_tensors, output_tensor, core_function_info)A compilation function is responsible from getting the attribute |
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
|
Thanks for the feedback @ypapadop-amd. I addressed some comments and moved some other discussion to the main thread here (naming and C++ support). Regarding the naming, I am open to ideas, we currently think This abstraction is intended for JIT module only. There is already a |
We should explore unification and simplification for external functions to focus on supporting unplaced and JIT Python |
I think when we have the opportunity to converge to a single design that fulfills all use-cases, we should seize it, rather than keep diverging. |
A
We can get a list of exported functions from the add_one_templated = ExternalKernel(
"void add_one<int>(int*, int*, int)",
source_string="""
template<typename T>
void add_one(T* input, T* output, int tile_size) {
for (int i = 0; i < tile_size; i++) {
output[i] = input[i] + 1;
}
}
template void add_one<int>(int* input, int* output, int tile_size);
""", |
|
OK I got CI working finally. NPU2 runner was hitting a cached NPU1 code somehow (is there a shared storage?). The hash now contains enough information to disambiguate archs. Please review the PR again. Regarding the suggested improvements (ordered by what I think is important):
|
|
@fifield @mawad-amd I made a prototype to support mangled names in #2480 I need to figure out why I need explicit paths but it seems to work. |
jgmelber
left a comment
There was a problem hiding this comment.
I think this LGTM. I'd like to see how one of the existing programming examples would work with this compiling from a file in aie_kernels.
Lets discuss a plan to convert more/all programming examples to JIT once this lands.
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This PR adds:
tensor.fill_function and test for itThe syntax looks like: