Skip to content

Add external C functions support for JIT#2475

Merged
mawad-amd merged 42 commits intomainfrom
muhaawad/extern-func
Aug 8, 2025
Merged

Add external C functions support for JIT#2475
mawad-amd merged 42 commits intomainfrom
muhaawad/extern-func

Conversation

@mawad-amd
Copy link
Copy Markdown
Collaborator

This PR adds:

  1. External function support for JIT-compiled designs
  2. Tests for external functions
  3. Helper tensor.fill_ function and test for it

The syntax looks like:

add_value = ExternalKernel(
     "add_value",
     source_string="""extern "C" {
         void add_value(int* input, int* output, int tile_size) {
             for (int i = 0; i < tile_size; i++) {
                 output[i] = input[i] + ADD_VALUE;
             }
         }
     }""",
     arg_types=[
         np.ndarray[(16,), np.dtype[np.int32]],
         np.ndarray[(16,), np.dtype[np.int32]],
         np.int32,
     ],
     compile_flags=[f"-DADD_VALUE=2"],
 )
transform(input_tensor, output_tensor, add_value)

@fifield
Copy link
Copy Markdown
Collaborator

fifield commented Aug 5, 2025

Looks good, I have some questions:

  • Can there be multiple top level functions in one source string, which go into a single object file? Or will multiple ExternalKernel be combined into one .o?
  • At the risk of bikeshedding a bit, why ExternalKernel instead of ExternalFunction (if a single function)? or ExternalModule (if multiple functions per source string)?
  • Does C++ mangling work? i.e. could I drop the extern "C"?

@jgmelber
Copy link
Copy Markdown
Collaborator

jgmelber commented Aug 5, 2025

Echoing that I am interested in Jeff's questions and casting a vote for ExternalFunction.

@mawad-amd
Copy link
Copy Markdown
Collaborator Author

mawad-amd commented Aug 5, 2025

For first and last comment, the implementation in the PR inherits the capabilities of the Kernel and Worker class.

  • Can there be multiple top level functions in one source string, which go into a single object file? Or will multiple ExternalKernel be combined into one .o?

Technically, yes, you could have multiple kernels but you only declare one function that you will use at the moment. We could turn the name into a list of names (my changes should still work) but the Worker class supports only one single external function call. Seems like we should be able to support multiple functions in the Worker abstraction but please correct me if not (I didn't author the worker class).

  • At the risk of bikeshedding a bit, why ExternalKernel instead of ExternalFunction (if a single function)? or ExternalModule (if multiple functions per source string)?

Yeah I like ExternalFunction too. Will make the change when we finish discussing Function vs. Module (I think we have 3 votes).

  • Does C++ mangling work? i.e. could I drop the extern "C"?

Looks like yes! I have just tried what is below. Not suggesting that the user go find the mangled name but not sure yet what would be a nice way to resolve that (i.e., what would the user type for name) or even better lookup the function when generating the MLIR (not sure if it is straight forward). Any ideas? (Would love to discuss this more but prefer to keep it to a different PR).

    add_one_templated = ExternalKernel(
        "_Z7add_oneIiEvPT_S1_i",
        source_string="""
            template<typename T>
            void add_one(T* input, T* output, int tile_size) {
                for (int i = 0; i < tile_size; i++) {
                    output[i] = input[i] + 1;
                }
            }

            template void add_one<int>(int* input, int* output, int tile_size);
        """,
        arg_types=[
            np.ndarray[(16,), np.dtype[np.int32]],
            np.ndarray[(16,), np.dtype[np.int32]],
            np.int32,
        ],
    )

@mawad-amd
Copy link
Copy Markdown
Collaborator Author

Also CI is failing on NPU2 for some reason -- I will need to debug that, I think I am passing the correct compiler options.

Copy link
Copy Markdown
Collaborator

@ypapadop-amd ypapadop-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments on the code. The biggest one is that I don't see how this can be used outside iron.jit and relatively simple core functions.

Things that I'd be looking as a developer:

  • Ability to separate the C and Python codes for ease of development and testability.
  • Ability to compile an external function / kernel outside iron.jit.
  • Maintaining the flexibility of compiling the C code manually.

Comment thread python/iron/jit.py
Comment thread python/iron/jit.py Outdated
Comment thread test/python/jit_extern_functions.py
Comment thread test/python/jit_extern_functions.py
@ypapadop-amd
Copy link
Copy Markdown
Collaborator

ypapadop-amd commented Aug 5, 2025

The alternative I have been using is this one:

def unary_op(input_tensor, output_tensor, core_function_info: CoreFunctionInfo):
    """Implements output = op(input)."""

def unary_op_core_function_info(
    op_name: str, device, input_tensors: list, output_tensor
):
    """Returns a compilation specification for unary ops."""

    current_dir = path.dirname(path.realpath(__file__))
    return CoreFunctionInfo(
        source_file=path.join(current_dir, "file_with_ops.cc"),
        exported_function= op_name,
        compile_args=[
            f"-DCOMPILE_{op_name.upper()}=1",
            f"-DINPUT_DTYPE={dtype_to_str(input_tensors[0].dtype)}",
            f"-DOUTPUT_DTYPE={dtype_to_str(output_tensor.dtype)}",
        ],
    )

@core_function(partial(unary_op_core_function_info, op_name="sqrt"))
def sqrt(
    input_tensors: list, output_tensor, core_function_info: CoreFunctionInfo
):
    """SQRT implementation."""
    return unary_op(*input_tensors, output_tensor, core_function_info)

A compilation function is responsible from getting the attribute core_function from sqrt. The core function knows about args and device intentionally. We could add additional attributes to determine the tile sizes etc.

Comment thread python/iron/jit.py Outdated
mawad-amd and others added 4 commits August 5, 2025 14:36
@mawad-amd
Copy link
Copy Markdown
Collaborator Author

mawad-amd commented Aug 5, 2025

Thanks for the feedback @ypapadop-amd. I addressed some comments and moved some other discussion to the main thread here (naming and C++ support). Regarding the naming, I am open to ideas, we currently think ExternalFunction is best. Regrading C++, we can support it but it opens other questions see above.

This abstraction is intended for JIT module only. There is already a Kernel abstraction that users can use for non-JIT designs. I would like to have the JIT module have full visibility over the entire Python program, and I generally think that there is some unifications and cleanup at the backend (for placed vs. unplaced) -- longer discussion.

@jgmelber
Copy link
Copy Markdown
Collaborator

jgmelber commented Aug 5, 2025

There is already a Kernel abstraction that users can use for non-JIT designs. I would like to have the JIT module have full visibility over the entire Python program, and I generally think that there is some unifications and cleanup at the backend (for placed vs. unplaced) -- longer discussion.

We should explore unification and simplification for external functions to focus on supporting unplaced and JIT Python

@ypapadop-amd
Copy link
Copy Markdown
Collaborator

Thanks for the feedback @ypapadop-amd. I addressed some comments and moved some other discussion to the main thread here (naming and C++ support). Regarding the naming, I am open to ideas, we currently think ExternalFunction is best. Regrading C++, we can support it but it opens other questions see above.

This abstraction is intended for JIT module only. There is already a Kernel abstraction that users can use for non-JIT designs. I would like to have the JIT module have full visibility over the entire Python program, and I generally think that there is some unifications and cleanup at the backend (for placed vs. unplaced) -- longer discussion.

I think when we have the opportunity to converge to a single design that fulfills all use-cases, we should seize it, rather than keep diverging.

@ypapadop-amd
Copy link
Copy Markdown
Collaborator

ypapadop-amd commented Aug 6, 2025

For first and last comment, the implementation in the PR inherits the capabilities of the Kernel and Worker class.

  • Can there be multiple top level functions in one source string, which go into a single object file? Or will multiple ExternalKernel be combined into one .o?

Technically, yes, you could have multiple kernels but you only declare one function that you will use at the moment. We could turn the name into a list of names (my changes should still work) but the Worker class supports only one single external function call. Seems like we should be able to support multiple functions in the Worker abstraction but please correct me if not (I didn't author the worker class).

A .o can have multiple exported symbols and we should be able to call them if they are store as a key-value pair (key being a a "templated" function type and value the strongly typed one).

  • Does C++ mangling work? i.e. could I drop the extern "C"?

Looks like yes! I have just tried what is below. Not suggesting that the user go find the mangled name but not sure yet what would be a nice way to resolve that (i.e., what would the user type for name) or even better lookup the function when generating the MLIR (not sure if it is straight forward). Any ideas? (Would love to discuss this more but prefer to keep it to a different PR).

    add_one_templated = ExternalKernel(
        "_Z7add_oneIiEvPT_S1_i",
        source_string="""
            template<typename T>
            void add_one(T* input, T* output, int tile_size) {
                for (int i = 0; i < tile_size; i++) {
                    output[i] = input[i] + 1;
                }
            }

            template void add_one<int>(int* input, int* output, int tile_size);
        """,
        arg_types=[
            np.ndarray[(16,), np.dtype[np.int32]],
            np.ndarray[(16,), np.dtype[np.int32]],
            np.int32,
        ],
    )

We can get a list of exported functions from the .o and demangle them using cxxfilt (https://pypi.org/project/cxxfilt/) in a dict of demangled-mangled names. Then at the call site, you can give the demangled name and match it to one of them. E.g.,

    add_one_templated = ExternalKernel(
        "void add_one<int>(int*, int*, int)",
        source_string="""
            template<typename T>
            void add_one(T* input, T* output, int tile_size) {
                for (int i = 0; i < tile_size; i++) {
                    output[i] = input[i] + 1;
                }
            }

            template void add_one<int>(int* input, int* output, int tile_size);
        """,

Comment thread python/iron/jit.py Outdated
@mawad-amd
Copy link
Copy Markdown
Collaborator Author

mawad-amd commented Aug 6, 2025

OK I got CI working finally. NPU2 runner was hitting a cached NPU1 code somehow (is there a shared storage?). The hash now contains enough information to disambiguate archs. Please review the PR again.

Regarding the suggested improvements (ordered by what I think is important):

  1. Backend improvements: There are some ownership issues at the moment (even without considering the JIT module). I think what we need to do is to restructure the entire backend (for unplaced and JIT) so that we have a Program that owns Kernel/Function objects and provide APIs for compilation and iterators over objects. I can discuss more and happy to help with this, but this will require decent amount of work, more than one person but is the most important thing IMO.
  2. Support for multiple functions within the same object file: once worker class support multiple functions (1), we can pursue this.
  3. C++ support would be nice improvement. Open an issue for this.

Comment thread python/iron/jit.py
Comment thread python/iron/jit.py
@ypapadop-amd
Copy link
Copy Markdown
Collaborator

@fifield @mawad-amd I made a prototype to support mangled names in #2480 I need to figure out why I need explicit paths but it seems to work.

Copy link
Copy Markdown
Collaborator

@jgmelber jgmelber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this LGTM. I'd like to see how one of the existing programming examples would work with this compiling from a file in aie_kernels.

Lets discuss a plan to convert more/all programming examples to JIT once this lands.

@mawad-amd mawad-amd enabled auto-merge August 8, 2025 20:09
@mawad-amd mawad-amd added this pull request to the merge queue Aug 8, 2025
Merged via the queue into main with commit cea3392 Aug 8, 2025
51 checks passed
@mawad-amd mawad-amd deleted the muhaawad/extern-func branch August 8, 2025 20:48
fifield pushed a commit to fifield/mlir-aie that referenced this pull request Nov 12, 2025
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants