Add external C functions support for JIT by mawad-amd · Pull Request #2475 · Xilinx/mlir-aie

mawad-amd · 2025-08-05T10:38:29Z

This PR adds:

External function support for JIT-compiled designs
Tests for external functions
Helper tensor.fill_ function and test for it

The syntax looks like:

add_value = ExternalKernel(
     "add_value",
     source_string="""extern "C" {
         void add_value(int* input, int* output, int tile_size) {
             for (int i = 0; i < tile_size; i++) {
                 output[i] = input[i] + ADD_VALUE;
             }
         }
     }""",
     arg_types=[
         np.ndarray[(16,), np.dtype[np.int32]],
         np.ndarray[(16,), np.dtype[np.int32]],
         np.int32,
     ],
     compile_flags=[f"-DADD_VALUE=2"],
 )
transform(input_tensor, output_tensor, add_value)

fifield · 2025-08-05T16:21:05Z

Looks good, I have some questions:

Can there be multiple top level functions in one source string, which go into a single object file? Or will multiple ExternalKernel be combined into one .o?
At the risk of bikeshedding a bit, why ExternalKernel instead of ExternalFunction (if a single function)? or ExternalModule (if multiple functions per source string)?
Does C++ mangling work? i.e. could I drop the extern "C"?

jgmelber · 2025-08-05T18:05:42Z

Echoing that I am interested in Jeff's questions and casting a vote for ExternalFunction.

mawad-amd · 2025-08-05T18:32:09Z

For first and last comment, the implementation in the PR inherits the capabilities of the Kernel and Worker class.

Can there be multiple top level functions in one source string, which go into a single object file? Or will multiple ExternalKernel be combined into one .o?

Technically, yes, you could have multiple kernels but you only declare one function that you will use at the moment. We could turn the name into a list of names (my changes should still work) but the Worker class supports only one single external function call. Seems like we should be able to support multiple functions in the Worker abstraction but please correct me if not (I didn't author the worker class).

At the risk of bikeshedding a bit, why ExternalKernel instead of ExternalFunction (if a single function)? or ExternalModule (if multiple functions per source string)?

Yeah I like ExternalFunction too. Will make the change when we finish discussing Function vs. Module (I think we have 3 votes).

Does C++ mangling work? i.e. could I drop the extern "C"?

Looks like yes! I have just tried what is below. Not suggesting that the user go find the mangled name but not sure yet what would be a nice way to resolve that (i.e., what would the user type for name) or even better lookup the function when generating the MLIR (not sure if it is straight forward). Any ideas? (Would love to discuss this more but prefer to keep it to a different PR).

    add_one_templated = ExternalKernel(
        "_Z7add_oneIiEvPT_S1_i",
        source_string="""
            template<typename T>
            void add_one(T* input, T* output, int tile_size) {
                for (int i = 0; i < tile_size; i++) {
                    output[i] = input[i] + 1;
                }
            }

            template void add_one<int>(int* input, int* output, int tile_size);
        """,
        arg_types=[
            np.ndarray[(16,), np.dtype[np.int32]],
            np.ndarray[(16,), np.dtype[np.int32]],
            np.int32,
        ],
    )

mawad-amd · 2025-08-05T18:32:56Z

Also CI is failing on NPU2 for some reason -- I will need to debug that, I think I am passing the correct compiler options.

ypapadop-amd

Some comments on the code. The biggest one is that I don't see how this can be used outside iron.jit and relatively simple core functions.

Things that I'd be looking as a developer:

Ability to separate the C and Python codes for ease of development and testability.
Ability to compile an external function / kernel outside iron.jit.
Maintaining the flexibility of compiling the C code manually.

ypapadop-amd · 2025-08-05T21:08:23Z

The alternative I have been using is this one:

def unary_op(input_tensor, output_tensor, core_function_info: CoreFunctionInfo):
    """Implements output = op(input)."""

def unary_op_core_function_info(
    op_name: str, device, input_tensors: list, output_tensor
):
    """Returns a compilation specification for unary ops."""

    current_dir = path.dirname(path.realpath(__file__))
    return CoreFunctionInfo(
        source_file=path.join(current_dir, "file_with_ops.cc"),
        exported_function= op_name,
        compile_args=[
            f"-DCOMPILE_{op_name.upper()}=1",
            f"-DINPUT_DTYPE={dtype_to_str(input_tensors[0].dtype)}",
            f"-DOUTPUT_DTYPE={dtype_to_str(output_tensor.dtype)}",
        ],
    )

@core_function(partial(unary_op_core_function_info, op_name="sqrt"))
def sqrt(
    input_tensors: list, output_tensor, core_function_info: CoreFunctionInfo
):
    """SQRT implementation."""
    return unary_op(*input_tensors, output_tensor, core_function_info)

A compilation function is responsible from getting the attribute core_function from sqrt. The core function knows about args and device intentionally. We could add additional attributes to determine the tile sizes etc.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

mawad-amd · 2025-08-05T22:00:18Z

Thanks for the feedback @ypapadop-amd. I addressed some comments and moved some other discussion to the main thread here (naming and C++ support). Regarding the naming, I am open to ideas, we currently think ExternalFunction is best. Regrading C++, we can support it but it opens other questions see above.

This abstraction is intended for JIT module only. There is already a Kernel abstraction that users can use for non-JIT designs. I would like to have the JIT module have full visibility over the entire Python program, and I generally think that there is some unifications and cleanup at the backend (for placed vs. unplaced) -- longer discussion.

jgmelber · 2025-08-05T23:02:09Z

There is already a Kernel abstraction that users can use for non-JIT designs. I would like to have the JIT module have full visibility over the entire Python program, and I generally think that there is some unifications and cleanup at the backend (for placed vs. unplaced) -- longer discussion.

We should explore unification and simplification for external functions to focus on supporting unplaced and JIT Python

ypapadop-amd · 2025-08-06T00:35:50Z

Thanks for the feedback @ypapadop-amd. I addressed some comments and moved some other discussion to the main thread here (naming and C++ support). Regarding the naming, I am open to ideas, we currently think ExternalFunction is best. Regrading C++, we can support it but it opens other questions see above.

This abstraction is intended for JIT module only. There is already a Kernel abstraction that users can use for non-JIT designs. I would like to have the JIT module have full visibility over the entire Python program, and I generally think that there is some unifications and cleanup at the backend (for placed vs. unplaced) -- longer discussion.

I think when we have the opportunity to converge to a single design that fulfills all use-cases, we should seize it, rather than keep diverging.

ypapadop-amd · 2025-08-06T01:17:48Z

For first and last comment, the implementation in the PR inherits the capabilities of the Kernel and Worker class.

Can there be multiple top level functions in one source string, which go into a single object file? Or will multiple ExternalKernel be combined into one .o?

Technically, yes, you could have multiple kernels but you only declare one function that you will use at the moment. We could turn the name into a list of names (my changes should still work) but the Worker class supports only one single external function call. Seems like we should be able to support multiple functions in the Worker abstraction but please correct me if not (I didn't author the worker class).

A .o can have multiple exported symbols and we should be able to call them if they are store as a key-value pair (key being a a "templated" function type and value the strongly typed one).

Does C++ mangling work? i.e. could I drop the extern "C"?

Looks like yes! I have just tried what is below. Not suggesting that the user go find the mangled name but not sure yet what would be a nice way to resolve that (i.e., what would the user type for name) or even better lookup the function when generating the MLIR (not sure if it is straight forward). Any ideas? (Would love to discuss this more but prefer to keep it to a different PR).
    add_one_templated = ExternalKernel(
        "_Z7add_oneIiEvPT_S1_i",
        source_string="""
            template<typename T>
            void add_one(T* input, T* output, int tile_size) {
                for (int i = 0; i < tile_size; i++) {
                    output[i] = input[i] + 1;
                }
            }

            template void add_one<int>(int* input, int* output, int tile_size);
        """,
        arg_types=[
            np.ndarray[(16,), np.dtype[np.int32]],
            np.ndarray[(16,), np.dtype[np.int32]],
            np.int32,
        ],
    )

We can get a list of exported functions from the .o and demangle them using cxxfilt (https://pypi.org/project/cxxfilt/) in a dict of demangled-mangled names. Then at the call site, you can give the demangled name and match it to one of them. E.g.,

    add_one_templated = ExternalKernel(
        "void add_one<int>(int*, int*, int)",
        source_string="""
            template<typename T>
            void add_one(T* input, T* output, int tile_size) {
                for (int i = 0; i < tile_size; i++) {
                    output[i] = input[i] + 1;
                }
            }

            template void add_one<int>(int* input, int* output, int tile_size);
        """,

mawad-amd · 2025-08-06T08:53:19Z

OK I got CI working finally. NPU2 runner was hitting a cached NPU1 code somehow (is there a shared storage?). The hash now contains enough information to disambiguate archs. Please review the PR again.

Regarding the suggested improvements (ordered by what I think is important):

Backend improvements: There are some ownership issues at the moment (even without considering the JIT module). I think what we need to do is to restructure the entire backend (for unplaced and JIT) so that we have a Program that owns Kernel/Function objects and provide APIs for compilation and iterators over objects. I can discuss more and happy to help with this, but this will require decent amount of work, more than one person but is the most important thing IMO.
Support for multiple functions within the same object file: once worker class support multiple functions (1), we can pursue this.
C++ support would be nice improvement. Open an issue for this.

ypapadop-amd · 2025-08-06T20:08:09Z

@fifield @mawad-amd I made a prototype to support mangled names in #2480 I need to figure out why I need explicit paths but it seems to work.

jgmelber

I think this LGTM. I'd like to see how one of the existing programming examples would work with this compiling from a file in aie_kernels.

Lets discuss a plan to convert more/all programming examples to JIT once this lands.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

mawad-amd added 5 commits August 5, 2025 03:12

Add ExternalKernel support for JIT

9ca1889

Add tests

058ad42

Add tesnor fill_ test

f5c7fe4

Update test name

6d36175

Rename jit test files

136a5a7

mawad-amd requested a review from jgmelber August 5, 2025 10:38

mawad-amd requested review from AndraBisca, fifield and stephenneuendorffer as code owners August 5, 2025 10:38

mawad-amd added 4 commits August 5, 2025 03:54

Fix compiler flags for NPU2

e29af78

Use define flags instead of debug

def58f0

Remove stray compiler flags

de7e523

Remove aie opt path addition

1c63fe5

ypapadop-amd reviewed Aug 5, 2025

View reviewed changes

Comment thread python/iron/jit.py

Comment thread python/iron/jit.py Outdated

Comment thread test/python/jit_extern_functions.py

Comment thread test/python/jit_extern_functions.py

ypapadop-amd reviewed Aug 5, 2025

View reviewed changes

Comment thread python/iron/jit.py Outdated

mawad-amd and others added 4 commits August 5, 2025 14:36

Merge branch 'main' into muhaawad/extern-func

3f20500

Use existing compile_cxx_core_function

4feb272

Update python/iron/jit.py

8926f75

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Move import to the top

37f61d3

ypapadop-amd reviewed Aug 6, 2025

View reviewed changes

Comment thread python/iron/jit.py Outdated

Try debugging the workflow. Can't repro locally.

433aa64

mawad-amd and others added 14 commits August 5, 2025 22:54

Fix cmake command

ebd4eed

Debug

7eb8f3b

Debug

7e861d5

Debug

44155c6

debug

ec2a777

Debug

c5c9692

Debug

5306493

Add debugging statments

ae45e4f

Include the device in the hash

6689356

Merge branch 'main' into muhaawad/extern-func

0898a48

Revert debug code

561dbe3

Remove debug code and handke NPU2Col1 and NPU1Col1

85b864e

handle other device names

4a22567

Replace ExternalKernel with ExternalFunction

750d705

Improve comments

2b4db11

ypapadop-amd reviewed Aug 6, 2025

View reviewed changes

Comment thread python/iron/jit.py

ypapadop-amd reviewed Aug 6, 2025

View reviewed changes

Comment thread python/iron/jit.py

mawad-amd added 2 commits August 6, 2025 19:52

Merge branch 'main' into muhaawad/extern-func

3e7bd80

Merge branch 'main' into muhaawad/extern-func

10b49ea

jgmelber approved these changes Aug 8, 2025

View reviewed changes

mawad-amd mentioned this pull request Aug 8, 2025

Introduce get_target_architecture helper function #2485

Open

Merge branch 'main' into muhaawad/extern-func

3cb0b9b

mawad-amd enabled auto-merge August 8, 2025 20:09

mawad-amd added this pull request to the merge queue Aug 8, 2025

Merged via the queue into main with commit cea3392 Aug 8, 2025
51 checks passed

mawad-amd deleted the muhaawad/extern-func branch August 8, 2025 20:48

fifield pushed a commit to fifield/mlir-aie that referenced this pull request Nov 12, 2025

Add external C functions support for JIT (Xilinx#2475)

a706ad2

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Conversation

mawad-amd commented Aug 5, 2025

Uh oh!

fifield commented Aug 5, 2025

Uh oh!

jgmelber commented Aug 5, 2025

Uh oh!

mawad-amd commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mawad-amd commented Aug 5, 2025

Uh oh!

ypapadop-amd left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ypapadop-amd commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mawad-amd commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jgmelber commented Aug 5, 2025

Uh oh!

ypapadop-amd commented Aug 6, 2025

Uh oh!

ypapadop-amd commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mawad-amd commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ypapadop-amd commented Aug 6, 2025

Uh oh!

jgmelber left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mawad-amd commented Aug 5, 2025 •

edited

Loading

ypapadop-amd left a comment •

edited

Loading

ypapadop-amd commented Aug 5, 2025 •

edited

Loading

mawad-amd commented Aug 5, 2025 •

edited

Loading

ypapadop-amd commented Aug 6, 2025 •

edited

Loading

mawad-amd commented Aug 6, 2025 •

edited

Loading