Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions src/runtime/HalideRuntimeVulkan.h
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,64 @@ extern int halide_vulkan_release_context(void *user_context,
VkDevice device,
VkQueue queue,
VkDebugUtilsMessengerEXT messenger);

typedef int (*halide_vulkan_acquire_context_t)(void *user_context,
struct halide_vulkan_memory_allocator **allocator,
VkInstance *instance,
VkDevice *device,
VkPhysicalDevice *physical_device,
VkQueue *queue,
uint32_t *queue_family_index,
VkDebugUtilsMessengerEXT *messenger,
bool create);
typedef int (*halide_vulkan_release_context_t)(void *user_context,
VkInstance instance,
VkDevice device,
VkQueue queue,
VkDebugUtilsMessengerEXT messenger);

/** Override the Vulkan context acquisition callback. Returns the previous
* handler. If unset, Halide uses its built-in Vulkan context management.
*/
extern halide_vulkan_acquire_context_t halide_set_vulkan_acquire_context(halide_vulkan_acquire_context_t handler);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No .... I don't think we can allow this. This doesn't match the runtime interface design. These methods are overloaded via weak linking.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the setter callbacks to support embedding environments where weak-symbol interposition is unreliable or unavailable, especially Windows-style linkage. Isn't vulkan cross-platform? If you don't want this in this PR I can move it out, but I think this is important to discuss.

Copy link
Copy Markdown
Contributor

@derek-gerstmann derek-gerstmann May 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, lets move the setter callbacks into a separate PR.

I'll raise this at the next community dev meeting.

I believe the CUDA get/set acquire/release context methods were added to support JIT compilation many years ago, but we really don't want to force an indirect call for everyone if we don't have to. With AOT, you can always override this method yourself regardless of the weak symbols.


/** Override the Vulkan context release callback. Returns the previous handler. */
extern halide_vulkan_release_context_t halide_set_vulkan_release_context(halide_vulkan_release_context_t handler);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows-OS doesn't like WEAK. Vulkan should be eventually supported by it, am I mistaken?
So to give you some context: I'm developing/building a cross-platform studio that uses halide as recommended way to implement image-processing kernels. This thing, owns stuff, vkDevice, vkInstance and stuff. But the intention is to leverage the memory allocator inside halide safely. This leads me to:

  1. First introduce this APIs like it was done for CUDA I think, without weak linkage to support windows?
  2. The gpu compilation cache keyed by allocator instead of vkDevice because halide doesn't own it

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more a design decision for the Halide runtime more than anything else. They all follow the same interface which has been very stable for a long long time. Yes, the MSVS toolchain is a pain to deal with for weak linking, but that doesn't prevent you from writing your own custom runtime which is usually what most integrators due when they wish to customize the behavior of the runtime to their app/framework.

My main concern is forcing an indirect call for all acquire/release context invocations. I'll raise this at the dev meeting this week and let you know how to proceed!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xFile3160 Okay, chatted with the rest of the team, and adding the get/set acquire/release access methods is fine if it makes things easier to use. Feel free to leave them in this PR!


/** Ensure a Halide Vulkan memory allocator exists for an externally-managed
* Vulkan context. Intended for embedders that override
* halide_vulkan_acquire_context()/halide_vulkan_release_context().
*
* The embedder should store the returned allocator with the same object that
* owns the external context, return it from later acquire-context calls for
* that context, and release it when that external context is torn down.
*
* This call refreshes Halide's Vulkan dispatch tables for the supplied
* instance/device. If `*allocator` is null, a new allocator bound to
* `device`/`physical_device` is created and stored back. If `*allocator` is
* non-null, it must already be bound to the supplied device.
*/
extern int halide_vulkan_acquire_memory_allocator(void *user_context,
struct halide_vulkan_memory_allocator **allocator,
VkInstance instance,
VkDevice device,
VkPhysicalDevice physical_device);

/** Destroy a Halide Vulkan memory allocator created for an externally-managed
* Vulkan context after the embedder has ensured no in-flight Halide work is
* using it. This only releases Halide-owned allocator and shader-module state;
* it does not destroy the Vulkan instance, device, queue, or any
* embedder-owned debug messenger.
*
* This call refreshes Halide's Vulkan dispatch tables for the supplied
* instance/device. The supplied device and physical_device must match the
* allocator's context.
*/
extern int halide_vulkan_release_memory_allocator(void *user_context,
struct halide_vulkan_memory_allocator *allocator,
VkInstance instance,
VkDevice device,
VkPhysicalDevice physical_device);
// --

// Override the default allocation callbacks (default uses Vulkan runtime implementation)
Expand Down
2 changes: 1 addition & 1 deletion src/runtime/gpu_context_common.h
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ class GPUCompilationCache {
}

for (int i = 0; i < (1 << log2_compilations_size); i++) {
if (compilations[i].kernel_id > kInvalidId &&
if (compilations[i].kernel_id > kDeletedId &&
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you changing things in the GPUCompilationCache?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shader modules cached in GPUCompilationCache are owned by Halide runtime state associated with the allocator used to create/destroy them. For externally managed contexts, VkDevice lifetime and Halide allocator lifetime are not the same boundary. Keying by allocator lets the release_memory_allocator delete only the cache entries owned by that allocator. I've done this to prevent stale shader-module/cache when external context tear down Halide allocator state without destroying the vkDevice, not owned by halide

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's two separate issues here. The first is the line I commented on in gpu_context_common.h. Why are you changing anything in this file?

The second issue is the change in the type definition for the GPUCompilationCache Key being used in the Vulkan runtime. The reason it was specified with the Device pointer was to allow sharing across contexts for the same devices created by the same instance to minimize kernel launch overhead.

Changing this to the allocator pointer now means the compilation cache isn't shared for all contexts for the same device, since the allocator pointer is created dynamically for the context.

I'd suggest leaving it as it is, and release the compilation cache inside of halide_vulkan_release_allocator() to detach the external vkDevice.

(all || (compilations[i].context == context)) &&
compilations[i].use_count == 0) {
debug(user_context) << "Releasing cached compilation: " << compilations[i].module_state
Expand Down
4 changes: 4 additions & 0 deletions src/runtime/runtime_api.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -213,10 +213,14 @@ extern "C" __attribute__((used)) void *halide_runtime_api_functions[] = {
(void *)&halide_d3d12compute_release_context,
(void *)&halide_d3d12compute_run,
(void *)&halide_vulkan_acquire_context,
(void *)&halide_vulkan_acquire_memory_allocator,
(void *)&halide_vulkan_device_interface,
(void *)&halide_vulkan_initialize_kernels,
(void *)&halide_vulkan_release_memory_allocator,
(void *)&halide_vulkan_release_context,
(void *)&halide_vulkan_run,
(void *)&halide_set_vulkan_acquire_context,
(void *)&halide_set_vulkan_release_context,
(void *)&halide_webgpu_device_interface,
(void *)&halide_webgpu_initialize_kernels,
(void *)&halide_webgpu_finalize_kernels,
Expand Down
177 changes: 162 additions & 15 deletions src/runtime/vulkan.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,34 @@ using namespace Halide::Runtime::Internal::Vulkan;

// --------------------------------------------------------------------------

extern "C" {
namespace Halide {
namespace Runtime {
namespace Internal {
namespace Vulkan {

// --------------------------------------------------------------------------
ALWAYS_INLINE int vk_load_external_context_functions(void *user_context, VkInstance instance, VkDevice device) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't really specific to external contexts ... just call it vk_load_vulkan_interface

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should only be done once per context creation, not repeatedly.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helper was introduced because with the new api to acquire external context, the caller returns an already created VkInstance/VkDevice etc. Halide still needs the device functions internally. This helper is making sure Halide's dispatch table is initialized for the supplied external instance/device. But you're right, vk_load_vulkan_interface is probably better and it should not be called everytime

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really don't want to be reloading dispatch tables. They should be loaded once, when the context is created.

if (vkGetInstanceProcAddr == nullptr) {
vk_load_vulkan_loader_functions(user_context);
if (vkGetInstanceProcAddr == nullptr) {
error(user_context) << "Vulkan: Failed to resolve loader functions for external context!\n";
return halide_error_code_symbol_not_found;
}
}

vk_load_vulkan_instance_functions(user_context, instance);
if (vkGetPhysicalDeviceProperties == nullptr || vkGetDeviceProcAddr == nullptr) {
error(user_context) << "Vulkan: Failed to resolve instance functions for external context!\n";
return halide_error_code_symbol_not_found;
}

vk_load_vulkan_device_functions(user_context, device);
if (vkCreateBuffer == nullptr || vkAllocateMemory == nullptr) {
error(user_context) << "Vulkan: Failed to resolve device functions for external context!\n";
return halide_error_code_symbol_not_found;
}

return halide_error_code_success;
}

// The default implementation of halide_acquire_vulkan_context uses
// the global pointers above, and serializes access with a spin lock.
Expand All @@ -29,15 +54,15 @@ extern "C" {
// call to halide_release_vulkan_context. halide_acquire_vulkan_context
// should block while a previous call (if any) has not yet been
// released via halide_release_vulkan_context.
WEAK int halide_vulkan_acquire_context(void *user_context,
Copy link
Copy Markdown
Contributor

@derek-gerstmann derek-gerstmann May 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can't be changed ... they need to match all the other runtimes.

halide_vulkan_memory_allocator **allocator,
VkInstance *instance,
VkDevice *device,
VkPhysicalDevice *physical_device,
VkQueue *queue,
uint32_t *queue_family_index,
VkDebugUtilsMessengerEXT *messenger,
bool create) {
WEAK int default_vulkan_acquire_context(void *user_context,
halide_vulkan_memory_allocator **allocator,
VkInstance *instance,
VkDevice *device,
VkPhysicalDevice *physical_device,
VkQueue *queue,
uint32_t *queue_family_index,
VkDebugUtilsMessengerEXT *messenger,
bool create) {
#ifdef DEBUG_RUNTIME
halide_start_clock(user_context);
#endif
Expand Down Expand Up @@ -74,11 +99,133 @@ WEAK int halide_vulkan_acquire_context(void *user_context,
return halide_error_code_success;
}

WEAK int halide_vulkan_release_context(void *user_context, VkInstance instance, VkDevice device, VkQueue queue, VkDebugUtilsMessengerEXT messenger) {
WEAK int default_vulkan_release_context(void *user_context, VkInstance instance, VkDevice device, VkQueue queue, VkDebugUtilsMessengerEXT messenger) {
halide_mutex_unlock(&thread_lock);
return halide_error_code_success;
}

WEAK halide_vulkan_acquire_context_t vulkan_acquire_context_handler =
default_vulkan_acquire_context;
WEAK halide_vulkan_release_context_t vulkan_release_context_handler =
default_vulkan_release_context;

} // namespace Vulkan
} // namespace Internal
} // namespace Runtime
} // namespace Halide

// --------------------------------------------------------------------------

extern "C" {

// --------------------------------------------------------------------------

WEAK int halide_vulkan_acquire_context(void *user_context,
halide_vulkan_memory_allocator **allocator,
VkInstance *instance,
VkDevice *device,
VkPhysicalDevice *physical_device,
VkQueue *queue,
uint32_t *queue_family_index,
VkDebugUtilsMessengerEXT *messenger,
bool create) {
return vulkan_acquire_context_handler(user_context, allocator, instance, device,
Copy link
Copy Markdown
Contributor

@derek-gerstmann derek-gerstmann May 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be changed.

physical_device, queue, queue_family_index,
messenger, create);
}

WEAK int halide_vulkan_release_context(void *user_context, VkInstance instance, VkDevice device, VkQueue queue, VkDebugUtilsMessengerEXT messenger) {
return vulkan_release_context_handler(user_context, instance, device, queue, messenger);
}

WEAK halide_vulkan_acquire_context_t halide_set_vulkan_acquire_context(halide_vulkan_acquire_context_t handler) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't allowed. Overloading is currently handled via weak linking, and this needs to be the same across all runtimes.

halide_vulkan_acquire_context_t result = vulkan_acquire_context_handler;
vulkan_acquire_context_handler = handler ? handler : default_vulkan_acquire_context;
return result;
}

WEAK halide_vulkan_release_context_t halide_set_vulkan_release_context(halide_vulkan_release_context_t handler) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

halide_vulkan_release_context_t result = vulkan_release_context_handler;
vulkan_release_context_handler = handler ? handler : default_vulkan_release_context;
return result;
}

WEAK int halide_vulkan_acquire_memory_allocator(void *user_context,
halide_vulkan_memory_allocator **allocator,
VkInstance instance,
VkDevice device,
VkPhysicalDevice physical_device) {
if (allocator == nullptr) {
error(user_context) << "Vulkan: allocator output pointer is null!\n";
return halide_error_code_buffer_argument_is_null;
}
if (instance == VK_NULL_HANDLE || device == VK_NULL_HANDLE || physical_device == VK_NULL_HANDLE) {
error(user_context) << "Vulkan: invalid external context handles for allocator acquisition!\n";
return halide_error_code_device_interface_no_device;
}

int error_code = vk_load_external_context_functions(user_context, instance, device);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should only ever be called during context creation, and only once.

if (error_code != halide_error_code_success) {
return error_code;
}

VulkanMemoryAllocator *runtime_allocator =
reinterpret_cast<VulkanMemoryAllocator *>(*allocator);
if (runtime_allocator != nullptr) {
if (runtime_allocator->current_device() != device ||
runtime_allocator->current_physical_device() != physical_device) {
error(user_context) << "Vulkan: external allocator does not match supplied device handles!\n";
return halide_error_code_internal_error;
}
return halide_error_code_success;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't actually return the allocator pointer. What's the intent?

}

const VkAllocationCallbacks *alloc_callbacks =
halide_vulkan_get_allocation_callbacks(user_context);
runtime_allocator =
vk_create_memory_allocator(user_context, device, physical_device, alloc_callbacks);
if (runtime_allocator == nullptr) {
error(user_context) << "Vulkan: Failed to create memory allocator for external context!\n";
return halide_error_code_out_of_memory;
}

*allocator = reinterpret_cast<halide_vulkan_memory_allocator *>(runtime_allocator);
return halide_error_code_success;
}

WEAK int halide_vulkan_release_memory_allocator(void *user_context,
halide_vulkan_memory_allocator *allocator,
VkInstance instance,
VkDevice device,
VkPhysicalDevice physical_device) {
VulkanMemoryAllocator *runtime_allocator =
reinterpret_cast<VulkanMemoryAllocator *>(allocator);
if (runtime_allocator == nullptr) {
return halide_error_code_success;
}
if (instance == VK_NULL_HANDLE || device == VK_NULL_HANDLE || physical_device == VK_NULL_HANDLE) {
error(user_context) << "Vulkan: invalid external context handles for allocator release!\n";
return halide_error_code_device_interface_no_device;
}
if (runtime_allocator->current_device() != device ||
runtime_allocator->current_physical_device() != physical_device) {
error(user_context) << "Vulkan: external allocator does not match supplied device handles during release!\n";
return halide_error_code_internal_error;
}

int error_code = vk_load_external_context_functions(user_context, instance, device);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this should only be called during context creation, and only once.

if (error_code != halide_error_code_success) {
return error_code;
}
if (vkDestroyShaderModule == nullptr || vkFreeMemory == nullptr) {
error(user_context) << "Vulkan: Failed to resolve device functions for external allocator release!\n";
return halide_error_code_symbol_not_found;
}

vk_destroy_shader_modules(user_context, runtime_allocator);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you destroying shader modules in this method?

return vk_destroy_memory_allocator(user_context, runtime_allocator);
}

WEAK bool halide_vulkan_is_initialized() {
halide_mutex_lock(&thread_lock);
bool is_initialized = (cached_instance != nullptr) && (cached_device != nullptr);
Expand Down Expand Up @@ -159,7 +306,7 @@ WEAK int halide_vulkan_initialize_kernels(void *user_context, void **state_ptr,
debug(user_context) << "halide_vulkan_initialize_kernels got compilation_cache mutex.\n";

VulkanCompilationCacheEntry *cache_entry = nullptr;
if (!compilation_cache.kernel_state_setup(user_context, state_ptr, ctx.device, cache_entry,
if (!compilation_cache.kernel_state_setup(user_context, state_ptr, ctx.allocator, cache_entry,
Halide::Runtime::Internal::Vulkan::vk_compile_kernel_module,
user_context, ctx.allocator, src, size)) {
error(user_context) << "Vulkan: Failed to setup compilation cache!\n";
Expand All @@ -185,7 +332,7 @@ WEAK void halide_vulkan_finalize_kernels(void *user_context, void *state_ptr) {

VulkanContext ctx(user_context);
if (ctx.error == halide_error_code_success) {
compilation_cache.release_hold(user_context, ctx.device, state_ptr);
compilation_cache.release_hold(user_context, ctx.allocator, state_ptr);
}

#ifdef DEBUG_RUNTIME
Expand Down Expand Up @@ -1151,7 +1298,7 @@ WEAK int halide_vulkan_run(void *user_context,

// 1. Get the shader module cache entry
VulkanCompilationCacheEntry *cache_entry = nullptr;
bool found = compilation_cache.lookup(ctx.device, state_ptr, cache_entry);
bool found = compilation_cache.lookup(ctx.allocator, state_ptr, cache_entry);
if (!found || (cache_entry == nullptr)) {
error(user_context) << "Vulkan: Failed to locate shader module! Unable to proceed!\n";
return halide_error_code_internal_error;
Expand Down
Loading