GPU offloading preconditioner#4953
Conversation
|
Thanks for this, it looks really promising. I have some suggestions about the interface. It seems to me that the current implementation might be abusing from Once |
|
I should have marked this as a draft from the start, apologies for missing that. @pbrubeck thanks for the detailed feedback. I like your suggested approach of having Where |
|
The issue in the CI seems to indicate that the host that the GPU tests attempted to run on did not have Nvidia drivers installed. Is it possible there are runners with the |
I've passed this on so it should get investigated and fixed shortly. |
Have you tried setting up PETSc inside the Nvidia-provided Cuda Docker image locally and running the tests, instead of on your local PC installation, just to check? I attempted this before (at the time it was PETSc installed inside a clean cuda:12.9 Ubuntu Docker image from Nvidia, though I can check my Dockerfile to confirm if needed), and I couldn’t get PETSc to work properly in that setup. When passing the --download-mpi flag to PETSc and then checking ompi_info, I saw the following MPI extensions: affinity, cuda, ftmpi, rocm, shortfloat. It seemed odd that both rocm and cuda appeared, especially since this was inside the CUDA container with no other prior installations. Could you check whether you’re seeing the same extensions on your side? Note: this was on a older PETSc version |
|
Thanks for the info @Olender. My testing setup is a little more complicated by necessity as I don't have access to a system that I can both run Docker on and that has Nvidia hardware. I'm testing the build set-up from the base Ubuntu image in Docker as in the longer term we're trying to augment the current installation instructions to include GPU builds. For that reason, I'm cross-compiling, so I download the full CUDA software stack into the container (stuck at 12.9 until this PETSc issue is resolved) and build PETSc, though I can't run |
013d6f0 to
4e8ca5b
Compare
connorjward
left a comment
There was a problem hiding this comment.
I think this is getting better. The CI error now appears to be a more mundane missing apt-get update.
33bf483 to
04d12d9
Compare
Yes, its looking good now. After a few iterations we have a working build. Now I can work on integrating a GPU test into |
69c58e7 to
decca0a
Compare
I commented on the distinction between A and P because I saw one of the use cases with |
Yeah this is the approach I was imagining. We already have to play conditional import games for things like JAX and PyTorch. I'm not overly concerned about this.
Fair enough. That is challenging. You could probably achieve this in a reasonable way if you added a step to the GPU CI very much like this one in petsctools: Note that I'm running the test file as a script and therefore relying on the |
|
@pbrubeck and @dsroberts I want to make sure that we're not being too perfectionist here. This is brand new functionality and doesn't have to be absolutely perfect in all circumstances from the outset. Once this is merged we have 6 months until it is in release and as such we can change the API as much as we like until then. |
…n OffloadPC is used
0bc7bc3 to
ff2e343
Compare
|
@connorjward you make a good point. I think what we have now is a good starting point and a clear idea of what work will come next. Since G-ADOPT is my priority, the next round of GPU work to come from me will be working out how to offload after a fieldsplit has been performed. HIP/ROCm to come after that. Also, please ensure @Olender is listed as a co-author in the merge commit, this work borrows a lot from #4166, and @Olender absolutely deserves credit for that in the next round of change logs. |
connorjward
left a comment
There was a problem hiding this comment.
Sorry it took me a couple of days to get to this. I think once these very minor things are addressed I am happy to merge.
| prefix = pc.getOptionsPrefix() or "" | ||
| options_prefix = prefix + self._prefix | ||
|
|
||
| self.device_mat = device_matrix_type(pc.comm.rank == 0) |
There was a problem hiding this comment.
I think this touches on a bigger open question about how we should do logging and warning and whether we want to force it to be collective or not. I am happy to continue to pass the rank here for the moment.
That said I think making warn a kwarg here is a good idea. Just add *, to the argument list.
Co-authored-by: Connor Ward <c.ward20@imperial.ac.uk>
Co-authored-by: Connor Ward <c.ward20@imperial.ac.uk>
Co-authored-by: Connor Ward <c.ward20@imperial.ac.uk>
Co-authored-by: Connor Ward <c.ward20@imperial.ac.uk>
…est for offload preconditioner. Add single-process tests to GPU github action.
70d7f93 to
47a19de
Compare
d5cb43f to
1c49242
Compare
|
@connorjward I know what you mean regarding logging, I don't think there is a nice, one-size-fits-all solution. This was an easy one as its a collectively consistent function that will never change throughout a job run, hence using Thanks for pointing out that test. I've attempted to add in the ETA: I guess that should be |
connorjward
left a comment
There was a problem hiding this comment.
I am happy with this. Thank you so much for doing this! It's only a small PC but the surrounding work is very involved. This will be very useful for later GPU work.
Description
First pass at
firedrake-configurefor GPU-enabled PETSc builds, as well as a corresponding github actions workflow. An optional--gpu-archflag has been added which adds all the necessary configuration to go from a fresh Ubuntu installation to a working Firedrake build (mostly) following existing build instructions.OffloadPCitself is more or less unchanged from @Olender's work for now, but with a few extra checks for whether GPU offload is available. I expect it will take a few iterations to get this going under CI, so once that's up and running reliably we can look at further development to theOffloadPCfunctionality and testing.I've tried to put this together in a way that can be expanded upon fairly easily. We're interested in ROCm/HIP as well as Cuda, so there are a couple of dictionaries that just have a
cudakey for now that will be expanded upon in time. I also thought thatOffloadPCshould be a no-op with a warning if there are any issues around GPUs, rather than crashing out. I think this is the right choice for workflow portability and heterogeneous systems, but it would be just as easy to raise exceptions there.