posix: disk_space_check thread unsafe removal in posix_fini


## Summary

`posix_fini` removes a brick from the shared `disk_space_check` thread using a direct `diskxl_count` decrement instead of the safe removal protocol (`delete_posix_diskxl`). This causes two bugs:

1. **Use-after-free (brick multiplexing + init failure):** The brick's `posix_diskxl` entry stays on the shared iteration list after `posix_fini` frees `priv`. The disk thread continues visiting the freed brick's entry, dereferencing `priv->base_path`, `priv->disk_reserve`, and `priv->disk_unit_percent`.

2. **Reference count double-decrement (all multi-brick shutdowns):** When both `posix_notify(PARENT_DOWN)` and `posix_fini` run (normal operational shutdown), `ctx->diskxl_count` is decremented twice per brick. The `uint32_t` underflow to `UINT32_MAX` prevents a double `pthread_join` by arithmetic coincidence, but the ref-count is structurally wrong.

## How to reproduce

### Bug 1: UAF on init failure in brick mux

1. Enable brick multiplexing: `gluster volume set all cluster.brick-multiplex on`
2. Start a volume with 2+ bricks on the same node
3. Cause one brick's `posix_init` to fail after `posix_spawn_disk_space_check_thread` succeeds but before init completes (e.g., remove the brick directory's volume-id xattr, or make the health check test directory unwritable)
4. The process continues serving other bricks. The disk_space_check thread visits the failed brick's freed `priv` pointer.

The race window is narrow (between the failed brick registering with the disk thread and the cleanup thread's `PARENT_DOWN` delivery), but the UAF is structurally present.

### Bug 2: double-decrement on normal shutdown

1. Run any multi-brick glusterfsd (with or without brick mux)
2. Stop the volume: `gluster volume stop <vol>`
3. `posix_notify(PARENT_DOWN)` calls `delete_posix_diskxl` → decrements `diskxl_count`
4. `posix_fini` directly decrements `diskxl_count` again
5. With 2 bricks: `diskxl_count` goes 2→1 (PARENT_DOWN) →0 (fini, joins thread) → underflow to `UINT32_MAX` (brick 2's PARENT_DOWN)

No crash occurs because `UINT32_MAX != 0`, so the `count == 0` join check doesn't trigger again. The thread was already joined by brick 1's fini.

## Root cause and history

Commit [`3f93be77e1`](https://github.com/gluster/glusterfs/commit/3f93be77e1) (Mohit Agrawal, 2020-11-09, PR #1595, fixes #1482) refactored the disk_space_check thread from per-brick to per-process to solve a real scaling problem: with brick mux at 250 bricks, 250 threads were spawned just for disk space checks.

**Before the refactor**, the thread was per-brick (`priv->disk_space_check`), and `posix_fini` cleaned it up the same way as all per-brick threads — set a flag, cancel, join:

```c
if (priv->disk_space_check) {
    priv->disk_space_check_active = _gf_false;
    (void)gf_thread_cleanup_xint(priv->disk_space_check);
    priv->disk_space_check = 0;
}
```

Simple and correct — no shared state, no ref-counting, no iteration list.

**The refactor** introduced two code paths for removing a brick from the now-shared thread:

**Safe path ([`delete_posix_diskxl`](https://github.com/gluster/glusterfs/blob/ae1d69672f/xlators/storage/posix/src/posix-common.c#L130-L155)):**
- Sets `pxl->detach_notify = true`
- Waits for `pxl->is_use == false` (disk thread signals when done with this brick)
- Removes pxl from `ctx->diskth_xl` list
- Sets `priv->pxl = NULL` (idempotent guard for subsequent calls)
- Decrements `diskxl_count`
- Frees pxl struct
- Joins thread if last brick

Called by: `posix_notify(PARENT_DOWN)` ([line 213](https://github.com/gluster/glusterfs/blob/ae1d69672f/xlators/storage/posix/src/posix-common.c#L213)), `posix_reconfigure` ([line 426](https://github.com/gluster/glusterfs/blob/ae1d69672f/xlators/storage/posix/src/posix-common.c#L426)).

**Shortcut in `posix_fini` ([lines 1301-1312](https://github.com/gluster/glusterfs/blob/ae1d69672f/xlators/storage/posix/src/posix-common.c#L1301-L1312)):**
- Directly decrements `diskxl_count`
- Joins thread if last brick
- Does NOT: set `detach_notify`, wait for `is_use`, remove from list, set `priv->pxl = NULL`, free pxl

The author correctly used the safe function for paths where the process stays alive (PARENT_DOWN, reconfigure), and the shortcut for `posix_fini` — the terminal cleanup path. The implicit assumption was that by the time `posix_fini` runs, PARENT_DOWN has already safely removed the brick from the list, so the full handshake is unnecessary overhead and fini only needs to handle the ref-count for the shared join.

This assumption holds for normal operational shutdown (Path A) and brick mux detach (Path C), where PARENT_DOWN always precedes fini. It does **not** hold for init-failure (Path B), where `posix_fini` runs without a prior PARENT_DOWN, and the brick's pxl entry stays on the iteration list while `priv` gets freed.

The idempotent guard (`priv->pxl == NULL` check at [line 132](https://github.com/gluster/glusterfs/blob/ae1d69672f/xlators/storage/posix/src/posix-common.c#L132)) that makes `delete_posix_diskxl` safe to call from both paths was already present in the function the author wrote — it just wasn't used from fini.

## Fix

Replace the 12-line unsafe block in `posix_fini` with a single call to the existing safe function:

```diff
-    pthread_mutex_lock(&ctx->xl_lock);
-    {
-        count = --ctx->diskxl_count;
-        if (count == 0)
-            pthread_cond_signal(&ctx->xl_cond);
-    }
-    pthread_mutex_unlock(&ctx->xl_lock);
-
-    if (count == 0) {
-        pthread_join(ctx->disk_space_check, NULL);
-        ctx->disk_space_check = 0;
-    }
+    delete_posix_diskxl(priv, ctx);
```

`delete_posix_diskxl` is already idempotent: it checks `priv->pxl` at entry ([line 132](https://github.com/gluster/glusterfs/blob/ae1d69672f/xlators/storage/posix/src/posix-common.c#L132)) and returns immediately if NULL. On paths where `PARENT_DOWN` already ran, `priv->pxl` was set to NULL ([line 142](https://github.com/gluster/glusterfs/blob/ae1d69672f/xlators/storage/posix/src/posix-common.c#L142)), so fini's call is a safe no-op. This eliminates both the UAF (brick removed from list before `priv` freed) and the double-decrement (single decrement point per brick).

## Version range and history

**Affected:** all releases since GlusterFS 8 (November 2020, commit [`3f93be77e1`](https://github.com/gluster/glusterfs/commit/3f93be77e1)).

**Notably,** the same author (Mohit Agrawal) had previously fixed the *identical* bug class for the janitor thread in [`f138d3fa22`](https://github.com/gluster/glusterfs/commit/f138d3fa22) (2019-07-16, "posix: In brick_mux brick is crashed while start/stop volume in loop"):

> Brick is crashed in janitor task at the time of accessing priv. If posix priv is cleaned up before call janitor task then janitor task is crashed.

That fix introduced the `janitor_task_stop` flag and the PARENT_DOWN drain protocol — the exact safe-removal pattern that `delete_posix_diskxl` implements for the disk thread. When the disk thread was refactored to per-process 16 months later ([`3f93be77e1`](https://github.com/gluster/glusterfs/commit/3f93be77e1)), the safe function was created but not used from `posix_fini`, re-introducing the pattern that the janitor fix had resolved.

This is part of a recurring pattern in brick mux: shared resources that work fine per-brick become UAF hazards when refactored to per-process, because `posix_fini`'s teardown was designed for the per-brick model. Related upstream history:
- [`e589d8de66`](https://github.com/gluster/glusterfs/commit/e589d8de66) (2020-03-04): Mohit fixes multiple brick mux teardown leaks (xlator, local_pool, rpc, inode)
- #3600 / [`04413e71aa`](https://github.com/gluster/glusterfs/commit/04413e71aa) (2023-05-11): Xavi Hernandez fixes two more brick mux crash bugs (auth module + stale pointer)
- #4666: gfapi use-after-free race between glfs_fini and glfs_open
- #4159: SIGABRT in gf_io_uring_cq_process_some (io_uring lifecycle)

No prior upstream report exists for the disk_space_check UAF or double-decrement specifically.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

posix: disk_space_check thread unsafe removal in posix_fini #4684

Summary

How to reproduce

Bug 1: UAF on init failure in brick mux

Bug 2: double-decrement on normal shutdown

Root cause and history

Fix

Version range and history

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

posix: disk_space_check thread unsafe removal in posix_fini #4684

Description

Summary

How to reproduce

Bug 1: UAF on init failure in brick mux

Bug 2: double-decrement on normal shutdown

Root cause and history

Fix

Version range and history

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions