Skip to content

dht: xattr-name option is a tiering fossil — configurable internally, broken externally, test is stale #4674

@ThalesBarretto

Description

@ThalesBarretto

Related: #4673 / #4672 (wrong GF_FREE on conf->xattr_name — same commit provenance)

Summary

The xattr-name option (cluster.dht-xattr-name, NO_DOC, op_version 3) was introduced to support "DHT over DHT" tiering configurations. The tiering feature was the only consumer that ever exercised a non-default value. Tiering was removed in 2019–2020, but the option survived because dht_init calls it unconditionally. Meanwhile, three external consumers hardcode the default trusted.glusterfs.dht, making the configurability silently broken — changing the option causes unlink guards, geo-replication, and the rebalance tool to malfunction without any error message.

Affected code

The option:

Hardcoded consumers that bypass the option:

  1. xlators/storage/posix/src/posix.h line 54#define DHT_LINKTO "trusted.glusterfs.dht.linkto"
  2. geo-replication/syncdaemon/resource.py line 136'trusted.glusterfs.dht.linkto'
  3. extras/rebalance.py lines 84, 300trusted.glusterfs.dht in getfattr/setfattr commands

Stale test:

  • tests/bugs/distribute/bug-924265.t — sets cluster.dht-xattr-name to trusted.foo.bar, verifies the custom xattr appears on the brick. Does not test rebalance, linkfile cleanup, unlink guards, or geo-rep with the non-default name.

The problem

DHT internally is consistent. All 40+ usage sites across dht-common.c (20), dht-shared.c (6), dht-selfheal.c (6), switch.c (4), nufa.c (2), and dht-layout.c (2) use conf->xattr_name and its four derived fields (mds_xattr_key, link_xattr_name, commithash_xattr_name, wild_xattr_name). Zero hardcoded strings in DHT source.

External consumers are not. None of the three hardcoded consumers have any mechanism to discover what xattr-name is set to. They all assume the default. If an administrator changes the option:

  1. posix unlink guard breaks: posix_skip_non_linkto_unlink (posix-entry-ops.c:1285) calls sys_lgetxattr(real_path, "trusted.glusterfs.dht.linkto", NULL, 0) to check whether a file is a linkfile before deleting it during rebalance. Wrong xattr name → sys_lgetxattr returns -1 (xattr not found under hardcoded name) → skip_unlink = true → posix returns EBUSY → stale linkfile accumulation. Real migrated files are protected by the IS_DHT_LINKFILE_MODE check (mode bits, independent of xattr name) and are never at risk.

  2. geo-rep linkfile detection breaks: resource.py checks trusted.glusterfs.dht.linkto to identify linkfiles during sync. Wrong name → linkfile stubs replicated to secondary volume as real files.

  3. rebalance.py reads/writes wrong xattr: The admin tool hardcodes the xattr name in shell getfattr/setfattr commands. Wrong name → getfattr returns nothing, setfattr writes xattrs under the old name that DHT never reads → silent no-ops (the real layout under the configured name is unaffected).

The posix coupling is architectural. DHT communicates with posix via xdata-based protocol keys (e.g., DHT_SKIP_NON_LINKTO_UNLINK tells posix "only unlink if linkfile"). These keys are in the shared header glusterfs.h — a legitimate cross-layer protocol. But the DHT_LINKTO constant is defined only in posix.h — posix independently knows the on-disk xattr name and reads it directly via sys_lgetxattr, bypassing the xlator stack. DHT could have passed conf->link_xattr_name in the xdata dict alongside the request key; it doesn't.

When the tiering feature needed a different xattr name in 2015, posix had to be patched with a parallel hardcoded constant (TIER_LINKTO "trusted.tier.tier-dht.linkto") — proving the abstraction leaked at first real use.

Timeline

Date Commit Author Event
2012-06-14 08d63afa1b Jeff Darcy extras/rebalance.py created with hardcoded trusted.glusterfs.dht (predates the option)
2013-03-19 76bc5d1b2d Jeff Darcy xattr-name option introduced for "DHT over DHT" tiering vision. Test bug-924265.t included.
2013-03-21 0106fce7fe Jeff Darcy Option moves to dht-shared.c for NUFA/Switch sharing
2014-01-29 7289fa908b Venky Shankar geo-rep hardcodes trusted.glusterfs.dht.linkto
2014-07-15 037811033f Venkatesh Somyajulu posix.h adds LINKTO "trusted.glusterfs.dht.linkto" for rebalance race fix (BZ 1110694)
2015-10-13 0243085e40 N Balachandran Tiering arrives — the only consumer that ever exercised a non-default value. glusterd sets xattr-name=trusted.tier.tier-dht for tier-DHT layer
2015-11-30 1498103e7d Mohammed Rafi KC posix.h renames LINKTODHT_LINKTO, adds TIER_LINKTO "trusted.tier.tier-dht.linkto"
2019-05-02 e1cc427558 Hari Gowtham Tiering removed from glusterd. Commit message: "The tier changes in DHT still remain as such."
2019-12-29 e21a5a1b1e Yaniv Kaul TIER_LINKTO removed from posix.h; DHT_LINKTO remains
2020-09-16 d63b97c93a Barak Sason Rofman Tier options removed from DHT under #1097. xattr-name survives — called unconditionally in dht_init.

Why the option cannot be removed

  1. dht_init calls GF_OPTION_INIT("xattr-name", conf->xattr_name, str, err) at line 793. If the option is removed from dht_options[], xlator_volume_option_get() returns NULL → GF_OPTION_INIT returns -1 → goto errevery volume fails to start.

  2. Existing volumes may have cluster.dht-xattr-name persisted in glusterd's option store. Removing the option from glusterd-volume-set.c prevents future volume set calls but does not remove already-stored values.

  3. The four derived names (mds_xattr_key, link_xattr_name, commithash_xattr_name, wild_xattr_name) are computed from conf->xattr_name at init time (lines 794–799). Hardcoding the default requires replacing these gf_asprintf calls too.

Suggested actions

This is a discussion issue — no single patch resolves it cleanly. Options:

Minimal (documentation):

  • Add a comment in dht-shared.c at the option definition explaining it's a tiering fossil that can't be removed
  • Mark bug-924265.t as known-stale or strengthen it to test the cross-layer contract

Moderate (lock down):

  • Log a warning if a non-default value is detected at init time
  • Add a note to the option description that non-default values are unsupported

Thorough (fix the coupling):

  • Have DHT pass conf->link_xattr_name via xdata alongside DHT_SKIP_NON_LINKTO_UNLINK, and have posix read it from xdata instead of hardcoding DHT_LINKTO
  • This makes posix xattr-name-agnostic and eliminates the coupling — but there's no current consumer that needs it

Additional xattr keys that are NOT configurable

Two operational xattr keys use the translator name distribute and are always hardcoded regardless of xattr-name:

Cross-references

Affected versions

All versions since GlusterFS 3.4 (2013). Confirmed present on:

  • devel (option at line 1005, init at line 793)
  • release-11 (option at line 1005, init at line 793)
  • release-10 (option at line 1034, init at line 831)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions