Related: #4673 / #4672 (wrong GF_FREE on conf->xattr_name — same commit provenance)
Summary
The xattr-name option (cluster.dht-xattr-name, NO_DOC, op_version 3) was introduced to support "DHT over DHT" tiering configurations. The tiering feature was the only consumer that ever exercised a non-default value. Tiering was removed in 2019–2020, but the option survived because dht_init calls it unconditionally. Meanwhile, three external consumers hardcode the default trusted.glusterfs.dht, making the configurability silently broken — changing the option causes unlink guards, geo-replication, and the rebalance tool to malfunction without any error message.
Affected code
The option:
Hardcoded consumers that bypass the option:
xlators/storage/posix/src/posix.h line 54 — #define DHT_LINKTO "trusted.glusterfs.dht.linkto"
geo-replication/syncdaemon/resource.py line 136 — 'trusted.glusterfs.dht.linkto'
extras/rebalance.py lines 84, 300 — trusted.glusterfs.dht in getfattr/setfattr commands
Stale test:
tests/bugs/distribute/bug-924265.t — sets cluster.dht-xattr-name to trusted.foo.bar, verifies the custom xattr appears on the brick. Does not test rebalance, linkfile cleanup, unlink guards, or geo-rep with the non-default name.
The problem
DHT internally is consistent. All 40+ usage sites across dht-common.c (20), dht-shared.c (6), dht-selfheal.c (6), switch.c (4), nufa.c (2), and dht-layout.c (2) use conf->xattr_name and its four derived fields (mds_xattr_key, link_xattr_name, commithash_xattr_name, wild_xattr_name). Zero hardcoded strings in DHT source.
External consumers are not. None of the three hardcoded consumers have any mechanism to discover what xattr-name is set to. They all assume the default. If an administrator changes the option:
-
posix unlink guard breaks: posix_skip_non_linkto_unlink (posix-entry-ops.c:1285) calls sys_lgetxattr(real_path, "trusted.glusterfs.dht.linkto", NULL, 0) to check whether a file is a linkfile before deleting it during rebalance. Wrong xattr name → sys_lgetxattr returns -1 (xattr not found under hardcoded name) → skip_unlink = true → posix returns EBUSY → stale linkfile accumulation. Real migrated files are protected by the IS_DHT_LINKFILE_MODE check (mode bits, independent of xattr name) and are never at risk.
-
geo-rep linkfile detection breaks: resource.py checks trusted.glusterfs.dht.linkto to identify linkfiles during sync. Wrong name → linkfile stubs replicated to secondary volume as real files.
-
rebalance.py reads/writes wrong xattr: The admin tool hardcodes the xattr name in shell getfattr/setfattr commands. Wrong name → getfattr returns nothing, setfattr writes xattrs under the old name that DHT never reads → silent no-ops (the real layout under the configured name is unaffected).
The posix coupling is architectural. DHT communicates with posix via xdata-based protocol keys (e.g., DHT_SKIP_NON_LINKTO_UNLINK tells posix "only unlink if linkfile"). These keys are in the shared header glusterfs.h — a legitimate cross-layer protocol. But the DHT_LINKTO constant is defined only in posix.h — posix independently knows the on-disk xattr name and reads it directly via sys_lgetxattr, bypassing the xlator stack. DHT could have passed conf->link_xattr_name in the xdata dict alongside the request key; it doesn't.
When the tiering feature needed a different xattr name in 2015, posix had to be patched with a parallel hardcoded constant (TIER_LINKTO "trusted.tier.tier-dht.linkto") — proving the abstraction leaked at first real use.
Timeline
| Date |
Commit |
Author |
Event |
| 2012-06-14 |
08d63afa1b |
Jeff Darcy |
extras/rebalance.py created with hardcoded trusted.glusterfs.dht (predates the option) |
| 2013-03-19 |
76bc5d1b2d |
Jeff Darcy |
xattr-name option introduced for "DHT over DHT" tiering vision. Test bug-924265.t included. |
| 2013-03-21 |
0106fce7fe |
Jeff Darcy |
Option moves to dht-shared.c for NUFA/Switch sharing |
| 2014-01-29 |
7289fa908b |
Venky Shankar |
geo-rep hardcodes trusted.glusterfs.dht.linkto |
| 2014-07-15 |
037811033f |
Venkatesh Somyajulu |
posix.h adds LINKTO "trusted.glusterfs.dht.linkto" for rebalance race fix (BZ 1110694) |
| 2015-10-13 |
0243085e40 |
N Balachandran |
Tiering arrives — the only consumer that ever exercised a non-default value. glusterd sets xattr-name=trusted.tier.tier-dht for tier-DHT layer |
| 2015-11-30 |
1498103e7d |
Mohammed Rafi KC |
posix.h renames LINKTO → DHT_LINKTO, adds TIER_LINKTO "trusted.tier.tier-dht.linkto" |
| 2019-05-02 |
e1cc427558 |
Hari Gowtham |
Tiering removed from glusterd. Commit message: "The tier changes in DHT still remain as such." |
| 2019-12-29 |
e21a5a1b1e |
Yaniv Kaul |
TIER_LINKTO removed from posix.h; DHT_LINKTO remains |
| 2020-09-16 |
d63b97c93a |
Barak Sason Rofman |
Tier options removed from DHT under #1097. xattr-name survives — called unconditionally in dht_init. |
Why the option cannot be removed
-
dht_init calls GF_OPTION_INIT("xattr-name", conf->xattr_name, str, err) at line 793. If the option is removed from dht_options[], xlator_volume_option_get() returns NULL → GF_OPTION_INIT returns -1 → goto err → every volume fails to start.
-
Existing volumes may have cluster.dht-xattr-name persisted in glusterd's option store. Removing the option from glusterd-volume-set.c prevents future volume set calls but does not remove already-stored values.
-
The four derived names (mds_xattr_key, link_xattr_name, commithash_xattr_name, wild_xattr_name) are computed from conf->xattr_name at init time (lines 794–799). Hardcoding the default requires replacing these gf_asprintf calls too.
Suggested actions
This is a discussion issue — no single patch resolves it cleanly. Options:
Minimal (documentation):
- Add a comment in
dht-shared.c at the option definition explaining it's a tiering fossil that can't be removed
- Mark
bug-924265.t as known-stale or strengthen it to test the cross-layer contract
Moderate (lock down):
- Log a warning if a non-default value is detected at init time
- Add a note to the option description that non-default values are unsupported
Thorough (fix the coupling):
- Have DHT pass
conf->link_xattr_name via xdata alongside DHT_SKIP_NON_LINKTO_UNLINK, and have posix read it from xdata instead of hardcoding DHT_LINKTO
- This makes posix xattr-name-agnostic and eliminates the coupling — but there's no current consumer that needs it
Additional xattr keys that are NOT configurable
Two operational xattr keys use the translator name distribute and are always hardcoded regardless of xattr-name:
Cross-references
Affected versions
All versions since GlusterFS 3.4 (2013). Confirmed present on:
devel (option at line 1005, init at line 793)
release-11 (option at line 1005, init at line 793)
release-10 (option at line 1034, init at line 831)
Related: #4673 / #4672 (wrong
GF_FREEonconf->xattr_name— same commit provenance)Summary
The
xattr-nameoption (cluster.dht-xattr-name,NO_DOC, op_version 3) was introduced to support "DHT over DHT" tiering configurations. The tiering feature was the only consumer that ever exercised a non-default value. Tiering was removed in 2019–2020, but the option survived becausedht_initcalls it unconditionally. Meanwhile, three external consumers hardcode the defaulttrusted.glusterfs.dht, making the configurability silently broken — changing the option causes unlink guards, geo-replication, and the rebalance tool to malfunction without any error message.Affected code
The option:
xlators/cluster/dht/src/dht-shared.cline 1005 — option definition, default"trusted.glusterfs.dht"xlators/cluster/dht/src/dht-shared.cline 793 —GF_OPTION_INIT("xattr-name", conf->xattr_name, str, err)xlators/mgmt/glusterd/src/glusterd-volume-set.cline 833 — glusterd-sidecluster.dht-xattr-nameentryHardcoded consumers that bypass the option:
xlators/storage/posix/src/posix.hline 54 —#define DHT_LINKTO "trusted.glusterfs.dht.linkto"geo-replication/syncdaemon/resource.pyline 136 —'trusted.glusterfs.dht.linkto'extras/rebalance.pylines 84, 300 —trusted.glusterfs.dhtingetfattr/setfattrcommandsStale test:
tests/bugs/distribute/bug-924265.t— setscluster.dht-xattr-nametotrusted.foo.bar, verifies the custom xattr appears on the brick. Does not test rebalance, linkfile cleanup, unlink guards, or geo-rep with the non-default name.The problem
DHT internally is consistent. All 40+ usage sites across
dht-common.c(20),dht-shared.c(6),dht-selfheal.c(6),switch.c(4),nufa.c(2), anddht-layout.c(2) useconf->xattr_nameand its four derived fields (mds_xattr_key,link_xattr_name,commithash_xattr_name,wild_xattr_name). Zero hardcoded strings in DHT source.External consumers are not. None of the three hardcoded consumers have any mechanism to discover what
xattr-nameis set to. They all assume the default. If an administrator changes the option:posix unlink guard breaks:
posix_skip_non_linkto_unlink(posix-entry-ops.c:1285) callssys_lgetxattr(real_path, "trusted.glusterfs.dht.linkto", NULL, 0)to check whether a file is a linkfile before deleting it during rebalance. Wrong xattr name →sys_lgetxattrreturns -1 (xattr not found under hardcoded name) →skip_unlink = true→ posix returns EBUSY → stale linkfile accumulation. Real migrated files are protected by theIS_DHT_LINKFILE_MODEcheck (mode bits, independent of xattr name) and are never at risk.geo-rep linkfile detection breaks:
resource.pycheckstrusted.glusterfs.dht.linktoto identify linkfiles during sync. Wrong name → linkfile stubs replicated to secondary volume as real files.rebalance.py reads/writes wrong xattr: The admin tool hardcodes the xattr name in shell
getfattr/setfattrcommands. Wrong name →getfattrreturns nothing,setfattrwrites xattrs under the old name that DHT never reads → silent no-ops (the real layout under the configured name is unaffected).The posix coupling is architectural. DHT communicates with posix via xdata-based protocol keys (e.g.,
DHT_SKIP_NON_LINKTO_UNLINKtells posix "only unlink if linkfile"). These keys are in the shared headerglusterfs.h— a legitimate cross-layer protocol. But theDHT_LINKTOconstant is defined only inposix.h— posix independently knows the on-disk xattr name and reads it directly viasys_lgetxattr, bypassing the xlator stack. DHT could have passedconf->link_xattr_namein the xdata dict alongside the request key; it doesn't.When the tiering feature needed a different xattr name in 2015, posix had to be patched with a parallel hardcoded constant (
TIER_LINKTO "trusted.tier.tier-dht.linkto") — proving the abstraction leaked at first real use.Timeline
08d63afa1bextras/rebalance.pycreated with hardcodedtrusted.glusterfs.dht(predates the option)76bc5d1b2dxattr-nameoption introduced for "DHT over DHT" tiering vision. Testbug-924265.tincluded.0106fce7fedht-shared.cfor NUFA/Switch sharing7289fa908btrusted.glusterfs.dht.linkto037811033fLINKTO "trusted.glusterfs.dht.linkto"for rebalance race fix (BZ 1110694)0243085e40xattr-name=trusted.tier.tier-dhtfor tier-DHT layer1498103e7dLINKTO→DHT_LINKTO, addsTIER_LINKTO "trusted.tier.tier-dht.linkto"e1cc427558e21a5a1b1eTIER_LINKTOremoved from posix.h;DHT_LINKTOremainsd63b97c93axattr-namesurvives — called unconditionally indht_init.Why the option cannot be removed
dht_initcallsGF_OPTION_INIT("xattr-name", conf->xattr_name, str, err)at line 793. If the option is removed fromdht_options[],xlator_volume_option_get()returns NULL →GF_OPTION_INITreturns -1 →goto err→ every volume fails to start.Existing volumes may have
cluster.dht-xattr-namepersisted in glusterd's option store. Removing the option fromglusterd-volume-set.cprevents futurevolume setcalls but does not remove already-stored values.The four derived names (
mds_xattr_key,link_xattr_name,commithash_xattr_name,wild_xattr_name) are computed fromconf->xattr_nameat init time (lines 794–799). Hardcoding the default requires replacing thesegf_asprintfcalls too.Suggested actions
This is a discussion issue — no single patch resolves it cleanly. Options:
Minimal (documentation):
dht-shared.cat the option definition explaining it's a tiering fossil that can't be removedbug-924265.tas known-stale or strengthen it to test the cross-layer contractModerate (lock down):
Thorough (fix the coupling):
conf->link_xattr_namevia xdata alongsideDHT_SKIP_NON_LINKTO_UNLINK, and have posix read it from xdata instead of hardcodingDHT_LINKTOAdditional xattr keys that are NOT configurable
Two operational xattr keys use the translator name
distributeand are always hardcoded regardless ofxattr-name:GF_XATTR_FIX_LAYOUT_KEY = "distribute.fix.layout"(dht-common.h:24)GF_XATTR_FILE_MIGRATE_KEY = "trusted.distribute.migrate-data"(dht-common.h:25)Cross-references
GF_FREE(conf->xattr_name)wrong-free in the same error path (same commit provenance:76bc5d1b2d)DHT_LINKTOhardcodingAffected versions
All versions since GlusterFS 3.4 (2013). Confirmed present on:
devel(option at line 1005, init at line 793)release-11(option at line 1005, init at line 793)release-10(option at line 1034, init at line 831)