Skip to content

Add upload and download timings#177

Draft
Hitenjain14 wants to merge 51 commits into
feat/art-changefrom
feat/enterprise-timings
Draft

Add upload and download timings#177
Hitenjain14 wants to merge 51 commits into
feat/art-changefrom
feat/enterprise-timings

Conversation

@Hitenjain14
Copy link
Copy Markdown
Contributor

@Hitenjain14 Hitenjain14 commented Jan 17, 2025

Description

Motivation and Context

How to test this PR?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Optimization (provides speedup with no functional changes)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • Fixes a regression (If yes, please add commit-id or PR # here)
  • Documentation updated
  • Unit tests added/updated

Hitenjain14 and others added 30 commits January 17, 2025 16:30
- Allow audit target initialization even if logsearchapi is unavailable
- Detect connection refused errors and log warning instead of failing
- Server will start normally and retry when sending logs
- Fixes 'Unable to initialize server audit HTTP target' error
Picks up:
- core/node: HealthyByLFB() and LFB-aware max-nonce for chaos networks
- feat: Add cache-first allocation and blobber caching for offline mode
- increased maxConnsPerHost to match TCPDialer Concurrency
- wasmsdk: export faucet + allow wallet set with privateKey
Implements two complementary approaches for high-throughput S3 operations:

1. LogCache (logcache.go): Log-structured ACID cache
   - Append-only cache file on NVMe (data + metadata unified)
   - Group commit with fdatasync (amortized ~0.02ms/entry at high concurrency)
   - In-memory index for O(1) GET/HEAD/LIST lookups
   - Committed objects stay in cache for continued GET serving
   - Background drain to blobbers via DoMultiOperation
   - Crash recovery via cache file replay
   - Adaptive: small files (≤1MB) cached, large files go direct to blobbers

2. WAL Intent Log (wal.go): Lightweight crash-recovery for writeback cache
   - Metadata-only entries (~100 bytes, no data duplication)
   - Works alongside MinIO's writeback cache for GET speed
   - Ensures ACID by recording intent before cache write

3. Config-driven tuning (initSDK.go):
   - sdk_batch_size, locked_blobbers_cap, enable_wal, wal_dir, wal_commit_workers
   - All performance knobs exposed via zs3server.json

4. S3 operation integration (gateway-zcn.go, dStorage.go):
   - PUT: size-adaptive routing (cache vs direct blobber)
   - GET/HEAD: cache-first with blobber fallback
   - DELETE: cache eviction + blobber delete
   - LIST: cache index merge with blobber listing

Measured results on test2 (12 cores, 3 enterprise blobbers):
- PUT (WAL inline, 1KiB): 3672 obj/s at conc=256 (ACID)
- PUT (sync direct IPs): 289 obj/s at conc=64
- GET (WAL-served): 3900 obj/s at conc=128
- GET (writeback cache): 8734 obj/s at conc=256
- Historical baseline: PUT 0.58, GET 125 obj/s
PUT: size-adaptive routing — ≤1MB through LogCache (append + fdatasync), >1MB direct to blobbers
GET: LogCache first via file-backed reader (sendfile-capable), blobber fallback on miss
HEAD: LogCache index lookup, blobber fallback
LIST: merge LogCache entries with blobber listing
DELETE: evict from LogCache + delete from blobbers
COPY: if source in LogCache, read from cache; else blobber copy

LogCache GET uses limitedFileReader with WriteTo interface for sendfile optimization.
Committed objects stay in cache index for continued GET serving.
Hot cache: 20K-entry in-memory LRU of recently-PUT objects.
GET checks hot cache first (~0.01ms) before falling back to shared fd pread (~0.1ms).
Eliminates os.Open/Close per GET that was adding ~1-2ms overhead.

Results: GET +22% at conc=16 (2706 → 3306 obj/s).
GET ceiling at ~3300 obj/s is Go net/http handler overhead, not cache layer.
Writeback cache handles all S3 operations natively (PUT/GET/HEAD/LIST/DELETE/COPY).
WAL intent log (metadata-only, ~100 bytes per entry) provides crash recovery.

PUT: writeback cache write (~1ms) + WAL intent fdatasync (~0.5ms amortized)
GET: MinIO sendfile from cache (bypasses gateway, ~0.5ms for 1KB)
DELETE: WAL marks deleted + MinIO cache eviction

Expected: PUT ~3000 obj/s, GET ~8000 obj/s, 1059 MB/s for 1MiB GET.
Requires: MINIO_CACHE_COMMIT=writeback in docker-compose + zs3server.json enable_wal=true.
…ivalent)

- nfs_server.go: NFSv3 server via go-nfs library, starts on configurable port
- nfs_fs.go: billy.Filesystem backed by Züs blobbers — reuses putFile,
  getFileReader, getRegularRefs (same data path as S3 gateway)
- nfs_file.go: billy.File with local temp staging, uploads to blobbers on Close
- Config: enable_nfs, nfs_port, nfs_cache_dir in zs3server.json
- Shares WAL, writeback cache, and batch workers with S3 gateway
- Architecture doc: NFS_GATEWAY_ARCHITECTURE.md
Reduces data loss window from 60s to 5s (12x safer).
Increases background drain workers from 10 to 20 (faster commit to blobbers).
Zero performance impact on PUT/GET — only affects background commit frequency.
…ance

Route NFS reads/writes through MinIO's in-process ObjectLayer API instead
of direct blobber calls. Eliminates HTTP loopback overhead and synchronous
blobber commits that made NFS 40x slower than S3.

Key changes:
- nfs_s3client.go: direct CacheObjectLayer.PutObject/GetObjectNInfo calls
- nfs_file.go: in-memory bytes.Buffer for files <=1MB (no temp file I/O)
- nfs_fs.go: stat cache after writes, S3 cache-aware Stat/ReadDir/OpenFile
- nfs_server.go: increased CachingHandler to 8192 entries
- object-api-common.go: exported GetGlobalCacheObjectAPI/GetGlobalObjectAPI
- NFS_ARCHITECTURE.md: full architecture doc with measured data + NFSv4.1 plan

Measured improvement (1KB files, chain stopped, 12 cores, 3 eblobbers 2+1):
  NFS PUT: 9 -> 48 obj/s (+5.3x)
  NFS GET: 44 -> 103 obj/s (+2.3x)
  Remaining gap vs S3 (357 PUT, 506 GET) is NFSv3 protocol overhead.
- nfs_cache_mode: "memory" option for fire-and-forget writes (no disk I/O,
  async blobber commit, no crash recovery)
- nfs_cache_mode: "disk" (default) unchanged — ACID via /mcache + WAL
- Dual-access test script (tests/dual_access_test.sh): 10 test cases
  verifying S3 write->NFS read, NFS write->S3 read, LIST, DELETE, overwrite
- Updated NFS_ARCHITECTURE.md with nconnect=16 benchmark data showing
  NFS GET at 63% of S3 (312 vs 493 obj/s), and path to S3 parity via
  NFS-Ganesha FSAL or go-nfs concurrent dispatch patch

Measured (Python bench, chain stopped, 12 cores, 3 eblobbers 2+1):
  nconnect=16 disk mode: PUT 61, GET 312 obj/s (1KB)
  nconnect=16 memory mode: PUT 47, GET 320 obj/s (1KB)
  Memory mode does NOT improve concurrent throughput — bottleneck is
  go-nfs RPC dispatch, not backend storage.
Replace go-nfs (60 obj/s) with NFS-Ganesha + FSAL_VFS + inotify blobber
sync. NFS-Ganesha handles the NFS protocol in C (fast), our Go process
watches the export directory and commits changes to blobbers async.

Architecture:
  NFS client → Ganesha (C, NFSv4) → /nfs_export (NVMe) → instant return
  Background: inotify → putFile() → blobbers (async, ACID via WAL)

Config: "nfs_ganesha_export_dir": "/nfs_export" in zs3server.json
Requires: apt install nfs-ganesha nfs-ganesha-vfs

Measured (Python bench, chain stopped, 12 cores, 3 eblobbers 2+1):
  NFS PUT 1KB: 3,208 obj/s (was 60 with go-nfs) — 53x improvement
  NFS GET 1KB: 7,794 obj/s (was 277 with go-nfs) — 28x improvement
  NFS PUT 1KB is 9x faster than S3 via boto3 (362 obj/s)
guruhubb added 21 commits April 12, 2026 13:10
After blobber commit, delete files from tmpfs export dir to bound
cache usage. When tmpfs exceeds 80%, spill committed files to NVMe.

Config:
  nfs_cache_evict: true (default) — delete from tmpfs after commit
  nfs_spillover_dir: "/path" — NVMe directory for overflow

This keeps tmpfs usage bounded for sustained workloads:
  Small files (1KB, 5K/s): 8GB tmpfs holds 25 min of burst
  Eviction at 50 commits/s keeps steady-state usage at ~50MB
- nfs_direct_threshold: files > 2MB bypass tmpfs, commit to blobbers
  directly via inotify (avoids filling cache with large data)
- s3_direct_threshold: same for S3 PUT path — large files bypass
  batch channel and use DoMultiOperation directly
- 3 NFS cache modes: tmpfs (fastest), nvme (crash-safe), direct (slow)
- Configurable via zs3server.json

Architecture per file size:
  ≤2MB: NFS write → tmpfs/nvme → batch commit → blobbers (5K obj/s)
  >2MB: NFS write → tmpfs/nvme → immediate commit → blobbers

Same logic for S3:
  ≤threshold: S3 PUT → writeback cache → batch commit
  >threshold: S3 PUT → DoMultiOperation → blobbers directly
Replace blocking putFile() path with direct DoMultiOperation batches.
Each sync worker collects files into batches of 25, commits with a single
WM lock acquisition per batch instead of per file.

Before: sync worker → putFile → blocks 500ms → next file (50 files/s)
After: sync worker → collect 25 files → DoMultiOperation → 25 files committed
       Expected: 5 workers × 25 files / 500ms = 250 files/s

Also: debounced inotify (200ms quiet before commit), 10K file channel buffer,
2s backoff on failure, rate-limited retries.
Adaptive config:
- FileSizeTracker (1000-entry ring buffer) tracks PUT sizes from S3 + NFS
- StartAdaptiveLoop() samples median every 30s, adjusts batch/worker config
- ConfigForFileSize() returns optimal settings per size category

Cache router:
- TryCacheRead() checks NFS export + /mcache before blobber download
- Enables cross-protocol visibility (NFS write → S3 GET from local cache)
- Tracks hit/miss stats with periodic logging

Benchmark cleanup:
- tests/cleanup_bench.sh: cleans caches, truncates logs, reports disk usage
- Supports --dry-run flag
…e architecture doc

The cache_router inside zs3server only checks /nfs_export for cross-protocol
reads (NFS write → S3 GET). MinIO's cache layer already handles /mcache
lookups before our handler runs — no need to duplicate that.

Updated NFS_ARCHITECTURE.md with full multi-level cache diagram:
  Level 1: Router (/router repo) — compute node local NVMe cache
  Level 2: zs3server — /mcache (S3) + /nfs_export (NFS) + CacheRouter
  Level 3: Blobbers — erasure-coded persistent storage

Added data flow diagrams for:
  - App requests S3 object (full cache hierarchy)
  - App writes via NFS (Ganesha → tmpfs → blobber sync)
  - External S3 object not on Züs (Router pulls from AWS → stores in Züs)
…3-upstream fallback, router-order fix, stub xattr defense

New internal HTTP endpoints consumed by the FSAL_ZUS Ganesha plugin to implement
NFS read-through-blobbers + directory listing union.

- POST /internal/prewarm {bucket, key} — fetch object from Züs into /nfs_export
  via existing getFileReader, using singleflight dedup + MarkCommitted anti-loop
  + os.Rename for atomicity. Fast-path when file already non-stub-cached.
- GET /internal/list?bucket=X&prefix=Y[&stub=1] — returns Züs directory listing.
  With stub=1, creates sparse placeholders with real sizes + user.zus.stub xattr
  so apps see all entries via readdir; actual content arrives on first open via
  prewarm. 30s TTL cache.
- S3-upstream fallback (config-gated) — on Züs 404, fetch from configured
  external S3 (AWS/MinIO), stream to client via io.Pipe + TeeReader, async
  cache-back to Züs. minio-go client, singleflight dedup on bucket/key.
  Replaces the role of the hypothetical /router repo; single source of truth.

BlobberSync hardening
- nfs_blobber_sync.go: new MarkCommitted(relPath) method.
- nfs_blobber_sync.go: always skip files with user.zus.stub xattr (defense in
  depth — MarkCommitted alone insufficient due to fsnotify Create firing before
  caller can populate the committed map).

Router-order bug fix (gateway-main.go)
- GatewayExtraRouters must register BEFORE registerSTSRouter. STS uses
  PathPrefix(/) + MatcherFunc(POST && form-urlencoded && no queries) which
  hijacked POST /internal/* whenever client omitted Content-Type: application/json
  (curl default is form-urlencoded). Returning 400 MissingParameter STS errors.

Measured results (SF=10 TPC-DS, 3.8 GB parquet, 452 files, 4+1 enterprise alloc)
- TPC-DS 10 queries, via /mnt/zus_nfs + FSAL_ZUS: 222.24s (vs NVMe 180.05s, +23%)
- S3-API cold: 223.08s; warm: 221.43s (warm≈cold, workload is CPU-bound at SF=10)
- fio seq_read 1M NFS: 577 MB/s (45% of NVMe 1294 MB/s)
- Prewarm throughput: 547-624 MB/s sustained at 16-way concurrency (Züs→tmpfs)
- Cold single-file NFS read (stub→prewarm→retry): 389ms for 1.9MB

Architecture documented in FSAL_ZUS_ARCHITECTURE.md.
Mount options in FSAL_ZUS_MOUNT.md.
Architecture:
  App → zs3server → /nfs_export (tmpfs, unified cache for S3+NFS)
    HIT → serve from tmpfs (5K+ obj/s)
    MISS → blobbers (GoSDK)
    MISS → external S3 (AWS) → write to /nfs_export → blobber sync

Key changes:
- external_s3.go: Router function fetches from AWS S3 on Züs miss,
  writes to /nfs_export so blobber sync commits to Züs
- gateway-zcn.go PutObject: S3 PUT now writes to /nfs_export
  (unified cache) so NFS clients see S3-written files instantly
- gateway-zcn.go GetObjectNInfo: 3-level lookup:
  /nfs_export → blobbers → external S3
- cache_router.go: simplified to check /nfs_export only
  (MinIO /mcache handled by MinIO's own cache layer)
- NFS_ARCHITECTURE.md: full multi-level cache diagram

Config:
  "external_s3_endpoint": "https://s3.amazonaws.com"
  "external_s3_region": "us-east-1"
…-safe eviction

Architecture
- Three tiers: tmpfs (/nfs_export) → spillover (/root/nfs_spillover) → blobber.
- Unified layout: S3 and NFS share /nfs_export paths.
- Per-file state via xattrs: user.zus.stub (placeholder) / user.zus.committed (real).
- Fully configurable: nfs_tmpfs_cache_enabled, nfs_spillover_cache_enabled,
  nfs_spillover_max_bytes, nfs_cache_disabled.

Routes (new)
- cache_router.go: TryCacheRead — full-file serve from tmpfs or spillover
  without going through gosdk. Skips stubs via user.zus.stub xattr.
- cache_clear_router.go, cache_stats_router.go: /internal/cache_clear,
  /internal/cache_stats operational endpoints.
- commit_router.go: /internal/commit for NFS write-marker integration.
- fallback_s3.go: upstream S3 fallback on Züs 404 (replaces external_s3.go).
- list_router.go: /internal/list + stub materialisation for NFS readdir.
- prewarm_router.go: /internal/prewarm fetch-from-blobber, io.Copy to tmpfs,
  xattr gating, short-read / ENOSPC stub restoration.
- inode_rel_map.go: inode → rel_path map for FSAL_ZUS handle-to-path recovery.
- mirror_s3_to_export.go: S3 PUT → /nfs_export tree mirroring for NFS visibility.

Read path (S3 — gateway-zcn.go)
- Fix A: local-file fast path. Serves ranges directly from tmpfs fd when
  file has user.zus.committed + blocks>0. Bypasses gosdk, consensus,
  erasure decode.
- Lazy cache-back on range-read miss: getFileReader serves the range +
  spawns cacheBackFullFetch in background. Deduped via cacheBackInflight
  sync.Map, gated by RecentlyEvicted (120 s cool-off).
- Full-file read cache-back-tee: TeeReader fills tmpfs as bytes flow to client.
- s3ContentHash used only for directory ETag; never written as file body.

NFS write path
- PutObject: MarkCommitted before open (inotify skip gate) → io.Copy to
  /nfs_export → putFile to blobbers → Setxattr committed after success.
- nfs_blobber_sync: inotify watcher + batched upload with per-path locks
  serialising S3-write / NFS-write / spill / cache-back / commit on one
  relPath.
- Skip gates: HasPrefix(basename, "."), .cacheback suffix, .cachefetch
  suffix (so temp files mid-rename never upload to blobber).

Eviction (nfs_blobber_sync.go)
- spilloverMonitor: 1 s tick; if tmpfs > 60% → SpillNow.
- spillCommittedFiles: oldest-mtime candidate; skip if openFdInodes hits,
  atime < 120 s, or committed xattr missing. Copies to spillover, then
  under cross-process OFD F_WRLCK on the tmpfs inode: Removexattr
  committed → O_TRUNC → Truncate(origSize) sparse stub → Setxattr stub →
  release. Paired with FSAL_ZUS F_RDLCK so spill blocks while reader
  holds fd.
- EnsureFreeTmpfs: skip-failed-candidate loop (was bailing on first
  spill error, leaving prewarm ENOSPC-stuck).
- evictSpilloverOldest: skip files with mtime < 120 s OR atime < 120 s
  (strictatime mount bumps atime on every read; catches active spillover
  readers).
- OFD locks (F_OFD_SETLKW/F_OFD_SETLK) everywhere; POSIX fcntl locks
  released on any same-process close and were dropping the guard before
  intended.
- RecentlyEvicted TTL 30 s → 120 s (longer cool-off against re-cache-back
  of just-evicted keys).

SF10 TPC-DS benches (10-query, SF10 ≈ 3.8 GB, test2 on localhost):
- NFS 8G/0:  228 s  (best; near Apr-14 baseline 223 s)
- NFS 1G/1G: 366 s  (no spillover benefit on this setup)
- S3  8G/0:  386 s  (Fix A fast path, no thrash)
- S3  0/1G:  385 s  (spillover-only; TryCacheRead serves full-file reads)
- S3  1G/1G: 752 s  (tmpfs thrash actively harmful at dataset > cache)
- S3  1G/0:  1135 s (worst — cache thrash + Fix A lock overhead > raw blobber)

Conclusion: single-tier tmpfs sized ≥ working set is the best config;
spillover becomes valuable only when blobber fetch is expensive (WAN) or
working set >> tmpfs.
Full 9-config sweep on SF10 dataset (test2, single node):
- Best: NFS 8G tmpfs = 228 s (near Apr-14 baseline 223 s)
- Worst: S3 1G tmpfs no-spill = 1135 s (cache thrash > no cache)
- Spillover adds zero benefit when blobbers are on localhost; only
  wins for WAN blobbers or dataset >> cache.
- S3 "no-thrash" configs converge at ~385 s — per-range HTTP/MinIO
  overhead dominates once cache stops thrashing.

Documents the cache architecture, read paths, eviction policy, and
all fixes landed this session (lazy cache-back, OFD locks,
atime-based eviction grace, EnsureFreeTmpfs skip-fail, prewarm
ENOSPC stub restore, FSAL_ZUS EIO fallback).

Recommended default: single-tier tmpfs sized ≥ working set.
…r skip

wal.go:
- Delete keeps the walDeleted entry alive with a Timestamp so a racing
  GET can detect 'recently deleted' for 60s (WasRecentlyDeleted).
- ClearTombstone removes the marker on successful PUT so visibility is
  restored after a DELETE → PUT sequence.
- maybeCompact harvests expired (>60s) tombstones.

gateway-zcn.go:
- GetObjectNInfo and GetObjectInfo consult WasRecentlyDeleted and
  return 404 while the tombstone is live, closing the read-after-
  delete race at the gateway layer before gosdk consensus.
- PUT now writes to a tmp path and renames in-place so a racing GET
  never sees a partial file (closes truncate-race on overwrite).
- ClearTombstone fires on successful PUT (and on CompleteMultipart).
- ShouldCacheFile admission gate wired into both range-miss and
  full-miss cache-back paths (size-fits + hit-rate + not-recently-
  evicted), stopping the thrash ratio from climbing above ~1.05 on
  S3 1G/0 configs.
- largeObjectCacheSkipBytes=1 GiB: objects ≥1 GiB skip tmpfs entirely
  and spool to spillover NVMe — fixes the Llama3 1.5 GB PUT ENOSPC
  crash caused by peak 2× usage during write-then-rename.
- RecordGet at the top of GetObjectNInfo so the predictor sees every
  request, not just cache-hits.
- NFS counters (tmpfs/spillover/prewarm) wired via prewarmHandler.

multipart.go: CompleteMultipartUpload clears the tombstone on success.

Validated: Porcupine 10/10 LINEARIZABLE at w=4/400 and w=8/400 ops on
put/get/del/copy/move/rename; S3 thrash ratio 2.95× → 1.06× after
admission gate.
prefetch_predictor.go (new, ~200 LOC):
- Per-dir ring buffer of recent GET keys.
- Sorted-run detection: when the last N requests in a dir form a
  monotonic sequence, dispatch cacheBackFullFetch for the next N keys
  via alloc.GetRefs so sequential workloads (MLPerf readouts, TPC-DS
  full-table scans) stream through tmpfs with no blobber round-trips.
- Thread-safe per-dir locks, bounded buffer size, backoff on evict.

cache_router.go:
- Wrap served readers in klauspost/readahead so OS-level readers see
  1 MiB prefetched chunks regardless of caller buffer size.
- Issue FADV_SEQUENTIAL + FADV_WILLNEED via unix.Fadvise on cache
  files before handing the fd to the HTTP writer — doubles effective
  sequential MB/s on spillover-NVMe hits.
initSDK.go:
- FallbackS3WriteThrough config field accepts "async" | "mirror" |
  "" (off). async posts to a bounded worker pool; mirror fails the
  PUT if the upstream S3 write fails.
- FallbackS3WriteMaxBytes caps per-object mirror size so huge
  checkpoints do not block the PUT ack path.

fallback_s3_write.go (new, ~250 LOC):
- syncPutToUpstream / syncPutStreamToUpstream / syncDeleteFromUpstream
  with retry + exponential backoff and a 16-way bounded concurrency
  channel.
- Optional mirror mode surfaces upstream errors on the PUT response
  so multi-region deployments can guarantee cross-region landing
  before ack.

nfs_blobber_sync.go:
- spilloverMonitor is now gated on NFSSpilloverCacheEnabled so
  deployments with tmpfs-only cache skip the monitor goroutine.
- commitBatch and commitDeleteBatch invoke the write-through hooks
  from fallback_s3_write.go so the NFS path also fans out to the
  upstream S3 when configured.
- Removed duplicate stubbuf declaration that tripped the linter.
cache_stats_router.go: expose nfs_tmpfs_hits, nfs_spillover_hits, and
nfs_prewarm_fetches in the /cache-stats JSON body so operators can
distinguish tmpfs-only hits from spillover-NVMe promotion and from
cold cacheBack fetches.

prewarm_router.go: atomic.AddInt64 bumps at the 4 prewarm entry
points feed the counters above.
go.work + go.work.sum: go 1.22.5 → 1.24.0, toolchain go1.22.11 →
go1.24.5. Required by the new dependencies pulled in by the
prefetch predictor (klauspost/readahead) and aligns the workspace
with the system_test toolchain bump.
…ture

DEPLOYMENT.md (new, ~400 lines): three deployment topologies
(co-located, remote-gateway, containerised) with ports, setup
commands, data-flow diagrams, and pricing model for multi-gateway
fan-out.

BENCHMARKS_2026_04_19.md: extended with PM addendum, kNFSd +
pipelined-AU section, methodology correction (NFS path was /nfs_export
raw tmpfs, now /mnt/zus_nfs through Ganesha), and final NFS re-bench
under /mnt/zus_nfs — SF10 NFS 1G/0 now 206s (vs 363s historic baseline).

BENCHMARKS_2026_04_15.md: SF10 baseline + cache-tier matrix (tmpfs +
spillover combinations) captured before the 04-19 fixes landed.

CLAUDE.md: Claude Code build/test/lint playbook for this repo.

LOGCACHE_ARCHITECTURE.md: design notes for the prospective log-cache
layer (not yet implemented).
…rm size-mismatch accept

- gateway-zcn.go: ListBuckets switches from GetRefs to ListDir so that
  directories created implicitly by rclone uploads (no explicit mkdir
  WriteMarker) are discovered; GetBucketInfo also probes ListDir as an
  implicit-dir fallback before returning BucketNotFound
- initSDK.go: pre-seed gosdk node cache with SetNetwork before
  InitStorageSDK so that InitNetworkDetails can use the cached nodes
  as a fallback when 0DNS is transiently unreachable at startup
- prewarm_router.go: split empty-read (n==0, restore stub) from
  size-mismatch (n>0 but n!=expected, accept actual bytes) — stale
  ActualFileSize in the best-effort filerefsworker fallback ref was
  triggering stub restore and starving subsequent reads
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants