From c1135c911aed09a1b45b5f4f42c4dc049dfcf507 Mon Sep 17 00:00:00 2001 From: Rajat Arya Date: Tue, 17 Mar 2026 19:33:28 -0700 Subject: [PATCH 1/5] Add mermaid diagrams to Xet protocol specification docs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - index.md: block diagram showing overall Xet architecture (file → chunks → xorbs → shard → CAS) - xorb.md: packet diagram for chunk header wire layout (replaces ASCII art) - shard.md: packet diagrams for all binary structures — header, FileDataSequenceHeader/Entry, FileVerificationEntry, FileMetadataExt, CASChunkSequenceHeader/Entry, footer (replaces ASCII art) - chunking.md: flowchart for the CDC boundary decision algorithm - hashing.md: flowchart showing the four hash computation paths (chunk, xorb, file, verification) - file-id.md: sequence diagram for the resolve URL → X-Xet-Hash flow Co-Authored-By: Claude Opus 4.6 (1M context) --- docs/xet/chunking.md | 18 ++++++ docs/xet/file-id.md | 14 +++++ docs/xet/hashing.md | 15 +++++ docs/xet/index.md | 45 ++++++++++++++ docs/xet/shard.md | 138 +++++++++++++++++++++++-------------------- docs/xet/xorb.md | 16 ++--- 6 files changed, 175 insertions(+), 71 deletions(-) diff --git a/docs/xet/chunking.md b/docs/xet/chunking.md index 1d501e044f..16db8fcc11 100644 --- a/docs/xet/chunking.md +++ b/docs/xet/chunking.md @@ -48,6 +48,24 @@ When a boundary found or taken: At end-of-file, if `start_offset < len(data)`, emit the final chunk `[start_offset, len(data))`. +### Decision Flowchart + +```mermaid +flowchart TD + A["Read next byte b"] --> B["h = (h << 1) + TABLE[b]"] + B --> C["size = offset - start + 1"] + C --> D{"size < MIN_CHUNK_SIZE\n(8 KiB)?"} + D -->|Yes| A + D -->|No| E{"size >= MAX_CHUNK_SIZE\n(128 KiB)?"} + E -->|Yes| G["Emit chunk, reset h = 0"] + E -->|No| F{"(h & MASK) == 0?"} + F -->|Yes| G + F -->|No| A + G --> H{"End of file?"} + H -->|No| A + H -->|Yes| I["Emit final chunk if data remains"] +``` + ### Pseudocode ```text diff --git a/docs/xet/file-id.md b/docs/xet/file-id.md index 9e4bae8afd..8673596e7c 100644 --- a/docs/xet/file-id.md +++ b/docs/xet/file-id.md @@ -31,3 +31,17 @@ This is the string representation of the hash and can be used directly in the fi > [!NOTE] > The resolve URL will return a 302 redirect http status code, following the redirect will download the content via the old LFS compatible route rather than through the Xet protocol. In order to use the Xet protocol you MUST NOT follow this redirect. + +```mermaid +sequenceDiagram + autonumber + actor C as Client + participant Hub as Hugging Face Hub + + C->>Hub: GET /namespace/repo/resolve/branch/filepath
Authorization: Bearer + Hub-->>C: 302 Redirect + X-Xet-Hash header + + Note over C: Extract X-Xet-Hash value = Xet File ID
Do NOT follow the 302 redirect + + C->>C: Use File ID with CAS Reconstruction API +``` diff --git a/docs/xet/hashing.md b/docs/xet/hashing.md index 6b2c3ab7c9..0e6f67a9ac 100644 --- a/docs/xet/hashing.md +++ b/docs/xet/hashing.md @@ -9,6 +9,21 @@ The Xet protocol utilizes a few different hashing types. All hashes referenced are 32 bytes (256 bits) long. +```mermaid +flowchart LR + subgraph Input + CD["Chunk Data"] + CH["Chunk Hashes"] + end + + CD -->|"blake3(data, DATA_KEY)"| ChunkHash["Chunk Hash"] + ChunkHash --> CH + + CH -->|"Merkle Tree\n+ INTERNAL_NODE_KEY"| XorbHash["Xorb Hash"] + CH -->|"Merkle Tree\n+ INTERNAL_NODE_KEY\nthen blake3(root, zeros)"| FileHash["File Hash"] + CH -->|"blake3(concat hashes,\nVERIFICATION_KEY)"| VerifHash["Term Verification Hash"] +``` + ## Chunk Hashes After cutting a chunk of data, the chunk hash is computed via a blake3 keyed hash with the following key (DATA_KEY): diff --git a/docs/xet/index.md b/docs/xet/index.md index 18faa58822..b2daeb4fff 100644 --- a/docs/xet/index.md +++ b/docs/xet/index.md @@ -19,6 +19,51 @@ Implementors can create their own clients, SDKs, and tools that speak the Xet pr ## Overall Xet Architecture +```mermaid +block + columns 3 + File["📄 File"] + space + space + + CDC["Chunking (CDC)"] + space + space + + block:chunks + columns 5 + C0["Chunk 0"] C1["Chunk 1"] C2["Chunk 2"] C3["..."] C4["Chunk N"] + end + + space + space + space + + block:xorbs + columns 2 + X0["Xorb A\n(chunks 0–1023)"] + X1["Xorb B\n(chunks 1024–N)"] + end + + space + Shard["Shard\n(file reconstructions\n+ xorb metadata)"] + + space + space + space + + CAS["CAS Server\n(Content Addressable Storage)"] + space + space + + File --> CDC + CDC --> chunks + chunks --> xorbs + xorbs --> Shard + xorbs --> CAS + Shard --> CAS +``` + - [Content-Defined Chunking](./chunking): Gearhash-based CDC with parameters, boundary rules, and performance optimizations. - [Hashing Methods](./hashing): Descriptions and definitions of the different hashing functions used for chunks, xorbs and term verification entries. - [File Reconstruction](./file-reconstruction): Defining "term"-based representation of files using xorb hash + chunk ranges. diff --git a/docs/xet/shard.md b/docs/xet/shard.md index ab62b370e1..0e05e88620 100644 --- a/docs/xet/shard.md +++ b/docs/xet/shard.md @@ -116,12 +116,14 @@ struct MDBShardFileHeader { **Memory Layout**: -```txt -┌────────────────────────────────────────────────────────────────┬───────────┬───────────┐ -│ tag (32 bytes) │ version │ footer_sz │ -│ Magic Number Identifier │ (8 bytes) │ (8 bytes) │ -└────────────────────────────────────────────────────────────────┴───────────┴───────────┘ -0 32 40 48 +```mermaid +--- +title: "MDBShardFileHeader (48 bytes)" +--- +packet + 0-255: "tag (32 bytes) — Magic Number Identifier" + 256-319: "version (u64)" + 320-383: "footer_size (u64)" ``` **Deserialization steps**: @@ -220,12 +222,15 @@ Given the `file_data_sequence_header.file_flags & MASK` (bitwise AND) operations **Memory Layout**: -```txt -┌────────────────────────────────────────────────────────────────┬──────────┬───────────┬────────────┐ -│ file_hash (32 bytes) │file_flags│num_entries│ _unused │ -│ File Hash Value │(4 bytes) │(4 bytes) │ (8 bytes) │ -└────────────────────────────────────────────────────────────────┴──────────┴───────────┴────────────┘ -0 32 36 40 48 +```mermaid +--- +title: "FileDataSequenceHeader (48 bytes)" +--- +packet + 0-255: "file_hash (32 bytes)" + 256-287: "file_flags (u32)" + 288-319: "num_entries (u32)" + 320-383: "_unused (8 bytes)" ``` ### FileDataSequenceEntry @@ -247,13 +252,16 @@ struct FileDataSequenceEntry { **Memory Layout**: -```txt -┌────────────────────────────────────────────────────────────────┬─────────┬─────────┬─────────┬─────────┐ -│ cas_hash (32 bytes) │cas_flags│unpacked │chunk_idx│chunk_idx│ -│ CAS Block Hash │(4 bytes)│seg_bytes│start │end │ -│ │ │(4 bytes)│(4 bytes)│(4 bytes)│ -└────────────────────────────────────────────────────────────────┴─────────┴─────────┴─────────┴─────────┘ -0 32 36 40 44 48 +```mermaid +--- +title: "FileDataSequenceEntry (48 bytes)" +--- +packet + 0-255: "cas_hash (32 bytes) — Xorb Hash" + 256-287: "cas_flags (u32)" + 288-319: "unpacked_segment_bytes (u32)" + 320-351: "chunk_index_start (u32)" + 352-383: "chunk_index_end (u32)" ``` ### FileVerificationEntry (OPTIONAL) @@ -271,12 +279,13 @@ struct FileVerificationEntry { **Memory Layout**: -```txt -┌────────────────────────────────────────────────────────────────┬────────────────────────────────┐ -│ range_hash (32 bytes) │ _unused (16 bytes) │ -│ Verification Hash │ Reserved Space │ -└────────────────────────────────────────────────────────────────┴────────────────────────────────┘ -0 32 48 +```mermaid +--- +title: "FileVerificationEntry (48 bytes)" +--- +packet + 0-255: "range_hash (32 bytes) — Verification Hash" + 256-383: "_unused (16 bytes)" ``` When a shard has verification entries, all file info sections MUST have verification entries. @@ -302,12 +311,13 @@ struct FileMetadataExt { **Memory Layout**: -```txt -┌────────────────────────────────────────────────────────────────┬────────────────────────────────┐ -│ sha256 (32 bytes) │ _unused (16 bytes) │ -│ SHA256 Hash │ Reserved Space │ -└────────────────────────────────────────────────────────────────┴────────────────────────────────┘ -0 32 48 +```mermaid +--- +title: "FileMetadataExt (48 bytes)" +--- +packet + 0-255: "sha256 (32 bytes) — SHA256 Hash" + 256-383: "_unused (16 bytes)" ``` ### File Info Bookend @@ -381,13 +391,16 @@ struct CASChunkSequenceHeader { **Memory Layout**: -```txt -┌────────────────────────────────────────────────────────────────┬─────────┬─────────┬─────────┬─────────┐ -│ cas_hash (32 bytes) │cas_flags│num_ │num_bytes│num_bytes│ -│ CAS Block Hash │(4 bytes)│entries │in_cas │on_disk │ -│ │ │(4 bytes)│(4 bytes)│(4 bytes)│ -└────────────────────────────────────────────────────────────────┴─────────┴─────────┴─────────┴─────────┘ -0 32 36 40 44 48 +```mermaid +--- +title: "CASChunkSequenceHeader (48 bytes)" +--- +packet + 0-255: "cas_hash (32 bytes) — Xorb Hash" + 256-287: "cas_flags (u32)" + 288-319: "num_entries (u32)" + 320-351: "num_bytes_in_cas (u32)" + 352-383: "num_bytes_on_disk (u32)" ``` ### CASChunkSequenceEntry @@ -406,15 +419,15 @@ struct CASChunkSequenceEntry { **Memory Layout**: -```txt -┌────────────────────────────────────────────────────────────────┬─────────┬─────────┬─────────────────┐ -│ chunk_hash (32 bytes) │chunk_ │unpacked │ _unused │ -│ Chunk Hash │byte_ │segment_ │ (8 bytes) │ -│ │range_ │bytes │ │ -│ │start │(4 bytes)│ │ -│ │(4 bytes)│ │ │ -└────────────────────────────────────────────────────────────────┴─────────┴─────────┴─────────────────┘ -0 32 36 40 48 +```mermaid +--- +title: "CASChunkSequenceEntry (48 bytes)" +--- +packet + 0-255: "chunk_hash (32 bytes)" + 256-287: "chunk_byte_range_start (u32)" + 288-319: "unpacked_segment_bytes (u32)" + 320-383: "_unused (8 bytes)" ``` ### CAS Info Bookend @@ -451,23 +464,20 @@ struct MDBShardFileFooter { **Memory Layout**: -> [!NOTE] -> Fields are not exactly to scale - -```txt -┌─────────┬─────────┬─────────┬─────────────────────────────────────────────────────────────┬─────────────────────────────────────┐ -│ version │file_info│cas_info │ _buffer (reserved) │ chunk_hash_hmac_key │ -│(8 bytes)│offset │offset │ (48 bytes) │ (32 bytes) │ -│ │(8 bytes)│(8 bytes)│ │ │ -└─────────┴─────────┴─────────┴─────────────────────────────────────────────────────────────┴─────────────────────────────────────┘ -0 8 16 24 72 104 - -┌─────────┬──────────┬─────────────────────────────────────────────────────────────────────────────┬─────────┐ -│creation │shard_ │ _buffer (reserved) │footer_ │ -│timestamp│key_expiry│ (72 bytes) │offset │ -│(8 bytes)│ (8 bytes)│ │(8 bytes)│ -└─────────┴──────────┴─────────────────────────────────────────────────────────────────────────────┴─────────┘ -104 112 120 192 200 +```mermaid +--- +title: "MDBShardFileFooter (200 bytes)" +--- +packet + 0-63: "version (u64)" + 64-127: "file_info_offset (u64)" + 128-191: "cas_info_offset (u64)" + 192-575: "_buffer (48 bytes reserved)" + 576-831: "chunk_hash_hmac_key (32 bytes)" + 832-895: "shard_creation_timestamp (u64)" + 896-959: "shard_key_expiry (u64)" + 960-1535: "_buffer2 (72 bytes reserved)" + 1536-1599: "footer_offset (u64)" ``` **Deserialization steps**: diff --git a/docs/xet/xorb.md b/docs/xet/xorb.md index e043b129ec..6ab1fe1c54 100644 --- a/docs/xet/xorb.md +++ b/docs/xet/xorb.md @@ -58,13 +58,15 @@ the uncompressed size also being at a maximum of 128KiB. #### Chunk Header Layout -```txt -┌─────────┬─────────────────────────────────┬──────────────┬─────────────────────────────────┐ -│ Version │ Compressed Size │ Compression │ Uncompressed Size │ -│ 1 byte │ 3 bytes │ Type │ 3 bytes │ -│ │ (little-endian) │ 1 byte │ (little-endian) │ -└─────────┴─────────────────────────────────┴──────────────┴─────────────────────────────────┘ -0 1 4 5 8 +```mermaid +--- +title: "Chunk Header (8 bytes)" +--- +packet + 0-7: "Version" + 8-31: "Compressed Size (LE)" + 32-39: "Compression Type" + 40-63: "Uncompressed Size (LE)" ``` ### Chunk Compression Schemes From 2b8ab4095d394bf58195b572089de35c9337b1b4 Mon Sep 17 00:00:00 2001 From: Rajat Arya Date: Tue, 17 Mar 2026 19:39:49 -0700 Subject: [PATCH 2/5] Switch packet diagrams to byte-level addressing for readability All packet diagrams now use 1 unit = 1 byte instead of 1 unit = 1 bit. This prevents 32-byte hash fields from spanning 8 rows with repeated labels, making the diagrams much more compact and readable. Co-Authored-By: Claude Opus 4.6 (1M context) --- docs/xet/shard.md | 68 +++++++++++++++++++++++------------------------ docs/xet/xorb.md | 8 +++--- 2 files changed, 38 insertions(+), 38 deletions(-) diff --git a/docs/xet/shard.md b/docs/xet/shard.md index 0e05e88620..31d3a87e9d 100644 --- a/docs/xet/shard.md +++ b/docs/xet/shard.md @@ -121,9 +121,9 @@ struct MDBShardFileHeader { title: "MDBShardFileHeader (48 bytes)" --- packet - 0-255: "tag (32 bytes) — Magic Number Identifier" - 256-319: "version (u64)" - 320-383: "footer_size (u64)" + 0-31: "tag (32 bytes) — Magic Number Identifier" + 32-39: "version (u64)" + 40-47: "footer_size (u64)" ``` **Deserialization steps**: @@ -227,10 +227,10 @@ Given the `file_data_sequence_header.file_flags & MASK` (bitwise AND) operations title: "FileDataSequenceHeader (48 bytes)" --- packet - 0-255: "file_hash (32 bytes)" - 256-287: "file_flags (u32)" - 288-319: "num_entries (u32)" - 320-383: "_unused (8 bytes)" + 0-31: "file_hash (32 bytes)" + 32-35: "file_flags (u32)" + 36-39: "num_entries (u32)" + 40-47: "_unused (8 bytes)" ``` ### FileDataSequenceEntry @@ -257,11 +257,11 @@ struct FileDataSequenceEntry { title: "FileDataSequenceEntry (48 bytes)" --- packet - 0-255: "cas_hash (32 bytes) — Xorb Hash" - 256-287: "cas_flags (u32)" - 288-319: "unpacked_segment_bytes (u32)" - 320-351: "chunk_index_start (u32)" - 352-383: "chunk_index_end (u32)" + 0-31: "cas_hash (32 bytes) — Xorb Hash" + 32-35: "cas_flags (u32)" + 36-39: "unpacked_segment_bytes (u32)" + 40-43: "chunk_index_start (u32)" + 44-47: "chunk_index_end (u32)" ``` ### FileVerificationEntry (OPTIONAL) @@ -284,8 +284,8 @@ struct FileVerificationEntry { title: "FileVerificationEntry (48 bytes)" --- packet - 0-255: "range_hash (32 bytes) — Verification Hash" - 256-383: "_unused (16 bytes)" + 0-31: "range_hash (32 bytes) — Verification Hash" + 32-47: "_unused (16 bytes)" ``` When a shard has verification entries, all file info sections MUST have verification entries. @@ -316,8 +316,8 @@ struct FileMetadataExt { title: "FileMetadataExt (48 bytes)" --- packet - 0-255: "sha256 (32 bytes) — SHA256 Hash" - 256-383: "_unused (16 bytes)" + 0-31: "sha256 (32 bytes) — SHA256 Hash" + 32-47: "_unused (16 bytes)" ``` ### File Info Bookend @@ -396,11 +396,11 @@ struct CASChunkSequenceHeader { title: "CASChunkSequenceHeader (48 bytes)" --- packet - 0-255: "cas_hash (32 bytes) — Xorb Hash" - 256-287: "cas_flags (u32)" - 288-319: "num_entries (u32)" - 320-351: "num_bytes_in_cas (u32)" - 352-383: "num_bytes_on_disk (u32)" + 0-31: "cas_hash (32 bytes) — Xorb Hash" + 32-35: "cas_flags (u32)" + 36-39: "num_entries (u32)" + 40-43: "num_bytes_in_cas (u32)" + 44-47: "num_bytes_on_disk (u32)" ``` ### CASChunkSequenceEntry @@ -424,10 +424,10 @@ struct CASChunkSequenceEntry { title: "CASChunkSequenceEntry (48 bytes)" --- packet - 0-255: "chunk_hash (32 bytes)" - 256-287: "chunk_byte_range_start (u32)" - 288-319: "unpacked_segment_bytes (u32)" - 320-383: "_unused (8 bytes)" + 0-31: "chunk_hash (32 bytes)" + 32-35: "chunk_byte_range_start (u32)" + 36-39: "unpacked_segment_bytes (u32)" + 40-47: "_unused (8 bytes)" ``` ### CAS Info Bookend @@ -469,15 +469,15 @@ struct MDBShardFileFooter { title: "MDBShardFileFooter (200 bytes)" --- packet - 0-63: "version (u64)" - 64-127: "file_info_offset (u64)" - 128-191: "cas_info_offset (u64)" - 192-575: "_buffer (48 bytes reserved)" - 576-831: "chunk_hash_hmac_key (32 bytes)" - 832-895: "shard_creation_timestamp (u64)" - 896-959: "shard_key_expiry (u64)" - 960-1535: "_buffer2 (72 bytes reserved)" - 1536-1599: "footer_offset (u64)" + 0-7: "version (u64)" + 8-15: "file_info_offset (u64)" + 16-23: "cas_info_offset (u64)" + 24-71: "_buffer (48 bytes reserved)" + 72-103: "chunk_hash_hmac_key (32 bytes)" + 104-111: "shard_creation_timestamp (u64)" + 112-119: "shard_key_expiry (u64)" + 120-191: "_buffer2 (72 bytes reserved)" + 192-199: "footer_offset (u64)" ``` **Deserialization steps**: diff --git a/docs/xet/xorb.md b/docs/xet/xorb.md index 6ab1fe1c54..57dcbe1231 100644 --- a/docs/xet/xorb.md +++ b/docs/xet/xorb.md @@ -63,10 +63,10 @@ the uncompressed size also being at a maximum of 128KiB. title: "Chunk Header (8 bytes)" --- packet - 0-7: "Version" - 8-31: "Compressed Size (LE)" - 32-39: "Compression Type" - 40-63: "Uncompressed Size (LE)" + 0: "Version" + 1-3: "Compressed Size (LE)" + 4: "Compression Type" + 5-7: "Uncompressed Size (LE)" ``` ### Chunk Compression Schemes From f85bad439915b81f1908a54dd6705f3570978559 Mon Sep 17 00:00:00 2001 From: Rajat Arya Date: Tue, 17 Mar 2026 19:42:56 -0700 Subject: [PATCH 3/5] Use bit-level addressing for xorb chunk header packet diagram The 8-byte chunk header is too small for byte-level units (cells are unreadably tiny on the 32-unit row). Bit-level gives 2 well-proportioned rows of 32 bits each with readable labels. Co-Authored-By: Claude Opus 4.6 (1M context) --- docs/xet/xorb.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/xet/xorb.md b/docs/xet/xorb.md index 57dcbe1231..b1f399dcb5 100644 --- a/docs/xet/xorb.md +++ b/docs/xet/xorb.md @@ -63,10 +63,10 @@ the uncompressed size also being at a maximum of 128KiB. title: "Chunk Header (8 bytes)" --- packet - 0: "Version" - 1-3: "Compressed Size (LE)" - 4: "Compression Type" - 5-7: "Uncompressed Size (LE)" + 0-7: "Version (1 byte)" + 8-31: "Compressed Size (3 bytes, LE)" + 32-39: "Compression Type (1 byte)" + 40-63: "Uncompressed Size (3 bytes, LE)" ``` ### Chunk Compression Schemes From 3ac4b62cfd21d72c73ffac943ec1520d0f1bd63c Mon Sep 17 00:00:00 2001 From: Rajat Arya Date: Tue, 17 Mar 2026 19:47:38 -0700 Subject: [PATCH 4/5] Remove blank lines inside mermaid code blocks Some markdown-to-HTML converters split on blank lines within fenced code blocks and inject

tags before mermaid processes the content. Removing all blank lines inside mermaid blocks fixes this. Co-Authored-By: Claude Opus 4.6 (1M context) --- docs/xet/file-id.md | 3 --- docs/xet/hashing.md | 2 -- docs/xet/index.md | 8 -------- 3 files changed, 13 deletions(-) diff --git a/docs/xet/file-id.md b/docs/xet/file-id.md index 8673596e7c..d6ab8bd70b 100644 --- a/docs/xet/file-id.md +++ b/docs/xet/file-id.md @@ -37,11 +37,8 @@ sequenceDiagram autonumber actor C as Client participant Hub as Hugging Face Hub - C->>Hub: GET /namespace/repo/resolve/branch/filepath
Authorization: Bearer Hub-->>C: 302 Redirect + X-Xet-Hash header - Note over C: Extract X-Xet-Hash value = Xet File ID
Do NOT follow the 302 redirect - C->>C: Use File ID with CAS Reconstruction API ``` diff --git a/docs/xet/hashing.md b/docs/xet/hashing.md index 0e6f67a9ac..db157ba5c5 100644 --- a/docs/xet/hashing.md +++ b/docs/xet/hashing.md @@ -15,10 +15,8 @@ flowchart LR CD["Chunk Data"] CH["Chunk Hashes"] end - CD -->|"blake3(data, DATA_KEY)"| ChunkHash["Chunk Hash"] ChunkHash --> CH - CH -->|"Merkle Tree\n+ INTERNAL_NODE_KEY"| XorbHash["Xorb Hash"] CH -->|"Merkle Tree\n+ INTERNAL_NODE_KEY\nthen blake3(root, zeros)"| FileHash["File Hash"] CH -->|"blake3(concat hashes,\nVERIFICATION_KEY)"| VerifHash["Term Verification Hash"] diff --git a/docs/xet/index.md b/docs/xet/index.md index b2daeb4fff..532852eb62 100644 --- a/docs/xet/index.md +++ b/docs/xet/index.md @@ -25,37 +25,29 @@ block File["📄 File"] space space - CDC["Chunking (CDC)"] space space - block:chunks columns 5 C0["Chunk 0"] C1["Chunk 1"] C2["Chunk 2"] C3["..."] C4["Chunk N"] end - space space space - block:xorbs columns 2 X0["Xorb A\n(chunks 0–1023)"] X1["Xorb B\n(chunks 1024–N)"] end - space Shard["Shard\n(file reconstructions\n+ xorb metadata)"] - space space space - CAS["CAS Server\n(Content Addressable Storage)"] space space - File --> CDC CDC --> chunks chunks --> xorbs From 0f4a765692585a69fd1fc9471b13fc8dc7afca7a Mon Sep 17 00:00:00 2001 From: Rajat Arya Date: Tue, 17 Mar 2026 19:49:36 -0700 Subject: [PATCH 5/5] Quote node labels in deduplication flowchart to prevent

injection Unquoted square brackets in mermaid flowchart nodes (e.g. A[Text]) can be interpreted as markdown link references by some parsers, causing

tags to wrap each label. Using quoted strings (A["Text"]) fixes this. Co-Authored-By: Claude Opus 4.6 (1M context) --- docs/xet/deduplication.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/xet/deduplication.md b/docs/xet/deduplication.md index 1f00cefa10..4c625bdb6f 100644 --- a/docs/xet/deduplication.md +++ b/docs/xet/deduplication.md @@ -56,10 +56,10 @@ When a file is processed for upload, it undergoes the following steps: ```mermaid graph TD - A[File Input] --> B[Content-Defined Chunking] - B --> C[Hash Computation] - C --> D[Chunk Creation] - D --> E[Deduplication Query] + A["File Input"] --> B["Content-Defined Chunking"] + B --> C["Hash Computation"] + C --> D["Chunk Creation"] + D --> E["Deduplication Query"] ``` 1. **Chunking**: Content-defined chunking using GearHash algorithm creates variable-sized chunks of file data