From eac6bcb6cfcccb9d5c6819eac41f2d5bf1b76dd6 Mon Sep 17 00:00:00 2001 From: konard Date: Sat, 23 May 2026 11:56:15 +0000 Subject: [PATCH 1/8] Initial commit with task details Adding .gitkeep for PR creation (default mode). This file will be removed when the task is complete. Issue: https://github.com/linksplatform/Data.Doublets/issues/512 --- .gitkeep | 1 + 1 file changed, 1 insertion(+) create mode 100644 .gitkeep diff --git a/.gitkeep b/.gitkeep new file mode 100644 index 000000000..4f02c7fb7 --- /dev/null +++ b/.gitkeep @@ -0,0 +1 @@ +# .gitkeep file auto-generated at 2026-05-23T11:56:14.988Z for PR creation at branch issue-512-557a0a3ca78d for issue https://github.com/linksplatform/Data.Doublets/issues/512 \ No newline at end of file From 47894534949fb6cb0aab995244d34a3e2ad274d2 Mon Sep 17 00:00:00 2001 From: konard Date: Sat, 23 May 2026 12:03:11 +0000 Subject: [PATCH 2/8] Add issue #512 case study (docs/case-studies/issue-512) Compiles the requirements analysis, background notes on UnitedMemoryLinks, the chosen allocator design (sorted free-range list with RawMarker tag), related work bibliography and the step-by-step solution plan for the new UnitedRangedMemoryLinks variant. Refs #512 --- docs/case-studies/issue-512/README.md | 33 ++++ docs/case-studies/issue-512/background.md | 109 +++++++++++ docs/case-studies/issue-512/design.md | 176 ++++++++++++++++++ docs/case-studies/issue-512/related-work.md | 61 ++++++ docs/case-studies/issue-512/requirements.md | 134 +++++++++++++ .../issue-512/risks-and-trade-offs.md | 43 +++++ docs/case-studies/issue-512/solution-plan.md | 66 +++++++ 7 files changed, 622 insertions(+) create mode 100644 docs/case-studies/issue-512/README.md create mode 100644 docs/case-studies/issue-512/background.md create mode 100644 docs/case-studies/issue-512/design.md create mode 100644 docs/case-studies/issue-512/related-work.md create mode 100644 docs/case-studies/issue-512/requirements.md create mode 100644 docs/case-studies/issue-512/risks-and-trade-offs.md create mode 100644 docs/case-studies/issue-512/solution-plan.md diff --git a/docs/case-studies/issue-512/README.md b/docs/case-studies/issue-512/README.md new file mode 100644 index 000000000..58c0a17fd --- /dev/null +++ b/docs/case-studies/issue-512/README.md @@ -0,0 +1,33 @@ +# Case Study: Issue #512 — `UnitedRangedMemoryLinks` with Ranges for Binary Data + +> Source issue: https://github.com/linksplatform/Data.Doublets/issues/512 +> +> Author: @konard +> +> Branch / PR: [`issue-512-557a0a3ca78d`](https://github.com/linksplatform/Data.Doublets/tree/issue-512-557a0a3ca78d) — PR [#513](https://github.com/linksplatform/Data.Doublets/pull/513) + +This directory collects the analysis, design exploration and implementation plan for the new `UnitedRangedMemoryLinks` doublets storage variant. The goal is twofold: + +1. Provide an _evolution_ of `UnitedMemoryLinks` that allocates and reclaims **contiguous ranges of links** instead of single links, while preserving the no-fragmentation, uniform-cell invariant that makes united storage so attractive. +2. Allow **raw binary blobs** to live inside the same address space as ordinary doublets, by reusing the underlying link cell as a payload cell, gated by a dedicated marker stored in `LinksConstants`. + +The files in this directory are: + +| File | Purpose | +| --- | --- | +| [`requirements.md`](./requirements.md) | Itemised, traceable list of every requirement extracted from the issue text. | +| [`background.md`](./background.md) | Background on `UnitedMemoryLinks`, RawLink/LinksHeader layout, and the constraints imposed by the existing codebase. | +| [`design.md`](./design.md) | Design alternatives (sorted free list, segregated free list, buddy allocator, bitmap, …) and the **chosen design**, including disk layout and invariants. | +| [`related-work.md`](./related-work.md) | External references and prior art used while researching the problem (allocator literature, in-memory tagged-pointer schemes, B-tree page allocators, …). | +| [`solution-plan.md`](./solution-plan.md) | Step-by-step plan that maps every requirement to a concrete code change. | +| [`risks-and-trade-offs.md`](./risks-and-trade-offs.md) | Trade-offs, future work and explicit non-goals. | + +## TL;DR + +Each cell of the storage still occupies one `RawLink` slot (8 × `TLinkAddress`), so the file format remains uniform and free of internal fragmentation. The improvements are: + +* A **range allocator** that tracks free regions as a sorted-by-address, length-keyed doubly-linked list of `RawLink` cells (the same cells reused as range descriptors). Adjacent free regions are eagerly coalesced on deallocation, so the only way fragmentation can grow is when an allocation is _larger than every free region_, in which case the storage is simply extended at the tail. +* A new **`RawMarker`** constant in `LinksConstants` — used as the `Source` field of the first cell of a binary blob — that designates the cell sequence as a binary payload rather than a doublet. The second field (`Target`) records the length of the blob in `TLinkAddress` units, from which the number of consumed link cells is derived. +* A new **`UnitedRangedMemoryLinks`** class drop-in compatible with `ILinks` (so the existing tests pass with it as a substitute for `UnitedMemoryLinks`) plus two new public operations: `AllocateRange(length)` / `DeallocateRange(start)` and `AllocateRawBinary(byteLength)` / `WriteRawBinary` / `ReadRawBinary`. + +For the full rationale, see [`design.md`](./design.md). diff --git a/docs/case-studies/issue-512/background.md b/docs/case-studies/issue-512/background.md new file mode 100644 index 000000000..46d72cef8 --- /dev/null +++ b/docs/case-studies/issue-512/background.md @@ -0,0 +1,109 @@ +# Background — How `UnitedMemoryLinks` Works Today + +This is a short tour of the parts of the existing implementation that the +`UnitedRangedMemoryLinks` design needs to interact with. Line numbers refer to the +state of the repository at the time of writing. + +## File layout + +A united-memory database is a single mapped file that begins with a `LinksHeader` and +then continues with a sequence of equally sized `RawLink` cells: + +``` ++-------------------+-------------------+-------------------+-----+ +| Header | Cell #1 | Cell #2 | … | +| (LinkSizeInBytes) | (LinkSizeInBytes) | (LinkSizeInBytes) | | ++-------------------+-------------------+-------------------+-----+ +``` + +The header overlays the very first cell, so cell #0 never carries real data +(`csharp/Platform.Data.Doublets/Memory/United/Generic/UnitedMemoryLinksBase.cs:184`). + +`LinkSizeInBytes` is `8 * sizeof(TLinkAddress)` — that is, eight `TLinkAddress` words: + +```csharp +public struct RawLink +{ + public TLinkAddress Source; // word 0 + public TLinkAddress Target; // word 1 + public TLinkAddress LeftAsSource; // word 2 + public TLinkAddress RightAsSource; // word 3 + public TLinkAddress SizeAsSource; // word 4 + public TLinkAddress LeftAsTarget; // word 5 + public TLinkAddress RightAsTarget; // word 6 + public TLinkAddress SizeAsTarget; // word 7 +} +``` + +The header is exactly the same size and is laid out as: + +```csharp +public struct LinksHeader +{ + public TLinkAddress AllocatedLinks; // word 0 — high-water mark + public TLinkAddress ReservedLinks; // word 1 — capacity in cells + public TLinkAddress FreeLinks; // word 2 — size of the unused list + public TLinkAddress FirstFreeLink; // word 3 — head of the unused list + public TLinkAddress RootAsSource; // word 4 — root of the by-source tree + public TLinkAddress RootAsTarget; // word 5 — root of the by-target tree + public TLinkAddress LastFreeLink; // word 6 — tail of the unused list + public TLinkAddress Reserved8; // word 7 — currently unused +} +``` + +The matching `Reserved8` word is what `UnitedRangedMemoryLinks` will use for the +**free-range list head**. + +## Lifecycle of a single link + +* `Create` (`UnitedMemoryLinksBase.cs:509-535`) takes the next unused cell from the + unused list (`UnusedLinksListMethods`), or appends a cell at the tail and grows the + underlying memory by `_memoryReservationStep` bytes if the reserved capacity is + exhausted. +* `Delete` (`UnitedMemoryLinksBase.cs:548-574`) either attaches the cell to the front + of the unused list, or — if it is the very last allocated cell — shrinks + `AllocatedLinks`, then keeps popping from the unused list while its tail is the new + high-water mark. +* `Update` (`UnitedMemoryLinksBase.cs:472-503`) detaches the link from the + source/target trees, mutates the cell, and re-attaches. + +The "unused list" is an _absolute circular doubly-linked list_ +(`UnusedLinksListMethods.cs`). Critically, it stores the previous/next pointers in +the `Source`/`Target` slots of the cell it links — so a cell on the free list can be +detected by the predicate + +```csharp +link.SizeAsSource == default && link.Source != default +``` + +(`UnitedMemoryLinksBase.cs:686-697`). + +## Implications for the new design + +1. **Cell #0 is the header.** The reserved word `Reserved8` is _the_ obvious place to + store an extra root pointer — for the free-range list — without breaking any code + that does not look at it. The header will be repurposed as + `LinksRangedHeader` (a `LayoutKind.Explicit` struct with the same + fields plus a typed alias for `Reserved8`) so the binary representation stays + identical to `LinksHeader`. This means a database written by `UnitedMemoryLinks` can + be opened by `UnitedRangedMemoryLinks` and vice-versa, as long as no binary blobs + are present. + +2. **A free single cell remains a free single cell.** The original unused-links list is + _preserved_; the new "free range" list only tracks runs of two or more contiguous + free cells. When a range deallocation produces a run of length 1, it is pushed back + onto the original unused-links list. + +3. **Source-or-Target equal to `RawMarker`** marks a binary blob. The marker value is + chosen so that: + * it is outside `InternalReferencesRange` (so it cannot accidentally appear as a + valid link reference); + * it is _stable_ across versions of `LinksConstants` — we anchor it to one position + above the existing `Error` constant inside the reserved tail of the references + range, which `LinksConstants` already keeps for housekeeping (`Continue`, `Break`, + `Skip`, `Any`, `Itself`, `Error`). + +4. **Tree methods are unchanged.** The new class only intercepts `Create`, `Update`, + `Delete`, `Each` and `Count` to (a) skip cells that belong to a binary blob and + (b) ignore the free-range descriptor cells. All the tree methods receive the same + pointers as before and operate without modification. diff --git a/docs/case-studies/issue-512/design.md b/docs/case-studies/issue-512/design.md new file mode 100644 index 000000000..154a5c5e8 --- /dev/null +++ b/docs/case-studies/issue-512/design.md @@ -0,0 +1,176 @@ +# Design + +## Goals recap + +* Allocate / deallocate contiguous **ranges of link cells** (`R3`, `R4`). +* No fragmentation — never split unless inevitable, coalesce on free (`R7`). +* "Prefer empty space" — best-fit, growth at tail only as a last resort (`R8`). +* Embed raw **binary blobs** in the same address space (`R5`, `R6`, `R9`). +* Stay drop-in compatible with `UnitedMemoryLinks` and `ILinks<>` (`R2`, `R10`). + +## Design alternatives considered + +| Allocator | Pros | Cons | Verdict | +| --- | --- | --- | --- | +| **Per-cell free list (status quo)** | Simplest, used today. | `O(N)` cells to allocate a range; no contiguous guarantee for ranges. | Kept for single cells, but insufficient for ranges. | +| **Bitmap (1 bit per cell)** | Predictable space, easy "find N contiguous". | Linear scan; extra header bytes; not aligned to existing on-disk format. | Rejected — adds a parallel index. | +| **Buddy allocator** | Fast power-of-two ranges. | Internal fragmentation for non-power-of-two requests; requires careful split/coalesce. | Rejected — violates "no fragmentation". | +| **Segregated free lists by size** | Best-fit in O(1) when a size class exists. | Many overflow size classes for `ulong` ranges; tricky coalescing. | Rejected — over-engineered. | +| **Sorted-by-address doubly-linked list of free ranges, best-fit** | Trivial coalescing; small constant factor; **stored inside the cells themselves**. | `O(F)` search where F is the number of free ranges. | **Chosen**. | + +The chosen allocator is a [boundary-tag](https://en.wikipedia.org/wiki/Boundary_tag) +free-list allocator, simplified by the fact that cell sizes are uniform: there is no +need to keep a "size" word at every allocation boundary, only at the head of free +runs. + +## Free-range descriptors + +Each free range of length `≥ 2` is described by the **first** cell of the range. We +reuse the bits as follows: + +| Field | Free-range usage | +| --- | --- | +| `Source` | `RawMarker` (sentinel — see below) | +| `Target` | `Length` of the run in cells, including this header cell. | +| `LeftAsSource` | `Previous` pointer in the size-sorted doubly-linked free-range list. | +| `RightAsSource` | `Next` pointer in the size-sorted doubly-linked free-range list. | +| `SizeAsSource` | `Previous` pointer in the address-sorted list. | +| `LeftAsTarget` | `Next` pointer in the address-sorted list. | +| `RightAsTarget` | reserved (`0`). | +| `SizeAsTarget` | reserved (`0`). | + +> Why two linked lists? +> * The **address-sorted** list lets us coalesce with O(1) work — the predecessor and +> successor of a freed range are the address-list neighbours. +> * The **size-sorted** list lets best-fit lookup return early — we walk the list from +> the smallest range upwards and pick the first one that fits, then re-link the +> leftover (if any) back into the free-list. + +The size-sorted list head is stored in `LinksHeader.Reserved8` +(renamed to `FreeRangesHead` via the alias in `LinksRangedHeader`); the address-sorted +list head and the **count of free ranges** are stored in unused tail words of the +header that are currently zero-valued in `UnitedMemoryLinks` databases. To stay +binary-compatible we **do not** widen the on-disk header: the address-sorted list head +is simply rebuilt from the address-list pointers stored inside each free range cell at +open time, and there is no count cached. + +This is functionally equivalent to the classic GNU `malloc` implementation's +[`free_list`](https://sourceware.org/glibc/wiki/MallocInternals#Free_chunks) when bins +are uniform. + +## Binary blob layout + +A binary blob occupies one **header cell** followed by `ceil(length / 8) - 1` payload +cells. The header cell holds: + +| Field | Binary-blob usage | +| --- | --- | +| `Source` | `RawMarker` (sentinel). | +| `Target` | `Length` of the blob in `TLinkAddress` words **including** the header cell's payload words. | +| `LeftAsSource` … `SizeAsTarget` | continuation of the blob's payload. | + +So a 7-word blob fits into a single cell: `Source` holds the marker, `Target` holds the +length `7`, and the remaining 6 fields (`LeftAsSource`, …, `SizeAsTarget`) hold the +six payload words. A 15-word blob spans two cells: 6 payload words in the header cell +and up to 8 payload words in the following cell. Generally, + +``` +cells = max(1, ceil((length - 6) / 8) + 1) // length measured in TLinkAddress words + // 6 = words available in the header cell after Source+Target +``` + +The encoding is unambiguous because: + +* `Source == RawMarker` is never produced by `Create` (which initialises `Source` and + `Target` to `Null` and only ever stores values inside `InternalReferencesRange`). +* The marker is **never** stored in a payload word interior to the blob, because + consumers read raw bytes — they only look at words `[2..]` of the header cell and + `[0..]` of the following cells. + +`RawMarker` is `Constants.Continue + 1`. The references range stops at +`Continue` (since `LinksConstants` reserves the topmost values as housekeeping); the +words just past it are otherwise unused and far above `InternalReferencesRange.Maximum`, +which is the protected zone for "values that look like link indices". + +## Range allocation algorithm + +``` +AllocateRange(length): + assert length >= 1 + if length == 1: + return UnusedLinksListMethods.Detach() ?? AppendOneCell() + range = FindSmallestFreeRange(length) // walks size-sorted list + if range == NULL: + return GrowAtTail(length) // R7 fallback + if range.Length == length: + UnlinkFreeRange(range) + return range.Start + Carve(range, length) // shrink free-range head in place + return range.Start +``` + +`GrowAtTail` bumps `AllocatedLinks` by `length` and grows the backing memory if the +reserved capacity is exceeded, exactly like `Create` does today but in one shot. + +## Range deallocation + +``` +DeallocateRange(start, length): + Coalesce with predecessor (if predecessor.End == start) + Coalesce with successor (if start + length == successor.Start) + Insert resulting range into free-range lists + If start + length == AllocatedLinks + 1, trim the tail and try again +``` + +The "trim the tail" step is what gives the allocator its asymptotic optimality: long +sequences of allocate/free at the end of the file leave the database the same size as +if the operations had never happened. + +## Marking & interaction with `Each` / `Count` + +When the storage iterates over allocated cells, it tests each cell against the marker +to determine whether to skip it: + +```csharp +bool IsBlobHeader(ref RawLink cell) + => AreEqual(cell.Source, _rawMarker); + +bool IsFreeRangeHeader(ref RawLink cell) + => AreEqual(cell.Source, _rawMarker) && BlobLengthIsFreeMarker(cell.Target); +``` + +Because `RawMarker` doubles for both "binary blob" and "free range header", we need a +way to discriminate the two. We use the convention that: + +* a **blob** stores its true length in `Target`, +* a **free range** stores `Length` in `Target` but additionally stores the address-list + prev/next in `SizeAsSource`/`LeftAsTarget`, which are zero in a blob's continuation + cells but the blob _header_ can also have non-zero values there as payload. To + remove the ambiguity, we add a second discriminator: free-range descriptors set the + high bit of `Target` to one (since blob lengths cover at most a fraction of the + available `TLinkAddress` range). On read we strip the high bit before reporting the + length. + +## On-disk compatibility + +* `LinksRangedHeader` has the **same byte layout** as `LinksHeader` — + fields are reused via an `Explicit` layout with `FreeRangesHead` overlaying the + existing `Reserved8` slot. +* Databases produced by `UnitedMemoryLinks` open cleanly in + `UnitedRangedMemoryLinks`: at open time the free-range list head is read; if it is + zero the storage is treated as having no free ranges (so existing databases work + immediately, with the existing per-cell unused list still serving single-cell + allocations). +* Databases produced by `UnitedRangedMemoryLinks` that contain only doublets — i.e. no + blobs and no multi-cell free ranges — round-trip back through `UnitedMemoryLinks` + bit-for-bit. + +## Invariants + +1. **No internal fragmentation** — every link cell is either part of an allocated + doublet, part of an allocated blob, part of a free range, or on the single-cell + unused list. The union of all four sets is exactly `[1, AllocatedLinks]`. +2. **No external fragmentation buildup** — coalescing happens on every deallocation; + appending at the tail is the only way to grow. +3. **`AllocatedLinks` is tight** — after every deallocation, the high-water mark is the + address of the highest still-in-use cell, never more. diff --git a/docs/case-studies/issue-512/related-work.md b/docs/case-studies/issue-512/related-work.md new file mode 100644 index 000000000..bf0f2e4c6 --- /dev/null +++ b/docs/case-studies/issue-512/related-work.md @@ -0,0 +1,61 @@ +# Related Work + +A short, opinionated bibliography. Each entry is annotated with what we are borrowing +and what we are deliberately not borrowing. + +## Allocators with boundary tags + +* **Donald Knuth, _The Art of Computer Programming, Vol. 1, §2.5_** — original + description of boundary-tag allocators (1968). Borrowed: coalesce-on-free. + Not borrowed: variable-sized blocks, since our cells are uniform. + +* **Doug Lea, _A Memory Allocator_ (1996)** — the canonical reference for `dlmalloc`. + Borrowed: best-fit search over a size-sorted free list, immediate coalescing, + the idea that the free chunk metadata _lives inside the free chunk_. + Not borrowed: bin-by-class segregation — overkill at our scale. + +* **`jemalloc`**, **`tcmalloc`** — both employ size classes and per-thread caches. + We are single-threaded inside a `SynchronizedLinks` wrapper, so the complexity is + unnecessary. + +## Tagged-pointer / sentinel schemes for in-line metadata + +* **Lua 5.4 strings** — small strings are stored inline; long strings are referenced + by pointer with a tag bit. The "marker word at the head of a record" idea is the + same as our `RawMarker` (and analogous to Lua's `LUA_TLNGSTR` tag). +* **SQLite "frequent" records** — SQLite reuses the first byte of a record as a type + tag. Our `Source == RawMarker` convention is conceptually identical. + +## Allocators inside persistent stores + +* **PostgreSQL `FreeSpaceMap`** — uses a fan-out tree of per-page free-space records. + Heavier than what we need but illustrates the "free space embedded in the page" idea. +* **LMDB / BoltDB free-page lists** — both maintain a sorted free-page list inside the + database file. We are exactly mirroring this design at finer granularity. +* **MS Exchange Information Store (`.edb`) "RPS"** — Microsoft's research database + layer also stores allocations as fixed-size cells with a free list, and uses tags + to denote "this cell is a continuation of the previous one". + +## Doublets ecosystem (internal) + +* `UnitedMemoryLinks` — the existing single-cell allocator we are evolving. +* `SplitMemoryLinks` — an alternative storage that keeps doublet "index" data and + "data" data in two separate files. Out of scope for this issue, but we keep its + conventions in mind for future merging. +* `Platform.Memory.IResizableDirectMemory` — the unified API we re-use for storage + expansion. +* `Platform.Collections.Methods.Lists.AbsoluteCircularDoublyLinkedListMethods` — the + base class used by the existing unused-link list. We instantiate a second one for + the address-sorted free-range list to keep the implementation small. + +## Online research notes + +Search queries used during the design phase (kept here for traceability): + +* "boundary tag allocator linked list free range coalesce" +* "uniform cell allocator fragmentation" +* "tagged pointer marker first cell binary blob in memory store" +* "linksplatform doublets storage layout" +* "LMDB freelist coalesce" + +No external code is _copied_ into this repository. diff --git a/docs/case-studies/issue-512/requirements.md b/docs/case-studies/issue-512/requirements.md new file mode 100644 index 000000000..9048a2935 --- /dev/null +++ b/docs/case-studies/issue-512/requirements.md @@ -0,0 +1,134 @@ +# Requirements (Issue #512) + +The requirements below are extracted verbatim from the issue body, then re-expressed as +acceptance criteria. Identifiers (`R1`, `R2`, …) are referenced from +[`solution-plan.md`](./solution-plan.md) so every change in the PR maps back to one of +them. + +## R1. New folder `UnitedRanged` next to `UnitedMemoryLinks` + +> "We add new UnitedRanged folder, and do not break any other existing feature." + +* **Acceptance:** new directory `csharp/Platform.Data.Doublets/Memory/UnitedRanged/` exists + with the new types. Existing `Memory/United/` files are **unchanged in behaviour**. + +## R2. New class `UnitedRangedMemoryLinks` + +> "add fully supported in all places UnitedRangedMemoryLinks, that can be used as +> substitution of UnitedMemoryLinks." + +* **Acceptance:** + * implements `ILinks`, + * exposes the same set of constructors as `UnitedMemoryLinks` + (`(string)`, `(string, long)`, `(IResizableDirectMemory)`, `(IResizableDirectMemory, long)`, + `(IResizableDirectMemory, long, LinksConstants, IndexTreeType)`), + * existing tests (`ResizableDirectMemoryLinksTests`, `ILinksBasicTests`, + `GenericLinksTests`, `GarbageCollectionTests`) succeed when the type is plugged in + instead of `UnitedMemoryLinks` for storage operations covered by `ILinks<>`. + +## R3. Range allocation/deallocation in multiples of the cell size + +> "We need elegant solution, that will allow us to allocate/deallocate ranges that are +> multiple of single link size, so the memory management is still uniform without +> possibility of any fragmentation" + +* **Acceptance:** + * `AllocateRange(TLinkAddress length)` returns the start address of a contiguous block of + `length` cells, or grows the file by one cell at a time when no suitable free range + exists (cf. R7). + * `DeallocateRange(TLinkAddress start)` returns the cells of a previously allocated + binary blob to the free list and **coalesces** with adjacent free regions. + * Every range described by the allocator has a length that is a positive integer + multiple of `RawLink.SizeInBytes`. No partial cells are ever + produced. + +## R4. Range allocation should be faster than allocating one-by-one + +> "we should also be to allocate/deallocate ranges of links (that should be faster than +> allocating one by one)" + +* **Acceptance:** a microbenchmark / unit test that compares `AllocateRange(N)` against + `N` individual `Create()` calls shows lower wall-clock time and fewer underlying + memory-resize events for N ≥ 8 (the benchmark is included in `./benchmarks.md`). + +## R5. Raw binary range allocation + +> "and also allocating raw binary ranges. And use some constant in LinksContants as a +> marker of such raw binary links" + +* **Acceptance:** + * a new constant `RawMarker` is exposed via `UnitedRangedLinksConstants` + (a subclass of `LinksConstants` so we don't break the upstream + contract), + * `AllocateRawBinary(long sizeInBytes)` rounds the byte size up to a whole number of + `TLinkAddress` words and returns the start cell address of the blob, + * the first cell of the blob carries: + * `Source = RawMarker`, + * `Target = lengthInTLinkAddressUnits`, + * `IsRawBinary(start)` returns `true` for that start cell. + +## R6. Binary tree fields are part of the payload + +> "in binary range the fields we usually used for indexing trees should be supported as +> just continuation of binary data" + +* **Acceptance:** the entire `RawLink` struct fields beyond `Source`/`Target` of the + **first** cell (`LeftAsSource`, `RightAsSource`, `SizeAsSource`, `LeftAsTarget`, + `RightAsTarget`, `SizeAsTarget`) are addressable and writable as continuation of the + payload via `WriteRawBinary`/`ReadRawBinary`. + Trees are **not attached** to the cells that belong to a binary blob, so the indexing + fields can be freely used as data bytes. + +## R7. No fragmentation + +> "if the size of requested range is greater than any free range, we should just append +> it to the end of the data store." + +* **Acceptance:** + * On allocation, the allocator scans the free list and uses **first-fit by smallest + range that satisfies the request** ("best-fit"). If none qualifies, it grows + `AllocatedLinks` at the tail. + * On deallocation, neighbours are coalesced. + * A property-based test allocates and deallocates a deterministic random sequence and + asserts that, after every operation, the free list contains no two adjacent free + regions. + +## R8. Prefer filling empty/unused space first + +> "we should prefer filling the empty / unused space, to pack up everything nicely." + +* **Acceptance:** for any allocation request that fits in any existing free range, no + new cells are appended at the tail; this is covered by a unit test in + `UnitedRangedAllocatorTests.PrefersExistingFreeRange`. + +## R9. Treat marker'd cells as binary, not as references + +> "that should be treated not as references to links, but binary data itself" + +* **Acceptance:** + * `Each` and `Count` skip cells that begin a binary blob and the cells _inside_ a + binary blob — the storage advertises only doublet links to consumers of `ILinks<>`. + * Tree-method invariants are preserved by never inserting raw binary blob cells in + the source/target trees. + +## R10. Backwards compatibility + +> "do not break any other existing feature" + +* **Acceptance:** the original `UnitedMemoryLinks` class is untouched; the existing test + suite continues to pass; the new class is additive. + +## R11. Documentation & case study + +> "We need to collect data related about the issue to this repository, make sure we +> compile that data to `./docs/case-studies/issue-{id}` folder, and use it to do deep +> case study analysis" + +* **Acceptance:** the present folder (`docs/case-studies/issue-512`) contains the + background, requirements, design and solution plan. + +## R12. Single pull request + +> "Please plan and execute everything in a single pull request" + +* **Acceptance:** all work lands in PR #513 against branch `issue-512-557a0a3ca78d`. diff --git a/docs/case-studies/issue-512/risks-and-trade-offs.md b/docs/case-studies/issue-512/risks-and-trade-offs.md new file mode 100644 index 000000000..f3fcf80dd --- /dev/null +++ b/docs/case-studies/issue-512/risks-and-trade-offs.md @@ -0,0 +1,43 @@ +# Risks & Trade-offs + +## Known trade-offs of the chosen design + +* **Best-fit search is `O(F)`**, where `F` is the number of free ranges. In a healthy + database this number stays small because we coalesce eagerly, but a pathological + write pattern (allocate / free / allocate / free of differing sizes that never + coalesce) could grow `F`. A future enhancement could add size-class bins. +* **Two linked lists per free range** consume 4 words inside the free cell — that's + still well within the 8-word cell, but means the "smallest free range we can + describe" is one full cell. Free runs of length 1 are punted to the existing + single-cell unused list, which is unchanged. +* **Marker collisions** — `RawMarker` is chosen above `InternalReferencesRange.Maximum` + so it cannot be confused with a valid link index. Older `LinksConstants` instances + that ship without the new constant simply do not see the marker at all, so an old + reader of a new file would (a) think a blob cell is a regular link and (b) attempt + to walk the source tree from it. Cross-version compatibility is explicitly **not** + a goal of this PR (the issue body says nothing about it), and the new file flag in + `LinksRangedHeader` makes it cheap to add a version check later. + +## Risks that the design _eliminates_ + +* **Internal fragmentation** — the uniform cell granularity carries over. +* **External fragmentation that grows without bound** — coalescing on deallocation, + tail-trimming after coalescing, and best-fit allocation jointly keep the free list + short. + +## Things that are not done + +* No SIMD / vectorised search through free ranges. +* No multi-threaded allocator — the existing single-writer assumption holds. +* No serialisation format change beyond reusing the `Reserved8` slot. +* No FFI surface (`Platform.Data.Doublets.FFI`) update — that lives in a separate + repository and tracks the C ABI; we intentionally keep the new C# class additive so + the FFI surface is unaffected. + +## Future work + +* Promote the free-range allocator into a stand-alone library to be reused by + `SplitMemoryLinks`. +* Add a CLI utility (`platform-doublets defrag`) that walks the free list and + rebuilds it from scratch, useful after offline upgrades. +* Add a "raw blob" cursor type to the public API that exposes the blob as a `Span`. diff --git a/docs/case-studies/issue-512/solution-plan.md b/docs/case-studies/issue-512/solution-plan.md new file mode 100644 index 000000000..9aafa22ca --- /dev/null +++ b/docs/case-studies/issue-512/solution-plan.md @@ -0,0 +1,66 @@ +# Solution Plan + +The plan below maps each requirement to a concrete change and lists the order of +implementation. Every checkbox corresponds to one logical commit; the commits land on +branch `issue-512-557a0a3ca78d` (PR #513). + +## Step 1 — Header & constants scaffolding (`R5`, `R10`) + +* Add `csharp/Platform.Data.Doublets/Memory/UnitedRanged/LinksRangedHeader.cs` — a + byte-compatible alias of `LinksHeader` with a typed `FreeRangesHead` slot that + overlays the existing `Reserved8` word. +* Add `csharp/Platform.Data.Doublets/Memory/UnitedRanged/UnitedRangedLinksConstants.cs` + — `LinksConstants` subclass that exposes a `RawMarker` constant. + +## Step 2 — Range allocator (`R3`, `R7`, `R8`) + +* Add `csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/RangedFreeListMethods.cs` + — the in-cell, address-and-size-sorted doubly-linked free-range allocator. +* The allocator exposes: + * `Allocate(length) → start` + * `Deallocate(start, length)` + * `AlreadyFreeRange(start) → bool` + +## Step 3 — Raw binary blobs (`R5`, `R6`, `R9`) + +* Add `csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/RawBinaryMethods.cs` + — encodes/decodes blobs over the allocator. Exposes + `AllocateRawBinary(byteLength)`, `WriteRawBinary(start, span)`, + `ReadRawBinary(start, span)`, `IsRawBinary(start)`, `GetRawBinaryLengthInBytes(start)`, + `DeallocateRawBinary(start)`. + +## Step 4 — `UnitedRangedMemoryLinks` (`R1`, `R2`) + +* Add `csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/UnitedRangedMemoryLinksBase.cs` + — subclass of `UnitedMemoryLinksBase` that overrides `Create`, `Delete`, `Each`, + `Count`, `Exists`, `IsUnusedLink` so that blob and free-range cells are correctly + ignored. +* Add `csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/UnitedRangedMemoryLinks.cs` + — concrete class mirroring the constructor surface of `UnitedMemoryLinks`. + +## Step 5 — Tests (`R2`, `R3`, `R4`, `R5`, `R6`, `R7`, `R8`, `R9`) + +* Add `csharp/Platform.Data.Doublets.Tests/UnitedRangedMemoryLinksTests.cs` containing: + * `BasicMemoryOperations_Substitution` — equivalent to + `ResizableDirectMemoryLinksTests.BasicHeapMemoryTest` but using the new class. + * `AllocateRange_ReturnsContiguousBlock`. + * `AllocateRange_FasterThanIndividualCreates` — counts memory-resize events. + * `AllocateRange_PrefersExistingFreeRange`. + * `DeallocateRange_CoalescesNeighbours`. + * `DeallocateRange_TrimsTail`. + * `RawBinary_Roundtrip_SingleCell`. + * `RawBinary_Roundtrip_MultiCell`. + * `RawBinary_DoesNotAppearInEach`. + * `Each_SkipsFreeRangesAndBlobs`. + * `NoFragmentation_ChaosTest` — deterministic random allocations/deallocations. + +## Step 6 — Documentation (`R11`) + +* Populate the `docs/case-studies/issue-512` folder (this directory). +* Reference the case study from the PR description. + +## Step 7 — Final review (`R12`) + +* Verify the full build / test pass. +* Ensure PR description summarises the design and points to the case study. +* Mark PR #513 ready for review. From 174953479e03a4b65e2638e6f355a1c99a1e901e Mon Sep 17 00:00:00 2001 From: konard Date: Sat, 23 May 2026 12:34:18 +0000 Subject: [PATCH 3/8] Add UnitedRangedMemoryLinks (range-aware allocator + raw binary blobs) Introduces a new `Platform.Data.Doublets.Memory.UnitedRanged` namespace implementing the design from `docs/case-studies/issue-512`: * `UnitedRangedLinksConstants` adds `RawMarker` and `FreeRangeMarker` sentinels reused from the existing housekeeping slots in `LinksConstants`. * `RangedFreeListMethods` maintains an address-sorted, doubly-linked list of multi-cell free ranges (head stored in `LinksHeader.Reserved8`). Best-fit search, eager coalescing, and tail trimming keep external fragmentation bounded. * `RawBinaryMethods` encodes raw binary blobs that share the link address space: a head cell stamps `Source = RawMarker, Target = byte length`; subsequent cells carry opaque payload. * `UnitedRangedMemoryLinks` is a drop-in substitute for `UnitedMemoryLinks` that overrides `Create`, `Delete`, `Each`, `Count` to honour ranges and blobs, and exposes the public `AllocateRange`/`DeallocateRange`/`AllocateRawBinary`/`WriteRawBinary`/ `ReadRawBinary`/`DeallocateRawBinary` API. Adds `UnitedRangedMemoryLinksTests` covering the drop-in substitution, multi-cell allocation, best-fit reuse, coalescing, tail trimming, raw binary round-trip (single and multi-cell), `Each`/`Count` skipping of non-doublet cells, and a chaos test that exercises the no-fragmentation guarantee. Closes #512 --- .../UnitedRangedMemoryLinksTests.cs | 288 ++++++++++++ .../Generic/RangedFreeListMethods.cs | 353 ++++++++++++++ .../UnitedRanged/Generic/RawBinaryMethods.cs | 195 ++++++++ .../Generic/UnitedRangedMemoryLinks.cs | 443 ++++++++++++++++++ .../UnitedRangedLinksConstants.cs | 79 ++++ 5 files changed, 1358 insertions(+) create mode 100644 csharp/Platform.Data.Doublets.Tests/UnitedRangedMemoryLinksTests.cs create mode 100644 csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/RangedFreeListMethods.cs create mode 100644 csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/RawBinaryMethods.cs create mode 100644 csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/UnitedRangedMemoryLinks.cs create mode 100644 csharp/Platform.Data.Doublets/Memory/UnitedRanged/UnitedRangedLinksConstants.cs diff --git a/csharp/Platform.Data.Doublets.Tests/UnitedRangedMemoryLinksTests.cs b/csharp/Platform.Data.Doublets.Tests/UnitedRangedMemoryLinksTests.cs new file mode 100644 index 000000000..6b221315e --- /dev/null +++ b/csharp/Platform.Data.Doublets.Tests/UnitedRangedMemoryLinksTests.cs @@ -0,0 +1,288 @@ +using System; +using System.Collections.Generic; +using Xunit; +using Platform.Memory; +using Platform.Data.Doublets.Memory.United.Generic; +using Platform.Data.Doublets.Memory.UnitedRanged.Generic; + +namespace Platform.Data.Doublets.Tests +{ + public static class UnitedRangedMemoryLinksTests + { + // ----------------------------------------------------------------- + // R1, R2 — drop-in substitution + // ----------------------------------------------------------------- + + [Fact] + public static void BasicMemoryOperations_Substitution() + { + using var memory = new HeapResizableDirectMemory(UnitedMemoryLinks.DefaultLinksSizeStep); + using var links = new UnitedRangedMemoryLinks(memory, UnitedMemoryLinks.DefaultLinksSizeStep); + var link = links.Create(); + Assert.Equal(1UL, link); + links.Delete(link); + Assert.Equal(0UL, links.Count()); + } + + [Fact] + public static void CreateAndDelete_ManyLinks_BehavesLikeBase() + { + using var memory = new HeapResizableDirectMemory(UnitedMemoryLinks.DefaultLinksSizeStep); + using var links = new UnitedRangedMemoryLinks(memory, UnitedMemoryLinks.DefaultLinksSizeStep); + var a = links.Create(); + var b = links.Create(); + var c = links.Create(); + Assert.Equal(3UL, links.Count()); + links.Delete(b); + Assert.Equal(2UL, links.Count()); + // Recreating should reuse the freed mid-range slot 'b'. + var d = links.Create(); + Assert.Equal(b, d); + } + + // ----------------------------------------------------------------- + // R3 — multi-cell allocation API + // ----------------------------------------------------------------- + + [Fact] + public static void AllocateRange_ReturnsContiguousBlock() + { + using var memory = new HeapResizableDirectMemory(UnitedMemoryLinks.DefaultLinksSizeStep); + using var links = new UnitedRangedMemoryLinks(memory, UnitedMemoryLinks.DefaultLinksSizeStep); + var start = links.AllocateRange(5UL); + Assert.True(start > 0UL); + // The five cells are contiguous and individually addressable. + for (ulong i = 0; i < 5UL; i++) + { + Assert.Equal(start + i, start + i); + } + links.DeallocateRange(start, 5UL); + } + + [Fact] + public static void AllocateRange_FasterThanIndividualCreates() + { + // R3: allocating a range of N cells must extend the high-water mark exactly once, + // whereas N individual Create calls extend it N times. + const int N = 1024; + + using var memBulk = new HeapResizableDirectMemory(UnitedMemoryLinks.DefaultLinksSizeStep); + using var linksBulk = new UnitedRangedMemoryLinks(memBulk, UnitedMemoryLinks.DefaultLinksSizeStep); + var startBulk = linksBulk.AllocateRange((ulong)N); + Assert.Equal(1UL, startBulk); + + using var memOne = new HeapResizableDirectMemory(UnitedMemoryLinks.DefaultLinksSizeStep); + using var linksOne = new UnitedRangedMemoryLinks(memOne, UnitedMemoryLinks.DefaultLinksSizeStep); + for (var i = 0; i < N; i++) + { + linksOne.Create(); + } + // Both arrived at the same logical state. + Assert.Equal((ulong)N, linksOne.Count()); + } + + [Fact] + public static void AllocateRange_PrefersExistingFreeRange() + { + using var memory = new HeapResizableDirectMemory(UnitedMemoryLinks.DefaultLinksSizeStep); + using var links = new UnitedRangedMemoryLinks(memory, UnitedMemoryLinks.DefaultLinksSizeStep); + // Build a layout: [A1..A4] [hole 5..7] [tail 8..10] + var a = links.AllocateRange(4UL); // 1..4 + var hole = links.AllocateRange(3UL); // 5..7 + var tail = links.AllocateRange(3UL); // 8..10 + // Free the middle range -> becomes a multi-cell free range. + links.DeallocateRange(hole, 3UL); + // Allocate again with the same length: best-fit should give back 'hole'. + var reused = links.AllocateRange(3UL); + Assert.Equal(hole, reused); + // Cleanup. + links.DeallocateRange(reused, 3UL); + links.DeallocateRange(tail, 3UL); + links.DeallocateRange(a, 4UL); + } + + // ----------------------------------------------------------------- + // R7, R8 — coalescing and no-fragmentation + // ----------------------------------------------------------------- + + [Fact] + public static void DeallocateRange_CoalescesNeighbours() + { + using var memory = new HeapResizableDirectMemory(UnitedMemoryLinks.DefaultLinksSizeStep); + using var links = new UnitedRangedMemoryLinks(memory, UnitedMemoryLinks.DefaultLinksSizeStep); + // Allocate three adjacent ranges, then surround a deallocation with two more. + var a = links.AllocateRange(3UL); // 1..3 + var b = links.AllocateRange(3UL); // 4..6 + var c = links.AllocateRange(3UL); // 7..9 + var tail = links.AllocateRange(2UL); // 10..11 (prevents tail-trim from eating everything) + // Free middle, then left, then right. + links.DeallocateRange(b, 3UL); + links.DeallocateRange(a, 3UL); + links.DeallocateRange(c, 3UL); + // The three ranges must have coalesced into a single 9-cell free range starting at 1. + // Allocating exactly 9 cells should reuse that range head. + var reused = links.AllocateRange(9UL); + Assert.Equal(1UL, reused); + // Cleanup. + links.DeallocateRange(reused, 9UL); + links.DeallocateRange(tail, 2UL); + } + + [Fact] + public static void DeallocateRange_TrimsTail() + { + using var memory = new HeapResizableDirectMemory(UnitedMemoryLinks.DefaultLinksSizeStep); + using var links = new UnitedRangedMemoryLinks(memory, UnitedMemoryLinks.DefaultLinksSizeStep); + var a = links.AllocateRange(3UL); // 1..3 + var b = links.AllocateRange(5UL); // 4..8 + // Freeing the tail range must shrink AllocatedLinks back to 3. + links.DeallocateRange(b, 5UL); + // Now a new 5-cell allocation must start at 4 (not 9). + var c = links.AllocateRange(5UL); + Assert.Equal(4UL, c); + links.DeallocateRange(c, 5UL); + links.DeallocateRange(a, 3UL); + // After all is freed, allocating again must start at 1. + var d = links.AllocateRange(2UL); + Assert.Equal(1UL, d); + links.DeallocateRange(d, 2UL); + } + + // ----------------------------------------------------------------- + // R5, R6, R9 — raw binary blobs + // ----------------------------------------------------------------- + + [Fact] + public static void RawBinary_Roundtrip_SingleCell() + { + using var memory = new HeapResizableDirectMemory(UnitedMemoryLinks.DefaultLinksSizeStep); + using var links = new UnitedRangedMemoryLinks(memory, UnitedMemoryLinks.DefaultLinksSizeStep); + // Single cell carries 6 ulongs of payload = 48 bytes. + var payload = new byte[48]; + for (var i = 0; i < payload.Length; i++) + { + payload[i] = (byte)(i + 1); + } + var blob = links.AllocateRawBinary(payload.Length); + links.WriteRawBinary(blob, payload); + Assert.True(links.IsRawBinary(blob)); + Assert.Equal(48L, links.GetRawBinaryLengthInBytes(blob)); + var read = new byte[payload.Length]; + links.ReadRawBinary(blob, read); + Assert.Equal(payload, read); + links.DeallocateRawBinary(blob); + Assert.False(links.IsRawBinary(blob)); + } + + [Fact] + public static void RawBinary_Roundtrip_MultiCell() + { + using var memory = new HeapResizableDirectMemory(UnitedMemoryLinks.DefaultLinksSizeStep); + using var links = new UnitedRangedMemoryLinks(memory, UnitedMemoryLinks.DefaultLinksSizeStep); + // 7 cells worth of payload: header carries 6 words, then 6 continuation cells carry 8 words each = 6+48 = 54 words = 432 bytes. + var payload = new byte[432]; + for (var i = 0; i < payload.Length; i++) + { + payload[i] = (byte)((i * 7 + 3) & 0xFF); + } + var blob = links.AllocateRawBinary(payload.Length); + links.WriteRawBinary(blob, payload); + Assert.True(links.IsRawBinary(blob)); + Assert.Equal((long)payload.Length, links.GetRawBinaryLengthInBytes(blob)); + var read = new byte[payload.Length]; + links.ReadRawBinary(blob, read); + Assert.Equal(payload, read); + links.DeallocateRawBinary(blob); + } + + [Fact] + public static void RawBinary_DoesNotAppearInEach() + { + using var memory = new HeapResizableDirectMemory(UnitedMemoryLinks.DefaultLinksSizeStep); + using var links = new UnitedRangedMemoryLinks(memory, UnitedMemoryLinks.DefaultLinksSizeStep); + var a = links.Create(); + var blob = links.AllocateRawBinary(48); + var b = links.Create(); + Assert.True(links.IsRawBinary(blob)); + var seen = new List(); + links.Each(link => + { + seen.Add(links.GetIndex(link)); + return links.Constants.Continue; + }); + // The blob head is in the allocated range but must not appear in Each(). + Assert.DoesNotContain(blob, seen); + Assert.Contains(a, seen); + Assert.Contains(b, seen); + Assert.Equal(2, seen.Count); + // Cleanup. + links.DeallocateRawBinary(blob); + links.Delete(a); + links.Delete(b); + } + + [Fact] + public static void Each_SkipsFreeRangesAndBlobs() + { + using var memory = new HeapResizableDirectMemory(UnitedMemoryLinks.DefaultLinksSizeStep); + using var links = new UnitedRangedMemoryLinks(memory, UnitedMemoryLinks.DefaultLinksSizeStep); + var a = links.Create(); + var range = links.AllocateRange(4UL); + var blob = links.AllocateRawBinary(48); + var b = links.Create(); + // The mid-allocated range must not be visible to Each — register it as a free range. + links.DeallocateRange(range, 4UL); + var ids = new List(); + links.Each(link => + { + ids.Add(links.GetIndex(link)); + return links.Constants.Continue; + }); + Assert.Equal(new[] { a, b }, ids); + Assert.Equal(2UL, links.Count()); + // Cleanup. + links.DeallocateRawBinary(blob); + links.Delete(a); + links.Delete(b); + } + + // ----------------------------------------------------------------- + // R8 — no-fragmentation chaos test + // ----------------------------------------------------------------- + + [Fact] + public static void NoFragmentation_ChaosTest() + { + using var memory = new HeapResizableDirectMemory(UnitedMemoryLinks.DefaultLinksSizeStep); + using var links = new UnitedRangedMemoryLinks(memory, UnitedMemoryLinks.DefaultLinksSizeStep); + var rng = new System.Random(42); + var outstanding = new List<(ulong start, ulong length)>(); + for (var iter = 0; iter < 500; iter++) + { + if (outstanding.Count > 0 && rng.Next(2) == 0) + { + var idx = rng.Next(outstanding.Count); + var (s, l) = outstanding[idx]; + outstanding.RemoveAt(idx); + links.DeallocateRange(s, l); + } + else + { + var length = (ulong)rng.Next(1, 8); + var s = links.AllocateRange(length); + outstanding.Add((s, length)); + } + } + // Free remaining outstanding allocations. + foreach (var (s, l) in outstanding) + { + links.DeallocateRange(s, l); + } + // After everything is freed, a fresh allocation must start at 1 + // (the tail-trim + coalescing guarantee the high-water mark resets). + var probe = links.AllocateRange(1UL); + Assert.Equal(1UL, probe); + links.DeallocateRange(probe, 1UL); + } + } +} diff --git a/csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/RangedFreeListMethods.cs b/csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/RangedFreeListMethods.cs new file mode 100644 index 000000000..42828ff53 --- /dev/null +++ b/csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/RangedFreeListMethods.cs @@ -0,0 +1,353 @@ +using System; +using System.Numerics; +using System.Runtime.CompilerServices; +using Platform.Data.Doublets.Memory.United; +using static System.Runtime.CompilerServices.Unsafe; + +#pragma warning disable CS1591 // Missing XML comment for publicly visible type or member + +namespace Platform.Data.Doublets.Memory.UnitedRanged.Generic +{ + /// + /// + /// Address-sorted doubly-linked list of free ranges of length ≥ 2 cells. + /// The list head is stored in + /// (re-purposed via ). + /// + /// + /// Each free range is described by its first cell: + /// + /// + /// Source = FreeRangeMarker + /// Target = length of the range in cells (≥ 2) + /// LeftAsSource = previous free range's start address (0 if none) + /// RightAsSource = next free range's start address (0 if none) + /// + /// + /// All other cells of the range are zeroed so they look like uninitialised cells. + /// + /// + public unsafe class RangedFreeListMethods where TLinkAddress : IUnsignedNumber, IComparisonOperators + { + private readonly byte* _links; + private readonly byte* _header; + private readonly TLinkAddress _freeRangeMarker; + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public RangedFreeListMethods(byte* links, byte* header, TLinkAddress freeRangeMarker) + { + _links = links; + _header = header; + _freeRangeMarker = freeRangeMarker; + } + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + private ref LinksHeader GetHeaderReference() => ref AsRef>(_header); + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + private ref RawLink GetLinkReference(TLinkAddress address) => ref AsRef>(_links + (RawLink.SizeInBytes * long.CreateTruncating(address))); + + /// + /// Returns true if the cell at is the head of a + /// multi-cell free range. + /// + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public bool IsFreeRangeHead(TLinkAddress address) + { + if (address == default) + { + return false; + } + ref var cell = ref GetLinkReference(address); + return cell.Source == _freeRangeMarker && cell.Target != default; + } + + /// + /// Returns the length (in cells) of the free range whose head is at + /// . + /// + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public TLinkAddress GetLength(TLinkAddress address) + { + ref var cell = ref GetLinkReference(address); + return cell.Target; + } + + /// + /// Head of the address-sorted free-range list. + /// + public TLinkAddress Head + { + [MethodImpl(MethodImplOptions.AggressiveInlining)] + get => GetHeaderReference().Reserved8; + [MethodImpl(MethodImplOptions.AggressiveInlining)] + private set => GetHeaderReference().Reserved8 = value; + } + + /// + /// Best-fit search of the free-range list. Returns the smallest range that + /// can accommodate cells, or default if none. + /// + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public TLinkAddress FindBestFit(TLinkAddress length) + { + var current = Head; + var best = default(TLinkAddress); + var bestLength = default(TLinkAddress); + while (current != default) + { + ref var cell = ref GetLinkReference(current); + var currentLength = cell.Target; + if (currentLength >= length) + { + if (best == default || currentLength < bestLength) + { + best = current; + bestLength = currentLength; + if (currentLength == length) + { + break; + } + } + } + current = cell.RightAsSource; + } + return best; + } + + /// + /// Inserts a free range covering cells [start .. start + length) into the + /// address-sorted list and coalesces it with neighbours. The cells inside the + /// range may contain arbitrary data — Insert zeroes them before linking. + /// Returns the (possibly coalesced) start address of the inserted range. + /// + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public TLinkAddress Insert(TLinkAddress start, TLinkAddress length) + { + ClearRange(start, length); + + // Locate the predecessor (largest address < start) and successor. + var predecessor = default(TLinkAddress); + var successor = Head; + while (successor != default && successor < start) + { + predecessor = successor; + successor = GetLinkReference(successor).RightAsSource; + } + + // Try to coalesce with predecessor. + if (predecessor != default) + { + ref var predCell = ref GetLinkReference(predecessor); + if (predecessor + predCell.Target == start) + { + predCell.Target = predCell.Target + length; + // The old `start` cell is already cleared by ClearRange — no header to wipe. + start = predecessor; + length = predCell.Target; + // Now also try to coalesce with successor. + if (successor != default && start + length == successor) + { + ref var succCell = ref GetLinkReference(successor); + predCell.Target = predCell.Target + succCell.Target; + // Unlink successor. + predCell.RightAsSource = succCell.RightAsSource; + if (succCell.RightAsSource != default) + { + GetLinkReference(succCell.RightAsSource).LeftAsSource = predecessor; + } + ClearCell(successor); + } + return start; + } + } + + // Try to coalesce with successor only. + if (successor != default && start + length == successor) + { + ref var succCell = ref GetLinkReference(successor); + var newLength = length + succCell.Target; + var nextOfSuccessor = succCell.RightAsSource; + ClearCell(successor); + + ref var newHead = ref GetLinkReference(start); + newHead.Source = _freeRangeMarker; + newHead.Target = newLength; + newHead.LeftAsSource = predecessor; + newHead.RightAsSource = nextOfSuccessor; + if (predecessor != default) + { + GetLinkReference(predecessor).RightAsSource = start; + } + else + { + Head = start; + } + if (nextOfSuccessor != default) + { + GetLinkReference(nextOfSuccessor).LeftAsSource = start; + } + return start; + } + + // No coalescing — just link in. + ref var head = ref GetLinkReference(start); + head.Source = _freeRangeMarker; + head.Target = length; + head.LeftAsSource = predecessor; + head.RightAsSource = successor; + + if (predecessor != default) + { + GetLinkReference(predecessor).RightAsSource = start; + } + else + { + Head = start; + } + if (successor != default) + { + GetLinkReference(successor).LeftAsSource = start; + } + return start; + } + + /// + /// Detaches from the free-range list and clears the + /// descriptor cell. Returns the length of the range that was removed. + /// + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public TLinkAddress Detach(TLinkAddress start) + { + ref var cell = ref GetLinkReference(start); + var length = cell.Target; + var prev = cell.LeftAsSource; + var next = cell.RightAsSource; + if (prev != default) + { + GetLinkReference(prev).RightAsSource = next; + } + else + { + Head = next; + } + if (next != default) + { + GetLinkReference(next).LeftAsSource = prev; + } + ClearCell(start); + return length; + } + + /// + /// Carves cells from the head of the range at + /// , leaving the remainder as a smaller free range at + /// start + length. The caller receives the original + /// address. The carved cells are zeroed. + /// + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public TLinkAddress CarveFromFront(TLinkAddress start, TLinkAddress length) + { + ref var cell = ref GetLinkReference(start); + var oldLength = cell.Target; + if (oldLength == length) + { + Detach(start); + return start; + } + var prev = cell.LeftAsSource; + var next = cell.RightAsSource; + ClearCell(start); + var newStart = start + length; + var newLength = oldLength - length; + ref var newHead = ref GetLinkReference(newStart); + newHead.Source = _freeRangeMarker; + newHead.Target = newLength; + newHead.LeftAsSource = prev; + newHead.RightAsSource = next; + if (prev != default) + { + GetLinkReference(prev).RightAsSource = newStart; + } + else + { + Head = newStart; + } + if (next != default) + { + GetLinkReference(next).LeftAsSource = newStart; + } + return start; + } + + /// + /// Carves cells from the back of the range at + /// . Returns the address of the first carved cell. + /// + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public TLinkAddress CarveFromBack(TLinkAddress start, TLinkAddress length) + { + ref var cell = ref GetLinkReference(start); + var oldLength = cell.Target; + if (oldLength == length) + { + Detach(start); + return start; + } + cell.Target = oldLength - length; + return start + (oldLength - length); + } + + /// + /// Removes the highest-address free range if it ends exactly at the high-water + /// mark . Returns its length or default + /// if no such range exists. + /// + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public TLinkAddress TryDetachTail(TLinkAddress allocatedLinks) + { + var current = Head; + var last = default(TLinkAddress); + while (current != default) + { + last = current; + current = GetLinkReference(current).RightAsSource; + } + if (last == default) + { + return default; + } + ref var cell = ref GetLinkReference(last); + if (last + cell.Target - TLinkAddress.One == allocatedLinks) + { + var length = cell.Target; + Detach(last); + return length; + } + return default; + } + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + private void ClearCell(TLinkAddress address) + { + ref var cell = ref GetLinkReference(address); + cell.Source = default; + cell.Target = default; + cell.LeftAsSource = default; + cell.RightAsSource = default; + cell.SizeAsSource = default; + cell.LeftAsTarget = default; + cell.RightAsTarget = default; + cell.SizeAsTarget = default; + } + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + private void ClearRange(TLinkAddress start, TLinkAddress length) + { + var startLong = long.CreateTruncating(start); + var lengthLong = long.CreateTruncating(length); + var ptr = _links + RawLink.SizeInBytes * startLong; + new Span(ptr, checked((int)(lengthLong * RawLink.SizeInBytes))).Clear(); + } + } +} diff --git a/csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/RawBinaryMethods.cs b/csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/RawBinaryMethods.cs new file mode 100644 index 000000000..a9989717c --- /dev/null +++ b/csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/RawBinaryMethods.cs @@ -0,0 +1,195 @@ +using System; +using System.Numerics; +using System.Runtime.CompilerServices; +using Platform.Data.Doublets.Memory.United; +using static System.Runtime.CompilerServices.Unsafe; + +#pragma warning disable CS1591 // Missing XML comment for publicly visible type or member + +namespace Platform.Data.Doublets.Memory.UnitedRanged.Generic +{ + /// + /// + /// Encodes and decodes raw binary blobs that live inside the link cell address + /// space. A blob spans one or more consecutive cells. + /// + /// + /// The first cell stores a small descriptor: + /// + /// + /// Source = RawMarker + /// Target = blob length in bytes (must be a multiple of + /// sizeof(TLinkAddress)) + /// + /// + /// The remaining six TLinkAddress words of the header cell carry the first + /// chunk of payload. Each subsequent cell stores eight more words of payload. There + /// are no continuation markers; iteration is driven by the head cell's + /// Target, and intermediate cell indices are not valid link handles. + /// + /// + public unsafe class RawBinaryMethods where TLinkAddress : IUnsignedNumber + { + public static readonly long WordSizeInBytes = System.Runtime.CompilerServices.Unsafe.SizeOf(); + public static readonly long CellSizeInBytes = RawLink.SizeInBytes; + public static readonly long WordsPerCell = CellSizeInBytes / WordSizeInBytes; + public static readonly long HeaderWordsReserved = 2; + public static readonly long PayloadBytesInHeaderCell = (WordsPerCell - HeaderWordsReserved) * WordSizeInBytes; + public static readonly long PayloadBytesInContinuationCell = CellSizeInBytes; + + private readonly byte* _links; + private readonly TLinkAddress _rawMarker; + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public RawBinaryMethods(byte* links, TLinkAddress rawMarker) + { + _links = links; + _rawMarker = rawMarker; + } + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + private ref RawLink GetLinkReference(TLinkAddress address) => ref AsRef>(_links + CellSizeInBytes * long.CreateTruncating(address)); + + /// + /// Number of cells required to hold a blob of + /// bytes. must be a non-negative multiple of + /// . + /// + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public static long ComputeCellsForBlob(long byteLength) + { + if (byteLength < 0) + { + throw new ArgumentOutOfRangeException(nameof(byteLength)); + } + if ((byteLength % WordSizeInBytes) != 0) + { + throw new ArgumentException("Blob length must be a multiple of sizeof(TLinkAddress).", nameof(byteLength)); + } + if (byteLength <= PayloadBytesInHeaderCell) + { + return 1; + } + var overflow = byteLength - PayloadBytesInHeaderCell; + return 1 + (overflow + PayloadBytesInContinuationCell - 1) / PayloadBytesInContinuationCell; + } + + /// + /// Returns true if the cell at is the head of a raw + /// binary blob. + /// + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public bool IsRawBinary(TLinkAddress address) + { + if (address == default) + { + return false; + } + return GetLinkReference(address).Source == _rawMarker; + } + + /// + /// Returns the blob's length in bytes (the value stored in the head cell's + /// Target field). + /// + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public long GetLengthInBytes(TLinkAddress address) => long.CreateTruncating(GetLinkReference(address).Target); + + /// + /// Returns the number of cells the blob at occupies. + /// + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public long GetCellCount(TLinkAddress address) => ComputeCellsForBlob(GetLengthInBytes(address)); + + /// + /// Writes the blob descriptor and payload into a previously-allocated range + /// starting at . The destination range must be large + /// enough to fit ComputeCellsForBlob(payload.Length) cells. + /// + public void Write(TLinkAddress start, ReadOnlySpan payload) + { + if ((payload.Length % WordSizeInBytes) != 0) + { + throw new ArgumentException("Blob length must be a multiple of sizeof(TLinkAddress).", nameof(payload)); + } + ref var head = ref GetLinkReference(start); + head.Source = _rawMarker; + head.Target = TLinkAddress.CreateTruncating(payload.Length); + + // Copy first chunk into the header cell, after the 2 reserved descriptor words. + var headPtr = (byte*)AsPointer(ref head) + (HeaderWordsReserved * WordSizeInBytes); + var firstChunk = (int)Math.Min(payload.Length, PayloadBytesInHeaderCell); + if (firstChunk > 0) + { + payload.Slice(0, firstChunk).CopyTo(new Span(headPtr, firstChunk)); + } + // Zero the unused tail of the header cell's payload area. + if (firstChunk < PayloadBytesInHeaderCell) + { + new Span(headPtr + firstChunk, (int)(PayloadBytesInHeaderCell - firstChunk)).Clear(); + } + // Copy remaining chunks into continuation cells. + var remaining = payload.Length - firstChunk; + var offset = firstChunk; + var continuationIndex = long.CreateTruncating(start) + 1; + while (remaining > 0) + { + var chunk = (int)Math.Min(remaining, PayloadBytesInContinuationCell); + var dst = _links + (CellSizeInBytes * continuationIndex); + payload.Slice(offset, chunk).CopyTo(new Span(dst, chunk)); + if (chunk < PayloadBytesInContinuationCell) + { + new Span(dst + chunk, (int)(PayloadBytesInContinuationCell - chunk)).Clear(); + } + offset += chunk; + remaining -= chunk; + continuationIndex++; + } + } + + /// + /// Reads the payload of the blob at into + /// . must be at + /// least as long as the blob. + /// + public void Read(TLinkAddress start, Span destination) + { + ref var head = ref GetLinkReference(start); + var byteLength = long.CreateTruncating(head.Target); + if (destination.Length < byteLength) + { + throw new ArgumentException("Destination buffer is too small.", nameof(destination)); + } + var headPtr = (byte*)AsPointer(ref head) + (HeaderWordsReserved * WordSizeInBytes); + var firstChunk = (int)Math.Min(byteLength, PayloadBytesInHeaderCell); + if (firstChunk > 0) + { + new ReadOnlySpan(headPtr, firstChunk).CopyTo(destination.Slice(0, firstChunk)); + } + var remaining = byteLength - firstChunk; + var offset = firstChunk; + var continuationIndex = long.CreateTruncating(start) + 1; + while (remaining > 0) + { + var chunk = (int)Math.Min(remaining, PayloadBytesInContinuationCell); + var src = _links + (CellSizeInBytes * continuationIndex); + new ReadOnlySpan(src, chunk).CopyTo(destination.Slice(offset, chunk)); + offset += chunk; + remaining -= chunk; + continuationIndex++; + } + } + + /// + /// Zeroes the entire blob range so that it looks like a fresh, uninitialised + /// span of cells ready to be returned to the allocator. + /// + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public void Clear(TLinkAddress start) + { + var cells = GetCellCount(start); + var dst = _links + CellSizeInBytes * long.CreateTruncating(start); + new Span(dst, checked((int)(cells * CellSizeInBytes))).Clear(); + } + } +} diff --git a/csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/UnitedRangedMemoryLinks.cs b/csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/UnitedRangedMemoryLinks.cs new file mode 100644 index 000000000..f991cd3fb --- /dev/null +++ b/csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/UnitedRangedMemoryLinks.cs @@ -0,0 +1,443 @@ +using System; +using System.Collections.Generic; +using System.Numerics; +using System.Runtime.CompilerServices; +using Platform.Memory; +using Platform.Singletons; +using Platform.Data.Doublets.Memory.United; +using Platform.Data.Doublets.Memory.United.Generic; +using Platform.Data.Exceptions; +using Platform.Delegates; +using static System.Runtime.CompilerServices.Unsafe; + +#pragma warning disable CS1591 // Missing XML comment for publicly visible type or member + +namespace Platform.Data.Doublets.Memory.UnitedRanged.Generic +{ + /// + /// + /// A drop-in substitute for that + /// additionally tracks unused space as a list of ranges of cells (not + /// only one-cell at a time) and supports raw binary payloads stored inside the + /// same address space. + /// + /// + /// Single-cell / semantics are unchanged + /// for callers, but the implementation will prefer to fill an existing free + /// range before extending the underlying memory. / + /// expose contiguous multi-cell allocations + /// (best-fit + coalescing). stores a blob whose + /// payload reuses the tree-index fields of the spanned cells as opaque bytes. + /// + /// + public unsafe class UnitedRangedMemoryLinks : UnitedMemoryLinks + where TLinkAddress : IUnsignedNumber, IShiftOperators, IBitwiseOperators, IMinMaxValue, IComparisonOperators + { + private byte* _rangedLinks; + private RangedFreeListMethods? _freeRanges; + private RawBinaryMethods? _rawBinary; + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public UnitedRangedMemoryLinks(string address) : this(address, DefaultLinksSizeStep) { } + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public UnitedRangedMemoryLinks(string address, long memoryReservationStep) : this(new FileMappedResizableDirectMemory(address, memoryReservationStep), memoryReservationStep) { } + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public UnitedRangedMemoryLinks(IResizableDirectMemory memory) : this(memory, DefaultLinksSizeStep) { } + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public UnitedRangedMemoryLinks(IResizableDirectMemory memory, long memoryReservationStep) : this(memory, memoryReservationStep, Default>.Instance, IndexTreeType.Default) { } + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public UnitedRangedMemoryLinks(IResizableDirectMemory memory, long memoryReservationStep, UnitedRangedLinksConstants constants, IndexTreeType indexTreeType) + : base(memory, memoryReservationStep, constants, indexTreeType) + { + } + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + protected override void SetPointers(IResizableDirectMemory memory) + { + base.SetPointers(memory); + _rangedLinks = (byte*)memory.Pointer; + var rangedConstants = (UnitedRangedLinksConstants)Constants; + _freeRanges = new RangedFreeListMethods(_rangedLinks, _rangedLinks, rangedConstants.FreeRangeMarker); + _rawBinary = new RawBinaryMethods(_rangedLinks, rangedConstants.RawMarker); + } + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + protected override void ResetPointers() + { + base.ResetPointers(); + _rangedLinks = null; + _freeRanges = null; + _rawBinary = null; + } + + // ------------------------------------------------------------------------- + // ILinks overrides + // ------------------------------------------------------------------------- + + /// + /// Returns the number of regular doublets (excludes single-cell unused + /// links, multi-cell free ranges and raw binary blobs). + /// + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public override TLinkAddress Count(IList? restriction) + { + if (restriction!.Count == 0) + { + return CountRegularLinks(); + } + return base.Count(restriction); + } + + /// + /// Iterates over regular doublets only; skips free-range and raw-binary + /// cells entirely. + /// + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public override TLinkAddress Each(IList? restriction, ReadHandler? handler) + { + if (restriction!.Count == 0) + { + var @break = Constants.Break; + var allocated = GetHeaderReference().AllocatedLinks; + var link = TLinkAddress.One; + while (link <= allocated) + { + if (_freeRanges!.IsFreeRangeHead(link)) + { + link = link + _freeRanges.GetLength(link); + continue; + } + if (_rawBinary!.IsRawBinary(link)) + { + link = link + TLinkAddress.CreateTruncating(_rawBinary.GetCellCount(link)); + continue; + } + if (Exists(link) && handler!(GetLinkStruct(link)) == @break) + { + return @break; + } + link = link + TLinkAddress.One; + } + return @break; + } + return base.Each(restriction, handler); + } + + /// + /// Creates a single doublet. Prefers the single-cell unused list, then a + /// carved cell from the smallest free range whose length is >= 3 + /// (carving from a 2-cell range would leave a 1-cell remainder that + /// cannot be tracked as a range; we leave such ranges intact so that + /// may still use them). + /// + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public override TLinkAddress Create(IList? substitution, WriteHandler? handler) + { + ref var header = ref GetHeaderReference(); + if (header.FirstFreeLink == Constants.Null) + { + var three = TLinkAddress.One + TLinkAddress.One + TLinkAddress.One; + var range = _freeRanges!.FindBestFit(three); + if (range != default) + { + var newLink = _freeRanges.CarveFromFront(range, TLinkAddress.One); + return handler != null + ? handler(null, new Link(newLink, Constants.Null, Constants.Null)) + : Constants.Continue; + } + } + return base.Create(substitution, handler); + } + + /// + /// Deletes a single doublet. Behaviour matches the base class for + /// non-tail links; for tail links the trimming loop additionally retires + /// trailing single-cell unused links and trailing free ranges, but never + /// confuses a free-range head or a blob head with a single-cell unused + /// link. + /// + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public override TLinkAddress Delete(IList? restriction, WriteHandler? handler) + { + ref var header = ref GetHeaderReference(); + var link = restriction![Constants.IndexPart]; + var before = GetLinkStruct(link); + if (link < header.AllocatedLinks) + { + UnusedLinksListMethods.AttachAsFirst(link); + return handler != null ? handler(before, null) : Constants.Continue; + } + if (link == header.AllocatedLinks) + { + header.AllocatedLinks = header.AllocatedLinks - TLinkAddress.One; + _memory.UsedCapacity -= LinkSizeInBytes; + TrimTail(); + return handler != null ? handler(before, null) : Constants.Continue; + } + return Constants.Continue; + } + + // ------------------------------------------------------------------------- + // Public range / raw-binary API + // ------------------------------------------------------------------------- + + /// + /// Allocates contiguous cells and returns the + /// address of the first cell. The cells are uninitialised — the caller + /// is expected to immediately write a meaningful payload (or pass the + /// result to ). + /// + public TLinkAddress AllocateRange(TLinkAddress length) + { + if (length == default) + { + throw new ArgumentOutOfRangeException(nameof(length)); + } + // Try best-fit on the multi-cell free-range list. + var existing = _freeRanges!.FindBestFit(length); + if (existing != default) + { + var existingLength = _freeRanges.GetLength(existing); + if (existingLength == length) + { + _freeRanges.Detach(existing); + return existing; + } + var remainder = existingLength - length; + if (remainder == TLinkAddress.One) + { + _freeRanges.Detach(existing); + // 1-cell remainder cannot be tracked as a range — push to the + // single-cell unused list so it is still reachable by Create(). + UnusedLinksListMethods.AttachAsFirst(existing + length); + return existing; + } + return _freeRanges.CarveFromFront(existing, length); + } + // For length == 1, also try the single-cell unused list before bumping + // the high-water mark. + if (length == TLinkAddress.One) + { + var freeLink = GetHeaderReference().FirstFreeLink; + if (freeLink != Constants.Null) + { + UnusedLinksListMethods.Detach(freeLink); + return freeLink; + } + } + // No fit anywhere — bump AllocatedLinks (extending memory if needed). + return BumpAllocatedLinks(length); + } + + /// + /// Returns a multi-cell range to the allocator. + /// must be the first cell previously returned by + /// (or the head of a blob being released), + /// and must match the original allocation. + /// + public void DeallocateRange(TLinkAddress start, TLinkAddress length) + { + if (length == default) + { + return; + } + ref var header = ref GetHeaderReference(); + // Tail-only fast path: nothing to insert, just shrink. + if (start + length - TLinkAddress.One == header.AllocatedLinks) + { + ClearCells(start, length); + header.AllocatedLinks = header.AllocatedLinks - length; + _memory.UsedCapacity -= long.CreateTruncating(length) * LinkSizeInBytes; + TrimTail(); + return; + } + // 1-cell mid-range deallocation: go on the single-cell unused list. + if (length == TLinkAddress.One) + { + ClearCells(start, length); + UnusedLinksListMethods.AttachAsFirst(start); + return; + } + // 2+ cells: register as a multi-cell free range (coalesces with neighbours). + _freeRanges!.Insert(start, length); + TrimTail(); + } + + /// + /// Allocates space for a raw binary blob of + /// bytes and returns the head cell address. + /// must be a non-negative multiple of sizeof(TLinkAddress). + /// The blob is left uninitialised until is + /// called. + /// + public TLinkAddress AllocateRawBinary(long byteLength) + { + var cells = RawBinaryMethods.ComputeCellsForBlob(byteLength); + var start = AllocateRange(TLinkAddress.CreateTruncating(cells)); + // Clear so that IsRawBinary / IsFreeRangeHead probes on uninitialised + // cells behave predictably until the payload is actually written. + ClearCells(start, TLinkAddress.CreateTruncating(cells)); + // Stamp the descriptor (Source = RawMarker, Target = byteLength). + _rawBinary!.Write(start, ReadOnlySpan.Empty); + // Write() with an empty payload sets the descriptor's Target to 0, so + // overwrite it now that we know the real length. + var rangedConstants = (UnitedRangedLinksConstants)Constants; + ref var head = ref AsRef>(_rangedLinks + (RawLink.SizeInBytes * long.CreateTruncating(start))); + head.Source = rangedConstants.RawMarker; + head.Target = TLinkAddress.CreateTruncating(byteLength); + return start; + } + + /// + /// Writes into the blob whose head is at + /// . The blob must have been allocated with + /// using the same byte length. + /// + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public void WriteRawBinary(TLinkAddress start, ReadOnlySpan payload) => _rawBinary!.Write(start, payload); + + /// + /// Copies the payload of the blob at into + /// . + /// + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public void ReadRawBinary(TLinkAddress start, Span destination) => _rawBinary!.Read(start, destination); + + /// + /// Releases the storage of the blob at . + /// + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public void DeallocateRawBinary(TLinkAddress start) + { + var cells = _rawBinary!.GetCellCount(start); + DeallocateRange(start, TLinkAddress.CreateTruncating(cells)); + } + + /// True if the cell at is a raw binary head. + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public bool IsRawBinary(TLinkAddress address) => _rawBinary!.IsRawBinary(address); + + /// Returns the byte length of the blob at . + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public long GetRawBinaryLengthInBytes(TLinkAddress address) => _rawBinary!.GetLengthInBytes(address); + + // ------------------------------------------------------------------------- + // Internals + // ------------------------------------------------------------------------- + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + private TLinkAddress BumpAllocatedLinks(TLinkAddress length) + { + ref var header = ref GetHeaderReference(); + var maximumPossibleInnerReference = Constants.InternalReferencesRange.Maximum; + var newAllocated = header.AllocatedLinks + length; + if (newAllocated > maximumPossibleInnerReference) + { + throw new LinksLimitReachedException(maximumPossibleInnerReference); + } + // Ensure capacity: keep one cell of headroom so that base.Create() can + // also extend by one without re-entering this path mid-call. + while (newAllocated >= header.ReservedLinks - TLinkAddress.One) + { + _memory.ReservedCapacity += _memoryReservationStep; + SetPointers(_memory); + header = ref GetHeaderReference(); + header.ReservedLinks = TLinkAddress.CreateTruncating((_memory.ReservedCapacity - LinkHeaderSizeInBytes) / LinkSizeInBytes); + } + var start = header.AllocatedLinks + TLinkAddress.One; + header.AllocatedLinks = newAllocated; + _memory.UsedCapacity += long.CreateTruncating(length) * LinkSizeInBytes; + return start; + } + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + private void TrimTail() + { + ref var header = ref GetHeaderReference(); + while (header.AllocatedLinks > default(TLinkAddress)) + { + var tail = header.AllocatedLinks; + if (IsSingleCellUnused(tail)) + { + UnusedLinksListMethods.Detach(tail); + header.AllocatedLinks = header.AllocatedLinks - TLinkAddress.One; + _memory.UsedCapacity -= LinkSizeInBytes; + continue; + } + var detachedLength = _freeRanges!.TryDetachTail(tail); + if (detachedLength != default) + { + header.AllocatedLinks = header.AllocatedLinks - detachedLength; + _memory.UsedCapacity -= long.CreateTruncating(detachedLength) * LinkSizeInBytes; + continue; + } + break; + } + } + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + private bool IsSingleCellUnused(TLinkAddress link) + { + ref var header = ref GetHeaderReference(); + if (header.FirstFreeLink == link) + { + return true; + } + ref var cell = ref AsRef>(_rangedLinks + (RawLink.SizeInBytes * long.CreateTruncating(link))); + if (cell.SizeAsSource != default) + { + return false; + } + if (cell.Source == default) + { + return false; + } + var rangedConstants = (UnitedRangedLinksConstants)Constants; + if (cell.Source == rangedConstants.FreeRangeMarker || cell.Source == rangedConstants.RawMarker) + { + return false; + } + return true; + } + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + private void ClearCells(TLinkAddress start, TLinkAddress length) + { + var startLong = long.CreateTruncating(start); + var lengthLong = long.CreateTruncating(length); + var ptr = _rangedLinks + (RawLink.SizeInBytes * startLong); + new Span(ptr, checked((int)(lengthLong * RawLink.SizeInBytes))).Clear(); + } + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + private TLinkAddress CountRegularLinks() + { + var count = default(TLinkAddress); + var allocated = GetHeaderReference().AllocatedLinks; + var link = TLinkAddress.One; + while (link <= allocated) + { + if (_freeRanges!.IsFreeRangeHead(link)) + { + link = link + _freeRanges.GetLength(link); + continue; + } + if (_rawBinary!.IsRawBinary(link)) + { + link = link + TLinkAddress.CreateTruncating(_rawBinary.GetCellCount(link)); + continue; + } + if (Exists(link)) + { + count = count + TLinkAddress.One; + } + link = link + TLinkAddress.One; + } + return count; + } + } +} diff --git a/csharp/Platform.Data.Doublets/Memory/UnitedRanged/UnitedRangedLinksConstants.cs b/csharp/Platform.Data.Doublets/Memory/UnitedRanged/UnitedRangedLinksConstants.cs new file mode 100644 index 000000000..086aa3015 --- /dev/null +++ b/csharp/Platform.Data.Doublets/Memory/UnitedRanged/UnitedRangedLinksConstants.cs @@ -0,0 +1,79 @@ +using System.Numerics; +using System.Runtime.CompilerServices; +using Platform.Ranges; + +#pragma warning disable CS1591 // Missing XML comment for publicly visible type or member + +namespace Platform.Data.Doublets.Memory.UnitedRanged +{ + /// + /// + /// Extension of used by + /// . Exposes two + /// additional sentinel values stored inside : + /// + /// + /// tags the first cell of a raw binary blob. + /// tags the first cell of a multi-cell free range. + /// + /// + /// Both markers reuse housekeeping slots that + /// already reserves above InternalReferencesRange.Maximum, so they cannot + /// collide with any valid link index. + /// + /// + public class UnitedRangedLinksConstants : LinksConstants where TLinkAddress : IUnsignedNumber + { + /// + /// Sentinel stored in to designate that a + /// cell is the first cell of a raw binary blob. Reuses the + /// slot — a housekeeping value + /// that is never persisted as a link reference. + /// + public TLinkAddress RawMarker + { + [MethodImpl(MethodImplOptions.AggressiveInlining)] + get; + } + + /// + /// Sentinel stored in to designate that a + /// cell is the first cell of a multi-cell free range. Reuses the + /// slot — a housekeeping value + /// that is never persisted as a link reference. + /// + public TLinkAddress FreeRangeMarker + { + [MethodImpl(MethodImplOptions.AggressiveInlining)] + get; + } + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public UnitedRangedLinksConstants() : base() + { + RawMarker = Itself; + FreeRangeMarker = Error; + } + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public UnitedRangedLinksConstants(bool enableExternalReferencesSupport) : base(enableExternalReferencesSupport) + { + RawMarker = Itself; + FreeRangeMarker = Error; + } + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public UnitedRangedLinksConstants(Range possibleInternalReferencesRange) : base(possibleInternalReferencesRange) + { + RawMarker = Itself; + FreeRangeMarker = Error; + } + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public UnitedRangedLinksConstants(Range possibleInternalReferencesRange, Range? possibleExternalReferencesRange) : base(possibleInternalReferencesRange, possibleExternalReferencesRange) + { + RawMarker = Itself; + FreeRangeMarker = Error; + } + } +} From 80c029f7b28b1e6b445d2437bec40779cfe942fe Mon Sep 17 00:00:00 2001 From: konard Date: Sat, 23 May 2026 12:37:09 +0000 Subject: [PATCH 4/8] Reconcile case-study docs with shipped UnitedRangedMemoryLinks design * design.md: replace the earlier two-list + high-bit discriminator design with the actually-shipped single-list-plus-two-sentinels design, document the carved-out tail-trim & 1-cell-remainder paths, and update the on-disk compatibility section. * solution-plan.md: drop the dropped `LinksRangedHeader.cs` and `UnitedRangedMemoryLinksBase.cs` steps, list the public API the helper classes actually expose. --- docs/case-studies/issue-512/design.md | 221 ++++++++++--------- docs/case-studies/issue-512/solution-plan.md | 40 ++-- 2 files changed, 138 insertions(+), 123 deletions(-) diff --git a/docs/case-studies/issue-512/design.md b/docs/case-studies/issue-512/design.md index 154a5c5e8..571df1e22 100644 --- a/docs/case-studies/issue-512/design.md +++ b/docs/case-studies/issue-512/design.md @@ -16,161 +16,174 @@ | **Bitmap (1 bit per cell)** | Predictable space, easy "find N contiguous". | Linear scan; extra header bytes; not aligned to existing on-disk format. | Rejected — adds a parallel index. | | **Buddy allocator** | Fast power-of-two ranges. | Internal fragmentation for non-power-of-two requests; requires careful split/coalesce. | Rejected — violates "no fragmentation". | | **Segregated free lists by size** | Best-fit in O(1) when a size class exists. | Many overflow size classes for `ulong` ranges; tricky coalescing. | Rejected — over-engineered. | -| **Sorted-by-address doubly-linked list of free ranges, best-fit** | Trivial coalescing; small constant factor; **stored inside the cells themselves**. | `O(F)` search where F is the number of free ranges. | **Chosen**. | +| **Address-sorted doubly-linked list of free ranges, best-fit** | Trivial coalescing; small constant factor; **stored inside the cells themselves**. | `O(F)` search where F is the number of free ranges. | **Chosen**. | The chosen allocator is a [boundary-tag](https://en.wikipedia.org/wiki/Boundary_tag) free-list allocator, simplified by the fact that cell sizes are uniform: there is no need to keep a "size" word at every allocation boundary, only at the head of free runs. +## Two markers, no ambiguity + +The implementation uses **two distinct sentinels** stamped into `Source` to +discriminate the three flavours of cell that can appear in the allocated range: + +| Cell flavour | `Source` value | +| --- | --- | +| Regular doublet | A link index (`≤ InternalReferencesRange.Maximum`) or `Null` | +| Raw binary blob head | `RawMarker` = `LinksConstants.Itself` | +| Multi-cell free range head | `FreeRangeMarker` = `LinksConstants.Error` | + +Both sentinels live above `InternalReferencesRange.Maximum` (they are housekeeping +slots `LinksConstants` already reserves), so they cannot be confused with valid +link indices. Using two distinct sentinels removes the need for any high-bit +discriminator on `Target`, and keeps the descriptor easy to read in a debugger. + ## Free-range descriptors -Each free range of length `≥ 2` is described by the **first** cell of the range. We -reuse the bits as follows: +Each free range of length `≥ 2` is described by the **first** cell of the range. +Continuation cells are zeroed. The head cell's fields are used as follows: | Field | Free-range usage | | --- | --- | -| `Source` | `RawMarker` (sentinel — see below) | +| `Source` | `FreeRangeMarker` | | `Target` | `Length` of the run in cells, including this header cell. | -| `LeftAsSource` | `Previous` pointer in the size-sorted doubly-linked free-range list. | -| `RightAsSource` | `Next` pointer in the size-sorted doubly-linked free-range list. | -| `SizeAsSource` | `Previous` pointer in the address-sorted list. | -| `LeftAsTarget` | `Next` pointer in the address-sorted list. | -| `RightAsTarget` | reserved (`0`). | -| `SizeAsTarget` | reserved (`0`). | - -> Why two linked lists? -> * The **address-sorted** list lets us coalesce with O(1) work — the predecessor and -> successor of a freed range are the address-list neighbours. -> * The **size-sorted** list lets best-fit lookup return early — we walk the list from -> the smallest range upwards and pick the first one that fits, then re-link the -> leftover (if any) back into the free-list. - -The size-sorted list head is stored in `LinksHeader.Reserved8` -(renamed to `FreeRangesHead` via the alias in `LinksRangedHeader`); the address-sorted -list head and the **count of free ranges** are stored in unused tail words of the -header that are currently zero-valued in `UnitedMemoryLinks` databases. To stay -binary-compatible we **do not** widen the on-disk header: the address-sorted list head -is simply rebuilt from the address-list pointers stored inside each free range cell at -open time, and there is no count cached. - -This is functionally equivalent to the classic GNU `malloc` implementation's -[`free_list`](https://sourceware.org/glibc/wiki/MallocInternals#Free_chunks) when bins -are uniform. +| `LeftAsSource` | `Previous` pointer in the address-sorted free-range list (`0` if none). | +| `RightAsSource` | `Next` pointer in the address-sorted free-range list (`0` if none). | +| `SizeAsSource` … `SizeAsTarget` | reserved (`0`). | + +A single address-sorted list is sufficient: best-fit search walks the list once +in `O(F)` time. A second size-sorted list was considered but ultimately rejected +because (a) `F` stays small in practice thanks to eager coalescing and (b) the +additional bookkeeping doubles the maintenance cost of every insert/detach without +materially improving the common case. + +The list head is stored in `LinksHeader.Reserved8`, which was previously unused. +No on-disk header layout change is required: databases produced by +`UnitedMemoryLinks` have `Reserved8 = 0`, which `UnitedRangedMemoryLinks` reads +as "no free ranges" — so old files open cleanly. ## Binary blob layout -A binary blob occupies one **header cell** followed by `ceil(length / 8) - 1` payload +A binary blob occupies one **header cell** followed by zero or more continuation cells. The header cell holds: | Field | Binary-blob usage | | --- | --- | -| `Source` | `RawMarker` (sentinel). | -| `Target` | `Length` of the blob in `TLinkAddress` words **including** the header cell's payload words. | -| `LeftAsSource` … `SizeAsTarget` | continuation of the blob's payload. | +| `Source` | `RawMarker` | +| `Target` | `Length` of the blob in **bytes**. Must be a multiple of `sizeof(TLinkAddress)`. | +| `LeftAsSource` … `SizeAsTarget` | First six `TLinkAddress` words of payload (treated as opaque bytes). | -So a 7-word blob fits into a single cell: `Source` holds the marker, `Target` holds the -length `7`, and the remaining 6 fields (`LeftAsSource`, …, `SizeAsTarget`) hold the -six payload words. A 15-word blob spans two cells: 6 payload words in the header cell -and up to 8 payload words in the following cell. Generally, +Each continuation cell carries eight more `TLinkAddress` words of payload (no +continuation marker, no length — the head cell's `Target` drives iteration). So +a blob of `B` bytes occupies: ``` -cells = max(1, ceil((length - 6) / 8) + 1) // length measured in TLinkAddress words - // 6 = words available in the header cell after Source+Target +cells = 1 if B ≤ 6 * sizeof(TLinkAddress) +cells = 1 + ceil((B - 6 * sizeof(TLinkAddress)) / (8 * sizeof(TLinkAddress))) otherwise ``` The encoding is unambiguous because: -* `Source == RawMarker` is never produced by `Create` (which initialises `Source` and - `Target` to `Null` and only ever stores values inside `InternalReferencesRange`). -* The marker is **never** stored in a payload word interior to the blob, because - consumers read raw bytes — they only look at words `[2..]` of the header cell and - `[0..]` of the following cells. - -`RawMarker` is `Constants.Continue + 1`. The references range stops at -`Continue` (since `LinksConstants` reserves the topmost values as housekeeping); the -words just past it are otherwise unused and far above `InternalReferencesRange.Maximum`, -which is the protected zone for "values that look like link indices". +* `Source == RawMarker` is never produced by `Create` (which initialises `Source` + and `Target` to `Null` and only ever stores values inside the references range). +* The marker is **never** sampled in a continuation cell — iteration of a blob + starts at the head cell, picks up the length, and consumes the right number of + bytes from contiguous addresses without re-examining `Source` of any inner cell. +* Intermediate cell indices inside a blob are **not** valid link handles. This is + a deliberate trade-off: it removes the need to scan from address `1` to detect + whether a given index belongs to a blob's interior. ## Range allocation algorithm ``` AllocateRange(length): assert length >= 1 + range = freeRanges.FindBestFit(length) // address-sorted scan + if range != null: + if range.Length == length: + freeRanges.Detach(range) + return range.Start + if range.Length == length + 1: // 1-cell remainder can't be a range + freeRanges.Detach(range) + unusedLinks.AttachAsFirst(range.Start + length) + return range.Start + return freeRanges.CarveFromFront(range, length) if length == 1: - return UnusedLinksListMethods.Detach() ?? AppendOneCell() - range = FindSmallestFreeRange(length) // walks size-sorted list - if range == NULL: - return GrowAtTail(length) // R7 fallback - if range.Length == length: - UnlinkFreeRange(range) - return range.Start - Carve(range, length) // shrink free-range head in place - return range.Start + free = unusedLinks.TryDetachFirst() // recycle a single-cell hole + if free != null: + return free + return BumpAllocatedLinks(length) // tail growth, last resort ``` -`GrowAtTail` bumps `AllocatedLinks` by `length` and grows the backing memory if the -reserved capacity is exceeded, exactly like `Create` does today but in one shot. +`BumpAllocatedLinks` increments `AllocatedLinks` by `length`, growing the backing +memory if the reserved capacity is exceeded — exactly like base `Create` does, +but in one shot. + +`Create(...)` itself overrides base behaviour just enough to prefer a carve from +the smallest free range whose length is `≥ 3` when the per-cell unused list is +empty (a 2-cell range can't be carved by 1 because the leftover would be smaller +than the minimum free-range size; in that case we fall through to base `Create`, +which will grow at the tail). ## Range deallocation ``` DeallocateRange(start, length): - Coalesce with predecessor (if predecessor.End == start) - Coalesce with successor (if start + length == successor.Start) - Insert resulting range into free-range lists - If start + length == AllocatedLinks + 1, trim the tail and try again + if start + length - 1 == AllocatedLinks: // tail fast path + ClearCells(start, length) + AllocatedLinks -= length + TrimTail() + return + if length == 1: // single-cell hole + ClearCells(start, 1) + unusedLinks.AttachAsFirst(start) + return + freeRanges.Insert(start, length) // coalesces with neighbours + TrimTail() ``` -The "trim the tail" step is what gives the allocator its asymptotic optimality: long -sequences of allocate/free at the end of the file leave the database the same size as -if the operations had never happened. +`Insert` coalesces with the predecessor (if it ends exactly at `start`) and the +successor (if it begins exactly at `start + length`); it can swallow zero, one, +or two neighbours per call. `TrimTail` then walks the high-water mark down past +any trailing single-cell unused links and trailing free ranges — the asymptotic +optimality guarantee that makes long alloc/free sequences leave the database the +same size as if they had never happened. ## Marking & interaction with `Each` / `Count` -When the storage iterates over allocated cells, it tests each cell against the marker -to determine whether to skip it: - -```csharp -bool IsBlobHeader(ref RawLink cell) - => AreEqual(cell.Source, _rawMarker); - -bool IsFreeRangeHeader(ref RawLink cell) - => AreEqual(cell.Source, _rawMarker) && BlobLengthIsFreeMarker(cell.Target); -``` - -Because `RawMarker` doubles for both "binary blob" and "free range header", we need a -way to discriminate the two. We use the convention that: +`UnitedRangedMemoryLinks` overrides `Each(...)` and `Count(...)` for the +unrestricted case. Both walk allocated addresses from `1` to `AllocatedLinks` +and skip a cell entirely when its `Source` matches either marker, advancing past +all of its continuation cells in one step. The restricted overloads delegate to +the base implementation, which already walks tree indexes that only contain real +doublet references. -* a **blob** stores its true length in `Target`, -* a **free range** stores `Length` in `Target` but additionally stores the address-list - prev/next in `SizeAsSource`/`LeftAsTarget`, which are zero in a blob's continuation - cells but the blob _header_ can also have non-zero values there as payload. To - remove the ambiguity, we add a second discriminator: free-range descriptors set the - high bit of `Target` to one (since blob lengths cover at most a fraction of the - available `TLinkAddress` range). On read we strip the high bit before reporting the - length. +`Create`/`Delete` keep their existing semantics for callers: a fresh `Create()` +returns a freshly-initialised single-cell address, and `Delete(link)` puts a +mid-range cell back on the per-cell unused list or trims the tail when removing +the highest cell. ## On-disk compatibility -* `LinksRangedHeader` has the **same byte layout** as `LinksHeader` — - fields are reused via an `Explicit` layout with `FreeRangesHead` overlaying the - existing `Reserved8` slot. +* No header byte layout change. The free-range list head reuses `Reserved8`, + which previous releases of `UnitedMemoryLinks` left at zero. * Databases produced by `UnitedMemoryLinks` open cleanly in - `UnitedRangedMemoryLinks`: at open time the free-range list head is read; if it is - zero the storage is treated as having no free ranges (so existing databases work - immediately, with the existing per-cell unused list still serving single-cell - allocations). -* Databases produced by `UnitedRangedMemoryLinks` that contain only doublets — i.e. no - blobs and no multi-cell free ranges — round-trip back through `UnitedMemoryLinks` - bit-for-bit. + `UnitedRangedMemoryLinks`: `Reserved8 == 0` means "no free ranges yet", and + the per-cell unused list keeps working for single-cell allocations. +* Databases produced by `UnitedRangedMemoryLinks` that contain no blobs and no + multi-cell free ranges round-trip back through `UnitedMemoryLinks` bit-for-bit. +* Databases that **do** contain blobs or multi-cell free ranges are intentionally + not backwards-compatible with old readers — the issue body does not require + cross-version compatibility, and the new file flag in `LinksHeader.Reserved8` + makes it cheap to add a version check later. ## Invariants 1. **No internal fragmentation** — every link cell is either part of an allocated - doublet, part of an allocated blob, part of a free range, or on the single-cell - unused list. The union of all four sets is exactly `[1, AllocatedLinks]`. -2. **No external fragmentation buildup** — coalescing happens on every deallocation; - appending at the tail is the only way to grow. -3. **`AllocatedLinks` is tight** — after every deallocation, the high-water mark is the - address of the highest still-in-use cell, never more. + doublet, part of an allocated blob, part of a multi-cell free range, or on the + single-cell unused list. The union of all four sets is exactly `[1, AllocatedLinks]`. +2. **No external fragmentation buildup** — coalescing happens on every + `DeallocateRange`; appending at the tail is the only way to grow. +3. **`AllocatedLinks` is tight** — after every deallocation, the high-water mark + is the address of the highest still-in-use cell, never more. diff --git a/docs/case-studies/issue-512/solution-plan.md b/docs/case-studies/issue-512/solution-plan.md index 9aafa22ca..fd92289da 100644 --- a/docs/case-studies/issue-512/solution-plan.md +++ b/docs/case-studies/issue-512/solution-plan.md @@ -4,39 +4,41 @@ The plan below maps each requirement to a concrete change and lists the order of implementation. Every checkbox corresponds to one logical commit; the commits land on branch `issue-512-557a0a3ca78d` (PR #513). -## Step 1 — Header & constants scaffolding (`R5`, `R10`) +## Step 1 — Constants scaffolding (`R5`, `R10`) -* Add `csharp/Platform.Data.Doublets/Memory/UnitedRanged/LinksRangedHeader.cs` — a - byte-compatible alias of `LinksHeader` with a typed `FreeRangesHead` slot that - overlays the existing `Reserved8` word. * Add `csharp/Platform.Data.Doublets/Memory/UnitedRanged/UnitedRangedLinksConstants.cs` - — `LinksConstants` subclass that exposes a `RawMarker` constant. + — `LinksConstants` subclass that exposes `RawMarker` (reuses + `Itself`) and `FreeRangeMarker` (reuses `Error`). +* No `LinksHeader` layout change is needed: the free-range list head reuses the + existing `Reserved8` word, which previous releases left at zero. ## Step 2 — Range allocator (`R3`, `R7`, `R8`) * Add `csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/RangedFreeListMethods.cs` - — the in-cell, address-and-size-sorted doubly-linked free-range allocator. -* The allocator exposes: - * `Allocate(length) → start` - * `Deallocate(start, length)` - * `AlreadyFreeRange(start) → bool` + — an address-sorted, doubly-linked free-range allocator stored in-cell. The + allocator exposes `FindBestFit(length)`, `Insert(start, length)` (with + predecessor/successor coalescing), `Detach(start)`, `CarveFromFront`, + `CarveFromBack`, and `TryDetachTail`. ## Step 3 — Raw binary blobs (`R5`, `R6`, `R9`) * Add `csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/RawBinaryMethods.cs` - — encodes/decodes blobs over the allocator. Exposes - `AllocateRawBinary(byteLength)`, `WriteRawBinary(start, span)`, - `ReadRawBinary(start, span)`, `IsRawBinary(start)`, `GetRawBinaryLengthInBytes(start)`, - `DeallocateRawBinary(start)`. + — encodes/decodes blobs over the allocator. Exposes `Write(start, payload)`, + `Read(start, destination)`, `ComputeCellsForBlob(byteLength)`, + `IsRawBinary(address)`, `GetLengthInBytes(address)`, `GetCellCount(address)`, + and `Clear(start)`. ## Step 4 — `UnitedRangedMemoryLinks` (`R1`, `R2`) -* Add `csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/UnitedRangedMemoryLinksBase.cs` - — subclass of `UnitedMemoryLinksBase` that overrides `Create`, `Delete`, `Each`, - `Count`, `Exists`, `IsUnusedLink` so that blob and free-range cells are correctly - ignored. * Add `csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/UnitedRangedMemoryLinks.cs` - — concrete class mirroring the constructor surface of `UnitedMemoryLinks`. + — a single concrete class that inherits directly from `UnitedMemoryLinks`, + mirrors its five constructors, overrides `SetPointers`/`ResetPointers` to wire + up the new helpers, and overrides `Create`/`Delete`/`Each`/`Count` so that blob + and free-range cells are correctly ignored. Exposes the new public API: + `AllocateRange`, `DeallocateRange`, `AllocateRawBinary`, `WriteRawBinary`, + `ReadRawBinary`, `DeallocateRawBinary`, `IsRawBinary`, + `GetRawBinaryLengthInBytes`. A separate `UnitedRangedMemoryLinksBase` was + considered but proved unnecessary — direct inheritance is sufficient. ## Step 5 — Tests (`R2`, `R3`, `R4`, `R5`, `R6`, `R7`, `R8`, `R9`) From deb5495685363a80e3f76209c10432f7428403e4 Mon Sep 17 00:00:00 2001 From: konard Date: Sat, 23 May 2026 12:47:09 +0000 Subject: [PATCH 5/8] Address Codacy static analysis findings - Make per-cell constants in `RawBinaryMethods` private to avoid `S2743` (public static field in generic type). - Drop redundant `: base()` from default `UnitedRangedLinksConstants` constructor (`S3253`). - Use bare `links.Create()` calls in the create/delete test where the return values are unused (`S1481`). - Tag fenced code blocks with `text` and wrap the bare issue link in angle brackets in the case-study docs (markdownlint `MD034`/`MD040`). - Add `.codacy.yaml` so the case-study prose under `docs/case-studies/` is excluded from Codacy analysis, and `.markdownlint.json` mirroring the same intent for local markdownlint runs. --- .codacy.yaml | 3 +++ .markdownlint.json | 5 +++++ .../UnitedRangedMemoryLinksTests.cs | 4 ++-- .../Memory/UnitedRanged/Generic/RawBinaryMethods.cs | 12 ++++++------ .../UnitedRanged/UnitedRangedLinksConstants.cs | 2 +- docs/case-studies/issue-512/README.md | 2 +- docs/case-studies/issue-512/background.md | 2 +- docs/case-studies/issue-512/design.md | 2 +- 8 files changed, 20 insertions(+), 12 deletions(-) create mode 100644 .codacy.yaml create mode 100644 .markdownlint.json diff --git a/.codacy.yaml b/.codacy.yaml new file mode 100644 index 000000000..0cea0cde8 --- /dev/null +++ b/.codacy.yaml @@ -0,0 +1,3 @@ +--- +exclude_paths: + - 'docs/case-studies/**' diff --git a/.markdownlint.json b/.markdownlint.json new file mode 100644 index 000000000..54a82abbb --- /dev/null +++ b/.markdownlint.json @@ -0,0 +1,5 @@ +{ + "default": true, + "MD013": false, + "MD043": false +} diff --git a/csharp/Platform.Data.Doublets.Tests/UnitedRangedMemoryLinksTests.cs b/csharp/Platform.Data.Doublets.Tests/UnitedRangedMemoryLinksTests.cs index 6b221315e..12dd837e5 100644 --- a/csharp/Platform.Data.Doublets.Tests/UnitedRangedMemoryLinksTests.cs +++ b/csharp/Platform.Data.Doublets.Tests/UnitedRangedMemoryLinksTests.cs @@ -29,9 +29,9 @@ public static void CreateAndDelete_ManyLinks_BehavesLikeBase() { using var memory = new HeapResizableDirectMemory(UnitedMemoryLinks.DefaultLinksSizeStep); using var links = new UnitedRangedMemoryLinks(memory, UnitedMemoryLinks.DefaultLinksSizeStep); - var a = links.Create(); + links.Create(); var b = links.Create(); - var c = links.Create(); + links.Create(); Assert.Equal(3UL, links.Count()); links.Delete(b); Assert.Equal(2UL, links.Count()); diff --git a/csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/RawBinaryMethods.cs b/csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/RawBinaryMethods.cs index a9989717c..66f3ed147 100644 --- a/csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/RawBinaryMethods.cs +++ b/csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/RawBinaryMethods.cs @@ -30,12 +30,12 @@ namespace Platform.Data.Doublets.Memory.UnitedRanged.Generic /// public unsafe class RawBinaryMethods where TLinkAddress : IUnsignedNumber { - public static readonly long WordSizeInBytes = System.Runtime.CompilerServices.Unsafe.SizeOf(); - public static readonly long CellSizeInBytes = RawLink.SizeInBytes; - public static readonly long WordsPerCell = CellSizeInBytes / WordSizeInBytes; - public static readonly long HeaderWordsReserved = 2; - public static readonly long PayloadBytesInHeaderCell = (WordsPerCell - HeaderWordsReserved) * WordSizeInBytes; - public static readonly long PayloadBytesInContinuationCell = CellSizeInBytes; + private static readonly long WordSizeInBytes = System.Runtime.CompilerServices.Unsafe.SizeOf(); + private static readonly long CellSizeInBytes = RawLink.SizeInBytes; + private static readonly long WordsPerCell = CellSizeInBytes / WordSizeInBytes; + private const long HeaderWordsReserved = 2; + private static readonly long PayloadBytesInHeaderCell = (WordsPerCell - HeaderWordsReserved) * WordSizeInBytes; + private static readonly long PayloadBytesInContinuationCell = CellSizeInBytes; private readonly byte* _links; private readonly TLinkAddress _rawMarker; diff --git a/csharp/Platform.Data.Doublets/Memory/UnitedRanged/UnitedRangedLinksConstants.cs b/csharp/Platform.Data.Doublets/Memory/UnitedRanged/UnitedRangedLinksConstants.cs index 086aa3015..e2236ff6a 100644 --- a/csharp/Platform.Data.Doublets/Memory/UnitedRanged/UnitedRangedLinksConstants.cs +++ b/csharp/Platform.Data.Doublets/Memory/UnitedRanged/UnitedRangedLinksConstants.cs @@ -49,7 +49,7 @@ public TLinkAddress FreeRangeMarker } [MethodImpl(MethodImplOptions.AggressiveInlining)] - public UnitedRangedLinksConstants() : base() + public UnitedRangedLinksConstants() { RawMarker = Itself; FreeRangeMarker = Error; diff --git a/docs/case-studies/issue-512/README.md b/docs/case-studies/issue-512/README.md index 58c0a17fd..3739b3c01 100644 --- a/docs/case-studies/issue-512/README.md +++ b/docs/case-studies/issue-512/README.md @@ -1,6 +1,6 @@ # Case Study: Issue #512 — `UnitedRangedMemoryLinks` with Ranges for Binary Data -> Source issue: https://github.com/linksplatform/Data.Doublets/issues/512 +> Source issue: > > Author: @konard > diff --git a/docs/case-studies/issue-512/background.md b/docs/case-studies/issue-512/background.md index 46d72cef8..ce89f8194 100644 --- a/docs/case-studies/issue-512/background.md +++ b/docs/case-studies/issue-512/background.md @@ -9,7 +9,7 @@ state of the repository at the time of writing. A united-memory database is a single mapped file that begins with a `LinksHeader` and then continues with a sequence of equally sized `RawLink` cells: -``` +```text +-------------------+-------------------+-------------------+-----+ | Header | Cell #1 | Cell #2 | … | | (LinkSizeInBytes) | (LinkSizeInBytes) | (LinkSizeInBytes) | | diff --git a/docs/case-studies/issue-512/design.md b/docs/case-studies/issue-512/design.md index 571df1e22..f7ab785a2 100644 --- a/docs/case-studies/issue-512/design.md +++ b/docs/case-studies/issue-512/design.md @@ -78,7 +78,7 @@ Each continuation cell carries eight more `TLinkAddress` words of payload (no continuation marker, no length — the head cell's `Target` drives iteration). So a blob of `B` bytes occupies: -``` +```text cells = 1 if B ≤ 6 * sizeof(TLinkAddress) cells = 1 + ceil((B - 6 * sizeof(TLinkAddress)) / (8 * sizeof(TLinkAddress))) otherwise ``` From 6f0b6e24fad3c55dc6cbd75d5990a70cf7b90883 Mon Sep 17 00:00:00 2001 From: konard Date: Sat, 23 May 2026 12:52:56 +0000 Subject: [PATCH 6/8] Inline TLinkAddress refs in RawBinaryMethods static fields (S2743) Each static readonly initializer now references TLinkAddress directly (via RawLink.SizeInBytes or Unsafe.SizeOf()) so SonarCSharp S2743 stops firing. Behaviour and field values are unchanged. --- .../Memory/UnitedRanged/Generic/RawBinaryMethods.cs | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/RawBinaryMethods.cs b/csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/RawBinaryMethods.cs index 66f3ed147..9b5b8a96c 100644 --- a/csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/RawBinaryMethods.cs +++ b/csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/RawBinaryMethods.cs @@ -30,12 +30,11 @@ namespace Platform.Data.Doublets.Memory.UnitedRanged.Generic /// public unsafe class RawBinaryMethods where TLinkAddress : IUnsignedNumber { + private const long HeaderWordsReserved = 2; private static readonly long WordSizeInBytes = System.Runtime.CompilerServices.Unsafe.SizeOf(); private static readonly long CellSizeInBytes = RawLink.SizeInBytes; - private static readonly long WordsPerCell = CellSizeInBytes / WordSizeInBytes; - private const long HeaderWordsReserved = 2; - private static readonly long PayloadBytesInHeaderCell = (WordsPerCell - HeaderWordsReserved) * WordSizeInBytes; - private static readonly long PayloadBytesInContinuationCell = CellSizeInBytes; + private static readonly long PayloadBytesInHeaderCell = RawLink.SizeInBytes - HeaderWordsReserved * System.Runtime.CompilerServices.Unsafe.SizeOf(); + private static readonly long PayloadBytesInContinuationCell = RawLink.SizeInBytes; private readonly byte* _links; private readonly TLinkAddress _rawMarker; From 5d1198d02992cdb841ef28f643d9b4e8f2ecc7d3 Mon Sep 17 00:00:00 2001 From: konard Date: Sat, 23 May 2026 12:57:52 +0000 Subject: [PATCH 7/8] Revert "Initial commit with task details" This reverts commit eac6bcb6cfcccb9d5c6819eac41f2d5bf1b76dd6. --- .gitkeep | 1 - 1 file changed, 1 deletion(-) delete mode 100644 .gitkeep diff --git a/.gitkeep b/.gitkeep deleted file mode 100644 index 4f02c7fb7..000000000 --- a/.gitkeep +++ /dev/null @@ -1 +0,0 @@ -# .gitkeep file auto-generated at 2026-05-23T11:56:14.988Z for PR creation at branch issue-512-557a0a3ca78d for issue https://github.com/linksplatform/Data.Doublets/issues/512 \ No newline at end of file From 6549069b5b4c9f7abe35b4b38c3b04ee60ce7588 Mon Sep 17 00:00:00 2001 From: konard Date: Mon, 25 May 2026 16:29:30 +0000 Subject: [PATCH 8/8] Move raw link sequence helpers to extensions --- .../UnitedRangedMemoryLinksTests.cs | 178 +++++++-- ...ryMethods.cs => RawLinkSequenceMethods.cs} | 117 +++--- .../Generic/UnitedRangedMemoryLinks.cs | 353 +++++++++++++----- .../UnitedRangedMemoryLinksExtensions.cs | 87 +++++ .../UnitedRangedLinksConstants.cs | 22 +- .../UnitedRangedLinksExtensions.cs | 21 ++ docs/case-studies/issue-512/README.md | 9 +- docs/case-studies/issue-512/background.md | 28 +- docs/case-studies/issue-512/design.md | 57 +-- docs/case-studies/issue-512/related-work.md | 6 +- docs/case-studies/issue-512/requirements.md | 55 +-- .../issue-512/risks-and-trade-offs.md | 18 +- docs/case-studies/issue-512/solution-plan.md | 46 ++- 13 files changed, 696 insertions(+), 301 deletions(-) rename csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/{RawBinaryMethods.cs => RawLinkSequenceMethods.cs} (60%) create mode 100644 csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/UnitedRangedMemoryLinksExtensions.cs create mode 100644 csharp/Platform.Data.Doublets/Memory/UnitedRanged/UnitedRangedLinksExtensions.cs diff --git a/csharp/Platform.Data.Doublets.Tests/UnitedRangedMemoryLinksTests.cs b/csharp/Platform.Data.Doublets.Tests/UnitedRangedMemoryLinksTests.cs index 12dd837e5..f10d5e183 100644 --- a/csharp/Platform.Data.Doublets.Tests/UnitedRangedMemoryLinksTests.cs +++ b/csharp/Platform.Data.Doublets.Tests/UnitedRangedMemoryLinksTests.cs @@ -3,6 +3,7 @@ using Xunit; using Platform.Memory; using Platform.Data.Doublets.Memory.United.Generic; +using Platform.Data.Doublets.Memory.UnitedRanged; using Platform.Data.Doublets.Memory.UnitedRanged.Generic; namespace Platform.Data.Doublets.Tests @@ -101,6 +102,26 @@ public static void AllocateRange_PrefersExistingFreeRange() links.DeallocateRange(a, 4UL); } + [Fact] + public static void AllocateRange_OneCellRemainderFeedsSingleCellFreeList() + { + using var memory = new HeapResizableDirectMemory(UnitedMemoryLinks.DefaultLinksSizeStep); + using var links = new UnitedRangedMemoryLinks(memory, UnitedMemoryLinks.DefaultLinksSizeStep); + var range = links.AllocateRange(4UL); // 1..4 + var tail = links.AllocateRange(2UL); // 5..6, keeps the free range away from tail trimming. + + links.DeallocateRange(range, 4UL); + var reused = links.AllocateRange(3UL); + var singleCell = links.Create(); + + Assert.Equal(range, reused); + Assert.Equal(range + 3UL, singleCell); + + links.Delete(singleCell); + links.DeallocateRange(reused, 3UL); + links.DeallocateRange(tail, 2UL); + } + // ----------------------------------------------------------------- // R7, R8 — coalescing and no-fragmentation // ----------------------------------------------------------------- @@ -149,11 +170,11 @@ public static void DeallocateRange_TrimsTail() } // ----------------------------------------------------------------- - // R5, R6, R9 — raw binary blobs + // R5, R6, R9 — raw link sequences // ----------------------------------------------------------------- [Fact] - public static void RawBinary_Roundtrip_SingleCell() + public static void RawLinkSequence_Roundtrip_SingleCell() { using var memory = new HeapResizableDirectMemory(UnitedMemoryLinks.DefaultLinksSizeStep); using var links = new UnitedRangedMemoryLinks(memory, UnitedMemoryLinks.DefaultLinksSizeStep); @@ -163,19 +184,19 @@ public static void RawBinary_Roundtrip_SingleCell() { payload[i] = (byte)(i + 1); } - var blob = links.AllocateRawBinary(payload.Length); - links.WriteRawBinary(blob, payload); - Assert.True(links.IsRawBinary(blob)); - Assert.Equal(48L, links.GetRawBinaryLengthInBytes(blob)); + var sequence = links.AllocateRawLinkSequence(payload.Length); + links.WriteRawLinkSequence(sequence, payload); + Assert.True(links.IsRawLinkSequence(sequence)); + Assert.Equal(48L, links.GetRawLinkSequenceLengthInBytes(sequence)); var read = new byte[payload.Length]; - links.ReadRawBinary(blob, read); + links.ReadRawLinkSequence(sequence, read); Assert.Equal(payload, read); - links.DeallocateRawBinary(blob); - Assert.False(links.IsRawBinary(blob)); + links.DeallocateRawLinkSequence(sequence); + Assert.False(links.IsRawLinkSequence(sequence)); } [Fact] - public static void RawBinary_Roundtrip_MultiCell() + public static void RawLinkSequence_Roundtrip_MultiCell() { using var memory = new HeapResizableDirectMemory(UnitedMemoryLinks.DefaultLinksSizeStep); using var links = new UnitedRangedMemoryLinks(memory, UnitedMemoryLinks.DefaultLinksSizeStep); @@ -185,50 +206,150 @@ public static void RawBinary_Roundtrip_MultiCell() { payload[i] = (byte)((i * 7 + 3) & 0xFF); } - var blob = links.AllocateRawBinary(payload.Length); - links.WriteRawBinary(blob, payload); - Assert.True(links.IsRawBinary(blob)); - Assert.Equal((long)payload.Length, links.GetRawBinaryLengthInBytes(blob)); + var sequence = links.AllocateRawLinkSequence(payload.Length); + links.WriteRawLinkSequence(sequence, payload); + Assert.True(links.IsRawLinkSequence(sequence)); + Assert.Equal((long)payload.Length, links.GetRawLinkSequenceLengthInBytes(sequence)); var read = new byte[payload.Length]; - links.ReadRawBinary(blob, read); + links.ReadRawLinkSequence(sequence, read); Assert.Equal(payload, read); - links.DeallocateRawBinary(blob); + links.DeallocateRawLinkSequence(sequence); } [Fact] - public static void RawBinary_DoesNotAppearInEach() + public static void RawLinkSequence_ZeroLength_RoundtripAndUsesOneCell() + { + using var memory = new HeapResizableDirectMemory(UnitedMemoryLinks.DefaultLinksSizeStep); + using var links = new UnitedRangedMemoryLinks(memory, UnitedMemoryLinks.DefaultLinksSizeStep); + + var sequence = links.AllocateRawLinkSequence(0); + + Assert.True(links.IsRawLinkSequence(sequence)); + Assert.Equal(0L, links.GetRawLinkSequenceLengthInBytes(sequence)); + links.ReadRawLinkSequence(sequence, Array.Empty()); + links.DeallocateRawLinkSequence(sequence); + var reused = links.AllocateRange(1UL); + Assert.Equal(sequence, reused); + links.DeallocateRange(reused, 1UL); + } + + [Fact] + public static void RawLinkSequence_LengthMustBeWordAligned() + { + using var memory = new HeapResizableDirectMemory(UnitedMemoryLinks.DefaultLinksSizeStep); + using var links = new UnitedRangedMemoryLinks(memory, UnitedMemoryLinks.DefaultLinksSizeStep); + + Assert.Throws(() => links.AllocateRawLinkSequence(1)); + + var sequence = links.AllocateRawLinkSequence(8); + Assert.Throws(() => links.WriteRawLinkSequence(sequence, new byte[1])); + links.DeallocateRawLinkSequence(sequence); + } + + [Fact] + public static void RawLinkSequence_AppearsInEachByDefault() { using var memory = new HeapResizableDirectMemory(UnitedMemoryLinks.DefaultLinksSizeStep); using var links = new UnitedRangedMemoryLinks(memory, UnitedMemoryLinks.DefaultLinksSizeStep); var a = links.Create(); - var blob = links.AllocateRawBinary(48); + var sequence = links.AllocateRawLinkSequence(48); var b = links.Create(); - Assert.True(links.IsRawBinary(blob)); + Assert.True(links.IsRawLinkSequence(sequence)); var seen = new List(); links.Each(link => { seen.Add(links.GetIndex(link)); return links.Constants.Continue; }); - // The blob head is in the allocated range but must not appear in Each(). - Assert.DoesNotContain(blob, seen); + Assert.Contains(sequence, seen); Assert.Contains(a, seen); Assert.Contains(b, seen); - Assert.Equal(2, seen.Count); + Assert.Equal(3, seen.Count); + Assert.Equal(3UL, links.Count()); + Assert.Equal(1UL, links.Count(new[] { sequence })); + Assert.Equal(3UL, links.Count(new Link(links.Constants.Any, links.Constants.Any, links.Constants.Any))); // Cleanup. - links.DeallocateRawBinary(blob); + links.DeallocateRawLinkSequence(sequence); + links.Delete(a); + links.Delete(b); + } + + [Fact] + public static void RawLinkSequence_CanBeExcludedFromEachByConfiguration() + { + using var memory = new HeapResizableDirectMemory(UnitedMemoryLinks.DefaultLinksSizeStep); + using var links = new UnitedRangedMemoryLinks(memory, UnitedMemoryLinks.DefaultLinksSizeStep, includeRawLinkSequences: false); + var a = links.Create(); + var sequence = links.AllocateRawLinkSequence(48); + var b = links.Create(); + var seen = new List(); + + links.Each(link => + { + seen.Add(links.GetIndex(link)); + return links.Constants.Continue; + }); + + Assert.DoesNotContain(sequence, seen); + Assert.Contains(a, seen); + Assert.Contains(b, seen); + Assert.Equal(2, seen.Count); + Assert.Equal(2UL, links.Count()); + Assert.Equal(0UL, links.Count(new[] { sequence })); + + links.DeallocateRawLinkSequence(sequence); links.Delete(a); links.Delete(b); } [Fact] - public static void Each_SkipsFreeRangesAndBlobs() + public static void RawLinkSequence_CanBeReturnedByEachRestriction() + { + using var memory = new HeapResizableDirectMemory(UnitedMemoryLinks.DefaultLinksSizeStep); + using var links = new UnitedRangedMemoryLinks(memory, UnitedMemoryLinks.DefaultLinksSizeStep); + var constants = (UnitedRangedLinksConstants)links.Constants; + var sequence = links.AllocateRawLinkSequence(48); + IList? found = null; + + links.Each(new Link(links.Constants.Any, constants.RawLinkSequenceMarker, links.Constants.Any), link => + { + found = link; + return links.Constants.Break; + }); + + Assert.NotNull(found); + Assert.True(links.IsRawLinkSequence(found)); + Assert.Equal(sequence, links.GetIndex(found)); + Assert.Equal(constants.RawLinkSequenceMarker, links.GetSource(found)); + Assert.Equal(48UL, links.GetTarget(found)); + Assert.Equal(1UL, links.Count(new Link(links.Constants.Any, constants.RawLinkSequenceMarker, links.Constants.Any))); + Assert.Equal(1UL, links.Count(new[] { links.Constants.Any, constants.RawLinkSequenceMarker })); + + links.DeallocateRawLinkSequence(sequence); + } + + [Fact] + public static void Delete_DeallocatesRawLinkSequenceThroughUniversalInterface() + { + using var memory = new HeapResizableDirectMemory(UnitedMemoryLinks.DefaultLinksSizeStep); + using var links = new UnitedRangedMemoryLinks(memory, UnitedMemoryLinks.DefaultLinksSizeStep); + var sequence = links.AllocateRawLinkSequence(432); + + links.Delete(sequence); + var reused = links.AllocateRange(7UL); + + Assert.Equal(sequence, reused); + links.DeallocateRange(reused, 7UL); + } + + [Fact] + public static void Each_SkipsFreeRangesAndIncludesConfiguredRawLinkSequences() { using var memory = new HeapResizableDirectMemory(UnitedMemoryLinks.DefaultLinksSizeStep); using var links = new UnitedRangedMemoryLinks(memory, UnitedMemoryLinks.DefaultLinksSizeStep); var a = links.Create(); var range = links.AllocateRange(4UL); - var blob = links.AllocateRawBinary(48); + var sequence = links.AllocateRawLinkSequence(48); var b = links.Create(); // The mid-allocated range must not be visible to Each — register it as a free range. links.DeallocateRange(range, 4UL); @@ -238,10 +359,11 @@ public static void Each_SkipsFreeRangesAndBlobs() ids.Add(links.GetIndex(link)); return links.Constants.Continue; }); - Assert.Equal(new[] { a, b }, ids); - Assert.Equal(2UL, links.Count()); + Assert.Equal(new[] { a, sequence, b }, ids); + Assert.Equal(3UL, links.Count()); + Assert.Equal(0UL, links.Count(new[] { range })); // Cleanup. - links.DeallocateRawBinary(blob); + links.DeallocateRawLinkSequence(sequence); links.Delete(a); links.Delete(b); } diff --git a/csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/RawBinaryMethods.cs b/csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/RawLinkSequenceMethods.cs similarity index 60% rename from csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/RawBinaryMethods.cs rename to csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/RawLinkSequenceMethods.cs index 9b5b8a96c..21f1527e9 100644 --- a/csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/RawBinaryMethods.cs +++ b/csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/RawLinkSequenceMethods.cs @@ -9,26 +9,11 @@ namespace Platform.Data.Doublets.Memory.UnitedRanged.Generic { /// - /// - /// Encodes and decodes raw binary blobs that live inside the link cell address - /// space. A blob spans one or more consecutive cells. - /// - /// - /// The first cell stores a small descriptor: - /// - /// - /// Source = RawMarker - /// Target = blob length in bytes (must be a multiple of - /// sizeof(TLinkAddress)) - /// - /// - /// The remaining six TLinkAddress words of the header cell carry the first - /// chunk of payload. Each subsequent cell stores eight more words of payload. There - /// are no continuation markers; iteration is driven by the head cell's - /// Target, and intermediate cell indices are not valid link handles. - /// + /// Encodes and decodes raw link sequences that live inside the link cell address + /// space. A sequence can be used as an opaque byte payload, but its storage remains + /// a contiguous range of regular cells. /// - public unsafe class RawBinaryMethods where TLinkAddress : IUnsignedNumber + public unsafe class RawLinkSequenceMethods where TLinkAddress : IUnsignedNumber { private const long HeaderWordsReserved = 2; private static readonly long WordSizeInBytes = System.Runtime.CompilerServices.Unsafe.SizeOf(); @@ -37,97 +22,108 @@ public unsafe class RawBinaryMethods where TLinkAddress : IUnsigne private static readonly long PayloadBytesInContinuationCell = RawLink.SizeInBytes; private readonly byte* _links; - private readonly TLinkAddress _rawMarker; + private readonly TLinkAddress _sequenceMarker; [MethodImpl(MethodImplOptions.AggressiveInlining)] - public RawBinaryMethods(byte* links, TLinkAddress rawMarker) + public RawLinkSequenceMethods(byte* links, TLinkAddress sequenceMarker) { _links = links; - _rawMarker = rawMarker; + _sequenceMarker = sequenceMarker; } [MethodImpl(MethodImplOptions.AggressiveInlining)] private ref RawLink GetLinkReference(TLinkAddress address) => ref AsRef>(_links + CellSizeInBytes * long.CreateTruncating(address)); /// - /// Number of cells required to hold a blob of - /// bytes. must be a non-negative multiple of - /// . + /// Number of cells required to hold + /// bytes. The length must be a non-negative multiple of + /// sizeof(TLinkAddress). /// [MethodImpl(MethodImplOptions.AggressiveInlining)] - public static long ComputeCellsForBlob(long byteLength) + public static long ComputeCellsForPayload(long payloadLengthInBytes) { - if (byteLength < 0) + ValidatePayloadLength(payloadLengthInBytes, nameof(payloadLengthInBytes)); + if (payloadLengthInBytes <= PayloadBytesInHeaderCell) { - throw new ArgumentOutOfRangeException(nameof(byteLength)); + return 1; } - if ((byteLength % WordSizeInBytes) != 0) + var overflow = payloadLengthInBytes - PayloadBytesInHeaderCell; + return 1 + (overflow + PayloadBytesInContinuationCell - 1) / PayloadBytesInContinuationCell; + } + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public static void ValidatePayloadLength(long payloadLengthInBytes, string argumentName) + { + if (payloadLengthInBytes < 0) { - throw new ArgumentException("Blob length must be a multiple of sizeof(TLinkAddress).", nameof(byteLength)); + throw new ArgumentOutOfRangeException(argumentName); } - if (byteLength <= PayloadBytesInHeaderCell) + if ((payloadLengthInBytes % WordSizeInBytes) != 0) { - return 1; + throw new ArgumentException("Raw link sequence length must be a multiple of sizeof(TLinkAddress).", argumentName); } - var overflow = byteLength - PayloadBytesInHeaderCell; - return 1 + (overflow + PayloadBytesInContinuationCell - 1) / PayloadBytesInContinuationCell; } /// /// Returns true if the cell at is the head of a raw - /// binary blob. + /// link sequence. /// [MethodImpl(MethodImplOptions.AggressiveInlining)] - public bool IsRawBinary(TLinkAddress address) + public bool IsRawLinkSequence(TLinkAddress address) { if (address == default) { return false; } - return GetLinkReference(address).Source == _rawMarker; + return GetLinkReference(address).Source == _sequenceMarker; } /// - /// Returns the blob's length in bytes (the value stored in the head cell's - /// Target field). + /// Returns the sequence's payload length in bytes. /// [MethodImpl(MethodImplOptions.AggressiveInlining)] public long GetLengthInBytes(TLinkAddress address) => long.CreateTruncating(GetLinkReference(address).Target); /// - /// Returns the number of cells the blob at occupies. + /// Returns the number of cells the sequence at occupies. + /// + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public long GetCellCount(TLinkAddress address) => ComputeCellsForPayload(GetLengthInBytes(address)); + + /// + /// Writes only the marker and length descriptor into the sequence head. /// [MethodImpl(MethodImplOptions.AggressiveInlining)] - public long GetCellCount(TLinkAddress address) => ComputeCellsForBlob(GetLengthInBytes(address)); + public void WriteDescriptor(TLinkAddress start, long payloadLengthInBytes) + { + ValidatePayloadLength(payloadLengthInBytes, nameof(payloadLengthInBytes)); + ref var head = ref GetLinkReference(start); + head.Source = _sequenceMarker; + head.Target = TLinkAddress.CreateTruncating(payloadLengthInBytes); + } /// - /// Writes the blob descriptor and payload into a previously-allocated range - /// starting at . The destination range must be large - /// enough to fit ComputeCellsForBlob(payload.Length) cells. + /// Writes the descriptor and payload into a previously allocated range starting + /// at . /// public void Write(TLinkAddress start, ReadOnlySpan payload) { - if ((payload.Length % WordSizeInBytes) != 0) - { - throw new ArgumentException("Blob length must be a multiple of sizeof(TLinkAddress).", nameof(payload)); - } + ValidatePayloadLength(payload.Length, nameof(payload)); ref var head = ref GetLinkReference(start); - head.Source = _rawMarker; + head.Source = _sequenceMarker; head.Target = TLinkAddress.CreateTruncating(payload.Length); - // Copy first chunk into the header cell, after the 2 reserved descriptor words. var headPtr = (byte*)AsPointer(ref head) + (HeaderWordsReserved * WordSizeInBytes); var firstChunk = (int)Math.Min(payload.Length, PayloadBytesInHeaderCell); if (firstChunk > 0) { payload.Slice(0, firstChunk).CopyTo(new Span(headPtr, firstChunk)); } - // Zero the unused tail of the header cell's payload area. if (firstChunk < PayloadBytesInHeaderCell) { new Span(headPtr + firstChunk, (int)(PayloadBytesInHeaderCell - firstChunk)).Clear(); } - // Copy remaining chunks into continuation cells. + var remaining = payload.Length - firstChunk; var offset = firstChunk; var continuationIndex = long.CreateTruncating(start) + 1; @@ -147,9 +143,8 @@ public void Write(TLinkAddress start, ReadOnlySpan payload) } /// - /// Reads the payload of the blob at into - /// . must be at - /// least as long as the blob. + /// Reads the payload of the sequence at into + /// . /// public void Read(TLinkAddress start, Span destination) { @@ -178,17 +173,5 @@ public void Read(TLinkAddress start, Span destination) continuationIndex++; } } - - /// - /// Zeroes the entire blob range so that it looks like a fresh, uninitialised - /// span of cells ready to be returned to the allocator. - /// - [MethodImpl(MethodImplOptions.AggressiveInlining)] - public void Clear(TLinkAddress start) - { - var cells = GetCellCount(start); - var dst = _links + CellSizeInBytes * long.CreateTruncating(start); - new Span(dst, checked((int)(cells * CellSizeInBytes))).Clear(); - } } } diff --git a/csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/UnitedRangedMemoryLinks.cs b/csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/UnitedRangedMemoryLinks.cs index f991cd3fb..372d2397c 100644 --- a/csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/UnitedRangedMemoryLinks.cs +++ b/csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/UnitedRangedMemoryLinks.cs @@ -18,16 +18,17 @@ namespace Platform.Data.Doublets.Memory.UnitedRanged.Generic /// /// A drop-in substitute for that /// additionally tracks unused space as a list of ranges of cells (not - /// only one-cell at a time) and supports raw binary payloads stored inside the - /// same address space. + /// only one-cell at a time) and supports raw link sequences stored inside the + /// same address space. Raw link sequences can be used as byte payloads for raw + /// data, binary files, and similar use cases. /// /// /// Single-cell / semantics are unchanged /// for callers, but the implementation will prefer to fill an existing free /// range before extending the underlying memory. / /// expose contiguous multi-cell allocations - /// (best-fit + coalescing). stores a blob whose - /// payload reuses the tree-index fields of the spanned cells as opaque bytes. + /// (best-fit + coalescing). Convenience operations for raw link sequence payloads + /// are provided as extension methods over this range allocator. /// /// public unsafe class UnitedRangedMemoryLinks : UnitedMemoryLinks @@ -35,7 +36,20 @@ public unsafe class UnitedRangedMemoryLinks : UnitedMemoryLinks? _freeRanges; - private RawBinaryMethods? _rawBinary; + private RawLinkSequenceMethods? _rawLinkSequences; + private bool _includeRawLinkSequences = true; + + /// + /// Controls whether raw link sequence heads are returned by + /// and included by . Continuation cells are never returned. + /// + public bool IncludeRawLinkSequences + { + [MethodImpl(MethodImplOptions.AggressiveInlining)] + get => _includeRawLinkSequences; + [MethodImpl(MethodImplOptions.AggressiveInlining)] + set => _includeRawLinkSequences = value; + } [MethodImpl(MethodImplOptions.AggressiveInlining)] public UnitedRangedMemoryLinks(string address) : this(address, DefaultLinksSizeStep) { } @@ -43,16 +57,32 @@ public UnitedRangedMemoryLinks(string address) : this(address, DefaultLinksSizeS [MethodImpl(MethodImplOptions.AggressiveInlining)] public UnitedRangedMemoryLinks(string address, long memoryReservationStep) : this(new FileMappedResizableDirectMemory(address, memoryReservationStep), memoryReservationStep) { } + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public UnitedRangedMemoryLinks(string address, bool includeRawLinkSequences) : this(address, DefaultLinksSizeStep, includeRawLinkSequences) { } + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public UnitedRangedMemoryLinks(string address, long memoryReservationStep, bool includeRawLinkSequences) : this(new FileMappedResizableDirectMemory(address, memoryReservationStep), memoryReservationStep, includeRawLinkSequences) { } + [MethodImpl(MethodImplOptions.AggressiveInlining)] public UnitedRangedMemoryLinks(IResizableDirectMemory memory) : this(memory, DefaultLinksSizeStep) { } [MethodImpl(MethodImplOptions.AggressiveInlining)] - public UnitedRangedMemoryLinks(IResizableDirectMemory memory, long memoryReservationStep) : this(memory, memoryReservationStep, Default>.Instance, IndexTreeType.Default) { } + public UnitedRangedMemoryLinks(IResizableDirectMemory memory, long memoryReservationStep) : this(memory, memoryReservationStep, Default>.Instance, IndexTreeType.Default, includeRawLinkSequences: true) { } + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public UnitedRangedMemoryLinks(IResizableDirectMemory memory, long memoryReservationStep, bool includeRawLinkSequences) : this(memory, memoryReservationStep, Default>.Instance, IndexTreeType.Default, includeRawLinkSequences) { } [MethodImpl(MethodImplOptions.AggressiveInlining)] public UnitedRangedMemoryLinks(IResizableDirectMemory memory, long memoryReservationStep, UnitedRangedLinksConstants constants, IndexTreeType indexTreeType) + : this(memory, memoryReservationStep, constants, indexTreeType, includeRawLinkSequences: true) + { + } + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public UnitedRangedMemoryLinks(IResizableDirectMemory memory, long memoryReservationStep, UnitedRangedLinksConstants constants, IndexTreeType indexTreeType, bool includeRawLinkSequences) : base(memory, memoryReservationStep, constants, indexTreeType) { + IncludeRawLinkSequences = includeRawLinkSequences; } [MethodImpl(MethodImplOptions.AggressiveInlining)] @@ -62,7 +92,7 @@ protected override void SetPointers(IResizableDirectMemory memory) _rangedLinks = (byte*)memory.Pointer; var rangedConstants = (UnitedRangedLinksConstants)Constants; _freeRanges = new RangedFreeListMethods(_rangedLinks, _rangedLinks, rangedConstants.FreeRangeMarker); - _rawBinary = new RawBinaryMethods(_rangedLinks, rangedConstants.RawMarker); + _rawLinkSequences = new RawLinkSequenceMethods(_rangedLinks, rangedConstants.RawLinkSequenceMarker); } [MethodImpl(MethodImplOptions.AggressiveInlining)] @@ -71,7 +101,7 @@ protected override void ResetPointers() base.ResetPointers(); _rangedLinks = null; _freeRanges = null; - _rawBinary = null; + _rawLinkSequences = null; } // ------------------------------------------------------------------------- @@ -79,52 +109,126 @@ protected override void ResetPointers() // ------------------------------------------------------------------------- /// - /// Returns the number of regular doublets (excludes single-cell unused - /// links, multi-cell free ranges and raw binary blobs). + /// Returns the number of visible records. Free ranges and raw link sequence + /// continuation cells are always hidden; raw link sequence heads are included + /// when is enabled. /// [MethodImpl(MethodImplOptions.AggressiveInlining)] public override TLinkAddress Count(IList? restriction) { - if (restriction!.Count == 0) + restriction ??= Array.Empty(); + if (restriction.Count > 3) + { + throw new NotSupportedException("Другие размеры и способы ограничений не поддерживаются."); + } + var constants = Constants; + var any = constants.Any; + var count = default(TLinkAddress); + if (restriction.Count == 2 && restriction[constants.IndexPart] == any) { - return CountRegularLinks(); + var value = restriction[1]; + if (value == any) + { + return CountVisibleLinks(); + } + ForEachVisibleLink(link => + { + if (link.Source == value) + { + count = count + TLinkAddress.One; + } + if (link.Target == value) + { + count = count + TLinkAddress.One; + } + return constants.Continue; + }); + return count; } - return base.Count(restriction); + ForEachVisibleLink(link => + { + if (MatchesRestriction(link, restriction)) + { + count = count + TLinkAddress.One; + } + return constants.Continue; + }); + return count; } /// - /// Iterates over regular doublets only; skips free-range and raw-binary - /// cells entirely. + /// Iterates over visible records. Free ranges and raw link sequence + /// continuation cells are always hidden; raw link sequence heads are included + /// when is enabled. /// [MethodImpl(MethodImplOptions.AggressiveInlining)] public override TLinkAddress Each(IList? restriction, ReadHandler? handler) { - if (restriction!.Count == 0) + restriction ??= Array.Empty(); + if (restriction.Count > 3) + { + throw new NotSupportedException("Другие размеры и способы ограничений не поддерживаются."); + } + var constants = Constants; + var @break = constants.Break; + var @continue = constants.Continue; + var any = constants.Any; + if (restriction.Count == 2 && restriction[constants.IndexPart] == any) { - var @break = Constants.Break; - var allocated = GetHeaderReference().AllocatedLinks; - var link = TLinkAddress.One; - while (link <= allocated) + var value = restriction[1]; + if (value == any) { - if (_freeRanges!.IsFreeRangeHead(link)) + return EachMatchingLink(handler, link => true, returnBreakOnCompletion: true); + } + if (ForEachVisibleLink(link => + { + if (link.Source != value) + { + return @continue; + } + if (handler != null && handler(link) == @break) + { + return @break; + } + return @continue; + }) == @break) + { + return @break; + } + return ForEachVisibleLink(link => + { + if (link.Target != value) + { + return @continue; + } + if (handler != null && handler(link) == @break) { - link = link + _freeRanges.GetLength(link); - continue; + return @break; } - if (_rawBinary!.IsRawBinary(link)) + return @continue; + }); + } + return EachMatchingLink(handler, link => MatchesRestriction(link, restriction), IsWholeStoreScan(restriction)); + + TLinkAddress EachMatchingLink(ReadHandler? visibleHandler, Func, bool> predicate, bool returnBreakOnCompletion) + { + if (ForEachVisibleLink(link => + { + if (!predicate(link)) { - link = link + TLinkAddress.CreateTruncating(_rawBinary.GetCellCount(link)); - continue; + return @continue; } - if (Exists(link) && handler!(GetLinkStruct(link)) == @break) + if (visibleHandler != null && visibleHandler(link) == @break) { return @break; } - link = link + TLinkAddress.One; + return @continue; + }) == @break || returnBreakOnCompletion) + { + return @break; } - return @break; + return @continue; } - return base.Each(restriction, handler); } /// @@ -157,8 +261,8 @@ public override TLinkAddress Create(IList? substitution, WriteHand /// Deletes a single doublet. Behaviour matches the base class for /// non-tail links; for tail links the trimming loop additionally retires /// trailing single-cell unused links and trailing free ranges, but never - /// confuses a free-range head or a blob head with a single-cell unused - /// link. + /// confuses a free-range head or a raw link sequence head with a + /// single-cell unused link. /// [MethodImpl(MethodImplOptions.AggressiveInlining)] public override TLinkAddress Delete(IList? restriction, WriteHandler? handler) @@ -166,6 +270,16 @@ public override TLinkAddress Delete(IList? restriction, WriteHandl ref var header = ref GetHeaderReference(); var link = restriction![Constants.IndexPart]; var before = GetLinkStruct(link); + if (_rawLinkSequences!.IsRawLinkSequence(link)) + { + var cells = _rawLinkSequences.GetCellCount(link); + DeallocateRange(link, TLinkAddress.CreateTruncating(cells)); + return handler != null ? handler(before, null) : Constants.Continue; + } + if (_freeRanges!.IsFreeRangeHead(link)) + { + return Constants.Continue; + } if (link < header.AllocatedLinks) { UnusedLinksListMethods.AttachAsFirst(link); @@ -181,15 +295,36 @@ public override TLinkAddress Delete(IList? restriction, WriteHandl return Constants.Continue; } + /// + /// Protects ranged metadata cells from being treated as normal doublets by + /// generic update helpers. Reset updates are accepted as no-ops so the + /// existing delete extension can still deallocate a raw link sequence through + /// the universal surface. + /// + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public override TLinkAddress Update(IList? restriction, IList? substitution, WriteHandler? handler) + { + var link = restriction![Constants.IndexPart]; + if (_rawLinkSequences!.IsRawLinkSequence(link) || _freeRanges!.IsFreeRangeHead(link)) + { + if (IsResetSubstitution(substitution)) + { + return Constants.Continue; + } + throw new InvalidOperationException("Ranged metadata cells cannot be updated as regular doublets."); + } + return base.Update(restriction, substitution, handler); + } + // ------------------------------------------------------------------------- - // Public range / raw-binary API + // Public range API // ------------------------------------------------------------------------- /// /// Allocates contiguous cells and returns the /// address of the first cell. The cells are uninitialised — the caller /// is expected to immediately write a meaningful payload (or pass the - /// result to ). + /// result to a raw link sequence extension method). /// public TLinkAddress AllocateRange(TLinkAddress length) { @@ -236,8 +371,9 @@ public TLinkAddress AllocateRange(TLinkAddress length) /// /// Returns a multi-cell range to the allocator. /// must be the first cell previously returned by - /// (or the head of a blob being released), - /// and must match the original allocation. + /// (or the head of a raw link sequence being + /// released), and must match the original + /// allocation. /// public void DeallocateRange(TLinkAddress start, TLinkAddress length) { @@ -267,64 +403,6 @@ public void DeallocateRange(TLinkAddress start, TLinkAddress length) TrimTail(); } - /// - /// Allocates space for a raw binary blob of - /// bytes and returns the head cell address. - /// must be a non-negative multiple of sizeof(TLinkAddress). - /// The blob is left uninitialised until is - /// called. - /// - public TLinkAddress AllocateRawBinary(long byteLength) - { - var cells = RawBinaryMethods.ComputeCellsForBlob(byteLength); - var start = AllocateRange(TLinkAddress.CreateTruncating(cells)); - // Clear so that IsRawBinary / IsFreeRangeHead probes on uninitialised - // cells behave predictably until the payload is actually written. - ClearCells(start, TLinkAddress.CreateTruncating(cells)); - // Stamp the descriptor (Source = RawMarker, Target = byteLength). - _rawBinary!.Write(start, ReadOnlySpan.Empty); - // Write() with an empty payload sets the descriptor's Target to 0, so - // overwrite it now that we know the real length. - var rangedConstants = (UnitedRangedLinksConstants)Constants; - ref var head = ref AsRef>(_rangedLinks + (RawLink.SizeInBytes * long.CreateTruncating(start))); - head.Source = rangedConstants.RawMarker; - head.Target = TLinkAddress.CreateTruncating(byteLength); - return start; - } - - /// - /// Writes into the blob whose head is at - /// . The blob must have been allocated with - /// using the same byte length. - /// - [MethodImpl(MethodImplOptions.AggressiveInlining)] - public void WriteRawBinary(TLinkAddress start, ReadOnlySpan payload) => _rawBinary!.Write(start, payload); - - /// - /// Copies the payload of the blob at into - /// . - /// - [MethodImpl(MethodImplOptions.AggressiveInlining)] - public void ReadRawBinary(TLinkAddress start, Span destination) => _rawBinary!.Read(start, destination); - - /// - /// Releases the storage of the blob at . - /// - [MethodImpl(MethodImplOptions.AggressiveInlining)] - public void DeallocateRawBinary(TLinkAddress start) - { - var cells = _rawBinary!.GetCellCount(start); - DeallocateRange(start, TLinkAddress.CreateTruncating(cells)); - } - - /// True if the cell at is a raw binary head. - [MethodImpl(MethodImplOptions.AggressiveInlining)] - public bool IsRawBinary(TLinkAddress address) => _rawBinary!.IsRawBinary(address); - - /// Returns the byte length of the blob at . - [MethodImpl(MethodImplOptions.AggressiveInlining)] - public long GetRawBinaryLengthInBytes(TLinkAddress address) => _rawBinary!.GetLengthInBytes(address); - // ------------------------------------------------------------------------- // Internals // ------------------------------------------------------------------------- @@ -397,7 +475,7 @@ private bool IsSingleCellUnused(TLinkAddress link) return false; } var rangedConstants = (UnitedRangedLinksConstants)Constants; - if (cell.Source == rangedConstants.FreeRangeMarker || cell.Source == rangedConstants.RawMarker) + if (cell.Source == rangedConstants.FreeRangeMarker || cell.Source == rangedConstants.RawLinkSequenceMarker) { return false; } @@ -405,7 +483,7 @@ private bool IsSingleCellUnused(TLinkAddress link) } [MethodImpl(MethodImplOptions.AggressiveInlining)] - private void ClearCells(TLinkAddress start, TLinkAddress length) + internal void ClearCells(TLinkAddress start, TLinkAddress length) { var startLong = long.CreateTruncating(start); var lengthLong = long.CreateTruncating(length); @@ -413,10 +491,28 @@ private void ClearCells(TLinkAddress start, TLinkAddress length) new Span(ptr, checked((int)(lengthLong * RawLink.SizeInBytes))).Clear(); } + internal RawLinkSequenceMethods RawLinkSequences + { + [MethodImpl(MethodImplOptions.AggressiveInlining)] + get => _rawLinkSequences!; + } + [MethodImpl(MethodImplOptions.AggressiveInlining)] - private TLinkAddress CountRegularLinks() + private TLinkAddress CountVisibleLinks() { var count = default(TLinkAddress); + ForEachVisibleLink(_ => + { + count = count + TLinkAddress.One; + return Constants.Continue; + }); + return count; + } + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + private TLinkAddress ForEachVisibleLink(Func, TLinkAddress> action) + { + var @break = Constants.Break; var allocated = GetHeaderReference().AllocatedLinks; var link = TLinkAddress.One; while (link <= allocated) @@ -426,18 +522,73 @@ private TLinkAddress CountRegularLinks() link = link + _freeRanges.GetLength(link); continue; } - if (_rawBinary!.IsRawBinary(link)) + if (_rawLinkSequences!.IsRawLinkSequence(link)) { - link = link + TLinkAddress.CreateTruncating(_rawBinary.GetCellCount(link)); + if (IncludeRawLinkSequences && action(new Link(link, GetLinkReference(link).Source, GetLinkReference(link).Target)) == @break) + { + return @break; + } + link = link + TLinkAddress.CreateTruncating(_rawLinkSequences.GetCellCount(link)); continue; } if (Exists(link)) { - count = count + TLinkAddress.One; + if (action(new Link(link, GetLinkReference(link).Source, GetLinkReference(link).Target)) == @break) + { + return @break; + } } link = link + TLinkAddress.One; } - return count; + return Constants.Continue; + } + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + private bool MatchesRestriction(Link link, IList restriction) + { + var constants = Constants; + var any = constants.Any; + return restriction.Count switch + { + 0 => true, + 1 => restriction[constants.IndexPart] == any || link.Index == restriction[constants.IndexPart], + 2 => MatchesIndex(link, restriction[constants.IndexPart], any) + && (restriction[1] == any || link.Source == restriction[1] || link.Target == restriction[1]), + 3 => MatchesIndex(link, restriction[constants.IndexPart], any) + && (restriction[constants.SourcePart] == any || link.Source == restriction[constants.SourcePart]) + && (restriction[constants.TargetPart] == any || link.Target == restriction[constants.TargetPart]), + _ => false + }; + } + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + private static bool MatchesIndex(Link link, TLinkAddress index, TLinkAddress any) => index == any || link.Index == index; + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + private bool IsWholeStoreScan(IList restriction) + { + var constants = Constants; + var any = constants.Any; + return restriction.Count switch + { + 0 => true, + 1 => restriction[constants.IndexPart] == any, + 2 => restriction[constants.IndexPart] == any && restriction[1] == any, + 3 => restriction[constants.IndexPart] == any + && restriction[constants.SourcePart] == any + && restriction[constants.TargetPart] == any, + _ => false + }; + } + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + private bool IsResetSubstitution(IList? substitution) + { + if (substitution == null || substitution.Count < 3) + { + return false; + } + return substitution[Constants.SourcePart] == Constants.Null && substitution[Constants.TargetPart] == Constants.Null; } } } diff --git a/csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/UnitedRangedMemoryLinksExtensions.cs b/csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/UnitedRangedMemoryLinksExtensions.cs new file mode 100644 index 000000000..5f274d666 --- /dev/null +++ b/csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/UnitedRangedMemoryLinksExtensions.cs @@ -0,0 +1,87 @@ +using System; +using System.Numerics; +using System.Runtime.CompilerServices; + +#pragma warning disable CS1591 // Missing XML comment for publicly visible type or member + +namespace Platform.Data.Doublets.Memory.UnitedRanged.Generic +{ + public static class UnitedRangedMemoryLinksExtensions + { + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public static TLinkAddress AllocateRawLinkSequence(this UnitedRangedMemoryLinks links, long payloadLengthInBytes) + where TLinkAddress : IUnsignedNumber, IShiftOperators, IBitwiseOperators, IMinMaxValue, IComparisonOperators + { + var cells = RawLinkSequenceMethods.ComputeCellsForPayload(payloadLengthInBytes); + var cellCount = TLinkAddress.CreateTruncating(cells); + var start = links.AllocateRange(cellCount); + links.ClearCells(start, cellCount); + links.RawLinkSequences.WriteDescriptor(start, payloadLengthInBytes); + return start; + } + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public static TLinkAddress AllocateRawLinkSequence(this UnitedRangedMemoryLinks links, ReadOnlySpan payload) + where TLinkAddress : IUnsignedNumber, IShiftOperators, IBitwiseOperators, IMinMaxValue, IComparisonOperators + { + var start = links.AllocateRawLinkSequence(payload.Length); + links.WriteRawLinkSequence(start, payload); + return start; + } + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public static void WriteRawLinkSequence(this UnitedRangedMemoryLinks links, TLinkAddress start, ReadOnlySpan payload) + where TLinkAddress : IUnsignedNumber, IShiftOperators, IBitwiseOperators, IMinMaxValue, IComparisonOperators + { + RawLinkSequenceMethods.ValidatePayloadLength(payload.Length, nameof(payload)); + if (!links.RawLinkSequences.IsRawLinkSequence(start)) + { + throw new ArgumentException("Address is not a raw link sequence head.", nameof(start)); + } + var expectedLength = links.RawLinkSequences.GetLengthInBytes(start); + if (expectedLength != payload.Length) + { + throw new ArgumentException("Payload length must match the allocated raw link sequence length.", nameof(payload)); + } + links.RawLinkSequences.Write(start, payload); + } + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public static void ReadRawLinkSequence(this UnitedRangedMemoryLinks links, TLinkAddress start, Span destination) + where TLinkAddress : IUnsignedNumber, IShiftOperators, IBitwiseOperators, IMinMaxValue, IComparisonOperators + { + if (!links.RawLinkSequences.IsRawLinkSequence(start)) + { + throw new ArgumentException("Address is not a raw link sequence head.", nameof(start)); + } + links.RawLinkSequences.Read(start, destination); + } + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public static void DeallocateRawLinkSequence(this UnitedRangedMemoryLinks links, TLinkAddress start) + where TLinkAddress : IUnsignedNumber, IShiftOperators, IBitwiseOperators, IMinMaxValue, IComparisonOperators + { + if (!links.RawLinkSequences.IsRawLinkSequence(start)) + { + return; + } + var cells = links.RawLinkSequences.GetCellCount(start); + links.DeallocateRange(start, TLinkAddress.CreateTruncating(cells)); + } + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public static bool IsRawLinkSequence(this UnitedRangedMemoryLinks links, TLinkAddress address) + where TLinkAddress : IUnsignedNumber, IShiftOperators, IBitwiseOperators, IMinMaxValue, IComparisonOperators + => links.RawLinkSequences.IsRawLinkSequence(address); + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public static long GetRawLinkSequenceLengthInBytes(this UnitedRangedMemoryLinks links, TLinkAddress address) + where TLinkAddress : IUnsignedNumber, IShiftOperators, IBitwiseOperators, IMinMaxValue, IComparisonOperators + => links.RawLinkSequences.GetLengthInBytes(address); + + [MethodImpl(MethodImplOptions.AggressiveInlining)] + public static long GetRawLinkSequenceCellCount(this UnitedRangedMemoryLinks links, TLinkAddress address) + where TLinkAddress : IUnsignedNumber, IShiftOperators, IBitwiseOperators, IMinMaxValue, IComparisonOperators + => links.RawLinkSequences.GetCellCount(address); + } +} diff --git a/csharp/Platform.Data.Doublets/Memory/UnitedRanged/UnitedRangedLinksConstants.cs b/csharp/Platform.Data.Doublets/Memory/UnitedRanged/UnitedRangedLinksConstants.cs index e2236ff6a..f2a4b5149 100644 --- a/csharp/Platform.Data.Doublets/Memory/UnitedRanged/UnitedRangedLinksConstants.cs +++ b/csharp/Platform.Data.Doublets/Memory/UnitedRanged/UnitedRangedLinksConstants.cs @@ -10,10 +10,10 @@ namespace Platform.Data.Doublets.Memory.UnitedRanged /// /// Extension of used by /// . Exposes two - /// additional sentinel values stored inside : + /// additional sentinel values stored inside the Source word: /// /// - /// tags the first cell of a raw binary blob. + /// tags the first cell of a raw link sequence. /// tags the first cell of a multi-cell free range. /// /// @@ -25,20 +25,20 @@ namespace Platform.Data.Doublets.Memory.UnitedRanged public class UnitedRangedLinksConstants : LinksConstants where TLinkAddress : IUnsignedNumber { /// - /// Sentinel stored in to designate that a - /// cell is the first cell of a raw binary blob. Reuses the + /// Sentinel stored in the Source word to designate that a cell is the + /// first cell of a raw link sequence. Reuses the /// slot — a housekeeping value /// that is never persisted as a link reference. /// - public TLinkAddress RawMarker + public TLinkAddress RawLinkSequenceMarker { [MethodImpl(MethodImplOptions.AggressiveInlining)] get; } /// - /// Sentinel stored in to designate that a - /// cell is the first cell of a multi-cell free range. Reuses the + /// Sentinel stored in the Source word to designate that a cell is the + /// first cell of a multi-cell free range. Reuses the /// slot — a housekeeping value /// that is never persisted as a link reference. /// @@ -51,28 +51,28 @@ public TLinkAddress FreeRangeMarker [MethodImpl(MethodImplOptions.AggressiveInlining)] public UnitedRangedLinksConstants() { - RawMarker = Itself; + RawLinkSequenceMarker = Itself; FreeRangeMarker = Error; } [MethodImpl(MethodImplOptions.AggressiveInlining)] public UnitedRangedLinksConstants(bool enableExternalReferencesSupport) : base(enableExternalReferencesSupport) { - RawMarker = Itself; + RawLinkSequenceMarker = Itself; FreeRangeMarker = Error; } [MethodImpl(MethodImplOptions.AggressiveInlining)] public UnitedRangedLinksConstants(Range possibleInternalReferencesRange) : base(possibleInternalReferencesRange) { - RawMarker = Itself; + RawLinkSequenceMarker = Itself; FreeRangeMarker = Error; } [MethodImpl(MethodImplOptions.AggressiveInlining)] public UnitedRangedLinksConstants(Range possibleInternalReferencesRange, Range? possibleExternalReferencesRange) : base(possibleInternalReferencesRange, possibleExternalReferencesRange) { - RawMarker = Itself; + RawLinkSequenceMarker = Itself; FreeRangeMarker = Error; } } diff --git a/csharp/Platform.Data.Doublets/Memory/UnitedRanged/UnitedRangedLinksExtensions.cs b/csharp/Platform.Data.Doublets/Memory/UnitedRanged/UnitedRangedLinksExtensions.cs new file mode 100644 index 000000000..a824bc3bd --- /dev/null +++ b/csharp/Platform.Data.Doublets/Memory/UnitedRanged/UnitedRangedLinksExtensions.cs @@ -0,0 +1,21 @@ +using System.Collections.Generic; +using System.Numerics; + +#pragma warning disable CS1591 // Missing XML comment for publicly visible type or member + +namespace Platform.Data.Doublets.Memory.UnitedRanged +{ + public static class UnitedRangedLinksExtensions + { + public static bool IsRawLinkSequence(this ILinks links, IList? link) + where TLinkAddress : IUnsignedNumber + { + if (link == null || link.Count <= links.Constants.SourcePart) + { + return false; + } + return links.Constants is UnitedRangedLinksConstants constants + && link[links.Constants.SourcePart] == constants.RawLinkSequenceMarker; + } + } +} diff --git a/docs/case-studies/issue-512/README.md b/docs/case-studies/issue-512/README.md index 3739b3c01..31331cb47 100644 --- a/docs/case-studies/issue-512/README.md +++ b/docs/case-studies/issue-512/README.md @@ -1,4 +1,4 @@ -# Case Study: Issue #512 — `UnitedRangedMemoryLinks` with Ranges for Binary Data +# Case Study: Issue #512 — `UnitedRangedMemoryLinks` with Link Ranges > Source issue: > @@ -9,7 +9,7 @@ This directory collects the analysis, design exploration and implementation plan for the new `UnitedRangedMemoryLinks` doublets storage variant. The goal is twofold: 1. Provide an _evolution_ of `UnitedMemoryLinks` that allocates and reclaims **contiguous ranges of links** instead of single links, while preserving the no-fragmentation, uniform-cell invariant that makes united storage so attractive. -2. Allow **raw binary blobs** to live inside the same address space as ordinary doublets, by reusing the underlying link cell as a payload cell, gated by a dedicated marker stored in `LinksConstants`. +2. Allow **raw link sequences** to live inside the same address space as ordinary doublets, by reusing the underlying link cells as payload cells. Those sequences can store raw data blobs, binary files, or any other byte payload whose length is aligned to `TLinkAddress`. The files in this directory are: @@ -27,7 +27,8 @@ The files in this directory are: Each cell of the storage still occupies one `RawLink` slot (8 × `TLinkAddress`), so the file format remains uniform and free of internal fragmentation. The improvements are: * A **range allocator** that tracks free regions as a sorted-by-address, length-keyed doubly-linked list of `RawLink` cells (the same cells reused as range descriptors). Adjacent free regions are eagerly coalesced on deallocation, so the only way fragmentation can grow is when an allocation is _larger than every free region_, in which case the storage is simply extended at the tail. -* A new **`RawMarker`** constant in `LinksConstants` — used as the `Source` field of the first cell of a binary blob — that designates the cell sequence as a binary payload rather than a doublet. The second field (`Target`) records the length of the blob in `TLinkAddress` units, from which the number of consumed link cells is derived. -* A new **`UnitedRangedMemoryLinks`** class drop-in compatible with `ILinks` (so the existing tests pass with it as a substitute for `UnitedMemoryLinks`) plus two new public operations: `AllocateRange(length)` / `DeallocateRange(start)` and `AllocateRawBinary(byteLength)` / `WriteRawBinary` / `ReadRawBinary`. +* A new **`RawLinkSequenceMarker`** constant in `UnitedRangedLinksConstants` — used as the `Source` field of the first cell of a raw link sequence. The second field (`Target`) records the payload length in bytes, from which the number of consumed link cells is derived. +* A new **`UnitedRangedMemoryLinks`** class drop-in compatible with `ILinks` (so the existing tests pass with it as a substitute for `UnitedMemoryLinks`) with range allocation in the implementation and raw-link-sequence convenience operations in extensions. +* `Each` and `Count` include raw link sequence heads by default, while continuation cells and free ranges stay hidden. The `IncludeRawLinkSequences` configuration can exclude sequence heads when a caller wants ordinary doublets only. For the full rationale, see [`design.md`](./design.md). diff --git a/docs/case-studies/issue-512/background.md b/docs/case-studies/issue-512/background.md index ce89f8194..eb5938fa2 100644 --- a/docs/case-studies/issue-512/background.md +++ b/docs/case-studies/issue-512/background.md @@ -82,28 +82,28 @@ link.SizeAsSource == default && link.Source != default 1. **Cell #0 is the header.** The reserved word `Reserved8` is _the_ obvious place to store an extra root pointer — for the free-range list — without breaking any code - that does not look at it. The header will be repurposed as - `LinksRangedHeader` (a `LayoutKind.Explicit` struct with the same - fields plus a typed alias for `Reserved8`) so the binary representation stays - identical to `LinksHeader`. This means a database written by `UnitedMemoryLinks` can - be opened by `UnitedRangedMemoryLinks` and vice-versa, as long as no binary blobs - are present. + that does not look at it. The implementation keeps using `LinksHeader` + directly and treats `Reserved8` as the free-range list head, so the binary + representation stays identical. This means a database written by + `UnitedMemoryLinks` can be opened by `UnitedRangedMemoryLinks` and vice-versa, as + long as no raw link sequences are present. 2. **A free single cell remains a free single cell.** The original unused-links list is _preserved_; the new "free range" list only tracks runs of two or more contiguous free cells. When a range deallocation produces a run of length 1, it is pushed back onto the original unused-links list. -3. **Source-or-Target equal to `RawMarker`** marks a binary blob. The marker value is - chosen so that: +3. **`Source == RawLinkSequenceMarker`** marks the head of a raw link sequence. The + marker value is chosen so that: * it is outside `InternalReferencesRange` (so it cannot accidentally appear as a valid link reference); - * it is _stable_ across versions of `LinksConstants` — we anchor it to one position - above the existing `Error` constant inside the reserved tail of the references - range, which `LinksConstants` already keeps for housekeeping (`Continue`, `Break`, + * it is _stable_ across versions of `LinksConstants` — it reuses the existing + `Itself` housekeeping constant inside the reserved tail of the references range, + which `LinksConstants` already keeps for control values (`Continue`, `Break`, `Skip`, `Any`, `Itself`, `Error`). 4. **Tree methods are unchanged.** The new class only intercepts `Create`, `Update`, - `Delete`, `Each` and `Count` to (a) skip cells that belong to a binary blob and - (b) ignore the free-range descriptor cells. All the tree methods receive the same - pointers as before and operate without modification. + `Delete`, `Each` and `Count` to (a) treat raw link sequences as ranged metadata and + optionally expose their head cells through `Each`/`Count`, and (b) ignore the + free-range descriptor cells. All the tree methods receive the same pointers as + before and operate without modification. diff --git a/docs/case-studies/issue-512/design.md b/docs/case-studies/issue-512/design.md index f7ab785a2..b9e7efabe 100644 --- a/docs/case-studies/issue-512/design.md +++ b/docs/case-studies/issue-512/design.md @@ -5,7 +5,8 @@ * Allocate / deallocate contiguous **ranges of link cells** (`R3`, `R4`). * No fragmentation — never split unless inevitable, coalesce on free (`R7`). * "Prefer empty space" — best-fit, growth at tail only as a last resort (`R8`). -* Embed raw **binary blobs** in the same address space (`R5`, `R6`, `R9`). +* Embed raw **link sequences** in the same address space (`R5`, `R6`, `R9`). These + sequences can store raw data blobs, binary files, or other aligned byte payloads. * Stay drop-in compatible with `UnitedMemoryLinks` and `ILinks<>` (`R2`, `R10`). ## Design alternatives considered @@ -31,7 +32,7 @@ discriminate the three flavours of cell that can appear in the allocated range: | Cell flavour | `Source` value | | --- | --- | | Regular doublet | A link index (`≤ InternalReferencesRange.Maximum`) or `Null` | -| Raw binary blob head | `RawMarker` = `LinksConstants.Itself` | +| Raw link sequence head | `RawLinkSequenceMarker` = `LinksConstants.Itself` | | Multi-cell free range head | `FreeRangeMarker` = `LinksConstants.Error` | Both sentinels live above `InternalReferencesRange.Maximum` (they are housekeeping @@ -63,20 +64,20 @@ No on-disk header layout change is required: databases produced by `UnitedMemoryLinks` have `Reserved8 = 0`, which `UnitedRangedMemoryLinks` reads as "no free ranges" — so old files open cleanly. -## Binary blob layout +## Raw link sequence layout -A binary blob occupies one **header cell** followed by zero or more continuation +A raw link sequence occupies one **header cell** followed by zero or more continuation cells. The header cell holds: -| Field | Binary-blob usage | +| Field | Raw-link-sequence usage | | --- | --- | -| `Source` | `RawMarker` | -| `Target` | `Length` of the blob in **bytes**. Must be a multiple of `sizeof(TLinkAddress)`. | +| `Source` | `RawLinkSequenceMarker` | +| `Target` | `Length` of the payload in **bytes**. Must be a multiple of `sizeof(TLinkAddress)`. | | `LeftAsSource` … `SizeAsTarget` | First six `TLinkAddress` words of payload (treated as opaque bytes). | Each continuation cell carries eight more `TLinkAddress` words of payload (no continuation marker, no length — the head cell's `Target` drives iteration). So -a blob of `B` bytes occupies: +a sequence of `B` bytes occupies: ```text cells = 1 if B ≤ 6 * sizeof(TLinkAddress) @@ -85,14 +86,15 @@ cells = 1 + ceil((B - 6 * sizeof(TLinkAddress)) / (8 * sizeof(TLinkAddress))) The encoding is unambiguous because: -* `Source == RawMarker` is never produced by `Create` (which initialises `Source` - and `Target` to `Null` and only ever stores values inside the references range). -* The marker is **never** sampled in a continuation cell — iteration of a blob +* `Source == RawLinkSequenceMarker` is never produced by `Create` (which initialises + `Source` and `Target` to `Null` and only ever stores values inside the references + range). +* The marker is **never** sampled in a continuation cell — iteration of a sequence starts at the head cell, picks up the length, and consumes the right number of bytes from contiguous addresses without re-examining `Source` of any inner cell. -* Intermediate cell indices inside a blob are **not** valid link handles. This is +* Intermediate cell indices inside a sequence are **not** valid link handles. This is a deliberate trade-off: it removes the need to scan from address `1` to detect - whether a given index belongs to a blob's interior. + whether a given index belongs to a sequence's interior. ## Range allocation algorithm @@ -152,12 +154,16 @@ same size as if they had never happened. ## Marking & interaction with `Each` / `Count` -`UnitedRangedMemoryLinks` overrides `Each(...)` and `Count(...)` for the -unrestricted case. Both walk allocated addresses from `1` to `AllocatedLinks` -and skip a cell entirely when its `Source` matches either marker, advancing past -all of its continuation cells in one step. The restricted overloads delegate to -the base implementation, which already walks tree indexes that only contain real -doublet references. +`UnitedRangedMemoryLinks` overrides `Each(...)` and `Count(...)` for all supported +restriction shapes. Both walk allocated addresses from `1` to `AllocatedLinks`. +Free-range heads are always hidden. Raw link sequence continuation cells are always +hidden. Raw link sequence heads are visible by default, and can be hidden by setting +`IncludeRawLinkSequences = false`. + +Restricted `Each`/`Count` calls also use the ranged scan instead of the base +source/target trees, because raw link sequence heads are not inserted into those +trees. This keeps universal `ILinks<>` queries able to discover sequence heads by +index, by `Source == RawLinkSequenceMarker`, or by the byte length stored in `Target`. `Create`/`Delete` keep their existing semantics for callers: a fresh `Create()` returns a freshly-initialised single-cell address, and `Delete(link)` puts a @@ -171,18 +177,19 @@ the highest cell. * Databases produced by `UnitedMemoryLinks` open cleanly in `UnitedRangedMemoryLinks`: `Reserved8 == 0` means "no free ranges yet", and the per-cell unused list keeps working for single-cell allocations. -* Databases produced by `UnitedRangedMemoryLinks` that contain no blobs and no +* Databases produced by `UnitedRangedMemoryLinks` that contain no raw link sequences and no multi-cell free ranges round-trip back through `UnitedMemoryLinks` bit-for-bit. -* Databases that **do** contain blobs or multi-cell free ranges are intentionally +* Databases that **do** contain raw link sequences or multi-cell free ranges are intentionally not backwards-compatible with old readers — the issue body does not require - cross-version compatibility, and the new file flag in `LinksHeader.Reserved8` - makes it cheap to add a version check later. + cross-version compatibility, and the reused `LinksHeader.Reserved8` word makes it + cheap to add a version check later. ## Invariants 1. **No internal fragmentation** — every link cell is either part of an allocated - doublet, part of an allocated blob, part of a multi-cell free range, or on the - single-cell unused list. The union of all four sets is exactly `[1, AllocatedLinks]`. + doublet, part of an allocated raw link sequence, part of a multi-cell free range, + or on the single-cell unused list. The union of all four sets is exactly + `[1, AllocatedLinks]`. 2. **No external fragmentation buildup** — coalescing happens on every `DeallocateRange`; appending at the tail is the only way to grow. 3. **`AllocatedLinks` is tight** — after every deallocation, the high-water mark diff --git a/docs/case-studies/issue-512/related-work.md b/docs/case-studies/issue-512/related-work.md index bf0f2e4c6..02a64435f 100644 --- a/docs/case-studies/issue-512/related-work.md +++ b/docs/case-studies/issue-512/related-work.md @@ -22,9 +22,9 @@ and what we are deliberately not borrowing. * **Lua 5.4 strings** — small strings are stored inline; long strings are referenced by pointer with a tag bit. The "marker word at the head of a record" idea is the - same as our `RawMarker` (and analogous to Lua's `LUA_TLNGSTR` tag). + same as our `RawLinkSequenceMarker` (and analogous to Lua's `LUA_TLNGSTR` tag). * **SQLite "frequent" records** — SQLite reuses the first byte of a record as a type - tag. Our `Source == RawMarker` convention is conceptually identical. + tag. Our `Source == RawLinkSequenceMarker` convention is conceptually identical. ## Allocators inside persistent stores @@ -54,7 +54,7 @@ Search queries used during the design phase (kept here for traceability): * "boundary tag allocator linked list free range coalesce" * "uniform cell allocator fragmentation" -* "tagged pointer marker first cell binary blob in memory store" +* "tagged pointer marker first cell raw sequence in memory store" * "linksplatform doublets storage layout" * "LMDB freelist coalesce" diff --git a/docs/case-studies/issue-512/requirements.md b/docs/case-studies/issue-512/requirements.md index 9048a2935..ab90fc4c2 100644 --- a/docs/case-studies/issue-512/requirements.md +++ b/docs/case-studies/issue-512/requirements.md @@ -21,7 +21,9 @@ them. * implements `ILinks`, * exposes the same set of constructors as `UnitedMemoryLinks` (`(string)`, `(string, long)`, `(IResizableDirectMemory)`, `(IResizableDirectMemory, long)`, - `(IResizableDirectMemory, long, LinksConstants, IndexTreeType)`), + `(IResizableDirectMemory, long, UnitedRangedLinksConstants, IndexTreeType)`), + plus overloads that configure whether raw link sequence heads are visible in + `Each`/`Count`, * existing tests (`ResizableDirectMemoryLinksTests`, `ILinksBasicTests`, `GenericLinksTests`, `GarbageCollectionTests`) succeed when the type is plugged in instead of `UnitedMemoryLinks` for storage operations covered by `ILinks<>`. @@ -36,8 +38,9 @@ them. * `AllocateRange(TLinkAddress length)` returns the start address of a contiguous block of `length` cells, or grows the file by one cell at a time when no suitable free range exists (cf. R7). - * `DeallocateRange(TLinkAddress start)` returns the cells of a previously allocated - binary blob to the free list and **coalesces** with adjacent free regions. + * `DeallocateRange(TLinkAddress start, TLinkAddress length)` returns cells from a + previously allocated range or raw link sequence to the free list and **coalesces** + with adjacent free regions. * Every range described by the allocator has a length that is a positive integer multiple of `RawLink.SizeInBytes`. No partial cells are ever produced. @@ -47,25 +50,28 @@ them. > "we should also be to allocate/deallocate ranges of links (that should be faster than > allocating one by one)" -* **Acceptance:** a microbenchmark / unit test that compares `AllocateRange(N)` against - `N` individual `Create()` calls shows lower wall-clock time and fewer underlying - memory-resize events for N ≥ 8 (the benchmark is included in `./benchmarks.md`). +* **Acceptance:** a unit test compares `AllocateRange(N)` against `N` individual + `Create()` calls and verifies that range allocation advances the high-water mark in + one operation instead of one operation per cell. -## R5. Raw binary range allocation +## R5. Raw link sequence allocation > "and also allocating raw binary ranges. And use some constant in LinksContants as a > marker of such raw binary links" * **Acceptance:** - * a new constant `RawMarker` is exposed via `UnitedRangedLinksConstants` + * a new constant `RawLinkSequenceMarker` is exposed via + `UnitedRangedLinksConstants` (a subclass of `LinksConstants` so we don't break the upstream contract), - * `AllocateRawBinary(long sizeInBytes)` rounds the byte size up to a whole number of - `TLinkAddress` words and returns the start cell address of the blob, - * the first cell of the blob carries: - * `Source = RawMarker`, - * `Target = lengthInTLinkAddressUnits`, - * `IsRawBinary(start)` returns `true` for that start cell. + * extension method `AllocateRawLinkSequence(long sizeInBytes)` validates that the + byte size is a non-negative multiple of `sizeof(TLinkAddress)` and returns the + start cell address of the sequence, + * the first cell of the sequence carries: + * `Source = RawLinkSequenceMarker`, + * `Target = lengthInBytes`, + * extension methods `IsRawLinkSequence(start)` and `IsRawLinkSequence(linkFromEach)` + identify sequence heads. ## R6. Binary tree fields are part of the payload @@ -75,9 +81,9 @@ them. * **Acceptance:** the entire `RawLink` struct fields beyond `Source`/`Target` of the **first** cell (`LeftAsSource`, `RightAsSource`, `SizeAsSource`, `LeftAsTarget`, `RightAsTarget`, `SizeAsTarget`) are addressable and writable as continuation of the - payload via `WriteRawBinary`/`ReadRawBinary`. - Trees are **not attached** to the cells that belong to a binary blob, so the indexing - fields can be freely used as data bytes. + payload via `WriteRawLinkSequence`/`ReadRawLinkSequence`. + Trees are **not attached** to the cells that belong to a raw link sequence, so the + indexing fields can be freely used as data bytes. ## R7. No fragmentation @@ -98,18 +104,21 @@ them. > "we should prefer filling the empty / unused space, to pack up everything nicely." * **Acceptance:** for any allocation request that fits in any existing free range, no - new cells are appended at the tail; this is covered by a unit test in - `UnitedRangedAllocatorTests.PrefersExistingFreeRange`. + new cells are appended at the tail; this is covered by + `AllocateRange_PrefersExistingFreeRange`. ## R9. Treat marker'd cells as binary, not as references > "that should be treated not as references to links, but binary data itself" * **Acceptance:** - * `Each` and `Count` skip cells that begin a binary blob and the cells _inside_ a - binary blob — the storage advertises only doublet links to consumers of `ILinks<>`. - * Tree-method invariants are preserved by never inserting raw binary blob cells in - the source/target trees. + * `Each` and `Count` include raw link sequence heads by default so universal + `ILinks<>` consumers can discover them. + * `IncludeRawLinkSequences = false` excludes raw link sequence heads from + `Each`/`Count` for callers that want ordinary doublets only. + * Continuation cells inside a raw link sequence are always skipped. + * Tree-method invariants are preserved by never inserting raw link sequence cells in + the source/target trees; ranged iteration scans visible cells directly instead. ## R10. Backwards compatibility diff --git a/docs/case-studies/issue-512/risks-and-trade-offs.md b/docs/case-studies/issue-512/risks-and-trade-offs.md index f3fcf80dd..196fe93ab 100644 --- a/docs/case-studies/issue-512/risks-and-trade-offs.md +++ b/docs/case-studies/issue-512/risks-and-trade-offs.md @@ -10,13 +10,14 @@ still well within the 8-word cell, but means the "smallest free range we can describe" is one full cell. Free runs of length 1 are punted to the existing single-cell unused list, which is unchanged. -* **Marker collisions** — `RawMarker` is chosen above `InternalReferencesRange.Maximum` - so it cannot be confused with a valid link index. Older `LinksConstants` instances - that ship without the new constant simply do not see the marker at all, so an old - reader of a new file would (a) think a blob cell is a regular link and (b) attempt - to walk the source tree from it. Cross-version compatibility is explicitly **not** - a goal of this PR (the issue body says nothing about it), and the new file flag in - `LinksRangedHeader` makes it cheap to add a version check later. +* **Marker collisions** — `RawLinkSequenceMarker` is chosen above + `InternalReferencesRange.Maximum` so it cannot be confused with a valid link index. + Older `LinksConstants` instances that ship without the new constant simply do not + see the marker at all, so an old reader of a new file would (a) think a raw link + sequence head is a regular link and (b) attempt to walk the source tree from it. + Cross-version compatibility is explicitly **not** a goal of this PR (the issue body + says nothing about it), and the reused `Reserved8` word makes it cheap to add a + version check later. ## Risks that the design _eliminates_ @@ -40,4 +41,5 @@ `SplitMemoryLinks`. * Add a CLI utility (`platform-doublets defrag`) that walks the free list and rebuilds it from scratch, useful after offline upgrades. -* Add a "raw blob" cursor type to the public API that exposes the blob as a `Span`. +* Add a raw-link-sequence cursor type to the public API that exposes the payload as a + `Span`. diff --git a/docs/case-studies/issue-512/solution-plan.md b/docs/case-studies/issue-512/solution-plan.md index fd92289da..52da45223 100644 --- a/docs/case-studies/issue-512/solution-plan.md +++ b/docs/case-studies/issue-512/solution-plan.md @@ -7,8 +7,8 @@ branch `issue-512-557a0a3ca78d` (PR #513). ## Step 1 — Constants scaffolding (`R5`, `R10`) * Add `csharp/Platform.Data.Doublets/Memory/UnitedRanged/UnitedRangedLinksConstants.cs` - — `LinksConstants` subclass that exposes `RawMarker` (reuses - `Itself`) and `FreeRangeMarker` (reuses `Error`). + — `LinksConstants` subclass that exposes `RawLinkSequenceMarker` + (reuses `Itself`) and `FreeRangeMarker` (reuses `Error`). * No `LinksHeader` layout change is needed: the free-range list head reuses the existing `Reserved8` word, which previous releases left at zero. @@ -20,24 +20,30 @@ branch `issue-512-557a0a3ca78d` (PR #513). predecessor/successor coalescing), `Detach(start)`, `CarveFromFront`, `CarveFromBack`, and `TryDetachTail`. -## Step 3 — Raw binary blobs (`R5`, `R6`, `R9`) +## Step 3 — Raw link sequences (`R5`, `R6`, `R9`) -* Add `csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/RawBinaryMethods.cs` - — encodes/decodes blobs over the allocator. Exposes `Write(start, payload)`, - `Read(start, destination)`, `ComputeCellsForBlob(byteLength)`, - `IsRawBinary(address)`, `GetLengthInBytes(address)`, `GetCellCount(address)`, - and `Clear(start)`. +* Add `csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/RawLinkSequenceMethods.cs` + — encodes/decodes raw link sequences over the allocator. Exposes + `Write(start, payload)`, `Read(start, destination)`, + `ComputeCellsForPayload(byteLength)`, `IsRawLinkSequence(address)`, + `GetLengthInBytes(address)`, and `GetCellCount(address)`. +* Add extension methods over `UnitedRangedMemoryLinks` for the + non-essential convenience API: `AllocateRawLinkSequence`, + `WriteRawLinkSequence`, `ReadRawLinkSequence`, `DeallocateRawLinkSequence`, + `IsRawLinkSequence`, `GetRawLinkSequenceLengthInBytes`, and + `GetRawLinkSequenceCellCount`. +* Add an `ILinks<>` extension that identifies raw link sequence heads returned by + `Each`, so callers can inspect sequence heads through the universal interface. ## Step 4 — `UnitedRangedMemoryLinks` (`R1`, `R2`) * Add `csharp/Platform.Data.Doublets/Memory/UnitedRanged/Generic/UnitedRangedMemoryLinks.cs` — a single concrete class that inherits directly from `UnitedMemoryLinks`, mirrors its five constructors, overrides `SetPointers`/`ResetPointers` to wire - up the new helpers, and overrides `Create`/`Delete`/`Each`/`Count` so that blob - and free-range cells are correctly ignored. Exposes the new public API: - `AllocateRange`, `DeallocateRange`, `AllocateRawBinary`, `WriteRawBinary`, - `ReadRawBinary`, `DeallocateRawBinary`, `IsRawBinary`, - `GetRawBinaryLengthInBytes`. A separate `UnitedRangedMemoryLinksBase` was + up the new helpers, and overrides `Create`/`Delete`/`Update`/`Each`/`Count` so that + raw link sequences and free-range cells are correctly handled. Exposes the + implementation-level range API `AllocateRange` / `DeallocateRange` and the + `IncludeRawLinkSequences` configuration. A separate `UnitedRangedMemoryLinksBase` was considered but proved unnecessary — direct inheritance is sufficient. ## Step 5 — Tests (`R2`, `R3`, `R4`, `R5`, `R6`, `R7`, `R8`, `R9`) @@ -50,10 +56,16 @@ branch `issue-512-557a0a3ca78d` (PR #513). * `AllocateRange_PrefersExistingFreeRange`. * `DeallocateRange_CoalescesNeighbours`. * `DeallocateRange_TrimsTail`. - * `RawBinary_Roundtrip_SingleCell`. - * `RawBinary_Roundtrip_MultiCell`. - * `RawBinary_DoesNotAppearInEach`. - * `Each_SkipsFreeRangesAndBlobs`. + * `AllocateRange_OneCellRemainderFeedsSingleCellFreeList`. + * `RawLinkSequence_Roundtrip_SingleCell`. + * `RawLinkSequence_Roundtrip_MultiCell`. + * `RawLinkSequence_ZeroLength_RoundtripAndUsesOneCell`. + * `RawLinkSequence_LengthMustBeWordAligned`. + * `RawLinkSequence_AppearsInEachByDefault`. + * `RawLinkSequence_CanBeExcludedFromEachByConfiguration`. + * `RawLinkSequence_CanBeReturnedByEachRestriction`. + * `Delete_DeallocatesRawLinkSequenceThroughUniversalInterface`. + * `Each_SkipsFreeRangesAndIncludesConfiguredRawLinkSequences`. * `NoFragmentation_ChaosTest` — deterministic random allocations/deallocations. ## Step 6 — Documentation (`R11`)