Support numbering system overrides in datetime by Manishearth · Pull Request #7905 · unicode-org/icu4x

Manishearth · 2026-04-22T17:42:01Z

Multiple of these are used in a year or day context, so the original suggestion of "generate string month data for these" does not work. In theory, we could handle most of these in one-off ways: jpnyear is a simple hardcoded if, the month-only ones like romanlow can go in data, etc. But since we need this infrastructure anyway for hanidec/hebr years, I don't think it's worth implementing this three different ways.

This is a pretty nontrivial PR that was also almost entirely agent-written. I spent a lot of work talking to the agent, linking it to docs, and getting it to diagnose various issues. In the actual code, most of my work was minor cleanups and reorganizing the jj history.

(I used an agent instead of doing this myself because I am attending UTC this week and I cannot do focused coding work but I can prompt agents)

The agent was pretty good at doing its own printf debugging to figure out the source of various bugs. The initial naïve implementation of just applying the preexisting pattern code was insufficient, there were a number of bugs, and it took a lot of back-and-forth with the agent to diagnose them. There were a lot of cycles of the agent implementing an incorrect fix (e.g. by force-overriding things in datagen) and me saying we don't want to do that, undoing the change, and trying again with more prompting.

A thing I did not do (that the agent wanted to do) was support explicit user-driven overrides, like he-u-calendar-hebrew-nu-hebr. I didn't do this because I'd need to support hanidays for a wider range. However, it's not a complicated change to make.

I have been reviewing this code extensively as I go on. However, this is a draft PR because I haven't done a proper end to end review. I also want to triple check the hebrew code against sources: it matches my understanding of the formatting but it might not be 100% correct.

Changelog

icu_datetime: Support numbering system overrides for datetime patterns when found in data

robertbastian

looks cool

tell your agent to not use Display, and to not allocate intermediate strings

robertbastian · 2026-04-22T18:59:49Z

+                    let s = o.format_number(year_val as usize);
+                    w.with_part(PART, |w| w.write_str(&s))?;


issue: you're allocating an intermediate string. instead, let format_number write to w

robertbastian · 2026-04-22T21:34:25Z

    }
+
+    /// <https://github.com/unicode-org/cldr/blob/main/common/rbnf/root.xml#L522>
+    fn format_hanidec(number: usize) -> String {


don't allocate this intermediate string. we implement an allocation-free Writeable for the numeric types, you could wrap that to replace the digits

Yeah, I was hoping to not have to do that. The hebrew stuff specifically is complex enough to benefit from intermediates.

I made this use Writeables for everything except Hebrew.

Drive-by: Datetime formatting is zero-alloc. Adding an allocation is a non-starter for me.

robertbastian · 2026-04-22T21:39:33Z

    0
  ],
  "elements": [
+    "ddddddddddddddddd日",


that's a lot of ds, can we still tweak the encoding? I find it hard to believe that we've used all the other lengths

This is based on preexisting agreement with CLDR

icu4x/components/datetime/src/provider/fields/length.rs

Lines 62 to 66 in 3ba5bf6

/// First index used for numeric overrides in compact [`FieldLength`] representation

///

/// Currently 17 due to decision in <https://unicode-org.atlassian.net/browse/CLDR-17217>,

/// may become 16 if the `> 16` is updated to a ` >= 16`

const FIRST_NUMERIC_OVERRIDE: u8 = 17;

well, we could still use 7, and if CLDR ever uses 7, shift that to 8 until we shift something to 16

That's a data stability issue.

I consider this encoding out of scope for this PR. We got agreement on this a while ago, if we want to change that we can get agreement on that later. If we don't care about data stability, then we can change from 17 to 7 as well. If we do, then we will be sorry picking 7.

Note that these are almost never stored as strings, they are just stored as "17 d" in the baked data. It's only the rendered/JSON patterns.

CLDR defined >16 (17 and higher) as the private use area.

These private-use field lengths are basically never stringified except in FsDataProvider JSON data.

Manishearth · 2026-04-22T21:56:27Z

Using Display and intermediate strings were actually deliberate choices to keep the impl simple, especially for Hebrew.

There are probably ways to optimize format_number but I'm wary of making it too complicated.

Manishearth · 2026-04-22T22:00:02Z

I will try to make it use writeables, though.

robertbastian · 2026-04-22T22:03:04Z

Using Display and intermediate strings were actually deliberate choices to keep the impl simple, especially for Hebrew.

There are probably ways to optimize format_number but I'm wary of making it too complicated.

We've put a lot of effort into keeping this code alloc free. Please try. Your code already does sequential pushes into a string, it shouldn't be hard to change that to writeable.

Manishearth · 2026-04-22T22:06:53Z

We've put a lot of effort into keeping this code alloc free. Please try. Your code already does sequential pushes into a string, it shouldn't be hard to change that to writeable.

The Hebrew code has different behavior based on how many letters there are in the final string, and involve insertion of a mark inside the string.

So yes, I could switch to writeable for most of this (and it's a followup I was hoping to play around with later), but I really don't want to try to do that for Hebrew just yet.

Manishearth · 2026-04-22T22:10:57Z

To be clear, it's possible to write the Hebrew stuff in an alloc-free way, just that it would further complicate code that is already pretty complicated.

Manishearth · 2026-04-22T22:13:28Z

So yes, I could switch to writeable for most of this (and it's a followup I was hoping to play around with later), but I really don't want to try to do that for Hebrew just yet.

Done

Co-authored-by: Robert Bastian <4706271+robertbastian@users.noreply.github.com>

Manishearth · 2026-04-29T00:10:54Z

I discussed with @sffc about the errors: I don't really want to introduce new error types for "we didn't implement a full RBNF here". Instead, I made the romanlow/hebrew implementations fully match what is specced in CLDR (which eventually falls back to the default), and filed a TODO for hanidays #7922.

My general position is we shouldn't be returning an error here, this is more of a GIGO situation and Latin fallback is reasonable.

robertbastian · 2026-04-29T14:26:08Z

+        return w.write_char('n'); // null
+    }
+    if n >= 5000 {
+        // romanlow falls back to the default past 5000.


the rules here says 5000: =#,##0=;. To me that looks like you should use the DecimalFormatter with grouping.

We can't until we know which DecimalFormatter to use, that is part of
https://unicode-org.atlassian.net/browse/CLDR-19424

I also don't particularly think we should optimize for this GIGO case.

robertbastian · 2026-04-29T14:31:37Z

+    Ok(())
+}
+
+/// <https://github.com/unicode-org/cldr/blob/main/common/rbnf/root.xml#L522>


Suggested change

/// <https://github.com/unicode-org/cldr/blob/main/common/rbnf/root.xml#L522>

/// <https://github.com/unicode-org/cldr/blob/fb0b4f0cb809cac10e8539dcba669c1d27d8e70c/common/main/root.xml#L3169-L3171>

this seems to be locale-sensitive?

Not exactly, and that's somewhat of an open question.

https://unicode-org.atlassian.net/browse/CLDR-19424

ah I looked at the wrong thing, the symbols to use with hanidec are locale dependent. as we don't do grouping, that doesn't matter.

link should be https://github.com/unicode-org/cldr/blob/c6d4b3579d2ee196ad0f9c3a9adb608a55ddf99b/common/supplemental/numberingSystems.xml#L39

Manishearth force-pushed the datetime-numsys branch 13 times, most recently from 4e18fdf to 7ebc672 Compare April 22, 2026 21:19

Manishearth marked this pull request as ready for review April 22, 2026 21:23

Manishearth requested review from a team, robertbastian, sffc and zbraniecki as code owners April 22, 2026 21:23

robertbastian previously requested changes Apr 22, 2026

View reviewed changes

Manishearth force-pushed the datetime-numsys branch 2 times, most recently from cbf8bac to 1911b22 Compare April 22, 2026 22:25

robertbastian reviewed Apr 22, 2026

View reviewed changes

Manishearth force-pushed the datetime-numsys branch from 1911b22 to 90bddd9 Compare April 22, 2026 22:42

robertbastian reviewed Apr 22, 2026

View reviewed changes

Comment thread components/datetime/src/provider/fields/length.rs Outdated

Comment thread components/datetime/src/provider/fields/length.rs Outdated

Comment thread components/datetime/src/provider/fields/length.rs Outdated

Comment thread components/datetime/src/provider/fields/length.rs Outdated

agent-driven: mutate numeric overrides in construction loop

f9c1f37

Manishearth requested a review from sffc April 28, 2026 05:07

Manishearth added 4 commits April 28, 2026 05:15

regen

1197d19

re-add import

64b9065

undo diff

854da4b

diff

915ab72

robertbastian reviewed Apr 28, 2026

View reviewed changes

Manishearth and others added 2 commits April 28, 2026 16:04

Update components/datetime/src/format/numeric_override.rs

036227c

Co-authored-by: Robert Bastian <4706271+robertbastian@users.noreply.github.com>

Update components/datetime/src/format/numeric_override.rs

a1b2392

Co-authored-by: Robert Bastian <4706271+robertbastian@users.noreply.github.com>

sffc approved these changes Apr 28, 2026

View reviewed changes

Manishearth and others added 3 commits April 28, 2026 16:06

Update components/datetime/src/format/numeric_override.rs

590d68a

Co-authored-by: Robert Bastian <4706271+robertbastian@users.noreply.github.com>

Update components/datetime/src/format/numeric_override.rs

46ec9b0

Co-authored-by: Robert Bastian <4706271+robertbastian@users.noreply.github.com>

some review comments

7501585

Manishearth mentioned this pull request Apr 28, 2026

Implement hanidays spellout rules #7922

Open

Manishearth added 2 commits April 29, 2026 00:04

manishearth: Remove error cases

54d48be

don't need data for overrides

42984d5

Manishearth requested review from robertbastian and sffc April 29, 2026 00:05

sffc approved these changes Apr 29, 2026

View reviewed changes

robertbastian reviewed Apr 29, 2026

View reviewed changes

Manishearth requested a review from robertbastian April 29, 2026 14:56

links

22483c1

Manishearth force-pushed the datetime-numsys branch from 34101f2 to 22483c1 Compare April 29, 2026 15:04

link

c052d73

robertbastian approved these changes Apr 29, 2026

View reviewed changes

Manishearth merged commit d0644e1 into unicode-org:main Apr 29, 2026
33 checks passed

Manishearth deleted the datetime-numsys branch April 29, 2026 15:57

Manishearth linked an issue Apr 29, 2026 that may be closed by this pull request

Support numbering system variations in datetime patterns #308

Closed

Manishearth mentioned this pull request May 5, 2026

Support numeric overrides in DateTimeFormat #4242

Closed

		let s = o.format_number(year_val as usize);
		w.with_part(PART, \|w\| w.write_str(&s))?;

	/// First index used for numeric overrides in compact [`FieldLength`] representation
	///
	/// Currently 17 due to decision in <https://unicode-org.atlassian.net/browse/CLDR-17217>,
	/// may become 16 if the `> 16` is updated to a ` >= 16`
	const FIRST_NUMERIC_OVERRIDE: u8 = 17;

	/// <https://github.com/unicode-org/cldr/blob/main/common/rbnf/root.xml#L522>
	/// <https://github.com/unicode-org/cldr/blob/fb0b4f0cb809cac10e8539dcba669c1d27d8e70c/common/main/root.xml#L3169-L3171>

Conversation

Manishearth commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog

Uh oh!

robertbastian left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Manishearth commented Apr 22, 2026

Uh oh!

Manishearth commented Apr 22, 2026

Uh oh!

robertbastian commented Apr 22, 2026

Uh oh!

Manishearth commented Apr 22, 2026

Uh oh!

Manishearth commented Apr 22, 2026

Uh oh!

Manishearth commented Apr 22, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Manishearth commented Apr 29, 2026

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Manishearth commented Apr 22, 2026 •

edited

Loading