[BUG] Device reachability issues when Thread OMR prefix changes

### Reproduction steps

### The Issue
As discussed in Yokohama at the MM and also in several personal talks we see many cases where an OMR prefix change in the thread network with multiple BRs lead to problems.

The OMR prefix change is basically an IPv6 address change of the device. The overall assumption is that, given that subscriptions still being active from device to a client, the next DataReport from the device would simply arrive at the client with a changed IPv6 source address.

The reality is that this seems to not happen. and the packages seem to still arrive with the old IPv6 address as source address.
@bzbarsky-apple already checked the SDK code and did not found a place where the UDP socket is bound to a specific address
@jwhui suspects: Looking into this a bit, many Matter/Thread platforms insert yet another IPv6 stack in between (e.g. Zephyr or LwIP). If there's another such IPv6 stack, then that IPv6 stack will be performing the IPv6 Source Address selection.

In fact the effect I analyze here with real users in their homes, so also the mix of devices and matter versions is huge, and I never ever saw any case where the source IP got updated! I can not list all the tested devices tbh but given the "real world" focus of the checks here it should affect "all".

### Summary

The attached log in a matter,js used as a client in a network with 135 thread nodes (and some wifi nodes). The network uses 5 Google Nest BRs (latest Fuchsia release with a thread version from October 2025 as I know) and one OpenThread based BR in the same network via which we have some limited insights. The log covers a wide timeframe around a OMR/Prefix change from fd2e to fd50 happened at roughly 12th april 21:59:38.

matter.js has multiple ways to detect changed IPs of devices:
* a new session is created (either by client or by device pushing a new session) - IP of the session exchange is compared to the "Last known good address" for the node and if it not matches it is updated
* a new incoming message from the device is sent which contains a new exchange - then the source IP of that message is checked against the session IP and if this does not match we update the session IP and also the klast known address of the peer node.

The relevant logline of such a change is "Operational address changed ...". To see the reason the log context around that line needs to be checked to see the source of the change.

Additionally we also monitor MDNS changes during active sessions but these are just logged and used to probe if the old address still works.

The logfile (goes up to 11h after the prefix change!) shows (for a thread network with 135 nodes!):
* We "only" logged 45 times "Session address ... no longer in mDNS results" which means that the BR actively announced new addresses after the prefix change but only for a subset of the nodes!
* We "only" logged 47 times "Operational address changed ...". It includes the 45 from above and 2 more where the device pushed a new session from a new IP ( @1:1c1 because of a reboot around the prefix time change, @1:218 also because of a reboot short after prefix change)
* 31 nodes had subscription timeouts in 15mins window after the prefix change, others "somewhen later"

That means:
* in the whole timeframe after prefix change "only" 47 nodes got re-subscribed and all other roughly 88 nodes still have an working subscription. 
* We can even talk to them still via the old address because in some cases we also sent invokes to them that worked! 
* All address changes recorded happened because of a new session establishment in that timeframe.
* NO IP change was originated from "a datareport for a working subscription arrived with a new source IP address"!

Additionally, we had only two cases, starting 4:30h after the prefix change, where we got an error while trying to reach the "old" IP for a new session and got a "Network unreachable" error. This case showed that there was NO valid IPs known from MDNS before we actively queried them!

## Fazit
As discussed and, when all this is really as the logs state, then:
* IP changes caused by such OMR/Prefix changes re not in all cases send new IP MDNS records. In some cases also no new IPs are announced actively by the BRs and needed to get queried.
* Somehow the old IPs are still reachable for long times, but also somewhen get "network unreachable" errors
* DataReports for active subscriptions from the devices are still reaching the client with the "old" source IPs

### More Details and infos on other intel from the logs

So the below detailed analysis of one such case is from a user and this is the change we check:

```
So 12. Apr 09:18:02 CEST 2026
| Age                  | OMR Prefix                                       | Pref   |IsLocal |
+----------------------+--------------------------------------------------+--------+--------+
|         11:18:24.516 | fd50:6be7:1f0:1::/64                             | low    | no     |
|  5 days 21:46:16.980 | fd2e:d139:e716:1::/64                            | low    | no     |
```

so the check from fd2e to fd50 at roughly 12. april 21:59:38. The client has connections to 135 thread devices.

Out of scope:
Directly around the Prefix change (actually 21:59:15.694 till 22:00:24.383) the log contains 25 cases where the StatusResponse to a subscription DataReport could not be delivered to the device (the MRP based delivery was basically tried 21:59 because the error was from the final wait time for device side MRP-acks.
Maybe there was something happening there already which in the end lead to the Prefix change. So lets ignore those because they all basically happened before the prefix change.

After Prefix change:
After the prefix change, basically starting 22:00:25.201 we see several cases like
```
2026-04-11 22:00:25.201 Subscription 5de4a782 to peer @1:68 timed out after 1m 36s
2026-04-11 22:00:25.204 Replacing subscription to @1:68 due to timeout
2026-04-11 22:00:25.207 Probe » @1:68•48b3⇵e764
2026-04-11 22:00:29.207 Probe « @1:68•48b3⇵e764 1↔2+1 (success)
```
which means that a subscription misses it's DataReport - usually these are wired thread devices with a subscription time of 1min where we wait 1:36 (in this case)  before we declare a subscription as timed out. Then we send out an empty read as a Probing package (and this is still to the OLD IP because we do not know the new one) and the probe got correctly delivered. So the Old IP was still routed and reached the device even after the prefix change. The "+1" in the probe log means that we 1 message retransmission happened.
We also have cases  where a probe fails.

At 22:00:34.372 in the log we see that first nodes get new MDNS announcements where the IP we have the current session to is no longer included, which also leads to an probing we added to gain more details (thats why the failed probe does not invalidate the subscription), e.g.

In some cases we can not reach the old address anymore and also the device does not send anything anymore, looks like that:
```
2026-04-11 22:00:34.372 @1:19f•489a Session address udp://[fd2e:d139:e716:1:6562:4ba6:2fe4:39de]:5540 no longer in mDNS results, probing
2026-04-11 22:00:34.378 Probe » @1:19f•489a⇵e769
2026-04-11 22:01:10.612 Probe « @1:19f•489a⇵e769 0↔5+5 (failed)
2026-04-11 22:09:09.268Subscription d4ef21ed to peer @1:19f timed out after 10m 53s
...
2026-04-11 22:09:09.275 Establish new Session to udp://[fd50:6be7:1f0:1:9e72:31d2:b7fd:7dbb]:5540 ...
2026-04-11 22:09:17.375 Operational address changed for @1:19f from udp://[fd2e:d139:e716:1:6562:4ba6:2fe4:39de]:5540 to udp://[fd50:6be7:1f0:1:9e72:31d2:b7fd:7dbb]:5540
```

But also cases like:
```
2026-04-11 22:00:34.397 @1:212•480f Session address udp://[fd2e:d139:e716:1:abd9:ba85:fd0b:1ecc]:5540 no longer in mDNS results, probing
2026-04-11 22:00:49.378 Probe » @1:212•480f⇵e76c
2026-04-11 22:00:50.846Probe « @1:212•480f⇵e76c (success)
```
... and from then on the device KEEPS the subscription active successfully because till the end of the log (11h later!) the device id not logged anything again.

Especially note that the second cases misses a logline like "Operational address changed" like in the above case where we "learned" the new IP address via the new session that was created and because we used the new MDNS address to connect to the device.

Other interesting facts from the log file:
* We "only" logged 45 times "Session address ... no longer in mDNS results" for a network with 135 thread devices - matter.js always keeps MDNS records updated based on incoming messages even if we do not do active queries.
* We "only" logged 47 times "Operational address changed ...". It includes the 45 from above and 2 more where the device pushed a new session from a new IP ( @1:1c1 because of a reboot around the prefix time change, @1:218 also because of a reboot short after prefix change)
* In fact the overlap were "just" 35 nodes that were logged as "address changed" AND also got a timeout of subscription and so a new Address

At 02:30:05.878am we saw some first cases of "ENETUNREACH while trying to send a probe ... which also shows that for some devices there were NO MDNS addresses valid anymore:
```
2026-04-12 02:29:22.026 Subscription 8fee9cbd to peer @1:1f6 timed out after 1m 36s
2026-04-12 02:29:22.032 Probe » @1:1f6•4802⇵ed20
2026-04-12 02:30:05.856 @1:1f6•4802 Session ended
2026-04-12 02:30:05.859 Probe « @1:1f6•4802⇵ed20 0↔5+5 (failed)
2026-04-12 02:30:05.862 @1:1f6 Resolving (no address known) <-- we query MDNS now but have no known address!
2026-04-12 02:30:05.866 @1:1f6•unsecured#39feb176c3542b93⇵ed21 udp://[fd2e:d139:e716:1:e387:dcd1:ae67:7174]:5540  <-- try to reach "last known" address
2026-04-12 02:30:05.878 General connection error (retry in 2m): network-unreachable send ENETUNREACH fd2e:d139:e716:1:e387:dcd1:ae67:7174:5540

... then the MDNS query delivered a new address
2026-04-12 02:31:17.010 CASE establishment to udp://[fd50:6be7:1f0:1:225f:7e5e:7316:91f9]:5540
2026-04-12 02:31:21.330 Operational address changed for @1:1f6 from udp://[fd2e:d139:e716:1:e387:dcd1:ae67:7174]:5540 to udp://[fd50:6be7:1f0:1:225f:7e5e:7316:91f9]:5540
```
... so in that case the subscribe was still active and without any IP change for 4:30h after the prefix change ...

Conclusion:
I validated it and all "Operational address changed" followed a new session establishment.

matter.js also checks for each new exchange if the address changed and would also trigger such address changes without a new session IF we would receive a UDP package for a new exchange with an other source address. But the log __does not__ show that.

This means that from all data we see in this (and many other comparable) logs is that it seems that on a ORM/Prefix change the devices still talk to the client with the "Old" IP until subscription times out and so we get a new session and discover the new IP

LogFile:

[prefix-change-log.log](https://github.com/user-attachments/files/26668378/prefix-change-log.log)

### Bug prevalence

reproducable

### GitHub hash of the SDK that was being used

multiple, real world devices

### Platform

core

### Platform Version(s)

1.x

### Anything else?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Device reachability issues when Thread OMR prefix changes #71525

Reproduction steps

The Issue

Summary

Fazit

More Details and infos on other intel from the logs

Bug prevalence

GitHub hash of the SDK that was being used

Platform

Platform Version(s)

Anything else?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[BUG] Device reachability issues when Thread OMR prefix changes #71525

Description

Reproduction steps

The Issue

Summary

Fazit

More Details and infos on other intel from the logs

Bug prevalence

GitHub hash of the SDK that was being used

Platform

Platform Version(s)

Anything else?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions