[OUTAGE] AKL-IX | Layer 2 reachability
Incident Report for New Zealand Internet Exchange
Resolved
The incident has been resolved, however the root cause is still under investigation, if any impactful or hazard works are required for the permanent fix; notices will be sent out.

Today one of our redundant direct point to point paths between MDR to DataCentre220 was taken offline due to an unrelated layer 1 fault, this occurs from time to time and normally never an issue nor communicated to peers as we have secondary paths ready to take over in the event of failure. OTDRs were conducted and work was underway fixing the issue which took time. We shut this path down when disruptive works were required at 10:42am (NZST) and our RSVP-TE secondary LSP kicked into gear shifting that paths traffic between MDR and DataCentre220 via VDC Albany (As expected). A short time later some peers advised of reachability issues between the 2 sites of bilateral sessions dropping and ARP failing. At this time the AKL-IX layer 2 fabric was checked and no immediate issue stood out, we could not replicate the issues experienced and the signalled RSVP-TE LSP was performing as expected. Additional requests were made to specific peers to identify the issue in greater detail. With switch control planes reporting all in order but these peers clearly having issues, the failed over LSP was removed to force traffic over the remaining site to site path to no avail, thankfully the original layer 1 fault was resolved and the path and accompanying LSPs was brought back up at ~2:03pm (NZST) which resolved the issue. So far this is isolated to a one-way MAC learning event from our switches between pe3.akl1 (DataCentre220) and pe2.akl3 (MDR) - whereby ARP requests originating from MDR toward pe3.akl1 at DataCentre220 were being dropped. Apologies for the inconvenience caused by this outage, were working hard on ensuring this won't occur again.
Posted Apr 03, 2023 - 19:17 NZST
Update
A fix has been implemented as of 2:03pm (NZST). Impacts were seen between peers on pe2.akl3 at MDR and pe3.akl1 at Datacentre220 with the root cause still under analysis.
Posted Apr 03, 2023 - 14:28 NZST
Investigating
We are currently investigating reports of partial breaks in layer 2 communications to peers between MDR and Datacentre220.
Posted Apr 03, 2023 - 12:41 NZST
This incident affected: NZIX (AKL-IX).