Friday, June 3, 2011

Debugging IPv6 MTU issues in Windows

A common problem you might face soon (World IPv6 Day is 5 days away) is reachability to IPv6 sites due to MTU issues. ICMPv6 has a nice internal mechanism which is supposed to help the application overcome these issues, but like in the IPv4 world, not everything is perfect.

Let's suppose that an IPv6 subscriber is using a DSL router and is connected through PPPoE to a BRAS.

TARGET <=> BRAS <=> DSL-ROUTER <=> HOST

The usual MTU for PPPoE connections is 1492 bytes, as shown below.

1500 bytes = Ethernet Payload
-     6 bytes = PPPoE header
-     2 bytes = PPP ID
---------------------------------
1492 bytes = IPv6 Packet that can be carried over a PPPoE connection

If your host is configured with 1492 (or something lower) as MTU on its LAN interface, then the OS running on it will automatically take care of "fragmentation", so you don't need to worry for anything. Unfortunately this isn't a common scenario by default. You either have to configure it manually on the host or if you are lucky enough and the DSL modem supports advertisement of MTU to its LAN interface through RA messages (and your host accepts them), it will happen automatically.

If your host is configured with anything larger than 1492 on its LAN interface (in most cases it's the default of 1500), problems might arise.

Users with hosts running Windows can try to ping an IPv6 address (i.e. the next hop after the DSL router) in order to find possible issues with the MTU. The closer the target is, the easier it will be to troubleshoot the problem. Then you start moving towards the target until you meet the issue.

First, some numbers you will need regarding the various headers

1492 bytes = IPv6 Packet
-  40 bytes = IPv6 Header
-   8 bytes = ICMPv6 Header
-------------------------------
1444 bytes = ICMPv6 payload data

Since Windows ping uses the actual payload as a size, if you want to send a total of 1492 bytes, you have to send 1492-40-8=1444 bytes of ICMPv6 payload data. Anything larger will lead to either a problem or to fragmentation.

Windows>ping -l 1444 x:x::x

Pinging target [x:x:xx] with 1444 bytes of data:

Reply from x:x:xx: time=53ms
Reply from x:x:xx: time=51ms
Reply from x:x:xx: time=54ms
Reply from x:x:xx: time=53ms

Ping statistics for x:x:xx:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 51ms, Maximum = 54ms, Average = 52ms

These are the relevant Wireshark captures.

The ICMP conversation between all involved devices


1444 bytes ICMP request from HOST to TARGET


If you increase the above number, you'd better start looking for "Too big" ICMPv6 received messages from any hop towards the target, otherwise you are in trouble.

i.e. if you ping with 1446 bytes of data, you get the following:

Windows>ping -l 1446 x:x:xx

Pinging target [x:x:xx] with 1446 bytes of data:

Packet needs to be fragmented but DF set.
Reply from x:x:xx: time=53ms
Reply from x:x:xx: time=55ms
Reply from x:x:xx: time=57ms

Ping statistics for x:x:xx:
    Packets: Sent = 4, Received = 3, Lost = 1 (25% loss),
Approximate round trip times in milli-seconds:
    Minimum = 53ms, Maximum = 57ms, Average = 55ms

These are the relevant Wireshark captures.

The ICMP conversation between all involved devices (fragmentation included))

1446 bytes ICMP request from HOST to TARGET
ICMP reply ("Too big") from DSL-ROUTER to HOST (original truncated message included)

As you can see, device DSL-ROUTER is replying with "Too Big" message in the first packet to the HOST and informs it about the MTU (1492) supported in the next-hop link (see RFC 4443 for ICMPv6 info); that's the WAN link towards the BRAS, where PPPoE is running on.

If you are in the unfortunate position to not get any incoming packets, you can safely assume (if everything else is fine) that someone in the path is blocking ICMPv6 messages.

The reply message is exactly 1280 bytes, which is the minimum packet size IPv6 supports. This leads to the original message being truncated in the reply message to 1280-40=1240 bytes for the ICMPv6 packet or  1240-8-40-8=1184 bytes for the actual payload data. So you loose 1446-1184=262 bytes of payload data in the reply message.

Next packets get a successful answer from the target, because they are sent as fragmented (1432+14 bytes).

1432 bytes ICMP request from HOST to TARGET

14 bytes ICMP request from HOST to TARGET

Windows is "smart" enough to keep track of this status for some minutes (in the so called destination or route cache), so next time you send large packets, the first packet is not lost, because fragmentation happens right away.

Windows>ping -l 1446 x:x:xx

Pinging target [x:x:xx] with 1446 bytes of data:

Reply from x:x:xx: time=54ms
Reply from x:x:xx: time=53ms
Reply from x:x:xx: time=55ms
Reply from x:x:xx: time=52ms

Ping statistics for x:x:xx:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 52ms, Maximum = 55ms, Average = 53ms




Imho, it's better to make your host use the appropriate MTU from the beginning (i.e. hardcode 1492 or use RA's value) and not depend on ICMPv6 messages to do fragmentation. Some people have proposed to always use the minimum of 1280 (Geoff Huston, Tore Anderson), in order to be safe on every possible case (tunnels involved). I generally prefer to use the maximum possible, hoping that someone in the middle won't mess with ICMPv6 messages. I know that currently this is not the case (so stick with something lower, like 1280, for now), but this will probably change as native IPv6 gets deployed. Unless we start filtering ICMPv6 messages uncontrollably...like many do on IPv4. Does "Internet Control Message Protocol" say anything to you?

Notes

1) RFC 1982 describes Path MTU Discovery (PMTUD) for IPv6.
2) RFC 4821 will help a lot in PMTUD, when and if all vendors start implementing it.
3) In order to see clearly the fragmented IPv6 packets in Wireshark, you have to disable reassembly in preferences.
4) You can use the commands "ipv6 rc" and "ipv6 rcf" in order to view and clear the destination/route cache in WindowsXP

 
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Greece License.