Monday, May 23, 2011

To forward, to peer, or to tunnel?

In an imaginary Cisco world every device would be able to talk with every other device in various layers. In the actual Cisco world, some devices can talk to some devices, while they can't talk to some other devices.

I'm talking specifically about L2 Control Protocols (L2CPs), when these need to be exchanged between different devices in order to support a requirement (i.e. create a spanning-tree loop). Cisco's L2 Protocol Tunneling (L2PT) can help in accomplishing some of these cases.

So let's suppose you have a scenario like the following.

When using the simplest form of devices (L2 switches like 3750), you can just tunnel the L2CPs between devices S1 and S2 and everything will be fine. Spanning tree running on devices C1-C4 will see a loop and will block a port depending on various parameters (priority, cost, etc).

As you move ahead and start to replace the S1 or S2 device with another (usually better), you realize that the new device supports a different way of handling L2CPs, which might be "incompatible" with the old way.

Generally, you can do the following actions on L2CPs as they enter a port:

forward: frame is forwarded to another device without any change (no local processing takes place)
drop: frame is dropped
peer: frame is processed/terminated locally
tunnel: frame is tunneled to another device after changing the destination mac address (L2PT)

Tunneling is quite common is scenarios like the above, where you need to pass the L2 frame across a L2 domain, without having the intermediate devices act upon it.

You can also achieve the same result with forwarding, as long as you don't have a native L2 domain in between, because you might end up mixing local protocols with protocols that just pass over.

It's obvious that you cannot have tunneling on one side and forwarding on the other side, because exchanged frames won't be able to "talk" each other. i.e. for STP one side will tunnel the frame by changing the destination mac address from 01-00-0c-cc-cc-cd (or 01-80-c2-00-00-00) to 01-00-0c-cd-cd-d0, while the other side will just forward the frame by keeping the original destination address of 01-00-0c-cc-cc-cd (or 01-80-c2-00-00-00).

Below you'll find a list with all available options regarding the handling of L2CPs on some known platforms:

Device Interface forward drop peer tunnel
3750 L2 switchport l2protocol-tunnel
ME-3400 L2 switchport l2protocol-tunnel
ME-3800X L2 switchport l2protocol drop l2protocol peer
ME-3800X L2 service instance l2protocol forward (1) l2protocol peer l2protocol tunnel
7600/67xx L2 switchport l2protocol-tunnel
7600/ES L2 switchport l2protocol-tunnel
7600/ES L3 l2protocol drop l2protocol peer
7600/ES L3 service instance l2protocol forward
ASR9000 L2 transport by default (2) l2protocol tunnel

As you can see, you cannot have L2 communication between a service instance on a 7600/ES and one of the smaller platforms, because 7600/ES doesn't support tunneling and the smaller platforms do not support forwarding. Actually, the biggest surprise to me was the lack of support of L2PT on the 7600 with the ES cards when using service instances. I had the impression that this would be the most feature rich platform.

Cisco's proposal is to use the same platform for such scenarios, because they haven't verified anything else and some platforms were built to be used in specific ways. So instead of supporting the same feature (L2PT was their idea after all) along the range of platforms, you should always replace them in pairs. And if by accident, you happen to have more S devices serving many overlapping rings, then you have to replace all of them.

I would prefer, instead of promoting new platforms or new designs, to focus on fixing the existing platforms, so they can cooperate with each other. After all, if a platform is good enough, it will get its share in the market.

Also, the online documentation is quite incomplete on this area. You have to guess what will happen in most cases. We had to open 3 different cases and involve our account team in order to clarify things and push for fixing the documentation. Not surprisingly enough, the peering functionality is another mess. I'll probably need to write another post describing all available options (which lead to different behavior) on these platforms.


1) "l2protocol forward" on ME-3800X will become available in the next major release. Thanks to Cisco for the chance to try it earlier.
2) This is the default behavior according to Xander's doc here.
3) Arie asked me to add some extra information about PW/MST/REP/PVST-AG (and all these L2 HA) scenarios. I'll try to write a new post as soon as i find enough free time to test them.

Thursday, May 5, 2011

How Multi is MP-BGP in IOS-XR?

This caught me on surprise. I had an impression that IOS-XR as an advanced operating system would support all kinds of multi-protocol transferability over BGP.

As it seems, there is an issue when transferring IPv6 prefixes over an IPv4 peering or IPv4 prefixes over an IPv6 peering. This happens for sure on ASR9k running latest 4.1.0, but i haven't verified it on the CRS yet.

IPv4 prefixes over IPv6 peering
This doesn't seem to be supported based on the available configuration options.
What is even more worrying, is that no other address family is supported too.

RP/0/RSP0/CPU0:ASR#conf t
RP/0/RSP0/CPU0:ASR(config)#router bgp 100
RP/0/RSP0/CPU0:ASR(config-bgp)#neighbor 2001::1:2:3
RP/0/RSP0/CPU0:ASR(config-bgp-nbr)#address-family ?
  ipv6  IPv6 Address Family

IPv6 prefixes over IPv4 peering
This is supported according to the configuration options, but it doesn't work.
Cisco also insists that this is definitely supported.

RP/0/RSP0/CPU0:ASR#conf t
RP/0/RSP0/CPU0:ASR(config)#router bgp 100
RP/0/RSP0/CPU0:ASR(config-bgp-nbr)#address-family ?
  ipv4   IPv4 Address Family
  ipv6   IPv6 Address Family
  l2vpn  L2VPN Address Family
  vpnv4  VPNv4 Address Family
  vpnv6  VPNv6 Address Family

As soon as you enable the IPv6 address family under the IPv4 neighbor, the BGP session is dropped and it never comes up.

RP/0/RSP0/CPU0:ASR#sh bgp sum
BGP router identifier, local AS number 100
BGP generic scan interval 60 secs
BGP table state: Active
Table ID: 0xe0000000   RD version: 1
BGP main routing table version 1
BGP scan interval 60 secs

BGP is operating in STANDALONE mode.

Process       RcvTblVer   bRIB/RIB   LabelVer  ImportVer  SendTblVer  StandbyVer
Speaker               1          1          1          1           1           1

Neighbor        Spk    AS MsgRcvd MsgSent   TblVer  InQ OutQ  Up/Down  St/PfxRcd      0  100        0       0        0    0    0 00:00:00 Idle

Also, debug shows that there are no tries of BGP to establish a session. It's like BGP gets disabled.

The only doc that refers such a limitation (in IOS-XR 3.3 for CRS) is the one in

A given address family is only supported with a neighbor whose address is from that address family. For instance, IPv4 neighbors support IPv4 unicast and multicast address families, and IPv6 neighbors support IPv6 unicast and multicast address families. However, you cannot exchange IPv6 routing information with an IPv4 neighbor and vice versa.

I searched all CCO for more information, but i didn't manage to find something useful. Does anyone have extra information to share? TAC is struggling (as usual) to find an answer...

Update #1
Cisco verified once more that this is a supported configuration. Arie Vayner (and later tac) proposed to add an IPv6 address to the interface being used as an IPv4 next-hop. Indeed, this solved the problem and the BGP session came up. But then it became even more interesting...

Two IPv6 prefixes are learned from the IPv4 neighbor. Next-hop is an IPv6 address.

RP/0/RSP0/CPU0:ASR#sh bgp ipv6 uni 
BGP router identifier, local AS number 100
BGP generic scan interval 60 secs
BGP table state: Active
Table ID: 0xe0800000   RD version: 5
BGP main routing table version 5
BGP scan interval 60 secs

Status codes: s suppressed, d damped, h history, * valid, > best
i - internal, r RIB-failure, S stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network            Next Hop            Metric LocPrf Weight Path
* i2001::1:2:3/128    2003::1:2:3              0    100      0 ?
* i2003::/64          2003::1:2:3              0    100      0 ?

Processed 2 prefixes, 2 paths

If i remove the IPv6 address from the interface that is being used as next-hop (the one i added before), then i automatically get an IPv6 prefix with an IPv4 next-hop!!!

RP/0/RSP0/CPU0:core-distr-kln-02#sh bgp ipv6 uni 
BGP router identifier, local AS number 100
BGP generic scan interval 60 secs
BGP table state: Active
Table ID: 0xe0800000   RD version: 6
BGP main routing table version 6
BGP scan interval 60 secs

Status codes: s suppressed, d damped, h history, * valid, > best
i - internal, r RIB-failure, S stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network            Next Hop            Metric LocPrf Weight Path
*>i2001::1:2:3/128             0    100      0 ?

Processed 1 prefixes, 1 paths

The BGP session stays up, until something happens that will reset it. Then it will stay down forever, as it was happening in the beginning.

I must say that i cannot endorse such an implementation. Using exactly the same configuration, you get different results, depending on the order of (un)configuring things. Also, i cannot understand why the establishment of an IPv4 BGP session that is going to negotiate IPv4/IPv6 address-family capabilities should depend on whether an IPv6 next-hop exists or not. That should be left for the NLRI exchange routine.

After all, RFC 4271 defines among others two error conditions for the NEXT_HOP attribute:

If the NEXT_HOP attribute field is syntactically incorrect, then the Error Subcode MUST be set to Invalid NEXT_HOP Attribute. The Data field MUST contain the incorrect attribute (type, length, and value). Syntactic correctness means that the NEXT_HOP attribute represents a valid IP host address.

If the NEXT_HOP attribute is semantically incorrect, the error SHOULD be logged, and the route SHOULD be ignored. In this case, a NOTIFICATION message SHOULD NOT be sent, and the connection SHOULD NOT be closed.

Update #2
After the developers got involved, we ended up with the following:

  1. In IOS-XR you need an IPv6 NH in order to activate the IPv6 AF for an IPv4 BGP session.
  2. If you don't have an IPv6 NH, then the IPv4 BGP session won't even come up.
  3. The above was done to protect against misconfiguration, because otherwise you would get a misleading v4 mapped v6 address as NH.
  4. If you have an IPv6 NH, then the IPv4 BGP session with the IPv6 AF will come up.
  5. If afterwards you remove the IPv6 NH, then the session deliberately remains up and you get a misleading v4 mapped v6 address as NH.
Cisco agreed (thx Xander) that the behavior in 3 and 5 contradict each other, so a short-term solution (update the documentation and print a warning message) got recorded in CSCtq26829.

Wednesday, May 4, 2011

BRAS/Server initiated renewal for DHCPv6-PD leases - When?

One major issue when dealing with IPv6 CPEs is the currently missing capability to renew automatically the IPv6 addresses on the CPE's LAN after a disconnect/reconnect of the subscriber's dynamic session.

Although there are some tricks (#1, #2) for client (subscriber) initiated renewal, not all CPE vendors support those tricks. Also many times it is preferable to have the BRAS/BNG, or generally the ISPs, control this renewal, since all the AAA (and BSS/OSS) systems are usually managed by them.

The DHCPv6 "Reconfigure" message was made to help in the above case. According to RFC 3315:

RECONFIGURE (10) A server sends a Reconfigure message to a client to inform the client that the server has new or updated configuration parameters, and that the client is to initiate a Renew/Reply or Information-request/Reply transaction with the server in order to receive the updated information.
The client includes a Reconfigure Accept option if the client is willing to accept Reconfigure messages from the server.

It's obvious that without this support, a client must wait until it renews its lease to get configuration updates, which might be from some hours to many days. Btw, shouldn't the change of the WAN interface state on the CPE automatically cause the renewal of the delegated prefix on its LAN???

Also, according to the recently approved informational RFC 6204, the support of the DHCPv6 Reconfigure option is a MUST for IPv6 CPEs.

WAA-4: The IPv6 CE router MUST be able to support the following DHCPv6 options: IA_NA, Reconfigure Accept and DNS_SERVERS.

Now, someone malicious might translate the above "MUST be able to support" phrase into "ok, it's not actually required to support it now, but you must be able to support it in the future". It definitely would be better to have it as "MUST support".

A recent "IPv6 CE Router Interoperability Whitepaper" from UNH-IOL shows that none of the CPEs that were tested, supported this option.

The last issue discovered during the testing was IPv6 CE router lack of support for DHCP Reconfigure. According to draft-ietf-v6ops-ipv6-cpe-router-09, “WAA-4: The IPv6 CE router MUST be able to support the following DHCPv6 options: IA_NA, Reconfigure Accept [RFC3315], DNS_SERVERS [RFC3646].” Therefore the IPv6 CE routers should have included the Reconfigure Accept in DHCPv6 Request or Solicit messages.

It gets a little bit more complicated, if you check what RFC 3633 says about the "Reconfigure" message when it is used for Prefix Delegation:

13.1. Delegating Router behavior

The delegating router initiates a configuration message exchange with a requesting router, as described in section 19, "DHCP Server-Initiated Configuration Exchange" of RFC 3315, by sending a Reconfigure message (acting as a DHCP server) to the requesting router, as described in section 19.1, "Server Behavior" of RFC 3315. The delegating router specifies the IA_PD option in the Option Request option to cause the requesting router to include an IA_PD option to obtain new information about delegated prefix(es).

13.2. Requesting Router behavior

The requesting router responds to a Reconfigure message, acting as a DHCP client, received from a delegating router as described in section 19.4, "Client Behavior" of RFC 3315. The requesting router MUST include the IA_PD Prefix option(s) (in an IA_PD option) for prefix(es) that have been delegated to the requesting router by the delegating router from which the Reconfigure message was received.

So, if someone claims support of the "Reconfigure" option, where does it refer to? DHCPv6 or DHCPv6-PD? What about Relay?

On the server side, Juniper MX series already support it (it's called "dynamic reconfiguration for DHCPv6"), but Cisco ASR1k doesn't. Cisco CNR 7.x also supports it, so does (or will) Dibbler 0.8.0. ISC DHCPv6 server and Windows Server 2008 probably don't.


  • Our experience with IPv6 CPEs until now is disappointing on this matter. Although we have feedback from various CPE vendors that they will support it, none of them actually supports it now.
  • Wouldn't it be interesting to have the "Reconfigure" message be sent by the BRAS/BNG DHCPv6 server to the client, when the router receives a Radius CoA (RFC 3576) packet for this specific subscriber?

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Greece License.