DCI using VXLAN with MP-BGP EVPN and Ingress Replication on a Nexus 9K

I have been messing around with my new Nexus 9000v’s and wanted to have a crack at setting up VXLAN using MP-BGP EVPN as the control plane. There is a lot of literature available on the topic however the topology I had in mind didn’t seem to be covered in any detail. I wanted to use it as a simple 2 site DCI solution to stretch some VLANs with just a pair of vPC connected leafs in each site. I didn’t want to use multicast for BUM (broadcast, unknown unicast and multicast) traffic and I wanted to keep the BGP configuration as simple as possible. After doing a lot of reading and piecing together of various configuration snippets, this is what I came up with. It is based upon 2 sites with a pair of port based point to point ethernet services connecting them and uses Ingress Replication instead of multicast to handle BUM traffic. This type of use case would be pretty common in the small to medium enterprise world. There is a good guide available here which gives an overview of VXLAN BGP EVPN.

This post assumes you already have a pair of Nexus 9K’s configured with vPC in each site.

The first step is to enable all of the required features on the switches with the following commands:

feature ospf
feature bgp
feature udld
feature interface-vlan
feature vn-segment-vlan-based
feature nv overlay
nv overlay evpn

 

Now we need to setup the L3 underlay to provide basic connectivity and IGP routing between all of the switches. My protocol of choice is OSPF however IS-IS and other IGP’s will work fine.

Start by configuring a loopback interface with a unique IP for each switch.

interface loopback0
ip address 172.16.1.1/32

 

Next we configure OSPF using the loopback address as the router-id.

router ospf L3Core
 router-id 172.16.1.1
 log-adjacency-changes
 passive-interface default

 

And then we add the OSPF instance to our loopback address.

interface loopback0 
ip address 172.16.1.1/32
ip router ospf L3Core area 0.0.0.0

 

Now we need to configure L3 interfaces on our switches to terminate the inter site port services and connections between each switch within the DC’s. I used additional L3 connections between the switches within each site, you could also use an SVI and the existing vPC peer-link to connect them, you just need to make sure you allow the VLAN across the peer-link and do NOT configure the VLAN you use on any other ports (i.e. the VLAN must be dedicated to the L3 connection only). Notice the MTU is set to 9216, it is best to set this as high as the underlying network allows as the VXLAN encapsulation adds extra bytes to each packet.

interface Ethernet1/3
 description L3 to SW-NEXUS03
 no switchport
 mtu 9216
 ip address 10.10.0.97/30
 ip ospf dead-interval 2
 ip ospf hello-interval 1
 ip ospf network point-to-point
 no ip ospf passive-interface
 ip router ospf L3Core area 0.0.0.0
 no shutdown

 

Once you have finished configuring all of the L3 interconnects it should resemble the below diagram. If you do a show ip route you should see all of the IP’s listed.

 

That’s the underlay sorted, now it is time for the overlay.

We start with an additional loopback interface for each switch. As we are using vPC, this interface will have two ip addresses. The first is unique to each switch, the second should be the same for each vPC pair and will be used as the VTEP (VXLAN tunnel endpoint) address.

interface loopback1
 ip address 192.168.1.1/32
 ip address 192.168.1.10/32 secondary
 ip router ospf L3Core area 0.0.0.0

 

Next step is to map the VLANs that we wish to stretch between the data centres to VXLAN VNIDs (Virtual Network Identifiers)

vlan 100
name Servers
vn-segment 1100

vlan 200
name Management
vn-segment 1200

vlan 300
name Backups
vn-segment 1300

 

Now we configure BGP. We will be using the loopback0 address for the router-id and defining each of the other switches as a neighbor so the config on each switch will be slightly different.

router bgp 100
 router-id 172.16.1.1
 address-family ipv4 unicast
 address-family l2vpn evpn
 neighbor 172.16.2.2
 remote-as 100
 update-source loopback0
 address-family ipv4 unicast
 address-family l2vpn evpn
     send-community
     send-community extended
 neighbor 172.16.3.3
 remote-as 100
 update-source loopback0
 address-family ipv4 unicast
 address-family l2vpn evpn
     send-community
     send-community extended
 neighbor 172.16.4.4
 remote-as 100
 update-source loopback0
 address-family ipv4 unicast
 address-family l2vpn evpn
     send-community
     send-community extended

 

And now EVPN

evpn
 vni 1100 l2
   rd auto
   route-target import auto
   route-target export auto
 vni 1200 l2
   rd auto
   route-target import auto
   route-target export auto
 vni 1300 l2
   rd auto
   route-target import auto
   route-target export auto

 

The last step is the NVE interface

interface nve1
 no shutdown
 source-interface loopback1
 host-reachability protocol bgp
 source-interface hold-down-time 30
 member vni 1100
   ingress-replication protocol bgp
 member vni 1200
   ingress-replication protocol bgp
 member vni 1300
   ingress-replication protocol bgp

 

If you run a show nve peers you should now see the IP of the other VTEP.

 

Hosts connected to one of the stretched VLANs in DC A should now be able to communicate with hosts on the same VLAN in DC B.

Job done!

8 Comments on "DCI using VXLAN with MP-BGP EVPN and Ingress Replication on a Nexus 9K"


  1. Nice Lab!

    I did almost exacly the same LAB in GNS3 before finding you page, only diffrence is I’m
    using multicast for BUM. Now I also tried your config. The only issue I see is when using iperf to send multicasts from a ubuntu server at one site I do receive duplicates at other side. Wonder if this due to my setup or an actual problem? I use 7.0(3)I7(3) in 9000v

    Thanks
    Nikas

    Reply

    1. Hey Niklas
      I dont recall experiencing that issue with my setup, when I next run it up I will check it out.
      Regards
      Kirin

      Reply

    1. Secondary link is there as an alternate route in case Primary goes down. For the use case I had I needed to steer traffic over the Primary unless there was an outage. This was done by adding a cost to the Secondary link route to make the Primary via the local L3 PtP link the favoured route from the switches directly connected via the secondary. I am not sure if you could do a true ECMP as I am not fully across the behaviour of the VTEP in a vPC situation (i.e. whether the VTEP interface is available simultaneously on both switches in the vPC pair). Might be something worth testing out!

      Regards

      Kirin

      Reply

  2. Hi Kirin,

    if i have two Nexus on each site, do all switches have to be interconnected, like SW-NEXUS01 on DC site has to connect to SW-NEXUS03 and SW-NEXUS04 , well just a clarification.

    Reply

    1. Hi Ashley

      Each switch participating in the overlay needs to be able to reach each other via the underlay. The network topology doesn’t have to look like the diagram in the post but there does need to be Layer 3 connectivity between each switch.

      Regards

      Kirin

      Reply

  3. Hello all
    Let me please know some details. I have similar topology but instead of two vpc, I have vpc pair in my primary DC and only one nexus in my secondary DC (not vpc) could I config between my DC VXLAN ?

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *