Understanding Layer 3 Packet Flow in an EVPN VXLAN Network

In the previous post, we walked through an Intra-VNI packet flow within a single bridge domain configured in an EVPN VXLAN fabric. This time, we’ll review how the same packet will be routed through the EVPN fabric between different subnets and, generally speaking, between different VNIs.

We’re still using the same network topology. However, this time Server10 (10.10.100.10) will be communicating with Server17(10.10.101.17)

And again, to simplify the packet capture process, I shut down second link between Server10 and the dc01-r01-leaf02 switch (e1 <==> E1/10).

Everything, as usual starts with Server10 attempting to send an ICMP request to Server17. However, since the destination host is in the different subnet, the traffic will be sent to the default gateway configured on the Server10, which is 10.100.254. This ip address is configured on dc01-r01-leaf01 and dc01-r01-leaf02 as an anycast gateway:

dc01-r01-leaf01# show run int vl100

!Command: show running-config interface Vlan100
!No configuration change since last restart
!Time: Thu Jun 26 17:41:41 2025

version 9.3(3) Bios:version  

interface Vlan100
  no shutdown
  vrf member DB
  no ip redirects
  ip address 10.10.100.254/24
  no ipv6 redirects
  fabric forwarding mode anycast-gateway

So, our leaf switch knows that the packet should be routed and will lookup the tenant routing table:


dc01-r01-leaf01# show ip route 10.10.101.0/24 longer-prefixes vrf DB
IP Route Table for VRF "DB"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>

10.10.101.0/24, ubest/mbest: 2/0
    *via 10.255.255.5%default, [200/0], 10w3d, bgp-65000, internal, tag 65000, segid: 9003911 tunnelid: 0xaffff05 encap: VXLAN
 
    *via 10.255.255.6%default, [200/0], 10w3d, bgp-65000, internal, tag 65000, segid: 9003911 tunnelid: 0xaffff06 encap: VXLAN
 
10.10.101.17/32, ubest/mbest: 1/0
    *via 10.255.255.102%default, [200/0], 00:12:23, bgp-65000, internal, tag 65000, segid: 9003911 tunnelid: 0xaffff66 encap: VXLAN

The output above shows several related routing entries. The/24 entry points to the prefix received from 10.255.255.5 (dc01-r02-leaf01) and 10.255.255.6 (dc01-r02-leaf01), indicating that VLAN 101 (10.10.101.0/24) exists.

In addition to this route, we can also see a host route 10.10.101.17/32 which is available via 10.255.255.102, this IP is defined on both dc01-r02 leaf switches loopback0 interfaces, as they’re configured as a VPC pair.

Let’s review the host route in details:

dc01-r01-leaf01# show bgp l2vpn evpn 10.10.101.17 
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 10.255.255.5:32868
BGP routing table entry for [2]:[0]:[0]:[48]:[5001.0011.0000]:[32]:[10.10.101.17]/272, version 9602
Paths: (2 available, best #2)
Flags: (0x000202) (high32 00000000) on xmit-list, is not in l2rib/evpn, is not in HW
Multipath: iBGP

  Path type: internal, path is valid, not best reason: Neighbor Address, no labeled nexthop
  AS-Path: NONE, path sourced internal to AS
    10.255.255.102 (metric 81) from 10.255.255.2 (10.255.255.2)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 900101 9003911
      Extcommunity: RT:65000:900101 RT:65000:9003911 SOO:10.255.255.102:0 ENCAP:8
          Router MAC:0200.0aff.ff66
      Originator: 10.255.255.5 Cluster list: 10.255.255.2 

  Advertised path-id 1
  Path type: internal, path is valid, is best path, no labeled nexthop
             Imported to 2 destination(s)
             Imported paths list: DB default
  AS-Path: NONE, path sourced internal to AS
    10.255.255.102 (metric 81) from 10.255.255.1 (10.255.255.1)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 900101 9003911
      Extcommunity: RT:65000:900101 RT:65000:9003911 SOO:10.255.255.102:0 ENCAP:8
          Router MAC:0200.0aff.ff66
      Originator: 10.255.255.5 Cluster list: 10.255.255.1 

  Path-id 1 not advertised to any peer

Route Distinguisher: 10.255.255.3:3    (L3VNI 9003911)
BGP routing table entry for [2]:[0]:[0]:[48]:[5001.0011.0000]:[32]:[10.10.101.17]/272, version 9605
Paths: (2 available, best #2)
Flags: (0x000202) (high32 00000000) on xmit-list, is not in l2rib/evpn, is not in HW
Multipath: iBGP

  Path type: internal, path is valid, not best reason: Router Id, no labeled nexthop
             Imported from 10.255.255.6:32868:[2]:[0]:[0]:[48]:[5001.0011.0000]:[32]:[10.10.101.17]/272 
  AS-Path: NONE, path sourced internal to AS
    10.255.255.102 (metric 81) from 10.255.255.1 (10.255.255.1)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 900101 9003911
      Extcommunity: RT:65000:900101 RT:65000:9003911 SOO:10.255.255.102:0 ENCAP:8
          Router MAC:0200.0aff.ff66
      Originator: 10.255.255.6 Cluster list: 10.255.255.1 

  Advertised path-id 1
  Path type: internal, path is valid, is best path, no labeled nexthop
             Imported from 10.255.255.5:32868:[2]:[0]:[0]:[48]:[5001.0011.0000]:[32]:[10.10.101.17]/272 
  AS-Path: NONE, path sourced internal to AS
    10.255.255.102 (metric 81) from 10.255.255.1 (10.255.255.1)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 900101 9003911
      Extcommunity: RT:65000:900101 RT:65000:9003911 SOO:10.255.255.102:0 ENCAP:8

          Router MAC:0200.0aff.ff66
      Originator: 10.255.255.5 Cluster list: 10.255.255.1 

  Path-id 1 not advertised to any peer

By default command will show two entries, one is received within EVPN AFI, and the other one after if was imported into L3 VRF. We’re interested in the first one within EVPN AFI. The record says that two paths were available and one was considered the best (#2), the nexthop is 10.255.255.102 available in the default routing table:

dc01-r01-leaf01# show ip route 10.255.255.102
IP Route Table for VRF "default"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>

10.255.255.102/32, ubest/mbest: 2/0
    *via 172.16.1.1, Eth1/1, [110/81], 10w3d, ospf-UNDERLAY, intra
    *via 172.16.1.25, Eth1/2, [110/81], 10w3d, ospf-UNDERLAY, intra

So per default routing table the traffic will be balanced between two different interfaces pointing to Spine switches. The hash calculation mechanism is quite complicated, and hard to be shown in the virtual environment, so to simplify, we agree that the traffic will be forwarded to spine01 (172.16.1.1)

dc01-r01-leaf01# show forwarding ipv4 10.255.255.102/32 

slot  1
=======


IPv4 routes for table default/base

------------------+-----------------------------------------+----------------------+-----------------+-----------------
Prefix            | Next-hop                                | Interface            | Labels          | Partial Install 
------------------+-----------------------------------------+----------------------+-----------------+-----------------
*10.255.255.102/32   172.16.1.1                                Ethernet1/1      

Further down the path when dc01-spine01 will receive the packet, it doesn’t care anymore about the payload, it just needs to send it further to the destination, which is still 10.255.255.102

The routing table on dc01-spine01 will show the similar output with ECMP routes

dc01-spine01# show ip route 10.255.255.102
IP Route Table for VRF "default"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>

10.255.255.102/32, ubest/mbest: 2/0
    *via 172.16.1.10, Eth1/2, [110/41], 10w3d, ospf-UNDERLAY, intra
    *via 172.16.1.14, Eth1/5, [110/41], 10w3d, ospf-UNDERLAY, intra
dc01-spine01# show forwarding ipv4 10.255.255.102 detail 

slot  1
=======


Prefix 10.255.255.102/32, No of paths: 1, Update time: Mon Apr 14 18:23:33 2025
   172.16.1.10                               Ethernet1/2         

Captured packet on the downstream interface (Eth1/2) will proof that traffic was encapsulated in the VXLAN and forwarded to the dc01-r02-leaf01 switch:

Frame 5: 148 bytes on wire (1184 bits), 148 bytes captured (1184 bits) on interface eth0, id 0
Ethernet II, Src: 50:01:00:00:1b:08 (50:01:00:00:1b:08), Dst: 50:05:00:00:1b:08 (50:05:00:00:1b:08)
Internet Protocol Version 4, Src: 10.255.255.101, Dst: 10.255.255.102
    0100 .... = Version: 4
    .... 0101 = Header Length: 20 bytes (5)
    Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
    Total Length: 134
    Identification: 0x0000 (0)
    000. .... = Flags: 0x0
    ...0 0000 0000 0000 = Fragment Offset: 0
    Time to Live: 254
    Protocol: UDP (17)
    Header Checksum: 0xa79c [validation disabled]
    [Header checksum status: Unverified]
    Source Address: 10.255.255.101
    Destination Address: 10.255.255.102
User Datagram Protocol, Src Port: 56021, Dst Port: 4789
Virtual eXtensible Local Area Network
    Flags: 0x0800, VXLAN Network ID (VNI)
    Group Policy ID: 0
    VXLAN Network Identifier (VNI): 9003911
    Reserved: 0
Ethernet II, Src: 02:00:0a:ff:ff:65 (02:00:0a:ff:ff:65), Dst: 02:00:0a:ff:ff:66 (02:00:0a:ff:ff:66)
Internet Protocol Version 4, Src: 10.10.100.10, Dst: 10.10.101.17
Internet Control Message Protocol

At this moment we know that Server17 is connected to dc01-r02-leaf02, and the traffic will cross the VPC link to reach the endhost. However, the VXLAN header should be stripped before, and the packet will be forwarded per local routing and switching rules

L3VNI BGP table:

dc01-r02-leaf01# show bgp l2vpn evpn vni-id 9003911
BGP routing table information for VRF default, address family L2VPN EVPN
BGP table version is 7443, Local Router ID is 10.255.255.5
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist, I-i
njected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup, 2 - b
est2

   Network            Next Hop            Metric     LocPrf     Weight Path
Route Distinguisher: 10.255.255.5:3    (L3VNI 9003911)
*>i[2]:[0]:[0]:[48]:[5003.0000.1b08]:[0]:[0.0.0.0]/216
                      10.255.255.101                    100          0 i
*>i[2]:[0]:[0]:[48]:[5004.0000.1b08]:[0]:[0.0.0.0]/216
                      10.255.255.101                    100          0 i
*>l[2]:[0]:[0]:[48]:[5005.0000.1b08]:[0]:[0.0.0.0]/216
                      10.255.255.102                    100      32768 i
*>i[2]:[0]:[0]:[48]:[7ed8.f7ad.d433]:[32]:[10.10.100.10]/272
                      10.255.255.101                    100          0 i
* i                   10.255.255.101                    100          0 i
* i[5]:[0]:[0]:[24]:[10.10.101.0]/224
                      10.255.255.6                      100          0 i
*>l                   10.255.255.5                      100      32768 i

VRF Routing table:

dc01-r02-leaf01# show ip route 10.10.101.17/24 longer-prefixes vrf DB
IP Route Table for VRF "DB"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>

10.10.101.0/24, ubest/mbest: 1/0, attached
    *via 10.10.101.254, Vlan101, [0/0], 10w3d, direct
     via 10.255.255.6%default, [200/0], 10w3d, bgp-65000, internal, tag 65000 (b
ackup), segid: 9003911 tunnelid: 0xaffff06 encap: VXLAN

VRF ARP table:

dc01-r02-leaf01# show ip arp vrf DB

Flags: * - Adjacencies learnt on non-active FHRP router
       + - Adjacencies synced via CFSoE
       # - Adjacencies Throttled for Glean
       CP - Added via L2RIB, Control plane Adjacencies
       PS - Added via L2RIB, Peer Sync
       RO - Re-Originated Peer Sync Entry
       D - Static Adjacencies attached to down interface

IP ARP Table for context DB
Total number of entries: 1
Address         Age       MAC Address     Interface       Flags
10.10.101.17    00:03:44  5001.0011.0000  Vlan101         + 

This record finally points to the dc01-r02-leaf02 through a VPC link and further to the endhost port:

dc01-r02-leaf01# show mac address-table address 5001.0011.0000
Legend: 
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
age - seconds since last seen,+ - primary entry using vPC Peer-Link,
(T) - True, (F) - False, C - ControlPlane MAC, ~ - vsan
   VLAN     MAC Address      Type      age     Secure NTFY Ports
---------+-----------------+--------+---------+------+----+------------------
+  101     5001.0011.0000   dynamic  0         F      F    vPC Peer-Link

Final destination:

dc01-r02-leaf02# show mac address-table address 5001.0011.0000
Legend: 
        * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
        age - seconds since last seen,+ - primary entry using vPC Peer-Link,
        (T) - True, (F) - False, C - ControlPlane MAC, ~ - vsan
   VLAN     MAC Address      Type      age     Secure NTFY Ports
---------+-----------------+--------+---------+------+----+------------------
*  101     5001.0011.0000   dynamic  0         F      F    Eth1/10

That is how our request reached the destination server. Server17 replies using the reverse path, sending the traffic directly to dc01-spine01 or dc01-spine02, depending on the applied hash; however, the process itself remains unchanged. This time, I intentionally kept SPINE/LEAF links to demonstrate the scenario where traffic arrives at a switch different from the one with the orphan server connected.

Leave a comment