SDN

18Jan

vRouter Agent and Memory Allocation

When vRouter starts, the kernel module allocates memory for both Flow and Bridge tables, which are Hash tables. Therefore, it is required that the memory allocation is Contiguous, meaning that the memory being allocated for the process consists of consecutive blocks, not fragmented across the memory space. How does this affect vRouter as a process? well, if you try to restart it on a system that is short on memory, there is a high probability that it doesn’t come up, primarily due to the memory allocation failure. My own observation has seen this behavior occur when compute nodes have less than ~15GB of free memory. It may not happen, but it can. Again, not directly because the system is low on free memory, but that it does not have a sufficient number of contiguous pages for such allocations. Having more free memory definitely helps though in improving the odds. When the error triggers, something along the lines of the following gets populated in contrail-vrouter-agent.log:

contrail-vrouter-agent: controller/src/vnsw/agent/vrouter/ksync/ksync_memory.cc:100: void KSyncMemory::Mmap(bool): Assertion `table_size_ != 0' failed.

This greatly limits the possibilities of live-patching a system that is running vRouter. A workaround that is usually recommended to avoid such situation of memory fragmentation is to reboot the node, forcing vRouter Kernel Module to be inserted immediately after OS boot up before the memory gets drained or fragmented by different processes. If you have workloads that are running over vRouter Kernel-Mode, it comes down to how much time you can sustain with vRouter down.

You might think: “So where is the resiliency / auto-healing / migration of VMs here?”. Trust me when I tell you that I’ve seen production environments where these options were simply not possible due to infrastructure design constraints. One of those use cases did not even involve vRouter directly. It was more related to a critical VM running on a compute node, with vRouter and SR-IOV interfaces at the same time, and no option to go for a node reboot following a minor release upgrade (VM2 in the following picture):

vRouter SR-IOV Mode

source: https://tungstenfabric.github.io/website/Tungsten-Fabric-Architecture.html#vrouter-deployment-options

Working around this specific use case limitations is a bit further down in the post, but first let’s explore how vRouter recent releases have options to navigate around this issue.

Read More »

1Oct

BGP as a Service (BGPaaS) Between Contrail, vSRX and Quagga

In the previous post, we discussed how to deploy vSRX on Openstack. Since I’m using Contrail as my SDN for this setup, this one will be about setting up BGP as a Service (BGPaaS) between Contrail, vSRX, and Quagga.

The scenario I’m trying to demonstrate is the following:

  • vSRX to advertise the pool 90.90.90.0/29
  • Quagga to advertise the pool 80.80.80.0/29
  • Both pools should exchanged between the nodes and Contrail using BGPaaS.
Contrail Configuration:

vSRX and Quagga VMs are attached to ports on ZoneA_MGMT Virtual Network (172.90.90.0/24), as follows

  • Virtual Network Gateway: 172.90.90.1 [AS64512]
  • vSRX: 172.90.90.10 [AS35500]
  • Quagga: 172.90.90.20 [AS42750]

BGPaaS_Topology

We will setup two BGP as a Service objects in Contrail under Configure > Services > BGP as a Service, like this:

Read More »

22Jul

Deploying vSRX 3.0 (19.1R1) on Openstack

vSRX is the Virtual edition of Juniper’s SRX Series physical firewalls, offering same features but in a much lighter package suitable for virtual and cloud environments. vSRX 3.0 is the new architecture of vSRX that was introduced back in 18.4R1. Many features have been introduced with that architecture change, including a greatly improved boot time compared to the old one. I’m experimenting with it to demonstrate some features of Contrail, so here’s how to deploy it on Openstack environments.

Get vSRX 3.0 Image

You can download vSRX 3.0 images directly from Juniper support website here. Make sure to download the qcow2 image file. For this post, I’ll be using 19.1R1-S1.3, but procedures will likely be similar on all vSRX 3.0 releases. You can also obtain an evaluation license from here.

Create the Config File

While creating the instance, you should provide it with the configuration file that must be applied to vSRX. You can boot the instance without config though, but you would have to do everything manually after boot-up, not fun.

Configuration file must start with #junos-config which will be interpreted by cloud-init to do the deployment. The following is a sample configuration file. Password for contrail user is c0ntrail123:

Read More »

28Mar

Resolving config_database_cassandra Container Restart Loop

A not-so-fun error occurred earlier today when my standalone Contrail/Tungsten Fabric Controller host went down. After bringing it back up, Cassandra DB was reporting the following errors:

Standalone Controller is not a recommended design, due to the nature of components running in it, plus how vRouter connects with controllers. This post is discussing a PoC setup.

Using the command contrail-status to display all services on that node (output will be different on vRouter nodes):

== Contrail control ==
control: initializing (Database:Cassandra connection down)
nodemgr: initializing

== Contrail config-database ==
nodemgr: initializing (Cassandra state detected DOWN. )

== Contrail database ==
nodemgr: initializing

== Contrail analytics ==
snmp-collector: initializing (Database:Cassandra[] connection down)
query-engine: initializing
alarm-gen: initializing (Database:Cassandra[] connection down)
nodemgr: initializing
collector: initializing (Database:Cassandra, Database:contrail-01.ameen.lab:Global connection down)
topology: initializing (Database:Cassandra[] connection down)

== Contrail webui ==

== Contrail config ==
svc-monitor: initializing (Database:Cassandra[] connection down)
nodemgr: initializing
device-manager: initializing (ApiServer:ApiServer[] connection down)
api: initializing (Database:Cassandra[] connection down)
schema: initializing (ApiServer:ApiServer[] connection down)

Also, some services were reporting state UP for less than 2 minutes, while the controller node itself was up for almost an hour:

Pod              Service         Original Name                          State    Status      
config-database  cassandra       contrail-external-cassandra            running  Up 11 seconds  
database         cassandra       contrail-external-cassandra            running  Up About a minute  
control          nodemgr         contrail-nodemgr                       running  Up About a minute 
config-database  nodemgr         contrail-nodemgr                       running  Up 34 seconds 

Checking on the Cassandra container revealed the issue: Read More »

18Mar

Generate Link-Local Mapping for VMs on Tungsten Fabric

Lately, I’ve been fiddling around with Juniper Contrail (Available as Upstream project: Tungsten Fabric). So, I’ll be posting about different stuff I learn about it, SDN in general, and Openstack as well.

One thing that I find myself doing often is testing connectivity between different network resources, primarily VMs. To do so, sometimes I need to test end-to-end connectivity which requires accessing the VM and initiating something as simple as a ping command to see what happens.

However, VNC Console (Or direct connectivity from my workstation towards Overlay/Virtual Networks that virtual machines are connected to may not be available. For this, I need to connect to the VM using the link-local IP address directly from the vRouter / Compute node.

I wrote a python script that uses Contrail API Introspect service to fetch info about compute nodes, then prints the info for VMs hosted on each one of them. In this example, I need to access a VM called AAP_02, so I use the script to find on which vRouter / Compute node it is hosted, then access it directly from there without needing to source Openstack credentials:

Read More »

Written with love ♥