28Jun

Junos Telemetry Interface (JTI) Ingest using Telegraf

There are multiple official products from Juniper that allow you to ingest Junos Telemetry Interface (JTI) data. However, I ran into couple of situations where I needed to do very quick testing of JTI without full product setup (due to licensing, test scale, and sometimes air gapped infra, etc.).

Telegraf is an extensible, open source server agent to help you collect metrics from your stacks, sensors, and systems. With telegraf, you can use inputs to ingest incoming data from plenty of data sources and formats including OpenTelemetry gRPC, and then redirect them to outputs like InfluxDB or Splunk. The good news is that there exists a plugin for parsing UDP-based Native JTI with telegraf, but unfortunately it is no longer maintained since over 4 years.

I created a fork of the plugin and merged it with telegraf v1.22, and got to a point where it does serve me well for my quick test cases. Here’s how to do quick Junos Telemetry Interface ingest testing using telegraf: Read More »

18Jan

vRouter Agent and Memory Allocation

When vRouter starts, the kernel module allocates memory for both Flow and Bridge tables, which are Hash tables. Therefore, it is required that the memory allocation is Contiguous, meaning that the memory being allocated for the process consists of consecutive blocks, not fragmented across the memory space. How does this affect vRouter as a process? well, if you try to restart it on a system that is short on memory, there is a high probability that it doesn’t come up, primarily due to the memory allocation failure. My own observation has seen this behavior occur when compute nodes have less than ~15GB of free memory. It may not happen, but it can. Again, not directly because the system is low on free memory, but that it does not have a sufficient number of contiguous pages for such allocations. Having more free memory definitely helps though in improving the odds. When the error triggers, something along the lines of the following gets populated in contrail-vrouter-agent.log:

contrail-vrouter-agent: controller/src/vnsw/agent/vrouter/ksync/ksync_memory.cc:100: void KSyncMemory::Mmap(bool): Assertion `table_size_ != 0' failed.

This greatly limits the possibilities of live-patching a system that is running vRouter. A workaround that is usually recommended to avoid such situation of memory fragmentation is to reboot the node, forcing vRouter Kernel Module to be inserted immediately after OS boot up before the memory gets drained or fragmented by different processes. If you have workloads that are running over vRouter Kernel-Mode, it comes down to how much time you can sustain with vRouter down.

You might think: “So where is the resiliency / auto-healing / migration of VMs here?”. Trust me when I tell you that I’ve seen production environments where these options were simply not possible due to infrastructure design constraints. One of those use cases did not even involve vRouter directly. It was more related to a critical VM running on a compute node, with vRouter and SR-IOV interfaces at the same time, and no option to go for a node reboot following a minor release upgrade (VM2 in the following picture):

vRouter SR-IOV Mode

source: https://tungstenfabric.github.io/website/Tungsten-Fabric-Architecture.html#vrouter-deployment-options

Working around this specific use case limitations is a bit further down in the post, but first let’s explore how vRouter recent releases have options to navigate around this issue.

Read More »

13Jan

Expanding Homelab Subnet Using VyOS

I switched my internet service provider a while ago. The router that I have received from my new provider came with no option to change the LAN subnet addressing or prefix. Being stuck with a /24 prefix for both personal devices and homelab needs was a hard pill to swallow. Although I have never actually gotten to a point where I utilized an entire /24 subnet before. The reason was that labs were already set up with a larger and different subnet on my older connection. Plus, having a large prefix gives more flexibility in how things can be segmented. Unfortunately, changing the firmware to something like DD-WRT was not an option, either. Therefore, I had to rely on a virtual router in order to expand my local network.

I chose VyOS, an open source router, to do this. It is really lightweight and simple to configure, and it fits my use case exactly how I wanted it to. I deployed it on my oVirt cluster, with the following setup in mind:

VyOS Gateway Setup

As displayed above, I would like that my VMs in prefix (172.20.20.0/20) are able to reach internet or any other service available in the main LAN (10.0.1.0/24). Therefore, I’m going to let VyOS act as the NAT Gateway for those VMs. Read More »

21Apr

Building Advanced Labs Using EVE-NG

Here’s a tiny backstory: Looking back into the past five to six years, homelabbing has been one of greatest pleasures to fiddle around and come up with the right setup for whatever I’m learning. I’ve went from VMs on my laptop to using XEN on some old machine to VMware ESXi. ESXi is great, but with time, I kept finding myself doing a lot of the same thing over and over, and although I automated most parts that are repetitive, I felt like I wanted something a bit quicker that just works.

For around two years, oVirt has been my choice for hosting my run-of-the-mill – mostly linux – labs. It works well with images, it can be automated, and it does nested virtualization very, very well. Nested Virtualization and speed to bring VMs up were my two absolutely favorite parts, especially as I started working on Contrail and Openstack. I still happily run oVirt today for everything, except Networking Labs. That’s the spot I realized I needed something fresh that tackles the problem of network devices labs in a smarter way.

Enter EVE-NG

My problem began when I needed to run labs containing network devices (Juniper vMX, vSRX, Cumulus, VyOS, etc). Unfortunately, creating such labs over oVirt or even nested over KVM within a linux host can be very tricky and time consuming with lots of moving parts that can easily break it. It just didn’t make sense to me that running a back-to-back connection between two devices meant you’d create a bridge and link both VMs on it. This is also how ESXi does it apparently, and it is a very tedious process that I don’t think I’d want to go through when I want to quickly spin something up to try it. That’s when EVE-NG came into the picture.

Emulated Virtual Environment – Next Generation (EVE-NG) simplifies the process of running labs containing network devices, and the way to interconnect them with other virtual nodes. It solves the missing piece of making such labs easier to build, and it does it in an intuitive way.

EVE-NG supports a whole bunch of disk images to be used when building labs. You have Linux (of course), and you have an increasing list of supported network virtual devices including Juniper, Cisco, F5, and many others. You can check the full list here. The User Interface of EVE-NG makes it so that the topology you see does actually work. You can place the devices, Drag-and-Drop cables to establish connectivity between the devices and select the wanted ports, re-organize and label things, it just does it in a good presentable way that also allows a speedy lab building process. There are many other features that you should definitely check out on their website

EVE-NG Deployment Options

Read More »

25Jan
Velocifire TKL71WS

OSX: Fix tilde/accent key on Velocifire TKL71WS Keyboard

I recently got the Velocifire TKL71WS Wireless Keyboard for my daily use. I like it a lot thus far – although battery life could be a little bit better, still under testing with different illumination profiles -, but it overall does the job for me.

The only caveat that I had with it is that it is missing a dedicate tilde/accent key (~/`). To use them, you have to hold the Function key (2nd key to the right side of space bar) and press Escape. That is quite a finger-twister and I found myself struggling with it a lot. Also, that is the method to write the Arabic character “ذ” which is also inconvenient.

Karabiner-Elements: Read More »

1Oct

BGP as a Service (BGPaaS) Between Contrail, vSRX and Quagga

In the previous post, we discussed how to deploy vSRX on Openstack. Since I’m using Contrail as my SDN for this setup, this one will be about setting up BGP as a Service (BGPaaS) between Contrail, vSRX, and Quagga.

The scenario I’m trying to demonstrate is the following:

  • vSRX to advertise the pool 90.90.90.0/29
  • Quagga to advertise the pool 80.80.80.0/29
  • Both pools should exchanged between the nodes and Contrail using BGPaaS.
Contrail Configuration:

vSRX and Quagga VMs are attached to ports on ZoneA_MGMT Virtual Network (172.90.90.0/24), as follows

  • Virtual Network Gateway: 172.90.90.1 [AS64512]
  • vSRX: 172.90.90.10 [AS35500]
  • Quagga: 172.90.90.20 [AS42750]

BGPaaS_Topology

We will setup two BGP as a Service objects in Contrail under Configure > Services > BGP as a Service, like this:

Read More »

22Jul

Deploying vSRX 3.0 (19.1R1) on Openstack

vSRX is the Virtual edition of Juniper’s SRX Series physical firewalls, offering same features but in a much lighter package suitable for virtual and cloud environments. vSRX 3.0 is the new architecture of vSRX that was introduced back in 18.4R1. Many features have been introduced with that architecture change, including a greatly improved boot time compared to the old one. I’m experimenting with it to demonstrate some features of Contrail, so here’s how to deploy it on Openstack environments.

Get vSRX 3.0 Image

You can download vSRX 3.0 images directly from Juniper support website here. Make sure to download the qcow2 image file. For this post, I’ll be using 19.1R1-S1.3, but procedures will likely be similar on all vSRX 3.0 releases. You can also obtain an evaluation license from here.

Create the Config File

While creating the instance, you should provide it with the configuration file that must be applied to vSRX. You can boot the instance without config though, but you would have to do everything manually after boot-up, not fun.

Configuration file must start with #junos-config which will be interpreted by cloud-init to do the deployment. The following is a sample configuration file. Password for contrail user is c0ntrail123:

Read More »

28Mar

Resolving config_database_cassandra Container Restart Loop

A not-so-fun error occurred earlier today when my standalone Contrail/Tungsten Fabric Controller host went down. After bringing it back up, Cassandra DB was reporting the following errors:

Standalone Controller is not a recommended design, due to the nature of components running in it, plus how vRouter connects with controllers. This post is discussing a PoC setup.

Using the command contrail-status to display all services on that node (output will be different on vRouter nodes):

== Contrail control ==
control: initializing (Database:Cassandra connection down)
nodemgr: initializing

== Contrail config-database ==
nodemgr: initializing (Cassandra state detected DOWN. )

== Contrail database ==
nodemgr: initializing

== Contrail analytics ==
snmp-collector: initializing (Database:Cassandra[] connection down)
query-engine: initializing
alarm-gen: initializing (Database:Cassandra[] connection down)
nodemgr: initializing
collector: initializing (Database:Cassandra, Database:contrail-01.ameen.lab:Global connection down)
topology: initializing (Database:Cassandra[] connection down)

== Contrail webui ==

== Contrail config ==
svc-monitor: initializing (Database:Cassandra[] connection down)
nodemgr: initializing
device-manager: initializing (ApiServer:ApiServer[] connection down)
api: initializing (Database:Cassandra[] connection down)
schema: initializing (ApiServer:ApiServer[] connection down)

Also, some services were reporting state UP for less than 2 minutes, while the controller node itself was up for almost an hour:

Pod              Service         Original Name                          State    Status      
config-database  cassandra       contrail-external-cassandra            running  Up 11 seconds  
database         cassandra       contrail-external-cassandra            running  Up About a minute  
control          nodemgr         contrail-nodemgr                       running  Up About a minute 
config-database  nodemgr         contrail-nodemgr                       running  Up 34 seconds 

Checking on the Cassandra container revealed the issue: Read More »

18Mar

Generate Link-Local Mapping for VMs on Tungsten Fabric

Lately, I’ve been fiddling around with Juniper Contrail (Available as Upstream project: Tungsten Fabric). So, I’ll be posting about different stuff I learn about it, SDN in general, and Openstack as well.

One thing that I find myself doing often is testing connectivity between different network resources, primarily VMs. To do so, sometimes I need to test end-to-end connectivity which requires accessing the VM and initiating something as simple as a ping command to see what happens.

However, VNC Console (Or direct connectivity from my workstation towards Overlay/Virtual Networks that virtual machines are connected to may not be available. For this, I need to connect to the VM using the link-local IP address directly from the vRouter / Compute node.

I wrote a python script that uses Contrail API Introspect service to fetch info about compute nodes, then prints the info for VMs hosted on each one of them. In this example, I need to access a VM called AAP_02, so I use the script to find on which vRouter / Compute node it is hosted, then access it directly from there without needing to source Openstack credentials:

Read More »

6Oct

Using Reboot Module in Ansible 2.7

For a while, rebooting a Linux machine using Ansible has been done primarily using a combination of a reboot shell command module and a wait_for clause to pause execution of other tasks until that machine has came back up. This is also the method that is being taught at the current available revision of DO407 Automation with Ansible course by Red Hat (based on Ansible 2.3). Using this method can be done as follows:

---
- name: Reboot and wait until the server is up
  hosts: server1
  tasks:

    - name: reboot machine
      shell: sleep 2; shutdown -r now "Ansible triggered reboot"
      async: 1
      poll: 0
      ignore_errors: true

    - name: Wait for server to come back
      wait_for:
        host: "{{ inventory_hostname }}"
        state: started
        delay: 30
        timeout: 300
        port: 22
      delegate_to: localhost

New in Ansible 2.7: reboot module

Read More »

Written with love ♥