Tag: hugepages

18Jan

vRouter Agent and Memory Allocation

When vRouter starts, the kernel module allocates memory for both Flow and Bridge tables, which are Hash tables. Therefore, it is required that the memory allocation is Contiguous, meaning that the memory being allocated for the process consists of consecutive blocks, not fragmented across the memory space. How does this affect vRouter as a process? well, if you try to restart it on a system that is short on memory, there is a high probability that it doesn’t come up, primarily due to the memory allocation failure. My own observation has seen this behavior occur when compute nodes have less than ~15GB of free memory. It may not happen, but it can. Again, not directly because the system is low on free memory, but that it does not have a sufficient number of contiguous pages for such allocations. Having more free memory definitely helps though in improving the odds. When the error triggers, something along the lines of the following gets populated in contrail-vrouter-agent.log:

contrail-vrouter-agent: controller/src/vnsw/agent/vrouter/ksync/ksync_memory.cc:100: void KSyncMemory::Mmap(bool): Assertion `table_size_ != 0' failed.

This greatly limits the possibilities of live-patching a system that is running vRouter. A workaround that is usually recommended to avoid such situation of memory fragmentation is to reboot the node, forcing vRouter Kernel Module to be inserted immediately after OS boot up before the memory gets drained or fragmented by different processes. If you have workloads that are running over vRouter Kernel-Mode, it comes down to how much time you can sustain with vRouter down.

You might think: “So where is the resiliency / auto-healing / migration of VMs here?”. Trust me when I tell you that I’ve seen production environments where these options were simply not possible due to infrastructure design constraints. One of those use cases did not even involve vRouter directly. It was more related to a critical VM running on a compute node, with vRouter and SR-IOV interfaces at the same time, and no option to go for a node reboot following a minor release upgrade (VM2 in the following picture):

vRouter SR-IOV Mode

source: https://tungstenfabric.github.io/website/Tungsten-Fabric-Architecture.html#vrouter-deployment-options

Working around this specific use case limitations is a bit further down in the post, but first let’s explore how vRouter recent releases have options to navigate around this issue.

Read More »

Written with love ♥