A not-so-fun error occurred earlier today when my standalone Contrail/Tungsten Fabric Controller host went down. After bringing it back up, Cassandra DB was reporting the following errors:
Standalone Controller is not a recommended design, due to the nature of components running in it, plus how vRouter connects with controllers. This post is discussing a PoC setup.
Using the command contrail-status to display all services on that node (output will be different on vRouter nodes):
== Contrail control == control: initializing (Database:Cassandra connection down) nodemgr: initializing == Contrail config-database == nodemgr: initializing (Cassandra state detected DOWN. ) == Contrail database == nodemgr: initializing == Contrail analytics == snmp-collector: initializing (Database:Cassandra[] connection down) query-engine: initializing alarm-gen: initializing (Database:Cassandra[] connection down) nodemgr: initializing collector: initializing (Database:Cassandra, Database:contrail-01.ameen.lab:Global connection down) topology: initializing (Database:Cassandra[] connection down) == Contrail webui == == Contrail config == svc-monitor: initializing (Database:Cassandra[] connection down) nodemgr: initializing device-manager: initializing (ApiServer:ApiServer[] connection down) api: initializing (Database:Cassandra[] connection down) schema: initializing (ApiServer:ApiServer[] connection down)
Also, some services were reporting state UP for less than 2 minutes, while the controller node itself was up for almost an hour:
Pod Service Original Name State Status config-database cassandra contrail-external-cassandra running Up 11 seconds database cassandra contrail-external-cassandra running Up About a minute control nodemgr contrail-nodemgr running Up About a minute config-database nodemgr contrail-nodemgr running Up 34 seconds
Checking on the Cassandra container revealed the issue: Read More »