A not-so-fun error occurred earlier today when my standalone Contrail/Tungsten Fabric Controller host went down. After bringing it back up, Cassandra DB was reporting the following errors:
Standalone Controller is not a recommended design, due to the nature of components running in it, plus how vRouter connects with controllers. This post is discussing a PoC setup.
Using the command contrail-status to display all services on that node (output will be different on vRouter nodes):
== Contrail control ==
control: initializing (Database:Cassandra connection down)
nodemgr: initializing
== Contrail config-database ==
nodemgr: initializing (Cassandra state detected DOWN. )
== Contrail database ==
nodemgr: initializing
== Contrail analytics ==
snmp-collector: initializing (Database:Cassandra[] connection down)
query-engine: initializing
alarm-gen: initializing (Database:Cassandra[] connection down)
nodemgr: initializing
collector: initializing (Database:Cassandra, Database:contrail-01.ameen.lab:Global connection down)
topology: initializing (Database:Cassandra[] connection down)
== Contrail webui ==
== Contrail config ==
svc-monitor: initializing (Database:Cassandra[] connection down)
nodemgr: initializing
device-manager: initializing (ApiServer:ApiServer[] connection down)
api: initializing (Database:Cassandra[] connection down)
schema: initializing (ApiServer:ApiServer[] connection down)
Also, some services were reporting state UP for less than 2 minutes, while the controller node itself was up for almost an hour:
Pod Service Original Name State Status
config-database cassandra contrail-external-cassandra running Up 11 seconds
database cassandra contrail-external-cassandra running Up About a minute
control nodemgr contrail-nodemgr running Up About a minute
config-database nodemgr contrail-nodemgr running Up 34 seconds
Checking on the Cassandra container revealed the issue: Read More »