RabbitMQ SOP
RabbitMQ is the message broker Fedora uses to allow applications to send each other (or themselves) messages.
Description
RabbitMQ is a message broker written in Erlang that offers a number of interfaces including AMQP 0.9.1, AMQP 1.0, STOMP, and MQTT. At this time only AMQP 0.9.1 is made available to clients.
Fedora uses the RabbitMQ packages provided by the Red Hat Openstack repository as it has a more up-to-date version.
The Cluster
RabbitMQ supports clustering a set of hosts into a single logical
message broker. The Fedora cluster is composed of 3 nodes,
rabbitmq01-03, in both staging and production. groups/rabbitmq.yml
is
the playbook that deploys the cluster.
Virtual Hosts
The cluster contains a number of virtual hosts. Each virtual host has its own set of resources - exchanges, bindings, queues - and users are given permissions by virtual host.
/pubsub
The /pubsub virtual host is the generic publish-subscribe virtual host used by most applications. Messages published via AMQP are sent to the "amq.topic" exchange. Messages being bridged from fedmsg into AMQP are sent via "zmq.topic".
/public_pubsub
This virtual host has the "amq.topic" and "zmq.topic" exchanges from /pubsub federated to it, and we allow anyone on the Internet to connect to this virtual host. For the moment it is on the same broker cluster, but if people abuse it it can be moved to a separate cluster.
Troubleshooting
RabbitMQ offers a CLI, rabbitmqctl, which you can use on any node in the cluster. It also offers a web interface for management and monitoring, but that is not currently configured.
Network Partition
In case of network partitions, the RabbitMQ cluster should handle it and
recover on its own. In case it doesn’t when the network situation is
fixed, the partition can be diagnosed with rabbitmqctl cluster_status
.
It should include the line {partitions,[]},
(empty array).
If the array is not empty, the first nodes in the array can be
restartedi one by one, but make sure you give them plenty of time to
sync messages after restart (this can be watched in the
/var/log/rabbitmq/rabbit.log
file)
Federation Status
Federation is the process of copying messages from the internal
/pubsub
vhost to the external /public_pubsub
vhost. During network
partitions, it has been seen that the Federation relaying process does
not come back up. The federation status can be checked with the command
rabbitmqctl eval 'rabbit_federation_status:status().'
on rabbitmq01
.
It should not return the empty array ([]
) but something like:
[[{exchange,<<"amq.topic">>}, {upstream_exchange,<<"amq.topic">>}, {type,exchange}, {vhost,<<"/public_pubsub">>}, {upstream,<<"pubsub-to-public_pubsub">>}, {id,<<"b40208be0a999cc93a78eb9e41531618f96d4cb2">>}, {status,running}, {local_connection,<<"<rabbit@rabbitmq01.phx2.fedoraproject.org.2.8709.481>">>}, {uri,<<"amqps://rabbitmq01.phx2.fedoraproject.org/%2Fpubsub">>}, {timestamp,{{2020,3,11},{16,45,18}}}], [{exchange,<<"zmq.topic">>}, {upstream_exchange,<<"zmq.topic">>}, {type,exchange}, {vhost,<<"/public_pubsub">>}, {upstream,<<"pubsub-to-public_pubsub">>}, {id,<<"c1e7747425938349520c60dda5671b2758e210b8">>}, {status,running}, {local_connection,<<"<rabbit@rabbitmq01.phx2.fedoraproject.org.2.8718.481>">>}, {uri,<<"amqps://rabbitmq01.phx2.fedoraproject.org/%2Fpubsub">>}, {timestamp,{{2020,3,11},{16,45,17}}}]]
If the empty array is returned, the following command will restart the
federation (again on rabbitmq01
):
rabbitmqctl clear_policy -p /public_pubsub pubsub-to-public_pubsub rabbitmqctl set_policy -p /public_pubsub --apply-to exchanges pubsub-to-public_pubsub "^(amq|zmq)\.topic$" '{"federation-upstream":"pubsub-to-public_pubsub"}'
After which the Federation link status can be checked with the same command as before.