We realized that some of our workers are dying due to not being able to connect to MySQL (From Python)
The message received: is mysql.connector.errors.
It’s coming and going and we couldn’t appropriate it to any issues on our code or server itself.
The assumption is that on high velocity querying, some packets/DNS reolution queries gets lost.
We have RabbitMQ is an exchange and we’re processing around 150 Messages per second, this might explain the scarce but existing issue with connecting to the RDS.
I realize that the DNS queries have TTL and are cached but still we were getting random errors here and there.
The solution was to define the RDS URL in the hosts file thus saving the need to query for it.
After inserting the URL and IP in the hosts file, we stopped having those random drops from our workers.