Quantcast
Channel: Debian User Forums
Viewing all articles
Browse latest Browse all 3423

intermittent DNS lookup failures in Debian 12 on AWS

$
0
0
Hello all. First post time. Apologies in advance if this question seems a bit vague, but I'm not sure where to look to evidence of this issue.

This is initially a request to see if anyone's seeing anything similar!

Circumstances are:
- We run workloads in AWS, historically on a weird mix of Amazon Linux, CentOS and Ubuntu.
- I want to gradually migrate to using Debian, using official Debian AMIs, as we retire servers and introduce new services.
- I have a few servers now running Debian 12 x86_64, I thought with no issues - but ...
- A service I've just migrated from an Ubuntu 16.04 box, a Python app, is having occasional issues which it claims are due to DNS lookup failures, when trying to establish a new PostgreSQL connection to the endpoint of an AWS RDS database.
- The app is Indico (see https://getindico.io/), so not one we write ourselves. It uses psycopg2 as the database client library.
- The errors look like this:

Code:

(psycopg2.OperationalError) could not translate host name "indico.cluster-example.eu-west-2.rds.amazonaws.com" to address: Name or service not known
- By and large, the app is working fine, so it is clearly usually able to talk to the database, but we get a few tens of these errors every day in the app's log files.
- We never saw this with the old Indico software on the Ubuntu server.
- This prompted me to check our test Debian 12 box, also running a test version of this Indico service - and lo and behold, it has also had occurrences of this DNS error - but hardly any, since the test service isn't really used.
- All our servers use the AWS VPC DNS resolver.
- As far as I can tell, we've not seen any DNS failures of any kind on any of our other servers, many of which talk to databases, only these two new Debian 12 boxes.

I have been discussing this over on the Indico forums, and the feeling is that it is not likely to be some new unknown issue in Indico or psycopg2 - I agree with this.

So, currently, I have a DNS service (AWS's VPC DNS resolver) which I expect to be totally reliable, being used to look up a name (the AWS RDS endpoint) which I would expect to always return an answer, by an OS (Debian 12) and a DB client library (psycopg2) which I again would be amazed at if they had unknown new bugs in the area of DNS lookups.

My Debian 12 servers have /etc/resolv.conf as a link to /run/systemd/resolve/resolv.conf, as expected, and it contains something like:

Code:

# This is /run/systemd/resolve/resolv.conf managed by man:systemd-resolved(8).# Do not edit.## This file might be symlinked as /etc/resolv.conf. If you're looking at# /etc/resolv.conf and seeing this text, you have followed the symlink.## This is a dynamic resolv.conf file for connecting local clients directly to# all known uplink DNS servers. This file lists all configured search domains.## Third party programs should typically not access this file directly, but only# through the symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a# different way, replace this symlink by a static file or a different symlink.## See man:systemd-resolved.service(8) for details about the supported modes of# operation for /etc/resolv.conf.nameserver 10.211.32.2search .
This looks normal. The boxes use DHCP, as per usual in AWS.

Can anyone suggest anything I can do to attempt to diagnose this?

Is there any more information I could provide or check, which might help?

Many thanks if you read this far! 8) Andy

Statistics: Posted by andyholtmacc — 2024-04-05 13:10 — Replies 1 — Views 50



Viewing all articles
Browse latest Browse all 3423

Trending Articles