The Docker Networking Trap: Why Your Containers Appear Healthy While Traffic Disappears
Container logs show success. Health checks pass. Yet traffic between services stops at the network boundary. This is how silent DNS and port-binding failures hide in plain sight—and how to find them before they crush production.
· 8 min read
The Morning Everything Looked Fine But Wasn't
You deploy a multi-container service. The orchestration layer reports green. Logs on both sides show no errors. But one service cannot reach another. Traffic vanishes somewhere between containers.
No timeout. No explicit failure. Just silence.
This is not the crash you learn to debug in a tutorial. This is the invisible failure: the kind where every log line looks correct right up until the moment it doesn't. And by then you are in production, or you are discovering it in staging at 2am the night before launch.
The trap has three entry points: DNS resolution fails silently, port bindings advertise what they do not actually serve, or the network itself allows the connection attempt but drops packets without acknowledgment. Each one feels like a different problem. Each one is the same category of mistake: assuming connectivity where there is none because you did not verify the actual path.
Why Container Health Checks Lie to You
Health checks verify that a service responds on localhost. They do not verify that other containers can reach it.
Here is the mechanism: a health check runs inside the container, talking to itself. It uses 127.0.0.1 or the loopback interface. If your application serves on that address, the check passes. Kubernetes marks the pod as ready. The deployment continues.
But 127.0.0.1 is not reachable from another pod. It is the container's own internal loop. You have proven nothing about inter-container routing.
This is why you can see logs saying "listening on 0.0.0.0:8080" and still have neighbouring containers unable to reach port 8080. The process is listening. The health check passes. The port is simply not bound to an interface that accepts traffic from outside the container.
The binding that is not what it appears to be
A common pattern:
CMD ["./app", "--listen", "127.0.0.1:8080"]
Or, more subtle:
ENV LISTEN_ADDR=localhost
# Application defaults to listening on $LISTEN_ADDR if not overridden
The application runs. It logs "listening". No error occurs. A health check hitting 127.0.0.1:8080 from the same container reports success.
Now send a request from another pod to the service's ClusterIP or container hostname, pointing to port 8080. It hangs. It times out. It goes nowhere.
The port is bound. The service exists. But it is bound to an address that is not routable from outside the container.
Verify this:
# From inside the container
docker exec <container_id> netstat -tlnp | grep 8080
# Look for the ADDR column
# 127.0.0.1:8080 = not reachable from outside
# 0.0.0.0:8080 = reachable from anywhere in the container's network
If you see 127.0.0.1, traffic from other containers cannot reach it. Fix this in your Dockerfile or startup script. Ensure the application listens on 0.0.0.0 or an explicit routable address, not localhost.
DNS Resolution Fails Before the Connection Even Starts
This one is harder to spot because DNS happens silently, and partial failures are common.
Your service is named api-service. Another container tries to reach http://api-service:8080. The DNS lookup runs. The container's resolver checks its internal DNS (usually the cluster's CoreDNS or the container host's nameserver).
If that DNS server does not have a record for api-service, the lookup fails. The HTTP client does not crash; it times out trying to connect to an unresolvable hostname. The timeout often happens before the log line is written. The request appears to vanish.
When DNS works sometimes but not always
A real scenario: two pods are in different namespaces. Pod A tries to reach http://service-b. The hostname is valid, but it needs the full FQDN: service-b.namespace-b.svc.cluster.local. Using just service-b works only if you are in the same namespace or if the client's search domain is configured to include other namespaces.
Check what the container's DNS actually sees:
# From inside a container
nslookup api-service
nslookup api-service.default.svc.cluster.local
# Compare results
If the first returns "not found" and the second returns an IP, you have a namespace isolation issue. The client is using a short name that does not resolve across namespace boundaries.
The stub resolver that drops lookups
In Docker Compose or local Docker networks, DNS is handled by the embedded nameserver. It is usually reliable, but it can be bypassed.
If a container's /etc/resolv.conf points to a non-existent or misconfigured nameserver (which happens when you manually set DNS options or use certain network drivers), lookups fail silently.
Verify:
cat /etc/resolv.conf
# Inside the container, check nameserver addresses
# Try to resolve a hostname
dig api-service
# or
nslookup api-service
If dig returns SERVFAIL or refuses the connection, or if nslookup times out, the nameserver is unreachable. Nothing will resolve. All outbound traffic to other containers by hostname will hang.
Fix this by ensuring the container's resolv.conf is correctly populated. In Docker Compose, this usually happens automatically. In Kubernetes, check the pod's DNS policy and dnsConfig fields.
Port Bindings That Look Right But Are Not Exposed
You bind a port inside the container. You declare it in the Dockerfile. You expose it in Docker Compose or Kubernetes. The service is still unreachable.
Port binding and port exposure are separate steps. Binding means the process listens on a port. Exposure means the container orchestrator makes that port available to other containers or the host.
The Dockerfile EXPOSE that does nothing
This line in a Dockerfile:
EXPOSE 8080
This tells humans and documentation tools that the service listens on 8080. It does not actually expose anything. It is metadata. The port is still only reachable if the container's runtime explicitly maps or publishes it.
If you rely on EXPOSE alone, and the orchestrator does not have a corresponding port declaration in the service definition, the port is not reachable.
In Docker Compose, you also need:
services:
api:
ports:
- "8080:8080"
In Kubernetes, the pod definition must include ports:
ports:
- containerPort: 8080
name: http
Without this, EXPOSE is just a label. The port is not opened.
Binding to the wrong interface inside the container
Even when you expose a port, if the application inside the container binds to 127.0.0.1, the port is bound but not accessible.
This is the most common case. The developer runs the app locally on localhost:8080, it works fine. They containerize it without changing the bind address. The container reports port 8080 is open, the orchestrator exposes it, but traffic still cannot reach it because the application is listening only to loopback.
Inside a container, loopback is isolated. Only processes inside that container can reach 127.0.0.1. Other containers and the host cannot.
Force the application to bind to 0.0.0.0 or an explicit external IP. In many frameworks, this is a configuration option. In others, you must patch the code.
# Example: Node.js server
app.listen(8080, '0.0.0.0'); // correct: reachable from outside
app.listen(8080, 'localhost'); // wrong: only reachable from inside
The Silent Packet Drop
Traffic reaches the network. The connection attempt is sent. No response comes back. No error is logged.
This happens when a firewall rule, network policy, or routing rule silently drops packets without sending a reset or ICMP unreachable message.
In Kubernetes, this is usually a NetworkPolicy. In Docker networks, it is less common but can happen with misconfigured iptables rules.
The symptom: TCP handshakes hang and eventually time out. SYN packets are sent, SYN+ACK never returns. The client waits for a timeout (often 30 seconds or more).
Checking and tracing network policies
If you are using Kubernetes NetworkPolicies, verify that the policy allows traffic between the pods:
kubectl get networkpolicies -A
kubectl describe networkpolicy <policy-name> -n <namespace>
A NetworkPolicy that denies ingress on port 8080 will silently drop connection attempts. The policy acts like a firewall: it does not crash, it does not error. It drops packets.
To debug, temporarily remove or modify the policy, test connectivity, then restore it with corrected rules.
In Docker, check iptables if you are on the host (not in a container):
sudo iptables -L -n
But this is rare in normal Docker setups. More common is a misconfigured custom bridge network or overlay network with routing rules that drop traffic between specific subnets.
How to Actually Debug This Before It Hits Production
1. Test DNS resolution from the client container
Do not rely on service names working automatically. Test them:
# From inside the client container
docker exec <client_container> nslookup <service_name>
docker exec <client_container> dig <service_name>
If this fails, DNS is the problem. Fix the resolver configuration or use the full FQDN.
2. Verify the listening socket
Check what interface and port the server is actually bound to:
# From inside the server container
docker exec <server_container> netstat -tlnp
# or
docker exec <server_container> ss -tlnp
Look for the address. 0.0.0.0 or the container's internal IP is good. 127.0.0.1 means it is not reachable from other containers.
3. Test from the client container
Do not test from the host. Test from the client container:
docker exec <client_container> curl http://<service_name>:8080
docker exec <client_container> curl http://<service_ip>:8080
docker exec <client_container> nc -zv <service_name> 8080
Use curl or nc (netcat) from inside the client container, not from the host. The host may have different networking rules or DNS configuration.
4. Trace packet flow with tcpdump
If connectivity still does not work, capture traffic at the network interface level:
# From the host (requires privileges)
sudo tcpdump -i <interface> 'port 8080'
Or inside a container:
docker exec <container> tcpdump -i eth0 'port 8080'
Look for SYN packets that go unanswered (indicating dropped traffic) versus RST packets (indicating refused connections) versus nothing (indicating the packet never left the container).
If you see SYN but no SYN+ACK, the traffic is being dropped somewhere in the network path, not refused by the server.
5. Check container logs for the actual listen address
Many applications log the listen address when they start:
# Check server logs
docker logs <server_container>
Look for lines like "listening on" or "bind address". If the application logs "listening on 127.0.0.1:8080" or "listening on localhost", that is your problem. It is not listening on a routable address.
6. Test the port with a simple echo server
Replace your application temporarily with a minimal echo server to isolate whether the problem is the application or the networking:
# Temporary test image
FROM alpine
RUN apk add --no-cache netcat
CMD ["nc", "-l", "-p", "8080", "-e", "cat"]
Or use a simple HTTP server:
FROM python:3.9-alpine
RUN echo 'import http.server; http.server.HTTPServer(("0.0.0.0", 8080), http.server.BaseHTTPRequestHandler).serve_forever()' > /tmp/server.py
CMD ["python", "/tmp/server.py"]
Deploy this, then test connectivity from the client. If the echo server works, your networking is fine. The problem is in your actual application's configuration.
A Real Pattern: The Multi-Tenant SaaS Pitfall
When you are building systems that orchestrate multiple customer environments (as in Shell ParkEasy), or services that dynamically spawn containers for isolated workloads, these DNS and binding issues compound.
Each tenant's container network is supposed to be isolated. But if you are not explicit about the listen address in each tenant's configuration, or if you are routing traffic through a gateway that does not know how to resolve tenant-specific service names, you end up with the same silent failure.
The logs look correct because each container is logging its own internal state. But the inter-tenant traffic never arrives. And you do not discover it until you try to coordinate across environments.
The fix is the same: verify DNS resolution, verify binding, test from the client side. But do it as a standard part of your deployment validation, not as a one-off debug session.
Pre-Deployment Checklist
Before any service goes to staging or production:
- From the client container, resolve the server's hostname by name and IP. Both must work.
- From the client container, connect to the server on the exposed port. Use curl or nc from inside the container, not from your laptop.
- Check the server's netstat output. Verify the listening address is 0.0.0.0 or a routable address, not 127.0.0.1 or localhost.
- If using Kubernetes, check that port declarations exist in the pod spec, not just in the Dockerfile.
- If using NetworkPolicies, verify they allow ingress on the port you are using.
- If traffic still does not flow, capture packets with tcpdump and look for SYN packets without SYN+ACK responses.
These steps take five minutes. They catch the problem before the problem catches you.
The Lesson
Container orchestration abstracts away a lot of networking complexity. It is tempting to assume that if health checks pass and logs look good, everything is fine.
It is not. These systems are still built on the same networking primitives that have existed for decades. DNS still requires valid records. Port binding still requires listening on a routable address. Firewalls still drop packets.
The difference is that when these basic mechanisms fail in containers, the failure mode is silent. Your application does not crash. The logs are clean. The service just becomes invisible to everything outside itself.
This is not a bug in Docker or Kubernetes. This is a consequence of moving from single-machine debugging (where you can ping localhost, hit a port from the same process) to distributed systems (where the distance between services is real, and failures can be invisible).
Verify the connection. Do not assume it. Test from the client side, not from the host. Log what you are listening on, and verify it is what you think. That covers 95% of these cases.
Keep reading
- devopsself-hosting
One VPS, three side projects, zero cloud bill
Docker Compose, Nginx, a 2GB Hetzner box, and the indie-infra setup I use when I do not want a cloud invoice on my personal card.
- inventory managementsupply chain
When Your Inventory Math Doesn't Match What's Actually in the Warehouse: How Rounding Errors and Manual Updates Turn Small Problems Into Supply Chain Disasters
Small inventory errors compound fast. Rounding mistakes, manual data entry delays, and systems that don't talk to each other create cascading supply chain failures that cost more to fix than the system that would prevent them.
- operationscost control
Why Your Vendor Lock-In Isn't Just a Tech Problem—It's Bleeding Your Operating Budget
Vendor lock-in costs most businesses real money every month: wasted time, duplicated effort, and the panic bill when you finally want to leave. Here's what it actually costs, and how to spot it before you're trapped.