Health endpoint for container readiness

simonhir · July 9, 2025, 12:08pm

Hi,

currently used probes in the helm-chart on “:3000/” return a 200 even if the rails server isn’t fully started up. This leads to traffic routed to unready services, 504 errors and therefore breaks the complete advantages of high availability with Kubernetes. As far as i saw there is no alternative endpoint for this usecase.

The solution would be to provide a health endpoint which returns 200 only when the rails server is ready to accept connections.
Rails itself already has this feature included, but it seams to not be available through Zammad at the moment: Rails::HealthController

I think that would be a great improvement for the stability of Zammad.

Differentiation of Zammad Monitoring: Requires auth token and returns the health of the complete setup not a single container

mgruner · July 21, 2025, 7:55am

@simonhir thanks for the proposal! To clarify this first, I would like to understand better what the issue is. Currently, we have

    readinessProbe:
      httpGet:
        path: /
        port: 3000
      failureThreshold: 5
      timeoutSeconds: 5

This does perform a regular HTTP call to the Rails application, so I would assume that it has to have booted first before answering the request here. Is this not the case? What exactly happens when it goes wrong?

mgruner · July 22, 2025, 11:50am

@simonhir I just tried to reproduce this unsuccessfully.

To do this, I placed an artificial delay in one of our intializer files (sleep 10) to prolong the startup period. When starting the puma web server, it only starts listening on the port after the initialization period is completed. During that period, HTTP calls fail:

curl localhost:3000
curl: (7) Failed to connect to localhost port 3000 after 0 ms: Couldn't connect to server

Afterwards, the correct content is immediately served. Therefore I wonder if the described issue actually exists, since the readiness probe does perform HTTP calls.