I have a replicaset with two pods in an onpremise kubernetes. Is it possible to setup an active standby setup so only one pod receive requests and the second one stay in standby mode and takes requests only if first one is down? I don’t find anything on the web on how to setup this.

If we can do active passive pattern in kubernetes, then is it possible from the pod who elected to know he is elected ? In my project, the pod is an asp.net web api

For more context:

  • my two pods receive financial stock pricing requests and it sends back a stream of prices. The requirement is that the price streaming should never stop when one of the pod stop or crashes.


  • my pod receive a request for AAPL stock prices and it sends back a stream of prices for AAPL

  • my second pod receive a request for MSFT stocks, and it sends back a stream of prices for MSFT

  • at one point of a time, each pod has his own list of stocks request it needs to push prices.

Solution 1:

If I have a replicaset with one pod, it receive a request and stream back prices. But the issue is: if it crashes, kube will restart it but it takes like 10sec to restart. That means there’s no prices for ~10 secs which is too long.

Solution 2:

If I have a replicaset with two pods, each pods will receive differents stocks pricing request but if one of the pod crashes, them every prices streamed by this pod goes down and have to wait for pod to restart which takes ~10 seconds. Same issue than before.

Solution 3:

If I have a replicaset with two pods with active/standby pattern, the active pod receive all requests and stream back prices. If it goes down or crashes. The standby pod become the active one and stream back prices very fast because it’s already started.

Thats why I’m trying to implement solution 3. What do you think ? would you do differently ?

Possibly the existing
or readiness
probes would meet your needs.
Kubernetes will remove a server from rotation
when it fails such health checks.

An app-level solution is to deploy a replicaset of proxies,
plus N distinct services which each offer a single pod
running the current price server.

We assume that all price requests have roughly equal
elapsed time cost. That is, they predictably will
complete within some 98th-percentile target time.
(Or a “big” request for three prices completes
within triple that, something predictable.)

Each proxy is small and simple, there’s almost no reason
for it to crash. The complexity is pushed out into the price servers.

When a proxy starts up it sends a synthetic price request
(say, for AAPL or for the most popular stock) in parallel
to each of the N servers, and records the latency,
timing out after 1 second or whatever you find reasonable.
Then it begins serving production queries.

A proxy will associate a pending_queries queue with each server,
and also an elapsed_time measurement from the most recently
completed successful (or timed-out) query.
A global DELAY_THRESHOLD, of perhaps 1.0 seconds,
lets us distinguish between a “good” timely response and
“degraded” or timed-out response.
A global MAX_QUEUE_DEPTH limits the outstanding queries
for a given server.
If all servers accumulate queues that huge then
the system has entered a failed state.

Upon receiving a price request the proxy chooses a random_server().
To do this it rolls an N-sided dice and verifies the
chosen server is below the MAX_QUEUE_DEPTH.
If not it cycles through a fixed permutation to find
next server which passes the test, or returns “fail”
to client if there is no such server.

Now our proxy sends request to chosen server,
and checks whether its recent elapsed_time latency
was below DELAY_THRESHOLD, that is, checks for health.
If unhealthy, proxy immediately chooses another
random_server() and sends to it, as well, anticipating
likely failure (or delay) in response to first request.
Continue doing this until we have sent to an apparently healthy server.
Adding small random jitter to each request timeout
will ensure that requests queued for a dead server
are spread out, minimizing time to notice that it returned.

Upon receiving a valid pricing response
the proxy updates elapsed time and maybe replies to client.
If another thread previously forwarded a response from
a faster server, then we have nothing to say to the client.
Upon timeout of final request (typically of only request) we obviously resend to a new server.

In this way we ensure that

  • No server, not even a dead server, will have a crazy number of pending requests.
  • Live servers will always have a recent elapsed_time estimate of their responsiveness.
  • Only a minimal amount of “busy work” is performed to probe servers for health — essentially just the queries sent to unhealthy servers.
  • If at least one server is healthy, clients will nearly always get a rapid response from it. Except for those one-second timeouts surrounding the death of a server.
  • Servers that recover will soon be probed and marked healthy.

Suppose your costs are dominated by electrical power,
that is, by CPU cycles consumed (ignoring DRAM refresh).
Then in steady state we are optimally efficient.
We could have N servers that are seldom idle,
or 2 × N servers with more idle time,
and we’d consume essentially the same resources.