Homec4science

Use stdio, not signals, to heartbeat from the daemons

Authored by epriestley <git@epriestley.com> on Feb 22 2015, 14:46.

Description

Use stdio, not signals, to heartbeat from the daemons

Summary:
Ref T7352. We currently run one overseer per daemon. I want to run one overseer for a group of daemons, to reduce the minimum memory footprint of an instance.

One barrier is how hang detection works: we detect daemon hangs by requiring them to send a periodic heartbeat. If a daemon doesn't heartbeat for a while, we assume it has hung and restart it.

Currently, this heartbeat is sent by having the daemons send SIGUSR1 to the overseer. When the overseer receives the signal, it extends the deadline for the next heartbeat.

However, the overseer can't tell where the signal came from. Right now it can only come from one place, but in a world where overseers run multiple daemons it could have come from any of the children.

Instead of using signals, this turns the daemon's stdout (which we already consume) into a structured message pipeline, and sends the heartbeat over stdout.

In a future diff, the overseer will be able to attriubute heartbeats to the correct child process.

Test Plan:

  • Ran daemon in the raw, saw sensible output.
  • Made daemon use plain echo, saw output get wrapped.
  • Artificially set heartbeat deadline to 10 seconds, saw heartbeating daemons continue running and hung daemons restart.

Reviewers: btrahan

Reviewed By: btrahan

Subscribers: epriestley

Maniphest Tasks: T7352

Differential Revision: https://secure.phabricator.com/D11850

Details

Committed
epriestley <git@epriestley.com>Feb 24 2015, 23:49
Pushed
aubortMar 17 2017, 12:03
Parents
rPHU77f0eda5b427: Add a utility class for getting system memory information
Branches
Unknown
Tags
Unknown

Event Timeline

epriestley <git@epriestley.com> committed rPHU55861bcbd6a5: Use stdio, not signals, to heartbeat from the daemons (authored by epriestley <git@epriestley.com>).Feb 24 2015, 23:49