Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It seems the problems your describing are management problems not engineering problems. The people who designed, built and are responsible for maintaining and updating the system should be on call if something goes wrong.

I view runbooks in the same way I view "knowledge transfers". It's management's desperate hope that somehow a person leaving a project or company can convey their acquired knowledge of a system on some Confluence page or from a few meetings. It's a complete failure to recognizing the essence of the work.

I don't think runbooks are necessarily bad if they are used to bootstrap new team members. But thinking their existence is a green light to allow people without domain expertise in a system to be responsible for administering the system during "off hours" is misguided.



> The people who designed, built and are responsible for maintaining and updating the system should be on call if something goes wrong.

In my experience, the devs were somewhere in the US west coast, and the SRE teams were geographically distributed to cover the 24 hour period during local daytime (nobody likes to be paged in the middle of the night). As an SRE in Zürich, I got paged in what was the middle of the night for the Kirkland people, dealt with the emergency (using the playbook), root-caused it (with the assistance of the playbook), and filed bugs to be looked at by the dev team when they woke up.

The systems stayed up, everyone could sleep at night, working as intended.


> and the SRE teams were geographically distributed to cover the 24 hour period during local daytime

Management problem number 1. These people should not be responsible for the running system.

> nobody likes to be paged in the middle of the night

Excellent motivation for the people that should be responsible for the running system to build quality software.


Why would randomly waking your engineers up in the middle of the night be an excellent motivation strategy?


It's an incentive to not release stuff that breaks in the middle of the night.

On the flip side that can lead to slower releases, or more expensive solutions




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: