Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have some paradoxical feelings about "blameless" retro culture that I'll try to sum up.

In general, I'm in favor of the approach. I don't think singling people out and bullying or shaming them for their mistakes ever works. I think most well-intentioned engineers will already beat themselves up plenty for making a serious mistake, and they don't need any encouragement to do so. I know I do.

On the other hand, there is a red line. At a place I worked, a DBA was let go after he repeatedly brought production down for 45 minutes to an hour at a time by running intensive queries of his own design for data-gathering, in some cases, after being explicitly told not to do that against the prod database. This was a person whose job description required him to have access to prod.

There were process problems, maybe - being allowed to run whatever queries you want on production under your own authority, sure - but his cavalier attitude towards a production environment was still unacceptable. Process can only help when people are well-intentioned and doing their best; if people are malicious or negligent or just not good at their jobs, adding more process to get around that only makes things worse.



I think there should be a difference between a postmortem process and a performance management process and just because the first is blameless doesn’t mean that the second can’t look back to find problems or negligence.

That said, even when there is obvious negligence, having the postmortem process look at the issue with blamelessness is important to build up tooling/changes that could prevent it from happening again. For example, maybe you could revoke individuals having direct access to the production database without multi-party authentication.


>I think there should be a difference between a postmortem process and a performance management process and just because the first is blameless doesn’t mean that the second can’t look back to find problems or negligence.

That doesn't make sense. The moment that you look back at a postmortem for use in penalizing someone via performance management, the postmortem is no longer blameless.


You don’t look back at the postmortem, but if a manager says “you have repeatedly broken policy and, despite warnings, have logged into systems without permissions leading to incidents” I don’t think that’s a problem. It’s completely separate.

Additionally, if someone is going up for promotion and uses a number of launches in their packet that all resulted in regressions and didn’t have good rollback plans, I don’t think the committee needs to be blind to that fact.


> he repeatedly

Surely the first occurrence led to a post-mortem which documented and forbed the practices that became known to be dangerous for production.


Yes, that is presumably what "after being explicitly told not to do that against the prod database" refers to.


Action items from a post mortem should not be “do better, human” but “prevent the humans from making this mistake, machine”


And missing details like, did his job require data analysis? Was he involved in coming up with the resolution in the post mortem, or was it done by someone unrelated?


True, learning from your mistakes should be a given. In this case the person involved was completely ignorant though.


Maybe. Unclear if it was documented or just told verbally.


I don't know for sure but I believe there was a PIP and so on.


It seems like a read replica would have helped out in this instance.

I agree if somebody decides to keep doing the same actions after being told not do to them, because their actions would bring down production, and their actions do bring down production, then they should be held accountable.


>> if people are malicious or negligent or just not good at their jobs, adding more process to get around that only makes things worse.

That's why there is a hiring and firing process.


> At a place I worked, a DBA was let go after he repeatedly brought production down for 45 minutes to an hour at a time by running intensive queries of his own design for data-gathering, in some cases, after being explicitly told not to do that against the prod database. This was a person whose job description required him to have access to prod.

Trying to have some sympathy: Was he given an alternative? Or was it a "stop doing that important thing -- I don't know how else to do it, figure it out" situation?


It wasn't particularly important and we had "offline" copies of most of the DB data for this sort of thing, just somewhat less up to date. I honestly don't know why he did this.


I think maybe it's an attempt to buzzwordify a culture of not holding honest mistakes against people, and pretend it's a discrete separable "thing we do" rather than a pervasively intertwingled aspect of "what we're like here".






Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: