You do not need 100% automation. What you need is a systematic approach to handling problems followed by fixing the root cause.
Runbooks came from techops in broadcasting, power plant operations, etc where there was a clear division between operators who pushed buttons, ran cables, etc and those that made decisions about buttons to push and cables to run. Dumb hands + runbooks created "smart hands".
Runbooks came from techops in broadcasting, power plant operations, etc where there was a clear division between operators who pushed buttons, ran cables, etc and those that made decisions about buttons to push and cables to run. Dumb hands + runbooks created "smart hands".
If your SRE runs like that it is not SRE.
Look at the incident handling:
1. Identify the issue
2. Implement a workaround to restore the service
3. Identify the root cause
4. Implement a fix for the root cause
5. Remove the workaround
Runbooks cover 1. and 2.