There's so much more we can do around activation and skills creation. Looking at the eval results, there are even cases where the context makes the agent worse.
The review eval tests language, activation etc of skills. I guess you could move it all to a skill quick and then run an eval on that if using Tessl. This checks if the way you write the instructions etc are being well understood by the agent
No, the context can be human created as much as it could be llm generated. The suggestions are based on Anthropic best practices and allow the agents to activate, and use the skills better, make the text clearer for the agent etc.
11 Skills used were here: https://github.com/mcollina/skills