It's not the same rule set though. The rule set they evaluated the AI on isn't one of the ones that it supports.
Edit: This is confusing for some people because there are essentially two rule sets with the same name, but Tromp-Taylor rules as commonly implemented for actual play (including by Katago) involves dead stone removal, where as Tromp Taylor rules as defined for Computer Science research doesn't. One might argue that the latter is the "real" Tromp Taylor rules (whatever that means), but at that point it is obvious that you are rules lawyering with the engine authors rather than doing anything that could reasonably be considered adversarial policy research.
There are two strategies described in this paper. The cyclic adversary, and the pass adversary. You are correct that the pass adversary is super dumb. It is essentially exploiting a loophole in a version of the rules that Katago doesn't actually support. This is such a silly attack that IMO the paper would be a lot more compelling if they had just left it out.
That said, the cyclic adversary is a legitimate weakness in Katago, and I found it quite impressive.
What is "cyclic" about the adversarial strategy, exactly? Is it depending on a superko rule? That might potentially be interesting, and explanatory. Positions where superko matters are extremely rare in human games, so it might be hard to seed training data. It probably wouldn't come up in self-play, either.
No, it isn't related to superko. It has to do with Katago misidentifying the status of groups that are wrapped around an opposing group. I assume the name cyclic has to do with the fact that the groups look like circles. There are images in the paper, but it is a straight forward misread of the life and death status of groups that are unambiguously dead regardless of rule set.
> So instead of looking, like the author of these new options, for ways to make life for the bad guys harder we do nothing?
Random brute force attempts against SSH are already a 100% solved problem, so doing nothing beyond maintaining the status quo seems pretty reasonable IMO.
> I don't buy your argument nor all the variation on the same theme: "There's a minuscule risk of X, so we absolutely nothing but saying there's nothing to do and we let bad guys roam free!".
Setting this up by default (as is being proposed) would definitely break a lot of existing use cases. The only risk that is minuscule here is the risk from not making this change.
I don't see any particularly reason to applaud making software worse just because someone is "trying".
This is essentially the Copenhagen interpretation of ethics. If you interact with a problem you become responsible for it. Improving someones situation is unethical while doing nothing isn't. This is because improving their situation makes you become responsible for it still not being good enough (according to the pundits who do nothing).
Can you be more specific about what particular anthropomorphizing you object to? The only place the author uses the word want is in describing the wants of humans.
That’s not how these kinds of trials work. The 13% figure comes from comparing the control and intervention groups, which were observed over the same period of time. The change in baseline mortality over time of the entire population isn’t relevant.
The best probability estimate you can make is constrained by the information you have available. The new person showing up has less information than the existing constant, so it makes sense that their best estimate would be less precise. Similarly, if someone with x-ray vision walked up in the middle of the game, they could pick the car 100% of the time, because they have access to more information than either of the existing contestants.
Your last paragraph isn't correct though, By switching you go from a 1/3 probability to a 2/3 probability. Based on the information the original contestant has, switching gets the car 2/3 of the time.
I don't see how a new contestant has less information, though? They know that one of the two doors contains the prize, which is all the previous contestant knows either.
The crucial bit of information that the new contestant doesn't have is that there was a door that was ineligible to be eliminated (the door chosen by the original contestant).
If the game had different rules, it would work like you are imagining. Specifically, if Monty randomly eliminated one of the two doors, meaning there was a chance for Monty to reveal the prize instead of a goat. If Monty has the chance to eliminate the prize before giving the contestant a chance to switch, then switching does not give you an advantage.
But it didn't. Before, we knew that one of those two doors could contain either a prize or a goat. After, we know the same exact thing. No information was gained there.
> If a "crappy" tool costs $10k/mo for the team and doesn't require much or any devops time to setup and maintain, it's likely cheaper than the $0/mo opensource but requires part or full time management option.
This is a total fantasy. There is no reason to expect the crappy enterprise tool that costs money will save time relative to the open source tool. In my experience enterprise tools takes more time and average and cost money. This line of reasoning (frequently pushed by dishonest sales people) is seductive because it tricks you into ignore the time cost of dealing with enterprise, not because it is correct.
I think you must be in some terrible giant corporate entity where everything sucks all the time.
I bet you that what I told you about this cost/benefit is being done in every startup that YCombinator funds. I bet you they are all choosing product over staff, because their engineering staff is already their ~largest expense. Their engineers might even cost more than the rest of the company combined including executives.
How many companies are choosing SaaS/PaaS/alltheS so their lean team of engineers can scale to millions, instead of hiring a whole team to manage AWS and docker or whatever ops strategy they have.
You might think it's a total fantasy and in the places you work it may be, but I've been in countless meetings with countless executives where the Google Sheet is busted out featuring engineering cost and tool cost and where cost/benefit is aggressively decided.
Spoiler: it's almost always cheaper to use the tool
> I've been in countless meetings with countless executives where the Google Sheet is busted out featuring engineering cost and tool cost and where cost/benefit is aggressively decided.
That is the problem though. It isn't the engineering cost vs the tool cost. It is the engineering cost vs the tool cost PLUS the engineering cost of dealing with the tool once you buy it. Everything you have said so far leads me to believe you are missing this aspect of the cost of buying the tool.
You are right that there is a time and a place for buying over DIY, but in order to make those decisions reliably you need to know how much effort is going to go into dealing with the tool once you buy it. This isn't something you can figure out using Google sheets, because you have to actually evaluate the tool and get a sense of how dangerous the foot guns are.
You're probably right about scaling though. That sounds like an area where the ROI of paying someone else to do it is pretty good.
Edit: This is confusing for some people because there are essentially two rule sets with the same name, but Tromp-Taylor rules as commonly implemented for actual play (including by Katago) involves dead stone removal, where as Tromp Taylor rules as defined for Computer Science research doesn't. One might argue that the latter is the "real" Tromp Taylor rules (whatever that means), but at that point it is obvious that you are rules lawyering with the engine authors rather than doing anything that could reasonably be considered adversarial policy research.