I've been wondering when we'll see it unambiguously showing up in there. I suspect this time next year it'll be visible for sure, maybe Q4 of this year?
At least here in SF the ideal thing would be that any vehicle dropping off in the bike lane gets fined or ticketed. This includes Waymo, Uber, cabs, personal cars, whatever. In practice it's very rare to get a ticket for this, which is why customers expect it from both Waymo and Uber.
This is super cool and exactly what I've been looking for for personal projects I think. I wanna try it out, but the "agent" part could be more seamless. How does my coding agent know how to work this thing?
I'd suggest including a skill for this, or if there's already one linking to it on the blog!
Saying nothing about the actual performance of this model, it does strike me how .... minimal(?) this announcement is. Their safety section is like 2 paragraphs about bioweapons. Go look at the reports for OpenAI and Anthropic's model releases. It's like 50+ pages of tests, examples, reports, and benchmarks across a bunch of safety and wellfare metrics.
If Meta wants to be seen as a cutting edge massive lab they need to come across as one instead of looking like a school project version of a frontier model.
Funny contrast with Anthropic. Ant does a "hero run," gets a model much more powerful than they expect. Meta does a hero run, gets a model much more mediocre than they expect. Read into this what you will, I guess?
That's it. It's just a rumor. A model, which I don't even know of it's this one specifically, fell short of expectations. This rumor came up around mid March.
I wonder if this will actually be why the models move to "neuralese" or whatever non-language latent representation people work out. Interpretability disappears but efficiency potentially goes way up. Even without a performance increase that would be pretty huge.
I just can't find myself summoning the energy to be mad about markdown. It's good enough for like 99% of the things I use it for. Sometimes I get annoyed at specific extension support or whatever when I realize I shouldn't be using markdown for that task.
> The devices are either dangerous, or they're not
That's not actually how it works though, it's all a risk and percentages. Nobody says "driving is either safe or it's not" or "delivering a baby is either safe or it's not"
Correct, but I agree with the parent that this is a dubious case to apply that reasoning.
To make it clearer, imagine another context: "It's dangerous for a passenger to have a gun on board. Therefore, we're strictly limiting passengers to only two guns."
Like, no. The relevant sad case is present with one gun just as with two.
Of course, what complicates it is that these power banks present a small but relevant risk of burning and killing everyone on board. So yeah, you might be below the risk threshold if everyone brought two, but not three. So it's not inherently a stupid idea, but requires a really precise risk calculation to justify that figure.
That's not really how risk is managed in aviation. ICAO will have made a list of all possible ways a power bank could create a hazard. Then for each failure mode, they'll come up with two numbers: probability, and severity. There's a formula to combine those two numbers into a single risk score. Any risks over the acceptable threshold (varies depending on the circumstances and I can't remember what it is for human-rated transport) must be mitigated.
A mitigation is anything that reduces the probability or the severity of a risk. There are different categories of mitigation, some of which are more robust than others. Once the risk score moves below the acceptable threshold, the risk is satisfactorily mitigated.
Example: Rapid depressurization. Without mitigation, the risk of rapid depressurization is unacceptably high. So we mitigate the probability by requiring sensitive inspections for metal fatigue, and we mitigate the severity by providing oxygen masks, a standard flight crew procedure for making an emergency descent, and regular training on that procedure. (Plus a bunch of other things I'm not thinking of off the top of my head.)
Assuming ICAO did their due diligence - and I don't have any reason to think they didn't - they would've assessed the probability and severity of all of the ways a consumer power bank might fail. That analysis is the rationale for both the number of power banks allowed on a flight and what you're allowed to do with them. And yes, they will have considered the probability of people not following the rules (which is the reason, btw, that airplane lavatories have enormous "no smoking" signs right above an ash tray).
This is the first decent answer, which I appreciate. And while my comparison to a bomb may have been over the top, I don't think a comparison to shampoo is fair either. And in any case, I'm not so sure whether the limit on toiletries is all that sensical either.
reply