I'm sure it varies, but personally I have a very prosaic reason that I would still drive myself in most scenarios: If someone else is driving I tend to get motion sickness.
I'm definitely not the most comfortable writing in public forums, so guilty as charged with throwing my comments through an LLM to make sure my point isn't being misconstrued.
I think Google basically _is_ the standards committees, at this point. Not in the sense of having majority control just by themselves, but in the sense of (1) the cartel being argued over here (browsers funded by Google) having that or close to it, and (2) Chrome being the main source of new features getting implemented, so that the job of the standards committees is mostly to play catch-up with Chrome.
> Aren’t coding copilots based on tokenizing programming language keywords and syntax?
No, they use the same tokenization as everyone else. There was one major change from early to modern LLM tokenization, made (as far as I can tell) for efficient tokenization of code: early tokenizers always made a space its own token (unless attached to an adjacent word.) Modern tokenizers can group many spaces together.
Merging like that doesn't work -- it will tend to overestimate the number of distinct elements.
This is fairly easy to see, if you consider a stream with some N distinct elements, with the same elements in both the first and second halves of the stream. Then, supposing that p is 0.5, the first instance will result in a set with about N/2 of the elements, and the second instance will also. But they won't be the same set; on average their overlap will be about N/4. So when you combine them, you will have about 3N/4 elements in the resulting set, but with p still 0.5, so you will estimate 3N/2 instead of N for the final answer.
I have a thought about how to fix this, but the error bounds end up very large, so I don't know that it's viable.
Robbery/burglary? SWATing? The possibilities are delightful and endless. The former is a major concern for people who are known to be rich; the latter for people who are infamous online (and has the [dis]advantage that it can be carried out by anybody, anywhere in the world, typically repeatedly, and with usually zero repercussions.)
Thanks for linking the graph, that's kind of wild. I agree with you that the lowest datapoint seems crazy. I can think of a few explanations.
- Random bad luck.
- As you say, failing to control for something -- although, if you then treat the lowest datapoint as being effectively the default risk, this would suggest support for radiation hormesis (that people who got a bit more than background radiation actually did better.)
- Some kind of data collection artifact. Perhaps the people with the absolute lowest dose, in a radiation-worker dataset, are selected for being ones who are not getting an accurate measurement (i.e. sloppy about wearing dose badges or something), and those people genuinely do have worse outcomes.