This is one of the most confusing claims I've seen in a long time. Grep and others over files would be the equivalent of an old fashioned keyword search where most RAG uses vector search. But everything else they claim about a file system just suggests that they don't know anything about databases.
I'm not familiar with how most out of the box RAG systems categorize data, but with a database you can index content literally in any way you want. You could do it like a filesystem with hierarchy, you could do it tags, or any other design you can dream up.
The search can be keyword, like grep, or vector, like rag, or use the ranking algorithms that traditional text search uses (tf-idf, BM25), or a combination of them. You don't have to use just the top X ranked documents, you could, just like grep, evaluate all results past whatever matching threshold you have.
Search is an extremely rich field with a ton of very good established ways of doing things. Going back to grep and a file system is going back to ... I don't know, the 60s level of search tech?
Sorry, this still makes no sense. LLMs don't care about files. The way most codings systems work is that they simply provide the whole file to the LLM rather than a subset of it. That's just a choice in how you implemented your RAG search system and database. In this case the "record" is big, a file. No doubt that works for code, but it's nonsensical outside that.
E.g. for wikipedia the logical unit would likely be an article. For a book, maybe it's a chapter, or maybe it's a paragraph. You need to design the system around your content and feed the LLM an appropriate logically related set of data.
Oh but they do. These CLI agents are trained and specifically tuned to work with the filesystem. It’s not about the content or how it’s actually stored, it’s about the familiar access patterns.
I can’t begin to tell you how many times I’ve seen a coding agent figure out it can get some data directly from the filesystem instead of a dedicated, optimized tool it was specifically instructed to use for this purpose.
You basically can’t stop these things from messing with files, it’s in their DNA. You block one shell command, they’ll find another. Either revoke shell access completely or play whackamole. You cannot believe how badly they want to work with files.
Yeah, some of the uplift people are anecdotally seeing from “just using the filesystem” is, imo, on account of how difficult it is to take a principled approach to pre-chunking when implementing other approaches.
Yeah I’ve had a lot of success with agentic search against a database.
The way I think of it, the main characteristic of agentic search is just that the agent can execute many types of adhoc queries
It’s not about a file system
As I understood it early RAG systems were all about performing that search for the agent - that’s what makes that approach “non agentic”
But when I have a database that has both embeddings and full text and you can query against both of those things and I let the agent execute whatever types of queries it wants - that’s “agentic search” in my book
Absolutely, agentic search is much more robust to the specific implementation details of your search setup (data quality issues, too) than the early one-shot approaches were. Anyone watching Claude Code work can see this for themselves.
I didn't get into the details too much, but I kept thinking, why isn't he just having an agent discover things from various data sources? I've had much better success with that.
Also odd in that most filesystems implement directories and file names as...a database. You can use a filesystem as a database but you're not being as clever as you thought.
Note that the vast majority of science requires physical experiments. We are very very far from automating that overall. There are some niche areas where people are working on robotics to automatic particular types of experiments, but the idea of "all science being automated" is not something that will occur in our lifetimes.
Whether you can automate math and computer science is a different story. It's possible, but I don't believe we are remotely as close as 2028. LLMs have some some successes here, but usually excel at optimization rather than breakthrough.
There's a lot that would have to go right to get to "all science", but isn't robotics itself a field pretty amenable to automation? A server rack might have trouble building new hardware, but it seems not terribly hard to imagine an LLM-based model deploying new experimental algorithms to the hardware and extracting their performance from a camera feed.
With humanoid robots, a large chunk of what would otherwise be highly expensive to automate becomes possible. "ALL" science may not be automatable. But lots will be.
Absurd. The scientific apparatus is already automated. What are you going to do, have your humanoid robot do the pipetting when there is already a specialized machine that fills trays of 100 samples every 5 seconds? (Totally made up example.)
There might be a way to phrase the future as a tradeoff of capital expenditures; at least that argument would be worth reading about.
Most science is not automated like this in practice. You only see robotic pipetting and fluid handling when you're looking at something more like production or development or you have a truly ridiculous amount of variations to try that are otherwise extremely uniform.
To educate others reading this, it's far from "obvious" how to classify gender in sports. Checking if they have the right "parts" physically doesn't do it. Checking for hormone levels doesn't do it. Even checking for Y chromosomes doesn't do it.
In my opinion the way forward is to stop trying to find arbitrary ways to define gender, and just start making competition classes based on whatever factors are relevant to the event. E.g. a women with high testosterone? They can compete with men or women with the same testosterone bracket. This would also let men with low-T compete fairly rather then be excluded from the games.
It's also relevant at what point other genetic changes are "unfair." There are absolutely genetic traits that give people HUGE advantages in various competitions. Just like the gender-related properties, these are natural and yet result in unfair competitions.
The problem with your proposed 'fuzzy divisions' is that they're not compatible with the zeitgeist of 'seeing the best compete', and 'drug-free' sports, as there's no reason to disallow performance-enhancing-drugs if we're already splitting into divisions.
Actually, you bring up an excelling additional argument for the sort of bracketing I proposed. It also works for drugs!
There is significant grey area wrt to "doping" too in the sense that a performance enhancing drug may express as a larger than normal amount of a naturally occurring substance. So did the person dope, or is that their natural genetics? In my scheme, WHO CARES!
Beyond that, I suppose there is the usual argument against more serious and non-natural forms of doping that it is physically detrimental to the competitors and by allowing it you are encouraging or pressuring people to essentially harm themselves.
Still, competition classes could be helpful in some of the doping grey areas.
Thank you. But the Y test still seems sufficient. Every criterion will have false positives and negatives. With the Y test the false negative (you present as a woman but have a Y chromosome) is rare and the vast majority of cases are handled well. If you have this condition you must compete against men (given the Y chromosome test rule) or not compete. If you’re dying to be in the Olympics as a woman but have the Y chromosome, you’re just out of luck. Not everyone can be a concert pianist either. No rule makes things wonderful for 100% of humans. The Y test gets very close.
But that's a contradiction, no? We're saving women from other women and barring trans people also (ones we consider men) because of a perceived risk that I don't see evidence for (i.e. people choosing to compete as women on a malicious basis or with an 'innate advantage' that makes it dangerous - we've had a long time of running these sports without this sort of regulation, and it seems to be a political choice more than a reaction to evidence that women are being outcompeted by trans people). This is also assuming that having a y chromosome makes it fair for people with a y chromosome to compete against one another, but if you compare people's physiology these people who present as women often have low/no testosterone. Separating on the line of testosterone picks up a lot of female athletes (especially at the olympic level) that are not trans, and overall I just see this hurting women without evidence that it's actually a response to harm. In any case, trans people and gender non conforming women become the victims of this in the public sphere.
It just seems very misguided.
High level sports consists entirely of outliers. That’s kind of the point of the olympics. This newest rule is nothing more than a misogynist rule to turn the women’s division into the “no more than statistically average” division.
Almost every gold medal winner in the past games would not have been affected by this new rule, so that's a biiit hyperbolic. Those athletes are still far outside the normal performance of women (or men, for that matter).
If you pay attention, your source has an asterisk of “typically” and “usually”, aswell as a distinction between phenotype and karyotype traits. While it is true that the majority of people with a Y chromosome are male, there are many people with Y chromosomes you’d call female because of their phenotype (which is what society primarily cares about), among other cicumstances.
I specifically said sex. Gender is mostly undefined. If you say that gender is the societal presentation as male or female, but you can’t define male from female then what are you defining? Its the “trans women are women” contradiction.
For Swyer syndrome, A 2017 study estimated that the incidence of Swyer syndrome is approximately 1 in 100,000 females. Fewer than 100 cases have been reported as of 2018.
For both the genetic disorders, they would have to be beneficial or at least not an disadvantage, for elite sport activity in order to be an issue for misclassification. For a sex-determination system, they could simply add an exception for Swyer syndrome and postpone the decision until such individual presented themselves at an Olympic competition.
I am going to try to keep my response apolitical to try to avoid fanning a culture war. That Wiki is the exact reason we are in this situation because we are bringing up points for 1 in 20000 or 0.005% of the population. Any system designed around 0.005% edge cases is going to be so complex that it is functionally impossible to do in practice. That is why one side says the solution is "obvious" because we have a simple rule that covers 99.9% of cases and the other 0.1% is unfortunately effectively barred from high level competition. Note, high level competition already bars 99.9% of people. Even though the opposing side is correct in pointing out these edge cases, it does nothing to advance an actual solution.
There are statistically around 15 women AFAB with XY chromosomes in the NCAA by those numbers (assuming no correlation between Swyer syndrome and athletic performance).
There are currently around 10 openly transgender women in the NCAA.
Sure, it covers 99.9% of cases, but top elite athletes are the genetic exceptions, they are the genetic freaks. They are the top 0.0001%. You don't get to compete at the most elite levels without your body being exceptionally gifted and almost specifically shaped for the relevant sport, which inevitably means funky genetic traits and disorders, higher testosterone levels etc.
I mean the word freak in the most loving and caring way possible, mind you.
I am not sure what point you are trying to make. When it comes to the Olympics, it was decided a long time ago that having both men and women's events was beneficial for societal progress to have both sexes represented. This was at a time when sex=gender. Now, we recognize the difference between sex and gender but one side thinks the split of events was always based on gender whereas it was almost surely based on sex. This ruling confirms that view point.
Except I proposed a solution, which you ignored (I'm assuming here that I'm your "opposing side".)
Also, there are a significant number of these sorts of arguments in high-level sports, probably precisely because these "0.1%" cases are exactly the ones that result in exceptional ability relative to norms. It's also curious that there is such obsession about naturally occurring genetic outliers with respect to females or gender but absolute silence about naturally occurring genetic outliers among men unrelated to gender. And surprise surprise the top athletes often have such outlier genetics!
If you're drawing a distinction between natural genetic difference related to only gender and no other factors then sadly it's exactly a culture war, not a war based in science or fairness.
Because in a specific minority of the population it disagrees with the gender assigned at birth for obvious reasons. There are plenty of resources you could read on intersex instead of lol at something you don’t understand
The EU governance system is vastly different than the US, and not nearly as fragile. Even if AfD gets sway in one country, it doesn't mean that suddenly they can do anything they want like you saw in the last US election.
My understanding of the EU system is that it's far more proportional in representation, and a simple 51% isn't enough to have 100% control. Parties still need to work together and compromise.
> My understanding of the EU system is that it's far more proportional in representation, and a simple 51% isn't enough to have 100% control. Parties still need to work together and compromise.
We've already seen with Brexit that 100% control is not needed in a parliamentary system to destroy a country's livelihood. But my point was that AfD doesn't need something like "presidential control" of the EU, it would just need to start working with other far-right parties in the EU such as Hungary and France's RN to sow chaos from within. Is that very far-fetched? You can't tell me that most of Europe doesn't hold its collective breath at every French election, crossing their fingers that Le Pen's party doesn't win this go around.
Especially odd as that’s exactly what Kagi assistant already does. Maybe they’d just rather use their key than pay Kagi for LLM based search.
On that note, Kagi research is legit amazing. There have been times I’ve spent 30min searching for something without success. As a last resort I asked Kagi research and it found why I could not. More than one option even. Now intend to use almost more than normal search.
Yeah, I agree with other comments here that the traditional search offering has gotten a bit worse (I think because the whole web has gotten worse), but research surfaces great results. Forums, small blogs and websites with authoritative views on subjects I search for. Really great. Yes, it is as expensive as any other AI pro plan unfortunately, but worth it for me.
I'm extremely confused by these comments. Are we all using the same google? Just to make sure I wasn't crazy I just did a search on Google and 1/2 the page was a combination of google AI result and ads. Below that there were 2.5 links visible. One reddit result, and two blogspam.
The exact same search on Kagi ('best lllm for coding') nets reddit, hacker news, and some other forum results right at the top, followed by a long dense list of links to various sites (including some of the same blogspam of course), but over all the results are hugely more rich and varied and also not at all the same.
How can you possibly say that a site that gives you 50% ads and a bunch of low quality links is remotely "only a little better" than a site that gives you zero ads and a huge number of better quality links?
I'm not familiar with how most out of the box RAG systems categorize data, but with a database you can index content literally in any way you want. You could do it like a filesystem with hierarchy, you could do it tags, or any other design you can dream up.
The search can be keyword, like grep, or vector, like rag, or use the ranking algorithms that traditional text search uses (tf-idf, BM25), or a combination of them. You don't have to use just the top X ranked documents, you could, just like grep, evaluate all results past whatever matching threshold you have.
Search is an extremely rich field with a ton of very good established ways of doing things. Going back to grep and a file system is going back to ... I don't know, the 60s level of search tech?
reply