Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The recent announcement to reject review articles and position papers already smelled like a shift towards a more "opinionated" stance, and this move smells worse.

The vacuum that arXiv originally filled was one of a glorified PDF hosting service with just enough of a reputation to allow some preprints to be cited in a formally published paper, and with just enough moderation to not devolve into spam and chaos. It has also been instrumental in pushing publishers towards open access (i.e., to finally give up).

Unfortunately, over the years, arXiv has become something like a "venue" in its own right, particularly in ML, with some decently cited papers never formally published and "preprints" being cited left and right. Consider the impression you get when seeing a reference to an arXiv preprint vs. a link to an author's institutional website.

In my view, arXiv fulfills its function better the less power it has as an institution, and I thus have exactly zero trust that the split from Cornell is driven by that function. We've seen the kind of appeasement prose from their statement and FAQ [1] countless times before, and it's now time for the usual routine of snapshotting the site to watch the inevitable amendments to the mission statement.

"What positive changes should users expect to see?" - I guess the negative ones we'll have to see for ourselves.

[1] https://tech.cornell.edu/arxiv/

 help



> Unfortunately, over the years, arXiv has become something like a "venue" in its own right, particularly in ML, with some decently cited papers never formally published and "preprints" being cited left and right.

This has been a common practice in physics, especially the more theoretical branches, since the inception of arXiv. Senior researchers write a paper draft, and then send copies to some of their peers, get and incorporate feedback, and just submit to arxiv.


And this is really how it should be. Honestly the only thing I want arxiv to do is become more like open review. Allow comments by peers and some better linking to data and project pages.

It works for physics because physicists are very rigorous. So papers don't change very much. It also works for ML because everyone is moving very fast that it's closer to doing open research. Sloppier, but as long as the readers are other experts then it's generally fine.

I think research should really just be open. It helps everyone. The AI slop and mass publishing is exploiting our laziness; evaluating people on quantity rather than quality. I'm not sure why people are so resistant to making this change. Yes, it's harder, but it has a lot of benefits. And at the end of the day it doesn't matter if a paper is generated if it's actually a quality paper (not in just how it reads, but the actual research). Slop is slop and we shouldn't want slop regardless. But if we evaluate on quality and everything is open it becomes much easier to figure out who is producing slop, collision rings, plagiarist rings, and all that. A little extra work for a lot of benefits. But we seem to be willing to put in a lot of work to avoid doing more work


You could imagine separating the "publishing" part, which really should just be open with minimal anti-spam etc, from the "this was reviewed by a trusted group of people so you should give it more consideration" part. You could do the second without it being attached to the publishing.

I think your phrasing was good. A lot of people conflate a work being published is equivalent to peer reviewed and that "peer reviewed" means "correct".

I think when you think about publishing as what it actually is, researchers communicating to researchers, what I said makes much more sense. I do think formal review does help reduce slop but I think anyone who has published anything is also very aware of how noisy the system is and how good works get rejected or delayed because they aren't "novel" enough.

Honestly, my ideal system is journals with low bars. We forget this prestige bullshit and silliness of novelty (often it's novel to niche experts but not to others) and basically check if it looks like due diligence was done, there's not things obviously wrong, no obvious plagiarism, and then maybe a little back and forth to help communicate. But I think we've gotten too lost in this idea of needing to punish fast and that it has to be important. Important to who? Tons of stuff is only considered important later, we've got a long track record of not being so great at that. But we have a long track record of at least some people working on what we later find out is important.


There's a lot of stuff with basic errors in peer reviewed journals. Things also can get rejected for anything from formatting to politics.

I like Arxiv better. I get the paper, know it's probably not reviewed (like in many journals), and review it if I want to. I used to ise Citeseerx, too, to get tons of CompSci papers. Even better, OpenReview might have some good observations.


I don't agree actually that is how it should or can work for everyone. Senior researchers produce good quality research, and they have a network of high quality peers built over decades. Both those are necessary for them to reach out and ask for feedback, and get genuine and high quality feedback.

Junior researchers don't have these typically. They also benefit more from anonymous feedback, which enables the reviewers to bluntly identify wrong or close to wrong results. So I think open journals should continue to exist. They fill an essential role in the scientific ecosystem.


Mostly I'm fine with journals and conferences but I think it's the prestige that has fucked everything over.

I want reviews of my papers! But I want reviews by people who care. I don't want reviews by people who don't want to review. I don't want reviews by people who think it's their job to reject or find flaws in the work. I want reviews by people who care. I want reviews by people who want to make my work better. I want reviews by people who understand all works are flawed and we can't tackle every one in every paper (the problem isn't solved, so there's always more!).

So low bars. Forget the prestige, citation count, novelty, and all the bullshit and just focus on the actual work and that the act of publishing is about communicating. Publishing is the main difference between private and public labs. Private labs do fine research, without all the formal review. It's just that nobody learns about it. They don't give back to the community.

So my ideal system still has reviewers, journals, and conferences but I think we'd get along just fine without them. I believe that if we can't recognize that then we can't use these other tools to make things better.

They aren't fundamental tools needed to make the process work, they're tools that can make the process work better. But I'm not convinced they're doing a good job of that right now.


>We've seen the kind of appeasement prose from their statement and FAQ [1] countless times before

what are you referring to, who is being appeased who shouldn't be? what are you worried about happening?


I came here to say something similar. As someone who works in a field that applies machine learning but is not purely focused on it, I interact with people who think that arXiv is the only relevant platform and that they don't need to submit their work to any journal, as well as people who still think that preprints don't count at all and that data isn't published until it's printed in an academic journal. It can feel like a clash of worlds.

I think both sides could learn from the other. In the case of ML, I understand the desire to move fast and that average time to publication of 250-300 days in some of the top-tier journals can feel like an unnecessary burden. But having been on both sides of peer review, there is value to the system and it has made for better work.

Not doing any of it follows the same spirit as not benchmarking your approach against more than maybe one alternative and that already as an after-thought. Or benchmaxxing but not exploring the actual real-world consequences, time and cost trade offs, etc.

Now, is academic publishing perfect? Of course not, very very far from it. It desperately needs to be reformed to keep it economically accessible, time efficient for both authors, editors and peer reviewers and to prevent the "hot topic of the day" from dominating journals and making sure that peer review aligns with the needs of the community and actually improves the quality of the work, rather than having "malicious peer review" to get some citations or pet peeves in.

Given the power that the ML field holds and the interesting experiments with open review, I would wish for the field to engage more with the scientific system at large and perhaps try to drive reforms and improve it, rather than completely abandoning it and treating a PDF hosting service as a journal (ofc, preprints would still be desirable and are important, but they can not carry the entire field alone).


Simply anticipating basic push backs from reviewers makes sure that you do a somewhat thorough job. Not 100% thorough and the reviews are sometimes frivolous and lazy and stupid. But just knowing that what you put out there has to pass the admittedly noisily gatekept gate of peer review overall improves papers in my estimation. There is also a negative side because people try to hide limitations and honest assessments and cherry pick and curate their tables more in anticipation of knee jerk reviewers but overall I think without any peer review, author culture would become much more lax and bombastic and generally trend toward engagement bait and social media attention optimized stuff.

The current balance where people wrote a paper with reviers in mind, upload it to Arxiv before the review concludes and keep it on Arxiv even if rejected is a nice balance. People get to form their own opinion on it but there is also enough self-imposed quality control on it just due to wanting it to pass peer review, that even if it doesn't pass peer review, it is still better than if people write it in a way that doesn't care or anticipate peer review. And this works because people are somewhat incentivized to get peer reviewed official publications too. But being rejected is not the end of the world either because people can already read it and build on it based on Arxiv.


I really am not sure about that: https://biologue.plos.org/wp-content/uploads/sites/7/2020/05...

The problem is that "optimizing for peer-review" is not the same thing as optimizing for quality. E.g., I like to add a few tongue-in-cheeks to entertain the reader. But then I have to worry endlessly about anal-retentive reviewers who refuse to see the big picture.


Currently a kind of rule of thumb is that a PhD student can graduate after approximately 3 papers published in a good peer reviewed venue.

If peer review were to go away, this whole academic system would get into a crisis. It's dysfunctional and has many problems but it's kinda load bearing for the system to chug along.


Maybe their institution should evaluate whether their papers pass muster? It's the one conferring the degree.

No hard rule, no crisis.

Maybe we can go back to very opinionated “true” academia,

where there are institutional gatekeepers,

but they mostly get it right on who to award (and not),

vs the current game of

“whoever plays ball with funding sources the best = the best academic”,

which is obviously bullshit.


You'll still need to convince the purseholders to pay you, and they'll want some objective metric to measure your output, and whatever metric they pick will be gamed.

The point of my comment was,

in much earlier institutions of knowledge and excellence,

the only transparent metric was whether or not they approved you.


That ossifies intellectual monocultures, though. (Or, heaven forbid, if someone has a financial conflict of interest in the private sphere...)

But this is already how the purse holders operate. A big group of experts get together and vote on which grant proposals within a given category to fund.

I think it comes down to how the system is structured and how many players there are. The more difficult it is for a small cult to capture control of the funding (or access to instrumentation or awarding of degrees or whatever) for a given area the less likely you are to end up with a monoculture.

Assuming the majority of the funding continues to come from governments then you have a centralized point of leverage that can shape the system. So it should be possible to impose constraints that result in a system that actively prevents monocultures from developing.


The current solution doesn’t resist capture by capital either,

and indeed we’re already left with all of the things claimed - the worst of both worlds, really.


You may have delivered value in peer review, but on the whole, peer review delivers negative value. https://www.experimental-history.com/p/the-rise-and-fall-of-...

The arXiv vs journal debate seems a lot like 'should the work get done, or should the work get certified' that you see all over 'institutions', and if the certification does not actually catch frauds or errors, it's not making the foundations stronger, which is usually the only justification for the latter side.


Can't say I agree with that position.

Responding largely to the linked article, you can't just ignore the massive increase in funding and associated output that occurred. Scaling almost any system up will be expected to result in creative new failure modes. It's easy to observe that a system isn't great and suppose that removing it would improve things but this very often isn't the case. Democracy is one such example.

There's also the publishing ecosystem that developed around the increased funding. It isn't clear to me why any blame (if it's even valid, see preceding paragraph) should be laid at the feet of the practice of peer reviewing publications rather than such an obviously dysfunctional institution.

Even if we accept the way in which publications have been undergoing peer review to somehow be the root of all evil (as opposed to the for profit publication of taxpayer funded work) - there's more than one way to go about it! A glaringly obvious problem, mentioned in the linked article yet not meaningfully addressed that I saw, is that peer reviewers aren't paid. If this was a compensated task presumably it would be performed much more rigorously. Building inspectors aren't volunteers and they seem to do a good enough job.


What's the value of academic publishing over the arxiv model of freely publishing, free access, and a global, vigorous discussion across a wide range of platforms, with experts, researchers, amateurs, institutions, and the peanut gallery all having the opportunity to participate?

What possible value does a journal like Nature, for example, bring to the table by claiming a paper for themselves and charging people for it, given the alternative?

I don't see any value there. Maintaining an exclusive clique by using artificial scarcity while coasting on the dregs of reputation remaining to a once prestigious institution is what a lot of these journals are doing.

The world has changed. There's no need for that sort of pay to play gatekeeping, and in fact, the model does tremendous damage to academic and intellectual integrity. It allows people to get away with fraud and it makes the institutions motivated to hide and cover it up so as to not damage their own reputations by admitting anything slipped by them.

If you contrast the damage done by journals, with regards to suppressed research, gatekept access, money taken from researchers and readers alike, against the value they might plausibly provide, the answer is clear.

They're not needed anymore. The AI era, since 2017, has thoroughly demonstrated that journals are materially incapable of keeping up, that they're unable to meaningfully contribute to the field, and that their curation or other involvement has no effective practical value. The same is true for other fields, but everyone involved wants to keep their piece of the grift going as long as possible.

We don't need them, anymore. I suspect we never did.


The value is the ability to do science as a career without being independently wealthy.

Politicians, administrators, donors, and taxpayers don't want scientists deciding on their own how to spend the money. They want control over what gets funded. They want funding decisions with justifications they can understand. But they don't understand the science itself, so they need "objective" metrics to support the decisions. And because those metrics matter, people will inevitably game them.


I've noticed it's field dependent. Some fields don't really feel much need to publish in a real journal.

Others (at least in chemistry) will accept it, but it raises concern if a paper is only available as a preprint.


> arXiv fulfills its function better the less power it has as an institution

It is an interesting instance of the rule of least power, https://en.wikipedia.org/wiki/Rule_of_least_power.


The irony of the TBL quotes there being the entire problem with the semantic web is the ontological tarpit that results due to the excessive expressive power of a general triple store.

Well, I’d argue that many things in the semweb are not expressive enough and lead to the misunderstandings we have.

People think, for instance, that RDFS and OWL are meant to SHACL people into bad an over engineered ontologies. The problem is these standards add facts and don’t subtract facts. At risk of sounding like ChatGPT: it’s a data transformation system not a validation system.

That is, you’re supposed to use RDFS to say something like

  ?s :myTermForLength ?o -> ?s :yourTermForLength ?o .
The point of the namespace system is not to harass you, it is to be able to suck in data from unlimited sources and transform it. Trouble is it can’t do the simple math required to do that for real, like

  ?s :lengthInFeet ?o -> ?s :lengthInInches 12*?o .
Because if you were trying OWL-style reasoning over arithmetic you would run into Kurt Gödel kinds of problems. Meanwhile you can’t subtract facts that fail validation, you can’t subtract facts that you just don’t need in the next round of processing. It would have made sense to promote SHACL first instead of OWL because garbage-in-garbage out, you are not going to reason successfully unless you have clean data… but what the hell do I know, I’m just an applications programmer who models business processes enough to automate them.

Similarly the problem of ordered collections has never been dealt with properly in that world. PostgreSQL, N1QL and other post-relational and document DB languages can write queries involving ordered collections easily. I can write rather unobvious queries by hand to handle a lot of cases (wrote a paper about it) but I can’t cover all the cases and I know back in the day I could write SPAQL queries much better than the average RDF postdoc or professor.

As for underengineering, Dublin Core came out when I worked at a research library and it just doesn’t come close in capability to MARC from 1970. Larry Masinter over at Adobe had to hack the standard to handle ordered collections because… the authors of a paper sure as hell care what order you write their names in. And it is all like that: RDF standards neglect basic requirements that they need to be useful and then all the complex/complicated stuff really stands out. If you could get the basics done maybe people would use them but they don’t.


> Unfortunately, over the years, arXiv has become something like a "venue" in its own right, ...

In my experience as a publishing scientist, this is partly because publishing with "reputable" journals is an increasingly onerous process, with exorbitant fees, enshittified UIs, and useless reviews. The alternative is to upload to arXiv and move on with your life.


That’s true. But that’s separate than the use in ML in Blockchain circles as a form of a marketing - using academic appearances.

Every field and every publisher has this issue though.

I've read papers in the chemical literature that were clearly thinly veiled case studies for whatever instrument or software the authors were selling. Hell, I've read papers that had interesting results, only to dig into the math and find something fundamentally wrong. The worst was an incorrect CFD equation that I traced through a telephone game of 4 papers only to find something to the effect of "We speculate adding $term may improve accuracy, but we have not extensively tested this"

Just because something passed peer review does not make it a good paper. It just means somebody* looked at it and didn't find any obvious problems.

If you are engaged in research, or in a position where you're using the scientific literature, it is vital that you read every paper with a critical lens. Contrary to popular belief, the literature isn't a stone tablet sent from God. It's messy and filled with contradictory ideas.

*Usually it's actually one of their grad students


That sounds more like an issue of certain fields having crappy standards because the people in those fields benefit from crappy standards than an issue with the site they happen to host papers on.

I don’t buy “some fields are just more honorable”. Everyone uses publishing for personal gain.

But yes it’s a people problem, not an arxiv problem.


> and with just enough moderation to not devolve into spam and chaos

arXiv has become a target for grifters in other domains like health and supplements. I’ve seen several small scale health influencers who ChatGPT some “papers” and then upload them to arXiv, then cite arXiv as proof of their “published research”. It’s not fooling anyone who knows how research work but it’s very convincing to an average person who thinks that that they’re doing the right thing when they follow sources that have done academic research.

I’ve been surprised as how bad and obviously grifty some of the documents I’ve seen on arXiv have become lately. Is there any moderation, or is it a free for all as long as you can get an invite?


This is great news for anyone building tools on top of arXiv data. The API (export.arxiv.org/api/) is one of the best free academic data sources — structured Atom feed with full abstracts, authors, categories, and publication dates.

I've been using it as one of 9 data sources in a market research tool — arXiv papers are a strong leading indicator of where an industry is heading. Academic research today often becomes commercial products in 2-3 years.


Review papers are interesting.

Bibliometrics reveal that they are highly cited. Internal data we had at arXiv 20 years ago show they are highly read. Reading review papers is a big part of the way you go from a civilian to an expert with a PhD.

On the other hand, they fall through the cracks of the normal methods of academic evaluation.

They create a lot of value for people but they are not likely to advance your career that much as an academic, certainly not in proportion to the value they create, or at least the value they used to create.

One of the most fun things I did on the way to a PhD was writing a literature review on giant magnetoresistance for the experimentalist on my thesis committee. I went from knowing hardly anything about the topic to writing a summary that taught him a lot he didn't know. Given any random topic in any field you could task me with writing a review paper and I could go out and do a literature search and write up a summary. An expert would probably get some details right that I'd get wrong, might have some insights I'd miss, but it's actually a great job for a beginner, it will teach you the field much more effectively than reading a review paper!

How you regulate review papers is pretty tricky. If it is original research the criterion of "is it original research" is an important limit. There might already be 25 review papers on a topic, but maybe I think they all suck (they might) and I can write the 26th and explain it to people the way I wish it was explained to me.

Now you might say in the arXiv age there was not a limit on pages, but LLMs really do problematize things because they are pretty good at summarization. Send one off on the mission to write a review paper and in some ways they will do better than I do, in other ways will do worse. Plenty of people have no taste or sense of quality and they are going to miss the latter -- hypothetically people could do better as a centaur but I think usually they don't because of that.

One could make the case that LLMs make review papers obsolete since you can always ask one to write a review for you or just have conversations about the literature with them. I know I could have spend a very long time studying the literature on Heart Rate Variability and eventually made up my mind about which of the 20 or so metrics I want to build into my application and I did look at some review papers and can highlight sentences that support my decisions but I made those decisions based on a few weekends of experiments and talking to LLMs. The funny thing is that if you went to a conference and met the guy who wrote the review paper and gave them the hard question of "I can only display one on my consumer-facing HRV app, which one do I show?" they would give you that clear answer that isn't in the review paper and maybe the odds are 70-80% that it will be my answer.


I exited academia for industry 15 years ago, and since then I haven't had nearly as much time to read review papers as I would like. For that reason, my view may be a bit outdated, but one thing I remember finding incredibly useful about review papers is that they provided a venue for speculation.

In the typical "experimental report" sort of paper, the focus is typically narrowed to a knifes edge around the hypothesis, the methods, the results, and analysis. Yes, there is the "Introduction" and a "Discussion", but increasingly I saw "Introductions" become a venue to do citation bartering (I'll cite your paper in the intro to my next paper if you cite that paper in the intro to your next paper) and "Discussion" turn into a place to float your next grant proposal before formal scoring.

Review papers, on the other hand, were more open to speculation. I remember reading a number that were framed as "here's what has been reported, here's what that likely means...and here's where I think the field could push forward in meaningful ways". Since the veracity of a review is generally judged on how well it covers and summarizes what's already been reported, and since no one is getting their next grant from a review, there's more space for the author to bring in their own thoughts and opinions.

I agree that LLMs have largely removed the need for review papers as a reference for the current state of a field...but I'll miss the forward-looking speculation.

Science is staring down the barrel of a looming crisis that looks like an echo chamber of epic proportions, and the only way out is to figure out how to motivate reporting negative results and sharing speculative outsider thinking.


My feelings about that outsider thing are pretty mixed.

On one hand I'm the person who implemented the endorsement system for arXiv. I also got a PhD in physics did a postdoc in physics then left the field. I can't say that I was mistreated, but I saw one of the stars of the field today crying every night when he was a postdoc because he was so dedicated to his work and the job market was so brutal -- so I can say it really hurts when I see something that I think belittles that.

On the other hand I am very much an interested outsider when it comes to biosignals, space ISRU, climate change, synthetic biology and all sorts of things. With my startup and hackathon experience it is routine for me to go look at a lot of literature in a new field and cook it down and realize things are a lot simpler than they look and build a demo that knocks the socks off the postdocs because... that's what I do.

But Riemann Hypothesis, Collatz, dropping names of anyone who wrote a popular book, I don't do that. What drives me nuts about crackpots is that they are all interested in the same things whereas real scientists are interested in something different. [1] It was a big part of our thinking about arXiv -- crackpot submissions were a tiny fraction of submission to arXiv but they would have been half the submissions to certain fields like quantum gravity.

I've sat around campfires where hippies were passing a spliff around and talking about that kind of stuff and was really amused recently when we found out that Epstein did the thing with professors who would have known better -- I mean, I will use my seduction toolbox to get people like that to say more than they should but not to have the same conversation I could have at a music festival.

[1] e.g. I think Tolstoy got it backwards!


> crackpot submissions were a tiny fraction of submission to arXiv but they would have been half the submissions to certain fields like quantum gravity

Just some very outsider thought:

Could it be that this problem is rather self-inflected by researchers and their marketing?

Physicists market all the time that resolving these questions about quantum gravity will give the answers to the deepest questions that plagued philosophers over millenia. Well, such a marketing attracts crackpots who do believe that they have something to tell about such topics.

Relatedly, to improve their chances of getting research funding, a lot of researchers do an outreach to the general public to show the importance of the questions that they work on. Of course this means that people from the general pyblic who now get interested in such questions will make their own attempt to make a contribution because - well, this researcher just told me how important it is to think about such questions. Of course such a person from the general public typically does not have the deep scientific knowledge such that their contribution meets the high scientific standards.


> Unfortunately, over the years, arXiv has become something like a "venue" in its own right, particularly in ML, with some decently cited papers never formally published and "preprints" being cited left and right. Consider the impression you get when seeing a reference to an arXiv preprint vs. a link to an author's institutional website.

This just isn't true. arXiv is not a venue. There's no place that gives you credit for arXiv papers. No one cares if you cite an arXiv paper or some random website. The vast vast majority of papers that have any kind of attention or citations are published in another venue.


A Fields medal was awarded based mainly on this paper never published elsewhere: https://arxiv.org/abs/math/0211159

I think there is a misunderstanding here. Does arXiv count as a publication? Yes, pretty much anything that gives you a DOI does, for example Zenodo. Does it function as a reputable anything? No.

The paper you link to counts as a publication, but its reputation stands on its own, it has nothing to do with arXiv as a venue. Ideally, that's how it is for all papers, but it isn't, just by publishing in certain venues your paper automatically gets a certain amount of reputation depending on the venue.


> Ideally, that's how it is for all papers, but it isn't

We require a method of filtering such that a given researcher doesn't have to personally vet in excruciating detail every paper he comes across because there simply isn't enough time in the day for that.

Ideally such a system would individually for each paper provide a multi-dimensional score that was reputable. How can those be calculated in a manner such that they're reputable? Who knows; that exercise is left for the reader.

In practice "well it got published in Nature" makes for a pretty decent spam filter followed by metrics such as how many times it's been cited since publication, checking that the people citing it are independent authors who actually built directly on top of the work, and checking how many of such citing authors are from a different field.


Can't we do better than that?

PageRank was a decent solution for websites. Can't we treat citations as a graph, calculate per-author and per-paper trustworthiness scores, update when a paper gets retracted, and mix in a dash of HN-style community upvotes/downvotes and openly-viewable commentary and Q&A by a community of experts and nonexperts alike?


It was not awarded because that paper is on arxiv. That paper could have been printed and sent out by mail. Or posted on 4chan. etc. It just so happens to be it was on arxiv which made no difference to anything.

My observation is that research, especially in AI has left universities, which are now focusing their research to a lesser degree on STEM. It appears research is now done by companies like Meta, OpenAI, Anthropic, Tencent, Alibaba, among many others.

Universities (outside a few) just have much weaker PR machines so you never hear what they do. Also their work is not user facing products so regular people, even tech power users won't see them.

I came across a good example of that a few years ago. Caltech had a page on their site listing Caltech startups.

There were quit a few off them--by number of starts per year per person Caltech was actually generating startups at a higher rate than Stanford. But almost none of those Caltech startups were doing anything that would bring them to the public's attention, or even to the average HN reader's attention.

For example one I remember was a company developing improved ion thrusters for spacecraft. Another was doing something to automate processing samples in medical labs.

Also almost none of them were the "undergraduates drop out to form a company" startup we often hear about, where the founders aren't actually using much that they actually learned at the school, with the school functioning more as a place that brought the founders together.

The Caltech startups were most often formed by professors and grad students, and sometimes undergraduates that were on their research team, and were formed to commercialize their research.

My guess is that this is how it is at a lot of universities.


Every university I've worked in has been dominated by this paradigm, has an office set up to support it, and a bunch of policies around what it means for your doctoral supervisor to also be your employer, etc.

Not sure about that. How would a university test scaling hypotheses in AI, for example? The level of funding required is just not there, as far as I know.

Universities are also not suited to test which race car is the fastest, but that does not obviate the need for academic research in mechanical engineering.

Perhaps but the fastest race car is not possibly marshalling in the end of human involvement in science, so you might consider these of considerably different levels of meriting the funding.

>marshalling in the end of human involvement in science

Good riddance! But not relevant in the least.


Impact size is not relevant to funding allocation?

Your attempts to smuggle your conclusions into the conversation are becoming tiresome. Profiling a private company's computer program is not impactful research. The best-fit parameters AI people call scaling exponents are not properties like the proton lifetime or electron electric dipole moment. Rest assured, there remain scientists at universities producing important work on machine learning.

There are a million other research things to do besides running huge pretraining runs and hyperparam grid search on giant clusters. To see what, you can start with checking out the best paper and similar awards at neurips, cvpr, iccv, iclr, icml etc.

This issue of accessibility is widely acknowledged in the academic literature, but it doesn’t mean that only large companies are doing good research.

Personally I think this resource mismatch can help drive creative choice of research problems that don’t require massive resources. To misquote Feynman, there’s plenty of room at the bottom


That's a specific field at a very specific time. In general there is a difference between research and development, you're going to expect the early work to be done in academia but the work to turn that into a product is done by commercial organizations.

You get ahead as an academic computer scientist, for instance, by writing papers not by writing software. Now there really are brilliant software developers in academic CS but most researchers wrote something that kinda works and give a conference talk about it -- and that's OK because the work to make something you can give a talk about is probably 20% of the work it would take to make something you can put in front of customers.

Because of that there are certain things academic researchers really can't do.

As I see it my experience in getting a PhD and my experience in startups is essentially the same: "how do you do make doing things nobody has ever done before routine?" Talk to people in either culture and you see the PhD students are thinking about either working in academia or a very short list of big prestigious companies and people at startups are sure the PhDs are too pedantic about everything.

It took me a long time of looking at other people's side projects that are usually "I want to learn programming language X", "I want to rewrite something from Software Tools in Rust" to realize just how foreign that kind of creative thinking is to people -- I've seen it for a long time that a side project is not worth doing unless: (1) I really need the product or (2) I can show people something they've never seen before or better yet both. These sound different, but if something doesn't satisfy (2) you can can usually satisfy (1) off the shelf. It just amazes me how many type (2) things stay novel even after 20 years of waiting.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: