More

modeless · 2026-03-30T20:24:15 1774902255

On the other hand it's a boon to those establishing new businesses. And a huge boon to employees. And a boon to the overall economy because it accelerates transfer of know-how out of older and more dysfunctional companies into newer and more nimble ones. This is what made Silicon Valley what it is, starting all the way back with the Traitorous Eight in 1957 and continuing today.

There are so many wannabe "New Silicon Valley" alternative areas that are unwilling to copy the non-compete ban, and subsequently fail to compete with the real Silicon Valley. It's a necessary ingredient in my opinion.

modeless · 2026-03-27T20:57:56 1774645076

Once you have matched humans on a problem then further progress on that problem is not necessarily meaningful anymore, in terms of quantitative measurement of intelligence. ARC-AGI-3 is designed to compare AIs to humans, not to measure arbitrarily high levels of superhuman intelligence. For that you would want a different benchmark.

modeless · 2026-03-27T03:33:00 1774582380

On the public set of 25 problems. These are intended for development and testing, not evaluation. There are 110 private problems for actual evaluation purposes, and the ARC-AGI-3 paper says "the public set is materially easier than the private set".

SchemaLoad · 2026-03-27T03:34:24 1774582464

Benchmarks on public tests are too easy to game. The model owners can just incorporate the answers in to the dataset. Only the private problems actually matter.

sanxiyn · 2026-03-27T03:37:54 1774582674

In this case the code is public and you can see they are not cheating in that sense.

DetroitThrow · 2026-03-27T05:13:46 1774588426

The harness seems extremely benchmark specific that gives them a huge advantage over what most models can use. This isn't a qualifying score for that reason.

Here is the ARC-AGI-3 specific harness by the way - lots of challenge information encoded inside: https://github.com/symbolica-ai/ARC-AGI-3-Agents/blob/symbol...

Davidzheng · 2026-03-27T04:08:24 1774584504

I agree it's not cheating that restricted sense. But I'm not really convinced that it can't be cheating in a more general sense. You can try like 10^10 variations of harnesses and select the one that performs best. And probably if you then look at it, it will not look like it's necessarily cheating. But you have biased the estimator by selecting the harness according to the value.

SchemaLoad · 2026-03-27T03:41:56 1774582916

Once the model has seen the questions and answers in the training stage, the questions are worthless. Only a test using previously unseen questions has merit.

lambda · 2026-03-27T03:46:52 1774583212

They aren't training new models for this. This is an agent harness for Opus 4.6.

measurablefunc · 2026-03-27T03:59:02 1774583942

All traffic is monitored, all signal sources are eventually incorporated into the training set in one way or another. The person you're responding to is correct, even a single API call to any AI provider is sufficient to discount future results from the same provider.

stale2002 · 2026-03-27T04:11:44 1774584704

ok! So if someone uses an existing, checkpointed, open source model then the answer is yes the results are valid and it doesn't matter that the tests are public.

measurablefunc · 2026-03-27T04:35:52 1774586152

Yes, assuming the checkpoint was before the announcement & public availability of the test set.

raincole · 2026-03-27T05:33:21 1774589601

You live in a conspiracy world. Those AI providers don't update the models that fast. You can try ask them solve ARC-AGI-3 without harness and see them struggle as yesterday yourself.

measurablefunc · 2026-03-27T06:23:17 1774592597

Which part is the conspiracy? Be as concrete as possible.

bberrry · 2026-03-27T09:19:34 1774603174

They are definitely cheating, they have crafted prompts[1] that explain the game rules rather than have the model explore and learn.

1. https://github.com/symbolica-ai/ARC-AGI-3-Agents/blob/symbol...

versteegen · 2026-03-27T12:26:30 1774614390

Where do you see that? I only skimmed the prompts but don't see any aspects of any of the games explained in there. There are a few hints which are legitimate prior knowledge about games in general, though some looks too inflexible to me. Prior knowledge ("Core priors") is a critical requirement of the ARC series, read the reports.

modeless · 2026-03-25T20:25:45 1774470345

The test doesn't prove you have AGI. It proves you don't have AGI. If your AI can't solve these problems that humans can solve, it can't be AGI.

Once the AIs solve this, there will be another ARC-AGI. And so on until we can't find any more problems that can be solved by humans and not AI. And that's when we'll know we have AGI.

observationist · 2026-03-25T20:31:03 1774470663

AI X that can solve the tests contrasted with AI Y that cannot, with all else being equal, means X is closer to AGI than Y. There's no meaningful scale implicit to the tests, either.

Kinda crazy that Yudkowsky and all those rationalists and enthusiasts spent over a decade obsessing over this stuff, and we've had almost 80 years of elite academics pondering on it, and none of them could come up with a meaningful, operational theory of intelligence. The best we can do is "closer to AGI" as a measurement, and even then, it's not 100% certain, because a model might have some cheap tricks implicit to the architecture that don't actually map to a meaningful difference in capabilities.

Gotta love the field of AI.

rolux · 2026-03-25T22:01:30 1774476090

Will there be a point in that series of ARC-AGI tests where AI can design the next test, or is designing the next text always going to be a problem that can be solved by humans and not AI?

modeless · 2026-03-25T22:33:29 1774478009

I don't see why AI couldn't design tests. But they can only be validated by humans, as they are intended to be possible and ideally easy for humans to solve.

rolux · 2026-03-26T12:17:52 1774527472

Yes, but I guess you see what I'm getting at. If designing the next ARC-AGI test is impossible for AI without a human in the loop, then AGI becomes unreachable by definition.

famouswaffles · 2026-03-25T21:03:34 1774472614

>It proves you don't have AGI.

It doesn't prove anything of the sort. ARC-AGI has always been nothing special in that regard but this one really takes the cake. A 'human baseline' that isn't really a baseline and a scoring so convoluted a model could beat every game in reasonable time and still score well below 100. Really what are we doing here ?

That Francois had to do all this nonsense should tell you the state of where we are right now.

modeless · 2026-03-24T22:25:11 1774391111

Yeah I was wondering if some native Linux apps might want to use it, since it is clearly useful and hard to emulate.

braiamp · 2026-03-25T09:10:43 1774429843

Linux native semaphores are enough. Linux has been able to be very performant without it. That feature seems like way too over engineered for little gains.

modeless · 2026-03-24T20:09:31 1774382971

Valve built more games than Epic in the past 10 years. Epic essentially only released Robo Recall and Fortnite + extra content, plus a spinoff of Rocket League which was an acquisition. Valve released a couple of duds (Artifact, Dota Underlords) but also some good games: Half-Life: Alyx, Counter-Strike 2, and Deadlock. They also did "The Lab" and "Aperture Desk Job" which, while not full games, were quite good as demos for their hardware.

johnnyanmac · 2026-03-25T16:41:37 1774456897

I'm sure any studio would trade their entire decade of portfolio to get where Fortnite is. Sony did in fact basically do that to great failure (despite Hell divers 2 being very well received, it's no Fortnite).

modeless · 2026-03-23T01:20:06 1774228806

> the key insight is that changes should be flagged as conflicting when they touch each other

Not really. Changes should be flagged as conflicting when they conflict semantically, not when they touch the same lines. A rename of a variable shouldn't conflict with a refactor that touches the same lines, and a change that renames a function should conflict with a change that uses the function's old name in a new place. I don't think I would bother switching to a new VCS that didn't provide some kind of semantic understanding like this.

modeless · 2026-03-22T16:37:11 1774197431

Win32 isn't that hard actually.

zabzonk · 2026-03-22T16:39:51 1774197591

To create a simple window, no it isn't. To create a rather complex application, then yes it is, compared with using a higher-level framework.

modeless · 2026-03-22T16:44:49 1774197889

This article is complaining about the complexity of creating a simple window in Wayland, which is much easier in Win32. Wayland doesn't make creating "a rather complex application" any easier either. In both cases you would use a framework. Even more so in Wayland, which doesn't provide widgets or standard dialogs at all, while Win32 does.

seba_dos1 · 2026-03-22T17:36:19 1774200979

Creating a simple window in Wayland isn't much harder than in Win32. You get a wl_surface, attach a wl_buffer to it, wrap it with xdg_toplevel and handle some callbacks for resizing etc. There's some boilerplate that allows all this to be extensible in backwards-compatible ways, but nothing complex, really. simple-touch example in Weston repository has about 400 lines.

Some compositor's insistence on CSD can make it a bit more complex since you get that in Win32 for free, but on the sane ones you just add xdg-decoration and you're done.

Also, this is all apples-to-oranges anyway, as Win32 is a toolkit, while wayland-client is just a protocol (de)serializer.

Joker_vD · 2026-03-22T18:47:04 1774205224

> Creating a simple window in Wayland isn't much harder than in Win32. You get a wl_surface, attach a wl_buffer to it, wrap it with xdg_toplevel and handle some callbacks for resizing etc. There's some boilerplate that allows all this to be extensible in backwards-compatible ways, but nothing complex, really. simple-touch example in Weston repository has about 400 lines.

I believe the youth nowadays calls what you wrote "copium". Because creating a simple window in Win32 (a whole program, in fact) looks like this:

    #ifndef UNICODE
    #define UNICODE
    #endif 
    
    #include <windows.h>
    
    LRESULT CALLBACK WindowProc(HWND hwnd, UINT uMsg, WPARAM wParam, LPARAM lParam);
    
    int WINAPI wWinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, PWSTR pCmdLine, int nCmdShow)
    {
        const wchar_t CLASS_NAME[]  = L"Sample Window Class";
        
        WNDCLASS wc = { };
    
        wc.lpfnWndProc   = WindowProc;
        wc.hInstance     = hInstance;
        wc.lpszClassName = CLASS_NAME;
    
        RegisterClass(&wc);
    
        HWND hwnd = CreateWindowEx(0, CLASS_NAME, L"Hello World! Program",
            WS_OVERLAPPEDWINDOW,
            CW_USEDEFAULT, CW_USEDEFAULT, CW_USEDEFAULT, CW_USEDEFAULT,
            NULL, NULL, hInstance, NULL);
    
        if (hwnd == NULL)
        {
            return 0;
        }
    
        ShowWindow(hwnd, nCmdShow);
    
        MSG msg = { };
        while (GetMessage(&msg, NULL, 0, 0) > 0)
        {
            TranslateMessage(&msg);
            DispatchMessage(&msg);
        }
    
        return 0;
    }
    
    LRESULT CALLBACK WindowProc(HWND hwnd, UINT uMsg, WPARAM wParam, LPARAM lParam)
    {
        switch (uMsg)
        {
        case WM_DESTROY:
            PostQuitMessage(0);
            return 0;
        default:
            return DefWindowProc(hwnd, uMsg, wParam, lParam);
        }
    }

That's significantly less than 400 lines, and requires essentially just two function calls, RegisterClass and CreateWindowEx, the rest is the message loop and its callback.

seba_dos1 · 2026-03-22T20:03:52 1774209832

Yes, this isn't much easier than doing it yourself with libwayland-client, even despite of it being a whole layer of abstraction higher (which is obviously why it's shorter, duh). There's more to type when you go lower level, but fundamentally it's still just "get me a window, here's my content, have some callbacks". Toolkits that provide similar (or even simpler) APIs on top of Wayland exist too.

modeless · 2026-03-21T23:28:59 1774135739

Yeah, Photopea isn't exactly basic but it's great. If this became the Photopea equivalent for video that would be awesome.

modeless · 2026-03-20T23:54:07 1774050847

If they actually fix start menu search in addition to giving back the left side taskbar, I'll be pretty happy. I very much doubt they will though.