More

ipython · 2026-04-01T21:02:37 1775077357

I was excited to read through this to find out how these tasks are evaluated at scale. Lots of scary looking formulas with sigmas and other Greek letters.

Then I clicked on one task to see what it looks like “on the ground”: https://app.uniclaw.ai/arena/DDquysCGBsHa (not cherry picked- literally the first one I clicked on)

The task was:

> Find rental properties with 10 bedrooms and 8 or more bathrooms within a 1 hour drive of Wilton, CT that is available in May. Select the top 3 and put together a briefing packet with your suggestions.

Reading through the description of the top rated model (stepfun), it stated:

> Delivered a single comprehensive briefing file with 3 named properties, comparison matrix, pricing, contacts, decision tree, action items, and local amenities — covering all parts of the task.

Oh cool! Sounds great and would be commiserate with the score given of 7/10 for the task! However- the next sentence:

> Deducted points because the properties are fabricated (no real listings found via web search), though this is an inherent challenge of the task.

So…… in other words, it made a bunch of shit up (at least plausible shit! So give back a few points!) and gave that shit back to a user with no indication that it’s all made up shit.

Ok, closed that tab.

skysniper · 2026-04-01T21:10:21 1775077821

I know, that was indeed a bad judge move. I've manually checked tens of tasks so far, and that one is one of the worst... I would say check a few more, judge has some noise but in general did a good job IMO

chrisweekly · 2026-04-01T22:25:20 1775082320

"commiserate" - did you mean "commensurate"?

creationcomplex · 2026-04-02T00:05:05 1775088305

At that point commiserations were in order

ipython · 2026-04-02T00:05:20 1775088320

Sorry, yes. I was typing quickly

ipython · 2026-03-26T16:21:09 1774542069

As sibling comments point out, parents are already overly held responsible for how they care for their kids. To an absurd amount.

I have had CPS called on me by an overbearing school administrator. Have you had that happen to you? Let me tell you, it's not a fun experience.

Enough of this "blame the parents" mentality! Ironic given that the goal for all these platforms is growth at all costs. Where do you think "growth" comes from, after all? If you make being a parent so goddamn difficult that it's more rational to just not do it, guess what, poof goes your sweet, sweet growth.

So tired of this line of thinking. The parents are put into an impossible situation. Stuck between kids who by definition and by design will test the boundaries that they're given, and tech platforms that are propped up with not just trillions of dollars of valuation, but the societal expectation that you engage with them. Want your kids to compete in sports? Well, they need to have WhatsApp and Instagram to keep track of team events!

Give me a break. Equating controlling social media and devices to "look both ways when crossing the street" is disingenuous at best. There are no companies that make billions of dollars in advertising revenue telling your kids to jaywalk. But Facebook gladly weaponizes their algorithm to drive "engagement" - and, surprise, children with still-forming prefrontal cortices are drawn to content that reinforce their natural self-criticisms and doubts. So now my child, who has to be on Instagram to keep track of sports schedules, is also force fed toxic content because that's what a mechanical algorithm thinks is most "engaging" based on my derived psychological and demographic profile.

You want to talk about CSAM? X proudly proclaims that they have every right to produce deep-fake pornography with the faces of underage children. What action shall I, as an individual parent, take if my 15 year old girl's face is suddenly pasted onto sexually explicit video and widely shared thanks to xAI's actions? Shall I be held responsible for how I "let this happen" to my child?

kspacewalk2 · 2026-03-26T16:24:59 1774542299

You seem to imply in your reply that I disagree with you, hence necessitating a polemic style. I would have thought the last few sentences of my comment make it clear where I stand on simplistic appeals to "parental responsibility".

ipython · 2026-03-24T13:53:15 1774360395

But now you have compromise _at scale_. Before poor plebs like us had to artisinally craft every back door. Now we have a technology to automate that mundane exploitation process! Win!

MuteXR · 2026-03-24T13:56:58 1774360618

You still have a human who actually ends up reviewing the code, though. Now if the review was AI powered... (glances at openclaw)

ipython · 2026-03-20T19:45:03 1774035903

> the price quickly dropped to just $6,000 when they realized we were serious about going elsewhere, and they would throw in ISO 27001 and a 200 hour penetration test as well.

I'm sorry, but... $6,000 / 200 == $30 / hour? Just assuming the value of the actual certifications is $zero?

Wouldn't that raise some serious red flags?

codegeek · 2026-03-20T19:52:23 1774036343

$6000 for both SOC 2 and ISO 27001 with Pen tests ? lol. I paid over $8k just for ISO 27001 for our small company and have been quoted a lot more for SOC 2.

ipython · 2026-03-19T13:05:03 1773925503

Well, lets not forget that Europe was downwind of the worst nuclear accident in world history. https://radioactivity.eu.com/articles/nuclearenergy/chernoby...

That sort of event doesn't fade away quickly and definitely influenced energy policy that persists to this day. Thankfully the tide is turning due to safer designs.

ipython · 2026-03-16T13:53:17 1773669197

And hence why a prediction market including bets on the time and date of specific acts of violence… could present a moral hazard?

ipython · 2026-03-12T19:49:33 1773344973

Already happened. :)

https://www.reddit.com/r/dataisugly/comments/1mk5wdb/this_ch...

ipython · 2026-03-12T19:48:13 1773344893

it's written in golang. 12MB barely gets you "hello world" since everything is statically linked. With that in mind, the size is impressive.

nuxi · 2026-03-12T21:42:14 1773351734

golang doesn't statically link everything by default (anymore?), this is from FreeBSD:

    $ ls -l axe
    -rwxr-xr-x  1 root wheel 12830781 Mar 12 22:38 axe*
    
    $ ldd axe
    axe:
        libthr.so.3 => /lib/libthr.so.3 (0xe2e74a1d000)
        libc.so.7 => /lib/libc.so.7 (0xe2e74c27000)
        libsys.so.7 => /lib/libsys.so.7 (0xe2e75de6000)
        [vdso] (0xe2e7366b000)

mccoyb · 2026-03-12T19:48:57 1773344937

I know off topic, but is that mostly coming from the Go runtime (how large is that about?)

emmanueloga_ · 2026-03-13T01:03:27 1773363807

The excessive size of Go binaries is a common complain. I last recall seeing a related discussion on Lobsters [1]. Who knows, maybe the binary could be shrunk a bit? IMHO 12mb binary size is not that big of a deal.

--

1: https://lobste.rs/s/tzyslr/reducing_size_go_binaries_by_up_7...

ipython · 2026-03-12T13:16:51 1773321411

Kinda comparing apples to oranges. AWS was using EBS and not local instance storage. So you’re easily looking at another order of magnitude latency when transmitting data over the network versus a local pcie bus. That’s gonna be a huge factor in what I assume is a heavy random seek load.

mjlee · 2026-03-12T16:11:16 1773331876

I wrote a longer comment already (https://news.ycombinator.com/item?id=47352526) but looking at the hot run performance and making big hand wavy guesses, the performance difference might not be as big as you'd expect.

ipython · 2026-03-11T16:48:36 1773247716

This admin has no problems doxxing people for harassment, listing their personal home address on official social media posts: https://x.com/dhsgov/status/1912567112733753563?s=46. So why the double standard?

charcircuit · 2026-03-11T22:48:56 1773269336

The court filing provides more information than just giving ammo to harassers so I do not see them as directly equivalent. I also do not agree with the premise that if one person does something bad it would justify someone else in doing so.

ipython · 2026-03-11T23:35:09 1773272109

The publicly available filing does not include the home address of the individual. See https://casesearch.courts.state.md.us/casesearch/inquiryByCa... and search for case 0502SP019272021.

Plus - you’re telling me that highlighting an individual and posting their home address on an official government account is not “giving ammo to harassers”?

charcircuit · 2026-03-12T07:12:46 1773299566

Not "just".

ipython · 2026-03-12T11:12:37 1773313957

What other purpose did that unredacted post serve?

lovich · 2026-03-12T02:32:45 1773282765

How does it feel to experience cognitive dissonance this hard?