Hacker Newsnew | past | comments | ask | show | jobs | submit | mpeg's commentslogin

If you look at the ranking breakdown though, Kimi K2.6 has only participated in the last 5 challenges (claude dominated before then) and if you only count those it would be in first place

It also has a DNF. So it has a high ceiling but also unfortunately a low floor. So using Kimi means accepting high variability of the output.

Personally what I've found that has made coding agents more and more useful over the last year is that they have gotten a higher and higher floor, not that they have gotten a higher and higher ceiling. They were already plenty smart a year ago, it was just that they failed so often and so spectacularly that it made them a liability. Now they have become much more reliable, which is the key thing that has transitioned them into being actually useful. For the most part I don't use them to work on really intellectually difficult tasks. I mostly use them to work on very boring and labor intensive tasks. Most commercial software development work is just boring drudgery like this. Certainly the bulk of what I need them for is. I need them to just not crap their pants all the time while they're at it.

So I'm kinda wary seeing the poor reliability of Kimi.


If you look at the last 5 challenges (the ones Kimi was in) both Claude and Kimi have 1 DNF, chatgpt has 2

I'm not sure this is enough data to form an opinion, but going by what we have Kimi would be as reliable as Claude


You learn the most random ways to abuse program features, one I still remember because of how long it took to figure it out was an htb box that (after a long exploitation path) used NTFS ADS to hide the flag within the alternate stream in a decoy file; and of course the normal way to extract the stream was disabled so had to do some black magic with other binaries to get it

I was surprised at the low bounty too, considering the resources of openai

Last year I won a similar prompt injection challenge ran by a crypto startup against the latest claude and gpt (at the time) and it was considerably more money, from an org with maybe $5-10m in funding.

That and the restrictive NDA kinda tells me they're not looking for serious bounty hunters, who would either want a lot more money or, alternatively, to be able to publish their work; seems like a marketing stunt.


This one would be a fun challenge in a ctf, or maybe more appropriate for a puzzle hunt – most people would look at the dissassembly and not at the actual bytes and completely miss the binary encoding

Some disassembly listings will also include the actual bytes (there are multiple reasons why you will want this).

It's generated, when you try it you can see this is mostly a harness around claude opus 4.7 that helps it create a good design plan, it also supports asking you questions as it goes along, letting you review and feedback on mockups, etc, but ultimately if you look at what it's generating as it does it – it's just code


"Its just code" is meaningless to me. Is the code its generating using mostly well known widgets with predefined knobs, or is every element completely custom and the knobs are created on the spot with slightly different naming and function every time?

I actually think I would prefer the more boring "it composes well known widgets" because then there's a chance I could just use this to generate a presentation layer and integrate it instead of new blobs of code I need to essentially reverse engineer or remake.


Depends of what you prompt it... if you tell it to use react and shadcn, it will use that.


Which doesn’t work for a native iOS app, so it’s HTML rendering something that would then be rewritten in iOS I guess.


It's rolling out progressively, it works for me – it actually seems very polished, the examples are really good; and it lets you create your design system from your codebase


I don't have a horse in this race, but this seems the right way to me. As a developer, I do already inject custom scripts to provide extra functionality / automation on SaaS I use where APIs are not available or limited.

However, the thought of the non-technical users I work with doing that is scary, they have no idea if the code the LLM writes is correct, is it going to have a bug that causes a massive issue down the line?

I've seen fat finger errors cause financial loss, but at least in those cases the user always had a chance to realise their error and fix it, with something like this how would you even know?


I'm glad to hear that it resonates! And yes, being able to provide a guardrailed environment has been the biggest problem that we're solving.

Now every customer can get the power of Claude Code and vibe coding in a safe and controlled environment.


your profile's blog link 404s



Right on. (Smart take on SaaS, too, BTW)


ty! any CEOs or product people at SaaS companies that come to mind to reach out to? I've been trying to get this in front of more people, so far they're always mind blown when they see how the product works


lots of admin credentials too, which have probably never been changed


admin passwords to dating sites, that's the stuff people get blackmailed with


How does someone's dating site password end up in Fiverr?


it's worse than you think – it's an admin password to the ~whole site~


How does an admin password to the whole site end up on Fiverr?


There are lots of passwords there (though one wonder if they were rotated). Basically, the people doing the hiring are sending PDFs with their credentials to the contractors to do the job.


Oh my. I feel for the tech team at fiverr. I'm sure it's nasty in there. Sending virtual hugs.


They have a dating site password! They can get real hugs.


Personally I have more sympathy for the people who were screwed over by the incompetence of at least some of that tech team


Meanwhile, I hope they get sent to prison for being so cavalier with other people's PII.


Motion is an excellent library so I gave this a go on a prod site. Some feedback

- I LOVE the concept, no clunky SaaS, you add the package and start it on your dev server and it just works. It seamlessly did with my vite based build.

- Needs a diff view which tells me what the agent is going to change when I publish my changes, right now it's a bit scary to use without it (not sure if it does once you try to publish changes, I didn't get that far in the process)

- I don't see the point of the "draw" feature. Maybe it's because I envision this kind of tool being used so that non-technical members of the team can make small design changes without dev support, and not as a way to design from scratch, but maybe you have a use-case for it.

- Integration with tailwindcss would be a killer feature, this particular project uses tailwind so all the styles in the style view show as the default ones but of course they're being applied via classes. You could detect tailwind classes and either show them separately or resolve them and show what they do in the styles view, then on publish you'd tell the agent to edit using tailwind classes

I agree with what others have said, a video or even better a live demo would be great. A demo would be extra work but would be super cool, as a stopgap you could have a stackblitz demo maybe.

The client-side injected js -> mcp flow is brilliant though. I might have to steal that idea for some projects I'm working in, I can imagine a lot of scenarios where it would make a great interface


Thanks for your feedback!

I just pushed a video to the homepage, there was already a live demo though, it was actually quite simple to implement (mostly gate a few things). There was a bit of a fear that agent somewhere out there would still be listening though...

I think a diff is an excellent idea. Perhaps with the ability to remove specific changes and switch before/after.

In terms of Tailwind, I'm thinking about a token/strict mode which would detect Tailwind classes and CSS variables. It wouldn't expose these in the sense you had to apply each one manually, but if you were for instance changing padding, it would snap between all your pre-defined tokens.

For the draw feature I think I'm just heavily Framer-pilled and it lets you pre-determine a rough width and height within a stack. But perhaps there's space for a click-to-add also with minimum dimensions.


Sorry I'm blind! I completely missed the live demo. I think because it's on the top right corner I instinctively ignored it.

Maybe could have a "Try live" button that sort of nudges you to it (could open the sidebar with the page structure or something to make it obvious you're in "edit mode") if other people struggle to find it

Re. diff view, yes, I think it's the kind of thing that would give reassurance to users that they can play around with it without breaking anything, otherwise I feel I'd be a bit scared of accidentally touching something that shouldn't be changed (especially as you might experiment a bit before you land on the right style to change)


Honestly, you'll struggle to find a cloud platform cheaper than cloudflare.

The $5/mo gets you 10 million dynamic requests (static assets are not included in this limit, so often a single pageview will be 1 dynamic request) and that would be across the whole workers product for your account, no extra pricing for extra websites, domains, or anything else like you'd see in most "wordpress hosting"

I run all my personal sites and client sites (one of them for a fortune 500 company) in the $5/mo plan, and the only time I went over that was when a client got hammered with malicious requests (and it was like $100)

Disclaimer: I have no relationship to cloudflare, I'm just a happy customer


I run a rust webserver on a €4 VPS from hetzner that serves 300M (million) requests a day. Way cheaper than doing that on _any_ "serverless" request-based platform, I think.


Yes perhaps I should have specified you can't get much cheaper for serverless platforms.

You can certainly run a VPS like that for cheaper, you could probably even beat the raw request numbers from those 1€ a month vps from ovh or similar. The key difference is with cloudflare your site is globally distributed by default, and you get to buy into the whole ecosystem, if you want.


> The key difference is with cloudflare your site is globally distributed by default, and you get to buy into the whole ecosystem, if you want.

The real question nobody asks: do you even really need global distribution?


Most of the time: no

But sometimes you do have clients in both sides of the atlantic and it's nice being able to cut their request times by a few hundred ms "for free". Personally, that's not the main reason I use cloudflare, but it can be handy!


interesting, I'd assumed the lowest tier of hetzner (4.50/m, 2 cpus, 4GB ram) wouldn't hold up to that.

must be very light, for so much traffic. any more details?


It's a BitTorrent tracker

tracker.mywaifu.best:6969/announce

Running https://github.com/ckcr4lyf/kiryuu

(Disclaimer: I'm the author of kiryuu)

CPX11, so 2vCPU/2GB


Care to share your multi-site strategy? I've been investigating ways to program billing alerts, which should alert me when bill is at xx%. There are a few ways, so that makes it more paletable. It just boils my blood when these big platforms don't offer it as a feature. THAT IS JUST PURE UNADULTERATED EVEIL, IT SOULD BE CRIMINAL REALLY!!


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: