More

MintsJohn · 2025-10-08T11:31:37 1759923097

Of course the devil is in the details. What you say and the skills needed make sense. It's unfortunately also the easiest aspects to dismiss either under pressure as there is often little immediate payoff, or because it's simply the hard part.

My experience with llms in general is that sadly, they're mostly good bullshitters. (current google search is the epitome of worthlessness, the AI summary so hard tries to make things balanced, that it just dreams up and exaggerates pros en cons for most queries). In a same way platforms like perplexity are worthless, they seem utterly unable to assign the proper value to sources they gather.

Of course that doesn't stop me from using llms where they're useful; it's nice to be able to give the architecture for a solution and let the llm fill the gaps than to code the entire thing by hand. And code-completion in general is a beautiful thing (sadly not a thing where much focus is on these days, most is on getting the llm create complete solutions while i would be delighted by even better code completion)

Still all in all, the more i see llms used (or the more i see (what i assume) well willing people copy/paste llm generated responses in favor of handwritten responses) on so much of the internet, resulting in a huge decline of factualness and reproducibility (in he sense, that original sources get obscured), but an increase of nice full sntences and proper grammar, the more i'm inclined to belief that in the foreseeable future llm's aren't a net positive.

(in a way it's also a perfect storm, the last decade education unprioritised teaching skills that would matter especially for dealing with AI and started to educate for use of tools instead of educate general principles. The product of education became labourers for a specific job instead of higher abstract level reasoning in a general area of expertise)

simonw · 2025-10-08T11:52:13 1759924333

Google's "AI overviews" are one of the worst LLM-powered features on the market today, they're genuinely damaging the reputation of the whole industry.

Meanwhile I've started using ChatGPT GPT-5 search as my default search engine! A year ago I would have laugher at the idea: https://simonwillison.net/2025/Sep/6/research-goblin/

And Google themselves have an "AI mode" which is a different league of quality from "AI overviews", I wrote about that one here: https://simonwillison.net/2025/Sep/7/ai-mode/

This is new. AI search tools almost universally sucked until OpenAI's release of o3 in April this year.

rhetocj23 · 2025-10-08T14:19:34 1759933174

It might actually be in Googles best interest to damage the interest in LLMS by showing those crappy AI Mode stuff, because it materially impacts their business model.

The perception of LLMs in the gen pop is what matters, not in the eyes of techies.

MintsJohn · 2025-08-08T12:30:52 1754656252

This is what finetuning has been all about since stable diffusion 1.5 and especially SDXL. And even something StabilityAI base models excelled at in the open weights category. (Midjourney has always been the champion, but proprietary)

Sadly with SAI going effectively bankrupt things changed, their rushed 3.0 model was broken beyond repair and the later 3.5 just unfinished or something (the api version is remarkably better), gens full of errors and artifacts even though the good ones looked great. It turned out hard to finetune as well.

In the mean time flux got released, but that model can be fried (as in one concept trained in) but not finetuned (this krea flux is not based on the open weights flux). Add to that that as models got bigger training/finetuning now costs an arm and a leg, so here we are, a year after flux got released a good finetune is celebrated as the next new thing :)

vunderba · 2025-08-08T14:26:42 1754663202

Agreed. From the article:

> Model builders have been mostly focused on correctness, not aesthetics. Researchers have been overly focused on the extra fingers problem.

While that might be true for the foundational models - the author seems to be neglecting the tens of thousands of custom LoRAs to customize the look of an image.

> Users fight the “AI Look” with heavy prompting and even fine-tuning

IMHO it is significantly easier to fix an aesthetic issue than an adherence issue. You can take a poor quality image, use ESRGAN upscalers, img2img using it as a ControlNet, run it through a different model, add LoRAs, etc.

I have done some nominal tests with Krea but mostly around adherence. I'd be curious to know if they've reduced the omnipresent bokeh / shallow depth of field given that it is Flux based.

dragonwriter · 2025-08-08T15:17:25 1754666245

> Model builders have been mostly focused on correctness, not aesthetics. Researchers have been overly focused on the extra fingers problem.

> While that might be true for the foundational models

Its possibly true [0] of the models from the big public general AI vendors (OpenAI, Google), its defintely not true of MJ (which, if it has an aesthetic bias to what the article describes as “the AI look” it is largely because that was a popular actively sought and prompted for look in early AI image gen to avoid the flatness bias of early models and MJ leaned very hard into biasing toward what was popular aesthetically in that and other areas as it developed. Heck, lots of SD finetunes actively sought to reproduce MJ aesthetics for a while.)

[0] but I doubt it, and I think they have also been actively targeting aesthetics as well as correctness, and the post even hints at at least part of how that reinforced the “AI look” — the focus on aesthetics meant more reliance on the LAION Aesthetics dataset to tune the models understanding of what looked good, transferring the biases of that dataset into models that were trying to focus on aesthetics.

vunderba · 2025-08-08T16:00:01 1754668801

Definitely. It's been a while since I used midjourney, but I imagine that style (and sheer speed) are probably the last remaining use cases of MJ today.

dvrp · 2025-08-08T18:29:09 1754677749

It is not just a fine-tune.

MintsJohn · on Oct 8, 2024

Interesting notion, I notice the same with image models, less stylistic more blandness on the latest generation. Only MJ seems to have style as a feature.

MintsJohn · on Feb 9, 2024

I tried it with rust, it's so bad it's simply not usable, it hallucinates methods and even the syntax is wrong at some points (it especially can't get error types correct or seems). Gpt4 doesn't handle rust perfect either, but the code it produces is good enough to only need some touch-ups, it can explain and fix wrong use of (we all known) libraries and even gets async code. But it's especially great for boilerplate, saves so must typing.

I was hoping openai/gpt4 would see some healthy competition, but Gemini doesn't seem to be it. Of course, the rust language might be an edge case.

MintsJohn · on Jan 1, 2024

I'm more and more thinking a government issued digital identity (like https://privacybydesign.foundation/irma-en/) that can be used to proof you're human (and other details of you want) but that can't be tracked back to an individual, but again optionally, can be used to create (multiple) online persona's is the way forward. I used to think of these things as dystopian, but fake content by fake persons is a bigger issue. Of course real persons could create persona's for such a bot, but a (personal and/or community based) blacklist mechanism based on the root account (the real human that created the persona) would go a long way.

MintsJohn · on Dec 8, 2023

No worries, you're not alone. I can see it has decent (not great imho) production values, but it's not for me. It's slow and uninteresting, I'm level 2/3 and facing unbeatable level 5 enemies (a hag in a swamp, some tiny island after a swamp or some underground minotaurs). I know I should find entertainment in doing other things in the game, but I'm 10 hours in already, and progress is agonizingly slow. All I got is that I've bug in my eye I want to get out of it, somehow all my companions have it as well, and there's a refugee camp that's being evicted by more druids. I have no idea what the overarching story is, I just go from little set piece to new little set piece (which feel rather artificial), but it's not very interesting and my characters feel as weak as when I started the game. I have fond memories of bg1 and bg2, played them into early morning cause I wanted to know what was next. But now, I just don't care, I just feel lost and like making no progress at all.

That there seemingly is no great story shouldn't be that much an issue to me, it doesn't always matter to me, but with the battles not being engaging, character progress nonexistent, exploring not being exciting nor rewarding, there's just nothing that makes me want to come back. Now i do have a great dislike for open-world games, and it seems bg3 just has all the elements that makes me dislike it, too much make your own adventure. I don't recall the earlier baldurs gate games being this way.

Gigachad · on Dec 9, 2023

I was getting stuck on some of these things too. Ended up having to explore more of the map to find stuff that was more manageable as well as checking YouTube for some tips on how to play.

Sure maybe they could have included a little more education on how stuff works in game, but it’s not too bad.

What has me most entertained about the game is that your choices and dialogue options have real impacts on your path through the game. Unlike most games that feel like they are on rails.

MintsJohn · on Nov 29, 2023

There's quite a few hosted SDXL platforms (mage.space, leonardo.ai, novel.ai, tensor.art, invoke.ai to name a few) and most consumers do not have the GPUs needed to run those models, only enthusiasts do.

It's always baffled me that stability didn't offer a competitive UI platform to use their models with, clipdrop is just bad quality and very bare-bones, and dreamstudio is pricey and still lacks most features. So this move to a new licensing strategy doesn't surprise me, it actually is somewhat comforting, as i expecting them to just stop releasing further trained models (e.g sdxl1.1 and up), and only offer those on their services (of course, that can still happen) cause how else were they going to monetize the consumers (i know they (planned to) offer custom trained/finetuned models to big corps, but that doesn't monetize consumers).

However, as most releases by stability these days, it has this feeling of close-but-no-cigar, and the recent LCM lora's might be a little slower, but these actually offer 1024^2 resolution, work with any existing lora's and finetunes (so they are usable for iterative development, unlike this turbo model, cause well, it's a different model, can't iterate on it then expect sdxl (with lora's, to a lesser extend also without) to generate a similar image) and support cfg-scale (and therefor negative prompts / prompt weighting). I suppose there's some niche market where you need all the speed you can get, but unless there's a giant leap in (temporal) consistency, that will remain niche, i don't see the mentioned real-time 3d "skinning" neither the video img-to-img (frame-to-frame) gimmicks take off with current quality and lack of flexibility. It's good research, optimizations have lots of value, but it needs quality as well.

Their recent video model is quite bad as well, especially compared to pika and runway gen-2, but well, but as with the the dalle-3 comparison one can say those are closed source and stability's offering is open.

Then we have the 3d model, close sourced, worse than luma's genie unfortunately.

The music model is nothing like suno's chirp (which might be multiple models, bark and a music model) used together), and the less said about their llm offerings the better.

Bottom line, stability needs a killer model again, they started strong with stable diffusion 1.5, took a wrong turn with 2.0 (kind of recovered by 2.1, but the damage was done), and while SDXL is't bad in a vacuum, neither was it the leap ahead that put it in front of competition like midjourney at the time, and Dalle-3 a little later, and now even a relatively small model like pixart-alpha, also opensource, can offer similar quality to what sdxl offers (with a lot of caveats, as it has been trained on so few images it just doesn't have info on many concepts). And more worrying, there's no hint of something better in the stability's pipeline. But maybe image-gen is as best as stability can get it, and they think they can make an impact pivoting in another direction or multiple directiobs, but currently, it feels a master-of-none situation.

MintsJohn · on Oct 26, 2023

Well, this just goes to show not all elderly are the same. My relatives would love more remotes and hate more strange non physical interfaces. Controlling lights with voice commands vs a (eventually on a remote) button, button wins 100% of the time.

The biggest hurdle seems to be discoverability. A physical remote makes sense, and doesn't change. Buttons have a singular function and context doesn't matter. Apps are different beasts, navigate up/down/forward/backward (thus context (what did you do before, now doing x does y, but otherwise z)) is just met with glazy eyes. Going back and front just makes no sense, why sometimes you need to go to a menu and other times it's a shortcut button, makes no sense, especially when summoning the menu needs special navigation. To add insult to injury, every now and then apps get an overhaul, and suddenly navigation and buttons changed/looked different.

Now when i say apps, i do mean apps on phones/tvs. The windows UI works for them because the basics are the same for all programs, using word or outlook, menu items have text like "send" "save" and while it takes time, functionality is discoverable. But for tv apps the logic is "click up or down until the icon you want has a different hue, then press a button on the remote to do stuff, but only when the screen shows x,not when it shows z". It's too much functionality condensed in too little UI. I'm constantly baffled by design choices for apps that are supposed to be used by everyone, I'm sure it looks nice to designers and devs, but have they even tried showing their brand new TV interface (these are the worst offenders) to an elderly and gave them simple tasks?

MintsJohn · on Oct 19, 2023

At least you got an explanation, I had my Vivid account blocked without one, the only interaction was by chatbot, the only response I got was can't disclose the reason because of compliance issues. For extra fun trying to get the funds on the account back I had to go to an online form, which didn't have the fields I was told to fill in, and again was met by no response. Eventually had to go through EU dispute settlement to get a human response and an email link to an app for digital identity validation, eventually I at least got my funds transferred to my main bank. Why the account was blocked remains a mystery, I only used it for online payments and never even contested, refunded or failed one.

Anyway, it's utterly bizarre to me that banks can get a license to run their business with what seems only a marketing and it-team with virtually no recourse for the customer, it was a WTF moment for me and I hope to never get in such a dystopian situation for something I really rely on, be it banking or (utility) services. (and I got a slightly better understanding how it must feel like for the victims of the Dutch childcare benefits scandal (https://www.politico.eu/article/dutch-scandal-serves-as-a-wa...))

plagiarist · on Oct 19, 2023

I had a financial company attempt to withhold my money so I skipped the chatbot and went straight to the CFPB. They were suddenly able to ACH the funds back to the account they originally took them from. Just goes to show how arbitrary and capricious all these companies are.

MintsJohn · on Sept 8, 2023

This is why ,as useful as they are, I also loathe llms, so much textual content is endless drivel, either fully created by AI or helpfully rewritten. Of course it's not new, but what is new (to me) is users using llms in discussion topics to either troll or to make their point, or receiving (real, work related) e-mails fixed up by chat-gpt; madness.