Hacker Newsnew | past | comments | ask | show | jobs | submit | jcattle's commentslogin

Can we talk about how jarring the announcement video is?

AI generated voice over, likely AI generated script (You see, this model isn't just generating images, it's thinking!). From what it looks like only the editing has some human touch to it?

It does this Apple style announcement which everyone is doing, but through the use of AI, at least for me, it falls right into the uncanny valley.


I mean, your prompt is basically this skit: https://www.youtube.com/watch?v=BKorP55Aqvg ("The Expert" 7 red lines: all strictly perpendicular, some with green ink some with transparent ink)

I couldn't imagine the image you were describing. I've listed some of the red lines with green ink I've noticed in your prompt:

Macro Close Up - Sharp throughout

Focus on tiny gear - But also on tweezers, old watchmakers hand, water drop?

Work on the mechanism of the watch (on the back of the watch) - but show the curved glass of the watch face which is on the front

This is the biggest. Even if the mechanism is accessible from the front, you'd have to remove the glass to get to it. It just doesn't make sense and that reflects in the images you get generated. There's all the elements, but they will never make sense because the prompt doesn't make sense.


The last point (reflection by front glass versus mechanism access so no front glass) is the only issue I see with it. Other than that I can easily visualize an image that satisfies the prompt. I think that the general idea is a good one because it's satisfable while having multiple competing requirements that impose geometric constraints on the scene without providing an immediate solution to said constraints as well as requiring multiple independent features (caustics, reflections, fluid dynamics, refraction, directional lighting) that are quite complicated to get right.

To illustrate that there aren't any contradictions (other than the final bit about the reflection in the glass). Consider a macro shot showing partial hands, partial tweezers, and pocket watch internals. That's much is certainly doable. Now imagine the partial left hand holding a half submerged pocket watch, fingertips of right hand holding front half of tweezers that are clasping a tiny gear, positioned above the work piece with the drop of water falling directly below. Capture the watchmaker's perspective. I could sketch that so an image model capable of 3D reasoning should have no trouble.

It's precisely the sort of scene you'd use to test a raytracer. One thing I can immediately think to add is nested dielectrics. Perhaps small transparent glass beads sitting at the bottom of the dish of water with the edge of the pocket watch resting on them, make the dish transparent glass, and place the camera level with the top of the dish facing forward?

https://blog.yiningkarlli.com/2019/05/nested-dielectrics.htm...

A second thing I can think to add is a flame. Perhaps place a tealight candle on the far side of the dish, the flame visible through (and distorted by) the water and glass beads?


Without the last point with the watch glass it is also easier to imagine for me. Still, you'd have to be selective.

Do you want it to actually look like macro photography (neither of the generated images do)? Then you can't have it sharp throughout and you won't be able to show the (sharp) watchmakers face in a reflection because it would be on a different focal plane.

Dropping the macro requirement, you can show a lot more. You can show that the watchmaker is actually old, you can show the reflection, etc.

Something has to give in the prompt, on multiple of the requirements. The generated images are dropping the macro requirement and are inventing some interesting hinging watch glass contraptions to make sense of it.


Yeah, fair enough. I figure "macro" sees sufficiently loose use that a model should be able to make sense of it but to get the prompt into perfect shape that ought to be replaced with something like "a closeup showing X, Y, Z in perfect focus". Still the only real problem I see is the aforementioned contradiction regarding the front glass. Short of that single detail an artist could easily satisfy the description as written to well within reason.

Yeah I dunno bud, I have a degree in film and three Emmy awards for technical production (an expert), I could shoot that prompt (unlike the so called "expert" in the skit). Canon EF 100mm Macro USM at f32 should be able to produce that, focus doesn't need to imply aperture, and a quick google search shows me there are loads of front gear pocket watches available. Also it produced something very clearly not shot with a 100mm anyway, as the telephoto compression is wrong.

Far be it for me to add to a comment by an expert from someone who only whipped out his macro lens for ring shots at weddings and - about 2 hours ago - a picture of our latest newborn. However, I think most photographers is that situation wouldn’t shoot at f/32 due to diffraction and would focus stack instead.

Of course, a text to image model shouldn’t really need to worry about that sort of thing.


Yeah I dunno bud, I've watched a few watch repair videos on youtube and have seen macro photography which other people did.

Sure there are pocket watches where the movement is visible from the front (you'd still likely service them from the back, but alas). Even if you'd do service from the front where the glass is, you'd still have to remove it to drop in a gear.

Anyway, I think that we aren't really talking about the same thing. I'm nitpicking your prompt while you constructed it to mostly see the performance of the model in novel situations and difficult lighting and refraction environments. And that's fair.

How satisfied are you with the generated image results? What would you do different when shooting this proposed scene yourself?


Reasonable people can disagree - I think you made some good points, I've been sitting for the last 20 minutes wondering where the DoF at 32 on a 100 runs out, maybe you're right I'm not 100% sure.

The prompt I did mostly to see how it does with the gears and the tweezers, and the perspective of the gears (do they.. I don't know the opposite word of distort, straighten?, but do they seem like they're actually round, could they work?) I think those are really hard things for AI, the glass distortion, reflections the DoF etc were just to see how it approached that, and like the other comment below said, I tried to pick something that that wasn't likely to be in training data, so it reasoned about it more.

Nano was able to spit it out consistently, Images 2 really struggles, and has yet to complete one I was satisfied with, whereas with nano it nails it almost every time, the 2 images I showed originally are the first shot of the prompt with the models. (here are the 3 other gens from Images2: https://drive.google.com/drive/folders/1s8gik_x0B-xDZO6rOqoz...)

How would I shoot it? I wouldn't, fixing a watch in water is a dumb idea. ;)


And are doing exceptional things (like building their own cleanroom in a shed)

There's a difference between insider trading and gambling no?

Is there? In gambling the house is the insider. They already know if you're going to win the next spin or not.

Just a thought: This data engineering can only really occur in sciences with a significant "moat".

Expensive tools, expensive test setups, live, gene-altered animals, etc.

In fields such as deep learning or other more digital fields (my field is using a lot of freely available satellite data) replication is often cheaper and actual application of research outcomes is a lot more common.


I used to think that but....

I've reviewed for a few "replication tracks" at ML Conferences and there are a surprising number of reports where people are simply unable to replicate published results. The reasons are all over the map: sometimes the original authors' code just needs to be fixed (new libraries, different environments), but other results simply don't seem to hold up.


Yes, immediately thought the same. CSV alone is a footgun and a half on any computer which doesn't have . as the decimal separator.

Let alone column sorting and joining of data.


CSV occupies, even years after moving away from more raw data work, way too much of my brain is still dedicated to "ways of dealing with CSV from random places".

I can already hear people who like CSV coming in now, so to get some of my bottled up anger about CSV out and to forestall the responses I've seen before

* It's not standardised

* Yes I know you found an RFC from long after many generators and parsers were written. It's not a standard, is regularly not followed, doesn't specify allowing UTF-8 (lmao, in 2005 no less) or other character sets as just files. I have learned about many new character sets from submitted data from real users. I have had to split up files written in multiple different character sets because users concatenated files.

* "You can edit it in a text editor" which feels like a monkeys-paw wish "I want to edit the file easily" "Granted - your users can now edit the files easily". Users editing the files in text editors results in broken CSV files because your text editor isn't checking it's standards compliant or typed correctly, and couldn't even if it wanted to.

* Errors are not even detectable in many cases.

* Parsers are often either strict and so fail to deal with real world cases or deal with real world cases but let through broken files.

* Literally no types. Nice date field you have there, shame if someone were to add a mixture of different dd/mm/yy and mm/dd/yy into it.

* You can blame excel for being excel, but at some point if that csv file leaves an automated data handling system and a user can do something to it, it's getting loaded into excel and rewritten out. Say goodbye to prefixed 0s, a variety of gene names, dates and more in a fully unrecoverable fashion.

* "ah just use tabs" no your users will put tabs in. "That's why I use pipes" yes pipes too. I have written code to use actual data separators and actual record separators that exist in ASCII and still users found some way of adding those in mid word in some arbitrary data. The only three places I've ever seen these characters are 1. lists of ascii characters where I found them, 2. my code, 3. this users data. It must have been crafted deliberately to break things.

This, excel and other things are enormous issues. The fact that there any are manual steps along the path for this introduces so many places for errors. People writing things down then entering them into excel/whatever. Moving data between files. You ran some analysis and got graphs, are those the ones in the paper? Are they based on the same datasets? You later updated something, are all the downstream things updated?

This occurs in all kinds of papers, I've seen clear and obvious issues over datasets covering many billions of spending, in aggregate trillions. I can only assume the same is true in many other fields as well as those processes exist there too.

There is so much scope to improve things, and yet so much of this work is done by people who don't know what the options are and often are working late hours in personal time to sort that it's rarely addressed. My wife was still working on papers for a research position she left and was not being paid for any more years after, because the whole process is so slow for research -> publication. What time is there then for learning and designing a better way of tracking and recording data and teaching all the other people how to update & generate stats? I built things which helped but there's only so much of the workflow I could manage.


While I appreciate a good rant just as much as the next person, most of these points have nothing to do with CSV. They are a general problem with underspecifying data, which is exactly what happens when you move data between systems.

The amount of hours I have wasted on unifying character sets across single database tables is horrifying to even think about. And the months it took before an important national dataset that supposedly many people use across several types of businesses was staggering. That fact that that XML came with a DTD was apparently not a hindrance to doing unspeakable horrors with both attributes and cdata constructs.

Sure, you can specify MM/DD/YY in a table, but it people put DD/MM/YY in there, what are you going to do about it? And that's exactly what happens in the real world when people move data across systems. That's why mojibake is still a thing in 2026.


I disagree, they are absolutely related to CSV in that these are all problems CSV has. Other formats can have these problems, but CSV is almost uniquely bad because these issues compound and it has a lot of them.

> They are a general problem with underspecifying data,

Which CSV provides essentially no tools to solve, unlike many other formats.

Also, several of these problems are not even about underspecified data but the format itself - you can have totally fine data which gets utterly fucked to the point of not parsing as a csv file by minor changes.

It's not even a fully specified format! Someone adds a comma in a field and then one of the following happens:

* Something generating the csv doesn't add quotes

* Something reading the csv doesn't understand quotes

And the classic

* Something sorted the file

> Sure, you can specify MM/DD/YY in a table, but it people put DD/MM/YY in there, what are you going to do about it?

If you've got something with actual date types you can have interfaces show actual calendars, and for many formats you will at least get an error if it's defined as DD/MM/YY and someone puts in 01/13/26. CSV however gives you no ability to do this - all data is just strings. And string defined dates with no restrictions are why I have had to deal with mixtures of 01/13/26 and 13/01/26, meaning everything goes just fine until you try and parse it. Or, like some of my personal favourites, "Winter 2019".

CSV is not one format, lacks verification of any useful kind, is almost uniquely easy for users to completely fuck up, and the lack of types means that programs do their own type inference which adds to things getting messed up.


You're blaming a lot of normal ETL problems on DSVs.

Like, specifying date as a type for a field in JSON isn't going to ensure that people format it correctly and uniformly. You still have parsing issues, except now you're duplicating the ignored schema for every data point. The benefit you get for all of that overhead is more useful for network issues than ensuring a file is well formed before sending it. The people who send garbage will be more likely to send garbage when the format isn't tabular.

There are types and there is a spec WHEN YOU DEFINE IT.

You define a spec. You deal with garbage that doesn't match the spec. You adjust your tools if the garbage-sending account is big. You warn or fire them if they're small. You shit-talk the garbage senders after hours to blow off steam. That's what ETL is.

DSVs aren't the problem. Or maybe they are for you because you're unable to address problems in your process, so you need a heavy unreadable format that enforces things that could be handled elsewhere.


I would kind of disagree.

We are talking here in the context of scientific datasets. Of course ETL plays a part here. However here it is really more the interplay of Excel with CSV which is often outputted by scientific instruments or scientific assistants.

You get your raw sensor data as a csv, just want to take a look in excel, it understandably mangles the data in attempt to infer column types, because of course it does, its's CSV! Then you mistakenly hit save and boom, all your data on disk is now an unrecoverable mangled mess.

Of course this is also the fault of not having good clean data practices, but with CSV and Excel it is just so, so easy to hold it wrong, simply because there is no right.

> so you need a heavy unreadable format

I prefer human unreadable if it means I get machine readable without any guesswork.


That's Excel's type inference causing problems. Not an issue with CSV or any other type of DSV.

It is possible to import a CSV into Excel without type conversion. I just tested it two different ways.

While possible, it's not Excel's default way of doing things. Not always obvious or easy. Not enough people who use Excel really know how to use it.

Regardless, Excel mangling files via type inference is an Excel problem. It's not the fault of the file formats Excel reads in.


The file format being ambiguous and underspecified enough to mangle is, though.

No, it's Excel trying to be too clever. It does the same thing with manual imput if you don't proactively change the field type.

You can import a DSV into Excel without mangling datatypes in a few different ways. Probably the best way is using Power Query.

A DSV generally does have a schema. It's just not in the file format itself. Just because it isn't self-describing doesn't mean it isn't described. It just means the schema is communicated outside of the data interchange.


If you get an .xls which doesn't have very esoteric functions, I expect it to open about the same way in any Excel program and any other office suite.

With CSV I do not have that expectation. I know that for some random user-submitted CSVs, I will have to fiddle. Even if that means finding the one row in thousand rows which has some null value placeholder, messing up the whole automatic inference.


You're just saying when there's no filetype transfer, you don't have to deal with issues related to filetype transfer.

It's both of their faults. CSV is not blameless here - Excel is doing something broadly that users expect, have dates as dates and numbers as numbers. Not everything as strings. If CSV had types then Excel would not have to guess what they are.

It does have types if you define them in the schema. Not every format needs to be self-describing. It's often more efficient to share the schema once outside of the data feed than have the overhead of restating it for every data point.

It's completely Excel's fault for pushing their type-inference and making it difficult for users to define or supply their own.

Power Query does a better job handling it, but you should be able to just supply a schema on import, like you can with Polars or DuckDb.

It's another example of MS babying their userbase too much. Like how VBA is single threaded only because threads are hard. They're making their product less usable and making it harder for their users to learn how stuff works.


Csv doesn’t have a schema, it has a barely adhered to post-hoc “not a specification” and everything is strings.

That you can solve some of these problems by using something as well as the csv file is not anywhere near as helpful, and it’s a clear problem of csv files. There is no universally followed schema, for a start, so now we’re at unique solutions all over the place.

> It's often more efficient to share the schema once outside of the data feed than have the overhead of restating it for every data point.

You cannot be suggesting that csv files are efficient surely, they’re atrociously inefficient. Having the same format and a tied in schema would solve a lot and add barely anything as overhead. If you want efficiency, do not use csv.

Asking users to manually load in the right schema every time they open a file is asking for trouble. Why wouldn’t you combine them?

> It's completely Excel's fault for pushing their type-inference and making it difficult for users to define or supply their own.

It’s not entirely excels fault that csv doesn’t have types. They didn’t invent and promote a new standard, but then why would you? There’s better formats out there. I’m sure they would argue that the excel files are a better format for a start.

And people did make better formats. That’s why I think csv should be consigned to the bin of history.


> "You can edit it in a text editor" which feels like a monkeys-paw wish

Yes :) Although I will note that some editors are good enough to maintain the structure as the user edits. Consider Emacs with `csv-mode`, for example. Of course most users don’t have Emacs so they’ll just end up using notepad (or worse, Word).


This is an excellent rant, thanks for sharing. I didn’t have to work with csv s mich as you, but what experience I had I share your sentiment.

Completely agree. If CSVs stay read only and are not user-submitted but computer generated they can be okay at best.

Anything else? Nope nope nope!


Interesting thought. PCBWay and JLCPCB sponsoring channels to show use-cases of their capability, thus growing their market.

Would be similar to the distributer/producer of a food item sponsoring channels to use their ingredient in recipes.

Makes a lot of sense.


Hey, just wanted to say thanks for your comment. I'm usually very apprehensive against nuclear (still kind of am) but I think one of my main points against was that "base load" (i.e. inflexible generation) would be bad in countries with high intermittent generation.

But I guess the whole calculation of 100% renewables is overprovision+storage. This wouldn't change with nuclear in the mix, nuclear would just generate all the time at whatever price it can get, just bringing the point of overprovision for renewables closer.

Then in countries in more extreme latitudes the calculation of if nuclear is worth it just becomes how cheap and viable the (long and short-term) storage part will get over the lifetime of a new nuclear reactor.

If storage gets so cheap that a nuclear reactor would be consistently in the red, even in the depths of winter, then it wouldn't make sense to build one today.

But I haven't done any calculations on that yet. For example for the Netherlands or Germany which still have a high reliance on gas but a large portion of solar+wind, how expensive nuclear could be for it to make sense to build a new reactor. And under which scenarios of development of storage prices it would potentially seize to make sense.


For countries that can reliably get to 99% hydro, save for some exceptional droughts, "build nuclear" is about the worst advice you can give them.

Building dams is not without environmental costs especially in water stressed regions. Grand Ethiopian Renaissance Dam has long been a source of tension between Ethiopia, Sudan and Egypt.

https://www.dw.com/en/gerd-grand-ethiopian-renaissance-dam-s...

Motuo Hydropower Station - will overtake the Three Gorges dam as the world's largest. The project has attracted criticism for its potential impact on millions of Indians and Bangladeshis living downriver, as well as the surrounding environment and local Tibetans.

https://www.bbc.com/news/articles/c4gk1251w14o


"a source of tension" is a understatement. might have have caused a war, and still could.

No need for absolute pitch. All you need is relative pitch. You play a note, compare to the note you heard, maybe even play them at the same time. And then change the note till you find the right one.


How do you do that with chords? I know everyone who isn't completely tone deaf can do that with one single note. But when it comes to chords, unless you already know some music theory, aren't there infinite number of combinations you have to try before you find the correct one?


Well, the guitar has a finite number of strings and each string is partition into a finite number of frets. It's definitely not more than, say, 30^6 ~ 729 million.

That said, common chords are A, B, C, D, E, F, G (and their sharps and flats), combined with either major or minor mode. Hence "C, G, F, Am, Em" is an example of what someone could play. Now, of course, if it doesn't sound exactly like a G, perhaps it's a G7? After some practice, you can even hear, by the sound of the strings, exactly which chord it is. Em, G, and D are particularly simple to recognize.


Interval training will help https://www.musictheory.net/exercises/ear-interval

Each interval has a unique "flavor" and once you can hear them you should be able to hear multiple intervals at the same time, which effectively identifies the chord. (Admittedly for complex jazz chords it can get very difficult and you probably need more powerful tools, I can't say.)


> infinite number of combinations you have to try before you find the correct one?

Kinda, but on Guitar, most pop songs are major/minor, possible sevenths. I think this post is aimed at someone who can read tab, but isn't "good" (what ever that means) so they should have an understanding of basic chord shapes.

The post does imply that this only really works if you can comfortably read tab, which is probably 6month-2 years of work (part time)


Theory will make it a million times easier. Figure out the key and changes and you'll have likely chords and if you can do substitutions you'll have some alternatives.

Even if they're not exactly what was played, you'll be able to get to a working version with the right idea.

In any case, theory and experience will narrow the field down a great deal so you're not just stabbing at things in the dark.


You don't have to get it right. If you know the basic guitar chords in the open positions, you can sort of play along to the vast majority of popular songs. As your hearing, knowledge of the neck, and maybe music theory improves you will start to recognise more things.

The point is not a perfect outcome. The point is the effort.


It wouldn't hurt to know how to do the 'cowboy chords' and then the 'barre chords' before (or in parallel to) doing the transcriptions. Anyway, you should start with easy songs that mostly just include those until that seems easy.


In some genres there are an infinite number. Most of the music regular people listen to is diatonic though and uses either power chords or triads, and then there are not that many options.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: