Hacker Newsnew | past | comments | ask | show | jobs | submit | streamer45's commentslogin

Rad! huggingface link gives 404 on my side though.


Oh damn! Thanks for catching that -- going to ping the HF folks to see what they can do to fix the collection link.

In the meantime here's the individual links to the models:

https://huggingface.co/Linum-AI/linum-v2-720p https://huggingface.co/Linum-AI/linum-v2-360p


Looks like 20GB VRAM isn't enough for the 360p demo :( need to bump my specs :sweat_smile:


Should be fixed now! Thanks again for the heads up


All good, cheers!


Per the RAM comment, you may able to get it run locally with two tweaks:

https://github.com/Linum-AI/linum-v2/blob/298b1bb9186b5b9ff6...

1) Free up the t5 as soon as the text is encoded, so you reclaim GPU RAM

2) Manual Layer Offloading; move layers off GPU once they're done being used to free up space for the remaining layers + activations


Any idea on the minimum VRAM footprint with those tweaks? 20GB seems high for a 2B model. I guess the T5 encoder is responsible for that.


T5 Encoder is ~5B parameters so back of the envelope would be ~10GB of VRAM (it's in bfloat16). So, for 360p should take ~15 GB RAM (+/- a few GB based on the duration of video generated).

We can update the code over the next day or two to provide the option for delete VAE after the text encoding is computed (to save on RAM). And then report back the GB consumed for 360p, 720p 2-5 seconds on GitHub so there are more accurate numbers.

Beyond the 10 GB from the T5, there's just a lot of VRAM taken up by the context window of 720p video (even though the model itself is 2B parameters).


The 5B text encoder feels disproportionate for a 2B video model. If the text portion is dominating your VRAM usage it really hurts the inference economics.

Have you tried quantizing the T5? In my experience you can usually run these encoders in 8-bit or even 4-bit with negligible quality loss. Dropping that memory footprint would make this much more viable for consumer hardware.


That all being said, you can just delete the T5 from memory after encoding the text so save on memory.

The 2B parameters will take up 4 Gb of memory but activations will be a lot more given size of context windows for video.

A 720p 5 second video is roughly 100K tokens of context


Great idea! We haven’t tried it but def interested to see if that works as well.

When we started down this path, T5 was the standard (back in 2024).

Likely won’t be the text encoder for subsequent models, given its size (per your point) and age


Nice! Built a similar system in the past using a servo-controlled traxxas buggy with an LTE hat, which let us do open-space driving. Latency (over internet) was still a challenge, and finding cameras and lenses that performed well across varying lighting conditions turned out to be a bit of a pain but pretty fun stuff.


Been using raylib for years to power generative digital paintings on embedded systems (RPI and the like). I have been really impressed with its performance and accessible API. Plus it's a very active and welcoming open source project, kudos to the maintainer.


Do you have any examples of code and/or art you can share?

I’ve always been fascinated by generative art.


After cleaning up my bookmarks i narrowed down my "Creative Coding" folder to these.

"Generative Design: Visualize, Program, and Create with JavaScript in p5.js":

http://www.generative-gestaltung.de/2/

Articles by Tyler Hobbs specially the one on "Flow Fields" :

https://tylerxhobbs.com/essays/2020/flow-fields

Articles by Sighack specially the one on "Watercolor Techniques":

https://sighack.com/post/generative-watercolor-in-processing

"Steve's Makerspace" on Youtube:

https://youtube.com/@StevesMakerspace


I don't know how known it is but Jared Tarbell has an excellent gallery: http://www.complexification.net/gallery/

EDIT: nevermind, bio says he co-founded Etsy... he's probably well known


I've been using raylib for years now to implement digital signage art and it's been a pleasure to work with, especially thanks to its excellent multi platform support (used many Raspberry PIs). Really well thought, intuitive API, kudos to the author.


Amazing to think they were actually spawning 150k headless browsers to simulate the traffic. That sounds like throwing money at the problem and it probably worked (for a while anyway).

Having built a load-test tool as well, I can say making it realistic enough and keeping it that way is possibly the hardest challenge. Maintenance cost is high, especially in a features focused environment.


> Having built a load-test tool as well.

Which tool. Curious.

To your other points,

> That sounds like throwing money at the problem and it probably worked (for a while anyway).

> Maintenance cost is high, especially in a features focused environment.

Isn't really just choosing which way to throw money at the problem? Hardware costs, vs. person-hours to maintain a thin client version?


> Which tool. Curious.

We built something similar at Mattermost, which (funnily enough) is a comparable application.

https://github.com/mattermost/mattermost-load-test-ng

https://mattermost.com/blog/improving-performance-through-lo...

> Isn't really just choosing which way to throw money at the problem? Hardware costs, vs. person-hours to maintain a thin client version?

That's fair, although the second option has (in my opinion) a better return on investment given by the knowledge and experience gain.


The new tool seems like an early version as well, with pretty basic functionality.

In the example where it is supposed to "viewing a message, marking the message as read, and finally calling reactions.add"...it doesn't really do those things in a real chain. They just have a 5 second delay after "view a message", then run the "mark message as read", then a 60 second delay, then calling reactions.add. I'm not sure that mimics real end user behavior terribly well.

It seems like they could have used jMeter rather than making a home-grown web sockets test client. Perhaps there's some requirement where existing tools don't work well.


For whom yet to read the article, this story is about stopping the money-throwing and switching to more scalable (cheaper) solution.

It's kind of interesting to see them choosing rather "declarative" (which is, json-centric) approach instead of adopting small languages like Lua for scenario-based scripting.

Maybe the declarative approach is suitable for auto-generation from the user stats data as they described? After all, there are often fewer number of people who like to write stress tests than writing a feature that should be stress-tested.


This is great and overdue. Hopefully all major browsers will add some support for open source/royalty free codecs.

Emscripten/WebAssembly actually worked rather well with audio (OPUS is just awesome) but when it comes to video it's just unfeasible, especially if you are looking at doing low latency streaming. That said, I cannot fail to mention the incredible effort done by ogv.js [1] to make a/v decoding possible almost anywhere.

Looking forward to experiment with this new API.

[1] https://github.com/brion/ogv.js/


At Mattermost we went for the do-it-yourself option and wrote a custom tool for the job [1]. After a lot of research on all the existing open-source frameworks we couldn't really find anything that would fit our use-case. We are quite happy with the result although, as the OP mentioned, there's a significant maintenance cost attached. As new features gets implemented and more API calls added you need to go back and make sure your user behaviour defining logic stays in sync with the real world. If I were to do it all over again, I'd probably give k6 [2] a chance but I am still convinced a tailored solution was the best choice.

[1] https://mattermost.com/blog/improving-performance-through-lo...

[2] https://k6.io/


Nice! Are all user inputs mixed into the same stream?


Yes they are. Technically though it's not mixed. All arriving audio data is dumped into a single buffer. Mixing would now be a great way to get rid of the stream's choppyness.

now that so many users are playing simultaneously, I'm a bit annoyed that some leave the website open and stream silence.

Wondering how I could get a user a fair slot to perform now...

Anyways, the stream has become a great source of entropy now :D


So, is this like a giant party line?

https://en.m.wikipedia.org/wiki/Party_line_(telephony)


Oh I see. Server side mixing would probably be a good option to help it scale better. Cool stuff anyway :)


I now hacked together a really bad mixer. Not sure how it will work live. Let's see.

I usually don't do prod testing but hey I'm on holiday :D


It seems appropriate to also mention https://mattermost.com/ as a valid Open Source alternative to Slack.


We use this at work alongside Gitlab.

Works perfectly.

'Bot' scripts can also be added to do things like tell a channel when a repo has been pushed etc. Very handy.


Amazing! Why does it look so much like computer graphics? I would't expect to see it like that with my own eyes.


Lack of ambient light and atmospheric attenuation. Significantly more direct light vs indirect light.

If you fly at 35,000 ft the horizon is at 221.3 miles and most of it is dense air. If you look directly downwards from ISS there is less than ten miles of thick atmosphere between the camera and the target.

If you do ray tracing from single light source with few objects and without effects that simulate atmosphere you simulate how the scene looks in vacuum.


I suspect also contributing is that the setting here is more like what you usually see with computer graphics than in real life. Very few moving parts.

In real life there are insects, and birds moving around. Wind blowing all sorts of things (leaves, blades of grass, trash, etc), etc. Individual strands of hair. Etc. All things we can't really reproduce with graphics.

Here there is just a sphere with a surface texture and some volumetric effects.


1) longer exposure, "averaging" neighboring pixels

2) noise reduction making it look "plasticky"

3) attempts to increase dynamic range with filters that favor certain color hues

That would be my list of possible explanations as a photographer.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: