More

apbytes · 2025-09-06T06:21:09 1757139669

This is a really well written blog, great work!

apbytes · 2025-06-19T17:12:56 1750353176

Not plain-text I hope. Was this extracted directly from providers? Given the fact that people reuse passwords, it should be easy to construct a larger db theoretically. But this number is mind-boggling.

apbytes · on April 4, 2025

When you call a cuda method, it is launched asynchronously. That is the function queues it up for execution on gpu and returns.

So if you need to wait for an op to finish, you need to `synchronize` as shown above.

`get_current_stream` because the queue mentioned above is actually called stream in cuda.

If you want to run many independent ops concurrently, you can use several streams.

Benchmarking is one use case for synchronize. Another would be if you let's say run two independent ops in different streams and need to combine their results.

Btw, if you work with pytorch, when ops are run on gpu, they are launched in background. If you want to bench torch models on gpu, they also provide a sync api.

claytonjy · on April 4, 2025

I’ve always thought it was weird GPU stuff in python doesn’t use asyncio, and mostly assumed it was because python-on-GPU predates asyncio. But I was hoping a new lib like this might right that wrong, but it doesn’t. Maybe for interop reasons?

Do other languages surface the asynchronous nature of GPUs in language-level async, avoiding silly stuff like synchronize?

ImprobableTruth · on April 4, 2025

The reason is that the usage is completely different from coroutine based async. With GPUs you want to queue _as many async operations as possible_ and only then synchronize. That is, you would have a program like this (pseudocode):

  b = foo(a)
  c = bar(b)
  d = baz(c)
  synchronize()

With coroutines/async await, something like this

  b = await foo(a)
  c = await bar(b)
  d = await baz(c)

would synchronize after every step, being much more inefficient.

hackernudes · on April 4, 2025

Pretty sure you want it to do it the first way in all cases (not just with GPUs)!

halter73 · on April 4, 2025

It really depends on if you're dealing with an async stream or a single async result as the input to the next function. If a is an access token needed to access resource b, you cannot access a and b at the same time. You have to serialize your operations.

alanfranz · on April 5, 2025

Well you can and should create multiple coroutine/tasks and then gather them. If you replace cuda with network calls, it’s exactly the same problem. Nothing to do with asyncio.

ImprobableTruth · on April 5, 2025

No, that's a different scenario. In the one I gave there's explicitly a dependency between requests. If you use gather, the network requests would be executed in parallel. If you have dependencies they're sequential by nature because later ones depend on values of former ones.

The 'trick' for CUDA is that you declare all this using buffers as inputs/outputs rather than values and that there's automatic ordering enforcement through CUDA's stream mechanism. Marrying that with the coroutine mechanism just doesn't really make sense.

apbytes · on April 4, 2025

Might have to look at specific lib implementations, but I'd guess that mostly gpu calls from python are actually happening in c++ land. And internally a lib might be using synchronize calls where needed.

hnuser123456 · on April 4, 2025

Thank you kindly!

apbytes · on Dec 30, 2024

I'd imagine that Meta is trying to avoid an AI / LLM monopoly from happening as the primary goal. They suffered when Apple was a gateway to their service for ios users and decided to shut their access to user data off. And AI is clearly going to be used widely to aid in content generation if not doing entirely. Also they tried jumping on the chatbot hype with M or something. That didn't pan out as well. By opening up AI, they would enable tools that allow users to pump out more content easily and spend more time in app. Getting the goodwill of dev community is a bonus. A brilliant move from Meta.

apbytes · on Oct 22, 2024

Great work!! I was just talking about how this is a major gap in Rust and here you are the very next day! Looking forward to use and contribute!

xyst · on Oct 22, 2024

What are these projects missing that you feel there’s a “major gap” in the web framework space?

rocket.rs, actix, axum, warp, gotham, ruille

apbytes · on April 5, 2022

Wait what‽ Do you have any resources you can point to for parallelism in finite automata?

eslaught · on April 5, 2022

It's been a while, but here are a couple things I found on Google for the search "finite automata gpu":

https://github.com/vqd8a/DFAGE

https://onlinelibrary.wiley.com/doi/epdf/10.1002/stvr.1796

apbytes · on Dec 25, 2021

Hang in there! And to add to what others have said, I found it extremely helpful to stay away from sad music, romantic shows, etc. until you feel strong again. And keep your body moving!

apbytes · on Dec 9, 2021

Glad to see Revolution OS in the list! Helped me get through times in grad school when I'd get stuck on something working on my thesis. Grab a bowl of ramen and put this movie on!

apbytes · on Dec 6, 2021

It checks for specific return code. It is useful for testing specific type of failure . Since in shell all non zero return codes are considered failure, we need a way to know that it failed because of the expected reason and not some other bug. So if testing for failure with err code 2, the function will "pass" the test only if the process actually fails with err code 2.

apbytes · on Nov 12, 2021

How would one go about demonstrating "great work as an employee"? Also would that be enough for an eb1?

proberts · on Nov 12, 2021

Definitely not enough for an EB1 in and of itself. You would need a letter from the confirming and describing your responsibilities and achievements.