More

kajecounterhack · 2026-03-18T20:55:09 1773867309

+10000 that Azure is a steaming pile of shit. Like what's this -- `azcopy` broken at head, and the working one doesn't guarantee correctness after a copy (99.6% copied successfully! good luck figuring out what went wrong!) compare that to migrating data with GCS or S3 -- they provide first class tools that do it right quickly (aws-cli, gsutil).

Want a VM? You'll also need this network security group, network interface, network manager, ip, virtual network... and maybe it'll be connected to the internet so you can SSH in? Compare to GCP or EC2 -- you just pick an instance and start it. You can SSH in directly, or even do it in the browser.

Billing also a nightmare: if you're running a startup, AWS and Google make it relatively easy to see how many credits you have left. The Azure dashboard makes you navigate a maze, and the button to click that says "Azure Credits" is _invisible_ for 30s until ostensibly some backend system finds your credits, then it magically shows up. Most people don't wait around and just assume there's no button.

And if you click it, maybe you will happen to be in the correct billing profile, maybe not! Don't get confused: billing profile and billing scope are different concepts too! And in your invoice, costs just magically get deducted, until they don't. No mention of any credits. Credits inaccessible through API (claude tried everything).

VMs, bucket storage, and copying data are the _simplest_ parts of the stack. Why would anyone bother trying to use other services if they can't get these right?

They literally give startups 2x the credits as GCP, 20x the credits of AWS and nobody wants to use them.

jiggawatts · 2026-03-18T21:11:40 1773868300

Azcopy is special bad, the team that looks after it is made up entirely of junior developers that obstinately refuse to listen to feedback.

Its documentation title is "Copy or move data to Azure Storage by using AzCopy v10" but it can’t actually do trivial operations like “move” because the devs are too scared to write code that deletes files: https://github.com/Azure/azure-storage-azcopy/issues/1650#is...

I recommend switching to “rclone” instead to avoid the frustration. It won't fill your entire system disk up with unnecessary log files unlike azcopy, which is a significant source of production server outages where I work because of this default behaviour.

kajecounterhack · 2026-01-30T22:49:02 1769813342

It has utility though: unlike the dollars in your mattress, it can't be printed into oblivion by your central bank. It is relatively portable, and people have flocked to it as a store of value especially during periods of socioeconomic instability when assets are going down and gov't spending is going up. It's tradeable for fiat in any country, so it allows you to bring value along if you relocate.

Its price reflects that utility and like any modern asset, a lot of speculation. You can speculate on whether it's more or less useful given current events -- nothing wrong with speculating that it is only going to be increasingly useful.

mapontosevenths · 2026-01-30T23:04:33 1769814273

You're right that it has utility, but being fungible doesnt imply that it is automatically an investment.

Speculation is not the same as investment, and it is still completely non-productive.

kajecounterhack · 2026-01-30T23:21:56 1769815316

Agree it doesn't generate wealth. It's explicitly a store of wealth.

Investment is a weird term because most people would consider keeping cash or cash equivalents (gold) to be investments, even if they don't generate wealth. Cash is also an opinion, in terms of the market.

michaelmrose · 2026-01-30T23:46:58 1769816818

An investment creates a return

fuzzfactor · 2026-01-31T02:59:42 1769828382

Roger, sometimes positive, sometimes negative.

kajecounterhack · 2025-09-12T22:28:01 1757716081

They are used in thin-film solar panel development. Not sure anyone has cracked the big problem with them, which is durability.

kajecounterhack · 2025-08-16T09:00:29 1755334829

I tried mapping back to closest token embeddings. Here's what I got:

    global_step = 1377; phase = continuous; lr = 5.00e-03; average_loss = 0.609497
  current tokens: ' Superman' '$MESS' '.");' '(sentence' '");' '.titleLabel' ' Republican' '?-'

    global_step = 1956; phase = continuous; lr = 5.00e-03; average_loss = 0.589661
  current tokens: ' Superman' 'marginLeft' 'iers' '.sensor' '";' '_one' '677' '».'

    global_step = 2468; phase = continuous; lr = 5.00e-03; average_loss = 0.027065
  current tokens: ' cited' '*>(' ' narrative' '_toggle' 'founder' '(V' '(len' ' pione'

    global_step = 4871; phase = continuous; lr = 5.00e-03; average_loss = 0.022909
  current tokens: ' bgcolor' '*>(' ' nomin' 'ust' ' She' 'NW' '(len' ' pione'

"Republican?" was kind of interesting! But most of the strings were unintelligible.

This was for classifying sentiment on yelp review polarity.

DoctorOetker · 2025-08-16T22:18:18 1755382698

During the prompt embedding optimization, the embeddings are allowed to take on any vector in embedding space, instead one could use a continuous penalty for superposing tokens:

Consider one of the embedding vectors in the input tensor: nothing guarantees its exactly on, or close to a specific token. Hence the probabilities with respect to each token form a distribution, ideally that distribution should be one-hot (lowest entropy) and worst case all equal probability (highest entropy), so just add a loss term penalizing the entropy on the quasitokens, to promote them to take on actual token values.

mattnewton · 2025-08-16T16:32:41 1755361961

Do the nearest tokens have a similar classification score?

kajecounterhack · 2025-06-29T21:15:05 1751231705

I'm similarly puzzled by "uncured bacon" which afaik still uses naturally occurring nitrites. How they're allowed to call it uncured when it's clearly still cured is beyond me.

kajecounterhack · 2025-06-23T09:47:22 1750672042

A lot of people use them together (cursor for IDE and claude code in the terminal inside the IDE).

In terms of performance, their agents differ. The base model their agents use are the same, but for example how they look at your codebase or decide to farm tasks out to lesser models, and how they connect to tools all differ.

kajecounterhack · 2025-06-18T17:42:38 1750268558

Cool stuff! Is the goal of this project personal learning, inference performance, or something else?

Would be nice to see how inference speed stacks up against say llama.cpp

nirw4nna · 2025-06-18T22:05:20 1750284320

Thanks! To be honest, it started purely as a learning project. I was really inspired when llama.cpp first came out and tried to build something similar in pure C++ (https://github.com/nirw4nna/YAMI), mostly for fun and to practice low-level coding. The idea for DSC came when I realized how hard it was to port new models to that C++ engine, especially since I don't have a deep ML background. I wanted something that felt more like PyTorch, where I could experiment with new architectures easily. As for llama.cpp, it's definitely faster! They have hand-optimizing kernels for a whole bunch of architectures, models and data types. DSC is more of a general-purpose toolkit. I'm excited to work on performance later on, but for now, I'm focused on getting the API and core features right.

NalNezumi · 2025-06-19T05:00:54 1750309254

If someone wanted to learn the same thing, what material would you suggest is a good place to start?

nirw4nna · 2025-06-19T06:15:31 1750313731

You just need a foundation of C/C++. If you already have that then just start programming, it's way better than reading books/guides/blogs (at least until you're stuck!). Also, you can read the source code of other similar projects on GitHub and get ideas from them, this is what I did at the beginning.

liuliu · 2025-06-18T18:16:15 1750270575

Both uses cublas under the hood. So I think it is similar for prefilling (of course, this framework is too early and don't have FP16 / BF16 support for GEMM it seems). Hand-roll gemv is faster for token generation hence llama.cpp is better.

kajecounterhack · 2025-06-19T23:47:04 1750376824

Unrelated: my man, I loved your C vision library back in the day.

kajecounterhack · 2025-06-18T04:36:17 1750221377

I know what you mean, but still Tiger Beetles are an insect https://en.wikipedia.org/wiki/Tiger_beetle

apgwoz · 2025-06-18T07:03:58 1750230238

I thought about that, but, but it’s not the comparison of the second word, it’s the strength of the first. Read this list:

* Tiger Shark * Tiger Beetle * Tiger Data * Tiger Games * Tiger Woods * Tiger Attack * Tiger Snake * Wild Tiger

Only one stands out as not like the others. Tiger is too strong a word. The second word disappears.

sidewndr46 · 2025-06-18T20:12:55 1750277575

Now they just need to rebrand as Tiger Direct!

kajecounterhack · 2025-06-17T18:33:56 1750185236

OK but also note there's also not a "both sides" to everything. Some stuff can just suck.

kreetx · 2025-06-17T19:33:18 1750188798

I'm sure it might. It's just that any news item from this particular site has been negative. (Even the one from 2019 that the sibling links.)

bryanlarsen · 2025-06-17T20:21:30 1750191690

The sibling link was the turning point -- articles about Tesla before the link were generally positive, articles after and including the link were negative.

kreetx · 2025-06-17T20:41:03 1750192863

You are correct (and it even looks like 2019 wasn't much negative).

kajecounterhack · 2025-06-17T18:24:58 1750184698

Tesla could have more camera data in sum (that's not even clear - transmitting and storing data from all the cars on the road is no easy task - L4 companies typically pysically remove drives and use appliances to suck data off the hard drives), but Waymo has more camera data per car (29 cameras) and higher fidelity data overall (including lidar, radar, and microphone data). Tesla can't magically enhance data it didn't collect.

This is a crippling disadvantage. Consider what it takes to evaluate a single software release for a robotaxi.

If you have a simulator, you can take long tail distribution events and just resimulate your software to see if there are regressions against those events. (Waymo, Zoox)

If you don't, or your simulator has too much error, you have to deploy your software in cars in "ghost mode" and hope that sufficient miles see rare and scary situations recur. You then need to find those specific situations and check if your software did a good job (vs just getting lucky). But what if you need to A/B test a change? What if you need to A/B test 100 changes made by different engineers? How do you ensure you're testing the right thing? (Tesla)

And if you have a simulator that _sucks_ because it doesn't have physics-grounded understanding of distances (i.e. it's based on distance estimates from camera), then you can easily trick yourself into thinking your software is doing the right thing, right up until you start killing people.

Another way to look at it is: most driving data is actually very low in signal. You want all the hard driving miles, and in high resolution, so that you can basically generate the world's best unit testing suite for the software driver. You can just throw the rest of the driving data away -- and you must, because nobody has that much storage and unit economics still matter.

This is to say nothing of the fact that differences between hardware matter too. Tesla has a bunch of car models out there, and software working well one one model may not actually work well on another.