+10000 that Azure is a steaming pile of shit. Like what's this -- `azcopy` broken at head, and the working one doesn't guarantee correctness after a copy (99.6% copied successfully! good luck figuring out what went wrong!) compare that to migrating data with GCS or S3 -- they provide first class tools that do it right quickly (aws-cli, gsutil).
Want a VM? You'll also need this network security group, network interface, network manager, ip, virtual network... and maybe it'll be connected to the internet so you can SSH in? Compare to GCP or EC2 -- you just pick an instance and start it. You can SSH in directly, or even do it in the browser.
Billing also a nightmare: if you're running a startup, AWS and Google make it relatively easy to see how many credits you have left. The Azure dashboard makes you navigate a maze, and the button to click that says "Azure Credits" is _invisible_ for 30s until ostensibly some backend system finds your credits, then it magically shows up. Most people don't wait around and just assume there's no button.
And if you click it, maybe you will happen to be in the correct billing profile, maybe not! Don't get confused: billing profile and billing scope are different concepts too! And in your invoice, costs just magically get deducted, until they don't. No mention of any credits. Credits inaccessible through API (claude tried everything).
VMs, bucket storage, and copying data are the _simplest_ parts of the stack. Why would anyone bother trying to use other services if they can't get these right?
They literally give startups 2x the credits as GCP, 20x the credits of AWS and nobody wants to use them.
Azcopy is special bad, the team that looks after it is made up entirely of junior developers that obstinately refuse to listen to feedback.
Its documentation title is "Copy or move data to Azure Storage by using AzCopy v10" but it can’t actually do trivial operations like “move” because the devs are too scared to write code that deletes files: https://github.com/Azure/azure-storage-azcopy/issues/1650#is...
I recommend switching to “rclone” instead to avoid the frustration. It won't fill your entire system disk up with unnecessary log files unlike azcopy, which is a significant source of production server outages where I work because of this default behaviour.
It has utility though: unlike the dollars in your mattress, it can't be printed into oblivion by your central bank. It is relatively portable, and people have flocked to it as a store of value especially during periods of socioeconomic instability when assets are going down and gov't spending is going up. It's tradeable for fiat in any country, so it allows you to bring value along if you relocate.
Its price reflects that utility and like any modern asset, a lot of speculation. You can speculate on whether it's more or less useful given current events -- nothing wrong with speculating that it is only going to be increasingly useful.
Agree it doesn't generate wealth. It's explicitly a store of wealth.
Investment is a weird term because most people would consider keeping cash or cash equivalents (gold) to be investments, even if they don't generate wealth. Cash is also an opinion, in terms of the market.
During the prompt embedding optimization, the embeddings are allowed to take on any vector in embedding space, instead one could use a continuous penalty for superposing tokens:
Consider one of the embedding vectors in the input tensor: nothing guarantees its exactly on, or close to a specific token. Hence the probabilities with respect to each token form a distribution, ideally that distribution should be one-hot (lowest entropy) and worst case all equal probability (highest entropy), so just add a loss term penalizing the entropy on the quasitokens, to promote them to take on actual token values.
I'm similarly puzzled by "uncured bacon" which afaik still uses naturally occurring nitrites. How they're allowed to call it uncured when it's clearly still cured is beyond me.
A lot of people use them together (cursor for IDE and claude code in the terminal inside the IDE).
In terms of performance, their agents differ. The base model their agents use are the same, but for example how they look at your codebase or decide to farm tasks out to lesser models, and how they connect to tools all differ.
Thanks!
To be honest, it started purely as a learning project. I was really inspired when llama.cpp first came out and tried to build something similar in pure C++ (https://github.com/nirw4nna/YAMI), mostly for fun and to practice low-level coding.
The idea for DSC came when I realized how hard it was to port new models to that C++ engine, especially since I don't have a deep ML background. I wanted something that felt more like PyTorch, where I could experiment with new architectures easily.
As for llama.cpp, it's definitely faster! They have hand-optimizing kernels for a whole bunch of architectures, models and data types. DSC is more of a general-purpose toolkit. I'm excited to work on performance later on, but for now, I'm focused on getting the API and core features right.
You just need a foundation of C/C++. If you already have that then just start programming, it's way better than reading books/guides/blogs (at least until you're stuck!). Also, you can read the source code of other similar projects on GitHub and get ideas from them, this is what I did at the beginning.
Both uses cublas under the hood. So I think it is similar for prefilling (of course, this framework is too early and don't have FP16 / BF16 support for GEMM it seems). Hand-roll gemv is faster for token generation hence llama.cpp is better.
The sibling link was the turning point -- articles about Tesla before the link were generally positive, articles after and including the link were negative.
Tesla could have more camera data in sum (that's not even clear - transmitting and storing data from all the cars on the road is no easy task - L4 companies typically pysically remove drives and use appliances to suck data off the hard drives), but Waymo has more camera data per car (29 cameras) and higher fidelity data overall (including lidar, radar, and microphone data). Tesla can't magically enhance data it didn't collect.
This is a crippling disadvantage. Consider what it takes to evaluate a single software release for a robotaxi.
If you have a simulator, you can take long tail distribution events and just resimulate your software to see if there are regressions against those events. (Waymo, Zoox)
If you don't, or your simulator has too much error, you have to deploy your software in cars in "ghost mode" and hope that sufficient miles see rare and scary situations recur. You then need to find those specific situations and check if your software did a good job (vs just getting lucky). But what if you need to A/B test a change? What if you need to A/B test 100 changes made by different engineers? How do you ensure you're testing the right thing? (Tesla)
And if you have a simulator that _sucks_ because it doesn't have physics-grounded understanding of distances (i.e. it's based on distance estimates from camera), then you can easily trick yourself into thinking your software is doing the right thing, right up until you start killing people.
Another way to look at it is: most driving data is actually very low in signal. You want all the hard driving miles, and in high resolution, so that you can basically generate the world's best unit testing suite for the software driver. You can just throw the rest of the driving data away -- and you must, because nobody has that much storage and unit economics still matter.
This is to say nothing of the fact that differences between hardware matter too. Tesla has a bunch of car models out there, and software working well one one model may not actually work well on another.
Want a VM? You'll also need this network security group, network interface, network manager, ip, virtual network... and maybe it'll be connected to the internet so you can SSH in? Compare to GCP or EC2 -- you just pick an instance and start it. You can SSH in directly, or even do it in the browser.
Billing also a nightmare: if you're running a startup, AWS and Google make it relatively easy to see how many credits you have left. The Azure dashboard makes you navigate a maze, and the button to click that says "Azure Credits" is _invisible_ for 30s until ostensibly some backend system finds your credits, then it magically shows up. Most people don't wait around and just assume there's no button.
And if you click it, maybe you will happen to be in the correct billing profile, maybe not! Don't get confused: billing profile and billing scope are different concepts too! And in your invoice, costs just magically get deducted, until they don't. No mention of any credits. Credits inaccessible through API (claude tried everything).
VMs, bucket storage, and copying data are the _simplest_ parts of the stack. Why would anyone bother trying to use other services if they can't get these right?
They literally give startups 2x the credits as GCP, 20x the credits of AWS and nobody wants to use them.
reply