Hacker Newsnew | past | comments | ask | show | jobs | submit | _flux's commentslogin

I don't know how large the cache is, but Gemini guessed that the quantized cache size for Gemini 2.5 Pro / Claude 4 with 1M context size could be 78 gigabytes. ChatGPT guessed even bigger numbers. If someone is able to deliver a more precise estimate, you're welcome to :-).

So it would probably be a quite a long transfer to perform in these cases, probably not very feasible to implement at scale.


Wouldn't it help if the system did compaction before the eviction happens? But the problem is that Claude probably don't want to automatically compact all sessions that have been left idle for one hour (and very likely abandoned already), that would probably introduce even more additional costs.

Maybe the UI could do that for sessions that the user hasn't left yet, when the deadline comes near.


What we would call O(n^2) in your rewriting message history would be the case where you have an empty database and you need to populate it with a certain message history. The individual operations would take 1, 2, 3, .. n steps, so (1/2)*n^2 in total, so O(n^2).

This is the operation that is basically done for each message in an LLM chat in the logical level: the complete context/history is sent in to be processed. If you wish to process only the additions, you must preserve the processed state on server-side (in KV cache). KV caches can be very large, e.g. tens of gigabytes.


The term unityped is used as well, and at typing level this also makes sense: you have one type called object, you put that object alongside the value object ("tag"), and then at runtime all operations on that object check if its type object provides the operation the code is trying to apply on it (or maybe each value object directly knows the operations it supports). I think I prefer this term.

"syntactic type" is a weird term to me, though. Is that in common use?


"Unityped" is informal, and inaccurate in a type theory context. The description you gave refers to the runtime/semantic domain, not to types in the type theory sense.

I used "syntactic type" to underscore that formally, typing is a syntactic system that assigns types to terms, where terms are syntactic expressions.

Because of that, it's usually redundant to include "syntactic". You'll typically see it used when it's being contrasted to some other less standard approach to typing, e.g.: https://blog.sigplan.org/2019/10/17/what-type-soundness-theo...


I believe __restrict, and __builtin_prefetch/__builtin_assume are compiler extensions, not part of the C++ language as is, and different compilers implement (or don't) these differently.

The rust compiler actually has similar things, but they're not available in stable builds. I suppose there are some issues if principle why not to include them in stable. E.g: https://doc.rust-lang.org/std/intrinsics/fn.prefetch_read_da...

Maybe some time in the future good acceptable abstractions will be conceived for them.. Perhaps using just using nightly builds for HPC is not that far out, though.


Rust already has __restrict; it is spelled &mut and is one of the most fundamental parts of the language. The key difference, of course, is that it's checked by the compiler, so is useful for correctness and not just performance. Also, for a long time it wasn't used for optimization, because the corresponding LLVM feature (noalias) was full of miscompilation bugs, because not that much attention was being paid to it, because hardly anyone actually uses restrict in C or __restrict in C++. But those days are finally over.

__builtin_assume is available on stable (though of course it's unsafe): https://doc.rust-lang.org/std/hint/fn.assert_unchecked.html

There's an open issue to stabilize the prefetch APIs: https://github.com/rust-lang/rust/issues/146941 As is usually the case when a minor standard-library feature remains unstable, the primary reason is that nobody has found the problem urgent enough to put in the required work to stabilize it. (There's an argument that this process is currently too inefficient, but that's a separate issue.) In the meantime, there are third-party libraries available that use inline assembly to offer this functionality, though this means they only support a couple of the most popular architectures.


btw. Fortran is implicitly behaving as "restrict" by default, which makes sense together with its intuitive "intent" system for function/subroutine arguments. This is one of the biggest reasons why it's still so popular in HPC - scientists can pretty much just write down their equations, follow a few simple rules (e.g. on storage order) and out comes fairly performant machine code. Doing the same (a 'naive' first implementation) in C or C++ usually leads to something severely degraded compared to the theoretical limits of a given algorithm on given hardware.

Oh I actually had some editing mistake, I meant to say that also Rust has restrict by default, by virtue of all references being unique xor readonly.

As I understand it, the Fortran compiler just expects your code to respect the "restrictness", it doesn't enforce it.


So that's where the intent system comes in (an argument can be in/out/inout) as well as the built-in array sizes, because it allows you to express what you want and then the compiler will enforce it. In Fortran you kinda have to work hard to invade the memory of one array from another, as they are allocated as distinct memory regions with their own space from the beginning. Pointer math is almost never necessary. Because there is built-in support for multidim arrays and array lengths, arrays are internally anyways built as flat memory regions, the same way you'd do it in C-arrays for good performance (i.e. cache locality), but with simple indices to address them. This then makes it unnecessary to treat memory as aliased by default.

Honestly, I still don't get why people have built up all these complex numerics frameworks in C and C++. Just use Fortran - it's built for exactly this usecase, and scientists will still be able to read your code without a CS degree. In fact, they'll probably be the ones writing it in the first place.


There are good reasons to use Fortran, some having to do with the language and many to do with legacy codes. These have to be balanced with the good reasons to avoid using Fortran for new development, which also have to do with the language and its compilers.

To me it just boils down to using the right tool for each job. I definitely wouldn’t use Fortran for anything heavily using strings. One weakness is also the lack of meta programming support. But for numerical code to be run on a specific hardware, including GPU, it’s pretty close to perfect, especially also since NVIDIA invested into it.

I’m glad you like it.

restrict is in C99. I’m not sure why standard C++ never adopted it, but I can guess: it can be hard to reason about two restrict’d pointers in C, and it probably becomes impossible when it interacts with other C++ features.

The rest are compiler extensions, but if you’re in the space you quickly learn that portability is valued far less than program optimization. Most of the point of your large calculations is the actual results themselves, not the code that got you there. The code needs to be correct and reproducible, but HPC folks (and grant funding agencies) don’t care if your Linux/amd64 program will run, unported, on Windows or on arm64. Or whether you’ve spent time making your kernels work with both rocm and cuda.


What does it do with a single probe, though? You need two to actually probe anything, right?

So I'm wondering how is the second probe problem dealth with. I've considered something similar but with small weight attached to a pogo pin, so the CNC arm could then just move it around, which would not be very easy to get completely reliable as there may be components on the board.


Your common oscilloscope is common ground referenced. You attach to your test circuits ground with the typically black alligator clips coming off the probe and then read voltage at a point with the test lead. A decent differential probe like you might be thinking of usually costs about as much as a decent hobbyist oscilloscope.

Ah well that does simplify things significantly, I suppose it's probably still somewhat useful.

But I'd expect a big part of the nets are not connected to the ground? I mean in my hobby designs a majority of them is, but let's say if you generously use decoupling capacitors, then that might not be the case?


Decoupling capacitors don't remove the ground reference, they just allow high-frequency signals a faster path to ground.

Typically, you need dedicated circuitry (and usually inductive coupling) to provide full isolation, but if the circuit is using this layout then you can still choose to ground the normally-isolated side for probing.


"the ground" ≠ "ground". It's unfortunate naming IMO, but "ground" is just the 0V reference point. A normal oscilloscope's ground probes are mains Earth referenced, that is they're connected to the "ground" pin in the outlet which in turn connects to one or more conductive rods buried in the Earth. Decoupling capacitors don't negate the ground reference, since they're connected between a power rail & circuit ground. So the power rail's voltage is still referenced to ground.

A truly "floating" circuit would be something battery powered, or galvanically isolated from mains Earth (e.g. by an isolation transformer).


Will it help? AI authors will just then buy those subscriptions and in the big picture it won't cost that much.

They could write to the RFID how much filament "it has left".

In principle it could use e.g. the `gdb` and step until it gets the secret. Or it can know ahead where the app stores the cerentials.

We could use suid binaries (e.g. sudo) to prevent that, but currently I don't think we can. Most anyone would agree that using a separate process, for which the agent environment provides a connection, is a better solution.


Seperate process as a seperate os user, and/or namespace.


I mean definitely a good starting point is a share-nothing system, but then it becomes impossible to use tools (no shared filesystem, no networking), so everything needs to happen over connections the agent provides.

MCP looks like it would then fit that purpose, even if there was an MCP for providing access to a shell. Actually I think a shell MCP would be nice, because currently all agent environments have their own ways of permission management to the shell. At least with MCP one could bring the same shell permissions to every agent environment.

Though in practice I just use the shell, and not MCP almost at all; shell commands are much easier to combine, i.e. the agent can write and run a Python program that invokes any shell command. In the "MCP shell" scenario this complete thing would be handled by that one MCP, it wouldn't allow combining MCPs with each other.

Maybe an "agent shell" is what we need.


That is fine, but you give up any pretence of security - your agent can inspect your tool's process, environment variables etc - so can presumably leak API keys and other secrets.

Other comments have claimed that tools are/can be made "just as secure" - they can, but as the saying goes: "Security is not a convenience".


I've found Gemini useful in extracting timestamps for particular spots in videos. Presumably it works with transcriptions, given how fast it is.

The three answers it found were:

- Avoiding lock-in to them: http://www.youtube.com/watch?v=KKbgulTp3FE&t=1914

- Competitive advantage: http://www.youtube.com/watch?v=KKbgulTp3FE&t=1852

- Perceived Lack of Use Case: http://www.youtube.com/watch?v=KKbgulTp3FE&t=1971

Those points do actually exist in the video, I checked. If there are more, I don't know about them, as I haven't yet watched the rest of the video.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: