More

trees101 · 2026-04-15T00:19:56 1776212396

From my reading, the official docs don’t support the strong claim that frontier LLMs are explicitly RL-trained to “be lazy” or conserve tokens as claimed in this thread. What they do document is adaptive / hidden reasoning compute: OpenAI says reasoning models allocate internal reasoning tokens and reasoning.effort controls how many are used (https://developers.openai.com/api/docs/guides/reasoning), and Anthropic says adaptive thinking decides whether/how much to use extended thinking based on request complexity, with effort as soft guidance and max_tokens as the hard cap (https://docs.anthropic.com/en/docs/build-with-claude/adaptiv... hinking). So prompt wording may change how the same budget is spent, but it can’t exceed the hard token cap.

Also, the “encouragement helps” anecdote seems real in the AlphaEvolve workflow, but I can't see that forpublic models. Gómez-Serrano says this in Quanta (https://www.quantamagazine.org/the-ai-revolution-in-math-has... rived-20260413/), and the released AlphaEvolve notebooks really do contain prompts like “Good luck, I believe in you...” (https://github.com/google-deepmind/alphaevolve_repository_of... oblems, e.g. https://github.com/google-deepmind/alphaevolve_repository_of... blems/blob/main/experiments/finite_field_kakeya_problem/finite_f ield_kakeya.ipynb). But those prompts also bundled strong structural hints (“find a general solution”, “better constructions are possible”), so from my reading the evidence is: prompt phrasing matters, especially in an internal search stack, but not “pep talks are a universal reasoning hack.”

dataviz1000 · 2026-04-15T03:42:17 1776224537

> Anthropic says adaptive thinking decides whether/how much to use extended thinking based on request complexity, with effort as soft guidance and max_tokens as the hard cap

Nothing I said contradicts this.

Here is the first attempt of what I'm testing. [0] Haiku can get the correct answer to `floor( (1234567 * 8901234) / 12345 )` or

``` Math.floor( (Math.floor(Math.random() * 9000000 + 1000000) * Math.floor(Math.random() * 9000000 + 1000000)) / Math.floor(Math.random() * 9000000 + 1000000) ) ```

Given this Haiku will give a correct answer 77.8% of the time. Add one digit or remove a digit, it is very highly predictable also.

That is the WHOLE point. The models are predictable!

Given that prompt Sonnet at 37-digit × 37-digit (~10³⁷) never quits a predictable percentage of the time!

And, Opus at 80-digit × 80-digit simply quits after 9 seconds and 333 tokens!

This is the amazing thing people are not discussing. The models are very predictable.

The AI companies are not posting this information because it shows how unreliable the models are, however, I think there is great virtue that the models are consistently unreliable.

[0] https://github.com/adam-s/agent-tuning/blob/main/application...

trees101 · 2026-04-15T04:06:09 1776225969

looks like you've done some thorough testing. Have you found that prompting reliably reduces premature quitting? And have you found that reducing premature quitting results in more accuracy?

dataviz1000 · 2026-04-15T04:55:25 1776228925

Because these are probabilistic machines, they solve the same problem at a predictable rate. Even with different variables, the success rate stays consistent.

I only noticed the premature quitting issue recently and haven't tested it much yet. It's getting expensive to run Sonnet on hard multiplication problems. I let it run to 200k tokens and it still grinds without quitting.

But Opus has a different problem. Ask it to solve a Rubik's Cube and it will run for hours and never solve it. So there are definitely prompts that make it run forever. But if you tell it to break down multiplication using algorithms, it behaves differently. It can take really complicated calculus problems and break them into simpler ones. I can't stump it that way.

Here's the interesting thing. Even when Opus solves modular expressions by breaking them down like calculus, it still fails at a predictable rate. There's a constant failure rate no matter what you do at any level of complexity.

Models have a baseline failure rate that prompting can't change. You can change how they fail -- token burn or quitting early -- but the underlying limit stays the same.

trees101 · 2026-01-28T00:38:30 1769560710

The P≠NP conjecture in CS says checking a solution is easier than finding one. Verifying a Sudoku is fast; solving it from scratch is hard. But Brandolini's Law says the opposite: refuting bullshit costs way more than producing it.

Not actually contradictory. Verification is cheap when there's a spec to check against. 'Valid Sudoku?' is mechanical. But 'good paper?' has no spec. That's judgment, not verification.

degamad · 2026-01-28T05:24:46 1769577886

> The P≠NP conjecture in CS says checking a solution is easier than finding one...

... for NP-hard problems.

It says nothing about the difficulty of finding or checking solutions of polynomial ("P") or exponential ("EXPTIME") problems.

bwfan123 · 2026-01-28T02:15:00 1769566500

producing BS can be equated to generating statements without caring for their truth value. Generating them is easy. Refuting them requires one to find a proof or a contradiction which is a lot of work, and is equal to "solving" the statement. As an analogy, refuting BS is like solving satisfiability, whereas generating BS is like generating propositions.

rspijker · 2026-01-28T08:19:19 1769588359

It's not contradictory because solving and producing bullshit are very different things. Generating less than 81 random numbers between 1 and 9 is probably also cheaper than verifying correctness of a sudoku.

trees101 · 2026-01-22T00:14:47 1769040887

Skill issue. I'm far more interactive when reading with LLMs. I try things out instead of passively reading. I fact check actively. I ask dumb questions that I'd be embarrassed to ask otherwise.

There's a famous satirical study that "proved" parachutes don't work by having people jump from grounded planes. This study proves AI rots your brain by measuring people using it the dumbest way possible.

trees101 · 2026-01-14T22:13:10 1768428790

why would you do that rather than just revoking the key directly in the anthropic console?

mingus88 · 2026-01-14T22:15:00 1768428900

It’s the key used by the attackers in the payload I think. So you publish it and a scanner will revoke it

trees101 · 2026-01-14T22:21:40 1768429300

oh I see, you're force-revoking someone else's key

rswail · 2026-01-15T07:35:48 1768462548

Which is an interesting DOS attack if you can find someone's key.

OJFord · 2026-01-15T09:10:29 1768468229

The interesting thing is that (if you're an attacker) your choice of attack is DoS when you have... anything available to you.

freakynit · 2026-01-15T02:11:32 1768443092

Does this mean a program can be written to generate all possible api keys and upload to github thereby revoke everyone's access?

kylecazar · 2026-01-15T02:18:02 1768443482

They are designed to be long enough that it's entirely impractical to do this. All possible is a massive number.

freakynit · 2026-01-15T02:48:46 1768445326

That's true tho... possible, but impractical.

antonvs · 2026-01-15T06:11:39 1768457499

Not possible given the amount of matter in the solar system and the amount of time before the Sun dies.

cortesoft · 2026-01-15T03:07:47 1768446467

Only possible if you are unconstrained by time and storage.

eru · 2026-01-15T04:20:40 1768450840

Not only you, but GitHub too, since you need to upload.

Storage is actually not much of a problem (on your end): you can just generate them on the fly.

trees101 · 2025-05-08T22:58:08 1746745088

great tips if you want more context, aider has /copy-context that copies the files that you have added to the context, your chat (I think). you can then paste into a subscription chat app where you're not paying per token

trees101 · 2025-05-08T22:52:57 1746744777

https://github.com/hotovo/aider-desk is a gui, takes 5 mins to install, has MCP support (try context7). Definitely worth a look and is an "easy" way in to aider.

trees101 · 2025-05-05T23:48:56 1746488936

its a pity that it only works with Claude out of the box. There is a way to proxy it to other models: https://github.com/1rgs/claude-code-proxy I've found it works with Gemini. But would be better if it just allowed switching.

trees101 · 2025-05-01T02:05:37 1746065137

how do we access this?

Rebelgecko · 2025-05-01T03:02:39 1746068559

https://support.google.com/youtube/answer/14110396

If you're not already a premium subscriber you may want to stick with other tools. I didn't mean to unintentionally advertise YouTube Premium:)

trees101 · 2025-04-28T08:17:59 1745828279

edit your `.config/tmuxai/config.yaml`

to add these lines:

``` openrouter: api_key: "dummy_key" model: gemma3:4b base_url: http://localhost:11434/v1 ```

trees101 · 2025-04-28T09:14:38 1745831678

I tried it and it didn't work too well. I suspect the prompts were optimized for Gemini, not local Gemma.

TBH I found the whole thing quite flaky even when using Gemini. I don't think I'll keep using it, although the concept was promising.

alvinunreal · 2025-04-28T09:37:20 1745833040

It's still early release, I know it should still be improved but my hope is community could help.

trees101 · on April 20, 2025

Not sure how accurate my stats are. I used ollama with the --verbose flag. Using a 4090 and all default settings, I get 40TPS for Gemma 29B model

`ollama run gemma3:27b --verbose` gives me 42.5 TPS +-0.3TPS

`ollama run gemma3:27b-it-qat --verbose` gives me 41.5 TPS +-0.3TPS

Strange results; the full model gives me slightly more TPS.

orangecat · on April 20, 2025

ollama's `gemma3:27b` is also 4-bit quantized, you need `27b-it-q8_0` for 8 bit or `27b-it-fp16` for FP16. See https://ollama.com/library/gemma3/tags.