More

domh · 2026-03-31T11:06:58 1774955218

I have an M4 Max with 48GB RAM. Anyone have any tips for good local models? Context length? Using the model recommended in the blog post (qwen3.5:35b-a3b-coding-nvfp4) with Ollama 0.19.0 and it can take anywhere between 6-25 seconds for a response (after lots of thinking) from me asking "Hello world". Is this the best that's currently achievable with my hardware or is there something that can be configured to get better results?

zozbot234 · 2026-03-31T11:18:58 1774955938

> it can take anywhere between 6-25 seconds for a response (after lots of thinking) from me asking "Hello world".

Qwen thinking likes to second-guess itself a LOT when faced with simple/vague prompts like that. (I'll answer it this way. Generating output. Wait, I'll answer it that way. Generating output. Wait, I'll answer it this way... lather, rinse, repeat.) I suppose this is their version of "super smart fancy thinking mode". Try something more complex instead.

drob518 · 2026-03-31T11:28:59 1774956539

Indeed. Qwen doesn’t just second guess itself, it third and fourth guesses itself.

Kichererbsen · 2026-03-31T15:15:50 1774970150

Solid Terry Pratchett reference right there.

domh · 2026-03-31T11:41:10 1774957270

OK thanks! That's helpful. I ignorantly assumed simpler prompt == faster first response.

functional_dev · 2026-03-31T14:39:11 1774967951

I did not know, that NVFP4 was handled at the silicon level... until I dug deeper here - https://vectree.io/c/llm-quantization-from-weights-to-bits-g...

duffyjp · 2026-03-31T16:56:33 1774976193

I still don't think I understand it. I saw those nvfp4 models up by chance yesterday and tried them on my Linux PC with a 5060TI 16gb. Ollama refused to pull them saying they were macOS only.

I assumed it was a meta-data bug and posted an issue, but apparently nvfp4 doesn't necessarily mean nvidia-fp4.

https://github.com/ollama/ollama/issues/15149

Patrick_Devine · 2026-03-31T22:41:35 1774996895

They are nvidia-fp4 weights, but CUDA support isn't _quite_ ready yet, but we've got that cooking.

kylehotchkiss · 2026-03-31T17:34:28 1774978468

I made my M2 Max generate a biryani recipe for me last night with 64gb ram and the baseline qwen3.5:35b model. I used the newest ollama with MLX.

https://gist.github.com/kylehotchkiss/8f28e6c75f22a56e8d2d31...

Under 3 minutes to get all that. The thinking is amusing, my laptop got quite warm, but for a 35b model on nearly 4 year old hardware, I see the light. This is the future.

Patrick_Devine · 2026-03-31T22:39:22 1774996762

The 35b-a3b-coding-nvfp4 model has the recommended hyperparameters set for coding, not chatting. If you want to use it to chat you can pull the `35b-a3b-nvfp4` model (it doesn't need to re-download the weights again so it will pull quickly) which has the presence penalty turned on which will stop it from thinking so much. You can also try `/set nothink` in the CLI which will turn off thinking entirely.

Octoth0rpe · 2026-03-31T11:11:34 1774955494

> it can take anywhere between 6-25 seconds for a response (after lots of thinking) from me asking "Hello world".

That's not an unsurprising result given the pretty ambiguous query, hence all the thinking. Asking "write a simple hello world program in python3" results in a much faster response for me (m4 base w/ 24gb, using qwen3.6:9b).

xienze · 2026-03-31T11:31:47 1774956707

Well, two things. First, “hi” isn’t a good prompt for these thinking models. They’ll have an identity crisis trying to answer it. Stupid, but it’s how it is. Stick to real questions.

Second, for the best performance on a Mac you want to use an MLX model.

domh · 2026-03-31T11:42:13 1774957333

Thanks! I assumed simpler == faster, but my ignorance is showing itself.

I am using the model they recommended in the blog post - which I assumed was using MLX?

fooker · 2026-03-31T13:42:52 1774964572

Avoid reasoning models in any situation where you have low tokens/second

EagnaIonat · 2026-03-31T14:47:10 1774968430

When MLX comes out you will see a huge difference. I currently moved to LMStudio as it currently supports MLX.

domh · 2026-03-17T12:45:16 1773751516

I keep emailing my (Labour) MP about this, I suggest you do the same! I get the standard "protecting the children" response. I am not voting Labour again if this madness is still in place (or worse!) at the next GE.

pjc50 · 2026-03-17T12:49:40 1773751780

MPs are pretty bad at dealing with anything that doesn't come from the party or the newspapers. I'm donating to the Open Rights Group to care about this on my behalf.

(my MP is SNP, so I benefit from not being in the two party trap)

domh · 2026-03-09T07:57:39 1773043059

Hey! The demo didn't work in Firefox. It said something about setting up the database then it crashed the tab.

ciju · 2026-03-09T08:16:38 1773044198

It doesn't work in Incognito mode. Did you try it without incognito?

Kinrany · 2026-03-10T01:16:55 1773105415

ciju · 2026-03-10T05:57:30 1773122250

Basically sqlite on opfs mode: https://sqlite.org/wasm/doc/trunk/persistence.md#incognito

domh · 2026-03-09T10:40:17 1773052817

Oh actually, sorry I lied. I recently switched to Vanadium as my default browser which is the modified Chromium instance that ships with GrapheneOS. Apologies

domh · 2026-03-08T19:09:41 1772996981

You can use tailscale services to do this now:

https://tailscale.com/docs/features/tailscale-services

Then you can access stuff on your tailnet by going to http://service instead of http://ip:port

It works well! Only thing missing now is TLS

avtar · 2026-03-08T20:17:32 1773001052

This would be perfect with TLS. The docs don't make this clear...

> tailscale serve --service=svc:web-server --https=443 127.0.0.1:8080

> http://web-server.<tailnet-name>.ts.net:443/ > |-- proxy http://127.0.0.1:8080

> When you use the tailscale serve command with the HTTPS protocol, Tailscale automatically provisions a TLS certificate for your unique tailnet DNS name.

So is the certificate not valid? The 'Limitations' section doesn't mention anything about TLS either:

https://tailscale.com/docs/features/tailscale-services#limit...

domh · 2026-03-09T10:00:52 1773050452

I think maybe TLS would work if you were to go to https://service.yourts.net domain, but I've not tried that.

nickdichev · 2026-03-09T11:22:14 1773055334

It works, I’m using tailscale services with https

avtar · 2026-03-09T15:19:07 1773069547

Thanks for clarifying :) I'll try it out this weekend.

domh · 2026-03-04T07:08:08 1772608088

NatWest and Monzo work fine on my Pixel 9a running GrapheneOS. Community maintained list of supported banking apps here:

https://privsec.dev/posts/android/banking-applications-compa...

Google Wallet is not supported at all.

aembleton · 2026-03-04T10:54:53 1772621693

Curve works and you can set that up as a replacement for Google Pay.

m00dy · 2026-03-04T08:33:11 1772613191

with avbroot ?

domh · 2026-03-04T10:13:24 1772619204

I didn't have to do any resigning or repacking apks. It just worked installed from the play store.

domh · 2026-02-19T07:07:40 1771484860

The UK is no longer in the EU; The UK is still in Europe and is very much European.

domh · 2026-02-17T12:37:44 1771331864

Here's a community maintained list of apps and whether or not they work:

https://privsec.dev/posts/android/banking-applications-compa...

This is linked to from the Banking Apps section on GrapheneOS docs: https://grapheneos.org/usage#banking-apps

Sample size of 1: my UK banking apps all work fine.

domh · 2026-02-17T12:33:45 1771331625

Yeah spot on. I think this is the only thing that's been announced so far: https://www.androidauthority.com/graphene-os-major-android-o...

domh · 2026-02-17T12:30:39 1771331439

This is similar to Deno Sandbox[1] which was announced a couple of weeks back. Apparently also something similar is done with fly.io's tokenizer[2][3]

[1]: https://deno.com/blog/introducing-deno-sandbox

[2]: https://news.ycombinator.com/item?id=46874959

[3]: https://github.com/superfly/tokenizer

domh · 2026-01-27T10:49:56 1769510996

My friend made this site to try and surface the best place to buy music: https://streamtoshelf.com/

He also made a section of the site that allowed you to login via Spotify and it would aggregate your listening history and tell you how much it would cost to buy all of your most listened to albums. Annoyingly Spotify seems to restrict the oauth app creation process, so users have to be invited by email to access that.