Memory bandwidth is quite low for the price. $3,000 for 128GB @ 273GB/s. You can...

s_dev · 2025-03-31T14:53:22 1743432802

I love MacBooks and Mac machines however they aren't cut out for specialised work.

I have a M4 Max with 128 GB RAM. If I play Civ VI (a game released in 2016) on it for a few hours without limiting the FPS it will heat up till it turns itself off.

It's not cut out for heavy loads consistently like games or crypto mining. It's cut out for a heavy loads like Xcode compiling for a few minutes then back to editing text.

My gaming machine which has poorer performance (in fps terms) and is equipped with AMD 3700 and a 2080 Super can play Civ VI indefinitely without breaking a sweat.

tw04 · 2025-03-31T15:15:17 1743434117

Couple of things: This must be new to the M4? I've played Civ VI for hours and hours on an M1 Max without issue.

If someone is looking at this Nvidia box, they're likely fine with a desktop footprint, in which case they'd be looking at the Mac Studio which should not have any thermal issues whatsoever. I'm guessing you're on a laptop?

If they are insistent on a laptop format, you can alleviate overheating issues pretty easily with some thermal pads and running the laptop on a cooling base when you're running heavy operations:

https://www.youtube.com/watch?v=IACHo5y9Los

singhrac · 2025-03-31T15:21:45 1743434505

That's pretty surprising and breaks my mental model of how these chips perform. Do you think that's because they don't have the raw FLOPs, something inefficient in the Apple/Metal rendering pipeline, something about Civ VI and how it was converted (i.e. x86 or DirectX emulation), or something else entirely?

I've had no trouble with a (base) M4 mini for regular dev work, though I compile remotely and haven't played any games on it.

flutas · 2025-03-31T16:09:17 1743437357

FWIW I also just started seeing comments from other devs on my team today about their M4 mini's thermally shutting down during a compile run in Android Studio frequently.

Shocking to say the least imo.

vimy · 2025-03-31T15:24:10 1743434650

Did you change the fan settings? Default is for fans to be quiet. If you go to energy management in settings you can choose “more power” or something instead of automatic. Your Mac will be louder but throttle less.

dcrazy · 2025-03-31T15:22:01 1743434521

I’m curious if Civilization VII has the same issue. Baldur’s Gate 3 was released around the same timeframe as Civ 6 and has some pathological behavior on higher-core-count machines.

comrade1234 · 2025-03-31T15:06:15 1743433575

They’re not playing civ vi. They’re using the gpu and memory exclusively. When I’m running deepseek on my M1 Max it barely heats up.

Rohansi · 2025-03-31T14:22:47 1743430967

Only looking at memory bandwidth as a measure of performance give you an incomplete picture. You also need to look at how much of that bandwidth your processor (CPU, GPU, NPU, etc.) can actually consume because it can be far less than the memory modules are capable of.

derekp7 · 2025-03-31T15:10:44 1743433844

You can also get an Epyc 9115 for $800, motherboard for $640, and 12 16-GiB ddr5-6400 dims for $1400, that gives you 614.4 GiB/sec, for around $2800. You may also want to add in a small GPU to do prompt processing (inference on a CPU is memory bandwidth bound, prompt processing is processing bound).

dcrazy · 2025-03-31T15:17:31 1743434251

How does CPU-based inference compare to GPU-based inference, performance-wise? And aren’t these machines likely to be used for training?

fransje26 · 2025-03-31T20:24:29 1743452669

In which world do you get 614GiB/s memory bandwidth with an Epyc?

I think the best you can dream of is 480.0 GB/s, so 447 GiB/s.

derekp7 · 2025-03-31T23:15:26 1743462926

I was going by the number of memory channels the CPU spec says it supports (12). But apparently I was wrong, as that gets bottlenecked by the number of CCDs on the chip. In which case you would need to go with a much higher end epyc processor, and then there are other limits. So much for napkin math

ekianjo · 2025-03-31T14:21:36 1743430896

Mac Studio is virtually useless if you work with long contexts. You will end up with having to wait minutes before the first token comes out.

strongpigeon · 2025-03-31T14:51:42 1743432702

As someone who doesn’t know that much about AI performance, why is that?

fancyfredbot · 2025-03-31T15:42:17 1743435737

The M series CPUs have very good memory bandwidth and capacity which lets them load in the billions of weights of a large LLM quickly.

Because the bottleneck to producing a single token is typically the time taken to get the weights into the FPU macs perform very well at producing additional tokens.

Producing the first token means processing the entire prompt first. With the prompt you don't need to process one token before moving on to the next because they are all given to you at once. That means loading the weights into the FPU onlu once for the entire prompt, rather than once for every token. That means the bottleneck isn't the time to get the weights to the FPU, it's the time taken to process the tokens.

Macs have comparatively low compute performance (M4 Max runs at about 1/4 the FP16 speed of the small nvidia box in this article, which itself is roughly 1/4 the speed of a 5090 GPU).

dumbmrblah · 2025-03-31T14:56:10 1743432970

Time point 1 is processing all the tokens from the original prompt.

Time point 2 is replying.

cma · 2025-03-31T14:55:45 1743432945

Next token is mostly bandwidth bound, prefill/ingest can process tokens in parallel and starts becoming more compute heavy. Next token(s) with speculative decode/draft model also becomes compute heavy since it processes several in parallel and only rolls back on mispredict.