More

thomashabets2 · 2026-05-20T16:23:51 1779294231

The point of this post, though, is even something as simple as "give me this string as an integer" doesn't have an answer that doesn't come with "are you OK with this best effort parse under these edge cases? Oh and we use this number as error, so you can't parse that".

Like… edge cases? It's parsing a number! We're not talking about I/O on hard vs soft intr NFS mounts, here. There's a right answer.

strlen(), on valid null terminated strings, doesn't come with caveats like "oh we can't measure strings of length 99".

But sure, C is turing complete. It is possible to solve any problem a turing machine can solve.

> understand the target platform and the target compiler’s behavior.

This is neither. This is purely the language.

prerok · 2026-05-20T18:25:04 1779301504

Somewhat true, but C is pretty close to translating directly to machine code, even if most compilers now do so many complex things the assembly can be pretty far off. My point being is that if you have a type int in your program, it's specifically tied to the byte size of an integer on the target platform. While it can be 8, 16, 32, 64 bits, it's defined based on what the target platform supports efficiently.

So, when you say, "it's purely the language", I have to disagree. The language means different things on different platforms but it's still defined exactly on the target platform. And it's efficient on that platform.

Nowadays, we prefer correct vs. efficient, which I do agree with, of course. But, I also understand why C is like it is. It is possible to claim it's a problem of the language but I would argue that it is not. C gives us barebones and working with it we have to know this. If that's not needed then sure, other languages will be easier to work with.

thomashabets2 · 2026-05-21T07:28:05 1779348485

> C is pretty close to translating directly to machine code

The C standard defines only its abstract machine, not actual hardware.

> The language means different things on different platforms but it's still defined exactly on the target platform

It's implemented to support a target platform, so that programs behave as if they ran on the abstract machine.

It'd be nice if we could move more stuff from UB to implementation defined.

Do keep in mind that target platform can change, in this regard. E.g. IIRC OpenBSD doesn't guarantee the ABI backward compatibility that Linux does, and can change things like size of int if they want, between versions.

> I also understand why C is like it is

Yup. It can be true that I understand why, and still understand that it's 2026.

thomashabets2 · 2026-05-20T16:11:46 1779293506

Only literally. 7.24.1 in the C programming language spec has these poor parsers.

rbanffy · 2026-05-20T16:51:41 1779295901

Is their misbehavior part of the spec as well? If not, we can always add the correct behavior to the spec and let anyone who implemented a broken version deal with fixing every program compiled using it.

thomashabets2 · 2026-05-20T17:15:54 1779297354

Fair enough.

For strtoul and friends, maybe? 7.24.1 is pretty dense, but the key parts are "the expected form of the subject sequence is a sequence of letters and digits representing an integer with the radix specified by base, optionally preceded by a plus or minus sign […] If the correct value is outside the range of representable values […] ULONG_MAX […] is returned".

So the "expected form" allows a minus sign, but then it's clearly "outside the range of representable values" for strtoul to try parsing a negative value. So maybe it should return ULONG_MAX on those.

So arguably a minus sign present could already be treated as an error, and still be standard compliant. Unless I'm misreading.

rbanffy · 2026-05-20T19:03:14 1779303794

Passing a negative value to a function that is specifically for converting strings into unsigned numbers is pretty much an error. In the case of functions that return an unsigned number, at least, negative return values can represent errors.

It’s more fun when the result can be signed though. Maybe strcmp with the representation of the LONG_MAX, and if it doesn’t match, call strtol and watch for a LONG_MAX indicating an error.

C is a bit messy. Would be nicer to return a struct with a possible error and the desired value, Golang style.

thomashabets2 · 2026-05-20T21:55:38 1779314138

If that's an error then so is passing in a non number.

So catch 22. You can only check for valid numbers if the number is valid?

rbanffy · 2026-05-21T09:38:57 1779356337

If you pass a negative number as a string to a function that converts strings representing positive numbers into positive long integers, then yes, it should return an error status instead of the wrong result.

thomashabets2 · 2026-05-22T17:33:26 1779471206

Ah, I misunderstood what you meant by "is an error". Agree.

imtringued · 2026-05-21T06:50:40 1779346240

That's the C way, yes.

thomashabets2 · 2026-05-21T07:20:40 1779348040

No. This is more like if strcmp compared null terminates strings, but can only compare strings that are in fact equal.

thomashabets2 · 2026-05-20T16:03:47 1779293027

While snprintf() is better than sprintf(), I find that it's easy for people to not check if the return value is bigger than the provided size. Sure, it prevents a buffer overflow, but there could still be a string truncation problem.

Similar to how strlcpy() is not a slam dunk fix to the strcpy() problem.

alexfoo · 2026-05-20T16:31:35 1779294695

That's partly the point.

If someone uses sprintf() you have to go faffing around to check whether they've thought about the destination buffer size. The size of the structure may be buried far away through several layers of other APIs/etc.

Using snprintf() doesn't solve this in any way, but checking whether the new use of snprintf() checks the return value is relatively simple. Again, there's still no guarantee that there aren't other problems with snprintf() but, in our experience, we found that once people were forced to use it over sprintf() and had things checked in PR reviews we found that the number of instances of misuse dropped dramatically.

It wasn't the switch of functions that reduced the number of problems we saw, but the outright banning of the known footgun `sprintf()` and the careful auditing and replacement of it with `snprintf()` that served as a whole load of reference copies for how to use it. We spread the work of replacing `sprintf()` around the team so that everyone got to do some of the switches and everyone got to review the changes. And we found a whole load of possible problems (most of which were very unlikely to ever lead to a crash or corruption.)

The same would apply if you picked any other known footgun and did similar refactoring/rewrites/auditing/etc.

Anyway, I haven't done C commercially/professionally for about 5 years now. I do miss it though.

thomashabets2 · 2026-05-20T15:59:00 1779292740

For unsigned that could work, but signed overflow is UB.

thomashabets2 · 2026-05-20T15:54:57 1779292497

> they don't like seem to like that unsigned parsing will accept negative numbers and then automatically wrap them to their unsigned equivalents, nor do they like that C number parsing often bails with best effort on non-numeric trailing data rather than flagging it an error, nor do they like that ULONG_MAX is used as a sentinel value by sscanf.

That's right. I don't like asking it to parse the number contained inside a string, and getting a different number as a result.

That's just simply not the right answer.

> I'm not sure what they mean by "output raw" vs "output"

I can see how that's very unclear. Changed now to "Readable".

thomashabets2 · 2026-05-20T15:49:13 1779292153

Yup. Sorry about that.

thomashabets2 · 2026-05-20T13:38:10 1779284290

Are you talking about creating a pointer (more than one item) past an array, or dereferencing that pointer? Both are currently UB.

For the former, I kinda get it. It may need to be there for cases like with segmented address space where p+10 could actually be a value less than p, for the eventually generated assembly. Maybe it should be fine to create such a pointer, but have it be "indeterminate value" or whatever, if you try to compare that pointer to anything? I don't know enough about compiler internals to say one way or the other.

Dereferencing, though, can only be UB. There may not be a "value" behind that address. There may be a motor that's been I/O mapped, or a self destruct button.

lelanthran · 2026-05-20T15:11:14 1779289874

I'm not saying that the result of the dereference be known, I'm saying that the instructions to do the dereference be always emitted.

Right now, if a dereference results in UB, the compiler may omit it entirely.

thomashabets2 · 2026-05-20T16:49:03 1779295743

I think I would defer to someone more of a language lawyer than we, but I'm not sure what you're describing can be expressed in the C abstract machine. If a pointer is invalid, not pointing to an object, then I'm not sure it means anything to "read from there".

I know what you mean, but I'm just not sure you're describing something that fits what C "is". We program C to the abstract machine specified in the standard (5.1.2), and the compiler's job is to translate that into something with identical behavior on particular hardware. Piercing the layers down to actual hardware or assembly isn't really done.

Even "volatile" just says (basically) "touching this object has side effects". It implies no double-loading, speculative store, etc, but doesn't say "don't emit assembly instructions to load this unless the program logic path takes the route where the C program does load it".

The standard is not using ancient language when it refers to "objects with static storage duration" instead of "heap" or ".data segment". It is the true class of objects in the abstract machine.

charleslmunger · 2026-05-20T15:41:51 1779291711

Wouldn't that make a compiler that emitted bounds checks violate the standard, since it would not be emitting the actual memory operations if you deref out of bounds?

marcosdumay · 2026-05-20T16:55:59 1779296159

No, because it's UB so there is no standard.

charleslmunger · 2026-05-21T07:47:05 1779349625

Isn't the proposal from the parent comment to define the behavior?

thomashabets2 · 2026-05-20T10:56:21 1779274581

Author here.

> It barely scratches the surface.

I agree. The point of the post is not to enumerate and explain the implications of all 283 uses of the word "undefined" in the standard. Nor enumerate all the things that are undefined by omission.

The point of the post is to say it's not possible to avoid them. Or at least, no human since the invention of C in 1972 has.

And if it's not succeeded for 54 years, "try harder", or "just never make a mistake", is at least not the solution.

The (one!) exploitable flaw found by Mythos in OpenBSD was an impressive endorsement of the OpenBSD developers, and yet as the post says, I pointed it at the simplest of their code and found a heap of UB.

Now, is it exploitable that `find` also reads the uninitialized auto variable `status` (UB) from a `waitpid(&status)` before checking if `waitpid()` returned error? (not reported) I can't imagine an architecture or compiler where it would be, no.

FTA:

> The following is not an attempt at enumerating all the UB in the world. It’s merely making the case that UB is everywhere, and if nobody can do it right, how is it even fair to blame the programmer? My point is that ALL nontrivial C and C++ code has UB.

wahern · 2026-05-20T17:30:45 1779298245

> Now, is it exploitable that `find` also reads the uninitialized auto variable `status` (UB) from a `waitpid(&status)` before checking if `waitpid()` returned error? (not reported) I can't imagine an architecture or compiler where it would be, no.

I presume you're referring to this code:

  pid = waitpid(pid, &status, 0);
  if (WIFEXITED(status))
    rval = WEXITSTATUS(status);
  else
    rval = -1;

The only signal handler find installs is for SIGINFO, and it uses the SA_RESTART flag, so EINTR can be ruled out. The pid argument is definitely valid as you can't reach the above if it wasn't, and there's no other way for the child process to be reaped[1], so no ECHILD.

A check should probably be added in case the situation changes in the future, triggering spooky action at a distance, or were that code to be copy+pasted somewhere where the invariants didn't hold. But I think the current code in its current context is, strictly speaking, correct as-is.

[1] OpenBSD lacks the kernel features for such surprises that might theoretically be possible on Linux.

thomashabets2 · 2026-05-20T21:48:46 1779313726

Indeed. That's why I didn't deem it worth reporting.

But in my code, I would have fixed for the reasons you mention. Sprinkle enough of these around, and some low percentage will in the future have its assumption invalidated.

BobbyTables2 · 2026-05-21T02:20:42 1779330042

Couldn’t waitpid return EINTR if the (parent) process were stopped and then continued?

EINTR scares the crap out of me because nobody expects it!

wahern · 2026-05-21T09:13:52 1779354832

No. You only get EINTR when a signal handler fires and you didn't use the SA_RESTART flag with sigaction. If you don't install any signal handlers, or you use SA_RESTART on all handlers, or you've blocked/masked all signals (or at least the ones with handlers), you won't get EINTR.

When writing library code, it's important to consider EINTR because you can't know about signal dispositions. Though, the common practice of looping on EINTR kind of defeats the purpose.

muvlon · 2026-05-20T11:07:30 1779275250

Fair enough!

> And if it's not succeeded for 54 years, "try harder", or "just never make a mistake", is at least not the solution.

And I 100% agree. UB is way overused by these standards for how dangerous it is, and as a consequence using C (and C++) for anything nontrivial amounts to navigating a minefield.

webstrand · 2026-05-20T16:02:43 1779292963

I think as compilers got smarter, UB changed somewhat in meaning. Originally the compilers didn't perform such complex analysis, and while invoking UB could break your program, it would still do something reasonable.

marcosdumay · 2026-05-20T16:42:02 1779295322

Yes, but compilers got smart enough for it to be a problem around 30 years ago, and we are still arguing about what to do.

pjmlp · 2026-05-20T17:34:37 1779298477

You see a reasoning here, basically when all those C compiler benchmarks started, vendors moved from what Frank Allen described, to anything goes to win SPEC something benchmarks.

"Oh, it was quite a while ago. I kind of stopped when C came out. That was a big blow. We were making so much good progress on optimizations and transformations. We were getting rid of just one nice problem after another. When C came out, at one of the SIGPLAN compiler conferences, there was a debate between Steve Johnson from Bell Labs, who was supporting C, and one of our people, Bill Harrison, who was working on a project that I had at that time supporting automatic optimization...The nubbin of the debate was Steve's defense of not having to build optimizers anymore because the programmer would take care of it. That it was really a programmer's issue.... Seibel: Do you think C is a reasonable language if they had restricted its use to operating-system kernels? Allen: Oh, yeah. That would have been fine. And, in fact, you need to have something like that, something where experts can really fine-tune without big bottlenecks because those are key problems to solve. By 1960, we had a long list of amazing languages: Lisp, APL, Fortran, COBOL, Algol 60. These are higher-level than C. We have seriously regressed, since C developed. C has destroyed our ability to advance the state of the art in automatic optimization, automatic parallelization, automatic mapping of a high-level language to the machine. This is one of the reasons compilers are ... basically not taught much anymore in the colleges and universities."

-- Fran Allen interview, Excerpted from: Peter Seibel. Coders at Work: Reflections on the Craft of Programming

saagarjha · 2026-05-20T11:16:22 1779275782

What should the behavior above be defined to do?

tsukikage · 2026-05-20T12:31:40 1779280300

“Implementation defined behaviour”: compiler author chooses, and documents the choice.

A lot of UB should be implementation defined behaviour instead; this would much better match programmers’ intuitions as they reason about their code - you can even see it in the comments of this post: it’s always things like “this hardware supports / doesn’t support unaligned accesses”, it’s never nasal demons.

tardedmeme · 2026-05-20T15:02:54 1779289374

I told someone at a conference that UB actually means "implementation-defined, no documentation required". He started to refute me and then stopped.

VorpalWay · 2026-05-20T17:05:29 1779296729

That isn't true, for UB the compiler is allowed to assume the UB can never happen. For example if you dereference a pointer and only after check if it is NULL, the compiler can remove the NULL check, since it is clearly impossible (nevermind that you might be on a microcontroller where NULL is a valid address).

The fallout of this are quite large! If behaviour is implementation defined the compiler has to stick to one consistent behaviour. No such need for UB, you can get different behaviour bu changing unrelated code, by changing between debug and release or just because of what garbage happened to be on the stack.

Since the compiler is allowed to assume the UB doesn't happen it will also sometimes look like the compiler miscompiled your code elsewhere, but what actually happened was some inlining followed by extrapolating "this can never happen".

UB is often surprising: I have seen unaligned loads crash on x86 due to it bring UB in C (even though x86 is generally fine with it). But once a newer compiler decided that it was fine to vectorise that code (since it clearly aligned) the CPU was no longer happy with it.

thomashabets2 · 2026-05-20T21:52:56 1779313976

I think parent commenter made a joke. UB can be seen as "implementation defines this to reformat your hard drive. No we don't document it".

That is, the compiler de facto defines what happens when you compile UB code.

So you're not wrong, but I think you missed the sarcastic spin of parent.

Maxatar · 2026-05-21T04:38:48 1779338328

>That is, the compiler de facto defines what happens when you compile UB code.

That is not what undefined behavior is though, that is unspecified behavior.

The entire point of undefined behavior is to cover the cases where the compiler can't define the semantics of your program either because doing so is genuinely not possible, or is incredibly onerous to deduce, or would require introducing runtime checks whose performance cost is at odds with C and C++'s predominant use cases.

thomashabets2 · 2026-05-21T07:31:06 1779348666

Sorry, by "de facto defines" I meant that it factually does something, even if that "something" is "segfault the compiler at build time".

That "de facto" did some heavy lifting.

marcosdumay · 2026-05-20T16:43:24 1779295404

Except that UB doesn't mean that. UB means "the developer must never write this".

munch117 · 2026-05-20T17:25:18 1779297918

Both are wrong. It means "this standard does not constrain the behaviour of code that does this".

It's entirely legal for implementations to have predictable behaviour, documented or not, for code that is undefined by the standard. In their quest for maxxing benchmark performance they generally choose not to, but there's really nothing in any standard that stops you from making an implementation that prioritises safety.

tardedmeme · 2026-05-20T17:51:43 1779299503

Every implementation so far has predictable behavior in all cases. Sometimes the rules for predicting it are very obscure. But it's all fully defined within the compiler's binary code. And none of them link to nasal portals.

AlotOfReading · 2026-05-20T18:25:20 1779301520

How do you propose to predict the behavior of a true race condition with only the binary, faithfully translated by the compiler?

Moreover, this is at best an incredibly pedantic point, not something that changes how programmers need to approach UB. You can't review the source code of a compiler that hasn't been written yet.

munch117 · 2026-05-21T06:38:33 1779345513

I didn't suggest that implementations should entirely eliminate every form of UB. There is plenty of middle ground. For example, you could easily limit the consequences of integer overflow by specifying or partially specifying overflow behaviour, with very little runtime cost.

I'm not suggesting you change how you write code, but with a better implementation the code that you do write - that lives in the real world where mistakes are made - might work better. How is that being pedantic?

An interesting case where compiler writers did something like that is casting via union members, but I'm running out of time, so we can talk about that another day.

tardedmeme · 2026-05-21T07:14:45 1779347685

It's fully defined by your CPU's silicon masks and your compiler's binary code that one of several things will happen.

tsukikage · 2026-05-21T07:53:34 1779350014

Turns out that when you're implementing network applications, the set of things that could happen also depends on what the script kiddie on the other side of the globe feels like this morning.

Some would prefer less excitement than this.

C code should be more predictable and easier to reason about than using a macro assembler. To the extent it is not, the language has failed.

PaulRobinson · 2026-05-23T08:20:14 1779524414

Given that it sits at the heart of the network stack, kernel and device drivers for every major operating system, is in many, many embedded devices in the World around us, and is responsible for making decent chunk of the global economy keep moving, that’s quite a failure case.

Perhaps some professional programmers know how to write secure software in a language with undefined behaviour. Maybe we should think about that more rather than just writing off an obviously huge success as a failure?

Filligree · 2026-05-20T12:00:06 1779278406

Print x twice. Not all “side effects” care about order.

Better yet, define an order for parameter evaluation.

HelloNurse · 2026-05-21T12:08:57 1779365337

There is an easy way to take control: read the volatile variable only once.

  volatile int x = 5;
  ...
  int y=x;
  printf("%d in hex is 0x%x.\n", y, y);

poppadom1982 · 2026-05-20T12:43:58 1779281038

You're missing the point. Volatile forces two loads of a value that may have changed in the middle. So the value of "x" may depend on the time/order of load.

AnimalMuppet · 2026-05-20T13:17:59 1779283079

Which is, if I understand correctly, the entire point of volatile. Don't use it if you don't want that behavior.

And in fact, in the example given, if there is something (another thread or whatever) that can change the value of x, then you don't know what either number will be. Well, in that circumstance, without volatile, it may print the same number both times, but you still don't know what the number will be (unless the read gets optimized away entirely).

chuckadams · 2026-05-20T13:42:43 1779284563

If that behavior is the entire point, then I think the bigger point is that the spec should reflect that and not call it undefined.

voakbasda · 2026-05-20T14:00:51 1779285651

I suspect that many undefined behaviors reflect the inability of the standard committee to come to a consensus on the nuances involved. “Punt to the implementers” is a way to allow every tool vendor to select their own expected behavior in those cases.

chowells · 2026-05-20T15:14:42 1779290082

You seem to be operating under the assumption "undefined behavior" means "the compiler authors can decide what to do." That's not what it means. It means "any program that causes this behavior to be triggered is not a valid C program, the programmer knows this and did not submit an invalid program, and the programmer explicitly prevented this from happening elsewhere in ways automated analysis cannot detect. Proceed with compilation knowing this branch is impossible."

The spelling for compiler authors getting to choose a behavior is "implementation defined", as the other comment mentions.

tardedmeme · 2026-05-21T07:15:40 1779347740

It means the C standard does not specify what the program does. Other documents may still specify what the program does. And the program definitely still does something, whether specified or not.

chowells · 2026-05-21T16:45:33 1779381933

> And the program definitely still does something, whether specified or not.

No. It most definitely does not mean this. Go read the series this is part of: https://blog.llvm.org/2011/05/what-every-c-programmer-should...

It is absolutely critical that people programming in C understand what real compilers in the real world do.

tardedmeme · 2026-05-21T23:07:16 1779404836

Are you saying the programs in the blog post don't do things? I think they do things. The whole blog post is talking about which things they do.

MarkusQ · 2026-05-20T14:36:07 1779287767

Then it should be "implementation defined" rather than "undefined".

hmry · 2026-05-20T13:11:32 1779282692

Why is that missing the point? Loading it twice, possibly with different values, is the intended behavior. It's only undefined because the C spec doesn't specify the order of the loads (unlike most other languages which have a perfectly well-defined order for side effects in a single expression).

rowanG077 · 2026-05-20T14:48:28 1779288508

What you are describing is implementation defined behavior. Using that is perfectly safe and reasonable. Undefined means this programs is malformed.

hmry · 2026-05-20T17:24:09 1779297849

No I'm just repeating what the original comment said, which is that it's explicitly UB:

"5.1.2.4.1 says any volatile access - including just reading it - is a side effect. 6.5.1.2 says that unsequenced side effects on the same scalar object (in this case, x) are UB. 6.5.3.3.8 tells us that the evaluations of function arguments are indeterminately sequenced w.r.t. each other."

If function arguments were sequenced with respect to each other, it wouldn't be a problem.

But actually, maybe the original comment is wrong. Presumably "indeterminately sequenced" and "unsequenced" mean different things, although I don't have a copy of the standard at hand to check.

echoangle · 2026-05-20T11:57:22 1779278242

Couldn’t you just define that function arguments are evaluated left to right?

Or just throw an error.

jfoks · 2026-05-20T15:50:23 1779292223

Why? Just for this edge case? It could be faster and/or allow smaller code size to allow this to be undefined.

Undefined is also different from "depends on the compiler", because which behavior is chosen can even depend on the circumstances, whatever code appears before and/or after it.

That said, UB in code, such as this example of ordering of reads of volatile parameters being undefined, does not automatically mean that code that uses it is bad. It may very well be that the function being called doesn't misbehave either way.

echoangle · 2026-05-20T17:32:55 1779298375

That’s the point of the whole article. It’s not worth the speed gain to have a language that nobody can safely use because you can’t really prevent UB when you write it.

> It may very well be that the function being called doesn't misbehave either way.

The function being good or bad has nothing to do with the UB. The UB occurs before the function is called.

saagarjha · 2026-05-20T12:18:55 1779279535

I meant reading the uninitialized variable

muvlon · 2026-05-20T14:06:29 1779285989

There is no uninitialized variable, I explicitly initialized it to 5.

And yes indeed, C could do what Rust does and define the order of evaluation for function arguments.

If the argument expressions are indeed side-effect-less, the compiler can always make use of the "as-if" rule and legally reorder the computation anyway, for example to alleviate register pressure.

lll-o-lll · 2026-05-20T11:37:15 1779277035

saagarjha · 2026-05-20T11:52:43 1779277963

I have good news about what UB allows

JadeNB · 2026-05-20T12:42:45 1779280965

What is that?

FabHK · 2026-05-20T15:27:29 1779290849

A fictitious assembly instruction (and pretty good TV series).

https://en.wikipedia.org/wiki/Halt_and_Catch_Fire_(computing...

SAI_Peregrinus · 2026-05-20T17:27:58 1779298078

Halt and Catch Fire

jeffffff · 2026-05-20T11:40:46 1779277246

Compilation error

saagarjha · 2026-05-20T11:53:28 1779278008

It’s hard to detect all UB at compile time

Demiurge · 2026-05-20T12:17:17 1779279437

It’s harder depending on the language, which is clearly the point.

nextaccountic · 2026-05-21T02:52:31 1779331951

> Or at least, no human since the invention of C in 1972 has.

No human without proper tools maybe, but what about seL4? It goes beyond proving the code is UB-free and actually formally verifies the code works as intended. And the code is written in C. (the proofs of course aren't)

The proof is interesting because it goes beyond just proving the C code is correct. For some platforms, they compile the code with an ordinary compiler, and verify that the machine code does what the C code is supposed to do. (that's because just writing correct C code doesn't help you if you trigger a compiler bug)

This works even if the compiler (in this case, GCC) isn't verified - they verify a specific output of the compiler, not that the compiler always generates machine code correctly.

lelanthran · 2026-05-20T13:14:11 1779282851

> The point of the post is to say it's not possible to avoid them. Or at least, no human since the invention of C in 1972 has.

What are you talking about? UB was coined only in the first C standard, in 1989. Prior to that there was no "If you do this, anything can happen". It was "If you do this, that will happen".

thomashabets2 · 2026-05-21T06:52:32 1779346352

> UB was coined only in the first C standard, in 1989

Pre 1989, when C did not have a standard, was the behavior unspecified or undefined? That is, of course, a trick question. Because in this context the very definitions of the words come from the standard itself.

Before a language gets a specification, is the de facto specification the five words "you know what I mean"?

The very definition of "UB" in C later became "[…] this document imposes no requirements". Is that not the same thing as "there is to specification (yet)"?

It sounds very zen, but "a non existing specification imposes no requirements".

But I don't think it's meaningful to argue the semantic difference before the (in-context) existence of the words "undefined" vs "unspecified".

> Prior to that there was no "If you do this, anything can happen".

Of course it was. You relied on "common sense".

> It was "If you do this, that will happen".

Haha, of course it wasn't. Before a specification there is neither a definition of "this" nor "that".

Unless you mean ye olde "the compiler implementation is the specification". In which case we'll get dragged into "what even is a language" and "what is the sound of one hand clapping?".

Or, alternatively, it's as true then as it is today. If you go by "GCC x.y.z on platform Z kernel Y, (etc…) is the specification" then there is no UB.

professoretc · 2026-05-20T14:12:15 1779286335

More like, "if you do this, what happens depends on your particular combination of hardware, operating system, and compiler. Don't ask us."

nickez · 2026-05-20T15:01:14 1779289274

No, that would be implementation defined.

professoretc · 2026-05-20T17:42:15 1779298935

The post I was replying to said,

> UB was coined only in the first C standard, in 1989. Prior to that there was no "If you do this, anything can happen".

I.e., the context is, before UB existed as a concept, how would these things be categorized. And I was trying to offer the correction that, before UB existed, it wasn't "all behavior is defined" but rather many behaviors depend on your particular local environment. While that may technically be implementation defined, the current standard requires that implementation defined be documented, and UB-like edge cases were most definitely not documented anywhere consistently in the old days!

tekne · 2026-05-20T17:12:29 1779297149

No, that's actually UB. The important bit here is "compiler defined" -- UB means the compiler is allowed to assume it never happens while compiling.

Consider, for example, an implementation defined function f() -- which can also diverge/crash horribly, etc.

If I write

    if p {
      print("p is true")
    } else {
      g()
    }

    if p {
      f()
    }

Then either we: - print p is true and execute f - do nothing

This is true regardless of if f immediately crashes the computer, nasal demons, whatever -- that's implementation defined.

UB means f may never happen.

And that means the compiler may optimize this to just:

g()

Notice the difference here -- the print never happens!, and g always happens.

You can see why this is concerning when you write code like

    if dry_run {
      print("would run rm -rf /")
    } else {
      run("rm -rf /")
    }

    if dry_run {
      // oops: some_debug_string is NULL and will segfault!
      print(some_debug_string);
    }

tyg13 · 2026-05-20T18:05:44 1779300344

I see what you're going for, but I don't see how your example is UB. If `p` is a pointer, and, after your `if (p)` check, `p` is dereferenced unconditionally, then yes, your check for `p == NULL` could be removed, and the code under the `if` would be removed as well. But the example you've constructed is not UB.

Quekid5 · 2026-05-21T07:06:54 1779347214

You misunderstood their example, I think.

If doesn't matter what 'p' is in their example. The point is: if 'f' is undefined behavior (rather than just impl-defined), then the optimizer concludes that the "if p { f() }" can never happen... which means that we're allowed to assume that 'if p { ... } else { ... }' (in the first part of the example) will always take the else branch. The compiler will optimize accordingly and just always call g() unconditionally.

saghm · 2026-05-20T18:43:27 1779302607

> if nobody can do it right, how is it even fair to blame the programmer? My point is that ALL

It's fair to blame the programmer for the choice of programming in a language like this, if it was in fact their choice. As you've so eloquently put, choosing those languages is essentially equivalent to choosing UB, so starting a new project with one of them is 100% blameworthy when the UB is inevitably found.

thomashabets2 · 2026-05-21T07:29:20 1779348560

Not all projects are green field. But sure, new modules can be written in other languages. And C is, as cross-language barriers go, fairly easy to interface with.

thomashabets2 · 2026-05-20T09:52:46 1779270766

Author here.

> A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned71) for the referenced type, the behavior is undefined.

C23 6.3.2.3p7.

thomashabets2 · 2026-05-20T09:43:51 1779270231

That is a typo, that I think I introduced when I went back to clarify that it applies to C++ too.

Will fix it.