Hacker Newsnew | past | comments | ask | show | jobs | submit | rep_lodsb's commentslogin

For the code generator, it produced this annotated disassembly:

    2100 push ax            ;--- EmitByte: write one byte to code output ---
    2101 mov di, [code_ptr] ;DI → current position in output buffer
    2104 stosb              ;Write AL to output, advance DI
    2105 mov [code_ptr], di ;Update code pointer
    2108 pop ax             ;Restore AX
    2109 ret                ;Every compiled instruction flows through this 6-instruction emitter
    2110 mov al, 0E8h       ;--- EmitCall: generate CALL instruction ---
    2112 call EmitByte      ;Emit opcode byte E8h (near CALL)
    2115 sub bx, [code_ptr] ;Calculate relative offset
    2118 sub bx, 2          ;Adjust for instruction length
    211A xchg ax, bx        ;AX = relative offset
    211B call EmitWord      ;Emit 16-bit relative displacement
    211E ret                ;Generated: E8 lo hi — a complete CALL instruction
Obviously, there has to be a lot more to even a simple-minded x86 code generator than just a generic "emit opcode byte" and "emit call" routine. In general, what A"I" produced here is not a full disassembly but a collection of short snippets, potentially not even including the really interesting ones. But is it even correct?

EmitByte here is unnecessarily pushing/popping AX, which isn't modified by the few instructions in between at all. No competent assembly language programmer would do this. So maybe against all expectations, Turbo Pascal is just really badly coded? No, it's of course a hallucination: those instructions don't appear in the binary at all!

That the hex addresses are wrong can already be seen in the instruction "mov di,[code_ptr]" here being apparently only three bytes long. In reality it would take four! And it's easy to confirm that this code isn't present at the addresses shown.

So maybe it's somewhere else? x86 disassembly can be complicated because the opcodes are variable length, and particularly in old programs like this the code and data are often not cleanly separated. Claude apparently ran it through NDISASM, which doesn't even attempt to handle that task.

But searching for e.g. the hex opcode B0 E8 ('mov al,0xe8') is enough to confirm that this code snippet isn't to be found anywhere.

There is a lot more suspicious code, including some that couldn't possibly work (like the "ret 1" in the system call dispatcher, which would misalign the stack).

Conclusion: it's slop


Thanks for this, I've added that to my write-up of the project here: https://simonwillison.net/2026/Mar/20/turbo-pascal/#hallucin...

> Because it's amusing to loop this kind of criticism through a model

Maybe it could become a general pattern, to have an agent whose task is just to deny the output validity. GANs are a very successful technique, perhaps it could work for language models too.


>Protip: your functions should be padded with instructions that'll trap if you miss a return.

Galaxy brained protip: instead of a trap, use return instructions as padding, that way it will just work correctly!

Some compilers insert trap instructions when aligning the start of functions, mainly because the empty space has to be filled with something, and it's better to use a trapping instruction if for some reason this unreachable code is ever jumped to. But if you have to do it manually, it doesn't really help, since it's easier to forget than the return.


That only works for unsigned integers.


Signend 64-bit is the worst case. When I tried to enable overflow checking thr overhead of RISC-V and Arm was comparable: https://news.ycombinator.com/item?id=46588159#46668916


Refer to the spec for the official idioms to handle every case.


Yes, you can detect signed overflow that way, but it's a lot more instructions so it won't be used in practice.

The designers of RISC-V included the bare minimum needed to compile C, everything else was deemed irrelevant.


>but it's a lot more instructions so it won't be used in practice.

It will be used when it needs to be handled. e.g. where elsewhere, an exception would actually handle it. Which is seldom the case.

More instructions doesn't mean slower, either. Superscalar machines have a hard time keeping themselves busy, and this is an easily parallelizable task.

>The designers of RISC-V included the bare minimum needed to compile C, everything else was deemed irrelevant.

Refer to "Computer Architecture: A Quantitative Approach" by by John L. Hennessy and David A. Patterson, for the actual methodology followed.


Secure boot can be disabled even on modern PCs.


It has nothing to do with being unable to run 16-bit code, that's a myth.

https://man7.org/linux/man-pages/man2/modify_ldt.2.html

Set seg_32bit=0 and you can create 16-bit code and data segments. Still works on 64 bit. What's missing is V86 mode, which emulates the real mode segmentation model.


That can be trapped for sure.


You're confusing several things here. The only x86 processor that didn't allow returning to real mode was the 16-bit 80286 - on all later ones it's as simple as clearing bit 0 of CR0 (and also disabling paging if that was enabled).

Nothing more privileged than ring 0 is required for that.

"v86" is what allowed real mode to be virtualized under a 32-bit OS. This is no longer available in 64-bit mode, but the CPU still includes it (as well as newer virtualization features which could be used to do the same thing).


You can write to CR0 from a DOS COM program while in V86 mode??? :o Wouldn't that cause a GPF / segfault / EMM386 crash?


The scenario was about the first fusion (hydrogen) bomb test causing a runaway "ignition" of the atmosphere. It was never considered likely, but they still did the math to make certain it couldn't happen.


Why is that surprising? The trap into kernel mode alone would already take more cycles than dedicated hardware needs for the full page table walk.


Since we're talking about defining our own processor, that means we need to define one with cheaper traps.

Expanding on what I wrote above about "bits of hardware acceleration", maybe adding a few primitives to the instruction set that make page table walking easier would help.

And with a trusted compiler architecture you don't need to keep the ISA stable between iterations, since it's assumed that all code gets compiled at the last minute for the current ISA.

Lots of fun things to experiment with.


Taking this to an extreme, the whole idea of a TLB sounds like hardware protection too?

As a thought experiment, imagine an extremely simple ISA and memory interface where you would do address translation or even cache management in software if you needed it... the different cache tiers could just be different NUMA zones that you manage yourself.

You might end up with something that looks more like a GPU or super-ultra-hyper-threading to get throughput masking the latency of software-defined memory addressing and caching?


In TempleOS, everything runs in ring 0, but that's not the same as doing protection in software (which would require disallowing any native code not produced by some trusted translator). It simply means there's no protection at all.


Very fitting if that was intended to be protection by faith.


That's because CS in real/V86 mode is actually a writable data segment. Most protection checks work exactly the same in any mode, but the "is this a code segment?" check is only done when CS is loaded in protected mode, and not on any subsequent code fetch.

Using a non-standard mechanism of loading CS (LOADALL or RSM), it's possible to have a writable CS in protected mode too, at least on these older processors.

There's actually a slight difference in the access rights byte that gets loaded into the hidden part of a segment register (aka "descriptor cache") between real and protected mode. I first noticed this on the 80286, and it looks to be the same on the 386:

- In protected mode, the byte always matches that from the GDT/LDT entry: bit 4 (code/data segment vs. system) must be set, the segment load instruction won't allow otherwise, bit 0 (accessed) is set automatically (and written back to memory).

- In real and V86 mode, both of these bits are clear. So in V86 mode the value is 0xE2 instead of the "correct" 0xF3 for a ring 3 data segment, and similarly in real mode it's 0x82 (ring 0).

The hardware seems to simply ignore these bits, but they still exist in the register, unlike other "useless" bits. For example, LDT only has bit 7 (present), and GDT/IDT/TSS have no access rights byte at all - they're always assumed to be present, and the access rights byte reads as 0xFF. At least on the 286 that was the case, I've read that on the Pentium you can even mark GDT as not-present, and then get a triple fault on any access to it.

Keeping these bits, and having them different between modes might have been an intentional choice, making it possible to determine (by ICE monitor software) in what mode a segment got loaded. Maybe even the two other possible combinations (where bit4 != bit0) have some use to mark a "special" segment type that is never set by hardware?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: