More

0d0a · 2025-09-29T06:49:14 1759128554

Exactly, it misses out on explaining how the fixed Huffman table is interpreted to apply symbol and distance codes, or how dynamic tables are derived from the input itself. Sure it's the hardest part, but also the more interesting to visualize. As another commenter pointed out, we are just left with mysterious bit sequences for these codes.

It would be cool if we could supply our own Huffman table and see how that affects the stream itself. We might want to put our text right there! https://github.com/nevesnunes/deflate-frolicking?tab=readme-...

cogman10 · 2025-09-29T18:56:06 1759172166

I think this is something that makes a decent teaching aid but doesn't work well for the uninitiated.

You need someone to spell out exactly what each of the sections are and what they are doing.

0d0a · on Sept 18, 2024

Even if it doesn't use block-based compression, if there isn't a huge range of corrupted bytes, corruption offsets are usually identifiable, as you will quickly end up with invalid length-distance pairs and similar errors. Although, errors might be reported a few bytes after the actual corruption.

I was motivated some years ago to try recovering from these errors [1] when I was handling a DEFLATE compressed JSON file, where there seemed to be a single corrupted byte every dozen or so bytes in the stream. It looked like something you could recover from. If you output decompressed bytes as the stream was parsed, you could clearly see a prefix of the original JSON being recovered up to the first corruption.

In that case the decompressed payload was plaintext, but even with a binary format, something like kaitai-struct might give you an invalid offset to work from.

For these localized corruptions, it's possible to just bruteforce one or two bytes along this range, and reliably fix the DEFLATE stream. Not really doable once we are talking about a sequence of four or more corrupted bytes.

[1]: https://github.com/nevesnunes/deflate-frolicking

0d0a · on Oct 6, 2023

Can you give some examples? When batching data, you benefit from picking something like io_uring. But for two-way communication, you still need to notify either side when data is ready (maybe you don't want to consume cpu just polling), and it isn't clear to me how those options handle that synchronization faster than pipes.

ltadeut · on Oct 6, 2023

The main thing io_uring gives you is avoiding multiple syscalls.

With a pipe you can’t really avoid that. With a shared memory queue/ring buffer you can write to the memory without any syscalls.

But you need to build synchronisation yourself (e.g., using semaphores for example). You don’t necessarily need to poll.

0d0a · on Sept 16, 2023

Nice, I'll give it a closer look. My only concern so far is memory hooking (still needed for hardware registers), which on Java side was called by FilteredMemoryState [1]. In memstate.cc it looks like just the simpler MemoryState is implemented [2], and there's no equivalent to MemoryAccessFilter. But it might not be that complicated to add...

[1]: https://github.com/NationalSecurityAgency/ghidra/blob/4561e8...

[2]: https://github.com/NationalSecurityAgency/ghidra/blob/4561e8...

0d0a · on Sept 11, 2023

I think it's closer to the 2A03. Unless I missed something, there isn't any support implemented for binary-coded decimal mode.

0d0a · on Sept 11, 2023

Thanks, but I think I'm going to disappoint you: the demo is using pre-recorded manual inputs, which are then replayed when emulating in Ghidra. The only logic involved is checking when we are at the right instruction to then send the input. I mentioned it briefly in the README but maybe I wasn't very clear, sorry!

0d0a · on Sept 11, 2023

From what I've seen, it's usually read at the vblank interrupt.

The input recording has entries in format "<instruction_number> <buttons_bitmask>". If I press a button and it's read from the hardware register after let's say 0x1000 instructions have been stepped, it is stored as "0x1000 0x80", and in the Ghidra emulator script, I only need to count up to 0x1000 instructions before I send that memory write to the other emulator. While the real timings are vastly different, the input will be read after roughly the same number of vblank calls. I say "roughly" because indeed I found a differential on the expected call where it should be read, but it isn't yet clear if that's a logic bug on my side, I'll have to eventually look into it again.

ComputerGuru · on Sept 12, 2023

Thanks - that’s along the lines of what I was expecting.

0d0a · on Sept 11, 2023

Nice to see another CTF enjoyer :) I've always thought about using Ghidra for vm challenges, but I'm still not sure if it fits the typical timeframe. Although I never used it, something like binja seems more favourable to quick and dirty scripting.

About custom pcodeops, yeah I was really tempted to use them for TLCS-900. For example, instruction `daa` adjusts the execution result of an add or subtract as binary-coded decimal, and the pcode for that is just inglorious (but I'm sure there's worse out there): https://github.com/nevesnunes/ghidra-tlcs900h/blob/5ff4eb851...

Pretty amusing how a single instruction takes more than a dozen lines in the decompilation: https://gist.github.com/nevesnunes/7417e8bec2cddfcaf8d7653c9...

0d0a · on July 5, 2023

https://nevesnunes.github.io/blog/

Reverse engineering, debugging, and some silly contraptions.

0d0a · on March 29, 2023

I'm also writing a processor module, and reading this is a bit encouraging to eventually write about it once it's finished.

Getting off the ground wasn't the hardest part so far. You can just pick the skeleton module that already comes with Ghidra, then lookup some existing simpler modules like the one for z80 to figure out how instructions are put together. You also have the script `DebugSleighInstructionParse` to check how bits are being decoded, very useful when you screw up some instruction definitions.

Unfortunately, you bump into a lot of jargon heavy error messages. The first time you hear about "Interior ellipsis in pattern", you sure have no idea what's that about. Now repeat that experience for several messages.

Then the hardest challenge is how to even test the module outside of some quick disassemblies. There's `pcodetest` but the setup is cumbersome and it seems more about validating instruction decoding rather than semantics. I might just write my own validation using pcode emulation and compare the register state against another emulator's instruction trace...

mumbel · on March 29, 2023

Pcodetest is more about validating the implementation of the instruction, sure it has to decode, but the benefit is most a base level set of logic that can be emulated. And definitely not a fan of the setup to get it going (also only helpful if you have a semi recent C compiler)

0d0a · on March 30, 2023

Oh nice, it wasn't clear from the test suite if that was the case, I'll give it a closer look.

Judging from the python scripts, it seems to expect a whole binutils toolchain (so not just compiler but also objdump, readelf...) and that would be a blocker for me.

mumbel · on March 31, 2023

Compiler (gcc) and maybe assembler (as) are used. I think the other binutils executables are unused but still built-in to their logic. Due to it's age and being removed from gcc, I was unable to cleanly setup pcodetest for 80960 (had to hack it all together and scripted their java portion to work with hack), but was super useful for improving tricore (pcodetest wasn't released when I submitted original PR) and writing risc-v.