More

rigtorp · 2025-12-30T22:05:53 1767132353

Better to use the Gmail API to incrementally backup your mail: https://github.com/rigtorp/gmbackup

rigtorp · 2025-12-30T22:03:20 1767132200

I have a tool that saves each mail as a single file using the Gmail API: https://github.com/rigtorp/gmbackup

rigtorp · 2025-09-18T15:29:11 1758209351

How is belief propagation used for decoding LDPC codes related to FFT?

srean · 2025-09-18T16:53:09 1758214389

At the core both derive their optimization from the distributive property. If the expression graph has symmetry, you get more optimization out of it.

https://www.cs.ubc.ca/~murphyk/Teaching/Papers/GDL.pdf

Check out the first paragraph

    THE humble distributive
    law, in its simplest form
    states that...this leads
    to a large family of fast
    algorithms, including 
    Viterbi’s algorithm and 
    the fast Fourier
    transform (FFT).

Two extremely influential papers appeared back to back in transactions information theory. This is one of them.

The other is

https://vision.unipv.it/IA2/Factor graphs and the sum-product algorithm.pdf

Both are absolute gems of papers. The editor made sure that both appear in the same volume.

rigtorp · 2025-09-18T19:07:41 1758222461

Interesting, of course many computations can be expressed as a graph. In the case of the bipartite graph we perform belief propagation on to decode LDPC where is the optimization from the distributive property? The parity matrix would typically be constructed so that there's few subexpression to factor out, to maximize the error correcting properties.

I agree both FFT and belief propagation can be expressed as message passing algorithms.

srean · 2025-09-18T20:49:59 1758228599

It shows up in pushing in the parenthesis and pulling common terms out in the expression that is a sum (over all possible assignments) of products of terms.

Doing the summation the naive way will be exponential in the number of variables. The goal is to this in an efficient way exploiting the distributive property and symmetry if any, much like in the FFT case.

This can be done efficiently, for example, when the graph is a tree. (Even if it isn't, one can pretend as if it is. Surprisingly that often works very well but that's a different topic entirely)

Read the paper it's not difficult to follow.

kqbx · 2025-09-18T18:42:08 1758220928

The second link is broken on HN because it contains a space. Here's a clickable version: https://vision.unipv.it/IA2/Factor%20graphs%20and%20the%20su...

adamnemecek · 2025-09-19T02:57:19 1758250639

There’s a whole subfield called generalized distributive law https://en.wikipedia.org/wiki/Generalized_distributive_law

rigtorp · on June 27, 2023

It might also be better for performance since the two cores can RFO the buffer pointer cache lines from each other.

gpderetta · on June 27, 2023

Neither producer or consumer ever writes into that line though, so it should stay in S state forever.

rigtorp · on June 27, 2023

There might be an additional optimization in having the writer also cache it's write index on the cache line together with the read index cache. This way the writer would only do writes to the write index cache line. The hardware might be able to optimize this. I wonder how it interacts with UMWAIT on the latest cores.

gpderetta · on June 27, 2023

Reading its own values is never a problem, the writeIndex cacheline will switch from E (or M) to S (or O), but it is never evicted, so reads from it never miss. For frequent writes the read will be fulfilled by the store buffer anyway.

rigtorp · on June 27, 2023

I think it will be invalidated due to RFO when the reader reads the write index. Only when multiple readers reads the same cache line without any intervening write will the RFO heuristic be disabled.

rigtorp · on Sept 18, 2022

I have something similar but in C++: https://github.com/rigtorp/c2clat

virgulino · on Sept 18, 2022

I went to your homepage. Your gif "Programming in C++" made me really laugh, thanks for that! 8-)

https://rigtorp.se/

wh33zle · on Sept 19, 2022

Hahaha, thank you for pointing that out!

I can offer the following in return: https://i.pinimg.com/736x/cc/aa/30/ccaa3008f98375c26a221c85d...

amock · on Sept 19, 2022

This works much better on my POWER9 system. I have two CPUs in my machine with the same core count but different cache configurations according to hwloc and this showed that they really are different.

habibur · on Sept 19, 2022

You need to add a Makefile there. It's just two lines.

nerpderp82 · on Sept 19, 2022

See also https://github.com/google/multichase

rigtorp · on July 27, 2022

You might need to add -fno-omit-frame-pointer to help ASAN unwind the stack.

abbeyj · on July 27, 2022

This is definitely an option you want to be using when using ASan or LSan. You may also want to consider additionally using -momit-leaf-frame-pointer to skip frame pointers only in leaf functions while keeping frame pointers for non-leaf functions. This can make small leaf functions significantly shorter, limiting some of the negative impact of using -fno-omit-frame-pointer alone.

Sometimes even -fno-omit-frame-pointer won't help, like if the stack is being unwound through a system library that was built without frame pointers. In that case you can switch to the slow unwinder. Set the environment variable `ASAN_OPTIONS=fast_unwind_on_malloc=0` when running your program. But note that this will make most programs run significantly slower so you probably want to use it only when you really need it and not as the default setting for all runs.

rigtorp · on Sept 17, 2021

You're incorrect, garbage collection would be the biggest problem for that use case. You have to be really careful even with your C/C++ code, warming up the branch predictor between packets etc, see my article for some pitfalls: https://rigtorp.se/virtual-memory/

rigtorp · on July 7, 2021

There's even more discussion on the lock memory ordering on Stackoverflow: https://stackoverflow.com/questions/61299704/how-c-standard-...

Taking a lock only needs to be an acquire operation and a compiler barrier for other lock operations. Using seq_cst or acq_rel semantics is stronger than needed. From my reading and discussions with people from WG21 the current argument for why taking a lock only requires acq semantics is that a compiler optimization that transforms a non-deadlocking program into a potentially deadlocking program is not allowed. There's an interesting twitter thread where we discuss this I can't find anymore :(.

rsc · on July 7, 2021

That is an amazing thread. The fact that C++ apparently allows optimizing

    #include <stdio.h>
    
    int stop = 1;
    
    void maybeStop() {
        if(stop)
            for(;;);
    }
    
    int main() {
        printf("hello, ");
        maybeStop();
        printf("world\n");
    }

into

    int main() {
        printf("hello, world\n");
    }

(as Clang does today) does not inspire confidence about disallowing moving the loop in the other example. If the compiler is allowed to assume that this loop terminates, why not the lock loop?

Maybe there is a reason, but none of this inspires confidence.

rigtorp · on July 7, 2021

The standard says that a thread must eventually terminate, do an atomic operation or do IO. So the while(lock.exchange(true)); loop is different.

Also keep in mind that C++11 specifies std::mutex::lock() to have acquire semantics and unlock() to have release semantics on the lock object. In order for std::mutex to actually work the reordering of m1.unlock(); m2.lock(); to m2.lock(); m1.unlock(); must be disallowed. But since m1 and m2 are separate objects m1.unlock() has no happens before relationship with m2.lock(). This seems to be a problem in the C++11 memory model. The arguments I have heard from some WG21 people is that there is no problem since transforming a wellformed terminating program into a non-terminating program is not allowed. I can't find the wording in the C++ standard that asserts this. But oh well, it works right now on gcc/llvm/msvc.

gpderetta · on July 7, 2021

I don't remember the exact wording, but the standard explicitly makes an exception for the always terminating assumptions, for loops accessing atomic variables or having side effects (i.e. volatile or I/O).

rigtorp · on June 29, 2021

Yes that's right!