Hacker Newsnew | past | comments | ask | show | jobs | submit | rigtorp's commentslogin

Better to use the Gmail API to incrementally backup your mail: https://github.com/rigtorp/gmbackup


I have a tool that saves each mail as a single file using the Gmail API: https://github.com/rigtorp/gmbackup


How is belief propagation used for decoding LDPC codes related to FFT?


At the core both derive their optimization from the distributive property. If the expression graph has symmetry, you get more optimization out of it.

https://www.cs.ubc.ca/~murphyk/Teaching/Papers/GDL.pdf

Check out the first paragraph

    THE humble distributive
    law, in its simplest form
    states that...this leads
    to a large family of fast
    algorithms, including 
    Viterbi’s algorithm and 
    the fast Fourier
    transform (FFT).
Two extremely influential papers appeared back to back in transactions information theory. This is one of them.

The other is

https://vision.unipv.it/IA2/Factor graphs and the sum-product algorithm.pdf

Both are absolute gems of papers. The editor made sure that both appear in the same volume.


Interesting, of course many computations can be expressed as a graph. In the case of the bipartite graph we perform belief propagation on to decode LDPC where is the optimization from the distributive property? The parity matrix would typically be constructed so that there's few subexpression to factor out, to maximize the error correcting properties.

I agree both FFT and belief propagation can be expressed as message passing algorithms.


It shows up in pushing in the parenthesis and pulling common terms out in the expression that is a sum (over all possible assignments) of products of terms.

Doing the summation the naive way will be exponential in the number of variables. The goal is to this in an efficient way exploiting the distributive property and symmetry if any, much like in the FFT case.

This can be done efficiently, for example, when the graph is a tree. (Even if it isn't, one can pretend as if it is. Surprisingly that often works very well but that's a different topic entirely)

Read the paper it's not difficult to follow.


The second link is broken on HN because it contains a space. Here's a clickable version: https://vision.unipv.it/IA2/Factor%20graphs%20and%20the%20su...


There’s a whole subfield called generalized distributive law https://en.wikipedia.org/wiki/Generalized_distributive_law


It might also be better for performance since the two cores can RFO the buffer pointer cache lines from each other.


Neither producer or consumer ever writes into that line though, so it should stay in S state forever.


There might be an additional optimization in having the writer also cache it's write index on the cache line together with the read index cache. This way the writer would only do writes to the write index cache line. The hardware might be able to optimize this. I wonder how it interacts with UMWAIT on the latest cores.


Reading its own values is never a problem, the writeIndex cacheline will switch from E (or M) to S (or O), but it is never evicted, so reads from it never miss. For frequent writes the read will be fulfilled by the store buffer anyway.


I think it will be invalidated due to RFO when the reader reads the write index. Only when multiple readers reads the same cache line without any intervening write will the RFO heuristic be disabled.


I have something similar but in C++: https://github.com/rigtorp/c2clat


I went to your homepage. Your gif "Programming in C++" made me really laugh, thanks for that! 8-)

https://rigtorp.se/


Hahaha, thank you for pointing that out!

I can offer the following in return: https://i.pinimg.com/736x/cc/aa/30/ccaa3008f98375c26a221c85d...


This works much better on my POWER9 system. I have two CPUs in my machine with the same core count but different cache configurations according to hwloc and this showed that they really are different.


You need to add a Makefile there. It's just two lines.



You might need to add -fno-omit-frame-pointer to help ASAN unwind the stack.


This is definitely an option you want to be using when using ASan or LSan. You may also want to consider additionally using -momit-leaf-frame-pointer to skip frame pointers only in leaf functions while keeping frame pointers for non-leaf functions. This can make small leaf functions significantly shorter, limiting some of the negative impact of using -fno-omit-frame-pointer alone.

Sometimes even -fno-omit-frame-pointer won't help, like if the stack is being unwound through a system library that was built without frame pointers. In that case you can switch to the slow unwinder. Set the environment variable `ASAN_OPTIONS=fast_unwind_on_malloc=0` when running your program. But note that this will make most programs run significantly slower so you probably want to use it only when you really need it and not as the default setting for all runs.


You're incorrect, garbage collection would be the biggest problem for that use case. You have to be really careful even with your C/C++ code, warming up the branch predictor between packets etc, see my article for some pitfalls: https://rigtorp.se/virtual-memory/


There's even more discussion on the lock memory ordering on Stackoverflow: https://stackoverflow.com/questions/61299704/how-c-standard-...

Taking a lock only needs to be an acquire operation and a compiler barrier for other lock operations. Using seq_cst or acq_rel semantics is stronger than needed. From my reading and discussions with people from WG21 the current argument for why taking a lock only requires acq semantics is that a compiler optimization that transforms a non-deadlocking program into a potentially deadlocking program is not allowed. There's an interesting twitter thread where we discuss this I can't find anymore :(.


That is an amazing thread. The fact that C++ apparently allows optimizing

    #include <stdio.h>
    
    int stop = 1;
    
    void maybeStop() {
        if(stop)
            for(;;);
    }
    
    int main() {
        printf("hello, ");
        maybeStop();
        printf("world\n");
    }
into

    int main() {
        printf("hello, world\n");
    }
(as Clang does today) does not inspire confidence about disallowing moving the loop in the other example. If the compiler is allowed to assume that this loop terminates, why not the lock loop?

Maybe there is a reason, but none of this inspires confidence.


The standard says that a thread must eventually terminate, do an atomic operation or do IO. So the while(lock.exchange(true)); loop is different.

Also keep in mind that C++11 specifies std::mutex::lock() to have acquire semantics and unlock() to have release semantics on the lock object. In order for std::mutex to actually work the reordering of m1.unlock(); m2.lock(); to m2.lock(); m1.unlock(); must be disallowed. But since m1 and m2 are separate objects m1.unlock() has no happens before relationship with m2.lock(). This seems to be a problem in the C++11 memory model. The arguments I have heard from some WG21 people is that there is no problem since transforming a wellformed terminating program into a non-terminating program is not allowed. I can't find the wording in the C++ standard that asserts this. But oh well, it works right now on gcc/llvm/msvc.


I don't remember the exact wording, but the standard explicitly makes an exception for the always terminating assumptions, for loops accessing atomic variables or having side effects (i.e. volatile or I/O).


Yes that's right!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: