Computation graphs and graph computation (2020)

motohagiography · on Nov 2, 2021

This is the post I needed to refine (or perhaps exacerbate) some cloudy layman thinking, thank you (or perhaps apologies in advance).

> Matrices are problematic for some reasons. Primarily, by treating a graph as a matrix, we impose an ordering over all vertices which is often arbitrary.

This jumped out, as well as a question about whether the terseness people value in notations is a feature of anything else we study at all, and whether terseness creates its own artifacts and blind spots. This idea of graphs and matrices being dynamical systems and state machines appears to be what you get when you remove time/ordinality from them, where they become things with a field of states. It implies there are cases where this ordinality artifact in adjacency matrices provides information, and cases where the ordinality of a matrix representation is noise or a constraint.

Intuitively, since the dimension of the adjacency matrix that expresses a graph must be equal or greater than the highest degree node in the graph. (via the pigeon hole principle I recently learned from another HN comment) it seems like for a given graph, you cannot determine the dimensions of the adjacency matrix that expresses the graph without evaluating all graph nodes for their maximum degree.

Which frames this conclusion from the post:

> First is theoretical: we must show that binary matrix mutiplication is Turing-equivalent. Second, we must show a proof-of-concept via binary recompilation. Third, we must develop a robust toolchain for compiling and introspecting a wide variety of graph programs.

Does this imply that determining the Turing-equivalency between graphs and matrices is NP-complete or -hard?

adenozine · on Nov 2, 2021

One of the best links I ever saw on HN.

That guy has a TON of cool projects on Github, I suspect he'll have a very influential career if he sticks to his guns and continues doing research on his interests.

Autograd especially is a fascinating project to read through, if you like Kotlin

da39a3ee · on Nov 2, 2021

I'm definitely going to be reading this blog post, and probably others by this person. They sound amazing and his work sounds amazing.

So I'm not the sort of person to ignore someone's technical work because of their style or attitude. But, breandan, do you realize that you sound just incredibly big-headed? This sort of stuff would be appropriate when you're interviewing, but not in public.

> This year, I predicted the pandemic weeks before the lockdown, exited the market, and turned down a job at Google ... I’m making some big bets and some will be wrong, but I see the very same spark of genius in them.

> A lot of people I looked up to at the time laughed at me. I’ll bet they aren’t laughing anymore.

> he information economy and confirmation bias takes were all dead right. ... Don’t say I warned you, go out and fix our broken systems.

> In 2017, I witnessed the birth of differentiable programming which I stole from Chris Olah and turned into a master’s thesis ...Had a lot of trouble convincing people that classical programs could be made differentiable, but look at the proceedings of any machine learning conference today and you’ll find dozens of papers on differentiable sorting and rendering and simulation. Don’t thank me, thank Chris and the Theano guys.

> In 2018, I correctly predicted Microsoft would acquire GitHub to mine code.

q1w2 · on Nov 2, 2021

> many algorithms can be expressed as matrix multiplication

I wonder if this can be applied to something like the 3n+1 problem, or Conway's Game of Life?

gwern · on Nov 2, 2021

It's a grid and has a famous APL 1-liner, and there's GPU-accelerated GoL implementations, so one assumes so.

dragontamer · on Nov 2, 2021

GPUs don't always perform matrix multiplications.

GPUs are just massive SIMD machines, which happen to be pretty good at matrix-multiplication. But systolic arrays (also known as "Tensor Cores" these days) are even better at the job, which is why Google put so much money into TPUs, and NVidia has created special "Tensor Cores" (really, systolic arrays) in their GPUs.

The number of applications that use systolic array matrix multiplications is rather slim: pretty much just the deep learning fellows. Otherwise, those matrix-multiplication routines in the GPU are largely executed using plain-old SIMD programming.

--------

If I were to write a GPU Conway of Life simulator, I'd just do it the same way as before. Counting up each pixel one-by-one and creating a state-transition from old-state to new-state, synchronizing the states / parallel calculation to ensure that there's no race conditions.

--------

To rewrite the Game of Life as a matrix-multiplication would be grossly inefficient. I actually argue the opposite needs to happen.

That is: we need more matrix multiplications written as if they were computer programs. (See my other post in this topic: https://news.ycombinator.com/item?id=29083898). It is clear that matrix-multiplications and graphs are related, but we process graphs with say... CPU-branch predictors (aka: following the instruction-pointer graph across the computer program graph).

You __COULD__ implement that as matrix multiplications. But that'd be a gross waste of energy. Instead, more matrix-multiplications (especially sparse matrix multiplications) should be written to take advantage of branch-predictors and other such features of modern CPUs.

------------

There's absolutely an equivalency between matrix multiplies and graphs. But this post is drawing the precisely opposite conclusion from reality IMO.

Matrix multiplications are incredibly dense and power-hungry. There are faster, cheaper, more efficient algorithms for performing these computations _IF_ you know the matricies are in a certain format. (Ex: tri-diagonal, sparse, or other such setups). Detecting these special forms (especially very common special forms) and optimizing on those is the goal for faster computers these days.

dragontamer · on Nov 2, 2021

Missing my favorite graphs: the binary decision diagram / Branching program. BDDs / Branching programs consist of nodes called "ITE Nodes", meaning "If / Then / Else". Roughly ITE(BoolVariable, Then-Address, Else-Address).

If BoolVariable is true, "jump" to Then-Address. If BoolVariable is false, jump to Else-Address.

Programs start on address 2. Address0 represents false, Address1 represents true.

Lets start with the easiest:

    0: false
    1: true
    2: ITE(0, 0, 1)

This program above represents "if(Variable[0]){ return false; } else { return true}". Can't get any easier than this!! Okay, lets make things a wee bit harder.

    0: false
    1: true
    2: ITE(0, 3, 4)
    3: ITE(1, 1, 0)
    4: ITE(1, 0, 1)

This program represents Var0 XOR Var1.

At these small sizes, it doesn't seem like branching programs / BDDs are very useful. However, it turns out that BDDs can in practice, represent 100+ variable graphs efficiently (!!!). As such, your CAD tools use BDDs in practice to prove whether or not boolean functions (such as "addition" or "multiplication") are in fact correct.

IMO, Binary Decision Diagrams sit at the edge of NP-completeness vs Efficiency. Clearly, BDDs represent the 3-SAT problem. But in practice, most algorithms (finding a solution for example) can take place very efficiently.

----------

For example: the ORBDD (Ordered, Reduced, BDD) has a property such that no node is redundant, and all variables are ordered. (That is, you visit "variable 0" first, then "variable 1", then "variable 2"). If you satisfy these properties, then you can "solve" the 3SAT / Satisfiability problem with the simple logic:

1. Start at the start of the program (aka: line 2). 2. Does the ITE() branch to non-zero (aka: non-false value) ?? Then your BDD is satisfiable. Period.

This is because under an ORBDD, if the BDD is unsatisfiable, it'd be represented as:

    0: false
    1: true
    2: ITE(0, 0, 0)

Because ORBDDs are "reduced" (ie: no redundant nodes). As such, solving 3SAT with ORBDDs is in fact, a O(1) operation.

--------

It turns out that the #P-complete "counting solutions" is solvable in O(n) time and O(n) space in BDDs (!!!). I can step you through the XOR program for instance. I'll "reorder" the program so that we can just count solutions from top-to-bottom (0, 1, 4, 3, 2).

    0: false <-- 0 solutions
    1: true <-- 1 solution
    4: ITE(1, 0, 1) <--- Then-branch has 0 solutions, but Else-branch has 1 solution. 0 + 1 == 1 solution
    3: ITE(1, 1, 0) <--- Then-branch has 0 solutions, but Else-branch has 1 solution. 0 + 1 == 1 solution
    2: ITE(0, 3, 4) <--- Then-branch has 1 solution + else-branch has 1 solution == 2 total solutions

You'll need a topological bucket / radix sort (O(n) sorting algorithm) to "visit" the ITE-statements in the correct order. But if done so, you can "obviously" count the solutions in just O(n) time!!

Because all ITE statements have their variable number associated with them in the 1st position (ITE(5, X, Y) should be visited before ITE(2, X, Y)), its a simple matter of O(n) bucket or radix sort on the variable number. (No need for a graph-based topological sort).

----------

If you "skip" a variable, there's a multiplier of 2^(number of skipped variables). For example, lets say we have "Var0 XOR Var4", (5 total variables, but Var1/Var2/Var3 are ignored).

    0: false <-- 0 solutions
    1: true <-- 1 solution
    4: ITE(4, 0, 1) <-- 1 solution
    3: ITE(4, 1, 0) <-- 1 solution
    2: ITE(0, 3, 4) <-- 2^(#Skipped Variables) * (1+1) solutions == 2^3 * 2 == 16 solutions total

We get 16 total solutions. (ex: 0_000_1, 0_001_1, ... 0_111_1, then 1_000_0, 1_001_0, ... 1_111_0).

I'm still using functions that are easily verifiable by hand. But you can imagine that when these BDDs get into Megabytes or Gigabytes territory, you'd much rather use BDDs for this problem rather than the truth table.

------

Alas: we don't get "something for noting". It turns out that _making an ORBDD in the first place_ is an NP-complete problem in of itself (!!!). In fact, even finding a good ordering (which variable should be "variable 0" ??) is itself an NP-complete problem.

So no shortcuts to solving NP-complete stuff at all. Nonetheless, if someone needs to "easily" solve the #P-class of problems (ie: counting-NP completeness). That is: "find the total number of solutions to the 3SAT problem", or "Find the best solution to the 3SAT problem", for optimizing problems and/or counting problems... BDDs are a very good data-structure for that.

Traditionally, ORBDDs are used for proving that two truth tables are identical, without storing the 2^128 solutions needed per bit of a 64-bit + 64-bit boolean function. The BDD is often several orders of magnitude smaller than the truth table (not in all cases. Just in most cases we come across in making chips)

In practice: you start with simple ORBDDs (such as AND, OR, XOR, SUM, MULTIPLY), you combine ORBDDs with the "synthesis" algorithm... which creates new ORBDDs that combines the previous ones. The output is provably ordered-and-reduced still.

That is to say: you can "maintain" ORBDDs rather easily in practice. In theory, each synthesis step could blow up the BDD exponentially... but in practice, there are many circuits / algorithms where the ORBDD remains small and usable.