More

ajtulloch · on April 1, 2024

- https://www.cs.utexas.edu/users/flame/laff/pfhp/index.html (e.g. here https://www.cs.utexas.edu/users/flame/laff/pfhp/week2-blocki...)

- https://gist.github.com/nadavrot/5b35d44e8ba3dd718e595e40184...

might be of interest

kpw94 · on April 1, 2024

Great links, especially last one referencing the Goto paper:

https://www.cs.utexas.edu/users/pingali/CS378/2008sp/papers/...

>> I believe the trick with CPU math kernels is exploiting instruction level parallelism with fewer memory references

It's the collection of tricks to minimize all sort of cache misses (L1, L2, TLB, page miss etc), improve register reuse, leverage SIMD instructions, transpose one of the matrices if it provides better spatial locality, etc.

larodi · on April 1, 2024

The trick is indeed to somehow imagine how the CPU works with the Lx caches and keep as much info in them as possible. So its not only about exploiting fancy instructions, but also thinking in engineering terms. Most of the software written in higher level langs cannot effectively use L1/L2 and thus results in this constant slowing down otherwise similarly (from asymptotic analysis perspective) complexity algos.

ajtulloch · on Nov 7, 2023

It's quite unfavorable on modern hardware. A Sapphire Rapids core can do 2 separate 32 half-precision FMAs (vfmadd132ph, [1]) per clock, which is 128 FLOPs/cycle. It is not possible to achieve that kind of throughput with an 8-bit LUT and accumulation, even just a shuffle with vpshufb is too slow.

[1]: https://www.intel.com/content/www/us/en/docs/intrinsics-guid...

anonymoushn · on Nov 7, 2023

That's absolutely wild. If you really only needed vpshufb, the throughput is the same in terms of values, because there are twice as many values per register and you get to retire half as many instructions, but it takes a bunch more instructions to combine the two inputs and apply a LUT of 256 values :(

phkahler · on Nov 8, 2023

>> It's quite unfavorable on modern hardware.

Fair point. It might help if the system is DRAM bandwidth limited, so reducing the data size helps even though individual operations take multiple instructions. But that is not the situation with todays hardware.

mattsan · on Nov 11, 2023

Hmmm is there anything preventing dedicated AI chips to have this LUT built in and to vectorise it too?

ajtulloch · on Sept 21, 2023

There’s a big long list of these in https://web.math.princeton.edu/generals/, not just Tao’s.

krackers · on Sept 21, 2023

Bhargava's one is fun to read: https://web.math.princeton.edu/generals/bhargava_manjul

zach · on Sept 21, 2023

I noticed that Evan O’Dorney’s generals was chaired by Bhargava and this question was asked:

  What is Brauer's theorem?
    [I had no idea and they moved on]

He clearly should have read Bhargava’s generals, where Andrew Wiles asks Bhargava the same question!

xdavidliu · on Sept 21, 2023

I don't like that question because it asks for recollection of a name, as opposed to taking the theorem "when X is true, then Y is true" and changing the question into the form "when X is true, ____???".

Worst case I've seen of this was when I was in 9th grade and our geometry teacher required us to memorize the chapter and section names of theorems in the book when proving. For example, in our proofs about triangles, we had to write "theorem 12.5" or else we wouldn't get credit on the test, and here 12.5 was the chapter and section number in the particular textbook, which is an utterly useless piece of info.

Of course, the name Brauer is not nearly as useless as a chapter name, but still being familiar with math history probably shouldn't be hard requirement for being a professional mathematician.

The_suffocated · on Sept 21, 2023

I think his generals are easier than Tao's. I wonder how many "average" PhD candidates worldwide can answer the questions in Tao's generals satisfactorily without difficulties. Many of them just seem highly research-oriented.

zem · on Sept 21, 2023

that was a very entertaining one!

adharmad · on Sept 21, 2023

Quite a few future field medalists in that list. The collective math IQ of the group is off the charts!

ajtulloch · on Sept 1, 2023

I think the mathematical concept that you are looking for is that of the dual space. Essentially if you have a vector space V, you can construct a dual space V* where the elements of the dual space are functions taking elements of V to the underlying field F, and under certain conditions these spaces are isomorphic (the same) - so there is a 1:1 correspondence between elements of the vector space and the functions in the dual space.

ajtulloch · on July 30, 2023

One way to view this formula is to use the fact that the Beta distribution is a conjugate prior for the binomial distribution.

Essentially if you have a Beta(a, b) prior then your prior mean is a/(a+b) and after observing n samples from a Bernoulli distribution that are all positive, your posterior is Beta(a+n, b) with posterior mean (a+n)/(a+n+b). So in your example you effectively have a Beta(0, x) prior and x (“suspicious”/“gullible”) is directly interpreted as the strength of your prior!

TekMol · on July 30, 2023

Can this way to view the formula be expressed without the terms

    beta distribution
    conjugate
    prior
    binomial distribution
    bernoulli distribution
    posterior

?

Because I could easily grasp that it is a "trust formula" in the way mg described it. But this way to "view" the formula is a mistery to me.

ivansavz · on July 30, 2023

Yeah, that's a lot of jargon associated with Bayesian statistics, but at it's root the idea is simple. How to merge information you have before observing some data (a.k.a. prior) with new information you just observed, to obtain updated information (a.k.a. posterior) that includes both what you believed initially + the new evidence you observed.

The probability machinery (Bayes rule) is a principled way to do this, and in the case of count data (number of positive reviews for the cafe) works out to give be a simple fraction n/(n+x).

Define: x = parameter of how skeptical you are in general about the quality of cafes (large x very sceptical), m = number of positive reviews for the cafe,

p = m+1 / (m+1+x) your belief (expressed as a probability) that the cafe is good after hearing m positive reviews about it.

Learning about the binomial and the beta distribution would help you see where the formula comes from. People really like Bayesian machinery, because it has a logical/consistent feel: i.e. rather than coming up with some formula out of thin air, you derive the formula based on general rules about reasoning under uncertainty + updating beliefs.

bigbillheck · on July 30, 2023

> Can this way to view the formula be expressed without the terms

You're asking "Can this way of viewing the formula in terms of Bayesian probability be expressed without any of the machinery of Bayesian probability?".

checkyoursudo · on July 30, 2023

Have you ever heard of the "Up Goer 5"?

svalorzen · on July 30, 2023

Also, in case anyone is interested, the uninformative Jeffreys prior for this in Bayesian statistics (meaning it does not assume anything and is invariant to certain transformations of the inputs) is Beta(0.5, 0.5). Thus the initial guess is 0.5, and it evolves from there from the data.

CrazyStat · on July 30, 2023

Isn't 0.5 an absurd guess for the probability of a new restaurant being exceptionally good?

ajtulloch · on July 17, 2023

https://www.inference.org.uk/itprnn/book.pdf is a classic text on this connection.

ajtulloch · on June 23, 2023

You're probably thinking of https://en.wikipedia.org/wiki/Oliver_Cromwell and not https://en.wikipedia.org/wiki/Thomas_Cromwell.

neaden · on June 23, 2023

No, I was not. While Oliver certainly killed more, the reformation and crackdown that Thomas led claimed enough victims that I'm comfortable calling it mass murdr.

ajtulloch · on April 6, 2023

https://github.com/facebook/folly/blob/main/folly/docs/Synch...

ajtulloch · on Feb 14, 2023

https://quantian.substack.com/p/from-trinity-to-liquidity is an excellent example of applying this idea to the problem of inferring the yield of a nuclear explosion and price impact of a trade in an order book.

ajtulloch · on Oct 19, 2022

“Fast Search in Hamming Space with Multi-Index Hashing” (https://www.cs.toronto.edu/~norouzi/research/papers/multi_in...) is a great paper. Note that you can do significantly better (in terms of query latency/throughput) with specialized implementations as in eg FAISS (https://github.com/facebookresearch/faiss/wiki/Binary-hashin...).