A good compressor also needs to compress data from experiments using randomizati...

mjburgess · on July 14, 2023

There is no such thing as "causal data". A causal model is an interpretation of data.

Eg., to say "increasingly energetic motion of molecules leads to increasingly hot water" is an interpretation of a very wide class of equations.

It posits the existence of molecules (a scientific discovery), water, energy, motion, heat, etc. and it provides a means of creating equations&measures tied to each of these terms.

Science is the production of those interpretations. There is no bare "data" which tells you how reality is.

Science isn't "magic trick engineering", it's Explanation. "Compressing tables of data" is something they do in the pseudosciences -- as you've seen, none of it is reproducible: "IQ" is just a compression of survey quizzes. Do you really think it exists?

Do you think you can just compress survey results and claim to have an explanatory model of the most complex system in the entire universe? (a person, society, and their joint interaction) etc.

ML is a temple to pseudoscience, permitted only because the situations it's used in are engineered and low-risk. The whole thing is a dumb trick. You cannot build models of the world from associations in data: that is called superstition.

gwern · on July 14, 2023

You flip a coin to randomize choice of a treatment and record the results. The coin-flips+results is a stream of binary data that can then be compressed well or poorly. A compressor which has built a correct causal model of the effects, whatever those are, will compress better than one which is unable to and can only blindly predict pretreatment results (or worse, predict conditional on the correlations which were just broken by the coin-flip, thereby actually wasting bits to fix its especially erroneous predictions). This is in line with the compression paradigm.

Where do you disagree? Do you think that causal models are completely useless for shortening predictions? Or do you think causality just doesn't exist?

> "Compressing tables of data" is something they do in the pseudosciences -- as you've seen, none of it is reproducible: "IQ" is just a compression of survey quizzes. Do you really think it exists?

That's not even close to correct about IQ. You can measure it from lots of things which are not 'survey quizzes'; fMRIs, for example.

mjburgess · on July 15, 2023

fMRIs are also pseudoscience. Again, just associative models of blood flow.

Causal models do not compress measurement data. There are an infinite number of ways of measuring any phenomena (consider, all possible devices which measure temperature) -- in this sense, they non-uniquely compress "all possible data about the phenomena across all possible measurement systems". Ie., even with "all possible data" there are an infinite number of lossess models of it.

(But we would not even want a lossless model, since "all possible data" includes all measurement systems which have their own dynamics).

When we have an explanatory model of heat (as the kinetics of molecules), we have a textbook of explanations which we use (via reasoning, imagination, etc.) to write down whole families of causal models. So when creating a new device we can determine what it's behaviour will be.

This has nothing to do with a compression of measurement data.

We do not, nor ever have, nor even cloud, determine the causal structure of reality using compression of measurements. Measurement devices are physical systems whose properties are causally determined by target devices -- how they are determined is not "in the data". Absent knowing this, ie., science, the data is just a description of the measurement system -- not the target.