Worth pointing out that calculating p-values on a wide set of metrics and select...

flowerthoughts · 2026-02-26T07:37:24 1772091444

> The idea is, since data has a ~1/20 chance of having a p < 0.05

Are you saying p is uniformly distributed over any data set? That doesn't jive with my limited understanding of entropy. What's this based on?

Mumps · 2026-02-26T16:00:14 1772121614

Yes, if OP did a full vocabulary comparison and took just those sub-threshold, it would be hacking. I'm not sure that's the case here, though? Given that (the post) OP started with em-dash, and probably didn't do repeated sampling, then it should be a pretty fair hypothesis that em-dash usage is a marker.

Your comment about p<0.05, feels out of place to me. The p-values here are << 0.05. Like waaaaay lower.

Perhaps Fisher's exact is more appropriate, on the per-word basis?

murphyslab · 2026-02-26T16:13:59 1772122439

A Bonferroni correction would be suitable. I usually see it used in genome-wide association studies (GWAS) that check to see if a trait or phenotype is influenced by any single nucleotide polymorphisms (SNPs) in a genome. So it's doing multiple testing on a scale of ~1 million.

> One of the simplest approaches to correct for multiple testing is the Bonferroni correction. The Bonferroni correction adjusts the alpha value from α = 0.05 to α = (0.05/k) where k is the number of statistical tests conducted. For a typical GWAS using 500,000 SNPs, statistical significance of a SNP association would be set at 1e-7. This correction is the most conservative, as it assumes that each association test of the 500,000 is independent of all other tests – an assumption that is generally untrue due to linkage disequilibrium among GWAS markers.

https://journals.plos.org/ploscompbiol/article?id=10.1371/jo...

cf: https://en.wikipedia.org/wiki/Bonferroni_correction

marginalia_nu · 2026-02-26T17:46:09 1772127969

I think these term frequency comparisons are probably a pretty blunt tool, as some of the most well known AI indicators aren't words, but turns of phrase and sentence structure.

IMO a more interesting experiment would be to show comments to people (that haven't seen these conclusions), and have them assess whether they suspect them of being bots or AI authored, and then correlate that with account age.