What's wild to me is that people worry about writing style fingerprinting while casually uploading their literal DNA to consumer genomics companies. 23andMe went bankrupt and suddenly 15 million people's most identifying data imaginable is an asset in a fire sale.
Your writing style can theoretically be masked with an LLM. Your genome can't. And it doesn't just identify you -- it identifies your relatives, your disease risks, your ancestry, things you might not even know about yourself yet. The deanonymization vector here is permanent and irrevocable in a way that no amount of OPSEC can fix after the fact.
The semantic approach in this paper (interests, clues, behavioral patterns) is scary enough. Now imagine combining that with leaked genetic data. You don't even need to match writing styles when you can match someone's 23andMe profile to their health subreddit posts about conditions they're genetically predisposed to.
Your writing style can theoretically be masked with an LLM. Your genome can't. And it doesn't just identify you -- it identifies your relatives, your disease risks, your ancestry, things you might not even know about yourself yet. The deanonymization vector here is permanent and irrevocable in a way that no amount of OPSEC can fix after the fact.
The semantic approach in this paper (interests, clues, behavioral patterns) is scary enough. Now imagine combining that with leaked genetic data. You don't even need to match writing styles when you can match someone's 23andMe profile to their health subreddit posts about conditions they're genetically predisposed to.