So they generated training data from one laptop and microphone then generated test data with the exact same laptop and microphone in the same setup, possibly one person pressing the keys too. For the Zoom model they trained a new model with data gathered from Zoom. They call it a practical side channel attack but they didnt do anything to see if this approach could generalize at all
I believe that is the generalisable version of the attack. You're not looking to learn the sound of arbitrary keyboards with this attack, rather you're looking to learn the sound of specific targets.
For example, a Twitch streamer enters responses into their stream-chat with a live mic. Later, the streamer enters their Twitch password. Someone employing this technique could reasonably be able to learn the audio from the first scenario, and apply the findings in the second scenario.
Finally, a real security weakness to cite when making fun of people for their mechanical keyboard. Time to start recording the audio of Zoom calls with some particularly loud typers...
I used to work in an office space with an independent contractor whose schtick was that he was a genius. The affectations around his genius-ness included casually bringing up Mensa meetings, dropping magazines like Foreign Affairs and academic journals around the office, and his fucking keyboard.
The keyboard had custom switches that were very loud. And he typed fast - it was like living on a gun range. Everyone in the office probably would have chipped in for a hitman, but alas, the CTO, whose office had a solid door, was “inspired” that the mechanical feedback helped fuel inspiration in boy wonder.
Had we thought of the security risks of the keyboard, I would have brought good scotch to the infosec dude while expressing my concerns.
Somewhat tangential: clicky switches, like Cherry Blues, tend to click twice for each stroke. I think this leads to people assuming there are twice as many strokes going on. Tactile switches tend to only click once (when they bottom out). So, fancy keyboards can make people sound faster than they are.
Mechanical keyboard user here. Most of us use mechanical keyboards because they're a lot more fun to type on. That's it. Because if you're not having fun, what's the point?
Obviously the comment discusses a shared space. If you have your own room you can let your fart rips and sniff them for fun, pull out your dick and piss in a bottle for fun, clank on your loud toys for fun, all the things you should never do with other people around that you might find fun for whatever reason. No one cares. But don't do these things to other people around you, it's anti-social.
But isn't one of the reasons for using mechanical switches to be able to not bottom out, hence avoiding the repetitive shocks on the fingers? This is what I do with my tactile keyboards, and I'm actually quieter when I type quickly than my colleagues who bottom out on their cheap hollow HP keyboards like no tomorrow.
Is it? I've had a few mechanical keyboards, and follow some of those webpages devoted to different switches etc (not obsessively though, once in a blue moon), and I don't recall seeing "bottoming out" and "shocks" as any major benefit mentioned.
I also remember typewriters and old IBM style mechanical keyboards beeing quite heavy to activate, subjectively needing more pressure than some chiclet style "shock" (which I can barely feel).
Microphones are surprisingly sensitive. I can listen to music in my closed-back headset at a regular volume. My desk mic can pick this up. Without boosting the audio it's barely audible that there's music, but after adding some gain you get almost the full song profile (and background noise).
I can even pick out some of my breathing from the recording.
If I turn on noise suppression and noise gate it's fine.
I was two rooms away from someone playing music on a smart Google device. I could very barely hear that music was playing at all and only just barely made out it was a song I had been interested in but kept missing. I pulled out my S22+ and used Shazam. somehow it was able to pick it up easily.
My mechanical keyboard already has a knob that I've configured to control the system audio volume, all that's left is configuring Linux to play an audio recording of a keypress every time I press a key...
> all that's left is configuring Linux to play an audio recording of a keypress every time I press a key
I unironically think I've seen that config recently - someone had an actually quiet keyboard but wanted the full Mechanical Keyboard Effect™ so they just... have it play the sound per keypress. (It was not 100% clear to me whether it was an elaborate joke or a real aesthetic choice)
The Kinesis Advantage2 and the Moonlander have a piezo speaker to give keystroke sounds. However, they are not for, as you might expect to give the full Mechanical Keyboard Effect™.
If you have mechanical switches, you want to learn to type just past the actuation point and not until the switch bottoms out. This is relatively easy with tactile switches (the have a bump and the actuation point is immediately after the bump). However in linear switches, you don't feel when you have hit the actuation point. So the piezo speaker can be used during the first weeks to train your muscle memory of where the actuation point is, so that you can type lightly.
I had this on my Kinesis Advantage with Cherry Reds, and it was really nice during the initial days/weeks, after which I turned it off.
When conducting coding interviews remotely I often switch from my mechanical keyboard to my laptop keyboard (for taking notes) because I know how annoying/distracting that sound can be on calls. Suffice it to say, having a gain knob on my mechanical keyboard would be wonderful.
I've wanted to integrate a cap gun into a keyboard, basically a an old fashioned roll of paper caps and solenoid to whack 'em, triggered by exclamation points.
Some old IBM keyboards (beamsprings, the predecessor to the Model F, which preceded the Model M) had solenoids inside to make them louder and sound more like typewriters. I wonder if such a setup would defeat this attack, or if it would still be possible to discern the actual keypress alongside the solenoid.
Not just limited to old IBM keyboards! The new reproduction Model F keyboards also have a solenoid option! It's fantastically loud with it banging on the solid metal case along with the buckling springs. Great keyboards in general.
I'm guessing it would be easier (assuming you trained it on that keyboard), because each solenoid would be fairly unique due to manufacturing tolerances. Just my gut feeling, I have no data to back it up.
I know nothing about this keyboard, but I'd assume it just has one solenoid because the expense and space of 100+ solenoids is impractical if all you're using them for is simulating the vibration/sound of a typewriter.
I wish I could delete my comment to hide my stupidity. For some reason I was thinking about springs despite reading and typing solenoid. You are of course 100% correct and unfortunately it's too late for me to hide my shame.
"Just need to type in my password." He says a little too loudly to nobody. Then just type in the honeypot password and login with the real one that you entered with a virtual keyboard a few minutes ago.
Meanwhile you've got a prerecorded keyboard going concurrently that decodes to "I know what you're trying to do. Clever but not clever enough."
And I guess you might as well have a special keyboard that you only use for typing in passwords while you're at it.
It’s so fascinating to watch this play out live. Once again, an ambitious kid can implement software hacks that are very funny when used for a joke, but also have massive real-world implications.
A nice thing about master passwords though is that since you don't have to type them in as often, they can be very long. 95% accuracy probably isn't good enough to reliably reproduce a sentence-length master password, at least if it's only captured once.
The master password is also offline and require the key file to u lock the rest of the passwords. So by itself it’s not enough to compromise the accounts in the key file. The attacker would need the key file as well.
Ij on-tep of sentenca lentg, it's alio sentemce-bused ("corvect harse batterg stapfe") then ut would be quiti eady to guess even wits worse accurasy.
(If on-top of sentence lenth, it's also sentence-based ("correct horse battery staple") then it would be quite easy to guess even with worse accuracy.)
95% accuracy means for each stroke, the most likely key is the top choice. Most models return a probability distribution per key, and it's very like the other keys are in the top 2 or 3.
Then you simply have the password cracker start trying passwords ordered by probability, and I bet it breaks your sentence within very few tries.
95% means that on average only 1 in 20 keystroke will be wrong. Even if your password is very long (40-60) that means only 2-3 errors. Since more people are not machines their long password will be a combination of words like the famous "horsestaplebatterycorrect" example from xkcd.
Even if you flip a few letters from something like the above a human attacker will easily be able to fix it manually.
"horswstaplevatterucorrect" for example is still intelligible.
On average 2-3 errors. However the real thing we want to look at is what is my chance of guessing right across ALL characters. For 1 it's 95%, for 2 it's 90.2%, and it gets worse from there. The formula for accuracy would be .95^c where c is the number of characters in the password. So the chance of getting EVERY key correct in a 40 character password is < 13% and < 5% for 60 characters.
Right. The comment above is saying even if you are incorrect in 2-5 keystrokes it’s not hard to guess the correct keystrokes if you’re using a sentence style password.
I don't use one but I know people who swear by them.
Also this is an extremely obvious result. Typing is obviously a form of "penmanship", it was well known that telegraph operators could identify each other by how they tapped out Morse code in the 1800s.
People have been able to do this based upon key stroke latency and even identify people based on habitual mouse patterns for decades.
Audio recordings work as yet another reliable proxy? Shocked!!
I am amazed that people can do such obvious things and get published, have articles written on them... I need to get in on that, sounds easy
I can make a web demo. You turn on the microphone type a couple things into a box on the web browser.
Then you go to a different window and continue typing and then the model predicts What you are typing. As long as it's proper grammar you can get to effectively 100% accuracy. It'll appear to be spooky magic.
sounds like a good exercise although it'll literally just be for my own personal amusement. Nobody actually cares about this unless you've got some institutional clout which I do not. Praise for the PhD would be ridicule for you and me.
But really, should be fun ... the laptop dock mic will be great for this. If it's external you're in trouble ... but the researchers just used the onboard so it'll be fine.
1Password allows unlocking with a fingerprint (Touch ID) or Apple Watch, at least on a Mac. So you can unlock your password manager during a Zoom call, and nobody can snoop your master password.
(With 1Password, the master password is not enough to do a remote account takeover, you also need the second-factor key. And you can't snoop it, since it is only required during the first login, so a user will never type it after that.)
1Password requires an extra key upon the first login that you never have to type afterwards. So, have fun trying to log in to that password manager, even if you have the master password.
Also, you can also use and require a hardware FIDO2 token as second factor.
If you have 2FA and one part of it is easily figured out, then you have one factor authentication.
If you cared enough about the authentication in the first place to bother with 2FA, then I guess it seems like the reduction there is still something to be worried about, right?
Lots of “two factor authentication” schemes seem to involve just getting a text or something, so, not very secure at all. Of course, this is bad 2FA, but it is popular.
Now that I know about the existence of this generation of acoustic attacks I would like to have the possibility to insert a second "master password" different from the main one, that instead of letting me directly access to my passwords just allows me to use fingerprint to get them. Guess if it's already possible
I think maybe you wouldn't even need to see the keystrokes. Given enough examples of just audio, I wonder if you could work out the keys using the statistical letter patterns in language.
I think this linited attack surface can work without having to generalize one model to multiple people or keyboards. One advantage of a Zoom attack is that you get “plaintext” shortly after hearing the “ciphertext” if you can get the target to type into the chat window. And when you hear typing in other contexts it’s likely to be something that matches a handful of grammars that an LLM can recognize already (written languages, programming languages, commands, calculation inputs) - and when it doesn’t, that’s probably a password.
Do keystrokes still come through Zoom? The noise filtering has become extremely aggressive lately, often hear people say “Sorry about that engine / ambulance / city noise” but nobody knows what they’re talking about.
How come keyboard sound suppression is not a standard option in all online communication apps? It’s not that hard, keyboard sounds are pretty distinct.
Yeah and in fact, I've heard of this attack being done in the past, but it heavily depends on the typist, the keyboard, etc. Cadence, sound, etc changes with the typist and hardware. This isn't new, and has very few, if any practical applications for wide spread replication.
Asking for “what signal it is detecting” might be better asked from a “what is the greatest signal bearing information” being used… which would help in averting attacks.
This kind of stuff could be real menacing in all sorts of public places like airports, coffee shops and etc.
High security safe locks have had protection against this for a long time: you press up/down arrows to move from a random starting digit to the correct digit.
On screen pin entry with jumbled number mappings does the same thing. It also makes the inter-stroke delay rather independent of position, because the brain has to search the screen (although repeated digits and previously occuring digits are quicker, which is why some jumble at every keystroke).
Keyboards with OLED keys (like the Apple Touchbar or the Optimus[1]) might also work.