Table of Contents

In the age of digitalization, cases of cybercrime are no longer uncommon. The cybersecurity field is constantly plagued by various threats, including phishing attacks, malware infections, and password theft, to name a few.

However, with rapid advances in artificial intelligence, a new, worrying threat has emerged.

Artificial intelligence is now able to recognize typing noises and decipher the typed text.

This form of attack, known as acoustic side-channel attacks, represents a new but significant threat to the digital realm. Experts express concerns that video conferencing platforms such as Zoom and the regular use of built-in microphones exponentially increase these risks.

Prominent researchers such as Joshua Harrison, Ehsan Toreini and Maryam Mehrnezhad emphasize that the threat of acoustic side-channel attacks is increasing due to recent advances in deep learning and the Increase in online activity via personal devices is greater than ever before.

How does acoustic snooping work?

The mechanism underlying the acoustic attack involves the complicated mapping of typed letters to the acoustic emissions produced by keystrokes.

Every time a key is pressed, it emits a distinctive acoustic signature, a complex interplay of factors such as the activation of the key switch, the depression of the keycap itself, the duration of the resulting sound, and even the intervals between successive keystrokes.

All of this results in each key on a keyboard producing a unique sound profile when struck, which is imperceptible to the human ear but contains valuable information about the keystrokes being performed.

Remarkably, advanced artificial intelligence systems are able to decipher this symphony of typing sounds.

By analyzing audio recordings that record these seemingly everyday acoustic signals, the AI system can reconstruct the exact letters, symbols and numbers that were typed.

A worrying aspect of this technology is that any device equipped with a microphone can fall victim to this form of covert surveillance – be it a laptop, smartphone or other device with audio input capabilities.

The AI system, studied by researchers from the universities of London, Durham and Surrey, demonstrated impressive levels of performance. When tested with a MacBook Pro keyboard the system achieved a remarkable hit rate of 93-95%. The system’s ability was to identify the exact keys that were pressed solely by analyzing the corresponding audio recordings.

Investigating the threats posed by acoustic attacks

The emergence of acoustic attacks, enabled by artificial intelligence’s ability to decipher keystrokes from typing sounds, has become a concerning attack vector.

The widespread use of video conferencing apps such as Zoom, MS Teams and Skype during the surge in remote working has increased the risk of audio eavesdropping. In the midst of the pandemic, Zoom’s user base jumped from 10 million in 2019 to over 300 million in 2020.

Modern device microphones have the potential to effortlessly capture typing sounds and relay them to AI monitoring tools. This significantly increases the threat of acoustic attacks and allows the theft of sensitive data such as passwords, financial data, credit card information and confidential messages.

Often, users unknowingly share sensitive information while having audio or video calls, without realizing that an ever-present microphone is transmitting distinct keystroke sounds.

Attackers can exploit this by using simple speech recognition algorithms to comb through these audio streams and separate the voices to extract the typing sounds. Noise reduction technologies further improve call quality by eliminating background noise and making reception of acoustic key signals clearer. The range of risks associated with acoustic espionage is expanding, affecting both individuals and entire corporate networks.

Some areas where these attacks can be used:

Sensitive financial and confidential information is vulnerable to compromise, including but not limited to social security numbers, credit card details, bank account information and passwords. Such breaches can contribute to identity theft and various forms of financial fraud.
Cybercriminals are able to acoustically intercept the master keys of password managers and thus decrypt entire identity databases within an organization.
Corporate espionage attacks secretly intercept business plans, confidential company information and financial details.
Advertising technology companies have the ability to secretly collect extensive personal and behavioral data by secretly monitoring users’ typing patterns.
All personal information is at high risk, including intimate details that could be exposed or traded. This poses a threat to professionals such as politicians, journalists and lawyers and can cause embarrassment and reputational damage.

How can you protect yourself?

User vigilance and caution are the most important factors in protecting one’s privacy.

When using video chat applications, it is essential to skillfully manage microphones, use anomaly detection, use encryption and use special physical microphone blockers to effectively prevent acoustic eavesdropping opportunities.

Below are some of the safety measures that should be followed.

Use highly random and strong passwords.
Enter passwords silently.
Enable two-factor or multi-factor authentication.
Use audio masking devices.
Do not enter sensitive information during a video call.
Prefer keyboards that are equipped with acoustic shielding.
Use of touch-sensitive keyboards.
Include ambient noise.

Do we really need to worry?

While the risk is ever-present, our concern may not be as great as it initially sounds.

First, the effectiveness of such methods depends heavily on collecting an appropriate and unique set of sample data – a criterion that varies with different systems such as smartphones, tablets, MacBooks or external keyboards.

Another influential aspect is the individual typing style – from strong keystrokes to gentle touches. These differences in sound dynamics can pose a challenge to the precision of sound capture devices. As a result, creating an accurate data set that reflects your unique typing patterns and keyboard is a complex undertaking.

Additionally, interpreting the tone sequence and distinguishing between items such as passwords and email addresses presents a huge challenge. Some AI models may also have difficulty distinguishing between certain keys such as “Shift” or “Ctrl”.

The bottom line

The emergence of acoustic side-channel attacks represents a potentially transformative shift in the cyber threat space. These acoustic eavesdropping techniques challenge the fundamentals of privacy and confidentiality in data entry.

While users feel safe when visual access is limited, interpreting keystroke sounds can compromise supposedly safe environments.

Machine learning algorithms trained on audio snippets of typing show amazing abilities at extracting textual information.

With various research showing over 90% accuracy in isolating keystrokes from audio alone, implementing innovative and sophisticated defense strategies is necessary to address this threat.

Crypto exchanges with the lowest fees 2023