Human’s Perceptions of Music.December 3, 2018
There seem to be a few common misassumptions about human hearing and particularly the term “psychoacoustics” that could use some clarification.
It’s not unusual for people to think of the ear as the main component in human hearing with “psychoacoustics” focusing on listening biases and/or the idea that one can remove parts of a musical signal with no ill effect. Both of these ideas or assumptions oversimplify the process of perception, which requires at least two organs in the body, the ear (or ears) and the brain, not really the “mind” as Sigmund Freud might have imagined it.
The Brain as an Organ (Science, not Conjecture)
The ear is really just an organ, analogous to a microphone that converts sound pressure waves into electrical signals. These electrical signals, or neural impulses, travel into at least three different physical parts of the brain, an organ as well, in progressive stages, each of which processes the signal(s) in more complex ways based on evolutionary biology:
At the beginning of the human hearing system, the brain responds primarily to the loudness of a sound as an evolutionary warning system for potential danger. That’s why alarm clocks work. In fact, hearing is the only perception that remains fully active while we sleep to act as a safeguard against danger so that we remain vigilant of an approaching menace and can react to it while our other senses more or less shut down while we sleep.
After passing through the part of the brain most sensitive to loudness, the electrical signals travel along to the part of the brain that separates sound into component parts including relative loudness, frequency, and matters related to time or phase primarily to recognize the source of the sound, including its distance and location, and to distinguish between noise and speech. In this part of the brain, one can process an incomplete signal but maintain the ability to recognize and respond to speech, so this is really where the part of psychoacoustics that focuses on irrelevant (more commonly described as “less important”) parts of the analog waveform converted to digital and/or electrical signals applies.
In the third or final stage, the electrical signals travel into the part of the brain in which individual neurons respond to or “get excited by” specific frequencies and their relationship to each other in the amplitude, frequency, and time domains to allow for the perception of music, whether it’s a bird whistling or an orchestra playing. In this context, we can and do recognize missing or errant parts of the analog waveform in all three domains.
The Limits of Human Hearing
Human hearing is not strictly limited to sound from 20-20,000 cycles per second or Hertz (abbreviated Hz), named after the German scientist Heinrich Hertz. In fact, human hearing has enormously wide dynamic range measured in decibels (abbreviated dB) and named indirectly in honor of the Scottish-born scientist Alexander Graham Bell.
Because human hearing has such great dynamic range, decibels have a logarithmic relationship to one another meaning that, in the general case, each increase in amplitude of 10 decibels represents a tenfold increase in the actual loudness or pressure of sound, so 20 decibels indicates an increase by a factor of 100, 30 decibels indicates an increase by a factor of 1000, and so on.
Although most people, based on statistical averages, hear the most sound the most loudly from 20-20,000 Hz, we can hear sound below 20Hz and significantly above 20kHz, just at a reduced volume as measured in decibels. At exactly what amplitude one can simply not hear a sound varies, but the sound of an ambulance siren at night in the distance, which you can often hear clearly, will measure anywhere from 30-60dB below “normal”. So you can’t really say that we can’t “hear” a 40kHz sound just because we perceive it at, say, 20-40dB below average compared to the 20-20kHz range.
All sound resonates upwards from its fundamental frequency at progressively lower amplitudes known as overtones, and the recognition of these overtones plays a very important part in human hearing particularly with respect to the perception of music. We often recognize bass not from the reproduction of the fundamental frequency itself, but from the perception of overtones, however much reduced in amplitude, sometimes reaching far above 20kHz (that’s why super tweeters are good, not because they make sound brighter in the sense of creating exaggerated treble).
If you use computer software (like the iZotope RX series) that can analyze a track of music and present it as a single, visual image with time going from left to right, frequency going from bottom to top, and amplitude represented by different colors or changes in brightness, you will see that every signal in an MP3 file that falls below a certain amplitude gets abruptly chopped off at 15kHz, meaning that important overtones that contribute to the perception of bass get truncated, which is why MP3’s sound bright, not because any low bass itself has been removed.
This valid psychoacoustic phenomenon carries more weight in the reproduction of music than the observation that removing frequency information saves bandwidth while maintaining the ability to recognize speech alone.
Spectral visualizers (like the iZotope RX series) more often than not use FFT’s (an area of mathematics that facilitates the conversion of a signal from one form to another), which can’t separate out the relative shift in time or phase of one or more component frequencies in the overall waveform, where the overall waveform consists of many component frequencies just like white light contains red, orange, yellow, green, blue, indigo, and violet color waveforms combined, all at different frequencies, in the electromagnetic spectrum.
Because the electrical signals representing sound travel through the brain in a specific, temporal order, and because different neurons respond to different frequencies of sound in the final stage where we recognize music, any relative delay in the arrival of different frequencies artificially caused by low-pass fliers in DSP will change the perception of sound, particularly music, and create an anomaly in the perception of sound that our brain interprets as, “not quite right”.
Taken to an extreme, these relative shifts in the time alignment of different frequencies might lead us to not recognize something as music at all. This is not the same thing as harmony versus dissonance, which is more a matter of the mathematical relationship between different individual notes in a chord or mode, all still time-aligned to one another.
A Practical Example
Put another way, suppose you have a four-channel source of sound with signals laid out as follows:
Track One: Left High Frequencies
Track Two: Right High Frequencies
Track Three: Left Low Frequencies
Track Four: Right Low Frequencies
Now, suppose that you have two discs, A and B, that only have two channels each:
Track One: Left High Frequencies
Track Two: Right High Frequencies
Track One: Left Low Frequencies
Track Two: Right Low Frequencies
You can still play full-range stereo sound if both discs remain perfectly synchronized; however, even if both discs run at precisely the same speed, if Disc A lags even a few fractions of a second ahead of or behind Disc B, you will hear the difference even though you might not consciously recognize it; but that third part of your brain will perceive the lag and, in all likelihood, something that you can’t quite put your finger on will sound “wrong”.
That’s the same thing as, e.g., having a low-pass filter delay the propagation of some frequencies relative to the others, what I call a “time smear.”
To summarize, sound is not an air-pressure wave that travels into the ear then into a “black box” with all sorts of unknown, pseudo-cognitive biases and associations that often get mislabeled as “psychoacoustics”. That “black box”, the human brain, has specific physical areas that respond to sound in different ways as the electrical or neural signals pass through each stage in order, each stage processing the signal in more sophisticated ways, based on evolutionary biology, until we arrive at the recognition of music, a form of authentication.
“[…] the truth will set you free”–John 8:32, from the Christian Bible (and probably Michael Fremer).
More to follow…”…everything under the sun is in tune but the sun is eclipsed by the moon”–Pink Floyd, from “Eclipse”