1. Two ears - two loudspeakers ?

 

Many of the stereo freaks devoutly believing, in case all components in the transmitting chain are perfect, the audio reproduction will be perfect. If two solid state microphones grasp the sound field on listeners point in the recording room, the loudspeakers should deliver perfect reproduction. After all, we have only two ears. However, important physical interrelations remain unconsidered in that view. All we can hope in traditional audio is tonal accuracy, veritable spatial reproduction will never happen. Those can count for lost already during the recording process.



1.1 The spatiality of recording

In the recording room we are hit from the direct wave front and its reflections from all possible directions. In the azimuth level interaural time differences (ITD) occur between listener's ears, dependent from starting position of the wave front or reflection. That's an important sign, yet only one part of spatial information. ITDs get evaluating between app 100 Hz and 3.6 kHz very accurately. Hereunder the differences in phase too small, above the result become ambivalent, because more as one wavelength fits between our ears. Further ambivalence arises regarding front and rear. Same ITD`s occur whether the source in front or behind the listener. Over and above, ITS`s absolutely independent from elevation of the source, run time detection doesn’t work in elevation level. For this reason we need additionally cues. We use the interaural level changes ILD, which arise by wave diffraction at head and shoulders in upper frequency range as well by resonance effects in outer ear. The individual moulding of the pinna is causing resonances, which change depending from elevation angle. Our individual listening experience, caused by the shape of head and pinna, provide excellent determination of the source position in all three room dimensions as long as Head Related Transfer Functions deliver correspondending signals regarding the ITD cues.

Our recording technology doesn't work so excellent by far. Two microphones on listener's position receive correct time differences, but the correct amplitude response irrecoverably lost during record. Spherical micros work widely independent from source direction, cardioids loose amplitude in upper frequency range outside of its axis. In any case its directional effect cannot develop notches and hills in frequency response, which provides human perceiving system dependent from angle of incidence for accurate spatial detection of the sound source. Later, during playback, we spend a lot of money for the last 2 dB linearity in frequency response, but during recording process we tolerate amplitude differences of 20 dB and more versus the signals at human eardrum.

However, without correct Interaural Level Differences (ILD), the recording is unavoidably reduced to the horizontal plane. Moreover, important information regarding the source direction was going lost, because the strong selective effect of the Head Related Transfer Function never included in the transmitting chain. That's less important for the position of the source itself. In front range ILD values are small, at least in the horizontal level. Yet the listeners are encountering by the first sound reflections from above, behind, and from all possible other directions. Without correct notches and hills in frequency response the reproduction is generating wrong cues. Its amplitude cues sometimes direct in contradiction regarding the time cues. Listening fatigue is only one of the results, because the spatial distribution of the first strong reflections is one of the most important facts for audio perception. Subjective perceived volume, speech intelligibility and estimation regarding source distance will be strongly disturbed by wrong cues. The attempt for producing spatial impression with later reverberation remains unconvincingly. Such late reflections provide important cues regarding fine structure and reflective behaviour of the surfaces in the recording room, yet reach the listener from all directions. Thus we are hardly able to allocate any concrete direction for the reverberation tail.

Studio productions normally don’t become recorded by microphone pairs; each signal gets his position in space according to the intention of the producer during down mix. Yet the problems are the same. All source signals are subdivided by pan pots between the speakers. The spatial impression ought to be generated by the assignment of faked reverb. Only sometimes such process is producing proper early reflections, causing satisfying spatial perception. Yet we cannot reach allocating those reflections from nearly infinite amount of source positions, which arise in the recording room, but rather from few loudspeaker positions.

 

1.2 The perception of the phantom acoustic source

All conventionally audio procedures, also the surround formats, produce phantom acoustic sources, which perceive between single loudspeakers. Those sound sources aren't real thus, but build up in brain by psychoacoustic connection of both ear signals. Unfortunately, such sound sources don't reveal the same behavior as real sound sources. We cannot keep our ears closely at it, differently regarding real sound sources the phantom source moves with the listener's position. In fact, we aren't hearing a sound source, but two.

For perceiving the same direction, two phantom source loudspeakers must produce much more differences between signals as real sound sources. For example: At a 30 degree azimuth angle, the genuine acoustic source causing an interaural time difference of 0.3 milliseconds and, for instance five dB interaural sound level difference. Radiated from two loudspeakers, from such values generate only approximately 10 degrees angle in Azimuth for the perceived Phantom acoustic source. For 30 degrees we require 1.5 milliseconds time- and 18 dB level difference.

The reason for that loss of the spatial impressions is crosstalk between the ears. The signal of the left speaker attains not the left ear alone. With detour around the head, the wave fronts reach the right side and converse. That exalts the Interaural Cross Correlation Coefficient IACC, one of the most important values regarding our spatial impression. We sense a sound event as spatial, if the signal difference between our ears is as high as possible. As far as both signals are utterly different, the IACC is zero, there is no correlation between the signals. If both signals are the same, for example, at mono headphone playback, the IACC value is one. At 0.3 a sound source in a free environment reaching the most possible difference, if the wave fronts reaching us from circa 55 degree azimuth angle. In acoustical famous venues a lot of the first reflections come from that range. If the direct sound source placed at the other side, such reflections are known as “acoustic attraction” engenders gooseflesh at the time the horns establish during the Brahms concert.

Conventionally loudspeaker reproduction cannot originate such experience, because the right box is only app. 30 degrees off from the median axis. Yet, the more closely spacing causes less IACC, consequently, less attractively. Furthermore, in the concert hall the central ceiling reflections hardly improve the spatial impression. All or part of such central reflections are contra- productive sometimes. Well educated architects know that and try leading such wave fronts sideward. Because of the crosstalk between the ears, our stereo loudspeakers cannot reach IACC below approximately 0.6, the concert hall experience remains out of reach therefore. All experiments mentioned to produce sound sources outside the loudspeaker base by inverse phase and other tricks are in vain and dilettantish. During playback, the playback room alone can produce sound sources in that range by its own reflections. Even so, such reflections are disturbing in most cases, because the sound detours very differently regarding to the recording room.


Without the correct spatial distribution of the first reflections drop away important cues for estimate distance regarding the direct sound source. The starting point of phantom source is in any case between the boxes, not before and not behind that line. That's comprehensible, if we look at different listener positions. For example, if in the concert hall the violin is placed exactly before the timbale, we hear both instruments from the same direction. If we move now to the right wall at the concert hall, the violin will perceive clearly left from timbale. By phantom source reproduction at home yet, both instruments remain on the same position, independent of the listener's position. Both sound sources feature the same starting point. The phantom sources are between the speakers, not behind and never in front.

Though, I have to admit in the real world also we cannot estimate the distance regarding the sound source directly. The signal differences between both receptors in audio we use for determination of direction and not, as like with the eyes, to estimate the distance. For audio the most important indication is the volume, loud sources nearby in normal case. However, phantom acoustic sources cannot be more converged at the listener as undirected radiating loudspeaker boxes. Those cannot provide better direct wave / diffuse field relation as the loudspeaker itself in playback room, realistic proximity effects not reachable thus. That restriction underrates widely, nearby sources or reflection sources are supremely important for spatial impression as well as an emotional impact of perception.


1.3 Two rooms in one record

The record room acoustics of the concert hall are stored in the record. Nevertheless, during reproduction the playback room superimposes its reflections on the recording. In main audio range average loudspeakers radiate unidirectional. That is causing strong playback room reflection. The level of those unwanted signals depends from reverberation time of the playback room and directive efficiency of the loudspeakers. In normal dwellings, the diffuse sound surmounting the direct wave in level less than one meter distanced from loudspeakers. Most of the amount we perceive from the sound event is caused by the playback room!


The problem in this matter is less the reverberation, causing at the playback room. We can tolerate additional reverberation as far as the amount remains below the reverberation level in recording environment. Much more disturbing are the early reflections of the playback room. Because their size is mostly much smaller as concert halls, the detours are peerlessly shorter. The near surfaces cause strong reflections, which level often hardly below the direct wave level. Superposition of both signals causing comb filter effects extend to 20 dB notches in the frequency response. Such comb filter effects arise in recording room by the same principles. Even so, the frequencies of the notches and hills dependent from the time differences of both signal components. Because our perception significantly relies on learnt patterns of stimulus, we assign resulting frequency response to the correct room impression. However, the playback room caused notches and hills as misguiding cues. We can degrade its impact by damping the playback room or at more directed radiation of the loudspeakers. But, the advantage of improved tonal accuracy became to devour by loss of the spatial impressions at such a way. Sometimes, if the playback room is not too different regarding the recording room, its reflection cause very authentic sound field with a superb spatial impression. On the other hand, if completely damped the playback environment, the sound is boring, less attractive and the reduction to the horizontal plane becomes very disturbing.

In that connection, the experiments of Acoustic Research in late 80´s in the Carnegie Hall are very interesting. Good speaker boxes were placed on the stage, as like the artists placed normally. Each speaker was guided by dry recorded audio of its artist. For the spectators, it was hardly possible defining reproduction or genuine event. Obviously, the spatial distribution of the sound reflections is the most important requirement for true spatial audio, at least major important as the last 2dB linearity in frequency response. Improved directive effect of the loudspeakers or damping the playback room avoids misguiding cues, but cannot create the correct reflections. But, that's an essential condition for true spatial audio. The phantom acoustic sources of the conventional procedures aren't able to restore correct spatial behavior. Simply boosting the amount of separate transmitting channels cannot be the solution for the described fundamental problems. Only the new approaches, like Ambisonics or Wave Field Synthesis would have the ability for surmount those problems. The virtual acoustic sources of the Wave Field Synthesis, in principle, provide the possibility to restore the genuine sound field physically. WFS- Holophony would produce a virtual copy of the genuine sonic field in all three dimensions. It doesn´t establish at phantom acoustic sources, but virtual acoustic sources, which are described at the next page.

 

Tow rooms in one record

 

 

last update 2010-08-11