Acoustics: IBM Research Report: Reports and documents on the acoustical evidence in the assassination of John F. Kennedy

II. CROSS-CORRELATION OF POWER SPECTRA
Spectral feature matching is a subjective process. A more objective experiment is to compare the two spectra by calculating a measure of their similarity. The cross-correlation coefficient ρ between two functions X and Y is defined as

ρ = [ Σ X · Y ]/[( Σ X² )·( Σ Y²]^½

where the summation is carried out over a suitably placed window on the spectra X(t,f) and Y(t,f). One can compute ρ as a function of time displacement between the two spectra by sliding one spectra over the other, if the relative timing of the two sources is not known.
Because of phase distortion introduced by various communication/recording processes, we did not look for phase correlation between the two spectra. Therefore, the calculation of correlation was limited to spectral intensities (squared magnitudes of the Fourier transform). For the purpose of power spectral computation, the signal was sampled at 20 kHz; length-400 blocks (equivalent to 20 ms) were formed for Fourier transformation; before transformation the sequences were multiplied by a window and then transformed using length-400 FFT. This resulted in power spectra with 50 Hz resolution. The effect of the window is to broaden pure tones to about 2-3 frequency bins (100-150 Hz). In the time domain, these blocks were 50% overlapped, thus a new FFT was computed every 10 ms.
Let S₁(t,f) and S₂(t,f) represent power spectra of the two channels, where t is a multiple of 10 ms and f is a multiple of 50 Hz. Although FFT computes power spectra up to 10 kHz (the Nyquist rate), there is very, little energy in Channel-I spectra beyond 3 kHz and in Channel-II spectra beyond 4 kHz.
Assuming that the two spectra contain a common audio signal, they should correlate well if they are properly aligned and corrected for difference in the recording speeds of the two devices. On the other hand, the two spectra would not correlate well if they are not properly aligned in time or if the speed difference is not properly corrected. It is sufficient to correct only one of the recordings for the speed difference; without loss of generality, we apply correction to the Channel-II spectra and introduce another parameter W, the spectral warp factor which is the ratio of the two recording speeds. The spectral warping essentially stretches one of the axis (time or frequency) by W and compresses the other axis by W (W may be less than or greater than 1). ρ (τ,W) is computed as a function of W and the time difference τ between the two spectra. For a particular value of W and τ, ρ(τ,W) will be maximum, indicating correct speed compensation and time alignment.
When the Channel-II recording (on its Gray Audograph disc) was played back on the Audograph, the reproducing needle frequently slipped resulting in "repeats." The reproductions of Channel II first used by the Committee had several repeats, as did those used by previous investigators. To avoid this problem, Committee members played the Gray Audograph disc on an audio turntable which gave a repeat-free recording but resulted in another problem because the Gray Audograph was recorded at a constant linear speed. Since the audio turntable operated at a constant angular speed, the resulting variable warp had to be compensated during power spectra computation. This variable warp (speed linearly varying with time due to turntable playback) is combined with the fixed warp W (to compensate for the speed difference in original recordings) in computing the Channel-II power spectra which is on the same time and frequency scale as Channel I. From now on we shall not refer specifically to the variable warp introduced by the playback mechanism as it is predetermined and fully compensated during processing, but we shall search for the fixed warp factor W which maximizes the correlation peak.
On examination of the two spectra, we noticed that Channel-I spectra fell off faster with frequency than Channel-II spectra, indicating that the frequency response of the transfer function from Channel II to Channel I dropped with frequency. To compensate for this drop in Channel-I spectral level, high frequencies of Channel-I spectra were boosted at a rate of 6 dB per 1000 Hz. We also noted large Channel-I energy in low frequency region, presumably due to the motorcycle engine noise. Since this is not expected to be correlated to Channel II. we did not consider power spectra below 600 Hz for the purpose of correlation. Similarly, since there is very little Channel-I energy beyond 3500 Hz, higher frequencies were neglected for the purpose of correlation. Another characteristic noted was that while Channel-II energy remained fairly constant from frame to frame (10 ms), there was a wide fluctuation in Channel-I energy (as it later turned out in our work, this was due to AGC action in Channel-I receiver, discussed in a later section of this report). To compensate for this variation, we normalized Channel-I energy to remain constant from frame to frame. To summarize, for the purpose of computing correlations, only frequency bins in the range 600-3500 Hz were considered, and high frequencies of the Channel-I spectra were boosted at a rate of 6 dB per 1000 Hz and, then normalized to a constant energy in the band of interest.
Correlation as a Function of Time
The two phrases ("Hold-Everything" and "Stemmons") on Channel I were correlated against the corresponding segments on Channel II. To do this, first Channel-II power spectra was compensated by a warp factor which gave the largest correlation peak and then a 2.5 seconds segment of Channel-I spectra was slid against a corresponding 10 seconds segment of Channe1-I1 spectra. This gave 750 correlation coefficients (different values of τ spanning a period of 7.5 seconds with a spacing of 10 ms) which are plotted in Figs. 1 and 2.
If the two spectra were identical, the peak value of ρ would be 1, which is the maximum possible value of the correlation coefficient. But because of the extraneous noise and sounds present in Channel I, this value will never be achieved. Since both the spectra are positive valued functions, their correlation will always be positive and this explains the background level present in these figures. Superposed on this background is a narrow large peak and several broad minor peaks. For both the phrases, the main peaks are similar and narrow (about 70-80 ms wide) and stand out clearly against the background. For the "Hold-Everything" phrase, the absolute value of the peak is somewhat low because during this phrase Channel I is very noisy. During the "Stemmons" phrase, Channel I is less noisy and this is reflected in a higher correlation peak. But for both the phrases, the background level is about the same and the shape and width of the central peak is similar. This clearly proves that the two channels have a common audio signal.
Correlation as a Function of Warp
We also computed correlation for different values of warp (W). For each value of warp, ρ(τ,W) was computed and its peak compared with the peak for the optimum value of warp (W,_opt). Table 1 summarizes these results.

ρ(W)/ρ(W_opt)
W/W_opt "Hold-Everything" "Stemmons"

0.96 0.669 0.452

0.97 0.738 0.616

0.98 0.789 0.758

0.99 0.913 0.922

1.00 1.000 1.000

1.01 0.922 0.896

1.02 0.851 0.777

1.03 0.772 0.615

1.04 0.648 0.462

Table 1. Cross-correlation Coefficient vs. Warp W.
From this table we note that the correlation peak decreases sharply as move away from the optimum warp. This also strengthens our conclusion that these large correlation peaks were not obtained by chance. This clearly establishes that there is an imprint of Channel II on Channel I. But the spectral matching is not perfect; in particular for the crucial "Hold-Everything" phrase (the so-called shots are supposed to be present on Channel I during this phrase), there are several instances of poor match. Also, the wide variation in Channel-I signal level is not explained.
At this stage of the Committee investigation, BRSW put forward another hypothesis to explain the cross-talk. They suggested that the cross-talk may have been picked up accidentally during a rerecording of the two channels. Their hypothesis was that at some time after the assassination, the Channel-I recording was being rerecorded acoustically, while Channel II was being played across the room. Although it is not a very tenable suggestions and it does not explain most of the detailed findings, the possibility that Channel II cross-talk was superimposed later (perhaps by design!) was investigated.

W/W_opt	"Hold-Everything"	"Stemmons"
0.96	0.669	0.452
0.97	0.738	0.616
0.98	0.789	0.758
0.99	0.913	0.922
1.00	1.000	1.000
1.01	0.922	0.896
1.02	0.851	0.777
1.03	0.772	0.615
1.04	0.648	0.462

Next Back

Next
Back