We present an approach that estimates missing values in the time-frequency domain of audio signals.
By Jinyu Han, Gautham J. Mysore and Bryan Pardo
Work presented at LVA/ICA 2012
Tel-Aviv, Israel
March 12-15, 2012.
We present an approach that estimates missing values in the time-frequency domain of audio signals.
Non-negative spectrogram factorization refers to a class of methods including non-negative matrix factorization and probabilistic latent component analysis (PLCA), which are used to factorize spectrograms. In this discussion, we will use the specific case of PLCA. However, the ideas generalize to most such methods.
One of the problems with PLCA is that:
In the following examples we automatically fill in the time-frequency domain using our proposed method and compare the results to a method based Probabilistic Latent Component Analysis. Here are two examples with large regions of the spectrogram missing.
In this example, PLCA produces a reconstruction with a lots of high frequency noise, while the reconstruction by the proposed method is much more clean. Although the reconstructed signal by the proposed method sounds less full in the high frequency range, we still find that it is more perceptually pleasing than adding extra noise in the high frequency domain
In this example, PLCA gets the temporal dynamics of the audio wrong. It is obviously to hear that too much energy is asigned to the percussion sound (the bright vertical strip in the spectrogram) in the music. In contrast, the reconstructed signal by our method has a better temporal dynamics.
This work was supported in part by NSF grant numbers IIS-0643752.
This project is a collaboration between Interactive Audio Lab of Northwestern University and Adobe Advanced Technology Labs.