MOS prediction
In the Output Caption With Target column, we show the output caption with the target sound class in the prompt.
In the MOSRA MOS prediction column, we show the output MOS Predicted by the MOSRA [1] model. This model makes predictions using only the audio, and does not have access to the signal class.
Input Audio | Input Prompt | Target Sound Ground Truth |
Output Caption With Target | MOSRA [1] MOS prediction |
---|---|---|---|---|
Paying attention to the music assess the audio quality | The audio quality is 4.6 | The audio quality is 2.9 | 1.1 | |
Paying attention to the music assess the audio quality | The audio quality is 4.4 | The audio quality is 4.1 | 1.1 | |
Paying attention to the music assess the audio quality | The audio quality is 5.0 | The audio quality is 4.3 | 1.2 | |
Paying attention to the speech assess the audio quality | The audio quality is 3.8 | The audio quality is 4.8 | 4.8 | |
Paying attention to the speech assess the audio quality | The audio quality is 4.2 | The audio quality is 2.8 | 2.1 | |
Paying attention to the speech assess the audio quality | The audio quality is 2.1 | The audio quality is 2.8 | 3.0 |
SNR prediction
In the Output Caption With Target column, we show the output caption with the target sound class in the prompt:
In the MOSRA SNR prediction column, we show the output SNR Predicted by the MOSRA model. This model makes predictions using only the audio, and does not have access to the signal class.
Input Audio | Input Prompt | Target Sound Ground Truth |
Output Caption With Target | MOSRA [1] SNR prediction |
---|---|---|---|---|
Paying attention to the dog estimate the SNR | The SNR is 12.1 | The SNR is 9.8 | -16.9 | |
Paying attention to the chainsaw estimate the SNR | The SNR is 11.8 | The SNR is 17.6 | -15.5 | |
Paying attention to the keyboard_typing estimate the SNR | The SNR is 7.2 | The SNR is 7.2 | -23.9 | |
Paying attention to the crickets estimate the SNR | The SNR is -19.5 | The SNR is -17.6 | -19.6 | |
Paying attention to the drinking_sipping estimate the SNR | The SNR is -8.8 | The SNR is -4.9 | -23.9 | |
Paying attention to the sneezing estimate the SNR | The SNR is 19.2 | The SNR is 19.9 | -20.4 | |
Paying attention to the frog estimate the SNR | The SNR is 5.4 | The SNR is 7.4 | 6.5 | |
Paying attention to the sea_waves estimate the SNR | The SNR is -8.7 | The SNR is -10.8 | -20.5 |