Performance of Four Subjective Video Quality Assessment Protocols and Impact of Different Rating Preprocessing and Analysis Methods

Abstract

Standardization bodies recommend various protocols to conduct subjective quality assessment (QA) of imaging systems. While many studies have compared these QA protocols, few have assessed the impact of different approaches for preprocessing and analyzing quality ratings. Furthermore, the effect of large versus small quality differences on the discrimination performance of protocols has not been extensively studied in video QA. This study examines these issues for four QA protocols. H.264 compressed medical videos and denoised natural scene videos were evaluated by expert and naive subjects. Scores were collected with four QA protocols-forced choice (FC), two ratio-scaled paired comparison methods: preference (Pref) and dissimilarity (Dissim), and single stimulus (SS)-and analyzed using combinations of different rating preprocessing approaches, generating a total of 14 methods. Performance metrics-probability and effect size-quantified the ability of the methods to discriminate between quality levels. The Pref and Dissim methods analyzed with classical multidimensional scaling and the SS method with Z-score transformation consistently outperformed the other methods. The type of preprocessing introduced large differences in the performance of individual protocols. Grouping stimuli pairs by small and large quality differences introduced significant differences in the performance rankings of the methods, with Pref and Dissim being most sensitive to small quality differences. For the future, we suggest further validation of the FC method, due to its simplicity and ease of use, and continued investigation into more robust raw scores transformation and statistical analysis methods for both SS and FC ratings.

References

Page 1

	Year	Citations

Page 1