Publication | Closed Access
Mutual Information Maximization for Effective Lip Reading
71
Citations
22
References
2020
Year
Unknown Venue
Lip reading has received an increasing research interest in recent years due to the rapid development of deep learning and its widespread potential applications. One key point to obtain good performance for the lip reading task depends heavily on how effective the representation can be used to capture the lip movement information and meanwhile to resist the noises resulted by the change of pose, lighting conditions, speaker's appearance, speaking speed and so on. Towards this target, we propose to introduce the mutual information constraints on both the local feature's level and the global sequence's level to enhance the relations of them with the speech content. On the one hand, we require the features generated at each time step to carry a strong relation with the speech content by imposing the local mutual information maximization constraint (LMIM), so as to improve the model's ability to discover fine-grained lip movements and the finegrained differences between words with similar pronunciation, such as “spend” and “spending”. On the other hand, we introduce the mutual information maximization constraint on the global sequence's level (GMIM), to make the model be able to pay more attention to discriminate key frames related with the speech content, and less to various noises appeared in the speaking process. By combining these two advantages together, the proposed method is expected to be both discriminative and robust for effective lip reading. To verify this method, we evaluate on two large-scale benchmarks whose videos are collected from several TV shows with a wide coverage of the speaking conditions. We perform a detailed analysis and comparison on several aspects, including the comparison with the baseline of the LMIM and GMIM, and the visualization of the learned representation. The results not only prove the effectiveness of the proposed method but also report new state-of-the-art performance on both the two benchmarks.
| Year | Citations | |
|---|---|---|
Page 1
Page 1