Author Information
Abstract
Video summarization (VS) refers to extraction of key clips with important information from long videos to compose the short videos. The video summaries are derived by capturing a variable range of time dependencies between video frames. A large body of works on VS have been proposed in recent years, but how to effectively select the key frames is still a changing issue. To this end, this paper presents a novel U-shaped non-local network for evaluating the probability of each frame selected as a summary from the original video. We exploit a reinforcement learning framework to enable unsupervised summarization of videos. Frames with high probability scores are included into a generated summary. Furthermore, a reward function is defined that encourages the network to select more representative and diverse video frames. Experiments conducted on two benchmark datasets with standard, enhanced and transmission settings demonstrate that the proposed approach outperforms the state-of-the-art unsupervised methods.
Graphical Abstract

References

This work is licensed under a This work is licensed under a Creative Commons Attribution 4.0 International License.