Vox Populi:generating video documentaries from semantically annotated media repositories

Abstract

The context of this research is one or more online video repositories containing severalhours of documentary footage and users possibly interested only in particular topicsof that material. In such a setting it is not possible to craft a single version containingall possible topics the user might like to see, unless including all the material, whichis clearly not feasible. The main motivation for this research is, therefore, to enablean alternative authoring process for film makers to make all their material dynamicallyavailable to users, without having to edit a static final cut that would select out possibleinformative footage.We propose a methodology to automatically organize video material in an editedvideo sequence with a rhetorical structure. This is enabled by defining an annotationschema for the material and a generation process with the following two requirements:• the data repository used by the generation process could be extended by simplyadding annotated material to it• the final resulting structure of the video generation process would seem familiarto a video literate user.The first requirement was satisfied by developing an annotation schema that explicitlyidentifies rhetorical elements in the video material, and a generation process thatassembles longer sequences of video by manipulating the annotations in a bottom-upfashion.The second requirement was satisfied by modeling the generation process accordingto documentary making and general film theory techniques, in particular makingthe role of rhetoric in video documentaries explicit.A specific case study was carried out using material for video documentaries. Theseused an interview structure, where people are asked to make statements about subjectivematters. This category is characterized by rich information encoded in the audiotrack and by the controversy of the different opinions expressed in the interviews.The approach was tested by implementing a system called Vox Populi that realizesa user-driven generation of rhetoric-based video sequences. Using the annotationschema, Vox Populi can be used to generate the story space and to allow the user toselect and browse such a space. The user can specify the topic but also the charactersof the rhetorical dialogue and the rhetoric form of the presentation.Presenting controversial topics can introduce some bias: Vox Populi tries to controlthis by modeling some rhetoric and film theory editing techniques that influencethe bias and by allowing the user to select the point of view she wants the generatedsequence to have.158 SUMMARYOverviewWe present a model to automatically generate documentaries and an implementationof it. We focus on matter-of-opinion documentaries based on interviews. Our modelhas the following characteristics, which are lacking in previous automatic generationapproaches:• it allows the viewer to select the subject and the point of view of the documentary;• it allows the documentarist to add material to the repository without having tospecify how this material should be presented (data-driven approach);• it generates documentaries according to presentation forms used by documentarists.This thesis answers the following research questions:RESEARCH QUESTION 1 (DOCUMENTARY FORM) What characteristics of the presentationforms used by documentaries on matter-of-opinion issues must be modeled?RESEARCH QUESTION 2 (ANNOTATION SCHEMA) What information should be capturedin an annotation schema for an automatic video generation approach where:• the viewer can specify the subject and the point of view,• the documentarist can collect material to be used for documentaries, withouthaving to specify how this material should be presented to the viewer,• the material is presented according to presentation forms used by documentarists?RESEARCH QUESTION 3 (GENERATION PROCESS) How must a generation processbe defined for an automatic video generation approach where:• the viewer can specify the subject and the point of view,• the documentarist can collect material to be used for documentaries, withouthaving to specify how this material should be presented to the viewer,• the material is presented according to presentation forms used by documentarists?RESEARCH QUESTION 4 (AUTHOR SUPPORT) How must a generation process bedefined so that it can give to the documentarist an indication of the quality of thedocumentaries it can generate?Chapter 2To determine what needs to be modeled, we analyze the domain of documentaries andthe process of documentary making. This analysis leads to the definition of HIGHLEVELREQUIREMENTS, which specify the presentation forms a documentary generationmodel can use, and how to edit video material into a correct (according totraditional film making) sequence. These high-level requirements provide an answer toResearch Question Documentary Form [1].In more detail, these high-level requirements restate the first two bullet points,while the third one is further specified using an analysis of the domain. The requirementspoint out the presentation forms that can be used in documentaries, namely thenarrative form (where the presentation of information is organized into stories), thecategorical form (where the presentation of information is organized into categories)and the rhetorical form (where the presentation of information is organized accordingto points of view, positions and arguments). We consider two levels in a story: thelevel of the scene, called micro-level, and the overall structure, called the macro-level.The narrative and categorical forms can be used at the macro-level, while the rhetoricalform must be used at the micro level. The rhetorical form is particularly relevant forour domain, namely matter-of-opinion documentaries. This form is composed of pointsof view (propagandist and binary communicator), which communicate positions (e.g."war in Afghanistan - For"), which in turn are expressed by arguments. Arguments arebased on logos, pathos and ethos techniques. The high-level requirements also specifythat the model should implement a montage technique often used in documentaries topresent interviews. This technique, called vox populi, consists of showing in a rapidsequence how interviewees answer related questions. To avoid misquoting an interviewee,the generation model is required to encode context information for the statementsmade during interviews. For the editing part, the analysis of the documentary-makingprocess requires the generation model to include continuity editing rules as used intraditional film making.Chapter 3Having defined what aspects of the domain need to be modeled, we examine howrelated work has solved similar problems, and determine which existing technical solutionsare feasible given the high-level requirements we set. This analysis leads to thedefinition of LOW-LEVEL REQUIREMENTS. These requirements are divided into twogroups. The first group specifies what data structure can represent video material forthe purpose of documentary generation. The second group determines the characteristicsof a processthat is capable of generating documentaries according to the high-levelrequirements.In more detail, the first group of requirements concerning the annotations specifythat the video material should be segmented into discrete units called clips. The descriptionof the clips should capture connotative as well as denotative aspects of thevideo material, using property-based annotations and a controlled vocabulary. Argumentscontained in interviews and based on logos should be encoded by an argumentmodel, the model of Toulmin. Arguments based on pathos and ethos should be evaluatedusing a cognitive model, the OCC model. In addition to the OCC model, filmtheory provides another method to evaluate pathos, based on the cinematic propertiesof the clip, namely gaze direction and framing distance. The second group ofrequirements specify that the generation process should dynamically create, using theannotations, a data structure (the Semantic Graph) that provides information about theargumentation relations (SUPPORTS and CONTRADICTS) among media items in the repository.Furthermore, based on argumentation theory the requirements define a means ofcomposing arguments from single statements, such as rebuttals and undercutters, andspecify that the categorical form should be used as the presentation form at the macrolevel.Chapter 4Guided by the high-level requirements and the first group of low-level requirements,we examine the content of video to determine the characteristics of the information weneed to model. Based on this analysis, we specify an annotation schema capable ofencoding the rhetorical form and the categorical form, and the cinematic properties ofvideo to support automatic editing. The definition of this annotation schema providesan answer to Research Question Annotation Schema [2].In more detail, two components of the rhetorical form are modeled, namely argumentsand positions. Arguments based on logos are encoded by modeling verbalinformation contained in the auditory and visual channel. The arguments are modeledusing three-part sentence-like descriptions of what an interviewee says, called statements,a thesaurus for the controlled vocabulary of terms used in the statements and themodel of Toulmin for the role each statement plays in an argument. Arguments basedon pathos are modeled using non-verbal information contained in the visual channel,by modeling the clip cinematic properties framing distance and gaze direction. Ethosis modeled based on the OCC model, by using verbal and non-verbal information todetermine social categories an interviewee belongs to, such as gender, race, educationlevel, and a user profile that values how important these categories are for the viewer.Positions are modeled as a subject and the interviewee’s attitude with respect to thatsubject, e.g. "war in Afghanistan - For". Further we define the categories to supportthe categorical for

References

Page 1

	Year	Citations

Page 1