Publication | Closed Access
LayoutTransformer: Scene Layout Generation with Conceptual and Spatial Diversity
37
Citations
21
References
2021
Year
Unknown Venue
EngineeringMachine LearningText InputsSocial SciencesNatural Language ProcessingMultimodal LlmImage AnalysisVisual GroundingData SciencePattern RecognitionLayout GenerationVisual Question AnsweringMachine VisionDesignVision Language ModelDeep LearningComputer VisionArchitectural DesignSpatial DiversityScene InterpretationScene UnderstandingScene ModelingSpatial Information
When translating text inputs into layouts or images, existing works typically require explicit descriptions of each object in a scene, including their spatial information or the associated relationships. To better exploit the text input, so that implicit objects or relationships can be properly inferred during layout generation, we propose a LayoutTransformer Network (LT-Net) in this paper. Given a scene-graph input, our LT-Net uniquely encodes the semantic features for exploiting their co-occurrences and implicit relationships. This allows one to manipulate conceptually diverse yet plausible layout outputs. Moreover, the decoder of our LT-Net translates the encoded contextual features into bounding boxes with self-supervised relation consistency preserved. By fitting their distributions to Gaussian mixture models, spatially-diverse layouts can be additionally produced by LT-Net. We conduct extensive experiments on the datasets of MS-COCO and Visual Genome, and confirm the effectiveness and plausibility of our LT-Net over recent layout generation models. Codes will be released at LayoutTransformer.
| Year | Citations | |
|---|---|---|
Page 1
Page 1