Looking behavior allows human to understand and interact with an enormous amount of information, a capacity challenging to replicate in AI systems. One of the core elements of this work is an effort to predict scan-paths from a combination of image information and past looking behavior. The success of this scan-path predication relies heavily on whether this image information can provide a sufficiently rich representation for prediction. In this paper, we show that changing representations dramatically simplifies and improves predictions of looking behavior. We introduce a representation of looking behavior that centers around interest-regions in images, defined by natural and collective looking behavior. These regions (called interest-based regions) can be used to partition images for semantic labeling and to provide a basis for shared representation across observers. Without any additional label or image information, we achieve highly accurate sequence prediction using this interest-based image representation.
關聯:
Communications in Information and Systems 23, p.245-262