Details
Paper ID 77
Medium

Categories

  • Image Captioning
  • Deep Learning

Abstract - Humans can prepare concise descriptions of pictures, focusing on what they find important. We demonstrate that automatic methods can do so too. We describe a system that can compute a score linking an image to a sentence. This score can be used to attach a descriptive sentence to a given image, or to obtain images that illustrate a given sentence. The score is obtained by comparing an estimate of meaning obtained from the image to one obtained from the sentence. Each estimate of meaning comes from a discriminative procedure that is learned using data. We evaluate on a novel dataset consisting of human-annotated images. While our underlying estimate of meaning is impoverished, it is sufficient to produce very good quantitative results, evaluated with a novel score that can account for synecdoche.

Paper - https://www.cs.cmu.edu/~afarhadi/papers/sentence.pdf

Dataset - https://vision.cs.uiuc.edu/pascal-sentences/