Visual Understanding with Fine-Grained Language and Complimentary Cues
Srishti Yadav (Ph.D. Student)
Visual data is informative, but they are also confusing, intentionally or unintentionally. Images that are visually similar, confusing, and manipulative to the human eye can benefit from the image pattern identification and associated description of these images at a fine-grained level. Understanding which token, words, phrases, or sentences evoke the best meaning, intention, and motivation of an image captured in real-life can have wide applications. Our research will attempt to understand this use of the objects and complimentary cues like motivation or feelings behind descriptions (as seen in the real world e.g. in news articles, video interviews with transcription, etc.) to find images that best match the fine-grained descriptions. These language-based heuristics, we contend, will not always result in an unequivocal interpretation of images, but will at least explain at what point and why interpretations differ.
Primary Host: | Serge Belongie (University of Copenhagen & Cornell University) |
Exchange Host: | Ekaterina Shutova (University of Amsterdam) |
PhD Duration: | 01 November 2022 - 31 October 2025 |
Exchange Duration: | 01 January 2025 - 30 June 2025 - Ongoing |