Melika Ayoughi

Email  /  CV  /  LinkedIn  /  Twitter  /  Github

I am a PhD student at the University of Amsterdam, where I work on extracting knowledge graphs from videos. I'm supervised jointly by Pascal Mettes from VIS Lab and Paul Groth from INDE Lab.

I did an Artificial Intelligence master at the University of Amsterdam. I worked on my thesis at TomTom on object detection under high class imbalance. I took part in an internship at Dexter Energy Services working on weather nowcasting using satellite images.

profile photo

I'm interested in computer vision, and specifically multimodal data such as video, audio, text and graph. I would like to work on improving high-level understanding of videos such as object-centric learning and relationship detection and easing retrieval by storing information in a knowledge graph. I'm currenlty working on a project to learn better instance-level represetations using prior knowledge.

project image

Self-Contained Entity Discovery from Captioned Videos

Melika Ayoughi , Pascal Mettes, Paul Groth
under review of TOMM, 2022
paper / arxiv / code / slides /

This paper introduces the task of visual named entity discovery in videos without the need for task-specific supervision or task-specific external knowledge sources. Assigning specific names to entities (e.g. faces, scenes, or objects) in video frames is a long-standing challenge. Commonly, this problem is addressed as a supervised learning objective by manually annotating faces with entity labels. To bypass the annotation burden of this setup, several works have investigated the problem by utilizing external knowledge sources such as movie databases. While effective, such approaches do not work when task-specific knowledge sources are not provided and can only be applied to movies and TV series. In this work, we take the problem a step further and propose to discover entities in videos from videos and corresponding captions or subtitles. We introduce a three-stage method where we (i) create bipartite entity-name graphs from frame-caption pairs, (ii) find visual entity agreements, and (iii) refine the entity assignment through entity-level prototype construction. To tackle this new problem, we outline two new benchmarks SC-Friends and SC-BBT based on the Friends and Big Bang Theory TV series. Experiments on the benchmarks demonstrate the ability of our approach to discover which named entity belongs to which face or scene, with an accuracy close to a supervised oracle, just from the multimodal information present in videos. Additionally, our qualitative examples show the potential challenges of self-contained discovery of any visual entity for future work.

Extra Curricular
October 2020-Now: Organization team of the inclusive AI program
Recent Activity
January 2023: Our paper on "Self-Contained Entity Discovery from Captioned Videos" got accepted at TOMM journal
November 2022: Presented our work at the WiML at Neurips 2022
July 2022: Presented our work at the Vision and Sports Summer School
May 2022: Presented our work at the Netherlands Conference on Computer Vision
Nov 2022: Master Thesis Supervision
April 2022: Bachelor Thesis Supervision
November 2022: Teaching Assistant Applied Machine Learning at UvA
November 2021: Teaching Assistant Applied Machine Learning at UvA
November 2020: Teaching Assistant Applied Machine Learning at UvA
September 2019: Teaching Assistant Machine Learning 1 at UvA