Vibhav Vineet

Principal Researcher
Microsoft Research
Redmond, WA
Email: firstname[dot]lastname[at]microsoft[dot]com

Research

My research interests are in computer vision, machine learning, and human-AI interactions. My ongoing research is focused on the development of models that utilize multi-modal data to enhance AI systems' ability to perceive and reason about the real-world environments of human users. This advancement will ultimately help AI systems to seamlessly interact and collaborate with humans, accomplishing tasks within the real world efficiently.
Topics of interest. 1) Multi-modal robustness analysis and reasoning. 2) Video understanding and generation. 3) Synthetic data for computer vision and AI models. 4) Embodied AI.

If you are interested in research collaborations or doing research internship at MSR Redmond, please contact me.

Recent and Selected Publications

Phi-4-reasoning technical report, ArXiv 2025
Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead, ArXiv 2025
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding, CVPR 2025
Unearthing Skill-Level Insights For Under Sstanding Trade-Offs Of Foundation Models, ICLR 2025
DreamDistribution: Prompt Distribution Learning for Text-to-Image Diffusion Models, ICLR 2025
MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation, ArXiv 2025
Controllable Text-to-Image Generation with GPT-4, ArXiv 2023
3db: A framework for debugging computer vision models, NeurIPS 2022
Neural-Sim: Learning to Generate Training Data with NeRF, ECCV 2022
Benchmarking Spatial Reasoning Abilities of Text-to-Image Generative Models, ArXiv 2022
DALL-E for Detection: Language-driven Context Image Synthesis for Object Detection, ArXiv 2022
Inferring Articulated Rigid Body Dynamics from RGBD Video, IROS 2022
AutoSimulate:(Quickly) Learning Synthetic Data Generation, ECCV 2020
Playing for data: Ground truth from computer games, ECCV 2016
Feature space optimization for semantic video segmentation, CVPR 2016
Semanticpaint: Interactive 3d labeling and learning at your fingertips, ACM TOG 2015
The semantic paintbrush: Interactive 3d mapping and recognition in large outdoor spaces, CHI 2015

Videos

Adapted from This page | Last updated: 04/13/2025

Vibhav Vineet

Research

Recent and Selected Publications

Videos

Code & Dataset