Vibhav Vineet
Principal Researcher
Microsoft Research
Redmond, WA
Email: firstname[dot]lastname[at]microsoft[dot]com
Google Scholar ,  
DBLP
|
 |
Research
My research interests are in computer vision, machine learning, and human-AI interactions. My ongoing research is focused on the development of models that utilize multi-modal data to enhance AI systems' ability to perceive and reason about the real-world environments of human users. This advancement will ultimately help AI systems to seamlessly interact and collaborate with humans, accomplishing tasks within the real world efficiently.
Topics of interest. 1) Multi-modal robustness analysis and reasoning. 2) Video understanding and generation. 3) Synthetic data for computer vision and AI models. 4) Embodied AI.
If you are interested in research collaborations or doing research internship at MSR Redmond, please contact me.
Recent and Selected Publications
- Phi-4-reasoning technical report, ArXiv 2025
- Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead, ArXiv 2025
- HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding, CVPR 2025
- Unearthing Skill-Level Insights For Under Sstanding Trade-Offs Of Foundation Models, ICLR 2025
- DreamDistribution: Prompt Distribution Learning for Text-to-Image Diffusion Models, ICLR 2025
- MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation, ArXiv 2025
- Controllable Text-to-Image Generation with GPT-4, ArXiv 2023
- 3db: A framework for debugging computer vision models, NeurIPS 2022
- Neural-Sim: Learning to Generate Training Data with NeRF, ECCV 2022
- Benchmarking Spatial Reasoning Abilities of Text-to-Image Generative Models, ArXiv 2022
- DALL-E for Detection: Language-driven Context Image Synthesis for Object Detection, ArXiv 2022
- Inferring Articulated Rigid Body Dynamics from RGBD Video, IROS 2022
- AutoSimulate:(Quickly) Learning Synthetic Data Generation, ECCV 2020
- Playing for data: Ground truth from computer games, ECCV 2016
- Feature space optimization for semantic video segmentation, CVPR 2016
- Semanticpaint: Interactive 3d labeling and learning at your fingertips, ACM TOG 2015
- The semantic paintbrush: Interactive 3d mapping and recognition in large outdoor spaces, CHI 2015
Videos
Adapted from This page | Last updated: 04/13/2025