Vibhav Vineet
Principal Researcher
Microsoft Research
Redmond, WA
Email: firstname[dot]lastname[at]microsoft[dot]com
Google Scholar ,  
DBLP
|
 |
Research
My research interests are in computer vision, machine learning, and human-AI interactions. My ongoing research is focused on the development of models that utilize multi-modal data to enhance AI systems' ability to perceive and reason about the real-world environments of human users. This advancement will ultimately help AI systems to seamlessly interact and collaborate with humans, accomplishing tasks within the real world efficiently.
Topics of interest. 1) Multi-modal robustness analysis and reasoning. 2) Video understanding and generation. 3) Synthetic data for computer vision and AI models. 4) Embodied AI.
If you are interested in research collaborations or doing research internship at MSR Redmond, please contact me.
Recent and Selected Publications
-
Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead
Vidhisha Balachandran, Jingya Chen, Lingjiao Chen, Shivam Garg, Neel Joshi, Yash Lara, John Langford, Besmira Nushi, Vibhav Vineet, Yue Wu, Safoora Yousefi .
[Paper]
-
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding
Shehreen Azad, Vibhav Vineet, Yogesh Singh Rawat .
Conference on Computer Vision and Pattern Recognition (CVPR), 2025.
[Paper]
-
Unearthing Skill-Level Insights For Under Sstanding Trade-Offs Of Foundation Models
Mazda Moayeri, Vidhisha Balachandran, Varun Chandrasekaran, Safoora Yousefi, Thomas Fel, Soheil Feizi, Besmira Nushi, Neel Joshi, Vibhav Vineet .
The Thirteenth International Conference on Learning Representations (ICLR), 2025.
[Paper]
-
DreamDistribution: Prompt Distribution Learning for Text-to-Image Diffusion Models
Brian Nlong Zhao, Yuhang Xiao, Jiashu Xu, Xinyang Jiang, Yifan Yang, Dongsheng Li, Laurent Itti, Vibhav Vineet, Yunhao Ge .
The Thirteenth International Conference on Learning Representations (ICLR), 2025.
[Paper]
-
Controllable Text-to-Image Generation with GPT-4
Tianjun Zhang, Yi Zhang, Vibhav Vineet, Neel Joshi, Xin Wang .
arXiv, 2023.
[Paper]
-
3db: A framework for debugging computer vision models
Guillaume Leclerc, Hadi Salman, Andrew Ilyas, Sai Vemprala, Logan Engstrom, Vibhav Vineet, Kai Xiao, Pengchuan Zhang, Shibani Santurkar, Greg Yang, Ashish Kapoor, Aleksander Madry .
Advances in Neural Information Processing Systems (NeurIPS), 2022.
[Paper]
-
Neural-Sim: Learning to Generate Training Data with NeRF
Yunhao Ge, Harkirat Behl, Jiashu Xu, Suriya Gunasekar, Neel Joshi, Yale Song, Xin Wang, Laurent Itti, Vibhav Vineet .
European Conference on Computer Vision (ECCV), 2022.
[Paper]
-
Benchmarking spatial relationships in text-to-image generation
Tejas Gokhale, Hamid Palangi, Besmira Nushi, Vibhav Vineet, Eric Horvitz, Ece Kamar, Chitta Baral, Yezhou Yang .
arXiv, 2023.
[Paper]
-
Dall-e for detection: Language-driven context image synthesis for object detection
Yunhao Ge, Jiashu Xu, Brian Nlong Zhao, Laurent Itti, Vibhav Vineet .
arXiv, 2022.
[Paper]
-
Learning Articulated Rigid Body Dynamics Simulations From Video
Eric Heiden, Ziang Liu, Vibhav Vineet, Erwin Coumans, Gaurav Sukhatme .
ICLR2022 Workshop on the Elements of Reasoning: Objects, Structure and Causality, 2022.
[Paper]
-
AutoSimulate:(Quickly) Learning Synthetic Data Generation
Harkirat Singh Behl, Atılım Güneș Baydin, Ran Gal, Philip HS Torr, Vibhav Vineet.
European Conference on Computer Vision (ECCV), 2020.
[Paper]
-
Playing for data: Ground truth from computer games
Stephan R Richter, Vibhav Vineet, Stefan Roth, Vladlen Koltun.
European conference on computer vision (ECCV), 2016.
[Paper]
-
Feature space optimization for semantic video segmentation
Abhijit Kundu, Vibhav Vineet, Vladlen Koltun.
Conference on computer vision and pattern recognition (CVPR), 2016.
[Paper]
-
Semanticpaint: Interactive 3d labeling and learning at your fingertips
Julien Valentin, Vibhav Vineet, Ming-Ming Cheng, David Kim, Jamie Shotton, Pushmeet Kohli, Matthias Nießner, Antonio Criminisi, Shahram Izadi, Philip Torr.
ACM Transactions on Graphics (TOG), 2015.
[Paper]
-
The semantic paintbrush: Interactive 3d mapping and recognition in large outdoor spaces
Ondrej Miksik, Vibhav Vineet, Morten Lidegaard, Ram Prasaath, Matthias Nießner, Stuart Golodetz, Stephen L Hicks, Patrick Pérez, Shahram Izadi, Philip HS Torr.
ACM Conference on Human Factors in Computing Systems (CHI), 2015.
[Paper]
Adapted from This page | Last updated: 04/13/2025