Zhiding's Homepage
  • Home
  • Resume
  • Personal
  • more...

 

Zhiding Yu  (禹之鼎)

Principal Research Scientist & Research Lead
Learning & Perception Research Group, NVIDIA Research
2788 San Tomas Expressway, Santa Clara, CA 95051

Email: zhidingy AT nvidia.com
[GitHub]   [Google Scholar]   [Twitter]   [LinkedIn]

About Me

I am a principal research scientist and research lead at the Learning and Perception Research Group, NVIDIA Research. Before joining NVIDIA in 2018, I obtained Ph.D. in ECE from Carnegie Mellon University in 2017, and M.Phil. in ECE from The Hong Kong University of Science and Technology in 2012. I graduated with a bachelor's degree from the Union Class of Electrical Engineering (FENG Bingquan Pilot Class), South China University of Technology in 2008.

I am interested in building general autonomy and intelligence across both virtual and physical domains. My recent focus lies in Vision Transformers, LLMs, multimodal LLMs, and vision-language-action (VLA) models, with applications spanning open-world understanding, reasoning, AV/robot perception-planning, and agentic systems. I have led or contributed to numerous flagship research efforts and products at NVIDIA, including SegFormer (Most Influential NeurIPS Paper, Demo), VoxFormer, FB-BEV/FB-OCC, (CVPR23 3D Occ Pred Challenge winner, video), Hydra-MDP (CVPR24 E2E Driving Challenge winner, video), the Eagle VLM project, Nemotron, Llama-Nemotron-VL, and GR00T N1/GR00T N1.5 (NVIDIA’s foundation models for humanoid robots). I also participated in designing NVIDIA’s next-generation end-to-end autonomous driving system. My works are characterized by state-of-the-art performance, scalable architectures, and data-centric strategies towards real-world generalization.

Please refer to Google Scholar for the list of my latest publications.

Honors and Awards

  • Winner, CVPR24 Challenge on End-to-End Driving at Scale

  • 2nd Place, CVPR24 Challenge on Driving with Language

  • Winner, CVPR23 Challenge on 3D Occupancy Prediction

  • Winner, ECCV22 Robust Vision Challenge (RVC) on Semantic Segmentation

  • Winner, CVPR18 Autonomous Driving Challenge (WAD) on Domain Adaptation

  • 2nd Place, ICMI15 EmotiW Challenge on Static Facial Expression Recognition

  • Best Paper Award, BMVC 2020

  • Best Paper Award, WACV 2015

  • Best Student Paper Award, ISCSLP 2014

Work Experience

NVIDIA (Santa Clara, CA)
Principal Research Scientist & Research Lead
I conduct research in multimodal learning and intelligent data strategies. I lead the Eagle VLM project which develops a family of frontier vision-language models with public training/data recipes and state-of-the-art performance matching or outperforming existing top-tier VLMs. Our work has laid the core VLM foundation and data strategy behind several flagship NVIDIA products/projects, including Llama-Nemotron-VL, Nemo Retriever Multimodal Embedding, GR00T N1, and GR00T N1.5.


2018.01 - Present
 

Mitsubishi Electric Research Laboratories (Cambridge, MA)
Research Intern, Computer Vision Group
Proposed a SOTA deep learning framework for semantic edge detection

2016.07 - 2016.10

Microsoft Research (Redmond, WA)
Research Intern, Multimedia, Interaction, and Communication (MIC) Group
Worked on deep learning based facial expression recognition. Work integrated into the Azure Cognitive Services (Media Coverage)

2015.06 - 2015.08

Adobe Research (San Jose, CA)
Research Intern, Computer Vision Group
Worked on voice-based photo editing

2013.06 - 2013.08

Education Background

Carnegie Mellon University

Ph.D. in Electrical and Computer Engineering

2012 - 2017

The Hong Kong University of Science and Technology

M.Phil. in Electronic and Computer Engineering

2009 - 2012

South China University of Technology

B.Eng. in Information Engineering (Talented Student Program)

2005 - 2008

 

Thank you for visiting this site!

  • Home
  • Resume
  • Personal
✕