About

I’m a Senior Applied Scientist at Apple where I work on Apple Intelligence, Visual Intelligence, and contribute to the Apple Foundation Model (AFM). My work centers on multimodal foundation models and their deployment in real-world applications, with expertise in 3D and multimodal scene understanding. I build advanced AI systems that integrate vision, language, and spatial reasoning.

Previously, I addressed the challenge of limited labeled 3D data by developing weakly and self-supervised learning pipelines for images, point clouds, and geometric meshes, resulting in several publications. Before joining Apple, I was a Senior Research Scientist at Fujitsu Research of Europe.

I hold a Ph.D. in 3D computer vision from UCL, London, supervised by Prof. Jan Boehm, with close collaboration with Prof. Tobias Ritschel. In summer 2021, I interned at Adobe in the Creative Intelligence Lab, working on geometrically-driven single-image relighting under Dr. Julien Philip.

Download CV

 

Publications

MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs

Erik Daxberger, Nina Wenzel, David Griffiths, Haiming Gang, Justin Lazarow, Gefen Kohavi, Kai Kang, Marcin Eichner, Yinfei Yang, Afshin Dehghan, Peter Grasch

ICCV 2025

Building on the Cubify Anything CA-1M, we generate VQA question pairs using an automated pipeline. We show reformatting high-quality 3D data in this way allows us to achieve SoTA results on many 3D spatial reasoning benchmarks.

Cubify Anything: Scaling Indoor 3D Object Detection

Justin Lazarow, David Griffiths, Gefen Kohavi, Francisco Crespo, Afshin Dehghan

CVPR 2025

We scale 3D object detection to every object in indoor scenes. Our work demonstrates that as we scale to smaller objects, 3D inductive priors become less valuable and a fully-transformer architecture out-performs SOTA 3D networks.

4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

Roman Bachmann, Oğuzhan Fatih Kar, David Mizrahi, Ali Garjani, Mingfei Gao, David Griffiths, Jiaming Hu, Afshin Dehghan, Amir Zamir

NeurIPS 2024

A framework for training any-to-any multimodal foundation models. Scalable. Open-sourced. Across tens of modalities and tasks.

OutCast: Outdoor Single Image Relighting with Cast Shadows

David Griffiths, Tobias Ritschel, Julien Philip

EuroGraphics 2022

We address the problem of single image relighting. Our work shows monocular depth estimators can provide sufficient geometry when combined with our novel 3D shadow map prediction module.

Curiosity-driven 3D Object Detection without Labels

David Griffiths, Jan Boehm, Tobias Ritschel

International Conference on 3D Vision (3DV) 2021

A novel method for self-supervised monocular 3D object detection. This is achieved through differentiable rendering and a GAN-like critic loss.

Semantic Segmentation of Terrestrial LIDAR Data Using Co-Registered RGB Data

Erick Sanchez, David Griffiths, Jan Boehm

Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci 2021

A pipeline which demonstrates that Terrestrial Laser Scanning (TLS) 3D data can be automatically labelled using off-the-shelf 2D semantic segmentation networks. With only a simple projection of a panoramic image, strong results can be generated with no additional training.

Finding Your (3D) Center: 3D Object Detection using a Leant Loss

David Griffiths, Jan Boehm, Tobias Ritschel

European Conference on Computer Vision (ECCV) 2020

We present a novel weakly-supervised approach for 3D object detection. Our method can be trained on upto 95% less labeled data and still benefits from unlabeled data.

SynthCity: A Large Scale Synthetic Point Cloud

David Griffiths, Jan Boehm

arXiv preprint 2019

We release a synthetic Mobile Laser Scanning (MLS) point cloud named SynthCity. Every point has a per-class and per-instance classification, along with colour, return intensity, end-of-line indicator and time.

Weighted point cloud augmentation for neural network training data class-imbalance

David Griffiths, Jan Boehm

Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci 2019

A key issue when training deep neural networks for outdoor point clouds is the inevitable large data imbalance. For example, a typical street scene will contain orders of magnitudes more ground points than street furniture. We develop a novel solution to apply a weighted augmentation to reduce the class-imbalance.

 

Patents

Computer-implemented method, data processing apparatus and computer program for object detection

David Griffiths

JP, US US20230298335A1 • Fujitsu Ltd • Filed: January 25, 2023

This patent presents a novel computer-implemented method for training object detection models without requiring manual annotations. The approach leverages embedding techniques and unsupervised learning to automatically identify and classify objects in images, significantly reducing the time and cost associated with traditional supervised training methods.

Directional Editing of Digital Images

David Griffiths, Julien Philip

US US11972512B2 • Adobe Inc. • Filed: January 25, 2022

This patent describes methods for directional editing of digital images using machine learning techniques. The system enables users to make targeted modifications to specific regions or aspects of images while preserving overall image quality and coherence through advanced neural network architectures.