About

I’m a Senior Applied Scientist at Apple where I work on Apple Intelligence, Visual Intelligence, and contribute to the Apple Foundation Model (AFM). My work centers on multimodal foundation models and their deployment in real-world applications, with expertise in 3D and multimodal scene understanding. I build advanced AI systems that integrate vision, language, and spatial reasoning.

Previously, I addressed the challenge of limited labeled 3D data by developing weakly and self-supervised learning pipelines for images, point clouds, and geometric meshes, resulting in several publications. Before joining Apple, I was a Senior Research Scientist at Fujitsu Research of Europe.

I hold a Ph.D. in 3D computer vision from UCL, London, supervised by Prof. Jan Boehm, with close collaboration with Prof. Tobias Ritschel. In summer 2021, I interned at Adobe in the Creative Intelligence Lab, working on geometrically-driven single-image relighting under Dr. Julien Philip.

Download CV

Publications

VideoFlexTok: Flexible-Length Coarse-to-Fine Video Tokenization

Andrei Atanov, Jesse Allardice, Roman Bachmann, Oguzhan Fatih Kar, Devon Hjelm, David Griffiths, Peter Fu, Afshin Dehghan, Amir Zamir

ICML 2026

A video tokenizer that represents videos as variable-length coarse-to-fine token sequences, enabling more efficient generative modeling. VideoFlexTok achieves comparable generation quality with a 5x smaller model and generates 10-second videos using 8x fewer tokens than 3D grid tokenizers.

SO-Bench: A Structural Output Evaluation of Multimodal LLMs

Di Feng, Kaixin Ma, Feng Nan, Haofeng Chen, Bohan Zhai, David Griffiths, Mingfei Gao, Zhe Gan, Eshan Verma, Yinfei Yang, Zhifeng Chen, Afshin Dehghan

CVPR 2026

We conduct a comprehensive study of visual structural output capabilities for MLLMs with SO-Bench, covering four visual domains with over 6.5K JSON schemas and 1.8K curated image-schema pairs, revealing persistent gaps in schema-compliant outputs.

MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs

Erik Daxberger, Nina Wenzel, David Griffiths, Haiming Gang, Justin Lazarow, Gefen Kohavi, Kai Kang, Marcin Eichner, Yinfei Yang, Afshin Dehghan, Peter Grasch

ICCV 2025

Building on the Cubify Anything CA-1M, we generate VQA question pairs using an automated pipeline. We show reformatting high-quality 3D data in this way allows us to achieve SoTA results on many 3D spatial reasoning benchmarks.

Cubify Anything: Scaling Indoor 3D Object Detection

Justin Lazarow, David Griffiths, Gefen Kohavi, Francisco Crespo, Afshin Dehghan

CVPR 2025

We scale 3D object detection to every object in indoor scenes. Our work demonstrates that as we scale to smaller objects, 3D inductive priors become less valuable and a fully-transformer architecture out-performs SOTA 3D networks.

4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

Roman Bachmann, Oğuzhan Fatih Kar, David Mizrahi, Ali Garjani, Mingfei Gao, David Griffiths, Jiaming Hu, Afshin Dehghan, Amir Zamir

NeurIPS 2024

A framework for training any-to-any multimodal foundation models. Scalable. Open-sourced. Across tens of modalities and tasks.

OutCast: Outdoor Single Image Relighting with Cast Shadows

David Griffiths, Tobias Ritschel, Julien Philip

EuroGraphics 2022

We address the problem of single image relighting. Our work shows monocular depth estimators can provide sufficient geometry when combined with our novel 3D shadow map prediction module.

Curiosity-driven 3D Object Detection without Labels

David Griffiths, Jan Boehm, Tobias Ritschel

International Conference on 3D Vision (3DV) 2021

A novel method for self-supervised monocular 3D object detection. This is achieved through differentiable rendering and a GAN-like critic loss.

Semantic Segmentation of Terrestrial LIDAR Data Using Co-Registered RGB Data

Erick Sanchez, David Griffiths, Jan Boehm

Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci 2021

A pipeline which demonstrates that Terrestrial Laser Scanning (TLS) 3D data can be automatically labelled using off-the-shelf 2D semantic segmentation networks. With only a simple projection of a panoramic image, strong results can be generated with no additional training.

Finding Your (3D) Center: 3D Object Detection using a Leant Loss

David Griffiths, Jan Boehm, Tobias Ritschel

European Conference on Computer Vision (ECCV) 2020

We present a novel weakly-supervised approach for 3D object detection. Our method can be trained on upto 95% less labeled data and still benefits from unlabeled data.

Improving Public Data for Building Segmentation from Convolutional Neural Networks for Fused Airborne Lidar and image data using active contours

David Griffiths, Jan Boehm

ISPRS Journal of Photogrammetry and Remote Sensing 2019

Manually labelling buildings for segmentation is a time consuming task. We show that readily available GIS mapping data can be used as training data. We develop a novel pipeline which uses Active Contours to improve coarse polygons into fine per-pixel label maps.

SynthCity: A Large Scale Synthetic Point Cloud

David Griffiths, Jan Boehm

arXiv preprint 2019

We release a synthetic Mobile Laser Scanning (MLS) point cloud named SynthCity. Every point has a per-class and per-instance classification, along with colour, return intensity, end-of-line indicator and time.

A Review on Deep Learning Techniques for 3D Sensed Data Classification

David Griffiths, Jan Boehm

Remote Sensing 2019

A comprehensive review paper on deep learning for 3D sensed data classification.

Weighted point cloud augmentation for neural network training data class-imbalance

David Griffiths, Jan Boehm

Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci 2019

A key issue when training deep neural networks for outdoor point clouds is the inevitable large data imbalance. For example, a typical street scene will contain orders of magnitudes more ground points than street furniture. We develop a novel solution to apply a weighted augmentation to reduce the class-imbalance.

Comparison of pre-and self-calibrated camera calibration models for UAS-derived nadir imagery for a SfM application

David Griffiths, Helene Burningham

Progress in Physical Geography: Earth and Environment 2018

Linear topologies can be challenging terrains for SfM pipelines. A key source of error is caused by intrinsic camera distortions. We demonstrate through effective camera pre-calibration, distortions can be significantly reduced.

Rapid object detection systems, utilising deep learning and unmanned aerial systems (UAS) for civil engineering applications

David Griffiths, Jan Boehm

Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci 2018

An experimental assessment addressing the ability to train a deep CNN-based object detector (RetinaNet / Faster R-CNN on a low quantity of training data. Specifically in the context of repetitive features (railway track).

Patents

Computer-implemented method, data processing apparatus and computer program for object detection

David Griffiths

JP, US US20230298335A1 • Fujitsu Ltd • Filed: January 25, 2023

This patent presents a novel computer-implemented method for training object detection models without requiring manual annotations. The approach leverages embedding techniques and unsupervised learning to automatically identify and classify objects in images, significantly reducing the time and cost associated with traditional supervised training methods.

Directional Editing of Digital Images

David Griffiths, Julien Philip

US US11972512B2 • Adobe Inc. • Filed: January 25, 2022

This patent describes methods for directional editing of digital images using machine learning techniques. The system enables users to make targeted modifications to specific regions or aspects of images while preserving overall image quality and coherence through advanced neural network architectures.