Publications

SO-Bench: A Structural Output Evaluation of Multimodal LLMs

Di Feng, Kaixin Ma, Feng Nan, Haofeng Chen, Bohan Zhai, David Griffiths, Mingfei Gao, Zhe Gan, Eshan Verma∗, Yinfei Yang, Zhifeng Chen, Afshin Dehghan

CVPR 2026

We conduct a comprehensive study of visual structural output capabilities for MLLMs with SO-Bench, covering four visual domains with over 6.5K JSON schemas and 1.8K curated image-schema pairs, revealing persistent gaps in schema-compliant outputs.

MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs

Erik Daxberger, Nina Wenzel, David Griffiths, Haiming Gang, Justin Lazarow, Gefen Kohavi, Kai Kang, Marcin Eichner, Yinfei Yang, Afshin Dehghan, Peter Grasch

ICCV 2025

Building on the Cubify Anything CA-1M, we generate VQA question pairs using an automated pipeline. We show reformatting high-quality 3D data in this way allows us to achieve SoTA results on many 3D spatial reasoning benchmarks.

Cubify Anything: Scaling Indoor 3D Object Detection

Justin Lazarow, David Griffiths, Gefen Kohavi, Francisco Crespo, Afshin Dehghan

CVPR 2025

We scale 3D object detection to every object in indoor scenes. Our work demonstrates that as we scale to smaller objects, 3D inductive priors become less valuable and a fully-transformer architecture out-performs SOTA 3D networks.

4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

Roman Bachmann, Oğuzhan Fatih Kar, David Mizrahi, Ali Garjani, Mingfei Gao, David Griffiths, Jiaming Hu, Afshin Dehghan, Amir Zamir

NeurIPS 2024

A framework for training any-to-any multimodal foundation models. Scalable. Open-sourced. Across tens of modalities and tasks.

OutCast: Outdoor Single Image Relighting with Cast Shadows

David Griffiths, Tobias Ritschel, Julien Philip

EuroGraphics 2022

We address the problem of single image relighting. Our work shows monocular depth estimators can provide sufficient geometry when combined with our novel 3D shadow map prediction module.

Curiosity-driven 3D Object Detection without Labels

David Griffiths, Jan Boehm, Tobias Ritschel

International Conference on 3D Vision (3DV) 2021

A novel method for self-supervised monocular 3D object detection. This is achieved through differentiable rendering and a GAN-like critic loss.

Semantic Segmentation of Terrestrial LIDAR Data Using Co-Registered RGB Data

Erick Sanchez, David Griffiths, Jan Boehm

Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci 2021

A pipeline which demonstrates that Terrestrial Laser Scanning (TLS) 3D data can be automatically labelled using off-the-shelf 2D semantic segmentation networks. With only a simple projection of a panoramic image, strong results can be generated with no additional training.

Finding Your (3D) Center: 3D Object Detection using a Leant Loss

David Griffiths, Jan Boehm, Tobias Ritschel

European Conference on Computer Vision (ECCV) 2020

We present a novel weakly-supervised approach for 3D object detection. Our method can be trained on upto 95% less labeled data and still benefits from unlabeled data.

Improving Public Data for Building Segmentation from Convolutional Neural Networks for Fused Airborne Lidar and image data using active contours

David Griffiths, Jan Boehm

ISPRS Journal of Photogrammetry and Remote Sensing 2019

Manually labelling buildings for segmentation is a time consuming task. We show that readily available GIS mapping data can be used as training data. We develop a novel pipeline which uses Active Contours to improve coarse polygons into fine per-pixel label maps.

SynthCity: A Large Scale Synthetic Point Cloud

David Griffiths, Jan Boehm

arXiv preprint 2019

We release a synthetic Mobile Laser Scanning (MLS) point cloud named SynthCity. Every point has a per-class and per-instance classification, along with colour, return intensity, end-of-line indicator and time.

A Review on Deep Learning Techniques for 3D Sensed Data Classification

David Griffiths, Jan Boehm

Remote Sensing 2019

A comprehensive review paper on deep learning for 3D sensed data classification.

Weighted point cloud augmentation for neural network training data class-imbalance

David Griffiths, Jan Boehm

Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci 2019

A key issue when training deep neural networks for outdoor point clouds is the inevitable large data imbalance. For example, a typical street scene will contain orders of magnitudes more ground points than street furniture. We develop a novel solution to apply a weighted augmentation to reduce the class-imbalance.

Comparison of pre-and self-calibrated camera calibration models for UAS-derived nadir imagery for a SfM application

David Griffiths, Helene Burningham

Progress in Physical Geography: Earth and Environment 2018

Linear topologies can be challenging terrains for SfM pipelines. A key source of error is caused by intrinsic camera distortions. We demonstrate through effective camera pre-calibration, distortions can be significantly reduced.

Rapid object detection systems, utilising deep learning and unmanned aerial systems (UAS) for civil engineering applications

David Griffiths, Jan Boehm

Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci 2018

An experimental assessment addressing the ability to train a deep CNN-based object detector (RetinaNet / Faster R-CNN on a low quantity of training data. Specifically in the context of repetitive features (railway track).