VideoFlexTok: Flexible-Length Coarse-to-Fine Video Tokenization
arXiv 2026
A video tokenizer that represents videos as variable-length coarse-to-fine token sequences, enabling more efficient generative modeling. VideoFlexTok achieves comparable generation quality with a 5x smaller model and generates 10-second videos using 8x fewer tokens than 3D grid tokenizers.
















