PixNerd: Pixel Neural Field Diffusion

05/08/2025

Daily Paper Cast

0:00

23:41

🤗 Upvotes: 33 | cs.CV

Authors:
Shuai Wang, Ziteng Gao, Chenhui Zhu, Weilin Huang, Limin Wang

Title:
PixNerd: Pixel Neural Field Diffusion

Arxiv:
http://arxiv.org/abs/2507.23268v2

Abstract:
The current success of diffusion transformers heavily depends on the compressed latent space shaped by the pre-trained variational autoencoder(VAE). However, this two-stage training paradigm inevitably introduces accumulated errors and decoding artifacts. To address the aforementioned problems, researchers return to pixel space at the cost of complicated cascade pipelines and increased token complexity. In contrast to their efforts, we propose to model the patch-wise decoding with neural field and present a single-scale, single-stage, efficient, end-to-end solution, coined as pixel neural field diffusion~(PixelNerd). Thanks to the efficient neural field representation in PixNerd, we directly achieved 2.15 FID on ImageNet $256\times256$ and 2.84 FID on ImageNet $512\times512$ without any complex cascade pipeline or VAE. We also extend our PixNerd framework to text-to-image applications. Our PixNerd-XXL/16 achieved a competitive 0.73 overall score on the GenEval benchmark and 80.9 overall score on the DPG benchmark.

D'autres épisodes de "Daily Paper Cast"

Plus d'épisodes

Découvrez le meilleur des podcasts sur l'application GetPodcast.

Abonnez-vous à tous vos podcasts préférés, écoutez les épisodes sans connexion internet et recevez des recommandations de podcasts passionnants.

PixNerd: Pixel Neural Field Diffusion

Daily Paper Cast

D'autres épisodes de "Daily Paper Cast"

Qwen-Image Technical Report

SitEmb-v1.5: Improved Context-Aware Dense Retrieval for Semantic Association and Long Story Comprehension

CellForge: Agentic Design of Virtual Cell Models

Beyond the Trade-off: Self-Supervised Reinforcement Learning for Reasoning Models' Instruction Following

Llama-3.1-FoundationAI-SecurityLLM-8B-Instruct Technical Report

Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models

Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training

PixNerd: Pixel Neural Field Diffusion

Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving

Phi-Ground Tech Report: Advancing Perception in GUI Grounding