Identity Preservation and Content Control in Generative Image Customization
Generative image customization has emerged as a transformative paradigm for context-aware image editing, enabling the seamless integration of reference objects into novel scenes while preserving their identity. However, it faces critical challenges in realism, identity preservation, and artifact-free synthesis. Traditional composition-based methods require labor-intensive manual annotation and multi-stage processing for harmonization, geometry correction, and shadow generation. While recent advances in diffusion models enable self-supervised frameworks to address these tasks holistically, limitations persist in preserving fine-grained object identity and mitigating generative artifacts. These localized artifacts (e.g., distorted logos or textures) often persist in synthesized images, undermining fidelity. This thesis bridges these gaps through three interconnected contributions that advance the scalability, fidelity, and controllability of customized image editing.
First, we propose a novel unified framework (ObjectStitch) leveraging conditional diffusion models to automate image compositing. We construct a new training scheme and a data augmentation strategy to tackle this task without manual labeling. Our approach holistically transforms viewpoint, geometry, color, and shadows while preserving input characteristics via a novel content adapter. To further enhance the ability to maintain object identity across diverse contexts, we design a two-stage learning framework (IMPRINT) that decouples identity preservation from compositing. The first stage focuses on dense-representation learning, where view-invariant object embeddings are extracted from the reference; the second harmonization stage focuses on seamlessly integrating the object into backgrounds. In addition, a shape-guidance mechanism enables user-directed layout control.
To address the challenge of generative artifacts that widely exist in image synthesis, we present a reference-guided artifact refinement model (Refine-by-Align). Its two-stage framework—alignment and refinement—extracts regional features from reference images to repair artifacts in composited outputs. This model-agnostic solution enhances identity details and generalizes across customization, virtual try-on, and view synthesis tasks.
Together, the contributions form a cohesive pipeline: a self-supervised backbone for compositing, a decoupled framework for identity preservation, and a universal refiner for artifact correction. Extensive experiments and user studies validate our methods’ superiority in realism and faithfulness, establishing new benchmarks for personalized image editing.
Funding
III: Medium: Collaborative Research: Deep Generative Modeling for Urban and Archaeological Recovery
Directorate for Computer & Information Science & Engineering
Find out more...Elements: Data: U-Cube: A Cyberinfrastructure for Unified and Ubiquitous Urban Canopy Parameterization
Directorate for Computer & Information Science & Engineering
Find out more...History
Degree Type
- Doctor of Philosophy
Department
- Computer Science
Campus location
- West Lafayette