<p dir="ltr">Medical imaging research is hampered by the scarcity of labeled data for rare pathologies and underserved populations, leading to biased or anatomically implausible synthetic samples from conventional generators. We introduce IJEPA-Diffusion-GAN, a hybrid framework that marries the semantic grounding of Image-based JointEmbedding Predictive Architecture (IJEPA) with the samplequality advantages of diffusion models and the speed of Generative Adversarial Networks. IJEPA embeddings, obtained from unlabeled MRI and chest-X-ray corpora, condition every stage of training, while a curriculum-based noise schedule first solidifies global anatomy and later refines fine details. A noise-level-aware discriminator stabilizes learning and, by fusing demographic metadata into the latent space, encourages more balanced synthesis across age, sex, and ethnicity.</p><p dir="ltr">Training in an IJEPA-derived latent space cuts memory requirements by 48% relative to pixel-space diffusion, making experimentation feasible on commodity GPUs. We evaluate the system on two public medicalimage benchmarks using Fréchet Inception Distance (FID), Inception Score (IS), Kernel Inception Distance (KID), and precision/recall. Although the hybrid approach narrowed the FID gap to state-of-the-art diffusion models for structurally simple classes, IS and precision reveal outstanding challenges in rendering highly heterogeneous lesions. These results highlight where semantic conditioning helps and where it falls short-guiding future architectural refinements. By embedding anatomyaware semantics into hybrid generative dynamics, IJEPA-Diffusion-GAN takes a step toward resource-efficient, bias-aware medical image synthesis that can ultimately broaden data access without compromising patient privacy.</p>