On the Latent Representations of Image Generative Models: A Technical, Artistic, Theoretical, and Art Historical Perspective

Ludovica Schaerf

The latent space in image generative models (IGMs) like DALL-E, Midjourney, and Stable Diffusion is an abstract, multidimensional space containing compressed feature values that encode the “recipe” for generating visual content. Studying this space offers insights into how cultural frameworks from training data are integrated through a machinic lens.
We hypothesize that the latent space possesses a dual nature: as a multidimensional space of potentiality and as a multidimensional archive of culture. Viewed as an abstract geometric space, it statistically fits cultural artifacts into a continuous, fluid environment – akin to Deleuze’s plane of immanence – where each point represents a virtuality that can be actualized into an image. Conversely, when considered as an ensemble of neurons – a code – it becomes a constructed repository of potential and actual cultural artifacts, memories, and knowledge, governed by specific rules of visual similarity: an archive.
While architectures like VAEs and GANs inherently form such latent spaces, newer models like diffusion probabilistic models (DPMs) lack a singular, compressed latent space. We studied DPMs to understand the effects of iterative denoising on the semanticity of their latent spaces, aiming to determine the ideal complexity and definition of the latent space in these models.
To exploit the cultural knowledge embedded in latent spaces, we adopted and extended the computer vision technique of disentanglement. This method uses images produced by latent space points to find traversal directions causing changes in specific factors of interest. The traversal direction serves as a vectorial representation of the factor within the model. We leveraged disentanglement to interpret the models’ representations of colors in textile patterns. Our observations indicate that GANs represent colors similarly to human perception, while DPMs do not. We further explored how colors are represented differently according to their domain and period – for instance, models trained on Indiennes textiles versus those trained on Bauhaus designs – analyzing these discrepancies.
Through a series of artistic installations, we reflected on the aesthetic nature of the latent space in conjunction with disentangled representations. In an installation inspired by Calvino’s Invisible Cities, we demonstrated the correspondence between Kublai Khan’s atlas of potential cities and a latent space projected along axes of memory, desire, signs, and more. Similarly, we constructed a model with a single latent space and two decoders – one trained on Western art and the other on artworks from Bogotá’s Museum of Modern Art (MAMBO). This approach showcased how a pseudo-geographic exploration of the latent space, coupled with parallel decoded images, can reveal the Western art bias inherent in these models.
Additionally, I authored a paper for the School of X introducing the latent space, its dual nature, and theoretical role, which will serve as the basis for the introduction to my doctoral thesis. Furthermore, I published a paper in VISART ECCV on using disentanglement to study the representations of colors in textiles, including technical progress in the field and reflections on computational creativity.