🧙 Vector Grimoire: Codebook-based shape Generation under Raster Image Supervision

Hasso Plattner Institute
ICML 2025

*Indicates Equal Contribution

Sun

Emoji

Spotlight

Water

Cancel

Eye

Map Pin

Christmas Ornament

Play

Heart

Mail

Bag

Settings

Cloud

Watch

Microphone


y in regular font

o in regular font

u in regular font

capital l in normal font

0 in bolditalic font

capital o in italic font

capital k in regular font

a in bold font

w in regular font

e in regular font

s in regular font

0 in cond regular font

m in italic font

e in semi - condensed extrabold font

Exclamation Mark (Icon)

Examples of icons and fonts generated with Grimoire without any shape context. The caption under each image is the conditioning text.
For these generations, in Fonts we used a lower resolution positional quantization compared to icons.

Abstract

Scalable Vector Graphics (SVG) is a popular format on the web and in the design industry. However, despite the great strides made in generative modeling, SVG has remained underexplored due to the discrete and complex nature of such data. We introduce GRIMOIRE, a text-guided SVG generative model that is comprised of two modules: A Visual Shape Quantizer (VSQ) learns to map raster images onto a discrete codebook by reconstructing them as vector shapes, and an Auto-Regressive Transformer (ART) models the joint probability distribution over shape tokens, positions, and textual descriptions, allowing us to generate vector graphics from natural language. Unlike existing models that require direct supervision from SVG data, GRIMOIRE learns shape image patches using only raster image supervision which opens up vector generative modeling to significantly more data. We demonstrate the effectiveness of our method by fitting GRIMOIRE for closed filled shapes on MNIST and for outline strokes on icon and font data, surpassing previous image-supervised methods in generative quality and the vector-supervised approach in flexibility.

Training of our VSQ module

Reconstruction quality

Ground Truth samples.

Reconstructed samples from our VSQ module.

Our ART module

Generation and Completion

Grimoire enables both generation from text and completion of partly drawn objects. In the latter case, one or multiple shapes drawn at a given position on a canvas can be encoded with the pre-trained VSQ module to obtain the closest code learned during the training stage. Finally, this conditioning code sequence, along with the original positions can be jointly provided to the auto-regressive model with the text descriptions. The rest of the decoding pipeline remains the same. An overview of the two approaches is illustrated above. We report a series of qualitative completion to show how the network predictions change or align with an increasing amount of conditioning shapes. Moreover, Grimoire could easily be extended to also perform fill-in-the-middle tasks.

BibTeX


        @inproceedings{CiprianoFeuerpfeilDeMelo2025VectorGrimoire,
          title={Vector Grimoire: Codebook-based Shape Generation under Raster Image Supervision},
          author={Marco Cipriano and Moritz Feuerpfeil and Gerard de Melo},
          booktitle={Proceedings of ICML 2025},
          year={2025},
        }