
Sun
Sun
Emoji
Spotlight
Water
Cancel
Eye
Map Pin
Christmas Ornament
Play
Heart
Bag
Settings
Cloud
Watch
Microphone
y in regular font
o in regular font
u in regular font
capital l in normal font
0 in bolditalic font
capital o in italic font
capital k in regular font
a in bold font
w in regular font
e in regular font
s in regular font
0 in cond regular font
m in italic font
e in semi - condensed extrabold font
Exclamation Mark (Icon)
Scalable Vector Graphics (SVG) is a popular format on the web and in the design industry. However, despite the great strides made in generative modeling, SVG has remained underexplored due to the discrete and complex nature of such data. We introduce GRIMOIRE, a text-guided SVG generative model that is comprised of two modules: A Visual Shape Quantizer (VSQ) learns to map raster images onto a discrete codebook by reconstructing them as vector shapes, and an Auto-Regressive Transformer (ART) models the joint probability distribution over shape tokens, positions, and textual descriptions, allowing us to generate vector graphics from natural language. Unlike existing models that require direct supervision from SVG data, GRIMOIRE learns shape image patches using only raster image supervision which opens up vector generative modeling to significantly more data. We demonstrate the effectiveness of our method by fitting GRIMOIRE for closed filled shapes on MNIST and for outline strokes on icon and font data, surpassing previous image-supervised methods in generative quality and the vector-supervised approach in flexibility.
Grimoire enables both generation from text and completion of partly drawn objects. In the latter case, one or multiple shapes drawn at a given position on a canvas can be encoded with the pre-trained VSQ module to obtain the closest code learned during the training stage. Finally, this conditioning code sequence, along with the original positions can be jointly provided to the auto-regressive model with the text descriptions. The rest of the decoding pipeline remains the same. An overview of the two approaches is illustrated above. We report a series of qualitative completion to show how the network predictions change or align with an increasing amount of conditioning shapes. Moreover, Grimoire could easily be extended to also perform fill-in-the-middle tasks.
Phone
Dice
Eye
Mask
Check
User
Photo
Glasses
Arrow
Lock
Conversation
Alarm
Glass
Tooth
Shield
Bottle
Tape
Search
Sun
User
Document
Bell
Cube
Smile
Thermometer
Glass
Lock
Headphones
Calendar
Arrow
Sea
Anchor
folder
User
Apple
Sun
Anchor
Mouse
Star
Plus
Share
Mountain
Fingers
Church
Male
Hourglass
Avatar
Crown
@inproceedings{CiprianoFeuerpfeilDeMelo2025VectorGrimoire,
title={Vector Grimoire: Codebook-based Shape Generation under Raster Image Supervision},
author={Marco Cipriano and Moritz Feuerpfeil and Gerard de Melo},
booktitle={Proceedings of ICML 2025},
year={2025},
}