As part of my dissertation thesis, I've surveyed the applications of generative AI in type design. The reason for surveying options for how to exploit generative AI in type design is to understand how the new technology might influence font production workflow and the application of fonts by users.
Within the scope of the searched project between the years 2010 and 2024, I’ve found 80 projects [20] and identified four main applicable AI tasks that can be leveraged in type design. In this article, I present the survey methodology, application description and methods, aswell as references to the particular projects.
The application of AI in font technology can sound exciting and scary at the same time. A doom machine that threatens type designers to replace them. But, let's be horror scenarios for the science-fiction genre and look into the current reality of AI application in type design.
Methodology
The methodology of the survey consists of five steps
Explorative search
Reference search
Selection
Classification
Reasoning
Explorative search: For searching prospective projects, the four main resources were used:
To get search results, the following keyword strings were used:
deep learning vector graphics
deep learning font generation
vector font generation
Reference search: Later, the list of references mentioned in the articles was used as an additional resource for prospective articles.
Selection: For simplification, I first surveyed articles focusing on vector graphics and omitted articles that used bitmap image representations.
Classification: To create the four categories, I've exploited the existing prevalent taxonomy of generative AI and repeatedly mentioned terms in the reviewed articles. For instance, latent space interpolation and few-shot generation are common terms used in generative AI to describe AI tasks.
Reasoning: The reasoning for the application of generative methods in production was deduced from the tasks of the models' inference.
Interpolation
Every type designer who stumbled upon a variable font 1 project has faced a frequent issue. Some parameters of variable font masters 2 don’t match, which prevents the calculation of the required instances in the design space 3. Unlike variable font interpolation, the interpolation methods of AI don’t necessarily face the issue and don’t even require masters to match all the properties.
Variable font interpolation
In type design, the interpolation is used to populate intermediate font styles between two or multiple font style masters. Key components for interpolation are:
Axes that represent a specific design variation, e.g. weight, width, or slant
Masters that represent extreme design along one or more axes, e.g. lightest and heaviest on the weight axis
Instances that represent a specific style that has been populated by calculation of axes position and masters.
The interpolation process calculates intermediate positions Pi of glyph
outlines control points P with
coordinates (x, y) at
a given value v along an axis.
The linear interpolation 4 formula commonly used in vector
graphics is Pi(v) = Pmin + v ⋅ (Pmax − Pmin)5 where:
Pmin
is the position of the control point in the master at the minimum value
of the axis.
Pmax
is the position of the control point in the master at the maximum value
of the axis.
v is the normalised value
along the axis ranging from 0 to 1.
Font interpolation with AI
Unlike variable font interpolation, the AI model doesn’t calculate the position of control points between two font masters directly but leverages the representation of fonts in a so-called latent space. The objects within a latent space (in our case, the fonts) are encoded as multi-dimensional vectors that can be interpolated. In other words, the latent space is like a mind that envisions what an interpolated shape between given fonts might look like.
For the latent space font, interpolation is usually used the same formula of linear interpolation Pi(v) = Pmin + v ⋅ (Pmax − Pmin). However, the common notation of the formula in machine learning is fip = (1 − λ) ⋅ f(a) + λ ⋅ f(b)6 where:
fiprepresents interpolated latent vector
f(a) and f(b) represent the two fonts
λ represents a parameter ranging from 0 to 1 that controls the interpolation level
Even though latent space interpolation allows the interpolation between shapes with various amounts of control points, the quality of results depends heavily on topological similarities. In other words, interpolating between radically different shapes (e.g. transitional antiqua and English roundhand script) will likely give birth to some truly horrific creatures of typographic nightmares.
Related works
Font-MF[3]
employs a Gaussian process latent variable model (GP-LVM) [13] to represent a manifold [9:157] of 46 fonts from
Google Fonts
library [1]. Manifold
(Latent Space) works here like a map of the fonts, and the interpolation
is performed by navigating through this map 7.
Hence, interpolation is not performed between two fonts through latent
space but directly within the latent space by traversing through it.
Font-MF [3]
method has the potential for type foundries to exploit their existing
fonts as a tool to explore new ideas. The interface of the manifold map
is arbitrary and can be replaced by commonly known sliders representing
the axes of their fonts. The limitation of the Font-MF system lies in
the requirement for the same topological structure of the fonts. That
means horrific typographic creatures are not allowed.
DeepSVG[5] is the first to use deep learning-based methods for vector graphics. DeepSVG has brought latent space operations to interpolate between two vector graphics in order to provide vector animations between two SVG images. The system leverages the autoencoder [9:499] technique, which involves two components. First, the encoder works like a memory of the system that has a memorised representation of all the seen fonts. The encoder is tasked to give two latent vectors of the two given fonts that represent the recollection of the two fonts. The two vectors are linearly interpolated, and values are forwarded to the second network—decoder. In the end, the decoder generates a new font—the interpolated instance of the master fonts. Latent vectors f(a) and f(b) represent represent hierarchical SVG command structures. Since DeepSVG comes with a robust animation system, it can be used not only for the generation of ideas but also to visualise the ideas as animations between given fonts.
DeepVecFont[29] inspired by the success of DeepSVG [5] uses a similar technique. Additionally, DeepVecFont latent space memorises both the control points (the sequential representation) and raster images of the glyphs (the raster image representation) 8. The interpolated latent space position is fed to decoders and later refined by a neural differentiable rasteriser that refines the initially synthesised glyphs to match the target style more closely. Latent vectors f(a) and f(b) represent font styles and structural characteristics. Unlike DeepSVG [5], which generates SVGs, DeepVecFont is capable of generating complete glyphs, which makes it more suitable for prototyping new font ideas.
SVGformer[4] uses a transformer-based model with geometric self-attention to capture both semantic and geometric relationships in SVG data. Latent vectors f(a) and f(b) capture continuous geometric and semantic features of SVGs.
VecFontSDF[31], instead of encoding curves to latent space directly, uses raster images that are used to calculate the signed distance function (SDF) of parabolic curves for each glyph. Then, the parabolic curves are converted into Bézier curves to create the final vector glyphs. Although the system generates appealing shapes, the lost original Bézier drawings are costly, making the system unsuitable for font production.
DeepVecFont-v2[30], DualVector[16], and VecFusion[27] are the descendants of the dual-modality idea of DeepVecFont [29]. The image features and sequence features are likewise combined into a unified latent space representation. After the interpolation, the interpolated feature is fed to both image and sequence decoders and refined. Since all three systems use a robust system leveraging the type design tool FontForge, the implementation of its interpolation techniques into the type design process is more straightforward than in the previous systems.
Interpolation Summary
Font interpolation with an AI model can be an excellent facility for a type designer. Such a tool helps to come up with new fonts out of the already existing designs. Hence, new ways can be opened for creating ideas for fonts—new fonts where, in a word, they seem to resemble the old fonts—and one would come up with very different font designs compared to their forerunners. The most promising design can be selected by a type designer for further development. This method can be beneficial, especially in commissioned projects, where client briefs are closely related to existing typeface styles.
Font Completion
Extending the font family to other languages is a common task for a big brand to localise its visual communication. One machine learning attempt aims to help with this task. Font completion, also known as few-shot font generation or font style transfer, aims to complete a whole font alphabet using only a few reference glyphs.
Font completion methods
The process of font completion methods typically uses encoder-decoder architectures with additional refinement stages and auxiliary modules.
During the training phase:
Encoders are trained to extract glyph shapes and styles from data and encode them into latent representations. These representations capture the essential style and structural information of the glyphs.
Decoders are trained to generate the whole font by identifying the style from the latent representations derived from reference glyphs. They use these representations to produce glyphs that maintain the style and structure of the reference set.
During the inference phase:
Encoders extract style features from reference glyphs and encode them into latent representations. These latent representations encapsulate the style information needed for generating new glyphs.
Decoders aggregate the latent representations from the encoders and use them to generate the entire font set in the style of the reference glyphs.
Specific details and exceptions are explained in the related works section.
Related works
In our table related to a font completion task have been found since 2010, starting with morphable template models
[25] that complete fonts from users’ skeleton sketches. It is important to
mention the repository aggregating the few-shot font generation project created by Song Park [22].
[11] proposed an end-to-end font style transfer system that generates
large-scale Chinese font with only a small number of training samples called a few-shot font training stacked
conditional GAN model to generate a set of multi-content glyph images following a consistent style from a few input
samples, which is one of the few. [6] focused on compositional
scripts and proposed a font generation framework that employed memory components and global-context awareness in the
generator to take advantage of the compositionality. [2] proposed a framework that introduced deep
metric learning to style encoders. [14] proposed a GAN based on a translator
that can generate fonts by observing only a few samples from other languages. [21]
proposes a model using the components of the initial, middle, and final components of Hangul. [23]
employed attention system so-called local experts to extract multiple style features not explicitly conditioned on
component labels. [24] introduced the deep
implicit representation of the fonts, which is different from any of the other projects since their data
representation remained in the raster. However, the generation ended with raster images. [26]
proposed an approach by learning fine-grained local styles from references and the spatial correspondence between
the content and reference glyphs. [18] proposed a model that simultaneously
learns to separate font styles in the embedding space where distances directly correspond to a measure of font
similarity and translates input images into the given observed or unobserved font style. [15] proposed a self-supervised cross-modality pre-training strategy and a
cross-modality transformer-based encoder that is conditioned jointly on the glyph image and the corresponding stroke
labels. [10] proposed a cross-lingual font generator based on AGIS-Net [8].
The above-mentioned attempts remain in the raster domain, which is not considered suitable for the font industry application 9.
FontCycle[32] leverages graph-based representation to extract style features from
raster images and graph transformer networks to generate complete fonts from a few image samples. Although the
graph-based representation seems to be a prospective approach to encoding spatial information of glyph shapes, the
absence of original Bézier curve drawings is a limitation, which makes the system lag behind the other approaches.
DeepVecFont[29] works with
dual-modality representation that efficiently exploits advances of raster image modality domain and vector outlines
suitable for type design. These features make DeepVecFont a hot candidate for font type designers, helping them
complete their fonts from initially provided few glyphs.
VecFontSDF[31] calculates quadratic
Bézier curves from raster image input by exploiting the signed distance function. Similarly to FontCycle [32], the VecFontSDF [31] system drawback from the industry implementation is that it lacks real vector drawing inputs.
DeepVecFont-v2[30] as a descendant of
capable DeepVecFont [29] improved the
limitation of sequential encoding of a variable number of drawing commands by introducing transformers instead of
original LSTM. The system is capable of exceptional results, which makes it a good candidate for font completion of
complex fonts.
DualVector[16], similar to DeepVecFont [29] and DeepVecFont-v2 [30], leverages both raster and sequential encoding. However, the input uses solely raster images of glyphs, which is again unfortunate, as original drawings aren’t used.
VecFusion[27] leverages recent advances in diffusion models to encode raster images and control point fields. However, the loss of vector drawings as input is again drawn back from use in the type design industry.
Font Completion Summary
Few-shot font generation, or font completion, is considered one of the most promising areas in professional type design industries. Type designers can thus benefit significantly from font completion by envisioning a view of the whole font while actually drawing the first few letters to predict the final result from an initial sketch.
The technique is incredibly tempting to CJK-type designers. Since CJK fonts contain thousands of characters, finishing a font in Simplified Chinese alone, including some 5,000 characters, is such an enormous task that it engages huge teams to complete just one style. Perhaps this huge workload explains the many projects aimed at pushing the boundaries of this machine learning undertaking further ahead in CJK countries.
Multimodal Generation
A multimodal or cross-modal generation represents a domain that aims to generate its outputs from different modalities of their inputs. For instance text-to-image, text-to-speech, image-to-text etc. In the font domain, it could be text-to-font, image-to-font.
[7] ’s proposed framework, based on GANs, enables the generated fonts to reflect human emotional information distinctive from scanned faces. This approach is considered as image-to-font as the input is scanned images, and the output is generated fonts.
[12] aimed to understand correlations between impressions andfont styles through a shared latent space in which the font and its impressions are embedded nearby. This approach is considered text-to-font as the input is impression defined in texts, and the output is generated fonts.
[17] aimed to generate fonts with specific impressions with a font dataset with impression labels. Similar to previous work, it is considered as text-to-font as the input is impression defined in texts, and the output is generated fonts.
[28] tried to analyse the given impression of fonts by training the Transformer network. They didn’t generate fonts. This approach is considered font-to-text as the input is a font, and the output generates text describing the impression.
Conclusions
The survey reveals earnest attempts to integrate AI into type design.
Latent space interpolation has the potential to help designers discover styles between existing fonts. Font completion can be used to envision final fonts from early drawings or help finalise large scripts like CJK fonts. Multimodal generation, on the other hand, can be a new way of using fonts by their end users, such as graphic designers, brand designers, or editorial designers.
Yet, AI in type design is nascent and needs further development. Its future influence on business is promising: the automation of tedium, production at faster rates, and innovative designs may become business norms.
The fears that doomsday accounts have created about AI are groundless. The goal is not to replace the type designers but to improve the workflow.
It follows that designers need not fear the redundancy brought about by AI but rather view it as their trusted sidekick in the creative arsenal.
Haruka Aoki, Koki Tsubota, Hikaru Ikuta, and Kiyoharu Aizawa. 2021.
DML: Few-Shot Font Generation with Deep Metric Learning. In 2020
25th International Conference on Pattern Recognition (ICPR),
January 2021. 8539–8546. https://doi.org/10.1109/ICPR48806.2021.9412254
[3]
Neill D. F. Campbell and Jan Kautz. 2014. Learning a manifold of fonts. ACM
Trans. Graph. 33, 4 (July 2014), 91:1–91:11. https://doi.org/10.1145/2601097.2601212
Alexandre Carlier, Martin Danelljan, Alexandre Alahi, and Radu Timofte. 2020.
DeepSVG: A Hierarchical Generative Network for Vector Graphics
Animation. https://doi.org/10.48550/arXiv.2007.11301
[6]
Junbum Cha, Sanghyuk Chun, Gayoung Lee, Bado Lee, Seonghyeon Kim, and Hwalsuk Lee.
2020. DM-Font: Few-Shot Compositional Font Generation with Dual
Memory. In Computer Vision – ECCV 2020 (Lecture
Notes in Computer Science), 2020. Springer International Publishing,
Cham, 735–751. https://doi.org/10.1007/978-3-030-58529-7_43
[7]
Lu Chen, Feifei Lee, Hanqing Chen, Wei Yao, Jiawei Cai, and Qiu Chen. 2020.
Automatic Chinese Font Generation System Reflecting Emotions Based on Generative
Adversarial Network. Applied Sciences 10, 17, 17 (January 2020), 5976. https://doi.org/10.3390/app10175976
Yue Jiang, Zhouhui Lian, Yingmin Tang, and Jianguo Xiao. 2017.
DCFont: An end-to-end deep chinese font generation system. In SIGGRAPH Asia
2017 Technical Briefs (SA ’17), November 27, 2017. Association
for Computing Machinery, New York, NY, USA, 1–4. https://doi.org/10.1145/3145749.3149440
[12]
Jihun Kang, Daichi Haraguchi, Seiya Matsuda, Akisato Kimura, and Seiichi Uchida.
2022. Shared Latent Space of Font Shapes and Their Noisy Impressions.
In MultiMedia Modeling (Lecture Notes in Computer
Science), 2022. Springer International Publishing, Cham, 146–157. https://doi.org/10.1007/978-3-030-98355-0_13
Ying-Tian Liu, Zhifei Zhang, Yuan-Chen Guo, Matthew Fisher, Zhaowen Wang, and
Song-Hai Zhang. 2023. DualVector: Unsupervised Vector Font Synthesis with
Dual-Part Representation. Retrieved May 20, 2023 from http://arxiv.org/abs/2305.10462
Ammar Ul Hassan Muhammad and Jaeyoung Choi. 2022. FontNet:
Closing the gap to font designer performance in font synthesis. https://doi.org/10.48550/arXiv.2205.06512
Jangkyoung Park, Ammar Ul Hassan Muhammad, and Jaeyoung Choi. 2021. CKFont:
Few-Shot Korean Font Generation based on Hangul Composability. KIPS Transactions on Software and Data
Engineering 10, 11 (November 2021), 473–482. https://doi.org/10.3745/KTSDE.2021.10.11.473
Song Park, Sanghyuk Chun, Junbum Cha, Bado Lee, and Hyunjung Shim. 2021.
MX-Font: Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts. In 2021
IEEE/CVF International Conference on Computer Vision
(ICCV), October 2021. IEEE, Montreal, QC, Canada, 13880–13889. https://doi.org/10.1109/ICCV48922.2021.01364
Rapee Suveeranont and Takeo Igarashi. 2010. Example-Based Automatic Font
Generation. In Smart Graphics (Lecture Notes in
Computer Science), 2010. Springer, Berlin, Heidelberg, 127–138. https://doi.org/10.1007/978-3-642-13544-6_12
[26]
Licheng Tang, Yiyang Cai, Jiaming Liu, Zhibin Hong, Mingming Gong, Minhu Fan,
Junyu Han, Jingtuo Liu, Errui Ding, and Jingdong Wang. 2022. FS-Font: Few-Shot Font
Generation by Learning Fine-Grained Local Styles. https://doi.org/10.48550/arXiv.2205.09965
[27]
Vikas Thamizharasan, Difan Liu, Shantanu Agarwal, Matthew Fisher, Michael Gharbi,
Oliver Wang, Alec Jacobson, and Evangelos Kalogerakis. 2023. VecFusion: Vector Font
Generation with Diffusion. Retrieved December 21, 2023 from http://arxiv.org/abs/2312.10540
Yuqing Wang, Yizhi Wang, Longhui Yu, Yuesheng Zhu, and Zhouhui Lian. 2023. DeepVecFont-v2: Exploiting Transformers to Synthesize Vector
Fonts with Higher Quality. https://doi.org/10.48550/arXiv.2303.14585
[31]
Zeqing Xia, Bojun Xiong, and Zhouhui Lian. 2023. VecFontSDF:
Learning to Reconstruct and Synthesize High-quality Vector
Fonts via Signed Distance Functions. https://doi.org/10.48550/arXiv.2303.12675
[32]
Ye Yuan, Wuyang Chen, Zhaowen Wang, Matthew Fisher, Zhifei Zhang, Zhangyang Wang,
and Hailin Jin. 2021. Font Completion and Manipulation by Cycling Between
Multi-Modality Representations. https://doi.org/10.48550/arXiv.2108.12965
End-notes
Variable fonts are an evolution of the OpenType font specification that enables many variations of a
typeface to be incorporated into a single file rather than having a separate font file for every width,
weight, or style.↩︎
Variable font masters represent extreme design along one or more axes, e.g. lightest and heaviest on the
weight axis↩︎
Design space is a set of axes that represent a specific design variation, e.g. weight, width, or slant.↩︎
What is it? What are the other interpolations? References↩︎
There are various notations of linear interpolation formulas. In this paper, two variants are presented:
the variable font interpolation formula commonly used in vector graphics and the latent space
interpolation formula commonly used in a mathematical context. Both variants represent the same values:
1. interpolated value; 2. two extremes on an interpolated axis; 3. parameter ranging from 0 to 1↩︎
There are various notations of linear interpolation formulas. In this paper, two variants are presented:
the variable font interpolation formula commonly used in vector graphics and the latent space
interpolation formula commonly used in a mathematical context. Both variants represent the same values:
1. interpolated value; 2. two extremes on an interpolated axis; 3. parameter ranging from 0 to 1↩︎
On the Font-MF project website, there is an interactive demo allowing users to traverse through the
manifold [19]. URL:
http://vecg.cs.ucl.ac.uk/Projects/projects_fonts/projects_fonts.html↩︎
What does it mean sequential representation and raster image representation? Point to the representation
article. Find references about representations.↩︎
Fonts are the domain of vector graphics. Therefore, no raster image generation approach is applicable to
font production. Unfortunately, the vast majority of researchers ignore this fact.↩︎
Acknowledgement
Special thanks to Matúš, aka Mat, that guy behind the LTTR/INK algorithm, for the technical review of the article.