Use of generative AI in type design: The state-of-the-art Survey

As part of my dissertation thesis, I've surveyed the applications of generative AI in type design. The reason for surveying options for how to exploit generative AI in type design is to understand how the new technology might influence font production workflow and the application of fonts by users.

Within the scope of the searched project between the years 2010 and 2024, I’ve found 80 projects [20] and identified four main applicable AI tasks that can be leveraged in type design. In this article, I present the survey methodology, application description and methods, aswell as references to the particular projects.

The application of AI in font technology can sound exciting and scary at the same time. A doom machine that threatens type designers to replace them. But, let's be horror scenarios for the science-fiction genre and look into the current reality of AI application in type design.

Methodology

The methodology of the survey consists of five steps

Explorative search
Reference search
Selection
Classification
Reasoning

Explorative search: For searching prospective projects, the four main resources were used:

Google Scholar https://scholar.google.com/
Semantic Scholar https://www.semanticscholar.org/
Researchgate https://www.researchgate.net/
ArXiv.org https://arxiv.org/

To get search results, the following keyword strings were used:

deep learning vector graphics
deep learning font generation
vector font generation

Reference search: Later, the list of references mentioned in the articles was used as an additional resource for prospective articles.

Selection: For simplification, I first surveyed articles focusing on vector graphics and omitted articles that used bitmap image representations.

Classification: To create the four categories, I've exploited the existing prevalent taxonomy of generative AI and repeatedly mentioned terms in the reviewed articles. For instance, latent space interpolation and few-shot generation are common terms used in generative AI to describe AI tasks.

Reasoning: The reasoning for the application of generative methods in production was deduced from the tasks of the models' inference.

Interpolation

Every type designer who stumbled upon a variable font ¹ project has faced a frequent issue. Some parameters of variable font masters ² don’t match, which prevents the calculation of the required instances in the design space ³. Unlike variable font interpolation, the interpolation methods of AI don’t necessarily face the issue and don’t even require masters to match all the properties.

Variable font interpolation

In type design, the interpolation is used to populate intermediate font styles between two or multiple font style masters. Key components for interpolation are:

Axes that represent a specific design variation, e.g. weight, width, or slant
Masters that represent extreme design along one or more axes, e.g. lightest and heaviest on the weight axis
Instances that represent a specific style that has been populated by calculation of axes position and masters.

The interpolation process calculates intermediate positions P_i of glyph outlines control points P with coordinates (x, y) at a given value v along an axis. The linear interpolation ⁴ formula commonly used in vector graphics is P_i(v) = P_min + v ⋅ (P_max − P_min) ⁵ where:

P_min is the position of the control point in the master at the minimum value of the axis.
P_max is the position of the control point in the master at the maximum value of the axis.
v is the normalised value along the axis ranging from 0 to 1.

Font interpolation with AI

Unlike variable font interpolation, the AI model doesn’t calculate the position of control points between two font masters directly but leverages the representation of fonts in a so-called latent space. The objects within a latent space (in our case, the fonts) are encoded as multi-dimensional vectors that can be interpolated. In other words, the latent space is like a mind that envisions what an interpolated shape between given fonts might look like.

For the latent space font, interpolation is usually used the same formula of linear interpolation P_i(v) = P_min + v ⋅ (P_max − P_min). However, the common notation of the formula in machine learning is f_ip = (1 − λ) ⋅ f(a) + λ ⋅ f(b)⁶ where:

f_iprepresents interpolated latent vector
f(a) and f(b) represent the two fonts
λ represents a parameter ranging from 0 to 1 that controls the interpolation level

Even though latent space interpolation allows the interpolation between shapes with various amounts of control points, the quality of results depends heavily on topological similarities. In other words, interpolating between radically different shapes (e.g. transitional antiqua and English roundhand script) will likely give birth to some truly horrific creatures of typographic nightmares.

Related works

Font-MF [3] employs a Gaussian process latent variable model (GP-LVM) [13] to represent a manifold [9:157] of 46 fonts from Google Fonts library [1]. Manifold (Latent Space) works here like a map of the fonts, and the interpolation is performed by navigating through this map ⁷. Hence, interpolation is not performed between two fonts through latent space but directly within the latent space by traversing through it. Font-MF [3] method has the potential for type foundries to exploit their existing fonts as a tool to explore new ideas. The interface of the manifold map is arbitrary and can be replaced by commonly known sliders representing the axes of their fonts. The limitation of the Font-MF system lies in the requirement for the same topological structure of the fonts. That means horrific typographic creatures are not allowed.

A screenshot of the Font-MF web application demonstrates the map-like interface used to navigate through the manifold.

DeepSVG[5] is the first to use deep learning-based methods for vector graphics. DeepSVG has brought latent space operations to interpolate between two vector graphics in order to provide vector animations between two SVG images. The system leverages the autoencoder [9:499] technique, which involves two components. First, the encoder works like a memory of the system that has a memorised representation of all the seen fonts. The encoder is tasked to give two latent vectors of the two given fonts that represent the recollection of the two fonts. The two vectors are linearly interpolated, and values are forwarded to the second network—decoder. In the end, the decoder generates a new font—the interpolated instance of the master fonts. Latent vectors f(a) and f(b) represent represent hierarchical SVG command structures. Since DeepSVG comes with a robust animation system, it can be used not only for the generation of ideas but also to visualise the ideas as animations between given fonts.

DeepVecFont [29] inspired by the success of DeepSVG [5] uses a similar technique. Additionally, DeepVecFont latent space memorises both the control points (the sequential representation) and raster images of the glyphs (the raster image representation) ⁸. The interpolated latent space position is fed to decoders and later refined by a neural differentiable rasteriser that refines the initially synthesised glyphs to match the target style more closely. Latent vectors f(a) and f(b) represent font styles and structural characteristics. Unlike DeepSVG [5], which generates SVGs, DeepVecFont is capable of generating complete glyphs, which makes it more suitable for prototyping new font ideas.

SVGformer [4] uses a transformer-based model with geometric self-attention to capture both semantic and geometric relationships in SVG data. Latent vectors f(a) and f(b) capture continuous geometric and semantic features of SVGs.

VecFontSDF [31], instead of encoding curves to latent space directly, uses raster images that are used to calculate the signed distance function (SDF) of parabolic curves for each glyph. Then, the parabolic curves are converted into Bézier curves to create the final vector glyphs. Although the system generates appealing shapes, the lost original Bézier drawings are costly, making the system unsuitable for font production.

DeepVecFont-v2 [30], DualVector [16], and VecFusion [27] are the descendants of the dual-modality idea of DeepVecFont [29]. The image features and sequence features are likewise combined into a unified latent space representation. After the interpolation, the interpolated feature is fed to both image and sequence decoders and refined. Since all three systems use a robust system leveraging the type design tool FontForge, the implementation of its interpolation techniques into the type design process is more straightforward than in the previous systems.

Interpolation Summary

Font interpolation with an AI model can be an excellent facility for a type designer. Such a tool helps to come up with new fonts out of the already existing designs. Hence, new ways can be opened for creating ideas for fonts—new fonts where, in a word, they seem to resemble the old fonts—and one would come up with very different font designs compared to their forerunners. The most promising design can be selected by a type designer for further development. This method can be beneficial, especially in commissioned projects, where client briefs are closely related to existing typeface styles.

Font Completion

Extending the font family to other languages is a common task for a big brand to localise its visual communication. One machine learning attempt aims to help with this task. Font completion, also known as few-shot font generation or font style transfer, aims to complete a whole font alphabet using only a few reference glyphs.

‍

Illustrative image of the concept of font completion (few-shot font generation).

Font completion methods

The process of font completion methods typically uses encoder-decoder architectures with additional refinement stages and auxiliary modules.

During the training phase:

Encoders are trained to extract glyph shapes and styles from data and encode them into latent representations. These representations capture the essential style and structural information of the glyphs.
Decoders are trained to generate the whole font by identifying the style from the latent representations derived from reference glyphs. They use these representations to produce glyphs that maintain the style and structure of the reference set.

During the inference phase:

Encoders extract style features from reference glyphs and encode them into latent representations. These latent representations encapsulate the style information needed for generating new glyphs.
Decoders aggregate the latent representations from the encoders and use them to generate the entire font set in the style of the reference glyphs.

Specific details and exceptions are explained in the related works section.

Related works

In our table related to a font completion task have been found since 2010, starting with morphable template models [25] that complete fonts from users’ skeleton sketches. It is important to mention the repository aggregating the few-shot font generation project created by Song Park [22].

[11] proposed an end-to-end font style transfer system that generates large-scale Chinese font with only a small number of training samples called a few-shot font training stacked conditional GAN model to generate a set of multi-content glyph images following a consistent style from a few input samples, which is one of the few. [6] focused on compositional scripts and proposed a font generation framework that employed memory components and global-context awareness in the generator to take advantage of the compositionality. [2] proposed a framework that introduced deep metric learning to style encoders. [14] proposed a GAN based on a translator that can generate fonts by observing only a few samples from other languages. [21] proposes a model using the components of the initial, middle, and final components of Hangul. [23] employed attention system so-called local experts to extract multiple style features not explicitly conditioned on component labels. [24] introduced the deep implicit representation of the fonts, which is different from any of the other projects since their data representation remained in the raster. However, the generation ended with raster images. [26] proposed an approach by learning fine-grained local styles from references and the spatial correspondence between the content and reference glyphs. [18] proposed a model that simultaneously learns to separate font styles in the embedding space where distances directly correspond to a measure of font similarity and translates input images into the given observed or unobserved font style. [15] proposed a self-supervised cross-modality pre-training strategy and a cross-modality transformer-based encoder that is conditioned jointly on the glyph image and the corresponding stroke labels. [10] proposed a cross-lingual font generator based on AGIS-Net [8].

The above-mentioned attempts remain in the raster domain, which is not considered suitable for the font industry application ⁹.

FontCycle [32] leverages graph-based representation to extract style features from raster images and graph transformer networks to generate complete fonts from a few image samples. Although the graph-based representation seems to be a prospective approach to encoding spatial information of glyph shapes, the absence of original Bézier curve drawings is a limitation, which makes the system lag behind the other approaches.

DeepVecFont [29] works with dual-modality representation that efficiently exploits advances of raster image modality domain and vector outlines suitable for type design. These features make DeepVecFont a hot candidate for font type designers, helping them complete their fonts from initially provided few glyphs.

VecFontSDF [31] calculates quadratic Bézier curves from raster image input by exploiting the signed distance function. Similarly to FontCycle [32], the VecFontSDF [31] system drawback from the industry implementation is that it lacks real vector drawing inputs.

DeepVecFont-v2 [30] as a descendant of capable DeepVecFont [29] improved the limitation of sequential encoding of a variable number of drawing commands by introducing transformers instead of original LSTM. The system is capable of exceptional results, which makes it a good candidate for font completion of complex fonts.

DualVector [16], similar to DeepVecFont [29] and DeepVecFont-v2 [30], leverages both raster and sequential encoding. However, the input uses solely raster images of glyphs, which is again unfortunate, as original drawings aren’t used.

VecFusion [27] leverages recent advances in diffusion models to encode raster images and control point fields. However, the loss of vector drawings as input is again drawn back from use in the type design industry.

Font Completion Summary

Few-shot font generation, or font completion, is considered one of the most promising areas in professional type design industries. Type designers can thus benefit significantly from font completion by envisioning a view of the whole font while actually drawing the first few letters to predict the final result from an initial sketch.

The technique is incredibly tempting to CJK-type designers. Since CJK fonts contain thousands of characters, finishing a font in Simplified Chinese alone, including some 5,000 characters, is such an enormous task that it engages huge teams to complete just one style. Perhaps this huge workload explains the many projects aimed at pushing the boundaries of this machine learning undertaking further ahead in CJK countries.

Multimodal Generation

A multimodal or cross-modal generation represents a domain that aims to generate its outputs from different modalities of their inputs. For instance text-to-image, text-to-speech, image-to-text etc. In the font domain, it could be text-to-font, image-to-font.

[7] ’s proposed framework, based on GANs, enables the generated fonts to reflect human emotional information distinctive from scanned faces. This approach is considered as image-to-font as the input is scanned images, and the output is generated fonts.

[12] aimed to understand correlations between impressions andfont styles through a shared latent space in which the font and its impressions are embedded nearby. This approach is considered text-to-font as the input is impression defined in texts, and the output is generated fonts.

[17] aimed to generate fonts with specific impressions with a font dataset with impression labels. Similar to previous work, it is considered as text-to-font as the input is impression defined in texts, and the output is generated fonts.

[28] tried to analyse the given impression of fonts by training the Transformer network. They didn’t generate fonts. This approach is considered font-to-text as the input is a font, and the output generates text describing the impression.

Conclusions

The survey reveals earnest attempts to integrate AI into type design.

Latent space interpolation has the potential to help designers discover styles between existing fonts. Font completion can be used to envision final fonts from early drawings or help finalise large scripts like CJK fonts. Multimodal generation, on the other hand, can be a new way of using fonts by their end users, such as graphic designers, brand designers, or editorial designers.

Yet, AI in type design is nascent and needs further development. Its future influence on business is promising: the automation of tedium, production at faster rates, and innovative designs may become business norms.

The fears that doomsday accounts have created about AI are groundless. The goal is not to replace the type designers but to improve the workflow.

It follows that designers need not fear the redundancy brought about by AI but rather view it as their trusted sidekick in the creative arsenal.

References

[1]

Google Fonts. Retrieved January 26, 2023 from https://fonts.google.com/noto

[2]

Haruka Aoki, Koki Tsubota, Hikaru Ikuta, and Kiyoharu Aizawa. 2021. DML: Few-Shot Font Generation with Deep Metric Learning. In 2020 25th International Conference on Pattern Recognition (ICPR), January 2021. 8539–8546. https://doi.org/10.1109/ICPR48806.2021.9412254

[3]

Neill D. F. Campbell and Jan Kautz. 2014. Learning a manifold of fonts. ACM Trans. Graph. 33, 4 (July 2014), 91:1–91:11. https://doi.org/10.1145/2601097.2601212

[4]

Defu Cao, Zhaowen Wang, Jose Echevarria, and Yan Liu. 2023. SVGformer: Representation Learning for Continuous Vector Graphics Using Transformers. 2023. 10093–10102. Retrieved July 4, 2023 from https://openaccess.thecvf.com/content/CVPR2023/html/Cao_SVGformer_Representation_Learning_for_Continuous_Vector_Graphics_Using_Transformers_CVPR_2023_paper.html

[5]

Alexandre Carlier, Martin Danelljan, Alexandre Alahi, and Radu Timofte. 2020. DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation. https://doi.org/10.48550/arXiv.2007.11301

[6]

Junbum Cha, Sanghyuk Chun, Gayoung Lee, Bado Lee, Seonghyeon Kim, and Hwalsuk Lee. 2020. DM-Font: Few-Shot Compositional Font Generation with Dual Memory. In Computer Vision – ECCV 2020 (Lecture Notes in Computer Science), 2020. Springer International Publishing, Cham, 735–751. https://doi.org/10.1007/978-3-030-58529-7_43

[7]

Lu Chen, Feifei Lee, Hanqing Chen, Wei Yao, Jiawei Cai, and Qiu Chen. 2020. Automatic Chinese Font Generation System Reflecting Emotions Based on Generative Adversarial Network. Applied Sciences 10, 17, 17 (January 2020), 5976. https://doi.org/10.3390/app10175976

[8]

Yue Gao, Yuan Guo, Zhouhui Lian, Yingmin Tang, and Jianguo Xiao. 2019. AGIS-Net: Artistic glyph image synthesis via one-stage few-shot learning. ACM Trans. Graph. 38, 6 (November 2019), 185:1–185:12. https://doi.org/10.1145/3355089.3356574

[9]

Goodfellow, Ian, Bengio, Yoshua, and Courville, Aaron. Deep Learning. MIT Press. Retrieved July 22, 2024 from https://www.deeplearningbook.org/

[10]

Haoyang He, Xin Jin, and Angela Chen. 2022. Few-Shot Cross-Lingual Font Generator. https://doi.org/10.48550/arXiv.2212.02886

[11]

Yue Jiang, Zhouhui Lian, Yingmin Tang, and Jianguo Xiao. 2017. DCFont: An end-to-end deep chinese font generation system. In SIGGRAPH Asia 2017 Technical Briefs (SA ’17), November 27, 2017. Association for Computing Machinery, New York, NY, USA, 1–4. https://doi.org/10.1145/3145749.3149440

[12]

Jihun Kang, Daichi Haraguchi, Seiya Matsuda, Akisato Kimura, and Seiichi Uchida. 2022. Shared Latent Space of Font Shapes and Their Noisy Impressions. In MultiMedia Modeling (Lecture Notes in Computer Science), 2022. Springer International Publishing, Cham, 146–157. https://doi.org/10.1007/978-3-030-98355-0_13

[13]

Neil D. Lawrence. 2005. Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models. J. Mach. Learn. Res. (December 2005). Retrieved July 23, 2024 from https://www.semanticscholar.org/paper/Probabilistic-Non-linear-Principal-Component-with-Lawrence/8d60f04d9677f1a5c439b6b1cef41606bd0ca646

[14]

Chenhao Li, Yuta Taniguchi, Min Lu, and Shin’ichi Konomi. 2021. Few-Shot Font Style Transfer Between Different Languages. 2021. 433–442. Retrieved December 28, 2022 from https://openaccess.thecvf.com/content/WACV2021/html/Li_Few-Shot_Font_Style_Transfer_Between_Different_Languages_WACV_2021_paper.html

[15]

Wei Liu, Fangyue Liu, Fei Ding, Qian He, and Zili Yi. 2022. XMP-Font: Self-Supervised Cross-Modality Pre-Training for Few-Shot Font Generation. 2022. 7905–7914. Retrieved December 27, 2022 from https://openaccess.thecvf.com//content/CVPR2022/html/Liu_XMP-Font_Self-Supervised_Cross-Modality_Pre-Training_for_Few-Shot_Font_Generation_CVPR_2022_paper.html

[16]

Ying-Tian Liu, Zhifei Zhang, Yuan-Chen Guo, Matthew Fisher, Zhaowen Wang, and Song-Hai Zhang. 2023. DualVector: Unsupervised Vector Font Synthesis with Dual-Part Representation. Retrieved May 20, 2023 from http://arxiv.org/abs/2305.10462

[17]

Seiya Matsuda, Akisato Kimura, and Seiichi Uchida. 2022. Font Generation with Missing Impression Labels. https://doi.org/10.48550/arXiv.2203.10348

[18]

Ammar Ul Hassan Muhammad and Jaeyoung Choi. 2022. FontNet: Closing the gap to font designer performance in font synthesis. https://doi.org/10.48550/arXiv.2205.06512

[19]

Neill Campbell. FONT-MF Demonstration. Retrieved January 18, 2023 from http://vecg.cs.ucl.ac.uk/Projects/projects_fonts/projects_fonts.html

[20]

Filip Paldia. 2024. AI Font Generation Projects: Comparison Table. Retrieved July 23, 2024 from https://github.com/filipaldi/ai-font-generation-projects

[21]

Jangkyoung Park, Ammar Ul Hassan Muhammad, and Jaeyoung Choi. 2021. CKFont: Few-Shot Korean Font Generation based on Hangul Composability. KIPS Transactions on Software and Data Engineering 10, 11 (November 2021), 473–482. https://doi.org/10.3745/KTSDE.2021.10.11.473

[22]

Song Park. 2022. FFG-benchmarks. Retrieved December 28, 2022 from https://github.com/clovaai/fewshot-font-generation

[23]

Song Park, Sanghyuk Chun, Junbum Cha, Bado Lee, and Hyunjung Shim. 2021. MX-Font: Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), October 2021. IEEE, Montreal, QC, Canada, 13880–13889. https://doi.org/10.1109/ICCV48922.2021.01364

[24]

Pradyumna Reddy, Zhifei Zhang, Zhaowen Wang, Matthew Fisher, Hailin Jin, and Niloy Mitra. 2021. A Multi-Implicit Neural Representation for Fonts. In Advances in Neural Information Processing Systems, 2021. Curran Associates, Inc., 12637–12647. Retrieved December 27, 2022 from https://proceedings.neurips.cc/paper/2021/hash/6948bd44c91acd2b54ecdd1b132f10fb-Abstract.html

[25]

Rapee Suveeranont and Takeo Igarashi. 2010. Example-Based Automatic Font Generation. In Smart Graphics (Lecture Notes in Computer Science), 2010. Springer, Berlin, Heidelberg, 127–138. https://doi.org/10.1007/978-3-642-13544-6_12

[26]

Licheng Tang, Yiyang Cai, Jiaming Liu, Zhibin Hong, Mingming Gong, Minhu Fan, Junyu Han, Jingtuo Liu, Errui Ding, and Jingdong Wang. 2022. FS-Font: Few-Shot Font Generation by Learning Fine-Grained Local Styles. https://doi.org/10.48550/arXiv.2205.09965

[27]

Vikas Thamizharasan, Difan Liu, Shantanu Agarwal, Matthew Fisher, Michael Gharbi, Oliver Wang, Alec Jacobson, and Evangelos Kalogerakis. 2023. VecFusion: Vector Font Generation with Diffusion. Retrieved December 21, 2023 from http://arxiv.org/abs/2312.10540

[28]

Masaya Ueda, Akisato Kimura, and Seiichi Uchida. 2022. Font Shape-to-Impression Translation. https://doi.org/10.48550/arXiv.2203.05808

[29]

Yizhi Wang and Zhouhui Lian. 2021. DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning. https://doi.org/10.48550/arXiv.2110.06688

[30]

Yuqing Wang, Yizhi Wang, Longhui Yu, Yuesheng Zhu, and Zhouhui Lian. 2023. DeepVecFont-v2: Exploiting Transformers to Synthesize Vector Fonts with Higher Quality. https://doi.org/10.48550/arXiv.2303.14585

[31]

Zeqing Xia, Bojun Xiong, and Zhouhui Lian. 2023. VecFontSDF: Learning to Reconstruct and Synthesize High-quality Vector Fonts via Signed Distance Functions. https://doi.org/10.48550/arXiv.2303.12675

[32]

Ye Yuan, Wuyang Chen, Zhaowen Wang, Matthew Fisher, Zhifei Zhang, Zhangyang Wang, and Hailin Jin. 2021. Font Completion and Manipulation by Cycling Between Multi-Modality Representations. https://doi.org/10.48550/arXiv.2108.12965

‍

End-notes

Variable fonts are an evolution of the OpenType font specification that enables many variations of a typeface to be incorporated into a single file rather than having a separate font file for every width, weight, or style.↩︎
Variable font masters represent extreme design along one or more axes, e.g. lightest and heaviest on the weight axis↩︎
Design space is a set of axes that represent a specific design variation, e.g. weight, width, or slant.↩︎
What is it? What are the other interpolations? References↩︎
There are various notations of linear interpolation formulas. In this paper, two variants are presented: the variable font interpolation formula commonly used in vector graphics and the latent space interpolation formula commonly used in a mathematical context. Both variants represent the same values: 1. interpolated value; 2. two extremes on an interpolated axis; 3. parameter ranging from 0 to 1↩︎
There are various notations of linear interpolation formulas. In this paper, two variants are presented: the variable font interpolation formula commonly used in vector graphics and the latent space interpolation formula commonly used in a mathematical context. Both variants represent the same values: 1. interpolated value; 2. two extremes on an interpolated axis; 3. parameter ranging from 0 to 1↩︎
On the Font-MF project website, there is an interactive demo allowing users to traverse through the manifold [19]. URL: http://vecg.cs.ucl.ac.uk/Projects/projects_fonts/projects_fonts.html↩︎
What does it mean sequential representation and raster image representation? Point to the representation article. Find references about representations.↩︎
Fonts are the domain of vector graphics. Therefore, no raster image generation approach is applicable to font production. Unfortunately, the vast majority of researchers ignore this fact.↩︎

Acknowledgement

Special thanks to Matúš, aka Mat, that guy behind the LTTR/INK algorithm, for the technical review of the article.

Use of generative AI in type design: The state-of-the-art Survey

Methodology

Interpolation

Variable font interpolation

Font interpolation with AI

Related works

Interpolation Summary

Font Completion

Font completion methods

Related works

Font Completion Summary

Multimodal Generation

Conclusions

References

End-notes

Acknowledgement

Filip Paldia

What's next