Can We Transform Text into Scientific Vector Graphics? This AI Paper Introduces AutomaTikZ and Explains the Power of TikZ

On Oct 13, 2023

Recent developments in text-to-image generation have made the creation of detailed graphics from straightforward natural language descriptions possible. Results using models like Stable Diffusion and DALL-E frequently resemble actual images or works of art created by humans. These models do not produce the best raster images for scientific figures, often produced at low resolutions. Scientific figures are essential to scientific study because they help researchers explain complicated concepts or communicate important discoveries. Raster graphics need to improve in these areas because they require a high level of geometric precision and text that can be read even in small letters. As a result, vector graphics, which divide data into geometric forms, enable text search, and often have reduced file sizes, are encouraged by many academic conferences.

The field of automated vector graphics creation is also expanding, although the available approaches have drawbacks of their own. They mostly produce Scalable Vector Graphics (SVG) format low-level path components, either failing to retain precise geometric relationships or producing outputs with a low degree of complexity, such as single icons or typeface letters. Researchers from Bielefeld University, the University of Hamburg, and the University of Mannheim & Bielefeld University investigate the usage of visual languages, which abstract from lower-level vector graphics formats, by offering high-level structures that may be compiled to them to solve these restrictions.

Language models suggest that acquiring these languages and using them to do simple tasks is possible. Still, it is being determined to what extent they can produce scientific numbers. They concentrate on the graphics language TikZ in this work due to its expressiveness and emphasis on science, which allows the production of complicated figures with just a few instructions. They want to know if language models can automatically create scientific figures based on picture captions, similar to text-to-image creation, and capture the subtleties of TikZ. Not only may this increase productivity and promote inclusivity (helping academics less familiar with programming-like languages, such as social scientists), but it could also improve teaching by producing customized TikZ examples. The TEX Stack Exchange is an example of this in use, with TikZ being the most commonly discussed subject there, with about 10% of the queries answered.

Their main contributions are:

(i) As part of their AutomaTikZ project, they developed DaTikZ, which has over 120k paired TikZ drawings and captions and is the first large-scale TikZ dataset.

(ii) The large language model (LLM) LLaMA on DaTikZ is adjusted, and its performance is contrasted with that of general-purpose LLMs, notably GPT-4 and Claude 2. Automatic and human evaluation finds that scientific figures produced by adjusted LLaMA are more similar to human-created figures.

(iii) They continue to work on CLiMA, an extension of LLaMA that includes multimodal CLIP embeddings. With this improvement, CLiMA can now more easily understand input captions, which enhances text-image alignment. Additionally, it makes it possible to use photos as additional inputs, which improves speed even more.

(iv) They also show that all models provide original results and have little memorizing issues. While LLaMA and CLiMA frequently provide degenerate solutions that maximize text-image similarity by overtly duplicating the input caption onto the output picture, GPT-4 and Claude 2 often produce simpler outputs.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.

▶️ Now Watch AI Research Updates On Our Youtube Channel [Watch Now]

Credit: Source link