A New Prompt Engineering Research Proposes PEZ (Prompts Made Easy): A Gradient Optimizer For Text That Utilizes Continuous Embeddings To Reliably Optimize Hard Prompts

Prompt engineering is the process of creating instructions to guide generative models. It is the key to unlocking large models’ power for image generation and language tasks. Today, prompt engineering methods can be broadly categorized into two categories.

  1. Hard prompting methods:  This method involves using hand-crafted sequences of interpretable tokens to bring out the model behaviors. Using hard prompting, many good prompts are discovered by trial and error or by sheer intuition. 
  1. Soft prompts: Soft prompts consist of continuous-valued language embeddings. They are uninterpretable and non-transferable. Gradient-based optimizers and large datasets are used to generate high-performing prompts for specialized tasks.

Given the left image, a discrete text prompt is discovered using CLIP to prompt Stable Diffusion, generating new images (right).

Advantages of Hard prompts

  • Hard prompts can be mixed and mutated to perform various tasks, whereas soft prompts are highly specialized. 
  • Hard prompts can be discovered using one model and deployed on another. This portability is not possible in the case of soft prompts as there are differences in embedding dimension and representation space among models. 
  • Only hard prompts can be used when only API access to a model is available.

Hard prompts made easy

Models like ChatGPT and Stable Diffusion use prompt; however, the actual text performs poorly. Soft prompts could be used to fully utilize the capabilities of the model, but the main problem with them is humans cannot understand them.

🚨 Read Our Latest AI Newsletter🚨

Researchers at the University of Maryland and New York University have designed a prompt optimizer PEZ for discovering good hard prompts. PEZ can work on any language or vision-language model. It can find interpretable prompts that describe the image’s content and generate new images with the same content, with minimal tuning. 

We can integrate different prompts to generate new images with combined content from multiple source images.

The main advantage of PEZ is that we can extract the prompt and then modify it to create new images. For example, we can capture the style of human drawings, generate a prompt for the same, and then change/add a word like tiger or Paris to generate new photos of the same style.

Optimization of Hard prompts

One way of optimizing hard prompts is by projecting soft prompts onto the nearest token embeddings. However, this can affect the performance even if the soft prompt is near the real token embedding. We can also project the soft prompts every iteration of gradient descent, but if the learning rate is too low, the model may stay at the same hard prompt and not learn anything.

PEZ solves this issue by bridging soft and hard prompts by optimizing the soft prompt using gradients that are computed at the nearest hard prompt. Ultimately, the model can project onto the nearest token embeddings to get an effective and interpretable hard prompt. This procedure is more reliable and requires much less engineering and tuning than previous hard prompt tuning methods.

The following image shows the algorithm used for optimization:


Check out the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 13k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.


I am a Civil Engineering Graduate (2022) from Jamia Millia Islamia, New Delhi, and I have a keen interest in Data Science, especially Neural Networks and their application in various areas.



Credit: Source link

Comments are closed.