AI relies on tokens to understand the user's prompt and translate it into the result. This input pattern is universal, and it adheres closely to how we communicate naturally as humans. We choose our words wisely because they hold meaning. Think about how different ways of saying "I exited" changes the tone and meaning:
- I departed
- I slinked away
- I escaped
- I drifted
- I bowed out
Words carry meaning in their association too. For example, if I prompt a generator to show me a bear in a city, it will tend to place a brown bear in a generic western city. If I specify that I am asking for a Panda Bear, it will place the bear in China. Why? I didn't ask for a location change. This is an example of the bias inherent in our words. Panda Bears have a cultural association with China, so the AI pulls that connection into its definition of city when placing the bear in context.
Token layering is a technique that combines tokens intentionally in order to refine the AI's understanding of your prompt and the direction of its response. Some tools, such as image generators, initially leaned heavily into the pattern literally, instructing users to list and weight tokens as their primary prompt. This was so popular that sharing compelling Midjourney tokens became a cottage industry.
However, while this is highly effective for creative brainstorming or injecting specific styles and tones through in-painting, it can be difficult for users to grasp new techniques. Even then, it's hard for tokens to fully describe your intent.
Web interfaces appear to converging on an evolved direction that blends an open ended prompt with layered tokens. Take Adobe Firefly for example, which asks you to write what you are looking for while an easy palette on the side of the screen lets you choose from stylistic, structural, and referential tokens. This pattern is also seen in Udio and other music generators.
Tokens can also be collected as a follow up to the initial prompt. Jasper and Perplexity are two notable examples of products that auto-generate follow up questions after the prompt is submitted, which themselves serve as inputs to capture additional tokens. The effect is a system that is progressively trying to better understand the user's intent without it feeling like a burden up front.
Finally, tokens can be introduced as a follow up action to help the user instruct the AI on how to modify its result. In this case, suggestions could be from a set list (as is the case with Grammarly or Notion), or generated automatically from the existing context, as the follow up above does. This method of collecting tokens and additional parameters is another example of how progressive disclosure can be used to decrease the up-front lift on the user.