The use of personal data by web platforms has always been a slippery slope. In order to justify their ad-supported business strategy to users, Big Tech's marketing departments manufactured the ostensible benefit of "personalization" in return for the capturing, re-use, and re-selling of personal data. The idea is that, they are going to show you ads and targeted posts no matter what, but at least those interruptions might be more in line with your interests.

This already feels invasive when it comes to capturing your clicks, like which groups you are in, which keywords in text are most likely to compel your engagement, and so on. It's an extraction of user intent.

What happens when the system is capturing more than your clicks though? When you are talking to an AI, or working with it to create content, where is the line drawn between what belongs to you, what belongs to the platform, and what to do with the stuff in the middle?

While the nuances and principals of this market are ironed out, many AI companies have implemented the Data Retention pattern, leading to its near ubiquity.

ChatGPT's settings show the common pattern

The pattern is generally the same in all uses: located in the user or company settings, a pithy statement about the need to improve the company's models for the benefit of all is paired with an on-off-toggle.

Where its interface is consistent, the principles of interaction vary.

Opt-in vs. opt-out

Most commonly, this setting defaults to "on." This is the case for ChatGPT, Substack, Github, and many other big players in the space.

Notably, when Figma announced their AI features at their Config conference in 2024, they took the approach of defaulting the option to "off." This could signal a departure from the norm towards customer-first defaults.

Figma has published their approach to AI including data storage and model training. They link to this explainer from the settings pane

Paid v. Unpaid

In many cases, the option to opt out of sharing your data or your company's data is only available on premium plans. Free users may never even see the setting, and won't be aware of the tradeoff.

This makes sense commercially, as the servers to run AI are expensive, so free users pay in the form of their data. As legislation related to AI becomes more concrete, we should expect to see pressure on this approach.

Enterprise vs. consumer

As Enterprise-grade accounts roll out across AI platforms, settings like this are being placed within the admin settings instead of within individual settings.Figma offers a good example of this approach. This is logically so that company admins don't have to rely on the discretion of individual employees to follow their standard security policies.

Full privacy

Some companies take this a step forward and fully guarantee a private experience. Limitless.ai doesn't include this setting because it doesn't train models with user data, and explicitly calls out in its privacy policy that it does not allow any 3rd party partners to use their users' data to train models either.

If you take this approach, you might want to include these details in your settings panel as well so that users don't think the lack of the setting relates to lack of privacy, when the opposite is true.

Details and variations

  • Generally a toggle found in the settings panel
  • There is no standard for whether this option defaults to on or off
  • Consider adding additional context so users understand the implication of their choice
  • May not be available to free users
  • Can be managed at the account or the individual level for business and enterprise plans

Considerations

Positives

User control over their data

When this option is present it gives users the discretion over how and when to share their data. Ultimately, if the company is able to earn their trust, users and choose to contribute to the common model

Better foundational models

The fact that someone is building for or using a foundational model suggests they find some value to it. Research shows that AI companies will soon run out of human-generated content to improve their models. Perhaps the models don't need to keep growing at a computational rate, but they will need a refresh from newer events and content to stay relevant. If users find using the models helpful, perhaps they may want to contribute back

Potential risks

Unclear rules

We all have a bit of shock from the last decade of tech companies extracting billions from our personal data. It's not wonder consumers are hesitant. AI companies have a duty to be clear about how they use and store the training data, and what happens if they goof up

Use when:
AI companies need more content to train future models, so they give users the option to opt in (or out) of data sharing.

Examples

ChatGPT has changed their framing to allow users to store a personal memory without being required to share their data to train the model
Claude follows a similar convention to ChatGPT
Claude also informs you before submitting feedback on a conversation that you data may be shared
Figma opts accounts out of AI sharing by default and provides context for how they would use your data if you opted in.
Notion ops you out by default but prompts you to opt in from within the interface
No items found.