Your Data; Their AI Models

Over on the Mem Slack forum, I recently pressed Mem CEO Dennis Xu to ease my fears that my Mem notes are not exposed to OpenAI and its information harvesting approach to help train its models. Note-taking, after all, often contains sensitive information. As he points out, the OpenAI ToS for its API recently changed, effective March 1 2023 (for the better). This is a very important change.

Mem uses ONLY the OpenAI API with users’ personal note content. While Mem uses such content to create the overall intelligence we experience in Mem itself, it is not harvested by OpenAI for future model training. Mem has essentially provided a privacy insulation layer by default.

In contrast, if you copy and paste any notes or content to ChatGPT and other consumer-facing interfaces offered by OpenAI, your content will be harvested and used in model training. If you use apps from other providers who also use OpenAI APIs and they have chosen to opt-IN to OpenAI data sharing, your content will also be harvested and used by OpenAI.

While the ToS for OpenAI’s APIs has recently changed, all data that traversed OpenAI APIs prior to 28-Feb-2023 is likely to be discoverable in future GPT models.

Be careful who you choose to act as the conduit to OpenAI.

Imagine if the information adhesives providers (such as Zapier or Make) cut a deal to share all data flowing through their integration recipes such that they would get GPT access for free or at a discount in exchange for surrendering your data. I have a hunch these market leaders are not doing this, and their ToSs probably indicate a deep regard for privacy. But it’s a nuance that really matters in the age of AI and large language models.

Can we be so certain that some of the glue factory competitors will not find data sharing to LLM providers an ideal way to compete in the emerging AI space by offering up your data to gain a competitive advantage?

Now, more than ever before is the time when ToSs with your integrated services matter.

Great points and good food for thought.

It sounds like using AI could lead to data leaks. I wonder how long it will take for corporate espionage and cyber security professionals to educate / exploit normal humans.

If data is flowing into AI models (as the service has stated it will), it’s not really a “leak”. But I suspect many users are unintentionally broadcasting their data in a manner that will capture and possibly expose it someday.

Most people think - I ask a question, it gives me some information. However, if you use your own data in an attempt to engineer a prompt for smart GPT outcomes, you are simply plying OpenAI with your information.

Good example - chat data prep.