Phunky Cafe

ChatGPT Health is worse than you think

If you've ever been curious about how OpenAI trains their models, you've likely checked their privacy policyArchive. Right up front, they have this tidbit:

For information about how we collect and use training information to develop our language models that power ChatGPT and other Services, and your choices with respect to that information, please see this help center article⁠.

Following that link brings you to a help center article titled How ChatGPT and our foundation models are developedArchive. Again, at the beginning, we get a short overview:

OpenAI’s foundation models, including the models that power ChatGPT, are developed using three primary sources of information: (1) information that is publicly available on the internet, (2) information that we partner with third parties to access, and (3) information that our users, human trainers, and researchers provide or generate.

No shocker there - of course they use your conversations to train their models by default. This help center article also links to another one called How your data is used to improve model performanceArchive, where they explicitly mention that they use user data to train their models:

ChatGPT, for instance, improves by further training on the conversations people have with it, unless you opt out.

However, there is something else odd. In their privacy policy and the general model training article, they only mention information how their models are trained. However, the foundation model article specifically discusses foundation models. In fact, the entire article seems to very intentionally specify details about how foundation models are trained.

What the hell is a foundation model?

At a high level, foundation models are what you get when you initially train an AI model. However, the foundation model can be tuned and adapted for different, specialized tasks.

OpenAI has a list of their modelsArchive, each categorized. Some of their specialized models include GPT-5 mini, GPT-5 nano, and o3-deep-research. The mini and nano flavors are tuned to be small and efficient, while the deep research models are tuned for more in-depth reasoning and research.

Crucially though, there is a missing category here: foundation models. Go ahead and click through their models and look for the word "foundation". You won't find it.

There are plenty of people online (and LLMs you can ask) who will happily define a foundation model for you and label models like GPT-4, GPT-5, and others as example of foundation models. However, OpenAI has no such official definition or label, despite referencing foundation models in their marketing material and help center articles when discussing model training.

So far, this probably feels like pointless semantics. You should probably just opt out of using your data to train any of their models - foundational or otherwise - and move on. Does it really matter if there's a difference between using your data for foundational or derivative models?

Well, yes. I certainly think so at least.

Enter ChatGPT Health

On January 7, 2026, OpenAI announced ChatGPT HealthArchive.

We’re introducing ChatGPT Health, a dedicated experience that securely brings your health information and ChatGPT’s intelligence together, to help you feel more informed, prepared, and confident navigating your health.

They are encouraging users to "bring your medical records and the apps you use to track your health and wellness into Health". They are also providing integration with popular health tracking apps like Apple Health, MyFitnessPal, and even medical records, citing "lab results, visit summaries, and clinical history" as examples.

People were already using LLMs as doctors and therapists before, but OpenAI going out of their way to embrace this use case changes things. Especially when it comes to medical information, this needs to be held to a high standard of scrutiny.

A quick note: please, do not use AI for medical or mental health advice. Medical professionals have already had a hard enough time with people going to Google for their wacko medical information, and these sycophantic LLMs are only making it worse. Seek help from a real doctor or therapist instead.

Foundation models do matter

Buried in the middle of the ChatGPT Health announcement is this line:

Conversations in Health are not used to train our foundation models.

Okay. But what about non-foundation models? We already saw OpenAI discuss training data without specifying foundation models in their opt-out article earlier, so we know this wordplay is intentional.

At the same time as ChatGPT Health, OpenAI also announced OpenAI for Healthcare, a product suite that includes ChatGPT for Healthcare. In this announcement, they have a similar bit about training data:

Content shared with ChatGPT for Healthcare is not used to train models.

This time though, OpenAI does not specify foundation models; just "models". These two announcements were written and released at the same time, likely by the same team. Wordplay like this must be intentional.

The more specific mention of "foundation models" used in the Health announcement certainly seems to leave a quiet loophole for OpenAI to use consumers' health data to train their models. As long as the models they train the health data on are not "foundation models", they would not technically be lying.

On the other hand, the more general, all-encompassing term "models" used in the Healthcare announcement cuts off any notion of using patient data to train their models - foundational or otherwise.

Back at square one

So, where does that leave us?

It would appear that OpenAI is trying to quietly train their non-foundation AI models on the data of unsuspecting users - including health data. By not publicly stating which of their models they define as foundational - or what their definition of "foundational" even is - and selectively including the specifier "foundation" versus "non-foundation" in their documentation, they leave themselves a comfortably grey area.

So far, they seem to have been pretty successful at getting this to fly under the radar. The few people I have seen discussing this also seem confused by the sly change in terminology, or outright apathetic (which at this point is totally fair).

The problem is that, for better or for worse, people are using ChatGPT and other AI tools a lot. A significant portion of those users are also not opting out of OpenAI using their data for model training - and I'd bet that a lot of them don't even know that this is happening. And now that OpenAI is foraying into the healthcare field, dishonest behavior like this will have a bigger impact than ever before.

The only action I can recommend people to take is to opt out of training, don't use ChatGPT for medical advice, and spread the word to your friends and family. People cannot make informed choices without having all relevant information available to them, and OpenAI is doing their damnedest to hide such information.

#ai #news #privacy #research