This week the X billionaire and CEO Elon Musk claimed the amount of human data used to develop artificial intelligence (AI) models like ChatGPT is all gone.
This, Musk didn’t state, was not backed up by evidence. But other influential tech leaders have said the same thing in recent months. And some research suggested that artificial data would expire in about two to eight years.
That’s because human beings can’t keep up with the high, high pace of constructing new data (text, video, images, etc) in order to keep up with the super-fast and huge computations of AI models. When actual data actually runs out, that’s a problem for AI developers and end-users alike.
It will make tech giants more dependent on AI generated data, called “synthetic data”. And that, in turn, might make the AI systems that hundreds of millions of people already depend on less accurate and trustworthy – and therefore helpful.
This doesn’t have to be so. Indeed, with the right training and oversight, artificial information might be used to enrich AI models.
The problems with real data
Tech companies rely on data – real or artificial – to construct, train and iterate generative AI machines like ChatGPT. This data has to be good. Poor data makes for bad results, just as bad ingredients in a recipe will make for bad food.