OpenAI, makers of the AI large-language model chatbot ChatGPT, announced on Thursday a deal with social media platform Reddit to give ChatGPT direct access to content from Reddit.
OpenAI, makers of the AI large-language model chatbot ChatGPT, announced on Thursday a deal with Reddit to give ChatGPT direct access to content from the social media platform.
While this might be a boon for AI research, it will be a disaster for privacy and the rights of users on Reddit and other sites that have been used for AI training. Large-language models vacuum massive amounts of data as part of their training, and Reddit users could have their every word added to ChatGPT and potentially exposed to the world. More guardrails are needed ensure training models respect our privacy.
If we want AI tools to continue to grow in sophistication, then we have to accept that the models have to be trained on a large volume of high-quality data. Instead of a "wild-west" situation where designers scrape online data without authorization, a recent turn towards licensing deals will ensure that training data is managed in a conscientious way that is fair to all parties involved.