Spectrum Labs, the leading provider of text analytics AI whose tools scale content moderation for games, apps, and online platforms, today announced the world’s first AI content moderation solution that detects and prevents harmful and toxic behavior through generative AI. With the advent of generative AI, such as ChatGPT, Dall-E, Bard, Stable Diffusion, and others, automatic content creation can now be used to quickly and easily produce racist imagery, hate speech, radicalization, spam, fraud, grooming, and Spread harassment on a massive scale with little time investment by bad actors aiming to abuse the new technology.
To address this issue, Spectrum Labs has developed a unique generative AI content moderation tool that helps platforms automatically protect their communities from this highly scalable adversarial content.
“Platforms were already struggling to sift through mountains of user-generated content online produced every day to identify and remove hateful, illegal and predatory content before generative AI emerged as recruiters for violent organizations, your job is much easier now has become,” said Justin Davis, CEO of Spectrum Labs. “Fortunately, our existing contextual AI content moderation tools can be adapted to handle this new flow of content because they are designed to recognize intent, not just a list of keywords or specific phrases, which Generative AI can easily avoid.”
Because generative AI is designed to create plausible variations of human speech, traditional keyword-based moderation tools cannot tell if content intent is hateful if they never use specific racist words or phrases. (For example, a children’s story about why one race is superior to another, sans racial slurs). Similarly, other existing contextual models that can detect sexual, threatening, or toxic content but cannot detect positive behaviors such as encouragement, affirmation, and rapport would redact Generative AI responses on sensitive topics, even if the content was helpful, supportive, and reassuring should be. (For example, if a user who has suffered sexual abuse seeks help to find psychological support resources).
Even with image-based generative AIs like Dall-E, automatically detecting and redacting toxic human-generated prompts can prevent the creation of libraries of new AI-generated image and video content that is hateful, threatening, radicalizing, and more, while preserving the real-time latency that makes the Generative AI user experience seem so magical.
Future applications of generative AI’s real-time, multi-layered AI moderation could include copyright infringement detection, bias detection in AI-generated content to filter and eliminate biased and problematic training data sources, and better analysis of the type of content people want to include make and how it is used. But for now, the company is focused on quickly deploying a basic set of tools to protect users and platforms from a potential tidal wave of toxic content.
“At Spectrum Labs, our mission is to make the internet a safer place for everyone. We know that trust and security workers are the unsung heroes in this fight, and we’re honored to support them in making the digital world safer, post by post,” Davis added.