OpenAI refutes allegations of cloning actress Scarlett Johansson’s voice for its own AI-powered chatbot named Sky. Nevertheless, the firm opted to remove the voice after encountering significant pressures, largely from the Internet, after users’ insistence on the likeliness of the voice to the actress’.
On Monday, OpenAI clarified that Sky was never meant to imitate Johansson, emphasising that “AI voices should not deliberately mimic a celebrity’s distinctive voice.” Johansson claimed that the chatbot’s voice was copied from her own, notably from the movie Her, where her own voice is used as a chat assistance.
Instead, the company insisted it hired an actress for the same, alongside using samples from multiple voice actors, months before OpenAI CEO Sam Altman contacted Johansson. In a statement, Altman said the Sky voice “is not Scarlett Johansson’s, and it was never intended to resemble hers.”
He also said that the company cast the actor for the Sky voice before approaching Johansson, adding, “We are sorry to Ms. Johansson that we didn’t communicate better.” The firm explained that the voices were “carefully selected through an extensive process spanning five months involving professional voice actors, talent agencies, casting directors, and industry advisors.”
This is further supplemented by the documents OpenAI shared, alongside the audio recordings of the anonymous actress whose voice sounds identical to Sky’s. Although Altman did seek contact with Johansson over her interest in voicing an AI chatbot in the future, she declined for personal reasons.
Eventually, through social media, the company announced that Sky would also be put on pause as the firm would work to clarify how the voice came to be chosen. OpenAI also breached the topic of the capabilities and limits of technology, alongside its potential risks in its process of selecting the final voices.
Women, technology and their audience
Nevertheless, the incident sparked debates on topics of AI and human privacy, alongside the use of women as “sources of comfort” in technological settings. Regarding the latter, Johansson stated Altman’s offer, wherein “He told me that he felt that by my voicing the system, I could bridge the gap between tech companies and creatives and help consumers to feel comfortable with the seismic shift concerning humans and AI.”
A similar scenario played out with the Lena Forsen (Lenna) image used from the 1972 issue of the Playboy magazine. The image came to be widely used as a standard test image for digital image processing that began in 1973.
Though the editor-in-chief of IEEE Transactions on Image Processing mentioned that the texture and rendering of the image made it a suitable test image, he did highlight that “the Lena image is a picture of an attractive woman. It is not surprising that the (mostly male) image processing research community gravitated toward an image that they found attractive.”
Consequently, the image was subject to controversy, similar to the popular use of women’s voices for chatbots, virtual assistants, and text-to-speech outputs. Netizens also questioned Sky’s apparent “flirty” nature in conversation, to which OpenAI mentioned in its blog post that charisma and approachability were among the characteristics so as to create a voice that “feels timeless.”
However, much like how the use of the Lenna image was eventually stopped, labelled “suggestive”, and called out for its adverse impact on female students getting into S.T.E.M fields, Sky has also been put on pause, its return uncertain for now.
The debate also concerned the use of copyrighted works by AI, wherein authors, actors and other professionals have expressed their dissatisfaction over AI using their work, physical features and personalities to create its own content, with neither consent nor pay. For now, lawsuits over the same assure that consent would be sought for AI to create such works, though its pervasiveness only makes it a temporary solution.