
Graduate students are hunched over screens showing Van Gogh’s Starry Night on a weekday afternoon in the Gates Computer Science Building, but they aren’t appreciating the brushwork. They are witnessing an algorithm explain how it feels. The picture loads. There’s a line of text. Wow. This painting’s blue and white hues give me the impression that I’m looking at a dream. Someone laughs. It is written down by someone else. The race to create emotionally intelligent machines looks like this, and it’s more bizarre than the headlines portray.
The ArtEmis project, which originated at Stanford’s Institute for Human-Centered AI, has accomplished something that previous emotion-recognition systems were never able to. It does not simply classify a face from a webcam feed as “happy” or “angry,” the kind of crude work that has long caused emotion AI to run afoul of civil rights organizations. With the help of over 81,000 paintings and approximately 440,000 human responses, ArtEmis was trained to react differently to various aspects of a single image. It may say “Fear” if you show it Dali’s melting clocks. There appears to be a dead animal on the ground. That isn’t categorization. That’s more in line with interpretation, though it’s debatable if you would classify it as empathy.
Researchers here believe that this work is remedial. With good reason, Emotion AI has a damaged reputation. Products that promise to read job applicants, inmates, and students using cameras and confidence scores that reduce human emotion to a few cartoon categories have flooded the market. It has been dubbed “emojification” by critics.
Some have more bluntly referred to it as “junk science with a racial bias problem.” Stanford’s approach seems different, with more humanities seminars and less surveillance, but it’s hard to ignore the commercial appeal. This is what marketers want. This is what chatbot companies want. The first person to master emotional fluency will most likely earn a substantial sum of money.
Co-director of HAI Fei-Fei Li has argued for years that AI should enhance human ability rather than replace it. Nearly everything the lab produces is based on this philosophy. Economist Erik Brynjolfsson of HAI has written about what he refers to as the “Turing Trap,” which is the notion that we’ve spent too much time worrying about whether machines can mimic us and not enough time considering what they might do in addition to us.
You can see ArtEmis’s argument’s appeal when you watch him describe a painting. Whether the machine truly feels anything is not the intriguing question. It most likely doesn’t. What happens when people work with a system that can express emotions on a large scale is an intriguing question.
It’s difficult not to wonder where this could go. In five years, an ArtEmis descendant might be used by a museum curator to assist in creating exhibition labels. A chatbot trained on this type of data might be used by a grief counselor. A toy company will likely use it less respectfully. As is typical in Silicon Valley, the technology is advancing more quickly than the discourse surrounding it. Researchers at Stanford are aware of this. They discuss ethics all the time, sometimes wearily, as people do when they’ve been debating a point for years and aren’t sure they’re winning.
Walking around the campus, it’s remarkable how unassuming the work appears up close. No robots are pointing at canvases. Just researchers, screens, coffee cups, and a quiet belief that we should be prepared for the possibility that machines will one day understand us better than we anticipate.
