Marvel Characters Similarity Map
Arranged Based On Shared Comic Book Appearances
By Oliver Gladfelter | Sept 30, 2020
I went to a small college. With a campus of 2,200 students, interpersonal relationships and interactions often came with many layers: you’d sit in front of your RA in at least two classes; your TA was also the DJ at approximately 30% of the parties you attended; your lab partner probably ended up dating your roommate. In other words, “How do you know so-and-so?” often had a complicated and convoluted answer.
I was recently reminded of these multifaceted relationships after coming across the fictional background story of The Vision, a Marvel character. As it turns out, Vision was created by Ultron, but ultimately turned against and defeated his creator. The two androids later reconnected as they both developed human emotions and worked together to “explore the world of feeling and who they are” (sounds like college to me).
As if they all attended a small college together, no two comic book characters are just teammates nor just enemies. Instead, we have a set of androids who are father and son, enemies, and codependent - and that's just naming one example. With 80 years of publication history, Marvel’s writers have had plenty of time to turn enemies into allies, allies into lovers, lovers into enemies, and so on many times over.
Because relationships are just as important to Marvel’s stories as any action scene or world-threatening crisis, I set out to create a visual representation of a few thousand of the series’ most iconic connections. I’m calling the final product a “proximity map,” in which the closer any two characters are to each other in the graph, the more often they appear in the same comic books. So Captain America and Iron Man, for example, are practically on top of each other in the graph, no doubt thanks to their 1,370 joint appearances throughout the comic books. Meanwhile, characters belonging solely to The Avengers and the X-Men are further apart from one another, since they’ve (mostly) stuck to their own storylines. Take a look for yourself:
Data & Methodology
All data are pulled from the Marvel Database. For each of Marvel’s 29,136 characters, I scraped how many comic books they’ve appeared in. Because continuing this analysis with over 29,000 data points would have overwhelmed the final product and destroyed my humble laptop, I opted to remove anyone who hasn’t appeared in at least 60 comic books. This left me with a sample of 756 characters.
For each character, I then collected a list of the comic books they’ve appeared in. Comparing these lists between every possible combination of any two characters (285,768 pairs, to be exact) allowed me to compute how many comic books the pair have appeared in together. For example, Peter Parker and Steve Rogers have shown up in 4,311 and 3,581 comic books, respectively, and have overlapped in 765 of those comic books.
The resulting data set of appearance counts for all possible character pairs can be considered high dimensional. It’s easy to visualize how many joint appearances Spider Man has with each of the other 755 characters, but much more difficult to visualize how many joint appearances all 756 characters have with each other. To do so would require a graph with 756 axes, which you don’t want to see and I don’t want to code (read: cannot code).
To compress our high-dimensional data for a flat, 2-dimensional visualization, we leverage a t-distributed stochastic neighbor embedding algorithm. This model is capable of considering the appearance counts of every pair and computing the most fitting coordinates (x, y) for each character. It also ensures that characters that were ‘close’ to each other in the multi-dimensional space also end up near each other in the two-dimensional space. The relationship won’t be perfectly linear - characters with the highest joint comic book appearances may not necessarily be the closest characters the graph, as they may be ‘pulled away’ from one another through large amounts of joint appearances with others. But for the most part, distance in space in the graph equates to extent of similarity in terms of the wholistic, multi-dimensional view. This is why The Avengers mostly cluster together, the X-Men mostly cluster together, and so on and so forth.
Code and data for this project available on Github.