Marvel Characters Similarity Map
Arranged Based On Shared Comic Book Appearances
By Oliver Gladfelter | Sept 30, 2020
I was recently skimming the background story of The Vision, a fictional Marvel character. Vision was created by Ultron, but ultimately turned against and defeated his creator. The two androids later reconnected and even had somewhat of a team-up.
The multifaceted relationship between Vision and Ultron seems indicative of comic book rosters: any two characters are rarely just teammates or just enemies - in 80 years of publication history, Marvel's writers have turned enemies into allies, allies into lovers, and lovers into enemies many times over.
Because these relationships drive Marvel's stories just as much as any action scene, I set out to create a visual representation of thousands of connections. I'm calling the final product a "proximity map" - the closer any two characters are to each other, the more often they appear in the same comic books. For example, Captain America and Iron Man overlap each other after sharing 1,370 joint appearances throughout the decades. Meanwhile, characters belonging solely to The Avengers and the X-Men are further apart from one another, since they’ve (mostly) stuck to their own storylines.
Data & Methodology
All data are pulled from the Marvel Database. For each of Marvel’s 29,136 characters, I scraped how many comic books they’ve appeared in. Because continuing this analysis with over 29,000 data points would have overwhelmed the final product and destroyed my humble laptop, I opted to remove anyone who hasn’t appeared in at least 60 comic books. This left me with a sample of 756 characters.
For each character, I then collected a list of the comic books they’ve appeared in. Comparing these lists between every possible combination of any two characters (285,768 pairs, to be exact) allowed me to compute how many comic books the pair have appeared in together. For example, Peter Parker and Steve Rogers have shown up in 4,311 and 3,581 comic books, respectively, and have overlapped in 765 of those comic books.
The resulting data set of appearance counts for all possible character pairs can be considered high dimensional. It’s easy to visualize how many joint appearances Spider Man has with each of the other 755 characters, but much more difficult to visualize how many joint appearances all 756 characters have with each other. To do so would require a graph with 756 axes, which you don’t want to see and I don’t want to code (read: cannot code).
To compress our high-dimensional data for a flat, 2-dimensional visualization, we leverage a t-distributed stochastic neighbor embedding algorithm. This model is capable of considering the appearance counts of every pair and computing the most fitting coordinates (x, y) for each character. It also ensures that characters that were ‘close’ to each other in the multi-dimensional space also end up near each other in the two-dimensional space. The relationship won’t be perfectly linear - characters with the highest joint comic book appearances may not necessarily be the closest characters the graph, as they may be ‘pulled away’ from one another through large amounts of joint appearances with others. But for the most part, distance in space in the graph equates to extent of similarity in terms of the wholistic, multi-dimensional view. This is why The Avengers mostly cluster together, the X-Men mostly cluster together, and so on and so forth.
Code and data for this project available on Github.