To encourage political participation and engagement among my friends, I often create drinking games for the primary debates. The rules I come up have been based on my own perceptions of each candidate’s go-to slogans, but I wanted to try a more scientific approach for the February debate: can data science and text analysis create the ultimate drinking game?
Since June, there have been seven debates, giving us hours of candidate dialogue. I scraped debate transcripts to capture every single word the candidates have uttered on stage - all 178,692 of them. I then used tf-idf to highlight three word phrases said often by one candidate but rarely by others. tf-idf is a convenient measure here because it devalues generic phrases used often by everyone, like “when I’m president…” or “we need to…”, making the final results more unique for each candidate.
Below are the results for the candidates who have qualified for the next debate. Hover over any phrase to read how the candidate uses it in context.
⛑️ Drink responsibly! ⛑️
As a bonus, here are the sayings of some candidates we’ve seen debate at least thrice previously, but who won’t be on stage this month. Don’t forget to hover over any phrase to read how the candidate uses it in context.
These Democratic candidates aren’t the only ones currently campaigning. Donald Trump has been on a tour of campaign rallies, stopping in Des Moines, Iowa just last week. I scraped transcripts of his speeches from 15 of these rallies (going as far back as 7 months ago) to see what phrases we’ll be hearing during the 2020 election. In total, I scraped 160,443 words from his rally speeches.
Remember, I’m defining a catchphrase as something a candidate says frequently in relation to other candidates. So these aren’t necessarily the three-word combinations Trump uses the most - they’re the phrases he says significantly more often than any of the Democratic candidates.
I’ve also surfaced Trump’s dialogue from the 2015-16 primary debates (and compared it to other Republicans who were in the same debates) to show how his campaign rhetoric already differs across two campaigns.
One final note: this entire project was an analysis on language and rhetoric, which is why it includes quotes found in debate and rally transcripts. Transcripts record everything candidates say verbatim, without discriminating on the basis of factual accuracy. As a consequence, it’s theoretically possible some of the quotes are misleading or unfactual. If you’re unsure about any of the quotes, I recommend looking up the topic on a credible news source.