What makes a song great? Part 3
Visualizing the greatest songs of all times (and learning why Nigel Tufnel might be wrong)
Intro and setup
In two previous articles, we scraped Rolling Stone’s list of the “Greatest 500 songs of all time” and retrieved audio features for each of them from Spotify’s API (with the help of the fantastic spotipy
library). Now that we have all the data, we can start doing some analysis. For example, here are some questions that sprung to my mind when I first conceived of running this analysis:
- is the “greatness” of a song somehow linked to its popularity?
- Which artists are featured more often?
- How do the songs’ features evolve in time?
- Are there any characteristics that somehow are more common in the “greatest songs of all time”?
- …
In what follows, I will answer these questions and more with the help of data analysis and visualization.
Note that you can download the dataset from Kaggle and that the source code for this article is available as a notebook in this GitHub repo.
Let’s get started by importing some libraries and loading the data in pandas
.
Let’s take a look at our dataframe.

There’s a bunch of columns we won’t need: let’s get rid of them.
df = df.drop(['Unnamed: 0', 'Spotify id', 'id', 'uri', 'track_href', 'analysis_url', 'type'], axis=1)
I also want to convert the duration_ms
column from millisecond to minutes for readability.
Who are the most voted artists?
Let’s plot the top 10 artists by the number of mentions:

No huge surprises here. Let’s move on.
Is popularity correlated to ranking?
One might wonder whether the ‘best’ songs — i.e., the ones ranked higher by Rolling Stone are also the most popular; vice versa, it is legitimate to ask whether a song’s (relatively) low ranking makes it less popular. Let’s check by adding a Ranking
column to our dataframe and correlating it to the popularity
column.

While most songs seem to be reasonably popular (as evidenced by the skewness of the distribution) — ranking is not correlated with popularity. Let’s inspect the most popular 10 songs.

Now let’s inspect the highest ranked songs:
df.sort_values(by="Ranking")[synthetic_display].head(10).style.hide_index()

These results are not intuitive (at least to me). It might be worth looking into how the popularity
score is calculated by Spotify before drawing any conclusions. Spotify's API documentation states that:
The popularity of the track. The value will be between 0 and 100, with 100 being the most popular. The popularity of a track is a value between 0 and 100, with 100 being the most popular. The popularity is calculated by algorithm and is based, in the most part, on the total number of plays the track has had and how recent those plays are. Generally speaking, songs that are being played a lot now will have a higher popularity than songs that were played a lot in the past. Duplicate tracks (e.g. the same track from a single and an album) are rated independently. Artist and album popularity is derived mathematically from track popularity.
Clearly, the popularity results are skewed towards more recent songs. It is also possible this has to do with duplicate tracks.
How are the audio features distributed?
As I mentioned earlier, we can inspect the audio features for each of the songs. Let’s plot them and see if we can gain any insights.

Let’s also take a closer look at the continuous audio features.

A few quick takeaways:
- danceability seems to be evenly distributed
- C is the prevalent key
- major is by far the prevalent mode
- songs recorded in the studio are prevalent (low liveliness
- 4/4 dominates time signatures
- duration is mostly comprised between 2 and 5 minutes;
- there is some tendency towards ‘happy’ (valence) and ‘energetic’ (energy) song
- both speechiness and acousticness receive a penalty.
Let’s also check how these features correlate with themselves and with ranking.

Perhaps unsurprisingly no one feature seems to be a significant predictor of rank: acousticness and instrumentalness receive a modest penalty, as noted.
One thing that should bother us is that mode
and key
have been decoupled. Let's create a categorical class for the actual key instead, and then take a closer look at the same data by visualizing it with boxplots.

By the way: is D minor the saddest of all keys? Or: Why Nigel Tufnel is wrong.
The great musician and composer Nigel Tufnel once quipped that
D Minor is the saddest of all keys.
We can plot key signatures against valence to see whether he was right.

There aren’t actually that many sad songs in D minor, at least among the “greatest 500 of all times” — if anything it seems that E minor and A minor are sadder! (Remember: tracks with low valence sound more negative; minor mode is 0.).
(If you are interested in the topic, a lot of people have been commenting on it: I find this post by Ethan Hein and this video by Adam Neely illuminating.)
How does the dataset evolve with time?
I thought it would be interesting to see how some of the features in our dataset evolve with time. With the help of a couple of helper functions, I added a decade
column to the dataframe.

Let’s see how the songs are distributed across decades.

So over half of the songs have been published in the 60s or in the 70s… Some will say that this is due to music being better at those times. I am inclined to think this says something about the median age of the Rolling Stone critics. Probably both things contribute.
We can now explore how any feature evolves decade after decade. Let’s check, for example, what happens to the duration
and loudness
features.

It is of little surprise that average loudness has increased in the past and also that there is less variance in this dimension as time goes by. Note also how much the average loudness increases in the current decade (although the sample size is low). As for duration were widely more varied in the 70s. But there are two huge outliers in the 60. So let’s check them out.
df.sort_values(by="duration_min", ascending=False)[:2]

I was curious to see which songs had the highest and lowest scores for each category.
HIGHEST SCORES
Feature Title Artist
---------------------------------------------------------------------------------------
energy When Doves Cry Prince and the Revolution
key 212 Azealia Banks
loudness Crazy Gnarls Barkley
mode Baby Love The Supremes
speechiness Flava in Ya Ear (Remix) Craig Mack feat. Notorious B.I.G
acousticness Teenage Riot Sonic Youth
instrumentalness Green Onions Booker T. and the MGs
liveness Tyrone Erykah Badu
valence Pressure Drop Toots and the Maytals
tempo Doll Parts Hole
time_signature Stronger Kanye West
duration_min Pt. 1-Acknowledgement John Coltrane
LOWEST SCORES
Feature Title Artist
---------------------------------------------------------------------------------------
energy Crazy Patsy Cline
key You're So Vain Carly Simon
loudness Crazy Patsy Cline
mode Stronger Kanye West
speechiness Waterloo Sunset The Kinks
acousticness Summer Babe (Winter Version) Pavement
instrumentalness Stronger Kanye West
liveness B.O.B. Outkast
valence Lose Yourself Eminem
tempo Without You Harry Nilsson
time_signature Solsbury Hill Peter Gabriel
duration_min It Takes Two Rob Base and DJ E-Z Rock
I also got to wonder if there were a way to find out which were the most representative songs in term of some audio features, e.g., in a given decade. This post on Stack Overflow pointed me in the right direction. The idea is to have a function that sorts a dataframe by the values closest to the mean in a given column. For example, if you want to have the song that’s closest to the mean in terms of duration, you would call sort_by_closest_to_mean(df, 'duration')[0]
.
Mean for dataset: 4.26
duration_min 4.26
Title I Like It
Artist Cardi B, J Balvin, and Bad Bunny
Name: 116, dtype: object
Linear Regression? Really?
All right, this is just for fun. So please do not take it too seriously. But what happens if one were to run a linear regression on the dataset? Well, here it goes.
-----------------------------------
feature coef
-----------------------------------
energy 7.92
key 4.90
loudness -1.32
mode -2.76
speechiness -6.29
acousticness -12.67
instrumentalness -9.30
liveness 6.35
valence 3.69
tempo -1.44
time_signature -1.86
-----------------------------------