What makes a song great? Part 3

Photo by Kelly Sikkema on Unsplash

Intro and setup

In two previous articles, we scraped Rolling Stone’s list of the “Greatest 500 songs of all time” and retrieved audio features for each of them from Spotify’s API (with the help of the fantastic spotipy library). Now that we have all the data, we can start doing some analysis. For example, here are some questions that sprung to my mind when I first conceived of running this analysis:

  • is the “greatness” of a song somehow linked to its popularity?
  • Which artists are featured more often?
  • How do the songs’ features evolve in time?
  • Are there any characteristics that somehow are more common in the “greatest songs of all time”?

In what follows, I will answer these questions and more with the help of data analysis and visualization.

Note that you can download the dataset from Kaggle and that the source code for this article is available as a notebook in this GitHub repo.

Let’s get started by importing some libraries and loading the data in pandas.

Let’s take a look at our dataframe.


There’s a bunch of columns we won’t need: let’s get rid of them.

df = df.drop(['Unnamed: 0', 'Spotify id', 'id', 'uri', 'track_href', 'analysis_url', 'type'], axis=1)

I also want to convert the duration_ms column from millisecond to minutes for readability.

Who are the most voted artists?

Let’s plot the top 10 artists by the number of mentions:

Top artists by number of mentions

No huge surprises here. Let’s move on.

Is popularity correlated to ranking?

One might wonder whether the ‘best’ songs — i.e., the ones ranked higher by Rolling Stone are also the most popular; vice versa, it is legitimate to ask whether a song’s (relatively) low ranking makes it less popular. Let’s check by adding a Ranking column to our dataframe and correlating it to the popularity column.

Ranking vs popularity

While most songs seem to be reasonably popular (as evidenced by the skewness of the distribution) — ranking is not correlated with popularity. Let’s inspect the most popular 10 songs.


Now let’s inspect the highest ranked songs:


These results are not intuitive (at least to me). It might be worth looking into how the popularity score is calculated by Spotify before drawing any conclusions. Spotify's API documentation states that:

The popularity of the track. The value will be between 0 and 100, with 100 being the most popular. The popularity of a track is a value between 0 and 100, with 100 being the most popular. The popularity is calculated by algorithm and is based, in the most part, on the total number of plays the track has had and how recent those plays are. Generally speaking, songs that are being played a lot now will have a higher popularity than songs that were played a lot in the past. Duplicate tracks (e.g. the same track from a single and an album) are rated independently. Artist and album popularity is derived mathematically from track popularity.

Clearly, the popularity results are skewed towards more recent songs. It is also possible this has to do with duplicate tracks.

How are the audio features distributed?

As I mentioned earlier, we can inspect the audio features for each of the songs. Let’s plot them and see if we can gain any insights.

Distribution of features

Let’s also take a closer look at the continuous audio features.

Distribution of features

A few quick takeaways:

  • danceability seems to be evenly distributed
  • C is the prevalent key
  • major is by far the prevalent mode
  • songs recorded in the studio are prevalent (low liveliness
  • 4/4 dominates time signatures
  • duration is mostly comprised between 2 and 5 minutes;
  • there is some tendency towards ‘happy’ (valence) and ‘energetic’ (energy) song
  • both speechiness and acousticness receive a penalty.

Let’s also check how these features correlate with themselves and with ranking.

Correlation matrix

Perhaps unsurprisingly no one feature seems to be a significant predictor of rank: acousticness and instrumentalness receive a modest penalty, as noted.

One thing that should bother us is that mode and key have been decoupled. Let's create a categorical class for the actual key instead, and then take a closer look at the same data by visualizing it with boxplots.

Distribution of keys

By the way: is D minor the saddest of all keys? Or: Why Nigel Tufnel is wrong.

The great musician and composer Nigel Tufnel once quipped that

D Minor is the saddest of all keys.

We can plot key signatures against valence to see whether he was right.

Plot of keys vs valence

There aren’t actually that many sad songs in D minor, at least among the “greatest 500 of all times” — if anything it seems that E minor and A minor are sadder! (Remember: tracks with low valence sound more negative; minor mode is 0.).

(If you are interested in the topic, a lot of people have been commenting on it: I find this post by Ethan Hein and this video by Adam Neely illuminating.)

How does the dataset evolve with time?

I thought it would be interesting to see how some of the features in our dataset evolve with time. With the help of a couple of helper functions, I added a decade column to the dataframe.


Let’s see how the songs are distributed across decades.

Songs per decade

So over half of the songs have been published in the 60s or in the 70s… Some will say that this is due to music being better at those times. I am inclined to think this says something about the median age of the Rolling Stone critics. Probably both things contribute.

We can now explore how any feature evolves decade after decade. Let’s check, for example, what happens to the duration and loudness features.

Duration and loudness by decade

It is of little surprise that average loudness has increased in the past and also that there is less variance in this dimension as time goes by. Note also how much the average loudness increases in the current decade (although the sample size is low). As for duration were widely more varied in the 70s. But there are two huge outliers in the 60. So let’s check them out.

df.sort_values(by="duration_min", ascending=False)[:2]

I was curious to see which songs had the highest and lowest scores for each category.

HIGHEST SCORES                                 

Feature Title Artist
energy When Doves Cry Prince and the Revolution
key 212 Azealia Banks
loudness Crazy Gnarls Barkley
mode Baby Love The Supremes
speechiness Flava in Ya Ear (Remix) Craig Mack feat. Notorious B.I.G
acousticness Teenage Riot Sonic Youth
instrumentalness Green Onions Booker T. and the MGs
liveness Tyrone Erykah Badu
valence Pressure Drop Toots and the Maytals
tempo Doll Parts Hole
time_signature Stronger Kanye West
duration_min Pt. 1-Acknowledgement John Coltrane
LOWEST SCORES                                  

Feature Title Artist
energy Crazy Patsy Cline
key You're So Vain Carly Simon
loudness Crazy Patsy Cline
mode Stronger Kanye West
speechiness Waterloo Sunset The Kinks
acousticness Summer Babe (Winter Version) Pavement
instrumentalness Stronger Kanye West
liveness B.O.B. Outkast
valence Lose Yourself Eminem
tempo Without You Harry Nilsson
time_signature Solsbury Hill Peter Gabriel
duration_min It Takes Two Rob Base and DJ E-Z Rock

I also got to wonder if there were a way to find out which were the most representative songs in term of some audio features, e.g., in a given decade. This post on Stack Overflow pointed me in the right direction. The idea is to have a function that sorts a dataframe by the values closest to the mean in a given column. For example, if you want to have the song that’s closest to the mean in terms of duration, you would call sort_by_closest_to_mean(df, 'duration')[0].

Mean for dataset: 	4.26
duration_min 4.26
Title I Like It
Artist Cardi B, J Balvin, and Bad Bunny
Name: 116, dtype: object

Linear Regression? Really?

All right, this is just for fun. So please do not take it too seriously. But what happens if one were to run a linear regression on the dataset? Well, here it goes.

feature coef
energy 7.92
key 4.90
loudness -1.32
mode -2.76
speechiness -6.29
acousticness -12.67
instrumentalness -9.30
liveness 6.35
valence 3.69
tempo -1.44
time_signature -1.86




Intellectual omnivore. Philosopher by training, management consultant by trade.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Better data communication with {ggplot2}, part 2


My Journey to Reinforcement Learning — Part 1.5: Simple Binary Image Transformation with Q-Learning

K-Means Clustering in the network security domain.

My Notes for Singular Value Decomposition with Interactive Code (feat Peter Mills)

Should you be harvesting data to gain insights to your internal processes?

Customer spotlight: Optimizing DERs by forecasting load and renewable production

Data Science Startups To Work For In 2022

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Bernardino Sassoli

Bernardino Sassoli

Intellectual omnivore. Philosopher by training, management consultant by trade.

More from Medium

Prediction and match preview for PSG vs Nice in the French Cup

My 2021 NFL awards


FIFA 18 Analysis