
Hacktivists from Anna’s Archive have undertaken a massive preservation effort by downloading approximately 86 million songs from Spotify, representing 99.6% of all streams on the platform. This archival project offers unprecedented insights into Spotify’s vast music library and listening patterns.
The Scope of the Archive
The archivists have scraped metadata from nearly the entire Spotify library, which spans about 300 terabytes and includes 256 million tracks from 15.43 million artists across 58.6 million albums. According to their blog post, this constitutes the “largest publicly available music metadata database” to date and represents a significant step toward building a comprehensive “preservation archive” for music.
Key Findings from the Data
The metadata analysis revealed several fascinating insights about music consumption and representation on Spotify:
- The top three most popular songs on Spotify have more streams than the bottom 20-100 million songs combined, highlighting extreme inequality in listening patterns.
- Electronic dance music artists make up nearly a quarter of all musicians on the platform, followed by rock, world/traditional, Latin, rap, pop, and classical genres.
- C Major is the most common musical key (9.3% of songs), while D# or Eb minor is the least common (1.3%).
Preservation Philosophy
While the hacktivists acknowledge that popular music is generally well-archived, they argue that current preservation efforts focus too heavily on commercially successful songs and high-quality file formats. Their approach aims to be more inclusive, capturing experimental and less mainstream music that might otherwise be lost.
AI Content Considerations
The archivists deliberately excluded songs with fewer than 1,000 listens, which would have expanded the dataset to over 700 terabytes. This decision was partly made to filter out AI-generated content, which human artists have complained is crowding them out of the platform. The team noted that the streaming numbers would likely be higher if only human-created songs were considered.
Limitations and Future Potential
Despite its impressive scope, the archive represents only a portion of the world’s music. However, as the hacktivists stated, “it’s a great start” for music preservation efforts that could eventually be more comprehensive.
This project not only preserves music but also provides valuable data insights that were previously unavailable to the public, offering a unique perspective on the digital music ecosystem and listening habits worldwide.


GIPHY App Key not set. Please check settings