Visualizing Song Structure with Self-Similarity Matrices

May 2025  — 
 analysisSSMmatricesmatrixautocorrelationmusicaudiolyricspython

I’ve been experimenting with Self-Similarity Matrices (SSMs) to explore structural patterns in music. SSMs are a visualization technique where each point in the matrix represents the similarity between two segments of a song, typically using spectral features like mel-frequency content. By overlaying this with a matrix derived from lyrics or spoken features (e.g., repeated words or syllables), it's possible to get a multi-layered view of how songs are structured — musically and linguistically.

The method follows the approach described in this tutorial from the FMP Notebooks, adapted for some basic audio and text processing.

Case 1: Heaven or Las Vegas – Cocteau Twins

In the first test, I applied the technique to Cocteau Twins’ Heaven or Las Vegas. The result suggests a form with a short intro, followed by multiple sections (A, B, C), each with differing internal density. The last 2/3 of the matrix shows a wide, diffuse block, possibly indicating layered instrumentation or effects processing. Parallel diagonal lines suggest recurring lyric-like elements, aligned with vocal phrasing.

Case 2: Toma la Ruta – Soda Stereo

A second experiment was run with Toma la Ruta from Soda Stereo’s Dynamo (1992), an album often noted for its experimental textures. The resulting SSM here showed similar traits: extended sections with overlapping or fuzzy repetition and long-range dependencies, again hinting at atmospheric layering. This offers a useful point of comparison with the Cocteau Twins result — both tracks belong to a stylistic space that favors evolving texture over rigid form.

Case 3: Diablo – Rosalía

To check whether the method was overfitting to "dream pop" or "shoegaze"-like music, I tested a more recent track by Rosalía — rooted in modern reggaeton but with nice "avant-garde" elements. The result was noticeably different: shorter structural segments, less long-range repetition, and more discrete, blocky transitions. This difference suggests that the method is indeed sensitive to structural and stylistic features, not just applying a generic template.

This tool aim to complement traditional music theory and listening analysis with a visual summary of repetition and development in a track. In these early tests, it’s already showing useful contrasts between genres and production styles.

To-Do: It would be nice to somehow match the lyrics analysis with the structural blocks in the audio SSM.

Do you work with music, code, or both? Feel free to share or fork this technique. The scripts and method are simple to adapt and will be shared in a repository as soon I fix a memory leak.

Cocteau Twins - Heaven or Las Vegas (1990)
No Pages Found