Google Dubtechno Now

Gamelan Hutan: Some Ideas About Conceptual Dance Music

3rd of August, 2021



“balinese gamelan is literally just dnb and i will defend that position lol”
cd_r0ms

A few years ago I had a great idea for a music project that I was never able to follow through on, and since I am now describing the idea here (instead of just actually doing it) I am committing to the impossibility of its realisation. Inspired by this experiment in synthesising human speech by YouTuber SomethingUnreal, I would set up a basic neural network and feed it two different training data sets. One would be fed 1990s jungle tapes, perhaps Jungle Massive 4, and the other would be fed with gamelan tapes (Balinese of course, perhaps gamelan beleganjur), like Kreasi Beleganjur Yadnya Swara. I would then combine the training sets and feed them into another neural network. After two short tracks consisting of the outputs of the first two neural networks, the meat of the record would be the unedited output of this final network, this Gamelan Hutan.

Listening to the machine synthesis of the human voice presented here gives one some impression of the sort of music that might have been produced by the systems I describe. In the experiment, the neural network is able to approximate the timbre of the human voice, but what it produces is for all intents and purposes a mad gibbering—made all the more disturbing by its occasional success in producing disembodied lexemes. I imagine the jungle and gamelan that might appear: familiar timbres and brief arrangements of whole percussive elements which are arranged in the end in a mutant catastrophe that succeeds in dispensing with human notions of coherent horizontal composition. Indeed, a sort of political statement emerges: When distinct and specific musical traditions are interpreted by outsiders—be they early 20th century musicologists first discovering the exotic classical courts of Bali or 2010s American teenagers discovering jungle via RateYourMusic chart algorithms—those outsiders are quick to construct an internal model of the music that accounts for the superficial and immediate impression of timbre, and slow to internalise the horizontal compositional conventions that most properly inform the character of the music. This political reading might be arrived at independently of my experiment by listening to the music of, for instance, Virtual Self, a producer who makes music that sounds like trance at first blush but has more in common, compositionally, with progressive house, trip hop and future bass.

“I'm less happy with the results this time around than in my last RNN+voice video, because I've experimented much less with my own voice than I have with higher-pitched voices from various games and haven't found the ideal combination of settings yet.Also, learning from a low-pitched voice is not as easy as with a high-pitched voice, for reasons explained in the first part of the video (basically, the most fundamental patterns are longer with a low-pitched voice).”
—SomethingUnreal

Of course, it was beyond my meagre abilities to make these neural networks in the first place. I have to content myself only with imagining the music that it might have produced. Even this is problematic though, since the limits of my technical knowledge also necessarily limit my capacity to imagine the music that I might possibly have been able to wring from these systems. For instance: the way the training data would have to be fed into the network in the first place would be as plain text. Digital audio data can be converted into plain text trivially easily, but even very short snippets of audio produce monumental amounts of text. For example, a single second of white noise recorded at lossless audio quality and converted into text produces almost ninety thousand characters. A full hour of jungle at that quality would be well over 300 million characters, which is completely unmanageable as a data set without a super-computer. For the training data to be manageable, audio would need to be 8-bit, with a very low sample rate, which of course entails that the output is also of this quality. The music is necessarily “lo-bit”.

This is sort of poetic, given the history of messy jungle mixtapes and the history of gamelan performance dissemination in the west via field recording and low quality cassette tape—but it’s also poetic in light of the development of new irony. Years ago, when I told my sister about my idea, she told me that she thought it might sound cool. She told one of her friends about it, a friend who apparently mixes jungle, and he was vehemently against the idea. His read on the situation was that my concept was a ridiculous bourgeois over-intellectualisation of a deeply working class and Afro-Caribbean musical style: my concept was musical gentrification. She told me about this reaction, for some reason, and I can’t say I blame the guy. It’s an astute observation, and I’m willing to suspend my annoyance at the famously conservative gatekeeping of junglists in order to recognise that yes, appropriation of black music by white conceptual artists for their post-avant-dreamfunk nonsense is indeed a legitimate grievance.

The new irony I refer to is the irony of vaporwave, and of HexD. HexD is the newest conceptual pool of boiling hot vomit dreamed up by the likes of white Portland residents who are bored of playing dull-as-dishwater post hardcore and have decided to ruin hip hop and dance music. I am of course being flippant here: although the soundcloud trap stuff is all (predictably) fucking terrible, I actually like sienna sleep. The first of his “Vyvanse Trance” series of mixes, a collection of freeform hardcore DJ mixes in which oldschool hard trance tunes are sped up, bitcrushed, and EQ’d to sound like low quality bootlegs, is very good. But, the mix is called “dtoU+f64yzylpqk+xYtxpcXRCt/P56qbut9fNbUrA”, which is just such wank. This is the new irony of internet mystique: in a gentrified panopticon of an internet in which there are no dark corners left—in which everything is available to us in high definition, instantly—sienna sleep has chosen to make his mixes sound like shit.

I used to say to people: “try and ignore the poncy over-intellectualising of trance here, because of course the concept of ‘nostalgia for the old internet’ is trite, but the music is good”. Having heard the tracks he’s actually playing, this just isn’t true though. The mix does sound like shit (in my opinion), and the concept is trite (this one is just objective fact), but it wouldn’t necessarily sound better without all the effects, without the concept. The wonderful roundness, or bounciness, of the kick drums probably wouldn’t be possible without all the smoke and mirrors. The hysterical cackle of the hi hats would just be the regular metronomic ticking of a clicking sample without the bitcrush. Both of these wonderful textures—so important to the music we do have—are not present in the original “clean” mixes of these tracks. The mythical clean copy, underneath, doesn’t exist. We have to extrapolate from what we have, and filling it out in our heads produces an ideal music that we can’t access.

In the speech synthesis experiment, SomethingUnreal describes his dismay at the results. He says that “the fundamental patterns [of language] are longer with a low-pitched voice”, which is of course because the lower pitch necessitates longer wavelengths of sound. Jungle is famous for its basslines, and the bass gongs of gamelan that announce the completion of its various cyclical motifs are of paramount importance. How this technical limitation in producing coherent “lexemes” in the bass register might scupper my musical concept, I can’t know. Where I envisaged a garbled stuttering arrangement of disembodied textures that resemble jungle and gamelan, those textures may just resemble nothing much at all. In that case, I think I would have to unfortunately agree that my sister’s friend was right. The whole thing would have to be shitcanned.