pub

Music Curation Tooling

This is a complex workflow! If it is ever fully documented, this is the beginning!

From local archiving and internet scraping of master file and meta data, to indexed storage, space optimization, classification cataloging, parameter remastering, presentation composition, sequencing, and intermediate steps. The tooling is not only complex but ambigious at times. For example ‘meta data’ may refer to recording, composer, artist, or performance venue, dates of composition, performance, release, etc. It may refer to data embeded into the file codec, or extrapolated data from a collection of recordings.

To avoid unintended consequences, extensive validation is used in the tooling, many programs and functions will test and only operate within expected parameters. While validation errors may seem endless for a new setup, they are seamless in normal operation and very effective at detecting irregularities from an expected setup.

Generally, the tooling downloads master recordings, normalizes file naming, remasters to mp3, manages sequence within collections, manages overview of collections (artists, duration, liner notes, etc), enables rapid refactoring of presentations, and exports composed sets of mp3 playlist files.

Terminology

File Naming Format

The workflow outputs mp3 files named to identify their sequence in a set, the artist or composer, specific opus or catalog, composition, key, and performance details; while the sequence, composer, and composition are expected, the other details are optional. Following this info in the filename is a unique id, and any transcoding parameters used. Typically this information is enough to uniquly identify, or search for, the original recording from a collection; download the original master file, extract the particular excerpt and transcode a new mp3 with the same parameters.

Output mp3 files are moved to a ./loss directory when they are complete. Normally they would be imedately moved to a playlist directory, but this tends to be a repositary for sub-optimal transcodings which didn’t pass the listening test, or backup for when playlist files are updated with new parameters.

Intermediate wav, flac, meas, and tmp files are created and stored in @/tmp, these files only use the id and transcoding parameters in their names. Files in process are stored with a trailing tildi (~), which is removed when that process is done. Any file with the correct naming scheme in @/tmp is assumed to be viable cache data, and will be used to save processing time, these wav and flac files should be purged as needed to save space.

Meta data, including media thumbnail art, json comments and description are stored in @/meta. A directory called orig, stores original master files, which are hard linked to their equivalent file in @.

Additional section meta data is stored in ‘comma files’ these files are named by a base 32 character followed by a comma, /^[0123456789abcdefghjkmnpqrstuvxyz]*,/ and store section information about the mp3s beginning with the same character. The section info is terminated by a blank line, the remainder of the file is periodically refreshed with calculated a program duration per artist overview.

Artist or recording meta data may be stored as /^[${b32re}]00,.*\.txt/ eg p00,set-name.txt, which is simply the major base sequence charcter prepended to the yaml file created by _youtube_json2txt which is invoked from the _youtube download process. ie simply prepend the name of the txt metadata file with the section in which it belongs, within the collection, and it will be sorted, and processed approprately, by numlist, and the other tools.

All of these directories and files are enclosed in a project directory in the form 6432-name describing the time of creation and name of the collection. Time is represented by the four most significant characters of unix time in hex. In this scheme the number increments about every 18 hours. Add 4 zeros, and convert to decimal for an approximation of regular unix time.

A script directory encloses release.sh, exclude.sh and a directory callled ./class. Within the class directory are regex lists of various classifications.

Transcoding Parameters

Environment

these variables should be set and exported

Software

Workflow

The workflow is highly optimized for rapid itterative curation activities. All of these commands are extended shell functions.

Export