“ C R E A T I O N : Music & AI ”
[ t h e s i s , excerpt : body ]
Sect One
Introduction to AI & Value Impact to Creative community
As a novel software engineer who is entering the space of Artificial Intelligence ( AI ) , the first ideal of knowing where to start reflects the thought process of life : listen to those who have more experience and success in the space and direction that’s the intention to go .
So amongst the research , here is an excerpt from Dale – a computer science research fellow at Carnegie Mellon – on a beginner-level understanding of the mechanics of AI :
“ … To make a long story very short (and probably overly simple), researchers in Artificial Intelligence (AI) started by writing programs to solve difficult problems that had no direct algorithmic solution. Early AI programs used a variety of techniques, especially sophisticated forms of search. It became apparent that humans solve many problems using prior knowledge to guide their search, so AI researchers began to study knowledge acquisition and representation. “Expert Systems” emerged from this and found many applications. As researchers continued to tackle hard problems, some began to look for ways that computers could acquire their own knowledge, thus “learning” about problems and their solutions. In many cases, machine learning systems have been found to surpass systems based on manually coded knowledge…”
With this in mind , the understanding of the system that holds the language learning models , that enable supervised and unsupervised trainings that streamline the accuracy of these intelligence models became the next step . Yet , a wall approaches – the general consensus of the AI technology community is that the functioning of this ( as in Ai ) is a black box; even the founder of OpenAI , one of the leading open source artificial intelligence models in our planet , admits quite in awe and unashamedly , “ there’s really no way to know … if [ the software ] is thinking that … or if [ the software ] just saw that alot of times in the training set … ” . So , with this black box in mind and a future of technologies to understand and safeguard for as an usability- and accessibility- minding UX Engineer - i.e. a product designer with software engineering capabilities – the ideation continues into this research proposal you read now .
Introduction of industry & value impact to creatives engineers & artists
In the history of the music industry , which shall be in reference as the “ Industry “ interchangeable throughout this publication , there has been a myriad of examples made of the artists that provide the feeling , features and arts and therefore assets to capitalize upon – as the industry did . Such with example , what is meant here is the idea that artists were faced with exploitation via advances that left them penniless after the books were balanced and some still had to stay within contracts for longer than understand to continue to pay off the debts incurred via partnerships ; with families left in wreckage , family fortunes in depletion and the health ( spiritual , emotional , mental , physical ) in disarray , should we just look away and move forward ?
Perhaps ; forgiving is a stressless way of living and should be an absolute decision to heal . Yet , what if we can do both : heal & rectify ?
Say there was a way ( i . e . an artificial intelligence model ) that could look into aiding the artists in recouping lost payouts and revenue from the many amounts when their art and creations have been of use via samplings – sample(s) or sampling(s) shall be in use interchangeably as well - and therefore the enablement and showcasing of more creations ( especially , audios such as streams , etc . )
Sect Two
Research : Artificial Intelligence , Large Language Models and Sonic Transcriptions
So , what’s the engineering , structure , technology stack(s) of Ai ? the ideations and design ? Well , the research into Voice AI is quite of interest .
Technology
Within the field of transcriptions , some models have been described to “ compute… probabilities of symbolic sequences, and so is trained to give a high probability to such sequences from … [ the ] training dataset” .
In simpler terms , there’s an ability to look at sound in another form ( i.e. a symbolic sequence ) and hold many comparisons to see how close the sonic you are asking of , relates to the sonic(s) within the training set(s) . The more comparisons that can be tracked within the Ai model , that then can be seen and verified by the artist(s) who create the sonic(s) , continually specify the accuracy of model(s) .
In the space of sonic transcriptions , the focus of relation within artificial intelligence technologies are large language models . This principle shall be in explanation below .
Large language models largely represent a class of deep learning architectures called transformer networks. A transformer model is a neural network that learns context and meaning by tracking relationships in sequential data, like the words in this sentence.
A transformer is made up of multiple transformer blocks, also known as layers. For example, a transformer has self-attention layers, feed-forward layers, and normalization layers, all working together to decipher input to predict streams of output at inference. The layers can be stacked to make deeper transformers and powerful language models. Transformers were first introduced by Google in the 2017 paper “Attention Is All You Need.”
There are two key innovations that make transformers particularly adept for large language models: positional encodings and self-attention.
Positional encoding embeds the order of which the input occurs within a given sequence. Essentially, instead of feeding words within a sentence sequentially into the neural network, thanks to positional encoding, the words can be fed in non-sequentially.
Self-attention assigns a weight to each part of the input data while processing it. This weight signifies the importance of that input in context to the rest of the input. In other words, models no longer have to dedicate the same attention to all inputs and can focus on the parts of the input that actually matter. This representation of what parts of the input the neural network needs to pay attention to is learnt over time as the model sifts and analyzes mountains of data.
These two techniques in conjunction allow for analyzing the subtle ways and contexts in which distinct elements influence and relate to each other over long distances, non-sequentially.
The ability to process data non-sequentially enables the decomposition of the complex problem into multiple, smaller, simultaneous computations. Naturally, GPUs are well suited to solve these types of problems in parallel, allowing for large-scale processing of large-scale unlabelled datasets and enormous transformer networks ,” - Nvidia .
Let’s get into the analysis of the problem definition , ideation and prototyping stages …
Analysis :
Information retrieval of data training set , music of AfroDiasporan community
and
Transcript comparisons ( for sampling identification of artist within sonic )
problem definition
May voice ai be able to compare sonic transcriptions - and via information retrieval - be able to distinguish the original sonic ( and artist of creation ) within a sample ? ; therefore , strengthening artists’ ability in receiving retribution for oversights felt in their experience(s) with the industry for music .
So , how does information retrieval work again ? This pattern recognition and output feature receives beautiful explanation here :
Machine learning : Nvidia .
Now let’s keep in mind the safety checks ; if one continues to see , creating and innovating technology without maintaining awareness on the safety checks is like flying into the sun . There’s better ways .
So , in being aware of the community who is interacting with these artificial intelligence technologies , some concerns become apparent , such as :
“ when [ the artist ] haven’t consented to [ this ai tech ] … how are we gonna navigate this … should this be … people who consent … or any name in a prompt … they should consent … “ – Chris Anderson , TED
“ … the question of where the line should be … and how people say this is too much … [ should be ] sorted out … before … via .. copyright law … fair use … ” – Sam Altman , OpenAI
“ definitely a change … i have empathy towards those who [ say ] … i like how things were before … there should be … in principle … a way … to calculate some sort of revenue share . no ? … i think it’ll be cool to figure out a new model … in the name of this … artist … and theres a new business model … “ – Sam Altman , OpenAI
“ … Even if an engineer has good intentions, their uncritical treatment of data and their resulting systems might become instrumental in producing great social and economic harm … ” – Bob Sterm , Maria Iglesias , Oded Ben-Tal , Marius Miron , Emilia Gòmez .
ideation + prototyping
So let’s look at a wireframe , a visual to add in the comprehension of the engineering task at hand : voice ai – the use of language models to identify artist contribution to samplings , currently within database .
To keep in mind such concerns , these ideations rose :
artists consenting to audio files within database ,
the metadata of audio file may pass through a boolean check to verify if the audio file is of an artist that already has statement of consent to involvement with this ai software ; specifically , the name of artists who consent may be held within a list for searching/scanning ;
a model to calculate revenue for artists ,
the audio files that the audience prompts to scan via uploading ( i.e. upload feature ) will pass as input through method(s) to retrieve such information ( i.e. name of vocalists , name of instrumentalists , stream duration , etc . ) . The successful retrieval of this information may be in use via the notification of consenting artists , to whom tracking and knowledge of statistics translate to revenue via streaming , etc .
engineering oversight ( esp . uncritical treatment of data ) that may lead to social harm ,
via experience in professional and educational spaces , engineering oversights due to biases are made well with the solution of diversification of the engineering team , as well as attention to well-beings. The engineers who supply code and supervisional trainings to these intelligence mods are just as human and prone to mistake as the audience who interact with such models for advancement ( esp. social , professional , creative ) .
In acknowledging the affinities ( i.e. safety of audience-technology relationship , feature to ascertain revenue opportunities for artists , successful information retrieval of data ) of the feedback seen via research – please see “ Quoted Material” for more - this user story arises , among many .
“ As a user ,
I would like to upload an audio file ,
So that I can see such information of the samplings within audio file such as vocal artists , instrumentalists , and engineers . ”
Onto wireframes !
Home Screen
Upon entering app , the feature to upload a sonic for analysis is in position of highlight , along with waveform motion graphic for visual engagement while downloading . ‘ AAA’ or ‘ Audio Analysis for Accreditation’ , TAP .
Log Screen
upon navigating to screen via nav bar , the features to access a list of the songs – and metadata of relation ) that were in analysis previously . ‘ AAA’ or ‘ Audio Analysis for Accreditation’ , TAP .
Profile Screen
Upon navigating to screen via nav bar , the features to access personal account settings – biography , privacy , etc . ‘ AAA’ or ‘ Audio Analysis for Accreditation’ , TAP .
Sect Three
testing + engineer handoff & …
CONCLUSION
Is there any right , that the designer of such an AI should retain? What would be the impact of maintaining the status quo, or of passing new rights in the market of human created and/or AI generated works? As our survey of copyright law indicates, technological developments challenge established norms. The distant future sketched above suggests that there is a need for fundamental re-thinking in this area ; and as technology and advancements , all shall continue .
Of course, the inevitability of such a sonic-sampling identifying AI should be taken with a grain of salt. Human creativity can surprise in its ability to incorporate new technologies … feel free to share where you draw the line of possibility .