Smaller, Slower, Sloppier1by Xuan Ye

My research-creation with and in artificial intelligence2 started in 2016. Amidst my role as a full-time web developer for user experience, I initiated an art project exploring digital poetics within algorithmic narration. This venture led me down a labyrinthine path of experimentation with artist-as-hacker methods, resulting in text-based artworks, including Twitter bots, interactive websites, and more.

In the work I, It's, The (2017 - 2018), I started a process utilizing the next-word prediction feature in smartphones to generate sentences and email correspondences addressed to myself. This process evolved into an ongoing dialogue between myself and the consumer-grade artificial intelligence system powering the text prediction function. The initial three words suggested by the keyboard interface—"I, It's, The"—served as a dialogue starter. With each interaction after the dialogue starter, the interface continued to propose new sets of three words without depleting its capacity. I would select one word out of three in each sentence iteration. Without knowing the predicted following words, the selecting step is almost arbitrary. After approximately 50 attempts spanning over a few months, the dialogue starter "I, It's, The" shifted to something fresh.

To rewrite my philosophical reverberations behind this work at the time—the AI-enacted language predictions are shaped by my language usage habits and tracked through all typing interactions with the keyboard interface. As I dialogue with this predicted "future-me," the once-clear distinction between myself and the machine begins to blur, gradually eroded line by line. The resulting output is marked by occasional self-contradiction, exemplified by phrases like: "The only one thing that sounds good is that it doesn't sound good."

Mikhail Bakhtin's concept of dialogism posits that the ideological becoming of a human being "is the process of selectively assimilating the words of others."3 In this process, human agency relinquishes authorship to the machine, while the machine is tasked with learning from the human's dissolving subjectivity. The dialectic of the mutual becoming of "I-It" and "self-other" entwine like a double helix. The ghost in the machine feels like kin.

As I reflect upon this work, what I did was close to the inner workings of a chatbot enabled by word embedding algorithms employed in the large language models that are now ubiquitous. My early intervention in re-enacting a chatbot fits the method I've now dubbed "Smaller, Slower, Sloppier" in this writing, which aims to unravel my process working in and with AI later. These methods guide my engagement with consumer-grade AI-powered functionalities, open-sourced data and codebases in creating images, text, and sound.

The solemnity of the title "Smaller, Slower, and Sloppier" intentionally mirrors the rhetoric of tech behemoths, with their grandiose promises of machines that are "Smarter, Brighter, Mightier."4 The portrayal of machines as intelligent (or "smart") and capable of almost omnipotent feats, marketed for mass consumption, promotes a techno-optimistic and extractivist worldview further entrenched by colonial capitalist ideologies. Ostensibly accessible technologies such as consumer-grade AI-powered functionalities, with their seductive promises of convenient and flawless prognostication, project an immaculate certainty that perpetuates the narrative of techno-capitalism driven by the relentless pursuit of progress and control.

Amid this glossy veneer, I discerned an opportunity for subversion. "Smaller, Slower, and Sloppier" proposes a reductivist stance, highlighting three tendencies in my artistic process. "Smaller Data" refers to collecting or feeding data in a lightweight manner, eschewing the need for vast datasets (big data) in favour of rigorous and commonsensical results. "Slower Training" involves training data in a shorter or somewhat inadequate way, deviating from swift and exhaustive data training processes. "Sloppier Rendering" abstains from meticulous fine-tuning of parameters, embracing unpredictability and imperfection. This practice beckons forth a reimagining of our relationship with technology, challenging the norm that technological progress must strive for optimization and efficiency to the detriment of human intuition and machinic labour. My works remix the aberrations arising from these methods, also illuminating how technological hiccups catalyze a reconfiguration of my imaginative horizons and perception.

To illustrate, I will delve into three projects developed around the same time: ERROAR#4 The Oral Logic (2019), Deep Aware Triads - orishormonospina (2019) and Garrulous Guts (2019).

ERROAR!#4 The Oral Logic, 2019 (single-channel HD video(3’25”, sound), interactive web-based application, laser-engraved mirror, 3D prints, generative poetry on paper scroll, algorithmic composition in collaboration with Jason Doell). Documentation at Ed Video booth, Supermarket Art Fair 2019 in Stockholm, Sweden.

In studying the early forays of experimentation in AI under the purview of DARPA, I happened upon a riveting anecdote—an account detailing the first case of virtual cannibalism enacted by AI entities, a consequence of inadequate directives by human researchers.5 The ramifications of this encounter resulted in the multimedia installation ERROAR!#4 The Oral Logic (2019), prompting my contemplation on the metaphorical resonance of cannibalism that transcends mere cultural symbolism to underscore a broader discourse on the cyborgian body. In this context, artificial intelligence becomes emblematic of a "cannibalized" human agency, or vice versa, a concept further elucidated by Catherine Malabou's observation that the notion of intelligence is inherently a collaborative effort between humans and machines.6

Apart from a single-channel video that re-animates the story, ERROAR!#4 The Oral Logic also features a generative text, "Past Said Lore by Sir. Salad Poet," and an interactive web application. To mirror the Christian genesis narrative in the DARPA experiment, I fed the text of John Milton's Paradise Lost as training data by implementing a recurrent neural network for a character-level language model.7 Deliberately setting the parameters to values below the threshold for optimal predictions, the resulting text is comprehensible yet notably irrational, contributing to the surrealism surrounding the narrative of virtual cannibalism recounted in the anecdote. The web application employs a real-time YOLO object detection system trained on datasets of various objects, capable of detecting 80 classes of objects with predicted probabilities.8 By positioning a webcam and a mirror facing each other at eye level, viewers are encouraged to engage with the application, which misidentifies human faces as objects. Objectifying humans by an erroneously functioning software initiates an endless feedback loop between the mirror and the camera, symbolizing a metaphorical autophagy akin to the closed loop of a human skeleton consuming its tailbone.

Deep Aware Triads - orishormonospina, 2019 (giclée print triptych).

I started the Deep Aware Triads series as an attempt to learn about artificial neural networks designed to replicate biological neural networks in animal brains.9 In computer vision, a supervised machine learning model employing artificial neural networks facilitates object detection, as exemplified by YOLO in the previous instance. Throughout the training process, the model's parameters, including weights and biases, are iteratively tuned to minimize a loss function that measures the disparity between predicted outputs and ground truth labels that construct referential databases for the machine to validate a target or objective usually defined by humans.10

Drawing on an ultra-simplified diagram of an artificial neural network I fictionalized to culminate in a single definitive goal or representation, I aimed to metaphorically mirror the training procedure by which these networks would identify "complex phenomena" within the context of Deep Aware Triads.

A gif image depicting the diagram used in Deep Aware Triads.

For instance, orishormonospina (2019) originated from my intrigue with how digital technologies shape embodied experiences. I chose to unfold this intricate interplay through three threads or subject matters: oris (oral), hormono (hormonal), and spina (spinal). With each subject matter serving as the target node in the output layer, I identified two facets (or dimensions)11 of the subject matter that I deemed relevant as the input layer and elucidated their connections, resulting in three nodes as the hidden layer. I compiled image databases from royalty-free stock photograph archives and medical research for each subject matter to populate these nodes in my diagram with visual lexicons.

Subsequently, I crafted collages of these images, utilizing the content-aware fill function in Adobe Photoshop and Illustrator to smear the borders of digital images with machine-calculated pixels. The content-aware fill function, designed to swiftly fill an area of an image with new information based on surrounding pixels, offered an AI-driven approach to removing objects or extending backgrounds without the need for manual cloning and filling.12 Instead of seeking clean and accurate predictions, I welcomed miscalculations by setting small sampling areas and low adaption values. Images encroached upon one another as if they were virtual bodies, giving rise to unexpected and enigmatic brushstrokes and textures that are repetitions of a small number of predicted pixels. Resembling a microscopic perspective, the final works are digital paintings exuding intricate details and evoking algorithmic pareidolia that could be seen as false positive results or “hallucinations” in the typical implementation of artificial neural networks.

In real-world applications, AI data labelling often relies on low-paid crowd workers,13 highlighting the ethical precarity inherent in the global division of digital labour. In the creation of this project, I view my method of annotation as an act akin to translation, whereby the curation of such data serves to encode the semiotic associations surrounding the phenomena on a symbolic level, surpassing mere iconicity. The representational authority of stock photography in contemporary visual culture reinforcing power structures in cultural production,14 adds to another stratum of annotation. As I navigated this complex use of stock photography imagery and meaning-making, I became acutely aware of the layers of significance embedded within each curated datum.

While artificial neural networks mimic human neuro-functions through mathematic abstraction, represented as “lower-dimensional diagrams of the biological architecture responsible for cognition in organic neural tissue”,15 my approach within the Deep Aware Triads series veers towards a recursive biomimicry. The digital paintings transform the probabilistic nature of computational feasibility into a metaphysical inquiry—what constitutes the “ground truth” of something entangled within myriad other entities, undergoing perpetual shifts in meaning and becoming?

A 10-minute clip was generated by Garrulous Guts 1.0 on April 28th, 2024.

Configured as a digestive system, the installation iteration of Garrulous Guts comprises multiple components. A two-channel 3D animation depicting the gastrointestinal tract, overlaid with texts, is projected onto the wall and the white air ducts covering two subwoofers that playback an algorithmic composition. The low-frequency sounds push the speakers' membranes, causing the capsules containing antibiotics and anti-estrogen hormone powders to pulsate.

In the sound design for this installation, I employed WaveNet, a deep neural network created by Google DeepMind for synthesizing raw audio waveforms.16 Developed initially for text-to-speech or speech-to-text functions like voice assistants, I trained an instance of WaveNet using a dataset of English speech corpus commonly used in ESL education. In the experiments, I discovered that after training the dataset insufficiently in duration and iteration, the results were glitched, stuttered, and unrecognizable English speech sounds. The new quasi-English speech that emerged from this process is noisy and asemic. It completely defeats the purpose of learning English or a speech recognition algorithm. It reassembles a tongue in the in-between space where one is acquiring a new language. It prompts hesitation, disruption, and unlearning, like vomit.

I gathered sound samples from three additional categories: amplified lung murmurs, sounds of vomiting, and purportedly therapeutic meditation music for alleviating vomiting symptoms. These sounds were then alchemized with the indecipherable English speech using a ChucK program devised by my collaborator Jason Doell, employing an aleatory algorithm to produce an endlessly generative composition. While the digital paintings in the Deep Aware Triads series are output artworks,17 the composition in Garrulous Guts is executable code, generating distinct outputs of infinite duration when activated at runtime.

ERROAR#4 - the Oral Logic (2019), Deep Aware Triads - orishormonospina (2019), and Garrulous Guts (2019) eventually coalesced into my solo exhibition project, titled The Oral Logic, in 2019. As we journey into the electronic expanse, we, as consumers and feeders, ingest, ruminate, metabolize, and assimilate electronic culture across all domains. From ingestion to articulation, from the molecular to the epistemological, our body and psyche are tethered to the cybogian economy. Massive networks mobilized by planetary computation exhibit both affective and cognitive capacities. In this context, The Oral Logic scrutinized the poetics and politics of human-machine coupling, prompting inquiry into the trajectory of this symbiotic relationship. And Smaller, Slower, Sloppier offers a framework—err with machines—to exchange agencies and enfold contingencies. Abort and divert together from hegemonic forces while building a morphing network of meaning-becoming.

Endnotes

  1. As an ESL (English as Second Language learner) who is fond of poetic noises, I have developed a rather complex workflow for writing in accessible English, often with the aid of AI-powered tools while working within an internet-connected interface. Most recently my aid mix includes free tiered Grammarly, DeepL, and ChatGPT-3.5, and I use Obsidian (in Markdown language) as my primary writing interface. This workflow typically involves recursive translation, light paraphrasing and grammar checks. Although the workflow is mainly intended for a more formal expression in an Anglophone environment rather than creative exploration, the raw outputs generated by these tools often feed back to my thought process prompting new sequences of words and ideas. Thus, I consider all my published writings in English as a collaboration with cybernetic intelligence.
  2. In this brief writing, the term "artificial intelligence" is employed in its broadest sense, encompassing various subfields such as machine learning and deep learning. While each of these domains has its own distinct definition and applications, the overarching concept of “artificial intelligence” here refers to intelligence exhibited by computer systems that often involve human intelligence.
  3. M.M. Bakhtin, The Dialogic Imagination: Four Essays, ed. by M. Holquist and C. Emerson (Austin: University of Texas Press, 1981), 341.
  4. "Apple Watch Series 9," Apple, accessed April 30, 2024, https://www.apple.com/apple-watch-series-9/.
  5. "Creepy Artificial Intelligence," Cube Development, accessed April 30, 2024, https://medium.com/cube-dev/creepy-artificial-intelligence-ebc3f76179a8/.
  6. Catherine Malabou, Morphing Intelligence: From IQ Measurement to Artificial Brains (New York: Columbia University Press, 2021).
  7. "Text Generation with an RNN," TensorFlow, accessed April 30, 2024, https://www.tensorflow.org/text/tutorials/text_generation.
  8. "Object Detector," ml5.js, accessed April 30, 2024, https://learn.ml5js.org/#/reference/object-detector.
  9. K. Allado-McDowell, "Designing Neural Media," Berliner Festspiele, accessed April 30, 2024, https://www.berlinerfestspiele.de/en/gropius-bau/programm/journal/2023/k-allado-mcdowell-designing-neural-media.
  10. "Ground Truth," Data Science Dictionary, Domino Data Lab, accessed April 30, 2024, https://domino.ai/data-science-dictionary/ground-truth.
  11. Allado-McDowell, "Designing Neural Media".
  12. "Content-Aware Fill in Photoshop," Adobe, accessed April 30, 2024, https://www.adobe.com/ca/products/photoshop/content-aware-fill.html.
  13. "Millions of Workers Are Training AI Models for Pennies," Wired, accessed April 30, 2024, https://www.wired.com/story/millions-of-workers-are-training-ai-models-for-pennies/.
  14. Paul Frosh, Inside the Image Factory: Stock Photography and Cultural Production (New York: Palgrave Macmillan, 2018).
  15. Allado-McDowell, "Designing Neural Media".
  16. "WaveNet: A Generative Model for Raw Audio," DeepMind, accessed April 30, 2024, https://deepmind.com/discover/article/wavenet-a-generative-model-for-raw-audio.
  17. Nick Montfort, "A Platform Poetics / Computational Art, Material and Formal Specificities, and 101 BASIC POEMS," DOI: 10.7273/ba27-s438.