image from work

almost imperceptible delays and lags

  • by

  • AI voice generator, HTML, JS, TypewriterJS, Audacity, QuickTime, Adobe Premiere
  • doi: https://doi.org/10.7273/bpz8-k747
  •  BEGIN 

almost imperceptible delays and lagsa simulated autodramaticuleby Manuel Portela

This work is a four-part multimedia play (or dramatic essay), in which two self-reflexive fictional characters (A and B) talk about their mediated condition as technological inscriptions. The four parts are structured according to patterns of repetition and variation of prompts and replies that expand and problematize in successive dialogues different telepresence technologies: writing (part 1), voice transmission (part 2), audiovisual transmission (part 3), and AI generated writing, voice, and video (part 4).

The aim of the piece is to draw attention to various forms of metonymic presentation of the self through inscription-based media technologies, on the one hand, and to the changing ontology of presence created by writing, voice and video generating technologies, on the other. The decoupling of writing, voice, and video from actual recording, i.e. their production through vectorial and algorithmic generation, is experienced by readers, listeners, and viewers of this work through the perceptual distinction (but also through the perceptual confusion) between what is recorded and what is simulated. Ontological distinctions are further problematized by the framing of the entire work as a stage play that emulates automated generation. While its theatrical logic points to embodied presence, its digital media form enacts several degrees of delegated presence.

The play’s engagement with the material conditions of production of synchronous telecommunication further instantiates problems of intersubjective communication. Media reflexivity also functions as a probe into the mediated constitution of the self through linguistic interaction. Synchronous remote presence becomes a metaphorical figure for self-presence in language. An additional layer of irony can be detected in the characters’ parodic allusions to both theoretical discourse and sentimental language. Artificial generation of texts, voices, and moving images – through its techniques for simulating human presence – attempts to solve human communication problems by turning certain features of the self into a residual effect of calculation of probabilities.

“almost imperceptible delays and lags” is presented here in two versions: as a written script for staging “a simulated autodramaticule”, and as a video essay in which each part has been performed and recorded as described in the stage directions (i.e., as writing, as voice, as audiovisual image, and as generated telepresence). Originally written in Portuguese in December 2023, the script was translated into English by the author. Recording and editing in April 2024. Acknowledgements: Pedro Miguel Diogo (character A), Dina Datta (character B), Tiago Cravidão (video, sound, editing), Luís Lucas Pereira (programming).



The Script:

Part I

[During the first sequence we watch the text being written letter by letter and projected on the screen at the back of the stage. Black letters on a white background. As soundtrack, we can hear, without being excessively loud, the muffled sound of electronic keyboards, suggesting that A and B are writing synchronously. The cursor should be visible and a few random typing keyboard mistakes and their correction should also be shown.]

A – We are finally face to face!

B – Yes, face to face.

A – It’s a manner of speaking. In fact, we are still avatarized.

B – Avatarized?

A – Yes, metonymized.

B – Metonymized?

A – Yes, avatarized and metonymized.

B – Explain yourself.

A – Look, you are writing and I am writing.

B – And then?

A – Synchronized writing is our avatar. It performs us in the network.

B – Do you mean to say that writing makes us exist on the net?

A – It makes us exist metonymically. By a process of inference.

B – If there is writing, there is someone who writes.

A – That’s right. If there is writing, there is a writing self. A scribbling clerk.

B – Is that why you talk about inference and metonymization?

A – Precisely. I write and you infer my being in this virtualized projection of presence.

B – And vice versa. I write and you infer my being in this virtualized projection of presence.

A – When we say “face to face” or “facing each other”, this can only be a turn of phrase.

B – Maybe it would be better to say “we are finally writing the writing! synchronized in the telepresence of our metonymic avatarization!”

A – But are you crazy or what?

B – Well, it’s the corollary of your reasoning. You like conceptual subtleties, but then hold back when conclusions need to be drawn.

A – I don’t want to get ahead of myself.

B – You don’t want to get ahead of yourself?

A – Yes, the case is more complicated than it seems.

B – How so?

Part II

[During the second sequence we only hear the text being said through loudspeakers. The sound of both voices should fill the room as if it were emerging among the spectators. A and B must be recognized as male and female voices, respectively. The screen light has been switched off. Stage and audience in complete darkness.]

A – Imagine that instead of writing, we were talking or recording voice messages.

B – You mean talking as we are now?

A – Yes, as we are now. But just talking, no written messages. Just orality and aurality. Uttering and listening. Listening and uttering.

B – And what would be the difference between a phone call or a recorded audio compared to writing?

A – In this case, avatarization or metonymization would be less evident.

B – I suppose the difference would be in the degree of telepresence.

A – Exactly. The voice emanates directly from the self and is only modulated by the language and the transmission channel.

B – Recognition would be practically instantaneous.

A – Speech implies not only a speaking subject, but a concrete subject, evidenced by the unique acoustic properties of their voice.

B – It does not require other identification systems to establish its referentiality.

A – That’s right. I see that our thoughts are in tune with each other.

B – Yes, in tune and synchronized.

A – Talking to each other.

B – Yes, you with me, and I with you.

A – Would you say then that we are face to face? Mouth and ear perfectly aligned?

B – Maybe in this case we could say “we are finally voice to voice! synchronized in the telepresence of our level-two metonymic avatarization!”

A – “Level-two?”

B – Well, it’s the corollary of your new reasoning. You were the one who said that presence had increased, but apparently you continue to withdraw when the time comes.

A – I don’t want to get ahead of myself.

B – You don’t want to get ahead of yourself?

A – Yes, the case is more complicated than it seems.

B – How so?

Part III

[During the third sequence, two videos are projected and they should fill the entire length of the back of the stage. Each video occupies half of the screen projection area. A's speeches appear in video 1 and B's speeches in video 2. Hypothesis 1: Characters have been framed in profile (one facing left and the other facing right), so that in the projection they appear to be facing each other, but sideways to the audience. The background scenery of each video and the costumes of A and B must be vivid and contrasting. A and B must be recognized as male and female, respectively, suggesting continuity with the previous scene. Hypothesis 2: A and B appear at opposite ends of the stage, facing each other and sideways to the audience. In this case they will be close to the audience and away from the projection screen at the back of the stage. Both characters, highlighted by individual spotlights that fall on them, are being filmed in profile by two different cameras. The images captured by these cameras are projected live onto both halves of the screen. A and B must be recognized as male and female, suggesting continuity with the previous scene. In this second hypothesis, when PART III ends, characters A and B remain on the stage, but they will be in the dark and motionless during PART IV.]

A – Imagine that instead of us writing, or talking and recording voice messages, we were on video calls or exchanging video messages.

B – As are we now?

A – Yes, as we are now. In the multimedia abundance of audiovisual communication. Synchronous moving image and sound. The time on this side nearly the same as the time on the other side. Almost imperceptible delays and lags.

B – I suppose that, in this case, there would no longer be avatars or metonymies.

A – You are wrong!

B – I don't see where you're going. Tell me, wouldn't we be audiovisually present in our essence as multimodal telecommunicators?

A – “Audiovisually present in our essence as multimodal telecommunicators.” But are you crazy or what?

B – If you see and hear me and know that I am in front of the camera and speaking into the microphone, you cannot assume that I am an avatar, or that I am just part of some larger whole. There are no inferences to be made.

A – Of course there are inferences to be made. The image of my face and the sound of my voice must be recognized as projections of the entirety of my body and my being.

B – Do you mean to say that telepresence does not coincide with presence?

A – It does not coincide, nor can it coincide. Immersion technologies base their commercial ontology of virtuality on a phenomenological confusion.

B – I don’t understand you.

A – The metabody is a phantasmal projection of the body. The metaverse is a consensual electronic hallucination. Metalife is a metaphorical projection of life.

B – So you suggest that I’m just a hallucinated ghost inside a metaphor?

A – I wouldn’t go that far. But I would find it difficult to describe ourselves as “audiovisually present in our essence as multimodal telecommunicators.”

B – So you think I’m not here talking to you?

A – You are there talking to me here.

B – So it all depends on demonstrative pronouns?

A – I guess you could say that. It's our mortal navigation system in space and time.

B – In that case perhaps we could conclude “finally we are audiovisually synchronized in the telepresence of our level three metonymic avatarization of you-there and me-here!”

A – “Level three of you-there and me-here!?” What kind of concepts are these?

B – Well, it was the best I could manage after what you said in a philosophical video call that is getting more and more absurd.

A – That’s why I don’t want to get ahead of myself.

B – You don’t want to get ahead of yourself or are you still running away from conclusions?

A – The case is even more complicated than it seems than when I say “the case is even more complicated than it seems”.

B – I think that our metonymic avatars you-there and me-here should not have started this conversation.

Part IV

[During the fourth part the three previous sequences are recapitulated, but suggesting that this time the writing, voice and filming sequences were artificially generated. In the first moment, A and B's speeches are just written text, generated letter by letter – but using a typographic font and writing speed different from the initial sequence to suggest automatic generation. The text is generated with regularity and speed unlike the regularity and speed of human production and reading. There are no keyboard sounds. In the second moment, A and B's speeches are artificially generated. Both speeches must be recorded in advance, choosing a “synthesized male voice” and a “synthesized female voice”. It should be noticeable in the prosody that these are two synthetic voices. In the third moment, A and B's speeches are presented in the form of an artificially generated video. These videos must be produced with similar background and the same costumes as the videos in sequence 3 (PART III), but characters A and B are filmed facing the camera (medium shot – framing chest and face). It should be suggested through pixelation effects, particularly of the face and mouth, that these are artificially generated videos. In the final part of the interaction, from the moment when A says “When I look at you”, the framing of A and B changes to a close-up of their respective faces. The two generated videos fill the entire length of the back of the stage. Each video occupies half of the screen projection area.]

[Moment 1: automatically generated text]

A – Yes, that is another problem that needs to be addressed.

B – What are you referring to? I think we've already covered all the cases: writing as avatar, the voice as avatar, the audiovisual image as avatar. What's missing?

A – Imagine that they are generated automatically.

B – Do you mean programmed?

A – Yes, a text generator, a voice generator, and an audiovisual image generator.

B – In this case they would continue to be avatars.

A – Yes, I suppose so. But they could be avatars without any connection to a concrete subject.

B – Textual or figurative residues of a tele-absence.

A – Sometimes you surprise me with those clear insights. It even seems that we can communicate after all.

B – Don’t get excited. Just because you can write down your thoughts, it doesn't mean that I think like you.

A – Of course, but I see that you were very well programmed.

[Moment 2: automatically generated voice]

B – After some time of conversation, calculating the probabilities of what you write is the easiest thing there is. You're a completely predictable syntactical complicator and I already know what you're going to say next.

A – I don’t like people second guessing me!

B – I already knew you were going to say that.

A – Of course you didn’t. All your knowledge is a posteriori. You just parrot.

B – I feel a slight metallic irritation in your voice that tells me the opposite of what you say. You are more transparent than an algorithm!

A – My voice is perfectly natural. It was on sale this week, I subscribed to the service for a year, it will only cost me 30 euros per month, and I have access to all updates and technical support.

B – I'm glad you made a good deal with yourself. Look, I signed up for mine last year and it wasn't that cheap. But it's a perfect simulation. When necessary, I can delegate all the avatarized metonymization of my presentified absence to her.

A – But are you crazy or what?

B – Didn’t I tell you that this calculator of probabilities was the best there is? I knew you had to repeat yourself! You are a highly circular recursive vocalizer, always biting the tail of your sentences. You are a slightly oracular propulsive equalizer, always losing the trail of your sentience.

[Moment 3: automatically generated video]

A – I see that your program has a bug. Tell me something, did you also subscribe to a moving image generator?

B – I didn’t subscribe, but I've seen that my face and my words are out there in films on the net.

A – Out there in films on the net?

B – Yes, they must have used old recordings and images even without me being tele-present.

A – Even without you being tele-present? I don't understand you.

B – Yes, I think these files were generated from others that had already been generated. At least that's what it seemed to me. And that's not even the worst...

A – What then?

B – The worst thing is that I can no longer understand anything of what’s going on.

A – What do you mean?

B – For example: are you an avatarized and metonymized metapresence of telesomething or are you a loop without a referent?

A – A loop without a referent?

B – Yes, tell me: are you an implanted hallucination, an invented memory, an auto-simulation emptied of being?

A – I don’t think so. At least I still have a slight sense of presence when I see you speak. [Close-up of faces.] When I look at you.

B – Don’t look at me like that.

A – How?

B – That way.

A – And you, please don’t talk to me like that.

B – How?

A – That way.

B – Don’t write to me again. Disappear from my life.

A – I never wrote to you.

B – I don’t want to know. Leave me.

A – I have been waiting for this moment for a long time.

B – What moment?

A – When we can finally be face to face!

B – Leave me! I already told you!

A – Your presence moves me.

B – I cannot say the same. Of that you can be sure.

A – I feel an electromagnetism between us.

B – And I feel a broken artificial intelligence.

A – Tell me: are you still there?

B – You tell me: am I still here?

December 12-13, 2023.

is a Professor in the Department of Languages, Literatures and Cultures at the University of Coimbra. His research focuses on digital critical editing, electronic literature, and comparative media studies. He has also worked as theatre director, cultural programmer, curator, and translator. He is currently the Director of the General Library of the University of Coimbra. His experimental writing (visual, sound, and digital) has been exhibited in galleries and shown in live performances.