AI voice between anthropocentrism and posthumanism: Alexa and voice cloning

Domenico Napolitano

doi:10.1386/jivs_00053_1

Posthuman Voices: Channels across Time and Shared Memories

ISSN: 2057-0341
E-ISSN: 2057-035X

AI voice between anthropocentrism and posthumanism: Alexa and voice cloning
By Domenico Napolitano¹
View Affiliations Hide Affiliations

Affiliations: ¹ ISNI: 0000000119427707 Suor Orsola Benincasa University of Naples
Source: Journal of Interdisciplinary Voice Studies, Volume 7, Issue Posthuman Voices: Channels across Time and Shared Memories, Aug 2022, p. 35 - 49
DOI: https://doi.org/10.1386/jivs_00053_1
Language: English
- Received: 28 Oct 2021
- Accepted: 13 Feb 2022
- Published online: 01 Aug 2022

Abstract

This article deals with the groundbreaking phenomenon of AI voice, highlighting two possible meanings that are often not problematized: the voice embedded into AI-based devices and the voice created using AI algorithms. In order to clarify the distinctions and the intersections of these two meanings, the article uses an approach inspired by media archaeology and social constructionism. It argues that AI voice as a social phenomenon is constructed by the interaction of a discursive level of representations and a non-discursive level of material practices and operations. The interaction of these two levels results in a tension between anthropocentrism and posthumanism, which is a characteristic of AI voice. Such tension is investigated through two case studies: the commercial of the smart speaker Amazon Alexa and the phenomenon of ‘voice cloning’. While the first is an example of how at a discursive level the ‘voice in the machine’ is represented as a way to ‘personify’ AI technology, the second, which consists in the possibility of reproducing the features of an embodied and personal voice, is an example of how the materialization of that cultural idea depends on the technical possibilities and material practices required by data-driven algorithms.

Article metrics loading...

/content/journals/10.1386/jivs_00053_1

2022-08-01

2024-04-25

Full text loading...

References

Abruzzese, Alberto. ( 2000), Metafore della pubblicità, Genova:: Costa&Nolan;.
[Google Scholar]
Andrejevic, Mark. ( 2020), Automated Media, New York:: Routledge;.
[Google Scholar]
Appadurai, Arjun. (ed.) ( 1986), The Social Life of Things: Commodities in Cultural Perspective, Cambridge, MA:: Cambridge University Press;.
[Google Scholar]
Arik, Sercan,, Cheng, Jitong,, Peng, Kainang,, Ping, Weng, and Zhou, Yanqi. ( 2018;), ‘ Neural voice cloning with a few samples. ’, arXiv:1802.06006 , https://arxiv.org/abs/1802.06006. Accessed 17 June 2022.
Bengio, Yoshua,, Goodfellow, Ian, and Courville, Aaron. ( 2017), Deep Learning, Cambridge, MA:: The MIT Press;.
[Google Scholar]
Bogost, Ian. ( 2018;), ‘ Sorry, Alexa is not a feminist. ’, The Atlantic, 24 January, https://www.theatlantic.com/technology/archive/2018/01/sorry-alexa-is-not-a-feminist/551291/. Accessed 15 October 2020.
[Google Scholar]
Bourdieu, Pierre. ( 1979), La Distinction, Paris:: Les éditions de minuit;.
[Google Scholar]
Bucher, Taina. ( 2018), If…Then: Algorithmic Power and Politics, New York:: Oxford University Press;.
[Google Scholar]
Chadwick, Rachelle. ( 2020;), ‘ Methodologies of voice: Towards posthuman voice analytics. ’, Methods in Psychology, 2:100021, pp. 1–10.
[Google Scholar]
Chen, Yutian,, Casagrande, Norman,, Zhang, Yu, and Brenner, Michael. ( 2019;), ‘ Using WaveNet technology to reunite speech-impaired users with their original voices. ’, DeepMind , 18 December, https://www.deepmind.com/blog/using-wavenet-technology-to-reunite-speech-impaired-users-with-their-original-voices. Accessed 7 May 2022.
Connor, Steven. ( 2000), Dumbstruck: A Cultural History of Ventriloquism, Oxford:: Oxford University Press;.
[Google Scholar]
Council of Europe Commissioner for Human Rights ( 2012), Who Gets to Decide? Right to Legal Capacity for Persons with Intellectual and Psychosocial Disabilities, Strasbourg:: Council of Europe Publishing;, https://www.refworld.org/pdfid/50f7e2572.pdf. Accessed 27 September 2021.
[Google Scholar]
Derrida, Jacques. ( [1967] 2010), Voice and Phenomenon: Introduction to the Problem of the Sign in Husserl’s Phenomenology (trans. L. Lawlor.), Evanston, IL:: Northwestern University Press;.
[Google Scholar]
Dolar, Mladen. ( 2006), A Voice and Nothing More, Cambridge, MA:: The MIT Press;.
[Google Scholar]
Dyson, Frances. ( 2014), The Tone of Our Times: Sound, Sense, Economy and Ecology, Cambridge, MA:: MIT Press;.
[Google Scholar]
Ernst, Wolfgang. ( 2012), Digital Memory and the Archive, Minneapolis, MN:: University of Minnesota Press;.
[Google Scholar]
Ernst, Wolfgang. ( 2018;), ‘ Radical media archaeology: Its epistemology, aesthetics and case studies. ’, Artnodes, 21, pp. 35–43.
[Google Scholar]
EU GDPR ( 2016;), European Union General Data Protection Regulation. , Art. 4, c. 14 , https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32016R0679&from=EN. Accessed 10 October 2021.
Faber, Liz. ( 2020), The Computer’s Voice: From Star Trek to Siri, Minneapolis, MN:: University of Minnesota Press;.
[Google Scholar]
Finkel, Alan. ( 2018;), ‘ Rules to encourage well behaved artificial intelligence. ’, Cosmos Magazine, 22 August, https://cosmosmagazine.com/technology/rules-to-encourage-well-behaved-artificial-intelligence. Accessed 19 September 2020.
[Google Scholar]
Flichy, Patrice. ( 2007), The Internet Imaginaire, Cambridge, MA:: The MIT Press;.
[Google Scholar]
Griswold, Wendy. ( 1994), Cultures and Societies in a Changing World, New York:: Sage;.
[Google Scholar]
Grusin, Richard. ( 2015), The Nonhuman Turn, Minneapolis, MN:: University of Minnesota Press;.
[Google Scholar]
Haraway, Donna. ( 1991), Simians, Cyborgs and Women: The Reinvention of Nature, London:: Free Association Books;.
[Google Scholar]
Hayles, Katherine N.. ( 1999), How We Became Posthuman: Virtual Bodies in Cybernetics, Literature, and Informatics, Chicago, IL:: University of Chicago Press;.
[Google Scholar]
Hester, Helen. ( 2016;), ‘ Technically female: Women, machines and hyperemployment. ’, Salvage Magazine, 8 August, https://salvage.zone/in-print/technically-female-women-machines-and-hyperemployment/. Accessed 19 November 2020.
[Google Scholar]
Husserl, Edmund. ( [1913] 1982), Ideas Pertaining to a Pure Phenomenology and to a Phenomenological Philosophy (trans. F. Kersten.), London:: Martinus Nijhoff Publishers;.
[Google Scholar]
Jemine, Corentin. ( 2019;), ‘ Real-time voice cloning. ’, master’s dissertation, Liège:: Université Liège;.
[Google Scholar]
Kluitenberg, Eric. ( 2006), Book of Imaginary Media: Excavating the Dream of the Ultimate Communication Medium, Rotterdam:: Nai Publishers;.
[Google Scholar]
Krysa, Joasia, and Parikka, Jussi. (eds) ( 2015), Writing and Unwriting (Media) Art History: Erkki Kurenniemi in 2048, Cambridge, MA:: MIT Press;.
[Google Scholar]
Latour, Bruno. ( 1992;), ‘ Where are the missing masses? The sociology of a few mundane artifacts. ’, in W. Bijker,, T. Hughes, and T. Pinch. (eds), The Social Construction of Technical Systems, Cambridge, MA:: MIT Press;, pp. 151–80.
[Google Scholar]
Licht, Alan. ( 2007), Sound Art: Beyond Music, Between Categories, New York:: Rizzoli International Publications;.
[Google Scholar]
Lorenzo-Trueba, Jaime, and Klimkov, Viacheslav. ( 2019;), ‘ Neural text-to-speech makes speech synthesizers much more versatile. ’, Amazon Science , 22 August, https://www.amazon.science/blog/neural-text-to-speech-makes-speech-synthesizers-much-more-versatile. Accessed 10 January 2021.
Marchesini, Roberto. ( 2002), Posthuman: Verso Nuovi Modelli di Esistenza, Torino:: Bollati Boringhieri;.
[Google Scholar]
Marr, Bernard. ( 2019;), ‘ Artificial intelligence can now copy your voice: What does that mean for humans?. ’, Forbes , 6 May, https://www.forbes.com/sites/bernardmarr/2019/05/06/artificial-intelligence-can-now-copy-your-voice-what-does-that-mean-for-humans/. Accessed 20 July 2021.
Matedub ( 2022), https://matedub.com/. Accessed 5 May 2022.
McLuhan, Marshall. ( 1994), Understanding Media: The Extensions of Man, Cambridge, MA:: The MIT Press;.
[Google Scholar]
Modulate ( 2020;), ‘ Voice skins and individual identity. ’, Modulate Blog , 7 March, https://www.modulate.ai/blog/voice-skins-and-individual-identity. Accessed 25 July 2021.
Moser, Ingunn, and Law, John. ( 2003;), ‘ “Making Voices”: New media technologies, disabilities, and articulation. ’, in G. Liestøl,, A. Morrison, and T. Rasmussen. (eds), Digital Media Revisited, Cambridge, MA:: MIT Press;, pp. 491–520.
[Google Scholar]
Napolitano, Domenico. ( 2020a;), ‘ The cultural origins of voice cloning. ’, in M. Verdicchio,, M. Carvalhais,, L. Ribas, and A. Rangel. (eds), xCoAx 2020 Proceedings of the Eighth Conference on Computation, Communication, Aesthetics & X, held online , 8 July, Porto:: Universidade de Porto Press;, pp. 59–73, http://2020.xcoax.org/xCoAx2020.pdf. Accessed 19 June 2022.
[Google Scholar]
Napolitano, Domenico. ( 2020b;), ‘ “Where’s the voice of the machine?” An ethnography of artificial voice socio-technical networks. ’, Etnografia e ricerca qualitativa, 3, pp. 351–72.
[Google Scholar]
Napolitano, Domenico. ( 2021;), ‘ Reuniting speech-impaired people with their voices: Sound technologies for disability and why they matter for organization studies. ’, puntOorg International Journal, 7:1, pp. 6–21, https://doi.org/10.19245/25.05.pij.7.1.2.
[Google Scholar]
Napolitano, Domenico, and Sicca, Luigi M.. ( 2021;), ‘ Organizing in sound: Sound art and the organization of space. ’, Studi Organizzativi, 2:2, pp. 93–120.
[Google Scholar]
Nass, Clifford, and Brave, Scott. ( 2005), Wired for Speech: How Voice Activates and Advances the Human-Computer Relationship, Cambridge, MA:: MIT Press;.
[Google Scholar]
Natale, Simone. ( 2012;), ‘ Fantasie mediali: la storia dei media e la sfida dell’immaginario. ’, Studi Culturali, 9:2, pp. 269–84.
[Google Scholar]
Natale, Simone. ( 2021), Deceitful Media: Artificial Intelligence and Social Life after the Turing Test, New York:: Oxford University Press;.
[Google Scholar]
Natale, Simone, and Pasulka, Diana. ( 2019), Believing in Bits: Digital Media and the Supernatural, New York:: Oxford University Press;.
[Google Scholar]
Neumark, Norie. ( 2017), Voicetracks: Attuning to Voice in Media and the Arts, Cambridge, MA:: MIT Press;.
[Google Scholar]
Parikka, Jussi. ( 2012), What Is Media Archaeology?, Cambridge, MA:: Polity Press;.
[Google Scholar]
Peters, John D.. ( 2004;), ‘ The voice and modern media. ’, in D. Kolesch, and J. Schrödl. (eds), Kunst-Stimmen, Berlin:: Theater der Zeit Recherchen 21;, pp. 85–100.
[Google Scholar]
Phan, Thao. ( 2017;), ‘ The materiality of the digital and the gendered voice of Siri. ’, Transformation, 29, pp. 23–33.
[Google Scholar]
Resemble ( 2022), https://www.resemble.ai/. Accessed 7 May 2022.
Revoice ( 2022), https://www.projectrevoice.org. Accessed 5 May 2022.
Siebers, Tobin. ( 2008), Disability Theory, Ann Arbor, MI:: University of Michigan Press;.
[Google Scholar]
Singh, Rita. ( 2019), Profiling Humans from Their Voice, Singapore:: Springer;.
[Google Scholar]
Sterne, Jonathan. ( 2012), Mp3: The Meaning of a Format, Durham, NC:: Duke University Press;.
[Google Scholar]
Sterne, Jonathan. ( 2021), Diminished Faculties: A Political Phenomenology of Impairment, Durham, NC:: Duke University Press;.
[Google Scholar]
Terranova, Tiziana. ( 2013;), ‘ Free labor. ’, in T. Scholz. (ed.), Digital Labor: The Internet as Playground and Factory, New York and London:: Routledge;.
[Google Scholar]
TheAdsWorld ( 2018;), ‘ Alexa loses her voice: Amazon Super Bowl LII commercial. ’, YouTube , 16 December, https://www.youtube.com/watch?app=desktop&v=iNxvsxU2rJE. Accessed 5 May 2022.
Wilson, Mark. ( 2018;), ‘ The war on what’s real. ’, Fast Company , 3 June, https://www.fastcompany.com/90162494/the-war-on-whats-real. Accessed 15 November 2020.
Woolgar, Steve. ( 1991;), ‘ The turn to technology in social studies of science. ’, Science, Technology, and Human Values, 16:1, pp. 20–50.
[Google Scholar]
Zuboff, Shoshana. ( 2019), The Age of Surveillance Capitalism: The Fight for the Future at the New Frontier of Power, New York:: PublicAffairs;.
[Google Scholar]
Napolitano, Domenico. ( 2022;), ‘ AI voice between anthropocentrism and posthumanism: Alexa and voice cloning. ’, Journal of Interdisciplinary Voice Studies, 7:1, pp. 35–49, https://doi.org/10.1386/jivs_00053_1
[Google Scholar]

http://instance.metastore.ingenta.com/content/journals/10.1386/jivs_00053_1

Article Type: Article

Keyword(s): AI; media archaeology; posthumanism; speech synthesis; speech-impairment; voice assistants

AI voice between anthropocentrism and posthumanism: Alexa and voice cloning

Abstract

Most Read This Month

Most Cited Most Cited RSS feed

Ballad of the dork-o-phone: Towards a crip vocal technoscience

Towards a hopeful plurality of democracy: An interview on vocal ontology with Adriana Cavarero

South Indian singing, digital dissemination and belonging in London’s Tamil diaspora

The art and craft of voice (and speech) training

Voice, identity, contact

‘Easy listening’: Altered Auditory Feedback and dysfluent speech

The clearing: Music, dysfluency, Blackness and time

Dramaturging the I-voicer in A Voice Is. A Voice Has. A Voice Does.: Methodologies of autobiophony

On breaking with

'A Song for You': The role of voice in the reification and de-naturalization of ablebodiedness