Spoken data be can stored in a range of formats, namely audio, video and text. For the purposes of corpus-building, spoken language is typically transcribed in written form and stored as digital text. As with multi-modal annotation coding schemes, there is no one-size-fits-all standard set of conventions for the written transcription of spoken discourse. Transcribing decisions are based on (1) what to capture and (2) how to capture it. And since transcription is, according to Cook, “infinitely delicate and infinitely expandable” (1990:1), a fundamental issue is not where to start but where to stop in describing all of the elements that contribute to interaction and its context.
One of the most commonly used bases for transcription is the Jefferson Transcription system, used widely in the field of Conversation Analysis (Sacks et al. 1972 and described in detail in Hepburn and Bolden 2017). This scheme tends towards detail capturing not simply words but the minutiae of discourse, from vowel length, to pauses to overlaps to pitch.
While it is not necessary to determine a set of conventions for the representation of common elements of spoken discourse delivered with clarity, there is variation in how features of recorded spoken discourse such as environmental sounds, pauses and interruptions are transcribed. Thus, it is advisable to predetermine how these elements are represented (if at all) in a transcription.
Typically, when devising transcription conventions, the level and focus of detail to be transcribed is usually determined by the focus of research that is to be undertaken with the transcription. For example, if it is predicted that a qualitative, conversation analysis will be undertaken using a transcript, then it is appropriate to include the symbols that are standard in conversation analysis to determine such features of spoken discourse as overlapping dialogue.
Likewise, if a researcher intends to look at prosodic features of spoken discourse, then they will be interested in transcribing features such as rising and falling intonation, which are usually transcribed using upward and downward arrows. However, as the inclusion of each of these elements adds time to the transcription, if one is not interested in analysing these features, then they may be disregarded. Accordingly, it is important to have a sense from the outset what features intend to be analysed as a priori to determining transcription conventions. Once this has been decided, it is best practice to adapt a previously constructed and applied set of conventions. This means that, while there is variety in transcription conventions across various corpora, there are core symbols and representations that are consistent due to sequences of adaptation of conventions. There is a caveat here since as Jefferson points out “… one cannot know what one will find until one finds it”. (Jefferson 2004: 15)
It may be beneficial to engage with close-reading of the data in a corpus somewhat before settling on conventions entirely as it may be found that certain conventions don’t necessarily fit the discourse being analysed. For example, for this project (IVO) we discussed using representations of hesitation devices such as uhm and er before engaging with our corpus and understanding that, in our data, such vocalisations could be standardised with two representations: uh and um. Once this had been decided, it made the process of selecting which representation to use a lot easier and we understood that further nuance would not be required.
Following these considerations, we settled upon a set of conventions that was determined by:
- The type of analysis we predicted we would undertake based on our research questions;
- Engagement with the corpus to provide a sense of what symbols and representations would suffice to appropriately transcribe it and;
- Previous transcription systems.
Both 2 and 3 above resulted in an iterative process of seeing what was sufficient for our corpus and referring to previous sets of conventions. The conventions which we drew upon were those used by the Cambridge and Nottingham Corpus of Discourse in English (McCarthy 1998), CorCenCC (Knight et al. 2020) and the British National Corpus (Love 2020). We adopted a simplistic broad transcription approach. The following example shows some of the conventions we adopted. Speakers were given a number and speaker turns identified by <>. Overlaps were marked with + and interruptions with =. Paralinguistic features like laughing and coughing were marked in square brackets, short pauses were marked with … and unclear utterances marked ![unclear]/! (see also below).
In addition, following engagement with our corpus and seeing the relevance of screen sharing for marking turning points in discourse, we decided to include annotations for the beginning of screen sharing with [screen share start] and the end with [screen share stop]. Within these phases, it became evident that changing slides or scrolling also marked points of topic change or relevant markers of discourse so we decided to include [screen change] at these points. As these examples show, knowledge of your data and consideration of the relevance of certain nonverbal elements of your data will determine how necessary it is to include them in your transcription.
Depending on the ethical considerations that are necessary to undertake a given research project, there will be standards of anonymisation that need to be adhered to in order to avoid the content of a corpus being traced to the individuals involved in it. The process involves replacing the name of a person, place, institution, event or other item that might reveal identities with a code or replacement term. It is important that these codes be searchable in corpus software, but to not be confused with other items. Thus, unique codes should be given that include a tag that identifies the type of item that is being replaced. For example, for participants in the data who are being addressed by other participants, we used anon (anonymisation) followed by underscore (__), dollar sign ($) and the numerical code assigned to the participant, all within square brackets. See extract below to see how this looks in a transcript. In this extract speaker 19 <S019> is referring to other participants who are attending the meeting:
When anonymising people who are not participants but who are referred to in the meeting of we used a different type of shortened codes: we used are FN for first name, SN for surname. We also anonymised any detail which might reveal any information about participants, for example if referring to an institution we used INST followed by a number, for projects we used PR, for places we used PL followed by a number, etc. See extract below as it looks in our orthographic transcript.
These help us to achieve various searches in our corpus without compromising the anonymity of participants. We kept a detailed list of references for all anonymisation.
Distinguishing representative symbols from each other can be important for later analysis. For each speaker turn we used angled brackets and an upper case S followed by the speaker ID number. For example, by using a specific symbol ($) for participants when they are referred to in the dialogue, we are able to search for all instances of participants being directly addressed by other participants in our data. If we used this symbol for all first names, including those not who are not participants, we would restrict this searchability to giving all mentions of first names, whether they are part or the data or not. These, in turn, are necessary to distinguish from symbols and codes that signify the speaker who is responsible for a turn. We used angled brackets, letter S (for speaker) followed by their numerical identifier for each speaker and speaker turn for example <S019>. Using angled brackets (which are xml readable) allows us to hide or show speaker identifiers in the corpus tools that we use to analyse the data. They also allow us to search on specific speakers usage.
As mentioned, when it comes to non-verbal sounds such as laughing, coughing, sneezing or a technical noise or issue, we use the square brackets with the activity in the centre e.g. [cough]. These square brackets make it easy and quick for us to search the data for such non-verbal behaviour. As these are recorded virtual meetings, there are many examples of technical issues within the meetings. The example extract below shows how this looks in our transcript:
References
Hepburn, A., and Bolden, G. B. (2017). Transcribing for Social Research. Thousand Oaks, CA: Sage.
Jefferson, G. (2004). Glossary of transcript symbols with an introduction. Conversation analysis, 13-31.
Knight, D., Morris, S., Fitzpatrick, T., Rayson, P., Spasić, I., Thomas, E-M., Lovell, A., Morris, J., Evas, J., Stonelake, M., Arman, L., Davies, J., Ezeani, I., Neale, S., Needs, J., Piao, S., Rees, M., Watkins, G., Williams, L., Muralidaran, V., Tovey-Walsh, B., Anthony, L., Cobb, T., Deuchar, M., Donnelly, K., McCarthy, M. and Scannell, K. (2020). CorCenCC: Corpws Cenedlaethol Cymraeg Cyfoes – the National Corpus of Contemporary Welsh. Cardiff University, http://doi.org/10.17035/d.2020.0119878310
Knight, D., Morris, S., Fitzpatrick, T., Rayson, P., Spasić, I., Thomas, E-M., Lovell, A., Morris, J., Evas, J., Stonelake, M., Arman, L., Davies, J., Ezeani, I., Neale, S., Needs, J., Piao, S., Rees, M., Watkins, G., Williams, L., Muralidaran, V., Tovey-Walsh, B., Anthony, L., Cobb, T., Deuchar, M., Donnelly, K., McCarthy, M. and Scannell, K. (2020). CorCenCC: Corpws Cenedlaethol Cymraeg Cyfoes – the National Corpus of Contemporary Welsh. Cardiff University, http://doi.org/10.17035/d.2020.0119878310
Love, R. (2020). Overcoming challenges in corpus construction: The spoken British National Corpus 2014. London: Routledge.
McCarthy, M. (1998). Spoken language and applied linguistics. Cambridge University Press.
Sacks, H., Schegloff, E. A., & Jefferson, G. (1974). A simplest systematics for the organization of turn-taking for conversation. Language, 50(4), 696-735.