Transcription rules for

### CONTENTS 1: Viewing This FAQ 2: New Question? 3: What We Need You To Do 4: Transcription Procedure 5: Using The System 6: Things That Might Bother You 7: Transcription Rules (Summary) 8: Transcription Rules 9: Tags 10: Using Tags 11: Timesavers 12: Miscellaneous Questions And Points 13: Spellings For O2 Data ### 1: VIEWING THIS FAQ Use command-plus (Mac) or ctrl-plus (Windows) to zoom in. Use command-minus (Mac) or ctrl-minus (Windows) to zoom out. Use command-f (Mac) or ctrl-f (Windows) to search for specific section titles and move to them. ### 2: NEW QUESTION? Email StJohn with your new question. ### 3: WHAT WE NEED YOU TO DO We produce speech recognition systems. This system will interact with callers and attempt to understand them. These systems must be tested using real information produced by actual people. Your job is to help us transcribe a large number of interactions between the caller and the system. When transcription is complete, a speech scientist will build a speech system and test it using your work. The scientist will rarely listen to the audio files. He/she will instead read your transcription and compare it to the result that the system produces. This helps him/her quickly find the source of a problem with the speech system. We have a number of conventions that describe the transcription more completely to help him/her do this. For example, let's say that you listen to an audio file and transcribe it as "help me with something else [noise]". The speech system, meanwhile, thinks that the noise is speech and transcribes the audio incorrectly as "help me with something else delivery". When the scientist compares the two results, he/she sees: TRANSCRIPTION: help me with something else [noise] SYSTEM RESULT: help me with something else delivery This makes the problem obvious. The scientist can then change the speech system to make it more accurate. Normally, we actually run the basic version of the speech system first. It is often faster for you to correct its guesses rather than type everything out yourself. ### 4: TRANSCRIPTION PROCEDURE 1) Press tab to play/repeat audio item. This also activates the text box that you type into. 2) Press enter to copy down the transcription produced by the speech system (shown in the "original utterance" line) if it is close enough to be useful. This only works if the "modified utterance" line and the text box are empty. Sometimes the original utterance line will contain only "blank" (showing that the system hasn't been able to guess anything), which won't be copied down if you press enter. 3) Transcribe the item (or edit the transcription produced by the speech system). Type into the text box. Press enter to move this text into the "modified utterance line". Use right/left arrow keys to move between words in the modified utterance. If there is text in the text box, the right/left arrow keys will instead move right or left through this text. Similarly, pressing the backspace key will delete the highlighted word in the modified utterance, but if there is text in the text box, it will instead delete a character in the text box. 4) Press enter to save the transcription (it is sent off to the webserver). The icon will turn from white to green. 5) Press enter to move to the next item (even if the next item is shown as a spinning orange disc, it will often work). 6) Use up/down arrow keys to move between items. This doesn't work if there's any text in the text box. Almost every function can be done via a keyboard shortcut rather than by using the mouse. While transcribing, press ctrl-space to show/hide the list of shortcuts. Spelling errors and using capitals/numerals/punctuation don't cause too much trouble. I've written proofreading code that catches 99% of these and allows me to correct them in a reasonable amount of time. An easy mistake is to press enter to move to the next transcription item, press enter again immediately when you first hear the audio and you haven't done any transcription, which copies down the speech system's transcription, and then adding your own transcription. This results in a "double transcription" that is very hard for me to catch and correct. If the waveform shows a small bubble near the end, please listen to it. Sometimes it is quiet speech, not just noise. Remember, we care most about the raw transcription of words, not the tags. ### 5: USING THE SYSTEM Our system currently only works for the Google Chrome browser. Your transcriptions are "saved" (sent to the webserver) when you press enter to move to the next item. Their icon will turn from white to green. You need a good internet connection to use our system. Sometimes, not all the audio files load. I find it is best to do the ones I can, then reload, then do the ones that were not loaded the first time. A spinning orange disc indicates that an item has not been loaded. Please note that often the item is successfully loaded, but the icon does not change from an orange disc to a white tick. It's worth trying to proceed normally through the orange discs. I am afraid that occasionally some transcriptions sent to the webserver may be lost in transit. When you press enter at the end of a job to move to the next one, your current job will be reloaded, with the lost transcriptions in white rather than green. This is irritating, but fixing it would require serious development work. Improving your internet connection will make it easier for more of your transcriptions to reach the webserver. Ways to improve your connection: - Be closer to your wifi router - Change the position of your computer (wifi waves vary in intensity within even a single room) - Connect to your router with an ethernet cable (this is the best option) You can use this website to check your web connection speed. http://www.speedtest.net/ Internally, the transcriptions are stored in batches of about 2000 items. They are delivered in groups of 50 to you. When we reach the end of a batch, I sweep up the remaining open jobs and complete them. However, your transcription total is unaffected (it's the number of items you transcribed, rather than the number of jobs you completed), so you don't need to make an effort to avoid leaving jobs half-completed. ### 6: THINGS THAT MIGHT BOTHER YOU 1. **Transcription can be boring.** -- We recommend using tomato-timer.com and setting the timer to 5 or 10 minutes. Then take a 1 minute break. Experiment until you find a working pattern that is comfortable that you can maintain for long periods of time. Every so often you should stand up and walk around the room. It's much, much better to produce a consistent rate of output than to do mad dashes and exhaust yourself. 2. **Ergonomics** -- A good working position is essential. You should not be hunched over your screen. Ideally, your screen should be raised to eye height and you should type using an external keyboard. You should have room to stretch out your legs and lean against the back of your chair. 3. **Swearing/Anger** -- Sometimes the people who call into a speech system are angry (or become angry when using the system). They might swear and be generally rude and aggressive. Even though you know it is not directed at you personally, this can affect your mood or make you uncomfortable. Unfortunately, if they are talking directly to the system, this must still be transcribed. If it makes it easier, think of how a customer service representative must maintain an even temper when dealing with many clients in many different moods. Deal with the difficult customer, then take a short break. ### 7: TRANSCRIPTION RULES (SUMMARY) - It's more important to transcribe all the words said to the system than to get the tags exactly right. - If you're not sure/happy about a transcription, add the tag [unsure] once anywhere in the transcription. - Use lower case throughout. Spelled-out letters should be in lower case separated by spaces. - No punctuation or special symbols except hyphens and apostrophes. - No abbreviations unless spoken, even common ones like "mr". Write titles in full e.g. "mister". - Use full words for everything including numbers, dates, amounts etc. - Always type dictionary words. Use this set of non-dictionary words: ok, yeah, yep, nope, dunno, wanna, lemme, gimme, gonna, gotta, innit, ain't - Type exactly what you hear even if it is ungrammatical or skips or repeats words. - While viewing the transcription page you can use ctrl-space to access the list of shortcuts. --- Example shortcut: If you type in numbers by themselves e.g. "123" and press enter, they are expanded to "one two three". ### 8: TRANSCRIPTION RULES - It's more important to transcribe all the words said to the system than to get the tags exactly right. - If you're not sure/happy about a transcription, add the tag [unsure] once anywhere in the transcription. - While viewing the transcription page you can use ctrl-space to access the list of shortcuts. --- Example shortcut: If you type in numbers by themselves e.g. "123" and press enter, they are expanded to "one two three". Note that this only works if the text you type in consists only of digits. "123" will work, but "123hello" or "12 york street" will not. - Use lower case for all transcriptions. Spelled letters should be in lower case separated by spaces, with no period after them. e.g. "my postcode is c b twelve five a q" NOT: "My postcode is CB12 5AQ." e.g. "my name is isa spelt i s a" NOT: "My name is Isa, spelt I-S-A." Letters should never be written out e.g. transcribing "h n j" as "aitch en jay". - No punctuation or special symbols except hyphens and apostrophes where these would normally be used. e.g. i'm not sure, i don't know, mother-in-law, self-evident Q: Should I always use hyphens? For example, when a caller says "thirty-first of april" or "twenty-second of may"? A: No. Don't make an effort to use hyphens, generally. For the speech system programmer, it doesn't help and sometimes hinders. - Use full words for everything including numbers, dates, amounts etc. --- Numbers e.g. "oh", "zero", "nothing" or "nought" - whichever was spoken NOT: "0" e.g. "a hundred" or "one hundred" - whichever was spoken NOT: "100" e.g. "twenty four double three" NOT: "24 33" --- Percentages e.g. "six point oh four percent", "two percent" NOT: 6.04%, 2% --- Currency amounts - type all units as words, no symbols like £ or $. e.g. "twelve pounds ten p", "a hundred and five u s dollars" NOT: £12.10, US$105 --- Dates e.g. "monday january the eighteenth two thousand and twelve" NOT: "Monday, January 18th 2012." - No abbreviations unless spoken as such, even common ones such as "st" and "dr", and titles. e.g. mister, missus, miss, doctor, reverend, professor NOT: Mr, Mrs, Ms, Dr, Rev, Prof e.g. saint louis street NOT: St Louis St e.g. doctor fosters drive NOT: Dr Fosters Dr - Type exactly what you hear even if it is ungrammatical or skips or repeats words. e.g. "ok right so [mispronunciation] the code hang on it [hesitation] it's two one zero five" NOT: "OK, right, sooo the code - hang on - it's 2105." - Mispronounced words - spell these correctly but use the tag [mispronunciation] after the word. Don't invent spellings to try to represent the way mispronounced words sound. e.g. "i'd like reservations [mispronunciation] oops sorry reservations" NOT: "I'd like reversations, oops sorry, reservations" --- Long-drawn-out speech (because of speaker uncertainty) e.g. "yeeeess well" is transcribed as "yes [mispronunciation] well". - Please try to guess word fragments. For example, if someone says "advi advisor" it should be transcribed as "advisor [fragment] advisor" You can also use [fragment] if the first part of the word was not said by the caller. "change my [noise] ariff" should be transcribed as "change my [noise] tariff [fragment]" --- If you're not sure about the completion, you can indicate this by enclosing the word in parentheses to show that it's a guess. For example, you can transcribe "change my [noise] riff" as "change my [noise] (tariff) [fragment]" --- If you can't easily complete the fragment, transcribe it with no changes e.g. "speak to so [fragment] help me with something else" (the fragment could be "somebody" or "someone"). - Speech systems cope with this set of non-dictionary words: ok, yeah, yep, nope, dunno, wanna, lemme, gimme, gonna, innit, ain't e.g. caller says "dunno" instead of "don't know". Transcribe this as "dunno". However, you should transcribe "don wanna" as "don't [mispronunciation] wanna". The system won't understand new contractions. It only copes with the contractions that someone has programmed into it. --- do NOT use: yea, yeap, nopey, duno, wana, leme, gime, gunna - If someone (who is not the caller) is speaking during some or all of the interaction, mark this as [side speech], not [background noise]. For the system, speech is very different to a steady background noise. - If a word has been cut off by the start or end of the audio recording, use a tilde to mark the cut-off. "o i'm calling to say i'm not happy" should be transcribed as "~o i'm calling to say i'm not happy". If the cut-off occurs at the end, put the tilde on the right-hand side of the last word e.g. "speak to some" should be transcribed as "speak to some~" If you can, include a best guess for words cut off by the start or end of the audio recording. You need to enclose the guess in parentheses to show that it's a guess. For example: "lo i'm calling to say i'm not happy" could be transcribed as "~(hello) i'm calling to say i'm not happy" Also, "make a paym" could be transcribed as "make a (payment)~" A space between the parenthesis and the tilde is fine e.g. "make a (payment) ~". ### 9: TAGS We call the annotations in square brackets "tags". All of these tags can be added via a keyboard shortcut. While transcribing, press ctrl-space to show/hide the list of shortcuts. **Tag List** ~ () [accent] [background-noise] [bad-audio] [breath-noise] [cough] [dtmf] [fragment] [hangup] [hesitation] [mispronunciation] [noise] [no-speech] [pause] [prompt-echo] [side-speech] [skipped] [unintelligible] [unsure] **Tag Descriptions** - ~ = audio cut-off A word is cut off by the start or end of the audio file. - () = best guess If the word is not completely clear, but you can guess it, enclose it with parentheses. If you can't easily guess it, put [unintelligible]. - [accent] Caller has a strong accent or speech dialect. - [background-noise] Any noticeable background noise that continues for a while. Examples: Music without any singing, machinery humming. - [bad-audio] Any audio quality/distortion issues. Examples: Audio fading out (and in), distortion, breaking up of phone line, muffled speech due to bad phone line. Only mark this once per transcription. Ideally, put it at the point where the audio quality first degrades, but this is not crucial. - [breath-noise] Loud breath, sigh, wind on microphone. - [cough] Cough, clear throat, laugh, sneeze. - [dtmf] The sound(s) of telephone touch tones being pressed on a telephone keypad. If you've ever listened to someone texting where each touch of a key makes a audible beep, that's the sound. - [fragment] The caller did not say a complete word. Add this tag directly after the partially-spoken word. For example, if someone says "advi advisor", transcribe this as "advisor [fragment] advisor". - [hangup] Any audible hangup noise. - [hesitation] "um", "er", "uh", "uh-huh", "mm-hm", etc. - [mispronunciation] The caller mispronounced a word. Add this tag directly after the mispronounced word. - [noise] Door slam, car horn, something dropped on the floor - [no-speech] The audio item does not contain anything recognisable as speech. Nonetheless, [dtmf], [noise], [hangup] and other events should be marked if they occur. - [pause] Long pause (e.g. at least 2 seconds) during speech by the caller to the system. - [prompt-echo] The audio contains some or all of a speech system prompt. A prompt is anything the system says to a caller, e.g. "Please say your account number". The O2 prompt wording before the recording is (roughly): "To help us make some improvements, in just a few words, please tell us why you are calling today, for example, to top up or to check on an upgrade" There are several different versions. Any other recordings, like TV/train announcements etc should be [side-speech]. - [side-speech] Any speech by the caller not directed at the system. Any intelligible speech from a bystander. Examples: People talking, background speech, radio, TV, automatic train announcements, baby noises, dogs barking, indistinct background hubbub (e.g. coffeeshop noise), music with singing. - [skipped] Skip the transcription entirely if the caller "rambles" for more than 20 words (approximately) or if the transcription is too long to fit properly on the transcription page. If the item is less than about 20 words and you can't understand it, use the tag [unintelligible]. - [unintelligible] Use this to mark any speech that you can't understand. - [unsure] If you're not sure/happy about a transcription, add the tag [unsure] to it. This only needs to be added once. ### 10: USING TAGS Transcribing the actual words that are spoken is much more important than getting the tags right. That said, you should learn all the tags and how they're used (over time and with practice). - Please use the shortcut keys for tags rather than typing them out yourself. If you miss out the hyphen "[breath noise]" or misspell the tag "[breah-noise]", I can catch this with code but I still have to correct it. - When there's only a noise and no actual speech, please put [noise] [no-speech]. However, for a cough or a breath noise by itself, don't add [no-speech]. Transcribe these as just [cough] or [breath-noise]. Essentially, if the sound is made by a human or animal (hesitation, cough, sneeze, breathing, side speech, baby crying, dog barking, unintelligible), it's fine by itself. If there is no human sound in the audio (dtmf, noise, background noise, hangup), then add [no-speech]. - Don't use [no-speech] if anyone talks in the background, however quietly or briefly. - [no-speech] is fine by itself if background noise is faint. - Please don't use [skipped] if you can't understand the caller but the call is quite short e.g. 5-8 words. Use [unintelligible] instead. Add [unsure] if you think I might be able to understand the person. - If you use [skipped], please don't add anything else to it. [skipped] should be the whole of the transcription. - Don't use [side-speech] to label your transcription as not being directed at the system e.g. "[side-speech] I'll see you at the cinema". Within a transcription, [side-speech] is used to show the position of a word/phrase/sentence that was not relevant and not necessary to transcribe. - [mispronunciation] is only used immediately after the word that was mispronounced. - [fragment] is only used immediately after the word that was only partly spoken. - Please don't use [noise] or [fragment] to transcribe a short segment of speech that can't be understood. Use [unintelligible] instead. - It is much better to put [unintelligible] than to guess unreliably. Use parentheses () around the word if you can guess easily. - You don't need to put [hesitation] unless the caller actually makes a noise e.g. "uh", "um", "er" etc. If they pause for e.g. 2-3 seconds, put [pause]. However, you only need to use [pause] for the caller, not for other people, so just put "[side-speech]" instead of "[side-speech] [pause] [side-speech]". - Please use tildes (~) only at the start and end of a transcription, never in the middle. They denote words cut off by the start/end of the audio file itself, not words cut off by the caller or by audio quality. - Telephone touchpad sounds are transcribed as [dtmf], not [noise]. If there are multiple consecutive touch-tone beeps in a row, you only need to put [dtmf] once. ### 11: TIMESAVERS Here is a list of things that will save you time, decrease your frustration, and increase your hourly rate of pay: 1) There is never any need to use the same tag twice in a row. If there are multiple consecutive noises in a row, put [noise] once. The same applies to [dtmf], [hesitation], [cough], [breath-noise], [side-speech], and [unintelligible]. For example, "[side-speech] [side-speech] pay a bill" should just be transcribed as "[side-speech] pay a bill". However, "[side-speech] pay a bill [side-speech]" should not be shortened to "[side-speech] pay a bill", as the [side-speech] tags are separated by speech to the system. 2) Use the auto-complete. It is shared among all transcribers. To add a phrase to the auto-complete, type it into the text box and press ctrl-enter. To delete an entry, type it in (or select it using the auto-complete), and press ctrl-minus. The deletion won't take effect until you reload the page. The auto-complete doesn't handle single words, only phrases. Also, I'm afraid that every time we move to a new batch (about once every 2000 items), the auto-complete is wiped. Note: Sometimes a batch is much smaller or larger e.g. 80 or 3000 items. 3) If you type in numbers by themselves e.g. "123" and press enter, they are expanded to "one two three". Note that this only works if the text you type in consists only of digits. "123" will work, but "123hello" or "12 york street" will not. 4) Don't transcribe any side speech that is unrelated to the system. Generally, you can guess whether the caller is talking to someone else or to the system. Mark it as [side-speech]. 8) Use [skipped] for anything over about 20 words said directly to the system. The rule of thumb is that if it's taking you 4-5 times longer than normal to transcribe an item, skip it. If the transcription is too long to fit on our display, skip it. If the item is less than about 20 words and you can't understand it, use the tag [unintelligible] In general, please do remember that almost everything has a keyboard shortcut. Using these will save you a lot of time. ### 12: MISCELLANEOUS QUESTIONS AND POINTS - Q: Should I use "ok" or "okay"? A: "ok", because it's shorter. But don't bother correcting "okay" to "ok". - Normally, when someone pronounces something incorrectly (or in a very thick regional accent), we mark this with [mispronunciation] e.g. "help me with summat else" is transcribed as "help me with something [mispronunciation] else". --- However, there are a couple of exceptions: --- The transcription "sort me bill out" is fine, as "me" is an actual word, not just a mispronunciation. --- The "g" is often left off the ending "ing". This is not a problem, as it's very close to the standard pronunciation. So "nothin" can be transcribed as "nothing", rather than "nothing [mispronunciation]". Similarly, when someone shortens "to" to "t" e.g. "going t- the shop", writing "to" without the mispronunciation tag is fine. However, "i fink so" should be transcribed as "i think [mispronunciation] so", as the pronunciation has changed quite a lot. - "bout" is often said instead of "about". This should be transcribed as "about [mispronunciation]". Similarly, people often say "cos" or "cause" instead of "because". Again, we write this as "because [mispronunciation]". Logically, these should be fragments, but they are so accepted that we treat them as common mispronunciations. - The caller says "Yeah that'll do". Q: Should I put "Yeah that will [mispronunciation] do"? A: No. The system can cope with standard grammatical contractions such as "that'll" for "that will". - Q: Should I correct "nuffin" to "nothing [mispronunciation]"? A: yes. - The caller says "I'll call ya". Q: How should I transcribe this? A: "I'll call you [mispronunciation]". - The caller says "mm-hm" or "uh-huh". --- Mark these as [hesitation]. They do actually mean "yes", but the system is never going to able to distinguish these reliably from normal hesitations. - The caller says "zee" or "zed" to indicate the letter "z". Q: Should I indicate which pronunciation the caller used? A: No. Transcribe both as "z". Someone will train the pronunciation system to handle both pronunciations of "z". ### 13: SPELLINGS FOR O2 DATA list of spellings for data from O2. 1) Spelling rules: - add-on (noun) - advisor (not adviser) - bolt-on (not bolt on) - broadband - enquiry (not inquiry) - landline - log in (verb) (not log-in, login) - login (noun) (not log-in, log in) - mix up (verb) - mix-up (noun) - passcode - pre-authorise - prepay, prepayment, prepaid - sign in - top up (verb) - top-up (noun) - upgrade - username - voicemail 2) O2 vocabulary spelling rules: - o two NOT o2, O2, oh two, oh to... - my o two - o two recycle (O2 Recycle) - o two refresh (O2 Refresh) - o two world chat (O2 World Chat) - apple ipad pro (Apple iPad Pro) - four g (4G) - g b (GB = gigabytes) - gigabytes - h t c (HTC = a mobile phone manufacturing company) - iphone - iphone six s plus - l g g four ("LG G4") - n s p c c (NSPCC = National Society for the Prevention of Cruelty to Children. O2 have a free child safety helpline) - pac code ("Porting Authorisation code" for users wanting to retain their number when changing network providers) - pay and go - pay as you go - pay monthly - pebble steel (Pebble Steel - smartwatch) - puk code ("Personal Unlocking code" for SIM card unlocking) - sim card - sixty four g b (GB = gigabytes) - smartwatch - sony xperia z three (Sony Xperia Z3) - store and share (O2 Store & Share - save files in the cloud and access from any device) - three g (3G) - tu go (TU Go app for Pay Monthly and Business plans - uses wifi for calls and messages when there's no signal) - wifi

Default FAQ