The SpiCE Corpus

SpiCE is an open-access corpus of conversational bilingual Speech in Cantonese and English. SpiCE includes high-quality audio recordings of 30-minute interviews with 34 early bilinguals in each language with accompanying transcriptions and language background informaiton. The corpus was first released in May 2021. Detailed information about the corpus is provided on the design and transcription pages, as well as in Khia's dissertation.


Citing the corpus

When referring to the corpus in the body of your text or presentation, you can call it the "SpiCE Corpus" (please keep the letters in their specified cases), but should spell out "Speech in Cantonese and English" at least once!

You can cite the corpus directly (preferred):

    author = {Johnson, Khia A.},
    publisher = {Scholars Portal Dataverse},
    title = {{SpiCE: Speech in Cantonese and English}},
    UNF = {UNF:6:c6HNIwwpBuQOA349cyCu7w==},
    year = {2021},
    version = {V1},
    doi = {10.5683/SP2/MJOXP3},
    url = {}

Alternatively, cite the paper that introduces the corpus:

    title = {SpiCE: A new open-access corpus of conversational bilingual speech in Cantonese and English},
    author = {Johnson, Khia A. and Babel, Molly and Fong, Ivan and Yiu, Nancy},
    booktitle = {Proceedings of the 12th Language Resources and Evaluation Conference},
    month = may,
    year = {2020},
    address = {Marseille, France},
    publisher = {European Language Resources Association},
    url = {},
    pages = {4089--4095},
    language = {English},
    ISBN = {979-10-95546-34-4},