LINGUISTICS

Corpora

  1. Home
  2. Research
  3. Corpora

Corpora

Edit Content

Asian SignBank

The Asian SignBank is one of the research outputs of the Nippon Foundation funded project “Practical Dictionaries for Asian-Pacific Sign Languages (2003-2008)” and “Asia Pacific Sign Linguistics Research and Training Program (2006-2012)”. The Asian SignBank documents sign language data contributed by researchers from the Asian countries participating in the research and training program. This prototype was developed by the joint research efforts of the Department of Linguistics and Modern Languages and the Department of Systems Engineering and Engineering Management at CUHK. The research team is currently focusing on data input and compiling computing procedures to generate sign language dictionaries based on the data archived. The Asian SignBank is constructed with Sign Linguistics as a philosophical foundation. The main purpose is to facilitate componential analysis of sign entries. The applications include facilitating sign language learning and sign language documentation.

Research reports/papers based on the corpus(titles and/or links):

Corpus Development

Tang, Gladys and Cheng Ka Yiu. 2005. The Nippon Foundation Asian SignBank. An International Symposium – Instructional Technology and Education of the Deaf, National Technical Institute for the Deaf, Rochester Institute of Technology, New York, USA, 27-30 June 2005.

Phonology

Mak, Joe and Gladys Tang. 2008. Movement – Simultaneity And Dynamicity In The Phonological System Of Hong Kong Sign Language. The First SignTyp Conference, University of Connecticut, Storrs, USA, 26-28 June 2008.

Edit Content

Child Hong Kong Sign Language Corpus

The Child HKSL Corpus was initially established by Gladys Tang in 2002, as part of a research project entitled “Development of Hong Kong Sign Language by Deaf Children” supported by Hong Kong Research Grant Council #4278/01H. A donation from Dr. Alex Yasumoto currently supports the continuing development of the corpus. This corpus is the first of its kind in Asia. A major goal of this corpus is to facilitate sign language acquisition research, a growing but understudied field of research.

Phonology

Wong, Yuet On. 2008. Acquisition of Handshape in Hong Kong Sign Language: A Case Study. MPhil thesis, the Chinese University of Hong Kong.

Morphology/Syntax

Tang, Gladys and Scholastica Lam. 2006. Child Acquisition of Verbs in Hong Kong Sign Language, Paper presented at the 11th Symposium on Contemporary Linguistics, Tianjin, China, 13-15 October 2006.

Tang, Gladys, Scholastica Lam, Felix Sze, Prudence Lau and Jafi Lee. 2006. Acquisition of Verb Agreement in Hong Kong Sign Language’. Paper presented at TISLR 9, Florianópolis, Santa Catarina, Brazil, 6-9 December, 2006.

Tang, Gladys. 2007. The Acquisition of Perfective Aspect in Hong Kong Sign Language. Paper presented at the Workshop on Acquisition of Functional Categories in Asian Languages, 26 December 2007, the Chinese University of Hong Kong.

Lam, Scholastica and Gladys Tang. 2008. The effect of Input Ambiguity on Head Directionality in early HKSL. Paper presented at the Conference on Bilingual Acquisition in Early Childhood, the Chinese University of Hong Kong, 11-12 December 2008.

Lam, Wai Sze. 2009. Early Phrase Structure in Hong Kong Sign Language: A Case Study. Doctoral dissertation, the Chinese University of Hong Kong.

 

Edit Content

Hong Kong Cantonese Child Language Corpus (CANCORP)

Established and made public in 1996, this corpus was the joint effort of Thomas Hun-tak Lee (Principal Investigator CUHK), Colleen Wong (HKPU) and Samuel Cheung-Shing Leung (then HKU, now HKIEd), supported by an earmarked grant from the Hong Kong Research Grants Council (“The development of grammatical competence in Cantonese- speaking children’, 1991-1993, CUHK 335/95H). The following research students and assistants made a pivotal contribution to the project: Alice Shuk-yee Cheung, Patricia Man, Kitty Ka-Sinn Szeto, and Cathy Sin-Ping Wong.

The corpus is a longitudinal record of the early language development of 8 Cantonese-speaking children, each of whom was observed for one year from the time when they were between one and a half to two years old. Four of the children are male, and the other four female.

The corpus contains 171 files coded according to the CHAT format (Codes for the Human Analysis of Transcripts), 14 megabytes in size, and tagged with 33 parts-of-speech labels. The transcripts record conversational exchanges between the children and various adults, mostly the investigators, and often caretakers and other members of the family as well.

There are several versions of the corpus. The original version of the corpus contains transcripts with the utterances given in Chinese characters in the main tier, and parts of speech tags in a subsidiary tier. An updated version of this corpus in this format can be downloaded from the following URLs.

http://www.arts.cuhk.edu.hk/~lal/

http://humanum.arts.cuhk.edu.hk/~cancorp/

Another version of the corpus, in the form of a zipped file ‘LeeWongLeung.zip’, contains transcripts with parts of speech tags laid out in a different format than that of the original CANCORP corpus. This version of CANCORP, due to the work of Paul Fletcher’s research group at HKU, can be downloaded from the CHILDES URL below.

http://childes.psy.cmu.edu/data/EastAsian/Cantonese/