Generally speaking, a corpus is a set of texts which linguists have long been using in their research. Nowadays, electronic corpora is available for most languages:
Electronic Corpora
Corpus of Contemporary American English (COCA)
http://corpus.byu.edu/coca/x.asp?w=1024&h=600
British National Corpus
http://www.natcorp.ox.ac.uk/
The CORIS (Corpus di Italiano Scritto).
http://dslo.unibo.it/coris_ita.html
The German National Corpus.
http://www.dwds.de/
The Russian Reference Corpus (BOKR).
http://ruscorpora.ru/
Since the 1990s, corpus-based research has been applied to language teaching. Corpora give both teachers and language learners quick access to authentic examples of word usage and the frequency information. Besides most corpora contain textual (register, genre, domain) and sociolinguistic (user gender and age) metadata as well as part-of-speech tagging.
Corpus-analysis tools provide access to electronic corpora and present the information retrieved in a different ways. Word-frequency lists provide the information about the number of words and their frequency in a corpus. They can be sorted according to their occurrence in the corpus, alphabetical order, and order of frequency. Lemmatized lists group words under a lemma (i.e., all related word forms). Stop lists contain units to be ignored by the computer. Ignoring function words, for example, provides a better idea of the semantic content of the corpus. Besides, corpus-based research contributed to the development of bridge bilingual dictionaries with definitions in the target language, dictionaries of collocations helping to learn how words occur in sentences, and learner dictionaries. Сorpus-based learner dictionaries use a limited number of lexical units to define words, which helps learners understand the definitions easily. Besides, by using corpora lexicographers can provide authentic examples in learner dictionaries instead of making up non-existing ones.
Concordancers are corpus-analysis tools of major importance that can retrieve all the occurrences of a unit in the context. They can be either monolingual or bilingual.
Monolingual concordancers operate on monolingual texts. The information received can be displayed in a number of ways. The most common type of display format is KWIC (key word in context) which shows the key word in the centre. The lists can be sorted according to the order of appearance in the corpus, or lining alphabetically the words preceding and following the key word.
Collocation generators determine collocations. They apply the mutual information (MI) formula. Pairs with high multi-unit scores are more likely to be collocations than words with low MI scores. However, there are certain limitations to the use of the MI formula. MI may fail to recognize a collocation if the number of co-occurrences within a corpus is too small. The collocations can be displayed in alphabetical or frequency-ranked order. Sometimes there may be one or more intervening words inside the collocation within a user-specified span.
Furthermore, corpus-based grammars have improved the quality of grammatical descriptions by providing real corpus examples.
The corpus-based analysis of TEFL syllabuses and teaching materials showed that most English textbooks lack realistic examples of language usage. As a result, new corpus-based courses present real life examples that show the usage of vocabulary and grammar.
Corpus-based approach is also used in language testing.
A learner corpus comprises data produced by learners of a foreign language which helps better understand the process of second language acquisition. These findings can be used to design curriculum, materials development and teaching methodology. Learner corpora can also help learners to analyse their own or their classmates’ writings and correct errors.
In conclusion, it should be noted that teachers should be trained to take advantage of the corpus-management tools and given access to appropriate corpus resources. Besides, a corpus should be an open source for learners.
References
Aijmer, K. (2009) Corpora and Language Teaching. Amsterdam: John Benjamins.
Chambers, A. (2007) ‘Popularising corpus consultation by language learners and teachers’ in E. Hidalgo, L. Quereda, and J. Santana (eds.) Corpora in the Foreign Language Classroom: Selected Papers from the Sixth International Conference on Teaching and Language Corpora (TaLC 6), pp. 3–16. Amsterdam: Rodopi.
Granger, S. (2002) ‘A bird’s-eye view of learner corpus research’ in S. Granger, J. Hung and S. Petch-Tyson (eds.) Computer Learner Corpora, Second Language Acquisition and Foreign Language Teaching, pp. 3–33. Amsterdam: John Benjamins.
Kennedy, G. (2003) ‘Amplifier collocations in the British National Corpus: Implications for English language teaching’. TESOL Quarterly 37/3: 467-487.