弘前大学人文社会科学部
文化創生課程 多文化共生コース


MENU

ゼミ・研究室紹介

国際共生論研究室/BUTLER ALASTAIR JAMES(准教授)

Welcome to Parsed Corpus Studies!

Hello everyone. My laboratory is named “Parsed Corpus Studies”. Parsed Corpus Studies deals with computer-readable collections of spoken and written data with additional information, called ANNOTATION. These collections are used to investigate a large number of issues, for example: whether word dependencies reflect differences between cultures, effects of historical developments, paths of child language acquisition, grammatical structures and dependencies, etc.
 As an example, suppose you wish to discover all the things that horses are described as doing in a given collection of Japanese texts. For this, you need to find all the action describing words (event predicates) where the action doer (the logical subject) is 馬 (or noun phrases formed on 馬).
 Besides expressions that are fairly reliably grammatical subjects, such as 馬が, there are other expressions that may or may not be subjects: 馬は, 馬も, 馬さえ, 馬なら, 馬に, 馬から, 馬の, etc. Furthermore, there are sentences such as 「馬が追い手を振りきって、走って、山へ逃げた」 where 馬 is the subject of three verbs (event predicates), but only appears local to one.
 Also, there might be pronouns that refer to previously mentioned horses, and you would like to find the event predicates these pronouns associate with as well.
 On top of this, there are expressions like 鳴く馬 and しっぽが白い馬, which depict horses as things that whinny or that have white tails, but where the status of 馬 as subject is not directly inferable by reference to particle marking or word order alone.
 With a parsed corpus, you can find all the event predicates in relation to which 馬 (or a noun phrase formed on 馬, or a pronoun co-referent with 馬) is a subject by referring to annotation information about syntactic structure, annotation information specifying grammatical role, and annotation information specifying semantic relations.
 The above is just one example of the kind of microscopic investigation of word-relations that becomes possible with a parsed corpus. What is especially exciting is that, with the existing corpora we continue to build, we can do this for substantial amounts of written and spoken data, notably for the English and Japanese languages, over varying time periods, and this for every word-relation that occurs in the data collections.

About the seminar and graduation thesis

In this department, before entering the third year, you have to decide which laboratory to belong to. In my laboratory, I accept any form of research that is linked to Parsed Corpora. The range of interests of the seminar students is very wide and inter-disciplinary.
 While each seminar student advances his or her own research, everyone needs the foundation of using data form Parsed Corpora on which the seminar is based. Therefore, in the third-year seminar, we focus on learning to acquire the basic knowledge that is necessary no matter what kind of research you plan to do. This includes learning how to search and utilise existing corpus data, such as the British National Corpus (BNC; http://www.natcorp.ox.ac.uk/), the Treebank Semantics Parsed Corpus (TSPC; http://www.compling.jp/ajb129/tspc.html), the Oxford-NINJAL Parsed Corpus of Old Japanese (ONCOJ; https://oncoj.ninjal.ac.jp/), and the NINJAL Parsed Corpus of Modern Japanese (NPCMJ; http://npcmj.ninjal.ac.jp/). This will involve learning about systems of word class tagging/morphological analysis, and syntactic and semantic layers of annotation.
 Your research will need Parsed Corpus data as its point of departure, but such data may not yet exist unless it is built by you. For building we try to harness computers as much as possible, and focus on what is possible from the UNIX command line with the AWK programming language. As a part of this, you will be studying example programs and writing programs of your own. With this background in place, we then gradually work towards specialised research for each person and narrow down the theme for a graduation thesis.
 In the 4th year seminar, writing a graduation thesis is the main task. This work proceeds while there are various other things to do, such as other courses to study and job hunting. Keeping on top of everything is not easy. Consequently, we can aim to slowly but steadily accumulate progress, notably with collecting and reading books and research papers to further your thesis aims. Your graduation thesis will clearly reflect how much related literature and materials were collected and read. It's a big task, but as the culmination of your undergraduate studies at university, the satisfaction of completing your graduation thesis will be a lifelong pleasure.

一覧へ戻る
ページトップへ