Laboratorio de Lingüística Informática
At LLI, work on Japanese language is focused on two linguistic resources: a Japanese corpus and a dictionary of basic vocabulary. Both resources were developed as part of the research work of the Professor Chieko Kimura. Other researchers, such as Shin Abe, Kengo Matsui and Marta Garrote, also participated in the development of these resources.
Spoken Japanese Corpus
The spoken Japanese corpus is the result of the research work of the Professor Chieko Kimura. It is made up of more than 12 hours of recording, divided into three gropus, according to the kind of interaction: monologues, dialogues and conversations. The corpus essential data are shown below:
C-ORAL-JAPON | ||||
Tipo | Archivo | Caracteres | Duración | Localización |
Conversación | jcv01 | 2,690 | 0:13:01 | Tokio |
jcv02 | 1,505 | 0:07:49 | Tokio | |
jcv03 | 1,913 | 0:09:49 | Tokio | |
jcv04 | 2,311 | 0:10:51 | Tokio | |
jcv05 | 1,856 | 0:08:48 | Tokio | |
jcv06 | 4,120 | 0:18:21 | Tokio | |
Diálogo | jdl01 | 5,419 | 0:30:37 | Madrid |
jdl02 | 7,520 | 0:37:59 | Madrid | |
jdl03 | 2,977 | 0:13:56 | Madrid | |
jdl04 | 1,234 | 0:06:12 | Tokio | |
jdl05 | 2,615 | 0:10:38 | Madrid | |
jdl06 | 2,297 | 0:09:53 | Madrid | |
jdl07 | 2,976 | 0:18:08 | Madrid | |
jdl08 | 3,901 | 0:22:36 | Madrid | |
jdl09 | 3,012 | 0:14:43 | Madrid | |
jdl010 | 3,328 | 0:17:07 | Madrid | |
jdl011 | 1,462 | 0:07:09 | Madrid | |
jdl012 | 1,452 | 0:06:23 | Madrid | |
jdl013 | 3,112 | 0:16:35 | Madrid | |
jdl014 | 2,905 | 0:12:42 | Madrid | |
jdl015 | 2,648 | 0:15:27 | Tokio | |
jdl016 | 2,750 | 0:13:16 | Tokio | |
jdl017 | 1,405 | 0:07:08 | Tokio | |
Monólogo | jmn01 | 2,041 | 0:11:18 | Madrid |
jmn02 | 2,887 | 0:16:20 | Tokio | |
jmn03 | 1,482 | 0:10:17 | Tokio | |
jmn04 | 7,552 | 0:38:47 | Tokio | |
jmn05 | 2,867 | 0:22:45 | Tokio | |
jmn06 | 7,683 | 0:52:31 | Tokio | |
jmn07 | 3,170 | 1:05:53 | Tokio | |
jmn08 | 2,962 | 0:19:37 | Tokio | |
jmn09 | 9,898 | 0:59:43 | Shizouka | |
jmn010 | 979 | 0:08:08 | Shizouka | |
jmn011 | 948 | 0:05:24 | Shizouka | |
jmn012 | 1,171 | 0:23:28 | Shizouka | |
jmn013 | 1,409 | 0:09:23 | Shizouka | |
jmn014 | 644 | 0:04:19 | Shizouka | |
jmn015 | 10,016 | 0:50:50 | Madrid | |
jmn016 | 4,177 | 0:27:09 | Madrid | |
Total | 125,294 | 12:35:00 |
Currently, the work is focused on the following aims: