Korean Machine Translation

				     Jun Yang

			         jun@glue.umd.edu


Abstract: This paper describes a project regarding the Korean Machine
 
Translation.  The project was done in a semester so it is not the complete work
 
for the Korean Machine Translation.  Rather, it is just the part of the Korean
 
Machine Translation.  The project was to produce two files.  One file was for
 
the rule of Korean language, and the other file was for the Korean dictionary
 
entry.  The two files will be used for a program called PC-KIMMO that is for
 
doing computational phonology and morphology.  It is typically used to build
 
morphological parser for natural language processing systems.


1  Introduction

PC-KIMMO is a program for doing computational phonology and morphology.  It is 

typically used to build morphological parser for natural language processing 

systems.  PC-KIMMO is described in the book "PC-KIMMO: a two-level processor for 
morphological analysis" by Evan L. Antworth, published by the Summer Institute 

of Linguistics (1990).  KGEN is an auxiliary program for PC-KIMMO.  The KGEN 

program was developed by Nathan Miles as a part-time project.
	
	The phonological component of PC-KIMMO is based on a rule formalism 

called two-level phonology.  A typical two-level rule looks like this:

	y:i => @:C __ +:0

PC-KIMMO cannot directly use rules written in this high-level notation.  

Two-level rules must first be translated into finite state tables such as this:

	   	@	y	+	@
		C	i	0	@
	1:	2	0	1	1
	2:	2	3	2	1
	3.	0	0	1	0

Then, the finite state tables can be used as the rule in the PC-KIMMO.  In order

to be used in PC-KIMMO, the finite state tables can be created as the output of

the KGEN with the two-level rule as the input of the KGEN.  


2  Two-Level Rule

The two-level rule is the KIMMO format.  So the first task is to convert the 

Korean rule to the two-level rule.  The typical Korean rule given originally

was: 

	l  -->  0  /  __  +  n

It means that when a syllable that starts with "n" comes after a syllable that 

ends with "l", the "l" becomes "0" (meaning that it disappears).  The KIMMO 

format for this rule is:

	l:0 => __ +:. n:n

The character "." is used as a morphological delimiter in Korean.


3  Finite State Table

The KGEN accepts as input a file of the two-level rules and produces as output 

a PC-KIMMO rules file that has the finite state tables.  The KGEN input file 

contains three sections: subset specifications, feasible pairs, and the rules 

section.

	The subsets section of the KGEN input file is optional.  The subset 

section declares the subset names and the alphabetic characters they specify.  

For example, if you want to declare the subset for vowels (a, e, i, o, and u), 

you can do:

	SUBSET V  a e i o u

where "V" is the subset name you create.

	The pairs section declares all feasible pairs used in the description.

This includes both default correspondences (such as a:a and b:b) and special 

correspondences (such as y:i and s:0).  The pairs section is obligatory.  Here 

is an example:

	PAIRS	a e i o u
		a e i o u

	PAIRS	a k l n p p s t u u
		i 0 0 0 o W 0 l l 0

	A rule is declared with the keyword RULE.  The rule must be written all

 on one line; for example,

	RULE	l:0 => __ +:. n:n

The environment line must be one or more underline characters.  White space 

(spaces, tabs, but not new lines) may be used freely to improve readability.


4  Dictionary Entry

The output of the KGEN is used for the rule of Korean language in the

KIMMO.  In addition to the rule, the KIMMO needs the Korean dictionary entry 

that accompany the morphological rule.  For example, here are the entries 

corresponding to "RULE  l:0 => __ +:. n:n":

	ROOTS:

	kel /ENDING "(cat v) 	(root kel-ta) 	(gloss hang)"
	kil /ENDING "(cat v) 	(root kil-ta) 	(gloss be_long)"
	kal /ENDING "(cat v) 	(root kal-ta) 	(gloss grind)"
	tal /ENDING "(cat v)	(root tal-ta)	(gloss attach connect)"

	ENDINGS:

	+nikka /End "(gloss since)"

where "cat v" means that the category is the verb,  "root kel-ta" means that the

root is "kel-ta", "gloss hang" means that the word means "hang", and "+nikka"

means that "nikka" can be substituted for the ending of the word ("ta" is the 

ending of the word in this case) with the additional meaning of the word (

"since" in this case).  So, "kel-nikka" becomes "ke nikka" because of the rule,

and it means "hang since" literally.  But according to the Korean grammar rule,

actually, "hang since" is "since <subject> hang".  I am not going to talk about

this detail of Korean rule since it was not the part of this project.  Anyway, 

by adding entries like that, I have provided the dictionary entry for the KIMMO.


5  Conclusions

By having the PC-KIMMO and the KGEN already, it was easier than I expected to 

work on the Korean Machine Translation.  I guess I may want to do some more hard

work such as studying how the KIMMO and the KGEN are programmed and improving 

them if it is possible in the future.

APPENDIX 1: Morphology Rules

Here is KOREAN.TXT from Jun:

!; KOREAN.RUL 4-DEC-96
!; Tables generated by KGEN
!; By Jun S Yang
!
;NULL 0
;ANY  @
;BOUNDARY #
!
;SECTION 1: Subsets

SUBSET C    b c d f g h j k l m n p q r s t v w x y z    ; consonants
SUBSET V    a e i o u   ; vowels

;SECTION 2: Feasible Pairs

; Consonant defaults
PAIRS  b c d f g h j k l m n p q r s t v w x y z
       b c d f g h j k l m n p q r s t v w x y z

; Vowel defaults
PAIRS  a e i o u
       a e i o u

; Special correspondences
PAIRS  + + + + a k l n p p s t u u
       . i k u i 0 0 0 o w 0 l l 0

;SECTION 3: Rule Syntax

; Rule 1
RULE l:0 => ___ +:. n

; Rule 2
RULE p:o => __ +:. V
RULE p:w => __ +:u V

; Rule 3
RULE t:l => __ +:. V

; Rule 4
RULE u:l => l __ +:. a
RULE u:l => l __ +:. e

; Rule 5
RULE s:0 => __ +:. V

; Rule 6
RULE k:0 => C +:. ___ a:i
RULE l:0 => C +:. ___ u l

; Rule 7
RULE l:l => C +:u ___ o
RULE u:0 => l +:. ___ l o

; Rule 8
RULE w:w => C +:k ___ a
RULE l:l => C +:i ___ a n g

; Rule 9
RULE n:0 => C +:. ___ u n

; Rule 10
RULE n:n => C +:i ___ a

END

APPENDIX 2: Kimmo Automata

Here is KOREAN.RUL from Jun:

; KOREAN.RUL 4-DEC-96
; Tables generated by KGEN
; By Jun S Yang


ALPHABET
     b c d f g h j k l m n p q r s t v w x y z a e i o u + . 
NULL 0
ANY @
BOUNDARY #
SUBSET C   b c d f g h j k l m n p q r s t v w x y z 
SUBSET V   a e i o u 

RULE "defaults" 1 31
    b c d f g h j k l m n p q r s t v w x y z a e i o u + + + + @
    b c d f g h j k l m n p q r s t v w x y z a e i o u . i k u @
 1: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

RULE "defaults" 1 11
    a k l n p p s t u u @
    i 0 0 0 o w 0 l l 0 @
 1: 1 1 1 1 1 1 1 1 1 1 1 

RULE " l:0 => ___ +:. n" 3 4

      l  +  n   @
      0  .  n   @
 1:   2  1  1   1
 2.   0  3  0   0
 3.   0  0  1   0


RULE " p:o => __ +:. V" 3 4

      p  +  V   @
      o  .  V   @
 1:   2  1  1   1
 2.   0  3  0   0
 3.   0  0  1   0


RULE " p:w => __ +:u V" 3 4

      p  +  V   @
      w  u  V   @
 1:   2  1  1   1
 2.   0  3  0   0
 3.   0  0  1   0


RULE " t:l => __ +:. V" 3 4

      t  +  V   @
      l  .  V   @
 1:   2  1  1   1
 2.   0  3  0   0
 3.   0  0  1   0


RULE " u:l => l __ +:. a" 4 5

      u  l  +  a   @
      l  l  .  a   @
 1:   0  2  1  1   1
 2:   3  2  1  1   1
 3.   0  0  4  0   0
 4.   0  0  0  1   0


RULE " u:l => l __ +:. e" 4 5

      u  l  +  e   @
      l  l  .  e   @
 1:   0  2  1  1   1
 2:   3  2  1  1   1
 3.   0  0  4  0   0
 4.   0  0  0  1   0


RULE " s:0 => __ +:. V" 3 4

      s  +  V   @
      0  .  V   @
 1:   2  1  1   1
 2.   0  3  0   0
 3.   0  0  1   0


RULE " k:0 => C +:. ___ a:i" 4 5

      k  C  +  a   @
      0  C  .  i   @
 1:   0  2  1  1   1
 2:   0  2  3  1   1
 3:   4  2  1  1   1
 4.   0  0  0  1   0


RULE " l:0 => C +:. ___ u l" 5 6

      l  C  +  u  l   @
      0  C  .  u  l   @
 1:   0  2  1  1  2   1
 2:   0  2  3  1  2   1
 3:   4  2  1  1  2   1
 4.   0  0  0  5  0   0
 5.   0  0  0  0  2   0


RULE " l:l => C +:u ___ o" 4 5

      l  C  +  o   @
      l  C  u  o   @
 1:   0  2  1  1   1
 2:   0  2  3  1   1
 3:   4  2  1  1   1
 4.   0  0  0  1   0


RULE " u:0 => l +:. ___ l o" 5 5

      u  l  +  o   @
      0  l  .  o   @
 1:   0  2  1  1   1
 2:   0  2  3  1   1
 3:   4  2  1  1   1
 4.   0  5  0  0   0
 5.   0  0  0  1   0


RULE " w:w => C +:k ___ a" 4 5

      w  C  +  a   @
      w  C  k  a   @
 1:   0  2  1  1   1
 2:   0  2  3  1   1
 3:   4  2  1  1   1
 4.   0  0  0  1   0


RULE " l:l => C +:i ___ a n g" 6 7

      l  C  +  a  n  g   @
      l  C  i  a  n  g   @
 1:   0  2  1  1  2  2   1
 2:   0  2  3  1  2  2   1
 3:   4  2  1  1  2  2   1
 4.   0  0  0  5  0  0   0
 5.   0  0  0  0  6  0   0
 6.   0  0  0  0  0  2   0


RULE " n:0 => C +:. ___ u n" 5 6

      n  C  +  u  n   @
      0  C  .  u  n   @
 1:   0  2  1  1  2   1
 2:   0  2  3  1  2   1
 3:   4  2  1  1  2   1
 4.   0  0  0  5  0   0
 5.   0  0  0  0  2   0


RULE " n:n => C +:i ___ a" 4 5

      n  C  +  a   @
      n  C  i  a   @
 1:   0  2  1  1   1
 2:   0  2  3  1   1
 3:   4  2  1  1   1
 4.   0  0  0  1   0

END

APPENDIX 3: Conversion of Morphology Rules into Kimmo Rule Format

This appendix shows all the Morphology Rules that were converted into
Kimmo Rule Format.  For example, the first rule is:

                
l --> 0 / ___ + n 

The Kimmo format for this rule is:

l:0 => ___+:. n:n

Note that I used the character "." since this is supposed to
be a morphological delimeter in Korean. 

In addition, we will have to work on Korean dictionary entries that
accompany the morphological rules.  For example, here are the entries
corresponding to the first Korean rule:

ROOTS:

kel /ENDING "(cat v) (root kel-ta) (gloss hang)"
kil /ENDING "(cat v) (root kil-ta) (gloss be_long)"
kal /ENDING "(cat v) (root kal-ta) (gloss grind)"
tal /ENDING "(cat v) (root tal-ta) (gloss attach connect)"

ENDINGS:

+nikka /End "(gloss since)"


------------------------------

2.2. Irregular verbs

1. l --> 0 / ___ + n  

a) nol-ta ('play', 'take a rest')
   aitul-i  cip-eyse  no-nikka  nemwu sikkulepta
 children-Nom house-in play-since very  noisy
(It's very noisy since the children are playing in the house.)

b) kel-ta ('hang')
  ke-nikka
  hang-since

c) kil-ta ('is long')
   ki-nikka
 is long-since

d) kal-ta ('grind')
   ka-nikka
 grind-since

e) tal-ta ('attach' 'connect')

f) pel-ta ('earn' 'get')
 ton-ul pe-nikka
money-Acc earn-since

g) mwul-ta ('bite')

h) sal-ta ('live' 'stay alive')

i) cwul-ta ('diminish' 'lessen')

j) nul-ta ('increase')

k) mel-ta ('is far away')

2.  p --> o / __ + V(owel)
    p --> wu / __ + V

a) tep-ta ('is hot')
 nalssi-ka te-wu-myen  swuyeng-ul ha-ca
 weather-Nom is hot-if  swimming-Acc do-Propositive
(If the weather is hot, let's go swimming.)

b) kop-ta ('is beautiful' 'is elegant')
 kop-ase --> kowu-ase --> ko-wase  (Here, you can see that after /p/ is 
changed to /wu/, there is a contraction of vowels, resulting in /wase/
rather than /wuase/.  Such contractions are very usual.)

kop-ase  --> ko-wase
is beautiful-because

c) komap-ta ('be grateful')

   komap-ase --> komawase
is grateful-because

d) nwup-ta ('lie')
   nwup-e -->  nwu-we
  lie-and

e) cwup-ta ('pick up')

f) chwup-ta ('is cold')

g) kip-ta ('sew/mend')


3. t --> l / __ + V(owel)
 
mwut-ta ('inquire' 'ask')
a) mwut-ese --> mwulese
  ask-and
 John-eykey mwul-ese hwakin-haca
    -to    ask-and   confirm-Propositive
(Let's ask John and confirm it.)

b) ket-ta ('walk')
  ket-ese --> kel-ese

c) kit-ta ('draw' (water) )
 mwul-ul  kil-ese  ka-ca
 water-Acc draw-and go-Propositive
(Let's draw water (from a well) and go.)

d) sit-ta ('load')
sit-ese --> sil-ese

e) tut-ta ('listen' 'hear')
tut-ese --> tul-ese


4. u --> l / l __ + a (or e)

a) kilu-ta ('raise' 'breed')
   kilu-e --> kill-e
  Mary-nun so-lul kill-e  ton-ul pelessta
     -Top cow-Acc breed-and money-Acc  made
(Mary bred cows and made money.)

b) nalu-ta ('carry')
  nalu-a --> nall-a

c) kwulu-ta ('roll')
  kwulu-e  --> kwull-e

d) hulu-ta ('flow' 'stream')
 hulu-e --> hull-e
 flow-and

e) nwulu-ta ('press')
  nwulu-e  --> nwull-e
 press-and

f) ccilu-ta ('poke')
  ccilu-e --> ccill-e

g) kolu-ta ('choose')
  kolu-a --> koll-a
 choose-and

5. s --> 0 / __ + V(owel)

a) cis-ta (build)
  cis-umyen  --> ci-umyen
  cip-ul ci-umeyn  na-nun  kot isa-lul hal-keta
 house-Acc build-if/when I-Top soon moving-Acc do-will
(If (they) build the house, I will move there very soon.)

b) kus-ta ('draw' (a line) )
 kus-umyen --> ku-umyen

c) is-ta ('connect')
  is-ese --> i-ese
 connect-and

d) pwus-ta ('pour')
 pwus-umyen --> pwu-umyen

e) ces-ta ('stir')
 ces-ese --> ce-ese
 stir-and


6. ka --> i / C(onsonant) + ___
   Nominative case marker: ka/ i, kkeyse(-honorific)

   lul --> ul / C + ___
   Accusative case marker: lul/ ul

   Genitive case marker: uy

7.  Postpositions:
    lo --> ulo / C + ___ (Exception: ulo --> lo / l + ___ )

    Instrumental lo/ ulo
    Reason or Source: lo/ ulo, ey
    Status: lo/ ulo
    Resultative: lo/ ulo
    Locative

         (a) (default): ey

         (b) [+Animate] (=Dative): ey-key, hanthey, kkey (-honorific)

         (c) Source: ey-se, ey-key-se, hanthey-se, pwuthe (-beginning point)

        *(d) Direction: lo/ ulo

         (e) Ending point: kkaci

         (f) Eventive: eyse

    Temporal
         (a) Eventive: ey

         (b) Beginning point: eyse, pwuthe

         (c) Ending point: kkaci

     Measure: ey

8.  Postpositions:
    wa --> kwa / C + ___
    lang --> ilang / C + ___

    Commitative: wa/ kwa, lang/ ilang, hako

9.  Delimeter:
    nun --> un / C + ___

    Topic: nun/ un

         cf: Topic marking of the subject is required when the sentence is
         classified as a depictive statement.

    Only: man

    Too: to

    Even: cocha, mace

    Each: mata

    
10. Postpositions:
    na --> ina / C + ___
    Unselective or Emphasis: na/ ina

    Amount: (Only in interrogative sentences): na/ ina

     ** When a delimiter is attached to the subject/ object NP, nomina-
     tive/ accusative case marker on the NP must be deleted. (Case III.(2)
     above is an exception.)


        - John-un /*John-i-un     Mary-to/ *Mary-ul-to     coahanta
              -Top      -Nom-Top      -too       -Acc-too      like
          `John likes Mary, too.'
          (`John, even Mary likes him.')


     ** Delimiters can be attached to adverbs/ verbs as well as nouns.


11. Others


      (1) Comparative: pota, kathi, chelem

      (2) Coordinate conjunction

          (a) And: wa/ kwa, lang/ ilang, hako

          (b) Or: na/ ina


======================================================================
Verbal Endings


We undertook an analysis of Verbal Endings from (Ihm et al., 1988).  In
the description below, unless specified otherwise, the initial vowel of the
verbal ending (= `u' or `e') is deleted when the verbal stem ends with a
vowel. (Verbs whose stem ends with `-o'/'-wu' sound are exceptions to this
generalization.)  Also, the initial vowel `e' of verbal ending changes to `a'
when the last syllable of verbal stem includes `-a' or `-o' sound.


   I.Terminative Endings

     *Terminative endings represent the mood type of sentences. They are
     classified based on the speech level. Speech level is mainly determined
     by the hearer's age, social status etc. relative to the speaker's.


      (1) Declarative

          (a) [super high]: `upnita'

          (b) [high]: `eyo', `ciyo'

          (c) [mid-low]: `so', `ne'

         (d) [low]: `ta', `e'

     (2) Interrogative

         (a) [super high]: `upnikka'

         (b) [high]: `eyo', `nayo'

         (c) [mid-low]: `swu', `na'

         (d) [low]: `ni', `nya'

     (3) Imperative

         (a) [super high]: `useyyo'

         (b) [high]: `eyo'

         (c) [mid-low]: `key'

         (d) [low]: `ela', `e'

     (4) Propositive

         (a) [super high]: `siciyo'

         (b) [high]: `upsita', `ciyo'

         (c) [mid-low]: `use'

         (d) [low]: `ca'


 II.Adnominal Endings (-involving either a relative clause or a complex
    NP clause)


     (1) Present tense

         (a) `un': for adjectival verbs

         (b) `nun': otherwise

              - kem-un    koyangi
                black-Adnm    cat
                `A cat which is black'/'a black cat'


              - chayk-ul    ilk-nun  salam
                 book-Acc read-Adnm  man
                `A man who is reading a book'


     (2) Future tense: `ul'

     (3) Past tense

         (a) (ess)`ten': implying reminiscence

         (b) `un': used only with non-adjectival verbs


III.Adverbial Endings


     (1) reason or cause: `se' `nikka' --->for, as

     (2) weak contrast: `nuntey' --->while

     (3) conditional: `umyen' `ketun' --->if, when

     (4) purpose: `lyeko' `le' --->in order to (do), for (do)ing

     (5) prerequisite:`(e/a)ya' `(e/a)yaman' --->only when, only if

     (6) goal: `tolok' --->so that (one) may/can (do)

     (7) concurrence: `umyense' `umye'

     (8) contrast: `ciman' --->although (cf: coordinate conjunction `ciman')

     (9) separate action: `ta(ka)'

     (10) greater degree:`ulswulok' --->the more (....the more)

     (11) immediate sequence: `ca' `camaca' ---> as soon as


 IV. Nominal Endings -'um' and `ki'


      (1) [+tense]: `um'

      (2) [-tense]: `ki'


     *`um' must be accompanied by a case marker.

     *`um' generally occurs with factive predicate, and `ki' occurs with
     nonfactive predicate.


  V. Coordinate Conjunction


      (1) and: `ko', `se'

          *`se' is used when the first conjunct precedes the second one in
          time sequence or when the first conjunct is subordinate to the
          second one. (cf:`kose')

      (2) or: `kena'

      (3) but: `ciman', `una'


 VI. (Quotative) Complementizer: `ko'

     *`ko' must be preceded by terminative endings

======================================================================
Korean Auxiliary Verbs


Our analysis of Korean auxiliary verbs from (Ihm et al., 1988) revealed that
such verbs are classified primarily into two groups: One corresponds to an
aspectual specification, and the other corresponds to the representation of
a state which is different from the present.

  I.Aspectual Specification

    *V = main verb stem


    (1a) V-e `peli-ta': completion

    (1b) V-e `nay-ta': accomplishment

    (1c) V-ko `mal-ta': perfective(?)(-something is done at last)

    (2a) V-e `noh-ta': completion + duration

    (2b) V-e `twu-ta': duration

    (2c) V-e `kaci-ko':  duration(?)  (-must be used in the form of `con-
         junction')

    (3a) V-kon `ha-ta': habitual

    (3b) V-e `tay-ta': repetition(?)

     (4) V-ko `iss-ta': progressive


 II.The State differing from the present


     (1) V-ko `siph-ta': hope (-want/hope to V)

     (2) V-na `siph-ta': (speaker's) guess

     (3) V-nunka `ha-ta': (speaker's) guess
         V-na `ha-ta'

     (4) V-un `tus-siph-ta': (speaker's) expect/ guess
         V-ul `tus-siph-ta'

     (5) V-un `tus-ha-ta': (speaker's) expect/ guess
         V-ul `tus-ha-ta'

     (6) V-eya `ha-ta': obligation (-have to V)

     (7) V-un `cheyha-ta': pretense (-pretend to V)

     (8) V-ul `ppenha-ta': almost (-almost did something, but no success/
         completion) ---> past tense required

     (9) V-ulye(ko) `ha-ta': is about to V

    (10) V-koca `ha-ta': volition

    (11) V-ulkka `ha-ta': plan(-not decisive)

    (12) V-ul `manha-ta': is worthwhile to V

    (13) V-na `po-ta': (speaker's) guess --->tense marker is not allowed


III.Others


     (1) V-e `cwu-ta': of benefit (-did something for others)

     (2) V-e `po-ta': trial

     (3) V-ci `anh-ta': negation (-does not V)