a valence dictionary of Polish [WG1]
Transkrypt
a valence dictionary of Polish [WG1]
A stronger formalism for specifying lexicalised arguments in Walenty – a valence dictionary of Polish [WG1] Agnieszka Patejuk, Elżbieta Hajnicz, Adam Przepiórkowski, Marcin Woliński Institute of Computer Science, Polish Academy of Sciences WALENTY About I contains over 50 000 schemata for over 11 300 lemmata (as of 5/09/2014) I describes valence of verbs, but also nouns and adjectives I created on the basis of attested data (from NKJP, from the web) I open source, available from: zil.ipipan.waw.pl/Walenty Formalism I syntactic positions (separated by “+”) are sets (enclosed in “{}”) I realisations of the position are members of the relevant set (separated by “;”) I realisations belong to the same set if they may be coordinated PL subj{np(str)} + obj{np(str)} + {np(inst)} + {prepnp(o,loc); prepncp(o,loc,że)} EN subj{np(str)} + obj{np(str)} + {np(inst)} + {prepnp(about,loc); prepncp(about,loc,that)} EARLIER FORMALISM: DRAWBACKS I too few lexicalised phrase types: fixed, comprepnp, NP, PP I predefined, imprecise modification patterns (no individual adjustment) . natr: modification not allowed . atr: modification allowed (though not necessary) . ratr: modification required (often possessive, NP or adjective) . batr: specific modification required (possessive: swój or własny, ‘own’) I comprepnp: treated uniformly despite differences in behaviour I fixed: no information about category EXTENDED FORMALISM: IMPROVEMENTS Lexicalised arguments of any category I lexicalised phrases of any base phrase type used in Walenty I metacategory lex: . base category as first parameter . followed by constraints appropriate for relevant category (always: lemma) EX możemy z nim (*piękne(go)) konie/*konia kraść can with he.inst beautiful horse.acc.pl/sg steal.inf ‘He’s a trustworthy friend.’ (lit. ‘We can steal horses with him.’) PL subj{np(str)} + {lex(infp(imperf),’kraść’,ratr ({lex(np(str),pl,’koń’,natr)} + {prepnp(z,inst)}))} EN subj{np(str)} + {lex(infp(imperf),’steal’,ratr ({lex(np(str),pl,’horse’,natr)} + {prepnp(with,inst)}))} I fixed: used as last resort, category provided explicitly EX Po lekturze włosy stanęły mi dęba/*dębu. after reading hair stood I.dat oak.gen PL fixed(advp(misc),’dęba’) EN fixed(advp(misc),’oak’) Redefinition and refinement of modification patterns I only 3 base patterns: natr, atr, ratr I atr and ratr: range of possible dependents can be constained individually I constraints can be imposed arbitrarily deep: specify dependents of dependents EX przyjęli obu z (szeroko) *(otwartymi) ramionami welcomed both.acc with wide open.pl.inst arms.pl.inst ‘They welcomed both with arms wide-open.’ PL subj{np(str)} + obj{np(str)} + {prepnp(do,gen)} + {xp(mod); lex(prepnp(z,inst),pl,XOR(’ramię’,’ręka’), ratr({lex(adjp(agr),agr,agr,’otwarty’, atr({lex(advp(misc),’szeroko’,natr)}))}))} EN subj{np(str)} + obj{np(str)} + {prepnp(to,gen)} + {xp(mod); lex(prepnp(with,inst),pl,XOR(’arm’,’hand’), ratr({lex(adjp(agr),agr,agr,’open’, atr({lex(advp(misc),’wide’,natr)}))}))} http://zil.ipipan.waw.pl/ i Rich syntax I lists with disjunction: OR (non-exclusive) vs XOR (exclusive) EX mogą spróbować swoich własnych sił w can try self’s.acc.pl own.acc.pl strength.acc.pl in projektowaniu wnętrz design interior ‘They can pit their strength against interior design.’ (lit. ‘They can try their own strengths in interior design.’) PL {lex(np(gen),pl,’siła’,ratr ({lex(adjp(agr),agr,agr,OR(’własny’,’swój’),natr)}))} EN {lex(np(gen),pl,’strength’,ratr ({lex(adjp(agr),agr,agr,OR(’own’,’self’s’),natr)}))} I (r)atr vs (r)atr1: allow/require any number of modifiers vs exactly one EX Umknęło tylko (Pańskiej) uwadze, że rzecz dotyczy ubezpieczeń escaped only your.sg attention.sg that matter concerns insurance ‘It only escaped your attention, Sir, that the matter concerns insurance.’ PL subj{cp(że)} + {lex(np(gen),sg,’uwaga’,atr1({possp}))} EN subj{cp(that)} + {lex(np(gen),sg,’attention’,atr1({possp}))} I coordination (semi-colons) vs concatenation (commas) in possp EX Umknęło (Pańskiej własnej) / (*mojej twojej) uwadze, że . . . escaped your.sg own.sg my.sg your.sg attention.sg that ‘It escaped your own attention, Sir, that. . . ’ EX Umknęło Pańskiej i mojej/*własnej (własnej) uwadze, że . . . escaped your.sg and my/own.sg own.sg attention.sg that ‘It escaped my and your (own) attention, Sir, that. . . ’ PL {lex(adjp(agr),agr,agr, OR(’mój’;’twój’;’nasz’;’wasz’;’czyj’;’czyjś’;’czyjkolwiek’; ’niczyj’;’swój’;’cudzy’;’pański’,’własny’),natr)} + {np(gen)} EN {lex(adjp(agr),agr,agr, OR(’my’;’your.SG’;’our’;’your.PL’;’whose’;’sb’s’;’anyone’s’; ’nobody’s’;’self’s’;’sb else’s’;’your.HON’,’own’),natr)} + {np(gen)} Complex prepositions: comprepnp I syntactic sugar for lexicalised PP with individual constraints I abbreviation used in Walenty, definition (expansion) in realisation list EX Około ośmiuset milionów cierpi z powodu niedożywienia. around 800 million.pl suffer from reason.gen malnutrition.gen ‘Around 800 million people suffer because of malnutrition.’ (lit. ‘Around 800 million people suffer from the reason of malnutrition.’) EX w Polsce cierpi z tego powodu ok. 10 do 20 proc. populacji in Poland suffer from this.gen reason.gen around 10 to 20 % population ‘In Poland around 10-20% of population suffer because of this.’ (lit. ‘In Poland around 10-20% of population suffer from this reason.’) PL {lex(prepnp(z,gen), ,’powód’, ratr({np(gen);ncp(gen,że)} + {adjp(agr)}))} EN {lex(prepnp(from,gen), ,’reason’, ratr({np(gen);ncp(gen,that)} + {adjp(agr)}))} EX ten fakt nie pojawił się w trakcie negocjacji this fact neg appeared refl in course.inst negotiations.gen ‘This fact has not surfaced in the course of the negotiations.’ EX ten fakt nie pojawił się w *tym trakcie this fact neg appeared refl in this course PL {lex(prepnp(w,inst),sg,’trakt’, ratr({np(gen);ncp(gen,że)}))} EN {lex(prepnp(in,inst),sg,’course’, ratr({np(gen);ncp(gen,that)}))} FURTHER DEVELOPMENT I to be added (by the end of 2015): . semantic layer . selectional preferences I future plans: paradigmatic restrictions, pragmatic information, alignment with English and Czech {aep,hajnicz,adamp,wolinski}@ipipan.waw.pl