a valence dictionary of Polish [WG1]

Transkrypt

a valence dictionary of Polish [WG1]
A stronger formalism for specifying
lexicalised arguments in Walenty
– a valence dictionary of Polish [WG1]
Agnieszka Patejuk, Elżbieta Hajnicz, Adam Przepiórkowski, Marcin Woliński
Institute of Computer Science, Polish Academy of Sciences
WALENTY
About
I contains over 50 000 schemata for over 11 300 lemmata (as of 5/09/2014)
I describes valence of verbs, but also nouns and adjectives
I created on the basis of attested data (from NKJP, from the web)
I open source, available from: zil.ipipan.waw.pl/Walenty
Formalism
I syntactic positions (separated by “+”) are sets (enclosed in “{}”)
I realisations of the position are members of the relevant set (separated by “;”)
I realisations belong to the same set if they may be coordinated
PL subj{np(str)} + obj{np(str)} + {np(inst)}
+ {prepnp(o,loc); prepncp(o,loc,że)}
EN subj{np(str)} + obj{np(str)} + {np(inst)}
+ {prepnp(about,loc); prepncp(about,loc,that)}
EARLIER FORMALISM: DRAWBACKS
I too few lexicalised phrase types: fixed, comprepnp, NP, PP
I predefined, imprecise modification patterns (no individual adjustment)
. natr: modification not allowed
. atr: modification allowed (though not necessary)
. ratr: modification required (often possessive, NP or adjective)
. batr: specific modification required (possessive: swój or własny, ‘own’)
I comprepnp: treated uniformly despite differences in behaviour
I fixed: no information about category
EXTENDED FORMALISM: IMPROVEMENTS
Lexicalised arguments of any category
I lexicalised phrases of any base phrase type used in Walenty
I metacategory lex:
. base category as first parameter
. followed by constraints appropriate for relevant category (always: lemma)
EX możemy z
nim
(*piękne(go)) konie/*konia
kraść
can
with he.inst beautiful horse.acc.pl/sg steal.inf
‘He’s a trustworthy friend.’ (lit. ‘We can steal horses with him.’)
PL subj{np(str)} + {lex(infp(imperf),’kraść’,ratr
({lex(np(str),pl,’koń’,natr)} + {prepnp(z,inst)}))}
EN subj{np(str)} + {lex(infp(imperf),’steal’,ratr
({lex(np(str),pl,’horse’,natr)} + {prepnp(with,inst)}))}
I fixed: used as last resort, category provided explicitly
EX Po lekturze włosy stanęły mi
dęba/*dębu.
after reading hair stood I.dat oak.gen
PL fixed(advp(misc),’dęba’)
EN fixed(advp(misc),’oak’)
Redefinition and refinement of modification patterns
I only 3 base patterns: natr, atr, ratr
I atr and ratr: range of possible dependents can be constained individually
I constraints can be imposed arbitrarily deep: specify dependents of dependents
EX przyjęli obu
z
(szeroko) *(otwartymi)
ramionami
welcomed both.acc with wide
open.pl.inst arms.pl.inst
‘They welcomed both with arms wide-open.’
PL subj{np(str)} + obj{np(str)} + {prepnp(do,gen)} + {xp(mod);
lex(prepnp(z,inst),pl,XOR(’ramię’,’ręka’),
ratr({lex(adjp(agr),agr,agr,’otwarty’,
atr({lex(advp(misc),’szeroko’,natr)}))}))}
EN subj{np(str)} + obj{np(str)} + {prepnp(to,gen)} + {xp(mod);
lex(prepnp(with,inst),pl,XOR(’arm’,’hand’),
ratr({lex(adjp(agr),agr,agr,’open’,
atr({lex(advp(misc),’wide’,natr)}))}))}
http://zil.ipipan.waw.pl/
i
Rich syntax
I lists with disjunction: OR (non-exclusive) vs XOR (exclusive)
EX mogą spróbować swoich
własnych
sił
w
can try
self’s.acc.pl own.acc.pl strength.acc.pl in
projektowaniu wnętrz
design
interior
‘They can pit their strength against interior design.’ (lit. ‘They can try their
own strengths in interior design.’)
PL {lex(np(gen),pl,’siła’,ratr
({lex(adjp(agr),agr,agr,OR(’własny’,’swój’),natr)}))}
EN {lex(np(gen),pl,’strength’,ratr
({lex(adjp(agr),agr,agr,OR(’own’,’self’s’),natr)}))}
I (r)atr vs (r)atr1: allow/require any number of modifiers vs exactly one
EX Umknęło tylko (Pańskiej) uwadze,
że rzecz dotyczy ubezpieczeń
escaped only your.sg attention.sg that matter concerns insurance
‘It only escaped your attention, Sir, that the matter concerns insurance.’
PL subj{cp(że)} + {lex(np(gen),sg,’uwaga’,atr1({possp}))}
EN subj{cp(that)} + {lex(np(gen),sg,’attention’,atr1({possp}))}
I coordination (semi-colons) vs concatenation (commas) in possp
EX Umknęło (Pańskiej własnej) / (*mojej twojej) uwadze,
że . . .
escaped your.sg own.sg
my.sg your.sg attention.sg that
‘It escaped your own attention, Sir, that. . . ’
EX Umknęło Pańskiej i
mojej/*własnej (własnej) uwadze,
że . . .
escaped your.sg and my/own.sg
own.sg attention.sg that
‘It escaped my and your (own) attention, Sir, that. . . ’
PL {lex(adjp(agr),agr,agr,
OR(’mój’;’twój’;’nasz’;’wasz’;’czyj’;’czyjś’;’czyjkolwiek’;
’niczyj’;’swój’;’cudzy’;’pański’,’własny’),natr)} +
{np(gen)}
EN {lex(adjp(agr),agr,agr,
OR(’my’;’your.SG’;’our’;’your.PL’;’whose’;’sb’s’;’anyone’s’;
’nobody’s’;’self’s’;’sb else’s’;’your.HON’,’own’),natr)} +
{np(gen)}
Complex prepositions: comprepnp
I syntactic sugar for lexicalised PP with individual constraints
I abbreviation used in Walenty, definition (expansion) in realisation list
EX Około ośmiuset milionów cierpi z
powodu
niedożywienia.
around 800
million.pl suffer from reason.gen malnutrition.gen
‘Around 800 million people suffer because of malnutrition.’
(lit. ‘Around 800 million people suffer from the reason of malnutrition.’)
EX w Polsce cierpi z
tego
powodu
ok.
10 do 20 proc. populacji
in Poland suffer from this.gen reason.gen around 10 to 20 % population
‘In Poland around 10-20% of population suffer because of this.’
(lit. ‘In Poland around 10-20% of population suffer from this reason.’)
PL {lex(prepnp(z,gen), ,’powód’,
ratr({np(gen);ncp(gen,że)} + {adjp(agr)}))}
EN {lex(prepnp(from,gen), ,’reason’,
ratr({np(gen);ncp(gen,that)} + {adjp(agr)}))}
EX ten fakt nie pojawił się
w trakcie
negocjacji
this fact neg appeared refl in course.inst negotiations.gen
‘This fact has not surfaced in the course of the negotiations.’
EX ten fakt nie pojawił się
w *tym trakcie
this fact neg appeared refl in this course
PL {lex(prepnp(w,inst),sg,’trakt’,
ratr({np(gen);ncp(gen,że)}))}
EN {lex(prepnp(in,inst),sg,’course’,
ratr({np(gen);ncp(gen,that)}))}
FURTHER DEVELOPMENT
I to be added (by the end of 2015):
. semantic layer
. selectional preferences
I future plans: paradigmatic restrictions, pragmatic information, alignment with
English and Czech
{aep,hajnicz,adamp,wolinski}@ipipan.waw.pl