Semantic analysis: predicates and arguments
One of the things that is needed for the semantic analysis of a sentence is the extraction of its predicates and their arguments. When I was trying to find out how to do this, I came by several linguistic techniques that are involved in this task. I thought it might be interesting for you to see what's available in this field. It was certainly interesting to me and it even made me understand some of the stuff that always troubled me in high school :)
Introduction
Let's start with an example. Take this simple sentence:
John Milton wrote Paradise Lost.
Using predicate logic we can write this sentence as follows:
write(john-milton, paradise-lost)
This representation of the sentence has the advantage that it can be stored easily in a relational database. Note that the paste tense has been lost in this representation. Also note that the token 'john-milton' is a constant that represents that which is named "John Milton" in the english sentence. And finally notice that 'write' has two arguments (or variables): the first one is assigned to john-milton, the second to paradise-lost. At this point it is not clear why there are two arguments and not three or four. Also it is not clear why john-milton needs to be in the first position and paradise-lost in the second.
But the actual sentence I want to be able to parse is:
John Milton wrote Paradise Lost in the sixteen fifties.
The sentence is true. Many scholars guess the epic poem is written in London around 1650-1660.
Syntactically, this sentence looks like this:
S
+-- NP
| +-- noun: John Milton
|
+-- VP
+-- verb: wrote
+-- NP
+-- NP
| +-- noun: Paradise Lost
|
+-- PP
+-- preposition: in
+-- NP
+-- determiner: the
+-- noun: sixteen fifties
This sentence doesn't fit predicate logic well, because we cannot simply add extra first-order sentences to represent the extra information ("in the sixteen fifties").
But let's try anyway:
write(john-milton, paradise-lost, decade-1650)
This looks good, but what would the following sentence look like?
John Milton wrote Paradise Lost in London.
write(john-milton, paradise-lost, london)
No that can't be right because that is the position where time used to be.
Predicate logic
A representation that solves this problem is like this:
isa(e, writing)
writer(e, john-milton)
writee(e, paradise-lost)
time(e, decade-1650)
As you can see an extra constant 'e' (for event) is introduced to link the different sentences together. But that's not all: along with e comes the constant 'writing', and the predicates 'isa', 'writer', 'writing', and 'time'.
What happens here is that the single predicate 'write' is split up and the concept of role is introduced. Each of the arguments of the predicate gets its own role and each of these roles is given a predicate. For example, argument 1 in the original predicate of 'write' now has the role of 'writer' and argument 2 has the role of 'writee'. By naming the roles explicitly rather than implictly it is possible to extend them arbitrarily. We can now add both 'time' and 'place' as extra roles.
However, calling these roles 'writer' and 'writee' is suboptimal. These roles can only be used in relation to the predicate 'write'. Whereas the role 'time' can be used to compare very different types of events, the role of 'writer' can only be used in the context of writing. Changing this role to 'agent' would generalize it and allows it to be used in a survey of all events in which a given person was the agent, for example. And the fact that the person is a writer can still be deducted from the fact that he or she is the agent in the event of 'writing'.
If we follow this reasoning, we end up with:
isa(e, writing)
agent(e, john-milton)
patient(e, paradise-lost)
time(e, decade-1650)
The question arises what this finite set of roles can be.
Scholastic semantic analysis
In school we are taught that a sentence can be analysed in two ways: syntactically (in Dutch: taalkundige analyse) and semantically (redekundige analyse). Semantic analysis labels the parts of the sentence as 'subject', 'predicate', 'object', 'complement', and 'adjunct'.
Our sentence is analysed as:
subject: john-milton
predicate: write
object: paradise-lost
adjunct: decade-1650
This analysis produces general roles for the arguments in the sentence (subject, predicate, etc.). Several refinements of the roles are discerned, like 'infinitive' and 'indirect object' These roles keep very close to the surface form, however. This is especially true for passive sentences like:
Paradise Lost was written by John Milton in the sixteen fifties.
That produces paradise-lost as the subject and john-milton as the object.
subject: paradise-lost
predicate: write
object: john-milton
adjunct: decade-1650
This is not very useful for our purpose. We want the roles to say something about the objects in the sentence, not about their form or their place in the sentence.
Grammatical case
Using grammatical case (in Dutch: naamvallen), the roles of arguments are expressed in a sentence through various forms. To express the fact that the book was written in London, we use the preposition 'in'. This is the locative case. The English language also uses word position to mark case. The subject of a sentence is placed first and denotes the nominative case. The direct object is placed second and denotes the accusative case.
nominative: john-milton
accusative: paradise-lost
locative: london
I wanted to add the right case of 'decade-1650', and what I found was 'accusative of duration of time'. This result is a little meagre if you ask me.
Theta roles
Theta roles are syntactic in nature. They describe the number and type of the arguments of a verb. The number of theta roles of a verb is fixed. So this formalism stays quite close to the orginal predicate logic expression, except that it 'names' the arguments. An example will make this clear:
write(agent: john-milton, theme: paradise-lost)
Only 'required' arguments are used as theta roles. So the adjunct 'in the sixteen fifties' cannot be represented. It is not considered to be part of the argument structure of the verb.
Thematic relations
Thematic relations are similar to theta roles, but their intention is semantic rather than syntactic. They assign roles to adjuncts ('in the sixteen fifties').
A list of the most important thematic relations is:
- Agent
- Experiencer
- Theme
- Patient
- Instrument
- Force
- Location
- Direction
- Recipient
- Source
- Time
- Beneficiary
- Manner
- Purpose
- Cause
In our example this would make:
predicate: write
agent: john-milton
patient: paradise-lost
time: decade-1650
Summary and conclusion
Some things to take away:
- A simple sentence has a single predicate and some arguments
- There are required and optional arguments. The optional ones are not always considered 'real'
- There are restrictions to the contents of an argument in a given role
- The role of the arguments with respect to the predicate
- How do you recognize different roles in a sentence?
- There are different predicates with the same name, i.e., write(a, b), write(a, b, c)
I really like thematic relations. They give me the feeling of being 'right', so I will go with them.
- Labels
- nlp
Archief > 2011
december
november
september
- 18-09-2011 18-09-2011 19:50 - Sense and reference in an NLP parser
- 06-09-2011 06-09-2011 20:17 - Semantic analysis: predicates and arguments
Reacties op 'Semantic analysis: predicates and arguments'
<a href="http://www.evergreensinus.com" >louis vuitton outlet</a>
louis vuitton outlet
cheap jerseys
louis vuitton damier belt
red bottoms on sale
louis vuitton tote bag
http://mpphotoexpressions.com
elegant temperaments
cheap red glitter shoes for girls